Pages

Wednesday, 31 March 2021

How to Install and Uninstall Java on Ubuntu

Step1: Installing Java on Ubuntu

You can install one or several Java packages. You can also decide which version you want on your system by installing a specific version number. The current default and LTS version is Java 11.

Install OpenJDK

1. Open the terminal (Ctrl+Alt+T) and update the package repository to ensure you download the latest software version:

sudo apt update

2. Then, you can confidently install the latest Java Development Kit with the following command:

sudo apt install default-jdk

Install Specific Version of OpenJDK

You may decide to use Open JDK 8, instead of the default OpenJDK 11.

To do so, open the terminal and type in the following command:

sudo apt install openjdk-8-jdk

Once the installation process is complete, verify the current Java version:

java -version; javac -version

How to find the OpenJDK directory with the following command:

readlink -f /usr/bin/javac

How to Set Default Java Version

As you can have multiple versions of Java installed on your system, you can decide which one is the default one.First, run a command that shows all the installed versions on your computer:

sudo update-alternatives --config java

step 2: Uninstall Java on Ubuntu

In case you need to remove any of the Java packages installed, use the apt remove or purge command. To remove Open JDK 11, run the command:

sudo apt remove default-jdk

To uninstall OpenJDK 8:

sudo apt remove openjdk-8-jdk

2nd Method:

1. sudo dpkg --list | grep -i jdk 

2. sudo apt-get purge Oracle-java8-installer

Friday, 26 March 2021

Introduction to Big Data and Why to use Big Data Technology

Data :

The quantities, characters, or symbols on which operations are performed by a computer, which may be stored and transmitted in the form of electrical signals and recorded on magnetic, optical, or mechanical recording media.

What is Big Data :


Big Data: 

Big Data is defined as data that is huge in size. BigData is a term used to describe a collection of data that is huge in size and yet growing exponentially with time.

It is data with so large size and complexity that none of the traditional data management tools can store it or process it efficiently

Examples Of Big Data

The New York Stock Exchange generates about one terabyte(TB) of new trade data per day.

The statistic shows that 500+terabytes of new data get ingested into the databases of social media site Facebook, every day. This data is mainly generated in terms of photo and video uploads, message exchanges, putting comments etc.



Different Languages used in computer science era are as shown below 


The use of the above different languages are

Hadoop  Founders :

When to use Hadoop :

Wednesday, 24 March 2021

How to connect to Windows 10 using OpenSSH Server

 Step 1:Click on Start and then type App & feature in the search  

Step 2:Click on Optional Features under App & feature

Step 3:Click on OpenSSH Server and install it   

Step 4:Click on Start and under search type Services App

Step 5:Click on OpenSSH Server and change the option from manual to automatic and then click on start   

Step 6:Now run the OpenSSh Server under the cmd prompt as shown by typing the password Thus the Open SSh server is now accessed from it 


Tuesday, 23 March 2021

Big Data Characteristics

Big Data Characteristics : (Volume, Velocity, Variety ,Veracity and  value...)

For a dataset to be considered Big Data, it must possess one or more characteristics that require accommodation in the solution design and architecture of the analytic environment.

This explores the five Big Data characteristics that can be used to help differentiate data categorized as “Big” from other forms of data.



1.Volume :

The name ‘Big Data’ itself is related to a size which is enormous.

Volume is a huge amount of data.

To determine the value of data, size of data plays a very crucial role. If the volume of data is very large then it is actually considered as a ‘Big Data’. This means whether a particular data can actually be considered as a Big Data or not, is dependent upon the volume of data.

Hence while dealing with Big Data it is necessary to consider a characteristic ‘Volume’.

Example: Typical data sources that are responsible for generating high data volumes can include:

• online transactions, such as point-of-sale and banking

• scientific and research experiments.

• sensors, such as Global Positioning System(GPS), RFIDs, smart meters and telematics.

• social media, such as Facebook and Twitter.

2.Velocity :

Velocity refers to the high speed of accumulation of data.

In Big Data velocity data flows in from sources like machines, networks, social media, mobile phones etc.

There is a massive and continuous flow of data. This determines the potential of data that how fast the data is generated and processed to meet the demands.

Sampling data can help in dealing with the issue like ‘velocity’.

Example: There are more than 3.5 billion searches per day are made on Google. Also, Facebook users are increasing by 22%(Approx.) year by year.

3.Variety :

It refers to nature of data that is structured, semi-structured and unstructured data.

It also refers to heterogeneous sources.

Variety is basically the arrival of data from new sources that are both inside and outside of an enterprise. It can be structured, semi-structured and unstructured.

Structured data: This data is basically an organized data. It generally refers to data that has defined the length and format of data.

Semi- Structured data: This data is basically a semi-organized data. It is generally a form of data that do not conform to the formal structure of data. Log files are the examples of this type of data.

Unstructured data: This data basically refers to unorganized data. It generally refers to data that doesn’t fit neatly into the traditional row and column structure of the relational database. Texts, pictures, videos etc. are the examples of unstructured data which can’t be stored in the form of rows and columns.


Note: JavaScript Object Notation (JSON) is a standard text-based format for representing structured data based on JavaScript object syntax. It is commonly used for transmitting data in web applications (e.g., sending some data from the server to the client, so it can be displayed on a web page, or vice versa).

4.Veracity :

It refers to inconsistencies and uncertainty in data, that is data which is available can sometimes get messy and quality and accuracy are difficult to control.

Data that is acquired in a controlled manner, for example via online customer registrations, usually contains less noise than data acquired via uncontrolled sources, such as blog postings.

Example: Data in bulk could create confusion whereas less amount of data could convey half or Incomplete Information.


5. Value:

             After having the 4 V’s into account there comes one more V which stands for Value. The bulk of Data having no Value is of no good to the company, unless you turn it into something useful.
             Data in itself is of no use or importance but it needs to be converted into something valuable to extract Information. Hence, you can state that Value! is the most important V of all the 5V’s.

Big Data Analytics

Big Data Analytics

Big data analytics examines large and different types of data to uncover hidden patterns, correlations, and other insights. Basically, Big Data Analytics is largely used by companies to facilitate their growth and development. This majorly involves applying various data mining algorithms on the given set of data, which will then aid them in better decision making.

Stages in Big Data Analytics

These are the following stages involved in the Big Data Analytics process:

Big Data Analytics: There are four general categories of analytics that are distinguished by the results they produce:

1. Descriptive Analytics        .......... hindsight

2. Diagnostic Analytics         ..........  insight

3. Predictive Analytics          ..........  insight

4. Prescriptive Analytics      ..........  foresight


Note:

Online Analytical Processing (OLAP) –
Online Analytical Processing consists of a type of software tools that are used for data analysis for business decisions. OLAP provides an environment to get insights from the database retrieved from multiple database systems at one time.

Examples – Any type of Data warehouse system is an OLAP system. Uses of OLAP are as follows:

  • Spotify analyzed songs by users to come up with the personalized homepage of their songs and playlist.

  • Netflix movie recommendation system.

Online Transaction Processing (OLTP) –
Online transaction processing provides transaction-oriented applications in a 3-tier architecture. OLTP administers day to day transaction of an organization.

Examples – Uses of OLTP are as follows:

  • ATM center is an OLTP application.

  • OLTP handles the ACID properties during data transaction via the application.

  • It’s also used for Online banking, Online airline ticket booking, sending a text message, add a book to the shopping cart.

1. Descriptive Analytics :

Descriptive analytics is carried out to answer questions about events that have already occurred. This form of analytics contextualizes data to generate information.

The reports are generally static in nature and display historical data that is presented in the form of data grids or charts. Queries are executed on operational data stores from within an enterprise, 

For example, Online Transaction Processing (OLTP) , a Customer Relationship Management system (CRM) , Enterprise Resource Planning (ERP) system.



Sample questions can include:

What was the sales volume over the past 12 months?

•     What is the number of support calls received as categorized by severity and geographic location?

What is the monthly commission earned by each sales agent?

It is estimated that 80% of generated analytics results are descriptive in nature.

2. Diagnostic Analytics :

Diagnostic Analytics aims to determine the cause of a phenomenon that occurred in the past using questions that focus on the reason behind the event. 

The goal of this type of analytics is to determine what information is related to the phenomenon in order to enable answering questions that seek to determine why something has occurred.

Diagnostic analytics usually requires collecting data from multiple sources and storing it in a structure that lends itself to performing drill-down and roll-up analysis.

Diagnostic analytics results are viewed via interactive visualization tools that enable users to identify trends and patterns. The executed queries are more complex compared to those of descriptive analytics and are performed on multidimensional data held in analytic processing systems.




Such questions include:
Why were Q2 sales less than Q1 sales?
Why have there been more support calls originating from the Eastern region than from the Western region?
Why was there an increase in patient readmission rates over the past three months?

3. Predictive Analytics :
Predictive analytics are carried out in an attempt to determine the outcome of an event that might occur in the future. With predictive analytics, information is enhanced to generate knowledge that conveys how that information is related.

It is important to understand that the models used for predictive analytics have implicit dependencies on the conditions under which the past events occurred. If these underlying conditions change, then the models that make predictions need to be updated.

Predictive analytics try to predict the outcomes of events, and predictions are made based on patterns, trends, and exceptions found in historical and current data. This can lead to the identification of both risks and opportunities. 

This kind of analytics involves the use of large datasets comprised of internal and external data and various data analysis techniques.



Questions are usually formulated such as :
What are the chances that a customer will default on a loan if they have missed a monthly payment?
What will be the patient survival rate if Drug B is administered instead of Drug A?
If a customer has purchased Products A and B, what are the chances that they will also purchase Product C?

4. Prescriptive Analytics :
Prescriptive analytics builds upon the results of predictive analytics by prescribing actions that should be taken. 

Prescriptive analytics provide more value than any other type of analytics and correspondingly require the most advanced skillset, as well as specialized software and tools.

Internal data might include current and historical sales data, customer information, product data, and business rules. 
External data may include social media data, weather forecasts and government-produced demographic data. 

Prescriptive analytics involves the use of business rules and large amounts of internal and external data to simulate outcomes and prescribe the best course of action.



Sample questions may include:
• Among three drugs, which one provides the best results?
• When is the best time to trade a particular stock?

Business Intelligence (BI) :
BI enables an organization to gain insight into the performance of an enterprise by analyzing data generated by its business processes and information systems.
BI applies analytics to large amounts of data across the enterprise, which has typically been consolidated into an enterprise data warehouse to run analytical queries.
The output of BI can be surfaced to a dashboard that allows managers to access and analyze the results and potentially refine the analytic queries to further explore the data.

Tools used in Big Data Analytics


How to install ubuntu operating system in windows10 without using virtualbox or any other third party software.


Step1:
Open control panel and then click on Programs 


Step 2:
Click on Turn Windows Feature on or off and then select Windows subsystem for Linux



Step 3:
Click on start and then type Microsoft store in the search 

Step 4:
After that  Go to Microsoft store and type ubuntu (Use red color logo don't download blue color) and get downloaded file from it 


After installing ubuntu Type this command to know the version of ubuntu  

lsb_release -a


Wednesday, 17 March 2021

HADOOP COMMANDS - hadoop fs or hdfs dfs

  1. root
  2. version of ubuntu/java
  3. how to add user in ubuntu
  4. how to delete user in ubuntu
  5. how to check the user's list in ubuntu
  6. how to distinguish local file system with distributed file system 
  7. list out all the commands used in hadoop 
  8. fsck
  9. hadoop version
  10. ls/lsr
  11. mkdir
  12. touchz / appendToFile
  13. copyFromLocal / put
  14. copyToLocal / get
  15. mv 
  16. cp
  17. chown
  18. chgrp
  19. setrep 
  20. du
  21. df
  22. stat
  23. help
  24. count

1. root:

How to check whether u r in root r not 

id -u

2.version of ubuntu/java

How to know the version of ubuntu

lsb_release -a

How to know the java version of ubuntu

java -version;javac -version

3.add user:

How to add the user in  ubuntu

sudo adduser usrk

4. delete user:

How to delete the user in  ubuntu

sudo deluser usrk

but it is not going to delete the user in order to delete we use

sudo rm -r usrk


5.check:
How to check the userlist  in  ubuntu

ls /home/

6.local file system & Distributed file system :
How to know local file system commands used in hadoop

hduser@rk-virtual-machine:~$ ls

Distributed file system commands used in hadoop 

hduser@rk-virtual-machine:~$ hdfs dfs -ls /

7.list out all the commands used in hadoop 

How to know what are the commands used in hadoop

hadoop fs -help

hadoop fs

 
8.fsck:
How to check the file system check healthy r not
hdfs fsck / 
     or 
hadoop fsck /
 
fsck:
How to use fsck commands in hadoop :
In this example, we are trying to check the health of the files in ‘test’ directory present in HDFS using the fsck command.
Usage:hadoop fsck <path> [ -move | -delete | -openforwrite] [-files [-blocks [-locations | -racks]]]

 
9.hadoop version :
How to check the hadoop version
Usage: version
 
hadoop version
         or 
hdfs version
10. ls/lsr:
How to check  hdfs / hadoop root 
Usage:hadoop fs -ls /path

hadoop fs -ls /
         or 
 hdfs dfs -ls / 

 hadoop fs -lsr / OR  hadoop fs -ls -R /
         or 
 hadoop fs -lsr /  OR  hdfs dfs -ls -R /
 

11. mkdir:
How to create directory in hadoop
Usage:hadoop fs –mkdir /path/directory_name

hadoop fs -mkdir /test
         or 
hdfs dfs -mkdir /test

hadoop fs -mkdir -p /ram/sita OR  hadoop fs -mkdir -p /ram/sita
          or  
hadoop fs -mkdir -p /ram/sita OR  hdfs dfs -mkdir -p /ram/sita

 

12. touchz / appendToFile:
How to create file ZERO BYTE FILE and NON ZERO BYTE FILE in hadoop
 
Usage:hadoop fs –touchz /directory/filename 

1) hadoop fs -touchz /virat.txt          ZERO BYTE FILE 

                  or 

 hdfs dfs -touchz /virat.txt

 
2) hadoop fs -appendToFile - /dhoni.txt    NON ZERO BYTE FILE 
                or 
    hdfs dfs -appendToFile - /dhoni.txt

In this case you need to write some text

for example dhoni is a boy Ctrl  + d         

 

13. copyFromLocal / put:
How to create -copyFromLocal OR -put  into hadoop   (Local to Hadoop)
 
Usage:hadoop fs -copyFromLocal <localsrc> <hdfs destination>
1) hadoop fs -copyFromLocal tanmai.txt /chiru
                  or
 
hdfs dfs -copyFromLocal tanmai.txt /chiru

Usage:hadoop fs -put
<localsrc> <hdfs destination>

2) hadoop fs -put sita.txt  /chiru
                or 
    hdfs dfs -put sita.txt /chiru


Note:
In this case it puts the file sita.txt from local to hadoop for that you need to create a file in local file system first and then it should be copyFromLocal to distributed file system 


14. copyToLocal / get:
How to create -copyToLocal OR -get  from hadoop   (Hadoop to Local)
 
Usage:hadoop fs -copyToLocal <hdfs source> <localdst>
1) hadoop fs -copyToLocal  /chiru/tanmai.txt /home/hduser/ramcharan/
                  or
 
hdfs dfs -copyToLocal /chiru/tanmai.txt /home/hduser/ramcharan/
 

Usage:hadoop fs -get <hdfs source> <localdst>

2) hadoop fs -get /chiru/hari.txt /home/hduser/ramcharan/
                or 
hdfs dfs -get /chiru/hari.txt /home/hduser/ramcharan/

Note:
In this case it puts the file hari.txt from hadoop to local for that you need to create a dummy directory (ramcharan)in local first and then it should be copyToLocal  
 
15.mv:
How to use mv commands in hadoop
The HDFS mv command moves the files or directories from the source to a destination within HDFS
 
Usage:hadoop fs -mv <src> <dest>

hadoop fs -mv /chiru/hari.txt /rohith/
                or 
hdfs dfs -mv  /chiru/hari.txt /rohith/
 

16. cp:
How to use cp commands in hadoop
The cp command copies a file from one directory to another directory within the HDFS.
 
Usage:hadoop fs -cp <src> <dest>

hadoop fs -cp /rohith/ram.txt /chiru/
                    or 
hdfs dfs -cp  /rohith/ram.txt /chiru/

17. chown:
How to use chown commands in hadoop :
Here we are changing the owner of a file name sample using the chown command.
Usage:hadoop fs -chown [-R] [owner] [:[group]] <path>

18. chgrp:
How to use chgrp commands in hadoop :
The Hadoop fs shell command chgrp changes the group of the file specified in the path.
The user must be the owner of the file or superuser.
Usage:hadoop fs -chgrp <group> <path>

19. setrep:
How to use setrep commands in hadoop :
Here we are trying to change the replication factor of the ‘ram.txt’ and 'mahesh.txt' file present in test directory on the HDFS filesystem
Usage: hadoop fs -setrep <rep> <path>

20. du:
How to use du commands in hadoop :
This Hadoop fs shell command du prints a summary of the amount of disk usage of all files/directories in the path.
Usage:hadoop fs –du –s /directory/filename
hdfs dfs -du /chiru
hdfs dfs -du -s /chiru 
21. df:
How to use df commands in hadoop :
The Hadoop fs shell command df shows the capacity, size, and free space available on the HDFS file system.
The -h option formats the file size in the human-readable format.
Usage:hadoop fs -df [-h] <path>

22. stat
:
How to use stat commands in hadoop :In the below example, we are using the stat command to print the information about file ‘mahesh.txt' present in the test directory of HDFS.
Usage: hadoop fs -stat [format] <path>
The Hadoop fs shell command stat prints the statistics about the file or directory in the specified format.

Formats:

%b –    file size in bytes
%g –    group name of owner
%n –    file name
%o –    block size
%r  –    replication
%u –    user name of owner

%y –    modification date

23. help:
How to use help commands in hadoop :
The Hadoop fs shell command help shows help for all the commands or the specified command.
Usage:hadoop fs -help [command]

24. count:
How to use count commands in hadoop :
Usage:hadoop fs -count [options] <path>

The Hadoop fs shell command count counts the number of files, directories, and bytes under the paths that matches the specified file pattern.

Options:
-q  –  shows quotas(quota is the hard limit on the number of names and amount of space used for individual directories)
-u  –  it limits output to show quotas and usage only
-h  –  shows sizes in a human-readable format
-v  –  shows header line

Access Hadoop UI from Browser:

Hadoop NameNode: Use your preferred browser and navigate to your localhost URL or IP. The default port number 9870 gives you access to the Hadoop NameNode UI:

http://localhost:9870

Hadoop DataNode: The default port 9864 is used to access individual DataNodes directly from your browser:

http://localhost:9864

YARN Resource Manager: The YARN Resource Manager is accessible on port 8088:

http://localhost:8088

Friends-of-friends-Map Reduce program

Program to illustrate FOF Map Reduce: import java.io.IOException; import java.util.*; import org.apache.hadoop.conf.Configuration; import or...