BigData: February 2022

Thursday, 24 February 2022

Hive-Joins,Subqueries

Step-1:

create database joins;

show databases;

Step-2:

create two tables namely table 1 named as sales and table2 named as products.

Step-3:

create two tables sales and product in the database named by join

sales table with two fields name and id

product table with two fields pid and pname.

Table 1: sales

name sid

raju 2

rani 3

vani 5

kani 7

poni 9

Table 2: product

pid pname

2 laptop

3 desktop

6 headphones

7 mobile

8 ipad

syntax to create table sales :

create table sales(name string,id int)

row format delimited

fields terminated by '\t';

syntax to create table product:

create table product(pid int,pname string)

row format delimited

fields terminated by '\t';

Load both the tables as shown :

INNER JOIN or JOIN:

The records common to the both tables will be retrieved by this inner-join the simplest kind of join is the inner join, where each match in the input table results in a row in the output

syntax: select table1.col1, table2.col2

from table2 join table1

on (table1.matching-col = table2.matching-col);

Example 1:

hive> select sales.name,product.pname
> from product join sales
> on(sales.sid = product.pid);

Example 2:
hive> select sales.*,product.*
> from product JOIN sales
> on(sales.sid = product.pid);

Example 3:

hive> select sales.*,product.*
> from sales JOIN product
> on(sales.sid = product.pid);

Example 4:

hive> select sales.*,product.*
> from sales INNER JOIN product
> on(sales.sid = product.pid);

LEFT OUTER JOIN:

This type of join returns all rows from the left table even if there is no matching row in the right table.Table returns all rows from the left table and matching rows from right table, unmatched right tables records will be null

syntax: select table1.col1,table2.col2

from table1 LEFT OUTER JOIN table2

on(table1.matching-col = table2.matching-col);

Example :

hive> select sales.*,product.*
> from sales LEFT OUTER JOIN product
> on(sales.sid = product.pid);

RIGHT OUTER JOIN:

It returns all rows from the right table even if there is no matching row in the left table. Table returns all rows from the right table and matching rows from left table,unmatched left table records will be null

syntax: select table1.col1,table2.col2

from table1 right outer join table2

on(table1.matching-col=table2.matching-col);

Example:

hive> select sales.*,product.*
> from sales RIGHT OUTER JOIN product
> on(sales.sid = product.pid);

FULL OUTER JOIN :

It returns all rows from the both tables that fulfill the join condition the unmatched rows from both tables will be returned as a null

syntax: select table 1.col1,table1.col2,table2.col1,table2.col2

from table1 full outer join table2

on(table1.matching-col = table2.matching-col);

Example:

hive> select sales.*,product.*

> from sales FULL OUTER JOIN product

> on(sales.sid = product.pid);

SUBQUERIES:

Step 1:

hive> create table std(sno int, sname string, scity string)
> row format delimited
> fields terminated by '\t';

hive> desc std;
OK
sno                     int
sname                   string
scity                   string
Time taken: 0.042 seconds, Fetched: 3 row(s)

hive> load data local inpath '/home/hduser/Desktop/std' into table std;
Loading data to table joins.std
OK
Time taken: 0.714 seconds

hive> select * from std;
OK
11    raju    vsp
22    rani    hyd
33    vani    vzm
44    kani    rjm
55    poni    ban

Step 2:

hive> create table dept(dno string, dname string, sno int)
> row format delimited
> fields terminated by '\t';
OK
Time taken: 0.079 seconds

hive> desc dept;
OK
dno                     string
dname                   string
sno                     int
Time taken: 0.041 seconds, Fetched: 3 row(s)

hive> load data local inpath '/home/hduser/Desktop/dept' into table dept;
Loading data to table joins.dept
OK
Time taken: 0.198 seconds

hive> select * from dept;
OK
d1   cse   11
d2   it   22
d3   ece   33
d4   mpc   44
d5   msc   55
Time taken: 0.138 seconds, Fetched: 5 row(s)

Step 3:

hive> select dno from dept where dname='mpc';
OK
d4
Time taken: 0.177 seconds, Fetched: 1 row(s)

hive> select dno,sno from dept where dname='mpc';
OK
d4 44
Time taken: 0.113 seconds, Fetched: 1 row(s)

hive> select sname from std where sno=(select sno from dept where dname='mpc');

hive> select sname,scity from std where sno=(select sno from dept where dname='mpc');

Monday, 21 February 2022

Data warehouse vs Data Mining

What is Big Data Analytics?

Basically, Big Data Analytics is largely used by companies to facilitate their growth and development. This majorly involves applying various data mining algorithms on the given set of data, which will then aid them in better decision making. There are multiple tools for processing Big Data such as Hadoop, Pig, Hive, Cassandra, Spark, Kafka, etc. depending upon the requirement of the organization.

What is Data warehouse?

A data warehouse is a technique for collecting and managing data from varied sources to provide meaningful business insights.

Data Warehouse is electronic storage of a large amount of information by a business which is designed for query and analysis instead of transaction processing. It is a process of transforming data into information and making it available to users for analysis.

What Is Data Mining?

Data mining is looking for hidden, valid, and potentially useful patterns in huge data sets. Data Mining is all about discovering unsuspected/ previously unknown relationships amongst the data.

It is a multi-disciplinary skill that uses machine learning, statistics, AI and database technology.

The insights extracted via Data mining can be used for marketing, fraud detection, and scientific discovery, etc.