Pages

Thursday 24 February 2022

Hive-Joins,Subqueries

Step-1:
create database joins;
show databases;

 

Step-2:
create two tables namely table 1 named as sales and table2 named as products. 


Step-3:
create two tables sales and product in the database named by join
sales table with two fields name and id 
product table with two fields pid and pname.

Table 1: sales

name sid
raju    2
rani    3
vani    5
kani    7
poni    9

Table 2: product

pid    pname
2    laptop
3    desktop    
6    headphones
7    mobile
8    ipad

syntax to create table sales :

create table sales(name string,id int)
row format delimited
fields terminated by '\t';

syntax to create table product:

create table product(pid int,pname string)
row format delimited
fields terminated by '\t';
 
Load both the tables as shown :

INNER JOIN or JOIN: 

The records common to the both tables will be retrieved by this inner-join the simplest kind of join is the inner join, where each match in the input table results in a row in the output
 
syntax: select table1.col1, table2.col2
from table2 join table1
on (table1.matching-col = table2.matching-col);
 
Example 1:
hive> select sales.name,product.pname
    > from product join sales
    > on(sales.sid = product.pid);

 

Example 2:
hive> select sales.*,product.*
    > from product JOIN sales
   > on(sales.sid = product.pid);
 

Example 3:
hive> select sales.*,product.*
    > from sales JOIN product
    > on(sales.sid = product.pid);

Example 4:
hive> select sales.*,product.*
    > from sales INNER JOIN product
    > on(sales.sid = product.pid);


LEFT OUTER JOIN:
This type of join returns all rows from the left table even if there is no matching row in the right table.Table returns all rows from the left table and matching rows from right table, unmatched right tables records will be null
 
syntax: select table1.col1,table2.col2
from table1 LEFT OUTER JOIN table2
on(table1.matching-col = table2.matching-col);

Example :
hive> select sales.*,product.*
    > from sales LEFT OUTER JOIN product
    > on(sales.sid = product.pid);
RIGHT OUTER JOIN:
It returns all rows from the right table even if there is no matching row in the left table. Table returns all rows from the right table and matching rows from left table,unmatched left table records will be null
 
syntax: select table1.col1,table2.col2
from table1 right outer join table2
on(table1.matching-col=table2.matching-col);
 
Example:
hive> select sales.*,product.*
    > from sales RIGHT OUTER JOIN product
    > on(sales.sid = product.pid);

FULL OUTER JOIN :

It returns all rows from the both tables that fulfill the join  condition the unmatched rows from both tables will be returned as a null

syntax: select table 1.col1,table1.col2,table2.col1,table2.col2
from table1 full outer join table2 
on(table1.matching-col = table2.matching-col);
 
Example:
hive> select sales.*,product.*
    > from sales FULL OUTER JOIN product
    > on(sales.sid = product.pid);


SUBQUERIES:
 
Step 1: 
hive> create table std(sno int, sname string, scity string)
    > row format delimited
    > fields terminated by '\t';
 
hive> desc std;
OK
sno                     int                                        
sname                   string                                     
scity                   string                                     
Time taken: 0.042 seconds, Fetched: 3 row(s)
 
hive> load data local inpath '/home/hduser/Desktop/std' into table std;
Loading data to table joins.std
OK
Time taken: 0.714 seconds
 
hive> select * from std;
OK
11    raju    vsp
22    rani    hyd
33    vani    vzm
44    kani    rjm
55    poni    ban
 

Step 2:
hive> create table dept(dno string, dname string, sno int)
    > row format delimited
    > fields terminated by '\t';
OK
Time taken: 0.079 seconds
 
hive> desc dept;
OK
dno                     string                                     
dname                   string                                     
sno                     int                                        
Time taken: 0.041 seconds, Fetched: 3 row(s)
 
hive> load data local inpath '/home/hduser/Desktop/dept' into table dept;
Loading data to table joins.dept
OK
Time taken: 0.198 seconds
 
hive> select * from dept;
OK
d1    cse    11
d2    it    22
d3    ece    33
d4    mpc    44
d5    msc    55
Time taken: 0.138 seconds, Fetched: 5 row(s)
 

Step 3:
hive> select dno from dept where dname='mpc';
OK
d4
Time taken: 0.177 seconds, Fetched: 1 row(s)
 
hive> select dno,sno from dept where dname='mpc';
OK
d4    44
Time taken: 0.113 seconds, Fetched: 1 row(s)
 
hive> select sname from std where sno=(select sno from dept where dname='mpc');



hive> select sname,scity from std where sno=(select sno from dept where dname='mpc');


Monday 21 February 2022

Data warehouse vs Data Mining

What is Big Data Analytics?
Basically, Big Data Analytics is largely used by companies to facilitate their growth and development. This majorly involves applying various data mining algorithms on the given set of data, which will then aid them in better decision making. There are multiple tools for processing Big Data such as Hadoop, Pig, Hive, Cassandra, Spark, Kafka, etc. depending upon the requirement of the organization.

What is Data warehouse?
A data warehouse is a technique for collecting and managing data from varied sources to provide meaningful business insights.

Data Warehouse is electronic storage of a large amount of information by a business which is designed for query and analysis instead of transaction processing. It is a process of transforming data into information and making it available to users for analysis.


What Is Data Mining?
Data mining is looking for hidden, valid, and potentially useful patterns in huge data sets. Data Mining is all about discovering unsuspected/ previously unknown relationships amongst the data.

It is a multi-disciplinary skill that uses machine learning, statistics, AI and database technology.

The insights extracted via Data mining can be used for marketing, fraud detection, and scientific discovery, etc.





Friends-of-friends-Map Reduce program

Program to illustrate FOF Map Reduce: import java.io.IOException; import java.util.*; import org.apache.hadoop.conf.Configuration; import or...