BigData: Chaining MapReduce jobs - Joining data from different sources

Tuesday, 11 April 2023

Chaining MapReduce jobs - Joining data from different sources - Data Flow of Reduce side join:

In data analyses we need to gather the data from two or more different sources.

If we want an inner join of the two data sets above, the desired output would look as listed below For example, Let’s take a two comma-separated files

1. Customers file with three fields: Customer ID, Name, and Phone Number. We put four records in the file for illustration

C_ID	Name	Phone_No
1	Ram	8977101699
2	Rani	8977101688
3	Vani	8977101677
4	Dhoni	8977101666

2. Order file with four fields: Customer ID, Order ID, Price, and Purchase Date.

C_ID	O_ID	Price	Date
3	A	100	11-05-2020
1	B	200	17-06-2021
2	C	300	19-02-2020
3	D	400	27-06-2021

If we want an inner join of the two data sets above, the desired output would look as listed below

C_ID	Name	Phone_No	O_ID	Price	Date
1	Ram	8977101699	B	200	17-06-2021
2	Rani	8977101688	C	300	19-02-2020
3	Vani	8977101677	A	100	11-05-2020
3	Vani	8977101677	D	400	27-06-2021

Data Flow of Reduce side join:

BigData

Pages

Tuesday, 11 April 2023

Chaining MapReduce jobs - Joining data from different sources - Data Flow of Reduce side join:

No comments:

Post a Comment

Friends-of-friends-Map Reduce program

Report Abuse

Labels