Pages

Tuesday, 11 April 2023

Chaining MapReduce jobs - Joining data from different sources - Data Flow of Reduce side join:

In data analyses we need to gather the data from two or more different sources. 

If we want an inner join of the two data sets above, the desired output would look as listed below For example, Let’s take a two comma-separated files 
1. Customers file with three fields: Customer ID, Name, and Phone Number. We put four records in the file for illustration

C_ID

Name

Phone_No

1

Ram

8977101699

2

Rani

8977101688

3

Vani

8977101677

4

Dhoni

8977101666







2. Order file with four fields: Customer ID, Order ID, Price, and Purchase Date.

C_ID

O_ID

Price

Date

3

A

100

11-05-2020

1

B

200

17-06-2021

2

C

300

19-02-2020

3

D

400

27-06-2021







If we want an inner join of the two data sets above, the desired output would look as listed below

C_ID

Name

Phone_No

O_ID

Price

Date

1

Ram

8977101699

B

200

17-06-2021

2

Rani

8977101688

       C
     300
       19-02-2020

3

Vani

8977101677

A

100

11-05-2020

3

Vani

8977101677

D

400

27-06-2021

Data Flow of Reduce side join:

No comments:

Post a Comment

Friends-of-friends-Map Reduce program

Program to illustrate FOF Map Reduce: import java.io.IOException; import java.util.*; import org.apache.hadoop.conf.Configuration; import or...