MapReduce is a software framework and programming model used for processing huge amounts of data. MapReduce program work in two phases, namely,
Map : Map tasks deal with splitting and mapping of data
Reduce: Reduce tasks shuffle and reduce the data.
Assumes that the matrix A is an m*n, where m represent the row i and n represent the column j whose elements aij denoted as the value
similarly Assumes that the matrix B is an n*p, where n represent the row j and p represent the column k whose elements bjk denoted as the value
The Map function Algorithm :
- for each element of Matrix A do
- produce (key ,value) pair as
- (i, k) (A,j,aij).
- for each element of Matrix B do
- produce (key ,value) pair as
- (i, k) (B,j,bjk).
- return set of (key ,value) pairs
The Reduce function Algorithm:
- for each key(i,k) do
- sort values begin with A by j in list A
- sort values begin with B by j in list B
- multiply aij and bjk for jth value of each list
- sum up aij and bjk
Example:
Consider a matrix A whose order is m*n = i*j (2 * 3)
and matrix B whose order is n*p = j*k (3 * 2)
Matrix
|
i/k
|
j
|
value
|
A
|
0
|
0
|
1
|
A
|
0
|
1
|
2
|
A
|
0
|
2
|
3
|
A
|
1
|
0
|
4
|
A
|
1
|
1
|
5
|
A
|
1
|
2
|
6
|
B
|
0
|
0
|
6
|
B
|
1
|
0
|
3
|
B
|
0
|
1
|
5
|
B
|
1
|
1
|
2
|
B
|
0
|
2
|
4
|
B
|
1
|
2
|
1
|
Where,
i represents rows of matrix A
j represents columns of matrix A
j represents rows of matrix B
k represents columns of matrix B
Map function for both Matrix A and Matrix B:
Matrix A:
- for each element of Matrix A do
- produce (key ,value) pair as
- key ---(i, k) and pair ---- (A,j,aij).
aij = 1 (where i=0 j=0)
key ---(0, k) and pair ---- (A,0,aij) where k is the duplicate and we need to do based on k value
key ---(0, 0) and pair ---- (A,0,1)
key ---(0, 1) and pair ---- (A,0,1)
aij = 2 (where i=0 j=1)
key ---(0, k) and pair ---- (A,1,aij) where k is the duplicate and we need to do based on k value
key ---(0, 0) and pair ---- (A,1,2)
key ---(0, 1) and pair ---- (A,1,2)
aij = 3 (where i=0 j=2)
key ---(0, k) and pair ---- (A,2,aij) where k is the duplicate and we need to do based on k value
key ---(0, 0) and pair ---- (A,2,3)
key ---(0, 1) and pair ---- (A,2,3)
aij = 4 (where i=1 j=0)
key ---(1, k) and pair ---- (A,0,aij) where k is the duplicate and we need to do based on k value
key ---(1, 0) and pair ---- (A,0,4)
key ---(1, 1) and pair ---- (A,0,4)
aij = 5 (where i=1 j=1)
key ---(1, k) and pair ---- (A,1,aij) where k is the duplicate and we need to do based on k value
key ---(1, 0) and pair ---- (A,1,5)
key ---(1, 1) and pair ---- (A,1,5)
aij = 6 (where i=1 j=2)
key ---(1, k) and pair ---- (A,2,aij) where k is the duplicate and we need to do based on k value
key ---(1, 0) and pair ---- (A,2,6)
key ---(1, 1) and pair ---- (A,2,6)
Matrix B:
- for each element of Matrix B do
- produce (key ,value) pair as
- key ---(i, k) and pair ---- (B,j,bjk).
bjk = 6 (where j=0 k=0)
key ---(i, 0) and pair ---- (B,0,bjk) where i is the duplicate and we need to do based on i value
key ---(0, 0) and pair ---- (B,0,6)
key ---(1, 0) and pair ---- (B,0,6)
bjk = 3 (where j=0 k=1)
key ---(i, 1) and pair ---- (B,0,bjk) where i is the duplicate and we need to do based on i value
key ---(0, 1) and pair ---- (B,0,3)
key ---(1, 1) and pair ---- (B,0,3)
bjk = 5 (where j=1 k=0)
key ---(i, 0) and pair ---- (B,1,bjk) where i is the duplicate and we need to do based on i value
key ---(0, 0) and pair ---- (B,1,5)
key ---(1, 0) and pair ---- (B,1,5)
bjk = 2 (where j=1 k=1)
key ---(i, 1) and pair ---- (B,1,bjk) where i is the duplicate and we need to do based on i value
key ---(0, 1) and pair ---- (B,1,2)
key ---(1, 1) and pair ---- (B,1,2)
bjk = 4 (where j=2 k=0)
key ---(i, 0) and pair ---- (B,2,bjk) where i is the duplicate and we need to do based on i value
key ---(0, 0) and pair ---- (B,2,4)
key ---(1, 0) and pair ---- (B,2,4)
bjk = 1 (where j=2 k=1)
key ---(i, 1) and pair ---- (B,2,bjk) where i is the duplicate and we need to do based on i value
key ---(0, 1) and pair ---- (B,2,1)
key ---(1, 1) and pair ---- (B,2,1)
Shuffle/Group :
(0,0) ->
(A,0,1) (A,1,2) (A,2,3)
(B,0,6) (B,1,5) (B,2,4)
(0,1) ->
(A,0,1) (A,1,2) (A,2,3)
(B,0,3) (B,1,2) (B,2,1)
(1,0) ->
(A,0,4) (A,1,5) (A,2,6)
(B,0,6) (B,1,5) (B,2,4)
(1,1) ->
(A,0,4) (A,1,5) (A,2,6)
(B,0,3) (B,1,2) (B,2,1)
Reduce Function:
(0,0) ->
(A,0,1) (A,1,2) (A,2,3)
(B,0,6) (B,1,5) (B,2,4)
___________________________
1*6 2*5 3*4
6 + 10 + 12 = 28
___________________________
(0,1) ->
(A,0,1) (A,1,2) (A,2,3)
(B,0,3) (B,1,2) (B,2,1)
___________________________
1*3 2*2 3*1
3 + 4 + 3 = 10
___________________________
(1,0) ->
(A,0,4) (A,1,5) (A,2,6)
(B,0,6) (B,1,5) (B,2,4)
___________________________
4*6 5*5 6*4
24 + 25 + 24 = 73
___________________________
(1,1) ->
(A,0,4) (A,1,5) (A,2,6)
(B,0,3) B,1,2) (B,2,1)
___________________________
4*3 5*2 6*1
12 + 10 + 6 = 28
___________________________
The final step in the MapReduce algorithm is to produce the matrix A × B
The unit of computation of matrix A × B is :