BigData

Thursday, 21 March 2024

Friends-of-friends-Map Reduce program

Program to illustrate FOF Map Reduce:

import java.io.IOException;
import java.util.*;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class FriendCommon {
public static class FriendMapper extends Mapper<Object, Text, Text, Text> {
private Text pair = new Text();
private Text user = new Text();
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
String[] line = value.toString().split(" -> ");
if (line.length == 2) {
String[] friends = line[1].split(" ");
for (int i = 0; i < friends.length; i++) {
for (int j = i + 1; j < friends.length; j++) {
String friendPair = friends[i].trim() + "," + friends[j].trim();
pair.set(friendPair);
user.set(line[0].trim());
context.write(pair, user);
}
}
}
}
}
public static class FriendReducer extends Reducer<Text, Text, Text, Text> {
private Text commonFriends = new Text();
public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
Set<String> userList = new HashSet<>();
for (Text value : values) {
userList.add(value.toString());
}
if (userList.size() > 1) {
List<String> sortedUsers = new ArrayList<>(userList);
Collections.sort(sortedUsers);
commonFriends.set(String.join(" ", sortedUsers));
context.write(key, commonFriends);
}
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "FriendCommon");
job.setJarByClass(FriendCommon.class);
job.setMapperClass(FriendMapper.class);
job.setReducerClass(FriendReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}

Step to run Map Reduce program :

Input File:

A -> B C D

B -> A C D E

C -> A B D E

D -> A B C E

E -> B C D

hduser@ubuntu:~/fof$ ls

FriendCommon.java in.txt

hduser@ubuntu:~/fof$ export CLASSPATH=`hadoop classpath`

hduser@ubuntu:~/fof$ echo $CLASSPATH

hduser@ubuntu:~/fof$ javac -d . FriendCommon.java

hduser@ubuntu:~/fof$ ls

'FriendCommon$FriendMapper.class' 'FriendCommon$FriendReducer.class' FriendCommon.class FriendCommon.java in.txt

hduser@ubuntu:~/fof$ jar -cvf fmr.jar -C /home/hduser/fof .

hduser@ubuntu:~/fof$ ls

fmr.jar 'FriendCommon$FriendMapper.class' 'FriendCommon$FriendReducer.class' FriendCommon.class FriendCommon.java in.txt

hduser@ubuntu:~/fof$ hadoop fs -mkdir /fofrk

hduser@ubuntu:~/fof$ hadoop fs -put /home/hduser/fof/in.txt /fofrk

hduser@ubuntu:~/fof$ hadoop fs -lsr /fofrk

lsr: DEPRECATED: Please use 'ls -R' instead.

-rw-r--r-- 1 hduser supergroup 62 2024-03-21 11:36 /fofrk/in.txt

hduser@ubuntu:~/fof$ hadoop fs -cat /fofrk/in.txt

A -> B C D

B -> A C D E

C -> A B D E

D -> A B C E

E -> B C D

hduser@ubuntu:~/fof$ hadoop jar fmr.jar FriendCommon /fofrk/in.txt /fofrk/out

hduser@ubuntu:~/fof$ hadoop fs -lsr /fofrk

lsr: DEPRECATED: Please use 'ls -R' instead.

-rw-r--r-- 1 hduser supergroup 62 2024-03-21 11:36 /fofrk/in.txt

drwxr-xr-x - hduser supergroup 0 2024-03-21 11:40 /fofrk/out

-rw-r--r-- 1 hduser supergroup 0 2024-03-21 11:40 /fofrk/out/_SUCCESS

-rw-r--r-- 1 hduser supergroup 88 2024-03-21 11:40 /fofrk/out/part-r-00000

hduser@ubuntu:~/fof$ hadoop fs -cat /fofrk/in.txt

A -> B C D

B -> A C D E

C -> A B D E

D -> A B C E

E -> B C D

hduser@ubuntu:~/fof$ hadoop fs -cat /fofrk/out/part-r-00000

A,B C D

A,C B D

A,D B C

A,E B C D

B,C A D E

B,D A C E

B,E C D

C,D A B E

C,E B D

D,E B C

Output:

Wednesday, 13 December 2023

Display your own Text on ubuntu terminal

$ figlet -f slant "Big Data" -c | lolcat && figlet -f digital -c "Drive into DataScience" | lolcat

How to display the above text on terminal:

Useful commands:

How to check the number of directories created by the user on terminal:

csedept@cse:~$ ls /home
csedept hdoop

csedept@cse:~$ ls -l /home/hdoop/

total 16

-rw-r--r-- 1 hdoop hdoop 8980 Dec 18 16:26 examples.desktop

drwxr-xr-x 14 hdoop hdoop 4096 Dec 18 16:55 hadoop

How to enter into the root terminal:

csedept@cse:~$ sudo -s

root@cse:/home/csedept#

How to use id command on the terminal:

csedept@cse:~$ id

uid=1000(csedept) gid=1000(csedept) groups=1000(csedept),4(adm),24(cdrom),27(sudo),30(dip),46(plugdev),116(lpadmin),126(sambashare)

csedept@cse:~$ id -u

1000

csedept@cse:~$ id -n -u

csedept

csedept@cse:~$ sudo adduser duck

[sudo] password for csedept:

Adding user `duck' ...

Adding new group `duck' (1001) ...

Adding new user `duck' (1001) with group `duck' ...

Creating home directory `/home/duck' ...

Copying files from `/etc/skel' ...

New password:

Retype new password:

passwd: password updated successfully

Changing the user information for duck

Enter the new value, or press ENTER for the default

Full Name []:duck

Room Number []:

Work Phone []:

Home Phone []:

Other []:

Is the information correct? [Y/n] y

csedept@cse:~$ ls /home/

csedept duck hdoop

How to switch the user using su command:

csedept@cse:~$ su - duck

Password:

duck@cse:~$ logout

csedept@cse:~$

Friday, 21 April 2023

Friends-of-friends

Social network sites such as LinkedIn and Facebook use the FoF algorithm to help users broaden their networks.

The key ingredient to success with this approach is to order the FoFs by the number of common friends, which increases the chances that the user knows the FoF.

1.Example : How to implement the FoF algorithm in MapReduce.

Two MapReduce jobs are required to calculate the FoFs for each user in a social network. The first job calculates the common friends for each user, and the second job sorts the common friends by the number of connections to your friends

2.Example : How to implement the FoF algorithm in MapReduce.

A -> B C D

B -> A C D E

C -> A B D E

D -> A B C E

E -> B C D

Mapper 1 output of A -> B C D

(A,B) -> (B C D)

(A,C) -> (B C D)

(A,D) -> (B C D)

Mapper 2 output of B -> A C D E

(A,B) -> (A C D E)

(B,C) -> (A C D E)

(B,D) -> (A C D E)

(B,E) -> (A C D E)

Mapper 3 output of C -> A B D E

(A,C) -> (A B D E)

(B,C) -> (A B D E)

(C,D) -> (A B D E)

(C,E) -> (A B D E)

Mapper 4 output of D -> A B C E

(A,D) -> (A B C E)

(B,D) -> (A B C E)

(C,D) -> (A B C E)

(D,E) -> (A B C E)

Mapper 5 output of E -> B C D

(B,E) -> (B C D)

(C,E) -> (B C D)

(D,E) -> (B C D)

Shuffle Or Group:

(A,B) -> (A C D E)

(A,B) -> (B C D)

-------------------------

(A,B) -> (A C D E) (B C D)

(A,C) -> (B C D)

(A,C) -> (A B D E)

-------------------------

(A,C) -> (A B D E) (B C D)

(A,D) -> (B C D)

(A,D) -> (A B C E)

-------------------------

(A,D) -> (A B C E) (B C D)

(B,C) -> (A C D E)

(B,C) -> (A B D E)

-------------------------

(B,C) -> (A B D E) (A C D E)

(B,D) -> (A C D E)

(B,D) -> (A B C E)

-------------------------

(B,D) -> (A B C E) (A C D E)

(B,E) -> (A C D E)

(B,E) -> (B C D)

-------------------------

(B,E) -> (A C D E) (B C D)

(C,D) -> (A B D E)

(C,D) -> (A B C E)

-------------------------

(C,D) -> (A B D E) (A B C E)

(C,E) -> (A B D E)

(C,E) -> (B C D)

-------------------------

(C,E) -> (A B D E) (B C D)

(D,E) -> (A B C E)

(D,E) -> (B C D)

-------------------------

(D,E) -> (A B C E) (B C D)

Reducer:

(A,B) -> (A C D E) (B C D)

The common pair is

(A,B) -> (C D)

(A,C) -> (A B D E) (B C D)

The common pair is

(A,C) -> (B D)

(A,D) -> (A B C E) (B C D)

The common pair is

(A,D) -> (B C )

(B,C) -> (A B D E) (A C D E)

The common pair is

(B,C) -> (A D E)

(B,D) -> (A B C E) (A C D E)

The common pair is

(B,D) -> (A C E)

(B,E) -> (A C D E) (B C D)

The common pair is

(B,E) -> (C D)

(C,D) -> (A B D E) (A B C E)

The common pair is

(C,D) -> (A B E)

(C,E) -> (A B D E) (B C D)

The common pair is

(C,E) -> (B D)

(D,E) -> (A B C E) (B C D)

The common pair is

(D,E) -> (B C)

Saturday, 15 April 2023

Example - Shortest Path - Dijkstra's algorithm

Relaxation;

d(u) + c(u,v) < d(v)

d(v)=d(u) + c(u,v)

Source	A	B	C	D	E	F
A	0	∞	∞	∞	∞	∞
B		2	4	∞	∞	∞
C			3	6	4	∞
E				6	4	∞
D				6		6
F						6

Find the shortest path from A to F using the above table : F E B A

Find the shortest path from A to E using the above table : E B A

Find the shortest path from A to C using the above table : C B A

Find the shortest path from A to D using the above table : D B A

Friday, 14 April 2023

Page Rank Algorithm using Directed Graph

PageRank works by counting the number and quality of links to a page to determine a rough estimate of how important the website is. The underlying assumption is that more important websites are likely to receive more links from other websites.

PageRank is a function that assigns a real number to each page in the Web (or at least to that portion of the Web that has been crawled and its links discovered).
The intent is that the higher the PageRank of a page, the more “important” it is.
There is one most popular model known as Random Surfer Model

Consider a directed graph with 4 nodes A,B&C and D ,in which D is a Dangling node

For Example: Consider a directed graph with 3 nodes A,B & C

Iteration 1:

Iteration 2:

The output of all the iteration in one table as shown:

Trick to find the Page Rank :

For the above given graph with ABC nodes

Page rank of a node is given by

indegree(name of the node)/(name of the node outdegree)

Page rank of A = C / 1

Page rank of B = A / 2

Page rank of C = A/2 + B/1

Iteration A B C

0 1 1 1

1 1 0.5 1.5

2 1.5 0.5 1

Thursday, 13 April 2023

Shortest Path Algorithm- Dijkstra algorithm

Dijkstra algorithm is a single-source shortest path algorithm. Here, single-source means that only one source is given, and we have to find the shortest path from the source to all the nodes.

First, we have to consider any vertex as a source vertex. Suppose we consider vertex 0 as a source vertex.

Here we assume that 0 as a source vertex, and distance to all the other vertices is infinity. Initially, we do not know the distances. First, we will find out the vertices which are directly connected to the vertex 0. As we can observe in the above graph that two vertices are directly connected to vertex 0.

Let's assume that the vertex 0 is represented by 'x' and the vertex 1 is represented by 'y'. The distance between the vertices can be calculated by using the below formula:

d(x, y) = d(x) + c(x, y) < d(y)

= (0 + 4) < ∞

= 4 < ∞

Since 4<∞ so we will update d(v) from ∞ to 4.

Therefore, we come to the conclusion that the formula for calculating the distance between the vertices:

{if( d(u) + c(u, v) < d(v))

d(v) = d(u) +c(u, v) }

Now we consider vertex 0 same as 'x' and vertex 4 as 'y'.

d(x, y) = d(x) + c(x, y) < d(y)

= (0 + 8) < ∞

= 8 < ∞

Therefore, the value of d(y) is 8. We replace the infinity value of vertices 1 and 4 with the values 4 and 8 respectively. Now, we have found the shortest path from the vertex 0 to 1 and 0 to 4. Therefore, vertex 0 is selected. Now, we will compare all the vertices except the vertex 0. Since vertex 1 has the lowest value, i.e., 4; therefore, vertex 1 is selected.

Since vertex 1 is selected, so we consider the path from 1 to 2, and 1 to 4. We will not consider the path from 1 to 0 as the vertex 0 is already selected.

First, we calculate the distance between the vertex 1 and 2. Consider the vertex 1 as 'x', and the vertex 2 as 'y'.

d(x, y) = d(x) + c(x, y) < d(y)

= (4 + 8) < ∞

= 12 < ∞

Since 12<∞ so we will update d(2) from ∞ to 12.

Now, we calculate the distance between the vertex 1 and vertex 4. Consider the vertex 1 as 'x' and the vertex 4 as 'y'.

d(x, y) = d(x) + c(x, y) < d(y)

= (4 + 11) < 8

= 15 < 8

Since 15 is not less than 8, we will not update the value d(4) from 8 to 12.

Till now, two nodes have been selected, i.e., 0 and 1. Now we have to compare the nodes except the node 0 and 1. The node 4 has the minimum distance, i.e., 8. Therefore, vertex 4 is selected.

Since vertex 4 is selected, so we will consider all the direct paths from the vertex 4. The direct paths from vertex 4 are 4 to 0, 4 to 1, 4 to 8, and 4 to 5. Since the vertices 0 and 1 have already been selected so we will not consider the vertices 0 and 1. We will consider only two vertices, i.e., 8 and 5.

First, we consider the vertex 8. First, we calculate the distance between the vertex 4 and 8. Consider the vertex 4 as 'x', and the vertex 8 as 'y'.

d(x, y) = d(x) + c(x, y) < d(y)

= (8 + 7) < ∞

= 15 < ∞

Since 15 is less than the infinity so we update d(8) from infinity to 15.

Now, we consider the vertex 5. First, we calculate the distance between the vertex 4 and 5. Consider the vertex 4 as 'x', and the vertex 5 as 'y'.

d(x, y) = d(x) + c(x, y) < d(y)

= (8 + 1) < ∞

= 9 < ∞

Since 5 is less than the infinity, we update d(5) from infinity to 9.

Till now, three nodes have been selected, i.e., 0, 1, and 4. Now we have to compare the nodes except the nodes 0, 1 and 4. The node 5 has the minimum value, i.e., 9. Therefore, vertex 5 is selected.

Since the vertex 5 is selected, so we will consider all the direct paths from vertex 5. The direct paths from vertex 5 are 5 to 8, and 5 to 6.

First, we consider the vertex 8. First, we calculate the distance between the vertex 5 and 8. Consider the vertex 5 as 'x', and the vertex 8 as 'y'.

d(x, y) = d(x) + c(x, y) < d(y)

= (9 + 15) < 15

= 24 < 15

Since 24 is not less than 15 so we will not update the value d(8) from 15 to 24.

Now, we consider the vertex 6. First, we calculate the distance between the vertex 5 and 6. Consider the vertex 5 as 'x', and the vertex 6 as 'y'.

d(x, y) = d(x) + c(x, y) < d(y)

= (9 + 2) < ∞</p>

= 11 < ∞

Since 11 is less than infinity, we update d(6) from infinity to 11.

Till now, nodes 0, 1, 4 and 5 have been selected. We will compare the nodes except the selected nodes. The node 6 has the lowest value as compared to other nodes. Therefore, vertex 6 is selected.

Since vertex 6 is selected, we consider all the direct paths from vertex 6. The direct paths from vertex 6 are 6 to 2, 6 to 3, and 6 to 7.

First, we consider the vertex 2. Consider the vertex 6 as 'x', and the vertex 2 as 'y'.

d(x, y) = d(x) + c(x, y) < d(y)

= (11 + 4) < 12

= 15 < 12

Since 15 is not less than 12, we will not update d(2) from 12 to 15

Now we consider the vertex 3. Consider the vertex 6 as 'x', and the vertex 3 as 'y'.

d(x, y) = d(x) + c(x, y) < d(y)

= (11 + 14) < ∞

= 25 < ∞

Since 25 is less than ∞, so we will update d(3) from ∞ to 25.

Now we consider the vertex 7. Consider the vertex 6 as 'x', and the vertex 7 as 'y'.

d(x, y) = d(x) + c(x, y) < d(y)

= (11 + 10) < ∞

= 22 < ∞

Since 22 is less than ∞ so, we will update d(7) from ∞ to 22.

Till now, nodes 0, 1, 4, 5, and 6 have been selected. Now we have to compare all the unvisited nodes, i.e., 2, 3, 7, and 8. Since node 2 has the minimum value, i.e., 12 among all the other unvisited nodes. Therefore, node 2 is selected.

Since node 2 is selected, so we consider all the direct paths from node 2. The direct paths from node 2 are 2 to 8, 2 to 6, and 2 to 3.

First, we consider the vertex 8. Consider the vertex 2 as 'x' and 8 as 'y'.

d(x, y) = d(x) + c(x, y) < d(y)

= (12 + 2) < 15

= 14 < 15

Since 14 is less than 15, we will update d(8) from 15 to 14.

Now, we consider the vertex 6. Consider the vertex 2 as 'x' and 6 as 'y'.

d(x, y) = d(x) + c(x, y) < d(y)

= (12 + 4) < 11

= 16 < 11

Since 16 is not less than 11 so we will not update d(6) from 11 to 16.

Now, we consider the vertex 3. Consider the vertex 2 as 'x' and 3 as 'y'.

d(x, y) = d(x) + c(x, y) < d(y)

= (12 + 7) < 25

= 19 < 25

Since 19 is less than 25, we will update d(3) from 25 to 19.

Till now, nodes 0, 1, 2, 4, 5, and 6 have been selected. We compare all the unvisited nodes, i.e., 3, 7, and 8. Among nodes 3, 7, and 8, node 8 has the minimum value. The nodes which are directly connected to node 8 are 2, 4, and 5. Since all the directly connected nodes are selected so we will not consider any node for the updation.

The unvisited nodes are 3 and 7. Among the nodes 3 and 7, node 3 has the minimum value, i.e., 19. Therefore, the node 3 is selected. The nodes which are directly connected to the node 3 are 2, 6, and 7. Since the nodes 2 and 6 have been selected so we will consider these two nodes.

Now, we consider the vertex 7. Consider the vertex 3 as 'x' and 7 as 'y'.

d(x, y) = d(x) + c(x, y) < d(y)

= (19 + 9) < 21

= 28 < 21

Since 28 is not less than 21, so we will not update d(7) from 28 to 21

Tuesday, 11 April 2023

Chaining MapReduce jobs - Joining data from different sources - Data Flow of Reduce side join:

In data analyses we need to gather the data from two or more different sources.

If we want an inner join of the two data sets above, the desired output would look as listed below For example, Let’s take a two comma-separated files

1. Customers file with three fields: Customer ID, Name, and Phone Number. We put four records in the file for illustration

C_ID	Name	Phone_No
1	Ram	8977101699
2	Rani	8977101688
3	Vani	8977101677
4	Dhoni	8977101666

2. Order file with four fields: Customer ID, Order ID, Price, and Purchase Date.

C_ID	O_ID	Price	Date
3	A	100	11-05-2020
1	B	200	17-06-2021
2	C	300	19-02-2020
3	D	400	27-06-2021

If we want an inner join of the two data sets above, the desired output would look as listed below

C_ID	Name	Phone_No	O_ID	Price	Date
1	Ram	8977101699	B	200	17-06-2021
2	Rani	8977101688	C	300	19-02-2020
3	Vani	8977101677	A	100	11-05-2020
3	Vani	8977101677	D	400	27-06-2021

Data Flow of Reduce side join:

BigData

Pages

Thursday, 21 March 2024

Friends-of-friends-Map Reduce program

Wednesday, 13 December 2023

Display your own Text on ubuntu terminal

Friday, 21 April 2023

Friends-of-friends

Saturday, 15 April 2023

Example - Shortest Path - Dijkstra's algorithm

Friday, 14 April 2023

Page Rank Algorithm using Directed Graph

Thursday, 13 April 2023

Shortest Path Algorithm- Dijkstra algorithm

Tuesday, 11 April 2023

Chaining MapReduce jobs - Joining data from different sources - Data Flow of Reduce side join:

Friends-of-friends-Map Reduce program

Report Abuse

Labels