Pages

Sunday 19 March 2023

MapReduce - Weather Dataset using single file -Part II

Program on MapReduce Weather Dataset :

step 1:Collect the weather datasets using NCDC domain and name this file as tempinput 

0067011990999991950051507004+68750+023550FM-12+038299999V0203301N00671220001CN9999999N9+00001+99999999999
0043011990999991950051512004+68750+023550FM-12+038299999V0203201N00671220001CN9999999N9+00221+99999999999
0043011990999991950051518004+68750+023550FM-12+038299999V0203201N00261220001CN9999999N9-00111+99999999999
0043012650999991949032412004+62300+010750FM-12+048599999V0202701N00461220001CN0500001N9+01111+99999999999
0043012650999991949032418004+62300+010750FM-12+048599999V0202701N00461220001CN0500001N9+00781+99999999999
 

Step 2: create a folder kk and  then create a file inside with Maxtemp.javaand allot permission to access from non root

root@ubuntu:/home/hduser# chmod -R 777 kk

//Maxtemp.java Program

import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;


public class Maxtemp
{
    public static class MaxtempMapper
    extends Mapper<LongWritable, Text, Text, IntWritable >
  {
    public void map(LongWritable key, Text value, Context context)
    throws IOException, InterruptedException
    {
         String line=value.toString();
          String year=line.substring(15,19);
          int airtemp;
          if(line.charAt(87)== '+')
          {
           airtemp=Integer.parseInt(line.substring(88,92));
           }
         else
         airtemp=Integer.parseInt(line.substring(87,92));
         String q=line.substring(92,93);
         if(airtemp!=9999&&q.matches("[01459]"))
         {
          context.write(new Text(year),new IntWritable(airtemp));
         }
     }
 }
 

public static class MaxtempReducer
extends Reducer<Text, IntWritable, Text, IntWritable>
 {
   public void reduce(Text key, Iterable<IntWritable> values, Context  context)throws IOException, InterruptedException
    {
        int maxvalue=Integer.MIN_VALUE;
        for (IntWritable value : values)
        {
         maxvalue=Math.max(maxvalue, value.get());
        }
        context.write(key, new IntWritable(maxvalue));
    }
 }

public static void main(String[] args) throws Exception

    {
        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf, "Maxtemp");
        job.setJarByClass(Maxtemp.class);
        // TODO: specify a mapper
        job.setMapperClass(MaxtempMapper.class);
        // TODO: specify a reducer
        job.setReducerClass(MaxtempReducer.class);

        // TODO: specify output types
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);

        // TODO: specify input and output DIRECTORIES (not files)
        FileInputFormat.setInputPaths(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        if (!job.waitForCompletion(true))
            return;
    }

}

Step 3:generate class files for each java file using the following commands
hduser@ubuntu:~/kk$ export CLASSPATH=`hadoop classpath`
hduser@ubuntu:~/kk$ echo $CLASSPATH
hduser@ubuntu:~/kk$ javac -d . Maxtemp.java
Step 4:Create a jar file using the following command
hduser@ubuntu:~/kk$ jar -cvf max.jar -C /home/hduser/kk .  
Step 5:create a folder rk and then copy the whether dataset file tempinput.txt under DFS using the following commands
hduser@ubuntu:~/kk$ hadoop fs -mkdir /rk
hduser@ubuntu:~/kk$ ls
'Maxtemp$MaxtempMapper.class'  'Maxtemp$MaxtempReducer.class'   Maxtemp.class   Maxtemp.java   rrr.jar   tempinput.txt
hduser@ubuntu:~/kk$ hadoop fs -put /home/hduser/kk/tempinput.txt /rk
Step 6: Now run the hadoop jar command
hduser@ubuntu:~/kk$ hadoop jar max.jar Maxtemp /rk/tempinput.txt /rk/joy
Step 7:Now we can check the maximum temperature of the given dataset in a folder rk/tempinput.txt
hduser@ubuntu:~/kk$ hadoop fs -cat /rk/tempinput.txt
hduser@ubuntu:~/kk$ hadoop fs -cat /rk/joy/part-r-00000

No comments:

Post a Comment

Friends-of-friends-Map Reduce program

Program to illustrate FOF Map Reduce: import java.io.IOException; import java.util.*; import org.apache.hadoop.conf.Configuration; import or...