MapReduce-Execution-on-GCP

Create a Hadoop MapReduce application to find the maximum temperature in every day of the years 1901 and 1902. Your application should read the input from HDFS and store the output to HDFS. When your application completes, merge all the results to one file and store it on the local cluster.

1. Copy of Your mapper.py code (or equivalent in another programming language) (25% of total grade)

public static class MaxTempMapper
        extends Mapper<LongWritable, Text, Text, IntWritable> {
            private static final int MISSING = 9999;

            public void map(LongWritable key, Text value, Context context)
                throws IOException, InterruptedException {
                    
                String line = value.toString();
                String date = line.substring(15, 23);
                int temp;
                if (line.charAt(87) == '+') {
                    temp = Integer.parseInt(line.substring(88, 92));
                }
                else {
                    temp = Integer.parseInt(line.substring(87, 92));
                }
                String quality = line.substring(92, 93);

                if(temp != MISSING && quality.matches("[01459]")) {
                    context.write(new Text(date), new IntWritable(temp));
                }
            }
        }

2. Copy of Your reducer.py code (or equivalent in another programming language) (25% of total grade)

public static class MaxTempReducer 
            extends Reducer<Text, IntWritable, Text, IntWritable> {

            public void reduce(Text key, Iterable<IntWritable> values, Context context)
                    throws IOException, InterruptedException {
                
                int maxValue = Integer.MIN_VALUE;
                for (IntWritable value : values) {
                    maxValue = Math.max(maxValue, value.get());
                }
                context.write(key, new IntWritable(maxValue));
            }
        }

3. Screenshot of the execution of Hadoop MapReduce Job in the terminal (25% of total grade)

steps

execution

4. Copy of your output file (after merging) containing the results (25% of total grade)

result file

Extra Credit: ▪ (+20%) Build GUI to upload the two data files from your local machine to GCP bucket automatically without having

for my code, I removed my config json file and hide my bucket name

demo.mp4

xynicole / mapreduce-execution-on-gcp Goto Github PK

mapreduce-execution-on-gcp's Introduction

MapReduce-Execution-on-GCP

1. Copy of Your mapper.py code (or equivalent in another programming language) (25% of total grade)

2. Copy of Your reducer.py code (or equivalent in another programming language) (25% of total grade)

3. Screenshot of the execution of Hadoop MapReduce Job in the terminal (25% of total grade)

steps

execution

4. Copy of your output file (after merging) containing the results (25% of total grade)

Extra Credit: ▪ (+20%) Build GUI to upload the two data files from your local machine to GCP bucket automatically without having

for my code, I removed my config json file and hide my bucket name

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent