Play-Hadoop-Python

My implementation of a simple Map/Reduce program in python.

Takes words and Maps each to a a value (1). Then, the reducer aggregates each word, summing its value to return the count of each word. Plan to make changes to the map/reducing usage, maybe running a learning algorithm on it or something.

This program makes use of "Streaming" within Hadoop. What this does is map the STDin to a mapper and the stdout from the mapper to the reducer. This allows me to use a python file in order to M/R, rather than traditionally Java.

Some commands to remember: cd usr/local/Cellar/Hadoop (brew installation) First SSH into localhost (by default http://localhost:50070/)

Starting Node for file system: $ sbin/start-dfs.sh Stopping Node for file system: $ sbin/stop-dfs.sh

Running jar file with input guten, output dir guten-output2, mapper.py, and reducer.py utilizing STREAM "api" bin/hadoop jar libexec/share/hadoop/tools/sources/hadoop-streaming-2.7.2.jar
-input /user/ahendy/guten/* \
-output /user/ahendy/guten-output2
-mapper /Users/ahendy/Documents/hadoopTesting/mapper.py
-reducer /Users/ahendy/Documents/hadoopTesting/reducer.py
-file /Users/ahendy/Documents/hadoopTesting/mapper.py
-file /Users/ahendy/Documents/hadoopTesting/reducer.py

Note: guten contains txt files, guten-output2 is output directory to be made. Mapper and reducer are found locally rather than on the HDFS, meaning it needs to be specified as a "file".

Files can be moved from HFDS (hadoop file directory system) to local with:

hadoop fs -copyToLocal

Similarily hadoop fs - copyFromLocal hadoop fs -put hadoop fs -mkdir hadoop fs -ls

Some tutorials I used to set up hadoop: http://zhongyaonan.com/hadoop-tutorial/setting-up-hadoop-2-6-on-mac-osx-yosemite.html https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html https://ambari.apache.org/1.2.5/installing-hadoop-using-ambari/content/ambari-kerb-2-2-2b.html http://www.quuxlabs.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/ https://hadoop.apache.org/docs/r1.2.1/streaming.html http://hortonworks.com/hadoop-tutorial/using-commandline-manage-files-hdfs/

ahendy / hadoop-canoconical-python Goto Github PK

hadoop-canoconical-python's Introduction

Play-Hadoop-Python

hadoop-canoconical-python's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent