Giter Site home page Giter Site logo

hadoop-canoconical-python's Introduction

Play-Hadoop-Python

My implementation of a simple Map/Reduce program in python.

Takes words and Maps each to a a value (1). Then, the reducer aggregates each word, summing its value to return the count of each word. Plan to make changes to the map/reducing usage, maybe running a learning algorithm on it or something.

This program makes use of "Streaming" within Hadoop. What this does is map the STDin to a mapper and the stdout from the mapper to the reducer. This allows me to use a python file in order to M/R, rather than traditionally Java.

Some commands to remember: cd usr/local/Cellar/Hadoop (brew installation) First SSH into localhost (by default http://localhost:50070/)

Starting Node for file system: $ sbin/start-dfs.sh Stopping Node for file system: $ sbin/stop-dfs.sh

Running jar file with input guten, output dir guten-output2, mapper.py, and reducer.py utilizing STREAM "api" bin/hadoop jar libexec/share/hadoop/tools/sources/hadoop-streaming-2.7.2.jar
-input /user/ahendy/guten/* \
-output /user/ahendy/guten-output2
-mapper /Users/ahendy/Documents/hadoopTesting/mapper.py
-reducer /Users/ahendy/Documents/hadoopTesting/reducer.py
-file /Users/ahendy/Documents/hadoopTesting/mapper.py
-file /Users/ahendy/Documents/hadoopTesting/reducer.py

Note: guten contains txt files, guten-output2 is output directory to be made. Mapper and reducer are found locally rather than on the HDFS, meaning it needs to be specified as a "file".

Files can be moved from HFDS (hadoop file directory system) to local with:

hadoop fs -copyToLocal

Similarily hadoop fs - copyFromLocal hadoop fs -put hadoop fs -mkdir hadoop fs -ls

Some tutorials I used to set up hadoop: http://zhongyaonan.com/hadoop-tutorial/setting-up-hadoop-2-6-on-mac-osx-yosemite.html https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html https://ambari.apache.org/1.2.5/installing-hadoop-using-ambari/content/ambari-kerb-2-2-2b.html http://www.quuxlabs.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/ https://hadoop.apache.org/docs/r1.2.1/streaming.html http://hortonworks.com/hadoop-tutorial/using-commandline-manage-files-hdfs/

hadoop-canoconical-python's People

Contributors

ahendy avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.