Giter Site home page Giter Site logo

giraph's Introduction

Giraph : Large-scale graph processing on Hadoop

Web and online social graphs have been rapidly growing in size and
scale during the past decade.  In 2008, Google estimated that the
number of web pages reached over a trillion.  Online social networking
and email sites, including Yahoo!, Google, Microsoft, Facebook,
LinkedIn, and Twitter, have hundreds of millions of users and are
expected to grow much more in the future.  Processing these graphs
plays a big role in relevant and personalized information for users,
such as results from a search engine or news in an online social
networking site.

Graph processing platforms to run large-scale algorithms (such as page
rank, shared connections, personalization-based popularity, etc.) have
become quite popular.  Some recent examples include Pregel and HaLoop.
For general-purpose big data computation, the map-reduce computing
model has been well adopted and the most deployed map-reduce
infrastructure is Apache Hadoop.  We have implemented a
graph-processing framework that is launched as a typical Hadoop job to
leverage existing Hadoop infrastructure, such as Amazon’s EC2.  Giraph
builds upon the graph-oriented nature of Pregel but additionally adds
fault-tolerance to the coordinator process with the use of ZooKeeper
as its centralized coordination service and is in the process of being
open-sourced.

Giraph follows the bulk-synchronous parallel model relative to graphs
where vertices can send messages to other vertices during a given
superstep.  Checkpoints are initiated by the Giraph infrastructure at
user-defined intervals and are used for automatic application restarts
when any worker in the application fails.  Any worker in the
application can act as the application coordinator and one will
automatically take over if the current application coordinator fails.

-------------------------------

Building and testing:

You will need the following:
- Java 1.6
- Maven 2 or higher

Use the maven commands to:
- compile (i.e mvn compile)
- package (i.e. mvn package)
- test (i.e. mvn test)
-- For testing, one can submit the test to a running Hadoop instance
   (i.e. mvn test -Dprop.mapred.job.tracker=localhost:50300)

-------------------------------

Running:

You will need the following:
- Hadoop 0.20.203 supported, other versions may work as well

giraph's People

Stargazers

 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.