Giter Site home page Giter Site logo

imclab / githubarchive.org Goto Github PK

View Code? Open in Web Editor NEW

This project forked from igrigorik/gharchive.org

0.0 1.0 0.0 264 KB

GitHub Archive is a project to record the public GitHub timeline, archive it, and make it easily accessible for further analysis.

Home Page: http://www.githubarchive.org

githubarchive.org's Introduction

GitHub Archive

http://www.githubarchive.org

Open-source developers all over the world are working on millions of projects: writing code & documentation, fixing & submitting bugs, and so forth. GitHub Archive is a project to record the public GitHub timeline, archive it, and make it easily accessible for further analysis.

Stats


Looking for the daily top new & watched repository reports? Sign up here.


GitHub provides 18 event types, which range from new commits and fork events, to opening new tickets, commenting, and adding members to a project. The activity is aggregated in hourly archives, which you can access with any HTTP client:

Query Command
Activity for April 11, 2012 at 3PM PST wget http://data.githubarchive.org/2012-04-11-15.json.gz
Activity for April 11, 2012 wget http://data.githubarchive.org/2012-04-11-{0..23}.json.gz
Activity for April 2012 wget http://data.githubarchive.org/2012-04-{01..31}-{0..23}.json.gz

Note: timeline data is available starting February 12, 2011.


Each archive contains a stream of JSON encoded GitHub events (sample), which you can process in any language. Ruby example:

require 'open-uri'
require 'zlib'
require 'yajl'

gz = open('http://data.githubarchive.org/2012-03-11-12.json.gz')
js = Zlib::GzipReader.new(gz).read

Yajl::Parser.parse(js) do |event|
  print event
end

Note: example script to import data into SQLite db


GitHub Archive dataset is also available via Google BigQuery. The JSON data is normalized and is updated every hour, allowing you to run arbitrary queries and analysis over the entire dataset in seconds. To get started, login into the BigQuery console, and add the project (name: "githubarchive"), or take a look at the 03/11..05/11 snapshot of the data under "publicdata:samples":

BigQuery

An example query, for more check the repository readme:

/* top 100 repos for Ruby by number of pushes */
SELECT repository_name, count(repository_name) as pushes, repository_description, repository_url
FROM [githubarchive:github.timeline]
WHERE type="PushEvent"
    AND repository_language="Ruby"
    AND PARSE_UTC_USEC(created_at) >= PARSE_UTC_USEC('2012-04-01 00:00:00')
GROUP BY repository_name, repository_description, repository_url
ORDER BY pushes DESC
LIMIT 100

License

(MIT License) - Copyright (c) 2012 Ilya Grigorik

githubarchive.org's People

Contributors

adamstac avatar andrew avatar danishkhan avatar igrigorik avatar izuzak avatar klangner avatar n0rmrx avatar piotrsikora avatar soult avatar tsnow avatar utkarshkukreti avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.