Giter Site home page Giter Site logo

wooodhead / githunter-scraper Goto Github PK

View Code? Open in Web Editor NEW

This project forked from labbsr0x/githunter-scraper

0.0 1.0 0.0 370 KB

The objective this tool is to feed a Mongo database with public information of some repositories hosted in GitHub, Gitlab and others providers.

License: MIT License

JavaScript 98.69% Dockerfile 0.82% Shell 0.50%

githunter-scraper's Introduction

GitHunter-Scraper

Githunter-Scraper is a tool for scraper some public information about repositories hosted in GitHub, Gitlab and others providers. The scraper is made based on an entry point (trending page on GitHub/GitLab, Mongo database, list of organization's members etc).

How to run

Run locally

After clone this repository, execute this command on terminal:

npm install

So on, run a command line like this:

node githunter-scraper.js --scraperPoint trending --provider github --nodes issuesV1

scraperPoint (required): It is the start point, from where the script should get the repositories to be scraper. For trending means that will crawl the github explore page, in trending tab.

provider (required): Where should read all information.

nodes (optional): Which king of information should read. Known nodes are: repository, issues, pulls and commits

Run in Docker with Conductor

You only need execute the script "startDocker.sh" present in root of the application:

./startDocker.sh

Or execute a simply docker-compose up -d

Usage

With Conductor

You can start a workflow defined inside ./conductor/server/provisioning by sending a post to the conductor-server url defined in docker compose file, like this:

curl -X POST \
  http://localhost:8080/api/workflow \
  -H 'Accept: */*' \
  -H 'Content-Type: application/json' \
  -d '{
    "name": "scraper_users",
    "version": 1,
    "input":{
      "scraperPoint": "organization.members",
      "nodes": "userStats",
      "organization": "bancodobrasil",
      "provider": "github"
    }
}'

With Schellar

Or you can start a workflow by scheduling it with Schellar, like this:

curl -X POST \
  http://localhost:3000/schedule \
  -H 'Content-Type: application/json' \
  -H 'cache-control: no-cache' \
  -d '{
    "name": "scraper-users-minute",
    "enabled": true,
    "parallelRuns": false,
    "workflowName": "scraper_users",
    "workflowVersion": "1",
    "cronString": "* * * * *",
    "workflowContext": {
      "scraperPoint": "organization.members",
      "nodes": "userStats",
      "organization": "bancodobrasil",
      "provider": "github"
    },
    "fromDate": "2019-01-01T15:04:05Z",
    "toDate": "2029-07-01T15:04:05Z"
}'

License

MIT

githunter-scraper's People

Contributors

finhaa avatar rafamarts avatar ramonfsk avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.