Giter Site home page Giter Site logo

alonisser / knesset-data-pipelines Goto Github PK

View Code? Open in Web Editor NEW

This project forked from hasadna/knesset-data-pipelines

0.0 3.0 0.0 646 KB

knesset data scrapers and data sync - using the datapackage pipelines framework

License: MIT License

Shell 39.69% Python 59.45% Smarty 0.86%

knesset-data-pipelines's Introduction

Knesset data pipelines

Build Status

Knesset data scrapers and data sync

Uses the datapackage pipelines framework to scrape Knesset data and aggregate to different data stores (PostgreSQL, Elasticsearch, Files)

Available Endpoints

Contributing

Looking to contribute? check out the Help Wanted Issues or the Noob Friendly Issues for some ideas.

Running the full pipelines environment using docker

A note for windows users:

Using windows with our docker environment is not currently recomended or supported. The build process seems to fail on numerous issues. We suggest that windows users either dual-boot to Linux, or run Linux in virtualbox. Best supported version is Ubuntu 17.04 If you wish to use windows, do so at your own risk, and please update this README file with instructions if you succeed.

Instructions for running on Ubuntu (other distros and mac should follow a similar process):

This will provide:

  • Pipelines dashboard: http://localhost:5000/
  • PostgreSQL server, pre-populated with data: postgresql://postgres:123456@localhost:15432/postgres
  • Minio object storage: http://localhost:9000/
    • Access Key = admin
    • Secret = 12345678
  • Adminer - DB Web UI: http://localhost:18080/
    • Database Type = PostgreSQL
    • Host = db
    • Port = 5432
    • Database = postgres
    • User = postgres
    • Password = 123456

After every change in the code you should run sudo bin/build.sh && sudo bin/start.sh

Installing the project locally and running tests

You should have an activated python 3.6 virtualenv, following procedure will work on Ubuntu 17.04:

curl -kL https://raw.github.com/saghul/pythonz/master/pythonz-install | bash
echo '[[ -s $HOME/.pythonz/etc/bashrc ]] && source $HOME/.pythonz/etc/bashrc' >> ~/.bashrc
source ~/.bashrc
sudo apt-get install build-essential zlib1g-dev libbz2-dev libssl-dev libreadline-dev libncurses5-dev libsqlite3-dev libgdbm-dev libdb-dev libexpat-dev libpcap-dev liblzma-dev libpcre3-dev
pythonz install 3.6.2
sudo pip install virtualenvwrapper
echo 'export WORKON_HOME=$HOME/.virtualenvs; export PROJECT_HOME=$HOME/Devel; source /usr/local/bin/virtualenvwrapper.sh' >> ~/.bashrc
source ~/.bashrc
cd knesset-data-pipelines
mkvirtualenv -a `pwd` -p $HOME/.pythonz/pythons/CPython-3.6.2/bin/python3.6 knesset-data-pipelines

Before running any knesset-data-pipelines script, be sure to activate the virtualenv

You can do that by running workon knesset-data-pipelines

Once you are inside a Python 3.6 virtualenv, you can run the following:

  • bin/install.sh
  • bin/test.sh

You can set some environment variables to modify behaviors, see a refernece at .env.example

Running the dpp cli

  • using docker: bin/dpp.sh
  • locally (from an activated virtualenv): dpp

Run all pipelines at once

Warning this might seriously overload your CPU, use with caution..

docker-compose up -d redis db minio
source .env.example
for PIPELINE in `dpp | tail -n+2 | cut -d" " -f2 -`; do
    dpp run "${PIPELINE}" &
done

Debugging committee meeting protocols

You should have the committee and session id of a meeting that you want to investigate

In this example the session id is 284231 and committee id is 196

knesset-data-pipelines's People

Contributors

orihoch avatar oknesset-deployment-bot avatar ravivbarzilay avatar aryehpro avatar abezgauzdina avatar pyup-bot avatar andreyors avatar gilwo avatar

Watchers

James Cloos avatar Alonisser avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.