Giter Site home page Giter Site logo

data-collectors's Introduction

data-collectors

This project aims to create a simple data pipeline using Prefect to collect data from sources like REST API, FIX, WebSocket, GraphQL, and data crawlers. All collected data will be stored in PostgreSQL for further analysis and used by a prediction engine to predict price trends for trading bots. These sources include various free public data such as open-exchange-rates, Etherscan, crypto-exchanges OHLC, stock data, etc.

How to setup the project

  1. After cloning this repo, cd into the root directory and create a python virtual env
python3 -m venv venv
  1. Once virtual env was setup, activate the virtual env.
source venv/bin/activate
  1. Next, you will need to install all the required packages for this.
pip install -r requirements.txt
  1. Next make sure to create .env file that contain all the credentials to your services like DB, API key etc. Here are the template for .env file.
POSTGRES_USER='your_username' 
POSTGRES_PWD='your_password' 
POSTGRES_PORT='5432'
POSTGRES_DBNAME='postgres'
POSTGRES_HOST='localhost'
  1. We will be using dockerize posgresql database for storing the data. We will persist the data into the data folder home directory, so we will need to create this data folder folder.
mkdir -p ~/data/postgres
  1. Once all packages are installed and necessary folder are created, you should start your dockerize postgresdb. Please ensure the .env is created since we are using those creds in the docker compose.
docker compose up -d 
  1. Now, you can ran the first flow which was to create all the required tables
python run_flow.py create-tables
  1. Prefect come with the local server where you can monitor your flow activities in a pretty dashboard. You can start the server by using
prefect server start

DISCLAIMER

All data provided here are downloaded from public APIs or community data sources. Therefore, I do not claim ownership of the data. I have included a few CSV data dumps downloaded from these sources to help initialize the database with proper data for further analysis and usage. Continuous data downloads from the APIs need to be run manually or scheduled using a scheduler. If you need additional data dumps, please visit the hosting sites; I have included the relevant links below.

Public Data Sources

Data Dump CSV - historical data

REST-API endpoints to download data

data-collectors's People

Contributors

inotives avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.