Giter Site home page Giter Site logo

viadot's Introduction

Viadot

build status formatting codecov

Documentation: https://dyvenia.github.io/viadot/

Source Code: https://github.com/dyvenia/viadot


A simple data ingestion library to guide data flows from some places to other places.

Getting Data from a Source

Viadot supports several API and RDBMS sources, private and public. Currently, we support the UK Carbon Intensity public API and base the examples on it.

from viadot.sources.uk_carbon_intensity import UKCarbonIntensity
ukci = UKCarbonIntensity()
ukci.query("/intensity")
df = ukci.to_df()
df

Output:

from to forecast actual index
0 2021-08-10T11:00Z 2021-08-10T11:30Z 211 216 moderate

The above df is a python pandas DataFrame object. The above df contains data downloaded from viadot from the Carbon Intensity UK API.

Loading Data to a Source

Depending on the source, viadot provides different methods of uploading data. For instance, for SQL sources, this would be bulk inserts. For data lake sources, it would be a file upload. We also provide ready-made pipelines including data validation steps using Great Expectations.

An example of loading data into SQLite from a pandas DataFrame using the SQLiteInsert Prefect task:

from viadot.tasks import SQLiteInsert

insert_task = SQLiteInsert()
insert_task.run(table_name=TABLE_NAME, dtypes=dtypes, db_path=database_path, df=df, if_exists="replace")

Set up

Note: If you're running on Unix, after cloning the repo, you may need to grant executable privileges to the update.sh and run.sh scripts:

sudo chmod +x viadot/docker/update.sh && \
sudo chmod +x viadot/docker/run.sh

a) user

Clone the main branch, enter the docker folder, and set up the environment:

git clone https://github.com/dyvenia/viadot.git && \
cd viadot/docker && \
./update.sh

Run the enviroment:

./run.sh

b) developer

Clone the dev branch, enter the docker folder, and set up the environment:

git clone -b dev https://github.com/dyvenia/viadot.git && \
cd viadot/docker && \
./update.sh -t dev

Run the enviroment:

./run.sh -t dev

Install the library in development mode (repeat for the viadot_jupyter_lab container if needed):

docker exec -it viadot_testing pip install -e . --user

Running tests

To run tests, log into the container and run pytest:

docker exec -it viadot_testing bash
pytest

Running flows locally

You can run the example flows from the terminal:

docker exec -it viadot_testing bash
FLOW_NAME=hello_world; python -m viadot.examples.$FLOW_NAME

However, when developing, the easiest way is to use the provided Jupyter Lab container available in the browser at http://localhost:9000/.

How to contribute

  1. Fork repository if you do not have write access
  2. Set up locally
  3. Test your changes with pytest
  4. Submit a PR. The PR should contain the following:
    • new/changed functionality
    • tests for the changes
    • changes added to CHANGELOG.md
    • any other relevant resources updated (esp. viadot/docs)

The general flow of working for this repository in case of forking:

  1. Pull before making any changes
  2. Create a new branch with
git checkout -b <name>
  1. Make some work on repository
  2. Stage changes with
git add <files>
  1. Commit the changes with
git commit -m <message>

Note: See out Style Guidelines for more information about commit messages and PR names

  1. Fetch and pull the changes that could happen while working with
git fetch <remote> <branch>
git checkout <remote>/<branch>
  1. Push your changes on repostory using
git push origin <name>
  1. Use merge to finish your push to repository
git checkout <where_merging_to>
git merge <branch_to_merge>

Please follow the standards and best practices used within the library (eg. when adding tasks, see how other tasks are constructed, etc.). For any questions, please reach out to us here on GitHub.

Style guidelines

  • the code should be formatted with Black using default settings (easiest way is to use the VSCode extension)
  • commit messages should:
    • begin with an emoji
    • start with one of the following verbs, capitalized, immediately after the summary emoji: "Added", "Updated", "Removed", "Fixed", "Renamed", and, sporadically, other ones, such as "Upgraded", "Downgraded", or whatever you find relevant for your particular situation
    • contain a useful description of what the commit is doing

Set up Black for development in VSCode

Your code should be formatted with Black when you want to contribute. To set up Black in Visual Studio Code follow instructions below.

  1. Install black in your environment by writing in the terminal:
pip install black
  1. Go to the settings - gear icon in the bottom left corner and select Settings or type "Ctrl" + ",".
  2. Find the Format On Save setting - check the box.
  3. Find the Python Formatting Provider and select "black" in the drop-down list.
  4. Your code should auto format on save now.

viadot's People

Contributors

trymzet avatar m-paz avatar adamsulek avatar rafalz13 avatar dependabot[bot] avatar angelika233 avatar winiar93 avatar github-actions[bot] avatar lzuchowska avatar acivitillo avatar pzet avatar lauralicja avatar msocha83 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.