Giter Site home page Giter Site logo

rust-etl's Introduction

๐Ÿฆ€ Rust for Extract, Transform, and Load operations

Practice ETL with Rust and Polars

This repository will walk you through examples for each step in ETL so that you can apply Rust and Polars for these operations using a sample CSV dataset.

You will be using a sample dataset that contains wines from all over the world. Explore the wine dataset and familiarize yourself with the data before you start the ETL process.

Each example is a separate Cargo project and it is meant to be run independently. You can run each example by navigating to the project directory and running the following command:

cargo run ../../top-rated-wines.csv

Lesson 1: Extracting data

For this lesson, you will learn how to read a CSV file and load it into a DataFrame in Polars. You will do minor checking of the data to ensure that it was loaded correctly and that the data is in the expected format.

Lesson 2: Transforming data

For this lesson, you will learn how to transform the data by filtering out unnecessary columns and rows. You will use one hot encoding to convert columns. There are two examples in this lesson, one that does hot encoding on all columns and another that does hot encoding on selected columns.

Lesson 3: Loading data

Finally, for this lesson, you will learn how to save the transformed data into a Parquet file. A Parquet file is a columnar storage file that is optimized for reading and writing data.

Extra challenges

  1. Verify Parquet file: You will save the transformed data into a Parquet file and then read it back to ensure that the data was saved correctly using the Load project as a reference.
  2. Add options for saving: Currently, all projects do not save the CSV back to the file system. Add an option to save the transformed data back to the file system.
  3. Add more transformations: Add more transformations to the data such as sorting, grouping, and aggregating data.
  4. Implement Schema validation: Use Polars Schema validation to ensure that the data is in the expected format before transforming it.

rust-etl's People

Contributors

alfredodeza avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.