Giter Site home page Giter Site logo

dagshub / open-source-data-pipeline Goto Github PK

View Code? Open in Web Editor NEW
2.0 3.0 0.0 9 KB

A repository that holds machine learning projects that uses DVC for data pipeline orchestration

ai dagshub dvc dvc-pipeline hacktoberfest hacktoberfest-2023 machine-learning machinelearning mlops open-source

open-source-data-pipeline's Introduction

Open Source Data Pipeline 🐶

Welcome to DagsHub’s Data Pipeline contribution project for Hacktoberfest 2023!

hero-narrow

In this exciting Hacktoberfest challenge, DagsHub invites you to build data pipelines using DVC for automation and versioning of Open Source Machine Learning projects.

What is DagsHub?

DagsHub is a centralized platform to host and manage machine learning projects including code, data, models, experiments, annotations, model registry, and more! DagsHub does the MLOps heavy lifting for its users. Every repository comes with configured S3 storage, an experiment tracking server, and an annotation workspace - all using popular open-source tools like MLflow, DVC, Git, and Label Studio.

What's the Challenge?

DagsHub is excited to introduce the DVC Data Pipeline Contribution Challenge. In this challenge, we invite you to contribute DVC (Data Version Control) data pipelines to open-source projects on DagsHub. DVC pipelines are essential for efficiently managing, versioning, and sharing data workflows in machine learning and data science projects.

How Can You Participate?

Here's a step-by-step guide to get involved in this challenge:

  1. Choose a Project: Explore open-source projects on DagsHub and select one that interests you. It can be any project that utilizes data pipelines or would benefit from one.
  2. Create the DVC Pipeline: Fork the project under your name and using DVC, design and execute a data pipeline that suits the project's needs. Ensure it follows best practices for data versioning, reproducibility, and scalability.
  3. Document Your Pipeline: As you build the pipeline, maintain clear and concise documentation describing its purpose, data sources, processing steps, and any dependencies. This documentation is crucial for future users and contributors and should be added to the project’s README file.
  4. Tag your project: Add relevant tags to the repository and files including dvc,data-pipeline, hacktoberfest, and hacktoberfest-2023 labels to the DagsHub repository.
  5. Submit Your Contribution: Open a Pull Request to the project on DagsHub.
  6. Proof of Contribution: Open a Pull Request here with the README.md, dvc.yaml and dvc.lock files and a link to the DagsHub repo.

Why Join the Challenge?

Participating in the DagsHub DVC Data Pipeline Contribution Challenge offers numerous benefits:

  • Skill Enhancement: Sharpen your DVC skills and gain hands-on experience in creating robust data pipelines.
  • Collaborative Learning: Collaborate with open-source project maintainers and fellow contributors, expanding your network and knowledge.
  • Contribution to Open Source: Contribute to the open-source community by enhancing the data workflows of valuable projects.
  • Visibility: Showcase your expertise to a wider audience within the data science and machine learning community.

open-source-data-pipeline's People

Contributors

nirbarazida avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.