Giter Site home page Giter Site logo

wscbs_assignment4b's Introduction

WSCBS_Assignment4b_Brane_Pipeline

DOI

Introduction

This course project is about using the Brane framework to implement a data processing pipeline. Our pipeline is built for the Kaggle challenges -- Titanic โ€“ Machine Learning from Disaster.

Note, in this project, we are not only focusing on the machine learning and data processing part, the other goal is to complete the process of this production pipeline through the framework of Brane.

This is an assignment of Web Service and Cloud-Based Service course in UvA at period 5, 2021-2022.

Structure

Our pipeline consists of four Brane packages: setup getfeatures, trainandpredict and visualization.

The compute package

The visualization package

Among them, setup is the package used for data preparation. getfeatures and trainandpredict are packages used for computation, including data processing and model training functions. And, visualization package is used to generate corresponding figures based on the processed data.

Getting-Started

We use submodule for each individual package of this repository. To clone the whole repository, run :

$ git clone --recurse-submodules https://github.com/TISNN/WSCBS_Assignment4b.git

For getting each submodule, please go to the package's git repository. All of the details of code documentation and setup instructions are listed in the README.md at each submodule.

Running the pipeline

After the installation of Brane environment, use makefile to build all the brane package, it will take about 10 mins.

$ make

Also, users can directly import the package via brane import commands.

$ brane import TISNN/brane-getfeatures
$ brane import TISNN/brane-trainandpredict
$ brane import TISNN/brane-visualization

The complete pipeline implementing by BraneScript is in pipeline.ipynb.

Testing

We created both python unit testing and automated testing by GitHub Actions and BraneScript.

1. pytest

Since we are writing each package separately, unit testing for the core functions is necessary to ensure they are executed correctly. To do so, we've built python scripts to test each of our functions individually. The pytest scripts are put in the pytest.py file, in each package.

2. Automated testing by Branescript

Another complete test is to consider the execution of the pipeline in Brane. For this testing, we created automated test workflow for each Brane package, using GitHub Actions service.

The steps for testing include:

  1. Setup of Docker, Docker Compose, Docker Buildx.
  2. Install Brane CLI (by copy usr/local/bin/brane file)
  3. Build the Brane package
  4. Run package by BraneScript.

The BraneScript is executed by the brane run command in the form of test.txt. We can determine whether it has successfully completed the task by examining the results of the execution.

After accomplishing this, we have actually built the complete CI/CD, which is part of the standard development workflow. Every time we use git push to update our code, Github Actions will automatically test it based on the workflow (.github/workflow/cicd_test.yml) we created. For each package in this project, it takes about 6 minutes to complete the branescript testing.

Notes

Run pipelines on cluster

At the beginning of the project, we have been trying to run brane directly on the cluster, but unfortunately due to kernel, RAM issues (or other problems), we were not able to successfully install the Brane environment on the cluster.

After installing Brane on another linux machine, we fetched the binary executable compiled brane file and uploaded it to /usr/local/bin of the cluster machine, so that we could finally run our packages on the cluster.

DOI

We created DOIs for each package by archiving it on Zenodo.

wscbs_assignment4b's People

Contributors

tisnn avatar ym-xu avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.