Giter Site home page Giter Site logo

otaviog / rflow Goto Github PK

View Code? Open in Web Editor NEW
10.0 3.0 2.0 2.68 MB

RFlow - A workflow framework for agile machine learning

License: MIT License

Dockerfile 0.14% Makefile 1.82% Python 98.03%
machine-learning workflow data-science ml ml-infrastructure ml-pipeline python productivity

rflow's Introduction

RFlow - A workflow framework for agile machine learning

pipeline status

Introduction

The Research Flow (RFlow) is a Python framework for creating Directed Acyclic Graph (DAG) workflows. The project's goal is to remove boilerplate code from common machine learning stages like data preprocessing, model fitting and evaluation.

The image below shows the graph visualization of the MNIST classification example. Rflow managed the connections from dataset parsing, training, to testing, alongside its parameter values:

In the example above, the training node is defined like:

class Train(rflow.Interface):
    # The `evaluate` funtion is every node's execution entry point.
    # Every argument is tracked by rflow (unless if it's specified in `non_collateral`)
    # Node are executed again if the tracked argument changes after the previous run.
    def evaluate(self, resource, train_dataset, test_dataset,
                 batch_size, test_batch_size, epochs, learning_rate=1.0, gamma=0.1,
                 device="cuda:0", log_interval=10):
        """Trains the Mnist model.
        """
		
        from torch.utils.data import DataLoader
        from torch.optim.lr_scheduler import StepLR
        import torch.optim as optim

        train_loader = DataLoader(...)
        test_loader = DataLoader(...)

        model = Net().to(device)
        model.train()
	    ... 
		
        for epoch in range(1, epochs + 1):
            train(model, device, train_loader,
                  optimizer, epoch, log_interval)
            test(model, device, test_loader)
	
		...

        torch.save(model.cpu(), resource.filepath)
        return model.to(device)

    # When nodes are update, then rflow calls load instead of evaluate.
    def load(self, resource, device):
        """Loads trained model
        """
        return torch.load(resource.filepath).to(device)

    def non_collateral(self):
        """Lists arguments that doesn't change the node's output.
        """
        return ["device", "log_interval"]

Joining with other nodes for loading dataset and testing, an experiment's DAG can be created by a function decorated with @rflow.graph:

@rflow.graph()
def mnist_train(g):
    # Resources are output definitions. FSResources represent local files.  
    g.dataset = LoadDataset(rflow.FSResource("data"))

    g.train = Train(rflow.FSResource("model.torch"))
    with g.train as args:
        args.train_dataset = g.dataset[0]
        args.test_dataset = g.dataset[1]
        args.batch_size = 64
        args.test_batch_size = 1000
        args.epochs = 14
        args.learning_rate = 1.0
        args.gamma = 0.1
        args.device = "cuda:0"

    g.test = Test()
    with g.test as args:
        args.model = g.train
        args.test_dataset = g.dataset[1]
        args.test_batch_size = 1000
        args.device = "cuda:0"

The graph then can be executed with a shell command like the following:

$ rflow mnist_train run test

Where test is the node's name. It's possible to specify to run until any node.

More information

Currently the project focus on workflows of prototype experiments, targeted into single machine users.

This project is under development, but should be usable for small projects.

Getting Started

Python>=3.4 is required. It's recommend to install graphviz.

$ sudo apt install graphviz

Install using pip:

$ pip install git+https://github.com/otaviog/rflow

For development setup, please refer to the CONTRIBUTING guide.

Create your first workflow:

import rflow

class CreateMessage(rflow.Interface):
    def evaluate(self, msg):
        return msg

class Print(rflow.Interface):
    def evaluate(self, msg):
        print(msg)

@rflow.graph()
def hello(g):
    g.create = CreateMessage()
    g.create.args.msg = "Hello"

    g.print = Print()
    g.print.args.msg = g.create

if __name__ == '__main__':
    rflow.command.main()

Save it as workflow.py and run with rflow command:

$ rflow hello run print
UPDATE  hello:print
.UPDATE  hello:create
.RUN  hello:create
.^hello:create
RUN  hello:print
Hello
^hello:print

Use the command viz-dag to visualizate the DAG:

$ rflow hello viz-dag

rflow's People

Contributors

otaviog avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.