Giter Site home page Giter Site logo

peterjproche / chexnet Goto Github PK

View Code? Open in Web Editor NEW

This project forked from edhenry/chexnet

0.0 0.0 0.0 391.95 MB

Implementation and fullstack pipeline for CheXNet classifier

License: MIT License

Python 2.51% Dockerfile 0.14% HTML 0.03% Jupyter Notebook 97.32%

chexnet's Introduction

CheXNet Pipeline

This repository is an implementation of the CheXNet solution outlined in CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning.

Overview

The goal of this implementation and the surrounding tooling to enable someone whom is interested, generally, in Machine Learning (ML) and would like to understand better what an example end to end Machine Learning pipeline might look like. All of the tools and utilities that are used within this implementation are open source, highlighting the open source and open science approach that I think is necessary to allow for further adoption of ML solutions.

The pipeline is general enough that one can swap the input portion and model portion of the pipelines to allow for other applications. This example is specific to computer vision, however one can feasibly "slot in" another application using the same tooling and rough outline of a pipeline.

A video explaining how to get the "production" pipeline up and running on a machine can be found here : https://www.youtube.com/watch?v=AQLgIxQC5g0

Tools and utilities

The pipeline consists of many different tools and utilities. I will provide an outline of each tool and utility below and what they are used for.

I will work through the list starting with data acquisition on through to training and deployment of a machine learning model. I will also cover the process of re-training a model and performing AB testing on the two models to measure whether or not a new model has better performance.

Project Jupyter exists to develop open-source software, open-standards, and services for interactive computing across dozens of programming languages. This will be used within the project as a UI for uploading images that we would like to run through our model.

Pachyderm - (WIP)

Pachyderm is used for data versioning and pipelining, like the link above states. This project leverages Pachyderm to create pipelines that are used not only in the preprocessing required for input images, but also for AB testing.

Ansible is an automation framework that can be used to define and provision software environments. We use this to provision the tooling required for the end to end pipeline.

There will be another README at the root of the playbooks directory that outlines what each play accomplishes should anyone want to modify or extend the framework.

Docker is an open source container platform that is great for dependency management, especially in projects with a diverse set of tools and libraries required for production. Docker is leveraged heavily in this example environment for dependency management and shipping models to production.

Docker compose is used to define each of the respective environments for all of the supporting services surrounding the entire pipeline. A docker-compose.yml can be found under each of the roles within the playbook directory of this repository.

This is the popular machine learning library released by Google. This is the library used for defining and training the CheXNet model.

This tool is used for tracking the training of the machine learning model.

Kafka is used as a message bus between the various services that would like to consume the images as they're feed into the system from the UI.

TensorFlow Serving is a model server that will be used to serve trained models. API calls will be made against the TensorFlow Server for performing inference over a trained model.

chexnet's People

Contributors

edhenry avatar kholohan avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.