aporia-ai / mlops.toys Goto Github PK

View Code? Open in Web Editor NEW

185.0 7.0 37.0 2.43 MB

🎲 A curated list of MLOps projects, tools and resources

Home Page: https://mlops.toys

License: Creative Commons Attribution 4.0 International

JavaScript 28.69% SCSS 2.17% Vue 69.14%

data-science machine-learning awesome mlops list awesome-list

mlops.toys's Introduction

Curated list of useful MLOps projects, tools and resources.

Visit at https://mlops.toys!

Made with ❤️ by Aporia

Contribute

We'd love your help!

If we missed a project, please create an issue with the name of the project and we'll add it :) You can also directly create a pull request.

To run the project locally: npm run dev

mlops.toys's People

Contributors

Stargazers

Watchers

mlops.toys's Issues

Overlapping features

Hi, can't help but notice that many of the tools listed can fulfill more than aspect of MLOps. Should this be broken down for each product so its more obvious?

[Request] FuseML - Open Source AI Orchestrato

Hi there, thanks for this amazing collection of tools and platforms I was wondering if you may add also FuseML. It's a new project I with other 3 are running as incubation project from SUSE and we are looking for help from other contributors.
We got a micro-webiste (https:// fuseml.github.io) and of cource a GH repo (https://github.com/fuseml).

A brief decription for the project:
FuseML allows you to re-use existing components and tools to create and manage the end-to-end ML lifecycle.
Optimize AI workload with a low-code approach and an extensible framework.
Allows teams to quickly iterate and re-use existing, well known tools for rapid experimentation and fast production releases.
It's all about Open Source, MLOps and fast delivery.

Hope this is enough we got also a youtube channel here

Feathr - LinkedIn's Feature Store

LinkedIn has now released their feature store as open-source, should be added to the feature store section

https://github.com/linkedin/feathr

Datapipe

Datapipe is a real-time, incremental Python ETL library for machine learning with record-level dependency tracking.

The library is designed for describing data processing pipelines and is capable of tracking dependencies for each record in the pipeline. This ensures that tasks within the pipeline receive only the data that has been modified, thereby improving the overall efficiency of data handling.

https://datapipe.dev/

Key Features:

Incremental Processing: datapipe processes only new or modified data, significantly reducing computation time and resource usage.
Real-time ETL: The library supports real-time data extraction, transformation, and loading.
Dependency Tracking: Automatic tracking of data dependencies and processing states.
Python Integration: Seamlessly integrates with Python applications, offering a Pythonic way to describe data pipelines.

Ideal projects for Datapipe

Projects with complex ML pipelines with a human-in-the-loop component
ML projects that require real-time model retraining based on newly labeled data
Projects that require content moderation

Github

https://github.com/epoch8/datapipe – Datapipe Core

https://github.com/epoch8/datapipe-examples/ – Usage examples

Screenshots

Logo

Add support for multiple categories per project

Hi maintainers,
We would like a way to add projects to multiple categories which is not possible today (I'm sure MLRun is not the only one).
I feel like this is a pretty basic need 😄

This is in the context of #9
For now I'll add a PR with MLRun in one category

New Project/Company recommendation

I'd like to suggest our company as an addition to the MLOps list.

Name: OctoML
Suggested new category: ML model optimization and acceleration

URL: https://octoml.ai/

Description:
OctoML automatically optimizes machine learning models to deliver up to 30x faster inference or prediction time, without sacrificing accuracy.

Deep Learning models optimized with our open source Apache TVM technology have less user-perceived lag, maximize hardware utilization, saving deployment costs, and are energy efficient for edge/IoT devices.

We also comprehensively benchmark customers’ models across CPU, GPU and Accelerator chips to help select the ideal hardware, balancing cost and performance.

How does OctoML speed up your machine learning predictions automatically?
Built on Apache TVM, the OctoML platform does the hard work of automatically making a model production-ready. Our technology uses machine learning to search the space of possible optimizations for a given model, freeing machine learning engineers from having to do it manually using specialized vendor/kernel libraries. It works by running experiments against the target hardware (CPU, GPU etc) to learn how the hardware behaves when certain automatically chosen optimizations are applied. We explore thousands to millions of permutations of a model. When the process is finished, we deliver a fast, energy efficient and accurate model ready to be pushed to production.

Explainer video: https://www.youtube.com/watch?v=gpO4y1mPMWA

Add TerminusDB

Suggestion to include TerminusDB in the list of MLOps tooling:

TerminusDB is an open-source graph platform and document store. It is designed for building data-intensive applications and knowledge graphs.

Quickly build versioned, bitemporal data products and give access to your domain teams
Visually construct data models, which are easy, maintainable, and enforced
Share your work and collaborate with colleagues
Versioning first - a full audit log, with commit history. You can see how data has changed, query diffs, and roll-back errors. You can also time-travel to any point in the data's history so you always know what happened.
Data lineage - where data comes from and how it got here

Link to introductory video: https://youtu.be/RNeYYvYIZbs

Try TerminusDB

Katonic.Ai

MLOPS Platform

Add Syndicai

Hey, Great initiative!

Would be amazing to add Syndicai there. I already prepared a PR, so you can just have a look!

New tool to add, KitOps

Check out KitOps, we just launched it to help ease model handoffs between data scientists and app devs or devops folk.

https://github.com/jozu-ai/kitops

Automate your cycle of Intelligence

Katonic MLOps Platform is a collaborative platform with a Unified UI to manage all data science activities in one place and introduce MLOps practice into the production systems of customers and developers. It is a collection of cloud-native tools for all of these stages of MLOps:

-Data exploration
-Feature preparation
-Model training/tuning
-Model serving, testing and versioning
Katonic is for both data scientists and data engineers looking to build production-grade machine learning implementations and can be run either locally in your development environment or on a production cluster. Katonic provides a unified system—leveraging Kubernetes for containerization and scalability for the portability and repeatability of its pipelines.

It will be great if you can list it on your account

Website -
Katonic One Pager.pdf
https://katonic.ai/

Hello? Is this abandoned? 😅

Hi folks
As I'm still in the MLOps domain I found myself revisiting the site and repo again to add more project(s) (from iterative.ai), but I see open PRs go back over 1y old...

Is this project/website dead? It would be great to either revive or archive it (and kill the website if so) for clarity since some people may still use this to discover MLOps tools.
If you still want to keep this alive but could may be use assistance with PR reviews/curation every now and then, let me know

(CC @SnirShechter @alongubkin)
Thanks!

Additional label/tag for Bodywork

Hello,

Many thanks for including Bodywork!

Is it possible to add Training Orchestration in addition to Model Serving, as we cover both in equal measures?

Many thanks,

Alex

Missing project request!

Hi,

We are making it super easy for data scientists to deploy AI at inferrd.com. We'd love to be included on your website. Can I create a pull request?

Please add Katonic.ai.

[Request] Missing project - PrimeHub

Hi there,

Please help adds the project, PrimeHub. Many thanks.

PrimeHub, a Kubernetes-based collaborative ML platform for teams of data scientists and administrators.

Cluster Computing
One-Click Notebook Environments
Group-centric Dataset Management / Resources Management / Access-control Management
Custom Machine Learning Environments
Model Tracking and Deployment
Capability Augmentation with 3rd-party Apps

It equips administrators with group-centric managements and eases MLOps for data scientists with pluggable capabilities.

Try PrimeHub CE
Try PrimeHub

Add Kedro

I'm not 100% sure it fits your criteria, but I think Kedro would be of interest to people who land here

https://github.com/quantumblacklabs/kedro

Add Activeloop

Hey there team Aporia!

Would love for our stack to be featured on the list, but I don't think there's a good category.

Our stack comprises Activeloop Hub , our open-source dataset format for AI (allows for streaming/version-control/querying of data in tensor-based format), as well as our platform that helps visualize, version control, query image, video, and audio data and plug it in TF/PT/other frameworks.

Would you be able to point me in the right direction?

Changes to Aim listing

Hello, I would like to make the following text changes to Aim and to add a demo video to our listing https://www.youtube.com/watch?v=g_rxmOiphgw&t=303s&ab_channel=DataTalksClub. Thank you very much!

An easy-to-use & supercharged open-source experiment tracker. Aim logs your training runs, enables a beautiful UI to compare them and an API to query them programmatically.

Why use Aim?

Compare runs easily to build models faster. Group and aggregate 100s of metrics. Analyze and learn correlations. Query with easy pythonic search.

Deep dive into details of each run for easy debugging. Explore hparams, metrics, images, distributions, audio, text, etc. Track plotly and matplotlib plots. Analyze system resource usage.

Have all relevant information centralized for easy governance. Centralized dashboard to view all your runs. Use SDK to query/access tracked runs. You own your data - Aim is open source and self hosted.

Filter by open source

Hi, great idea!

Would it be possible to add a filter so that viewers can choose to display only open source projects?

Cheers,
D.

New Tool: Wallaroo.AI

New tool to add Wallaroo.AI - platform to deploy, manage and observe any model at scale across any environment from cloud to edge. Lets you go from python notebook to inferencing in minutes.

Add lakeFS

Would love to see lakeFS added to this cool project!

Name: lakeFS
Category: Data Versioning
URL: https://lakefs.io

Description:
lakeFS is an open-source data lake management platform that transforms your object storage into a Git-like repository. lakeFS enables you to manage your data lake the way you manage your code. Run parallel pipelines for experimentation and CI/CD for your data.

lakeFS features:

Exabytes scale version control
Git-like operations: branch, commit, merge, revert
Zero copy branching for frictionless experiments
Full reproducibility of data and code
Pre-commit/merge hooks for data CI/CD
Instantly revert changes to data

Getting started video: https://www.youtube.com/watch?v=xThorxDzmrw&t=5s

MarkovML - Data to Gen AI Faster

MarkovML is an easiest no-code platform to understand data, streamline AI workflows and build apps to get from data to actionable AI faster. It allows you to:

Perform Data Analysis: Quickly analyze text-based datasets just in a few clicks without using code.

No-Code Auto EDA: Unlock deep insights from your data using our Auto Data Analyzers powered by AI. Identify data gaps, outliers, and patterns to make informed modeling decisions.
Collaborative Reporting: Together, create and share comprehensive visual reports to eliminate scattered information, siloed knowledge, and disconnected communication.

Easy Organization And Discovery: Use our Intelligent Data Catalog to effortlessly organize AI data, metrics, and insights from all your ML workflows in one centralized place for seamless discovery, traceability, and lineage.

Build Hosted AI Application: Effortlessly build interactive AI & GenAI Apps from your data using a drag-n-drop interface.
Adopt GenAI With Ease: Boost your speed of innovation by streamlining Generative AI development with our no-code, intuitive drag-and-drop interface - Mizzen.
Seamlessly Versatile: Build a wide range of applications, from summarization and classification to semantic search and Q&A, with just a few clicks.
Build Confidently From Day One: Ensure robust data governance, ensure privacy, and uncompromising security for your AI applications freeing your team to focus on building better AI applications.

Automate Workflows: Create automated workflows using our intuitive no code workflow builder.

Build Custom Workflows: Boost team productivity by effortlessly building complex data workflows from scratch with our intuitive, drag-and-drop interface.
Reusable Workflows: Save effort by using our pre-built templates or reusing and sharing your previously created workflows.
Automate Manual Data Tasks: Save time and effort by automating monotonous data tasks, auto scheduling workflows and eliminating human errors.

Click here to Sign-up for free

Adding Modzy

Name: Modzy
buttonText: Try Modzy
link: https://www.modzy.com/try-free/
category: Model Monitoring, Model Serving, Experiment Tracking, Explainability
description: >- Modzy is an MLOps platform that accelerates the deployment, integration, and monitoring of production-ready AI.

Features:

Easy model deployment and monitoring for data scientists, with drift detection and explainability
APIs and SDKs in Python, Java, Javascript, and Go for developers to integrate AI models into any application
Support for cloud, on-premise, edge or hybrid deployments, with military-grade security

gitHubRepo: https://github.com/modzy
YouTube demo link: https://youtu.be/TluT0ZG-QRM

Terraform Provider Iterative (TPI)

Hello! We have a new MLOPs tool we'd love to add to the mlops.toys!

You can find the repo here: https://github.com/iterative/terraform-provider-iterative
Read the blog post: https://dvc.org/blog/terraform-provider
Watch the video: https://youtu.be/2fEgO8SazSE

Let me know if you need anything else or would like to collaborate in some way!

Add Streamlit

The Streamlit system seems appropriate in at least model serving

https://streamlit.io/

Adding MLRun

name: MLRun
buttonText: Try MLRun
link: 'https://mlrun.org'
category: Feature Store, Model Monitoring, Model Serving, Training Orchestration, Experiment Tracking
description: >-
MLRun is an end-to-end open-source MLOps solution to manage and automate your entire analytics and machine learning lifecycle, from data ingestion, through model development to full pipeline deployment. MLRun eases the development of machine learning pipelines at scale and helps ML teams build a robust process for moving from the research phase to fully operational production deployments.

Feature and Artifact Store: handles the ingestion, processing, metadata, and storage of data and features across multiple repositories and technologies.
Elastic Serverless Runtimes: converts simple code to scalable and managed microservices with workload-specific runtime engines (such as Kubernetes jobs, Nuclio, Dask, Spark, and Horovod).
ML Pipeline Automation: automates data preparation, model training and testing, deployment of real-time production pipelines, and end-to-end monitoring.
Central Management: provides a unified portal for managing the entire MLOps workflow. The portal includes a UI, a CLI, and an SDK, which are accessible from anywhere.

gitHubRepoName: mlrun/mlrun
youTubeVideoId: _3mxz3zMPpw

logo:

Add support for categories (plural)

Please add 'Feathr' from linkedin to the list

Please add 'Feathr' from linkedin to the list of open source tools