modelfoxdotdev / modelfox Goto Github PK

ModelFox makes it easy to train, deploy, and monitor machine learning models.

License: Other

Rust 83.77% CSS 1.56% Elixir 1.27% Go 1.43% JavaScript 0.33% TypeScript 0.76% Python 6.25% Jinja 0.01% Ruby 2.04% Nix 0.48% HTML 0.01% PHP 2.09%

rust machine-learning elixir elixir-lang go golang js javascript python python3

modelfox's People

Contributors

Stargazers

Watchers

Forkers

isgasho sam-mix chronsyn trendingtechnology cxz rizalgowandy ezzy huangweiboy2 smvorwerk toread-jxj gcaaa31928 vaibhavds19 nkconnor nicholasrobertm iceshou matthewrobertbell sl1mb0 deciduously praveer-k ai-app techseeker-404 ai-and-ml masonforest jesuhrz renyuanz xlpe huaxingxu onetoolscollection hooops gaozining dmitrysychev kiskoza farzad-jalali hedonhermdev kfabryczny legneato type-checker schaudge davidkuhta peter-kehl tbillington mostafaeissa koute pierorolando1 hasheddan swiftdiaries winkexp otsaw webclinic017 chuxiaof bridgecrew-perf7 iakirago gg-big-org mitchygoodness arthurking000 daryllimyt oyelowo inuncepching sanyaade-projects 5l1v3r1 bycloudary ego roger120981

modelfox's Issues

Ruby: RuntimeError when following the docs

When following the "predict with Ruby" instructions here, it raises a RuntimeError

$ ruby test.rb
/home/vagrant/.rbenv/versions/3.0.1/lib/ruby/gems/3.0.0/gems/tangram-0.6.1/lib/tangram/tangram.rb:424:in `block in new_predict_input': tangram error (RuntimeError)
        from /home/vagrant/.rbenv/versions/3.0.1/lib/ruby/gems/3.0.0/gems/tangram-0.6.1/lib/tangram/tangram.rb:416:in `each'
        from /home/vagrant/.rbenv/versions/3.0.1/lib/ruby/gems/3.0.0/gems/tangram-0.6.1/lib/tangram/tangram.rb:416:in `new_predict_input'
        from /home/vagrant/.rbenv/versions/3.0.1/lib/ruby/gems/3.0.0/gems/tangram-0.6.1/lib/tangram/tangram.rb:406:in `block in new_predict_input_vec'
        from /home/vagrant/.rbenv/versions/3.0.1/lib/ruby/gems/3.0.0/gems/tangram-0.6.1/lib/tangram/tangram.rb:404:in `each'
        from /home/vagrant/.rbenv/versions/3.0.1/lib/ruby/gems/3.0.0/gems/tangram-0.6.1/lib/tangram/tangram.rb:404:in `new_predict_input_vec'
        from /home/vagrant/.rbenv/versions/3.0.1/lib/ruby/gems/3.0.0/gems/tangram-0.6.1/lib/tangram/tangram.rb:277:in `predict'
        from test.rb:29:in `<main>'

A couple of side notes that I encountered while getting started with tangram:

The DockerHub link on the installation page is wrong - missing a /r/ (fixed link)
The tangram app command does not output anything, took me a few attempts to get it to listen on the correct port, and I didn't know if its running at all. Would be nice if it just stated "Server running on 0.0.0.0:8080".

OS: Ubuntu 18 (Vagrant on Windows)
Ruby: 3.0.1
Installed Tangram via the Ubuntu 18 instructions

Serving tangram models in the browser with WASM

I'm interested in deploying tangram models client-side via WASM. Is this something you'd be interested in to support officially?

In any case I'd be keen to experiment in that direction, any pointers as to how to get started on this?

Fix AUC approximate calculation

The threshold=0 and threshold=1 endpoints on the ROC curve are missing leading to an inaccurate AUC-ROC calculation in binary_classification_metrics

https://github.com/tangramdotdev/tangram/blob/a4a40bdfdbff59ea963453def3d5eeb5494c5c16/crates/metrics/binary_classification.rs#L82

PHP Library

This issue tracks creating a tangram library for PHP using the FFI functionality introduced in PHP 7.4.

terminated by signal SIGILL (Illegal instruction)

Given an input file input.csv

content,label
some-text,a
other-text,b

The following invocation

$ tangram --version
$ tangram train --file input.csv --target label --output output.tangram

fails with

✅ Inferring train table columns. 0ms
✅ Loading train table. 0ms
✅ Shuffling. 0ms
✅ Computing train stats. 0ms
thread panicked while panicking. aborting.
fish: 'tangram  train --file input.csv…' terminated by signal SIGILL (Illegal instruction)

Add support for missing data?

Unless I have overlooked something, it appears that tangram doesn't support missing data. Being able to handle missing data would be great :-)

Tutorial should cover `tangram predict` CLI command

I was following the tutorial, and I got to this page: https://www.tangram.dev/docs/getting_started/train

I ran tangram train --file heart_disease.csv --target diagnosis and generated my heart_disease.tangram model file - then I clicked the "Next: Make a Prediction. >" link at the bottom of the page and it took me here: https://www.tangram.dev/docs/getting_started/predict/

I'm not quite ready to spin up a development environment for one of those languages! I was hoping that I'd be able to use the CLI tool to make a prediction.

It looks like I can, with tangram predict -m heart_disease.tangram - but in order to use that I need to feed the tool a CSV file with new data in it, and the tutorial doesn't provide me with one of those.

Instead, I have to manually create a CSV from the examples in the different languages, e.g. here: https://www.tangram.dev/docs/getting_started/predict/python

It would be great if the first page of the "Make a Prediction" documentation (this page here) gave me a copy-and-paste CLI example using tangram predict, so that I could try that out before digging into the programming language demos.

Add HTTP prediction server to the CLI

Right now it is possible to make predictions from the CLI over stdin/stdout with tangram predict, but it would be nice to add an HTTP server for serving predictions which could be started like so: tangram serve --model <MODEL_PATH>.

Train without header

Is it possible to train without having a header row, but rather specify column number somehow?

Create size optimized model for making predictions

Currently the .tangram file contains both the model needed to make predictions and the report information. We would like to add the ability to generate a size optimized model file that strips the report information so when you incorporate the .tangram model into your code, the serialized file will be smaller.

Support cargo install

Tangram should be installable with cargo install tangram.

Could not find a train file at path for csv with multiline values

Given the following input file

content,label
a
,0
b,1

and the tangram invocation

tangram train --file input.csv --target label --output output.tangram

the following error is produced

🤔 Inferring train table columns. 0B / 23B 0% 0ms elapsed
[                                                                              ]
error: Could not find a train file at path: "input.csv"
   0: backtrace::capture::Backtrace::new
   1: tangram_core::train::Trainer::prepare
   2: tangram::train::train::{{closure}}
   3: tangram::main
   4: std::sys_common::backtrace::__rust_begin_short_backtrace
   5: _main

A ; delimited mulitline csv yields a different exception

✅ Inferring train table columns. 0ms
✅ Loading train table. 0ms
✅ Shuffling. 0ms
✅ Computing train stats. 0ms
error: panicked at 'called `Result::unwrap()` on an `Err` value: Any', cli/train.rs:158:44
   0: backtrace::capture::Backtrace::new
   1: tangram::train::train::{{closure}}
   2: std::panicking::rust_panic_with_hook
   3: std::panicking::begin_panic_handler::{{closure}}
   4: std::sys_common::backtrace::__rust_end_short_backtrace
   5: _rust_begin_unwind
   6: core::panicking::panic_fmt
   7: core::option::expect_none_failed
   8: core::ptr::drop_in_place<core::option::Option<tangram::train::ProgressThread>>
   9: tangram::train::train::{{closure}}
  10: tangram::main
  11: std::sys_common::backtrace::__rust_begin_short_backtrace
  12: _main

   0: backtrace::capture::Backtrace::new
   1: tangram::main
   2: std::sys_common::backtrace::__rust_begin_short_backtrace
   3: _main

Support for Multilabel Classification

Strategies under consideration:

use multiple binary classifiers, one for each of the labels
use a single multiclass classifer with label powerset method

Email address?

HI - I attempted to email the address listed on the pricing page today and got the following bounce. Is there a better email to contact for pricing inquiries?

Allow for use of sensible defaults when empty value is present

When running my first test model I ran into the issue where my target field was sometimes empty (signifying zero, or 'no data' for that timeframe). It was a numeric field so It'd be nice to be able to set a value in my config.json that it defaults to when there is no data per field.

An additional side note but the lack of a --verbose flag to tell me which row it was failing on made it take longer to find what the actual problem was. The error it returned was "error: The target column contains invalid values." which was a bit vague till I realized it meant the target I was setting via the CLI.

Add Support for automatically upsampling the minority class in the case of large class imbalance.

Hi, I'm trying out Tangram for a binary classification problem. I have data with about 600 observations in one class and 6000 in another. Can I do a weighted model with Tangram that would balance the classes? Or does Tangram do some balancing implicitly under the hood?

Originally posted by @otsaw in tangramdotdev/tangram#38

Support piping training data through stdin

$ cat heart_disease.csv | tangram train --target diagnosis
error: panicked at 'internal error: entered unreachable code', crates/core/train.rs:85:18
   0: backtrace::capture::Backtrace::new
   1: tangram::train::train::{{closure}}
   2: std::panicking::rust_panic_with_hook
   3: std::panicking::begin_panic_handler::{{closure}}
   4: std::sys_common::backtrace::__rust_end_short_backtrace
   5: _rust_begin_unwind
   6: core::panicking::panic_fmt
   7: core::panicking::panic
   8: tangram_core::train::Trainer::prepare
   9: tangram::main
  10: std::sys_common::backtrace::__rust_begin_short_backtrace
  11: _main

Not sure this is actually supported, but since stdin works for "tangram predict" it would be less surprising to support stdin for train as well.

Also allows training integration in other software without having to write temporary CSV file to disk or passing compressed CSV directly (gunzip heart_disease.csv.gz | tangram train --target diagnosis) or transform CSV on the fly (eg. tangramdotdev/tangram#35 (comment))

Add NA to DEFAULT_INVALID_VALUES

Please add "NA" to DEFAULT_INVALID_VALUES. "NA" is used by default by most R functions (both base R and various packages) that write CSV files.

Mention windows explicitly in install docs

The install docs should explicitly call out windows support.

predict CLI should support --threshold

Threshold option is available from libraries (or predict server) but not available from the CLI - ideally it should be available too.

Support Column Names with Spaces

Problem: Clicking a column whose name contains spaces on the "Training Stats" page returns "not found".

Add support for column names that contain spaces.

Java Library

This issue tracks adding a tangram library for Java. Ideally, the library will be compatible with other JVM languages such as Scala and Kotlin.

error: invalid IP address syntax

Error when running the tangram app: error: invalid IP address syntax
tangram version: tangram_cli_0.7.0_x86_64-unknown-linux-gnu.tar.gz

Windows tarball marked as malicious

We need to look into why windows marks our tarball as potentially malicious and figure out how to convince it otherwise.

Ignoring CSV columns in input file

(Great project by the way!)

It seems like it isn't currently possible to ignore CSV columns in the input file, this would be useful for training without having to process CSV files with things like database ID columns before starting training.

Support Time Series Forecasting

Would like to forecast future record(s) of CSV with Tangram based on existing data or a subset of existing data that is close to the current time (https://www.tensorflow.org/tutorials/structured_data/time_series).

More informative CLI train output

Currently when training a model, the output I get is

Computing features. 0ms
Training model 2 of 8. 0s 116ms
Computing model comparison features. 0ms
Computing comparison metric. 0ms

I'd prefer to see what kind of model it's training and what the value of the comparison metric is. So, something similar to in the app's "All Models" table under "Grid", but save me a bit of time not having to go there. For example

Computing features. 0ms
Training model 2 of 8: Linear. 0s 116ms
Computing model comparison features. 0ms
Computing comparison metric: AUC 0.765832. 0ms
...
Finalizing the best model: Linear AUC 0.765832. 0ms

I also noticed that I can guess the final model type based on filesize, but it would be nice if it was clearly in the output.

502 bad gateway when attempting to view model on app.tangram.xyz, or local version

Local version does this upon clicking upload:

but the model appears in the list:

but then clicking it once again results in this:

This is the output of the local version: thread 'tokio-runtime-worker' panicked at 'called Option::unwrap()on aNone value', crates/app/pages/repos/_/models/_/index/server/get.rs:435:30

The offending code: https://github.com/tangramxyz/tangram/blob/main/crates/app/pages/repos/_/models/_/index/server/get.rs#L435

Model documentation

I'm reading through the documentation on tangram.xyz but don't see any indication of what model or combination of models tangram is selecting for the data. Can you please provide some documentation on this? Thanks!

hdf5 file support

It would be great if tangram allowed hdf5 files as input in addition to csv. The CLI can determine what file format to use based on the extension of the input file, and possibly an additional command line argument to force the format.

Support for Deno and native ES modules

The JavaScript library currently has support for Node.js and bundlers. We should be able to add support for Deno and native ES modules like so:

import * as tangram from "https://js.tangram.xyz";

Strange behavior when importing JS library

After I install via npm install @tangramdotdev/tangram
and attempt to import using let tangram = require("@tangramdotdev/tangram")
and follow the remainder of the tutorial steps

I can run the script and get a working prediction as expected. However I am noticing some strange behaviors.

A) - Eslint is not happy with the import. It always gives me "Unable to resolve path to module '@tangramdotdev/tangram'.eslintimport/no-unresolved"

B) Even though the script will execute and work as expected, I can not test any function that imports the tangram library. Jest always throws the "Cannot find module '@tangramdotdev/tangram' from 'index.js' "

I've never observed this before in any other library. Any guidance? I've included some screenshots:

Working script: (Note the ESLINT error on import)

Ignore blocks at line 13 and 161 - Just some data marshalling / cleaning to get it in the right shape.

Working result: (after directly uncommenting line 177 in above image to run directly)

A simple jest test of above foo function:

const foo = require("./index")

test("foo", () => {
    expect(foo()).toEqual({})
})

But when I run above test:

It's so baffling to me that I can run the script and all works as normal, but ESLINT and Jest are both throwing fits.

C# Library

This issue tracks adding a library for C#.

Unable to install tangram python library

On Ubuntu 18

$ pip3 install tangram
Collecting tangram
  Could not find a version that satisfies the requirement tangram (from versions: )
No matching distribution found for tangram

$ pip3 --version
pip 9.0.1 from /usr/lib/python3/dist-packages (python 3.6)

This is reproducible on two different systems (although similar OS and specs).

implement `Into<PredictInputValue>` for more types

I am getting E0277 following the advanced Rust example (https://github.com/tangramdotdev/tangram/blob/main/languages/rust/examples/advanced/main.rs):

the trait bound `PredictInputValue: From<i32>` is not satisfied
the following implementations were found:
  <PredictInputValue as From<&str>>
  <PredictInputValue as From<f32>>
  <PredictInputValue as From<f64>>
  <PredictInputValue as From<std::string::String>>
required because of the requirements on the impl of `Into<PredictInputValue>` for `i32` rustc(E0277)

There is also mismatched types error at predict_one.

% rustc -Vv
rustc 1.54.0 (a178d0322 2021-07-26)
binary: rustc
commit-hash: a178d0322ce20e33eac124758e837cbd80a6f633
commit-date: 2021-07-26
host: aarch64-apple-darwin
release: 1.54.0

Check mark shows up as tofu

This is not a big deal, but in the CLI training output, the check mark you're using shows up as tofu for me. I'm using the Cascadia Code font, which doesn't include that character. Maybe you could switch to the regular check mark Unicode 2713? That's in Cascadia Code and probably more common elsewhere too. (For what it's worth, I don't really like to see fancy characters in CLI output at all, but that seems to be a lost battle as it's more and more common.)

Building from source and running

Can you please provide very straightforward instructions on building from source? The scripts aren't straightforward to use. Additionally, it would be more idiomatic for a rust repo like this to include at least one example which is easy to run, like so:

cargo run --example heart_disease

I would also suggest including the heart_disease.csv test file in your github repo so folks compiling from source can have a command like the above (or a script could do this...)

Add support for weight column

Add support for a weight column. Can be passed to the cli as --weight-column or in the config file.

can u show an example to enable early stopping in config file

Optimizing the size of the model

Which hyperparameters are the most important ones for minimizing the size of a Gradient Boosted Tree model? From my experiments so far, it seems like min_examples_per_node and max_rounds have the biggest effect.

Allow fixing a single parameter value

I have two potential use cases where I'd like to fix a single parameter value, but otherwise get the full default parameter grid. If I understand correctly, currently I'd have to specify that full grid in quite verbose JSON, which is a bit much.

For the data I have (lots of dummy variables), both linear and GBDT models do well and, perhaps depending on the downsampling stochastics, sometimes I get a linear model as best, sometimes GBDT. I'd like to fix that so that only linear models are tried (or only GBDT), because I don't want a discontinuity in production of having now a linear model, a month later GBDT, then linear again, etc.
If I oversample (which I'm not currently doing), I believe I need to fix the min_examples_per_node to more than the upsampling replication count. I'd like to set that but otherwise get the default grid.

If you want to keep the CLI simple, having these via Python (#12) would be fine for me too.

panic after completing training

tangram_cli 0.5.0

invoked with

tangram train --file test.csv --target price_60 --output output.tangram

error: panicked at 'called `Option::unwrap()` on a `None` value', crates/core/train.rs:1699:65
   0: tangram::train::train::{{closure}}
   1: std::panicking::rust_panic_with_hook
             at /rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panicking.rs:595:17
   2: std::panicking::begin_panic_handler::{{closure}}
             at /rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panicking.rs:495:13
   3: std::sys_common::backtrace::__rust_end_short_backtrace
             at /rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/sys_common/backtrace.rs:141:18
   4: rust_begin_unwind
             at /rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panicking.rs:493:5
   5: core::panicking::panic_fmt
             at /rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/core/src/panicking.rs:92:14
   6: core::panicking::panic
             at /rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/core/src/panicking.rs:50:5
   7: tangram::train::train::{{closure}}
   8: tangram::main
   9: std::sys_common::backtrace::__rust_begin_short_backtrace
  10: main
  11: __libc_start_main
  12: _start

i've attached my test data that causes this issue, apologies it's so large (edit: maybe it's not that bad 😄 ) it was the smallest reproducible test set i could come up with

let me know if there's anything else i can provide!

test.zip

Expose training to the language libraries

This issue tracks adding support for training in addition to prediction to each of the language libraries.

Node import issue

There seems to be an error on the Node example (https://www.tangram.xyz/docs/getting_started/predict/node).

Reproduce:

run: npm install @tangramxyz/tangram
create an index.js file with the provided code
test and run with: node index.js
Note the error below

Node-version 16.4.2

node:internal/modules/cjs/loader:1112
      throw new ERR_REQUIRE_ESM(filename, parentPath, packageJsonPath);
      ^

Error [ERR_REQUIRE_ESM]: Must use import to load ES Module: /Users/fnordell/Documents/private/machine_learning/node_modules/@tangramxyz/tangram/dist/node.js
require() of ES modules is not supported.
require() of /Users/fnordell/Documents/private/machine_learning/node_modules/@tangramxyz/tangram/dist/node.js from /Users/fnordell/Documents/private/machine_learning/index.js is an ES module file as it is a .js file whose nearest parent package.json contains "type": "module" which defines all .js files in that package scope as ES modules.
Instead rename node.js to end in .cjs, change the requiring code to use import(), or remove "type": "module" from /Users/fnordell/Documents/private/machine_learning/node_modules/@tangramxyz/tangram/package.json.

    at new NodeError (node:internal/errors:363:5)
    at Object.Module._extensions..js (node:internal/modules/cjs/loader:1112:13)
    at Module.load (node:internal/modules/cjs/loader:975:32)
    at Function.Module._load (node:internal/modules/cjs/loader:816:12)
    at Module.require (node:internal/modules/cjs/loader:999:19)
    at require (node:internal/modules/cjs/helpers:93:18)
    at Object.<anonymous> (/Users/fnordell/Documents/private/machine_learning/index.js:3:17)
    at Module._compile (node:internal/modules/cjs/loader:1095:14)
    at Object.Module._extensions..js (node:internal/modules/cjs/loader:1124:10)
    at Module.load (node:internal/modules/cjs/loader:975:32) {
  code: 'ERR_REQUIRE_ESM'
}

CLI login and upload

Right now, models can only be uploaded to the app from the browser. It would be nice to allow upload from the CLI as well. This will require implementing authentication in the CLI as well. The proposed interface is:

tangram login to authenticate, optionally with --app-url <APP_URL> to support self-hosted apps.
tangram upload --app-url <APP_URL> --repo-id <REPO_ID> --model <MODEL_PATH>.

Multiple Target Columns

Can I able to use multiple target columns in tangram

Chocolatey Package

Tangram is currently packaged for Scoop on Windows, but it would be great to add a Chocolatey package as well.

Support yaml for configuration file

As requested by @DannyBen, this issue tracks adding support for YAML as well as JSON for training configuration.

Error E0658 when building on Arch Linux

I recieved the following error when running cargo build, with cargo version 1.53.0 and rustc 1.53.0:

Compiling tangram_table v0.6.0 (/usr/src/tangram/crates/table)
Compiling hyper v0.14.9
Compiling tracing-wasm v0.2.0
Compiling aws-creds v0.26.0
Compiling wasm-bindgen-cli-support v0.2.74
Compiling tangram_license v0.0.0 (/usr/src/tangram/crates/license)
Compiling tangram_features v0.6.0 (/usr/src/tangram/crates/features)
Compiling tangram_tree v0.6.0 (/usr/src/tangram/crates/tree)
Compiling web-sys v0.3.51
Compiling wasm-bindgen-futures v0.4.24
Compiling serde-wasm-bindgen v0.3.0
Compiling sunfish v0.2.5
Compiling hyper-rustls v0.22.1
Compiling tangram_linear v0.6.0 (/usr/src/tangram/crates/linear)
error[E0658]: arbitrary expressions in key-value attributes are unstable
--> crates/linear/lib.rs:1:10
|
1 | #![doc = include_str!("./README.md")]
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: see issue #78835 rust-lang/rust#78835 for more information

error: aborting due to previous error

For more information about this error, try
rustc --explain E0658.
error: could not compile tangram_linear

To learn more, run the command again with --verbose.
warning: build failed, waiting for other jobs to finish...
error: build failed

What kind of models and training methods are used by tangram?

I am trying to get to know what kind of models and training methods are used by tangram. However, I struggle to find any information. The help message from tangram train --help says nothing on that. I can only see that tangram is training 8 models. In the About info page I can read "... of the gradient boosted decision tree algorithm.", but this information is also very rudimentary.