Giter Site home page Giter Site logo

bigmler's People

Contributors

aficionado avatar arnaudsj avatar cheesinglee avatar florent2 avatar jaor avatar mmerce avatar osroca avatar petersen-poul avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bigmler's Issues

when running a batch prediction, predictions.csv can be incomplete

Ran a batch prediction on 50k rows. This usually works. Got an incomplete predictions.csv file back, despite no complaints from bigmler. It exited with code 0 and appeared to work fine. The file was cut off after about 10k records, halfway through a line.

My guess is that it should catch any errors and retry.

KeyError: 'fields' when using bigmler analyze

Hi,
I have been using bigmler analyze with the following options:

bigmler analyze --features \
    --dataset "${DATASET_ID}" \
    --penalty 0.002 --staleness 3 --k-folds 2 \
    --predictions-csv \
    --optimize-category='True' \
    --optimize precision

And I'm getting the following error:

Creating the best features set..........
Traceback (most recent call last):
  File ".../.env/pyenv-2.7.10-default/bin/bigmler", line 11, in <module>
    sys.exit(main())
  File ".../.env/pyenv-2.7.10-default/lib/python2.7/site-packages/bigmler/bigmler.py", line 97, in main
    analyze_dispatcher(args=new_args)
  File ".../.env/pyenv-2.7.10-default/lib/python2.7/site-packages/bigmler/analyze/dispatcher.py", line 117, in analyze_dispatcher
    resume=resume)
  File ".../.env/pyenv-2.7.10-default/lib/python2.7/site-packages/bigmler/analyze/k_fold_cv.py", line 239, in create_features_analysis
    objective_name=objective_name, resume=resume)
  File ".../.env/pyenv-2.7.10-default/lib/python2.7/site-packages/bigmler/analyze/k_fold_cv.py", line 586, in best_first_search
    fields = Fields(dataset)
  File ".../.env/pyenv-2.7.10-default/lib/python2.7/site-packages/bigml/fields.py", line 175, in __init__
    resource_info = get_fields_structure(resource_or_fields, True)
  File ".../.env/pyenv-2.7.10-default/lib/python2.7/site-packages/bigml/fields.py", line 116, in get_fields_structure
    fields = resource['fields']
KeyError: 'fields'

Any idea what is causing this? If you want to reproduce this, the dataset ID is dataset/574f4a0c7e0a8d09ab0093ae.

Best,
Aurélien

can't whitelist fields AND blacklist auto-generated date fields

let's say i have a dataset with fields a, b, c. note that a is a date. i create a model excluding c using a whitelist:

--model-fields a,b

then i decide i don't want a.month or a.day-of-month which are auto-generated, so i try

--model-fields a,b,-a.day-of-month,-a.month

this doesn't work; it says there's no such field.

uses 3gb+ memory running predictions

screen shot 2013-09-13 at 2 41 37 am

here you can see an ensemble running

it seems to load half the models (1gb ram), run predictions, load the next half of models (2gb), run predictions and finally combine (3gb).

every time you load new models, can't you clear references to the old ones and let them be garbage collected?

can't set `ordering` when creating evaluations

we want to perform multiple evaluations using the same dataset, so we need to be able to set ordering:

Specifies the type of ordering followed to pick the instances of the dataset to evaluate the model. There are three differnt types that you can specify:
0 Deterministic
1 Linear
2 Random

i believe bigmler currently doesn't support this.

excluding non-existent fields with `--model-fields` blows up

If you try to run

--model-fields " -foo,-bar"

and foo doesn't exist, then bigmler raises an exception.

For exclusions, I suggest only a warning be logged.

(For inclusions, on the other hand, a missing field should still raise an exception.)

`--output` seems to be ignored

the latest version of bigmler seems to ignore the --output option...

for example, I specify /tmp/foo/ensemble and it outputs to /tmp/foo/ensembles

Adding ~ expansion to paths

The options that point to paths do not include user home directory expansion or ~ and ~user. We should add it.

`--test-split` and `--json-filter` don't work together

I've got a master dataset with lots of labels and features.

I want to manually generate a split for every label with --test-split so that I can create multiple ensembles and evaluations against it (different feature sets).

I want to use --json-filter to make sure that a split only contains labeled records - in other words, ["not",["missing", "labelX"]]. That way I can ensure that train and test really are 70%/30%.

Currently, --test-split and --json-filter don't work together - the filter is ignored.

--resources-log and --clear-logs fail in Python 3[.7]

To reproduce:

  1. Install bigmler for Python3
  2. Run
bigmler execute \
                --code '(+ 1 1)' \
                --output-dir tmp \
                --resources-log foo

or

bigmler execute \
                --code '(+ 1 1)' \
                --output-dir tmp \
                --clear-logs

Expected behavior:
Both succeed, and the first command writes to the log file foo

Actual behavior:
Fails with stack trace

Traceback (most recent call last):
  File "/usr/local/bin/bigmler", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.7/site-packages/bigmler/bigmler.py", line 97, in main
    bd.subcommand_dispatcher(subcommand, new_args)
  File "/usr/local/lib/python3.7/site-packages/bigmler/dispatchers.py", line 55, in subcommand_dispatcher
    return globals()["%s_dispatcher" % subcommand.replace("-", "_")](args)
  File "/usr/local/lib/python3.7/site-packages/bigmler/execute/dispatcher.py", line 63, in execute_dispatcher
    execute_whizzml(command_args, api, session_file)
  File "/usr/local/lib/python3.7/site-packages/bigmler/execute/dispatcher.py", line 78, in execute_whizzml
    clear_log_files([log])
  File "/usr/local/lib/python3.7/site-packages/bigmler/dispatcher.py", line 105, in clear_log_files
    open(log_file, 'w', 0).close()
ValueError: can't have unbuffered text I/O

Suggested fix:

This line

open(log_file, 'w', 0).close()

open(log_file, 'w', 0).close()

This works in Python2 but not Python3. Internet seems to suggest replacing with

open(log_file, 'wb', 0).close()

Which works in both (tested on Python 2.7.16)

Little render error in docs

Here, where you explain Flatline, there is a missing link. It reads like this

Flatline expression <https://github.com/bigmlcom/flatline>

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.