ludwig-ai / ludwig Goto Github PK
View Code? Open in Web Editor NEWLow-code framework for building custom LLMs, neural networks, and other AI models
Home Page: http://ludwig.ai
License: Apache License 2.0
Low-code framework for building custom LLMs, neural networks, and other AI models
Home Page: http://ludwig.ai
License: Apache License 2.0
I cannot find reuters-allcats.csv
file link in https://uber.github.io/ludwig/examples/ .
I think the Example page should contain this file, so a user can actually follow the example to try Ludwig out.
Thanks for open sourcing this project.
Would it be possible to have S3 or any other object storage support? E.g reading input data from S3 or saving model to S3.
I know there could be a workaround using S3FS, but is this supported outta-the-box?
Thanks
the ludwig looks good, and I want to know how to use ludwig to estimate time of arrival (eta), many thanks.
I'm new to machine learning and have managed to train ludwig using an f1 data set:
team,surname,position,track,year
Mercedes,Rosberg,1,Albert Park Grand Prix Circuit,2014
McLaren,Magnussen,2,Albert Park Grand Prix Circuit,2014
McLaren,Button,3,Albert Park Grand Prix Circuit,2014
...
my model is as follows:
input_features:
-
name: team
type: category
-
name: track
type: category
-
name: surname
type: category
-
name: year
type: category
output_features:
-
name: position
type: numerical
training:
epochs: 10
When I run the predict function I am trying to get a driver / position prediction for each track. What do I need to do in order to make this happen. I have tried to make a new data file without the position bugt it falls over with a missing 'position' key so clearly that's needed even though that is the field I am trying to predict:
team,surname,track,year
Mercedes,Hamilton,Albert Park Grand Prix Circuit,2019
McLaren,Magnussen,Albert Park Grand Prix Circuit,2019
McLaren,Button,Albert Park Grand Prix Circuit,2019
Note the above data is truncated from the last 5 years with positions of 10th and above. I can expand this if it helps training!
When I add some positions in the prediction just returns a list of numerical values:
1.7585578
2.0917244
1.6508131
I'm assuming this is the position that is would be expecting for that driver/track/team combo but I'm not sure.
Can someone explain this to me further? Do I need a model without the 'position' key in for the prediction? Is there any way to tell Ludwig to use full Integers for positions and maybe only assign one of each per track (/per year)
Or can I be pointed to somewhere I can learn this myself?
Hi there, thanks for launching this exciting tool! I am trying to modify the training parameters in the yaml a model_definition file and I am getting an error that the dictionary is a list and not a dictionary.
YAML File
# ludwig yaml test
input_features:
-
name: image_path
type: image
encoder: stacked_cnn
output_features:
-
name: class
type: category
training:
-
epochs: 01
Stack Trace:
File "/home/mendeza/Documents/uber-ludwig/py3env/lib/python3.6/site-packages/ludwig/cli.py", line 86, in main
CLI()
File "/home/mendeza/Documents/uber-ludwig/py3env/lib/python3.6/site-packages/ludwig/cli.py", line 64, in __init__
getattr(self, args.command)()
File "/home/mendeza/Documents/uber-ludwig/py3env/lib/python3.6/site-packages/ludwig/cli.py", line 67, in experiment
experiment.cli(sys.argv[2:])
File "/home/mendeza/Documents/uber-ludwig/py3env/lib/python3.6/site-packages/ludwig/experiment.py", line 548, in cli
experiment(**vars(args))
File "/home/mendeza/Documents/uber-ludwig/py3env/lib/python3.6/site-packages/ludwig/experiment.py", line 159, in experiment
model_definition = merge_with_defaults(yaml.load(def_file))
File "/home/mendeza/Documents/uber-ludwig/py3env/lib/python3.6/site-packages/ludwig/utils/defaults.py", line 148, in merge_with_defaults
default_training_params[param])
File "/home/mendeza/Documents/uber-ludwig/py3env/lib/python3.6/site-packages/ludwig/utils/misc.py", line 136, in set_default_value
dictionary[key] = value
TypeError: list indices must be integers or slices, not str
Installed ludwig today (pip install ludwig) on Python 3.5.5 on Ubuntu 16.04 and on Python 3.6.8 on Windows 10 (version 1809). In both environments, ludwig seems to be failing to open the model file and issues the error "AttributeError: 'str' object has no attribute 'get'".
Here is my cmd line:
ludwig experiment --model_definition model_definition.yaml --data_csv e:\github\LabCoat\Client\data\cars.json
Here is the stack trace:
Traceback (most recent call last):
File "c:\anaconda3\envs\babyai\lib\runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "c:\anaconda3\envs\babyai\lib\runpy.py", line 85, in run_code
exec(code, run_globals)
File "c:\anaconda3\envs\babyai\Scripts\ludwig.exe_main.py", line 9, in
File "c:\anaconda3\envs\babyai\lib\site-packages\ludwig\cli.py", line 86, in main
CLI()
File "c:\anaconda3\envs\babyai\lib\site-packages\ludwig\cli.py", line 64, in init
getattr(self, args.command)()
File "c:\anaconda3\envs\babyai\lib\site-packages\ludwig\cli.py", line 67, in experiment
experiment.cli(sys.argv[2:])
File "c:\anaconda3\envs\babyai\lib\site-packages\ludwig\experiment.py", line 548, in cli
experiment(**vars(args))
File "c:\anaconda3\envs\babyai\lib\site-packages\ludwig\experiment.py", line 161, in experiment
model_definition = merge_with_defaults(model_definition)
File "c:\anaconda3\envs\babyai\lib\site-packages\ludwig\utils\defaults.py", line 126, in merge_with_defaults
model_definition.get('preprocessing', {})
AttributeError: 'str' object has no attribute 'get'
It doesn't seem to be opening the model file (same error with existing or non-existing model file specified).
Maybe I missed something, but trying to get the image classification example going, i always end up with the following error just after the training starts:
Allocator (GPU_0_bfc) ran out of memory trying to allocate 10.99GiB. Current allocation summary follows.
No matter what images I use, or if I do it with resize_image in the input_features or bigger or tinier width/height, it always wants to allocate those 10.99GiB (which my GTX1080 hasnt :))
Even when using --gpu_fraction it wants those 10.99GiB (tried values from 0.1 to 0.5)
I'm a tensorfloor noob, so maybe I just got something completely wrong.
Definition:
_input_features:
-
name: image_path
type: image
encoder: stacked_cnn
resize_image: true
width: 28
height: 28
output_features:
-
name: class
type: category_
On my Mac I'm trying to use a file with a .CSV extension such as input.CSV. Unfortunately, ludwig fails to load such files in preprocessing since string.replace in python is case sensitive.
Steps to reproduce (provided you already have a trained model):
prediction.CSV
ludwig predict --data_csv prediction.CSV --model_path results/experiment_run_0/model
Expected behavior:
ludwig can read the file
Actual:
An exception is thrown:
| |_ _ __| |_ __ _(_)__ _
| | || / _` \ V V / / _` |
|_|\_,_\__,_|\_/\_/|_\__, |
|___/
ludwig v0.1.0 - Predict
Dataset type: generic
Dataset path: prediction.CSV
Model path: results/experiment_run_0/model
Output path: results_0
Found hdf5 with the same filename of the csv, using it instead
Loading metadata from: results/experiment_run_0/model/train_set_metadata.json
Loading data from: prediction.CSV
Traceback (most recent call last):
File "/Users/vood/.pyenv/versions/3.6.3/bin/ludwig", line 11, in <module>
sys.exit(main())
File "/Users/vood/.pyenv/versions/3.6.3/lib/python3.6/site-packages/ludwig/cli.py", line 86, in main
CLI()
File "/Users/vood/.pyenv/versions/3.6.3/lib/python3.6/site-packages/ludwig/cli.py", line 64, in __init__
getattr(self, args.command)()
File "/Users/vood/.pyenv/versions/3.6.3/lib/python3.6/site-packages/ludwig/cli.py", line 73, in predict
predict.cli(sys.argv[2:])
File "/Users/vood/.pyenv/versions/3.6.3/lib/python3.6/site-packages/ludwig/predict.py", line 379, in cli
full_predict(**vars(args))
File "/Users/vood/.pyenv/versions/3.6.3/lib/python3.6/site-packages/ludwig/predict.py", line 86, in full_predict
only_predictions
File "/Users/vood/.pyenv/versions/3.6.3/lib/python3.6/site-packages/ludwig/data/preprocessing.py", line 660, in preprocess_for_prediction
shuffle_training=False
File "/Users/vood/.pyenv/versions/3.6.3/lib/python3.6/site-packages/ludwig/data/preprocessing.py", line 239, in load_data
hdf5_data = h5py.File(hdf5_file_path, 'r')
File "/Users/vood/.pyenv/versions/3.6.3/lib/python3.6/site-packages/h5py/_hl/files.py", line 394, in __init__
swmr=swmr)
File "/Users/vood/.pyenv/versions/3.6.3/lib/python3.6/site-packages/h5py/_hl/files.py", line 170, in make_fid
fid = h5f.open(name, flags, fapl=fapl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5f.pyx", line 85, in h5py.h5f.open
OSError: Unable to open file (file signature not found)```
Hi, I'm getting the following stack trace when trying to do a simple prediction that should yield a text output containing utf-8 text, in this case the symbol โ . I believe this comes down to csv.writer (in data_utils.py function save_csv) not playing nice with utf-8 out of the box. I'm not super familiar with using python's native csv writing though.
Traceback (most recent call last):
File "C:\Users\xxx\AppData\Local\Programs\Python\Python36\Scripts\ludwig-script.py", line 11, in <module>
load_entry_point('ludwig==0.1.0', 'console_scripts', 'ludwig')()
File "c:\users\xxx\appdata\local\programs\python\python36\lib\site-packages\ludwig\cli.py", line 86, in main
CLI()
File "c:\users\xxxi\appdata\local\programs\python\python36\lib\site-packages\ludwig\cli.py", line 64, in __init__
getattr(self, args.command)()
File "c:\users\xxx\appdata\local\programs\python\python36\lib\site-packages\ludwig\cli.py", line 73, in predict
predict.cli(sys.argv[2:])
File "c:\users\xxx\appdata\local\programs\python\python36\lib\site-packages\ludwig\predict.py", line 379, in cli
full_predict(**vars(args))
File "c:\users\xxx\appdata\local\programs\python\python36\lib\site-packages\ludwig\predict.py", line 120, in full_predict
save_prediction_outputs(postprocessed_output, experiment_dir_name)
File "c:\users\xxx\appdata\local\programs\python\python36\lib\site-packages\ludwig\predict.py", line 210, in save_prediction_outputs
save_csv(csv_filename.format(output_field, output_type), values)
File "c:\users\xxx\appdata\local\programs\python\python36\lib\site-packages\ludwig\utils\data_utils.py", line 60, in save_csv
writer.writerow(row)
File "c:\users\xxx\appdata\local\programs\python\python36\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u264b' in position 0: character maps to <undefined>
Hello,
my train csv file looks like
mbploreto:script loretoparisi$ head -n2 /root/spam_dataset.csv
label text
HAM waiting waiting waiting waiting solitude stands by the window as someone said i tried hard to find you i found fake promises instead the thought behind to join the thought before i thought i was blind sometimes i feel i feel the way to live i thought i had strength to overcome these walls i thought i was wonderful memories keep together things now would you like to know how it feels to be always stuck in the past without any rest the thought behind to join the thought before i thought i was blind
SPAM please every body click cross
so I have my configuration as string
"{input_features: [{name: text, type: text}], output_features: [{name: label, type: category}]}"
and I start training then:
ludwig train --data_csv /root/spam_dataset.csv --model_definition "{input_features: [{name: text, type: text}], output_features: [{name: label, type: category}]}"
Suddenly I get that error about the text
field:
_ _ _
| |_ _ __| |_ __ _(_)__ _
| | || / _` \ V V / / _` |
|_|\_,_\__,_|\_/\_/|_\__, |
|___/
ludwig v0.1.0 - Train
Experiment name: experiment
Model name: run
Output path: results/experiment_run_1
ludwig_version: '0.1.0'
command: ('ludwig train '
'--data_csv /root/spam_dataset.csv --model_definition {input_features: '
'[{name: text, type: text}], output_features: [{name: label, type: '
'category}]}')
commit_hash: '98b82b3f56c0'
dataset_type: '/root/spam_dataset.csv'
model_definition: { 'combiner': {'type': 'concat'},
'input_features': [ { 'encoder': 'parallel_cnn',
'level': 'word',
'name': 'text',
'tied_weights': None,
'type': 'text'}],
'output_features': [ { 'dependencies': [],
'loss': { 'class_distance_temperature': 0,
'class_weights': 1,
'confidence_penalty': 0,
'distortion': 1,
'labels_smoothing': 0,
'negative_samples': 0,
'robust_lambda': 0,
'sampler': None,
'type': 'softmax_cross_entropy',
'unique': False,
'weight': 1},
'name': 'label',
'reduce_dependencies': 'sum',
'reduce_input': 'sum',
'top_k': 3,
'type': 'category'}],
'preprocessing': { 'bag': { 'fill_value': '',
'format': 'space',
'lowercase': 10000,
'missing_value_strategy': 'fill_with_const',
'most_common': False},
'binary': { 'fill_value': 0,
'missing_value_strategy': 'fill_with_const'},
'category': { 'fill_value': '<UNK>',
'lowercase': False,
'missing_value_strategy': 'fill_with_const',
'most_common': 10000},
'force_split': False,
'image': {'missing_value_strategy': 'backfill'},
'numerical': { 'fill_value': 0,
'missing_value_strategy': 'fill_with_const'},
'sequence': { 'fill_value': '',
'format': 'space',
'lowercase': False,
'missing_value_strategy': 'fill_with_const',
'most_common': 20000,
'padding': 'right',
'padding_symbol': '<PAD>',
'sequence_length_limit': 256,
'unknown_symbol': '<UNK>'},
'set': { 'fill_value': '',
'format': 'space',
'lowercase': False,
'missing_value_strategy': 'fill_with_const',
'most_common': 10000},
'split_probabilities': (0.7, 0.1, 0.2),
'stratify': None,
'text': { 'char_format': 'characters',
'char_most_common': 70,
'char_sequence_length_limit': 1024,
'fill_value': '',
'lowercase': True,
'missing_value_strategy': 'fill_with_const',
'padding': 'right',
'padding_symbol': '<PAD>',
'unknown_symbol': '<UNK>',
'word_format': 'space_punct',
'word_most_common': 20000,
'word_sequence_length_limit': 256},
'timeseries': { 'fill_value': '',
'format': 'space',
'missing_value_strategy': 'fill_with_const',
'padding': 'right',
'padding_value': 0,
'timeseries_length_limit': 256}},
'training': { 'batch_size': 128,
'bucketing_field': None,
'decay': False,
'decay_rate': 0.96,
'decay_steps': 10000,
'dropout_rate': 0.0,
'early_stop': 3,
'epochs': 200,
'gradient_clipping': None,
'increase_batch_size_on_plateau': 0,
'increase_batch_size_on_plateau_max': 512,
'increase_batch_size_on_plateau_patience': 5,
'increase_batch_size_on_plateau_rate': 2,
'learning_rate': 0.001,
'learning_rate_warmup_epochs': 5,
'optimizer': { 'beta1': 0.9,
'beta2': 0.999,
'epsilon': 1e-08,
'type': 'adam'},
'reduce_learning_rate_on_plateau': 0,
'reduce_learning_rate_on_plateau_patience': 5,
'reduce_learning_rate_on_plateau_rate': 0.5,
'regularization_lambda': 0,
'regularizer': 'l2',
'staircase': False,
'validation_field': 'combined',
'validation_measure': 'loss'}}
Using full raw csv, no hdf5 and json file with the same name have been found
Building dataset (it may take a while)
Traceback (most recent call last):
File "/Users/loretoparisi/Documents/Projects/AI/ludwig/venv/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2656, in get_loc
return self._engine.get_loc(key)
File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1608, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'text'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "ludwig", line 11, in <module>
load_entry_point('ludwig==0.1.0', 'console_scripts', 'ludwig')()
File "/Users/loretoparisi/Documents/Projects/AI/ludwig/venv/lib/python3.6/site-packages/ludwig-0.1.0-py3.6.egg/ludwig/cli.py", line 86, in main
File "/Users/loretoparisi/Documents/Projects/AI/ludwig/venv/lib/python3.6/site-packages/ludwig-0.1.0-py3.6.egg/ludwig/cli.py", line 64, in __init__
File "/Users/loretoparisi/Documents/Projects/AI/ludwig/venv/lib/python3.6/site-packages/ludwig-0.1.0-py3.6.egg/ludwig/cli.py", line 70, in train
File "/Users/loretoparisi/Documents/Projects/AI/ludwig/venv/lib/python3.6/site-packages/ludwig-0.1.0-py3.6.egg/ludwig/train.py", line 663, in cli
File "/Users/loretoparisi/Documents/Projects/AI/ludwig/venv/lib/python3.6/site-packages/ludwig-0.1.0-py3.6.egg/ludwig/train.py", line 224, in full_train
File "/Users/loretoparisi/Documents/Projects/AI/ludwig/venv/lib/python3.6/site-packages/ludwig-0.1.0-py3.6.egg/ludwig/data/preprocessing.py", line 457, in preprocess_for_training
File "/Users/loretoparisi/Documents/Projects/AI/ludwig/venv/lib/python3.6/site-packages/ludwig-0.1.0-py3.6.egg/ludwig/data/preprocessing.py", line 62, in build_dataset
File "/Users/loretoparisi/Documents/Projects/AI/ludwig/venv/lib/python3.6/site-packages/ludwig-0.1.0-py3.6.egg/ludwig/data/preprocessing.py", line 83, in build_dataset_df
File "/Users/loretoparisi/Documents/Projects/AI/ludwig/venv/lib/python3.6/site-packages/ludwig-0.1.0-py3.6.egg/ludwig/data/preprocessing.py", line 123, in build_metadata
File "/Users/loretoparisi/Documents/Projects/AI/ludwig/venv/lib/python3.6/site-packages/pandas/core/frame.py", line 2927, in __getitem__
indexer = self.columns.get_loc(key)
File "/Users/loretoparisi/Documents/Projects/AI/ludwig/venv/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2658, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1608, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'text'
Great project!
Given the stress on modular design, I'm wondering how hard it is to "swap-in" the pytorch library and the models.
I want to follow the examples and where to get the datasets in the examples?
Since Ludwig is running on TensorFlow, it can run on a TPU, which can be useful especially combined with the free TPU provided by Colaboratory.
Is there a way to provide the appropriate TPU-connector URL to a Ludwig model? (for both training and inference).
I get this error when trying to perform prediction with a new image on trained model:
Traceback (most recent call last):
File "/home/andrey/.venvs/ludwig-learn/bin/ludwig", line 11, in <module>
sys.exit(main())
File "/home/andrey/.venvs/ludwig-learn/lib/python3.6/site-packages/ludwig/cli.py", line 86, in main
CLI()
File "/home/andrey/.venvs/ludwig-learn/lib/python3.6/site-packages/ludwig/cli.py", line 64, in __init__
getattr(self, args.command)()
File "/home/andrey/.venvs/ludwig-learn/lib/python3.6/site-packages/ludwig/cli.py", line 73, in predict
predict.cli(sys.argv[2:])
File "/home/andrey/.venvs/ludwig-learn/lib/python3.6/site-packages/ludwig/predict.py", line 379, in cli
full_predict(**vars(args))
File "/home/andrey/.venvs/ludwig-learn/lib/python3.6/site-packages/ludwig/predict.py", line 104, in full_predict
debug
File "/home/andrey/.venvs/ludwig-learn/lib/python3.6/site-packages/ludwig/predict.py", line 173, in predict
gpu_fraction=gpu_fraction
File "/home/andrey/.venvs/ludwig-learn/lib/python3.6/site-packages/ludwig/models/model.py", line 1182, in predict
only_predictions=only_predictions
File "/home/andrey/.venvs/ludwig-learn/lib/python3.6/site-packages/ludwig/models/model.py", line 756, in batch_evaluation
is_training=is_training
File "/home/andrey/.venvs/ludwig-learn/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 929, in run
run_metadata_ptr)
File "/home/andrey/.venvs/ludwig-learn/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1128, in _run
str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (1,) for Tensor 'image_path/image_path:0', which has shape '(?, 100, 100, 3)'
The model definition I use:
input_features:
-
name: image_path
type: image
encoder: stacked_cnn
in_memory: false
height: 100
width: 100
output_features:
-
name: tags
type: set`
Example of testing csv-file:
image_path,tags
testdata/cat.101.jpg
I've tried output features with category and set types but eventually get that error.
When using text features, the following error is raised:
Traceback (most recent call last):
File "/Users/user/.virtualenvs/ml/bin/ludwig", line 10, in <module>
sys.exit(main())
File "/Users/user/.virtualenvs/ml/lib/python3.6/site-packages/ludwig/cli.py", line 86, in main
CLI()
File "/Users/user/.virtualenvs/ml/lib/python3.6/site-packages/ludwig/cli.py", line 64, in __init__
getattr(self, args.command)()
File "/Users/user/.virtualenvs/ml/lib/python3.6/site-packages/ludwig/cli.py", line 70, in train
train.cli(sys.argv[2:])
File "/Users/user/.virtualenvs/ml/lib/python3.6/site-packages/ludwig/train.py", line 663, in cli
full_train(**vars(args))
File "/Users/user/.virtualenvs/ml/lib/python3.6/site-packages/ludwig/train.py", line 224, in full_train
random_seed=random_seed
File "/Users/user/.virtualenvs/ml/lib/python3.6/site-packages/ludwig/data/preprocessing.py", line 562, in preprocess_for_training
[training_set, validation_set, test_set]
File "/Users/user/.virtualenvs/ml/lib/python3.6/site-packages/ludwig/data/preprocessing.py", line 777, in replace_text_feature_level
level)
KeyError: 'name_word'
This is due to the following lines https://github.com/uber/ludwig/blob/9de6ee32f0e2e6cc6157f0772aa8c28a8d662fe8/ludwig/data/preprocessing.py#L774-L778 as it removes the columns regarding of whether they exist or not. Conditionally removing the columns fixes the issue and the model trains successfully.
The model definition I'm using is:
input_features:
-
name: name
type: text
encoder: rnn
level: char
output_features:
-
name: sex
type: category
The error is raised when you run the ludwig experiment --data_csv src/data/input/training.csv
for the second time with the same file name .csv input file. you have to change the file name every time you run the experiment. When it creates the .json and the .hdf5 for the first time it works, but when it loads the pre created .json and the .hdf5 files it raise the following error
ludwig experiment --data_csv src/data/input/training.csv --model_definition_file model_definition.yaml
Found hdf5 and json with the same filename of the csv, using them instead
Using full hdf5 and json
Loading data from: src/data/input/training.hdf5
Loading metadata from: src/data/input/training.json
usr/lib/python3.5/site-packages/h5py/_hl/dataset.py:313: H5pyDeprecationWarning: dataset.value has been deprecated. Use dataset[()] instead.
"Use dataset[()] instead.", H5pyDeprecationWarning)
Traceback (most recent call last):
File "/usr/local/bin/ludwig", line 11, in
load_entry_point('ludwig==0.1.0', 'console_scripts', 'ludwig')()
File "usr/lib/python3.5/site-packages/ludwig/cli.py", line 86, in main
CLI()
File "usr/lib/python3.5/site-packages/ludwig/cli.py", line 64, in init
getattr(self, args.command)()
File "usr/lib/python3.5/site-packages/ludwig/cli.py", line 67, in experiment
experiment.cli(sys.argv[2:])
File "usr/lib/python3.5/site-packages/ludwig/experiment.py", line 548, in cli
experiment(**vars(args))
File "usr/lib/python3.5/site-packages/ludwig/experiment.py", line 234, in experiment
random_seed=random_seed
File "usr/lib/python3.5/site-packages/ludwig/data/preprocessing.py", line 562, in preprocess_for_training
[training_set, validation_set, test_set]
File "usr/lib/python3.5/site-packages/ludwig/data/preprocessing.py", line 777, in replace_text_feature_level
level)
KeyError: 'text_char'
Starting ludwig
is really slow, taking more than 10 seconds to start on a macbook pro. This is probably due to some model imports.
$ time ludwig --help
usage: ludwig <command> [<args>]
Available sub-commands:
experiment Runs a full experiment training a model and testing it
train Trains a model
predict Predicts using a pretrained model
visualize Visualizes experimental results
collect_weights Collects tensors containing a pretrained model weights
collect_activations Collects tensors for each datapoint using a pretrained model
ludwig cli runner
positional arguments:
command Subcommand to run
optional arguments:
-h, --help show this help message and exit
real 0m11.582s
user 0m8.958s
sys 0m1.447s
Traceback (most recent call last):
File "/Users/liusitong/lstworkspace/wowo-server/services/crawler_py/venv/bin/ludwig", line 11, in
sys.exit(main())
File "/Users/liusitong/lstworkspace/wowo-server/services/crawler_py/venv/lib/python3.6/site-packages/ludwig/cli.py", line 86, in main
CLI()
File "/Users/liusitong/lstworkspace/wowo-server/services/crawler_py/venv/lib/python3.6/site-packages/ludwig/cli.py", line 64, in init
getattr(self, args.command)()
File "/Users/liusitong/lstworkspace/wowo-server/services/crawler_py/venv/lib/python3.6/site-packages/ludwig/cli.py", line 67, in experiment
experiment.cli(sys.argv[2:])
File "/Users/liusitong/lstworkspace/wowo-server/services/crawler_py/venv/lib/python3.6/site-packages/ludwig/experiment.py", line 548, in cli
experiment(**vars(args))
File "/Users/liusitong/lstworkspace/wowo-server/services/crawler_py/venv/lib/python3.6/site-packages/ludwig/experiment.py", line 258, in experiment
debug=debug
File "/Users/liusitong/lstworkspace/wowo-server/services/crawler_py/venv/lib/python3.6/site-packages/ludwig/train.py", line 368, in train
debug=debug
File "/Users/liusitong/lstworkspace/wowo-server/services/crawler_py/venv/lib/python3.6/site-packages/ludwig/models/model.py", line 110, in init
**kwargs
File "/Users/liusitong/lstworkspace/wowo-server/services/crawler_py/venv/lib/python3.6/site-packages/ludwig/models/model.py", line 184, in __build
is_training=self.is_training
File "/Users/liusitong/lstworkspace/wowo-server/services/crawler_py/venv/lib/python3.6/site-packages/ludwig/models/outputs.py", line 43, in build_outputs
**kwargs
File "/Users/liusitong/lstworkspace/wowo-server/services/crawler_py/venv/lib/python3.6/site-packages/ludwig/models/outputs.py", line 93, in build_single_output
**kwargs
File "/Users/liusitong/lstworkspace/wowo-server/services/crawler_py/venv/lib/python3.6/site-packages/ludwig/features/base_feature.py", line 314, in concat_dependencies_and_build_output
**kwargs
File "/Users/liusitong/lstworkspace/wowo-server/services/crawler_py/venv/lib/python3.6/site-packages/ludwig/features/sequence_feature.py", line 250, in build_output
kwarg=kwargs
File "/Users/liusitong/lstworkspace/wowo-server/services/crawler_py/venv/lib/python3.6/site-packages/ludwig/features/sequence_feature.py", line 320, in build_sequence_output
eval_loss
File "/Users/liusitong/lstworkspace/wowo-server/services/crawler_py/venv/lib/python3.6/site-packages/ludwig/features/sequence_feature.py", line 445, in sequence_measures
self.name
File "/Users/liusitong/lstworkspace/wowo-server/services/crawler_py/venv/lib/python3.6/site-packages/ludwig/models/modules/measure_modules.py", line 73, in masked_accuracy
maxlen=correct_predictions.shape[1], dtype=tf.int32)
File "/Users/liusitong/lstworkspace/wowo-server/services/crawler_py/venv/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py", line 2504, in sequence_mask
maxlen = ops.convert_to_tensor(maxlen)
File "/Users/liusitong/lstworkspace/wowo-server/services/crawler_py/venv/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1050, in convert_to_tensor
as_ref=False)
File "/Users/liusitong/lstworkspace/wowo-server/services/crawler_py/venv/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1146, in internal_convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File "/Users/liusitong/lstworkspace/wowo-server/services/crawler_py/venv/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py", line 282, in _dimension_tensor_conversion_function
raise ValueError("Cannot convert an unknown Dimension to a Tensor: %s" % d)
ValueError: Cannot convert an unknown Dimension to a Tensor: ?
This is either because I couldn't ๐ instructions in documentation or something missing from current functionality: is there a way to have something like .predict_proba
set of methods that would not only output target label(s) but also confidence/probability model assigns to it?
Upon first running, I get the error below. I traced this back to a problem with the way in which Windows resolves executable extensions, per the links below. I am able to resolve this for now by adding the shell=True argument to line 41 of ludwig\utils\misc.py, but this solution is not recommended. Any advice?
I am running Anaconda on Windows 10. Github Desktop and Git for Windows installed.
the Windows API function CreateProcess, used by subprocess under the hood, doesn't auto-resolve other executable extensions than .exe. On Windows, the 'git' command is really installed as git.cmd. Therefore, you should modify your example to explicitly invoke git.cmd
Traceback (most recent call last):
File "c:\anaconda3\lib\runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "c:\anaconda3\lib\runpy.py", line 85, in run_code
exec(code, run_globals)
File "C:\Anaconda3\Scripts\ludwig.exe_main.py", line 9, in
File "c:\anaconda3\lib\site-packages\ludwig\cli.py", line 86, in main
CLI()
File "c:\anaconda3\lib\site-packages\ludwig\cli.py", line 64, in init
getattr(self, args.command)()
File "c:\anaconda3\lib\site-packages\ludwig\cli.py", line 67, in experiment
experiment.cli(sys.argv[2:])
File "c:\anaconda3\lib\site-packages\ludwig\experiment.py", line 548, in cli
experiment(**vars(args))
File "c:\anaconda3\lib\site-packages\ludwig\experiment.py", line 201, in experiment
random_seed
File "c:\anaconda3\lib\site-packages\ludwig\utils\misc.py", line 43, in get_experiment_description
stdout=open(os.devnull, 'w')) == 0
File "c:\anaconda3\lib\subprocess.py", line 267, in call
with Popen(*popenargs, **kwargs) as p:
File "c:\anaconda3\lib\subprocess.py", line 709, in init
restore_signals, start_new_session)
File "c:\anaconda3\lib\subprocess.py", line 997, in _execute_child
startupinfo)
FileNotFoundError: [WinError 2] The system cannot find the file specified
The doc page here shows the command line argument to use for specifying a model definition file as "--model_definition", but I just learned that it should be "--model_definition_file". Please update ASAP as new users are likely to get tripped up by this (as I did). Thanks!
Its possible that I missed this in the Documentation, but is there a way to initialize from an existing model? For example: initializing an image encoder with a ResNet trained for the ImageNet task.
Thanks,
csg
Hello,
My small_dataset.csv
looks like this
user_id,item_id,dow,hod,category_id,cusine_id,restaurant_id,item_count
-6088859261566846925,601d4dc5-2ad,7,3,9c64431e-79f,fb8b5b7f-013,706247b3-86e,1
-4662982070704311223,af48a23f-f80,7,2,a989cb37-db5,8631ea96-a78,ea90883c-a2f,1
-4849981472186386714,bab8cfae-930,1,1,4f31b9ea-059,5aa1a680-942,d9ab8a13-380,1
-5400975689533971605,3ae9d553-282,4,0,37d510f8-745,c9aa3993-86f,0b8d021a-c5c,1
4810905585891548302,3d673534-8f8,3,1,4264559d-470,5aa1a680-942,5772716f-1f3,1
8523163246689801258,f5cab6a9-8f5,2,3,60db56cd-21e,2d470ebd-e46,4e4c60a4-bca,1
6141444640923780397,e1d46baf-89c,6,17,b7226a64-569,5aa1a680-942,910396af-1dd,1
7782246672655206426,617f0d50-e63,2,23,68f8231d-6e1,8631ea96-a78,344071ad-459,1
-3808793554476278979,56903221-9d1,5,5,79803e34-c64,e153fa17-ac6,5746c349-5e1,1
My model_definition
looks like this (I defined it programatically)
INPUT_COLUMN_NAMES = "user_id, item_id, dow, hod, category_id, cusine_id, restaurant_id, item_count".split(", ")
NUMERIC_COLUMN_NAMES = ["dow", "hod", "item_count"]
OUTPUT_COLUMN_NAMES = "item_id, category_id, restaurant_id".split(", ")
model_definition = {
"input_features": [
{
"name": col_name,
"type": "numerical" if col_name in NUMERIC_COLUMN_NAMES else "text",
"encoder": "parallel_cnn"
} for col_name in INPUT_COLUMN_NAMES
],
"output_features": [
{
"name": col_name,
"type": "numerical" if col_name in NUMERIC_COLUMN_NAMES else "text",
} for col_name in OUTPUT_COLUMN_NAMES
],
"training": {
"epochs": 10
}
}
When I run the train()
function I get the error KeyError: 'item_id_char'
Log
INFO:root:Model name: run
INFO:root:Output path: ../ludwig_model\_run_5
INFO:root:
INFO:root:ludwig_version: '0.1.0'
INFO:root:command: 'D:/playground/ml-playground/ludwig-test/cnn.py'
INFO:root:dataset_type: 'generic'
INFO:root:random_seed: 42
INFO:root:model_definition: { 'combiner': {'type': 'concat'},
'input_features': [ { 'encoder': 'parallel_cnn',
'level': 'word',
'name': 'user_id',
'tied_weights': None,
'type': 'text'},
{ 'encoder': 'parallel_cnn',
'level': 'word',
'name': 'item_id',
'tied_weights': None,
'type': 'text'},
{ 'encoder': 'parallel_cnn',
'name': 'dow',
'tied_weights': None,
'type': 'numerical'},
{ 'encoder': 'parallel_cnn',
'name': 'hod',
'tied_weights': None,
'type': 'numerical'},
{ 'encoder': 'parallel_cnn',
'level': 'word',
'name': 'category_id',
'tied_weights': None,
'type': 'text'},
{ 'encoder': 'parallel_cnn',
'level': 'word',
'name': 'cusine_id',
'tied_weights': None,
'type': 'text'},
{ 'encoder': 'parallel_cnn',
'level': 'word',
'name': 'restaurant_id',
'tied_weights': None,
'type': 'text'},
{ 'encoder': 'parallel_cnn',
'name': 'item_count',
'tied_weights': None,
'type': 'numerical'}],
'output_features': [ { 'decoder': 'generator',
'dependencies': [],
'level': 'char',
'loss': { 'class_distance_temperature': 0,
'class_weights': 1,
'type': 'softmax_cross_entropy',
'weight': 1},
'name': 'item_id',
'reduce_dependencies': 'sum',
'reduce_input': 'sum',
'type': 'text',
'weight': 1},
{ 'decoder': 'generator',
'dependencies': [],
'level': 'char',
'loss': { 'class_distance_temperature': 0,
'class_weights': 1,
'type': 'softmax_cross_entropy',
'weight': 1},
'name': 'category_id',
'reduce_dependencies': 'sum',
'reduce_input': 'sum',
'type': 'text',
'weight': 1},
{ 'decoder': 'generator',
'dependencies': [],
'level': 'char',
'loss': { 'class_distance_temperature': 0,
'class_weights': 1,
'type': 'softmax_cross_entropy',
'weight': 1},
'name': 'restaurant_id',
'reduce_dependencies': 'sum',
'reduce_input': 'sum',
'type': 'text',
'weight': 1}],
'preprocessing': { 'bag': { 'fill_value': '',
'format': 'space',
'lowercase': 10000,
'missing_value_strategy': 'fill_with_const',
'most_common': False},
'binary': { 'fill_value': 0,
'missing_value_strategy': 'fill_with_const'},
'category': { 'fill_value': '<UNK>',
'lowercase': False,
'missing_value_strategy': 'fill_with_const',
'most_common': 10000},
'force_split': False,
'image': {'missing_value_strategy': 'backfill'},
'numerical': { 'fill_value': 0,
'missing_value_strategy': 'fill_with_const'},
'sequence': { 'fill_value': '',
'format': 'space',
'lowercase': False,
'missing_value_strategy': 'fill_with_const',
'most_common': 20000,
'padding': 'right',
'padding_symbol': '<PAD>',
'sequence_length_limit': 256,
'unknown_symbol': '<UNK>'},
'set': { 'fill_value': '',
'format': 'space',
'lowercase': False,
'missing_value_strategy': 'fill_with_const',
'most_common': 10000},
'split_probabilities': (0.7, 0.1, 0.2),
'stratify': None,
'text': { 'char_format': 'characters',
'char_most_common': 70,
'char_sequence_length_limit': 1024,
'fill_value': '',
'lowercase': True,
'missing_value_strategy': 'fill_with_const',
'padding': 'right',
'padding_symbol': '<PAD>',
'unknown_symbol': '<UNK>',
'word_format': 'space_punct',
'word_most_common': 20000,
'word_sequence_length_limit': 256},
'timeseries': { 'fill_value': '',
'format': 'space',
'missing_value_strategy': 'fill_with_const',
'padding': 'right',
'padding_value': 0,
'timeseries_length_limit': 256}},
'training': { 'batch_size': 128,
'bucketing_field': None,
'decay': False,
'decay_rate': 0.96,
'decay_steps': 10000,
'dropout_rate': 0.0,
'early_stop': 3,
'epochs': 50,
'gradient_clipping': None,
'increase_batch_size_on_plateau': 0,
'increase_batch_size_on_plateau_max': 512,
'increase_batch_size_on_plateau_patience': 5,
'increase_batch_size_on_plateau_rate': 2,
'learning_rate': 0.001,
'learning_rate_warmup_epochs': 5,
'optimizer': { 'beta1': 0.9,
'beta2': 0.999,
'epsilon': 1e-08,
'type': 'adam'},
'reduce_learning_rate_on_plateau': 0,
'reduce_learning_rate_on_plateau_patience': 5,
'reduce_learning_rate_on_plateau_rate': 0.5,
'regularization_lambda': 0,
'regularizer': 'l2',
'staircase': False,
'validation_field': 'combined',
'validation_measure': 'loss'}}
INFO:root:
INFO:root:Using full dataframe
INFO:root:Building dataset (it may take a while)
INFO:root:Loading NLP pipeline
D:\playground\ml-playground\ludwig-test\venv\lib\site-packages\ludwig\features\numerical_feature.py:63: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead.
np.float32).as_matrix()
INFO:root:Writing train set metadata with vocabulary
Traceback (most recent call last):
File "D:/playground/ml-playground/ludwig-test/cnn.py", line 27, in <module>
main()
File "D:/playground/ml-playground/ludwig-test/cnn.py", line 23, in main
train(model)
File "D:/playground/ml-playground/ludwig-test/cnn.py", line 13, in train
model.train()
File "D:\playground\ml-playground\ludwig-test\model.py", line 63, in train
self.model.train(data_df=self.data_frame, logging_level=logging.INFO, output_directory="../ludwig_model")
File "D:\playground\ml-playground\ludwig-test\venv\lib\site-packages\ludwig\api.py", line 448, in train
random_seed=random_seed)
File "D:\playground\ml-playground\ludwig-test\venv\lib\site-packages\ludwig\data\preprocessing.py", line 561, in preprocess_for_training
[training_set, validation_set, test_set]
File "D:\playground\ml-playground\ludwig-test\venv\lib\site-packages\ludwig\data\preprocessing.py", line 769, in replace_text_feature_level
feature['level']
KeyError: 'item_id_char'
The datatype "numerical" is incorrectly specified as "numeric" in the getting started guide.
I tried to install it on ubuntu, details:
I get this error when trying to install ludwig:
Collecting ludwig
Could not find a version that satisfies the requirement ludwig (from versions: )
No matching distribution found for ludwig
Saving the current best model after epochs does not consume too much time if the training is for big models, but it could be a high cost for training small models. I have found no way to turn off the saving best model feature. Tried to set skip_save_progress_weights=True, skip_save_processed_input=True
but it did not help. Thanks
Would it be possible to export the models as TensorFlow models. In that way we wouldn'nt need to depend on ludwig
and its dependencies to just run the inference part, and additionally we could convert the models to tflite
so that we run them on mobile.
May be related to #55.
When I try to use the programmatic API,
model = LudwigModel(model_definition={}, model_definition_file="some/path/to/file.yml")
it throws AttributeError: 'str' object has no attribute 'get'
. It seems like when loading the model_definition_file, Ludwig is using yaml.load
, which, according to the pyyaml documentation, takes a byte string, a Unicode string, an open binary file object, or an open text file object. However, the ludwig documentation says that model_definition_file should be a file path. I tried passing in an open file object and it works.
I think either the documentation needs to be changed or the model initialization should be modified to actually take a file path instead.
Please add support for automated feature engineering, ideally build on top of the already wonderful feature tool library.
This is especially helpful when one could just declare the target feature in the YAML file and then just declare "autofeatures=True" to let feature tool find and rank the best features for classification.
https://github.com/Featuretools/featuretools
https://blog.featurelabs.com/deep-feature-synthesis/
Similarly, for time series, please add automated feature extraction based on the tsfrsh library.
This would be beyond incredible because it would save so much time on feature engineering.
I'm trying to use an existing model to improve it using train feature.
However, using --model_load_path results/experiment_run_0/model/
is creating a new experiment_run_1 with starting results as the experiment_run_0. Am I doing something wrong?
To help CLI users get started with Ludwig, it would be great if you could include an example with an associated downloadable .csv file and a matching model file to train it with.
The current "Getting started" documentation (and the associated examples section) all give a sketch of what the .csv file could look like for a particular scenario, but without a specific file that users can try. Providing a starting "hello world" pair of .csv /model files help eliminate these as a source of error when users are trying to establish their first successful run with ludwig.
As soon as my brand new model is ready, I would like ludwig
the enable predictions via the command line, so that one could do like
echo "call me at +3912345679 for offers" | ludwig predict --only_prediction --model_path /path/to/model -
where the -
may indicate the stdin
(as an example) and get the model predictions directly. This would help to integrated ludwing
command in a inference pipeline.
If I have understood well the api (my model is not ready yet...) it should be possibile via the programmatic api like that:
from ludwig import LudwigModel
# load a model
model = LudwigModel.load(model_path)
# obtain predictions
myDict = { 'text': ['call me at +3912345679 for offers'] }
predictions = model.predict(data_dict=myDict,
return_type=dict
batch_size=128,
gpus=None,
gpu_fraction=1,
logging_level=logging.DEBUG)
# close model (eventually)
model.close()
Is that correct?
I'm not sure if I missed it somewhere, but having the csvs used in the examples would be really helpful
CSV is good for numerical data, but when you have text data that may contain ,
and "
, escaping the values in the columns can be tricky and identification of delimiter comma is harder for CSV parsers.
Can you add support for data file formats, TSV and JSON which do not have the problems above as much?
I have tensorflow-gpu installed and Keras can use the GPU effectively. I only have one GPU.
With ludwig, I tried a regression problem and found the training is very slow.
train_stats = ludwig_model.train(data_df=df, logging_level=logging.ERROR, gpus=[0])
By watch -n 1 nvidia-smi, I found the training did not actually utilize the GPU but stored the data in the GPU memory anyway.
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.93 Driver Version: 410.93 CUDA Version: 10.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Quadro P6000 Off | 00000000:03:00.0 On | Off |
| 26% 44C P8 19W / 250W | 24289MiB / 24449MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 822 G /usr/bin/gnome-shell 186MiB |
| 0 4687 C /home/yshi1/anaconda3/bin/python 22929MiB |
| 0 10166 C /home/yshi1/anaconda3/bin/python 977MiB |
| 0 23459 G /usr/bin/X 191MiB |
+-----------------------------------------------------------------------------+
It might save others a bit of time to see an example of specifying training parameters in a YAML file. Took me a bit to figure out the role of dashes in YAML format, and that the training section of the model definition file expects a dictionary, not a list of dictionaries.
As in (note the lack of dashes under the 'training' section):
input_features:
-
name: image_path
type: image
encoder: stacked_cnn
-
name: some_number
type: numerical
output_features:
-
name: label
type: category
training:
batch_size: 64
Currently adding a new feature / encoder / decoder requires forking the package.
It would be awesome to be able to add them at run time as part of the API and/or a conf.py e.g.
ludwig.add_base_feature(NotNumericalBaseFeature)
It would a nice touch to expose the models once they are created. Perhaps Tensor Serving or Seldon could play a role in here.
Hello!
Can't install with pip on Ubuntu 18.10. The installation fails with the following error:
Collecting tensorflow>=1.12 (from ludwig)
Could not find a version that satisfies the requirement tensorflow>=1.12 (from ludwig) (from versions: 1.13.0rc1, 1.13.0rc2)
No matching distribution found for tensorflow>=1.12 (from ludwig)
Could you please look into it? Tensorflow version is 1.13 GPU.
Getting No module found 'en_core_web_sm' while training dataset
I ran following command
ludwig train --data_csv path/to/camera_dataset.csv --model_definition "{input_
features: [{name: Model, type: text}], output_features: [{name: Price, type: numerical}]}"
After running ludwig visualize --visualization learning_curves -ts results/experiment_run_4/training_statistics.json
, it returns empty. Where to get the visualization results?Is it saved as a image in somewhere?
Do I need to do some setting with the PYTHONPATH in the ~/.bash_profile and source ~/.bash_profile?
What is going on? Why can I not run it after running the commands in the getting started document for ludwig?
Thanks in advance!
Hello ,
when trying to install Ludwig on win 7
Python 3.6.8 used
I started with this command
pip install ludwig
it goes well for some time then it show the below error , it says requires Microsoft Visual C++ 14,so which component of it does it need?
copying cytoolz\tests\test_tlz.py -> build\lib.win-amd64-3.6\cytoolz\tests
copying cytoolz\tests\test_utils.py -> build\lib.win-amd64-3.6\cytoolz\tests
running build_ext
building 'cytoolz.dicttoolz' extension
error: Microsoft Visual C++ 14.0 is required. Get it with "Microsoft Visual C++ Build Tools": https://visualstudio.microsoft.com/downloads/
thanks
Ahmad
Seems like the example here for the predict API is wrong. It says the argument is dataset_csv
and dataset_df
when it should be data_csv
and data_df
. It's correct in the later part of the docs though.
E:\repos\tensorflow>pushd E:\repos\tensorflow
E:\repos\tensorflow>ludwig experiment --data_csv translation.csv --model_definition_file model_definition.yaml
e:\programdata\anaconda3\lib\site-packages\h5py_init_.py:36: FutureWarning: Conversion of the second argument of issubdtype from float
to np.floating
is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type
.
from .conv import register_converters as register_converters
Traceback (most recent call last):
File "e:\programdata\anaconda3\lib\runpy.py", line 193, in run_module_as_main
"main", mod_spec)
File "e:\programdata\anaconda3\lib\runpy.py", line 85, in run_code
exec(code, run_globals)
File "E:\ProgramData\Anaconda3\Scripts\ludwig.exe_main.py", line 5, in
File "e:\programdata\anaconda3\lib\site-packages\ludwig_init.py", line 16, in
from ludwig.api import LudwigModel
File "e:\programdata\anaconda3\lib\site-packages\ludwig\api.py", line 39, in
from ludwig.data.postprocessing import postprocess_df, postprocess
File "e:\programdata\anaconda3\lib\site-packages\ludwig\data\postprocessing.py", line 19, in
from ludwig.features.feature_registries import output_type_registry
File "e:\programdata\anaconda3\lib\site-packages\ludwig\features\feature_registries.py", line 33, in
from ludwig.features.image_feature import ImageBaseFeature
File "e:\programdata\anaconda3\lib\site-packages\ludwig\features\image_feature.py", line 24, in
from skimage.io import imread
File "e:\programdata\anaconda3\lib\site-packages\skimage_init.py", line 158, in
from .util.dtype import *
File "e:\programdata\anaconda3\lib\site-packages\skimage\util_init.py", line 7, in
from .arraycrop import crop
File "e:\programdata\anaconda3\lib\site-packages\skimage\util\arraycrop.py", line 8, in
from numpy.lib.arraypad import _validate_lengths
ImportError: cannot import name '_validate_lengths'
HI using py 3.6 under windows 10 with tensorflow or tensorflow gpu got this message
i installed it yesterday on another windows machine no problem but failed on my personal one
failed with the version of this evening #74 i think
btw great job!
same thing if i try in import ludwg under jupyter notebook
D:>ludwig
Traceback (most recent call last):
File "d:\anaconda3\lib\runpy.py", line 193, in run_module_as_main
"main", mod_spec)
File "d:\anaconda3\lib\runpy.py", line 85, in run_code
exec(code, run_globals)
File "d:\Anaconda3\Scripts\ludwig.exe_main.py", line 5, in
File "d:\anaconda3\lib\site-packages\ludwig_init.py", line 16, in
from ludwig.api import LudwigModel
File "d:\anaconda3\lib\site-packages\ludwig\api.py", line 38, in
from ludwig.data.dataset import Dataset
File "d:\anaconda3\lib\site-packages\ludwig\data\dataset.py", line 17, in
import h5py
File "d:\anaconda3\lib\site-packages\h5py_init_.py", line 26, in
from . import _errors
ImportError: cannot import name '_errors'
When using "train" command with training and validation data will cause inappropriate attribute access violation for Dataframe.
I was running ludwig on Windows 10 and Python 3.6.8.
ludwig train --data_train_csv data_train.csv --data_validation_csv data_val.csv -mdf model.yaml
Using training raw csv, no hdf5 and json file with the same name have been found
Building dataset (it may take a while)
Loading training csv...
done
Loading validation csv..
done
Loading test csv..
done
Concatenating csvs..
done
Traceback (most recent call last):
File "d:\python\python36\lib\runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "d:\python\python36\lib\runpy.py", line 85, in run_code
exec(code, run_globals)
File "D:\Python\python36\Scripts\ludwig.exe_main.py", line 9, in
File "d:\python\python36\lib\site-packages\ludwig\cli.py", line 86, in main
CLI()
File "d:\python\python36\lib\site-packages\ludwig\cli.py", line 64, in init
getattr(self, args.command)()
File "d:\python\python36\lib\site-packages\ludwig\cli.py", line 70, in train
train.cli(sys.argv[2:])
File "d:\python\python36\lib\site-packages\ludwig\train.py", line 663, in cli
full_train(**vars(args))
File "d:\python\python36\lib\site-packages\ludwig\train.py", line 224, in full_train
random_seed=random_seed
File "d:\python\python36\lib\site-packages\ludwig\data\preprocessing.py", line 487, in preprocess_for_training
random_seed=random_seed
File "d:\python\python36\lib\site-packages\ludwig\data\preprocessing.py", line 90, in build_dataset_df
global_preprocessing_parameters
File "d:\python\python36\lib\site-packages\ludwig\data\preprocessing.py", line 165, in build_data
preprocessing_parameters
File "d:\python\python36\lib\site-packages\ludwig\features\image_feature.py", line 57, in add_feature_data
csv_path = os.path.dirname(os.path.abspath(dataset_df.csv))
File "d:\python\python36\lib\site-packages\pandas\core\generic.py", line 5067, in getattr
return object.getattribute(self, name)
AttributeError: 'DataFrame' object has no attribute 'csv'
Receiving this error trying to run a simple image classification example
File "/Users/colinpetit/.pyenv/versions/3.6.5/lib/python3.6/site-packages/ludwig/features/image_feature.py", line 101, in add_feature_data
data[feature['name']][i, :, :, :] = img
ValueError: could not broadcast input array from shape (743,620,3) into shape (383,442,4)
model_definition_file:
input_features:
-
name: img
type: image
encoder: stacked_cnn
output_features:
-
name: type
type: category
training:
epochs: 10
I'm facing import error while executing the ludwig from google colab.. !!
Below is the code which I executed in steps:
Loading Ludwig
!pip install ludwig
Upload file from local to Drive - to use in Colab
from google.colab import files
uploaded = files.upload()
Train DataSet using ludwig cmd
!ludwig train --data_csv ludwig_traindataset_1.csv --model_definition_file model_def.yaml
My model_def.yaml
input_features:
- name: utterance
type: sequence
encoder: rnn
cell_type: lstm
bidirectional: true
num_layers: 2
reduce_output: None
output_features:
- name: intent
type: category
reduce_input: sum
num_fc_layers: 1
fc_size: 64
- name: slots
type: sequence
decoder: tagger
Dummy Dataset:
Attached the screenshot error:
Can anyone help me on this in Google Colab GPU / Python 3.6
Are unittests going to be open sourced? At the moment only integration tests have been published.
On the positive side, the Developers Guide says:
Current test coverage is limited to several integration tests which ensure end-to-end functionality but we are planning to expand it.
which hints that more testing is yet to be added.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.