tensorflow / decision-forests Goto Github PK
View Code? Open in Web Editor NEWA collection of state-of-the-art algorithms for the training, serving and interpretation of Decision Forest models in Keras.
License: Apache License 2.0
A collection of state-of-the-art algorithms for the training, serving and interpretation of Decision Forest models in Keras.
License: Apache License 2.0
I trained a model successfully. I was also able to use model.evaluate,model.summary, and tfdf.model_plotter.plot_model_in_colab(model, tree_idx=0, max_depth=4)
But when I tried to save it using:
model.save("hypermodels/model")
I am getting the following error:
ValueError: Got non-flat/non-unique argument names for SavedModel signature 'serving_default': more than one argument to '__inference_signature_wrapper_12650' was named 'build_existing_model.geometry_foundation_type_Heated Basement'. Signatures have one Tensor per named input, so to have predictable names Python functions used to generate these signatures should avoid *args and Tensors in nested structures unless unique names are specified for each. Use tf.TensorSpec(..., name=...) to provide a name for a Tensor input.
I have succussfuly run this Decision Forest Algorithm. However, my data has severe imbalance between categories, in which case the Accuracy
is not fair to evaluate the model performance. I would like to ask are there options of f1
, precision
, and recall
applied as the metrics?
I was able to fit my RandomForest model, however when I try to convert it into tflite format it throws error.
The error is : InvalidArgumentError: Cannot convert a Tensor of dtype resource to a NumPy array.
I am getting the following error when I try a simple model.
csv_feature_columns = ['weekday_weekend'] + weather_columns + building_columns + schedules_columns + encoded_time_columns + ["total_site_electricity_kwh"]
train_df = pd.read_csv(timeseries_file_path,usecols=csv_feature_columns,nrows=10000)
train_ds = tfdf.keras.pd_dataframe_to_tf_dataset(train_df, label="total_site_electricity_kwh")
model = tfdf.keras.RandomForestModel()
model.fit(train_ds)
157/157 [==============================] - 6s 18ms/step
---------------------------------------------------------------------------
NotFoundError Traceback (most recent call last)
<ipython-input-6-ce1e05e4d2c8> in <module>
1 # Train a Random Forest model.
2 model = tfdf.keras.RandomForestModel()
----> 3 model.fit(train_ds)
4
~/.conda/envs/tensorflow25/lib/python3.7/site-packages/tensorflow_decision_forests/keras/core.py in fit(self, x, y, callbacks, **kwargs)
743
744 history = super(CoreModel, self).fit(
--> 745 x=x, y=y, epochs=1, callbacks=callbacks, **kwargs)
746
747 self._build(x)
~/.conda/envs/tensorflow25/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_batch_size, validation_freq, max_queue_size, workers, use_multiprocessing)
1227 epoch_logs.update(val_logs)
1228
-> 1229 callbacks.on_epoch_end(epoch, epoch_logs)
1230 training_logs = epoch_logs
1231 if self.stop_training:
~/.conda/envs/tensorflow25/lib/python3.7/site-packages/tensorflow/python/keras/callbacks.py in on_epoch_end(self, epoch, logs)
433 logs = self._process_logs(logs)
434 for callback in self.callbacks:
--> 435 callback.on_epoch_end(epoch, logs)
436
437 def on_train_batch_begin(self, batch, logs=None):
~/.conda/envs/tensorflow25/lib/python3.7/site-packages/tensorflow_decision_forests/keras/core.py in on_epoch_end(***failed resolving arguments***)
930 del logs
931 if epoch == 0:
--> 932 self._model._train_model() # pylint:disable=protected-access
933
934
~/.conda/envs/tensorflow25/lib/python3.7/site-packages/tensorflow_decision_forests/keras/core.py in _train_model(self)
864 guide=guide,
865 training_config=self._advanced_arguments.yggdrasil_training_config,
--> 866 deployment_config=self._advanced_arguments.yggdrasil_deployment_config,
867 )
868
~/.conda/envs/tensorflow25/lib/python3.7/site-packages/tensorflow_decision_forests/tensorflow/core.py in train(input_ids, label_id, model_id, learner, task, generic_hparms, ranking_group, training_config, deployment_config, guide, model_dir, keep_model_in_resource)
503 training_config=training_config.SerializeToString(),
504 deployment_config=deployment_config.SerializeToString(),
--> 505 guide=guide.SerializeToString())
506
507
~/.conda/envs/tensorflow25/lib/python3.7/site-packages/tensorflow/python/util/tf_export.py in wrapper(*args, **kwargs)
402 'Please pass these args as kwargs instead.'
403 .format(f=f.__name__, kwargs=f_argspec.args))
--> 404 return f(**kwargs)
405
406 return tf_decorator.make_decorator(f, wrapper, decorator_argspec=f_argspec)
~/.conda/envs/tensorflow25/lib/python3.7/site-packages/tensorflow_decision_forests/tensorflow/ops/training/op.py in simple_ml_model_trainer(feature_ids, label_id, weight_id, model_id, model_dir, learner, hparams, task, training_config, deployment_config, guide, name)
510 return _result
511 except _core._NotOkStatusException as e:
--> 512 _ops.raise_from_not_ok_status(e, name)
513 except _core._FallbackException:
514 pass
~/.conda/envs/tensorflow25/lib/python3.7/site-packages/tensorflow/python/framework/ops.py in raise_from_not_ok_status(e, name)
6895 message = e.message + (" name: " + name if name is not None else "")
6896 # pylint: disable=protected-access
-> 6897 six.raise_from(core._status_to_exception(e.code, message), None)
6898 # pylint: enable=protected-access
6899
~/.conda/envs/tensorflow25/lib/python3.7/site-packages/six.py in raise_from(value, from_value)
NotFoundError: Resource decision_forests/ 12-in/N27tensorflow_decision_forests3ops23AbstractFeatureResourceE does not exist. [Op:SimpleMLModelTrainer]
When running a few RF with multiprocessing(in parallel) its working. but when running a few RF with multiprocessing after RF its stuck. I'm running multiprocessing with the class multiprocessing by running the command:
pool = multiprocessing.Pool()
pool.map(func, input)
in func I'm running tensorflow-RF
Any idea why this is happening?
Thanks,
Tsachi
I just ran out of space on /tmp/ after training about 200 decision forests. I think the temporary directory created at
is never cleaned up, even after the python process ends. Each model that I was training required about 20 megs. So after 200 models I had 4 gigs in /tmp/ and my operating system said "what is all that doing there??" and got mad at me.I have two ideas about this.
_train_model
. So we should be able to use a https://docs.python.org/3/library/tempfile.html#tempfile.TemporaryDirectory context manager, just for that call (unless the user explicitly provdes a temporary directory).Thoughts?
Tried to disable early-stopping & validation data but it seems like it does not work
model = tfdf.keras.GradientBoostedTreesModel(
num_trees=n_trees,
growing_strategy="BEST_FIRST_GLOBAL",
max_depth=depth,
min_examples=1,
shrinkage=learning_rate,
categorical_algorithm="RANDOM",
use_hessian_gain=True,
validation_ratio=0.0,
early_stopping=None,
temp_directory=tmp_dir_name
)
model.fit(x=x_selected, y=y)
File "/opt/conda/envs/tf2.5.0/lib/python3.8/site-packages/tensorflow_decision_forests/keras/core.py", line 780, in fit
history = super(CoreModel, self).fit(
File "/opt/conda/envs/tf2.5.0/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py", line 1229, in fit
callbacks.on_epoch_end(epoch, epoch_logs)
File "/opt/conda/envs/tf2.5.0/lib/python3.8/site-packages/tensorflow/python/keras/callbacks.py", line 435, in on_epoch_end
callback.on_epoch_end(epoch, logs)
File "/opt/conda/envs/tf2.5.0/lib/python3.8/site-packages/tensorflow_decision_forests/keras/core.py", line 994, in on_epoch_end
self._model._train_model() # pylint:disable=protected-access
File "/opt/conda/envs/tf2.5.0/lib/python3.8/site-packages/tensorflow_decision_forests/keras/core.py", line 915, in _train_model
tf_core.train(
File "/opt/conda/envs/tf2.5.0/lib/python3.8/site-packages/tensorflow_decision_forests/tensorflow/core.py", line 494, in train
return training_op.SimpleMLModelTrainer(
File "/opt/conda/envs/tf2.5.0/lib/python3.8/site-packages/tensorflow/python/util/tf_export.py", line 404, in wrapper
return f(**kwargs)
File "/opt/conda/envs/tf2.5.0/lib/python3.8/site-packages/tensorflow_decision_forests/tensorflow/ops/training/op.py", line 512, in simple_ml_model_trainer
_ops.raise_from_not_ok_status(e, name)
File "/opt/conda/envs/tf2.5.0/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 6897, in raise_from_not_ok_status
six.raise_from(core._status_to_exception(e.code, message), None)
File "", line 3, in raise_from
tensorflow.python.framework.errors_impl.UnknownError: TensorFlow: INVALID_ARGUMENT: Early stopping requires a validation set. Either set "validation_set_ratio" to be greater than 0, or disable early stopping. [Op:SimpleMLModelTrainer]
Good day.
Just to check:
Would I be able to save a tfdf model (in, for example, a tflite format) and then load the model to perform on-device inference in smartphones?
Thank you.
Cheers.
I tried to evaluate the model using:
evaluation = model.evaluate(test_ds, return_dict=True)
But I am getting the following error:
ValueError: SyncOnReadVariable does not support
assign_addin cross-replica context when aggregation is set to tf.VariableAggregation.SUM.
The training was completed successfully.
Decision Forests work well on small datasets where cross-validation is commonly used. It would be valuable to easily run cross-validations and report cross-validated related metrics (evaluation metrics, confidence interfaces, statistical tests, etc.).
Hey,
I ran the feature significant and compared the results to Sklearn output. not only the results are different, but also the results that I'm getting using this implementation doesn't make any sense(using the info that I have about my data).
for example a feature that is constant is one of the most significant features(it got the heights value).
maybe I don't know how to read the output properly?
("data:0.33" (1; #27), 235)
this means that feature number 27 got score of 235?
Tsachi
Background
Currently, the training graph contains one tf op for each input features. In case of large number of features (or in the case of multi dimensional features), this can lead to a large overhead (large memory consumption, large training initialization stage).
Features request
Support for multi dimensional features without creating an op for each dimension.
My training data is in a multi GB CSV file. I have built a data pipeline using tf.data to stream this data and do some pre-processing,. Can I use these dataset objects in tfdf model.fit (similar to how it is done in Keras) or does tfdf need the dataset to have all the data stored in memory?
I would like to be able to explicitly name my model. I've seen that the models have a name attribute but it does not appear to be possible to set this manually.
I've tried:
setattr(model, 'name', 'my_cool_model')
And:
tfdf.keras.RandomForestModel(name = 'my_cool_model')
Differently from GBT, it seems that RandomForest does not have logit output
The logits are available in v0.1.7, but the signature is different from sklearn:
model = tfdf.keras.GradientBoostedTreesModel(apply_link_function=False)
Hey there,
First of all, congratulations for your effort, this is a great initiative!
I am raising this issue because I have faced a problem with installation. I have created a Python 3.8.6 virtual environment on my Mac and installed tensorflow
2.5.0 successfully. When I ran the installation command for the "Tensorflow Decision Forests" package,
pip3 install tensorflow_decision_forests --upgrade
I got:
ERROR: Could not find a version that satisfies the requirement tensorflow_decision_forests (from versions: none) ERROR: No matching distribution found for tensorflow_decision_forests
It's a bit confusing because the installation command on PyPi (I guess this is the right one) contains dashes ,instead of underscores, in the package name.
Any ideas?
Thanks a lot
I'm asking for this feature because the dataset I'm working on is generally greater than RAM size (>1.5TiB)
For regular Tensorflow tasks, this can be get around via tweaking training loops and dataset
API.
As for TFDF, if I understand correctly, is an wrapping over Yggdrasil C API, datasets are either copied or moved to Yggdrasil as a whole,
However I'm seeing some interesting codes in Yggdrasil:
https://github.com/google/yggdrasil-decision-forests/blob/52ed2571c46baa9738f81d7341dc27700dbfec73/yggdrasil_decision_forests/utils/filesystem_test.cc#L84-L93
https://github.com/google/yggdrasil-decision-forests/blob/52ed2571c46baa9738f81d7341dc27700dbfec73/yggdrasil_decision_forests/utils/filesystem_test.cc#L132-L140
I wonder if you could clarify a bit on how datasets are handled in and between TFDF and Yggdrasil. Is it even possible to train an large dataset (> RAM size). If that could be achieved via playing around TFRecord, are they relate to how we define TFRecord data layout?
There is a problem with Tensorflow_decision_forests after updating to version 2.6.0
here is the gist https://colab.research.google.com/gist/lukebor/70f7abd84d547bf39c4a8b47394e7017/beginner_colab.ipynb
I have used tensorflow beginner tutorial and upgraded the tf. If there is other way to import tfdf please let me know
Hello,I found a performance issue in the definition of _synthetic_train_and_test
,
tensorflow_decision_forests/keras/keras_test.py,
compression_type="GZIP").map(parse) was called without num_parallel_calls.
I think it will increase the efficiency of your program if you add this.
The same issues also exist in test_path, compression_type="GZIP").map(parse).batch(50).map(preprocess)
Here is the documemtation of tensorflow to support this thing.
Looking forward to your reply. Btw, I am very glad to create a PR to fix it if you are too busy.
It seems the Keras ModelCheckpoint
call back doesn't work with TFDF. Is there an alternate way to create checkpoints during training? I am training on a data set with tens of millions of samples and it takes several hours to train. I want to save the progress so that it doesn't need to retrain from scratch in case training crashes.
Hello!
First of all, I highly appreciate your efforts for TFDF
Found that there are multiple options for variable importance such as NUM_AS_ROOT
variable_importance = model.make_inspector().variable_importances()['NUM_AS_ROOT']
Thank you!
I can't install TFDF on Google Colab. The minimum working example I've made is here, and the first cell where I install and load the library fails.
The error is
NotFoundError Traceback (most recent call last)
<ipython-input-5-4f2d5416ffd4> in <module>()
1 get_ipython().system('pip install tensorflow_decision_forests')
2 import tensorflow as tf
----> 3 import tensorflow_decision_forests as tfdf
/usr/local/lib/python3.7/dist-packages/tensorflow_decision_forests/__init__.py in <module>()
49 __author__ = "Mathieu Guillame-Bert"
50
---> 51 from tensorflow_decision_forests import keras
52 from tensorflow_decision_forests.component import py_tree
53 from tensorflow_decision_forests.component.builder import builder
/usr/local/lib/python3.7/dist-packages/tensorflow_decision_forests/keras/__init__.py in <module>()
47 from typing import Callable, List
48
---> 49 from tensorflow_decision_forests.keras import core
50 from tensorflow_decision_forests.keras import wrappers
51
/usr/local/lib/python3.7/dist-packages/tensorflow_decision_forests/keras/core.py in <module>()
58 from tensorflow.python.training.tracking import base as base_tracking # pylint: disable=g-direct-tensorflow-import
59 from tensorflow_decision_forests.component.inspector import inspector as inspector_lib
---> 60 from tensorflow_decision_forests.tensorflow import core as tf_core
61 from tensorflow_decision_forests.tensorflow.ops.inference import api as tf_op
62 from tensorflow_decision_forests.tensorflow.ops.training import op as training_op
/usr/local/lib/python3.7/dist-packages/tensorflow_decision_forests/tensorflow/core.py in <module>()
29 import tensorflow as tf
30
---> 31 from tensorflow_decision_forests.tensorflow.ops.training import api as training_op
32 from yggdrasil_decision_forests.dataset import data_spec_pb2
33 from yggdrasil_decision_forests.learner import abstract_learner_pb2
/usr/local/lib/python3.7/dist-packages/tensorflow_decision_forests/tensorflow/ops/training/api.py in <module>()
22 from tensorflow.python.framework import load_library
23 from tensorflow.python.platform import resource_loader
---> 24 tf.load_op_library(resource_loader.get_path_to_datafile("training.so"))
/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/load_library.py in load_op_library(library_filename)
55 Raises:
56 RuntimeError: when unable to load the library or get the python wrappers.
---> 57 """
58 lib_handle = py_tf.TF_LoadLibrary(library_filename)
59 try:
NotFoundError: /usr/local/lib/python3.7/dist-packages/tensorflow_decision_forests/tensorflow/ops/training/training.so: undefined symbol: _ZN10tensorflow14kernel_factory17OpKernelRegistrar12InitInternalEPKNS_9KernelDefEN4absl14lts_2020_09_2311string_viewESt10unique_ptrINS0_15OpKernelFactoryESt14default_deleteIS9_EE
I want to predict the probabilities of all classes in multiclass classification problem. How do i do it?
Hi
An error happens when I try to load the saved model.
The code worked well with other Keras models. Thus, this may be a TFDF bug.
How to reproduce the issue
I saved a random forest model as follows
model = tfdf.keras.RandomForestModel(num_trees=4000,
max_depth=16,
min_examples=1,
winner_take_all=False,
categorical_algorithm="RANDOM")
model.fit(x=X, y=y)
model.save('./random_forest_model')
When I tried to load the saved model in a different file. An error happened as follows
The issue did not happen, if I tried to load the model in the same file where the model was generated & saved.
classifier_model_loaded = tf.keras.models.load_model(classifier_model, compile=False)
File "/opt/conda/envs/tf2.5.0/lib/python3.8/site-packages/tensorflow/python/keras/saving/save.py", line 206, in load_model
return saved_model_load.load(filepath, compile, options)
File "/opt/conda/envs/tf2.5.0/lib/python3.8/site-packages/tensorflow/python/keras/saving/saved_model/load.py", line 152, in load
loaded = tf_load.load_partial(path, nodes_to_load, options=options)
File "/opt/conda/envs/tf2.5.0/lib/python3.8/site-packages/tensorflow/python/saved_model/load.py", line 775, in load_partial
return load_internal(export_dir, tags, options, filters=filters)
File "/opt/conda/envs/tf2.5.0/lib/python3.8/site-packages/tensorflow/python/saved_model/load.py", line 908, in load_internal
raise FileNotFoundError(
FileNotFoundError: Op type not registered 'SimpleMLInferenceOpWithHandle' in binary running on c1boes2. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) tf.contrib.resampler
should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
If trying to load on a different device from the computational device, consider using setting the experimental_io_device
option on tf.saved_model.LoadOptions to the io_device such as '/job:localhost'.
Hi
Issue symptom
When loaded a saved model from disk, predict is missing
How to reproduce
Create a model
Save it to disk
load it from disk
call model.predict (If I do like model(X, training=False), it works well)
model = tfdf.keras.RandomForestModel(num_trees=n_trees, max_depth=depth, min_examples=1)
model.fit(x=x_selected, y=y)
path = './random_forest'
os.makedirs(path, exist_ok=True)
file_name = tempfile.TemporaryDirectory(dir=path).name
model.save(file_name)
loaded_model = tf.saved_model.load(file_name)
score = loaded_model.predict(x_selected)
AttributeError: '_UserObject' object has no attribute 'predict'
Thanks!
Hi, thanks for this package!
I'm looking into converting an existing neural network model to use the decision forests approach, for comparison and/or use in an ensemble.
The existing neural network I've been developing has multiple outputs (mixture of regression and classification), and some of these outputs feed into other outputs (the other outputs also have access to the rest of the training data).
It might be easier to explain via simple example (in the likely case that I'm not putting it into words well!). Let's say I have input features In and 3 target labels A, B, and C. My neural network works something like this:
A = Model(In)
B = Model(In + A)
C = Model(In + A + B)
This gives me a unified model for A, B, and C, which can be trained, saved, and loaded as one entity.
I can see how I might achieve something like this with decision-forests by using the preprocessing argument, passing in the training data and the previous model, and returning the training data with an added column for the prediction of the previous model. The final model would give a single output, but I could write something to load each model and make a list of all the predictions. In a similar vein, I could write something to load and augment the training data before the training of each model, as an alternative to using preprocessing.
Is there a way to obtain multi-output models in a way that is nicer than the (potentially silly) approach above?
Thanks for any help, sorry if this question doesn't make much sense, I can clarify if needed!
I am able to save and then re-load a model. But when I use the re-loaded model for prediction or evaluation, I get the following error:
model.save("hypermodels/model")
model = tf.keras.models.load_model("hypermodels/model/")
energy_predictions = model.predict(train_ds,verbose=1)
InvalidArgumentError: Unexpected dimension of numerical_features bank.
[[{{node gradient_boosted_trees_model_1/StatefulPartitionedCall/StatefulPartitionedCall/inference_op}}]] [Op:__inference_predict_function_28619]
Function call stack:
predict_function
Hello all together,
I have a short question regarding the training time needed by this specific model.
For digging into the material I used the example from TensorFlow Website with the Penguin Data and started the training on my Linux Laptop with a NVIDIA GeForce GTX 1050 Ti with GPU support enabled.
Now I am wondering why the model takes more than a hour for only the training of 300 rows of data with 5 features or so...
Have anyone a benchmark value?
I would really appreciate your help guys.
Best regards
Julian
After !pip install tensorflow_decision_forests --upgrade
and try to import tensorflow_decision_forests as tfdf
,
I found this error NotFoundError: /opt/conda/lib/python3.7/site-packages/tensorflow_decision_forests/tensorflow/ops/training/training.so: undefined symbol: _ZN10tensorflow11GetNodeAttrERKNS_9AttrSliceEN4absl14lts_2020_09_2311string_viewEPSs
I've tried to uninstall the tensorflow and reinstall the tensorflow==2.3.0, but does not work.
Please let me know if you have any comments
Problem:
Installing via pip install tensorflow-decision-forests
returns warning message:
After installation the library can't be used because of:
Specifying version via ==0.1.3
doesn't help, and reinstalling too.
Hi
Please , review my small update for the README file here
I know it's not big but it gust a beginning
I've been following the example posted here to obtain predictions from individual trees within a GradientBoostedTreesModel
i.e.
# Train model
model = tfdf.keras.GradientBoostedTreesModel()
model.compile(metrics=["accuracy"])
model.fit(train_ds)
# Extract trees
trees = model.make_inspector().extract_all_trees()
# Build model with one tree
builder = tfdf.builder.GradientBoostedTreeBuilder(
path = "model",
objective=inspector_bt.objective()
)
builder.add_tree(trees[0])
builder.close()
However, it fails when calling builder.close()
with the following error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-21-f4a8f4f498e3> in <module>
7 # Add first tree
8 builder_bt.add_tree(trees_bt[0])
----> 9 builder_bt.close()
/usr/local/lib/python3.6/site-packages/tensorflow_decision_forests/component/builder/builder.py in close(self)
737
738 # Should be called last.
--> 739 super(GradientBoostedTreeBuilder, self).close()
740
741 def specialized_header(self) -> Any:
/usr/local/lib/python3.6/site-packages/tensorflow_decision_forests/component/builder/builder.py in close(self)
500
501 for tree in self._trees:
--> 502 self._write_branch(tree.root)
503 self._trees = []
504
/usr/local/lib/python3.6/site-packages/tensorflow_decision_forests/component/builder/builder.py in _write_branch(self, node)
586
587 # Converts the node into a proto node.
--> 588 core_node = py_tree.node.node_to_core_node(node, self.dataspec)
589
590 # Write the node to disk.
/usr/local/lib/python3.6/site-packages/tensorflow_decision_forests/component/py_tree/node.py in node_to_core_node(node, dataspec)
153 condition_lib.set_core_node(node.condition, dataspec, core_node)
154 if node.value is not None:
--> 155 value_lib.set_core_node(node.value, core_node)
156
157 elif isinstance(node, LeafNode):
/usr/local/lib/python3.6/site-packages/tensorflow_decision_forests/component/py_tree/value.py in set_core_node(value, core_node)
154 core_node.regressor.top_value = value.value
155 if value.standard_deviation is not None:
--> 156 dist = core_node.regressor.dist
157 dist.count = value.num_examples
158 dist.sum = 0
AttributeError: dist
I've tested a possible fix for this by changing this line (line 156 above) to dist = core_node.regressor.distribution
as used elsewhere in the codebase (see here) and it seems to work, but I'd appreciate the eyes of someone that is more familiar with the code than I am.
It's possible that this hasn't been caught previously as none of the tests here seem to include the standard deviation in the RegressionValue
.
Background
My tensorflow codes work on GPU. They have some matrix operations which can be done fast on GPU. If they run with tfdf, the data must be downloaded from GPU & uploaded to GPU when classification is done. In terms of throughput, this is a great loss.
Feature Request
Please support GPU especially for inference like predict function. Training can take times because an user can try various configurations to find the best one. This is understandable. However, applying the trained model must meet the runtime requirement.
Which tf.distribute
strategy would be most suitable to use with tfdf if we were to use it with multiple nodes of an HPC.
The upstream dmlc xgboost has a feature called interaction constraints.
This feature is useful to train highly explainable models for high-risk applications like lending. It would be wonderful if TFDF boosting supported a similar option.
I'm working on an application where I'd like to retrieve the standard deviation of the predictions made by the trees within an ensemble (currently a tfdf.keras.RandomForestModel
) to use as an estimate of the confidence of a given prediction.
It looks like I could do this by running a prediction on each individual tree with inspector.iterate_on_nodes()
but is there a better way to do this via the main predict
method, and if not would you consider this as an enhancement?
Hi
Could you support something like model.predict_log_proba?
The last layer of TFDF should have something like sigmoid(A) or exp(A) to make [0, 1] ranged output. This is good.
However, I also do need A output without sigmoid or exp. This will allow me to have much wider ranged output than sigmoid(A) or exp(A).
Hope that you consider this positively because this is very important to my tasks ;-)
Really look forward to seeing this feature on next release
Thanks a lot!
First of all, thanks a lot. I love TFDT!
Issue symptom
Found the issue when I ran the same code 2 times
If I over-write an existing model, then some issue happens when I load the model from disk
How to reproduce the issue
generate a model, save it to a file, and then, load it to perform inference. This is good
model = tfdf.keras.RandomForestModel(num_trees=n_trees, max_depth=depth, min_examples=1)
model.fit(x=X, y=y)
model.save('./slf_random_forest')
loaded_model = tf.saved_model.load('./random_forest')
Score = loaded_model(X, training=False)
Run the same code again & get an error as follows. If I remove the './random_forest' first, then everything is good
File "/dl_data/users/howardlee/opwi_algo/Post/Selfi/Algo/Classifiers.py", line 76, in decision_function
Score = self.clf(X, training=False)
File "/opt/conda/envs/tf2.5.0/lib/python3.8/site-packages/tensorflow/python/saved_model/load.py", line 670, in _call_attribute
return instance.__call__(*args, **kwargs)
File "/opt/conda/envs/tf2.5.0/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 889, in __call__
result = self._call(*args, **kwds)
File "/opt/conda/envs/tf2.5.0/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 956, in _call
return self._concrete_stateful_fn._call_flat(
File "/opt/conda/envs/tf2.5.0/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 1960, in _call_flat
return self._build_call_outputs(self._inference_function.call(
File "/opt/conda/envs/tf2.5.0/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 591, in call
outputs = execute.execute(
File "/opt/conda/envs/tf2.5.0/lib/python3.8/site-packages/tensorflow/python/eager/execute.py", line 59, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Unexpected dimension of numerical_features bank.
[[{{node StatefulPartitionedCall/StatefulPartitionedCall/inference_op}}]]
(1) Invalid argument: Unexpected dimension of numerical_features bank.
[[{{node StatefulPartitionedCall/StatefulPartitionedCall/inference_op}}]]
[[StatefulPartitionedCall/StatefulPartitionedCall/inference_op/_4]]
0 successful operations.
0 derived errors ignored. [Op:__inference_restored_function_body_5262]
Function call stack:
restored_function_body -> restored_function_body
Hi,
it would be very helpful to have a C5.0 Decision tree algorithm implementation in tfdf as there is none until now for Python and guess there is quite some demand to have this well know and one of the best algorithms at hand in Python!
It is quite different to CART:
-multiple branches,
I am sure it would boost the recognition and usability of tfdf library and make it especially useful for when strong and simple models that are directly explainable are needed.
Thank you for taking note!
TensorFlow Decision Forests appears being memory hungry. I compared it with PyCaret on Colab. TensorFlow Decision Forests crashed with the message “Your session crashed after using all available RAM.”, while PyCaret completed the work. Is there any feasible way to solve this problem?
Dear authors,
I used tfdf.pd_dataframe_to_tf_dataset for train and test set respectively after making sure that both train and test had all 4 classes (single label for each data point).
I found that labels in two sets were integer encoded ([0 1 2 3]
).
I defined:
train = tfdf.keras.pd_dataframe_to_tf_dataset(df_train, label=label_column_name)
test = tfdf.keras.pd_dataframe_to_tf_dataset(df_test, label=label_column_name)
model = RandomForestModel(num_trees=5)
model.fit(train, validation_data=test)
It raised error:
ValueError: Shapes (None, 4) and (None, 1) are incompatible
Then I move to this code:
model.fit(train)
model.evaluate(test)
It raised error:
ValueError: Shapes (None, 4) and (None, 1) are incompatible
Then, I checked:
pred = model.predict(test)
print(pred[0])
print(np.unique(pred))
Output:
[0. 1. 0. 0.]
[0. 0.2 0.4 0.6 0.8 1. ]
Please help me to fix this error.
Thank you so much.
Hi,
I have built the TF DF model and I am trying to serve it using Docker, I am using the following commands:
# Saved the model using the command:
model.save(MODEL_SAVE_PATH)
# Docker commands
docker pull tensorflow/serving
docker run -d --name serv_base_img tensorflow/serving
docker cp $PWD/models/my_classifier1 serv_base_img:/models/my_classifier1
docker commit --change "ENV MODEL_NAME my_classifier1" serv_base_img my_classifier1
docker run -p 8501:8501 --mount type=bind,source=$PWD/models/my_classifier1,target=/models/my_classifier1 -e MODEL_NAME=my_classifier1 -t tensorflow/serving &
I am getting the following issue:
[1] 76832
2021-06-16 13:03:59.138269: I tensorflow_serving/model_servers/server.cc:89] Building single TensorFlow model file config: model_name: my_classifier1 model_base_path: /models/my_classifier1
2021-06-16 13:03:59.138494: I tensorflow_serving/model_servers/server_core.cc:465] Adding/updating models.
2021-06-16 13:03:59.138511: I tensorflow_serving/model_servers/server_core.cc:591] (Re-)adding model: my_classifier1
2021-06-16 13:03:59.258773: I tensorflow_serving/core/basic_manager.cc:740] Successfully reserved resources to load servable {name: my_classifier1 version: 1}
2021-06-16 13:03:59.258814: I tensorflow_serving/core/loader_harness.cc:66] Approving load for servable version {name: my_classifier1 version: 1}
2021-06-16 13:03:59.258834: I tensorflow_serving/core/loader_harness.cc:74] Loading servable version {name: my_classifier1 version: 1}
2021-06-16 13:03:59.259636: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:38] Reading SavedModel from: /models/my_classifier1/001
2021-06-16 13:03:59.300033: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:90] Reading meta graph with tags { serve }
2021-06-16 13:03:59.300099: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:132] Reading SavedModel debug info (if present) from: /models/my_classifier1/001
2021-06-16 13:03:59.301471: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-06-16 13:03:59.351039: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:277] SavedModel load for tags { serve }; Status: fail: Not found: Op type not registered 'SimpleMLCreateModelResource' in binary running on de74cefbb44d. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.. Took 91403 microseconds.
2021-06-16 13:03:59.351122: E tensorflow_serving/util/retrier.cc:37] Loading servable: {name: my_classifier1 version: 1} failed: Not found: Op type not registered 'SimpleMLCreateModelResource' in binary running on de74cefbb44d. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
Any solution for this?
Thank you!!!
i got an error when load model using keras.models.load_model
from different devices. This is my complete code:
from tensorflow import keras
model_path = '/content/drive/MyDrive/saved_model/my_model'
imported = keras.models.load_model(model_path)
I got an error like this:
NotFoundError: Op type not registered 'SimpleMLCreateModelResource' in binary running on 135bfc4bd927. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.)
tf.contrib.resamplershould be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
The training and evaluation was completed successfully
I heard that predict_log_proba will be supported on 0.1.7 but it is still missing
Please take a look at #26
Hello,
The intermediate_colab ("Combine With Other Models") tutorial does a good job at showing how to preprocess a string to a categorical set. This is the example function provided:
def prepare_dataset(example):
label = (example["label"] + 1) // 2
return {"sentence" : tf.strings.split(example["sentence"])}, label
train_ds = all_ds["train"].batch(64).map(prepare_dataset)
test_ds = all_ds["validation"].batch(64).map(prepare_dataset)
From my understanding, tf.strings.split
isn't the best way of doing this because it wont drop duplicates. For example, a text feature “The TV is the best” would be represented by {"The","TV","is","the","best"} when using tf.string.split. According to this article, it should instead be transformed to the following categorical set: {“best”, “is”, “the”, “TV}."
Is dropping duplicates necessary?
I just wanted to know if there is a plan for GPU.
My codes are in tensorflow. Therefore, running on GPU is very important in terms of throughput.
df_and_nn_model = tfdf.keras.GradientBoostedTreesModel(preprocessing=regmodel_wo_head,
task=tfdf.keras.Task.REGRESSION,
num_trees=500,
max_depth=2,
max_num_nodes=-1,
min_examples=5,
validation_ratio=0.2,
subsample=0.9,
early_stopping='MIN_LOSS_FINAL',
shrinkage=0.001)
and after
df_and_nn_model.compile(metrics=[tf.keras.metrics.RootMeanSquaredError()])
with sys_pipes():
df_and_nn_model.fit(train_dataset, validation_data=val_dataset)
[INFO kernel.cc:772] Configure learner
[FATAL hyper_parameters.cc:49] Already consumed hyper-parameter "max_depth".
This was working yesterday morning but i made updates on kaggle and it throws this exception, i have no idea what it means.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.