Giter Site home page Giter Site logo

big data out of memory about tensorrec HOT 9 CLOSED

jfkirk avatar jfkirk commented on May 24, 2024
big data out of memory

from tensorrec.

Comments (9)

jfkirk avatar jfkirk commented on May 24, 2024 3

I was able to get up to your scale of data and fit a model with RMSE loss.

from tensorrec import TensorRec
from tensorrec.util import generate_dummy_data

model = TensorRec(n_components=10)

interactions, user_features, item_features = generate_dummy_data(
	num_users=4000000, num_items=20000, interaction_density=.0015, num_user_features=200,
	num_item_features=200, n_features_per_user=10, n_features_per_item=1,  pos_int_ratio=1
)

model.fit(user_features=user_features, item_features=item_features, interactions=interactions, verbose=True)

I had to scale down n_components to 10 or TensorFlow started to hit segmentation faults when the data passed ~1m users.

Closing for now -- please let me know if you run in to more trouble!

from tensorrec.

jfkirk avatar jfkirk commented on May 24, 2024 2

Ah, this makes sense. The issue is that the prediction matrix is dense and, even if the loss function doesn't require it, the dense predictions are calculated and the serialized predictions are pulled out of it. I previously had written two separate prediction graphs, one serial and one dense, but at some point I (incorrectly) consolidated down to the one. I'll fix this by re-adding the serial prediction graph.

Its worth noting that this will not solve the issue with dense loss functions (like WMRB) but RMSE and Separation will be okay.

In the mean-time, you can get around this by shrinking the size of (users x items) until this dense matrix doesn't hit OOM. If you have too many users and items to do this feasibly, maybe try breaking up users in to feasibly-sized chunks and calling fit-partial on the chunks in sequence.

Thank you for reporting this, @mmmmming18 !

from tensorrec.

jfkirk avatar jfkirk commented on May 24, 2024 1

Quick update: I've refactored the serial prediction graph so it can handle a larger scale of data without having to do dense prediction. I also had to refactor the dummy data generator to be able to handle this much data.

I'm now testing to see if the new graph can get close to the scale of data you have.

from tensorrec.

jfkirk avatar jfkirk commented on May 24, 2024

What kind of numbers are you looking at for "big"? If you give me:

#users
#user features
#items
#item features
#interactions

I'll try to find the weak link and smooth it out.

from tensorrec.

MingCong18 avatar MingCong18 commented on May 24, 2024

@jfkirk
#users: 4M users
#user features: 200
#items: 20000
#item features: 200
#interactions: 4M * 20000

from tensorrec.

jfkirk avatar jfkirk commented on May 24, 2024

That's 80B interactions -- is that data dense, or are most interactions 0? What data type are you using to feed in the interactions?

from tensorrec.

MingCong18 avatar MingCong18 commented on May 24, 2024

The interaction matrix is sparse, most are 0.
I created the matrix using "interactions = sp.lil_matrix((num_users, num_items))", similar to your code in util.py.

from tensorrec.

jfkirk avatar jfkirk commented on May 24, 2024

How many are non-zero? I imagine this is where the problem is coming from -- at full density, this matrix would be at least ~320GB.

Also, are the feature values dense or sparse? If sparse, how many of the features are non-zero?

Can you drop the stack trace you're getting, too, just to help me debug? Thanks for taking the time!

from tensorrec.

MingCong18 avatar MingCong18 commented on May 24, 2024

In each row, the average number of non-zero value is about 30 in interaction matrix, 10 in user feature matrix and only 1 in item feature matrix.

The below is the OOM error. I ran it using cpu only.

2018-01-30 10:09:23.212384: W tensorflow/core/framework/op_kernel.cc:1192] Resource exhausted: OOM when allocating tensor with shape[4539556,29101]
Traceback (most recent call last):
File "", line 2, in
File "/usr/local/lib/python2.7/dist-packages/tensorrec/tensorrec.py", line 225, in fit
out_sample_interactions)
File "/usr/local/lib/python2.7/dist-packages/tensorrec/tensorrec.py", line 272, in fit_partial
session.run(self.tf_optimizer, feed_dict=feed_dict)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 889, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1120, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1317, in _do_run
options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1336, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[4539556,29101]
[[Node: MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](SparseTensorDenseMatMul/SparseTensorDenseMatMul, SparseTensorDenseMatMul_1/SparseTensorDenseMatMul)]]

Caused by op u'MatMul', defined at:
File "", line 2, in
File "/usr/local/lib/python2.7/dist-packages/tensorrec/tensorrec.py", line 225, in fit
out_sample_interactions)
File "/usr/local/lib/python2.7/dist-packages/tensorrec/tensorrec.py", line 257, in fit_partial
self._build_tf_graph(n_user_features=user_features.shape[1], n_item_features=item_features.shape[1])
File "/usr/local/lib/python2.7/dist-packages/tensorrec/tensorrec.py", line 172, in _build_tf_graph
+ tf.expand_dims(self.tf_projected_item_biases, 0)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/math_ops.py", line 1891, in matmul
a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_math_ops.py", line 2437, in _mat_mul
name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2956, in create_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1470, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[4539556,29101]
[[Node: MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](SparseTensorDenseMatMul/SparseTensorDenseMatMul, SparseTensorDenseMatMul_1/SparseTensorDenseMatMul)]]

from tensorrec.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.