hehefan / pointrnn Goto Github PK

View Code? Open in Web Editor NEW

143.0 12.0 20.0 623 KB

TensorFlow implementation of PointRNN, PointGRU and PointLSTM.

License: MIT License

Python 58.07% Makefile 2.00% C++ 27.96% Cuda 11.98%

pointrnn's People

Contributors

Stargazers

Watchers

pointrnn's Issues

when to release the code

hello, @hehefan thanks for your paper, i'm interested in the implementation EMD loss , so when do you share your code? thanks.

When to share the code

hello, @hehefan thanks for your paper, i'm interested in the implementation ``EMD``` loss , so when do you share your code? thanks.

Hi @hehefan,
I am trying to implement this complete architecture of PointRNN using Pytorch and the modules you have provided on the radar points of the nuscenes dataset. There are a few issues. Firstly, my loss values are going very high as I am using Chamfer's distance and Earth Movers distance for my loss as stated in the paper. The loss values are going to more than 100000. I do not know whether this is normal. Then at certain point, the states being returned from the RNN cells are giving very high values in the range of 10 raised to 36. Then, the loss values returned turn out to be Nan values. Any help from you would be highly appreciated as I am working under a deadline.

Thank you.

about scripts/visualization

Thanks for sharing the source code!
I met a problem when I ran the visualization.py file. In 183 row: flow = flows[i], I noticed that "flows is not defined".
Hope your answer, thanks!!!

Training with argoverse dataset

Hi, @hehefan !
Thanks for your nice work!
In the process of training the argo dataset, I have some problems. I downloaded the original dataset, and processed the data according to the format of the test set you gave. The data can be read normally. After the training, I checked the training log and found that the two evaluation indicators did not drop significantly, but fluctuated within a less than ideal range. The result of the evaluation also did not achieve the expected effect in the paper.
My question is: 1) Is there a problem with my data processing? Are you using all the data? 2) Can you provide train.log so that I think I might be able to find the problem better.
This is the field I just started, and your work has inspired me a lot, thank you.

When share the code?

hello, @hehefan thanks for your paper, i'm interested in the implementation ``EMD``` loss , so when do you share your code?

tensorflow.python.framework.errors_impl.

tensorflow.python.framework.errors_impl.NotFoundError: /home/wh/wyt/PointRNN-master/modules/tf_ops/sampling/tf_sampling_so.so: undefined symbol: _ZN10tensorflow8internal21CheckOpMessageBuilder9NewStringEv

There is this generated file in my path TF_ sampling_ so. So, but an error is reported and the file cannot be found

Found Inf or NaN global norm. : Tensor had NaN values

Why does this happen when I run the MNIST dataset

/home/wh/anaconda3/envs/tensor12/bin/python /home/wh/wyt/minst/PointRNN-master/train-mmnist.py
/home/wh/anaconda3/envs/tensor12/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:523: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/wh/anaconda3/envs/tensor12/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:524: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/wh/anaconda3/envs/tensor12/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/wh/anaconda3/envs/tensor12/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:526: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/wh/anaconda3/envs/tensor12/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:527: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/wh/anaconda3/envs/tensor12/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:532: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
np_resource = np.dtype([("resource", np.ubyte, 1)])
2021-12-11 10:52:19.865697: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2021-12-11 10:52:20.149034: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-12-11 10:52:20.149929: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
name: Quadro K620M major: 5 minor: 0 memoryClockRate(GHz): 1.124
pciBusID: 0000:08:00.0
totalMemory: 1.96GiB freeMemory: 1.38GiB
2021-12-11 10:52:20.149977: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2021-12-11 10:52:35.883070: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-12-11 10:52:35.883148: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
2021-12-11 10:52:35.883169: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
2021-12-11 10:52:35.883457: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1122 MB memory) -> physical GPU (device: 0, name: Quadro K620M, pci bus id: 0000:08:00.0, compute capability: 5.0)
2021-12-11 10:53:09.238386: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.06GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2021-12-11 10:53:09.294904: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.05GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2021-12-11 10:53:09.800776: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.02GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2021-12-11 10:53:09.801196: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.03GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2021-12-11 10:53:09.803051: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.03GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2021-12-11 10:53:09.804134: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.06GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2021-12-11 10:53:13.046394: E tensorflow/core/kernels/check_numerics_op.cc:185] abnormal_detected_host @0x202da0800 = {1, 0} Found Inf or NaN global norm.
Traceback (most recent call last):
File "/home/wh/anaconda3/envs/tensor12/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call
return fn(*args)
File "/home/wh/anaconda3/envs/tensor12/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/home/wh/anaconda3/envs/tensor12/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Found Inf or NaN global norm. : Tensor had NaN values
[[{{node VerifyFinite/CheckNumerics}} = CheckNumericsT=DT_FLOAT, message="Found Inf or NaN global norm.", _device="/job:localhost/replica:0/task:0/device:GPU:0"]]
[[{{node Adam/update/_1332}} = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_30256_Adam/update", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/wh/wyt/minst/PointRNN-master/train-mmnist.py", line 86, in
cd, emd, step, summary, predictions, _ = sess.run([model.cd, model.emd, model.global_step, summary_op, model.predicted_frames, model.train_op], feed_dict=feed_dict)
File "/home/wh/anaconda3/envs/tensor12/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 929, in run
run_metadata_ptr)
File "/home/wh/anaconda3/envs/tensor12/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _run
feed_dict_tensor, options, run_metadata)
File "/home/wh/anaconda3/envs/tensor12/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
run_metadata)
File "/home/wh/anaconda3/envs/tensor12/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Found Inf or NaN global norm. : Tensor had NaN values
[[node VerifyFinite/CheckNumerics (defined at /home/wh/wyt/minst/PointRNN-master/models/mmnist.py:600) = CheckNumericsT=DT_FLOAT, message="Found Inf or NaN global norm.", _device="/job:localhost/replica:0/task:0/device:GPU:0"]]
[[{{node Adam/update/_1332}} = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_30256_Adam/update", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

Caused by op 'VerifyFinite/CheckNumerics', defined at:
File "/home/wh/wyt/minst/PointRNN-master/train-mmnist.py", line 63, in
is_training=True)
File "/home/wh/wyt/minst/PointRNN-master/models/mmnist.py", line 600, in init
clipped_gradients, norm = tf.clip_by_global_norm(gradients, max_gradient_norm)
File "/home/wh/anaconda3/envs/tensor12/lib/python3.6/site-packages/tensorflow/python/ops/clip_ops.py", line 265, in clip_by_global_norm
"Found Inf or NaN global norm.")
File "/home/wh/anaconda3/envs/tensor12/lib/python3.6/site-packages/tensorflow/python/ops/numerics.py", line 47, in verify_tensor_all_finite
verify_input = array_ops.check_numerics(t, message=msg)
File "/home/wh/anaconda3/envs/tensor12/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py", line 817, in check_numerics
"CheckNumerics", tensor=tensor, message=message, name=name)
File "/home/wh/anaconda3/envs/tensor12/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/wh/anaconda3/envs/tensor12/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "/home/wh/anaconda3/envs/tensor12/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
op_def=op_def)
File "/home/wh/anaconda3/envs/tensor12/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1770, in init
self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): Found Inf or NaN global norm. : Tensor had NaN values
[[node VerifyFinite/CheckNumerics (defined at /home/wh/wyt/minst/PointRNN-master/models/mmnist.py:600) = CheckNumericsT=DT_FLOAT, message="Found Inf or NaN global norm.", _device="/job:localhost/replica:0/task:0/device:GPU:0"]]
[[{{node Adam/update/_1332}} = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_30256_Adam/update", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

Process finished with exit code 1

Calculation of CD and EMD loss

Hey!
while calculating chamfer distance loss using tf_nn_distance module

dists_forward, _, dists_backward, _ = tf_nndistance.nn_distance(predicted_frames[i], frames[i+int(seq_length/2)]) loss_cd = tf.reduce_mean(input_tensor=dists_forward+dists_backward)

dists_forward has the shape output: dist1: (batch_size,#point_1) that means it's averaged over num_points
and hence you aren't dividing it by num_points.

while calculating earth mover distance using tf_approxmatch module

match = tf_approxmatch.approx_match(frames[i+int(seq_length/2)], predicted_frames[i]) emd_distance = tf.reduce_mean(input_tensor=tf_approxmatch.match_cost(frames[i+int(seq_length/2)], predicted_frames[i], match))

match_cost has the shape match : batch_size * #query_points * #dataset_points and after taking reduce mean it's giving the mean over whole point cloud and that's why you are dividing the emd loss as self.emd /= (int(seq_length/2)*num_points)

But in the loss calculation you are using self.loss += (alpha*loss_cd + beta*loss_emd) before dividing loss_emd with the num_points that means loss_cd is calculated over per molecule but loss _emd is calculated over per point cloud and let's assume if we take alpha as 1 and beta as 1, will that mean we aren't taking the equal contribution of both type of loss?

@hehefan Could you please clarify this? Thanks.

Argoverse Question

Hello @hehefan ,

First thank you for the work and to make it available!

I am having difficulties loading the argoverse dateset.
The argoverse dateset I downloaded and extracted is in cvs format, it appears I should convert it to npy file in order to load to the program.

Could you explain how do you extracted and preprocessed the Argoverse dataset, in order to run the "train-argo-nu.py".

Thank you

about usage

Hi, @hehefan ,

Thanks for releasing such a useful package. However, there's lack of guide on how to use it. Could you provide some guide for it (such as visualization on the prediction, training steps, etc.)?

THX!

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

hehefan / pointrnn Goto Github PK

pointrnn's People

Contributors

Stargazers

Watchers

Forkers

pointrnn's Issues

when to release the code

When to share the code

Nan loss values

about scripts/visualization

Training with argoverse dataset

When share the code?

tensorflow.python.framework.errors_impl.

Found Inf or NaN global norm. : Tensor had NaN values

Calculation of CD and EMD loss

Argoverse Question

about usage

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent