isl-org / deeplagrangianfluids Goto Github PK

View Code? Open in Web Editor NEW

198.0 198.0 40.0 18.72 MB

Code repository for "Lagrangian Fluid Simulation with Continuous Convolutions", ICLR 2020.

License: Other

Shell 0.89% Python 99.11%

cnn convnet convolutional-neural-networks deeplearning fluids simulation

deeplagrangianfluids's People

Contributors

Stargazers

Watchers

Forkers

greenty5 jarkkom daydreamer2023 cock-puncher roar-robotics chloezpan tamwaiban gmzang peterzhousz mfkiwl piba941 shuowang-ai duytrangiale syguan96 jwhuseu cfd-dem-team tianyuzelin syam-s bakhtiyar-k hanfengzhai arjun-mani bruinxiong dfldylan phoenixdigitalfx lizat-i uiskngs raynehe tjcd2929 zijieli-jlee tangmcs shutengw yangaoxu cajetanoriekezie ustb-ai3d mariaelenavalencia dtbinh yujinakanishi iceki7 maxhang

deeplagrangianfluids's Issues

Can you provide your generated data?

It takes so long for me to generate data. I will be very grateful if you can provide the generated data.

A question about the comparison of efficiency between cconv and traditional PBF

Excellent work!
But I didn't find the comparision of efficiency between PBF and ur approach(cconv) in this paper.
So I just wonder if ur approach runs much more faster than traditional PBF in the simulation of same scenes.
Thx a lot！

Question about the code of evaluation on the whole sequence

Hi,
I wonder whether there is an issue with the code for evaluating on the whole sequence. The input of the model as well as gt_pos is the position of the fluid at current time step. After the model forwards, the new pr_pos should be the position of the next time step. However, in Line 233, the errors are still computed between pr_pos and gt_pos, which are not aligned in time.

Is my understanding wrong or it's a bug with the code?

DeepLagrangianFluids/scripts/evaluate_network.py

Lines 221 to 226 in d651c6f

    
               init_pos = torch.from_numpy(data['pos0'][0]).to(device) 
        
               init_vel = torch.from_numpy(data['vel0'][0]).to(device) 
        
               inputs = (init_pos, init_vel, None, box, box_normals) 
        
           else: 
        
               inputs = (pr_pos, pr_vel, None, box, box_normals)

DeepLagrangianFluids/scripts/evaluate_network.py

Lines 232 to 238 in d651c6f

    
           gt_pos = data['pos0'][0] 
        
           fluid_errors.add_errors(scene_id, 
        
                                   0, 
        
                                   frame_id, 
        
                                   scale * pr_pos.cpu().numpy(), 
        
                                   scale * gt_pos, 
        
                                   compute_gt2pred_distance=True)

Question about the input of viscosity?

Hi! I noticed that the input of fluid particles is described by , which includes the viscosity. But in the code the input just contains [1, v] without viscosity. That makes me confused.
What's more, what's the meaning of the '1' in the tuple?
Looking forward to your reply!

The inference and training code doesn't work properly

After I run the pretrained model：
cd scripts ./run_network.py --weights pretrained_model_weights.h5 \ --scene example_scene.json \ --output example_out \ --write-ply \ train_network.py
I got the right output ( after visualized it). However, I found the code worked on CPU although the GPU memory usage is about 100%. Hence the inference time was about 6 hours.
After I run the training code:
cd datasets ./create_data.sh cd ../ python scripts/train_network.py
It works on GPU, however, the loss was NaN. I install the latest SPlisHSPlasH (v2.7.0), it has exchanged Static/DynamicBoundarySimulator by SPHSimulator which can handle dynamic and static scenes. So I generate the data by SPHSimulator. I have also tried to use older version of SPlisHSPlasH (v2.5.0), however, 'create_data.sh' cannot run successfully with DynamicBoundarySimulator.
I wonder if the generated training data wrong, so could you please provide your generated data or check the code.

Training the network with small timestep

Hi the team,

I have a question about the timestep that you used in the paper. According to the paper, the team has sampled the input data with the frequency of 50Hz. The ground truth simulation run for 1s each and therefore, the timestep in this case is 1/50 = 0.02s. I just wonder why do we need to sample the simulation data at this frequency? Why don't we just use all the data and with the timestep used in the SPH simulation? For my case, I use the same timestep in simulation, which is in the scale of 10^(-5), and the behaviour of the network is very strange. Could you please explain the reason for this?

Thanks for your help.
Cheers

module 'partio' has no attribute 'read'

Got this error when I try to run the example

./run_network.py --weights pretrained_model_weights.h5 \
                 --scene example_scene.json \
                 --output example_out \
                 --write-ply \
                 train_network_tf.py

Is the version of partio different now?

pyopenvdb module

Hi,

I'm currently trying to use scripts/create_surface_meshes.py, but it asks for pyopenvdb. I'm wondering which version I should be downloading.
When I used python3 -m pip install pyopenvdb, it generates the following error

ERROR: Could not find a version that satisfies the requirement pyopenvdb
ERROR: No matching distribution found for pyopenvdb

Any insight on how this can be done is immensely appreciated!

*.msgpack.zst': No such file or directory

Hi
After running The data generation script in the datasets subfolder : ./create_data.sh , i get the following error for all sim__ folders generated under our_defautl_scenes :
mv: cannot stat 'ours_default_data/sim__*.msgpack.zst': No such file or directory
This leads to an empty valid and train folders under our_default_data .
any guide line on how to solve this would be great

Thanks

importerror when run the example pretrained network

I got an error saying
ImportError: /home/xxx/FluidCC/MY_ENV/lib/python3.7/site-packages/open3d/cpu/pybind.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZNSt3__119__shared_weak_countD2Ev
when run the example
`./run_network.py --weights pretrained_model_weights.h5 \

             --scene example_scene.json \
             --output example_out \
             --write-ply \
             train_network_tf.py

`
I have added the path where the .so file is installed into the LD_LIBRARY_PATH and restarted bash, but it still has this error.
So I think it might be a problem when building the .so file, but I have no idea what problem cause this error.

The build log shows I only have three things that are skipped:

Performing Test CMAKE_HAVE_LIBC_PTHREAD - FAILED
Could not find Vulkan
Check the size of off64_t failed

I am running on Ubuntu 20.04 and use a cpu version of tensorflow 2.0 with python3.7 in my virtual environment.
Could you give me some suggestion?

SPlisHSPlasH binaries

You have said in the splishsplash_config.py "Config file that stores the paths to the SPlisHSPlasH binaries ", but I could not find the DynamicBoundarySimulator binary and the VolumeSampling binary in SPlisHSPlasH-2.4.0. There are only some dirs named DynamicBoundarySimulator and VolumeSampling . How to get them? Thanks!

Please set the path to the DynamicBoundarySimulator in

when run:

cd datasets
./create_data.sh

It's raise:

Traceback (most recent call last):
  File "create_physics_scenes.py", line 22, in <module>
    from splishsplash_config import SIMULATOR_BIN, VOLUME_SAMPLING_BIN
  File "/home/weiwenying/projects/tests/DeepLagrangianFluids/datasets/splishsplash_config.py", line 6, in <module>
    raise ValueError(
ValueError: Please set the path to the DynamicBoundarySimulator in /home/weiwenying/projects/tests/DeepLagrangianFluids/datasets/splishsplash_config.py

where DynamicBoundarySimulator? How to get it? Thanks!

Will you provide a pytorch version continuous conv?

Thanks for the great work! I am amazed by the efficiency of Continuous Convolutions as described in the paper and hope that I can use it in pytorch. I see a pytorch folder in open3d.ml. Does this mean you will port CConv to pytorch soon?

Question about learning viscosity

Hi Benjamin,

Could you please describe more details on how did you train the network to learn different viscosity? The results in the paper showed that the model can predict different behaviors for different viscosity but it is not clear how to get that results. Thanks for your help.

Cheers,
Duy

Visualizing Bounding Box Simulation in Blender

Hello,

Thanks for open-sourcing the code for this great paper! I'm exploring this field of learning particle-based fluid simulation, and I was looking to your codebase for how to visualize simulations. It seems the workflow involves importing a .blend file into Blender and using the blender_external_mesh.py file to load particle data.

The dataset I'm working with involves fluid simulation in bounding box containers. Is it possible for you to provide bounding_boxes.blend? I see the canyon blend file, but 'bounding_boxes.blend' is referenced in the obj files but not provided. I would really appreciate it. In addition, if you could provide any guidance on how to create such a blend file for known bounding box coordinates, that would be extremely helpful! (as I am very new to Blender). Thanks very much, I look forward to hearing from you.

[Question] Open3D with ML module -> Building Tensorflow ops on Windows is currently not supported.

Hey not directly related to your repo, however you might know the answers :

When compiling open3d with ML module on windows (with BUILD_GLFW=ON to avoid problen with GLFW lib) I got the following error:

$ cmake .. -DCMAKE_BUILD_TYPE=Release -DBUILD_TENSORFLOW_OPS=ON -DBUILD_CUDA_MODULE=ON -DBUILD_GLFW=ON
-- New GUI is currently unsupported on Windows
<...>
-- Building library 3rdparty_poisson from source
CMake Error at src/Open3D/ML/CMakeLists.txt:2 (message):
Building Tensorflow ops on Windows is currently not supported.

That seems unexpected. Do you guys have an hint on what could be happening here?

Thanks!

Florent

Timestep used in the training process

Hello everyone,

I'm currently doing some experiments with the code to understand the parameters used in it. There is one thing I'm not clear is about the timestep during the training process. Should the timestep (dt) used in the training model match with the timestep used in simulation? My simulation produces a list of files, each file records all the states of particles in one frame. The timestep I used in simulation is quite small (around 2e-5 seconds).

When I train the model with timestep of 0.02s, there is no issue even though the prediction is not really good (especially for the boundary collision) but somehow looks sensible. The problem starts when I used the same timestep in the simulation to train the model. During the training, the loss decreases overtime, however, it get stuck and fluctuates about the 0.3 or 0.4 without further improvement (I also tried to modify the learning rate but not work). After finish the training, I use that model to run test case. However, when I put the first frame to let the model predict, the predicted position and the delta x just blow up (get very big value) just after a few steps (4 or 5 steps).

Could you please suggest me some reasons for this?

Thanks.

tf.function retracing warning

Hello,

I tried to use different sets of data (generated by me) for network training, however, for some datasets, I get the following warning, but the training finishes.

WARNING:tensorflow:5 out of the last 5 calls to <function main..train at 0x7f48328aa560> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for more details.

For other datasets, sometimes the warning shows up and the training crashes right after or after several steps. Sometimes, everything goes ok.

What could be the reason for the warning? All datasets go through the same collection and conversion procedure.

GPU consumption keeps increasing until CUDA out of memory

Hi, I came across an OOM issue during training it. It seems the GPU consumption keeps increasing in the training loop, and I added some print to see the GPU allocated situation.

while trainer.keep_training(step,
                                train_params.max_iter,
                                checkpoint_manager=manager,
                                display_str_list=display_str_list):
        print("begin loop:{}".format(torch.cuda.memory_allocated()))
        data_fetch_start = time.time()
        batch = next(data_iter)
        batch_torch = {}
        for k in ('pos0', 'vel0', 'pos1', 'pos2', 'box', 'box_normals'):
            batch_torch[k] = [torch.from_numpy(x).to(device) for x in batch[k]]
        data_fetch_latency = time.time() - data_fetch_start
        trainer.log_scalar_every_n_minutes(5, 'DataLatency', data_fetch_latency)
        # print("batch prepared:{}".format(torch.cuda.memory_allocated()))

        current_loss = train(model, batch_torch)
        scheduler.step()
        display_str_list = ['loss', float(current_loss)]
        # print("loss computed:{}".format(torch.cuda.memory_allocated()))

        if trainer.current_step % 10 == 0:
            trainer.summary_writer.add_scalar('TotalLoss', float(current_loss),
                                              trainer.current_step)
            trainer.summary_writer.add_scalar('LearningRate',
                                              scheduler.get_last_lr()[0],
                                              trainer.current_step)
            # print("write loss and lr to summary:{}".format(torch.cuda.memory_allocated()))

        if trainer.current_step % (1 * _k) == 0:
            for k, v in evaluate(model,
                                 val_dataset,
                                 frame_skip=20,
                                 device=device).items():
                trainer.summary_writer.add_scalar('eval/' + k, v,
                                                  trainer.current_step) 
            # print("evaluate completed:{}".format(torch.cuda.memory_allocated()))
        torch.cuda.empty_cache()
        # print(torch.cuda.memory_allocated(device=0))
        del current_loss
        del batch
        del batch_torch
        # print("after delete:{}".format(torch.cuda.memory_allocated()))`

I find the del lines in the bottom can free some memory but cannot meet the consumption it takes during batch generation and loss computation. Do you have any suggestion about this? I am confused about what caused the increasing of consumption.

Questions about boundary particle in run_network.py

I find you use 1.9 * S(obj) / S(particle) to roughly match the number of boundary particles, can I ask about why don't you use directly the boundary particle output from SPlisHSPlasH's surface sampling?
I mean you use the volume sampling of it to generate fluid particles, but give up the surface sampling to generate boundary particles, which will introduce a new variable. For example, if I want to compare the performance between pretrained model and the ground truth that is generated by SPlisHSPlasH, I will hope they have the same initial fluid particles and boundary particles. But now they have different boundary particles.

A question about network's output

I have read ur paper. It's really an excellent work.
But I'm a little confused about why its output should be divided by "128" in the end.

That's the sentence which mentions it in paper.

The output of the network is scaled with 1/128 to roughly adjust the output range to the ground truth position correction of the training data.

Hope u can answer my question. Thx!

The shape of pre-trained model does not match the model defined in ‘default.py’

Hello, I am very interested in this exciting work. I tried to run the inference code. However, I met some problem. The shape of pre-trained model does not match the model defined in ‘default.py’. Could you please provide the 'default.py' that matches the pre-trained parameters ?

can not create object

Hi, I try to use open3d 0.17.0 to generate the trainning data, but it always shows create object failed and cannot generate fluid particles. Is it not supported for higher open3d?

Error when running ./train_network_tf.py

Here is the full output of ./train_network_tf.py:
any idea on how to solve this ?

2021-04-08 12:05:13.984267: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
['./../datasets/ours_default_data/valid/sim_0201_00.msgpack.zst', './../datasets/ours_default_data/valid/sim_0201_01.msgpack.zst', './../datasets/ours_default_data/valid/sim_0201_02.msgpack.zst', './../datasets/ours_default_data/valid/sim_0201_03.msgpack.zst', './../datasets/ours_default_data/valid/sim_0201_04.msgpack.zst', './../datasets/ours_default_data/valid/sim_0201_05.msgpack.zst', './../datasets/ours_default_data/valid/sim_0201_06.msgpack.zst', './../datasets/ours_default_data/valid/sim_0201_07.msgpack.zst', './../datasets/ours_default_data/valid/sim_0201_08.msgpack.zst', './../datasets/ours_default_data/valid/sim_0201_09.msgpack.zst', './../datasets/ours_default_data/valid/sim_0201_10.msgpack.zst', './../datasets/ours_default_data/valid/sim_0201_11.msgpack.zst', './../datasets/ours_default_data/valid/sim_0201_12.msgpack.zst', './../datasets/ours_default_data/valid/sim_0201_13.msgpack.zst', './../datasets/ours_default_data/valid/sim_0201_14.msgpack.zst', './../datasets/ours_default_data/valid/sim_0201_15.msgpack.zst', './../datasets/ours_default_data/valid/sim_0202_00.msgpack.zst', './../datasets/ours_default_data/valid/sim_0202_01.msgpack.zst', './../datasets/ours_default_data/valid/sim_0202_02.msgpack.zst', './../datasets/ours_default_data/valid/sim_0202_03.msgpack.zst'] ...
['./../datasets/ours_default_data/train/sim_0001_00.msgpack.zst', './../datasets/ours_default_data/train/sim_0001_01.msgpack.zst', './../datasets/ours_default_data/train/sim_0001_02.msgpack.zst', './../datasets/ours_default_data/train/sim_0001_03.msgpack.zst', './../datasets/ours_default_data/train/sim_0001_04.msgpack.zst', './../datasets/ours_default_data/train/sim_0001_05.msgpack.zst', './../datasets/ours_default_data/train/sim_0001_06.msgpack.zst', './../datasets/ours_default_data/train/sim_0001_07.msgpack.zst', './../datasets/ours_default_data/train/sim_0001_08.msgpack.zst', './../datasets/ours_default_data/train/sim_0001_09.msgpack.zst', './../datasets/ours_default_data/train/sim_0001_10.msgpack.zst', './../datasets/ours_default_data/train/sim_0001_11.msgpack.zst', './../datasets/ours_default_data/train/sim_0001_12.msgpack.zst', './../datasets/ours_default_data/train/sim_0001_13.msgpack.zst', './../datasets/ours_default_data/train/sim_0001_14.msgpack.zst', './../datasets/ours_default_data/train/sim_0001_15.msgpack.zst', './../datasets/ours_default_data/train/sim_0002_00.msgpack.zst', './../datasets/ours_default_data/train/sim_0002_01.msgpack.zst', './../datasets/ours_default_data/train/sim_0002_02.msgpack.zst', './../datasets/ours_default_data/train/sim_0002_03.msgpack.zst'] ...
[0408 12:05:28 @parallel.py:340] [MultiProcessRunnerZMQ] Will fork a dataflow more than one times. This assumes the datapoints are i.i.d.
2021-04-08 12:05:29.748225: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
Process _Worker-1:
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "****/DeepLagrangianFluids/env36/lib/python3.6/site-packages/dataflow/dataflow/parallel.py", line 313, in run
    dp = next(itr)
  File "/****/DeepLagrangianFluids/env36/lib/python3.6/site-packages/dataflow/dataflow/parallel.py", line 50, in _repeat_iter
    yield from get_itr()
  File "/****/DeepLagrangianFluids/env36/lib/python3.6/site-packages/dataflow/dataflow/common.py", line 657, in __iter__
    for dp in self._inf_iter:
  File "/****/DeepLagrangianFluids/env36/lib/python3.6/site-packages/dataflow/dataflow/common.py", line 389, in __iter__
    yield from self.ds
  File "/****/DeepLagrangianFluids/env36/lib/python3.6/site-packages/dataflow/dataflow/common.py", line 389, in __iter__
    yield from self.ds
  File "./../datasets/dataset_reader_physics.py", line 46, in __iter__
    box = data[0]['box']
IndexError: list index out of range
2021-04-08 12:05:32.033298: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
./../datasets/ours_default_data/train/sim_0118_08.msgpack.zst HERE !!
Process _Worker-2:
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/****/DeepLagrangianFluids/env36/lib/python3.6/site-packages/dataflow/dataflow/parallel.py", line 313, in run
    dp = next(itr)
  File "/****/DeepLagrangianFluids/env36/lib/python3.6/site-packages/dataflow/dataflow/parallel.py", line 50, in _repeat_iter
    yield from get_itr()
  File "/****/DeepLagrangianFluids/env36/lib/python3.6/site-packages/dataflow/dataflow/common.py", line 657, in __iter__
    for dp in self._inf_iter:
  File "/****/DeepLagrangianFluids/env36/lib/python3.6/site-packages/dataflow/dataflow/common.py", line 389, in __iter__
    yield from self.ds
  File "/****/DeepLagrangianFluids/env36/lib/python3.6/site-packages/dataflow/dataflow/common.py", line 389, in __iter__
    yield from self.ds
  File "./../datasets/dataset_reader_physics.py", line 46, in __iter__
    box = data[0]['box']
IndexError: list index out of range
2021-04-08 12:05:44.522652: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-04-08 12:05:47.379310: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2021-04-08 12:05:47.437509: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-04-08 12:05:47.438251: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: 
pciBusID: 0000:03:00.0 name: GeForce 940MX computeCapability: 5.0
coreClock: 1.2415GHz coreCount: 3 deviceMemorySize: 3.95GiB deviceMemoryBandwidth: 13.41GiB/s
2021-04-08 12:05:47.438293: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-04-08 12:05:47.438434: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcublas.so.10'; dlerror: libcublas.so.10: cannot open shared object file: No such file or directory
2021-04-08 12:05:47.464373: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2021-04-08 12:05:47.494165: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2021-04-08 12:05:47.544487: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2021-04-08 12:05:47.565652: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2021-04-08 12:05:47.662424: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2021-04-08 12:05:47.662515: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1753] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2021-04-08 12:05:47.663295: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-04-08 12:05:47.793170: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2899885000 Hz
2021-04-08 12:05:47.794340: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5b046d0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-04-08 12:05:47.794459: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2021-04-08 12:05:47.823957: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-04-08 12:05:47.824033: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263]      
# 2021-04-08 12:05:48        0 n/a ips                 n/a rem | 
[0408 12:05:48 @parallel.py:351] ERR Exception '<class 'IndexError'>' in worker:
Traceback (most recent call last):
  File "./train_network_tf.py", line 165, in <module>
    sys.exit(main())
  File "./train_network_tf.py", line 134, in main
    batch = next(data_iter)
  File "/****/DeepLagrangianFluids/env36/lib/python3.6/site-packages/dataflow/dataflow/common.py", line 120, in __iter__
    for data in self.ds:
  File "/****/DeepLagrangianFluids/env36/lib/python3.6/site-packages/dataflow/dataflow/parallel.py", line 363, in __iter__
    yield self._recv()
  File "/****/DeepLagrangianFluids/env36/lib/python3.6/site-packages/dataflow/dataflow/parallel.py", line 352, in _recv
    raise exc.exc_type(exc.exc_msg)
IndexError: Traceback (most recent call last):
  File "/****/DeepLagrangianFluids/env36/lib/python3.6/site-packages/dataflow/dataflow/parallel.py", line 313, in run
    dp = next(itr)
  File "/****/DeepLagrangianFluids/env36/lib/python3.6/site-packages/dataflow/dataflow/parallel.py", line 50, in _repeat_iter
    yield from get_itr()
  File "/****/DeepLagrangianFluids/env36/lib/python3.6/site-packages/dataflow/dataflow/common.py", line 657, in __iter__
    for dp in self._inf_iter:
  File "/****/DeepLagrangianFluids/env36/lib/python3.6/site-packages/dataflow/dataflow/common.py", line 389, in __iter__
    yield from self.ds
  File "/****/DeepLagrangianFluids/env36/lib/python3.6/site-packages/dataflow/dataflow/common.py", line 389, in __iter__
    yield from self.ds
  File "./../datasets/dataset_reader_physics.py", line 46, in __iter__
    box = data[0]['box']
IndexError: list index out of range

MultiProcessRunnerZMQ successfully cleaned-up.
MultiProcessRunnerZMQ successfully cleaned-up.

During simulation, "Error(s) in loading state_dict for MyParticleNetwork"

Hi, thanks for the wonderful work!

I encountered an error when I trying to run the simulation for the canyon scene under the instruction. I'm using Pytorch on Ubuntu 20.04.

When I train the network from scratch, I got a list of models under train_network_torch_default/checkpoints, such as ckpt-50000.pt. Then when I run:

../scripts/run_network.py --weights ../train_network_torch_default/checkpoints/ckpt-50000.pt \
                          --scene canyon_scene.json \
                          --output canyon_out \
                          --num_steps 1500 \
                          ../scripts/train_network_torch.py

I got the error telling me the .pt file isn't correct:

Namespace(device='cuda', num_steps=1500, output='canyon_out', scene='canyon_scene.json', trainscript='../scripts/train_network_torch.py', weights='../exp_original_code/checkpoints/ckpt-50000.pt', write_bgeo=False, write_ply=False)
Traceback (most recent call last):
  File "../scripts/run_network.py", line 240, in <module>
    sys.exit(main())
  File "../scripts/run_network.py", line 236, in main
    args.num_steps, args.output, args)
  File "../scripts/run_network.py", line 110, in run_sim_torch
    model.load_state_dict(weights)
  File "/home/rayne/anaconda3/envs/py3.7/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1605, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for MyParticleNetwork:
	Missing key(s) in state_dict: "gravity", "conv0_fluid.kernel", "conv0_fluid.bias", "conv0_fluid.offset", "conv0_obstacle.kernel", "conv0_obstacle.bias", "conv0_obstacle.offset", "dense0_fluid.weight", "dense0_fluid.bias", "dense1.weight", "dense1.bias", "conv1.kernel", "conv1.bias", "conv1.offset", "dense2.weight", "dense2.bias", "conv2.kernel", "conv2.bias", "conv2.offset", "dense3.weight", "dense3.bias", "conv3.kernel", "conv3.bias", "conv3.offset". 
	Unexpected key(s) in state_dict: "step", "model", "optimizer", "scheduler".

When I replace ckpt-50000.pt with the file pretrained_model_weights.pt, the command could work. I'm not sure if the ckpt-50000.pt file is different from the provided pretrained_model_weights.pt file?

I'm wondering if you could help me with it. Many thanks!

Got an AttributeError when the code call the "open3d.ml.tf.layers()"

After I run the code in the "Building Open3D with ML module" to install Open3d ML module:

git clone --branch ml-module https://github.com/intel-isl/Open3D.git

mkdir Open3D/build
cd Open3D/build

cmake .. -DCMAKE_BUILD_TYPE=Release -DBUILD_TENSORFLOW_OPS=ON

I got an CMake Error

CMake Error at /d/WangYue/venv/lib/python3.5/site-packages/cmake/data/share/cmake-3.17/Modules/FindPackageHandleStandardArgs.cmake:164 (message):
  Could NOT find nanoflann (missing: nanoflann_INCLUDE_DIR)
Call Stack (most recent call first):
  /d/WangYue/venv/lib/python3.5/site-packages/cmake/data/share/cmake-3.17/Modules/FindPackageHandleStandardArgs.cmake:445 (_FPHSA_FAILURE_MESSAGE)
  3rdparty/CMake/Findnanoflann.cmake:15 (find_package_handle_standard_args)
  src/Open3D/ML/TensorFlow/CMakeLists.txt:12 (find_package)

So I run these code to git and configure it.

git clone --recursive https://github.com/intel-isl/Open3D
cmake .. -DCMAKE_BUILD_TYPE=Release -DBUILD_TENSORFLOW_OPS=ON
make install-pip-package

But I got error message when I want to run the pretrained model:

Traceback (most recent call last):
  File "./run_network.py", line 13, in <module>
    from create_physics_scenes import obj_surface_to_particles, obj_volume_to_particles
  File "./../datasets/create_physics_scenes.py", line 20, in <module>
    import open3d as o3d
  File "/d/WangYue/venv/lib/python3.5/site-packages/open3d/__init__.py", line 46, in <module>
    from open3d.open3d_pybind import camera
ImportError: /d/WangYue/venv/lib/python3.5/site-packages/open3d/open3d_pybind.cpython-35m-x86_64-linux-gnu.so: undefined symbol: XcursorImageLoadCursor

After I change the cmake sentence:

cmake -DBUILD_EIGEN3=ON  \
      -DBUILD_GLEW=ON    \
      -DBUILD_JSONCPP=ON \
      -DBUILD_PNG=ON     \
      -DBUILD_TENSORFLOW_OPS=ON
      ..

However, I got a new error when I run the pretrained model:

Traceback (most recent call last):
  File "./run_network.py", line 152, in <module>
    sys.exit(main())
  File "./run_network.py", line 148, in main
    args.output, args)
  File "./run_network.py", line 40, in run_sim
    model = trainscript_module.create_model()
  File "/e/WangYue/DeepLagrangianFluids/scripts/train_network.py", line 32, in create_model
    model = MyParticleNetwork()
  File "/e/WangYue/DeepLagrangianFluids/scripts/../models/default.py", line 58, in __init__
    activation=None)
  File "/e/WangYue/DeepLagrangianFluids/scripts/../models/default.py", line 36, in Conv
    conv_fn = o3dml.layers.ContinuousConv
AttributeError: module 'open3d.ml.tf' has no attribute 'layers'

Btw, It's okay to "import open3d.ml.tf".
How can I fix it?

Open3D/3rdparty/fmt/include/fmt/core.h(477): error: identifier "parse_context" is undefined

After I run the code in the "Building Open3D with ML module" to install Open3d ML module:
git clone --recursive --branch ml-module https://github.com/intel-isl/Open3D.git
mkdir Open3D/build
cd Open3D/build

cmake .. -DBUILD_EIGEN3=ON -DBUILD_GLEW=ON -DBUILD_JSONCPP=ON -DBUILD_PNG=ON -DCMAKE_BUILD_TYPE=Release -DBUILD_TENSORFLOW_OPS=ON -DBUILD_CUDA_MODULE=ON
make install-pip-package
after I make install-pip-package ，I got an MAKE Error
[ 91%] Linking CXX shared module ../../lib/Release/Python/pybind.cpython-37m-x86_64-linux-gnu.so
[ 91%] Built target pybind
Scanning dependencies of target open3d_tf_ops
[ 91%] Building CUDA object src/Open3D/ML/TensorFlow/CMakeFiles/open3d_tf_ops.dir/__/ContinuousConv/Detail/ContinuousConvCUDAKernels.cu.o
[ 91%] Building CXX object src/Open3D/ML/TensorFlow/CMakeFiles/open3d_tf_ops.dir/ContinuousConv/ContinuousConvBackpropFilterOpKernel.cpp.o
[ 91%] Building CUDA object src/Open3D/ML/TensorFlow/CMakeFiles/open3d_tf_ops.dir/ContinuousConv/ContinuousConvBackpropFilterOpKernel.cu.o
/home/es2/anaconda3/lib/python3.7/site-packages/tensorflow/include/Eigen/src/Core/util/XprHelper.h(114): warning: __host__ annotation is ignored on a function("no_assignment_operator") that is explicitly defaulted on its first declaration

/home/es2/anaconda3/lib/python3.7/site-packages/tensorflow/include/Eigen/src/Core/util/XprHelper.h(114): warning: __device__ annotation is ignored on a function("no_assignment_operator") that is explicitly defaulted on its first declaration

/home/es2/anaconda3/lib/python3.7/site-packages/tensorflow/include/Eigen/src/Core/util/XprHelper.h(115): warning: __host__ annotation is ignored on a function("no_assignment_operator") that is explicitly defaulted on its first declaration
/home/es2/Open3D/3rdparty/fmt/include/fmt/core.h(477): error: identifier "parse_context" is undefined

/home/es2/Open3D/3rdparty/fmt/include/fmt/core.h(477): error: expected a ";"

/home/es2/Open3D/3rdparty/fmt/include/fmt/core.h(478): error: identifier "wparse_context" is undefined
/home/es2/Open3D/3rdparty/fmt/include/fmt/core.h(478): error: expected a ";"

/home/es2/Open3D/3rdparty/fmt/include/fmt/format.h(2599): error: identifier "writer" is undefined

/home/es2/Open3D/3rdparty/fmt/include/fmt/format.h(2599): error: expected a ";"

/home/es2/Open3D/3rdparty/fmt/include/fmt/format.h(2600): error: identifier "wwriter" is undefined

/home/es2/Open3D/3rdparty/fmt/include/fmt/format.h(2600): error: expected a ";"

8 errors detected in the compilation of "/tmp/tmpxft_00003f1d_00000000-6_ContinuousConvBackpropFilterOpKernel.cpp1.ii". src/Open3D/ML/TensorFlow/CMakeFiles/open3d_tf_ops.dir/build.make:108: recipe for target 'src/Open3D/ML/TensorFlow/CMakeFiles/open3d_tf_ops.dir/ContinuousConv/ContinuousConvBackpropFilterOpKernel.cu.o' failed
make[3]: *** [src/Open3D/ML/TensorFlow/CMakeFiles/open3d_tf_ops.dir/ContinuousConv/ContinuousConvBackpropFilterOpKernel.cu.o] Error 1
CMakeFiles/Makefile2:1584: recipe for target 'src/Open3D/ML/TensorFlow/CMakeFiles/open3d_tf_ops.dir/all' failed
make[2]: *** [src/Open3D/ML/TensorFlow/CMakeFiles/open3d_tf_ops.dir/all] Error 2
CMakeFiles/Makefile2:1977: recipe for target 'src/pybind/CMakeFiles/install-pip-package.dir/rule' failed
make[1]: *** [src/pybind/CMakeFiles/install-pip-package.dir/rule] Error 2
Makefile:691: recipe for target 'install-pip-package' failed
make: *** [install-pip-package] Error 2

How can I fix it?

Is there any description about the dataset?

Hi, I downloaded the dataset, but it could not be decoded with one msgpack.loads. It seems that there are lots of things need to be decoded, like keys of the dictionary are binary string and pos values are bytes and I don't know the way it was encoded.
Do you have any suggestion on it?

Thank you!

MapInvert in scene.json not set to TRUE and the scene has abnormal behavior

Hi,

I used the ./create_data.sh to generate training and testing sets. However, when I rendered the scene.json in the splishsplash, it acts abnormally. Compared to DamBreak files, it does not set MapInvert to TRUE. I am wondering if the true training sets are like this or not.

Thank you!

Zeyi

	init_pos = torch.from_numpy(data['pos0'][0]).to(device)
	init_vel = torch.from_numpy(data['vel0'][0]).to(device)

	inputs = (init_pos, init_vel, None, box, box_normals)
	else:
	inputs = (pr_pos, pr_vel, None, box, box_normals)

	gt_pos = data['pos0'][0]
	fluid_errors.add_errors(scene_id,
	0,
	frame_id,
	scale * pr_pos.cpu().numpy(),
	scale * gt_pos,
	compute_gt2pred_distance=True)