brummi / behindthescenes Goto Github PK
View Code? Open in Web Editor NEWOfficial implementation of the paper: Behind the Scenes: Density Fields for Single View Reconstruction (CVPR 2023)
Home Page: https://fwmb.github.io/bts/
Official implementation of the paper: Behind the Scenes: Density Fields for Single View Reconstruction (CVPR 2023)
Home Page: https://fwmb.github.io/bts/
Hi Brummi,
Thanks a lot for the awesome code! I noticed you use a quite small MLP to render the density field, e.g., ResnetFC
with small hidden dimension channels (64
) and without any ResNet blocks (0
).
I wonder if a larger MLP with 1) larger dimensions; 2) more layers will lead to suboptimal results according to your previous experiments.
Thanks a lot for the information :)
Hi Brummi,
I tried to evaluate your provided model on the KITTI-raw and KITTI-360 datasets, both yielded suboptimal results
o_acc: 0.944 | ie_acc: 0.771 | ie_rec: 0.439
o_acc: 0.95 | ie_acc: 0.82 | ie_rec: 0.47
abs_rel: 0.102 | rmse: 4.409 | a1: 0.881
abs_rel: 0.102 | rmse: 4.407 | a1: 0.882
Even using your provided model, there is a large evaluation gap in KITTI-360, where for the ie_acc
, the gap is 0.771 v.s. 0.82
. Though the KITTI-raw score has little difference from yours, the numbers are not exactly the same. I hope to make sure:
I also observed further performance decline with my own trained model, i.e., for KITTI-raw, abs_rel: 0.104 | rmse: 4.554 | a1: 0.874
, for KITTI-360 o_acc: 0.948 | ie_acc: 0.784 | **ie_rec: 0.369**
. Can you provide some suggestions to faithfully reproduce your results?
Thank you for your information!
Hi,
When I launch the training process, I met the error below.
I think the training need a processed data, not the raw data.
File "/rockywin.wang/NeRF/BehindTheScenes/datasets/kitti_360/kitti_360_dataset.py", line 571, in __getitem__
imgs_p_left, imgs_f_left, imgs_p_right, imgs_f_right = self.load_images(sequence, img_ids, load_left, load_right, img_ids_fish=img_ids_fish)
File "/rockywin.wang/NeRF/BehindTheScenes/datasets/kitti_360/kitti_360_dataset.py", line 442, in load_images
img_perspective = cv2.cvtColor(cv2.imread(os.path.join(self.data_path, "data_2d_raw", seq, "image_00", self._perspective_folder, f"{id:010d}.png")), cv2.COLOR_BGR2RGB).astype(np.float32) / 255
cv2.error: OpenCV(4.5.3) /tmp/pip-req-build-afu9cjzs/opencv/modules/imgproc/src/color.cpp:182: error: (-215:Assertion failed) !_src.empty() in function 'cvtColor'
Hi, I noticed that the load_depth function in kitti_360_dataset.py was wrong since code like
points = points[points[:, 0] >= 0, :]
was omitted. Hence the points backward would mask out the forward point in the following process.
Is the result of this method on KITTI-Raw based on stereo cameras? I would like to know if there is still a considerable performance with only a monocular camera or surrounding cameras.
Nice work! There are some details I feel confused about, like the code below. Why do you define two evaluators? They are exactly similar except for the time when they are called. There might be some engineering strategies at play, and I would appreciate the opportunity to learn from them.
# We define two evaluators as they wont have exactly similar roles:
# - `evaluator` will save the best model based on validation score
evaluator = create_evaluator(model, metrics=eval_metrics, criterion=criterion if loss_during_validation else None, config=config)
if vis_loader is not None:
visualizer = create_evaluator(model, metrics=eval_metrics, criterion=criterion if loss_during_validation else None, config=config)
else:
visualizer = None
I'm asking because it seems like there haven't been any commits for a few months while the evaluation code is not being released.
hey Brummi~ another amazing job after momoRec ~
i have some question with the ray_batch_size, Is this an experience param?
If I trained bts on a custom dataset, Does this parameter have a big impact?
Hi, @Brummi
Can you show me the example code for how to generate the camera trajectories?
So I can set the camera trajectories on our own dataset?
Hi, I'm currently doing test on the KITTI-360 dataset and trying to visualize the ground truth depth map of the datasample (returned output from the load_depth function of Kitti360Dataset). However it doesn't show the proper depth profile of the corresponding scene image, as shown below:
real image:
Are there any additional processings I need to do with the depth maps in order to visualize it properly? Thanks
Thanks for your great work! I am interested in visualizing occupancies behind the scenes as depicted in Fig.4 in your paper. I succeed in training in KITTI-360 and generate depth sequences, but the transition to BEV are not very successful. Do you use the script: python scripts/videos/gen_vid_transition.py to get similar results as Fig. 4?
Thanks for your great work! The log file gets me a little confused. Take epoch2 as an example, it appears 5 times, could you please give some explanations?
Epoch 1 - Evaluation time (seconds): 4.40 - Vis metrics:
abs_rel: 0.16088220118284555
sq_rel: 1.6475946958800654
rmse: 6.0004214998035215
rmse_log: 0.2652074425737867
a1: 0.8132480978965759
a2: 0.9111361503601074
a3: 0.9498652219772339
2023-07-08 16:39:02,464 kitti_raw INFO: Epoch[1] Complete. Time taken: 00:46:03.668
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[2023-07-08 16:39:19,277][ignite.engine.engine.Engine][INFO] - Engine run starting with max_epochs=1.
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
Evaluation (val): [1/1] 100%|███████████████████████████████████████████████████████████████████████████████████████ [00:00<?]Visualizing
[2023-07-08 16:39:23,725][ignite.engine.engine.Engine][INFO] - Epoch[1] Complete. Time taken: 00:00:04.238
[2023-07-08 16:39:23,725][ignite.engine.engine.Engine][INFO] - Engine run complete. Time taken: 00:00:04.447
2023-07-08 16:39:23,809 kitti_raw INFO:
Epoch 2 - Evaluation time (seconds): 4.45 - Vis metrics:
abs_rel: 0.23017341934595395
sq_rel: 3.088896086703511
rmse: 7.271316281467014
rmse_log: 0.3078563333555099
a1: 0.7813498377799988
a2: 0.8981010913848877
a3: 0.9393996000289917
[2023-07-08 16:47:56,868][ignite.engine.engine.Engine][INFO] - Engine run starting with max_epochs=1.
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
Evaluation (val): [1/1] 100%|███████████████████████████████████████████████████████████████████████████████████████ [00:00<?]Visualizing
[2023-07-08 16:48:01,295][ignite.engine.engine.Engine][INFO] - Epoch[1] Complete. Time taken: 00:00:04.211
[2023-07-08 16:48:01,296][ignite.engine.engine.Engine][INFO] - Engine run complete. Time taken: 00:00:04.427
2023-07-08 16:48:01,401 kitti_raw INFO:
Epoch 2 - Evaluation time (seconds): 4.43 - Vis metrics:
abs_rel: 0.1650394400605743
sq_rel: 1.998698890271272
rmse: 5.943043702966046
rmse_log: 0.2682881054646016
a1: 0.8516638278961182
a2: 0.9178416728973389
a3: 0.9528733491897583
[2023-07-08 16:56:36,516][ignite.engine.engine.Engine][INFO] - Engine run starting with max_epochs=1.
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
Evaluation (val): [1/1] 100%|███████████████████████████████████████████████████████████████████████████████████████ [00:00<?]Visualizing
[2023-07-08 16:56:40,894][ignite.engine.engine.Engine][INFO] - Epoch[1] Complete. Time taken: 00:00:04.135
[2023-07-08 16:56:40,895][ignite.engine.engine.Engine][INFO] - Engine run complete. Time taken: 00:00:04.378
2023-07-08 16:56:40,991 kitti_raw INFO:
Epoch 2 - Evaluation time (seconds): 4.38 - Vis metrics:
abs_rel: 0.14595142936506972
sq_rel: 1.4979753871574057
rmse: 5.749837607629393
rmse_log: 0.26670149436806356
a1: 0.8354953527450562
a2: 0.9211630821228027
a3: 0.9507426023483276
conda activate base
[2023-07-08 17:05:13,192][ignite.engine.engine.Engine][INFO] - Engine run starting with max_epochs=1.
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[2023-07-08 17:08:11,160][ignite.engine.engine.Engine][INFO] - Epoch[1] Complete. Time taken: 00:02:57.756
[2023-07-08 17:08:11,161][ignite.engine.engine.Engine][INFO] - Engine run complete. Time taken: 00:02:57.969
2023-07-08 17:08:11,282 kitti_raw INFO:
Epoch 2 - Evaluation time (seconds): 177.97 - Test metrics:
abs_rel: 0.11900233822874062
sq_rel: 0.8671512578064269
rmse: 4.559270895221629
rmse_log: 0.199316740456192
a1: 0.8598688319325447
a2: 0.9558506403118372
a3: 0.9797724287491292
[2023-07-08 17:08:11,282][ignite.engine.engine.Engine][INFO] - Engine run starting with max_epochs=1.
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
Evaluation (val): [1/1] 100%|███████████████████████████████████████████████████████████████████████████████████████ [00:00<?]Visualizing
[2023-07-08 17:08:15,634][ignite.engine.engine.Engine][INFO] - Epoch[1] Complete. Time taken: 00:00:04.166
[2023-07-08 17:08:15,635][ignite.engine.engine.Engine][INFO] - Engine run complete. Time taken: 00:00:04.353
2023-07-08 17:08:15,740 kitti_raw INFO:
Epoch 2 - Evaluation time (seconds): 4.35 - Vis metrics:
abs_rel: 0.13582281319264153
sq_rel: 1.4082434631998875
rmse: 6.013943428623624
rmse_log: 0.28424850291586035
a1: 0.8557999134063721
a2: 0.923481822013855
a3: 0.9515572786331177
[2023-07-08 17:16:50,828][ignite.engine.engine.Engine][INFO] - Engine run starting with max_epochs=1.
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
Evaluation (val): [1/1] 100%|███████████████████████████████████████████████████████████████████████████████████████ [00:00<?]Visualizing
[2023-07-08 17:16:55,211][ignite.engine.engine.Engine][INFO] - Epoch[1] Complete. Time taken: 00:00:04.195
[2023-07-08 17:16:55,212][ignite.engine.engine.Engine][INFO] - Engine run complete. Time taken: 00:00:04.384
2023-07-08 17:16:55,296 kitti_raw INFO:
Epoch 2 - Evaluation time (seconds): 4.38 - Vis metrics:
abs_rel: 0.17579734838730143
sq_rel: 2.5830991379591985
rmse: 6.899043454084597
rmse_log: 0.30531200850440265
a1: 0.8277871608734131
a2: 0.9161496162414551
a3: 0.941655695438385
2023-07-08 17:25:02,273 kitti_raw INFO: Epoch[2] Complete. Time taken: 00:45:59.807
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[2023-07-08 17:25:31,259][ignite.engine.engine.Engine][INFO] - Engine run starting with max_epochs=1.
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
Evaluation (val): [1/1] 100%|████████████████████████████████████████████████████████████████████████ [00:00<?]Visualizing
[2023-07-08 17:25:35,633][ignite.engine.engine.Engine][INFO] - Epoch[1] Complete. Time taken: 00:00:04.175
[2023-07-08 17:25:35,634][ignite.engine.engine.Engine][INFO] - Engine run complete. Time taken: 00:00:04.374
2023-07-08 17:25:35,720 kitti_raw INFO:
Epoch 3 - Evaluation time (seconds): 4.37 - Vis metrics:
abs_rel: 0.15186456882564142
sq_rel: 2.0138081328306496
rmse: 6.890541053109251
rmse_log: 0.2912098378889751
a1: 0.846023678779602
a2: 0.9128282070159912
a3: 0.944538414478302
Hi,
I added the depth supervised loss in the config file.
But, I found the depth loss starts small and oscillates up.
The visualization of depth map was also poor.
Hi Thanks for your work. I tried your model architecture on my custom data and here are some problems and my insights
1.There are two key advantages about this model:
(1) It is self-supervised and need only continuous video sequences with relative [R|T] between adjacent frames. This reduce the human labor enormously and we can use high precise localization algorithm or device to automatically get the camera position.
(2) It is not computationally intensively if we do not want to render the depth but only want to get 3d occupancy grid, e.g. , x range [-8,8]
y range [-0.4, 2.2] z range [1,21], and the voxel resolution is 0.2, then there are about 100000 sample point and one inference of mlp with input [1,100000,103] is enough. The grid sample op can also be easily implemented using cuda kernel function and there are also no 3d conv ops.
2.But to be honest there are some inevitable problems associated with the model
(1) The training signal depends both on the image quality itself and [R|T] precision:
-- First, If the image has reflection on the ground such as underground parking lot, then no matter how precision the [R|T] matrix is, the training signal will be vague and weak in these region because there will be no way to tune the predicted density along the camera ray to lead it to find the best stereo-matching point on the epipolar line of the render image.
-- Second, If the [R|T] matrix is not precise, then the traning signal will also not be clear enough to get notable result. For example, if the video sequence is monocular, then becasue there are always some road surface vibration when driving, the [R|T] matrix may not be precise which may lead to sub-optimal training result. However when trained on stereo-camera, the [R|T] between left and right camera is very precision and will almost not change when driving, then the training signal is much clearer and the result will be much better compared to mono ones. So in order to get very good result, it seems that the data gathering car need to equip stereo-camera with larger baseline distance when applied to outdoor driving scenerio.
(2) The generalization ability is weak
Even when trained with stereo-camera or with very precise [R|T] between adjacent sequences, and the image quality is very good with no reflection or artifacts. The generalization ablity is not so impressive. For example the model trained on KITTI-raw or KITTI-360 will perform very bad on custom dataset without finetune (zero-shot). The depth map is far from precise, especially in the road surface region. When finetuned on custom mono dataset, the model performe better but still far from precise, especially in the road surface region, and the texture-copy artifact will occur in rendered depth map.
The problem can not be solved even using larger dataset and I think it is really the intrinsic limitation of the model. The key problem may line in the how the point feature in 3d space be constructed. In your paper, the 3d point feature consist of three components: 1. image feature sampled from projected pixel using interpolation 2.position embedding using [sin(fu) sin(fv) sin(fz) cos(fu) cos(fv) cos(fz) sin(2fu) sin(2fv) sin(2fz) cos(2fu) cos(2fv) cos(2fz) ... ] 3.the normalized position itself (u,v,z)
So I think this may be the key problem that weaken its generalization because the image feature part in point feature vector may be the dominant factor in determining the decoded density in that point and image feature may be very different in different dataset domain. So when zero-shot it to a whole new different dataset the model perform so bad that it has to be finetuned to adapt to new image feature space. If this custom dataset has only mono-camera, then the training result will not be so good. This really hinder its practical usage in autonomous driving
I am wonder if there are some misunderstanding about this model and also want to know how to enhance model generalization and get good result on my custom mono video sequence with middle precise [R|T]
Hi,
I am confused about that the pose matrix is inverted and multiplied by itself, isn't that the identity matrix?
When I know the R and T of the current view, how do I generate the motion trajectory of the new view on the kitti dataset to generate the demo, can you provide the code?
How to get this file of "./scripts/videos/trajectories/simple_movement.npy"?
These files are not in the link mentioned by you.
As shown, the 0928 director only has two files .
Others are missing!
Originally posted by @liguopeng0923 in #20 (comment)
Dear authors,
Great work. Please add license. Please also check for and respect licenses from which this repo was derived from ( pixelNeRF, monodepth2).
Thanks
Does it support webcam for a live depth prediction?
Thanks for your great work! I have a question about using the ground truth occupancy. Could you please help me with it?
I think that if objects move during a shoot, there is a possibility that the whole object or parts of it will be carved out.
When you were creating the ground truth occupancy, how did you deal with moving objects?
Thank you!
Hi,
I am confused about the input data shape. Can you help me?
The input data shape is [4, 8, 3,192, 640] when the batch size is 16.
The input data shape is [2, 8, 3,192, 640] when the batch size is 8.
The input data shape is [1, 8, 3,192, 640] when the batch size is 2.
And what's the meaning of the number 8?
Do you plan to release the visualization code that represents density as occupancy, like the density field image shown in Figure 1?
Hi,
I have a kitti odometry dataset, but I don't have kitti raw data.
Can I run the code of BehindTheScene in kitti odometry data with the config of kitti raw data?
I found the SceneRF is training in kitti odometry dataset.
Hello. Thank you for sharing this amazing project!
May I know when the code will be released?
I am looking forward to running this project!
Hello, thank you for sharing this wonderful work.
When I use the gen_img_custom.py
with the pre-trained model you shared to visualize the depth map and profile map for the KITTI-Raw data, I got the results like this:
They are different with the results in Figure 4 in your paper.
I am not sure what the problem.
hey brummi ~
I have some question with FisheyeToPinhole.
I tried to visualize the fisheye img after this sample function.
I used the kitti360 data_fisheye_calibration data and calibration results which you support, and then I used the resamle function, after resampling it cut a lot, a lot of information is missing, I was wondering if I could focus on the fisheye camera information, Maybe I don't need such resample ? Looking forward to your reply, really appreciate.
Hi,
I am confusion about the setting of frame_sample_mode
.
for cam in range(4):
ids_loss += [cam * steps + i for i in range(start_from, steps, 2)]
ids_render += [cam * steps + i for i in range(1 - start_from, steps, 2)]
start_from = 1 - start_from
The images are recon depth, recon img, and invailds.
The traning loss did't decrease, if i use fisheye data to train in KITTI360 . I have checked i use the same config as the original one except i didn't preprocess the dataset. It seems that the invalid regions are large.
Could you give me some advice for proper training?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.