brummi / behindthescenes Goto Github PK

Official implementation of the paper: Behind the Scenes: Density Fields for Single View Reconstruction (CVPR 2023)

Python 99.76% Shell 0.24%

3d-reconstruction depth-estimation depth-prediction kitti kitti-360 kitti-dataset nerf self-supervised self-supervised-learning cvpr

behindthescenes's Introduction

Behind the Scenes: Density Fields for Single View Reconstruction

Paper | Video | Project Page

This is the official implementation for the CVPR 2023 paper:

Behind the Scenes: Density Fields for Single View Reconstruction

Felix Wimbauer¹, Nan Yang¹, Christian Rupprecht² and Daniel Cremers¹
¹Technical University of Munich, ²University of Oxford

CVPR 2023 (arXiv)

If you find our work useful, please consider citing our paper:

@article{wimbauer2023behind,
  title={Behind the Scenes: Density Fields for Single View Reconstruction},
  author={Wimbauer, Felix and Yang, Nan and Rupprecht, Christian and Cremers, Daniel},
  journal={arXiv preprint arXiv:2301.07668},
  year={2023}
}

voxel_video.mp4

📋 Abstract

Inferring a meaningful geometric scene representation from a single image is a fundamental problem in computer vision. Approaches based on traditional depth map prediction can only reason about areas that are visible in the image. Currently, neural radiance fields (NeRFs) can capture true 3D including color but are too complex to be generated from a single image. As an alternative, we introduce a neural network that predicts an implicit density field from a single image. It maps every location in the frustum of the image to volumetric density. Our network can be trained through self-supervision from only video data. By not storing color in the implicit volume, but directly sampling color from the available views during training, our scene representation becomes significantly less complex compared to NeRFs, and we can train neural networks to predict it. Thus, we can apply volume rendering to perform both depth prediction and novel view synthesis. In our experiments, we show that our method is able to predict meaningful geometry for regions that are occluded in the input image. Additionally, we demonstrate the potential of our approach on three datasets for depth prediction and novel-view synthesis.

🪧 Overview

a) Our method first predicts a pixel-aligned feature map F, which describes a density field, from the input image I_I. For every pixel u', the feature f_u' implicitly describes the density distribution along the ray from the camera origin through u'. Crucially, this distribution can model density even in occluded regions (e.g. the house).

b) To render novel views, we perform volume rendering. For any point x, we project x into F and sample f_u'. This feature is combined with positional encoding and fed into an MLP to obtain density σ. We obtain the color c by projecting x into one of the views, in this case I₁, and directly sampling the image.

🏗️️ Setup

🐍 Python Environment

We use Conda to manage our Python environment:

conda env create -f environment.yml

Then, activate the conda environment :

conda activate bts

💾 Datasets

All data should be placed under the data/ folder (or linked to there) in order to match our config files for the different datasets. The folder structure should look like:

data/KITTI-360
data/KITTI-Raw
data/RealEstate10K

All non-standard data (like precomputed poses and datasplits) comes with this repository and can be found in the datasets/ folder.

KITTI-360

To download KITTI-360, go to https://www.cvlibs.net/datasets/kitti-360/index.php and create an account. We require the perspective images, fisheye images, raw velodyne scans, calibrations, and vehicle poses.

KITTI (Raw)

To download KITTI, go to https://www.cvlibs.net/datasets/kitti/raw_data.php and create an account. We require all synched+rectified data, as well as the calibrations. The website also provides scripts for automatic downloading of the different sequences. As we have found the provided ground truth poses to be lacking in quality, we computed our own poses with ORB-SLAM3 and use them by default. They can be found under datasets/kitti_raw/orb-slam_poses.

RealEstate10K

You first have to download the camera trajectories and video information from https://google.github.io/realestate10k/download.html. Place these files under data/RealEstate10K/train and data/RealEstate10K/test respectively. We then provide a script to download and preprocess the videos. Note that these scripts may take several days to run. Further, the download script uses the tempory folder (default /dev/shm/).

python datasets/realestate10k/download_realestate10k.py -d data/RealEstate10K -o data/RealEstate10K -m train
python datasets/realestate10k/download_realestate10k.py -d data/RealEstate10K -o data/RealEstate10K -m test
python datasets/realestate10k/process_realestate10k.py -d data/RealEstate10K -m train
python datasets/realestate10k/process_realestate10k.py -d data/RealEstate10K -m test

Other Dataset Implementations

This repository contains dataloader implementations for other datasets, too. These are not officially supported and are not guaranteed to work out of the box. However, they might be helpful when extending this codebase.

📸 Checkpoints

We provide download links for pretrained models for KITTI-360, KITTI, and RealEstate10K (soon). Models will be stored under out/<dataset>/pretrained/<checkpoint-name>.pth.

download_checkpoint.sh {kitti-360|kitti-raw|realestate10k}

🏃 Running the Example

We provide a script to run our pretrained models with custom data. The script can be found under scripts/images/gen_img_custom.py and takes the following flags:

--img <path> / i <path>: Path to input image. The image will be resized to match the model's default resolution.
--model <model> / -m <model>: Which pretrained model to use (KITTI-360 (default), KITTI-Raw, RealEstate10K).
--plot / -p: Plot outputs instead of saving them.

media/example/ contains two example images. Note that we use the default projection matrices for the respective datasets to compute the density profiles (birds-eye views). Therefore, if your custom data comes from a camera with different intrinsics, the output profiles might be skewed.

# Plot outputs
python scripts/images/gen_img_custom.py --img media/example/0000.png --model KITTI-360 --plot

# Save outputs to disk
python scripts/images/gen_img_custom.py --img media/example/0000.png --model KITTI-360

🏋 Training

We provide training configurations for our different models. Generally, all trainings are run on a single Nvidia A40 GPU with 48GB memory.

KITTI-360

python train.py -cn exp_kitti_360

KITTI (Raw)

python train.py -cn exp_kitti_raw

RealEstate10K

python train.py -cn exp_re10k

📊 Evaluation

We further provide configurations to reproduce the evaluation results from the paper for occupancy and depth estimation.

# KITTI-360 Lidar Occupancy
python eval.py -cn eval_lidar_occ

# KITTI Raw Depth
python eval.py -cn eval_depth

📽 Rendering Images & Videos

We provide scripts to generate images and videos from the outputs of our models. Generally, you can adapt the model and configuration for the output by changing some constant in the scripts. Generated files are stored under media/.

Inference on custom images

Please refer to the example section.

Generate images for samples from the datasets

python scripts/images/gen_imgs.py

Generate depth / profile videos

python scripts/videos/gen_vid_seq.py

Generate novel view animations

python scripts/videos/gen_vid_nvs.py

We provide different camera trajectories under scripts/videos/trajectories.

Generate animation from depth map to top-down profile

python scripts/videos/gen_vid_transition.py

🗣️ Acknowledgements

This work was supported by the ERC Advanced Grant SIMULACRON, the GNI project AI4Twinning and the Munich Center for Machine Learning. C. R. is supported by VisualAI EP/T028572/1 and ERC-UNION-CoG-101001212.

This repository is based on the PixelNeRF code base and takes a lot of inspiration from Monodepth2.

behindthescenes's People

Contributors

Stargazers

Watchers

Forkers

daydreamer2023 seabird-go nilskeunecke steven-xiong jackzhousz zebrajack keonhee-han ricklentz zshn25 notgooood bruinxiong runjtu jackzhenguo flyinggh limjoohyun

behindthescenes's Issues

Does larger MLP affects the final results?

Hi Brummi,

Thanks a lot for the awesome code! I noticed you use a quite small MLP to render the density field, e.g., ResnetFC with small hidden dimension channels (64) and without any ResNet blocks (0).

I wonder if a larger MLP with 1) larger dimensions; 2) more layers will lead to suboptimal results according to your previous experiments.

Thanks a lot for the information :)

Occupancy visualization code

Do you plan to release the visualization code that represents density as occupancy, like the density field image shown in Figure 1?

Missing some files in KITTI-RAW Poses in datasets/kitti_raw/orb-slam_poses.

Inferior scores of both provided models and trained models

Hi Brummi,

I tried to evaluate your provided model on the KITTI-raw and KITTI-360 datasets, both yielded suboptimal results

KITTI-360

testing image: The unzipped PNG image (w/o preprocessing)
my evaluated results o_acc: 0.944 | ie_acc: 0.771 | ie_rec: 0.439
results on the paper: o_acc: 0.95 | ie_acc: 0.82 | ie_rec: 0.47

KITTI-raw

testing image: kitti-raw image (transformed to .jpg as in monodepth2)
my evaluated results abs_rel: 0.102 | rmse: 4.409 | a1: 0.881
results on the paper: abs_rel: 0.102 | rmse: 4.407 | a1: 0.882

Even using your provided model, there is a large evaluation gap in KITTI-360, where for the ie_acc, the gap is 0.771 v.s. 0.82. Though the KITTI-raw score has little difference from yours, the numbers are not exactly the same. I hope to make sure:

If I should use the preprocessed images for KITTI-360 for evaluation
If some Python environment settings influence scores. Currently, I use PyTorch-2.0

I also observed further performance decline with my own trained model, i.e., for KITTI-raw, abs_rel: 0.104 | rmse: 4.554 | a1: 0.874, for KITTI-360 o_acc: 0.948 | ie_acc: 0.784 | **ie_rec: 0.369**. Can you provide some suggestions to faithfully reproduce your results?

Thank you for your information!

Experiment settings of other works mentioned in the paper

Thanks for your great work! I wonder what's the experiment settings of these 3 works, mono? stereo? or with fisheye?

confusion about frame_sample_mode

Hi,
I am confusion about the setting of frame_sample_mode.

I don't know the meaning of the magical number of 4 and 2 in the code block.
I don't known the meaning of this for loop

                for cam in range(4):
                    ids_loss += [cam * steps + i for i in range(start_from, steps, 2)]
                    ids_render += [cam * steps + i for i in range(1 - start_from, steps, 2)]
                    start_from = 1 - start_from

When I train my dataset with only a front camera no stereo camera and fisheye camera, how can I modify these settings to fitting my dataset.

Will the evaluation code be released?

I'm asking because it seems like there haven't been any commits for a few months while the evaluation code is not being released.

Question about of generate novel view animations

When I know the R and T of the current view, how do I generate the motion trajectory of the new view on the kitti dataset to generate the demo, can you provide the code?
How to get this file of "./scripts/videos/trajectories/simple_movement.npy"?

Please add license

Dear authors,

Great work. Please add license. Please also check for and respect licenses from which this repo was derived from ( pixelNeRF, monodepth2).

Thanks

Train on KITTI-360 with fisheye

The images are recon depth, recon img, and invailds.
The traning loss did't decrease, if i use fisheye data to train in KITTI360 . I have checked i use the same config as the original one except i didn't preprocess the dataset. It seems that the invalid regions are large.
Could you give me some advice for proper training?

Why the profile results are different with yours?

Hello, thank you for sharing this wonderful work.

When I use the gen_img_custom.py with the pre-trained model you shared to visualize the depth map and profile map for the KITTI-Raw data, I got the results like this:

They are different with the results in Figure 4 in your paper.

I am not sure what the problem.

Wrong depth projection of kitti360 dataset

Hi, I noticed that the load_depth function in kitti_360_dataset.py was wrong since code like
points = points[points[:, 0] >= 0, :] was omitted. Hence the points backward would mask out the forward point in the following process.

Some problem about this architecture

Hi Thanks for your work. I tried your model architecture on my custom data and here are some problems and my insights

1.There are two key advantages about this model：
(1) It is self-supervised and need only continuous video sequences with relative [R|T] between adjacent frames. This reduce the human labor enormously and we can use high precise localization algorithm or device to automatically get the camera position.
(2) It is not computationally intensively if we do not want to render the depth but only want to get 3d occupancy grid, e.g. , x range [-8,8]
y range [-0.4, 2.2] z range [1,21], and the voxel resolution is 0.2, then there are about 100000 sample point and one inference of mlp with input [1,100000,103] is enough. The grid sample op can also be easily implemented using cuda kernel function and there are also no 3d conv ops.

2.But to be honest there are some inevitable problems associated with the model
(1) The training signal depends both on the image quality itself and [R|T] precision:
-- First, If the image has reflection on the ground such as underground parking lot, then no matter how precision the [R|T] matrix is, the training signal will be vague and weak in these region because there will be no way to tune the predicted density along the camera ray to lead it to find the best stereo-matching point on the epipolar line of the render image.

-- Second, If the [R|T] matrix is not precise, then the traning signal will also not be clear enough to get notable result. For example, if the video sequence is monocular, then becasue there are always some road surface vibration when driving, the [R|T] matrix may not be precise which may lead to sub-optimal training result. However when trained on stereo-camera, the [R|T] between left and right camera is very precision and will almost not change when driving, then the training signal is much clearer and the result will be much better compared to mono ones. So in order to get very good result, it seems that the data gathering car need to equip stereo-camera with larger baseline distance when applied to outdoor driving scenerio.

(2) The generalization ability is weak
Even when trained with stereo-camera or with very precise [R|T] between adjacent sequences, and the image quality is very good with no reflection or artifacts. The generalization ablity is not so impressive. For example the model trained on KITTI-raw or KITTI-360 will perform very bad on custom dataset without finetune (zero-shot). The depth map is far from precise, especially in the road surface region. When finetuned on custom mono dataset, the model performe better but still far from precise, especially in the road surface region, and the texture-copy artifact will occur in rendered depth map.

The problem can not be solved even using larger dataset and I think it is really the intrinsic limitation of the model. The key problem may line in the how the point feature in 3d space be constructed. In your paper, the 3d point feature consist of three components： 1. image feature sampled from projected pixel using interpolation 2.position embedding using [sin(fu) sin(fv) sin(fz) cos(fu) cos(fv) cos(fz) sin(2fu) sin(2fv) sin(2fz) cos(2fu) cos(2fv) cos(2fz) ... ] 3.the normalized position itself (u,v,z)

So I think this may be the key problem that weaken its generalization because the image feature part in point feature vector may be the dominant factor in determining the decoded density in that point and image feature may be very different in different dataset domain. So when zero-shot it to a whole new different dataset the model perform so bad that it has to be finetuned to adapt to new image feature space. If this custom dataset has only mono-camera, then the training result will not be so good. This really hinder its practical usage in autonomous driving

I am wonder if there are some misunderstanding about this model and also want to know how to enhance model generalization and get good result on my custom mono video sequence with middle precise [R|T]

Question about the ground truth occupancy.

Thanks for your great work! I have a question about using the ground truth occupancy. Could you please help me with it?
I think that if objects move during a shoot, there is a possibility that the whole object or parts of it will be carved out.
When you were creating the ground truth occupancy, how did you deal with moving objects?
Thank you!

Why is the pose used for nerf calculation an identity matrix?

Hi,
I am confused about that the pose matrix is inverted and multiplied by itself, isn't that the identity matrix?

Is the result of this method on KITTI-Raw based on stereo cameras?

Is the result of this method on KITTI-Raw based on stereo cameras? I would like to know if there is still a considerable performance with only a monocular camera or surrounding cameras.

How to visualize Fig.4 in your paper?

Thanks for your great work! I am interested in visualizing occupancies behind the scenes as depicted in Fig.4 in your paper. I succeed in training in KITTI-360 and generate depth sequences, but the transition to BEV are not very successful. Do you use the script: python scripts/videos/gen_vid_transition.py to get similar results as Fig. 4?

When will the code be released?

Hello. Thank you for sharing this amazing project!

May I know when the code will be released?

I am looking forward to running this project!

These files are not in the link mentioned by you.

          These files are not in the link mentioned by you.

As shown, the 0928 director only has two files .

Others are missing!

Originally posted by @liguopeng0923 in #20 (comment)

FisheyeToPinholeSampler

hey brummi ~
I have some question with FisheyeToPinhole.
I tried to visualize the fisheye img after this sample function.
I used the kitti360 data_fisheye_calibration data and calibration results which you support, and then I used the resamle function, after resampling it cut a lot, a lot of information is missing, I was wondering if I could focus on the fisheye camera information, Maybe I don't need such resample ? Looking forward to your reply, really appreciate.

Does it support live demo from the webcam?

Does it support webcam for a live depth prediction?

Question about the log file

Thanks for your great work! The log file gets me a little confused. Take epoch2 as an example, it appears 5 times, could you please give some explanations?
Epoch 1 - Evaluation time (seconds): 4.40 - Vis metrics:
abs_rel: 0.16088220118284555
sq_rel: 1.6475946958800654
rmse: 6.0004214998035215
rmse_log: 0.2652074425737867
a1: 0.8132480978965759
a2: 0.9111361503601074
a3: 0.9498652219772339
2023-07-08 16:39:02,464 kitti_raw INFO: Epoch[1] Complete. Time taken: 00:46:03.668
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[2023-07-08 16:39:19,277][ignite.engine.engine.Engine][INFO] - Engine run starting with max_epochs=1.
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
Evaluation (val): [1/1] 100%|███████████████████████████████████████████████████████████████████████████████████████ [00:00<?]Visualizing
[2023-07-08 16:39:23,725][ignite.engine.engine.Engine][INFO] - Epoch[1] Complete. Time taken: 00:00:04.238
[2023-07-08 16:39:23,725][ignite.engine.engine.Engine][INFO] - Engine run complete. Time taken: 00:00:04.447
2023-07-08 16:39:23,809 kitti_raw INFO:
Epoch 2 - Evaluation time (seconds): 4.45 - Vis metrics:
abs_rel: 0.23017341934595395
sq_rel: 3.088896086703511
rmse: 7.271316281467014
rmse_log: 0.3078563333555099
a1: 0.7813498377799988
a2: 0.8981010913848877
a3: 0.9393996000289917
[2023-07-08 16:47:56,868][ignite.engine.engine.Engine][INFO] - Engine run starting with max_epochs=1.
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
Evaluation (val): [1/1] 100%|███████████████████████████████████████████████████████████████████████████████████████ [00:00<?]Visualizing
[2023-07-08 16:48:01,295][ignite.engine.engine.Engine][INFO] - Epoch[1] Complete. Time taken: 00:00:04.211
[2023-07-08 16:48:01,296][ignite.engine.engine.Engine][INFO] - Engine run complete. Time taken: 00:00:04.427
2023-07-08 16:48:01,401 kitti_raw INFO:
Epoch 2 - Evaluation time (seconds): 4.43 - Vis metrics:
abs_rel: 0.1650394400605743
sq_rel: 1.998698890271272
rmse: 5.943043702966046
rmse_log: 0.2682881054646016
a1: 0.8516638278961182
a2: 0.9178416728973389
a3: 0.9528733491897583
[2023-07-08 16:56:36,516][ignite.engine.engine.Engine][INFO] - Engine run starting with max_epochs=1.
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
Evaluation (val): [1/1] 100%|███████████████████████████████████████████████████████████████████████████████████████ [00:00<?]Visualizing
[2023-07-08 16:56:40,894][ignite.engine.engine.Engine][INFO] - Epoch[1] Complete. Time taken: 00:00:04.135
[2023-07-08 16:56:40,895][ignite.engine.engine.Engine][INFO] - Engine run complete. Time taken: 00:00:04.378
2023-07-08 16:56:40,991 kitti_raw INFO:
Epoch 2 - Evaluation time (seconds): 4.38 - Vis metrics:
abs_rel: 0.14595142936506972
sq_rel: 1.4979753871574057
rmse: 5.749837607629393
rmse_log: 0.26670149436806356
a1: 0.8354953527450562
a2: 0.9211630821228027
a3: 0.9507426023483276
conda activate base
[2023-07-08 17:05:13,192][ignite.engine.engine.Engine][INFO] - Engine run starting with max_epochs=1.
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[2023-07-08 17:08:11,160][ignite.engine.engine.Engine][INFO] - Epoch[1] Complete. Time taken: 00:02:57.756
[2023-07-08 17:08:11,161][ignite.engine.engine.Engine][INFO] - Engine run complete. Time taken: 00:02:57.969
2023-07-08 17:08:11,282 kitti_raw INFO:
Epoch 2 - Evaluation time (seconds): 177.97 - Test metrics:
abs_rel: 0.11900233822874062
sq_rel: 0.8671512578064269
rmse: 4.559270895221629
rmse_log: 0.199316740456192
a1: 0.8598688319325447
a2: 0.9558506403118372
a3: 0.9797724287491292
[2023-07-08 17:08:11,282][ignite.engine.engine.Engine][INFO] - Engine run starting with max_epochs=1.
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
Evaluation (val): [1/1] 100%|███████████████████████████████████████████████████████████████████████████████████████ [00:00<?]Visualizing
[2023-07-08 17:08:15,634][ignite.engine.engine.Engine][INFO] - Epoch[1] Complete. Time taken: 00:00:04.166
[2023-07-08 17:08:15,635][ignite.engine.engine.Engine][INFO] - Engine run complete. Time taken: 00:00:04.353
2023-07-08 17:08:15,740 kitti_raw INFO:
Epoch 2 - Evaluation time (seconds): 4.35 - Vis metrics:
abs_rel: 0.13582281319264153
sq_rel: 1.4082434631998875
rmse: 6.013943428623624
rmse_log: 0.28424850291586035
a1: 0.8557999134063721
a2: 0.923481822013855
a3: 0.9515572786331177
[2023-07-08 17:16:50,828][ignite.engine.engine.Engine][INFO] - Engine run starting with max_epochs=1.
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
Evaluation (val): [1/1] 100%|███████████████████████████████████████████████████████████████████████████████████████ [00:00<?]Visualizing
[2023-07-08 17:16:55,211][ignite.engine.engine.Engine][INFO] - Epoch[1] Complete. Time taken: 00:00:04.195
[2023-07-08 17:16:55,212][ignite.engine.engine.Engine][INFO] - Engine run complete. Time taken: 00:00:04.384
2023-07-08 17:16:55,296 kitti_raw INFO:
Epoch 2 - Evaluation time (seconds): 4.38 - Vis metrics:
abs_rel: 0.17579734838730143
sq_rel: 2.5830991379591985
rmse: 6.899043454084597
rmse_log: 0.30531200850440265
a1: 0.8277871608734131
a2: 0.9161496162414551
a3: 0.941655695438385
2023-07-08 17:25:02,273 kitti_raw INFO: Epoch[2] Complete. Time taken: 00:45:59.807
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[2023-07-08 17:25:31,259][ignite.engine.engine.Engine][INFO] - Engine run starting with max_epochs=1.
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
Evaluation (val): [1/1] 100%|████████████████████████████████████████████████████████████████████████ [00:00<?]Visualizing
[2023-07-08 17:25:35,633][ignite.engine.engine.Engine][INFO] - Epoch[1] Complete. Time taken: 00:00:04.175
[2023-07-08 17:25:35,634][ignite.engine.engine.Engine][INFO] - Engine run complete. Time taken: 00:00:04.374
2023-07-08 17:25:35,720 kitti_raw INFO:
Epoch 3 - Evaluation time (seconds): 4.37 - Vis metrics:
abs_rel: 0.15186456882564142
sq_rel: 2.0138081328306496
rmse: 6.890541053109251
rmse_log: 0.2912098378889751
a1: 0.846023678779602
a2: 0.9128282070159912
a3: 0.944538414478302

"FileNotFoundError: [Errno 2] No such file or directory: 'scripts/videos/trajectories/simple_movement.npy'"

Hi, @Brummi
Can you show me the example code for how to generate the camera trajectories?
So I can set the camera trajectories on our own dataset?

question about visualizing ground truth depth

Hi, I'm currently doing test on the KITTI-360 dataset and trying to visualize the ground truth depth map of the datasample (returned output from the load_depth function of Kitti360Dataset). However it doesn't show the proper depth profile of the corresponding scene image, as shown below:
real image:

visualized gt_depth map:

Are there any additional processings I need to do with the depth maps in order to visualize it properly? Thanks

Can the kitti raw data be replace with the kitti odometry data set

Hi,
I have a kitti odometry dataset, but I don't have kitti raw data.
Can I run the code of BehindTheScene in kitti odometry data with the config of kitti raw data?
I found the SceneRF is training in kitti odometry dataset.

Some details I feel confused about

Nice work! There are some details I feel confused about, like the code below. Why do you define two evaluators? They are exactly similar except for the time when they are called. There might be some engineering strategies at play, and I would appreciate the opportunity to learn from them.

# We define two evaluators as they wont have exactly similar roles:
# - `evaluator` will save the best model based on validation score
evaluator = create_evaluator(model, metrics=eval_metrics, criterion=criterion if loss_during_validation else None, config=config)

if vis_loader is not None:
    visualizer = create_evaluator(model, metrics=eval_metrics, criterion=criterion if loss_during_validation else None, config=config)
else:
    visualizer = None

How can I get the point cloud and mesh from the density?

Hi,
Can I generate the point cloud and mesh from the density?
Just like the below.

details with the keyParam ray_batch_size: 4096

hey Brummi~ another amazing job after momoRec ~
i have some question with the ray_batch_size, Is this an experience param？
If I trained bts on a custom dataset, Does this parameter have a big impact？

How to add depth supervised loss?

Hi,
I added the depth supervised loss in the config file.
But, I found the depth loss starts small and oscillates up.
The visualization of depth map was also poor.

How to generate the training data?

Hi,
When I launch the training process, I met the error below.
I think the training need a processed data, not the raw data.

  File "/rockywin.wang/NeRF/BehindTheScenes/datasets/kitti_360/kitti_360_dataset.py", line 571, in __getitem__
    imgs_p_left, imgs_f_left, imgs_p_right, imgs_f_right = self.load_images(sequence, img_ids, load_left, load_right, img_ids_fish=img_ids_fish)
  File "/rockywin.wang/NeRF/BehindTheScenes/datasets/kitti_360/kitti_360_dataset.py", line 442, in load_images
    img_perspective = cv2.cvtColor(cv2.imread(os.path.join(self.data_path, "data_2d_raw", seq, "image_00", self._perspective_folder, f"{id:010d}.png")), cv2.COLOR_BGR2RGB).astype(np.float32) / 255
cv2.error: OpenCV(4.5.3) /tmp/pip-req-build-afu9cjzs/opencv/modules/imgproc/src/color.cpp:182: error: (-215:Assertion failed) !_src.empty() in function 'cvtColor'

Why the input data shape is [4, 8, 3,192, 640] when the batch szie is 16?

Hi,
I am confused about the input data shape. Can you help me?
The input data shape is [4, 8, 3,192, 640] when the batch size is 16.
The input data shape is [2, 8, 3,192, 640] when the batch size is 8.
The input data shape is [1, 8, 3,192, 640] when the batch size is 2.
And what's the meaning of the number 8?