alexklwong / calibrated-backprojection-network Goto Github PK
View Code? Open in Web Editor NEWPyTorch Implementation of Unsupervised Depth Completion with Calibrated Backprojection Layers (ORAL, ICCV 2021)
License: Other
PyTorch Implementation of Unsupervised Depth Completion with Calibrated Backprojection Layers (ORAL, ICCV 2021)
License: Other
Hi, Alex,
Thank you for your excellent work.
I some problem when run the pretrained model and train the model.
I haven't change the code, but the following errors were reported.
RuntimeError: inverse_cuda: For batch 0: U( , ) is zero, singular U.
(The values in parentheses are different each time i run them)
Have you met this error before, and how can i solve it?
Thanks in advance.
Thank you for releasing your nice code.
I'm trying to test your pre-trained model in VOID-1500 dataset, so I followed the instructions step by step to configure the dataset.
However, when 'bash bash/void/run_kbnet_void1500.sh' is executed, the following error occurs:
< major errors >
Traceback (most recent call last):
File "src/run_kbnet.py", line 149, in <module>
device=args.device)
File "/home/zinuok/mono_depth/calibrated-backprojection-network/src/kbnet.py", line 899, in run
for idx, (inputs, ground_truth) in enumerate(zip(dataloader, ground_truths)):
File "/home/zinuok/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 517, in __next__
data = self._next_data()
File "/home/zinuok/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1199, in _next_data
return self._process_data(data)
File "/home/zinuok/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1225, in _process_data
data.reraise()
File "/home/zinuok/.local/lib/python3.6/site-packages/torch/_utils.py", line 429, in reraise
raise self.exc_type(msg)
ValueError: Caught ValueError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/zinuok/.local/lib/python3.6/site-packages/numpy/lib/shape_base.py", line 867, in split
len(indices_or_sections)
TypeError: object of type 'int' has no len()
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/zinuok/.local/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 202, in _worker_loop
data = fetcher.fetch(index)
File "/home/zinuok/.local/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/zinuok/.local/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/zinuok/mono_depth/calibrated-backprojection-network/src/datasets.py", line 264, in __getitem__
normalize=False)
File "/home/zinuok/mono_depth/calibrated-backprojection-network/src/datasets.py", line 44, in load_image_triplet
image1, image0, image2 = np.split(images, indices_or_sections=3, axis=-1)
File "<__array_function__ internals>", line 6, in split
File "/home/zinuok/.local/lib/python3.6/site-packages/numpy/lib/shape_base.py", line 873, in split
'array split does not result in an equal division')
ValueError: array split does not result in an equal division
I think the last error is the main cause of my problem
File "/home/zinuok/mono_depth/calibrated-backprojection-network/src/datasets.py", line 44, in load_image_triplet
image1, image0, image2 = np.split(images, indices_or_sections=3, axis=-1)
File "<__array_function__ internals>", line 6, in split
File "/home/zinuok/.local/lib/python3.6/site-packages/numpy/lib/shape_base.py", line 873, in split
'array split does not result in an equal division')
ValueError: array split does not result in an equal division
caused from the following code:
# Split along width
image1, image0, image2 = np.split(images, indices_or_sections=3, axis=-1)
I've printed the shape of the 'images' variable, but its size is (3, 480, 640)..
How can I solve this error?
I've tried several changes like, changing N_THREADS from 8 to 1, but I couldn't fix this error
Hello,
In run_kbnet.py, you have an argument 'input_channels_depth'
whose default is 2; I'm not sure what this means.
In networks.py, it's written that it is the "number of input channels for depth branch", but why would the depth be 2 channel?
Hello,
I captured some data, and set it up in the VOID format, and train the model on that. While training, I get the best results as:
Step MAE RMSE iMAE iRMSE
21000 119.343 200.820 56.941 120.669
But if I run the evaluation script with this best result model weights (21000) , I get very different results with much higher errors:
Evaluation results:
MAE RMSE iMAE iRMSE
3989.622 3996.284 517.797 540.038
+/- +/- +/- +/-
55.942 58.938 51.691 54.052
This doesn't make sense to me since I'm using the same set of data for the validation and testing,similar to what is provided in the scripts. Do you have any ideas on why this could be happening?
Hello, I have a question about the values in "K.txt"
in original VOID dataset, the intrinsic parameters provided in here are:
"f_x": 514.638,
"f_y": 518.858,
"c_x": 315.267,
"c_y": 247.358,
However, in the "K.txt":
5.471833965147203571e+02 0.000000000000000000e+00 3.176305425559989430e+02
0.000000000000000000e+00 5.565094509450176474e+02 2.524727249693490592e+02
0.000000000000000000e+00 0.000000000000000000e+00 1.000000000000000000e+00```
, where K =
f_x 0 c_x
0 f_y c_y
0 0 1
(as I know)
They are somewhat different.
Q1. Is the camera's distortion model (radtan) already applied in "K.txt" ?
Q2. And the second question is that, why the intrinsic parameters are different across the different sequence? Did you use different sensor setup in each sequence? (in your paper, it is written that D435i was used for data acquisition). If so, which intrinsic data should be used for real-usage, like VIO ?
Very thanks in advance
Thank you very much for providing this code!
but i still have a little question :
The path you provide in the setup_dataset_kitti.py doesn't seem to be an dirpath, so there were a lot of empty txt files after I ran the code.
in line 258 :
sequence = sparse_depth_paths[0].split(os.sep)[5]
sequence_date = sequence[0:10]
When I debug the code,the sequence is 'data' so the sequence_data is 'data' but i think its wrong
im look forward to your answer
Hi, Alex,
Thanks for your nice work. I'm facing the problem of uneven GPU memories when training the model with multiple GPUs. It costs much more memory on GPU#0 than others. I think the main reason is that DataParallel can only compute losses on GPU#0. Would you give some advice to balance the GPU memory? Thanks in advance.
Hi~ Alex,
Thank you for your nice work. I succeeded in reproducing your results. However, I have some questions about whether img or depth is normalized. They don't seem to be normalized in your code, so what's the purpose of that? Does normalization yield better results? By the way, If the image resolution is too large, limited by cuda memory, could I resize image and depth instead of random crop? And during the test, the obtained low-resolution depth map will be upsampled to the original resolution size.
Looking forward to your reply!
Best regards,
Viot
Hi~, Alex,
Could you tell me the number of your training samples? I get 72400 according to your data and processing method, Is this quantity correct?
Looking forward to your reply!
Thank you for your excellent work.
I would like to ask if you used cuda 11.0 when testing on ubuntu 20.04?
When I trainthe network on cuda 11.0 + pytorch1.7 based on RTX 3090, the loss cannot drop normally. I cannot find the reason.
Could you help me?
The last comma needs to be removed.
Hi Alex,
Thanks for your great work. I'm reproducing the NYU v2 (generalization) experiments. I followed the instructions to prepare NYU data provided by you. When I used your pre-trained model kbnet-void1500.pth
to evaluate on NYU v2,I got these errors:
(kbnet) zlq@ivg-SYS-7048GR-TR:/home/disk2/code/calibrated-backprojection-network$ bash bash/void/run_knet_nyu_v2_test.sh
usage: run_kbnet.py [-h] --image_path IMAGE_PATH --sparse_depth_path
SPARSE_DEPTH_PATH --intrinsics_path INTRINSICS_PATH
[--ground_truth_path GROUND_TRUTH_PATH]
[--input_channels_image INPUT_CHANNELS_IMAGE]
[--input_channels_depth INPUT_CHANNELS_DEPTH]
[--normalized_image_range NORMALIZED_IMAGE_RANGE [NORMALIZED_IMAGE_RANGE ...]]
[--outlier_removal_kernel_size OUTLIER_REMOVAL_KERNEL_SIZE]
[--outlier_removal_threshold OUTLIER_REMOVAL_THRESHOLD]
[--min_pool_sizes_sparse_to_dense_pool MIN_POOL_SIZES_SPARSE_TO_DENSE_POOL [MIN_POOL_SIZES_SPARSE_TO_DENSE_POOL ...]]
[--max_pool_sizes_sparse_to_dense_pool MAX_POOL_SIZES_SPARSE_TO_DENSE_POOL [MAX_POOL_SIZES_SPARSE_TO_DENSE_POOL ...]]
[--n_convolution_sparse_to_dense_pool N_CONVOLUTION_SPARSE_TO_DENSE_POOL]
[--n_filter_sparse_to_dense_pool N_FILTER_SPARSE_TO_DENSE_POOL]
[--n_filters_encoder_image N_FILTERS_ENCODER_IMAGE [N_FILTERS_ENCODER_IMAGE ...]]
[--n_filters_encoder_depth N_FILTERS_ENCODER_DEPTH [N_FILTERS_ENCODER_DEPTH ...]]
[--resolutions_backprojection RESOLUTIONS_BACKPROJECTION [RESOLUTIONS_BACKPROJECTION ...]]
[--n_filters_decoder N_FILTERS_DECODER [N_FILTERS_DECODER ...]]
[--deconv_type DECONV_TYPE]
[--min_predict_depth MIN_PREDICT_DEPTH]
[--max_predict_depth MAX_PREDICT_DEPTH]
[--weight_initializer WEIGHT_INITIALIZER]
[--activation_func ACTIVATION_FUNC]
[--min_evaluate_depth MIN_EVALUATE_DEPTH]
[--max_evaluate_depth MAX_EVALUATE_DEPTH]
[--output_path OUTPUT_PATH] [--save_outputs]
[--keep_input_filenames]
[--depth_model_restore_path DEPTH_MODEL_RESTORE_PATH]
[--device DEVICE]
run_kbnet.py: error: unrecognized arguments: --avg_pool_sizes_sparse_to_dense_pool 0 --encoder_type knet_v1 fusion_conv_previous sparse_to_dense_pool_v1 --input_type sparse_depth validity_map 3 3 3 0 --n_resolutions_encoder_intrinsics 0 1 2 3 --skip_types image depth --decoder_type multi-scale --output_kernel_size 3 --outlier_removal_method remove
So I deleted the unrecognized arguments and run it again, this time I got these numbers:
Evaluation results:
MAE RMSE iMAE iRMSE
122.836 228.426 24.147 50.003
+/- +/- +/- +/-
71.550 130.133 16.920 36.531
I know the numbers are close to the reported results in this repo, but I think maybe I can perfectly reproduce the results you reported if considering the unrecognized arguments.
My question is: How can I reproduce the results which are closer to your reported results? Is there something wrong with my operation?
My environment is as follows:
torch 1.3.0
torchvision 0.4.1
Python 3.7.12
CUDA 10.2
Thank you in advance.
In the given demo, you have a sparse points and colored predicted point clouds, I want to know how to get the similar demo?
Hello,
When the image validation is run, the dataloader for inference is in line 764 of kbnet.py is initialized as:
dataloader = torch.utils.data.DataLoader(
datasets.KBNetInferenceDataset(
image_paths=image_paths,
sparse_depth_paths=sparse_depth_paths,
intrinsics_paths=intrinsics_paths),
batch_size=1,
shuffle=False,
num_workers=1,
drop_last=False)
in which case the KBNetInferenceDataset
class is initialized with the default use_image_triplet=True
, and tries to fetch and split triplet of images. I understand its function in the training, but why is it so in the validation?
Hey there!
Thank you for the work!
I tried it out on my own sparse depth map + rgb image, and it didn't perform too well at all. I also visualized some ground truth data from the VOID dataset, and found that some pointclouds look similarly bad. The copyroom folder looks fine, but the first image from birthplace_of_internet already looks bad. I can understand my own dataset could be problematic concerning sparse depth map resolution, but after looking at some ground truth data, I'm wondering if the problem lies elsewhere.
Any idea on why my custom dataset would look like this? The "bad" ground truth from VOID still looks better than my results .
Here a link to the visualization of a VOID and a custom pointcloud.
I ran kbnet on both a Python 3.7 venv with the given dependency versions and a 3.9 venv with newer library versions.
I visualized everything using Open3D:
image_opencv = cv2.imread(image_file)
image = o3d.io.read_image(image_file)
depth_image = o3d.io.read_image(depth_file)
K = np.loadtxt(intrinsics_file)
intrinsic = o3d.camera.PinholeCameraIntrinsic(image_opencv.shape[0], image_opencv.shape[1], K[0][0],K[1][1], K[0][2], K[1][2])
rgbd_image = o3d.geometry.RGBDImage.create_from_color_and_depth(image, depth_image, convert_rgb_to_intensity=False)
pcd = o3d.geometry.PointCloud.create_from_rgbd_image(rgbd_image, intrinsic)
o3d.visualization.draw_geometries([pcd])
Hey, Alex.
I noticed that you used a kitti_static_frames.txt file when processing the dataset. Is the purpose of this to pick out pictures of scenes without moving objects?
Hello,
I was wondering how the gradients for the loss function of , say color consistency loss, are computed wrt the weights by backprop.
You have your depth d
at a coordinate which is a function of your weights, say X
at a pixel location i
, i.e. d(X,i)
.
From this depth, you get your warped image coordinate, say W(d)
. For the color consistency, you compare the pixel values of the source image, and warped image, which would be some difference between img(i)
and img(W(d(X,i) )
.
Numerically, I can see how you can obtain the derivative of this function wrt the weights, but how does auto-differentiation do it analytically, since img(i)
cannot really be stated analytically?
Hello Alex, thank you for releasing this work!
I had a quick question: what is the purpose of min_predict_depth and max_predict_depth / what do these two lines do? https://github.com/alexklwong/calibrated-backprojection-network/blob/master/src/kbnet_model.py#L183-L184
From my interpretation, it does not seem to be scaling the output_depth to be between the min_predict_depth and max_predict_depth.
Thank you!
Hi Alex,
I notice that you update the script for downloading the NYUv2 dataset. Thanks.
I downloaded the raw data from the NYUv2 official website weeks ago. But I found the unzipped data contains many files, with extensions like: '.dump', '.pgm', and '.ppm'.
And the name of files are like:
However, the setup python file of NYUv2 seems to only accept the files with '.png' extension.
So my question is: Did I download the correct NYUv2 data? or How can I set up the data for your code?
Thanks in advance!
Hello,
Though the absolute poses are available for each frame in the VOID dataset, it looks like PoseNet is used for getting the poses between cameras. Is there a particular reason for this?
Hello,
While running the training script train_kbnet_void1500.sh, I inevitably end up getting this error after 5000+ steps:
Begin training...
Step= 1000/67350 Loss=1.33181 Time Elapsed=0.15h Time Remaining=10.22h
Step= 2000/67350 Loss=1.39467 Time Elapsed=0.31h Time Remaining=9.97h
Step= 3000/67350 Loss=1.27011 Time Elapsed=0.46h Time Remaining=9.78h
Step= 4000/67350 Loss=1.13345 Time Elapsed=0.61h Time Remaining=9.62h
Step= 5000/67350 Loss=1.04204 Time Elapsed=0.76h Time Remaining=9.49h
Traceback (most recent call last):
File "src/train_kbnet.py", line 251, in <module>
n_thread=args.n_thread)
File "/home/madharak/ws/calibrated-backprojection-network/src/kbnet.py", line 524, in train
log_path=log_path)
File "/home/madharak/ws/calibrated-backprojection-network/src/kbnet.py", line 571, in validate
for idx, (inputs, ground_truth) in enumerate(zip(dataloader, ground_truths)):
File "/home/madharak/anaconda3/envs/depth_completion/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 819, in __next__
return self._process_data(data)
File "/home/madharak/anaconda3/envs/depth_completion/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 846, in _process_data
data.reraise()
File "/home/madharak/anaconda3/envs/depth_completion/lib/python3.7/site-packages/torch/_utils.py", line 385, in reraise
raise self.exc_type(msg)
TypeError: function takes exactly 5 arguments (1 given)
Have you encountered this error before, and do you have an idea of why it could be occurring? I guess it must be something to do with my data, but I have no idea what.
I'm running torch 1.3.0, and torchvision 0.4.1
I try to get the rmse by the groud_truth and output_depth folders generated by the model,I use data_utils.load_depth_with_validity_map read the ground_truth and validity_map and data_utils.load_depth read the output_depth,but can't get the initial result.
Thanks
images = []
for path in image_paths:
image = data_utils.load_image(path)
images.append(np.stack(image, axis=-1))
ground_truths = []
for path in ground_truth_paths:
ground_truth, validity_map = data_utils.load_depth_with_validity_map(path)
ground_truths.append(np.stack([ground_truth, validity_map], axis=-1))
output_depths = []
for path in output_depth_paths:
output_depth = data_utils.load_depth(path)
output_depths.append(np.stack(output_depth, axis=-1))
for i in range(length):
ground_truth = ground_truths[i,:,:,:]
# ground_truth = np.squeeze(ground_truth)
output_depth = output_depths[i,:,:]
validity_map = ground_truth[:, :, 1]
ground_truth = ground_truth[:, :, 0]
validity_mask = np.where(validity_map > 0, 1, 0)
min_max_mask = np.logical_and(
ground_truth > 0,
ground_truth < 100)
mask = np.where(np.logical_and(validity_mask, min_max_mask) > 0)
#output_depth = output_depth[mask]
#ground_truth = ground_truth[mask]
mae[i] = eval_utils.mean_abs_err(1000 * output_depth, 1000 * ground_truth)
rmse[i] = eval_utils.root_mean_sq_err(1000 * output_depth, 1000 * ground_truth)
print(eval_utils.root_mean_sq_err(1000.0 * output_depths[i,:,:][np.where(validity_map > 0,1,0)], 1000.0 * ground_truths[i,:,:,0][np.where(validity_map > 0,1,0)]))
imae[i] = eval_utils.inv_mean_abs_err(0.001 * output_depth, 0.001 * ground_truth)
irmse[i] = eval_utils.inv_root_mean_sq_err(0.001 * output_depth, 0.001 * ground_truth)
#print(rmse[i])
# Compute mean metrics
mae = np.mean(mae)
rmse = np.mean(rmse)
imae = np.mean(imae)
irmse = np.mean(irmse)
Hi, Alex, Thank you so much for the excellent work. I am using the pretrained model to run on VOID test dataset. However, the performance is pretty bad for all three cases. Here is a screen shot of the results. I am wondering where the problem could be. (1) Does the pytorch version matters . Mine is cuda1.8.1 + cuda11.1. (2) Is the pretrained model on share drive the final version used in the paper ? Thank you very much.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.