Giter Site home page Giter Site logo

alexklwong / calibrated-backprojection-network Goto Github PK

View Code? Open in Web Editor NEW
117.0 8.0 24.0 26.95 MB

PyTorch Implementation of Unsupervised Depth Completion with Calibrated Backprojection Layers (ORAL, ICCV 2021)

License: Other

Shell 8.63% Python 91.37%
machine-learning deep-learning computer-vision 3d-vision sensor-fusion unsupervised-learning self-supervised-learning depth-estimation depth-completion 3d-reconstruction

calibrated-backprojection-network's People

Contributors

alexklwong avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

calibrated-backprojection-network's Issues

Question about RuntimeError: inverse_cuda: For batch 0: U( , ) is zero, singular U.

Hi, Alex,
Thank you for your excellent work.
I some problem when run the pretrained model and train the model.
I haven't change the code, but the following errors were reported.
RuntimeError: inverse_cuda: For batch 0: U( , ) is zero, singular U.
(The values in parentheses are different each time i run them)
Have you met this error before, and how can i solve it?
Thanks in advance.

Error: Caught ValueError in DataLoader worker process 0

Thank you for releasing your nice code.
I'm trying to test your pre-trained model in VOID-1500 dataset, so I followed the instructions step by step to configure the dataset.

However, when 'bash bash/void/run_kbnet_void1500.sh' is executed, the following error occurs:
< major errors >

  • ValueError: Caught ValueError in DataLoader worker process 0
  • TypeError: object of type 'int' has no len()
  • ValueError: array split does not result in an equal division <-- major error
Traceback (most recent call last):
  File "src/run_kbnet.py", line 149, in <module>
    device=args.device)
  File "/home/zinuok/mono_depth/calibrated-backprojection-network/src/kbnet.py", line 899, in run
    for idx, (inputs, ground_truth) in enumerate(zip(dataloader, ground_truths)):
  File "/home/zinuok/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 517, in __next__
    data = self._next_data()
  File "/home/zinuok/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1199, in _next_data
    return self._process_data(data)
  File "/home/zinuok/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1225, in _process_data
    data.reraise()
  File "/home/zinuok/.local/lib/python3.6/site-packages/torch/_utils.py", line 429, in reraise
    raise self.exc_type(msg)
ValueError: Caught ValueError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/home/zinuok/.local/lib/python3.6/site-packages/numpy/lib/shape_base.py", line 867, in split
    len(indices_or_sections)
TypeError: object of type 'int' has no len()

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/zinuok/.local/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 202, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/zinuok/.local/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/zinuok/.local/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/zinuok/mono_depth/calibrated-backprojection-network/src/datasets.py", line 264, in __getitem__
    normalize=False)
  File "/home/zinuok/mono_depth/calibrated-backprojection-network/src/datasets.py", line 44, in load_image_triplet
    image1, image0, image2 = np.split(images, indices_or_sections=3, axis=-1)
  File "<__array_function__ internals>", line 6, in split
  File "/home/zinuok/.local/lib/python3.6/site-packages/numpy/lib/shape_base.py", line 873, in split
    'array split does not result in an equal division')
ValueError: array split does not result in an equal division

I think the last error is the main cause of my problem

  File "/home/zinuok/mono_depth/calibrated-backprojection-network/src/datasets.py", line 44, in load_image_triplet
    image1, image0, image2 = np.split(images, indices_or_sections=3, axis=-1)
  File "<__array_function__ internals>", line 6, in split
  File "/home/zinuok/.local/lib/python3.6/site-packages/numpy/lib/shape_base.py", line 873, in split
    'array split does not result in an equal division')
ValueError: array split does not result in an equal division

caused from the following code:

# Split along width
    image1, image0, image2 = np.split(images, indices_or_sections=3, axis=-1)

I've printed the shape of the 'images' variable, but its size is (3, 480, 640)..

How can I solve this error?
I've tried several changes like, changing N_THREADS from 8 to 1, but I couldn't fix this error

Why quality is so bad?

Hi!

Thanks for your great work! I have tested your repository and got bad results.
I tested with bash/void/run_kbnet_void150.sh

There is the image
0000000043

Depth map:
grayscale

Sparse point cloud:
screencast-bpconcjcammlapcogcnnelfmaeghhagj-2023 04 21-17_50_16

And resulting dense point cloud:
screencast-bpconcjcammlapcogcnnelfmaeghhagj-2023 04 21-17_53_47

Looks much worse than in your README. Is it normal or I don't know about additional postprocessing?
Thanks.

What is the 'input_channels_depth' parameter for?

Hello,
In run_kbnet.py, you have an argument 'input_channels_depth' whose default is 2; I'm not sure what this means.
In networks.py, it's written that it is the "number of input channels for depth branch", but why would the depth be 2 channel?

Different results between validation while training, and testing

Hello,
I captured some data, and set it up in the VOID format, and train the model on that. While training, I get the best results as:

    Step       MAE      RMSE      iMAE     iRMSE
   21000   119.343   200.820    56.941   120.669

But if I run the evaluation script with this best result model weights (21000) , I get very different results with much higher errors:

Evaluation results:
     MAE      RMSE      iMAE     iRMSE
3989.622  3996.284   517.797   540.038
     +/-       +/-       +/-       +/-
  55.942    58.938    51.691    54.052

This doesn't make sense to me since I'm using the same set of data for the validation and testing,similar to what is provided in the scripts. Do you have any ideas on why this could be happening?

camera intrinsic matrix

Hello, I have a question about the values in "K.txt"

in original VOID dataset, the intrinsic parameters provided in here are:

"f_x": 514.638,
"f_y": 518.858,
"c_x": 315.267,
"c_y": 247.358,

However, in the "K.txt":

5.471833965147203571e+02 0.000000000000000000e+00 3.176305425559989430e+02
0.000000000000000000e+00 5.565094509450176474e+02 2.524727249693490592e+02
0.000000000000000000e+00 0.000000000000000000e+00 1.000000000000000000e+00```

, where K =
f_x   0    c_x
0     f_y  c_y
0      0      1
(as I know)

They are somewhat different.

Q1. Is the camera's distortion model (radtan) already applied in "K.txt" ?

Q2. And the second question is that, why the intrinsic parameters are different across the different sequence? Did you use different sensor setup in each sequence? (in your paper, it is written that D435i was used for data acquisition). If so, which intrinsic data should be used for real-usage, like VIO ?

Very thanks in advance

Could you provide the results after the setup_dataset_kitti.py, please?

Thank you very much for providing this code!
but i still have a little question :
The path you provide in the setup_dataset_kitti.py doesn't seem to be an dirpath, so there were a lot of empty txt files after I ran the code.
in line 258 :
sequence = sparse_depth_paths[0].split(os.sep)[5]
sequence_date = sequence[0:10]
When I debug the code,the sequence is 'data' so the sequence_data is 'data' but i think its wrong
im look forward to your answer

uneven GPU memory caused by multi-gpu training

Hi, Alex,
Thanks for your nice work. I'm facing the problem of uneven GPU memories when training the model with multiple GPUs. It costs much more memory on GPU#0 than others. I think the main reason is that DataParallel can only compute losses on GPU#0. Would you give some advice to balance the GPU memory? Thanks in advance.

normalization or not

Hi~ Alex,

Thank you for your nice work. I succeeded in reproducing your results. However, I have some questions about whether img or depth is normalized. They don't seem to be normalized in your code, so what's the purpose of that? Does normalization yield better results? By the way, If the image resolution is too large, limited by cuda memory, could I resize image and depth instead of random crop? And during the test, the obtained low-resolution depth map will be upsampled to the original resolution size.
Looking forward to your reply!

Best regards,
Viot

about dataset

Hi~, Alex,
Could you tell me the number of your training samples? I get 72400 according to your data and processing method, Is this quantity correct?
Looking forward to your reply!

Question about cuda 11.0 and pyorch 1.7.0

Thank you for your excellent work.

I would like to ask if you used cuda 11.0 when testing on ubuntu 20.04?
When I trainthe network on cuda 11.0 + pytorch1.7 based on RTX 3090, the loss cannot drop normally. I cannot find the reason.
Could you help me?

NYU performance

Hi Alex,
Thanks for your great work. I'm reproducing the NYU v2 (generalization) experiments. I followed the instructions to prepare NYU data provided by you. When I used your pre-trained model kbnet-void1500.pth to evaluate on NYU v2,I got these errors:
(kbnet) zlq@ivg-SYS-7048GR-TR:/home/disk2/code/calibrated-backprojection-network$ bash bash/void/run_knet_nyu_v2_test.sh
usage: run_kbnet.py [-h] --image_path IMAGE_PATH --sparse_depth_path
SPARSE_DEPTH_PATH --intrinsics_path INTRINSICS_PATH
[--ground_truth_path GROUND_TRUTH_PATH]
[--input_channels_image INPUT_CHANNELS_IMAGE]
[--input_channels_depth INPUT_CHANNELS_DEPTH]
[--normalized_image_range NORMALIZED_IMAGE_RANGE [NORMALIZED_IMAGE_RANGE ...]]
[--outlier_removal_kernel_size OUTLIER_REMOVAL_KERNEL_SIZE]
[--outlier_removal_threshold OUTLIER_REMOVAL_THRESHOLD]
[--min_pool_sizes_sparse_to_dense_pool MIN_POOL_SIZES_SPARSE_TO_DENSE_POOL [MIN_POOL_SIZES_SPARSE_TO_DENSE_POOL ...]]
[--max_pool_sizes_sparse_to_dense_pool MAX_POOL_SIZES_SPARSE_TO_DENSE_POOL [MAX_POOL_SIZES_SPARSE_TO_DENSE_POOL ...]]
[--n_convolution_sparse_to_dense_pool N_CONVOLUTION_SPARSE_TO_DENSE_POOL]
[--n_filter_sparse_to_dense_pool N_FILTER_SPARSE_TO_DENSE_POOL]
[--n_filters_encoder_image N_FILTERS_ENCODER_IMAGE [N_FILTERS_ENCODER_IMAGE ...]]
[--n_filters_encoder_depth N_FILTERS_ENCODER_DEPTH [N_FILTERS_ENCODER_DEPTH ...]]
[--resolutions_backprojection RESOLUTIONS_BACKPROJECTION [RESOLUTIONS_BACKPROJECTION ...]]
[--n_filters_decoder N_FILTERS_DECODER [N_FILTERS_DECODER ...]]
[--deconv_type DECONV_TYPE]
[--min_predict_depth MIN_PREDICT_DEPTH]
[--max_predict_depth MAX_PREDICT_DEPTH]
[--weight_initializer WEIGHT_INITIALIZER]
[--activation_func ACTIVATION_FUNC]
[--min_evaluate_depth MIN_EVALUATE_DEPTH]
[--max_evaluate_depth MAX_EVALUATE_DEPTH]
[--output_path OUTPUT_PATH] [--save_outputs]
[--keep_input_filenames]
[--depth_model_restore_path DEPTH_MODEL_RESTORE_PATH]
[--device DEVICE]
run_kbnet.py: error: unrecognized arguments: --avg_pool_sizes_sparse_to_dense_pool 0 --encoder_type knet_v1 fusion_conv_previous sparse_to_dense_pool_v1 --input_type sparse_depth validity_map 3 3 3 0 --n_resolutions_encoder_intrinsics 0 1 2 3 --skip_types image depth --decoder_type multi-scale --output_kernel_size 3 --outlier_removal_method remove

So I deleted the unrecognized arguments and run it again, this time I got these numbers:

Evaluation results:
     MAE      RMSE      iMAE     iRMSE
 122.836   228.426    24.147    50.003
     +/-       +/-       +/-       +/-
  71.550   130.133    16.920    36.531

I know the numbers are close to the reported results in this repo, but I think maybe I can perfectly reproduce the results you reported if considering the unrecognized arguments.
My question is: How can I reproduce the results which are closer to your reported results? Is there something wrong with my operation?

My environment is as follows:
torch                  1.3.0
torchvision            0.4.1
Python 3.7.12
CUDA 10.2

Thank you in advance.

How to get the beautiful demo?

In the given demo, you have a sparse points and colored predicted point clouds, I want to know how to get the similar demo?

Role of 'datasets.load_image_triplet' in validation

Hello,
When the image validation is run, the dataloader for inference is in line 764 of kbnet.py is initialized as:

    dataloader = torch.utils.data.DataLoader(
        datasets.KBNetInferenceDataset(
            image_paths=image_paths,
            sparse_depth_paths=sparse_depth_paths,
            intrinsics_paths=intrinsics_paths),
        batch_size=1,
        shuffle=False,
        num_workers=1,
        drop_last=False) 

in which case the KBNetInferenceDataset class is initialized with the default use_image_triplet=True, and tries to fetch and split triplet of images. I understand its function in the training, but why is it so in the validation?

Questionable depth map on VOID ground truth + inference

Hey there!
Thank you for the work!
I tried it out on my own sparse depth map + rgb image, and it didn't perform too well at all. I also visualized some ground truth data from the VOID dataset, and found that some pointclouds look similarly bad. The copyroom folder looks fine, but the first image from birthplace_of_internet already looks bad. I can understand my own dataset could be problematic concerning sparse depth map resolution, but after looking at some ground truth data, I'm wondering if the problem lies elsewhere.

Any idea on why my custom dataset would look like this? The "bad" ground truth from VOID still looks better than my results .
Here a link to the visualization of a VOID and a custom pointcloud.

I ran kbnet on both a Python 3.7 venv with the given dependency versions and a 3.9 venv with newer library versions.

I visualized everything using Open3D:

        image_opencv = cv2.imread(image_file)
        image = o3d.io.read_image(image_file)
        depth_image = o3d.io.read_image(depth_file)
        K = np.loadtxt(intrinsics_file)
        intrinsic = o3d.camera.PinholeCameraIntrinsic(image_opencv.shape[0], image_opencv.shape[1], K[0][0],K[1][1], K[0][2], K[1][2])
        rgbd_image = o3d.geometry.RGBDImage.create_from_color_and_depth(image, depth_image, convert_rgb_to_intensity=False)
        pcd = o3d.geometry.PointCloud.create_from_rgbd_image(rgbd_image, intrinsic)
        o3d.visualization.draw_geometries([pcd])

Kitti static frames

Hey, Alex.
I noticed that you used a kitti_static_frames.txt file when processing the dataset. Is the purpose of this to pick out pictures of scenes without moving objects?

How are the gradients for the image information loss computed for backprop?

Hello,
I was wondering how the gradients for the loss function of , say color consistency loss, are computed wrt the weights by backprop.

You have your depth d at a coordinate which is a function of your weights, say X at a pixel location i, i.e. d(X,i) .
From this depth, you get your warped image coordinate, say W(d). For the color consistency, you compare the pixel values of the source image, and warped image, which would be some difference between img(i) and img(W(d(X,i) ) .

Numerically, I can see how you can obtain the derivative of this function wrt the weights, but how does auto-differentiation do it analytically, since img(i) cannot really be stated analytically?

about NYUv2 data

Hi Alex,
I notice that you update the script for downloading the NYUv2 dataset. Thanks.
I downloaded the raw data from the NYUv2 official website weeks ago. But I found the unzipped data contains many files, with extensions like: '.dump', '.pgm', and '.ppm'.
And the name of files are like:
image

However, the setup python file of NYUv2 seems to only accept the files with '.png' extension.

# Example: nyu/training/depths/raw_data/bedroom_0001/r-1294886360.208451-2996770081.png

So my question is: Did I download the correct NYUv2 data? or How can I set up the data for your code?
Thanks in advance!

Error while training

Hello,
While running the training script train_kbnet_void1500.sh, I inevitably end up getting this error after 5000+ steps:

Begin training...
Step=  1000/67350  Loss=1.33181  Time Elapsed=0.15h  Time Remaining=10.22h
Step=  2000/67350  Loss=1.39467  Time Elapsed=0.31h  Time Remaining=9.97h
Step=  3000/67350  Loss=1.27011  Time Elapsed=0.46h  Time Remaining=9.78h
Step=  4000/67350  Loss=1.13345  Time Elapsed=0.61h  Time Remaining=9.62h
Step=  5000/67350  Loss=1.04204  Time Elapsed=0.76h  Time Remaining=9.49h
Traceback (most recent call last):
  File "src/train_kbnet.py", line 251, in <module>
    n_thread=args.n_thread)
  File "/home/madharak/ws/calibrated-backprojection-network/src/kbnet.py", line 524, in train
    log_path=log_path)
  File "/home/madharak/ws/calibrated-backprojection-network/src/kbnet.py", line 571, in validate
    for idx, (inputs, ground_truth) in enumerate(zip(dataloader, ground_truths)):
  File "/home/madharak/anaconda3/envs/depth_completion/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 819, in __next__
    return self._process_data(data)
  File "/home/madharak/anaconda3/envs/depth_completion/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 846, in _process_data
    data.reraise()
  File "/home/madharak/anaconda3/envs/depth_completion/lib/python3.7/site-packages/torch/_utils.py", line 385, in reraise
    raise self.exc_type(msg)
TypeError: function takes exactly 5 arguments (1 given)

Have you encountered this error before, and do you have an idea of why it could be occurring? I guess it must be something to do with my data, but I have no idea what.
I'm running torch 1.3.0, and torchvision 0.4.1

how to get RMSE

I try to get the rmse by the groud_truth and output_depth folders generated by the model,I use data_utils.load_depth_with_validity_map read the ground_truth and validity_map and data_utils.load_depth read the output_depth,but can't get the initial result.
Thanks

images = []
for path in image_paths:
    image = data_utils.load_image(path)
    images.append(np.stack(image, axis=-1))

ground_truths = []
for path in ground_truth_paths:
    ground_truth, validity_map = data_utils.load_depth_with_validity_map(path)
    ground_truths.append(np.stack([ground_truth, validity_map], axis=-1))

output_depths = []
for path in output_depth_paths:
    output_depth = data_utils.load_depth(path)
    output_depths.append(np.stack(output_depth, axis=-1))

for i in range(length):
    ground_truth = ground_truths[i,:,:,:]
   # ground_truth = np.squeeze(ground_truth)
    output_depth = output_depths[i,:,:]
    
    validity_map = ground_truth[:, :, 1]
    ground_truth = ground_truth[:, :, 0]
    
    validity_mask = np.where(validity_map > 0, 1, 0)
    min_max_mask = np.logical_and(
        ground_truth > 0,
        ground_truth < 100)
    mask = np.where(np.logical_and(validity_mask, min_max_mask) > 0)
    #output_depth = output_depth[mask]
    #ground_truth = ground_truth[mask]

    mae[i] = eval_utils.mean_abs_err(1000 * output_depth, 1000 * ground_truth)
    rmse[i] = eval_utils.root_mean_sq_err(1000 * output_depth, 1000 * ground_truth)

    print(eval_utils.root_mean_sq_err(1000.0 * output_depths[i,:,:][np.where(validity_map > 0,1,0)], 1000.0 * ground_truths[i,:,:,0][np.where(validity_map > 0,1,0)]))
    imae[i] = eval_utils.inv_mean_abs_err(0.001 * output_depth, 0.001 * ground_truth)
    irmse[i] = eval_utils.inv_root_mean_sq_err(0.001 * output_depth, 0.001 * ground_truth)
    #print(rmse[i])
    # Compute mean metrics
mae   = np.mean(mae)
rmse  = np.mean(rmse)
imae  = np.mean(imae)
irmse = np.mean(irmse)

Question on coordinate frames for pose data

Hello,
Photo_loss

In the above, the relative pose g(tau)(t) belonging to SE(3), refers to the transformation from the world frame to the camera frame right? That is, the pose is wrt the camera frame.

Performance of pretrained model is not good

Hi, Alex, Thank you so much for the excellent work. I am using the pretrained model to run on VOID test dataset. However, the performance is pretty bad for all three cases. Here is a screen shot of the results. I am wondering where the problem could be. (1) Does the pytorch version matters . Mine is cuda1.8.1 + cuda11.1. (2) Is the pretrained model on share drive the final version used in the paper ? Thank you very much.

Ian
Capture

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.