Giter Site home page Giter Site logo

disk's Introduction

DISK

Official code release for DISK: learning local features with policy gradient. If you use this code in your work, please cite us as

@article{tyszkiewicz2020disk,
  title={DISK: Learning local features with policy gradient},
  author={Tyszkiewicz, Micha{\l} and Fua, Pascal and Trulls, Eduard},
  journal={Advances in Neural Information Processing Systems},
  volume={33},
  year={2020}
}

Table of contents

  1. Installation
  2. Inference
  3. Training
  4. Extending

Installation

  1. Clone this repo recursively
  2. cd into this repo: the next step uses relative paths
  3. Execute pip install --user -r requirements.txt

Inference

Feature extraction

To extract features, execute

python detect.py h5_artifacts_destination images_directory

This should create h5_artifacts_destination/keypoints.h5 and h5_artifacts_destination/descriptors.h5 compatible with the IMW benchmark. The model by default uses a 4-layer U-Net architecture which means that the image dimensions have to be a multiple of 16: for this reason you will probably want to specify the --height and --width flags to scale the input images accordingly. The images will be scaled preserving their aspect ratio (by 0-padding the missing values) and the keypoint locations will be rescaled and filtered with respect to the original image dimensions.

You can use --help to learn about other options, in particular it is possible to specify the weights file with --model-path. We provide save-depth.pth, the checkpoint trained with depth-based reward and reported in the paper (default), as well as save-epipolar.pth, the savepoint trained with epipolar reward and shown in supplementary material.

Keypoint matching

Execute

python match.py h5_artifacts_destination

(or use --help to learn about other options). This should create h5_artifacts_destination/matches.h5 incompatible with the IMW benchmark: instead of saving matches as {image_name_1}-{image_name_2}, it saves them as {image_name_1}/{image_name_2}, which creates HDF groups and therefore allows this approach to scale to large image collections (saving HDFs with > 300k top-level groups becomes painfully slow due to hashing overhead).

Viewing results

The view_h5.py script can be used to view artifacts generated by detect.py and match.py.

Exporting to COLMAP

After features are detected and matched, the results can be converted into a COLMAP-compatible database format with colmap/h5_to_db.py h5_artifacts_location raw_images_location. Note that the features are inserted WITHOUT their descriptors, so our match.py has to be used to perform the matching beforehand. At the same time, match.py doesn't run pose estimation, so the exhaustive feature matching stage of the COLMAP pipeline still has to be ran. An example pipeline use below:

# assume we have the images in scene/images
python detect.py --height 1024 --width 1024 --n 2048 scene/h5 scene/images
python match.py --rt 0.95 --save-threshold 100 scene/h5
python colmap/h5_to_db.py --database-path scene/database.db scene/h5 scene/images

# don't use GPU since we aren't computing the descriptor distance matrices anyway,
# only RANSAC
colmap exhaustive_matcher --database_path scene/database.db --SiftMatching.use_gpu 0
mkdir scene/sparse
colmap mapper --database_path scene/database.db --image_path scene/images --output_path scene/sparse

Please try h5_to_db.py --help for extra additional options.

Training

The training script

Assuming data is available, python train.py DATASETS_LOCATION starts training. The --reward switch allows for choosing the reward scheme (depth or epipolar). For more information, execute python train.py --help.

Reproducing our results

The data we used for training and validation can be downloaded by executing the download_dataset script (~164 gb). It will download the data into datasets.epfl.ch/disk-data/ and this is the path that should be given to train.py. The default settings of the script will learn with the inverse softmax matching temperature inverse_T (called θ_M in the paper) annealed from 15 to 50 over the course of first 20 epochs. We then pick the best checkpoint according to validation AUC, as reported by python compute_validation_auc.py TENSORBOARD_LOG_FILE. Following this schedule allowed us to obtain 0.50432 stereo AUC and 0.72624 multiview AUC on IMW2020 test set with 2k features, slightly less than reported in the paper (0.51315 and 0.72705, respectively).

The paper results (available as depth-save.pth, the default checkpoint in detect.py) were obtained through an ad-hoc schedule of annealing θ_M between 15 and 25 over 10 epochs and then training for further 40 epochs. We picked the best checkpoint obtained this way (39th) and fine-tuned it with a schedule of θ_M=25+epoch_number, for another 50 epochs, obtaining the best model at 20th epoch (θ_M=45). We default to the currently presented mode of training for simplicity, while disclosing the original process.

As people often request this, we have uploaded the cached results for the MMA metric on HPatches (Figure 5 in the NeurIPS paper) to this repository: they are available on the results/hpatches folder. You can read them with this notebook, similarly to the cached results provided by that repository.

Low GPU memory training

We performed our experiments with 32GB version of Nvidia V100 GPUs. However, running python train.py --substep 2 --batch-size 1 --chunk-size 10000 --warmup 500 should be functionally equivalent with that setup and fit within 11/12gb GPUs (note that training in this mode may take on the order of 2 weeks!).

Custom data preparation

Alternatively, one can use a custom dataset laid out in the proper format, as explained more in depth here. We provide a script to automate that process in the case of photo collections posed with COLMAP.

Creating new datasets by importing from COLMAP

A new dataset (for instance with custom scenes) can be created by importing from COLMAP outputs. One should run COLMAP on the images, including steps of image rectification and patch match depth estimation. This should leave the user with a directory structured as

$ tree colmap_output
colmap_output/
├── images
│   ├── 2020_07_25__12_09_03.jpg
│   ├── 2020_07_25__12_09_05.jpg
│   ├── ...
├── run-colmap-geometric.sh
├── run-colmap-photometric.sh
├── sparse
│   ├── cameras.bin
│   ├── images.bin
│   └── points3D.bin
└── stereo
    ├── consistency_graphs
    ├── depth_maps
    │   ├── 2020_07_25__12_09_03.jpg.geometric.bin
    │   ├── 2020_07_25__12_09_03.jpg.photometric.bin
    │   ├── 2020_07_25__12_09_05.jpg.geometric.bin
    │   ├── 2020_07_25__12_09_05.jpg.photometric.bin
    │   ├── ...
    ├── fusion.cfg
    ├── normal_maps
    │   ├── ...
    └── patch-match.cfg

one can then execute python colmap/colmap2dataset.py colmap_output --name my_scene to create an extra "dataset" directory:

tree colmap_output/dataset/
├── calibration
│   ├── calibration_2020_07_25__12_09_03.jpg.h5
│   ├── calibration_2020_07_25__12_09_05.jpg.h5
│   ├── ..
├── dataset.json
└── depth
    ├── 2020_07_25__12_09_03.h5
    ├── 2020_07_25__12_09_05.h5
    ├── ..

The dataset.json is a file for instantiating DISK dataloaders and it contains a collection of absolute paths to contents of colmap_output/dataset and colmap_output/images, so those should not be moved afterwards. colmap_output/stereo and colmap_output/sparse can be safely deleted to conserve disk space.

In case one wants to merge multiple scenes into a single dataset, she can execute python colmap/merge_datasets.py my_scene_1/dataset/dataset.json my_scene_2/dataset/dataset.json ... in order to obtain a single file called merged.json which contains all the scenes (and still references the files in their original locations for each of the scenes!). Scenes with repeating names (as given by the --name flag of colmap2dataset) will be renamed to unique (but non-informative) names.

Extending

We tried to keep the code easy to understand and reasonably documented. Please open an issue if problems are encountered.

@dimchecked

We extensively use torch_dimcheck (the @dimchecked function decorator) for clarifying function signatures: please refer to the repository for extra information.

NpArray

We often deal with collections of tensors which are semantically batched but of different shapes (such as lists of features in different images of the same scene). Since PyTorch doesn't have the concept of jagged tensors, we wrap them with numpy arrays with dtype=object, rather than standard lists. This allows us to retain the reshaping, stackin and indexing functionality of those math libraries. In signatures, those are often annotated with the NpArray type annotation.

disk's People

Contributors

ducha-aiki avatar etrulls avatar jatentaki avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

disk's Issues

backbone of unet

Hi @ducha-aiki @jatentaki Thanks for your great works!
i have some questions about the unet backbone you change,

  1. why you replace bn by instance normalization, did you do some ablation experiments?
  2. why you replace Relu by PRelu, did you do some ablation experiments?
  3. why you replace the two convolutional layers per block by single conv layer, did you do some ablation experiments?
  4. whats the meaning of (NL) number of landmarks in your Results on ETH-COLMAP? in other papers i didnt see this, usualy use #Reg Images. # Sparse Points Track Length, Reproj Error .#Dense Points

Looking forward to your reply! thanks

descriptor

Hello, regarding the descriptor of DISK, do you have a 256-dimensional preprocessing model?

too few features during training

Hi,

Thank you for open-sourcing your training code.
I want to change the backbone and train a model using your training scheme. I use the default configuration. However, the model cannot extract any features during training (~300 steps) and it causes errors when calculating the "distance" and "affinity" matrix. Could you give me some advice?

Looking forward to your reply :)

AUC for HPatches Image Matching MMA

Hi!
Thank you so much for this work.
I noticed in this paper you reported AUC(5px) for HPatches Image Matching MMA (0.698 for DISK 8k).
Could you please explain how this AUC is calculated at each threshold? How did you derive 0.698?
Or if you could kindly provide the code for this, that would be great!
Thank you so much!

questions about disk's features

Hi, @jatentaki
Thank you for your great work! I have some further questions:

  1. have you tested disk's result on aachen localization benchmark and ETH benchmark (for 3D construction tasks)? How's the result?
  2. why you choose unet as feature extraction network? or have you compared with other networks?
  3. have you tried to do feature detection in different layers? (just like multi-scale strategy)
  4. I found disk's keypoints number is less than other SOTA methods, but still works great. What's the major reason, is it because of the grid strategy during training?

Seems many questions...however, really looking forward to your reply. Thanks a lot in advance!

RuntimeError: downsample feature map of size torch.Size([1, 64, 135, 240])

Hi @ducha-aiki @jatentaki Thanks for your great works! I met some errors:
Traceback (most recent call last):
File "detect.py", line 249, in
described_samples = extract(dataset, args.h5_path)
File "detect.py", line 201, in extract
batched_features = extract(bitmaps)
File "/home/anaconda3/envs/pytorch1/lib/python3.6/site-packages/torch_dimcheck/dimcheck.py", line 110, in wrapped
result = func(*args, **kwargs)
File "/home/disk/disk/model/disk.py", line 57, in features
descriptors, heatmaps = self._split(self.unet(images))
File "/home/anaconda3/envs/pytorch1/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/anaconda3/envs/pytorch1/lib/python3.6/site-packages/torch_dimcheck/dimcheck.py", line 110, in wrapped
result = func(*args, **kwargs)
File "/home/anaconda3/envs/pytorch1/lib/python3.6/site-packages/unets/unet.py", line 100, in forward
features.append(layer(features[-1]))
File "/home/anaconda3/envs/pytorch1/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/anaconda3/envs/pytorch1/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward
input = module(input)
File "/home/anaconda3/envs/pytorch1/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/anaconda3/envs/pytorch1/lib/python3.6/site-packages/unets/ops.py", line 42, in forward
raise RuntimeError(msg)
RuntimeError: Trying to downsample feature map of size torch.Size([1, 64, 135, 240])

which version of pytorch? any suggestions? Thanks!

question about the detection scores ?

Thanks for your great work!
I wonder why the keypoint scores are like [226.70332 214.92577 209.83643 ... 59.53643 59.525322 59.50951 ], ?

I noticed that the scores may be the log prob, but log x(0<x<1) will be a negative value, why values above like ..?
And how can I get a prob for the keypoints?

The website of dataset is down?

Hi,

I'm trying to download the MegaDepth training dataset using your script. But I found that the website is down. Do you have any idea about it?

Thank you very much.

Missing data request

Thank you for amazing work!

During dataset download some of the scenes seem to be corrupted. I tried to download Megadepth DISK dataset from website and it is down. 😢

Could you please provide these scenes? The missing files are ["0411", "0472", "0476", "0478", '0482']

How to get matched keypoints

Hi, thank you for opensourcing such a great model,

I'm now trying to run keypoint matching between two images ("image_0.png", "image_1.png"), and set those images into the directory ./pairs
, & created outputs directory of h5 files ./output
, & set contents of this repo codes ./ too,
I ran

python detect.py output ./pairs
python match.py output

and seems it worked & created

ls output

descriptors.h5 keypoints.h5 matches.h5

, but when I checked with

keypoint_f = h5py.File('output/keypoints.h5', 'r')
, it doesn't contain any keys,
is it working fine?

thanks a lot,

Tensor RT

Hi there,

Thanks for this great work.

Is there any tensor RT version of the weights available?

Regards,
Jacob

Slow inference speed

On my system with RTX 3080 8GB and Ryzen 9 6900hx, I get around 4 FPS for feature detection(input images were at 1280x720 resolution). Is there any way to increase the inference speed ?

Btw amazing work! I found DISK to be far more robust than superpoint+superglue, SoSnet in terms of matching at large change in rotation as well as illumination .Only bottleneck being the inference time. It's far too slow to be used on a real time pipeline(in my case its 25 FPS)

Using DISK for aerial images

Thank you for your amazing work.
I am using DISK quite successfully on aerial images, however, these mostly have high resolutions (talking about 16000 x 16000 pixels).
I am using the original images as .tif files in your proposed inference workflow.

If I use python detect.py --height 3200 --width 3200 (maximum for our GPUs) is a high accuracy of the final keypoints in the original image with full resolution still given?
Or would you propose a different workflow?

Question about the training time

This is a great study. I'd like to ask the author, if I use a V100 GPU to train DISK from scratch, approximately how long would it take? I hope to have a reference, thank you very much.

AttributeError: 'JpegImageFile' object has no attribute 'getexif'

Hi and thanks for your work!
I ran your pipeline Exporting to COLMAP
When I used the script h5_to_db.py I ran into the following issue:

Traceback (most recent call last):
File "colmap/h5_to_db.py", line 145, in
fname_to_id = add_keypoints(db, args.h5_path, args.image_path)
File "colmap/h5_to_db.py", line 72, in add_keypoints
camera_id = create_camera(db, path)
File "colmap/h5_to_db.py", line 40, in create_camera
focal = get_focal(image_path)
File "colmap/h5_to_db.py", line 12, in get_focal
exif = image.getexif()
AttributeError: 'JpegImageFile' object has no attribute 'getexif'

So I changed line 12 to exif = image._getexif() (added an underscore) and the pipeline worked. Maybe it is helpful for others.

dataset request

Excellent work, I would like to conduct further research based on this work. However, the official website for the megadepth disk dataset has been shut down, making it impossible to download. Could anyone share the megadepth disk dataset? I am willing to pay for it.

Get AttributeError: 'tuple' object has no attribute 'encode' from `match.py`

Could you please give some help to solve the following problem?

I have tested the matching with two images.

In /mnt/HDD4TB2/Disk-NIPS2020/H5_destination ...there are three files generated by detect.py:

- matches.h5
- keypoints.h5
- descriptors.h5

I have executed the following command:

python3 match.py /mnt/HDD4TB2/Disk-NIPS2020/H5_destination

However, I have received the following problem:

Processing /mnt/HDD4TB2/Disk-NIPS2020/H5_destination with DEV=cuda
0it [00:00, ?it/s]Traceback (most recent call last):
  File "match.py", line 194, in <module>
    brute_match(described_samples, hdf)
  File "match.py", line 149, in brute_match
    desc_1 = descriptors[key_1].to(DEV)
  File "match.py", line 68, in __getitem__
    descriptors = self.ds_file[ix][()]
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "/home/xxxxxx/anaconda3/envs/SGAT/lib/python3.6/site-packages/h5py/_hl/group.py", line 287, in __getitem__
    oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
  File "/home/xxxxxx/anaconda3/envs/SGAT/lib/python3.6/site-packages/h5py/_hl/base.py", line 200, in _e
    name = name.encode('ascii')
AttributeError: 'tuple' object has no attribute 'encode'
0it [00:00, ?it/s]

No model 'disk' saved in depth-save.pth

Hi, thanks for your great work.

I have one error when running detect.py, it seems that the unet.path_down.0.1.1.weight is missing

Missing key(s) in state_dict: "unet.path_down.0.1.1.weight".

then I tried to read the state_dict and found that only model named 'extractor' exists, will you provide the model named 'disk'?

question about the training result

Hello~ I wonder the performance of the final models would be similar or different when I train the network from scratch each time. Have you ever evaluated this issue?
Thanks.

environment

Hello, what is the specific version number of each library in requirements.txt

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.