Giter Site home page Giter Site logo

r2d2's Introduction

R2D2: Reliable and Repeatable Detector and Descriptor

This repository contains the implementation of the following paper:

@inproceedings{r2d2,
  author    = {Jerome Revaud and Philippe Weinzaepfel and C{\'{e}}sar Roberto de Souza and
               Martin Humenberger},
  title     = {{R2D2:} Repeatable and Reliable Detector and Descriptor},
  booktitle = {NeurIPS},
  year      = {2019},
}

Fast-R2D2

This repository also contains the code needed to train and extract Fast-R2D2 keypoints. Fast-R2D2 is a revised version of R2D2 that is significantly faster, uses less memory yet achieves the same order of precision as the original network.

License

Our code is released under the Creative Commons BY-NC-SA 3.0 (see LICENSE for more details), available only for non-commercial use.

Getting started

You just need Python 3.6+ equipped with standard scientific packages and PyTorch1.1+. Typically, conda is one of the easiest way to get started:

conda install python tqdm pillow numpy matplotlib scipy
conda install pytorch torchvision cudatoolkit=10.1 -c pytorch

Pretrained models

For your convenience, we provide five pre-trained models in the models/ folder:

  • r2d2_WAF_N16.pt: this is the model used in most experiments of the paper (on HPatches MMA@3=0.686). It was trained with Web images (W), Aachen day-time images (A) and Aachen optical flow pairs (F)
  • r2d2_WASF_N16.pt: this is the model used in the visual localization experiments (on HPatches MMA@3=0.721). It was trained with Web images (W), Aachen day-time images (A), Aachen day-night synthetic pairs (S), and Aachen optical flow pairs (F).
  • r2d2_WASF_N8_big.pt: Same than previous model, but trained with N=8 instead of N=16 in the repeatability loss. In other words, it outputs a higher density of keypoints. This can be interesting for certain applications like visual localization, but it implies a drop in MMA since keypoints gets slighlty less reliable.
  • faster2d2_WASF_N16.pt: The Fast-R2D2 equivalent of r2d2_WASF_N16.pt
  • faster2d2_WASF_N8_big.pt: The Fast-R2D2 equivalent of r2d2_WASF_N8.pt

For more details about the training data, see the dedicated section below. Here is a table that summarizes the performance of each model:

model name model size
(#weights)
number of
keypoints
MMA@3 on
HPatches
r2d2_WAF_N16.pt 0.5M 5K 0.686
r2d2_WASF_N16.pt 0.5M 5K 0.721
r2d2_WASF_N8_big.pt 1.0M 10K 0.692
faster2d2_WASF_N8_big.pt 1.0M 5K 0.650

Feature extraction

To extract keypoints for a given image, simply execute:

python extract.py --model models/r2d2_WASF_N16.pt --images imgs/brooklyn.png --top-k 5000

This also works for multiple images (separated by spaces) or a .txt image list. For each image, this will save the top-k keypoints in a file with the same path as the image and a .r2d2 extension. For example, they will be saved in imgs/brooklyn.png.r2d2 for the sample command above.

The keypoint file is in the npz numpy format and contains 3 fields:

  • keypoints (N x 3): keypoint position (x, y and scale). Scale denotes here the patch diameters in pixels.
  • descriptors (N x 128): l2-normalized descriptors.
  • scores (N): keypoint scores (the higher the better).

Note: You can modify the extraction parameters (scale factor, scale range...). Run python extract.py --help for more information. By default, they corespond to what is used in the paper, i.e., a scale factor equal to 2^0.25 (--scale-f 1.189207) and image size in the range [256, 1024] (--min-size 256 --max-size 1024).

Note2: You can significantly improve the MMA@3 score (by ~4 pts) if you can afford more computations. To do so, you just need to increase the upper-limit on the scale range by replacing --min-size 256 --max-size 1024 with --min-size 0 --max-size 9999 --min-scale 0.3 --max-scale 1.0.

Feature extraction with kapture datasets

Kapture is a pivot file format, based on text and binary files, used to describe SFM (Structure From Motion) and more generally sensor-acquired data.

It is available at https://github.com/naver/kapture. It contains conversion tools for popular formats and several popular datasets are directly available in kapture.

It can be installed with:

pip install kapture

Datasets can be downloaded with:

kapture_download_dataset.py update
kapture_download_dataset.py list
# e.g.: install mapping and query of Extended-CMU-Seasons_slice22
kapture_download_dataset.py install "Extended-CMU-Seasons_slice22_*"

If you want to convert your own dataset into kapture, please find some examples here.

Once installed, you can extract keypoints for your kapture dataset with:

python extract_kapture.py --model models/r2d2_WASF_N16.pt --kapture-root pathto/yourkapturedataset --top-k 5000

Run python extract_kapture.py --help for more information on the extraction parameters.

Evaluation on HPatches

The evaluation is based on the code from D2-Net.

git clone https://github.com/mihaidusmanu/d2-net.git
cd d2-net/hpatches_sequences/
bash download.sh
bash download_cache.sh
cd ../..
ln -s d2-net/hpatches_sequences # finally create a soft-link

Once this is done, extract all the features:

python extract.py --model models/r2d2_WAF_N16.pt --images d2-net/image_list_hpatches_sequences.txt

Finally, evaluate using the iPython notebook d2-net/hpatches_sequences/HPatches-Sequences-Matching-Benchmark.ipynb. You should normally get the following MMA plot: image.

New: we have uploaded in the results/ folder some pre-computed plots that you can visualize using the aforementioned ipython notebook from d2-net (you need to place them in the d2-net/hpatches_sequences/cache/ folder).

  • r2d2_*_N16.size-256-1024.npy: keypoints were extracted using a limited image resolution (i.e. with python extract.py --min-size 256 --max-size 1024 ...)
  • r2d2_*_N16.scale-0.3-1.npy: keypoints were extracted using a full image resolution (i.e. with python extract.py --min-size 0 --max-size 9999 --min-scale 0.3 --max-scale 1.0).

Here is a summary of the results:

result file training set resolution MMA@3 on
HPatches
note
r2d2_W_N16.scale-0.3-1.npy W only full 0.699 no annotation whatsoever
r2d2_WAF_N16.size-256-1024.npy W+A+F 1024 px 0.686 as in NeurIPS paper
r2d2_WAF_N16.scale-0.3-1.npy W+A+F full 0.718 +3.2% just from resolution
r2d2_WASF_N16.size-256-1024.npy W+A+S+F 1024 px 0.721 with style transfer
r2d2_WASF_N16.scale-0.3-1.npy W+A+S+F full 0.758 +3.7% just from resolution

Evaluation on visuallocalization.net

In our paper, we report visual localization results on the Aachen Day-Night dataset (nighttime images) available at visuallocalization.net. We used the provided local feature evaluation pipeline provided here: https://github.com/tsattler/visuallocalizationbenchmark/tree/master/local_feature_evaluation In the meantime, the ground truth poses as well as the error thresholds of the Aachen nighttime images (which are used for the local feature evaluation) have been improved and changed on the website, thus, the original results reported in the paper cannot be reproduced.

Training the model

We provide all the code and data to retrain the model as described in the paper.

Downloading training data

The first step is to download the training data. First, create a folder that will host all data in a place where you have sufficient disk space (15 GB required).

DATA_ROOT=/path/to/data
mkdir -p $DATA_ROOT
ln -fs $DATA_ROOT data 
mkdir $DATA_ROOT/aachen

Then, manually download the Aachen dataset here and save it as $DATA_ROOT/aachen/database_and_query_images.zip. Finally, execute the download script to complete the installation. It will download the remaining training data and will extract all files properly.

./download_training_data.sh

The following datasets are now installed:

full name tag Disk # imgs # pairs python instance
Random Web images W 2.7GB 3125 3125 auto_pairs(web_images)
Aachen DB images A 2.5GB 4479 4479 auto_pairs(aachen_db_images)
Aachen style transfer pairs S 0.3GB 8115 3636 aachen_style_transfer_pairs
Aachen optical flow pairs F 2.9GB 4479 4770 aachen_flow_pairs

Note that you can visualize the content of each dataset using the following command:

python -m tools.dataloader "PairLoader(aachen_flow_pairs)"

image

Training details

To train the model, simply run this command:

python train.py --save-path /path/to/model.pt 

On a recent GPU, it takes 30 min per epoch, so ~12h for 25 epochs. You should get a model that scores 0.71 +/- 0.01 in MMA@3 on HPatches (this standard-deviation is similar to what is reported in Table 1 of the paper).

If you want to retrain fast-r2d2 architectures, run:

python train.py --save-path /path/to/fast-model.pt --net 'Fast_Quad_L2Net_ConfCFS()'

Note that you can fully configure the training (i.e. select the data sources, change the batch size, learning rate, number of epochs etc.). One easy way to improve the model is to train for more epochs, e.g. --epochs 50. For more details about all parameters, run python train.py --help.

r2d2's People

Contributors

humenbergerm avatar jerome-revaud avatar mhumenbe avatar yocabon avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

r2d2's Issues

Some questions of sampler.py

1、
pscores = (feat1[None,:,:] * feat2[b2, :, xy2p[1], xy2p[0]]).sum(dim=-1).t()

I can't understand the code above, in the sampler.py, to compute the pscores.

2、
Does the mask get from the dataloader has been used? I found that the mask which the loss using is from the sampler.py.
I trid to pop the 'mask' from inputs of net, it still work.

Is there someone can explain it for me?
Thx a lot!

How to extract patch descriptor only?

Hi,

Is there a way of getting single descriptor out of 32x32 patch similar to HardNet or SIFT?
The reason is that I am interested to run original HPatches, patch-based protocol.
I do realize, that it is not the working mode, R2D2 is optimized for, but would like to have some results anyway.

If I do smth like:

out = r2d2([torch.zeros(1,3,32,32)])['descriptors'][0]

then I have [1,128,32,32] tensor, while I would like [1,128,1,1].

Could you release the calculation of matching performance ? Such as Repeatability and M-Scores

Thank you for releasing your code implementation. I enjoy reading your paper, as well as having a very good impression about the performance of R2D2.

It seems that your method clearly outperforms the others in terms of MMA, and it is nice that you have picked the similar evaluation technique as D2-Net. This makes it clear that your method is among the best. However, recently many papers appear to employ similar measurements, yet with some minor differences tuned to their specific assumption.

Therefore, I would like to kindly ask for an implementation of how you calculate the Repeatability and M-Scores? It would be really helpful because lots of people can use it as a reference (at least in python version).

when is the loss_from_ap in the ReliabilityLoss class called ?

Hi,

I am wondering when is the loss_from_ap in the ReliabilityLoss class called and why there is a minus in the equation instead of a plus, as in (eq 5) in the paper,

56. return 1 - ap*rel - (1-rel)*self.base

also could you please provide details about the input parameters and functioning of the NghSampler2 function
Thanks

About matching score

Hello, your work is excellent. I would like to follow your work, Can you open your test code about matching score?

Why the time of feature extraction is more than superpoint?

I notice the weight number of r2d2 is less than superpoint, but the time r2d2 used to extract feature is more than superpoint in my machine.
There is a GTX 1660Ti GPU in my machine,the time superpoint used to extract feature is about 0.03ms, but r2d2 is about 0.08(not the muti-scale).My input is a 640 * 480 picture.

Questions about the time cost of R2D2 feature extraction

HI @jerome-revaud ,

  1. I tried to extract R2D2 features using default multiscale settings, and for a 640*480 image, it takes about 200ms+ (about 4 scales). So I tried single scale for the same image, it takes about 70ms+. Is the time cost normal? And can I speed up the extraction process?

  2. I surprisely found that extract 2.5k, 5k, 10k keypoints take the same time. Is this normal?

Thanks in advance! Really hope to get your reply!

Doubts about reliability performance in custom image

Hello. Thanks for releasing the code. It works amazingly well on completely new data as well!

While testing it out on difficult views of a building on multiple views, this was the output that I got, (after matching features)

Screenshot from 2020-06-22 12-41-36

Focusing purely on the building (mostly on the right hand side of each image), surprisingly, the features have been matched extremely well. However, since the patterns of the building are repeated, according to Figure 1 of your paper, (and the reliability score) shouldn't the feature extraction stage prevent finding such repeatable patterns in the image?

While these extremely good matches does suit my purpose, I was wondering if this is infact going against what is stated in the paper, or if this behaviour is to be expected. If this is to be expected, could you elaborate a little more on the reliability of keypoints from this perspective that I have put forth?

Thanks

Note: I am not overly worried about the other 'wrong' matches in the image as of now, since I understand that they've not been trained on anything remotely similar to such views/images, but the building reliability is my primary query.

Looking forward to more example programs, thanks again.

Thank you for your very significant work.
Please ask how to achieve the matching of two images after extracting features, such as finding a homography matrix and connecting the key points of the image. Looking forward to more example programs, thanks again.

Some details about Oxford dataset

Hi, sorry for bothering, when I try to reproduce the network's performance on oxford dataset, I was stuck by the process of using DoG detector to detect keypoints on an image. I am wondering whether the DoG detector is implemented by yourself or you just resort to some open source code(If so, could you please provide me the link?). Thank You!!!

question about two losses

Hi @jerome-revaud ,
I'm quite interested in your two losses. You have tested the aachen localization result with limited keypoints, the result decreases by 3~6% (10k kpts to 2k), it seems acceptable. And i'm curious is peaky loss works so that you can using limited kpts? maybe because peaky loss ensures discriminativeness in a patch?
if i wanna localize with limited kpts, do you have any further suggestions/strategies that i can use?

thanks in advance!

Visualize heatmap

Hi, thank you for the significant work. I really enjoy your paper.

I would like to visualize the heatmap of repeatability and reliability (as Fig 6). Could you please guide me how to do it?

Thank you

Generate image pairs using random homography

hello!
I want to use my own data to fine-tune your model to better fit our scene. I need to build my own training data set. According to the random homography mentioned in your article, could you please give me some advice?

Matching-score

I use your r2d2_WASF_N16.pt to evaluate on HPatches , I get the MMA@3 = 0.721 , the same as your data, but I get the m-score only 0.292, Can you tell me some details about your m-score evaluation code?

Reliability loss converges to 0.5 instead of 0.

Hello,

I am using your proposed framework to match historical and present satellite imagery. However, the reliability loss does not converge to 0, but to 0.5. Which results in an AP of 0 and a reliability score of 0. I would guess that is due to the initial AP being too low.

Did you experience any similar behavior while testing? Or do you have an idea why this could happen?

Unable to extract features with pretrained models under conda

Hello,

we are trying to run the feature extraction part from the R2D2 project in our lab, but facing some issues with it.

We have set up everything under Anaconda, accourding to the project's Github readme file.

Each time we want to run the feature extraction script, we receive the following error:

(r2d2) geza@labor10-HP-Z440-Workstation:~/r2d2$ python extract.py --model models/r2d2_WAF_N16.pt --images /home/geza/Asztal/KiszomborEXT_Canon_vs_Canon_v3/Canon/001/resized/4R1A2625.png --top-k 5000
Launching on GPUs 0

>> Creating net = Quad_L2Net_ConfCFS()
 ( Model size: 486K parameters )

Extracting features for /home/geza/Asztal/KiszomborEXT_Canon_vs_Canon_v3/Canon/001/resized/4R1A2625.png
Traceback (most recent call last):
  File "extract.py", line 182, in <module>
    extract_keypoints(args)
  File "extract.py", line 143, in extract_keypoints
    verbose = True)
  File "extract.py", line 97, in extract_multiscale
    img = F.interpolate(img, (nh,nw), mode='bilinear', align_corners=False)
  File "/home/geza/anaconda3/envs/r2d2/lib/python3.6/site-packages/torch/nn/functional.py", line 3013, in interpolate
    scale_factor_list[0], scale_factor_list[1])
RuntimeError: CUDA error: no kernel image is available for execution on the device

Our regular environment consists of an HP workstation, equipped with an NVIDIA Tesla K20c GPU.
Here are some environment info:

Architecture: x86_64
CPU: Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz
RAM: 32 GB
OS: Ubuntu 18.04.1 LTS (Bionic Beaver)
GPU: Tesla K20c and GeForce GT 710

NVIDIA driver info + SMI output:

cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module  418.87.00  Thu Aug  8 15:35:46 CDT 2019
GCC version:  gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)

nvidia-smi

Wed Jul  1 10:53:20 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00    Driver Version: 418.87.00    CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K20c          On   | 00000000:05:00.0 Off |                    0 |
| 30%   37C    P8    17W / 225W |      0MiB /  4743MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GT 710      On   | 00000000:06:00.0 N/A |                  N/A |
| 50%   53C    P8    N/A /  N/A |    104MiB /   973MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    1                    Not Supported                                       |
+-----------------------------------------------------------------------------+

Conda environment information:

conda info

     active environment : r2d2
    active env location : /home/geza/anaconda3/envs/r2d2
            shell level : 1
       user config file : /home/geza/.condarc
 populated config files : /home/geza/.condarc
          conda version : 4.8.3
    conda-build version : 3.18.9
         python version : 3.7.4.final.0
       virtual packages : __cuda=10.1
                          __glibc=2.27
       base environment : /home/geza/anaconda3  (writable)
           channel URLs : https://repo.anaconda.com/pkgs/main/linux-64
                          https://repo.anaconda.com/pkgs/main/noarch
                          https://repo.anaconda.com/pkgs/r/linux-64
                          https://repo.anaconda.com/pkgs/r/noarch
          package cache : /home/geza/anaconda3/pkgs
                          /home/geza/.conda/pkgs
       envs directories : /home/geza/anaconda3/envs
                          /home/geza/.conda/envs
               platform : linux-64
             user-agent : conda/4.8.3 requests/2.23.0 CPython/3.7.4 Linux/4.15.0-108-generic ubuntu/18.04.1 glibc/2.27
                UID:GID : 1010:1010
             netrc file : None
           offline mode : False

Installed packages with version:

# packages in environment at /home/geza/anaconda3/envs/r2d2:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main
blas                      1.0                         mkl
ca-certificates           2020.6.24                     0
certifi                   2020.6.20                py36_0
cudatoolkit               10.1.243             h6bb024c_0
cycler                    0.10.0                   py36_0
dbus                      1.13.16              hb2f20db_0
expat                     2.2.9                he6710b0_2
fontconfig                2.13.0               h9420a91_0
freetype                  2.10.2               h5ab3b9f_0
glib                      2.65.0               h3eb4bd4_0
gst-plugins-base          1.14.0               hbbd80ab_1
gstreamer                 1.14.0               hb31296c_0
icu                       58.2                 he6710b0_3
intel-openmp              2020.1                      217
jpeg                      9b                   h024ee3a_2
kiwisolver                1.2.0            py36hfd86e86_0
ld_impl_linux-64          2.33.1               h53a641e_7
libedit                   3.1.20191231         h7b6447c_0
libffi                    3.3                  he6710b0_1
libgcc-ng                 9.1.0                hdf63c60_0
libgfortran-ng            7.3.0                hdf63c60_0
libpng                    1.6.37               hbc83047_0
libstdcxx-ng              9.1.0                hdf63c60_0
libtiff                   4.1.0                h2733197_1
libuuid                   1.0.3                h1bed415_2
libxcb                    1.14                 h7b6447c_0
libxml2                   2.9.10               he19cac6_1
lz4-c                     1.9.2                he6710b0_0
matplotlib                3.2.2                         0
matplotlib-base           3.2.2            py36hef1b27d_0
mkl                       2020.1                      217
mkl-service               2.3.0            py36he904b0f_0
mkl_fft                   1.1.0            py36h23d657b_0
mkl_random                1.1.1            py36h0573a6f_0
ncurses                   6.2                  he6710b0_1
ninja                     1.9.0            py36hfd86e86_0
numpy                     1.18.5           py36ha1c710e_0
numpy-base                1.18.5           py36hde5b4d6_0
olefile                   0.46                     py36_0
openssl                   1.1.1g               h7b6447c_0
pcre                      8.44                 he6710b0_0
pillow                    7.1.2            py36hb39fc2d_0
pip                       20.1.1                   py36_1
pyparsing                 2.4.7                      py_0
pyqt                      5.9.2            py36h05f1152_2
python                    3.6.10               h7579374_2
python-dateutil           2.8.1                      py_0
pytorch                   1.5.1           py3.6_cuda10.1.243_cudnn7.6.3_0    pytorch
qt                        5.9.7                h5867ecd_1
readline                  8.0                  h7b6447c_0
scipy                     1.5.0            py36h0b6359f_0
setuptools                47.3.1                   py36_0
sip                       4.19.8           py36hf484d3e_0
six                       1.15.0                     py_0
sqlite                    3.32.3               h62c20be_0
tk                        8.6.10               hbc83047_0
torchvision               0.6.1                py36_cu101    pytorch
tornado                   6.0.4            py36h7b6447c_1
tqdm                      4.47.0                     py_0
wheel                     0.34.2                   py36_0
xz                        5.2.5                h7b6447c_0
zlib                      1.2.11               h7b6447c_3
zstd                      1.4.4                h0b5b093_3

The CUDA toolkit has been recently upgraded from version 10.0 to version 10.1. When the versions were inaccurate, we were receiving errors too:

Traceback (most recent call last):
  File "extract.py", line 182, in <module>
    extract_keypoints(args)
  File "extract.py", line 112, in extract_keypoints
    iscuda = common.torch_set_gpu(args.gpu)
  File "/home/geza/r2d2/tools/common.py", line 32, in torch_set_gpu
    os.environ['HOSTNAME'],os.environ['CUDA_VISIBLE_DEVICES'])
  File "/home/geza/anaconda3/envs/r2d2/lib/python3.6/os.py", line 669, in __getitem__
    raise KeyError(key) from None
KeyError: 'HOSTNAME'

We tried to run this on two machines (workstations) and the errors were the same. The configuration of both machines is the same.

Is here something that we're missing out?

Thank you for your answer in advance!

How to reproduce the detector repeatability on the Oxford dataset ?

Hi,
I tested detector repeatability using pretrained models and the Oxford dataset. But my results was not same with paper's. (My results are half or less than the paper.)

I followed the method of 'Scale & affine invariant interest point detectors' that consider that two points A and B correspond if the error in relative point location is less than 1.5pixel.
(In serction 4.1, you said you followed this paper)(LINK)
A is keypoints of image 1 and B is points of image 2, 3, ...
image

Could you tell me how experiment the repeatibility test? or Could you upload the code about that experiment?

Matching Score

Hi, can you provide the code for computing Matching Score? I use the model in your code to compute, but only 0.311@3pixel.

Codes for making datasets

Hi, thanks for your great work. The datasets you made for training is very crucial in your work. Will you release the codes which you used to make the datasets and release more details about it? Thank you.

Testing

How do I test the code on a single pair of images?

code trouble

Thank you for your great work. Can you explain the details of line 104 in train.py? I really want to figure it out, Thank you very much.

How to generate .npy file ?

Thanks for your great work, I want know how to generate .npy file ? I change the model ,so I want evaluate on HPatches, but I don't know how to generate .npy file? can you tell me? Thanks again.

Learn from Correspondences instead of aflow

Hello,

While training the reliability AP loss, is it possible to feed correspondences instead of aflow? This is because the optical flow is mostly dense while the correspondences are sparse in nature. Have you tried using correspondences to train the network? Would this affect the sampler and the AP loss?
Thanks!

Images used for style transfer

Hello, I am interested in the images you used for transferring the style of a Aachen night image into a day image. If I look at the provided dataset, the image names contain the name of the day image, but not the original name of the night image you used for the style transfer. For example, the image '1031.jpg.st_3473.jpg' corresponds to the day image '1031.jpg', but I cannot find any '3473.jpg' in the set of night images. Could you tell me how I can find out which night images you used, please?

Also, which style transfer model did you use? I couldn't find it in the paper.

Thank you and kind regards

Reproduce the results in the paper

Hi,

Thanks for your excellent paper and code for the feature detection.
In this repo, you only released the MMA metric.
I want to test the pretrained model on repeatibility, matching score and visual localization.
Could you tell me the specific detail on the repeatibility and matching score metric?
And could you tell me the localization pipeline used in your experiment?

Is the model structure in the code the same as described in the paper?

While debugging the code in train.py, I found the model structure of Quad_L2Net as follows:

ModuleList(
(0): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=False, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=False, track_running_stats=True)
(5): ReLU(inplace=True)
(6): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(7): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=False, track_running_stats=True)
(8): ReLU(inplace=True)
(9): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2))
(10): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=False, track_running_stats=True)
(11): ReLU(inplace=True)
(12): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2))
(13): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=False, track_running_stats=True)
(14): ReLU(inplace=True)
(15): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(4, 4), dilation=(4, 4))
(16): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=False, track_running_stats=True)
(17): ReLU(inplace=True)
(18): Conv2d(128, 128, kernel_size=(2, 2), stride=(1, 1), padding=(2, 2), dilation=(4, 4))
(19): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=False, track_running_stats=True)
(20): Conv2d(128, 128, kernel_size=(2, 2), stride=(1, 1), padding=(4, 4), dilation=(8, 8))
(21): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=False, track_running_stats=True)
(22): Conv2d(128, 128, kernel_size=(2, 2), stride=(1, 1), padding=(8, 8), dilation=(16, 16))
)

Is there dilated convolution only at the green arrow in Figure 2 of the paper? In the model above, the convolutional layers behind the model all contain hole convolutions, and the value of the expansion is still doubling.
1)Is the structure of the model I debugged correct?
2)How should I understand Figure 2 in the paper?

Welcome to the Learning Group!!!

To discuss and apply the wonderful model R2D2, we set up a learning group. Welcome to add my WeChat (BaichuanHuang) and you will be invited into the group.

What if I want it work in a low resolution situation?

It is more like a general question...
Considering the problem I am working on (light field), I wish to use R2D2 with low resolution images (like 40 by 40, and the images are supposed to have more than 60% overlapping), but it seems to be hard for a feature exporter to work in low resolution.
Any suggestion on how to tweak?

About the difference between MMA and M-Score

Hi @jerome-revaud , I tried to understand the difference of MMA and M-Score. And they really looked the same:
{the putative matches considered correct(<threshold) } / {the putative matches(with one covisible kpt)}

Could you post the implementation of the metrics?

Getting NaN at reliability loss occasionally during training

Could you please suggest how to solve this problem and why it happens ?

During training, I get NaN at reliability loss occasionally, which happens more often when batch size is set to small number such as 1 or 2. (threads = 1)
Here, I also attached the screen shot when it happens...
NaN_problem

I have used the default setting in the train.py, except that batch size = 1. My computer does not have enough GPU memory when batch size > 4 .

Initially, I suspected that this problem happens due to the lack of the corresponding pixels between two image. Therefore, I have tried to skip any samples that causes the NaN loss by forcing MyTrainer.forward_backward() in train.py to return before calculating loss.backward() as shown in the following captured screen and adding "continuous" and print(details) in the if-condition in Line 55 and continue training until it finished 25 epoch.

Mytrainer

However, my trained R2D2 WASF N16 has a drop in MMA performance as shown below in the attached plot...
Here, I have used the default setting with WASF N16 and min_size = 256, max_size =1024.

The performance of my trained model is denoted as R2D2 WASF N16 (Self-trained) and
it is compared with the downloaded pretrained R2D2_WASF_N16.pt, at the same feature extraction setting.

But the performance drop is quite obvious.
The MMA @ 3px is 0.67 instead of 0.72+/- 0.01.

hseq

Explanation of this implementation

Hi, thanks for the great repo! I'm interested in finetuning the model for my research and I'm trying to understand your implementation here:

scale = np.sqrt(np.clip(np.abs(dx[1]*dy[0] - dx[0]*dy[1]), 1e-16, 1e16))

  1. what is this line doing?

  2. From the comments above, it says "applying a median filter" but I didn't see median filter afterwards. Is this comment deprecated?

  3. for _ in range(50*self.n_samples):
    Seems here you are doing some window selection based on best optical flow quality. Is this part necessary? Does it make big difference compared with random sampling a valid window inside the image region (without score comparison)?

Require for the image data of Zero

Hello,
Thanks for your great work!
However, when I try to download the images of zero datasets from image_url, sometimes I will come across a '404 error'. And it really takes a very long time to download from the website links.
So can you offer images of zero datasets?

Pipeline for Optical Flow

Hello,
I am quite curious about the implementation of optical flow data generation. My goal is to apply this process on another dataset. In the paper, under Section 3.3 Inference and training details > Ground-truth correspondences, the process is roughly described. Is the pipeline also available as open source?

Update: Following the citations in that section, I came across a confusing situation about the citations. In R2D2 paper, EpicFlow (cited as 40) and DeepMatching (cited as 41) are cited. In EpicFlow paper, the authors refer to another paper as DeepMatching (cited as 34 in the EpicFlow paper). Specifically:
Deepmatching paper mentioned in R2D2:
DeepMatching: Hierarchical Deformable Dense Matching
Deepmatching paper mentioned in EpicFlow:
DeepFlow: Large Displacement Optical Flow with Deep Matching
I am guessing that you wanted to cite Deepflow instead. :) I am yet to make a deep dive into those papers, maybe they are about the exact same thing, because all authors are the same. Just wanted to share this interesting situation :)

Best regards,
Kagan

Confusion about the different architectures model size

Hi there

In the Readme file there are details about some of the pretrained models. There is a column about the network size (# of parameters).

model name model size
(#weights)
number of
keypoints
MMA@3 on
HPatches
r2d2_WAF_N16.pt 0.5M 5K 0.686
r2d2_WASF_N16.pt 0.5M 5K 0.721
r2d2_WASF_N8_big.pt 1.0M 10K 0.692
faster2d2_WASF_N8_big.pt 1.0M 5K 0.650

When I check the code the number of parameters for the Fast-R2D2 is about 0.5M not 1.0M. Please clarify this.
thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.