Giter Site home page Giter Site logo

Comments (4)

TheProjectsGuy avatar TheProjectsGuy commented on May 27, 2024

Hey @euncheolChoi
Thanks for taking an interest in our work. I'll first review the vocabulary terminology, then answer your two questions.

Vocabulary Terminology

Vocabulary is a "superset" of images we use to build database descriptors (for aggregation technique VLAD). By superset, I mean a collection of datasets. We experiment with the following vocabulary types:

  • Global: where we include the database images from all the datasets (in Table III and IV).
  • Structured: where we include the database images from only the structured datasets (in Table III)
  • Unstructured: where we include the database images from only the unstructured datasets (in Table IV)
  • Map-Specific: where we include the database images from only the single dataset (on which we're testing). For example, if we're trying on Oxford, we "train" VLAD with only images from the Oxford database set (no other dataset). This can be read as "dataset-specific" as well.
  • Domain-Specific: A "domain" is a collection of datasets with similar properties. We find this through tSNE and PCA projections (Figure 1, for example). We color-coded these domains in the paper. For example, the "Urban" domain comprises the Pitts-30k, St. Lucia, and Oxford datasets; the "Aerial" domain contains the Nardo Air datasets and the VP-Air dataset. In the paper, we project the GeM-pooled descriptors.

Unless specified otherwise, the results in Table III and IV are using the domain-specific vocabularies. We found these to give the best performance (see Table V).

This is further described in the sections III.D, V.B, and A2 of our paper on arxiv.

Getting Vocabularies

How can i get vocabularies which is made by "Anyloc method"?

You could do it in either of the two methods. I suggest the first one.

Method 1: Use the torch.hub model

This is still in beta but should suit your needs for benchmarking. You can refer to issue #11 for more on this. A simple tutorial is as follows

# Load model
import torch
model = torch.hub.load("AnyLoc/DINO", "get_vlad_model", 
        backbone="DINOv2", device="cuda")
# Images
img = torch.rand(1, 3, 224, 224)
# Result: VLAD descriptors of shape [1, 49152]
res = model(img)
# Also supports batching
img = torch.rand(16, 3, 224, 224)
# Result: VLAD descriptors of shape [16, 49152]
res = model(img)
# More help
print(torch.hub.list("AnyLoc/DINO"))
r = torch.hub.help("AnyLoc/DINO", "get_vlad_model")
print(r)

The default is the indoor domain (since we show major improvement there, and it's from the more widely available structured set). However, you can use the aerial domain by loading

model = torch.hub.load("AnyLoc/DINO", "get_vlad_model", 
        backbone="DINOv2", domain="aerial", device="cuda")

The above will load the model we used for the aerial dataset columns (Nardo Air and VP-Air) in Table IV.
This method works for our paper's indoor, urban, and aerial domains.

Method 2: Using the repo's codebase

I think you should use this only if you're going to replicate the results and not use this for benchmarking, etc. (because the repo setup is a little more tedious and requires setting up a container).

  1. Set up the repository and the datasets as described here

  2. Create the cluster centers using dino_v2_global_vocab_vlad.py. Call to the script is documented in dino_v2_global_vocab_vlad_ablations.sh. This script creates a GlobalVLADVocabularyDataset class and loads the datasets accordingly. Also note that you must run the scripts from the repo's home folder (instead of cding in the ./scripts folder) - this is to find the other files and utilities in the repo. See this set of arguments to get a more accurate picture of which datasets are used for which vocabularies

    if [ "$global_vocab" == "indoor" ]; then
    python_cmd+=" --db-samples.baidu-datasets 1"
    python_cmd+=" --db-samples.gardens 1"
    python_cmd+=" --db-samples.17places 1"
    elif [ "$global_vocab" == "urban" ]; then
    python_cmd+=" --db-samples.Oxford 1"
    python_cmd+=" --db-samples.st-lucia 1"
    python_cmd+=" --db-samples.pitts30k 4"
    elif [ "$global_vocab" == "aerial" ]; then
    python_cmd+=" --db-samples.Tartan-GNSS-test-rotated 1"
    python_cmd+=" --db-samples.Tartan-GNSS-test-notrotated 1"
    python_cmd+=" --db-samples.VPAir 2"
    elif [ "$global_vocab" == "hawkins" ]; then
    python_cmd+=" --db-samples.hawkins 1"
    elif [ "$global_vocab" == "laurel_caverns" ]; then
    python_cmd+=" --db-samples.laurel-caverns 1"
    elif [ "$global_vocab" == "structured" ]; then
    python_cmd+=" --db-samples.Oxford 1"
    python_cmd+=" --db-samples.gardens 1"
    python_cmd+=" --db-samples.17places 1"
    python_cmd+=" --db-samples.baidu-datasets 1"
    python_cmd+=" --db-samples.st-lucia 1"
    python_cmd+=" --db-samples.pitts30k 4"
    elif [ "$global_vocab" == "unstructured" ]; then
    python_cmd+=" --db-samples.Tartan-GNSS-test-rotated 1"
    python_cmd+=" --db-samples.Tartan-GNSS-test-notrotated 1"
    python_cmd+=" --db-samples.hawkins 1"
    python_cmd+=" --db-samples.laurel-caverns 1"
    python_cmd+=" --db-samples.eiffel 1"
    python_cmd+=" --db-samples.VPAir 2"
    elif [ "$global_vocab" == "both" ]; then # Global vocabulary
    # Structured
    python_cmd+=" --db-samples.Oxford 1"
    python_cmd+=" --db-samples.gardens 1"
    python_cmd+=" --db-samples.17places 1"
    python_cmd+=" --db-samples.baidu-datasets 1"
    python_cmd+=" --db-samples.st-lucia 1"
    python_cmd+=" --db-samples.pitts30k 4"
    # Unstructured
    python_cmd+=" --db-samples.Tartan-GNSS-test-rotated 1"
    python_cmd+=" --db-samples.Tartan-GNSS-test-notrotated 1"
    python_cmd+=" --db-samples.hawkins 1"
    python_cmd+=" --db-samples.laurel-caverns 1"
    python_cmd+=" --db-samples.eiffel 1"
    python_cmd+=" --db-samples.VPAir 2"
    else
    echo "Invalid global vocab!"
    exit 1
    fi

  3. When running the above script, you might want to "test" it on a small dataset (the bulk of the time will be spent on getting and caching cluster centers). You can later call with the same arguments to test on other datasets from the same vocabulary. The cluster centers will be loaded from the cache (if you're using it).

If you do not want to run step 2 (due to data or compute constraints), we also release the cluster centers in the public material. You'll have to download and unzip the Colab1/cache.zip file and navigate to ./cache/vocabulary/dinov2_vitg14/l31_value_c32 folder. You'll find a c_centers.pt file in each of the listed vocabulary folders. You could also get these from the torch.hub release.

You can visualize the tSNE and PCA clusters by downloading datasets and using the scripts dino_v2_datasets_gem_tsne_clustering.py and dino_v2_datasets_gem_pca_clustering.py respectively. See this for other methods/scripts.

Vocabulary used in Table 4

In Table 4 of the paper, the recall values are calculated for each dataset. For Anyloc-VLAD-DINOv2, I am curious to know which Vocabulary was used to obtain each of these results. In particular, I am interested in the results for the dataset in the Aerial domain.

The Hawkins, Laurel Cavers, and Mid-Atlantic Ridge results are using the respective datasets alone. We used map-specific vocabularies here (because we only tested with one dataset from their respective domains). If, for example, we want to include another underwater dataset (let's say DB-A), we would get the cluster centers using DB-A and Mid-Atlantic as these two datasets would belong to the "underwater" domain. Some code changes (specifically in step 2 of the method 2 above) will be needed for this to work. You'll also need to write your Dataset in the custom_datasets folder (for this new DB-A dataset).
Note that though Hawkins (degraded) and Laurel Caverns (sub-terranean) have similar imagery types (camera properties and lack of distinct features in images), they are projected at different places in Figure 1.

The results for Nardo-Air, Nardo-Air R, and VP-Air is using the "aerial" domain. All the database images from these datasets are used (actually, every second image from VP-Air to avoid memory OOM and to counter the class imbalance of VP-Air). You can verify this here

elif [ "$global_vocab" == "aerial" ]; then
python_cmd+=" --db-samples.Tartan-GNSS-test-rotated 1"
python_cmd+=" --db-samples.Tartan-GNSS-test-notrotated 1"
python_cmd+=" --db-samples.VPAir 2"
elif [ "$global_vocab" == "hawkins" ]; then
python_cmd+=" --db-samples.hawkins 1"
elif [ "$global_vocab" == "laurel_caverns" ]; then
python_cmd+=" --db-samples.laurel-caverns 1"

Additionally, to get the results of Anyloc-VLAD-DINOv2 after creating a cluster, can I use anyloc_vlad_generate.py?

After you get the cluster centers (assuming you're following the method 2 above), you'll have to change the c_centers_file to this new .pt file here

c_centers_file = os.path.join(cache_dir, "vocabulary",
ext_specifier, domain, "c_centers.pt")

Assuming that you've configured the dataset directory, it should work fine. However, I recommend that you directly use method 1 as it doesn't require you to set up any repository for it to work.

from anyloc.

euncheolChoi avatar euncheolChoi commented on May 27, 2024

Thank you so much for the detailed guidelines. Of the methods you mentioned, I am using
model = torch.hub.load("AnyLoc/DINO", "get_vlad_model", backbone="DINOv2", domain="aerial", device="cuda")
to load the model and try to extract the descriptors. However, I am getting the following error while loading the model. Is this a temporary error? If you have a solution to this, I would appreciate it if you could share it.

------------ Dataset loaded ------------
------- Generating global descriptors -------
Using model : Anyloc-dino-VLAD_aerial_domain
Using cache found in /root/.cache/torch/hub/AnyLoc_DINO_main
Exception: invalid syntax (, line 1)
[ERROR]: Exit is not safe
Traceback (most recent call last):
File "/root/workspace/aerial_pr/Anyloc_docker_aerial/AnyLoc/aeria_scripts/dino_v2_global_vpr.py", line 327, in
main(largs)
File "/opt/conda/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/root/workspace/aerial_pr/Anyloc_docker_aerial/AnyLoc/aeria_scripts/dino_v2_global_vpr.py", line 239, in main
db_descs, qu_descs = build_cache(largs, gpr_ds)
File "/opt/conda/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/root/workspace/aerial_pr/Anyloc_docker_aerial/AnyLoc/aeria_scripts/dino_v2_global_vpr.py", line 129, in build_cache
anyloc = torch.hub.load("AnyLoc/DINO", "get_vlad_model", backbone="DINOv2", device="cuda")
File "/opt/conda/lib/python3.7/site-packages/torch/hub.py", line 540, in load
model = _load_local(repo_or_dir, model, *args, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/torch/hub.py", line 566, in _load_local
hub_module = _import_module(MODULE_HUBCONF, hubconf_path)
File "/opt/conda/lib/python3.7/site-packages/torch/hub.py", line 89, in _import_module
spec.loader.exec_module(module)
File "", line 724, in exec_module
File "", line 860, in get_code
File "", line 791, in source_to_code
File "", line 219, in _call_with_frames_removed
File "", line 1
(backbone = )

from anyloc.

TheProjectsGuy avatar TheProjectsGuy commented on May 27, 2024

Hey @euncheolChoi,

We didn't experience this error before. Are you trying this from a clean install? Make sure your torch hub directory - usually ~/.cache/torch/hub - doesn't have anything before trying this out. Also ensure that you've sourced your conda environment with python and pytorch installed.

# Install dependencies
conda install -c conda-forge einops
pip install fast_pytorch_kmeans

And you can run the following

import torch
# Load the model (this will download the vocabulary and the ViT model from torch.hub)
model = torch.hub.load("AnyLoc/DINO", "get_vlad_model", backbone="DINOv2", domain="aerial", device="cuda")
# Your images here
img = torch.rand(16, 3, 224, 224)
# Global descriptors
res = model(img)    # (10, 49152) dim

Are you following the same steps as above?

I get the following when I run this

Python 3.10.13 | packaged by conda-forge | (main, Dec 23 2023, 15:36:39) [GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> model = torch.hub.load("AnyLoc/DINO", "get_vlad_model", backbone="DINOv2", domain="aerial", device="cuda")
Using cache found in /home/avneesh/.cache/torch/hub/AnyLoc_DINO_main
Storing (torch.hub) cache in: /home/avneesh/.cache/torch/hub/checkpoints/anyloc_files
100.0%
Downloading: "https://github.com/facebookresearch/dinov2/zipball/main" to /home/avneesh/.cache/torch/hub/main.zip
/home/avneesh/.cache/torch/hub/facebookresearch_dinov2_main/dinov2/layers/swiglu_ffn.py:51: UserWarning: xFormers is not available (SwiGLU)
  warnings.warn("xFormers is not available (SwiGLU)")
/home/avneesh/.cache/torch/hub/facebookresearch_dinov2_main/dinov2/layers/attention.py:33: UserWarning: xFormers is not available (Attention)
  warnings.warn("xFormers is not available (Attention)")
/home/avneesh/.cache/torch/hub/facebookresearch_dinov2_main/dinov2/layers/block.py:40: UserWarning: xFormers is not available (Block)
  warnings.warn("xFormers is not available (Block)")
Downloading: "https://dl.fbaipublicfiles.com/dinov2/dinov2_vitg14/dinov2_vitg14_pretrain.pth" to /home/avneesh/.cache/torch/hub/checkpoints/dinov2_vitg14_pretrain.pth
100.0%
VLAD caching is disabled.
Desc dim set to 1536
>>> img = torch.rand(16, 3, 224, 224)
>>> res = model(img)
>>> res.shape
torch.Size([16, 49152])

This takes about 5.3 GB on the GPU.

from anyloc.

euncheolChoi avatar euncheolChoi commented on May 27, 2024

Oh it was just my environment issue. Now i can get result. Thank you.

from anyloc.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.