Giter Site home page Giter Site logo

sample4geo's People

Contributors

konradhabel avatar skyy93 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

sample4geo's Issues

Training results on the university-1652 dataset

Thank you for your reply.
May I ask what weight file you are using for training? I used the weight file downloaded from Hugging Face, why is the final result AP value higher than the 91.39 mentioned in your paper?The following are the results of different weight files:
1.Model: convnext_base.fb_in22k_ft_in1k_384

{'input_size': (3, 224, 224), 'interpolation': 'bicubic', 'mean': (0.485, 0.456, 0.406), 'std': (0.229, 0.224, 0.225), 'crop_pct': 0.875, 'crop_mode': 'center'}
Start from: pretrained/university/convnext_base.fb_in22k_ft_in1k_384/weights_e1_0.9515.pth
GPUs available: 8

Image Size Query: (384, 384)
Image Size Ground: (384, 384)
Mean: (0.485, 0.456, 0.406)
Std: (0.229, 0.224, 0.225)

Query Images Test: 701
Gallery Images Test: 51355

------------------------------[University-1652]------------------------------
Extract Features:
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:04<00:00, 1.43it/s]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 402/402 [01:43<00:00, 3.87it/s]
Compute Scores:
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 701/701 [00:03<00:00, 193.97it/s]
Recall@1: 95.1498 - Recall@5: 97.0043 - Recall@10: 97.4322 - Recall@top1: 99.2867 - AP: 91.3901

2.Model: convnext_base.fb_in22k_ft_in1k_384

{'input_size': (3, 224, 224), 'interpolation': 'bicubic', 'mean': (0.485, 0.456, 0.406), 'std': (0.229, 0.224, 0.225), 'crop_pct': 0.875, 'crop_mode': 'center'}
Start from: /slurm-files/sgy/code/Sample4Geo/pretrained/university/convnext_base.fb_in22k_ft_in1k_384/weights_e1_0.9501.pth
GPUs available: 8

Image Size Query: (384, 384)
Image Size Ground: (384, 384)
Mean: (0.485, 0.456, 0.406)
Std: (0.229, 0.224, 0.225)

Query Images Test: 701
Gallery Images Test: 51355

------------------------------[University-1652]------------------------------
Extract Features:
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:11<00:00, 2.00s/it]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 402/402 [03:41<00:00, 1.82it/s]
Compute Scores:
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 701/701 [00:06<00:00, 104.41it/s]
Recall@1: 95.0071 - Recall@5: 96.7190 - Recall@10: 97.1469 - Recall@top1: 99.2867 - AP: 91.6377

Model performance about convnext_tiny model

Hello, thank you for your great work.. We tested your method using the 'convnext_tiny.fb_in22k_ft_in1k_384' model and default parameters, training for 40 epochs. However, the model's performance only reached around 40% mean average precision on the University-1652 dataset. This suggests a deviation from the experimental results reported in your paper. Is it possible that this is due to parameter issues, such as the default setting of epoch to 1 in the training code?

About CVUSA

I tried to use CVUSA after polar coordinate transformation for training, but the index was abnormally high. I don’t know why? This is very important to me, thank you!

About break_counter in dataset shuffle

Thanks for your implementation.
I would like to ask about break_counter. What does this variable mean? And why do different datasets have different number for ending the loop of shuffle for avoiding same id in one batch?
Thanks for your time.

How to get CVACT dataset

There is no link to the official CVACT dataset and the author's email is not working, can you share the CVACT dataset with me? Thank you very much!

About The InfoNCE Loss temperature parameter Setting of τ

Thank you very much for your paper and code, and I would like to ask you about The InfoNCE Loss
Does temperature parameter τ have an initial value? According to the thesis, it is a learnable parameter. How does it learn by itself

Training result error

Thank you for your wonderful work.
I encountered a problem during the training process, and as the epoch increased, re@1 The AP value actually decreased
This is the output result of my different epochs:
------------------------------[Epoch: 1]------------------------------
100%|████████████████████████████████████████████████████████████████████████| 295/295 [04:40<00:00, 1.05it/s, loss=0.8647, loss_avg=1.0110, lr=0.000999]
Epoch: 1, Train Loss = 1.011, Lr = 0.000999

------------------------------[Evaluate]------------------------------
Extract Features:
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:02<00:00, 2.08it/s]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 402/402 [01:32<00:00, 4.35it/s]
Compute Scores:
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 701/701 [00:03<00:00, 197.38it/s]
Recall@1: 93.5806 - Recall@5: 96.5763 - Recall@10: 97.1469 - Recall@top1: 99.2867 - AP: 89.7420

Shuffle Dataset:
42428it [00:00, 224071.11it/s]
Original Length: 37854 - Length after Shuffle: 37760
Break Counter: 512
Pairs left out of last batch to avoid creating noise: 94
First Element ID: 0945 - Last Element ID: 1646

------------------------------[Epoch: 25]------------------------------
100%|████████████████████████████████████████████████████████████████████████| 295/295 [04:45<00:00, 1.04it/s, loss=0.8206, loss_avg=0.8082, lr=0.000313]
Epoch: 25, Train Loss = 0.808, Lr = 0.000313

------------------------------[Evaluate]------------------------------
Extract Features:
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:02<00:00, 2.12it/s]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 402/402 [01:38<00:00, 4.10it/s]
Compute Scores:
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 701/701 [00:03<00:00, 197.02it/s]
Recall@1: 77.8887 - Recall@5: 85.8773 - Recall@10: 88.7304 - Recall@top1: 98.5735 - AP: 54.7721

Shuffle Dataset:
42376it [00:00, 223377.40it/s]
Original Length: 37854 - Length after Shuffle: 37760
Break Counter: 512
Pairs left out of last batch to avoid creating noise: 94
First Element ID: 0956 - Last Element ID: 1375

Geo-Localization for Drone Images Using Cross-View Image Embeddings

Thanks for your amazing work !

Problem
When the model extracts embeddings, the positional information of the image is not retained. This makes it impossible to pinpoint the exact position of the drone's target view based on the similarity of embeddings alone. The inability to retain positional data within the embeddings hampers the accuracy and reliability of the geo-localization process.

Steps to Reproduce

  • Use the model to calculate embeddings for cross-view images.

  • Compare the similarity between embeddings of different images.

  • Attempt to determine the exact position of the drone's target view based on the embeddings.

Expected Behavior
The model should retain the positional information within the embeddings, allowing for accurate determination of the drone's target view position based on the similarity of embeddings.

Actual Behavior
The positional information is lost during the embedding extraction process, making it impossible to accurately pinpoint the drone's target view position.

Possible Solution
Explore modifications to the embedding extraction process to retain positional information.
Investigate alternative architectures or additional components that can preserve positional data within the embeddings.
Consider combining the current embedding approach with a complementary method that retains positional information.

Additional Context
The model is implemented for geo-localization purposes in drone imagery.
The primary objective is to accurately determine the position of the drone's target view using image embeddings.

So any other optimized algorithm or approach to get the correct drone target view position using this model ?
Thanks !

Model performance with ConvNext-tiny

Thanks for your response! However, the current situation is that if I train using the default parameters you suggested, the performance of the University-1652 dataset network will be poor in the Epoch=1 phase when using a 4*3090GPU.

The problem we notice is that the loss does not decrease throughout the training process.
The only difference from the default code in training is that we downloaded the ConvNeXt-T model from https://github.com/facebookresearch/ConvNeXt fine-tuned on the ImageNet-1k dataset and load it locally via
model_state_dict = torch.load('. /pretrained/university/{}.pth'.format(config.model)) model.load_state_dict(model_state_dict, strict=False)

Originally posted by @MingkunLishigure in #1 (comment)

Get log file

Thank you for your excellent work. Can you provide the log files corresponding to different samplings? Including different data sets.

image

About drawing heat maps

Thank you for your great work and your excellent and responsible answers! But I have a question. Regarding the drawing of feature heat maps in your paper, can you provide the code?

inverse polar

Thanks for the great work, could you please give me the code you used to plot the inverse polar transformation in your final supplementary material?

image

find a mistake in code

Thanks to the author for the well-commented code. I find a mistake in code.
In the train_university.py (line94-line95):
config.query_folder_train = './data/U1652/train/satellite' config.gallery_folder_train = './data/U1652/train/drone'
which should change to :
config.query_folder_train = './data/U1652/train/drone' config.gallery_folder_train = './data/U1652/train/satellite'

Question about the memory

Hi, thank you very much for your excellent work and code. Your code structure is intuitive and effective and has inspired me a lot. However, I do not have that high-performance computer in my reproduction experiments, so I tried to reduce the batch size from 128 to 16 at the expense a liitle experimental results performance. However, during the evaluation phase of the VIGOR dataset, the program often gets killed due to lack of memory, I tried to buy an extra memory stick to increase my computer's memory to 32GB, and it still gets killed. So I would appreciate if you could share the size of the memory of your experimental equipment for reference.

In addition, I noticed the following settings in the training scripts :

neighbour_select: int = 64     # max selection size from pool
neighbour_range: int = 128     # pool size for selection

Based on your description in 3.4 of the paper, while reducing the batch size to 16, should I adjust neighbour_range to 16 and neighbour_select to 8?

How to get CVACT dataset

There is no link to the official CVACT dataset and the author's email is not working, can you share the CVACT dataset with me? Thank you very much!

Vit model

Thank you for your excellent work. I would like to ask what vit model you used in the ablation experiment?

Confuse

Thank you for your excellent work, but I found that when I used the polar coordinate transformed image to retrieve, it looked worse than directly using the original sat view to retrieve. This is not in line with intuition, because polar coordinate transformation seems to be beneficial in other networks. !

Accuracy of the cross-area mode in VIGORl

Hello! Thank you very much for sharing your research results! I have a question,in this paper, the accuracy of the VIGOR dataset in cross-area mode is very high. Is this due to the use of metric learning?

Are there any instructions?

Hello. Good to see your great works!

I wonder if there are any instructinos about packages,dependencies to be installed !

thankyou!

Training results on the university-1652 dataset

Thank you for your reply.
May I ask what weight file you are using for training? I used the weight file downloaded from Hugging Face, why is the final result AP value higher than the 91.39 mentioned in your paper?Here are my training results:
------------------------------[Epoch: 1]------------------------------
100%|████████████████████████████████████████████████████████████████████████| 295/295 [04:45<00:00, 1.03it/s, loss=0.8654, loss_avg=1.0052, lr=0.000000]
Epoch: 1, Train Loss = 1.005, Lr = 0.000000

------------------------------[Evaluate]------------------------------
Extract Features:
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:03<00:00, 1.94it/s]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 402/402 [01:43<00:00, 3.90it/s]
Compute Scores:
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 701/701 [00:03<00:00, 194.62it/s]
Recall@1: 95.0071 - Recall@5: 96.7190 - Recall@10: 97.1469 - Recall@top1: 99.2867 - AP: 91.6379

Shuffle Dataset:
42428it [00:00, 191806.88it/s]
Original Length: 37854 - Length after Shuffle: 37760
Break Counter: 512
Pairs left out of last batch to avoid creating noise: 94
First Element ID: 0945 - Last Element ID: 1646

Results in CVACT_val.

Hi @Skyy93 ,

I want to report your ``Random" results of CVACT_val in my paper. But I cannot find that. Can you tell me?

Best,
Guopeng.

test_160k.py

Hi, I really appreciate your method employed in this competition. It appears that your novel approach also places emphasis on the university_160k dataset. Could u pls upload test_160k.py? Thx a lot:)

sharing weights

Thanks for the great work, have you tried ConvNext-B without sharing weights?

Missing implementation of DSS in University1562

Thanks for your interesting work.
However, I can not find the implementation of Dynamic similarity sampling in the code, where there is only one custom sampler to prevent the occurrence of multiple images from the same class.
Does it mean that there is no special sampling method for University1562?

get train script

Thank you for your excellent work. Can you provide the train script?

The training process of university-1652

May I ask what initial weight file is used to train the university-1652 dataset? Why do I use the hugging face's convnext_mase.fb_122k_ft_11k_384/pytocht_madel.bin to train AP values re@1 Getting lower and lower

The training process of university-1652

May I ask what initial weight file is used to train the university-1652 dataset? Why do I use the hugging face's convnext_mase.fb_122k_ft_11k_384/pytocht_madel.bin to train AP values re@1 Getting lower and lower?

Training result error

Thank you for your reply.
Does this mean that the epoch should be set to 1 during training? Will it not cause underfitting?

GPS smapling

Thank you for your excellent work, but I have some doubts. You mentioned in the ablation section of the paper that CVUSA can also use GPS sampling, but the data set does not seem to provide latitude and longitude.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.