tsattler / geometric_burstiness Goto Github PK
View Code? Open in Web Editor NEWLicense: BSD 3-Clause "New" or "Revised" License
License: BSD 3-Clause "New" or "Revised" License
@tsattler Is there any visual vocabulary ready to use available, or I must train it myself?
Hi, @tsattler , I met the same problem as #2 metioned. The images of the database are captured uniformly on the trajectory of the camera inside the room. So the adjacent ones look the same. Does it matter? I trained the vocabulary with the dataset ANN_SIFT1M(http://corpus-texmex.irisa.fr/), and both database images and query images are extracted with hesaff. Anything Wrong?
Here is the log
Loading the inverted index
Index loaded
Weights loaded
Found 2 query files
Found 836 db images in image_db.txt
Query 0 : ./q/000230.jpg.bin
Prepared 1287 query descriptors
Computing the self-similarity for the query image from 1287 query descriptors
Self-similarity: 26333.7
Normalization weight 0.00616232
Query found 836 potentially relevant database images
Score of most relevant image 0.0912435 (image 726) with 4322 matches
Starting spatial verification
In function DetermineDBImageSelfSimilarities() of file inverted_file.h, the self-similarity of database images is computed to use as the normalization factor for final similarity score between query and database images.
However, it seems that the computation of self-similarity is simply to accumulate the square of idf weight of each visual word. The detail code is given as follow.
for (int i = 0; i < num_entries; ++i) {
current_image_id = entries_[i].image_id;
score_ref[current_image_id] += idf_squared;
}
However, the correlation of same visual word in a database image is ignored, which violate the original definition of self-similarity that the similarity score between the same image should be 1, as illustrated in ACCV 2014 Disloc paper.
On the contrast, we modify the computation of self-similarity as follow.
int num_score = score_ref.size();
std::vector<double> num_vw(num_score,0.0);
for (int i = 0; i < num_entries; ++i) {
current_image_id = entries_[i].image_id;
num_vw[current_image_id] += 1;
}
for (int i = 0; i < num_score; ++i) {
score_ref[i] += num_vw[i]*num_vw[i]*idf_squared;
}
I have found the modification can improve the recall when I conduct the experiment on Pittsburgh 250k dataset. The recall@1 is improved from 0.508 to 0.527 without spatial verification step.
I 'm not sure whether this is a bug or the original computation in the public code is the right definition of self-similarity.
1, Could you tell me the detail recall, such as recall@1, after initial retrieval without spatial verification in your implementation?
I have evaluated two different feature extraction approaches, the one from heasff (https://github.com/perdoch/hesaff), and another one from VGG with low cornerness threshold (http://www.robots.ox.ac.uk/~vgg/research/affine/detectors.html#binaries).
The latter one with threshold 100 will extract about 260M local features, and the former can extract about 218M locat features.
However the recall of both feature extraction step is hard to improve to the result reported in ACCV2014 and your paper on both place recognition datasets.
2, Could you tell me how you transfer the jpg image to ppm image? I use the jpegtopnm command in Linux.
3, If you have time, could you give the result of feature extraction of the first image in Pittsburgh dataset, i.e., the imgname.hesaff file? I want to verify if I have extracted enough features.
Thanks!
Thank you for sharing your source code. It is really helpful to promote CBIR research community.
From disclaimer you mentioned that this source code can be differ from the codes used for the paper publication. However, I observe huge difference.
The reported performance on Oxf105k and Par106k follows below:
retrieval_rank | inliers | eff.inliers | inter-image | inter-place | inter-place+pop | |
---|---|---|---|---|---|---|
oxf105k(mAP) | - | 0.710 | 0.730 | 0.708 | 0.735 | 0.745 |
Par106k(mAP) | - | 0.613 | 0.619 | 0.611 | 0.649 | 0.682 |
I tried to get results on Oxford 5k with 200k vocabulary trained on Paris 6k.
I got results which is weird for two reasons.
retrieval_rank | inliers | eff.inliers | inter-image | inter-place | inter-place+pop | |
---|---|---|---|---|---|---|
oxf5k(mAP) | 0.700 | 0.679 | 0.682 | 0.674 | x | x |
Maybe, I made some obvious mistake. I assume you had run a lot of experiment. If you have any clues, please give me a hint.
@tsattler Hi tsattler. When running compute_word_assignments
, the input requires a visual vocabulary. So to compute word assignments, I need to write all the 128-D rootsift descriptors extracted from images to a text file (i.e. vocabulary). Did I understand it right?
@tsattler Hey tsattler, I just wonder how long should a query take? I collected 340 images as db and used a 20K vocabulary from Inria holidays and I modified the parameter num_words to 20K in the computing_hamming_thresholds.cc.
As I set the nearest words num to 1 ( as asked in the README ) in every step, I simoutaneously modified the kNumNNWords to 1 in the query.cc. However , when I started in this condition, it has taken too long to run, the result file was updated every other several minutes. So, is there anything wrong about my settings?
My respected Dr Sattler:
I'm trying to evaluate the code with the Pittsburgh dataset .
I have select 20k images from the dataset and extracted Hessian affine sift features (using the code from CUVT, Perdoch, sift_type=2) to train the 200k codebook, which have about 17M features.
Finally, I only achieve the recall@1 = 0.47, which is lower than the Disloc reported (about 0.55)
Could you make the codebook public ?
If I have detect_points.ln and compute_descriptors.ln for extracting local features, such as the software provided in http://www.robots.ox.ac.uk/~vgg/research/affine/detectors.html#binaries. How to extract the local feature?
Could you give use the demo shell script to extract local features of an image in your experiments?
I want to verify if I correctly extract the local feature.
Thanks in advance!
For the function ReRankingInterPlaceGeometricBurstiness in the ranking_schemes.h, the purpose of lines of 255-234 makes me confused.
When a new place is found, the other image will be update the place information based on the previous min distance and the new distance between the image and the place.
However, the distance is still computed as the distance from the first image.
Is this right?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.