clovaai / generative-evaluation-prdc Goto Github PK

Code base for the precision, recall, density, and coverage metrics for generative models. ICML 2020.

License: MIT License

Python 100.00%

deep-learning generative-adversarial-network evaluation-metrics precision recall machine-learning generative-model fidelity diversity evaluation icml icml-2020 icml2020

generative-evaluation-prdc's Issues

About a specific random embedding extraction method

When extracting random embeddings, does use a randomly initialized vgg16 mean using an untrained model?

Or does it mean to train and select vgg16 up to a certain threshold as in the deep prior image? If so, how did you set the threshold for that threshold?

May I know the specific code you used or the code you referenced?

[Feature Request] Command line interface

Could you please add a command-line interface with image directory as argument?

ex)

Inception Score
FID score
-> ./fid_score.py path/to/dataset1 path/to/dataset2

using exact similarity search

Thank you for providing the coding and implementation for density and coverage. It is awesome to have the code ready for use in practice. I cite the paper whenever possible.

I took the liberty to research possible improvements. I found that using an exact similarity search as offered by faiss can speed up the calculation of density and coverage by a great deal.

Here are my results for num_real_samples = num_fake_samples = 1024, feature_dim = 12, nearest_k = 5:

--------------------------------------------------------------------------------------- benchmark: 4 tests ---------------------------------------------------------------------------------------
Name (time in ms)                 Min                 Max                Mean             StdDev              Median                IQR            Outliers      OPS            Rounds  Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_bench_my_coverage        10.7225 (1.0)       53.4196 (1.09)      13.9665 (1.0)       8.7931 (1.0)       11.1186 (1.0)       1.4884 (1.0)           1;4  71.5998 (1.0)          24           1
test_bench_my_density         11.9193 (1.11)      49.0908 (1.0)       16.7892 (1.20)      9.0503 (1.03)      12.8918 (1.16)      3.9669 (2.67)          2;3  59.5619 (0.83)         18           1
test_bench_prdc_coverage     316.7985 (29.55)    400.5574 (8.16)     354.7417 (25.40)    31.7475 (3.61)     355.2325 (31.95)    43.6299 (29.31)         2;0   2.8190 (0.04)          5           1
test_bench_prdc_density      365.5958 (34.10)    400.6876 (8.16)     382.3611 (27.38)    12.6120 (1.43)     380.4541 (34.22)    12.9641 (8.71)          2;0   2.6153 (0.04)          5           1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

I tagged, any algorithm using faiss with test_bench_my. Using a similarity tree approach, this line

 real_nearest_neighbour_distances = compute_nearest_neighbour_distances(
        real_features, nearest_k)

in the original code is accelerated big time due to the efficient lookup of samples with the tree structure.

As such a change would drag in a dependency to faiss, I am reluctant to send a PR to this repo. Let me know what you think!

How do I use my own image dataset to run your code?

Hello, your research is very significant to my work.
What I want to ask is, how do I go about running this code to test my images? I have already prepared the original image and the generated image, but I am not quite sure how to run your code program? Can you tell me how to run the code when using my own image dataset?

Is it like this?
run prdc . /generate-images ./real-images
or compute_prdc . /generate-images . /real-images

When the number of real samples is smaller than 10K, does the metric still produce a reliable score?

Hi, thanks for sharing the great work!
I would like to use this metric to evaluate the results in the image-to-image translation task. However, in I2I datasets, the number of real samples are always less than 10K, and most of them are around 1k. In this case, does the metric still produce a reliable score? Can I directly use this metric?
Looking forward to your reply, thanks a lot!

Feature extraction to obtain vectors

Dear Sir,
it seems that in your work there aren't tools for getting the vectors of features from images. Thus, have you got any advice for me to obtain these features vectors from my real-fake images dataset?
Thank you so much!

Dummy example gives non intuitive result

Hello and thank you for this great paper and implementation.

I've run your method with a dummy example:

    fake_features = torch.ones((1024, 4096))
    real_features = torch.ones((1024, 4096))

and would expect 1.0 for both density and coverage but actually got 0.0

There are two changes to the Density metric that might help

Add less or equal
(distance_real_fake < np.expand_dims(real_nearest_neighbour_distances, axis=1)
=>
(distance_real_fake <= np.expand_dims(real_nearest_neighbour_distances, axis=1)
Add clamp with self.neareset_k and enforce [0,1] result
(distance_real_fake <= real_nearest_neighbour_distances.unsqueeze(1)).sum(dim=0) => (distance_real_fake <= real_nearest_neighbour_distances.unsqueeze(1)).sum(dim=0).clamp(0, self.nearest_k)

Is it make sense or do I miss something?

Density much larger than 1

Hi! I understand that the density metric is not upper-bounded by 1 and the expectation of density given two identical distributions is 1. However, when I evaluate the density for StyleGAN2 trained on FFHQ, the density is much larger than 1. For the pre-trained StyleGAN2-F, the density is around 1.12. For a fine-tuned StyleGAN2 that obtains higher precision, the density goes up to around 1.5 . Is this an ill behavior of the density metric?

Thanks in advance!

FID-infinite comparison and discussion

Thanks for the interesting work!

This is not an issue regarding your source code but just a comment to your study.

I believe it would be fair to discuss and cite the recent work by Min Jin Chong and David Forsyth Effectively Unbiased FID and Inception Score and where to find them
where they propose unbiased drop-in replacements for FID scores.

clovaai / generative-evaluation-prdc Goto Github PK

generative-evaluation-prdc's Issues

About a specific random embedding extraction method

[Feature Request] Command line interface

using exact similarity search

How do I use my own image dataset to run your code?

When the number of real samples is smaller than 10K, does the metric still produce a reliable score?

Feature extraction to obtain vectors

Dummy example gives non intuitive result

Density much larger than 1

FID-infinite comparison and discussion

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent