Comments (8)
Hi, I guess your results are obtained from YoucookII. If you run on the youcookii_data.no_transcript.pickle
directly, the scores you got are right under the condition of your hyperparameters. Our best scores are generated with the transcript. The youcookii_data.no_transcript.pickle
is a version without the transcript. Read the readme file to get the below information:
If using video only as input (youcookii_data.no_transcript.pickle),
The results are close to
BLEU_1: 0.3921, BLEU_2: 0.2522, BLEU_3: 0.1655, BLEU_4: 0.1117
METEOR: 0.1769, ROUGE_L: 0.4049, CIDEr: 1.2725
You should compare your scores with the third line from the bottom in Table 3 in our paper.
from univl.
Hi, I guess your results are obtained from YoucookII. If you run on the
youcookii_data.no_transcript.pickle
directly, the scores you got are right under the condition of your hyperparameters. Our best scores are generated with the transcript. Theyoucookii_data.no_transcript.pickle
is a version without the transcript. Read the readme file to get the below information:If using video only as input (youcookii_data.no_transcript.pickle),
The results are close to
BLEU_1: 0.3921, BLEU_2: 0.2522, BLEU_3: 0.1655, BLEU_4: 0.1117
METEOR: 0.1769, ROUGE_L: 0.4049, CIDEr: 1.2725You should compare your scores with the third line from the bottom in Table 3 in our paper.
Hi, @ArrowLuo , thanks for your information. In fact, in Table 3 of the paper, the scores for single V as input are: B-3: 16.46, B_4: 11.17, M: 17.57, R-L: 40.09, CIDEr: 1.27. It is much larger than these: BLEU_1: 0.3921, BLEU_2: 0.2522, BLEU_3: 0.1655, BLEU_4: 0.1117, METEOR: 0.1769, ROUGE_L: 0.4049, CIDEr: 1.2725. So may I know is the above scores obtained by better hyper-parameter setting or something else?
from univl.
Sorry for the confusion. These metrics are printed with real value and reported with percentages in the paper (except CIDEr
). So your scores are right.
from univl.
Sorry for the confusion. These metrics are printed with real value and reported with percentages in the paper (except
CIDEr
). So your scores are right.
Hi, @ArrowLuo , so the metrics printed by program is correct, so may I know is all the metrics in Table 3 in paper are normalized? meaning metrics for all the models involved in table. And how do you normalize it? Because I am lack of sense on the performance by these pre-normalizaed metrics: BLEU_1: 0.3921, BLEU_2: 0.2522, BLEU_3: 0.1655, BLEU_4: 0.1117, METEOR: 0.1769, ROUGE_L: 0.4049 , sicne I desire to make a comparison with those in paper. Many thanks~
from univl.
Not normalization operation. Just need to multiply 100 on these metrics (except for CIDEr).
For example, the print is:
BLEU_1: 0.3921, BLEU_2: 0.2522, BLEU_3: 0.1655, BLEU_4: 0.1117, METEOR: 0.1769, ROUGE_L: 0.4049
the scores after multiplying 100:
BLEU_1: 39.21, BLEU_2: 25.22, BLEU_3: 16.55, BLEU_4: 11.17, METEOR: 17.69, ROUGE_L: 40.49
from univl.
Not normalization operation. Just need to multiply 100 on these metrics (except for CIDEr).
For example, the print is:BLEU_1: 0.3921, BLEU_2: 0.2522, BLEU_3: 0.1655, BLEU_4: 0.1117, METEOR: 0.1769, ROUGE_L: 0.4049
the scores after multiplying 100:
BLEU_1: 39.21, BLEU_2: 25.22, BLEU_3: 16.55, BLEU_4: 11.17, METEOR: 17.69, ROUGE_L: 40.49
Hi, @ArrowLuo , many thanks! got it. However, I found there is a mismatch of metrics scores in Univl paper with E2E masked transformer for dense video captioning original paper. In their paper, their scores are:
Method GT Proposals Learned Proposals
B4 M B4 M
Bi-LSTM +TempoAttn 0.87 8.15 0.08 4.62
Our Method 1.42 11.20 0.30 6.58
While in Univl paper, Table 3, B4 : 4.38, M:11.55 for E2E masked transformer. Is this result obtained by your experiments based on their released model, and utilizing ground-truth proposals during inference?
from univl.
These part of baseline results are copied from Table 4 in https://arxiv.org/pdf/1906.05743.pdf. I notice that it is indeed different from the original paper.
from univl.
These part of baseline results are copied from Table 4 in https://arxiv.org/pdf/1906.05743.pdf. I notice that it is indeed different from the original paper.
Ok, I see, thanks!
from univl.
Related Issues (20)
- How to fine-tune with additional layers before UniVL? HOT 2
- Run Without Distributed HOT 3
- TypeError: bad operand type for unary -: 'list' HOT 6
- How to run captioning task on my own video datasets? HOT 1
- Pre-training acceleration using multi-machine distributed training HOT 1
- Can you share your HowTo100M.csv file? HOT 3
- This repo is missing important files HOT 1
- Unable to run video captioning code HOT 3
- where to get transcript to generate youcookii_data.pickle HOT 2
- end-to-end video file captioning process HOT 3
- feature & data shape HOT 6
- How can I create my video feature pickle HOT 4
- video only test for youcook HOT 2
- How to only input text feature or video feature HOT 2
- Is there a code for Finetune on CMU-MOSI here? HOT 1
- Issues about Freezing some additional layers instead of meanP in CLIP4Clip HOT 2
- Error message (torch.distributed.elastic.multiprocessing.errors.ChildFailedError:)
- Estimate of zero-shot performance HOT 1
- Zero score (every output is None) on evaluation captioning with pretrained model HOT 1
- Non-Configurable GPU Count via Arguments
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from univl.