shannonai / openvidial Goto Github PK
View Code? Open in Web Editor NEWCode, Models and Datasets for OpenViDial Dataset
Code, Models and Datasets for OpenViDial Dataset
The valid dataset and test dataset saved in Google Drive cannot be downloaded. Can you upload a compressed valid data and test data?
Hello, I am running out to report this error, I haven't lack this file, I downloaded one online, but the dimension is wrong, please ask how I want to solve it, thank you.
Thank you for the excellent works! I am trying to extract features from my own dataset, but it seems that the model and config in the feature_extract readme are mismatched. I can't load them in the run_rcnn.py.
cd data
wget https://dl.fbaipublicfiles.com/vilbert-multi-task/detectron_model.pth
wget https://dl.fbaipublicfiles.com/vilbert-multi-task/detectron_config.yaml
Iโm trying to extract rcnn features by myself using provided 'run_rcnn.py' script, however the missing module 'rcnn' is required in line 38.
from rcnn.dataset import get_dataloader
So where can I find the get_dataloader function ?
THX
Hi. When I try to reproduce the part of mmi. I could not find train/valid/test.src.jsonl, so i could not reproduce it. Could you please tell me where it is ? Thanks a lot
Hi,
What is the episode stand for in your dataset paper? I can't find any instruction about this.
Thank you a lot.
hi, how did you separate these subs? by time ? turn? or random? thx
When I reproduce this baseline,the first error is filenotfound: cannot find the file /preprocessed_data/dict.txt. How can I get it?
Could you provide the names of the TV series or/and the movies being used to construct the dataset?
thx.
We note that the provided training set 10.zip contains only 4 images, is this correct? Because we found that the total training set is less than 170G.
Hi, I failed to find the label of each object in FV, I want to know if they have been saved or need to be extracted. Thanks a lot.
As most of the compressed files of OpenViDial 2.0 are more than 100GB, could you please split them into smaller ones for a better downloading stability?
I run your code of NV and CV model. The BLEU-4 is 1.21 and 1.22 respectively.
Then I use grep ^D gen.out | cut -f3- > sys.txt to get the sys.txt.
But the performance is poor.
=====Stats of /deepo_data/sys_NV.txt=====
Diversity-1: 0.0028171826554375134
Diversity-2: 0.012234149152032867
Diversity-3: 0.02608896729461698
Diversity-4: 0.04205556064912441
StopWords%: 0.5369782208034367; StopWords/Sent: 3.8842692900782727
AvgLength: 7.233569518455623
=====Stats of /deepo_data/sys_CV.txt=====
Diversity-1: 0.0029348115275008298
Diversity-2: 0.012985147977712393
Diversity-3: 0.027433267080460996
Diversity-4: 0.04410504609184471
StopWords%: 0.5485848448526971; StopWords/Sent: 3.9040424742831488
AvgLength: 7.116570045480276
The stopwords seem normal. But the diversity performances pool. The line of sys_NV and sys_CV are both 51231.
sys_CV.txt
2021-03-20 12:27:30 | INFO | fairseq.utils | CUDA enviroments for all 4 workers
2021-03-20 12:27:30 | INFO | fairseq_cli.train | training on 4 devices (GPUs/TPUs)
2021-03-20 12:27:30 | INFO | fairseq_cli.train | max tokens per GPU = 8000 and max sentences per GPU = 32
2021-03-20 12:27:30 | INFO | fairseq.trainer | no existing checkpoint found train_logs/reproduce_img_object/layer3_lr2e-4_bsz128_drop0.3_warmup6000/checkpoint_last.pt
2021-03-20 12:27:30 | INFO | fairseq.trainer | loading train data for epoch 1
2021-03-20 12:27:30 | INFO | video_dialogue_model.data.object_dataset | find minimum truncate of preprocessed_data_dir-train: 0
2021-03-20 12:38:39 | INFO | fairseq.data.data_utils | loaded 974803 examples from: preprocessed_data_dir/train
2021-03-20 12:39:10 | INFO | fairseq.trainer | NOTE: your device may support faster training with --fp16
epoch 001: 0%| | 0/7045 [00:00<?, ?it/s]2021-03-20 12:39:10 | INFO | fairseq.trainer | begin training epoch 1
Traceback (most recent call last):
File "/usr/local/bin/fairseq-train", line 8, in
sys.exit(cli_main())
File "/usr/local/lib/python3.6/dist-packages/fairseq_cli/train.py", line 352, in cli_main
distributed_utils.call_main(args, main)
File "/usr/local/lib/python3.6/dist-packages/fairseq/distributed_utils.py", line 286, in call_main
nprocs=args.distributed_num_procs,
File "/usr/local/lib/python3.6/dist-packages/torch/multiprocessing/spawn.py", line 200, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/usr/local/lib/python3.6/dist-packages/torch/multiprocessing/spawn.py", line 158, in start_processes
while not context.join():
File "/usr/local/lib/python3.6/dist-packages/torch/multiprocessing/spawn.py", line 108, in join
(error_index, name)
Exception: process 1 terminated with signal SIGKILL
This seems to be a problem caused by insufficient memory. The memory of my computer is 200G.
How much memory does the FV model need? Or is it caused by other reasons?
Hi,
I run your code (text_only). I get the gen.out file. The result seems that (Last line in gen.out):
Generate test with beam=5: BLEU4 = 1.21, 15.6/1.4/0.5/0.2 (BP=0.953, ratio=0.954, syslen=370561, reflen=388568)
The results are correct? What means that results? Do the results include BLEU-1, BLEU-2, BLEU-4, Dis-1, Dis-2, Dis-3, and Dis-4?
There is a bug as follows:
mm = mmap.mmap(fid.fileno(), bytes, access=acc, offset=start)
ValueError: mmap length is greater than file size
The size of object features files:
149G train.objects.mmap
11G train.objects.mmap.splitaa
11G train.objects.mmap.splitab
11G train.objects.mmap.splitac
11G train.objects.mmap.splitad
11G train.objects.mmap.splitae
11G train.objects.mmap.splitaf
11G train.objects.mmap.splitag
11G train.objects.mmap.splitah
11G train.objects.mmap.splitai
11G train.objects.mmap.splitaj
11G train.objects.mmap.splitak
11G train.objects.mmap.splital
11G train.objects.mmap.splitam
2.1G train.objects.mmap.splitan
17G train.objects.mmap.splitao
The script provided to preprocess text data only can binarize the .txt files into .bin files.
How to get files like train.sent_num.npy exactly?
THX
Q1: Under what configuration did you publish the experimental results of the paper?
Q2: What about one 1080-TI GPU reproduce the experiment?
Hi,
I run your code (FV). I get the gen.out file. The result seems that (Last line in gen.out):
Generate test with beam=5: BLEU4 = 0.40, 6.2/0.5/0.1/0.1 (BP=1.000, ratio=2.087, syslen=810940, reflen=388568)
Diversity-1: 0.0006535270732838133
Diversity-2: 0.0031576051690483616
Diversity-3: 0.006970750791777929
Diversity-4: 0.010780433006226303
StopWords%: 0.3479316484665542; StopWords/Sent: 5.507739454627082
AvgLength: 15.829946711951749
Are the results correct?
When I want to view the shape of train.features.mmap, numpy reports an error. How can I solve this problem
By the way, can I directly use the mmap file (such as train/valid/test. features.mmap) as the video feature, for example, save it as an .npy file for multimodal training
thank you
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.