Giter Site home page Giter Site logo

pliang279 / multibench Goto Github PK

View Code? Open in Web Editor NEW
459.0 16.0 67.0 51.09 MB

[NeurIPS 2021] Multiscale Benchmarks for Multimodal Representation Learning

License: MIT License

Python 27.43% Shell 0.02% Makefile 0.01% CSS 0.75% JavaScript 1.46% HTML 70.31% Batchfile 0.02%
machine-learning multimodal-learning robotics natural-language-processing computer-vision deep-learning healthcare representation-learning speech-processing

multibench's Introduction

MultiBench: Multiscale Benchmarks for Multimodal Representation Learning

MultiBench website

codecov Documentation Status

Documentation, Tutorials and examples

Contributors

Correspondence to:

Paper

MultiZoo & MultiBench: A Standardized Toolkit for Multimodal Deep Learning
Paul Pu Liang, Yiwei Lyu, Xiang Fan, Arav Agarwal, Yun Cheng, Louis-Philippe Morency, Ruslan Salakhutdinov
JMLR 2022 Open Source Software.

MultiBench: Multiscale Benchmarks for Multimodal Representation Learning
Paul Pu Liang, Yiwei Lyu, Xiang Fan, Zetian Wu, Yun Cheng, Jason Wu, Leslie Chen, Peter Wu, Michelle A. Lee, Yuke Zhu, Ruslan Salakhutdinov, Louis-Philippe Morency
NeurIPS 2021 Datasets and Benchmarks Track.

If you find this repository useful, please cite our paper and corresponding software package:

@article{liang2023multizoo,
  title={MULTIZOO \& MULTIBENCH: A Standardized Toolkit for Multimodal Deep Learning},
  author={Liang, Paul Pu and Lyu, Yiwei and Fan, Xiang and Agarwal, Arav and Cheng, Yun and Morency, Louis-Philippe and Salakhutdinov, Ruslan},
  journal={Journal of Machine Learning Research},
  volume={24},
  pages={1--7},
  year={2023}
}
@inproceedings{liang2021multibench,
  title={MultiBench: Multiscale Benchmarks for Multimodal Representation Learning},
  author={Liang, Paul Pu and Lyu, Yiwei and Fan, Xiang and Wu, Zetian and Cheng, Yun and Wu, Jason and Chen, Leslie Yufan and Wu, Peter and Lee, Michelle A and Zhu, Yuke and others},
  booktitle={Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1)},
  year={2021}
}

Overview

Learning multimodal representations involves integrating information from multiple heterogeneous sources of data. It is a challenging yet crucial area with numerous real-world applications in multimedia, affective computing, robotics, finance, human-computer interaction, and healthcare. Unfortunately, multimodal research has seen limited resources to study (1) generalization across domains and modalities, (2) complexity during training and inference, and (3) robustness to noisy and missing modalities.

In order to accelerate progress towards understudied modalities and tasks while ensuring real-world robustness, we release MultiBench, a systematic and unified large-scale benchmark for multimodal learning spanning 15 datasets, 10 modalities, 20 prediction tasks, and 6 research areas. MultiBench provides an automated end-to-end machine learning pipeline that simplifies and standardizes data loading, experimental setup, and model evaluation. To reflect real-world requirements, MultiBench is designed to holistically evaluate (1) performance across domains and modalities, (2) complexity during training and inference, and (3) robustness to noisy and missing modalities.

To accompany MultiBench, we also provide a standardized implementation of 20 core approaches in multimodal learning unifying innovations in fusion paradigms, optimization objectives, and training approaches which we call MultiZoo. MultiZoo implements these methods in a modular fashion to enable accessibility for new researchers, compositionality of approaches, and reproducibility of results.

Datasets currently supported

  1. Affective computing: MUStARD, CMU-MOSI, UR-FUNNY, CMU-MOSEI
  2. Healthcare: MIMIC
  3. Robotics: MuJoCo Push, Vision & Touch
  4. Finance: Stocks-food, Stocks-health, Stocks-tech
  5. HCI: ENRICO
  6. Multimedia: AV-MNIST, MM-IMDb, Kinetics-S, Kinetics-L
  7. RTFM env

To add a new dataset:

  1. Go to datasets/
  2. Add a new folder if appropriate
  3. Write a python file with a get_dataloader function that returns a tuple of 3 dataloaders (for train, valid, test data respectively) containing preprocessed data. Please following the existing examples (such as avmnist: datasets/avmnist/get_data.py)
  4. Go to examples/ and write an example training python file following the existing examples
  5. Check that calling the dataloader and running a simple training script works

Algorithms supported

See Appendix Section F for detailed descriptions of each part.

  1. Unimodal models: MLP, GRU, LeNet, CNN, LSTM, Transformer, FCN, Random Forest, ResNet, etc... (see unimodals/)
  2. Fusion paradigms: early/late fusion, NL-gate, tensor fusions, Multiplicative Interactions, Low-Rank Tensor Fusion, etc (see fusions/)
  3. Optimization objectives: (default: CrossEntropyLoss for classification tasks, MSELoss for regression tasks), ELBO, Weighted Reconstruction Loss, CCA loss, Contrastive Loss, etc (see objective_functions/)
  4. Training structures: Supervised Learning (which supports Early Fusion, Late Fusion, MVAE, MFM, etc), Gradient Blend, Architecture Search, etc (see training_structures/)

To add a new algorithm:

  1. Figure out which subfolder to add it into:
  • unimodals/ : unimodal architectures
  • fusions/ : multimodal fusion architectures
  • objective_functions/ : objective functions in addition to supervised training loss (e.g., VAE loss, contrastive loss)
  • training_structures/ : training algorithms excluding objective functions (e.g., balancing generalization, architecture search outer RL loop)
  1. see examples/ and write an example training python file following the existing examples
  2. check that calling the added functions and running a simple training script works
  3. Make sure your new modules are well documented by comments in its input and output format and shapes

Open call for research areas, datasets, tasks, algorithms, and evaluation

We welcome new contributions to MultiBench through new research areas, datasets, tasks, algorithms, and evaluation. Please refer to the sections above for instructions on adding new datasets and algorithms, and open a pull request if you would like to see a specific dataset or algorithm added. We plan to use MultiBench as a theme for future workshops, competitions, and academic courses - stay tuned for upcoming calls for participation!

Experiments

Affective Computing

We release the processed datasets: sarcasm, mosi, mosei, humor. The original datasets are also publicly available at MultimodalSDK for MOSI and MOSEI, MUsTARD and UR-Funny. You can obtain processed data with datasets/affect/get_data.py, note that sarcasm means MUsTARD and humor means UR-FUNNY.

There are several example scripts for running affect datasets under examples/affect/. For example, to run affect datasets with simple late fusion, fistly, you can use

traindata, validdata, test_robust = get_dataloader('/home/pliang/multibench/affect/pack/mosi/mosi_raw.pkl', data_type='mosi')

or if you don't want to use packed data, and expect data with the same max squence length, use max_pad and max_seq_len options, and remember to set is_packed=False in the train and test functions

traindata, validdata, testdata = get_dataloader('/home/pliang/multibench/affect/pack/mosi/mosi_raw.pkl', data_type='mosi', max_pad=True, max_seq_len=50)

then do

python3 examples/affect/affect_late_fusion.py

Healthcare

The MIMIC dataset has restricted access. To gain access to the preprocessed version of this dataset, please follow instructions here to gain the necessary credentials. Once you have the credentials, email [email protected] with proof of your credentials and ask for the preprocessed 'im.pk' file.

After you have the 'im.pk' file, you can get the dataloaders of this dataset by calling the get_dataloader function in examples/mimic/get_data.py. The get_dataloader function takes 2 inputs: the first specifies which task you want to do (-1 means mortality task, 1 means icd9 10-19 task, 7 means ic9 70-79 task). The input modalities will be static (vector of size 5) and time-series (24x30 shaped).

There are several example scripts for running MIMIC under examples/healthcare/. For example, to run MIMIC with Low Rank Tensor Fusion, do

python3 examples/healthcare/mimic_low_rank_tensor.py

Robotics

Vision & Touch

For Vision and Touch dataset, the scripts for downloading the dataset is included in dataset/robotics/ folder (download_data.sh). After the data is downloaded, use dataset/robotics/data_loader.py to access the preprocessed dataloaders. Note that this dataset only has train and valid set, so the output will be a tuple of 2 dataloaders instead of 3. The default task is Contact, but you can get the dataloaders for End Effector task by passing in "output='ee_yaw_next'" as argument to the get_data function.

For more detailed information on this dataset, see the original repo.

There are several example scripts for running Vision and Touch under examples/robotics/. For example, to run Vision and Touch with Low Rank Tensor Fusion on Contact Task, do

python3 examples/robotics/LRTF.py

MuJoCo Push (Gentle Push)

The code for MuJoCo Push experiments can be found under the examples/gentle_push directory. Each model type has its own Python file under this directory, which can be directly executed to run the experiments.

For example, to run the late fusion model:

python examples/gentle_push/LF.py

This will also download the dataset to datasets/gentle_push/cache on the first run. Since the original dataset is hosted on Google Drive, sometimes the automatic download may fail for various reasons. We observed that running on Colab solves the issue. Additionally, you can download these files manually and place them at the correct locations:

  • Download gentle_push_10.hdf5 to datasets/gentle_push/cache/1qmBCfsAGu8eew-CQFmV1svodl9VJa6fX-gentle_push_10.hdf5.
  • Download gentle_push_300.hdf5 to datasets/gentle_push/cache/18dr1z0N__yFiP_DAKxy-Hs9Vy_AsaW6Q-gentle_push_300.hdf5.
  • Download gentle_push_1000.hdf5 to datasets/gentle_push/cache/1JTgmq1KPRK9HYi8BgvljKg5MPqT_N4cR-gentle_push_1000.hdf5.

Finance

The code for finance experiments can be found under the examples/finance directory. Each model type has its own Python file under this directory. Each file accepts two arguments, --input-stocks and --target-stock. For example, to run simple late fusion on the stocks benchmarked in the paper:

python examples/finance/stocks_late_fusion.py --input-stocks 'MCD SBUX HSY HRL' --target-stock 'MCD'
python examples/finance/stocks_late_fusion.py --input-stocks 'AAPL MSFT AMZN INTC AMD MSI' --target-stock 'MSFT'
python examples/finance/stocks_late_fusion.py --input-stocks 'MRK WST CVS MCK ABT UNH TFX' --target-stock 'UNH'

You can specify arbitrary stocks to be downloaded. The data loader will automatically download the data for you. If the stocks do not cover the date range defined in datasets/stocks/get_data.py, a different date range can be specified.

For unimodal experiments, run stocks_early_fusion.py with the the same stock passed to --input-stocks and --target-stock.

Below is a full list of stocks under each category outlined in the paper:

F&B (18): CAG CMG CPB DPZ DRI GIS HRL HSY K KHC LW MCD MDLZ MKC SBUX SJM TSN YUM
Health (63): ABT ABBV ABMD A ALXN ALGN ABC AMGN ANTM BAX BDX BIO BIIB BSX BMY CAH CTLT CNC CERN CI COO CVS DHR DVA XRAY DXCM EW GILD HCA HSIC HOLX HUM IDXX ILMN INCY ISRG IQV JNJ LH LLY MCK MDT MRK MTD PKI PRGO PFE DGX REGN RMD STE SYK TFX TMO UNH UHS VAR VRTX VTRS WAT WST ZBH ZTS
Tech (100): AAPL ACN ADBE ADI ADP ADSK AKAM AMAT AMD ANET ANSS APH ATVI AVGO BR CDNS CDW CHTR CMCSA CRM CSCO CTSH CTXS DIS DISCA DISCK DISH DXC EA ENPH FB FFIV FIS FISV FLIR FLT FOX FOXA FTNT GLW GOOG GOOGL GPN HPE HPQ IBM INTC INTU IPG IPGP IT JKHY JNPR KEYS KLAC LRCX LUMN LYV MA MCHP MPWR MSFT MSI MU MXIM NFLX NLOK NOW NTAP NVDA NWS NWSA NXPI OMC ORCL PAYC PAYX PYPL QCOM QRVO SNPS STX SWKS T TEL TER TMUS TRMB TTWO TWTR TXN TYL V VIAC VRSN VZ WDC WU XLNX ZBRA

HCI

The code for HCI experiments can be found under the examples/hci directory. Our experiments use the ENRICO dataset, which contains application screenshots and their UI layout. App screens are classified into 20 different design categories.

The unimodal examples can be run using the following commands:

Screenshot modality

python examples/hci/enrico_unimodal_0.py

UI Layout modality

python examples/hci/enrico_unimodal_1.py

The multimodal examples are found in the same directory. As an example:

Simple Late Fusion

python examples/hci/enrico_simple_late_fusion.py

Multimedia

To access AV-MNIST, download the avmnist.tar.gz file from here and untar it. Then, input the location of the avmnist file to the get_dataloader function in datasets/avmnist/get_data.py script. The input modalities are black-white images (28x28 tensors) and audio spectograms (112x112 tensors).

There are several example scripts for running AV-MNIST under examples/multimedia/. For example, to run Vision and Touch with Simple Late Fusion with Concatenation, do

python examples/multimedia/avmnist_simple_late_fusion.py

To access MM-IMDb, download the multimodal_imdb.hdf5 from here and we also use the raw data from here to test models' robustness.

There are several example scripts for running MM-IMDb under examples/multimedia/. To run experiments, input the location of the hdf5 file to the get_dataloader function in each of the examples. Then, taking Text and Image with Simple Late Fusion with Concatenation for example, do

python examples/multimedia/mmimdb_simple_late_fusion.py

Scripts for the Kinetics dataset are located in the special directory. Run python special/kinetics_*.py for the respective script.

To access Clotho, clone the clotho-dataset repository somewhere on your device and follow the instructions in the ReadMe of that repository to download and preprocess the data (use the one-step preprocess approach). To get the dataloader, input the path to the "clotho-dataset" repo to the get_dataloaders function in datasets/clotho/get_data.py script. The default data are audio features (padded to 2574x64) and text caption word indices (padded to 20x18).

Evaluation

Complexity

We have a script (eval_scripts/complexity.py) for recording complexity data for training and testing, including peak memory, number-of-parameters and time for training and number-of-parameters and time for testing. You will need to install memory_profiler to run this script. It provides 2 useful functions: all_in_one_train, which takes in a function reference of the training process as well as all the modules involved in training and will run the training process and print out total runtime, peak memory and total number of parameters; all_in_one_test, which takes a function reference of the testing process as well as all the modules involved in testing and will run the testing process and print out total runtime and total number of parameters.

For example usage, see examples/healthcare/mimic_baseline_track_complexity.py (which adds complexity measuring to the script examples/healthcare/mimic_baseline.py)

Robustness

Modality-specific and multimodal imperfection implementations are under robustness, organized by modalities. We have a script (eval_scripts/robustness.py) that reports robustness metrics for testing on data of modality-specific and multimodal imperfections. It also plots the performance-imperfection curve and saves to the default directory.

All robustness experiments are now integrated into the standard training/testing scripts.

We visualize the experiment results using two metrics, relative and effective robustness, as well as a combination of both. These plots indicate the tradeoff between accuracy and robustness:

References

Patch Note / Major Updates

6/11/2021: Refactored some code. Specifically, we deprecated the Simple_Early_Fusion, Simple_Late_Fusion, MVAE, MFM, CCA, Contrastive training structures with the new Supervised_Learning training structure, and modified some examples/ files accordingly. We also integrated the dataloaders and testing scripts for robustness experiments into the regular ones. The deprecated training structures as well as their examples can be found in deprecated_training_structures/ and deprecated_examples/ folders. The deprecated dataloaders and testing scripts specifically for robustness can be found in deprecated_dataloaders/ and deprecated_examples_robust/ folders.

7/9/2021: Added support for Clotho (audio captioning), Yummly-28K (image-text retrieval), RTFM (language-guided reinforcement learning). We plan to use this as a starting point to gradually expand our repo to include QA, retrieval, generative, and RL tasks as well.

multibench's People

Contributors

arav-agarwal2 avatar js0nwu avatar kapikantzari avatar lvyiwei1 avatar mrbeann avatar neal-ztwu avatar peter-yh-wu avatar pliang279 avatar sfanxiang avatar vanvan2017 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

multibench's Issues

Question regarding the DHG-14/28 dataset

Hello. I wished to open a PR sometime to add support for the DHG-14/28 dataset [ site | paper ]. It's a challenging dynamic hand-gesture recognition dataset consisting of three modalities:

  • Depth videos / sequences of 16-bit depth-maps, at resolution 640x480
  • Sequences of 2D skeleton coordinates (in the image space) of 22 hand joints (frames, 22*2)
  • Sequences of 3D skeleton coordinates (in the world space), (frames, 22*3)

However, there's a small issue: the standard evaluation process of this dataset is a bit different from the norm.

There are exactly 2800 data instances in the dataset, performed by 20 unique people. Benchmarks on this dataset are evaluated through a 20-fold, 'leave-one-out' cross validation process. Models are trained 20 times: each time 19 people's data is used for training, while 1 person's data is strictly isolated and used as validation. This prevents any data leakage, and is supposed to increase the robustness of the evaluation.

The instructions in MultiBench mention implementing get_dataloader and having it return 3 dataloaders for train, val and test respectively. However there is no test in this dataset, rather 20 combinations of train and val.

Would it be okay to implement it in such a way that it returns training and validation dataloaders only?

Some algorithms hang while running.

I test avmnist with different algorithms, but some of it hang while running. E.g., unimodal_1 (the strange thing is unimodal_0 is fine), MFM, cca.

EMAP evaluation

Hi There!

Thanks for your work in putting together MultiBench --- this benchmark seems quite promising! I got a google scholar ping from your arxiv paper about the potential inclusion of Empirical Multimodally Additive Projections (EMAP) as a means of evaluating whether or not algorithms are using multi-modal interactions to get better accuracy or not. I'm one of the authors of that paper, and after seeing your RFC, wanted to reach out. Don't hesitate to let me know if I can be helpful implementation-wise for that potential addition!

Jack

Errors running mmimdb examples

First off, grateful for the repo and hats off to the tremendous effort that went into building this.

When experimenting with one of the given examples "MultiBench/examples/multimedia/mmimdb_simple_early_fusion.py". There are multiple errors being faced.

  1. The files vgg.tar, synset_words.txt and GoogleNews-vectors-negative300.bin.gz are required to run the function at from datasets.imdb.get_data import get_dataloader and to initialize the class at from .vgg import VGGClassifier. These are locally passed from the authors' source code but are not available in the git repo. This makes it hard for developers like me to run tests and experiment with the repo.

I would also want to point out that installation of the package blocks isn't available in the environment.yml file so that had to be installed separately. If possible please, share the above files so I can run experiments for my project as well.

Questions about the video encodings of the mosi and mosei datasets

Thank you for writing a brilliant paper and a convenient code repository to reproduce the results. I have gone through the repo and the paper, but I still have questions about the implemented datasets and dataloaders.

Could you please lend some time to elucidate the following questions about the datasets?

  1. For MOSEI dataset, the encodings for a datapoint are of size 713. I can understand that these features are obtained from OpenFace and Facet libraries, but could you tell us which component/indices in the encodings are obtained from where ?

  2. For the MOSI dataset, the encodings are only of size 35. It seems there are only the Facet features provided for the dataset. Is there a reason why other (OpenFace) features are not used/provided as in Mosei ?

  3. Are you fine-tuning the training data of MOSI/MOSEI to obtain the video encodings?

Thank you again for your efforts. Your answers would save many hours banging our heads around the code.

[QUESTION] Availability of trained models

Hello everyone :)

First congratulations for this amazing work and benchmark ! It's really really huge ! Second, i was wondering if some already trained models could potentially be shared (e.g. best state of the art, specific to some domains , etc)

Again a huge congrats !!

Léo

Suggestion about PyTorch version

The author did not mention it in requirement.txt, but after testing, you had better use torch2.0 and the torchtext of the corresponding version.

作者在requirement.txt中没有讲,但经过测试,pytorch2.0以及对应的torchtext版本是必须的。

Question about the mortality label on im.pk file

First, thanks for replying to my last question. But I still have a few questions about the im.pk file mortality lables. According to the paper, there are six labels that correspond to six different times of death. So shouldn't only one of these six labels be 1? But when I was debugging, I found that the lables for different time of death sometimes equal 1 at the same time. So I think there is some deviation in my understanding, please give me some guidance.Thank you!

Labels for CMU MOSEI

@pliang279 When I access CMU MOSEI labels using mosei.hdf I have an array of 7. What labels do each of the array elements correspond to?

requirements for imdb dataset

It seems the imdb dataset uses the theano and blocks, what is the system and version requirement for these? I have tried the official link, but it seems not working.
https://blocks.readthedocs.io/en/latest/setup.html

Some error message like,

File "/tmp/pip-install-sazv5l9d/toolz_ccc093e7dfa34bf2af1fb5c703132aa3/toolz/functoolz.py", line 467
      f.__name__ for f in reversed((self.first,) + self.funcs),
      ^
  SyntaxError: Generator expression must be parenthesized

Question about relative robustness

Hi I have a question about relative robustness.
There is a function of "relative_robustness_helper" in eval_scripts/robustness.py.
It may be my misunderstanding, but that function is correct?
I think it's necessary to compare with 'LF' result, but it doesn't do it.
(I checked the paper and there is some explanation about it but I think it's not same to code)

Code to obtain features from raw data

Could you please tell us if scripts are available to obtain features from the raw video and audio data? Could you point us to the code or provide us with the code?

Questions about the im.pk data & MFM

How exactly are the im.pk files used in healthcare examples obtained and pre-processed? For example, what is the specific content of the label in the im.pk file, what is the specific data of the two modes in the im.pk file, and what data is recorded?
Finally, I would like to ask what is the specific MFM method mentioned and used in the project, and what is the full name of MFM?
I would really appreciate your reply! As I have been preparing my project recently, the multibench subject has helped me a lot!

Info regarding preprocessed MOSEI

Hi,

Thank you for this amazing repo. I would like to ask further information about how the MOSEI dataset was preprocessed in the released files in the affecting computing part. I was questioning why the sentiment includes 22.777 datapoints while the whole dataset seems to be 23.453. It would be useful however to include maybe a small readme with additional info on how each modality was preprocessed.

What's the meaning of modalities in MUJOCO PUSH dataset?

Hi, I recently tried the MUJOCO PUSH dataset, but I cannot figure out the concrete meaning of the modalities. The paper mentioned

The multimodal inputs are gray-scaled images (1 × 32 × 32) from an RGB camera, forces (and binary contact information) from a force/torque sensor, and the 3D position of the robot end-effector.

I found the modality in the dataset are "control", "image", "sensor", "pos". What are the correspondences between these modalities and the paper? (i.e. what's the meaning of these modalities?).

Error while testing with avmnist_simple_late_fusion.

Firstly, thank you for this great repo.
I try to run examples/multimedia/avmnist_simple_late_fusion.py. The training procedure is all right. But while it runs into the test, it gets the following error.

single_test(encoder, head, test_dataloaders_all[list(test_dataloaders_all.keys())[0]][0], auprc, modalnum, task, criterion)
AttributeError: 'DataLoader' object has no attribute 'keys'

I guess this is due to refactoring do not complete? How can I fix this? Thanks!

questions about mosei dataset

Hi, thanks for your code.

When I use your dataloader loading mosei affect dataset[I used the dataset provided in your repo]. I found that the batch video data shape is [batchsize, 50, 35]. batch audio data shape is [batchsize, 50 74]. What does 50 and 35 in video date shape mean? BTW, the regular video batch data format should be [batchsize, channel, clip_length, crop_size, crop_size]. It seems that [batchsize, 50, 35] doesn't follow this format. What is the reason for this?

Thanks!

Question about the mosei dataset

Hi, Thanks for your code.

When I use your code to train a model in mosei dataset, I find that after 10 epoch, the model became overfiting. Is this normal?

By the way, in your example, you train your model 1000 epoch. Is this superparameter the result of your experiment?

Thanks!

My model is as follows:

encoders=[GRU(35,300,dropout=True,has_padding=False).cuda(), \
    GRU(74,300,dropout=True,has_padding=False).cuda(),\
    GRU(300,300,dropout=True,has_padding=False).cuda()]
head=MLP(300,150,1).cuda()
fusion =add()

Leaderboard

Hello,
I am currently conducting some experiments on CMU-MOSI and CMU-MOSEI using mmsdk, but I would like to use MultiBench too for my research. Where can I find the sota? It seems to me that you are busy with other things than creating a Leaderboard right now, but do you have any suggestions on how to reproduce the state-of-the-art for MultiBench? Are you aware of the state-of-the-art right now?

Extracting info from the H5 files

Hello,

I would be interested to train an audio-only model (or, perhaps, a bimodal audio-text one) using CMU-MOSEI data.

I would be recomputing the audio embeddings.

So I would need only the links to the videos plus the timestamps and the annotated emotions per timestamp range.

How would I go about extracting this information?

Thanks,

Ed

Process mosei_senti_data.pkl to match the text id in mosei.hdf5

If you are using the mosei_senti_data.pkl and want to get the raw text by matching the id in mosei.hdf5, please consider to use the following script to process the data.

file1 = pickle.load(open('data/mosei_senti_data.pkl', 'rb'))

data = file1['test']['id']

# keep the first element and add the num.
modified_data = []
counters = {}
for element in tqdm(data, desc="Processing elements"):
    key = element[0]
    if key not in counters:
        counters[key] = 0
    modified_data.append(f"{key}[{counters[key]}]")
    counters[key] += 1


file1['test']['id'] = np.array(modified_data)


with open('data/mosei_new.pkl', 'wb') as f:
    pickle.dump(file1, f)

print('all done!')

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.