Librispeech960 Pretrained Model about ssast HOT 3 OPEN

JDRanpariya commented on August 20, 2024

Librispeech960 Pretrained Model

from ssast.

Comments (3)

JDRanpariya commented on August 20, 2024

How much data shall I use for fine tuning to get decent results to avoid initial over fitting? Does 50 files of 5 sec work in training? what's the general rule when dealing with over fitting in case of transformers? Do we really need more data to fine tune with or is it hyper parameters?

from ssast.

YuanGongND commented on August 20, 2024

hi there,

Hey I'm curious on why you don't have Librispeech960 Pre-trained on Frame base. I saw you were recommending Frame based models. Do you have Pre-trained Librispeech on Frame?

We do have AudioSet+Librispeech pretrained checkpoint for frame based AST, see https://github.com/YuanGongND/ssast#pretrained-models. One conclusion in our ablation study is that this checkpoint would be better than the model trained solely on Librispeech, even on speech tasks.

Note that for speech tasks, we do not mean ASR, but speech classification, e.g., command recognition, emotion recognition, etc.

How much data shall I use for fine tuning to get decent results to avoid initial over fitting? Does 50 files of 5 sec work in training? what's the general rule when dealing with over fitting in case of transformers? Do we really need more data to fine tune with or is it hyper parameters?

It is hard to estimate as there are many factors (e.g., how many classes, how easy it is to sepearate sounds). You would need to try, but 50 files is a very small number. The smallest dataset we tested is ESC-50 (50 classes, each 40 samples, total 2000 samples).

-Yuan

from ssast.

JDRanpariya commented on August 20, 2024

Hey Thanks Yuan,

Nice answer, I guess I got the idea on what factors I should be looking for when deciding smallest dataset. kudos!

I appreciate your answer to the Librispeech model, I guess I should have framed the question a little different. Anyway, from what I understand Frame-400 trained on both Audioset and Librispeech should perform better than others for Speech classification.

Looking at ablation study in paper and table 2, I can't find whether Libripseech(only) has been trained with patch or frame . From table 5 I can see that Librispeech only has been paired with patch.

It would be nice to the benchmarks for Librispeech only with frames for speech tasks. It's just that I'm unable to find it either on paper or Github readme. Apologies for inconvenience.

Best Regards,
Jaydeep

from ssast.

Librispeech960 Pretrained Model about ssast HOT 3 OPEN

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent