Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Hello <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-ur

Invalid compatible shapes caused by op LogicalAnd in triplet_loss.py about speaker-recognition-papers HOT 4 CLOSED

bjfu-ai-institute commented on June 11, 2024

Invalid compatible shapes caused by op LogicalAnd in triplet_loss.py

from speaker-recognition-papers.

Comments (4)

vzxxbacq commented on June 11, 2024

Hello @pranoot , I'm gald to that my code is helpful to you.

In fact, this error is caused by one-hot encoding. It has been solved by adding np.argmax(labels, axis=1) . contact me if you still have question.

Thanks for your report.

from speaker-recognition-papers.

pranoot commented on June 11, 2024

Hello @vzxxbacq ,

Thanks for the revert and the fix both were helpful and now that script is running without any bug.

Now I am trying to train the same model on my own dataset.
I have few doubts can you please help me with the same.

For pre processing feature extraction in deep speaker I used ext_fbank_feature as specified in the paper. There is SLIDE_WINDOW present in it. Is it required and also if yes what should be the appropriate shape for the same?
The shape of the input place holder is ( None, 100, 64, 1) I want to know is 100 num of frames of individual clip, 64 the number of f-bank coefficients?
What should be the duration of individual clip which is used for training ?
Should the clip contain silence along with utterances or should clip not have any silence?

Thank you very much! :)

from speaker-recognition-papers.

vzxxbacq commented on June 11, 2024

Hi @pranoot ,
Actually, if you understand slide_window function, all problems will be solved. In the paper, author use fixed length audio ( 10ms ), but we have many dataset with changable length. So, I write the slide_window to apply our model to these dataset. slide_window parameters is a list [l, r] and the function will do feature[i] = feature[i-l : i+r], so obviously 100 means feature[i] = feature[ i-49 : i+50 ] that slide_window=[49, 50]. And 64 is feature dims in the paper.
And I didn't write VAD method, I used other toolkit to do this part job.
If you still have question feel free to contact me.

from speaker-recognition-papers.

pranoot commented on June 11, 2024

Oh cool got it !
Thanks a lot !! :D

from speaker-recognition-papers.

Invalid compatible shapes caused by op LogicalAnd in triplet_loss.py about speaker-recognition-papers HOT 4 CLOSED

Comments (4)

Related Issues (9)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent