Giter Site home page Giter Site logo

Comments (4)

vzxxbacq avatar vzxxbacq commented on June 11, 2024

Hello @pranoot , I'm gald to that my code is helpful to you.

In fact, this error is caused by one-hot encoding. It has been solved by adding np.argmax(labels, axis=1) . contact me if you still have question.

Thanks for your report.

from speaker-recognition-papers.

pranoot avatar pranoot commented on June 11, 2024

Hello @vzxxbacq ,

Thanks for the revert and the fix both were helpful and now that script is running without any bug.

Now I am trying to train the same model on my own dataset.
I have few doubts can you please help me with the same.

  1. For pre processing feature extraction in deep speaker I used ext_fbank_feature as specified in the paper. There is SLIDE_WINDOW present in it. Is it required and also if yes what should be the appropriate shape for the same?
  2. The shape of the input place holder is ( None, 100, 64, 1) I want to know is 100 num of frames of individual clip, 64 the number of f-bank coefficients?
  3. What should be the duration of individual clip which is used for training ?
  4. Should the clip contain silence along with utterances or should clip not have any silence?

Thank you very much! :)

from speaker-recognition-papers.

vzxxbacq avatar vzxxbacq commented on June 11, 2024

Hi @pranoot ,
Actually, if you understand slide_window function, all problems will be solved. In the paper, author use fixed length audio ( 10ms ), but we have many dataset with changable length. So, I write the slide_window to apply our model to these dataset. slide_window parameters is a list [l, r] and the function will do feature[i] = feature[i-l : i+r], so obviously 100 means feature[i] = feature[ i-49 : i+50 ] that slide_window=[49, 50]. And 64 is feature dims in the paper.
And I didn't write VAD method, I used other toolkit to do this part job.
If you still have question feel free to contact me.

from speaker-recognition-papers.

pranoot avatar pranoot commented on June 11, 2024

Oh cool got it !
Thanks a lot !! :D

from speaker-recognition-papers.

Related Issues (9)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.