bjfu-ai-institute / speaker-recognition-papers Goto Github PK

View Code? Open in Web Editor NEW

90.0 14.0 21.0 9.72 MB

Share some recent speaker recognition papers and their implementations.

Python 99.35% Shell 0.65%

tensorflow paper-implementations speaker-recognition speaker-verification

speaker-recognition-papers's Issues

Fighting Wolf

I will appreciate it if Fangshen can provide documents or guidance in Chinese. Because I! Am! Chinese!

How to generate enrollment data for testing CTDNN model?

Hi @vzxxbacq, can you please tell me what is enrollment data? Also can you walk me through how can I test my saved model which I have trained for 100 speakers? Thanks!

Ctdnn approach dount

Hi Fang,

Thanks for implementing these research papers.Means a lot.

Would want to know what does 9,40,1 represent in this line of code
How can i use mfcc feature extractor to create this array.

tf.placeholder(tf.float32, shape=[None, 9, 40, 1], name='pred_x')

Thanks

Invalid compatible shapes caused by op LogicalAnd in triplet_loss.py

Hi @vzxxbacq ,

Thanks a lot for sharing the work and codes for deep speaker. I was trying to run the script deep_speaker.py and I encountered some error. I have changed batch size to 4 and am not using any gpu for training, the _main() function is as follows.

def _main():
    """
    Test model.
    """
    from pyasv.data_manage import DataManage
    from pyasv import Config
    import sys
    sys.path.append("../..")
    config = Config(name='deepspeaker', n_speaker=10, batch_size=4, n_gpu=0, max_step=20, is_big_dataset=False,
                 learning_rate=0.01, save_path='./dataset/save', conv_weight_decay=0.01, fc_weight_decay=0.01, bn_epsilon=1e-3 )
    x = np.random.random([120, 100, 64, 1])
    y = np.random.randint(0, 10, [120, 1])
    train = DataManage(x, y, config)

    x = np.random.random([64, 100, 64, 1])
    y = np.random.randint(0, 10, [64, 1])
    validation = DataManage(x, y, config)

    run(config, train, validation)

and the error i encountered is as follows :

  File "deep_speaker.py", line 512, in <module>
    _main()
  File "deep_speaker.py", line 508, in _main
    run(config, train, validation)
  File "deep_speaker.py", line 459, in run
    _no_gpu(config, train, validation)
  File "deep_speaker.py", line 247, in _no_gpu
    feed_dict={x: batch_x, y: batch_y})
  File "/Users/Desktop/speaker_verification/speaker_env/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 895, in run
    run_metadata_ptr)
  File "/Users/Desktop/speaker_verification/speaker_env/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1128, in _run
    feed_dict_tensor, options, run_metadata)
  File "/Users/Desktop/speaker_verification/speaker_env/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1344, in _do_run
    options, run_metadata)
  File "/Users/Desktop/speaker_verification/speaker_env/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1363, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [4,4] vs. [4,4,10]
	 [[Node: LogicalAnd = LogicalAnd[_device="/job:localhost/replica:0/task:0/device:CPU:0"](LogicalNot, Equal_1)]]

Caused by op u'LogicalAnd', defined at:
  File "deep_speaker.py", line 512, in <module>
    _main()
  File "deep_speaker.py", line 508, in _main
    run(config, train, validation)
  File "deep_speaker.py", line 459, in run
    _no_gpu(config, train, validation)
  File "deep_speaker.py", line 223, in _no_gpu
    model = DeepSpeaker(config=config, x=x, y=y)
  File "deep_speaker.py", line 49, in __init__
    self._build_train_graph(x, y)
  File "deep_speaker.py", line 80, in _build_train_graph
    self._loss = self._triplet_loss(output, y)
  File "deep_speaker.py", line 142, in _triplet_loss
    loss = triplet_loss.batch_hard_triplet_loss(targets, inp, 1.0)
  File "/speaker-recognition-papers/pyasv/loss/triplet_loss.py", line 239, in batch_hard_triplet_loss
    mask_anchor_positive = _get_anchor_positive_triplet_mask(labels)
  File "/speaker-recognition-papers/pyasv/loss/triplet_loss.py", line 94, in _get_anchor_positive_triplet_mask
    mask = tf.logical_and(indices_not_equal, labels_equal)
  File "/Users/Desktop/speaker_verification/speaker_env/lib/python2.7/site-packages/tensorflow/python/ops/gen_math_ops.py", line 2401, in logical_and
    "LogicalAnd", x=x, y=y, name=name)
  File "/Users/Desktop/speaker_verification/speaker_env/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/Users/Desktop/speaker_verification/speaker_env/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3160, in create_op
    op_def=op_def)
  File "/Users/Desktop/speaker_verification/speaker_env/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1625, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): Incompatible shapes: [4,4] vs. [4,4,10]

I tried with even different number of speakers but it still is giving me this error. Can you please let me know why is it giving this error?

Thanks a lot !

Using DataManage4BigData class

Hello @vzxxbacq ,

I wanted to use the class DataManage4BigData but am stuck with few queries. I have listed them down so can you please help me with the same?

There is a parameter 'split_type' for the class, what should be its value? Should it be 'train' or 'validation' depending upon the types of clips we are using ?
There is a write-file method in it which uses the extracted features but since my dataset is huge its showing me Memory Error for once i extract features for whole dataset. I wanted to know is there anything I am doing wrong in the process for training using large dataset?

Thank you for your help! :)

ModuleNotFoundError: No module named 'pyasv.data_manage'

As mentioned in the documentation data_manage present in the repo.

疑问

请问，为什么程序文件夹pyasv中的config.py文件为什么与html文件夹中给出的config.html网页中的内容不一样

Low validation accuracy while training for 50 speakers

Hi,

First of all, thank you for sharing your implementation of CTDNN model for ASV. I have been trying to use your code for training a 50 speakers model. However, I am unable to achieve a validation accuracy above 2.7% no matter what. I have tried various ways of parameter tuning and loss and optimizer customizations but to no effect. I have been trying to replicate the inputs as mentioned in the CTDNN paper with a sliding window of size 9 and 40 f-bank dimensions. I am calculating validation accuracy using Jaccard similarity. The loss doesn't decrease even after training for a long duration. It will be really great if you could share some pretrained model with me, or guide me in reproducing the results you got while implementing.

Validation Accuracy low for Deep Speaker Model

Hi @vzxxbacq

I have been trying to train my model for 18 speakers but the validation accuracy is really low and I have tried it for various different architecture but the training as well as validatoin accuracy is low as well as model is not converging well. Can youplease help me out and tell me what is going wrong ?

Thank you

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.