bjfu-ai-institute / speaker-recognition-papers Goto Github PK
View Code? Open in Web Editor NEWShare some recent speaker recognition papers and their implementations.
Share some recent speaker recognition papers and their implementations.
I will appreciate it if Fangshen can provide documents or guidance in Chinese. Because I! Am! Chinese!
Hi @vzxxbacq, can you please tell me what is enrollment data? Also can you walk me through how can I test my saved model which I have trained for 100 speakers? Thanks!
Hi Fang,
Thanks for implementing these research papers.Means a lot.
Would want to know what does 9,40,1 represent in this line of code
How can i use mfcc feature extractor to create this array.
tf.placeholder(tf.float32, shape=[None, 9, 40, 1], name='pred_x')
Thanks
Hi @vzxxbacq ,
Thanks a lot for sharing the work and codes for deep speaker. I was trying to run the script deep_speaker.py and I encountered some error. I have changed batch size to 4 and am not using any gpu for training, the _main() function is as follows.
def _main():
"""
Test model.
"""
from pyasv.data_manage import DataManage
from pyasv import Config
import sys
sys.path.append("../..")
config = Config(name='deepspeaker', n_speaker=10, batch_size=4, n_gpu=0, max_step=20, is_big_dataset=False,
learning_rate=0.01, save_path='./dataset/save', conv_weight_decay=0.01, fc_weight_decay=0.01, bn_epsilon=1e-3 )
x = np.random.random([120, 100, 64, 1])
y = np.random.randint(0, 10, [120, 1])
train = DataManage(x, y, config)
x = np.random.random([64, 100, 64, 1])
y = np.random.randint(0, 10, [64, 1])
validation = DataManage(x, y, config)
run(config, train, validation)
and the error i encountered is as follows :
File "deep_speaker.py", line 512, in <module>
_main()
File "deep_speaker.py", line 508, in _main
run(config, train, validation)
File "deep_speaker.py", line 459, in run
_no_gpu(config, train, validation)
File "deep_speaker.py", line 247, in _no_gpu
feed_dict={x: batch_x, y: batch_y})
File "/Users/Desktop/speaker_verification/speaker_env/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 895, in run
run_metadata_ptr)
File "/Users/Desktop/speaker_verification/speaker_env/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1128, in _run
feed_dict_tensor, options, run_metadata)
File "/Users/Desktop/speaker_verification/speaker_env/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1344, in _do_run
options, run_metadata)
File "/Users/Desktop/speaker_verification/speaker_env/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1363, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [4,4] vs. [4,4,10]
[[Node: LogicalAnd = LogicalAnd[_device="/job:localhost/replica:0/task:0/device:CPU:0"](LogicalNot, Equal_1)]]
Caused by op u'LogicalAnd', defined at:
File "deep_speaker.py", line 512, in <module>
_main()
File "deep_speaker.py", line 508, in _main
run(config, train, validation)
File "deep_speaker.py", line 459, in run
_no_gpu(config, train, validation)
File "deep_speaker.py", line 223, in _no_gpu
model = DeepSpeaker(config=config, x=x, y=y)
File "deep_speaker.py", line 49, in __init__
self._build_train_graph(x, y)
File "deep_speaker.py", line 80, in _build_train_graph
self._loss = self._triplet_loss(output, y)
File "deep_speaker.py", line 142, in _triplet_loss
loss = triplet_loss.batch_hard_triplet_loss(targets, inp, 1.0)
File "/speaker-recognition-papers/pyasv/loss/triplet_loss.py", line 239, in batch_hard_triplet_loss
mask_anchor_positive = _get_anchor_positive_triplet_mask(labels)
File "/speaker-recognition-papers/pyasv/loss/triplet_loss.py", line 94, in _get_anchor_positive_triplet_mask
mask = tf.logical_and(indices_not_equal, labels_equal)
File "/Users/Desktop/speaker_verification/speaker_env/lib/python2.7/site-packages/tensorflow/python/ops/gen_math_ops.py", line 2401, in logical_and
"LogicalAnd", x=x, y=y, name=name)
File "/Users/Desktop/speaker_verification/speaker_env/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/Users/Desktop/speaker_verification/speaker_env/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3160, in create_op
op_def=op_def)
File "/Users/Desktop/speaker_verification/speaker_env/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1625, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
InvalidArgumentError (see above for traceback): Incompatible shapes: [4,4] vs. [4,4,10]
I tried with even different number of speakers but it still is giving me this error. Can you please let me know why is it giving this error?
Thanks a lot !
Hello @vzxxbacq ,
I wanted to use the class DataManage4BigData but am stuck with few queries. I have listed them down so can you please help me with the same?
Thank you for your help! :)
As mentioned in the documentation data_manage present in the repo.
请问,为什么程序文件夹pyasv中的config.py文件为什么与html文件夹中给出的config.html网页中的内容不一样
Hi,
First of all, thank you for sharing your implementation of CTDNN model for ASV. I have been trying to use your code for training a 50 speakers model. However, I am unable to achieve a validation accuracy above 2.7% no matter what. I have tried various ways of parameter tuning and loss and optimizer customizations but to no effect. I have been trying to replicate the inputs as mentioned in the CTDNN paper with a sliding window of size 9 and 40 f-bank dimensions. I am calculating validation accuracy using Jaccard similarity. The loss doesn't decrease even after training for a long duration. It will be really great if you could share some pretrained model with me, or guide me in reproducing the results you got while implementing.
Hi @vzxxbacq
I have been trying to train my model for 18 speakers but the validation accuracy is really low and I have tried it for various different architecture but the training as well as validatoin accuracy is low as well as model is not converging well. Can youplease help me out and tell me what is going wrong ?
Thank you
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.