llsourcell / tensorflow_speech_recognition_demo Goto Github PK

This is the code for 'How to Make a Simple Tensorflow Speech Recognizer' by @Sirajology on Youtube

Python 100.00%

tensorflow_speech_recognition_demo's Introduction

tensorflow_speech_recognition_demo

This is the code for 'How to Make a Simple Tensorflow Speech Recognizer' by @Sirajology on Youtube

Overview

This is the full code for 'How to Make a Simple Tensorflow Speech Recognizer' by @Sirajology on Youtube. In this demo code we build an LSTM recurrent neural network using the TFLearn high level Tensorflow-based library to train on a labeled dataset of spoken digits. Then we test it on spoken digits.

Dependencies

tflearn (http://tflearn.org/)
tensorflow (https://www.tensorflow.org/versions/r0.12/get_started/os_setup.html)
future

Use pip to install any missing dependencies

Usage

Run the following code in terminal. This will take a couple hours to train fully.

python demo.py

Challenge

The weekly challenge is from the last video, it's still running! Check it out here

Credits

Credit for the vast majority of code here goes to pannouse. I've merely created a wrapper to get people started!

tensorflow_speech_recognition_demo's People

Contributors

Stargazers

Watchers

Forkers

robustfengbin wen036 elviswf elmargb vishal2232 codecrack3 matthewswogger onthelake coll3ctions zenanswer lhhightech shazz newriverchan emrul prasannanatarajan merico34 helgejo bin2000 rbollineni aswincsekar frankiegu jadeluo orapradeep minsopheaktra raojm johnpineda4 aerovenky davidfumo chromeappplayj karanchaparwal suanfeng mcopelli chongchai tdhworkspace nurfitra rcpbayindir la-jachoo xixiliya kongyt xucui123 ray-tseng 307509256 caiweicaiwei ztingson s0x06 neuralnetworkingtechnologies tanduong xiao2mo leonardoaraujosantos lulzzz wangmengzhi jangoai wantongtang abhishekhp2016 laceywang1993 insky2005 cooledge istiyakv edisono priyeshlakar pluketic scottai skyle97 apanduro lintg1989 jaswanth1998 ajianironside xiaozhuo12138 alexqianyi uncledickhe hmilysls superhg2012 tommysheu aifullstack mengqingmeng tonyingithub rongyousu ashihskumar713 galamon5 tomaszszyborski zbxzc35 yeshaohuagg ahmed-elnaggar priyanka-bagade wangzihaooooo lab930boss seanreed1111 hcchengithub geosson ceste 15449119 18307612949 hyzwj leedakyeong apustar rockingvs hectonpdomingos yurisousa heiyixueren stefanradev93

tensorflow_speech_recognition_demo's Issues

TypeError: maybe_download() takes exactly 2 arguments (1 given)

$ python speech_data.py
downloading speech datasets
Traceback (most recent call last):
File "speech_data.py", line 372, in
maybe_download( Source.DIGIT_SPECTROS)
TypeError: maybe_download() takes exactly 2 arguments (1 given)

Feature shape

Hi,

just wanted to mention the lstm layer of tflearn accepts features of shape [ batch x time x features_dim ] and in the script they are passed like [batch x features_dim x time ].

Coordinator stopped with threads still running

I am getting this runtime error. How to solve this?

not working on Windows

Hi,

I get the following error when I run demo.py

for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno 11001] getaddrinfo failed

    raise URLError(err)
urllib.error.URLError: <urlopen error [Errno 11001] getaddrinfo failed>

I tried to manually download the data files but the link is not working!

Appreciate your help,

Why is the training loop an infinite loop?

Maybe this question is silly, but I do not get why the training loop is "while 1". Why wouldn't that train for an infinite number of times?

Failed to load the native TensorFlow runtime

Traceback (most recent call last):
File "demo.py", line 2, in
import tflearn
File "/usr/local/lib/python2.7/site-packages/tflearn/init.py", line 4, in
from . import config
File "/usr/local/lib/python2.7/site-packages/tflearn/config.py", line 3, in
import tensorflow as tf
File "/usr/local/lib/python2.7/site-packages/tensorflow/init.py", line 24, in
from tensorflow.python import *
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/init.py", line 72, in
raise ImportError(msg)
ImportError: Traceback (most recent call last):
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/init.py", line 61, in
from tensorflow.python import pywrap_tensorflow
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow.py", line 28, in
_pywrap_tensorflow = swig_import_helper()
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow', fp, pathname, description)
ImportError: dlopen(/usr/local/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow.so, 10): Library not loaded: @rpath/libcudart.8.0.dylib
Referenced from: /usr/local/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow.so
Reason: image not found

Failed to load the native TensorFlow runtime.

See https://github.com/tensorflow/tensorflow/blob/master/tensorflow/g3doc/get_started/os_setup.md#import_error

for some common reasons and solutions. Include the entire stack trace
above this error message when asking for help.

how to use the model to predict?

model.load("tflearn.lstm.model")
_y=model.predict(X)
print( numpy.argmax( Y[63]))
print (numpy.argmax( _y[63]))

the output is :
[ 0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
[0.032069575041532516, 0.0994795486330986, 0.2319323718547821, 0.04906868934631348, 0.11966893821954727, 0.1732122153043747, 0.013387128710746765, 0.07908126711845398, 0.1753220111131668, 0.02677823416888714]

the argmax seems is always not match.and I can't find anything useful from the output

speech_data fails to properly extract .tar

In the demo, I'm getting

Looking for data spoken_numbers_pcm.tar in data/
Extracting data/spoken_numbers_pcm.tar to data/
Data ready!

and then

FileNotFoundError: [Errno 2] No such file or directory: 'data/spoken_numbers_pcm/'

It successfully creates data/ but the file spoken_numbers_pcm.tar fails to extract, I'm left with the plain tar file in the dir.

I don't think it's a permissions thing. Here is the permissions of the downloaded file:
-rw-r--r-- 1 mm mm 38M Dec 10 11:20 spoken_numbers_pcm.tar
Setting chmod 666 doesn't help, so I don't think that is it.

I'm pretty sure this block in speech_data.maybe_download() is the point of failure:

if os.path.exists(filepath):
    print('Extracting %s to %s' % ( filepath, work_directory))
    os.system('tar xf '+filepath)
    print('Data ready!')

Not sure why it's failing, but I would recommend using the tarfile library for better portability and reliability. Have you looked into using the subprocess library at all? I highly recommend for times when you have to interface with other programs!

System: Python 3.5.2, Jupyter, Mint 18

Cheers,

ValueError: At least two variables have the same name: FullyConnected/W

/home/mg/anaconda2/envs/tensorflow/lib/python2.7/site-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from ._conv import register_converters as _register_converters
Looking for data spoken_numbers_pcm.tar in data/
Extracting data/spoken_numbers_pcm.tar to data/
Data ready!
loaded batch of 2402 files
WARNING:tensorflow:VARIABLES collection name is deprecated, please use GLOBAL_VARIABLES instead; VARIABLES will be removed after 2017-03-02.
2018-01-31 12:07:20.026732: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2
Traceback (most recent call last):
File "demo.py", line 32, in
model = tflearn.DNN(net, tensorboard_verbose=0)
File "/home/mg/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tflearn/models/dnn.py", line 65, in init
best_val_accuracy=best_val_accuracy)
File "/home/mg/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tflearn/helpers/trainer.py", line 137, in init
allow_empty=True)
File "/home/mg/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1239, in init
self.build()
File "/home/mg/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1248, in build
self._build(self._filename, build_save=True, build_restore=True)
File "/home/mg/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1284, in _build
build_save=build_save, build_restore=build_restore)
File "/home/mg/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 743, in _build_internal
saveables = self._ValidateAndSliceInputs(names_to_saveables)
File "/home/mg/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 596, in _ValidateAndSliceInputs
names_to_saveables = BaseSaverBuilder.OpListToDict(names_to_saveables)
File "/home/mg/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 561, in OpListToDict
name)
ValueError: At least two variables have the same name: FullyConnected/W

this project don't works, bad porject.

while(1):
means that it would never stop ,i have been train for few days ,but just never get finished.
And the code was also wrong in some ways. The train data would be always the initial 64 audios.and train it with while(true), so we'd always training our model with the same initial 64 audios
In face , X,Y=nextbatch should be get inside of the whlie loop,Then each epoch can get the next 64 audio'data to training modle.

Traceback demo.py line 15, and urllib.error.URLError

Hi, I have an error saying, urllib.error.URLError: <urlopen error [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond>. I have an Traceback error saying,
Traceback (most recent call last): File "demo.py", line 15, in <module> X, Y = next(batch).

The screenshot of my screen in Command Prompt is here.

import skimage.io?

I'm having trouble locating where I can download scikit (I think that's what I need). Can someone point me in the right direction? I'm using python 3.5.2 64-bit windows 10. Thanks! Here is the error message:
Traceback (most recent call last):
File "C:\Users--------\Desktop\listen\demo.py", line 3, in
import speech_data
File "C:\Users--------\Desktop\listen\speech_data.py", line 11, in
import skimage.io # scikit-image
ImportError: No module named 'skimage'

missing predict.py

I wonder where is predict.py file that was mention in https://www.youtube.com/watch?v=u9FPqkuoEJ8 at 6:41 of the video

'tar' is not recognized as an internal or external command, operable program, or batch file

Como es what? It's throwing this error.

'tar' is not recognized as an internal or external command,
operable program or batch file.
Traceback (most recent call last):
File "", line 1, in
File "C:\Python\lib\site-packages\spyder\utils\site\sitecustomize.py", line 880, in runfile
execfile(filename, namespace)
File "C:\Python\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/Users/Jackson Andrews/Irene.py", line 16, in
X, Y = next(batch)
File "C:\Users\Jackson Andrews\speech_data.py", line 164, in mfcc_batch_generator
files = os.listdir(path)
FileNotFoundError: [WinError 3] The system cannot find the path specified: 'data/spoken_numbers_pcm/'

How to realise the output prediction

After editing the #7 & #30 line, the code gives some output but in array format like
[[0.08132637 0.09584257 0.0770199 0.0861607 0.10533866 0.12470371
0.1259584 0.10570324 0.08132752 0.11661901].
so from this how to realise the prediction of word & how someone know that this ans predicts which audio file.

TypeError in Speech_data.py: maybe_download takes 2 arguments but only one is passed

Note: I am new to python and machine learning so I maybe missing an obvious issue.

I am getting an error in file "speech_data.py"
TypeError: maybe_download() takes exactly 2 arguments (1 given)

Can someone explain to me why there is only one argument being passed into maybe_download when it is defined with two parameters? If the argument being passed is multivariate, then how do I address the error that I am being given?

Can we test a voice which is different from the training samples?

I try to use different test samples from traininig samples.
For example i used this train set (0_Samantha_100.wav, 1_Samantha_100.wav, 2_Samantha_100.wav, 3_Samantha_100.wav, 4_Samantha_100.wav, 5_Samantha_100.wav, 6_Samantha_100.wav, 7_Samantha_100.wav, 8_Samantha_100.wav, 9_Samantha_100.wav), and test the 0_Samantha_120.wav file.
But system not recognize test voice is "zero".

little change to the project, then it works well

Hello, everyone!
I run the project, and found some problem, and change code as bellow, then it works well.
1, about the mfcc_batch_generator funtion, which generate a batch of sound feature data and labels, but in the trainning step, the data is not updated by the next() in the loop. So I add a new function mfcc_batch_generatorEx simaliar to mfcc_batch_generator in speech_data.py file:
def mfcc_batch_generatorEx(batch_size=10, source=Source.DIGIT_WAVES, target=Target.digits):
maybe_download(source, DATA_DIR)
if target == Target.speaker:
speakers = get_speakers()
batch_features = []
labels = []
files = os.listdir(path)

print("loaded batch of %d files" % len(files))
shuffle(files)
for wav in files:
    if not wav.endswith(".wav"): 
        continue
    wave, sr = librosa.load(path+wav, mono=True)
    if target==Target.speaker: 
        label=one_hot_from_item(speaker(wav), speakers)
    elif target==Target.digits:  
        label=dense_to_one_hot(int(wav[0]),10)
    elif target==Target.first_letter:  
        label=dense_to_one_hot((ord(wav[0]) - 48) % 32,32)
    else: 
        raise Exception("todo : labels for Target!")
    labels.append(label)
    mfcc = librosa.feature.mfcc(wave, sr)
    # print(np.array(mfcc).shape)
    mfcc = np.pad(mfcc,((0,0),(0,80-len(mfcc[0]))), mode='constant', constant_values=0)
    batch_features.append(np.array(mfcc))
return batch_features, labels

2、 in demo.py file, generate all sound features and labels by using bellow
X, Y = speech_data.mfcc_batch_generatorEx(batch_size)
3、 in the training step, using code bellow:
with tf.Session() as sess:
model.fit(trainX, trainY, n_epoch=training_iters)#, validation_set=(testX, testY), show_metric=True,batch_size=batch_size)
_y = model.predict(X)
YY = [x.tolist() for x in Y]
corrent_prediction = tf.equal(tf.arg_max(_y,1), tf.arg_max(YY,1))
accuracy = tf.reduce_mean(tf.cast(corrent_prediction, tf.float32))
print("\n\ncorrent_prediction = " , sess.run(accuracy) )

model.save("tflearn.lstm.model")

Open Questions, Failed to load the native TensorFlow runtime

Hi I have deployed my AI chatbot application in the Heroku, Im getting error like ( Failed to load the native TensorFlow runtime) why this happen? how do i resolve this ?

from tensorflow.contrib.rnn.python.ops.core_rnn import static_rnn as _rnn, \ ImportError: No module named core_rnn

$ python demo.py
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
Traceback (most recent call last):
File "demo.py", line 2, in
import tflearn
File "/usr/local/lib/python2.7/dist-packages/tflearn/init.py", line 21, in
from .layers import normalization
File "/usr/local/lib/python2.7/dist-packages/tflearn/layers/init.py", line 10, in
from .recurrent import lstm, gru, simple_rnn, bidirectional_rnn,
File "/usr/local/lib/python2.7/dist-packages/tflearn/layers/recurrent.py", line 8, in
from tensorflow.contrib.rnn.python.ops.core_rnn import static_rnn as _rnn,
ImportError: No module named core_rnn

LINK NOT WORKING

This link to download data is not working.
http://pannous.net/files/spoken_numbers_pcm.

url not working

Total number of steps while training

What is the total number of training steps?

Failed to load the native TensorFlow runtime.

Traceback (most recent call last):
File "C:\python36\lib\site-packages\tensorflow\python\pywrap_tensorflow.py", line 41, in
from tensorflow.python.pywrap_tensorflow_internal import *
File "C:\python36\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 35, in
_pywrap_tensorflow_internal = swig_import_helper()
File "C:\python36\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 30, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
File "C:\python36\lib\imp.py", line 242, in load_module
return load_dynamic(name, filename, file)
File "C:\python36\lib\imp.py", line 342, in load_dynamic
return _load(spec)
ImportError: DLL load failed: The specified module could not be found.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "D:\speech\tacotron-master\synthesizer.py", line 3, in
import tensorflow as tf
File "C:\python36\lib\site-packages\tensorflow_init_.py", line 24, in
from tensorflow.python import *
File "C:\python36\lib\site-packages\tensorflow\python_init_.py", line 51, in
from tensorflow.python import pywrap_tensorflow
File "C:\python36\lib\site-packages\tensorflow\python\pywrap_tensorflow.py", line 52, in
raise ImportError(msg)
ImportError: Traceback (most recent call last):
File "C:\python36\lib\site-packages\tensorflow\python\pywrap_tensorflow.py", line 41, in
from tensorflow.python.pywrap_tensorflow_internal import *
File "C:\python36\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 35, in
_pywrap_tensorflow_internal = swig_import_helper()
File "C:\python36\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 30, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
File "C:\python36\lib\imp.py", line 242, in load_module
return load_dynamic(name, filename, file)
File "C:\python36\lib\imp.py", line 342, in load_dynamic
return _load(spec)
ImportError: DLL load failed: The specified module could not be found.

Failed to load the native TensorFlow runtime.

See https://www.tensorflow.org/install/install_sources#common_installation_problems

for some common reasons and solutions. Include the entire stack trace
above this error message when asking for help.

Training not using GPU

Hey everyone!

I'm trying to train my model with Sirajs Code. Unfortunately it takes ages to train (a few days). I have installed Tensorflow with GPU support, but the Code is not using any of the GPUs capacity. What am I getting wrong? Any suggestions? In Sirajs Video he says, that it takes some ours to train fully...

Thanks!
Cheers
julitos

How can I use my voice to predict? Can someone provide some idea? Please

Voice change to text

Can this demo make voice change to text.

Invalid objective: catagorical_crossentropy

While executing demo.py I came across with some errors. I solved some of them but I can't solve this one.

Looking for data spoken_numbers_pcm.tar in data/
Extracting data/spoken_numbers_pcm.tar to data/
'tar' is not recognized as an internal or external command,
operable program or batch file.
Data ready!
loaded batch of 2402 files
Traceback (most recent call last):
File "demo.py", line 15, in
net = tflearn.regression(net, optimizer='adam', learning_rate=learning_rate, loss='catagorical_crossentropy')
File "C:\python35\lib\site-packages\tflearn\layers\estimator.py", line 174, in regression
loss = objectives.get(loss)(incoming, placeholder)
File "C:\python35\lib\site-packages\tflearn\objectives.py", line 10, in get
return get_from_module(identifier, globals(), 'objective')
File "C:\python35\lib\site-packages\tflearn\utils.py", line 25, in get_from_module
raise Exception('Invalid ' + str(module_name) + ': ' + str(identifier))
Exception: Invalid objective: catagorical_crossentropy

This occurs everytime I run demo.py (Note that Data set is already downloaded and is saved in the folder "data/")

Can someone tell me how to fix this issue?

Exception in thread Thread-1915

Running python demo.py i get this error
Exception in thread Thread-1915:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "/home/tensorflow/.local/lib/python2.7/site-packages/tflearn/data_flow.py", line 240, in wait_for_threads
self.coord.join(self.threads)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/coordinator.py", line 390, in join
" ".join(stragglers))
RuntimeError: Coordinator stopped with threads still running: Thread-1914

Exception in thread Thread-1915

spoken_words.tar

Does anyone have this file spoken_words.tar ? Cant get it from the dropbox link

llsourcell / tensorflow_speech_recognition_demo Goto Github PK

tensorflow_speech_recognition_demo's Introduction

tensorflow_speech_recognition_demo

Overview

Dependencies

Usage

Challenge

Credits

tensorflow_speech_recognition_demo's People

Contributors

Stargazers

Watchers

Forkers

tensorflow_speech_recognition_demo's Issues

Recommend Projects

Recommend Topics

Recommend Org