Giter Site home page Giter Site logo

tensorflow_speech_recognition_demo's Introduction

tensorflow_speech_recognition_demo

This is the code for 'How to Make a Simple Tensorflow Speech Recognizer' by @Sirajology on Youtube

Overview

This is the full code for 'How to Make a Simple Tensorflow Speech Recognizer' by @Sirajology on Youtube. In this demo code we build an LSTM recurrent neural network using the TFLearn high level Tensorflow-based library to train on a labeled dataset of spoken digits. Then we test it on spoken digits.

Dependencies

Use pip to install any missing dependencies

Usage

Run the following code in terminal. This will take a couple hours to train fully.

python demo.py

Challenge

The weekly challenge is from the last video, it's still running! Check it out here

Credits

Credit for the vast majority of code here goes to pannouse. I've merely created a wrapper to get people started!

tensorflow_speech_recognition_demo's People

Contributors

llsourcell avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tensorflow_speech_recognition_demo's Issues

Feature shape

Hi,

just wanted to mention the lstm layer of tflearn accepts features of shape [ batch x time x features_dim ] and in the script they are passed like [batch x features_dim x time ].

not working on Windows

Hi,

I get the following error when I run demo.py

for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno 11001] getaddrinfo failed
    raise URLError(err)
urllib.error.URLError: <urlopen error [Errno 11001] getaddrinfo failed>

I tried to manually download the data files but the link is not working!

Appreciate your help,

Failed to load the native TensorFlow runtime

Traceback (most recent call last):
File "demo.py", line 2, in
import tflearn
File "/usr/local/lib/python2.7/site-packages/tflearn/init.py", line 4, in
from . import config
File "/usr/local/lib/python2.7/site-packages/tflearn/config.py", line 3, in
import tensorflow as tf
File "/usr/local/lib/python2.7/site-packages/tensorflow/init.py", line 24, in
from tensorflow.python import *
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/init.py", line 72, in
raise ImportError(msg)
ImportError: Traceback (most recent call last):
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/init.py", line 61, in
from tensorflow.python import pywrap_tensorflow
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow.py", line 28, in
_pywrap_tensorflow = swig_import_helper()
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow', fp, pathname, description)
ImportError: dlopen(/usr/local/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow.so, 10): Library not loaded: @rpath/libcudart.8.0.dylib
Referenced from: /usr/local/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow.so
Reason: image not found

Failed to load the native TensorFlow runtime.

See https://github.com/tensorflow/tensorflow/blob/master/tensorflow/g3doc/get_started/os_setup.md#import_error

for some common reasons and solutions. Include the entire stack trace
above this error message when asking for help.

how to use the model to predict?

model.load("tflearn.lstm.model")
_y=model.predict(X)
print( numpy.argmax( Y[63]))
print (numpy.argmax( _y[63]))

the output is :
[ 0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
[0.032069575041532516, 0.0994795486330986, 0.2319323718547821, 0.04906868934631348, 0.11966893821954727, 0.1732122153043747, 0.013387128710746765, 0.07908126711845398, 0.1753220111131668, 0.02677823416888714]

the argmax seems is always not match.and I can't find anything useful from the output

speech_data fails to properly extract .tar

In the demo, I'm getting

Looking for data spoken_numbers_pcm.tar in data/
Extracting data/spoken_numbers_pcm.tar to data/
Data ready!

and then

FileNotFoundError: [Errno 2] No such file or directory: 'data/spoken_numbers_pcm/'

It successfully creates data/ but the file spoken_numbers_pcm.tar fails to extract, I'm left with the plain tar file in the dir.

I don't think it's a permissions thing. Here is the permissions of the downloaded file:
-rw-r--r-- 1 mm mm 38M Dec 10 11:20 spoken_numbers_pcm.tar
Setting chmod 666 doesn't help, so I don't think that is it.

I'm pretty sure this block in speech_data.maybe_download() is the point of failure:

if os.path.exists(filepath):
    print('Extracting %s to %s' % ( filepath, work_directory))
    os.system('tar xf '+filepath)
    print('Data ready!')

Not sure why it's failing, but I would recommend using the tarfile library for better portability and reliability. Have you looked into using the subprocess library at all? I highly recommend for times when you have to interface with other programs!

System: Python 3.5.2, Jupyter, Mint 18

Cheers,

ValueError: At least two variables have the same name: FullyConnected/W

/home/mg/anaconda2/envs/tensorflow/lib/python2.7/site-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from ._conv import register_converters as _register_converters
Looking for data spoken_numbers_pcm.tar in data/
Extracting data/spoken_numbers_pcm.tar to data/
Data ready!
loaded batch of 2402 files
WARNING:tensorflow:VARIABLES collection name is deprecated, please use GLOBAL_VARIABLES instead; VARIABLES will be removed after 2017-03-02.
2018-01-31 12:07:20.026732: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2
Traceback (most recent call last):
File "demo.py", line 32, in
model = tflearn.DNN(net, tensorboard_verbose=0)
File "/home/mg/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tflearn/models/dnn.py", line 65, in init
best_val_accuracy=best_val_accuracy)
File "/home/mg/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tflearn/helpers/trainer.py", line 137, in init
allow_empty=True)
File "/home/mg/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1239, in init
self.build()
File "/home/mg/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1248, in build
self._build(self._filename, build_save=True, build_restore=True)
File "/home/mg/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1284, in _build
build_save=build_save, build_restore=build_restore)
File "/home/mg/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 743, in _build_internal
saveables = self._ValidateAndSliceInputs(names_to_saveables)
File "/home/mg/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 596, in _ValidateAndSliceInputs
names_to_saveables = BaseSaverBuilder.OpListToDict(names_to_saveables)
File "/home/mg/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 561, in OpListToDict
name)
ValueError: At least two variables have the same name: FullyConnected/W

this project don't works, bad porject.

while(1):
means that it would never stop ,i have been train for few days ,but just never get finished.
And the code was also wrong in some ways. The train data would be always the initial 64 audios.and train it with while(true), so we'd always training our model with the same initial 64 audios
In face , X,Y=nextbatch should be get inside of the whlie loop,Then each epoch can get the next 64 audio'data to training modle.

Traceback demo.py line 15, and urllib.error.URLError

Hi, I have an error saying, urllib.error.URLError: <urlopen error [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond>. I have an Traceback error saying,
Traceback (most recent call last): File "demo.py", line 15, in <module> X, Y = next(batch).

The screenshot of my screen in Command Prompt is here.

errorspeechrecognition

import skimage.io?

I'm having trouble locating where I can download scikit (I think that's what I need). Can someone point me in the right direction? I'm using python 3.5.2 64-bit windows 10. Thanks! Here is the error message:
Traceback (most recent call last):
File "C:\Users--------\Desktop\listen\demo.py", line 3, in
import speech_data
File "C:\Users--------\Desktop\listen\speech_data.py", line 11, in
import skimage.io # scikit-image
ImportError: No module named 'skimage'

'tar' is not recognized as an internal or external command, operable program, or batch file

Como es what? It's throwing this error.

'tar' is not recognized as an internal or external command,
operable program or batch file.
Traceback (most recent call last):
File "", line 1, in
File "C:\Python\lib\site-packages\spyder\utils\site\sitecustomize.py", line 880, in runfile
execfile(filename, namespace)
File "C:\Python\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/Users/Jackson Andrews/Irene.py", line 16, in
X, Y = next(batch)
File "C:\Users\Jackson Andrews\speech_data.py", line 164, in mfcc_batch_generator
files = os.listdir(path)
FileNotFoundError: [WinError 3] The system cannot find the path specified: 'data/spoken_numbers_pcm/'

How to realise the output prediction

After editing the #7 & #30 line, the code gives some output but in array format like
[[0.08132637 0.09584257 0.0770199 0.0861607 0.10533866 0.12470371
0.1259584 0.10570324 0.08132752 0.11661901].
so from this how to realise the prediction of word & how someone know that this ans predicts which audio file.

TypeError in Speech_data.py: maybe_download takes 2 arguments but only one is passed

Note: I am new to python and machine learning so I maybe missing an obvious issue.

I am getting an error in file "speech_data.py"
TypeError: maybe_download() takes exactly 2 arguments (1 given)

Can someone explain to me why there is only one argument being passed into maybe_download when it is defined with two parameters? If the argument being passed is multivariate, then how do I address the error that I am being given?

Can we test a voice which is different from the training samples?

I try to use different test samples from traininig samples.
For example i used this train set (0_Samantha_100.wav, 1_Samantha_100.wav, 2_Samantha_100.wav, 3_Samantha_100.wav, 4_Samantha_100.wav, 5_Samantha_100.wav, 6_Samantha_100.wav, 7_Samantha_100.wav, 8_Samantha_100.wav, 9_Samantha_100.wav), and test the 0_Samantha_120.wav file.
But system not recognize test voice is "zero".

little change to the project, then it works well

Hello, everyone!
I run the project, and found some problem, and change code as bellow, then it works well.
1, about the mfcc_batch_generator funtion, which generate a batch of sound feature data and labels, but in the trainning step, the data is not updated by the next() in the loop. So I add a new function mfcc_batch_generatorEx simaliar to mfcc_batch_generator in speech_data.py file:
def mfcc_batch_generatorEx(batch_size=10, source=Source.DIGIT_WAVES, target=Target.digits):
maybe_download(source, DATA_DIR)
if target == Target.speaker:
speakers = get_speakers()
batch_features = []
labels = []
files = os.listdir(path)

print("loaded batch of %d files" % len(files))
shuffle(files)
for wav in files:
    if not wav.endswith(".wav"): 
        continue
    wave, sr = librosa.load(path+wav, mono=True)
    if target==Target.speaker: 
        label=one_hot_from_item(speaker(wav), speakers)
    elif target==Target.digits:  
        label=dense_to_one_hot(int(wav[0]),10)
    elif target==Target.first_letter:  
        label=dense_to_one_hot((ord(wav[0]) - 48) % 32,32)
    else: 
        raise Exception("todo : labels for Target!")
    labels.append(label)
    mfcc = librosa.feature.mfcc(wave, sr)
    # print(np.array(mfcc).shape)
    mfcc = np.pad(mfcc,((0,0),(0,80-len(mfcc[0]))), mode='constant', constant_values=0)
    batch_features.append(np.array(mfcc))
return batch_features, labels 

2ใ€ in demo.py file, generate all sound features and labels by using bellow
X, Y = speech_data.mfcc_batch_generatorEx(batch_size)
3ใ€ in the training step, using code bellow:
with tf.Session() as sess:
model.fit(trainX, trainY, n_epoch=training_iters)#, validation_set=(testX, testY), show_metric=True,batch_size=batch_size)
_y = model.predict(X)
YY = [x.tolist() for x in Y]
corrent_prediction = tf.equal(tf.arg_max(_y,1), tf.arg_max(YY,1))
accuracy = tf.reduce_mean(tf.cast(corrent_prediction, tf.float32))
print("\n\ncorrent_prediction = " , sess.run(accuracy) )

model.save("tflearn.lstm.model")

from tensorflow.contrib.rnn.python.ops.core_rnn import static_rnn as _rnn, \ ImportError: No module named core_rnn

$ python demo.py
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
Traceback (most recent call last):
File "demo.py", line 2, in
import tflearn
File "/usr/local/lib/python2.7/dist-packages/tflearn/init.py", line 21, in
from .layers import normalization
File "/usr/local/lib/python2.7/dist-packages/tflearn/layers/init.py", line 10, in
from .recurrent import lstm, gru, simple_rnn, bidirectional_rnn,
File "/usr/local/lib/python2.7/dist-packages/tflearn/layers/recurrent.py", line 8, in
from tensorflow.contrib.rnn.python.ops.core_rnn import static_rnn as _rnn,
ImportError: No module named core_rnn

Failed to load the native TensorFlow runtime.

Traceback (most recent call last):
File "C:\python36\lib\site-packages\tensorflow\python\pywrap_tensorflow.py", line 41, in
from tensorflow.python.pywrap_tensorflow_internal import *
File "C:\python36\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 35, in
_pywrap_tensorflow_internal = swig_import_helper()
File "C:\python36\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 30, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
File "C:\python36\lib\imp.py", line 242, in load_module
return load_dynamic(name, filename, file)
File "C:\python36\lib\imp.py", line 342, in load_dynamic
return _load(spec)
ImportError: DLL load failed: The specified module could not be found.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "D:\speech\tacotron-master\synthesizer.py", line 3, in
import tensorflow as tf
File "C:\python36\lib\site-packages\tensorflow_init_.py", line 24, in
from tensorflow.python import *
File "C:\python36\lib\site-packages\tensorflow\python_init_.py", line 51, in
from tensorflow.python import pywrap_tensorflow
File "C:\python36\lib\site-packages\tensorflow\python\pywrap_tensorflow.py", line 52, in
raise ImportError(msg)
ImportError: Traceback (most recent call last):
File "C:\python36\lib\site-packages\tensorflow\python\pywrap_tensorflow.py", line 41, in
from tensorflow.python.pywrap_tensorflow_internal import *
File "C:\python36\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 35, in
_pywrap_tensorflow_internal = swig_import_helper()
File "C:\python36\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 30, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
File "C:\python36\lib\imp.py", line 242, in load_module
return load_dynamic(name, filename, file)
File "C:\python36\lib\imp.py", line 342, in load_dynamic
return _load(spec)
ImportError: DLL load failed: The specified module could not be found.

Failed to load the native TensorFlow runtime.

See https://www.tensorflow.org/install/install_sources#common_installation_problems

for some common reasons and solutions. Include the entire stack trace
above this error message when asking for help.

Training not using GPU

Hey everyone!

I'm trying to train my model with Sirajs Code. Unfortunately it takes ages to train (a few days). I have installed Tensorflow with GPU support, but the Code is not using any of the GPUs capacity. What am I getting wrong? Any suggestions? In Sirajs Video he says, that it takes some ours to train fully...

Thanks!
Cheers
julitos

Invalid objective: catagorical_crossentropy

While executing demo.py I came across with some errors. I solved some of them but I can't solve this one.

Looking for data spoken_numbers_pcm.tar in data/
Extracting data/spoken_numbers_pcm.tar to data/
'tar' is not recognized as an internal or external command,
operable program or batch file.
Data ready!
loaded batch of 2402 files
Traceback (most recent call last):
File "demo.py", line 15, in
net = tflearn.regression(net, optimizer='adam', learning_rate=learning_rate, loss='catagorical_crossentropy')
File "C:\python35\lib\site-packages\tflearn\layers\estimator.py", line 174, in regression
loss = objectives.get(loss)(incoming, placeholder)
File "C:\python35\lib\site-packages\tflearn\objectives.py", line 10, in get
return get_from_module(identifier, globals(), 'objective')
File "C:\python35\lib\site-packages\tflearn\utils.py", line 25, in get_from_module
raise Exception('Invalid ' + str(module_name) + ': ' + str(identifier))
Exception: Invalid objective: catagorical_crossentropy

This occurs everytime I run demo.py (Note that Data set is already downloaded and is saved in the folder "data/")

Can someone tell me how to fix this issue?

Exception in thread Thread-1915

Running python demo.py i get this error
Exception in thread Thread-1915:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "/home/tensorflow/.local/lib/python2.7/site-packages/tflearn/data_flow.py", line 240, in wait_for_threads
self.coord.join(self.threads)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/coordinator.py", line 390, in join
" ".join(stragglers))
RuntimeError: Coordinator stopped with threads still running: Thread-1914

Exception in thread Thread-1915

spoken_words.tar

Does anyone have this file spoken_words.tar ? Cant get it from the dropbox link

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.