Giter Site home page Giter Site logo

rkuo2000 / speech_quantum_dl Goto Github PK

View Code? Open in Web Editor NEW

This project forked from huckiyang/quantumspeech-qcnn

0.0 1.0 0.0 877.85 MB

Quantum Convolution Neural Networks for Speech Processing and Automatic Speech Recognition.

License: MIT License

Python 99.52% Shell 0.48%

speech_quantum_dl's Introduction

Speech Quantum Deep Learning

Quantum Machine Learning for Speech Recognition.

  • NEW Released the quantum speech processing code! (12/24)

  • Paper Link "Decentralizing Feature Extraction with Quantum Convolutional Neural Network for Automatic Speech Recognition"

1. Environment

  • option 1: from conda and pip install
conda install -c anaconda tensorflow-gpu=2.0
conda install -c conda-forge scikit-learn 
conda install -c conda-forge librosa 
pip install pennylane --upgrade 
  • option 2: from environment.yml (for 2080 Ti with CUDA 10.0)
conda env create -f environment.yml

Origin with tensorflow 2.0 with CUDA 10.0.

2. Dataset

We use Google Speech Commands Dataset V1 for Limited-Vocabulary Speech Recognition.

mkdir ../dataset
wget http://download.tensorflow.org/data/speech_commands_v0.01.tar.gz
tar -xf speech_commands_v0.01.tar.gz

2.1. Pre-processed Features

We provide 2000 pre-processed feautres in ./data_quantum, which included both mel features, and (2,2) quanvolution features with 1500 for training and 500 for testing. You could get 90.6% test accuracy by the provided data.

You could use np.load to load these features to train your own quantum speech processing model in 3.1.

2.2. Audio Features Extraction (optional)

Please set the sampling rate sr and data ratio (--port N for 1/N data; --port 1 for all data) for extracting Mel Features.

python main_qsr.py --sr 16000 --port 100 --mel 1 --quanv 1

2.3. Quanvolution Encoding (optional)

If you have pre-load audio features from 2.2. you can set the quantum convolution kernal size in helper_q_tool.py function quanv. We provide an example for kernal size = 3 in line 57.

You will see a message below during the Quanvolution Encoding with features extraction comment from 2.2..

===== Shape 60 126
Kernal =  2
Quantum pre-processing of train Speech:
2/175

3. Training

3.1 QCNN U-Net Bi-LSTM Attention Model

Spoken Terms Recognition with additional U-Net Encoder discussed in our work.

python main_qsr.py

In 25 epochs. One way to improve the recognition system performance is to encode more data for training, refer to 2.2. and 2.3.

1500/1500 [==============================] - 3s 2ms/sample - val_loss: 0.4408 - val_accuracy: 0.9060                              

Please set use_Unet = False. in model.py.

def attrnn_Model(x_in, labels, ablation = False):
    # simple LSTM
    rnn_func = L.LSTM
    use_Unet = False

3.2 Neural Saliency by Class Activation Mapping (CAM)

python cam_sp.py

3.3 CTC Model for Automatic Speech Recognition

We also provide a CTC model with Word Error Rate (WER) evaluation for future studies to the community refer to the discussion.

For example, an output "y-e--a" of input "yes" is identified as an incorrect word with the CTC alignment.

Noted this Quantum ASR CTC version is only supported tensorflow-gpu==2.3. Please create a new environment for running this experiment.

  • unzip the features for asr
cd data_quantum/asr_set
bash unzip.sh
  • run the ctc model in ./speech_quantum_dl
python qsr_ctc_wer.py

Result pre-trained weight in checkpoints/asr_ctc_demo.hdf5

Epoch 32/50
107/107 [==============================] - 5s 49ms/step - loss: 0.1191 - val_loss: 0.7115
Epoch 33/50
107/107 [==============================] - 5s 49ms/step - loss: 0.1547 - val_loss: 0.6701
=== WER: 9.895833333333334  % 

Tutorial Link.

  • Only for academic purpose. Feel free to contact the author for the other purposes.

Reference

If you think this work helps your research or use the code, please consider reference our paper. Thank you!

@article{yang2020decentralizing,
  title={Decentralizing Feature Extraction with Quantum Convolutional Neural Network for Automatic Speech Recognition},
  author={Yang, Chao-Han Huck and Qi, Jun and Chen, Samuel Yen-Chi and Chen, Pin-Yu and Siniscalchi, Sabato Marco and Ma, Xiaoli and Lee, Chin-Hui},
  journal={arXiv preprint arXiv:2010.13309},
  year={2020}
}

Acknowledgment

We would like to appreciate Xanadu AI for providing the PennyLane and IBM research for providing qiskit and quantum hardware to the community. There is no conflict of interest.

FAQ

Since the area between speech and quantum ML is still quite new, please feel free to open a issue for discussion.

Feel free to use this implementation for other speech processing or sequence modeling tasks (e.g., speaker recognition, speech seperation, event detection ...) as the quantum advantages discussed in the paper.

speech_quantum_dl's People

Contributors

huckiyang avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.