Speech Quantum Deep Learning

Quantum Machine Learning for Speech Recognition.

NEW Released the quantum speech processing code! (12/24)

Paper Link "Decentralizing Feature Extraction with Quantum Convolutional Neural Network for Automatic Speech Recognition"

1. Environment

option 1: from conda and pip install

conda install -c anaconda tensorflow-gpu=2.0
conda install -c conda-forge scikit-learn 
conda install -c conda-forge librosa 
pip install pennylane --upgrade

option 2: from environment.yml (for 2080 Ti with CUDA 10.0)

conda env create -f environment.yml

Origin with tensorflow 2.0 with CUDA 10.0.

2. Dataset

We use Google Speech Commands Dataset V1 for Limited-Vocabulary Speech Recognition.

mkdir ../dataset
wget http://download.tensorflow.org/data/speech_commands_v0.01.tar.gz
tar -xf speech_commands_v0.01.tar.gz

2.1. Pre-processed Features

We provide 2000 pre-processed feautres in ./data_quantum, which included both mel features, and (2,2) quanvolution features with 1500 for training and 500 for testing. You could get 90.6% test accuracy by the provided data.

You could use np.load to load these features to train your own quantum speech processing model in 3.1.

2.2. Audio Features Extraction (optional)

Please set the sampling rate sr and data ratio (--port N for 1/N data; --port 1 for all data) for extracting Mel Features.

python main_qsr.py --sr 16000 --port 100 --mel 1 --quanv 1

2.3. Quanvolution Encoding (optional)

If you have pre-load audio features from 2.2. you can set the quantum convolution kernal size in helper_q_tool.py function quanv. We provide an example for kernal size = 3 in line 57.

You will see a message below during the Quanvolution Encoding with features extraction comment from 2.2..

===== Shape 60 126
Kernal =  2
Quantum pre-processing of train Speech:
2/175

3. Training

3.1 QCNN U-Net Bi-LSTM Attention Model

Spoken Terms Recognition with additional U-Net Encoder discussed in our work.

python main_qsr.py

In 25 epochs. One way to improve the recognition system performance is to encode more data for training, refer to 2.2. and 2.3.

1500/1500 [==============================] - 3s 2ms/sample - val_loss: 0.4408 - val_accuracy: 0.9060

Alternatively, training without U-Net as the method proposed in Douglas C. de Andrade et al. similar to their implementation but without kapre layers.

Please set use_Unet = False. in model.py.

def attrnn_Model(x_in, labels, ablation = False):
    # simple LSTM
    rnn_func = L.LSTM
    use_Unet = False

3.2 Neural Saliency by Class Activation Mapping (CAM)

python cam_sp.py

3.3 CTC Model for Automatic Speech Recognition

We also provide a CTC model with Word Error Rate (WER) evaluation for future studies to the community refer to the discussion.

For example, an output "y-e--a" of input "yes" is identified as an incorrect word with the CTC alignment.

Noted this Quantum ASR CTC version is only supported tensorflow-gpu==2.3. Please create a new environment for running this experiment.

unzip the features for asr

cd data_quantum/asr_set
bash unzip.sh

run the ctc model in ./speech_quantum_dl

python qsr_ctc_wer.py

Result pre-trained weight in `checkpoints/asr_ctc_demo.hdf5`

Epoch 32/50
107/107 [==============================] - 5s 49ms/step - loss: 0.1191 - val_loss: 0.7115
Epoch 33/50
107/107 [==============================] - 5s 49ms/step - loss: 0.1547 - val_loss: 0.6701
=== WER: 9.895833333333334  %

Tutorial Link.

Only for academic purpose. Feel free to contact the author for the other purposes.

Reference

If you think this work helps your research or use the code, please consider reference our paper. Thank you!

@article{yang2020decentralizing,
  title={Decentralizing Feature Extraction with Quantum Convolutional Neural Network for Automatic Speech Recognition},
  author={Yang, Chao-Han Huck and Qi, Jun and Chen, Samuel Yen-Chi and Chen, Pin-Yu and Siniscalchi, Sabato Marco and Ma, Xiaoli and Lee, Chin-Hui},
  journal={arXiv preprint arXiv:2010.13309},
  year={2020}
}

Acknowledgment

We would like to appreciate Xanadu AI for providing the PennyLane and IBM research for providing qiskit and quantum hardware to the community. There is no conflict of interest.

FAQ

Since the area between speech and quantum ML is still quite new, please feel free to open a issue for discussion.

Feel free to use this implementation for other speech processing or sequence modeling tasks (e.g., speaker recognition, speech seperation, event detection ...) as the quantum advantages discussed in the paper.

rkuo2000 / speech_quantum_dl Goto Github PK

speech_quantum_dl's Introduction

Speech Quantum Deep Learning

1. Environment

2. Dataset

2.1. Pre-processed Features

2.2. Audio Features Extraction (optional)

2.3. Quanvolution Encoding (optional)

3. Training

3.1 QCNN U-Net Bi-LSTM Attention Model

3.2 Neural Saliency by Class Activation Mapping (CAM)

3.3 CTC Model for Automatic Speech Recognition

Result pre-trained weight in checkpoints/asr_ctc_demo.hdf5

Reference

Acknowledgment

FAQ

speech_quantum_dl's People

Contributors

Watchers

Recommend Projects

Recommend Topics

Recommend Org

Result pre-trained weight in `checkpoints/asr_ctc_demo.hdf5`