Giter Site home page Giter Site logo

g2p-seq2seq's Introduction

Build Status

Sequence-to-Sequence G2P toolkit

The tool does Grapheme-to-Phoneme (G2P) conversion using transformer model from tensor2tensor toolkit [1]. A lot of approaches in sequence modeling and transduction problems use recurrent neural networks. But, transformer model architecture eschews recurrence and instead relies entirely on an attention mechanism to draw global dependencies between input and output [2].

This implementation is based on python TensorFlow, which allows an efficient training on both CPU and GPU.

Installation

The tool requires TensorFlow at least version 1.8.0 and Tensor2Tensor version 1.6.6 or higher. Please see the installation guide for TensorFlow installation details, and details about the Tensor2Tensor installation see guide

The g2p-seq2seq package itself uses setuptools, so you can follow standard installation process:

sudo python setup.py install

You can also run the tests

python setup.py test

The runnable script g2p-seq2seq is installed in /usr/local/bin folder by default (you can adjust it with setup.py options if needed) . You need to make sure you have this folder included in your PATH so you can run this script from command line.

Running G2P

A pretrained 3-layer transformer model with 256 hidden units is available for download on cmusphinx website. Unpack the model after download. The model is trained on CMU English dictionary

wget -O g2p-seq2seq-cmudict.tar.gz https://sourceforge.net/projects/cmusphinx/files/G2P%20Models/g2p-seq2seq-model-6.2-cmudict-nostress.tar.gz/download
tar xf g2p-seq2seq-cmudict.tar.gz

The easiest way to check how the tool works is to run it the interactive mode and type the words

$ g2p-seq2seq --interactive --model_dir model_folder_path
...
> hello
...
Pronunciations: [HH EH L OW]
...
>

To generate pronunciations for an English word list with a trained model, run

  g2p-seq2seq --decode your_wordlist --model_dir model_folder_path [--output decode_output_file_path]

The wordlist is a text file with one word per line

If you wish to list top N variants of decoding, set return_beams flag and specify beam_size:

  g2p-seq2seq --decode your_wordlist --model_dir model_folder_path --return_beams --beam_size number_returned_beams [--output decode_output_file_path]

To evaluate Word Error Rate of the trained model, run

  g2p-seq2seq --evaluate your_test_dictionary --model_dir model_folder_path

The test dictionary should be a dictionary in standard format:

hello HH EH L OW
bye B AY

You may also calculate Word Error Rate considering all top N best decoded results. In this case we consider word decoding as error only if none of the decoded pronunciations will match with the ground true pronunciation of the word.

Training G2P system

To train G2P you need a dictionary (word and phone sequence per line). See an example dictionary

  g2p-seq2seq --train train_dictionary.dic --model_dir model_folder_path

You can set up maximum training steps:

  "--max_epochs" - Maximum number of training epochs (Default: 0).
     If 0 train until no improvement is observed

It is a good idea to play with the following parameters:

  "--size" - Size of each model layer (Default: 256).

  "--num_layers" - Number of layers in the model (Default: 3).

  "--filter_size" - The size of the filter layer in a convolutional layer (Default: 512)

  "--num_heads" - Number of applied heads in Multi-attention mechanism (Default: 4)

You can manually point out Development and Test datasets:

  "--valid" - Development dictionary (Default: created from train_dictionary.dic)
  "--test" - Test dictionary (Default: created from train_dictionary.dic)

Otherwise, The program will split the dataset that you feed to it in the training mode itself. In the directory with the training data you will find three data files with the following extensions: ".train", ".dev" and ".test".

In the case where you have raw dictionary with stress (for example, like in CMU English dictionary), you may set the following parameter while launching the train mode:

  "--cleanup" - Set to True to cleanup dictionary from stress and comments.

If you need to continue training a saved model just point out the directory with the existing model:

  g2p-seq2seq --train train_dictionary.dic --model_dir model_folder_path

And, if you want to start training from scratch:

  "--reinit" - Rewrite model in model_folder_path

Also, in case of solving inverse problem:

  "--p2g" - Run the program in a phoneme-to-grapheme conversion mode.

The differences in pronunciations between short and long words can be significant. So, seq2seq models apply bucketing technique to take account of such problems. On the other hand, splitting initial data into too many buckets can worsen the final results. Because in this case there will not be sufficient amount of examples in each particular bucket. To get better results, you may tune the following three parameters that change the number and size of the buckets:

  "--min_length_bucket" - the size of the minimal bucket (Default: 6)
  "--max_length" - maximal possible length of words or maximal number of phonemes in pronunciations (Default: 30)
  "--length_bucket_step" - multiplier that controls the number of length buckets in the data. The buckets have maximum lengths from min_bucket_length to max_length, increasing by factors of length_bucket_step (Default: 1.5)

After training the model, you may freeze it:

  g2p-seq2seq --model_dir model_folder_path --freeze

File "frozen_model.pb" will appear in "model_folder_path" directory after launching previous command. And now, if you run one of the decoding modes, The program will load and use this frozen graph.

Word error rate on CMU dictionary data sets

System WER (CMUdict PRONASYL 2007), % WER (CMUdict latest*), %
Baseline WFST (Phonetisaurus) 24.4 33.89
Transformer num_layers=3, size=256 20.6 30.2
* These results pointed out for dictionary without stress.

References


[1] Lukasz Kaiser. "Accelerating Deep Learning Research with the Tensor2Tensor Library." In Google Research Blog, 2017.

[2] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lucasz Kaiser, and Illia Polosukhin. "Attention Is All You Need." arXiv preprint arXiv:1706.03762, 2017.

g2p-seq2seq's People

Contributors

b00f avatar gorinars avatar lifefeel avatar maxisawesome avatar nigma avatar nshmyrev avatar nurtas-m avatar robrechtme avatar shahawi avatar vagrawal avatar viralmehtaswe avatar zhanwenchen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

g2p-seq2seq's Issues

WER 0.46?

I ran with 2 layers and 512 units but got nowhere close to reported?
Is this execution correct?
python -u g2p.py --train ../../cmudict/cmudict.dict --size 512
Preparing G2P data
Creating vocabularies in /tmp
Creating vocabulary /tmp/vocab.phoneme
Creating vocabulary /tmp/vocab.grapheme
Reading development and training data.
Creating 2 layers of 512 units.
Reading model parameters from /tmp/translate.ckpt-200
global step 400 learning rate 0.5000 step-time 2.78 perplexity 7.83
eval: bucket 0 perplexity 4.88
eval: bucket 1 perplexity 6.30
eval: bucket 2 perplexity 12.34
global step 600 learning rate 0.5000 step-time 2.71 perplexity 4.34
eval: bucket 0 perplexity 2.48
eval: bucket 1 perplexity 2.96
eval: bucket 2 perplexity 4.78
global step 800 learning rate 0.5000 step-time 2.63 perplexity 2.72
eval: bucket 0 perplexity 1.75
eval: bucket 1 perplexity 2.15
eval: bucket 2 perplexity 3.45
global step 1000 learning rate 0.5000 step-time 2.56 perplexity 2.26
eval: bucket 0 perplexity 1.65
eval: bucket 1 perplexity 1.84
eval: bucket 2 perplexity 3.17
global step 1200 learning rate 0.5000 step-time 2.68 perplexity 2.00
eval: bucket 0 perplexity 1.29
eval: bucket 1 perplexity 1.69
eval: bucket 2 perplexity 2.57
global step 1400 learning rate 0.5000 step-time 2.86 perplexity 1.84
eval: bucket 0 perplexity 1.48
eval: bucket 1 perplexity 1.70
eval: bucket 2 perplexity 2.15
global step 1600 learning rate 0.5000 step-time 3.40 perplexity 1.76
eval: bucket 0 perplexity 1.65
eval: bucket 1 perplexity 1.67
eval: bucket 2 perplexity 2.18
global step 1800 learning rate 0.5000 step-time 3.65 perplexity 1.71
eval: bucket 0 perplexity 1.42
eval: bucket 1 perplexity 1.79
eval: bucket 2 perplexity 2.04
global step 2000 learning rate 0.5000 step-time 2.68 perplexity 1.56
eval: bucket 0 perplexity 1.30
eval: bucket 1 perplexity 1.53
eval: bucket 2 perplexity 1.83
global step 2200 learning rate 0.5000 step-time 3.33 perplexity 1.61
eval: bucket 0 perplexity 1.50
eval: bucket 1 perplexity 1.66
eval: bucket 2 perplexity 1.70
global step 2400 learning rate 0.5000 step-time 3.01 perplexity 1.52
eval: bucket 0 perplexity 1.29
eval: bucket 1 perplexity 1.47
eval: bucket 2 perplexity 1.79
global step 2600 learning rate 0.5000 step-time 3.09 perplexity 1.53
eval: bucket 0 perplexity 1.34
eval: bucket 1 perplexity 1.57
eval: bucket 2 perplexity 1.90
global step 2800 learning rate 0.5000 step-time 2.92 perplexity 1.49
eval: bucket 0 perplexity 1.35
eval: bucket 1 perplexity 1.67
eval: bucket 2 perplexity 1.85
global step 3000 learning rate 0.5000 step-time 2.82 perplexity 1.44
eval: bucket 0 perplexity 1.39
eval: bucket 1 perplexity 1.55
eval: bucket 2 perplexity 1.81
global step 3200 learning rate 0.5000 step-time 2.68 perplexity 1.43
eval: bucket 0 perplexity 1.49
eval: bucket 1 perplexity 1.35
eval: bucket 2 perplexity 1.87
global step 3400 learning rate 0.5000 step-time 2.90 perplexity 1.41
eval: bucket 0 perplexity 1.35
eval: bucket 1 perplexity 1.56
eval: bucket 2 perplexity 1.73
global step 3600 learning rate 0.5000 step-time 2.79 perplexity 1.40
eval: bucket 0 perplexity 1.27
eval: bucket 1 perplexity 1.32
eval: bucket 2 perplexity 1.59
global step 3800 learning rate 0.5000 step-time 2.87 perplexity 1.38
eval: bucket 0 perplexity 1.52
eval: bucket 1 perplexity 1.46
eval: bucket 2 perplexity 1.52
global step 4000 learning rate 0.5000 step-time 2.74 perplexity 1.36
eval: bucket 0 perplexity 1.49
eval: bucket 1 perplexity 1.41
eval: bucket 2 perplexity 1.83
global step 4200 learning rate 0.5000 step-time 2.80 perplexity 1.37
eval: bucket 0 perplexity 1.23
eval: bucket 1 perplexity 1.36
eval: bucket 2 perplexity 1.58
global step 4400 learning rate 0.5000 step-time 2.94 perplexity 1.36
eval: bucket 0 perplexity 1.58
eval: bucket 1 perplexity 1.53
eval: bucket 2 perplexity 1.73
global step 4600 learning rate 0.5000 step-time 3.16 perplexity 1.35
eval: bucket 0 perplexity 1.25
eval: bucket 1 perplexity 1.54
eval: bucket 2 perplexity 1.58
global step 4800 learning rate 0.5000 step-time 2.74 perplexity 1.33
eval: bucket 0 perplexity 1.44
eval: bucket 1 perplexity 1.60
eval: bucket 2 perplexity 1.72
global step 5000 learning rate 0.5000 step-time 2.77 perplexity 1.33
eval: bucket 0 perplexity 1.36
eval: bucket 1 perplexity 1.38
eval: bucket 2 perplexity 1.60
global step 5200 learning rate 0.5000 step-time 2.97 perplexity 1.32
eval: bucket 0 perplexity 1.29
eval: bucket 1 perplexity 1.41
eval: bucket 2 perplexity 1.66
global step 5400 learning rate 0.5000 step-time 2.77 perplexity 1.30
eval: bucket 0 perplexity 1.31
eval: bucket 1 perplexity 1.52
eval: bucket 2 perplexity 1.45
global step 5600 learning rate 0.5000 step-time 2.80 perplexity 1.30
eval: bucket 0 perplexity 1.31
eval: bucket 1 perplexity 1.28
eval: bucket 2 perplexity 1.75
global step 5800 learning rate 0.5000 step-time 2.64 perplexity 1.29
eval: bucket 0 perplexity 1.42
eval: bucket 1 perplexity 1.33
eval: bucket 2 perplexity 1.41
global step 6000 learning rate 0.5000 step-time 2.76 perplexity 1.28
eval: bucket 0 perplexity 1.26
eval: bucket 1 perplexity 1.39
eval: bucket 2 perplexity 1.48
global step 6200 learning rate 0.5000 step-time 2.55 perplexity 1.28
eval: bucket 0 perplexity 1.37
eval: bucket 1 perplexity 1.37
eval: bucket 2 perplexity 1.67
global step 6400 learning rate 0.5000 step-time 2.68 perplexity 1.26
eval: bucket 0 perplexity 1.23
eval: bucket 1 perplexity 1.50
eval: bucket 2 perplexity 1.44
global step 6600 learning rate 0.5000 step-time 2.98 perplexity 1.26
eval: bucket 0 perplexity 1.12
eval: bucket 1 perplexity 1.54
eval: bucket 2 perplexity 1.47
global step 6800 learning rate 0.5000 step-time 2.87 perplexity 1.26
eval: bucket 0 perplexity 1.22
eval: bucket 1 perplexity 1.29
eval: bucket 2 perplexity 1.56
global step 7000 learning rate 0.5000 step-time 2.81 perplexity 1.26
eval: bucket 0 perplexity 1.22
eval: bucket 1 perplexity 1.45
eval: bucket 2 perplexity 1.54
global step 7200 learning rate 0.5000 step-time 2.76 perplexity 1.25
eval: bucket 0 perplexity 1.35
eval: bucket 1 perplexity 1.46
eval: bucket 2 perplexity 1.40
global step 7400 learning rate 0.5000 step-time 3.06 perplexity 1.24
eval: bucket 0 perplexity 1.18
eval: bucket 1 perplexity 1.26
eval: bucket 2 perplexity 1.48
global step 7600 learning rate 0.5000 step-time 3.15 perplexity 1.25
eval: bucket 0 perplexity 1.47
eval: bucket 1 perplexity 1.31
eval: bucket 2 perplexity 1.50
global step 7800 learning rate 0.5000 step-time 3.13 perplexity 1.24
eval: bucket 0 perplexity 1.50
eval: bucket 1 perplexity 1.43
eval: bucket 2 perplexity 1.46
global step 8000 learning rate 0.5000 step-time 2.76 perplexity 1.23
eval: bucket 0 perplexity 1.39
eval: bucket 1 perplexity 1.37
eval: bucket 2 perplexity 1.47
global step 8200 learning rate 0.5000 step-time 2.64 perplexity 1.22
eval: bucket 0 perplexity 1.30
eval: bucket 1 perplexity 1.25
eval: bucket 2 perplexity 1.59
global step 8400 learning rate 0.5000 step-time 2.38 perplexity 1.23
eval: bucket 0 perplexity 1.42
eval: bucket 1 perplexity 1.43
eval: bucket 2 perplexity 1.45
global step 8600 learning rate 0.5000 step-time 2.53 perplexity 1.21
eval: bucket 0 perplexity 1.42
eval: bucket 1 perplexity 1.33
eval: bucket 2 perplexity 1.39
global step 8800 learning rate 0.5000 step-time 2.58 perplexity 1.21
eval: bucket 0 perplexity 1.21
eval: bucket 1 perplexity 1.31
eval: bucket 2 perplexity 1.50
global step 9000 learning rate 0.5000 step-time 2.88 perplexity 1.21
eval: bucket 0 perplexity 1.36
eval: bucket 1 perplexity 1.30
eval: bucket 2 perplexity 1.57
global step 9200 learning rate 0.5000 step-time 3.03 perplexity 1.21
eval: bucket 0 perplexity 1.47
eval: bucket 1 perplexity 1.45
eval: bucket 2 perplexity 1.38
global step 9400 learning rate 0.5000 step-time 2.77 perplexity 1.20
eval: bucket 0 perplexity 1.39
eval: bucket 1 perplexity 1.29
eval: bucket 2 perplexity 1.55
global step 9600 learning rate 0.5000 step-time 2.86 perplexity 1.19
eval: bucket 0 perplexity 1.53
eval: bucket 1 perplexity 1.35
eval: bucket 2 perplexity 1.46
global step 9800 learning rate 0.5000 step-time 2.87 perplexity 1.19
eval: bucket 0 perplexity 1.43
eval: bucket 1 perplexity 1.43
eval: bucket 2 perplexity 1.80
global step 10000 learning rate 0.5000 step-time 2.74 perplexity 1.18
eval: bucket 0 perplexity 1.36
eval: bucket 1 perplexity 1.50
eval: bucket 2 perplexity 1.45
Training process stopped.
Beginning calculation word error rate (WER) on test sample.
WER : 0.469490521327
Accuracy : 0.530509478673

Supported languages.

Does it support arabic language, if not what alternative method could be used for text to phoneme for arabic text?

Review training stop criteria

Many steps should have the same perplexity before training ends. The number of steps thus could be reduced significantly if we stop when we have the same perplexity 4 times or so. It should not affect the accuracy.

Float divison by zero

g2p-seq2seq --evaluate NEWARABIC/test.wordlist --model NEWARABIC
Creating 2 layers of 64 units.
Reading model parameters from NEWARABIC
Beginning calculation word error rate (WER) on test sample.
Words : 0
Errors: 0
Traceback (most recent call last):
File "/usr/local/bin/g2p-seq2seq", line 9, in
load_entry_point('g2p-seq2seq==5.0.0a0', 'console_scripts', 'g2p-seq2seq')()
File "build/bdist.linux-x86_64/egg/g2p_seq2seq/app.py", line 81, in main
File "build/bdist.linux-x86_64/egg/g2p_seq2seq/g2p.py", line 348, in evaluate
ZeroDivisionError: float division by zero

When I decode the same wordlist, it works fine.

No need to two-pass loops

Here you can do with a single pass, and there is not need for list

  lst = []
  for line in inp_dictionary:
    lst.append(line.strip().split())

  graphemes, phonemes = [], []
  for line in lst:
    if len(line)>1:
      graphemes.append(list(line[0]))
      phonemes.append(line[1:])

Where is the Train Model file?

I used the following command to train G2P model:
python g2p.py --train /home/cmudict.dict --model /home/MyModel --max_steps 8400

here is the log:

Preparing G2P data
Creating vocabularies in /home/MyModel
Creating vocabulary /home/MyModel/vocab.phoneme
Creating vocabulary /home/MyModel/vocab.grapheme
Reading development and training data.
Creating 2 layers of 64 units.
........
Reading model parameters from /home/MyModel/translate.ckpt-8200
global step 8400 learning rate 0.4901 step-time 3.43 perplexity 1.37
  eval: bucket 0 perplexity 1.46
  eval: bucket 1 perplexity 1.29
  eval: bucket 2 perplexity 1.47
Training process stopped.
Beginning calculation word error rate (WER) on test sample.
WER :  0.4961492891
Accuracy :  0.5038507109

In MyModel directory there are so many generated files present, but there is no "model" file.

translate.ckpt-200
translate.ckpt-200.meta
translate.ckpt-400
translate.ckpt-400.meta
translate.ckpt-600
translate.ckpt-600.meta
translate.ckpt-7200
translate.ckpt-7200.meta
translate.ckpt-7400
translate.ckpt-7400.meta
translate.ckpt-7600
translate.ckpt-7600.meta
translate.ckpt-7800
translate.ckpt-7800.meta
translate.ckpt-8000
translate.ckpt-8000.meta
translate.ckpt-8200
translate.ckpt-8200.meta
translate.ckpt-8400
translate.ckpt-8400.meta
model.params
vocab.phoneme
vocab.grapheme
translate.ckpt-8600
translate.ckpt-8600.meta
translate.ckpt-8800
checkpoint
translate.ckpt-8800.meta

Where to get that "model" file.
or do I have to rename file translate.ckpt-8800 to model ?

Provide a link to reference dictionary

Also update error rates in readme. Phonetisaurus error rate on this set is also 24.4%. Phonetisaurus on latest cmudict 33.89%. Provide our results on latest cmudict.

Last word in the list is skipped

[shmyrev@alpha g2p_seq2seq]$ cat > word list
hello
world
how
are 
you
[shmyrev@alpha g2p_seq2seq]$ python g2p.py --model /home/shmyrev/cmudict-g2p-model --decode word.list
HH EH L OW
W ER L D
HH AW
AA R

Last word is missing

Moreover, each line should contain a word, not just the phonemes. It should create a ready-to-use dictionary:

[shmyrev@alpha g2p_seq2seq]$ python g2p.py --model /home/shmyrev/cmudict-g2p-model --decode word.list
hello HH EH L OW
world W ER L D
how HH AW
are AA R
you Y UW

PER?

Is it possible to get phone error rate in addition to word error rate?

Train with short dictionaries

Traceback (most recent call last):
  File "g2p.py", line 442, in <module>
    tf.app.run()
  File "/usr/lib/python2.7/site-packages/tensorflow/python/platform/default/_app.py", line 30, in run
    sys.exit(main(sys.argv))
  File "g2p.py", line 425, in main
    g2p_model.train(g2p_params, FLAGS.train, FLAGS.valid, FLAGS.test)
  File "g2p.py", line 243, in train
    self.__run_evals()
  File "g2p.py", line 269, in __run_evals
    self.valid_set, bucket_id)
  File "/usr/lib/python2.7/site-packages/tensorflow/models/rnn/translate/seq2seq_model.py", line 252, in get_batch
    encoder_input, decoder_input = random.choice(data[bucket_id])
  File "/usr/lib64/python2.7/random.py", line 274, in choice
    return seq[int(self.random() * len(seq))]  # raises IndexError if seq is empty
IndexError: list index out of range

Check size for embedding layer

When you convert letter and phoneme symbols to numerical ids, isn't it confusing for the model to train with integers for classes? Would it be better to have one-hot encoding or maybe even letter embeddings to make distances between letters or phonemes more meaningful?

Zero accuracy

I run

python /home/ubuntu/g2p-seq2seq/g2p_seq2seq/g2p.py --train cmudict.dict --num_layers 4 --size 64 --model model

I get

WER :  0.964269283852
Accuracy :  0.0357307161478

coding problem

Dear All,

I got a coding error in test phase (training and interactive phase were all fine). My training dictionary is a mixture of cmudict (ascii) and Chinese (utf-8) lexicons. What should I do? Should I convert all cmudict entries to utf-8?

Thanks a lot in advance!

Here is the log:

global step 91200 learning rate 0.2425 step-time 0.13 perplexity 1.02
Training done.
Creating 2 layers of 512 units.
Reading model parameters from g2p-seq2seq-oc16
Beginning calculation word error rate (WER) on test sample.
Traceback (most recent call last):
File "/home/liao/anaconda3/envs/python2.7/bin/g2p-seq2seq", line 9, in
load_entry_point('g2p-seq2seq==5.0.0a0', 'console_scripts', 'g2p-seq2seq')()
File "build/bdist.linux-x86_64/egg/g2p_seq2seq/app.py", line 67, in main
File "build/bdist.linux-x86_64/egg/g2p_seq2seq/g2p.py", line 234, in train
File "build/bdist.linux-x86_64/egg/g2p_seq2seq/g2p.py", line 347, in evaluate
File "build/bdist.linux-x86_64/egg/g2p_seq2seq/g2p.py", line 323, in calc_error
File "build/bdist.linux-x86_64/egg/g2p_seq2seq/g2p.py", line 279, in decode_word
UnicodeEncodeError: 'ascii' codec can't encode character u'\u86c8' in position 9: ordinal not in range(128)

Here is my training dictionary:

瘦西湖 sh ou4 x i1 h u2
睃 s uo1
supercuts S UW1 P ER0 K AH2 T S
电报机 d ian4 b ao4 j i1
galka G AE1 L K AH0
知 zh ix4
Unipus Y UW1 N IH0 P AH0 S

Fix bad code patterns

Never convert integer to string to later convert it back to integer, this is very inefficient.

Never join list items in a string to later split them and join them again.

Remove the code which is not used.

Avoid redundant dictionary construction

If you only need direct and reversed dictionary, it is better to change this method:

def initialize_vocabulary(vocabulary_path):
  """Initialize vocabulary from file.
  We assume the vocabulary is stored one-item-per-line, so a file:
    d
    c
  will result in a vocabulary {"d": 0, "c": 1}, and this function will
  also return the reversed-vocabulary ["d", "c"].

To this method with optional reverse parameter:

def load_vocab(vocabulary_path, reverse = False)

This method should return only one vocabulary direct or reversed based on optional flag

test accuracy with trained model

I want to test accuracy of the trained model on cmudict.
Are there any standard training, validation, test dict for this task.

How it is compared in papers for fair evaluation if there are no standard partitions?
Thanks a lot for this code.

Running question for this command(g2p-seq2seq --interactive --model model_folder_path)

sam@speechws13:~/g2p-seq2seq-master$ g2p-seq2seq --interactive --model g2p-seq2seq-cmudict/g2p-seq2seq-cmudict/modle
Traceback (most recent call last):
  File "/usr/local/bin/g2p-seq2seq", line 9, in <module>
    load_entry_point('g2p-seq2seq==5.0.0a0', 'console_scripts', 'g2p-seq2seq')()
  File "/usr/local/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 542, in load_entry_point
    return get_distribution(dist).load_entry_point(group, name)
  File "/usr/local/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 2569, in load_entry_point
    return ep.load()
  File "/usr/local/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 2229, in load
    return self.resolve()
  File "/usr/local/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 2235, in resolve
    module = __import__(self.module_name, fromlist=['__name__'], level=0)
  File "build/bdist.linux-x86_64/egg/g2p_seq2seq/__init__.py", line 23, in <module>
  File "build/bdist.linux-x86_64/egg/g2p_seq2seq/g2p.py", line 36, in <module>
ImportError: No module named data_utils

how can I fix this question??
thank you~

Logic is not clear

  create_vocabulary(ph_vocab_path, train_ph)
  create_vocabulary(gr_vocab_path, train_gr)

  # Initialize vocabularies.
  ph_vocab = initialize_vocabulary(ph_vocab_path, False)
  gr_vocab = initialize_vocabulary(gr_vocab_path, False)

Why do you need to initialize the vocabulary after you created it. Logic must be more straightforward. First initialize the vocabulary, then save it, then there is no need to reload it again.

Time to restore saved model?

In the function g2p.py I added a time.time() function around the command

self.model.saver.restore(self.session, os.path.join(self.model_dir,
"model"))

to see how long it takes to load a pre-trained model to decode words. With a model trained with 512 nodes I get:

Time to load model: 2.53336596489

with only 64 nodes I don't get much savings:

Time to load model: 2.50763916969

which according to the python time module is output in seconds. That seems really slow. I am using the cpu instead of the gpu, because in the end if we are to include a similar NN model in our software, we don't have any gpu power on our servers. But still, when I compare it with a current openfst implementation of an n-gram model, that one is only 300ms or 0.3s to load in c++.

It may be faster if I can restore the saved file from c++ but I have to see about writing code to allow that.

training model did not improve accuracy!

So, I was training a new model based on CMUSphinx dictionary, with --max_steps 10000 --size 512 --num_layers 3 --learning_rate 0.5 variables, after I finished the training on model, I got this output with trained model.

a
M HH HH HH HH UH UH UH UH UH
b
M M UH UH UH UH UH UH UH UH
c
M M M UH UH UH UH UH UH UH
d
M M M M UH UH UH UH UH UH
hello
M HH HH HH HH UW UW UW UW M M M M M M
aa
HH HH HH HH HH HH HH HH UH UH

Is there anything wrong with my approach ?

this was my last output in training model.
global step 10000 learning rate 0.4000 step-time 3.48 perplexity 1.15
eval: bucket 0 perplexity 1.40
eval: bucket 1 perplexity 1.25
eval: bucket 2 perplexity 1.34

Cleanup main function

Move

    train_gr, train_ph = data_utils.split_to_grapheme_phoneme(train_dic)
    valid_gr, valid_ph = data_utils.split_to_grapheme_phoneme(valid_dic)
    test_gr, test_ph = data_utils.split_to_grapheme_phoneme(test_dic)

from the main function to train

TypeError in g2p-seq2seq --interactive

Hi,
thanks so much for this great project!

I have it running in --decode mode but run into this error when --interactivewhere I receive this message:

$ sudo g2p-seq2seq --interactive --model g2p-seq2seq-cmudict
Creating 2 layers of 512 units.
Reading model parameters from g2p-seq2seq-cmudict
> hello
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.5/bin/g2p-seq2seq", line 11, in <module>
    load_entry_point('g2p-seq2seq==5.0.0a0', 'console_scripts', 'g2p-seq2seq')()
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/g2p_seq2seq-5.0.0a0-py3.5.egg/g2p_seq2seq/app.py", line 78, in main
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/g2p_seq2seq-5.0.0a0-py3.5.egg/g2p_seq2seq/g2p.py", line 308, in interactive
TypeError: decoding str is not supported

Sorry if this is a newbie error. Any help much appreciated :)

List index out of range

I am trying to do everything right but this error still persists.

Creating 2 layers of 64 units.
Created model with fresh parameters.
global step 200 learning rate 0.5000 step-time 3.09 perplexity 1.57
Traceback (most recent call last):
File "/usr/local/bin/g2p-seq2seq", line 9, in
load_entry_point('g2p-seq2seq==5.0.0a0', 'console_scripts', 'g2p-seq2seq')()
File "build/bdist.linux-x86_64/egg/g2p_seq2seq/app.py", line 67, in main
File "build/bdist.linux-x86_64/egg/g2p_seq2seq/g2p.py", line 217, in train
File "build/bdist.linux-x86_64/egg/g2p_seq2seq/g2p.py", line 253, in __run_evals
File "/usr/local/lib/python2.7/dist-packages/tensorflow/models/rnn/translate/seq2seq_model.py", line 250, in get_batch
encoder_input, decoder_input = random.choice(data[bucket_id])
File "/usr/lib/python2.7/random.py", line 275, in choice
return seq[int(self.random() * len(seq))] # raises IndexError if seq is empty
IndexError: list index out of range

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.