Giter Site home page Giter Site logo

hujinsen / stargan-voice-conversion Goto Github PK

View Code? Open in Web Editor NEW
270.0 16.0 55.0 7.32 MB

full tensorflow implementation of the paper: StarGAN-VC: Non-parallel many-to-many voice conversion with star generative adversarial networks https://arxiv.org/abs/1806.02169

License: MIT License

Python 100.00%
voice-conversion voice-converter stargan-vc stargan cyclegan-vc tensorflow

stargan-voice-conversion's People

Contributors

akitaikeda avatar erjanmx avatar hujinsen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

stargan-voice-conversion's Issues

Issues in training the model

python train.py
Loading Data...
found stat file: ./etc/children-stats.npz
found stat file: ./etc/adult-stats.npz
Loading Data Done.
d1: [None, 36, 512, 32]
d2: [None, 18, 256, 64]
d3: [None, 9, 128, 128]
d4: [None, 9, 128, 64]
[None, 1, 128, 4]
u1.shape :[None, 9, 128, 64]
c1 shape: (?, 9, 128, 4)
u1_concat.shape :[None, 9, 128, 68]
u2.shape :[None, 9, 128, 128]
u3.shape :[None, 18, 256, 64]
u4.shape :[None, 36, 512, 32]
u4_concat.shape :[None, 36, 512, 36]
u5.shape :[None, 36, 512, 1]
d1: [None, 36, 512, 32]
d2: [None, 18, 256, 64]
d3: [None, 9, 128, 128]
d4: [None, 9, 128, 64]
[None, 1, 128, 4]
u1.shape :[None, 9, 128, 64]
c1 shape: (?, 9, 128, 4)
u1_concat.shape :[None, 9, 128, 68]
u2.shape :[None, 9, 128, 128]
u3.shape :[None, 18, 256, 64]
u4.shape :[None, 36, 512, 32]
u4_concat.shape :[None, 36, 512, 36]
u5.shape :[None, 36, 512, 1]
d1: [None, 36, 512, 32]
d2: [None, 18, 256, 64]
d3: [None, 9, 128, 128]
d4: [None, 9, 128, 64]
[None, 1, 128, 4]
u1.shape :[None, 9, 128, 64]
c1 shape: (?, 9, 128, 4)
u1_concat.shape :[None, 9, 128, 68]
u2.shape :[None, 9, 128, 128]
u3.shape :[None, 18, 256, 64]
u4.shape :[None, 36, 512, 32]
u4_concat.shape :[None, 36, 512, 36]
u5.shape :[None, 36, 512, 1]
domain_classifier_d1: (?, 8, 512, 8)
domain_classifier_d1_p: (?, 4, 256, 8)
domain_classifier_d12: (?, 4, 256, 16)
domain_classifier_d2_p: (?, 2, 128, 16)
domain_classifier_d3: (?, 2, 128, 32)
domain_classifier_d3_p: (?, 1, 64, 32)
domain_classifier_d4: (?, 1, 64, 16)
domain_classifier_d4_p: (?, 1, 32, 16)
domain_classifier_d5: (?, 1, 32, 4)
domain_classifier_d5_p: (?, 1, 16, 4)
classifier_output: (?, 1, 1, 4)
domain_classifier_d1: (?, 8, 512, 8)
domain_classifier_d1_p: (?, 4, 256, 8)
domain_classifier_d12: (?, 4, 256, 16)
domain_classifier_d2_p: (?, 2, 128, 16)
domain_classifier_d3: (?, 2, 128, 32)
domain_classifier_d3_p: (?, 1, 64, 32)
domain_classifier_d4: (?, 1, 64, 16)
domain_classifier_d4_p: (?, 1, 32, 16)
domain_classifier_d5: (?, 1, 32, 4)
domain_classifier_d5_p: (?, 1, 16, 4)
classifier_output: (?, 1, 1, 4)
d1: [None, 36, 512, 32]
d2: [None, 18, 256, 64]
d3: [None, 9, 128, 128]
d4: [None, 9, 128, 64]
[None, 1, 128, 4]
u1.shape :[None, 9, 128, 64]
c1 shape: (?, 9, 128, 4)
u1_concat.shape :[None, 9, 128, 68]
u2.shape :[None, 9, 128, 128]
u3.shape :[None, 18, 256, 64]
u4.shape :[None, 36, 512, 32]
u4_concat.shape :[None, 36, 512, 36]
u5.shape :[None, 36, 512, 1]
2019-03-12 12:55:15.378518: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Traceback (most recent call last):
File "train.py", line 282, in
train(processed_dir, test_wav_dir)
File "train.py", line 158, in train
lambda_classifier=lambda_classifier
File "/home/StarGan/model.py", line 148, in train
self.generator_learning_rate: generator_learning_rate})
File "/homeStarGan/myvenv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 900, in run
run_metadata_ptr)
File "/home/StarGan/myvenv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1111, in _run
str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (8, 2) for Tensor 'source_label:0', which has shape '(?, 4)'

Python-v:3.6
Tensorflow:1.8

Help Appericiated

Exception: ====no match files!====

platform: ubuntu 18.04

process:
python3.6 train.py --processed_dir ./data/processed --test_wav_dir ./data/fourspeakers_test

issue log:
t_wav_dir ./data/fourspeakers_test
Loading Data...
found stat file: ./etc/TM1-stats.npz
Traceback (most recent call last):
File "StarGAN-Voice-Conversion/utility.py", line 57, in normalizer_dict
stat_filepath = [fn for fn in glob.glob(p) if one_speaker in fn][0]
IndexError: list index out of range

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "train.py", line 282, in
train(processed_dir, test_wav_dir)
File "train.py", line 48, in train
normlizer = Normalizer()
File "StarGAN-Voice-Conversion/utility.py", line 29, in init
self.norm_dict = self.normalizer_dict()
File "StarGAN-Voice-Conversion/utility.py", line 59, in normalizer_dict
raise Exception('====no match files!====')
Exception: ====no match files!====

Download dataset-->Preprocess dataset -->Train
processed dataset already pass, but train above error occurred, Can you provide guidance?

tks.

Cannot open the generated .wav file

Hello, I was playing around your code but the files generated by the command below cannot be opened.
python convert.py --model_dir ./out/100_2018-11-11-20-46/model --source_speaker SF1 --target_speaker SM1
Here, I manually placed SF1, SF2, SM1, SM2 in both of the train and test directory and I was able to preprocess the train data and was able to train the model successfully. Do you have any idea?

KeyError: 'coded_sps_mean'

OS: Windows
I'm only changed some code that the folders and files will be work on windows, the rest of the code is untouched.

If i start train.py
python train.py --processed_dir .\data\processed --test_wav_dir .\data\fourspeakers_test

I get the error:

Traceback (most recent call last): File "train.py", line 266, in <module> train(processed_dir, test_wav_dir) File "train.py", line 115, in train one_file = normlizer.forward_process(one_file, speaker_name) File "C:\work\StarGAN-Voice-Conversion\utility.py", line 33, in forward_process mean = self.norm_dict[speakername]['coded_sps_mean'] KeyError: 'coded_sps_mean'

in the array of 'speakername' are the dict keys:
dict_keys(['f0', 'ap', 'sp', 'coded_sp'])

Any idea how i could fix it?

network implementation differences (InstanceNorm, probability mean, max-pooling)

Thanks for your great implementation.
I found some implementation differences compared to original article.

difference list

  • IN as alternatives of BN
  • mean as alternatives of product in D/C last layer
  • (in only C) max-pooling as alternatives of strided-Conv

IN as alternatives of BN

In Generator, Discriminator and Classifier, Instance Normalization (IN) are used as alternatives of Batch Normalization (BN) (code).
There are comment-outed Batch Normalization, so are there any problem in BN?

mean as alternatives of product in D/C last layer

In original article, probabilities (probability of each patches) is multiplied (== product).

the final output D(y, c) is given by the product of all these probabilities.
...
“Product” denote ... product pooling layers,

But in this implementation, in last layer, probabilities is taken average.

c1_red = tf.reduce_mean(c1, keepdims=True) code

Is this intended implementation based on your experiments, or some other reasons?

(in only C) max-pooling as alternatives of strided-Conv

In D, down-sampling is made by strided-Conv as same as original article.
But in C, down-sampling is made by max-pooling.
Why are they used in different manner?

ValueError: Cannot feed value of shape (8, 10) for Tensor 'source_label:0', which has shape '(?, 4)'

now ,have this error:
Traceback (most recent call last):
File "train.py", line 283, in
train(processed_dir, test_wav_dir)
File "train.py", line 158, in train
lambda_classifier=lambda_classifier
File "/home/wuli/work/deeplearning/tf-star-gan-voice-conversion/StarGAN-Voice-Conversion/model.py", line 148, in train
self.generator_learning_rate: generator_learning_rate})
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 900, in run
run_metadata_ptr)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1111, in _run
str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (8, 10) for Tensor 'source_label:0', which has shape '(?, 4)'

Have you met this error before?

ValueError: Object arrays cannot be loaded when allow_pickle=False

[485:486]svaing file: ./data/processed/TM1-100131_0.npy
[485:486]svaing file: ./data/processed/TM1-100131_512.npy
save file: ./data/processed/TM1-100132
audio mcep shape (36, 512)
[486:486]svaing file: ./data/processed/TM1-100132_0.npy
Traceback (most recent call last):
File "preprocess.py", line 200, in
generator.generate_stats()
File "/home/user/sources/StarGAN-Voice-Conversion/utility.py", line 158, in generate_stats
d = t.f.arr_0.item()
File "/home/user/.local/lib/python3.6/site-packages/numpy/lib/npyio.py", line 94, in getattribute
return object.getattribute(self, '_obj')[key]
File "/home/user/.local/lib/python3.6/site-packages/numpy/lib/npyio.py", line 262, in getitem
pickle_kwargs=self.pickle_kwargs)
File "/home/user/.local/lib/python3.6/site-packages/numpy/lib/format.py", line 722, in read_array
raise ValueError("Object arrays cannot be loaded when "
ValueError: Object arrays cannot be loaded when allow_pickle=False

Version Issues

Hello,

I can't find the model file which should get generated after running train.py file. First argument of the convert.py is causing issue.

About the final result

I am curious about I can't open the final result, except using sox in linux and encode it into 32bit or 16 bit, BTW have you had a good result, I have tried some non-parallel data, and the result is bad. Though i can specify the voice after conversion, but the voice is not clear, do you have any advise on improving the voice qulity

Converted samples are corrupted!

in the samples folder, files that are the converted speech won't open with any player, I even tried out my phone to play but it didn't work.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.