Comments (9)
Hi,
I did the same setting as you, placing SF1, SF2,SM1,SM2 in train and test directory, and then train the model and convert the speech. All goes well, the generated wav files can open.
My environments:
Ubuntu 16.04, tensorflow 1.7, librosa 0.6.2.
I'm wondering if the generated wav files during training are good?
from stargan-voice-conversion.
@hujinsen I have a similar problem. The training does happen pretty quickly (not sure if it's suppose to take longer). Here's the output, truncated:
Iteration: 0014745, Generator Learning Rate: 0.0001526, Discriminator Learning Rate: 0.0000763,Generator Loss : 1.193, Discriminator Loss : 1.130, domain_classifier_loss: 0.000
============test model============
============save converted audio============
============save converted audio============
============save converted audio============
============save converted audio============
============save converted audio============
============save converted audio============
============save converted audio============
============save converted audio============
============test finished!============
============save model============
save model: ./out/100_2018-11-12-22-23/model
Time Elapsed for This Epoch: 00:00:42
Training Time: 00:56:07
The wavs in the out
directory as well as the ones after running convert.py
aren't good. Wondering what I might be doing wrong? I ran the code as-is with the data in the README. I'll attach a sample of what I get.
from stargan-voice-conversion.
I cannot open file generated during the training as well.
Following file was generated at ./out/100_2018-11-11-20-46/wav/
.
SF1-SM1+200001.zip
(P.S. I am working on the google Colaboratory which is ubuntu 18.04.1, tensorflow 1.12, librosa 0.6.2.)
from stargan-voice-conversion.
first row is your converted audio's spectrogram, second row is the same file generated by 90 epoch model.It seems first audio's frequency is lower than 500Hz, which is not normal. May be the problem occurs at the conversion process , f0 may be not correctly converted. Could you please check if the generated speaker statistics ok. If the problem still exists, could you please mail the model you trained to me and I'll locate the problem.I cannot open file generated during the training as well.
Following file was generated at./out/100_2018-11-11-20-46/wav/
.
SF1-SM1+200001.zip
(P.S. I am working on the google Colaboratory which is ubuntu 18.04.1, tensorflow 1.12, librosa 0.6.2.)
from stargan-voice-conversion.
hi, the generated waveform almost all values nearly zero. That's strange. Could you please deliver the model you trained by mail?@hujinsen I have a similar problem. The training does happen pretty quickly (not sure if it's suppose to take longer). Here's the output, truncated:
Iteration: 0014745, Generator Learning Rate: 0.0001526, Discriminator Learning Rate: 0.0000763,Generator Loss : 1.193, Discriminator Loss : 1.130, domain_classifier_loss: 0.000 ============test model============ ============save converted audio============ ============save converted audio============ ============save converted audio============ ============save converted audio============ ============save converted audio============ ============save converted audio============ ============save converted audio============ ============save converted audio============ ============test finished!============ ============save model============ save model: ./out/100_2018-11-12-22-23/model Time Elapsed for This Epoch: 00:00:42 Training Time: 00:56:07
The wavs in the
out
directory as well as the ones after runningconvert.py
aren't good. Wondering what I might be doing wrong? I ran the code as-is with the data in the README. I'll attach a sample of what I get.
from stargan-voice-conversion.
@hujinsen Thank you for the follow-up!
f0
for the ./SF1/200001.wav
seems to have no problem. (I was able to play it by IPython.display
).
However, I did not get how I should check "generated speaker statistics". Could you explain what I should check in detail?
(P.S. my working folder is following. Shared Google Drive: StarGAN-Voice-Conversion)
from stargan-voice-conversion.
@keikunn4 the "generated speaker statistics" is the files in ./etc directory. I listen to the voice generated, it's just noise. I also use your model to convert the speech, the results just as yours, noise. The model maybe collapsed, I suggest retrain you model.
from stargan-voice-conversion.
@ZohaibAhmed the answer to "generated waveform almost all values nearly zero" is numpy version. higher version numpy cause an warning when run train.py. The warning is "numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88" When you see this warning, change the numpy version to 1.14.5, and tensorflow to 1.8.0, this way fix the problem.
from stargan-voice-conversion.
@hujinsen I don't get that error, but the results are still with values near zero - even with the etc
folder that you sent.
Here's the full log:
~/Development/StarGAN-Voice-Conversion$ python3 convert.py --model_dir ./out/100_2018-11-12-22-23/model/ --source_speaker SF1 --target_speaker TM1
d1: [None, 36, 512, 32]
d2: [None, 18, 256, 64]
d3: [None, 9, 128, 128]
d4: [None, 9, 128, 64]
[None, 1, 128, 4]
u1.shape :[None, 9, 128, 64]
c1 shape: (?, 9, 128, 4)
u1_concat.shape :[None, 9, 128, 68]
u2.shape :[None, 9, 128, 128]
u3.shape :[None, 18, 256, 64]
u4.shape :[None, 36, 512, 32]
u4_concat.shape :[None, 36, 512, 36]
u5.shape :[None, 36, 512, 1]
d1: [None, 36, 512, 32]
d2: [None, 18, 256, 64]
d3: [None, 9, 128, 128]
d4: [None, 9, 128, 64]
[None, 1, 128, 4]
u1.shape :[None, 9, 128, 64]
c1 shape: (?, 9, 128, 4)
u1_concat.shape :[None, 9, 128, 68]
u2.shape :[None, 9, 128, 128]
u3.shape :[None, 18, 256, 64]
u4.shape :[None, 36, 512, 32]
u4_concat.shape :[None, 36, 512, 36]
u5.shape :[None, 36, 512, 1]
d1: [None, 36, 512, 32]
d2: [None, 18, 256, 64]
d3: [None, 9, 128, 128]
d4: [None, 9, 128, 64]
[None, 1, 128, 4]
u1.shape :[None, 9, 128, 64]
c1 shape: (?, 9, 128, 4)
u1_concat.shape :[None, 9, 128, 68]
u2.shape :[None, 9, 128, 128]
u3.shape :[None, 18, 256, 64]
u4.shape :[None, 36, 512, 32]
u4_concat.shape :[None, 36, 512, 36]
u5.shape :[None, 36, 512, 1]
domain_classifier_d1: (?, 8, 512, 8)
domain_classifier_d1_p: (?, 4, 256, 8)
domain_classifier_d12: (?, 4, 256, 16)
domain_classifier_d2_p: (?, 2, 128, 16)
domain_classifier_d3: (?, 2, 128, 32)
domain_classifier_d3_p: (?, 1, 64, 32)
domain_classifier_d4: (?, 1, 64, 16)
domain_classifier_d4_p: (?, 1, 32, 16)
domain_classifier_d5: (?, 1, 32, 4)
domain_classifier_d5_p: (?, 1, 16, 4)
classifier_output: (?, 1, 1, 4)
domain_classifier_d1: (?, 8, 512, 8)
domain_classifier_d1_p: (?, 4, 256, 8)
domain_classifier_d12: (?, 4, 256, 16)
domain_classifier_d2_p: (?, 2, 128, 16)
domain_classifier_d3: (?, 2, 128, 32)
domain_classifier_d3_p: (?, 1, 64, 32)
domain_classifier_d4: (?, 1, 64, 16)
domain_classifier_d4_p: (?, 1, 32, 16)
domain_classifier_d5: (?, 1, 32, 4)
domain_classifier_d5_p: (?, 1, 16, 4)
classifier_output: (?, 1, 1, 4)
d1: [None, 36, 512, 32]
d2: [None, 18, 256, 64]
d3: [None, 9, 128, 128]
d4: [None, 9, 128, 64]
[None, 1, 128, 4]
u1.shape :[None, 9, 128, 64]
c1 shape: (?, 9, 128, 4)
u1_concat.shape :[None, 9, 128, 68]
u2.shape :[None, 9, 128, 128]
u3.shape :[None, 18, 256, 64]
u4.shape :[None, 36, 512, 32]
u4_concat.shape :[None, 36, 512, 36]
u5.shape :[None, 36, 512, 1]
2018-11-19 09:45:33.388578: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-11-19 09:45:34.039516: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-11-19 09:45:34.039874: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.665
pciBusID: 0000:01:00.0
totalMemory: 10.73GiB freeMemory: 10.53GiB
2018-11-19 09:45:34.039885: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2018-11-19 09:45:34.261052: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-11-19 09:45:34.261073: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
2018-11-19 09:45:34.261078: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
2018-11-19 09:45:34.261211: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10165 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)
found stat file: ./etc/TM1-stats.npz
found stat file: ./etc/SF2-stats.npz
found stat file: ./etc/SF1-stats.npz
found stat file: ./etc/TM2-stats.npz
from stargan-voice-conversion.
Related Issues (20)
- How many epochs did you train? HOT 2
- I clone your code and got a bad voice without changing any parm,i don't kown why? HOT 9
- Audio files generated after conversion cannot be opened HOT 3
- what should i do if i want to use more speakers data HOT 1
- Training time on Google Colab 12GB GPU HOT 1
- Why do the loss of generator and discriminator are 'nan' during the training process? and if there is a gradient explosion, I still have no change after changing the learning rate.
- KeyError: 'coded_sps_mean' HOT 2
- ValueError: Object arrays cannot be loaded when allow_pickle=False HOT 4
- Do not use os.path.join and str.split('/') both
- not able to reproduce the results in original paper HOT 6
- wavs generated by convert.py are just noise HOT 3
- AttributeError: module 'posixpath' has no attribute 'normpaths'
- can this model train VCTK dataset?
- Version Issues
- how to generate .wav file from .npy file after convert.py?
- Converted samples are corrupted! HOT 2
- Getting Normalizer issues while running train.py file, HOT 2
- train model and resume model training at saved checkpoint?
- can the train.py be run on gpu?
- Is it possible to have any source and target audio (wav file) as input when inference?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from stargan-voice-conversion.