Giter Site home page Giter Site logo

Comments (9)

hujinsen avatar hujinsen commented on May 24, 2024

Hi,
I did the same setting as you, placing SF1, SF2,SM1,SM2 in train and test directory, and then train the model and convert the speech. All goes well, the generated wav files can open.

My environments:
Ubuntu 16.04, tensorflow 1.7, librosa 0.6.2.

I'm wondering if the generated wav files during training are good?

from stargan-voice-conversion.

ZohaibAhmed avatar ZohaibAhmed commented on May 24, 2024

@hujinsen I have a similar problem. The training does happen pretty quickly (not sure if it's suppose to take longer). Here's the output, truncated:

Iteration: 0014745, Generator Learning Rate: 0.0001526, Discriminator Learning Rate: 0.0000763,Generator Loss : 1.193, Discriminator Loss : 1.130, domain_classifier_loss: 0.000
============test model============
============save converted audio============
============save converted audio============
============save converted audio============
============save converted audio============
============save converted audio============
============save converted audio============
============save converted audio============
============save converted audio============
============test finished!============
============save model============
save model: ./out/100_2018-11-12-22-23/model
Time Elapsed for This Epoch: 00:00:42
Training Time: 00:56:07

The wavs in the out directory as well as the ones after running convert.py aren't good. Wondering what I might be doing wrong? I ran the code as-is with the data in the README. I'll attach a sample of what I get.

SF1-TM1+200001.wav.zip

from stargan-voice-conversion.

keikunn4 avatar keikunn4 commented on May 24, 2024

I cannot open file generated during the training as well.
Following file was generated at ./out/100_2018-11-11-20-46/wav/.
SF1-SM1+200001.zip
(P.S. I am working on the google Colaboratory which is ubuntu 18.04.1, tensorflow 1.12, librosa 0.6.2.)

from stargan-voice-conversion.

hujinsen avatar hujinsen commented on May 24, 2024

I cannot open file generated during the training as well.
Following file was generated at ./out/100_2018-11-11-20-46/wav/.
SF1-SM1+200001.zip
(P.S. I am working on the google Colaboratory which is ubuntu 18.04.1, tensorflow 1.12, librosa 0.6.2.)

screen shot 2018-11-15 at 10 45 47

first row is your converted audio's spectrogram, second row is the same file generated by 90 epoch model.It seems first audio's frequency is lower than 500Hz, which is not normal. May be the problem occurs at the conversion process , f0 may be not correctly converted. Could you please check if the generated speaker statistics ok. If the problem still exists, could you please mail the model you trained to me and I'll locate the problem.

from stargan-voice-conversion.

hujinsen avatar hujinsen commented on May 24, 2024

@hujinsen I have a similar problem. The training does happen pretty quickly (not sure if it's suppose to take longer). Here's the output, truncated:

Iteration: 0014745, Generator Learning Rate: 0.0001526, Discriminator Learning Rate: 0.0000763,Generator Loss : 1.193, Discriminator Loss : 1.130, domain_classifier_loss: 0.000
============test model============
============save converted audio============
============save converted audio============
============save converted audio============
============save converted audio============
============save converted audio============
============save converted audio============
============save converted audio============
============save converted audio============
============test finished!============
============save model============
save model: ./out/100_2018-11-12-22-23/model
Time Elapsed for This Epoch: 00:00:42
Training Time: 00:56:07

The wavs in the out directory as well as the ones after running convert.py aren't good. Wondering what I might be doing wrong? I ran the code as-is with the data in the README. I'll attach a sample of what I get.

SF1-TM1+200001.wav.zip

screen shot 2018-11-15 at 11 07 29

hi, the generated waveform almost all values nearly zero. That's strange. Could you please deliver the model you trained by mail?

from stargan-voice-conversion.

keikunn4 avatar keikunn4 commented on May 24, 2024

@hujinsen Thank you for the follow-up!
f0 for the ./SF1/200001.wav seems to have no problem. (I was able to play it by IPython.display).
However, I did not get how I should check "generated speaker statistics". Could you explain what I should check in detail?
(P.S. my working folder is following. Shared Google Drive: StarGAN-Voice-Conversion)

from stargan-voice-conversion.

hujinsen avatar hujinsen commented on May 24, 2024

@keikunn4 the "generated speaker statistics" is the files in ./etc directory. I listen to the voice generated, it's just noise. I also use your model to convert the speech, the results just as yours, noise. The model maybe collapsed, I suggest retrain you model.

from stargan-voice-conversion.

hujinsen avatar hujinsen commented on May 24, 2024

@ZohaibAhmed the answer to "generated waveform almost all values nearly zero" is numpy version. higher version numpy cause an warning when run train.py. The warning is "numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88" When you see this warning, change the numpy version to 1.14.5, and tensorflow to 1.8.0, this way fix the problem.

from stargan-voice-conversion.

ZohaibAhmed avatar ZohaibAhmed commented on May 24, 2024

@hujinsen I don't get that error, but the results are still with values near zero - even with the etc folder that you sent.

Here's the full log:

~/Development/StarGAN-Voice-Conversion$ python3 convert.py --model_dir ./out/100_2018-11-12-22-23/model/  --source_speaker SF1 --target_speaker TM1
d1: [None, 36, 512, 32]
d2: [None, 18, 256, 64]
d3: [None, 9, 128, 128]
d4: [None, 9, 128, 64]
[None, 1, 128, 4]
u1.shape :[None, 9, 128, 64]
c1 shape: (?, 9, 128, 4)
u1_concat.shape :[None, 9, 128, 68]
u2.shape :[None, 9, 128, 128]
u3.shape :[None, 18, 256, 64]
u4.shape :[None, 36, 512, 32]
u4_concat.shape :[None, 36, 512, 36]
u5.shape :[None, 36, 512, 1]
d1: [None, 36, 512, 32]
d2: [None, 18, 256, 64]
d3: [None, 9, 128, 128]
d4: [None, 9, 128, 64]
[None, 1, 128, 4]
u1.shape :[None, 9, 128, 64]
c1 shape: (?, 9, 128, 4)
u1_concat.shape :[None, 9, 128, 68]
u2.shape :[None, 9, 128, 128]
u3.shape :[None, 18, 256, 64]
u4.shape :[None, 36, 512, 32]
u4_concat.shape :[None, 36, 512, 36]
u5.shape :[None, 36, 512, 1]
d1: [None, 36, 512, 32]
d2: [None, 18, 256, 64]
d3: [None, 9, 128, 128]
d4: [None, 9, 128, 64]
[None, 1, 128, 4]
u1.shape :[None, 9, 128, 64]
c1 shape: (?, 9, 128, 4)
u1_concat.shape :[None, 9, 128, 68]
u2.shape :[None, 9, 128, 128]
u3.shape :[None, 18, 256, 64]
u4.shape :[None, 36, 512, 32]
u4_concat.shape :[None, 36, 512, 36]
u5.shape :[None, 36, 512, 1]
domain_classifier_d1: (?, 8, 512, 8)
domain_classifier_d1_p: (?, 4, 256, 8)
domain_classifier_d12: (?, 4, 256, 16)
domain_classifier_d2_p: (?, 2, 128, 16)
domain_classifier_d3: (?, 2, 128, 32)
domain_classifier_d3_p: (?, 1, 64, 32)
domain_classifier_d4: (?, 1, 64, 16)
domain_classifier_d4_p: (?, 1, 32, 16)
domain_classifier_d5: (?, 1, 32, 4)
domain_classifier_d5_p: (?, 1, 16, 4)
classifier_output: (?, 1, 1, 4)
domain_classifier_d1: (?, 8, 512, 8)
domain_classifier_d1_p: (?, 4, 256, 8)
domain_classifier_d12: (?, 4, 256, 16)
domain_classifier_d2_p: (?, 2, 128, 16)
domain_classifier_d3: (?, 2, 128, 32)
domain_classifier_d3_p: (?, 1, 64, 32)
domain_classifier_d4: (?, 1, 64, 16)
domain_classifier_d4_p: (?, 1, 32, 16)
domain_classifier_d5: (?, 1, 32, 4)
domain_classifier_d5_p: (?, 1, 16, 4)
classifier_output: (?, 1, 1, 4)
d1: [None, 36, 512, 32]
d2: [None, 18, 256, 64]
d3: [None, 9, 128, 128]
d4: [None, 9, 128, 64]
[None, 1, 128, 4]
u1.shape :[None, 9, 128, 64]
c1 shape: (?, 9, 128, 4)
u1_concat.shape :[None, 9, 128, 68]
u2.shape :[None, 9, 128, 128]
u3.shape :[None, 18, 256, 64]
u4.shape :[None, 36, 512, 32]
u4_concat.shape :[None, 36, 512, 36]
u5.shape :[None, 36, 512, 1]
2018-11-19 09:45:33.388578: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-11-19 09:45:34.039516: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-11-19 09:45:34.039874: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: 
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.665
pciBusID: 0000:01:00.0
totalMemory: 10.73GiB freeMemory: 10.53GiB
2018-11-19 09:45:34.039885: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2018-11-19 09:45:34.261052: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-11-19 09:45:34.261073: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 
2018-11-19 09:45:34.261078: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N 
2018-11-19 09:45:34.261211: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10165 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)
found stat file: ./etc/TM1-stats.npz
found stat file: ./etc/SF2-stats.npz
found stat file: ./etc/SF1-stats.npz
found stat file: ./etc/TM2-stats.npz

from stargan-voice-conversion.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.