Hello, I was playing around your code but the files generated by the command below can

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Cannot open the generated .wav file about stargan-voice-conversion HOT 9 CLOSED

hujinsen commented on May 24, 2024

Cannot open the generated .wav file

from stargan-voice-conversion.

Comments (9)

hujinsen commented on May 24, 2024

Hi,
I did the same setting as you, placing SF1, SF2,SM1,SM2 in train and test directory, and then train the model and convert the speech. All goes well, the generated wav files can open.

My environments:
Ubuntu 16.04, tensorflow 1.7, librosa 0.6.2.

I'm wondering if the generated wav files during training are good?

from stargan-voice-conversion.

ZohaibAhmed commented on May 24, 2024

@hujinsen I have a similar problem. The training does happen pretty quickly (not sure if it's suppose to take longer). Here's the output, truncated:

Iteration: 0014745, Generator Learning Rate: 0.0001526, Discriminator Learning Rate: 0.0000763,Generator Loss : 1.193, Discriminator Loss : 1.130, domain_classifier_loss: 0.000
============test model============
============save converted audio============
============save converted audio============
============save converted audio============
============save converted audio============
============save converted audio============
============save converted audio============
============save converted audio============
============save converted audio============
============test finished!============
============save model============
save model: ./out/100_2018-11-12-22-23/model
Time Elapsed for This Epoch: 00:00:42
Training Time: 00:56:07

The wavs in the out directory as well as the ones after running convert.py aren't good. Wondering what I might be doing wrong? I ran the code as-is with the data in the README. I'll attach a sample of what I get.

SF1-TM1+200001.wav.zip

from stargan-voice-conversion.

keikunn4 commented on May 24, 2024

I cannot open file generated during the training as well.
Following file was generated at ./out/100_2018-11-11-20-46/wav/.
SF1-SM1+200001.zip
(P.S. I am working on the google Colaboratory which is ubuntu 18.04.1, tensorflow 1.12, librosa 0.6.2.)

from stargan-voice-conversion.

hujinsen commented on May 24, 2024

I cannot open file generated during the training as well.
Following file was generated at ./out/100_2018-11-11-20-46/wav/.
SF1-SM1+200001.zip
(P.S. I am working on the google Colaboratory which is ubuntu 18.04.1, tensorflow 1.12, librosa 0.6.2.)

first row is your converted audio's spectrogram, second row is the same file generated by 90 epoch model.It seems first audio's frequency is lower than 500Hz, which is not normal. May be the problem occurs at the conversion process , f0 may be not correctly converted. Could you please check if the generated speaker statistics ok. If the problem still exists, could you please mail the model you trained to me and I'll locate the problem.

from stargan-voice-conversion.

hujinsen commented on May 24, 2024

@hujinsen I have a similar problem. The training does happen pretty quickly (not sure if it's suppose to take longer). Here's the output, truncated:
Iteration: 0014745, Generator Learning Rate: 0.0001526, Discriminator Learning Rate: 0.0000763,Generator Loss : 1.193, Discriminator Loss : 1.130, domain_classifier_loss: 0.000
============test model============
============save converted audio============
============save converted audio============
============save converted audio============
============save converted audio============
============save converted audio============
============save converted audio============
============save converted audio============
============save converted audio============
============test finished!============
============save model============
save model: ./out/100_2018-11-12-22-23/model
Time Elapsed for This Epoch: 00:00:42
Training Time: 00:56:07
The wavs in the out directory as well as the ones after running convert.py aren't good. Wondering what I might be doing wrong? I ran the code as-is with the data in the README. I'll attach a sample of what I get.

SF1-TM1+200001.wav.zip

hi, the generated waveform almost all values nearly zero. That's strange. Could you please deliver the model you trained by mail?

from stargan-voice-conversion.

keikunn4 commented on May 24, 2024

@hujinsen Thank you for the follow-up!
f0 for the ./SF1/200001.wav seems to have no problem. (I was able to play it by IPython.display).
However, I did not get how I should check "generated speaker statistics". Could you explain what I should check in detail?
(P.S. my working folder is following. Shared Google Drive: StarGAN-Voice-Conversion)

from stargan-voice-conversion.

hujinsen commented on May 24, 2024

@keikunn4 the "generated speaker statistics" is the files in ./etc directory. I listen to the voice generated, it's just noise. I also use your model to convert the speech, the results just as yours, noise. The model maybe collapsed, I suggest retrain you model.

from stargan-voice-conversion.

hujinsen commented on May 24, 2024

@ZohaibAhmed the answer to "generated waveform almost all values nearly zero" is numpy version. higher version numpy cause an warning when run train.py. The warning is "numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88" When you see this warning, change the numpy version to 1.14.5, and tensorflow to 1.8.0, this way fix the problem.

from stargan-voice-conversion.

ZohaibAhmed commented on May 24, 2024

@hujinsen I don't get that error, but the results are still with values near zero - even with the etc folder that you sent.

Here's the full log:

~/Development/StarGAN-Voice-Conversion$ python3 convert.py --model_dir ./out/100_2018-11-12-22-23/model/  --source_speaker SF1 --target_speaker TM1
d1: [None, 36, 512, 32]
d2: [None, 18, 256, 64]
d3: [None, 9, 128, 128]
d4: [None, 9, 128, 64]
[None, 1, 128, 4]
u1.shape :[None, 9, 128, 64]
c1 shape: (?, 9, 128, 4)
u1_concat.shape :[None, 9, 128, 68]
u2.shape :[None, 9, 128, 128]
u3.shape :[None, 18, 256, 64]
u4.shape :[None, 36, 512, 32]
u4_concat.shape :[None, 36, 512, 36]
u5.shape :[None, 36, 512, 1]
d1: [None, 36, 512, 32]
d2: [None, 18, 256, 64]
d3: [None, 9, 128, 128]
d4: [None, 9, 128, 64]
[None, 1, 128, 4]
u1.shape :[None, 9, 128, 64]
c1 shape: (?, 9, 128, 4)
u1_concat.shape :[None, 9, 128, 68]
u2.shape :[None, 9, 128, 128]
u3.shape :[None, 18, 256, 64]
u4.shape :[None, 36, 512, 32]
u4_concat.shape :[None, 36, 512, 36]
u5.shape :[None, 36, 512, 1]
d1: [None, 36, 512, 32]
d2: [None, 18, 256, 64]
d3: [None, 9, 128, 128]
d4: [None, 9, 128, 64]
[None, 1, 128, 4]
u1.shape :[None, 9, 128, 64]
c1 shape: (?, 9, 128, 4)
u1_concat.shape :[None, 9, 128, 68]
u2.shape :[None, 9, 128, 128]
u3.shape :[None, 18, 256, 64]
u4.shape :[None, 36, 512, 32]
u4_concat.shape :[None, 36, 512, 36]
u5.shape :[None, 36, 512, 1]
domain_classifier_d1: (?, 8, 512, 8)
domain_classifier_d1_p: (?, 4, 256, 8)
domain_classifier_d12: (?, 4, 256, 16)
domain_classifier_d2_p: (?, 2, 128, 16)
domain_classifier_d3: (?, 2, 128, 32)
domain_classifier_d3_p: (?, 1, 64, 32)
domain_classifier_d4: (?, 1, 64, 16)
domain_classifier_d4_p: (?, 1, 32, 16)
domain_classifier_d5: (?, 1, 32, 4)
domain_classifier_d5_p: (?, 1, 16, 4)
classifier_output: (?, 1, 1, 4)
domain_classifier_d1: (?, 8, 512, 8)
domain_classifier_d1_p: (?, 4, 256, 8)
domain_classifier_d12: (?, 4, 256, 16)
domain_classifier_d2_p: (?, 2, 128, 16)
domain_classifier_d3: (?, 2, 128, 32)
domain_classifier_d3_p: (?, 1, 64, 32)
domain_classifier_d4: (?, 1, 64, 16)
domain_classifier_d4_p: (?, 1, 32, 16)
domain_classifier_d5: (?, 1, 32, 4)
domain_classifier_d5_p: (?, 1, 16, 4)
classifier_output: (?, 1, 1, 4)
d1: [None, 36, 512, 32]
d2: [None, 18, 256, 64]
d3: [None, 9, 128, 128]
d4: [None, 9, 128, 64]
[None, 1, 128, 4]
u1.shape :[None, 9, 128, 64]
c1 shape: (?, 9, 128, 4)
u1_concat.shape :[None, 9, 128, 68]
u2.shape :[None, 9, 128, 128]
u3.shape :[None, 18, 256, 64]
u4.shape :[None, 36, 512, 32]
u4_concat.shape :[None, 36, 512, 36]
u5.shape :[None, 36, 512, 1]
2018-11-19 09:45:33.388578: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-11-19 09:45:34.039516: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-11-19 09:45:34.039874: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: 
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.665
pciBusID: 0000:01:00.0
totalMemory: 10.73GiB freeMemory: 10.53GiB
2018-11-19 09:45:34.039885: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2018-11-19 09:45:34.261052: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-11-19 09:45:34.261073: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 
2018-11-19 09:45:34.261078: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N 
2018-11-19 09:45:34.261211: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10165 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)
found stat file: ./etc/TM1-stats.npz
found stat file: ./etc/SF2-stats.npz
found stat file: ./etc/SF1-stats.npz
found stat file: ./etc/TM2-stats.npz

from stargan-voice-conversion.

Cannot open the generated .wav file about stargan-voice-conversion HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent