With so much noise its basically unuseable. Google's was perfectly noise free. Hop

Thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Why does it have so much noise and vibration, how can we remove it ? about randomcnn-voice-transfer HOT 10 CLOSED

mazzzystar commented on July 3, 2024

Why does it have so much noise and vibration, how can we remove it ?

from randomcnn-voice-transfer.

Comments (10)

wintdkyo commented on July 3, 2024 1

I was actually impressed by the results! I tried it myself, and my own results are promising. I think it's a great starting point given the lack of pre-training prep required compared to other voice conversion projects out here.

from randomcnn-voice-transfer.

ishandutta2007 commented on July 3, 2024 1

Thanks @mazzzystar . I have already been trying r9y9's ones.
I am consolidating everything I found and also whatever you suggested here

We can close this issue here.

For mutual benefits feel free to stay connected, here are my contact details:

google hangout/gmail: [email protected]
whatsapp: +919952917263
wechat: ishandutta2007offi
linkedin https://www.linkedin.com/in/ishandutta2007

from randomcnn-voice-transfer.

mazzzystar commented on July 3, 2024

@ishandutta2007 Yes you are looking at correct samples. I listened it again and don't think my results differ much with Google's on audio quality, basically they're all NOT good, that's why I didn't get a paper about it, though I think my transfer style "taste" more like the target person on voice texture. Sorry for I just like to share some of my try out and may give others some inspiration on this topic.

from randomcnn-voice-transfer.

ishandutta2007 commented on July 3, 2024

I was talking about various style transfer papers from here and their corresponsing results. All of them are totally noise free. https://google.github.io/tacotron.
Can you help me find where have you downloaded these examples from?

from randomcnn-voice-transfer.

mazzzystar commented on July 3, 2024

@ishandutta2007 https://google.github.io/speech_style_transfer/samples.html
Here it is, I wrote about it in README.

from randomcnn-voice-transfer.

ishandutta2007 commented on July 3, 2024

Ok got it. my mistake.
But my point was this is not usable yet, long way to go before this can be integrated in apps.
Is there a way to pass this generated voice through some another NN and get a smooth noise free voice like the other papers of google that I just showed you.

from randomcnn-voice-transfer.

mazzzystar commented on July 3, 2024

@ishandutta2007 In my own view, Tacotron is a framework using for text-to-speech synthesis task, which generates audio features from text. I don't see the similarity between us, maybe you mean can we use the output of this work as the input feature for some Vocoder(like WaveNet-based vocoder) ? Maybe you can try though I don't think it works.

Yeah currently this is just a toy, as I know currently it's difficult to integrate speech conversion into apps, if you would like to do research, you may find works as below is worthy to try out:

Hope these can help you.

from randomcnn-voice-transfer.

ishandutta2007 commented on July 3, 2024

I just checked your profile and found you have worked on tacotron and Wavenet too. And those results are very much at par with google's. Have you worked on any of the style transfer methodologies like 1803.09047 or 1803.09017

Apologies My questions might be too naive as I am more of a developer than DL researcher.

from randomcnn-voice-transfer.

ishandutta2007 commented on July 3, 2024

As of now I am just looking for a TTS which is indistinguishable from humans, (atleast to the level what google demonstrated as part of google duplex)
I tried wavenet, it's too robotic, inspite of adding all the prosody.
Same for tacotron.
The tacotron with style transfer papers which I linked above look promising. But whatever results most people uploaded on github are too noisy and are not matching google's.

To be honest style transfer is not my requirement, to sound realistic is all I need.

from randomcnn-voice-transfer.

mazzzystar commented on July 3, 2024

@ishandutta2007 Actually I'm also new to speech, and haven't tried Tacotron based style transfer work, I came across these two paper and think they may be adding some condition(e.g. speaker embedding, prosody, etc.) on Tacotron when converting text->audio feature, this may works for style transfer task, one of my colleague is trying out this method. And the paper I posted above "Neural Discrete Representation Learning" is what I'm now working for style transfer between raw audios, my results it not as good as the demo but still worth a try.

And as for the indistingushable tts, I think currently Tacotron + WaveNet Vocoder is good enough to get reasonable result, you may find these useful:

from randomcnn-voice-transfer.

Why does it have so much noise and vibration, how can we remove it ? about randomcnn-voice-transfer HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent