Giter Site home page Giter Site logo

Comments (10)

wintdkyo avatar wintdkyo commented on July 3, 2024 1

I was actually impressed by the results! I tried it myself, and my own results are promising. I think it's a great starting point given the lack of pre-training prep required compared to other voice conversion projects out here.

from randomcnn-voice-transfer.

ishandutta2007 avatar ishandutta2007 commented on July 3, 2024 1

Thanks @mazzzystar . I have already been trying r9y9's ones.
I am consolidating everything I found and also whatever you suggested here

We can close this issue here.

For mutual benefits feel free to stay connected, here are my contact details:

from randomcnn-voice-transfer.

mazzzystar avatar mazzzystar commented on July 3, 2024

@ishandutta2007 Yes you are looking at correct samples. I listened it again and don't think my results differ much with Google's on audio quality, basically they're all NOT good, that's why I didn't get a paper about it, though I think my transfer style "taste" more like the target person on voice texture. Sorry for I just like to share some of my try out and may give others some inspiration on this topic.

from randomcnn-voice-transfer.

ishandutta2007 avatar ishandutta2007 commented on July 3, 2024

I was talking about various style transfer papers from here and their corresponsing results. All of them are totally noise free. https://google.github.io/tacotron.
Can you help me find where have you downloaded these examples from?

from randomcnn-voice-transfer.

mazzzystar avatar mazzzystar commented on July 3, 2024

@ishandutta2007 https://google.github.io/speech_style_transfer/samples.html
Here it is, I wrote about it in README.

from randomcnn-voice-transfer.

ishandutta2007 avatar ishandutta2007 commented on July 3, 2024

Ok got it. my mistake.
But my point was this is not usable yet, long way to go before this can be integrated in apps.
Is there a way to pass this generated voice through some another NN and get a smooth noise free voice like the other papers of google that I just showed you.

from randomcnn-voice-transfer.

mazzzystar avatar mazzzystar commented on July 3, 2024

@ishandutta2007 In my own view, Tacotron is a framework using for text-to-speech synthesis task, which generates audio features from text. I don't see the similarity between us, maybe you mean can we use the output of this work as the input feature for some Vocoder(like WaveNet-based vocoder) ? Maybe you can try though I don't think it works.

Yeah currently this is just a toy, as I know currently it's difficult to integrate speech conversion into apps, if you would like to do research, you may find works as below is worthy to try out:

Hope these can help you.

from randomcnn-voice-transfer.

ishandutta2007 avatar ishandutta2007 commented on July 3, 2024

I just checked your profile and found you have worked on tacotron and Wavenet too. And those results are very much at par with google's. Have you worked on any of the style transfer methodologies like 1803.09047 or 1803.09017

Apologies My questions might be too naive as I am more of a developer than DL researcher.

from randomcnn-voice-transfer.

ishandutta2007 avatar ishandutta2007 commented on July 3, 2024

As of now I am just looking for a TTS which is indistinguishable from humans, (atleast to the level what google demonstrated as part of google duplex)
I tried wavenet, it's too robotic, inspite of adding all the prosody.
Same for tacotron.
The tacotron with style transfer papers which I linked above look promising. But whatever results most people uploaded on github are too noisy and are not matching google's.

To be honest style transfer is not my requirement, to sound realistic is all I need.

from randomcnn-voice-transfer.

mazzzystar avatar mazzzystar commented on July 3, 2024

@ishandutta2007 Actually I'm also new to speech, and haven't tried Tacotron based style transfer work, I came across these two paper and think they may be adding some condition(e.g. speaker embedding, prosody, etc.) on Tacotron when converting text->audio feature, this may works for style transfer task, one of my colleague is trying out this method. And the paper I posted above "Neural Discrete Representation Learning" is what I'm now working for style transfer between raw audios, my results it not as good as the demo but still worth a try.

And as for the indistingushable tts, I think currently Tacotron + WaveNet Vocoder is good enough to get reasonable result, you may find these useful:

from randomcnn-voice-transfer.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.