Giter Site home page Giter Site logo

Dataset licence about tortoise-tts HOT 7 OPEN

neonbjb avatar neonbjb commented on August 16, 2024
Dataset licence

from tortoise-tts.

Comments (7)

neonbjb avatar neonbjb commented on August 16, 2024

The dataset consists of thousands of audiobooks and podcasts that were scraped from the web. Many are copywritten, which is why I am not releasing the dataset.

If you know or believe the laws in your jurisdiction will consider ML models as extensions of their datasets, then you should consider Tortoise license encumbered and you should not use it for commercial purposes.

from tortoise-tts.

C00reNUT avatar C00reNUT commented on August 16, 2024

Thank you for the clarification.

Just one more thing, I am asking because I would like to use the train_ voices

This repo comes with several pre-packaged voices. Voices prepended with "train_" came from the training set and perform far better than the others. If your goal is high quality speech, I recommend you pick one of them. If you want to see what Tortoise can do for zero-shot mimicing, take a look at the others.

I just want to be sure that they are not 'exact' 1:1 copy of the original voice, because maybe the generalization of the model could be fine according to the law, but I wouldn't be so sure with the exact voice match

from tortoise-tts.

neonbjb avatar neonbjb commented on August 16, 2024

This is a good point. You should not use any of the pre-packaged voices for business purposes for the time being. I will re-open t his and investigate which voices have copywrites attached to them and remove them.

from tortoise-tts.

neonbjb avatar neonbjb commented on August 16, 2024

FYI: LibriTTS and HiFiTTS datasets were used to train Tortoise. If you are looking for license-free voices that will work very well with this program, use one of those.

from tortoise-tts.

C00reNUT avatar C00reNUT commented on August 16, 2024

FYI: LibriTTS and HiFiTTS datasets were used to train Tortoise. If you are looking for license-free voices that will work very well with this program, use one of those.

Excellent, that is a very valuable information. There shall be plenty of public domain options, it will be just a bit of hit or miss trials

from tortoise-tts.

Aspie96 avatar Aspie96 commented on August 16, 2024

Just as a (probably) dumb (related) question: is there any reason to favour those datasets over LibriSpeech or some other dataset based on LibriVox (maybe a public domain one, since LibriSpeech is not exactly public domain)?

from tortoise-tts.

neonbjb avatar neonbjb commented on August 16, 2024

Not a dumb question, this is something that took me some pain to figure out. ASR-focused datasets are often poor for TTS because they are missing punctuation and have bad splitting (e.g. not split on sentences). These are both important cues for a TTS system. Both of these applies to LibriSpeech.

I believe LibriSpeech intersects with LibriTTS, so the model should work equally well with voices from either datasets.

from tortoise-tts.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.