Giter Site home page Giter Site logo

mrwellmann / unity-text-to-seech-with-sentis Goto Github PK

View Code? Open in Web Editor NEW
5.0 1.0 0.0 3.13 MB

This repo contains a Unity 3D editor only TTS system. It will output the audio, create an output.wav of it and put int in to StreamingAssets folder.

License: Creative Commons Zero v1.0 Universal

C# 8.61% Python 9.41% HLSL 14.01% ShaderLab 67.97%

unity-text-to-seech-with-sentis's Introduction

Text to speech implementation for Unity 3D

Overview

This repo contains a text to speech implementation for Unity 3D. Because it's using the python.scripting package, it's not possible to make a build. If you run it in the editor, it will output the audio and put an output.wav file into the StreamingAssets folder. See a sample output here here.

Model

I'm using the Unity Sentis Package together with ljspeech-jets-onnx. The ONNX model is not included in this repo and has to be downloaded separately. You can download it from Hugging Face ljspeech-jets-onnx.

Tokenizer

The tokenization is done with the help of the python.scripting package. Note: Note: The package can only be used in the Editor. To perform tokenization into phonemes without Python, have a look at this post.

Alt text

How to run ljspeech-jets-onnx in Unity 3D with Sentis

  1. You need to download the ONNX file from https://huggingface.co/NeuML/ljspeech-jets-onnx/tree/main. The model.onnx Assets/[TTS]/Data Models/ljspeech-jets-onnx/model.onnx not part of the repository and ignored via the .gitignore to save LFS space.
  2. You might need to reimport the model, just right click the model asset and reimport it. (Explanation https://discussions.unity.com/t/binarizer-sample-add-a-custom-layer-only-needs-a-reimport/279200).
  3. The import will still display an error, but it can be disregarded. We will be removing the final layers. Alt text
  4. You can execute the scene "TTS Test", which will produce Assets\StreamingAssets\output.wav for the string "Hello World! I wish I could speak."

The Story

This was my first attempt at using Sentis over a weekend, and I had no prior experience working with AI or ML code.

I spent several hours searching for other ONNX models without the "If" operator but didn't find any. I then got more help at https://discussions.unity.com/t/model-didnt-import-ljspeech-jets-onnx/265609/13.

As pointed out in the forum, modifying the ONNX outside of Unity might have been faster, especially with the assistance of Chat GPT-4 providing me with step-by-step guidance. What wasn't feasible was using the code interpreter, as it lacked access to the ONNX library.

Another option would have been to learn how to create an ONNX myself; however, I did not want to take that route this time.

A highly beneficial tool was https://netron.app/. It's a handy tool to understand what the ONNX is doing. Here is a screenshot of the sections with the "If" operator as well as the input and outputs of ljspeech-jets-onnx: Alt text

Muse Chat history of importing ljspeech-jets-onnx into Unity 3D

I posted some questions that I could have answered myself, but I thought it would be fun to see where the conversation led.

  • You can find a text copy of the conversation at Copy Of Caht
  • And here are the better readable images

Alt text Alt text Alt text Alt text Alt text Alt text Alt text Alt text Alt text Alt text Alt text Alt text

unity-text-to-seech-with-sentis's People

Contributors

mrwellmann avatar

Stargazers

 avatar Randy H avatar  avatar AkiKurisu avatar Beareality avatar

Watchers

 avatar

unity-text-to-seech-with-sentis's Issues

Tokenization

Thank you for your Sentis Example Project. The tokenization is done right, i just changed the output string a bit and parsed it as int array.

In TokenizerRunner.cs:

// Get the output from the StringWriter
            string output = stringWriter.ToString();
            output = output.Substring(1, output.Length - 3);
            return output;

In TestTextToSpeech.cs:

// Convert input text to tensor.
        tokenizedOutput = tokenizerRunner.ExecuteTokenizer(text);
        string[] tokens = tokenizedOutput.Split(' ');
        for (int i = 0; i < tokens.Length; i++) {
            if (tokens[i] == "")
            {
                tokens[i] = "0";
            };
        }
        int[] inputValues = tokens.Select(int.Parse).ToArray();

Fix for the gibberish bug

Thanks for sharing your code. ๐Ÿ‘

After applying the changes from bearman, replace this code part in the TestTextTtoSpeech.cs file:

// Convert input text to tensor.
    var tokenizedOutput = tokenizerRunner.ExecuteTokenizer(inputText);
    var tokenList = tokenizedOutput.Split(' ').ToList();
    for (int i = tokenList.Count-1; i >=0 ; i--)
    {
        if (tokenList[i] == "")
        {
            tokenList.RemoveAt(i);
        };
    }
    int[] inputValues = tokenList.ToArray().Select(int.Parse).ToArray();

Instead of replacing the empty string by a zero, the unrequired tokens now get removed.
An output .wav is attached...

output.zip

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.