Giter Site home page Giter Site logo

andrew-fennell / cognative Goto Github PK

View Code? Open in Web Editor NEW
20.0 20.0 6.0 22.06 MB

Translated vocal synthesis - Clone a voice and output speech in another language

Python 99.81% Batchfile 0.19%
ai ml python speech-synthesis text-to-speech translation tts voice

cognative's Introduction

Hi there, I'm Andrew! 👋

Linkedin Badge Website Badge Gmail Badge

I recently graduated from Texas A&M University with a BS in Computer Engineering. Outside of work, I enjoy working on personal and open source projects.

About me

  • 🚧  I'm always working on projects for fun
  • 💻  I enjoy contributing to open source projects
  • 📷  I am learning photography and chess
  • 🗻  僕は日本語を勉強しています (I'm studying Japanese)

How I work

  • I am highly motivated and enjoy learning new things.
  • I study and learn well through habit building.
  • I enjoy checking things off the list. ✅
  • I like improving skills, languages, and technologies I've learned through putting them into action with projects.

GitHub stats

These are a few stats from the current year that show my contributions to (public) personal and open source projects.

andrew-fennell's GitHub stats

Get in touch

If you have any questions about what I'm working on or just want to get in touch, feel free to contact me!

cognative's People

Contributors

andrew-fennell avatar aref-sadeghi-gh avatar austin-currington avatar ctis98 avatar dependabot[bot] avatar jacobsmith1020 avatar ptjohn0122 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

cognative's Issues

Clean up unused RTVC files

There are some different GUI files and things that won't be used. These just need to be cleaned up before a "release".

Develop initial user interface

Create initial user interface.

File browser:

  • Source audio file path
  • Destination audio file path

Dropdown:

  • Source language (checkbox to auto detect language)
  • Destination language

Button:

  • Run vocal synthesis
  • Play audio

Some decisions need to be made about technologies. Those decision can be added here as comments.

Fix RTVC printing colors

Some print colors are not ended.

For example, if something prints in green, it may print the next line in green.

Create hardware documentation

Hardware documentation should be created (preferably in .md format).

This documentation can replace the current README.md file at CogNative/CogNative/hardware.

Documentation should cover all of the hardware that we are using. It should also include anything that someone would need to run this project. (CUDA)

Feel free to set minimum specs in this document for other components, but I don't think it is required.

Add cross-language model to vocal synthesis

Add add least two cross-language models to vocal synthesis. This is a core function of this project, so this should be one of the primary focuses going forward.

The specific source languages can be chosen by the ML/AI team. The destination language should be English.

Make user interface improvements (if needed)

After an initial user interface is created, there may be new features by April 11th that need to be added.

If there are any adjustments to be made, this is a reminder to do so.

Integrate speech_recognition module with vocal synthesis models.

Create a module that will handle all vocal synthesis interaction.

This should include:

  • Taking in audio file path
  • Choosing language (or auto identifying language, when that is added)
  • Transcribing the audio file (in any supported language)
  • Translating the audio file (to any supported language)
  • Synthesizing audio in that language, that mimics the the source voice
  • Handling the output location of those files

The output audio may be bad before an improved model is implemented for cross-language vocal synthesis to the destination language. This is okay. The point is to add a module that will handle all of these top-level things.

We will narrow the scope to specific source and destination languages that are supported by our vocal synthesis models when we begin creating those models.

Adjust README to reflect project proposal

The README needs to be adjusted. Some funny stuff should probably go ahead and be removed, the name should be added to reflect the project proposal, and the description can be adjusted to fit what we decided to do.

Auto-detect source audio language

The first 10 seconds could be clipped (if the source audio is longer than 10 seconds) to auto-detect language of source audio.

This would allow us to remove another input from the UI and CLI.

Integrate STT and Translation

Functions for audio in different languages to be transcribed and translated (using the modules already developed for STT and translation)

Noise reduction in vocal synthesis

Currently, the synthesized voice contains significant static.

There are libraries that we can use to improve the sound quality and remove static. This would make the listener's experience significantly better.

Add language auto-detection

Add a language auto-detection feature. This should auto-detect the language being spoken in an input audio file (preferably a .wav file).

Setup new hardware

NUC may not work anymore. We need to setup new hardware, with the help of our professor, and secure a place to store the hardware.

Return file cut short when testing autotranslation

When testing autotranslation, I used the following command on a one-minute spanish file cut from the beginning of Angelina.wav and received a five second, one-sentence output instead of the full length audio. The file is inside the zip below as well.

(I used the one minute file to save time/money but also because the original 25 minute file returns an unrelated error about being too long. )

python -m CogNative.main -sampleAudio "I:\github\repo\CogNative\CogNative\examples\AngelinaShort.wav" -synType audio -dialogueAudio "I:\github\repo\CogNative\CogNative\examples\AngelinaShort.wav" -out "I:\github\repo\CogNative\CogNative\examples\AngelinaShortClone.wav" -useExistingEmbed y

AngelinaShort.zip

Test Hardware

Hardware needs to be tested.

  • Test GPU for functionality
  • can we access the machine?
  • is it updated?

Add functionality to UI

The UI is currently taking user input, but it isn't actually producing a cloned output.

I think the easiest (maybe not the best) route is to call main.py when the "Clone voice" button is clicked.

I think you can do this through the sys library. Read stdout to display success or errors on the UI.

No ending punctuation issue

If there is not an ending punctuation, there is no temp output chunk, which leads to this error:

image

I don't think this is the case for all inputs, so here is the input to demo this issue:

Input text that fails
Avsikten hos gemenskapen är att utgå från ett ‘globalt perspektiv’, där förhållanden i Sverige inte ska ses som normerande eller viktigare än de i omvärlden

Input text that works
Avsikten hos gemenskapen är att utgå från ett ‘globalt perspektiv’, där förhållanden i Sverige inte ska ses som normerande eller viktigare än de i omvärlden.

Audio file vocal synthesis INPUT cannot exceed a certain length

Problem

Audio files that are over a minute long do not work as vocal synthesis text input. (if you give an audio file to "copy" the words from, rather than providing text directly)

Error:
image

Proposed solution

  • Cut the provided audio into segments
  • Transcribe each audio segment
  • Combine transcriptions

This could run into issues with words and sentences being cut, which would decrease the quality of the transcriptions.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.