danielswolf / rhubarb-lip-sync Goto Github PK

Rhubarb Lip Sync is a command-line tool that automatically creates 2D mouth animation from voice recordings. You can use it for characters in computer games, in animated cartoons, or in any other project that requires animating mouths based on existing recordings.

License: Other

CMake 0.22% Makefile 0.89% HTML 3.43% Python 2.77% CSS 0.05% C++ 49.14% Batchfile 0.01% Shell 0.49% Perl 0.16% C 37.82% Yacc 0.01% C# 0.05% Objective-C 1.15% M4 0.25% Lex 0.01% Scheme 0.36% Ruby 0.06% Java 1.97% JavaScript 0.25% Objective-C++ 0.92%

lip-sync animation command-line game-development cli

rhubarb-lip-sync's Introduction

Rhubarb Lip Sync allows you to quickly create 2D mouth animation from voice recordings. It analyzes your audio files, recognizes what is being said, then automatically generates lip sync information. You can use it for animating speech in computer games, animated cartoons, or any similar project.

Rhubarb Lip Sync integrates with the following applications:

Adobe After Effects (see below)
Moho and OpenToonz (see below)
Spine by Esoteric Software (see below)
Vegas Pro by Magix (see below)
Visionaire Studio (see external link)

In addition, you can use Rhubarb Lip Sync’s command line interface (CLI) to generate files in various output formats (TSV/XML/JSON).

Demo video

Click the image for a demo video.

Integrations

Adobe After Effects

You can use Rhubarb Lip Sync to animate dialog right from Adobe After Effects. For more information, follow this link or see the directory extras/AdobeAfterEffects.

Moho and OpenToonz

Rhubarb Lip Sync can create .dat switch data files, which are understood by Moho and OpenToonz. You can set the frame rate using the --datFrameRate option; to control the shape names, use the --datUsePrestonBlair flag. For more details, see Command-line options.

Spine by Esoteric Software

Rhubarb Lip Sync for Spine is a graphical tool that allows you to import a Spine project, perform automatic lip sync, then re-import the result into Spine. For more information, follow this link or see the directory extras/EsotericSoftwareSpine of the download.

Vegas Pro by Magix

Rhubarb Lip Sync also comes with two plugin scripts for Vegas Pro (previously Sony Vegas). For more information, follow this link or see the directory extras/MagixVegas of the download.

Mouth shapes

Rhubarb Lip Sync can use between six and nine different mouth positions. The first six mouth shapes (Ⓐ-Ⓕ) are the basic mouth shapes and the absolute minimum you have to draw for your character. These six mouth shapes were invented at the Hanna-Barbera studios for shows such as Scooby-Doo and The Flintstones. Since then, they have evolved into a de-facto standard for 2D animation, and have been widely used by studios like Disney and Warner Bros.

In addition to the six basic mouth shapes, there are three extended mouth shapes: Ⓖ, Ⓗ, and Ⓧ. These are optional. You may choose to draw all three of them, pick just one or two, or leave them out entirely.

Ⓐ		Closed mouth for the “P”, “B”, and “M” sounds. This is almost identical to the Ⓧ shape, but there is ever-so-slight pressure between the lips.
Ⓑ		Slightly open mouth with clenched teeth. This mouth shape is used for most consonants (“K”, “S”, “T”, etc.). It’s also used for some vowels such as the “EE” sound in bee.
Ⓒ		Open mouth. This mouth shape is used for vowels like “EH” as in men and “AE” as in bat. It’s also used for some consonants, depending on context. This shape is also used as an in-between when animating from Ⓐ or Ⓑ to Ⓓ. So make sure the animations ⒶⒸⒹ and ⒷⒸⒹ look smooth!
Ⓓ		Wide open mouth. This mouth shapes is used for vowels like “AA” as in father.
Ⓔ		Slightly rounded mouth. This mouth shape is used for vowels like “AO” as in off and “ER” as in bird. This shape is also used as an in-between when animating from Ⓒ or Ⓓ to Ⓕ. Make sure the mouth isn’t wider open than for Ⓒ. Both ⒸⒺⒻ and ⒹⒺⒻ should result in smooth animation.
Ⓕ		Puckered lips. This mouth shape is used for “UW” as in you, “OW” as in show, and “W” as in way.
Ⓖ		Upper teeth touching the lower lip for “F” as in for and “V” as in very. This extended mouth shape is optional. If your art style is detailed enough, it greatly improves the overall look of the animation. If you decide not to use it, you can specify so using the `extendedShapes` option.
Ⓗ		This shape is used for long “L” sounds, with the tongue raised behind the upper teeth. The mouth should be at least far open as in Ⓒ, but not quite as far as in Ⓓ. This extended mouth shape is optional. Depending on your art style and the angle of the head, the tongue may not be visible at all. In this case, there is no point in drawing this extra shape. If you decide not to use it, you can specify so using the `extendedShapes` option.
Ⓧ		Idle position. This mouth shape is used for pauses in speech. This should be the same mouth drawing you use when your character is walking around without talking. It is almost identical to Ⓐ, but with slightly less pressure between the lips: For Ⓧ, the lips should be closed but relaxed. This extended mouth shape is optional. Whether there should be any visible difference between the rest position Ⓧ and the closed talking mouth Ⓐ depends on your art style and personal taste. If you decide not to use it, you can specify so using the `extendedShapes` option.

How to run Rhubarb Lip Sync

General usage

Rhubarb Lip Sync is a command-line tool that is currently available for Windows and OS X.

Download the latest release and unzip the file anywhere on your computer.
Call rhubarb, passing it an audio file as argument and telling it where to create the output file. In its simplest form, this might look like this: rhubarb -o output.txt my-recording.wav. There are additional command-line options you can specify in order to get better results.
Rhubarb Lip Sync will analyze the sound file, animate it, and create an output file containing the animation. If an error occurs, Rhubarb Lip Sync will instead print an error message to stderr and exit with a non-zero exit code.

Command-line options

Basic command-line options

The following command-line options are the most common:

Option Description

<input file>

The audio file to be analyzed. This must be the last command-line argument. Supported file formats are WAVE (.wav) and Ogg Vorbis (.ogg).

-r <recognizer>, --recognizer <recognizer>

Specifies how Rhubarb Lip Sync recognizes speech within the recording. Options: pocketSphinx (use for English recordings), phonetic (use for non-English recordings). For details, see Recognizers.

Default value: pocketSphinx

-f <format>, --exportFormat <format>

The export format. Options: tsv (tab-separated values, see details), xml (see details), json (see details), dat (see Moho and OpenToonz).

Default value: tsv

-d <path>, --dialogFile <path>

With this option, you can provide Rhubarb Lip Sync with the dialog text to get more reliable results. Specify the path to a plain-text file (in ASCII or UTF-8 format) containing the dialog contained in the audio file. Rhubarb Lip Sync will still perform word recognition internally, but it will prefer words and phrases that occur in the dialog file. This leads to better recognition results and thus more reliable animation.

For instance, let’s say you’re recording dialog for a computer game. The script says: “That’s all gobbledygook to me.” But actually, the voice artist ends up saying “That’s just gobbledygook to me,” deviating from the dialog. If you specify a dialog file with the original line (“That’s all gobbledygook to me”), this will still allow Rhubarb Lip Sync to produce better results, because it will watch out for the uncommon word “gobbledygook”. Rhubarb Lip Sync will ignore the dialog file where it audibly differs from the recording, and benefit from it where it matches.

It is always a good idea to specify the dialog text. This will usually lead to more reliable mouth animation, even if the text is not completely accurate.

--extendedShapes <string>

As described in Mouth shapes, Rhubarb Lip Sync uses six basic mouth shapes and up to three extended mouth shapes, which are optional. Use this option to specify which extended mouth shapes should be used. For example, to use only the Ⓖ and Ⓧ extended mouth shapes, specify GX; to use only the six basic mouth shapes, specify an empty string: "".

Default value: GHX

-o, --output <output file>

The name of the output file to create. If the file already exists, it will be overwritten. If you don’t specify an output file, the result will be written to stdout.

--version

Displays version information and exits.

-h, --help

Displays usage information and exits.

--datFrameRate number

Only valid when using the dat export format. Controls the frame rate for the output file.

Default value: 24

--datUsePrestonBlair

Only valid when using the dat export format. Uses Preston Blair mouth shapes names instead of the default alphabetical ones. This applies the following mapping:

Alphabetic name	Preston Blair name
A	MBP
B	etc
C	E
D	AI
E	O
F	U
G	FV
H	L
X	rest

Caution: This mapping is only applied when exporting, after the recording has been animated. To control which mouth shapes to use, use the extendedShapes option with the alphabetic names.

Tip: For optimal results, make sure your mouth drawings follow the guidelines in the Mouth shapes section. This is easier if you stick to the alphabetic names instead of the Preston Blair names. The only situation where you need to use the Preston Blair names is when you’re using OpenToonz, because OpenToonz only supports the Preston Blair names.

Advanced command-line options

The following command-line options can be helpful in special situations, especially when automating Rhubarb Lip Sync.

Option	Description
`-q`, `--quiet`	By default, Rhubarb Lip Sync writes a number of progress messages to `stderr`. If you’re using it as part of a batch process, this may clutter your console. If you specify the `--quiet` flag, there won’t be any output to `stderr` unless an error occurred. You can combine this option with the `consoleLevel` option to change the minimum event level that is printed to `stderr`.
`--machineReadable`	This option is useful if you want to integrate Rhubarb Lip Sync with another (possibly graphical) application. All status messages to `stderr` will be in structured JSON format, allowing your program to parse them and display a graphical progress bar or something similar. For details, see Machine-readable status messages.
`--consoleLevel` <level>	Sets the log level for reporting to the console (`stderr`). Options: `trace`, `debug`, `info`, `warning`, `error`, `fatal`. If `--quiet` is also specified, only events with the specified level or higher will be printed. Otherwise, a small number of essential events (startup, progress, etc.) will be printed even if their levels are below the specified value. Default value: `error`
`--logFile` <path>	Creates a log file with diagnostic information at the specified path.
`--logLevel` <level>	Sets the log level for the log file. Only events with the specified level or higher will be logged. Options: `trace`, `debug`, `info`, `warning`, `error`, `fatal`. Default value: `debug`
`--threads` <number>	Rhubarb Lip Sync uses multithreading to speed up processing. By default, it creates as many worker threads as there are cores on your CPU, which results in optimal processing speed. You may choose to specify a lower number if you feel that Rhubarb Lip Sync is slowing down other applications. Specifying a higher number is not recommended, as it won’t result in any additional speed-up. Note that for short audio files, Rhubarb Lip Sync may choose to use fewer threads than specified. Default value: as many threads as your CPU has cores

Recognizers

The first step in processing an audio file is determining what is being said. More specifically, Rhubarb Lip Sync uses speech recognition to figure out what sound is being said at what point in time. You can choose between two recognizers:

PocketSphinx

PocketSphinx is an open-source speech recognition library that generally gives good results. This is the default recognizer. The downside is that PocketSphinx only recognizes English dialog. So if your recordings are in a language other than English, this is not a good choice.

Phonetic

Rhubarb Lip Sync also comes with a phonetic recognizer. Phonetic means that this recognizer won’t try to understand entire (English) words and phrases. Instead, it will recognize individual sounds and syllables. The results are usually less precise than those from the PocketSphinx recognizer. The advantage is that this recognizer is language-independent. Use it if your recordings are not in English.

Output formats

The output of Rhubarb Lip Sync is a file that tells you which mouth shape to display at what time within the recording. You can choose between three file formats — TSV, XML, and JSON. The following paragraphs show you what each of these formats looks like.

Tab-separated values (`tsv`)

TSV is the simplest and most compact export format supported by Rhubarb Lip Sync. Each line starts with a timestamp (in seconds), followed by a tab, followed by the name of the mouth shape. The following is the output for a recording of a person saying 'Hi.'

0.00	X
0.05	D
0.27	C
0.31	B
0.43	X
0.47	X

Here’s how to read it:

At the beginning of the recording (0.00s), the mouth is closed (shape Ⓧ). The very first output will always have the timestamp 0.00s.
0.05s into the recording, the mouth opens wide (shape Ⓓ) for the “HH” sound, anticipating the “AY” sound that will follow.
The second half of the “AY” diphtong (0.31s into the recording) requires clenched teeth (shape Ⓑ). Before that, shape Ⓒ is inserted as an in-between at 0.27s. This allows for a smoother animation from Ⓓ to Ⓑ.
0.43s into the recording, the dialog is finished and the mouth closes again (shape Ⓧ).
The last output line in TSV format is special: Its timestamp is always the very end of the recording (truncated to a multiple of 0.01s) and its value is always a closed mouth (shape Ⓧ or Ⓐ, depending on your extendedShapes settings).

XML format (`xml`)

XML format is rather verbose. The following is the output for a person saying 'Hi,' the same recording as above.

<?xml version="1.0" encoding="utf-8"?>
<rhubarbResult>
  <metadata>
    <soundFile>C:\Users\Daniel\Desktop\av\hi\hi.wav</soundFile>
    <duration>0.47</duration>
  </metadata>
  <mouthCues>
    <mouthCue start="0.00" end="0.05">X</mouthCue>
    <mouthCue start="0.05" end="0.27">D</mouthCue>
    <mouthCue start="0.27" end="0.31">C</mouthCue>
    <mouthCue start="0.31" end="0.43">B</mouthCue>
    <mouthCue start="0.43" end="0.47">X</mouthCue>
  </mouthCues>
</rhubarbResult>

The file starts with a metadata block containing the full path of the original recording and its duration (truncated to a multiple of 0.01s). After that, each mouthCue element indicates the start and end of a certain mouth shape, as explained for TSV format. Note that the end of each mouth cue is identical with the start of the following one. This is a bit redundant, but it means that we don’t need a special final element like in TSV format.

JSON format (`json`)

JSON format is very similar to XML format. The choice mainly depends on the programming language you use, which may have built-in support for one format but not the other. The following is the output for a person saying 'Hi,' the same recording as above.

{
  "metadata": {
    "soundFile": "C:\\Users\\Daniel\\Desktop\\av\\hi\\hi.wav",
    "duration": 0.47
  },
  "mouthCues": [
    { "start": 0.00, "end": 0.05, "value": "X" },
    { "start": 0.05, "end": 0.27, "value": "D" },
    { "start": 0.27, "end": 0.31, "value": "C" },
    { "start": 0.31, "end": 0.43, "value": "B" },
    { "start": 0.43, "end": 0.47, "value": "X" }
  ]
}

There is nothing surprising here; everything said about XML format applies to JSON, too.

Machine-readable status messages

Use the --machineReadable command-line option to enable machine-readable status messages. In this mode, each line printed to stderr will be an object in JSON format. Every object contains the following:

Property type: The type of the event. Currently, one of "start" (application start), "progress" (numeric progress), "success" (successful termination), "failure" (unsuccessful termination), and "log" (a log message without structured information).
Event-specific structured data. For instance, a "progress" event contains the property value with a numeric value between 0.0 and 1.0.
Property log: A log message describing the event, plus severity information. If you aren’t interested in the structured data, you can display this as a fallback. For instance, a "progress" event with the structured information "value": 0.69 may contain the following redundant log message: "Progress: 69%".

You can combine this option with the consoleLevel option. Note, however, that this only affects unstructured events of type "log" (not to be confused with the log property each event contains).

The following is an example output to stderr from a successful run:

{ "type": "start", "file": "hi.wav", "log": { "level": "Info", "message": "Application startup. Input file: \"hi.wav\"." } }
{ "type": "progress", "value": 0.00, "log": { "level": "Trace", "message": "Progress: 0%" } }
{ "type": "progress", "value": 0.01, "log": { "level": "Trace", "message": "Progress: 1%" } }
{ "type": "progress", "value": 0.03, "log": { "level": "Trace", "message": "Progress: 3%" } }
{ "type": "progress", "value": 0.06, "log": { "level": "Trace", "message": "Progress: 6%" } }
{ "type": "progress", "value": 0.69, "log": { "level": "Trace", "message": "Progress: 68%" } }
{ "type": "progress", "value": 1.00, "log": { "level": "Trace", "message": "Progress: 100%" } }
// Result data, printed to stdout...
{ "type": "success", "log": { "level": "Info", "message": "Application terminating normally." } }

The following is an example output to stderr from a failed run:

{ "type": "start", "file": "no-such-file.wav", "log": { "level": "Info", "message": "Application startup. Input file: \"no-such-file.wav\"." } }
{ "type": "failure", "reason": "Error processing file \"no-such-file.wav\".\nCould not open sound file \"no-such-file.wav\".\nNo such file or directory", "log": { "level": "Fatal", "message": "Application terminating with error: Error processing file \"no-such-file.wav\".\nCould not open sound file \"no-such-file.wav\".\nNo such file or directory" } }

Note that the output format adheres to SemVer. That means that the JSON output created after a minor upgrade will still be compatible. Note, however, that the following kinds of changes may occur at any time, because I consider them non-breaking:

Additional types of progress events. Just ignore those events whose types you do not know or use their unstructured log property.
Additional properties in any object. Just ignore properties you aren’t interested in.
Changes in JSON formatting, such as a re-ordering of properties or changes in whitespaces (except for line breaks — every event will remain on a singe line)
Fewer or more events of type "log" or changes in the wording of log messages

Versioning (SemVer)

Rhubarb Lip Sync uses Semantic Versioning (SemVer) for its command-line interface. For general information on Semantic Versioning, have a look at the official SemVer website.

As a rule of thumb, everything you can use through the command-line interface adheres to SemVer. Everything else (i.e., the source code, integrations with third-party software, etc.) does not.

I’d love to hear from you!

Have you created something great using Rhubarb Lip Sync? — Let me know on Twitter or send me an email at [email protected]!

Do you need help? Have you spotted a bug? Do you have a suggestion? — Create an issue!

JetBrains have been kind enough to supply me with a free Open Source license of ReSharper C++.

rhubarb-lip-sync's People

Stargazers

Watchers

Forkers

meshonline galek lab-x ridowan007 alever520 colemancda tjoen jangrimm dekuraan lubberscorrado mgmeier saurabhshri stethd gamedevforks rlalance munzuruleee goldlhz besmaller shubhampachori12110095 baaslaawe abdulhalimaliakbar zyamusic ssorasora iopass4 amshb001 argent0 sandsux lukas-mertens gopherwood gertkeno huokedu alexzpopov morevnaproject pscott-au copperdong edmig hagrima tritonsailor amirpavlo korlibs-archive cuongngo-agilityio yazici rollersdp sushantojal htwmedia yudie433 johnjjung koflazycat willwil luzpaz rk16449 secp8x32 gandolfxu someonefighting xiaohong23 stanleyg58 yushkovco xcaostagit wagi-coding serviceinnovationlab bmk10 alignmark ultrop caozhengquan blogmpw1 muhammamahdi xxpniu anil-sharma10 tahoma2d hommmm wzpan lucky-chief qeeji jiumao-org kissinger31 guomin amuxing pengkunfan sguzwf mi2think swipswaps fw079 asdlei99 menglj bearman92 kkunn0 stephengao lwzbuaa botainick giuliogatto mbalele whztt07 samsgates nabin-01 swimming92404 freestriker zluza gabrielmontagne thumbedmonkey fuxi-lab

rhubarb-lip-sync's Issues

Directly outputting Words/timestamp to Mouth Shapes bypassing pocketSphinx

Rhubarb is a little slow, I think its becuase the time pocketSphinx take to recognize the words in the audio file. In my project, I will use google speech to text, which will output a something like this:

[
             {
                "startTime": "1.300s",
                "endTime": "1.400s",
                "word": "Four"
              },
              {
                "startTime": "1.400s",
                "endTime": "1.600s",
                "word": "score"
              },
              {
                "startTime": "1.600s",
                "endTime": "1.600s",
                "word": "and"
              },
              {
                "startTime": "1.600s",
                "endTime": "1.900s",
                "word": "twenty"
              }
]

What file/function do I look into inside this repo to see how rhubarb converts the words to mouth shapes directly. I mean what function does rhubarb uses to convert words to shapes eg:

0.00	A
0.05	B
0.63	C
0.70	B
0.84	F
0.98	D

I looked into rhubarb-lip-sync/rhubarb/src/core/ and could not figure it out. So Far I understand that, the words are being converted to "DARPA phonetic alphabet" first and then converted to mouth Shapes. eg: AA becomes Shape A, EH becomes shape C etc..

Can you kindly provide a high level overview of what process rhubarb follows to convert words to shapes?

Thanks

Segmentation fault due to modification of timeline while iterating

In voiceActivityDetection.cpp, the activity timeline is modified while being iterated. This results in undefined behavior. On Windows, the code seems to work; on OS X, it results in a segmentation fault.

This may be the same bug @RankoR observed in #62.

GMOCK is required?

As I know, GMOCK is only for Test purpose.
When I uncheck it from cmake, cmake configure always is error.
CMake Error at CMakeLists.txt:97 (set_target_properties):
set_target_properties Can not find target to add properties to: gmock

Support Ogg Vorbis file format

For some developers, working with uncompressed WAVE files may not be an option. For instance, users of Visionaire Studio typically work with Ogg Vorbis files throughout development. Similarly, Spine supports .ogg files, but Rhubarb for Spine does not.

Rhubarb should support .ogg files (Ogg Vorbis) as an alternative to .wav files.

Supporting Rhubarb Lip-Sync in KorGE Game Engine

I'm adding Rhubarb Lip-Sync (still a WIP) support to my Game Engine:
http://docs.korge.soywiz.com/audio/lipsync

And I wanted to let you now about it, and to ask a question:

Is it ok to host the binary files of Rhubarb in one of my repositories, download them with a gradle task/intelliJ plugin and run them locally to generate lipsync files from audio files?

Also, can I use those mouth images/link Rhubarb Lip-Sync 1.0 demo video as I'm doing in the documentation?

Small Platforms and Speed of Translations

Hi.
A couple of questions.
Firstly - would this run on a RPi 3 or smaller.
Secondly can this be run in a continuous speech environment.

I am looking at ways to implement this in a real time environment where an 'actor' is wearing a mascot style costume that has servo's around the characters lips. When the actor speaks the suit's mouth will give an approximate shape of the spoken sound.

Many many thanks
Dave

After Effects: Script editor opens when animation is complete

I'm using the AfterEffects script to animate a recording. When Rhubarb is done generating the animation, a new window opens with some debug output.

Different results on OSX vs linux

I am getting different results on OSX vs linux, should this be expected?
OSX is the better result.

Linux:
{
"metadata": {
"soundFile": "audio.wav",
"duration": 3.56
},
"mouthCues": [
{ "start": 0.00, "end": 0.09, "value": "X" },
{ "start": 0.09, "end": 0.30, "value": "B" },
{ "start": 0.30, "end": 0.37, "value": "C" },
{ "start": 0.37, "end": 0.44, "value": "B" },
{ "start": 0.44, "end": 0.51, "value": "C" },
{ "start": 0.51, "end": 0.72, "value": "D" },
{ "start": 0.72, "end": 0.80, "value": "C" },
{ "start": 0.80, "end": 1.12, "value": "X" },
{ "start": 1.12, "end": 1.26, "value": "B" },
{ "start": 1.26, "end": 1.54, "value": "C" },
{ "start": 1.54, "end": 2.83, "value": "X" },
{ "start": 2.83, "end": 2.89, "value": "C" },
{ "start": 2.89, "end": 3.09, "value": "G" },
{ "start": 3.09, "end": 3.23, "value": "B" },
{ "start": 3.23, "end": 3.56, "value": "X" }
]
}

OSX:
{
"metadata": {
"soundFile": "audio.wav",
"duration": 3.56
},
"mouthCues": [
{ "start": 0.00, "end": 0.09, "value": "X" },
{ "start": 0.09, "end": 0.30, "value": "B" },
{ "start": 0.30, "end": 0.37, "value": "C" },
{ "start": 0.37, "end": 0.44, "value": "B" },
{ "start": 0.44, "end": 0.51, "value": "C" },
{ "start": 0.51, "end": 0.72, "value": "D" },
{ "start": 0.72, "end": 0.80, "value": "C" },
{ "start": 0.80, "end": 1.12, "value": "X" },
{ "start": 1.12, "end": 1.26, "value": "B" },
{ "start": 1.26, "end": 1.54, "value": "C" },
{ "start": 1.54, "end": 2.30, "value": "X" },
{ "start": 2.30, "end": 2.46, "value": "B" },
{ "start": 2.46, "end": 2.60, "value": "D" },
{ "start": 2.60, "end": 2.67, "value": "B" },
{ "start": 2.67, "end": 2.88, "value": "C" },
{ "start": 2.88, "end": 3.02, "value": "B" },
{ "start": 3.02, "end": 3.56, "value": "X" }
]
}

How to build the project in linux system

Hi, I would like to use rhubarb to extract the mouth shape on the real time audio stream, but I don't know how to do this? I have downloaded the source code, but I can find the Makefile. How can I build the project by myself.

can't find an .exe

Hi - I'm not sure is I should be asking this of you or the creator of the Blender addon...

I've downloaded and unzipped Rhubarb, and installed this blender addon:
https://github.com/scaredyfish/blender-rhubarb-lipsync
but when I come to select the .exe file I can't find one! There's a batch file, but no .exe.

Do you know of this addon?

By the way, I'm currently in the middle of creating a series of tutorials on creating simple automated lipsinc in Blender called Lipsinc for the Lazy in Blender. If I can get rhubarb working, it will make it obsolete! :S but I don't really mind. I want to get it working, and do a tutorial on it. I would also be interested in doing a skype interview if that interests you.

How does --dialogFile work?

I'm now working on a project which need to grab mouth shape info from an audio. I already did the sound recognition and get the script by using sphinx. I'd like to use rhubarb to get the mouth shape but instead of taking the audio as input, I'm wondering if I can use the script which I already got from sphinx. I noticed this " -d , --dialogFile " but I don't know what's the file like. It will be helpful if you can tell more about dialogFile. Does it have time info matches with the script in dialogFile? It will be great if there's an example or more reference for the dialogFile. Any suggestions or advice will be helpful, really appreciate!

Unwanted mouth movement at beginning of file

If a recording starts with a pause, the resulting animation should start with a closed mouth (X shape). For some recordings, however, the animation wrongly starts with a B shape.

This is caused by the way Rhubarb uses WebRTC for voice activity detection. For some recordings, WebRTC erroneously detects a noise sound at the very beginning of the recording, which Rhubarb then translates into a B shape.

A build with phonetic recognizer support

Can we get a build with phonetic recognizer please? Latest build doesnt have that support. I tried downloading the project and building it myself, but didnt manage to get far. It kept saying there were some libraries or paths missing.

Adjustable detection sensitivity

Hey, thanks for the awesome work you do. However ive had some problems with breathing sounds registering as phonemes and sometimes in clear long phonemes (like holding AAH, or OOH for a long time) it can switch to other phonemes that arent there for a brief period. Can we please get some sort of adjustable sensitivity of the detection algorithm or something along those lines?

pointer being freed was not allocated

Stdout output is OK, but that is printed to stderr:

rhubarb(7885,0x7fffabb45380) malloc: *** error for object 0x627261: pointer being freed was not allocated
*** set a breakpoint in malloc_error_break to debug

macOS 10.3.6, build from master

Migrate to latest version of utf8proc

Previous version of utf8proc could only be used as dynamic libraries. To use it as a static library, I had to patch it.

The latest version of utf8proc introduces a toggle. I should switch to this version and get rid of the patch.

rhubarb error exec format error in blender 2.79

Hi. Trying to run rhubarb on blender 2.79 but it doesnt work. Ive created a pose library, uploaded a .wav audio file and when pressing Rhubarb lipsync button it pops the error: "OSError: [Errno 8] Exec format error"

Im Using Ubuntu 16.04.
Yes, User Preferences is correct

No, rhubarb does not run from command line, no additional error messages, only "Exec format error"

No additional error messages when running Blender from command line
Thanks for advise

After Effects

Hi do you have support for an after effects script that parses the xml to time remapped keyframes?

Change install path

Rhubarb binary by default is installed to /usr/local/. I'm pretty sure that correct location should be /usr/local/bin/

rhubarb.exe missing

Hi this may be a noob question but I'm trying to use Rhubarb with After effects.
The directions say that there is an executable called rhubarb.exe

Do I need to compile to project to create this file?

Pocketsphinx not finding acoustic model

Hi, I'm on the most recent version of OSX and the most recent downloadable binary trying to use the CLI to generate a lipsync from a Google API text to speech wav and the dialog text.
The command I'm using is:
rhubarb input.wav -f json -d input.dialog --extendedShapes GX

I'm getting this output:
Generating lip sync data for "input.wav". Progress: [#-------------------] 6% [Error] "acmod.c", line 83: Folder '/usr/local/bin/res/sphinx/acoustic-model' does not contain acoustic model definition 'mdef' Progress (cont'd): [#-------------------] 6% [Error] "acmod.c", line 83: Folder '/usr/local/bin/res/sphinx/acoustic-model' does not contain acoustic model definition 'mdef' Progress (cont'd): [#-------------------] 6% [Error] "acmod.c", line 83: Folder '/usr/local/bin/res/sphinx/acoustic-model' does not contain acoustic model definition 'mdef' Progress (cont'd): [#-------------------] 6% [Error] "acmod.c", line 83: Folder '/usr/local/bin/res/sphinx/acoustic-model' does not contain acoustic model definition 'mdef' Progress (cont'd): [#-------------------] 6% [Error] "acmod.c", line 83: Folder '/usr/local/bin/res/sphinx/acoustic-model' does not contain acoustic model definition 'mdef' Progress (cont'd): [#-------------------] 6% [Error] "acmod.c", line 83: Folder '/usr/local/bin/res/sphinx/acoustic-model' does not contain acoustic model definition 'mdef' Progress (cont'd): [#-------------------] 6% [Error] "acmod.c", line 83: Folder '/usr/local/bin/res/sphinx/acoustic-model' does not contain acoustic model definition 'mdef' Progress (cont'd): [#-------------------] 6% [Error] "acmod.c", line 83: Folder '/usr/local/bin/res/sphinx/acoustic-model' does not contain acoustic model definition 'mdef' Progress (cont'd): [#-------------------] 6% [Fatal] Application terminating with error: Error processing file "1552349650_dinosaureggsoatmeal.wav". Error performing speech recognition via PocketSphinx tools. Error creating speech decoder.
From what I can tell after researching this seems to be pocketsphinx looking in the wrong location for the acoustic model. Indeed, the referenced folder doesn't exist and homebrew puts sphinx elsewhere, but I can't seem to find any way to customize the directory for rhubarb.

I also get the same output when I start the program with -r phonetic, which I hoped might avoid calling on the sphinx acoustic model.

Any idea of the best way to direct rhubarb to the acoustic model's actual directory, or what else might be going on here?

Crash with message "Time range start must not be less than end."

There is a rare corner case that isn't handled correctly by the algorithm that optimizes the animation timing. As a result, Rhubarb will crash for some WAVE file with this error message (numbers may differ):

Time range start must not be less than end. Start: 115cs, end: 113cs

Segfault with wav file containing some initial music before spoken words

I have been working to have Rhubarb help me convert an audio podcast into lip sync animation data for a podcast animation project. These podcasts all start with about 10 seconds of audio, and although some run through Rhubarb ok, most end up with a crash. I was able to isolate the problem after reviewing some debug logs to the first part of the podcast, where the music is located before the speaking begins. If I remove the first 10 seconds from the ~10 minute podcast, Rhubard works fine, and the output works very well. So, I was able to create a small testcase by clipping out the first 20 seconds of audio into a wave file which exhibits the crash. I am using Rhubarb version 1.6.0-win32. Attached is a tarball with the wav file, and the log. I am running Rhubarb like this:

./tools/rhubarb-lip-sync-1.6.0-win32/rhubarb.exe podcast3_first20s.wav --logFile podcast3_first20s.log --logLevel Debug

rhubarb_first20_crash.tar.gz

animation speed looks slow

Hello,
I've created small demo using the library, both with and without using dialogFile, and the end result looks like it skipping frames.
is there i possibility to specify sensitivity or mouth motion speed?
thank you very much.

Generic error message in Rhubarb for Spine

When Rhubarb Lip Sync for Spine is run from a JAR file (as opposed to a .class file), it fails to determine its binary directory. This results in the generic error message "Error performing lip sync for event ."

To fix this, a more robust may of determining the binary directory needs to be implemented.

Improve read performance for WAVE files

When reading a WAVE audio file, WaveFileReader performs a superfluous seek operation for every single sample. Performing this operation only when necessary results in a significant speedup.

Lack of documentation Adobe After Effects

Hi,

I'm actually encounter some difficulties to use Rhubarb in After Effect.

Binary is installed etc, everything should work as expected but we've got the following error "No animation result, animation was probably canceled" : We have a mouth composition with A to F (on file per frame) and a wave audio file. Is there any step by step documentation that we could find ?

Any help will be very appreciated :)

How to tweak the results?

This is an awesome project. Great work!

You've stated that you're getting about 75% accuracy, That means that at least 25% are still inaccurate. How do you tweak those manually? To modify the output files by hand is tedious and error-prone. Do you have a GUI for that?

Cheers

Making Animations more Smooth

Daniel,

I Like rhubarb and very grateful to you for making it open source. But I have a gripe with it, this goes for most lips animation software actually. Anyway, here it is:

The generated animation is fine for 2d cartoon, but for 3d/realistic looking subjects, it looks like watching a stop motion animation. Because transitioning from mouth shape A to B happens instantly. This makes rhubarb very limiting. It would be great, there was an option to add more mouth shapes related to each mouth shapes so that the animation looks more smooth.

Can the animation made more smooth if we used more mouth shapes? for example, for transitioning from mouth shape A to B, maybe use more mouth shapes for A, for example, lips_AB1, lispAB2, lipsAB3 . Would it work?

I am using the Spine extension. Did you write the spine extension in Java? I see that you have written the after effects extension in javascript. Since I am a javascript developer, can I modify the after effects extension to use it with Spine?

Related #28

Thank You

"Animated" dialogue text?

Would it be practical to add functionality to export the times at which words are spoken in the optionally provided dialogue file? This would be a really useful feature for printing text at the precise moment that a character speaks it.

Segmentation fault

Issue:
karoshi@karoshi-HP-Pavilion-15-Notebook-PC:~/LipSync/rhubarb-lip-sync/build$ ./rhubarb ../../test.wav
Analyzing input file [#-------------------] 6% -Segmentation fault (core dumped)

Input: wav file checked with avprobe.
avprobe version 11.2-6:11.2-1, Copyright (c) 2007-2014 the Libav developers
built on Jan 18 2015 05:12:33 with gcc 4.9.2 (Ubuntu 4.9.2-10ubuntu2)
[wav @ 0x1f31980] max_analyze_duration 5000000 reached
Input #0, wav, from '../../test.wav':
Duration: 00:00:09.91, bitrate: 705 kb/s
Stream #0.0: Audio: pcm_s16le, 44100 Hz, 1 channels, s16, 705 kb/s

avprobe output

When use gdb to get backtrace:

(gdb) backtrace
#0 0x00007ffff79079a0 in std::string::find_first_of(char const*, unsigned long, unsigned long) const () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#1 0x00000000007f0e41 in boost::filesystem::path::m_path_iterator_increment(boost::filesystem::path::iterator&) ()
#2 0x00000000007e98cd in boost::filesystem::detail::canonical(boost::filesystem::path const&, boost::filesystem::path const&, boost::system::error_code*) ()
#3 0x0000000000764e86 in boost::filesystem::canonical(boost::filesystem::path const&, boost::filesystem::path const&) ()
#4 0x00000000007646b1 in getBinPath() ()
#5 0x000000000077e334 in getBinaryNameabi:cxx11 ()
#6 0x000000000077e760 in NiceCmdLineOutput::printShortUsage(TCLAP::CmdLineInterface&, std::ostream&) const ()
#7 0x000000000077e62c in NiceCmdLineOutput::failure(TCLAP::CmdLineInterface&, TCLAP::ArgException&) ()
#8 0x00000000006cd4c6 in main ()

Integrating other voice recognition with API

Greetings. I'm in love with what you created :), its astounding.

Anyway i have a small suggestion, can i somehow integrate the voice recognition with the text-to-speech using google cloud? i mean this one : https://cloud.google.com/speech-to-text/docs/languages

And i can specify which letters represent what image is displayed?

Output format for Moho

Would it be possible to implement an output-format for MOHO? This program (former animate studio) is very often used an some other programs use the same format.
It is similar to the stanbdartd format but it uses frames instead of seconds.
The first line of a Moho Switchdatafile (*.dat) must have the content "MohoSwitch1"

best regards
Norbert Linda

Completion percentage stops randomly

See image

Real-time lip sync for Cosplay

Hi.
I found your Github with the Rhubarb Lip Sync app on it and I was wondering if you could give me some advice.

I do animatronics for Cos-Play and other amateur/hobbies applications. One that I have been working on for a long time is a way to take a continuous real time speech stream from a microphone, and generate a number corresponding to the lip shape. From there a secondary processer can take this number and manipulate servo’s to give an approximation on an animatronic character.

For instance.. When they are making a movie where the main character is a werewolf consisting of an actor in the suit, and a team of puppeteers conforming the lips and facial expressions using remote control. All of this has to be carefully scripted or put in afterwards as CGI.

However take the same character and put them in a live performance situation i.e. a Cos-Play convention and you do not have that flexibility. I already have ways to pick up the actors facial movements underneath the costume. But getting a small self-contain Real Time Lip Sync is beyond me.

Could Rhubarb be compiled in Mono to run on an RPi or similar. The output being just a stream of numbers covering the lip shapes. The input would be a microphone worn by the actor.

Any suggestions would be greatly appreciated.

Many thanks
Dave

Magix Vegas and rhubarb need help please

Hi,
I'm trying the latest version of rhubarb with Magix Vegas 16, have created the xml file and selected the image file, when I press OK, I get the following Error : Image file name doesn't have expected format.
Have tried with BMP , PNG and JPG what should I do ?

Best regards

"" is not a valid shape value

Hello,

thanks for providing such a useful tool. I unfortunately came across a problem when using the extra/EsotericSoftwareSpine tool:

here's the underlying error tips:

KEVINXZHANG-MC4:EsotericSoftwareSpine rdm$ java -jar rhubarb-for-spine-1.9.1.jar 
objc[67391]: Class FIFinderSyncExtensionHost is implemented in both /System/Library/PrivateFrameworks/FinderKit.framework/Versions/A/FinderKit (0x7fff9ab9c1d0) and /System/Library/PrivateFrameworks/FileProvider.framework/OverrideBundles/FinderSyncCollaborationFileProviderOverride.bundle/Contents/MacOS/FinderSyncCollaborationFileProviderOverride (0x132f8fdc8). One of the two will be used. Which one is undefined.
com.rhubarb_lip_sync.rhubarb_for_spine.EndUserException: " is not a valid Shape value.
	at com.rhubarb_lip_sync.rhubarb_for_spine.RhubarbTask.call(RhubarbTask.kt:54)
	at com.rhubarb_lip_sync.rhubarb_for_spine.AudioFileModel$startAnimation$wrapperTask$1.run(AudioFileModel.kt:152)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
com.rhubarb_lip_sync.rhubarb_for_spine.EndUserException: " is not a valid Shape value.
	at com.rhubarb_lip_sync.rhubarb_for_spine.RhubarbTask.call(RhubarbTask.kt:54)
	at com.rhubarb_lip_sync.rhubarb_for_spine.AudioFileModel$startAnimation$wrapperTask$1.run(AudioFileModel.kt:152)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

And if it helps, I’ve attached the spine I'm trying to animate:

spineboy.zip

As you can see I just modified the example spineboy given by EsotericSoftware a little bit. Let me know if I need to give you more details. Thank you~

Build support for Linux

Fails to compile with Boost 1.56.0+.

sudo apt-get -y install libboost-all-dev fetches Boost 1.58 and there was change in some templates in Boost 1.56 thus compilation fails.

Relevant links : http://lists.boost.org/boost-users/2014/08/82693.php and https://www.facebook.com/libracode/posts/1339039419469305

[ 13%] Building CXX object CMakeFiles/rhubarb-exporters.dir/src/exporters/XmlExporter.cpp.o
In file included from /usr/include/boost/property_tree/detail/xml_parser_utils.hpp:15:0,
                 from /usr/include/boost/property_tree/detail/xml_parser_write.hpp:15,
                 from /usr/include/boost/property_tree/xml_parser.hpp:15,
                 from /home/ubuntu/rhubarb-lip-sync/src/exporters/XmlExporter.cpp:3:
/usr/include/boost/property_tree/detail/xml_parser_writer_settings.hpp: In instantiation of ‘class boost::property_tree::xml_parser::xml_writer_settings<char>’:
/home/ubuntu/rhubarb-lip-sync/src/exporters/XmlExporter.cpp:28:53:   required from here
/usr/include/boost/property_tree/detail/xml_parser_writer_settings.hpp:38:35: error: ‘char’ is not a class, struct, or union type
  typedef typename Str::value_type Ch;
                                   ^
/usr/include/boost/property_tree/detail/xml_parser_writer_settings.hpp:40:9: error: ‘char’ is not a class, struct, or union type
         xml_writer_settings(Ch inchar = Ch(' '),
         ^
/usr/include/boost/property_tree/detail/xml_parser_writer_settings.hpp:50:33: error: ‘char’ is not a class, struct, or union type
         typename Str::size_type indent_count;
                                 ^
/home/ubuntu/rhubarb-lip-sync/src/exporters/XmlExporter.cpp: In member function ‘virtual void XmlExporter::exportAnimation(const boost::filesystem::path&, JoiningContinuousTimeline<Shape>&, const ShapeSet&, std::ostream&)’:
/home/ubuntu/rhubarb-lip-sync/src/exporters/XmlExporter.cpp:28:53: error: no matching function for call to ‘boost::property_tree::xml_parser::xml_writer_settings<char>::xml_writer_settings(char, int)’
  write_xml(outputStream, tree, writer_setting(' ', 2));
                                                     ^
In file included from /usr/include/boost/property_tree/detail/xml_parser_utils.hpp:15:0,
                 from /usr/include/boost/property_tree/detail/xml_parser_write.hpp:15,
                 from /usr/include/boost/property_tree/xml_parser.hpp:15,
                 from /home/ubuntu/rhubarb-lip-sync/src/exporters/XmlExporter.cpp:3:
/usr/include/boost/property_tree/detail/xml_parser_writer_settings.hpp:36:11: note: candidate: boost::property_tree::xml_parser::xml_writer_settings<char>::xml_writer_settings()
     class xml_writer_settings
           ^
/usr/include/boost/property_tree/detail/xml_parser_writer_settings.hpp:36:11: note:   candidate expects 0 arguments, 2 provided
/usr/include/boost/property_tree/detail/xml_parser_writer_settings.hpp:36:11: note: candidate: constexpr boost::property_tree::xml_parser::xml_writer_settings<char>::xml_writer_settings(const boost::property_tree::xml_parser::xml_writer_settings<char>&)
/usr/include/boost/property_tree/detail/xml_parser_writer_settings.hpp:36:11: note:   candidate expects 1 argument, 2 provided
/usr/include/boost/property_tree/detail/xml_parser_writer_settings.hpp:36:11: note: candidate: constexpr boost::property_tree::xml_parser::xml_writer_settings<char>::xml_writer_settings(boost::property_tree::xml_parser::xml_writer_settings<char>&&)
/usr/include/boost/property_tree/detail/xml_parser_writer_settings.hpp:36:11: note:   candidate expects 1 argument, 2 provided
CMakeFiles/rhubarb-exporters.dir/build.make:134: recipe for target 'CMakeFiles/rhubarb-exporters.dir/src/exporters/XmlExporter.cpp.o' failed
make[2]: *** [CMakeFiles/rhubarb-exporters.dir/src/exporters/XmlExporter.cpp.o] Error 1
CMakeFiles/Makefile2:73: recipe for target 'CMakeFiles/rhubarb-exporters.dir/all' failed
make[1]: *** [CMakeFiles/rhubarb-exporters.dir/all] Error 2
Makefile:149: recipe for target 'all' failed
make: *** [all] Error 2

Support new JSON format in Spine 3.8 beta

Incorrect animation before some pauses

In rare cases, mouth animation before a dialog pause can be wrong. This is the case if the mouth ought to be opening and closing rapidly just before the pause. The optimization algorithm will then try to stretch those last lip flaps over the entire duration of the pause.

Linux documentation

Hi can you please let us know how to use this in linux , i tried running the executable in linux ubuntu , but its not working ,

Magix Vegas, any way to use rhubarb multiple times on the same project ?

Really enjoying rhubarb,
I need to use it multiple times on the same project, but everytime I import rhubarb from the scripting (Magix Vegas 16) it exits my current project and starts a new one. Any way to bypass it and have multiple instances of rhubarb running on the same project?

Best Regards

Faster Result Generation

Just want to ask if there is actually a way to speed up result generation? Since --threads flag is more like to limit the threads. Is there any way or configuration that might speed up the process? Thanks man!

XML and JSON formats may contain only relative path to audio file

The XML and JSON exporters add metadata to the output file, including the path of the audio file. If the audio file is given as a relative path on the command line, it is also written as a relative path. This means that code interpreting that information may not be able to find the original audio file.

When using the Vegas plugin on such an XML file, it responds with the (not really helpful) message "Media stream not specified."

As a fix, the exporters should always write the absolute file path.

Languages

Wow this looks great!
Just wondering about language support. You have any plans to support other languages than English?

I'm planing to do LipSync for my game in Unity that has animations done in Spine (Esoteric Software).

Phonetic recognition

Rhubarb Lip Sync uses word-based speech recognition. That works well for English dialog. For non-English dialog, however, phonetic recognition might work better. Rather than try to extract English words from non-English speech, this will extract phonemes.

I'm planning to add a CLI option to switch to phonetic recognition.

This is only a temporary solution. In the long run, I still plan to implement full (word-based) recognition for languages other than English (see #5).

Extract phonemes rather than shapes

Hello,

Very interesting project! I am trying to use a wave file to extract phonemes as well. In your project you can extract target shapes rather than phonemes. Is it possible to modify the program to support extraction of phonemes given a phoneme set (e.g. Arpabet, IPA)?

Constantinos

Special characters in output file name get garbled

Special characters (non-ASCII Unicode) in file paths are handled correctly for the input file and the log file. However, they don't work for the output file (at least on Windows). Specifying -o der-frühe-vogel.xml results in a file der-frÃ¼he-vogel.xml.

The animation from the sync result didn't match very good.

I took a piece of speech and use rhubarb to extract the mouth shape and put the mouth shape change into an animation video. It seems to match for less than 50% of the speech. Would you mind take a look what's wrong? The animated video

0.00 X
0.56 C
0.79 B
1.37 D
1.57 C
......

I just put the mouth shape X from 0 to 0.56 seconds and C from 0.56 to 0.79 seconds. Is that the right way of creating the animation? I feel like the sample rate is a bit low since when the speech speed is fast many syllables are not recognized. Is there an option to adjust the sampling rate?