svito-zar / gesticulator Goto Github PK

The official implementation for ICMI 2020 Best Paper Award "Gesticulator: A framework for semantically-aware speech-driven gesture generation"

Home Page: https://svito-zar.github.io/gesticulator/

License: GNU General Public License v3.0

Dockerfile 0.12% Python 99.88%

gesture-generation gestures autoregression neural-networks machine-learning animation agents graphics human-computer-interaction

gesticulator's Introduction

This repository contains PyTorch based implementation of the ICMI 2020 Best Paper Award recipient paper Gesticulator: A framework for semantically-aware speech-driven gesture generation.

0. Set up

Requirements

python3.6+
ffmpeg (for visualization)

Installation

NOTE: during installation, there will be several error messages (one for bert-embedding and one for mxnet) about conflicting packages - please ignore them, they don't affect the functionality of the repository.

Clone the repository:

git clone [email protected]:Svito-zar/gesticulator.git

(optional) Create and activate virtual environment:

virtualenv gest_env --py=3.6.9
source gest_env/bin/activate

conda create -n gest_env python=3.6.9
conda activate gest_env

Install the dependencies:
```
python install_script.py
```

Demonstration

Head over to the demo folder for a quick demonstration if you're not interested in training the model yourself.

Documentation

For all the scripts which we refer to in this repo description there are several command line arguments which you can see by calling them with the --help argument.

Loading and saving models

Pretrained model files can be loaded with the following command

from gesticulator.model.model import GesticulatorModel

loaded_model = GesticulatorModel.load_from_checkpoint(<PATH_TO_MODEL_FILE>)

If the --save_model_every_n_epochs argument is provided to train.py, then the model will be saved regularly during training.

Training the model

1. Obtain the data

Sign the license for the Trinity Speech-Gesture dataset

Obtain training data from the GENEA_Challenge_2020_data_release folder of the Trinity Speech-Gesture dataset, using the acquired credentials:

cd dataset
mkdir genea_data && cd genea_data

# Change USERNAME to the actual username you received for the dataset
wget --user USERNAME --ask-password -r -np -nH --cut-dirs=2 -R index.html* https://trinityspeechgesture.scss.tcd.ie/data/Trinity%20Speech-Gesture%20I/GENEA_Challenge_2020_data_release/

2.1 Rename and move files

# rename files from the GENEA Challenge names to the Trinity Speech-Gesture dataset naming
python rename_data_files.py

# Go back to the gesticulator/gesticulator directory
cd ..

2.2 Pre-process the data

cd gesticulator/data_processing

# encode motion from BVH files into exponensial map representation
python bvh2features.py
# ( this will take a while)

# Split the dataset into training and validation
python split_dataset.py

# Encode all the features
python process_dataset.py

# Go back to the gesticulator/gesticulator directory
cd ..

By default, the model expects the dataset in the dataset/raw_data folder, and the processed dataset will be available in the dataset/processed_data folder. If your dataset is elsewhere, please provide the correct paths with the --raw_data_dir and --proc_data_dir command line arguments.

3. Learn speech- and text-driven gesture generation model

In order to train the model, run

python train.py

The model configuration and the training parameters are automatically read from the gesticulator/config/default_model_config.yaml file.

Notes

The results will be available in the results/last_run/ folder, where you will find the Tensorboard logs alongside with the trained model file.

It is possible to visualize the predicted motion on the validation data during training by setting the save_val_predictions_every_n_epoch parameter in the config file.

If the --run_name <name> command-line argument is provided, the results/<name> folder will be created and the results will be stored there. This can be very useful when you want to keep your logs and outputs for separate runs.

To train the model on the GPU, provide the --gpus argument as described here. For details regarding training parameters, please visit this link.

Evaluating the model

Visualizing the results

In order to generate and visualize gestures on the test dataset, run

python evaluate.py --use_semantic_input --use_random_input

If you set the run_name argument during training, then please provide the path to the saved model checkpoint by using the --model_file option.

The generated motion is stored in the results/<run_name>/generated_gestures folder 1) in the exponential map format 2) as .mp4 videos and 3) as 3D coordinates (which can be used for objective evaluation).

For nice visualization you can use the following repository: https://github.com/jonepatr/genea_visualizer

Quantitative evaluation

For the quantitative evaluation (velocity histograms and jerk), you may use the scripts in the gesticulator/obj_evaluation folder.

Citing

If you use this code in your research please cite it:

@inproceedings{kucherenko2020gesticulator,
  title={Gesticulator: A framework for semantically-aware speech-driven gesture generation},
  author={Kucherenko, Taras and Jonell, Patrik and van Waveren, Sanne and Henter, Gustav Eje and Alexanderson, Simon and Leite, Iolanda and Kjellstr{\"o}m, Hedvig},
  booktitle={Proceedings of the ACM International Conference on Multimodal Interaction},
  year={2020}
}

For using the dataset used in this work, please don't forget to cite Trinity Speech-Gesture dataset and GENEA Gesture Generation Challenge using the following bib files:

@inproceedings{ferstl2018investigating,
author = {Ferstl, Ylva and McDonnell, Rachel},
title = {Investigating the Use of Recurrent Motion Modelling for Speech Gesture Generation},
year = {2018},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
booktitle = {Proceedings of the 18th International Conference on Intelligent Virtual Agents},
series = {IVA '18}
}

@inproceedings{kucherenko2021large,
  author = {Kucherenko, Taras and Jonell, Patrik and Yoon, Youngwoo and Wolfert, Pieter and Henter, Gustav Eje},
  title = {A Large, Crowdsourced Evaluation of Gesture Generation Systems on Common Data: {T}he {GENEA} {C}hallenge 2020},
  year = {2021},
  publisher = {Association for Computing Machinery},
  address = {New York, NY, USA},
  doi = {10.1145/3397481.3450692},
  booktitle = {26th International Conference on Intelligent User Interfaces},
  pages = {11--21},
  numpages = {11},
  keywords = {evaluation paradigms, conversational agents, gesture generation},
  location = {College Station, TX, USA},
  series = {IUI '21}
}

Contact

If you have any questions - please use the Discussion tab.

If you encounter any problems/bugs/issues please create an issue on Github.

gesticulator's People

Contributors

Stargazers

Watchers

Forkers

nagyrajmund dusky3 famosi peterzhousz ramfalas done-n-dusted teshima058 greatfeel birdflies yesiltepe-hidir tomkingsforduoa lakecarrot hhurchand zf223669 limjooeun nefeliandreou peterzs toymaker222 hydu0016

gesticulator's Issues

Update to Python 3.8

Hello,

I was hearing your talk at HBRS from March 15th 2023. After this lecture, we have students have to make an assignment containing the usage of the demo from this repository.

I am using a Windows 10 OS incl. Ubuntu subsystem 18.04. In this subsystem Python 3.8.10 is installed.

While installing I saw, that the module dataclasses-0.8 was needed. Installing it automatically with pip install dataclasses just installed version 0.6. So, I had to download it manually from this page.

Install this module with pip install dataclasses-0.8-py3-none-any.whl will yield into the error

ERROR: Package `dataclasses requires a different python: 3.8.10 not in >=3.6 and <3.7

Is there any way to make this demo compatible with python 3.8?

Thanks
Alex

CUDA error

Dear Sir:
I intent to run the training on GPU, however, I got a error below: what`s that problem? thanks!
/home/zf223669/Mount/anaconda3/envs/Gesticulator/bin/python3.6 /home/zf223669/Mount/Gesticulator/gesticulator/gesticulator/train.py --gpus 1
/home/zf223669/Mount/anaconda3/envs/Gesticulator/lib/python3.6/site-packages/requests/init.py:91: RequestsDependencyWarning: urllib3 (1.26.5) or chardet (3.0.4) doesn't match a supported version!
RequestsDependencyWarning)
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
CUDA_VISIBLE_DEVICES: [0]

| Name | Type | Params

0 | activation | Tanh | 0
1 | first_layer | Sequential | 156 K
2 | second_layer | Sequential | 131 K
3 | third_layer | Sequential | 196 K
4 | hidden_to_output | Sequential | 13 K
5 | encode_speech | Sequential | 238 K
6 | reduce_speech_enc | Sequential | 2 M
7 | conditioning_1 | Sequential | 69 K
8 | loss | MSELoss | 0
Validation sanity check: 0it [00:00, ?it/s]Traceback (most recent call last):
File "/home/zf223669/Mount/Gesticulator/gesticulator/gesticulator/train.py", line 69, in
main(hyperparams)
File "/home/zf223669/Mount/Gesticulator/gesticulator/gesticulator/train.py", line 43, in main
trainer.fit(model)
File "/home/zf223669/Mount/anaconda3/envs/Gesticulator/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 1003, in fit
results = self.single_gpu_train(model)
File "/home/zf223669/Mount/anaconda3/envs/Gesticulator/lib/python3.6/site-packages/pytorch_lightning/trainer/distrib_parts.py", line 186, in single_gpu_train
results = self.run_pretrain_routine(model)
File "/home/zf223669/Mount/anaconda3/envs/Gesticulator/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 1196, in run_pretrain_routine
False)
File "/home/zf223669/Mount/anaconda3/envs/Gesticulator/lib/python3.6/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 293, in _evaluate
output = self.evaluation_forward(model, batch, batch_idx, dataloader_idx, test_mode)
File "/home/zf223669/Mount/anaconda3/envs/Gesticulator/lib/python3.6/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 470, in evaluation_forward
output = model.validation_step(*args)
File "/home/zf223669/Mount/Gesticulator/gesticulator/gesticulator/model/model.py", line 464, in validation_step
predicted_gesture = self.forward(speech, text, use_conditioning=True, motion = None, use_teacher_forcing=False)
File "/home/zf223669/Mount/Gesticulator/gesticulator/gesticulator/model/model.py", line 293, in forward
speech_encoding_full = self.encode_speech(curr_speech)
File "/home/zf223669/Mount/anaconda3/envs/Gesticulator/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/home/zf223669/Mount/anaconda3/envs/Gesticulator/lib/python3.6/site-packages/torch/nn/modules/container.py", line 100, in forward
input = module(input)
File "/home/zf223669/Mount/anaconda3/envs/Gesticulator/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/home/zf223669/Mount/anaconda3/envs/Gesticulator/lib/python3.6/site-packages/torch/nn/modules/linear.py", line 87, in forward
return F.linear(input, self.weight, self.bias)
File "/home/zf223669/Mount/anaconda3/envs/Gesticulator/lib/python3.6/site-packages/torch/nn/functional.py", line 1372, in linear
output = input.matmul(weight.t())
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)

About computer configuration

Hello, I would like to ask you about the computer configuration used in your trainning model and the approximate trainning time.

Mismatching number of frames in original transcripts vs. BERT for GENEA Challenge

Hi!

I have cloned this repository and performed steps 1 and 2 from the README. I placed the files from folder GENEA_Challenge_2020_data_release into one directory per category instead of having two directories Test_data and Training_data. During step 2, when I run python process_dataset.py, the following error occurs:

ERROR: The number of frames in the encoded transcript (4678)
       does not match the number of frames in the input (4762)!

The error comes from processing Recording_004.json in Training_data/transcripts with parse_json_transcript.py.

How can I fix this error?

I assume the error might have to do with the fact that I am using the transcriptions originally published for the GENEA challenge. I assume this because In an earlier commit, the README proposed to use the original dataset and transcribe it with Google ASR. But now it says to use the GENEA folder, so I wanted to give it a try.

Any info on this issue would help me out a great deal! Thanks in advance.

Cheers,
Stefan

about output layer

Dear Taras,
Today I revisited your code, a curiosity is about this line in model.py (line 329):
curr_pose = self.hidden_to_output(final_h) * math.pi #because it is from -pi to pi

To me, multiplying math.pi is redundant because in your output layer, the final function is a linear transformation (line 161), which should be capable to learn that constant value automatically:
self.hidden_to_output = nn.Sequential(nn.Linear(final_hid_l_sz, self.output_dim),
nn.Tanh(), nn.Dropout(args.dropout),
nn.Linear(self.output_dim, self.output_dim))

In my case, I am working on 2D lip-synching, and both training input and ground-truth (I am using MFCC as input, face landmarks detected by dlib as ground truth) are already normalized by removing mean and divided by std, so I did not multiply anything with the output:
curr_pose = self.hidden_to_output(final_h)

Not sure if I am doing it correctly?

Thanks for your sharing,
Kelvin

evaluation

Dear authors,

Thanks for the excellent work and the code contribution!
----- training --------
I have successfully run the train.py script

how many epochs you used for training?-> in your paper, you mentioned 'to pretain the model without autoregression for first 7 epochs' and then '5 epochs of training with autoregression', do you use 12 epochs for the final model?
do you set an early stop mechanism?

----- evaluation ----------
when I run the evaluation work, it returns the error:
no such file or directory : '../dataset/processed/test_inputs/X_test_NaturalTalking_04.npy' ]

I checked the 'test_inputs' folder it only contains 'T_test_Recording_003.npy and X_test_Recording_003.npy'. They are from the 'Training_data' of the GENEA_Challenge_2020_data.

Could you please specify which test file you use for the paper?

Thanks a lot for your contribution and looking forward to your reply.

Train error

Dear,
When I use my own data to train the models.
I encountered the following error:

ERROR: GesticulatorModel.forward() returned None
Possible causes: corrupt dataset or a problem with the environment.

Do you think what is the reason and How to solve it?
In forward function, why set "motion_seq = None" every time ?

how to visualize BVH files

Hi, Thanks for the wonderful project.

Please ignore the title of this question. I typed it wrong, but could not change it.

My question is:

I saw the video for the gesture that the neural net predicts from a given speech and text.
The predicted gesture is in BVH file format. So, I guess that there should be a BVH file viewer in order to generate the video. But I could not find any BVH viewer in the github. So, would you mind pointing to a BVH viewer this project uses?

What is the reason to do Euler_Angle to Expressional_Map conversion?

Dear Taras,
Can you please share your knowledge on why you do Euler_Angle2Exponential_Map conversion first and then build deep learning model?
Is it because that Exponential_Map has some special characteristics for easy convergence?

A related question is why you don't consider to do Euler_Angle2Position conversion for model building?

Thanks for your sharing,
Kelvin

Folder setup for training

Hi,

Thank you for the exciting project. I am trying to get training working, but I am finding documentation about how to process the dataset a bit lacking.

I downloaded the GENEA dataset. I then took the following steps:

I renamed the downloaded dataset as genea_data (note that this different from the instructions, which suggest putting the downloaded folder in a folder called genea_data, but seems to be consistent with the subsequent steps).

I then, in the data_processing folder, executed:

python bvh2features.py --bvh_dir ../../dataset/raw_data/Motion/ --dest_dir ../../dataset/processed_data/

(note that there are no defaults for bvh_dir or dest_dir, and the names of these flags do not match the documentation.)

After this, a bunch of .npz files are dumped into processed_data. I then run split_dataset.py, which creates a train and a test folder in processed_data, but does not distribute the .npz files. This is clearly not correct, as they should clearly be in train/test folders.

The documentation right now is inaccurate and makes it extremely difficult to reproduce the training procedure. Can you please explain how this is supposed to look and work? I am very confused and frustrated.

JointSelector question

Dear Taras,
Today I want to modify your code to not only select upper body (15 joints) but also select legs and foots (addition 10 joints). A problem comes out:

Your original JointSelector:
('jtsel', JointSelector(['Spine','Spine1','Spine2','Spine3','Neck','Neck1','Head','RightShoulder', 'RightArm', 'RightForeArm', 'RightHand', 'LeftShoulder', 'LeftArm', 'LeftForeArm', 'LeftHand'], include_root=True)),
Will return expmap in 45 dims

My revised JointSelector:
('jtsel', JointSelector(['Spine','Spine1','Spine2','Spine3','Neck','Neck1','Head','RightShoulder', 'RightArm', 'RightForeArm', 'RightHand', 'LeftShoulder', 'LeftArm', 'LeftForeArm', 'LeftHand', 'RightUpLeg', 'RightLeg', 'RightFoot', 'RightForeFoot', 'RightToeBase', 'LeftUpLeg', 'LeftLeg', 'LeftFoot', 'LeftForeFoot', 'LeftToeBase'], include_root=True)),
Unfortunately return expmap in 65 dims (I expect 75 dims)

Can you please take a look whether preprocessing.py has a bug please? Thanks!

BTW, a side question is why you did not include figures in your original JointSelector of upperbody?

Kelvin

IMPORTANT: the velocity loss had a bug!

The bug was fixed in this commit: 1164222 .

But this means that experiments with the velocity loss in the paper were not correct and should be disregarded.

evaluation metric

For the evaluation metric, e.g. Table 3 and Fig. 4, what does the value mean?
Does higher/lower value correspond to better results? Or if the value is close to the ground truth=> better result

error when in demo.py

Hi,
when i run python demo.py in the demo folder, error occurs

Traceback (most recent call last):
File "demo.py", line 109, in
main(args)
File "demo.py", line 31, in main
data_pipe_dir = '../gesticulator/utils/data_pipe.sav')
File "/data/wuzhongjun/gesticulator/gesticulator/visualization/motion_visualizer/generate_videos.py", line 32, in visualize
20)
File "/data/wuzhongjun/gesticulator/gesticulator/visualization/motion_visualizer/convert2bvh.py", line 13, in write_bvh
inv_data = data_pipeline.inverse_transform(anim_clip)
AttributeError: 'numpy.ndarray' object has no attribute 'inverse_transform'

in the function write_bvh of file convert2bvh.py, the type of variable data_pipeline is <class 'numpy.ndarray'>，which has no attribute 'inverse_transform'.
I alter the file data_pipe.sav in gesticulator/utils to the one generated by script bvh2features.py in gesticulator/data_processing, and data_pipeline is

Pipeline(memory=None,
steps=[('dwnsampl', DownSampler(keep_all=False, tgt_fps=60)), ('root', RootTransformer(method='hip_centric', position_smoothing=0,
rotation_smoothing=0)), ('mir', Mirror(append=True, axis='X')), ('jtsel', JointSelector(include_root=True,
joints=['Spine', 'Spine1', 'Spine2', 'Spine3', ...ocapParameterizer(param_type='expmap')), ('cnst', ConstantsRemover(eps=1e-06)), ('np', Numpyfier())])

but still get error of

Traceback (most recent call last):
File "demo.py", line 109, in
main(args)
File "demo.py", line 31, in main
data_pipe_dir = '../gesticulator/utils/data_pipe.sav')
File "/data/wuzhongjun/gesticulator/gesticulator/visualization/motion_visualizer/generate_videos.py", line 32, in visualize
20)
File "/data/wuzhongjun/gesticulator/gesticulator/visualization/motion_visualizer/convert2bvh.py", line 13, in write_bvh
inv_data = data_pipeline.inverse_transform(anim_clip)
File "/home/wuzhongjun/anaconda3/lib/python3.7/site-packages/sklearn/pipeline.py", line 458, in inverse_transform
Xt = transform.inverse_transform(Xt)
File "/data/wuzhongjun/gesticulator/gesticulator/visualization/pymo/preprocessing.py", line 646, in inverse_transform
new_mocap = self.org_mocap.clone()
AttributeError: 'Numpyfier' object has no attribute 'org_mocap_'

Any idea to solve this problem? Hope to see your reply. Thanks a lot.

Zhongjun, Wu

The data format of motion = gp.predict_gestures(args.audio,args.text)?

Hi, I enjoy your gesticulator.
I would like to call the motion generator from Unity C# script, using pythonnet package.

I was experimenting how to get the motion from the gesticulator in C# script and apply it to an Unity humanoid character.
All things got done except that the motion displayed in Unity does not look right; The character remains in T pose.

For testing I used demo.py where we have the following code:

model = GesticulatorModel.load_from_checkpoint( # MJ: model should be obtained in c# script once for all, not for every utterance
args.model_file, inference_mode=True)
# This interface is a wrapper around the model for predicting new gestures conveniently
gp = GesturePredictor(model, feature_type)

# 2. Predict the gestures with the loaded model
motion = gp.predict_gestures(args.audio, args.text)

=> 

The shape of motion is: (1, 528, 45). I thought that 45 refers to the number of Euler angles in the 15 selected joints
specified in JointSelector(['Spine','Spine1','Spine2','Spine3','Neck','Neck1','Head','RightShoulder', 'RightArm', 'RightForeArm', 'RightHand', 'LeftShoulder', 'LeftArm', 'LeftForeArm', 'LeftHand']

in

data_pipe = Pipeline([
   ('dwnsampl', DownSampler(tgt_fps=fps,  keep_all=False)),
   ('root', RootTransformer('hip_centric')),
   ('mir', Mirror(axis='X', append=True)), ## MJ:The mirroring along the x axis is used as data augmentation technique;
   #  15 joint angles are selected:
   ('jtsel', JointSelector(['Spine','Spine1','Spine2','Spine3','Neck','Neck1','Head','RightShoulder', 'RightArm', 'RightForeArm', 'RightHand', 'LeftShoulder', 'LeftArm', 'LeftForeArm', 'LeftHand'], include_root=True)),
   ('exp', MocapParameterizer('expmap')), # Euler Angles to Exponential Map
   ('cnst', ConstantsRemover()),
   ('np', Numpyfier())
])

from bvh2features.py

Q1: Does not the motion include the rotation for the Hip joint? Did you assume that the rotation of the Hip joint is 
     constant (0,0,0) in Euler angles? I am confused because you mention "14 joints excluding the Hips" somewhere in your
    gesticulator paper.
     
Q2:  The Euler angles in motion are in degrees rather than radians?    

Thanks.

bvh get downsampled twice?

Hi,

In process_data.py line 96 output vectors get downsampled.

output_vectors = output_vectors[0::3] # Subsample motion (from 60 fsp to 20 fps)

Maybe I understood something wrong, but wasnt this already done in the downsampling step inside the pipeline of bvh2features.py?

kind regards

Import Error for running demo code

I have installed required packages with the following command.

pip install -r requirements.txt

but got this error:

ERROR: torchnlp 0.0.0.1 has requirement scikit-learn==0.20.2, but you'll have scikit-learn 0.22.2.post1 which is incompatible.
ERROR: torchnlp 0.0.0.1 has requirement torch==1.0.0, but you'll have torch 1.4.0 which is incompatible.
ERROR: torchnlp 0.0.0.1 has requirement tqdm==4.28.1, but you'll have tqdm 4.44.1 which is incompatible.

Now, if I run demo code with following command:

python demo.py --audio input/jeremy_howard.wav --text input/jeremy_howard.json

I get this error:

Traceback (most recent call last):
  File "demo.py", line 5, in <module>
    import torch
  File "/home/<name>/python3.7/site-packages/torch/__init__.py", line 81, in <module>
    from torch._C import *
ImportError: numpy.core.multiarray failed to import

Could you please guide on this?

About the demo result

Hello! Recently I run the demo.py for a simple test. But I notice that the output file with .mp4 suffix has no audio. It's only a video without any audio. Is it normal?
Looking forward to your early reply!

Error when running demo.py ("bert_spectro" does not exist)

Hi!

I've tried to run the demo as outlined in the README in the demo folder. Regardless of the instruction I use (listed under Instructions) the following error occurs:

ERROR: The given dataset directory /path/to/repository/gesticulator/demo/data_processing/bert_spectro does not exist!
Please, set the correct path with the --data_dir option!

What I found out so far, is that the string bert_spectro is contained in models/default.ckpt.

I would appreciated any help. Thanks in advance!

Cheers,

Stefan

FiLM algorithm, why (alpha + 1)?

Thanks for your work, which is great! When I reading ur code, a question comes to my mind:
In my understanding, the correct equation should be, output = nn_layer * (alpha) + beta, why you add 1 in your implementation,

Thanks

Penetrating hands through spines

Hi, Svito-zar.

First of all, big thanks for awesome project and paper.
I have been exploring your project and successfully achieved bvh data from demo.py script.

However, when I import the bvh data into blender to visualize, I encountered unexpected phenomenon.
As attached image(captured on blender), hands are penetrating some bones(mostly bones around spine area).

I have been investigating why this is happening, and as of now, I couldn't find any clue yet.
Can you please share your opinion on this issue?
Also, I would be much grateful if you guide me how to prevent this phenomenon by either code level or blender level(preferably code level solution).

Again, I appreciate your awesome project!
Many Thanks.

GRU layer should have the batch_first=True flag

Hi, I was going through the gesticulator codebase and using GRU for speech feature encoding. I noticed that before sending the curr_speech input to GRU, you keep the first dimension as the batch_size and the second dimension as the temporal size. So batch_first=True flag should be used to initialize GRU layer in my opinion. Please let me know if this is the case. Thank you for sharing your awesome work :)

dataset structure

i'm confused about the dataset structure.
I have downloaded the trinity speech gesture dataset, it includes several folders:

AlignmentTimes.csv	2019-10-01 16:20	338
Audio/	2019-10-01 15:34	-
BVH_from_clap/	2020-08-14 19:31	-
FBXs/	2019-01-23 17:16	-
FBXs_from_clap/	2020-08-14 21:34	-
GENEA_Challenge_2020_data_release/	2020-09-09 12:45	-

For this repository, shall we only use the GENEA_Challenge_2020_data_release and ignore the others?
For GENEA_Challenge_2020_data_realease, we then apply

encode motion from BVH files into exponensial map representation

python bvh2features.py

Split the dataset into training and validation

python split_dataset.py

Encode all the features

python process_dataset.py

to the 'Training_data' and 'Test_data' folders, respectively?
3. I'm confused about the data structure, could the authors shall their data structure?

Is Chinese supported？

make bvh datasets

Dear Svito-zar, Do you know how to make bvh datasets as same as the format of GENEA Challenge 2020?

error in demo.py

no module named 'pymo.preprocessing' ,but it is not in requirements.txt
Thanks!!

"bvh2features.py" what exactly returns?

Hi @Svito-zar, I'm working on a speech-to-gesture model for social robots. The goal of the project is to learn a model that predicts gestures from a given audio. Then, through a "motion2robot" mapping, the robot imitates predicted gestures.

Currently what I need to do is to extract features from BVH files. I've downloaded the GENEA2020 and trinityspeechgesture datasets. At this step, I think your bvh2features.py script could come to my aid. Here I have some questions on this script:

The description of the script says "This script converts a gesticulation dataset from the BVH format to joint angles.". I can't get what exactly "joint angles" are. For example, If I have for the joint_i [Zrotation_i Xrotation_i Yrotation_i] rotation coordinates, what I have as result from the script for the joint_i? How can I interpret these new angles?
I need only the upper body. Is this what JointSelector allows me to do?
You selected 9 joints (['Head','RightShoulder', 'RightArm', 'RightForeArm', 'RightHand', 'LeftShoulder', 'LeftArm', 'LeftForeArm', 'LeftHand']). I've tried to give in input to this script a BVH file with fps=60 and I have in output a numpy array with shape (n_frame, 27). From this shape I deduce that I have, for each Joint, 3 values. What this values are is the answer to 1. but, if the deduction is correct, what's the order of this array? Is the order the same as the declaration in the JointSelector? For example, in this case, does [npy_array[0][0], npy_array[0][1], npy_array[0][2] correspond to the Head angles at frame=0?

P.S: My model will use only audio features (no text). Is there a way to train your model using only audio?

Many thanks for your help and congrats for your works :)

Update to python 3.8 torchlightning 1.5 for compatibility with Nvidia 3090

Link:https://github.com/zf223669/gesticulator.git

fixed some bugs to update to python3.8

the consult of verifying when training

Hi，I am extremely grateful to all your help always.
I have questions as follows:
During training , do the model use 'X_dev.npy' and 'Y_dev.npy' for verifying?

    def __init__(self, root_dir, apply_PCA=False, train=True):
        """
        Args:
            root_dir (string): Directory with the datasat.
        """
        self.root_dir = root_dir
        # Get the data
        if train:
            self.audio = np.load(path.join(root_dir, 'X_train.npy')).astype(np.float32)
            self.text = np.load(path.join(root_dir, 'T_train.npy')).astype(np.float32)
            # apply PCA
            if apply_PCA:
                self.gesture = np.load(path.join(root_dir, 'PCA', 'Y_train.npy')).astype(np.float32)
            else:
                self.gesture = np.load(path.join(root_dir, 'Y_train.npy')).astype(np.float32)
        else:
            self.audio = np.load(path.join(root_dir, 'X_dev.npy')).astype(np.float32)
            self.text = np.load(path.join(root_dir, 'T_dev.npy')).astype(np.float32)
            # apply PCA
            if apply_PCA:
                self.gesture = np.load(path.join(root_dir, 'PCA', 'Y_dev.npy')).astype(np.float32)
            else:
                self.gesture = np.load(path.join(root_dir, 'Y_dev.npy')).astype(np.float32)

Or the model use the 'X_dev_NaturalTalking_001.npy' when verifying?

class ValidationDataset(Dataset):
    """Validation samples from the Trinity Speech-Gesture Dataset."""

    def __init__(self, root_dir, past_context, future_context):
        """
        Args:
            root_dir (string): Directory with the datasat.
        """
        self.root_dir = root_dir
        self.past_context = past_context
        self.future_context = future_context
        # Get the data
        self.audio = np.load(path.join(root_dir, 'dev_inputs', 'X_dev_NaturalTalking_001.npy')).astype(np.float32)
        self.text = np.load(path.join(root_dir, 'dev_inputs', 'T_dev_NaturalTalking_001.npy')).astype(np.float32)

I'm a little confused~

obj_evaluation

The download link of the reference gestures for the test dataset in obj_evaluation part is unavailable now, can you update a new link? Thanks very much.

error with module pymo

D:\PycharmProjects\gesticulator-master\venv\Scripts\python.exe D:/PycharmProjects/gesticulator-master/demo/demo.py
Some weights of the model checkpoint at bert-base-cased were not used when initializing BertModel: ['cls.seq_relationship.bias', 'cls.predictions.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.weight']

This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Using time-annotated JSON transcription 'input/jeremy_howard.json'
Traceback (most recent call last):
File "D:\PycharmProjects\gesticulator-master\demo\demo.py", line 108, in
main(args)
File "D:\PycharmProjects\gesticulator-master\demo\demo.py", line 28, in main
visualize(motion.detach(), "temp.bvh", "temp.npy", "temp.mp4",
File "D:\PycharmProjects\gesticulator-master\demo\gesticulator\visualization\motion_visualizer\generate_videos.py", line 29, in visualize
write_bvh((data_pipe_dir,), # write_bvh expects a tuple
File "D:\PycharmProjects\gesticulator-master\demo\gesticulator\visualization\motion_visualizer\convert2bvh.py", line 11, in write_bvh
data_pipeline = joblib.load(datapipe_file[0])
File "D:\PycharmProjects\gesticulator-master\venv\lib\site-packages\joblib\numpy_pickle.py", line 587, in load
obj = _unpickle(fobj, filename, mmap_mode)
File "D:\PycharmProjects\gesticulator-master\venv\lib\site-packages\joblib\numpy_pickle.py", line 506, in _unpickle
obj = unpickler.load()
File "C:\Users\UserPC.LAPTOP-F0JUUKDE\AppData\Local\Programs\Python\Python39\lib\pickle.py", line 1212, in load
dispatchkey[0]
File "C:\Users\UserPC.LAPTOP-F0JUUKDE\AppData\Local\Programs\Python\Python39\lib\pickle.py", line 1528, in load_global
klass = self.find_class(module, name)
File "C:\Users\UserPC.LAPTOP-F0JUUKDE\AppData\Local\Programs\Python\Python39\lib\pickle.py", line 1579, in find_class
import(module, level=0)
ModuleNotFoundError: No module named 'pymo.preprocessing'

This one above it is the error I get... I don't know how to deal since I don't know how to differently specify the directory to the module

Some questions about the result

Hello! I'm wondering about the results shown in the paper, in the abstract, it mentioned "generates gestures as a sequence of joint angle rotations as output", but the visual result in your demo video is a human-like result, how to convert it? Can I use the digital human mesh with obj suffix as data to drive?
Hope to get your early reply! Thank you!

What is the joint "pCube4" which is defined as a child of Spine3 joint?

I noticed that there is "pCube4" joint as a child of Spine3:
JOINT pCube4
{
OFFSET 0.00000 10.91090 1.10792
CHANNELS 3 Zrotation Xrotation Yrotation
End site
{
OFFSET 0.00000 0.00000 0.00000
}
}
What is this joint?

process_data

Hi, Svito-zar, It seems that you do not use all data to train the models.
In the follow code, you take a random sample according to seq_step, which may reduce real training data.

 stop_ind = input_vectors.shape[0] - n_reserved_inds
 input_vectors_final  = np.array([input_vectors[i - args.past_context : i + n_reserved_inds] 
                                     for i in range(start_ind, stop_ind, seq_step)])
stop_ind = output_vectors.shape[0] - n_reserved_inds
output_vectors_final = np.array([output_vectors[i - args.past_context : i + n_reserved_inds]
                                     for i in range(start_ind, stop_ind, seq_step)])

gesticulator-requirements

Thanks for the amazing work!
the package specified in requirements.txt under 'gesticulator' has some conflicts. Could you please double-check the requirements and update the file?

instructions for training NoText model

hi，
when i look into the demo.py, i find that there are different feature types, including "MFCC", "Pros", "MFCC+Pros", "Spectro", "Spectro+Pros", if the Pretrained model is just for Spectro features, indicating in function check_feature_type，
Another question is that how to train the model without text input, or how to test the performance of No Text stated in the paper. could you give some instructions.
Thanks a lot.