sortanon / controllabletalknet Goto Github PK

View Code? Open in Web Editor NEW

120.0 120.0 47.0 166 KB

A web app that lets you play around with TalkNet models

License: GNU Affero General Public License v3.0

CSS 11.09% Python 62.59% Jupyter Notebook 25.09% Dockerfile 1.23%

controllabletalknet's People

Contributors

Stargazers

Watchers

Forkers

raytrac3r minermanb seantempesta jhfong cheezebone d3ft0uch nuked88 abb128 wintdkyo wow-glimmer vivekkalyanarangan30 cdg720 randy-h0 joshuaword2alt doken-tokuyama techisthatfun simrit1 andrewkuo spicytigermeat jaedukseo martindisley jjandnn leeformoney lalalune cris140 effusiveperiscope vesuvan f0nt wszzswwszzsw docenig juryrigger ariffinsetya elandig ifgcguitarclub yeshuawb3 neocadia hydrusbeta leonardorodriguesdasilva devindahiddouse macguyversmusic sanyaade-projects sirlatore pandaind motorwaysouth

controllabletalknet's Issues

other gans?

IIRC there are other compatible GANs and a lot of new stuff coming out. is Univnet possible? FreGAN2?

I have a problem with setup.bat. Each time I run it this appears and after pressing a button it says that Visual Studios is being installed but after it's all done nothing changes. I try running setup.bat again but the same thing happens every time. Does anyone have a fix for that?

Regardless of mew tech add new models

Hello sort anon pls update model lists and make new models if possible regardless of other tech talknet still has its benefits

Please explain how to use custom models?

What does Drive ID for custom model imply? the full directory to the file? This is confusing.

more mlp characters [request]

i wonder if its possible to add derpy whooves, and doctor whooves and other ponys like gallus and some of the others like that

Lots of errors during setup

I've been running into a ton of issues when running setup.bat on my Windows 10 laptop. I think I fixed the initial issues ("io.h not found", "error cannot open file 'kernel32.lib'") by setting up environment variables, but now there's constant errors that seem to appear when attempting to build a wheel.

Here are what's currently installed in my VS BuildTools Development w/ C++:

Here are the environment variables I've added to fix the "can't find file" issues:

Here's a preview of my current error that repeatedly happens before aborting (it's waaaay longer than this):

Here's a link to the entirety of what my console says:

https://pastebin.com/th0VLJz9

I'm probably an idiot and might not have installed something properly, since I don't have any knowledge of python, c++, or errors in general, but I would appreciate any pointers that could help me fix these issues and run your webapp offline. Thank you!

docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].

did I set docker up wrong?

fixed Dockerfile by downgrading workzeug

I noticed the current version of dash requires an older workzeug. This pull request edits the dockerfile to force that dependency to downgrade to a working version.

Notebooks not working due to missing modules

It appears that the two Google Colab notebooks for training and using Controllable TalkNet do not work properly anymore. I constantly get errors that various modules are missing and even when I add lines to install them manually, some of them still don't work (NeMo in particular). These notebooks used to work for me with no problems before, but not anymore. It appears that many of the dependencies have been updated and function differently. If that is the case, will the notebooks be updated at some point?

is it possible to make a derpy and doctor whooves model please ?

please make these two models for talk net

"is a directory" error

fid = open(filename, 'rb')

IsADirectoryError: [Errno 21] Is a directory: '/home/jordancruz/Tools/ControllableTalkNet/training/basil-training-data/basil-data/Basiliska-the-lamia-locally-trained/_training_data/wavs/'

Making Online Talknet Interface, More Accessible To Screen Readers

FYI:

I would have posted this bug report to the PPP first, but do to said area not having an "audio captcha" for Visually Impaired Anons, this is the next best thing.

After the interface loads and I navigate via the tab key to reach the combo box featuring a list of voices, nothing speaks when I arrow down through said choices. The only way I am able to confirm my choice is to tab once then arrow back by line. Otherwise, I need to shift-tab bak or arrow up to the combo box itself to choose a different voice.

In addition, if there is a way you could please point out where the button is for accessing the folder icon (by mentioning the icon's title), that would be most helpful.

NB. I use the VoiceOver screen reader created by Apple, thus Mileage may vary with other assistive technology products found on either Linux, Windows or mobile.

talknet has issues downloading some models if your new to talknet

hello sortanon please solve this issue as new anons come oonto the scene are unable to download the model cause it says download failed when the link is completely fine

RuntimeError: repeats has to be Long tensor.

Hi, I have a question. I’ve been trying to use controllable Talknet on colab to synthesize speech using an audio reference. This seems to work fine for very short samples but when trying somewhat longer reference audio colab does not seem to work.

To get around this problem I tweaked some bits of the code to make it run locally on a jupyter notebook. I can use the TTS functionality just fine, however, I am not able to synthesize using reference audio of any length. When I try I get this error message:
line 437, in tensors=[torch.repeat_interleave(text1, durs1) for text1, durs1 in zip(x, reps)], value=pad, dtype=x.dtype, RuntimeError: repeats has to be Long tensor.

Is this something you ever encounterd?

i ran the setup and this error shows when running talknet.bat

Starting TalkNet server. Close this window to shut down the server.
Traceback (most recent call last):
File "talknet_offline.py", line 3, in
from controllable_talknet import *
File "C:\Users\alex_\Desktop\talknet controller\ControllableTalkNet\controllable_talknet.py", line 5, in
from jupyter_dash import JupyterDash
ModuleNotFoundError: No module named 'jupyter_dash'

Issue with colab synthesis notebook

The interface just won't run for some reason

Offline training of singing models

Hello,

I've noticed that in the singing models there is a TalkNetSinger.nemo file that is not present in the non-singing models; however, there is nothing in the training code provided in the offline training notebook wrt generating this file. How do we generate this file?

docker version does not work...

Hello getting the following errors when trying to run a docker container:

root@DESKTOP-7A6UGRU:/home/snufas/github_projects/docker_talknet# docker run -it --gpus all -p 8050:8050 talknet-offline
Updating TalkNet...
Updating HiFi-GAN...
Updating Python dependencies...
ERROR: pytorch-lightning 1.7.0 has requirement tensorboard>=2.9.1, but you'll have tensorboard 2.4.1 which is incompatible.
ERROR: pytorch-lightning 1.7.0 has requirement torch>=1.9., but you'll have torch 1.8.1+cu111 which is incompatible.
ERROR: pytorch-lightning 1.7.0 has requirement typing-extensions>=4.0.0, but you'll have typing-extensions 3.7.4.3 which is incompatible.
ERROR: tensorboard 2.9.1 has requirement protobuf<3.20,>=3.9.2, but you'll have protobuf 3.20.1 which is incompatible.
ERROR: pytorch-lightning 1.7.0 has requirement torch>=1.9., but you'll have torch 1.8.1+cu111 which is incompatible.
ERROR: pytorch-lightning 1.7.0 has requirement typing-extensions>=4.0.0, but you'll have typing-extensions 3.7.4.3 which is incompatible.
Launching TalkNet...
Traceback (most recent call last):
File "talknet_offline.py", line 3, in
from controllable_talknet import *
File "/talknet/controllable_talknet.py", line 3, in
import dash
File "/usr/local/lib/python3.8/dist-packages/dash/init.py", line 5, in
from .dash import Dash, no_update # noqa: F401,E402
File "/usr/local/lib/python3.8/dist-packages/dash/dash.py", line 20, in
import flask
File "/usr/local/lib/python3.8/dist-packages/flask/init.py", line 4, in
from . import json as json
File "/usr/local/lib/python3.8/dist-packages/flask/json/init.py", line 8, in
from ..globals import current_app
File "/usr/local/lib/python3.8/dist-packages/flask/globals.py", line 56, in
app_ctx: "AppContext" = LocalProxy( # type: ignore[assignment]
TypeError: init() got an unexpected keyword argument 'unbound_message'

can you ubdate the dockerfile?
Thanks

Debugging issues in `backward_extractor`

When trying to run # Extract phoneme duration step of TalkNet_Training_Offline notebook, I'm getting random errors in the backward_extractor function. See the output below;

[NeMo I 2023-05-29 13:30:11 features:252] PADDING: 1
[NeMo I 2023-05-29 13:30:11 features:262] STFT using conv
[NeMo I 2023-05-29 13:30:12 modelPT:439] Model EncDecCTCModel was successfully restored from /home/mmmmllll1/.cache/torch/NeMo/NeMo_1.0.2/qn5x5_libri_tts_phonemes/656c7439dd3a0d614978529371be498b/qn5x5_libri_tts_phonemes.nemo.
[NeMo I 2023-05-29 13:30:13 collections:173] Dataset loaded with 642 files totalling 0.67 hours
[NeMo I 2023-05-29 13:30:13 collections:174] 0 files were filtered totalling 0.00 hours
18%
114/642 [00:48<02:57, 2.98it/s]
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[18], line 94
     91 target_tokens = preprocess_tokens(seq_ids, blank_id)
     93 f, p = forward_extractor(target_tokens, log_probs, blank_id)
---> 94 durs = backward_extractor(f, p)
     96 dur_key = Path(dl.dataset.collection[sample_idx].audio_file).stem
     97 dur_data[dur_key] = {
     98     'blanks': torch.tensor(durs[::2], dtype=torch.long).cpu().detach(), 
     99     'tokens': torch.tensor(durs[1::2], dtype=torch.long).cpu().detach()
    100 }

Cell In[18], line 45, in backward_extractor(f, p)
     43     t -= 1
     44 assert durs.shape[0] == n
---> 45 assert np.sum(durs) == m
     46 assert np.all(durs[1::2] > 0)
     47 return durs

AssertionError:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In[20], line 94
     91 target_tokens = preprocess_tokens(seq_ids, blank_id)
     93 f, p = forward_extractor(target_tokens, log_probs, blank_id)
---> 94 durs = backward_extractor(f, p)
     96 dur_key = Path(dl.dataset.collection[sample_idx].audio_file).stem
     97 dur_data[dur_key] = {
     98     'blanks': torch.tensor(durs[::2], dtype=torch.long).cpu().detach(), 
     99     'tokens': torch.tensor(durs[1::2], dtype=torch.long).cpu().detach()
    100 }

Cell In[20], line 41, in backward_extractor(f, p)
     39     s, t = n - 1, m
     40 while s > 0:
---> 41     durs[s - 1] += 1
     42     s -= p[s, t]
     43     t -= 1

IndexError: index 4720093899646973286 is out of bounds for axis 0 with size 49

I'm unsure how I should debug what is causing these issues? I assume there is something wrong with my training input?

Exits with NeMo and numpy related errors on fresh arch install

Using the instructions given as-is I'm running into this as the sequence of boot events every time I go to run sudo docker run -it --gpus all -p 8050:8050 talknet-offline:

Updating TalkNet...
Updating HiFi-GAN...
Updating Python dependencies...
Launching TalkNet...
2023-04-09 23:50:33.232655: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
[NeMo W 2023-04-09 23:50:34 optimizers:47] Apex was not found. Using the lamb optimizer will error out.
Traceback (most recent call last):
File "talknet_offline.py", line 3, in <module>
from controllable_talknet import *
File "/talknet/controllable_talknet.py", line 14, in <module>
from nemo.collections.tts.models import TalkNetSpectModel
File "/usr/local/lib/python3.8/dist-packages/nemo/collections/tts/__init__.py", line 15, in <module>
import nemo.collections.tts.data
File "/usr/local/lib/python3.8/dist-packages/nemo/collections/tts/data/__init__.py", line 15, in <module>
import nemo.collections.tts.data.datalayers
File "/usr/local/lib/python3.8/dist-packages/nemo/collections/tts/data/datalayers.py", line 58, in <module>
from nemo.collections.asr.parts.preprocessing.features import WaveformFeaturizer
File "/usr/local/lib/python3.8/dist-packages/nemo/collections/asr/__init__.py", line 15, in <module>
from nemo.collections.asr import data, losses, models, modules
File "/usr/local/lib/python3.8/dist-packages/nemo/collections/asr/models/__init__.py", line 16, in <module>
from nemo.collections.asr.models.classification_models import EncDecClassificationModel
File "/usr/local/lib/python3.8/dist-packages/nemo/collections/asr/models/classification_models.py", line 28, in <module>
from nemo.collections.asr.data import audio_to_label_dataset
File "/usr/local/lib/python3.8/dist-packages/nemo/collections/asr/data/audio_to_label_dataset.py", line 15, in <module>
from nemo.collections.asr.data import audio_to_label
File "/usr/local/lib/python3.8/dist-packages/nemo/collections/asr/data/audio_to_label.py", line 23, in <module>
from nemo.collections.asr.parts.preprocessing.segment import available_formats as valid_sf_formats
File "/usr/local/lib/python3.8/dist-packages/nemo/collections/asr/parts/preprocessing/__init__.py", line 16, in <module>
from nemo.collections.asr.parts.preprocessing.features import (
File "/usr/local/lib/python3.8/dist-packages/nemo/collections/asr/parts/preprocessing/features.py", line 42, in <module>
from librosa.util import tiny
File "/usr/local/lib/python3.8/dist-packages/lazy_loader/__init__.py", line 76, in __getattr__
submod = importlib.import_module(submod_path)
File "/usr/lib/python3.8/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "/usr/local/lib/python3.8/dist-packages/librosa/util/utils.py", line 17, in <module>
from numpy.typing import ArrayLike, DTypeLike
ModuleNotFoundError: No module named 'numpy.typing'

The most I can work out so far is that some packages fell into dependency hell and are expecting numpy to be 1.20 and not 1.19.2, can't figure out the issues with loading NeMo. The Windows build works fine with multiboot, but this error seems to persist across anything Arch related seeing as it's happened on two fresh installs along with Manjaro.

is it possible to introduce ditzy doo talking and singing model possibly that would be great for the next update thanks

Please implement derpy in the next model update talking and singing and maybe doctor whooves as well if possible

New Issues

theres a problem it thinks that there is no tensorflow_hib module not found also when i delete that line from extracts.py it then works but is in some kind of beta and isnt as goiod as the stable build before please fix this, its a major issue

ControllableTalkNet for another language

In Poland, voice cloning AI is very popular, but tacotron2 does not allow adding emotions and singing. TalkNet technology seems to be brilliant, I would like to make a version for Polish language, but I don't have much IT knowledge and I need some light help.

I have been practicing for a week a 30-hour Polish audiobook "The Doll" on this Colab notepad: https://colab.research.google.com/drive/1VqSWRU1H3KIU6au_ojOGFtU0HQPUFa6t

However, despite quite a bit of training, it still twists words a lot. I have discovered that the problem is not necessarily
with the model, but perhaps with the synthesis notebook, which is tailored exclusively for English: https://colab.research.google.com/drive/1aj6Jk8cpRw7SsN3JSYCv57CrR6s0gYPB
Everything you type characters into the generator field is converted to English ARPAbet.

Is it possible to disable this conversion?
Alternatively, is it possible to adapt this ARPAbet for the Polish language? But here there is a problem, because in Polish there are consonants which are not present in English, for example "ć", "ś", "ń", "ź".

MIDI Support

Would it be possible to make the script able to directly read MIDI files to get durations and pitch? It'd be very helpful in cases where you don't have clean vocals, but you have a MIDI based on the vocals.

I've been looking into the code, and it looks like it might be possible if you make it able to read the note durations and pitches in a MIDI and convert it to the proper format, but I'm not a skilled enough coder to do it.

Alternatively, I use a concatenative singing synthesizer called UTAU, and the .ust files made with it seem fairly simple in terms of structure, so it might even be possible to import durations and pitch from it instead. UST files even contain lyrics for each note, so a transcript could be extracted.

New Models

Hello Sort Anon i know others are moving on to different things but is it possible to release new models i still like talknet cause its very versatile in its use i can even change lyrics of songs please make a doctor whooves model ?

CPU version of windows Controllable Talknet

Is it possible to make a CPU version of controllable talknet on windows? It should be as someone has already done this on colab

Thank you!

How do i actually train a model offline on windows?

the readme only lets me pull up the main program to make voices not train them.

"Reduce metallic noise" fails with "Reconstruction VQGAN failed to download" - Where to place VQGAN file?

Selecting "Reduce metallic noise" gives the error "Reconstruction VQGAN failed to download"

However in the terminal, I can see the Google Drive link

**Access denied with the following error:                                                                                                                                                                                                      
                                                                                                                                                                                                                                             
        Cannot retrieve the public link of the file. You may need to change                                                                                                                                                                  
        the permission to 'Anyone with the link', or have had many accesses.                                                                                                                                                                 

You may still be able to access the file from the browser:

         https://drive.google.com/uc?id=1wlilvBtlBiAUEqqdqE0AEqo-UKx2X_cL 

**

I was able to manually download this in my browser. Where shall I put this so that ControllableTalkNet can find it? I can add a docker mount if needed.

About DiffSVC

DiffSVC_gui don't have code for it.
Do you have code for DiffSVC?

TalkNet training step 7 has a lot of missing + unexpected keys

Hello! I'm currently using your TalkNet training script on Google Colab (https://colab.research.google.com/drive/1Nb8TWjUBJIVg7QtIazMl64PAY4-QznzI?usp=sharing#scrollTo=nM7-bMpKO7U2) and there's an error that appears on step 7 where the console lists a bunch of missing and unexpected keys. I have absolutely zero experience with Python, so I would appreciate any pointers or tips on how to fix this.

Full Console Log:
https://pastebin.com/8XtFRTQN

Where do `train.py` and `config_v1b.json` come from?

The cell in TalkNet_Training_Offline.ipynb @ https://github.com/SortAnon/ControllableTalkNet/blame/5ee364f5bb1fe63fcde2b690507bd7cd89bfe268/TalkNet_Training_Offline.ipynb#L818-L823
runs

!python train.py --fine_tuning True --config config_v1b.json \
{start_from_universal} \
--checkpoint_interval 250 --checkpoint_path "{os.path.join(output_dir, 'HiFiGAN')}" \
--input_training_file "{hifi_train}" \
--input_validation_file "{hifi_val}" \
--input_wavs_dir "{hifi_wavs}"

But where do train.py and config_v1b.json come from? They don't seem to be included in this repository?

Other languages

Could you add compatibility with languages other than English, such as by compatibility with CSS10 as used by NANSY? https://github.com/Kyubyong/css10

Problem with linux version

Hello, I'm facing an issue while trying to run this tool. Upon launching it, I encounter an error message stating the following:

"If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:

Downgrade the protobuf package to 3.20.x or lower.
Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower)."

I attempted to downgrade the protobuf package, but unfortunately, it didn't resolve the issue.

Apple Silicon Support?

Training notebook broken Installing incompatible libaries

training notebook seems to install the wrong versions of the following libraries

toad 0.1.0 which requires numpy>=1.20, but you have numpy 1.19.5 which is incompatible.

konoha 4.6.5 which requires importlib-metadata<4.0.0,>-3.7.0, but you have importlib-metadata 4.11.3 which is incompatible.

google-colab 1.0.0 which requires requests -2.23.0, but you have requests 2.27.1 which is incompatible.

flair 0.8.0 which which requires torch<-1.7.1, >=1.5.0, but you have torch 1.8.1 which is incompatible.

datascience 0.10.6 which requires folium=-0.2.1, but you have folium 0.8.3 which is incompatible

albumentations 0.1.12 which requires imgaug<0.2.7, >=0.2.5, but you have imgaug 0.2.9 which is incompatible.

below is an attached screenshot of the error

ImportError: cannot import name 'get_num_classes' from 'torchmetrics.utilities.data' on step 3

Currently running the docker container on a linux environment, however when running step 3, it returns the following error:

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
Cell In[4], line 4
      1 # Extract phoneme duration
      3 import json
----> 4 from nemo.collections.asr.models import EncDecCTCModel
      5 asr_model = EncDecCTCModel.from_pretrained(model_name="asr_talknet_aligner").cpu().eval()
      7 def forward_extractor(tokens, log_probs, blank):

File ~/anaconda3/envs/talknet/lib/python3.8/site-packages/nemo/collections/asr/__init__.py:15
      1 # Copyright (c) 2020, NVIDIA CORPORATION.  All rights reserved.
      2 #
      3 # Licensed under the Apache License, Version 2.0 (the "License");
   (...)
     12 # See the License for the specific language governing permissions and
     13 # limitations under the License.
---> 15 from nemo.collections.asr import data, losses, models, modules
     16 from nemo.package_info import __version__
     18 # Set collection version equal to NeMo version.

File ~/anaconda3/envs/talknet/lib/python3.8/site-packages/nemo/collections/asr/losses/__init__.py:15
      1 # Copyright (c) 2020, NVIDIA CORPORATION.  All rights reserved.
      2 #
      3 # Licensed under the Apache License, Version 2.0 (the "License");
   (...)
     12 # See the License for the specific language governing permissions and
     13 # limitations under the License.
---> 15 from nemo.collections.asr.losses.angularloss import AngularSoftmaxLoss
     16 from nemo.collections.asr.losses.audio_losses import SDRLoss
     17 from nemo.collections.asr.losses.ctc import CTCLoss

File ~/anaconda3/envs/talknet/lib/python3.8/site-packages/nemo/collections/asr/losses/angularloss.py:18
      1 # ! /usr/bin/python
      2 # Copyright (c) 2020, NVIDIA CORPORATION.  All rights reserved.
      3 #
   (...)
     13 # See the License for the specific language governing permissions and
     14 # limitations under the License.
     16 import torch
---> 18 from nemo.core.classes import Loss, Typing, typecheck
     19 from nemo.core.neural_types import LabelsType, LogitsType, LossType, NeuralType
     21 __all__ = ['AngularSoftmaxLoss']

File ~/anaconda3/envs/talknet/lib/python3.8/site-packages/nemo/core/__init__.py:16
      1 # Copyright (c) 2020, NVIDIA CORPORATION.  All rights reserved.
      2 #
      3 # Licensed under the Apache License, Version 2.0 (the "License");
   (...)
     12 # See the License for the specific language governing permissions and
     13 # limitations under the License.
     15 import nemo.core.neural_types
---> 16 from nemo.core.classes import *

File ~/anaconda3/envs/talknet/lib/python3.8/site-packages/nemo/core/classes/__init__.py:18
     16 import hydra
     17 import omegaconf
---> 18 import pytorch_lightning
     20 from nemo.core.classes.common import (
     21     FileIO,
     22     Model,
   (...)
     27     typecheck,
     28 )
     29 from nemo.core.classes.dataset import Dataset, IterableDataset

File ~/anaconda3/envs/talknet/lib/python3.8/site-packages/pytorch_lightning/__init__.py:20
     17 _PACKAGE_ROOT = os.path.dirname(__file__)
     18 _PROJECT_ROOT = os.path.dirname(_PACKAGE_ROOT)
---> 20 from pytorch_lightning import metrics  # noqa: E402
     21 from pytorch_lightning.callbacks import Callback  # noqa: E402
     22 from pytorch_lightning.core import LightningDataModule, LightningModule  # noqa: E402

File ~/anaconda3/envs/talknet/lib/python3.8/site-packages/pytorch_lightning/metrics/__init__.py:15
      1 # Copyright The PyTorch Lightning team.
      2 #
      3 # Licensed under the Apache License, Version 2.0 (the "License");
   (...)
     12 # See the License for the specific language governing permissions and
     13 # limitations under the License.
---> 15 from pytorch_lightning.metrics.classification import (  # noqa: F401
     16     Accuracy,
     17     AUC,
     18     AUROC,
     19     AveragePrecision,
     20     ConfusionMatrix,
     21     F1,
     22     FBeta,
     23     HammingDistance,
     24     IoU,
     25     Precision,
     26     PrecisionRecallCurve,
     27     Recall,
     28     ROC,
     29     StatScores,
     30 )
     31 from pytorch_lightning.metrics.metric import Metric, MetricCollection  # noqa: F401
     32 from pytorch_lightning.metrics.regression import (  # noqa: F401
     33     ExplainedVariance,
     34     MeanAbsoluteError,
   (...)
     39     SSIM,
     40 )

File ~/anaconda3/envs/talknet/lib/python3.8/site-packages/pytorch_lightning/metrics/classification/__init__.py:14
      1 # Copyright The PyTorch Lightning team.
      2 #
      3 # Licensed under the Apache License, Version 2.0 (the "License");
   (...)
     12 # See the License for the specific language governing permissions and
     13 # limitations under the License.
---> 14 from pytorch_lightning.metrics.classification.accuracy import Accuracy  # noqa: F401
     15 from pytorch_lightning.metrics.classification.auc import AUC  # noqa: F401
     16 from pytorch_lightning.metrics.classification.auroc import AUROC  # noqa: F401

File ~/anaconda3/envs/talknet/lib/python3.8/site-packages/pytorch_lightning/metrics/classification/accuracy.py:18
     14 from typing import Any, Callable, Optional
     16 from torchmetrics import Accuracy as _Accuracy
---> 18 from pytorch_lightning.metrics.utils import deprecated_metrics
     21 class Accuracy(_Accuracy):
     23     @deprecated_metrics(target=_Accuracy)
     24     def __init__(
     25         self,
   (...)
     32         dist_sync_fn: Callable = None,
     33     ):

File ~/anaconda3/envs/talknet/lib/python3.8/site-packages/pytorch_lightning/metrics/utils.py:22
     20 from torchmetrics.utilities.data import dim_zero_mean as _dim_zero_mean
     21 from torchmetrics.utilities.data import dim_zero_sum as _dim_zero_sum
---> 22 from torchmetrics.utilities.data import get_num_classes as _get_num_classes
     23 from torchmetrics.utilities.data import select_topk as _select_topk
     24 from torchmetrics.utilities.data import to_categorical as _to_categorical

ImportError: cannot import name 'get_num_classes' from 'torchmetrics.utilities.data' (/home/ghostdog/anaconda3/envs/talknet/lib/python3.8/site-packages/torchmetrics/utilities/data.py)

I've tried re-installing torchmetrics version 0.6.0 using the command conda install -c conda-forge torchmetrics=0.6.0
What can I do to remedy this?

Tools to help automate data extraction

Love what you're doing for the community + the world.

We have been working on adapting this tool to work with this repo, might be helpful in ongoing research:
https://github.com/Appen/UHV-OTS-Speech

Weird audio glitches on long notes

I don't know how to contact you so I thought this was my best bet. I've noticed when using custom voicebanks on long words and long notes it glitches out. The pony singing banks don't have this glitch so I was wondering how to fix this in my own banks. I was also wondering if there's anyway to make or edit the phoneme converter because I was getting vowel conversion errors.

Error: UserWarning: torchaudio C++ extension is not available.

Setup everything, installed C++ got error on launch.

[NeMo W 2022-12-15 10:29:07 optimizers:47] Apex was not found. Using the lamb optimizer will error out.
[NeMo W 2022-12-15 10:29:07 nemo_logging:349] C:\Users\chlyw\Desktop\Talknet\miniconda\lib\site-packages\torchaudio\extension\extension.py:13: UserWarning: torchaudio C++ extension is not available.
warnings.warn('torchaudio C++ extension is not available.')

TalkNet_Training_Offline error

GPU available: True, used: True
TPU available: False, using: 0 TPU cores
[NeMo W 2022-12-02 09:58:37 modelPT:138] If you intend to do training or fine-tuning, please call the ModelPT.setup_training_data() method and provide a valid configuration file to setup the train data loader.
Train config :
dataset:
target: nemo.collections.asr.data.audio_to_text.AudioToCharWithDursF0Dataset
manifest_filepath: H:/ControllableTalkNet/tTrump\trainfiles.json
max_duration: null
min_duration: 0.1
int_values: false
load_audio: false
normalize: false
sample_rate: 22050
trim: false
durs_file: H:/ControllableTalkNet/tTrump\durations.pt
f0_file: H:/ControllableTalkNet/tTrump\f0s.pt
blanking: true
vocab:
notation: phonemes
punct: true
spaces: true
stresses: false
add_blank_at: last
dataloader_params:
drop_last: false
shuffle: true
batch_size: 16
num_workers: 4

[NeMo W 2022-12-02 09:58:37 modelPT:145] If you intend to do validation, please call the ModelPT.setup_validation_data() or ModelPT.setup_multiple_validation_data() method and provide a valid configuration file to setup the validation data loader(s).
Validation config :
dataset:
target: nemo.collections.asr.data.audio_to_text.AudioToCharWithDursF0Dataset
manifest_filepath: H:/ControllableTalkNet/tTrump\valfiles.json
max_duration: null
min_duration: 0.1
int_values: false
load_audio: false
normalize: false
sample_rate: 22050
trim: false
durs_file: H:/ControllableTalkNet/tTrump\durations.pt
f0_file: H:/ControllableTalkNet/tTrump\f0s.pt
blanking: true
vocab:
notation: phonemes
punct: true
spaces: true
stresses: false
add_blank_at: last
dataloader_params:
drop_last: false
shuffle: false
batch_size: 16
num_workers: 1

[NeMo I 2022-12-02 09:58:37 modelPT:439] Model TalkNetDursModel was successfully restored from H:\ControllableTalkNet\talknet_durs.nemo.
[NeMo I 2022-12-02 09:58:37 collections:173] Dataset loaded with 134 files totalling 0.21 hours
[NeMo I 2022-12-02 09:58:37 collections:174] 0 files were filtered totalling 0.00 hours
[NeMo I 2022-12-02 09:58:37 collections:173] Dataset loaded with 134 files totalling 0.21 hours
[NeMo I 2022-12-02 09:58:37 collections:174] 0 files were filtered totalling 0.00 hours
[NeMo W 2022-12-02 09:58:37 modelPT:660] The lightning trainer received accelerator: dp. We recommend to use 'ddp' instead.
[NeMo I 2022-12-02 09:58:37 modelPT:751] Optimizer config = Adam (
Parameter Group 0
amsgrad: False
betas: (0.9, 0.999)
eps: 1e-08
lr: 0.001
weight_decay: 1e-06
)
[NeMo I 2022-12-02 09:58:37 lr_scheduler:621] Scheduler "<nemo.core.optim.lr_scheduler.CosineAnnealing object at 0x0000021A2DF86EB0>"
will be used during training (effective maximum steps = 180) -
Parameters :
(min_lr: 3.0e-06
warmup_ratio: 0.02
max_steps: 180
)
Warm-starting from H:\ControllableTalkNet\talknet_durs.nemo
[NeMo I 2022-12-02 09:58:37 exp_manager:216] Experiments will be logged at H:\ControllableTalkNet\tTrump\TalkNetDurs\2022-12-02_09-57-24
[NeMo I 2022-12-02 09:58:37 exp_manager:563] TensorboardLogger has been set up
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
[NeMo W 2022-12-02 09:58:38 modelPT:660] The lightning trainer received accelerator: dp. We recommend to use 'ddp' instead.
[NeMo I 2022-12-02 09:58:38 modelPT:751] Optimizer config = Adam (
Parameter Group 0
amsgrad: False
betas: (0.9, 0.999)
eps: 1e-08
lr: 0.001
weight_decay: 1e-06
)
[NeMo I 2022-12-02 09:58:38 lr_scheduler:621] Scheduler "<nemo.core.optim.lr_scheduler.CosineAnnealing object at 0x0000021A2E22DCD0>"
will be used during training (effective maximum steps = 180) -
Parameters :
(min_lr: 3.0e-06
warmup_ratio: 0.02
max_steps: 180
)

| Name | Type | Params

0 | embed | Embedding | 7.6 K
1 | model | ConvASREncoder | 2.5 M
2 | proj | Conv1d | 513

2.5 M Trainable params
0 Non-trainable params
2.5 M Total params
9.841 Total estimated model params size (MB)
Validation sanity check: 0%
0/2 [00:00<?, ?it/s]

PicklingError Traceback (most recent call last)
Cell In[6], line 68
66 initialize(config_path="conf")
67 cfg = compose(config_name="talknet-durs")
---> 68 train(cfg)

Cell In[6], line 62, in train(cfg)
60 exp_manager(trainer, cfg.get('exp_manager', None))
61 trainer.callbacks.extend([pl.callbacks.LearningRateMonitor(), LogEpochTimeCallback()]) # noqa
---> 62 trainer.fit(model)

File ~\anaconda3\envs\talknet\lib\site-packages\pytorch_lightning\trainer\trainer.py:460, in Trainer.fit(self, model, train_dataloader, val_dataloaders, datamodule)
455 # links data to the trainer
456 self.data_connector.attach_data(
457 model, train_dataloader=train_dataloader, val_dataloaders=val_dataloaders, datamodule=datamodule
458 )
--> 460 self._run(model)
462 assert self.state.stopped
463 self.training = False

File ~\anaconda3\envs\talknet\lib\site-packages\pytorch_lightning\trainer\trainer.py:758, in Trainer._run(self, model)
755 self.pre_dispatch()
757 # dispatch start_training or start_evaluating or start_predicting
--> 758 self.dispatch()
760 # plugin will finalized fitting (e.g. ddp_spawn will load trained model)
761 self.post_dispatch()

File ~\anaconda3\envs\talknet\lib\site-packages\pytorch_lightning\trainer\trainer.py:799, in Trainer.dispatch(self)
797 self.accelerator.start_predicting(self)
798 else:
--> 799 self.accelerator.start_training(self)

File ~\anaconda3\envs\talknet\lib\site-packages\pytorch_lightning\accelerators\accelerator.py:96, in Accelerator.start_training(self, trainer)
95 def start_training(self, trainer: 'pl.Trainer') -> None:
---> 96 self.training_type_plugin.start_training(trainer)

File ~\anaconda3\envs\talknet\lib\site-packages\pytorch_lightning\plugins\training_type\training_type_plugin.py:144, in TrainingTypePlugin.start_training(self, trainer)
142 def start_training(self, trainer: 'pl.Trainer') -> None:
143 # double dispatch to initiate the training loop
--> 144 self._results = trainer.run_stage()

File ~\anaconda3\envs\talknet\lib\site-packages\pytorch_lightning\trainer\trainer.py:809, in Trainer.run_stage(self)
807 if self.predicting:
808 return self.run_predict()
--> 809 return self.run_train()

File ~\anaconda3\envs\talknet\lib\site-packages\pytorch_lightning\trainer\trainer.py:844, in Trainer.run_train(self)
841 if not self.is_global_zero and self.progress_bar_callback is not None:
842 self.progress_bar_callback.disable()
--> 844 self.run_sanity_check(self.lightning_module)
846 self.checkpoint_connector.has_trained = False
848 # enable train mode

File ~\anaconda3\envs\talknet\lib\site-packages\pytorch_lightning\trainer\trainer.py:1112, in Trainer.run_sanity_check(self, ref_model)
1109 self.on_sanity_check_start()
1111 # run eval step
-> 1112 self.run_evaluation()
1114 self.on_sanity_check_end()
1116 self.state.stage = stage

File ~\anaconda3\envs\talknet\lib\site-packages\pytorch_lightning\trainer\trainer.py:954, in Trainer.run_evaluation(self, on_epoch)
951 dataloader = self.accelerator.process_dataloader(dataloader)
952 dl_max_batches = self.evaluation_loop.max_batches[dataloader_idx]
--> 954 for batch_idx, batch in enumerate(dataloader):
955 if batch is None:
956 continue

File ~\anaconda3\envs\talknet\lib\site-packages\torch\utils\data\dataloader.py:355, in DataLoader.iter(self)
353 return self._iterator
354 else:
--> 355 return self._get_iterator()

File ~\anaconda3\envs\talknet\lib\site-packages\torch\utils\data\dataloader.py:301, in DataLoader._get_iterator(self)
299 else:
300 self.check_worker_number_rationality()
--> 301 return _MultiProcessingDataLoaderIter(self)

File ~\anaconda3\envs\talknet\lib\site-packages\torch\utils\data\dataloader.py:914, in _MultiProcessingDataLoaderIter.init(self, loader)
907 w.daemon = True
908 # NB: Process.start() actually take some time as it needs to
909 # start a process and pass the arguments over via a pipe.
910 # Therefore, we only add a worker to self._workers list after
911 # it started, so that we do not call .join() if program dies
912 # before it starts, and del tries to join but will get:
913 # AssertionError: can only join a started process.
--> 914 w.start()
915 self._index_queues.append(index_queue)
916 self._workers.append(w)

File ~\anaconda3\envs\talknet\lib\multiprocessing\process.py:121, in BaseProcess.start(self)
118 assert not _current_process._config.get('daemon'),
119 'daemonic processes are not allowed to have children'
120 _cleanup()
--> 121 self._popen = self._Popen(self)
122 self._sentinel = self._popen.sentinel
123 # Avoid a refcycle if the target function holds an indirect
124 # reference to the process object (see bpo-30775)

File ~\anaconda3\envs\talknet\lib\multiprocessing\context.py:224, in Process._Popen(process_obj)
222 @staticmethod
223 def _Popen(process_obj):
--> 224 return _default_context.get_context().Process._Popen(process_obj)

File ~\anaconda3\envs\talknet\lib\multiprocessing\context.py:327, in SpawnProcess._Popen(process_obj)
324 @staticmethod
325 def _Popen(process_obj):
326 from .popen_spawn_win32 import Popen
--> 327 return Popen(process_obj)

File ~\anaconda3\envs\talknet\lib\multiprocessing\popen_spawn_win32.py:93, in Popen.init(self, process_obj)
91 try:
92 reduction.dump(prep_data, to_child)
---> 93 reduction.dump(process_obj, to_child)
94 finally:
95 set_spawning_popen(None)

File ~\anaconda3\envs\talknet\lib\multiprocessing\reduction.py:60, in dump(obj, file, protocol)
58 def dump(obj, file, protocol=None):
59 '''Replacement for pickle.dump() using ForkingPickler.'''
---> 60 ForkingPickler(file, protocol).dump(obj)

PicklingError: Can't pickle <class 'nemo.collections.common.parts.preprocessing.collections.AudioTextEntity'>: attribute lookup AudioTextEntity on nemo.collections.common.parts.preprocessing.collections failed