Giter Site home page Giter Site logo

nvidia / nemo Goto Github PK

View Code? Open in Web Editor NEW
11.0K 199.0 2.3K 254.68 MB

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Home Page: https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html

License: Apache License 2.0

Python 72.72% Shell 0.16% Dockerfile 0.03% Jupyter Notebook 26.95% HTML 0.02% CSS 0.01% Makefile 0.01% C++ 0.11%
machine-translation speaker-recognition asr tts generative-ai multimodal deeplearning neural-networks speaker-diariazation speech-translation

nemo's Introduction

Project Status: Active – The project has reached a stable, usable state and is being actively developed. Documentation CodeQL NeMo core license and license for collections in this repo Release version Python version PyPi total downloads Code style: black

NVIDIA NeMo Framework

Latest News

Large Language Models and Multimodal
NVIDIA releases 340B base, instruct, and reward models pretrained on a total of 9T tokens. (2024-06-18) See documentation and tutorials for SFT, PEFT, and PTQ with Nemotron 340B in the NeMo Framework User Guide.

NVIDIA sets new generative AI performance and scale records in MLPerf Training v4.0 (2024/06/12) Using NVIDIA NeMo Framework and NVIDIA Hopper GPUs NVIDIA was able to scale to 11,616 H100 GPUs and achieve near-linear performance scaling on LLM pretraining. NVIDIA also achieved the highest LLM fine-tuning performance and raised the bar for text-to-image training.

Accelerate your generative AI journey with NVIDIA NeMo Framework on GKE (2024/03/16) An end-to-end walkthrough to train generative AI models on the Google Kubernetes Engine (GKE) using the NVIDIA NeMo Framework is available at https://github.com/GoogleCloudPlatform/nvidia-nemo-on-gke. The walkthrough includes detailed instructions on how to set up a Google Cloud Project and pre-train a GPT model using the NeMo Framework.

Bria Builds Responsible Generative AI for Enterprises Using NVIDIA NeMo, Picasso (2024/03/06) Bria, a Tel Aviv startup at the forefront of visual generative AI for enterprises now leverages the NVIDIA NeMo Framework. The Bria.ai platform uses reference implementations from the NeMo Multimodal collection, trained on NVIDIA Tensor Core GPUs, to enable high-throughput and low-latency image generation. Bria has also adopted NVIDIA Picasso, a foundry for visual generative AI models, to run inference.

New NVIDIA NeMo Framework Features and NVIDIA H200 (2023/12/06) NVIDIA NeMo Framework now includes several optimizations and enhancements, including: 1) Fully Sharded Data Parallelism (FSDP) to improve the efficiency of training large-scale AI models, 2) Mix of Experts (MoE)-based LLM architectures with expert parallelism for efficient LLM training at scale, 3) Reinforcement Learning from Human Feedback (RLHF) with TensorRT-LLM for inference stage acceleration, and 4) up to 4.2x speedups for Llama 2 pre-training on NVIDIA H200 Tensor Core GPUs.

H200-NeMo-performance

NVIDIA now powers training for Amazon Titan Foundation models (2023/11/28) NVIDIA NeMo Framework now empowers the Amazon Titan foundation models (FM) with efficient training of large language models (LLMs). The Titan FMs form the basis of Amazon’s generative AI service, Amazon Bedrock. The NeMo Framework provides a versatile framework for building, customizing, and running LLMs.

Speech Recognition
New Standard for Speech Recognition and Translation from the NVIDIA NeMo Canary Model (2024/04/18) The NeMo team just released Canary, a multilingual model that transcribes speech in English, Spanish, German, and French with punctuation and capitalization. Canary also provides bi-directional translation, between English and the three other supported languages.

Pushing the Boundaries of Speech Recognition with NVIDIA NeMo Parakeet ASR Models (2024/04/18) NVIDIA NeMo, an end-to-end platform for the development of multimodal generative AI models at scale anywhere—on any cloud and on-premises—released the Parakeet family of automatic speech recognition (ASR) models. These state-of-the-art ASR models, developed in collaboration with Suno.ai, transcribe spoken English with exceptional accuracy.

Turbocharge ASR Accuracy and Speed with NVIDIA NeMo Parakeet-TDT (2024/04/18) NVIDIA NeMo, an end-to-end platform for developing multimodal generative AI models at scale anywhere—on any cloud and on-premises—recently released Parakeet-TDT. This new addition to the  NeMo ASR Parakeet model family boasts better accuracy and 64% greater speed over the previously best model, Parakeet-RNNT-1.1B.

Introduction

NVIDIA NeMo Framework is a scalable and cloud-native generative AI framework built for researchers and PyTorch developers working on Large Language Models (LLMs), Multimodal Models (MMs), Automatic Speech Recognition (ASR), Text to Speech (TTS), and Computer Vision (CV) domains. It is designed to help you efficiently create, customize, and deploy new generative AI models by leveraging existing code and pre-trained model checkpoints.

For technical documentation, please see the NeMo Framework User Guide.

LLMs and MMs Training, Alignment, and Customization

All NeMo models are trained with Lightning. Training is automatically scalable to 1000s of GPUs.

When applicable, NeMo models leverage cutting-edge distributed training techniques, incorporating parallelism strategies to enable efficient training of very large models. These techniques include Tensor Parallelism (TP), Pipeline Parallelism (PP), Fully Sharded Data Parallelism (FSDP), Mixture-of-Experts (MoE), and Mixed Precision Training with BFloat16 and FP8, as well as others.

NeMo Transformer-based LLMs and MMs utilize NVIDIA Transformer Engine for FP8 training on NVIDIA Hopper GPUs, while leveraging NVIDIA Megatron Core for scaling Transformer model training.

NeMo LLMs can be aligned with state-of-the-art methods such as SteerLM, Direct Preference Optimization (DPO), and Reinforcement Learning from Human Feedback (RLHF). See NVIDIA NeMo Aligner for more information.

In addition to supervised fine-tuning (SFT), NeMo also supports the latest parameter efficient fine-tuning (PEFT) techniques such as LoRA, P-Tuning, Adapters, and IA3. Refer to the NeMo Framework User Guide for the full list of supported models and techniques.

LLMs and MMs Deployment and Optimization

NeMo LLMs and MMs can be deployed and optimized with NVIDIA NeMo Microservices.

Speech AI

NeMo ASR and TTS models can be optimized for inference and deployed for production use cases with NVIDIA Riva.

NeMo Framework Launcher

NeMo Framework Launcher is a cloud-native tool that streamlines the NeMo Framework experience. It is used for launching end-to-end NeMo Framework training jobs on CSPs and Slurm clusters.

The NeMo Framework Launcher includes extensive recipes, scripts, utilities, and documentation for training NeMo LLMs. It also includes the NeMo Framework Autoconfigurator, which is designed to find the optimal model parallel configuration for training on a specific cluster.

To get started quickly with the NeMo Framework Launcher, please see the NeMo Framework Playbooks. The NeMo Framework Launcher does not currently support ASR and TTS training, but it will soon.

Get Started with NeMo Framework

Getting started with NeMo Framework is easy. State-of-the-art pretrained NeMo models are freely available on Hugging Face Hub and NVIDIA NGC. These models can be used to generate text or images, transcribe audio, and synthesize speech in just a few lines of code.

We have extensive tutorials that can be run on Google Colab or with our NGC NeMo Framework Container. We also have playbooks for users who want to train NeMo models with the NeMo Framework Launcher.

For advanced users who want to train NeMo models from scratch or fine-tune existing NeMo models, we have a full suite of example scripts that support multi-GPU/multi-node training.

Key Features

Requirements

  • Python 3.10 or above
  • Pytorch 1.13.1 or above
  • NVIDIA GPU (if you intend to do model training)

Developer Documentation

Version Status Description
Latest Documentation Status Documentation of the latest (i.e. main) branch.
Stable Documentation Status Documentation of the stable (i.e. most recent release) branch.

Install NeMo Framework

The NeMo Framework can be installed in a variety of ways, depending on your needs. Depending on the domain, you may find one of the following installation methods more suitable.

  • Conda / Pip - Refer to Conda and Pip for installation instructions.
    • This is the recommended method for ASR and TTS domains.
    • When using a Nvidia PyTorch container as the base, this is the recommended method for all domains.
  • Docker Containers - Refer to Docker containers for installation instructions.
    • NeMo Framework container - nvcr.io/nvidia/nemo:24.05
  • LLMs and MMs Dependencies - Refer to LLMs and MMs Dependencies for installation instructions.

Important: We strongly recommended that you start with a base NVIDIA PyTorch container: nvcr.io/nvidia/pytorch:24.02-py3.

Conda

Install NeMo in a fresh Conda environment:

conda create --name nemo python==3.10.12
conda activate nemo

Install PyTorch using their configurator:

conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia

The command to install PyTorch may depend on your system. Use the configurator linked above to find the right command for your system.

Then, install NeMo via Pip or from Source. We do not provide NeMo on the conda-forge or any other Conda channel.

Pip

To install the nemo_toolkit, use the following installation method:

apt-get update && apt-get install -y libsndfile1 ffmpeg
pip install Cython packaging
pip install nemo_toolkit['all']

Depending on the shell used, you may need to use the "nemo_toolkit[all]" specifier instead in the above command.

Pip from a Specific Domain

To install a specific domain of NeMo, you must first install the nemo_toolkit using the instructions listed above. Then, you run the following domain-specific commands:

pip install nemo_toolkit['asr']
pip install nemo_toolkit['nlp']
pip install nemo_toolkit['tts']
pip install nemo_toolkit['vision']
pip install nemo_toolkit['multimodal']

Pip from a Source Branch

If you want to work with a specific version of NeMo from a particular GitHub branch (e.g main), use the following installation method:

apt-get update && apt-get install -y libsndfile1 ffmpeg
pip install Cython packaging
python -m pip install git+https://github.com/NVIDIA/NeMo.git@{BRANCH}#egg=nemo_toolkit[all]

Build from Source

If you want to clone the NeMo GitHub repository and contribute to NeMo open-source development work, use the following installation method:

apt-get update && apt-get install -y libsndfile1 ffmpeg
git clone https://github.com/NVIDIA/NeMo
cd NeMo
./reinstall.sh

If you only want the toolkit without the additional Conda-based dependencies, you can replace reinstall.sh with pip install -e . when your PWD is the root of the NeMo repository.

Mac Computers with Apple Silicon

To install NeMo on Mac computers with the Apple M-Series GPU, you need to create a new Conda environment, install PyTorch 2.0 or higher, and then install the nemo_toolkit.

Important: This method is only applicable to the ASR domain.

Run the following code:

# [optional] install mecab using Homebrew, to use sacrebleu for NLP collection
# you can install Homebrew here: https://brew.sh
brew install mecab

# [optional] install pynini using Conda, to use text normalization
conda install -c conda-forge pynini

# install Cython manually
pip install cython packaging

# clone the repo and install in development mode
git clone https://github.com/NVIDIA/NeMo
cd NeMo
pip install 'nemo_toolkit[all]'

# Note that only the ASR toolkit is guaranteed to work on MacBook - so for MacBook use pip install 'nemo_toolkit[asr]'

Windows Computers

To install the Windows Subsystem for Linux (WSL), run the following code in PowerShell:

wsl --install
# [note] If you run wsl --install and see the WSL help text, it means WSL is already installed.

To learn more about installing WSL, refer to Microsoft's official documentation.

After installing your Linux distribution with WSL, two options are available:

Option 1: Open the distribution (Ubuntu by default) from the Start menu and follow the instructions.

Option 2: Launch the Terminal application. Download it from Microsoft's Windows Terminal page if not installed.

Next, follow the instructions for Linux systems, as provided above. For example:

apt-get update && apt-get install -y libsndfile1 ffmpeg
git clone https://github.com/NVIDIA/NeMo
cd NeMo
./reinstall.sh

RNNT

For optimal performance of a Recurrent Neural Network Transducer (RNNT), install the Numba package from Conda.

Run the following code:

conda remove numba
pip uninstall numba
conda install -c conda-forge numba

Install LLMs and MMs Dependencies

If you work with the LLM and MM domains, three additional dependencies are required: NVIDIA Apex, NVIDIA Transformer Engine, and NVIDIA Megatron Core. When working with the main branch, these dependencies may require a recent commit.

The most recent working versions of these dependencies are here:

export apex_commit=810ffae374a2b9cb4b5c5e28eaeca7d7998fca0c
export te_commit=bfe21c3d68b0a9951e5716fb520045db53419c5e
export mcore_commit=02871b4df8c69fac687ab6676c4246e936ce92d0
export nv_pytorch_tag=24.02-py3

When using a released version of NeMo, please refer to the Software Component Versions for the correct versions.

PyTorch Container

We recommended that you start with a base NVIDIA PyTorch container: nvcr.io/nvidia/pytorch:24.02-py3.

If starting with a base NVIDIA PyTorch container, you must first launch the container:

docker run \
  --gpus all \
  -it \
  --rm \
  --shm-size=16g \
  --ulimit memlock=-1 \
  --ulimit stack=67108864 \
  nvcr.io/nvidia/pytorch:$nv_pytorch_tag

Next, you need to install the dependencies.

Apex

NVIDIA Apex is required for LLM and MM domains. Although Apex is pre-installed in the NVIDIA PyTorch container, you may need to update it to a newer version.

To install Apex, run the following code:

git clone https://github.com/NVIDIA/apex.git
cd apex
git checkout $apex_commit
pip install . -v --no-build-isolation --disable-pip-version-check --no-cache-dir --config-settings "--build-option=--cpp_ext --cuda_ext --fast_layer_norm --distributed_adam --deprecated_fused_adam --group_norm"

When attempting to install Apex separately from the NVIDIA PyTorch container, you might encounter an error if the CUDA version on your system is different from the one used to compile PyTorch. To bypass this error, you can comment out the relevant line in the setup file located in the Apex repository on GitHub here: https://github.com/NVIDIA/apex/blob/master/setup.py#L32.

cuda-nvprof is needed to install Apex. The version should match the CUDA version that you are using.

To install cuda-nvprof, run the following code:

conda install -c nvidia cuda-nvprof=11.8

Finally, install the packaging:

pip install packaging

To install the most recent versions of Apex locally, it might be necessary to remove the pyproject.toml file from the Apex directory.

Transformer Engine

NVIDIA Transformer Engine is required for LLM and MM domains. Although the Transformer Engine is pre-installed in the NVIDIA PyTorch container, you may need to update it to a newer version.

The Transformer Engine facilitates training with FP8 precision on NVIDIA Hopper GPUs and introduces many enhancements for the training of Transformer-based models. Refer to Transformer Enginer for information.

To install Transformer Engine, run the following code:

git clone https://github.com/NVIDIA/TransformerEngine.git && \
cd TransformerEngine && \
git checkout $te_commit && \
git submodule init && git submodule update && \
NVTE_FRAMEWORK=pytorch NVTE_WITH_USERBUFFERS=1 MPI_HOME=/usr/local/mpi pip install .

Transformer Engine requires PyTorch to be built with at least CUDA 11.8.

Megatron Core

Megatron Core is required for LLM and MM domains. Megatron Core is a library for scaling large Transformer-based models. NeMo LLMs and MMs leverage Megatron Core for model parallelism, transformer architectures, and optimized PyTorch datasets.

To install Megatron Core, run the following code:

git clone https://github.com/NVIDIA/Megatron-LM.git && \
cd Megatron-LM && \
git checkout $mcore_commit && \
pip install . && \
cd megatron/core/datasets && \
make

NeMo Text Processing

NeMo Text Processing, specifically Inverse Text Normalization, is now a separate repository. It is located here: https://github.com/NVIDIA/NeMo-text-processing.

Docker Containers

NeMo containers are launched concurrently with NeMo version updates. NeMo Framework now supports LLMs, MMs, ASR, and TTS in a single consolidated Docker container. You can find additional information about released containers on the NeMo releases page.

To use a pre-built container, run the following code:

docker pull nvcr.io/nvidia/nemo:24.05

To build a nemo container with Dockerfile from a branch, run the following code:

DOCKER_BUILDKIT=1 docker build -f Dockerfile -t nemo:latest

If you choose to work with the main branch, we recommend using NVIDIA's PyTorch container version 23.10-py3 and then installing from GitHub.

docker run --gpus all -it --rm -v <nemo_github_folder>:/NeMo --shm-size=8g \
-p 8888:8888 -p 6006:6006 --ulimit memlock=-1 --ulimit \
stack=67108864 --device=/dev/snd nvcr.io/nvidia/pytorch:23.10-py3

Future Work

The NeMo Framework Launcher does not currently support ASR and TTS training, but it will soon.

Discussions Board

FAQ can be found on the NeMo Discussions board. You are welcome to ask questions or start discussions on the board.

Contribute to NeMo

We welcome community contributions! Please refer to CONTRIBUTING.md for the process.

Publications

We provide an ever-growing list of publications that utilize the NeMo Framework.

To contribute an article to the collection, please submit a pull request to the gh-pages-src branch of this repository. For detailed information, please consult the README located at the gh-pages-src branch.

Licenses

nemo's People

Contributors

akoumpa avatar anteju avatar arendu avatar blisc avatar borisfom avatar chiphuyen avatar cuichenx avatar drnikolaev avatar ekmb avatar ericharper avatar fayejf avatar github-actions[bot] avatar maximumentropy avatar michalivne avatar nithinraok avatar okuchaiev avatar pablo-garay avatar redoctopus avatar rlangman avatar seannaren avatar stevehuang52 avatar tango4j avatar titu1994 avatar tkornuta-nvidia avatar vahidoox avatar vsl9 avatar xuesongyang avatar yaoyu-33 avatar yidong72 avatar yzhang123 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nemo's Issues

jasper_inference seems very slow

Hi,
I use jasper_infer.py on my desktop and follow the tutorial.
But the inference speed seems so slow. The beam size is 100.
I use my own evaluation dataset, which has 14000 samples.
I tested the ctc decoder with language model alone and the speed seems faster.
I'm wondering how can I debug to find the problem.

Another thing is that when I use jasper_infer.py on NGC, I find there is no output on the log after showing:

2019-10-11 05:31:57,269 - WARNING - No batch_size specified in the data layer. Setting batch_size to 1.
2019-10-11 05:31:57,378 - WARNING - When constructing AudioToTextDataLayer. The base NeuralModule class received the following unused arguments:
2019-10-11 05:31:57,378 - WARNING - dict_keys(['batch_size'])
2019-10-11 05:31:59,696 - INFO - Dataset loaded with 18.09 hours. Filtered 0.00 hours.
2019-10-11 05:31:59,696 - INFO - Evaluating 14326 examples
2019-10-11 05:31:59,699 - INFO - PADDING: 16
2019-10-11 05:31:59,699 - INFO - STFT using conv
2019-10-11 05:32:45,601 - INFO - ================================
2019-10-11 05:32:45,603 - INFO - Number of parameters in encoder: 18894656
2019-10-11 05:32:45,603 - INFO - Number of parameters in decoder: 4406475
2019-10-11 05:32:45,604 - INFO - Total number of parameters in decoder: 23301131
2019-10-11 05:32:45,604 - INFO - ================================
2019-10-11 05:32:45,946 - INFO - Restoring JasperEncoder from /nemo_project/nemo_projects/aishell/checkpoint/JasperEncoder-STEP-72000.pt
2019-10-11 05:32:46,803 - INFO - Restoring JasperDecoderForCTC from /nemo_project/nemo_projects/aishell/checkpoint/JasperDecoderForCTC-STEP-72000.pt

It seems it is waiting for something ...

Running on CPU

Hi,

I am currently trying to run 'simplest_example.py' on a CPU within a docker container.

I have tried modifying the code to run on CPU by passing:

  • "placement=DeviceType.CPU" to the Factory which produces an Error regarding CUDA:

Traceback (most recent call last):
File "simplest_example.py", line 27, in
optimizer="sgd")
File "/opt/conda/lib/python3.6/site-packages/nemo_toolkit-0.8-py3.6.egg/nemo/core/neural_factory.py", line 526, in train
stop_on_nan_loss=stop_on_nan_loss)
File "/opt/conda/lib/python3.6/site-packages/nemo_toolkit-0.8-py3.6.egg/nemo/backends/pytorch/actions.py", line 1022, in train
'amp_min_loss_scale', 1.0))
File "/opt/conda/lib/python3.6/site-packages/nemo_toolkit-0.8-py3.6.egg/nemo/backends/pytorch/actions.py", line 359, in __initialize_amp
opt_level=AmpOptimizations[optim_level],
File "/opt/conda/lib/python3.6/site-packages/apex/amp/frontend.py", line 358, in initialize
return _initialize(models, optimizers, _amp_state.opt_properties, num_losses, cast_model_outputs)
File "/opt/conda/lib/python3.6/site-packages/apex/amp/_initialize.py", line 170, in _initialize
check_params_fp32(models)
File "/opt/conda/lib/python3.6/site-packages/apex/amp/_initialize.py", line 92, in check_params_fp32
name, param.type()))
File "/opt/conda/lib/python3.6/site-packages/apex/amp/_amp_state.py", line 32, in warn_or_err
raise RuntimeError(msg)
RuntimeError: Found param fc1.weight with type torch.FloatTensor, expected torch.cuda.FloatTensor.
When using amp.initialize, you need to provide a model with parameters
located on a CUDA device before passing it no matter what optimization level
you chose. Use model.to('cuda') to use the default device.

To fix that issue I additionally passed:

  • 'optimization_level=1' to prevent APEX from being called which returned

2019-10-11 09:32:10,688 - WARNING - Data Layer does not have any weights to return. This get_weights call returns None.
Starting .....
Starting epoch 0
Traceback (most recent call last):
File "simplest_example.py", line 27, in
optimizer="sgd")
File "/opt/conda/lib/python3.6/site-packages/nemo_toolkit-0.8-py3.6.egg/nemo/core/neural_factory.py", line 526, in train
stop_on_nan_loss=stop_on_nan_loss)
File "/opt/conda/lib/python3.6/site-packages/nemo_toolkit-0.8-py3.6.egg/nemo/backends/pytorch/actions.py", line 1184, in train
final_loss.get_device()))
RuntimeError: Device index must not be negative

How do I run the example on CPU? Thanks.

How to use pretrained models ?

I download the pretrained models Aishell2 Jasper 10x5dr and QuartzNet15x5. There is a error when I use it :

NeMo-master/examples/asr$ python3 jasper_aishell_infer.py --eval_datasets ../../data/test.json --vocab_file aishell2_quartznet15x5/vocab.txt
2019-12-30 15:11:43,318 - INFO - Dataset loaded with 0.01 hours. Filtered 0.00 hours.
2019-12-30 15:11:43,318 - INFO - Evaluating 10 examples
2019-12-30 15:11:43,319 - INFO - PADDING: 16
2019-12-30 15:11:43,319 - INFO - STFT using conv
a 12
2019-12-30 15:11:48,699 - INFO - ================================
2019-12-30 15:11:48,700 - INFO - Number of parameters in encoder: 332602624
2019-12-30 15:11:48,700 - INFO - Number of parameters in decoder: 5337175
2019-12-30 15:11:48,701 - INFO - Total number of parameters in decoder: 337939799
2019-12-30 15:11:48,701 - INFO - ================================
2019-12-30 15:11:48,704 - INFO - Restoring JasperEncoder from ./aishell2_jasper10x5dr/JasperEncoder-STEP-394050.pt
Traceback (most recent call last):
  File "jasper_aishell_infer.py", line 260, in <module>
    main()
  File "jasper_aishell_infer.py", line 212, in main
    checkpoint_dir=load_dir,
  File "/usr/local/lib/python3.6/dist-packages/nemo/core/neural_factory.py", line 687, in infer
    modules_to_restore=modules_to_restore)
  File "/usr/local/lib/python3.6/dist-packages/nemo/backends/pytorch/actions.py", line 1545, in infer
    mod.restore_from(checkpoint, self._local_rank)
  File "/usr/local/lib/python3.6/dist-packages/nemo/backends/pytorch/nm.py", line 111, in restore_from
    self.load_state_dict(t.load(path, map_location=load_device))
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 839, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for JasperEncoder:
	Missing key(s) in state_dict: "encoder.0.mconv.0.conv.weight", "encoder.0.mconv.1.weight", "encoder.0.mconv.1.bias", "encoder.0.mconv.1.running_mean", "encoder.0.mconv.1.running_var", "encoder.1.mconv.0.conv.weight", "encoder.1.mconv.1.weight", "encoder.1.mconv.1.bias", "encoder.1.mconv.1.running_mean", "encoder.1.mconv.1.running_var", "encoder.1.mconv.4.conv.weight", "encoder.1.mconv.5.weight", "encoder.1.mconv.5.bias", "encoder.1.mconv.5.running_mean", "encoder.1.mconv.5.running_var", "encoder.1.mconv.8.conv.weight", "encoder.1.mconv.9.weight", "encoder.1.mconv.9.bias", "encoder.1.mconv.9.running_mean", "encoder.1.mconv.9.running_var", "encoder.1.mconv.12.conv.weight", "encoder.1.mconv.13.weight", "encoder.1.mconv.13.bias", .......

Some parameter I set:

parser.add_argument("--model_config",default="./aishell2_jasper10x5dr/jasper10x5dr.yaml", type=str)
parser.add_argument("--load_dir",default='./aishell2_jasper10x5dr/', type=str)

nemo_nlp.utils not found

Hi,

After following the instructions for pre-training and installation, I get the following error:

nemo_nlp.utils not found

There are following two cases:

  1. Installing nemo_nlp with pip install nemo_nlp doesn't create utils directory in "lib/python3.6/site-packages"

  2. Installing nemo_nlp by cloning the git link and then running setup.py creates the utils directory but it still says "nemo_nlp.utils not found"

Thanks!

examples/nlp/ner.py isn't working

I've just sent a pull request to fix the issue. The model is training and achieves high accuracy and F1 score.
But I still see the following warning:
'WARNING - Data Layer does not have any weights to return. This get_weights call returns None.'

Pre-trained models are no longer compatible with new model architecture for ASR

The pre-trained models have effectively become not usable anymore since updates were made to JasperEncoder or most probable the jasper.py module.

Example pre-trained model: https://ngc.nvidia.com/catalog/models/nvidia:quartznet15x5

Error on trying to load the same:

jasper_encoder = nemo_asr.JasperEncoder(
    jasper=jasper_model_definition['JasperEncoder']['jasper'],
    activation=jasper_model_definition['JasperEncoder']['activation'],
    feat_in=jasper_model_definition['AudioToMelSpectrogramPreprocessor']['features'])

jasper_encoder.restore_from(CHECKPOINT_ENCODER, local_rank=0)
RuntimeError: Error(s) in loading state_dict for JasperEncoder:
	Missing key(s) in state_dict: "encoder.0.mconv.0.conv.weight", "encoder.0.mconv.1.conv.weight", "encoder.0.mconv.2.weight", "encoder.0.mconv.2.bias", "encoder.0.mconv.2.running_mean", "encoder.0.mconv.2.running_var", "encoder.1.mconv.0.conv.weight", "encoder.1.mconv.1.conv.weight", "encoder.1.mconv.2.weight", "encoder.1.mconv.2.bias", "encoder.1.mconv.2.running_mean", "encoder.1.mconv.2.running_var", "encoder.1.mconv.5.conv.weight", "encoder.1.mconv.6.conv.weight", "encoder.1.mconv.7.weight", "encoder.1.mconv.7.bias", "encoder.1.mconv.7.running_mean", "encoder.1.mconv.7.running_var", "encoder.1.mconv.10.conv.weight", "encoder.1.mconv.11.conv.weight", "encoder.1.mconv.12.weight", "encoder.1.mconv.12.bias", "encoder.1.mconv.12.running_mean", "encoder.1.mconv.12.running_var", "encoder.1.mconv.15.conv.weight", "encoder.1.mconv.16.conv.weight", "encoder.1.mconv.17.weight", "encoder.1.mconv.17.bias", "encoder.1.mconv.17.running_mean", "encoder.1.mconv.17.running_var", "encoder.1.mconv.20.conv.weight", "encoder.1.mconv.21.conv.weight", "encoder.1.mconv.22.weight", "encoder.1.mconv.22.bias", "encoder.1.mconv.22.running_mean", "encoder.1.mconv.22.running_var", "encoder.1.res.0.0.conv.weight", "encoder.2.mconv.0.conv.weight", "encoder.2.mconv.1.conv.weight", "encoder.2.mconv.2.weight", "encoder.2.mconv.2.bias", "encoder.2.mconv.2.running_mean", "encoder.2.mconv.2.running_var", "encoder.2.mconv.5.conv.weight", "encoder.2.mconv.6.conv.weight", "encoder.2.mconv.7.weight", "encoder.2.m...
	Unexpected key(s) in state_dict: "encoder.0.conv.0.weight", "encoder.0.conv.1.weight", "encoder.0.conv.2.weight", "encoder.0.conv.2.bias", "encoder.0.conv.2.running_mean", "encoder.0.conv.2.running_var", "encoder.0.conv.2.num_batches_tracked", "encoder.1.conv.0.weight", "encoder.1.conv.1.weight", "encoder.1.conv.2.weight", "encoder.1.conv.2.bias", "encoder.1.conv.2.running_mean", "encoder.1.conv.2.running_var", "encoder.1.conv.2.num_batches_tracked", "encoder.1.conv.5.weight", "encoder.1.conv.6.weight", "encoder.1.conv.7.weight", "encoder.1.conv.7.bias", "encoder.1.conv.7.running_mean", "encoder.1.conv.7.running_var", "encoder.1.conv.7.num_batches_tracked", "encoder.1.conv.10.weight", "encoder.1.conv.11.weight", "encoder.1.conv.12.weight", "encoder.1.conv.12.bias", "encoder.1.conv.12.running_mean", "encoder.1.conv.12.running_var", "encoder.1.conv.12.num_batches_tracked", "encoder.1.conv.15.weight", "encoder.1.conv.16.weight", "encoder.1.conv.17.weight", "encoder.1.conv.17.bias", "encoder.1.conv.17.running_mean", "encoder.1.conv.17.running_var", "encoder.1.conv.17.num_batches_tracked", "encoder.1.conv.20.weight", "encoder.1.conv.21.weight", "encoder.1.conv.22.weight", "encoder.1.conv.22.bias", "encoder.1.conv.22.running_mean", "encoder.1.conv.22.running_var", "encoder.1.conv.22.num_batches_tracked", "encoder.1.res.0.0.weight", "encoder.2.conv.0.weight", "encoder.2.conv.1.weight", "encoder.2.conv.2.weight", "encoder.2.conv.2.bias", "encoder.2.conv.2.running_mean", "encoder....

Is there anyway to make this work with the older models or get newer compatible pre-trained models? @okuchaiev

Which labels are required to use in tacotron2.yaml configuration for none English languages [TTS]?

Dear Team,

I may suppose that the original letters and characters in UTF8 encoding are required in labels of the tacotron configuration file for none English languages, but as I see in your sample for Chinese Mandarin language (tacotron2_mandarin.yaml) the latin characteres are used.

labels: [' ', '!', ',', '.', '?', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '0', '1', '2', '3', '4']

Is it a right option?

KeyError: 'EvalLoss'

I am getting this EvalLoss key error when trying to do a training/validation run using the ASR tutorial. See the command and the output below. Training seems to work ok but not the evaluation step.

The same error occurs when trying to follow the notebook in the ASR tutorial. Any suggestions on how to fix this? I am using the Docker container that I pulled with this:

docker pull nvcr.io/nvidia/nemo:v0.9

I made sure I had the latest nemo toolkit and nemo asr modules by running pip install.

================================
Here is the terminal output:

root@9bb9ab3869fc:/workspace/nemo_examples/asr# python -m torch.distributed.launch --nproc_per_node=2 /workspace/nemo_examples/asr/jasper.py --batch_size=64 --num_epochs=100 --lr=0.015 --warmup_steps=8000 --weight_decay=0.001 --train_dataset=/home/pakh0002/data/train-manifests/an4_train_manifest.json --eval_datasets /home/pakh0002/data/test-manifests/an4_test_manifest.json --model_config=/workspace/nemo_examples/asr/configs/quartznet15x5.yaml --exp_name=MyLARGE-ASR-EXPERIMENT


Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.


/opt/conda/lib/python3.6/site-packages/torchvision/io/_video_opt.py:17: UserWarning: video reader based on ffmpeg c++ ops not available
warnings.warn("video reader based on ffmpeg c++ ops not available")
/opt/conda/lib/python3.6/site-packages/torchvision/io/_video_opt.py:17: UserWarning: video reader based on ffmpeg c++ ops not available
warnings.warn("video reader based on ffmpeg c++ ops not available")
Could not import torchaudio. Some features might not work.
Could not import torchaudio. Some features might not work.
2019-12-17 18:39:51,009 - INFO - Doing ALL GPU
2019-12-17 18:39:51,300 - INFO - Dataset loaded with 0.71 hours. Filtered 0.00 hours.
2019-12-17 18:39:51,300 - INFO - Parallelizing DATALAYER
2019-12-17 18:39:51,300 - INFO - Have 948 examples to train on.
2019-12-17 18:39:51,301 - INFO - PADDING: 16
2019-12-17 18:39:51,301 - INFO - STFT using conv
2019-12-17 18:39:51,382 - INFO - Dataset loaded with 0.00 hours. Filtered 0.00 hours.
2019-12-17 18:39:51,382 - INFO - Parallelizing DATALAYER
2019-12-17 18:39:51,752 - INFO - ================================
2019-12-17 18:39:51,754 - INFO - Number of parameters in encoder: 18894656
2019-12-17 18:39:51,754 - INFO - Number of parameters in decoder: 29725
2019-12-17 18:39:51,756 - INFO - Total number of parameters in decoder: 18924381
2019-12-17 18:39:51,756 - INFO - ================================
2019-12-17 18:39:51,824 - WARNING - Data Layer does not have any weights to return. This get_weights call returns None.
Selected optimization level O1: Insert automatic casts around Pytorch functions and Tensor methods.

Defaults for this optimization level are:
enabled : True
opt_level : O1
cast_model_type : None
patch_torch_functions : True
keep_batchnorm_fp32 : None
master_weights : None
loss_scale : dynamic
Processing user overrides (additional kwargs that are not None)...
After processing overrides, optimization options are:
enabled : True
opt_level : O1
cast_model_type : None
patch_torch_functions : True
keep_batchnorm_fp32 : None
master_weights : None
loss_scale : dynamic
2019-12-17 18:39:51,835 - INFO - Doing distributed training
2019-12-17 18:39:51,858 - INFO - Starting .....
2019-12-17 18:39:51,862 - INFO - Found 2 modules with weights:
2019-12-17 18:39:51,862 - INFO - JasperDecoderForCTC
2019-12-17 18:39:51,862 - INFO - JasperEncoder
2019-12-17 18:39:51,862 - INFO - Total model parameters: 18924381
2019-12-17 18:39:51,862 - INFO - Restoring checkpoint from folder MyLARGE-ASR-EXPERIMENT-lr_0.015-bs_64-e_100-wd_0.001-opt_novograd-ips_1/checkpoints ...
2019-12-17 18:39:51,864 - WARNING - For module JasperEncoder, no file matches in MyLARGE-ASR-EXPERIMENT-lr_0.015-bs_64-e_100-wd_0.001-opt_novograd-ips_1/checkpoints
2019-12-17 18:39:51,864 - WARNING - Checkpoint folder MyLARGE-ASR-EXPERIMENT-lr_0.015-bs_64-e_100-wd_0.001-opt_novograd-ips_1/checkpoints present but did not restore
2019-12-17 18:39:51,864 - INFO - Starting epoch 0
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 32768.0
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 32768.0
2019-12-17 18:39:56,417 - INFO - Step: 0
2019-12-17 18:39:56,421 - INFO - Loss: 380.2557373046875
2019-12-17 18:39:56,421 - INFO - training_batch_WER: inf%
2019-12-17 18:39:56,422 - INFO - Prediction: ZJWJBITBIZBWBSJWBJZBJQZQBJB BZQVBBQBPBPWIPBNGBYPWQBQBDBDQBWBPBWWBZBIQBBN
2019-12-17 18:39:56,422 - INFO - Reference:
2019-12-17 18:39:56,422 - INFO - Step time: 2.671168327331543 seconds
2019-12-17 18:39:56,422 - INFO - Doing Evaluation ..............................
Traceback (most recent call last):
File "/workspace/nemo_examples/asr/jasper.py", line 309, in
main()
File "/workspace/nemo_examples/asr/jasper.py", line 305, in main
batches_per_step=args.iter_per_step)
File "/opt/conda/lib/python3.6/site-packages/nemo/core/neural_factory.py", line 616, in train
gradient_predivide=gradient_predivide)
File "/opt/conda/lib/python3.6/site-packages/nemo/backends/pytorch/actions.py", line 1512, in train
self._perform_on_iteration_end(callbacks=callbacks)
File "/opt/conda/lib/python3.6/site-packages/nemo/core/neural_factory.py", line 198, in _perform_on_iteration_end
callback.on_iteration_end()
File "/opt/conda/lib/python3.6/site-packages/nemo/core/callbacks.py", line 435, in on_iteration_end
self.action._eval(self._eval_tensors, self, step)
File "/opt/conda/lib/python3.6/site-packages/nemo/backends/pytorch/actions.py", line 709, in _eval
callback._global_var_dict)
File "/opt/conda/lib/python3.6/site-packages/nemo_asr/helpers.py", line 154, in process_evaluation_epoch
eloss = torch.mean(torch.stack(global_vars['EvalLoss'])).item()
KeyError: 'EvalLoss'
Traceback (most recent call last):
File "/opt/conda/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/opt/conda/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/opt/conda/lib/python3.6/site-packages/torch/distributed/launch.py", line 253, in
main()
File "/opt/conda/lib/python3.6/site-packages/torch/distributed/launch.py", line 249, in main
cmd=cmd)
subprocess.CalledProcessError: Command '['/opt/conda/bin/python', '-u', '/workspace/nemo_examples/asr/jasper.py', '--local_rank=1', '--batch_size=64', '--num_epochs=100', '--lr=0.015', '--warmup_steps=8000', '--weight_decay=0.001', '--train_dataset=/home/pakh0002/data/train-manifests/an4_train_manifest.json', '--eval_datasets', '/home/pakh0002/data/test-manifests/an4_test_manifest.json', '--model_config=/workspace/nemo_examples/asr/configs/quartznet15x5.yaml', '--exp_name=MyLARGE-ASR-EXPERIMENT']' returned non-zero exit status 1.

ERROR: module export failed for JasperEncoder with exception number of output names provided (2) exceeded number of outputs (1)

Hello, I tried to trained my own mandarin ASR model with open corpus aishell_1, everything seems right, the config file I used is located in examples/asr/configs/quartznet10x5.yaml, but when I attempted to convert temporary JasperEncoder-STEP-30000.pt and JasperDecoderForCTC-STEP-30000.pt to onnx format by using scripts/export_jasper_to_onnx.py script, An error occured when converting encoder pt file to onnx format, some logs are:

Loading config file...
Determining model shape...
Num encoder input features: 64
Num decoder input features: 1024
Initializing models...
Loading checkpoints...
Exporting encoder...
2020-01-07 16:07:16,987 - WARNING - Turned off 115 masked convolutions
Module is JasperEncoder. We are removinginput and output length ports since they are not needed for deployment
/xxx/anaconda3/lib/python3.7/site-packages/torch/jit/init.py:1007: TracerWarning: Output nr 1. of the traced function does not match the corresponding output of the Python function. Detailed error:
Not within tolerance rtol=1e-05 atol=1e-05 at input[0, 305, 3] (0.005420095752924681 vs. 0.005409650504589081) and 1 other locations (0.00%)
check_tolerance, _force_outplace, True, _module_class)
2020-01-07 16:07:24,303 - ERROR - ERROR: module export failed for JasperEncoder with exception number of output names provided (2) exceeded number of outputs (1)

After my own check and trace, I think there may be a bug in nemo.backends.pytorch.actions.py

input_names=input_names,
output_names=output_names,

after I removed "length" from list input_names and removed "encoded_lengths" from list output_names before calling torch.onnx.export, the converting process worked fine.

The nemo version I used is 0.9.0

No module named 'nemo_nlp.utils'

Same issue as #84 , tried the recommended solution and it did not work.

I installed NeMo with:

pip install nemo_toolkit nemo_asr nemo_nlp

Running the following Python import code, I get an error

import torch
print("PyTorch Version:", torch.__version__)
import nemo
print("NeMo Version:", nemo.__version__)
import nemo_nlp
print("NeMo NLP Version:", nemo_nlp.__version__)

Output:

PyTorch Version: 1.2.0
NeMo Version: 0.8.1
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-4-9d2b1d1a62b9> in <module>()
      5 print("NeMo Version:", nemo.__version__)
      6 
----> 7 import nemo_nlp
      8 print("NeMo NLP Version:", nemo_nlp.__version__)

/usr/local/lib/python3.6/dist-packages/nemo_nlp/data/datasets/utils.py in <module>()
     17 from nemo.utils.exp_logging import get_logger
     18 
---> 19 from ...utils.nlp_utils import (get_vocab,
     20                                 write_vocab,
     21                                 write_vocab_in_order,

ModuleNotFoundError: No module named 'nemo_nlp.utils'

TIMIT ?

Hi there !

Do you have a TIMIT recipe already existing?

Thanks for the toolkit!

CUDA out of memory

Hi, thank you for making a excellent project.
I have a question about training big model. I can train jasper12x1SEP on a 1080Ti GPU. I can't train jasper15x5SEP on two 2080Ti GPU , which error is CUDA out of memory. How can I use two 2080Ti train jasper15x5SEP.
parameter:

        num_epochs=50,
        batch_size=32,
        eval_batch_size=16,
        lr=0.015,
        weight_decay=0.001,
        warmup_steps=8000,
        checkpoint_save_freq=2000,
        train_eval_freq=100,
        eval_freq=4000

Issue while installing swig

Note: commands executed inside the container

  1. While following the instructions from https://nvidia.github.io/NeMo/asr/tutorial.html#inference for using klm, got an error like "swig package not available" while running apt-get install swig.

This was fixed by running apt-get update before running apt-get install swig.

  1. Also sudo (in sudo apt-get install swig) is not required inside container.

  2. install_decoders.sh was failing with gcc error. Had to run the following to fix it:
    apt-get install ssh pkg-config libflac-dev libogg-dev libvorbis-dev libboost-dev swig python-dev git-core libsndfile1-dev python-setuptools libboost-all-dev //NOTE: I'm not sure if all the dependencies are required

Training non-English ASR model

Hello!
I tried to train russian ASR model based on 1_ASR_tutorial_using_NeMo.ipynb notebook (from NeMo/examples/asr/notebooks/) using Google Colab. I used config jasper_an4.yaml and quartznet5x3.yaml with changed labels accordingly to russian alphabet ("а", "б", "в", "г" and etc.) and wav files with russian speech.
During training I got empty "Reference" in every step. There was "Prediction" on the first step and then only empty rows. Loss seems to be correct but WER is infinite in this case. Is there a problem with the encoding?
I got predictions only with whitespaces on inference, again WER is infinite.

I will appreciate for any hint how to solve this issue.

Pillow version gives error

When running an nlp model inside nemo container I received the error
ImportError: cannot import name 'PILLOW_VERSION' from 'PIL'.
The installed Pillow version was 7, after downgrading to 6 it was fine.

examples/nlp/nmt_tutorial.py: how to generate YouTokenToMe model for custom language?

Dear Team of Neural Machine Translation,

`

pass a YouTokenToMe model to YouTokenToMeTokenizer for de

if the target is zh, we should pass a vocabulary file, e.g. zh_vocab.txt

`
src: examples/nlp/nmt_tutorial.py

How to generate YouTokenToMe model for custom langueage?
I'll be so appreciated, if you provide any instruction for it.

Thank you in advance!

How about supporting BPE in ASR

I find many papers using BPE as modelling units, so
I was wondering if it is possible to change current char-based ASR to tokenizer based (like nlp).
By using a custom tokenizer, it may help to reuse tokenizers in NLP collection. (Maybe we should put some common tools into one utils or collections, like the models and utility functions).
And users can use whatever modelling units they want.

Potential problems:

  1. Currently every example script is char based, including helpers.py.
  2. The beam search decoder now only supports char based models. (It requires a vocab file.)
    The ctc_beam_search_with_lm works like this:
    First, it will detect the language model used is char-based or word based by simply checking the length of the unigrams. If there is one unigram whose length is greater than 1, then it is word based, otherwise it is char based.

On ngram language model built on words like english LM, it will detect whether the current character is space. If it is a space, it will call language model function to detect the LM score.
On ngram LM built on charaters like some mandarin LM, it will call everytime it appends a new character (a new prefix generated).
In order to make beam search with LM available for all modelling units, we may need to change the decoder code to let it support this feature.

Of course, if we only want to use greedy search on different modelling units or only use beam search on char-based model, there will be no problem.

Combining Nemo with Pre-existing Classification Models

Hello! Great work here. We'd like to know if you could guide us toward the proper resources concerning how to combine NeMo with pre-existing Image Classification or even Object Detection models. Thank you for your time.

ASR Tutorial Notebook: RuntimeError: Mismatch in shape: grad_output[0] has a shape of torch.Size([1]) and output[0] has a shape of torch.Size([]).`

Getting following error when running "examples/asr/notebooks/1_ASR_tutorial_using_NeMo.ipynb" on Google Colab with GPU runtime

"RuntimeError: Mismatch in shape: grad_output[0] has a shape of torch.Size([1]) and output[0] has a shape of torch.Size([]).`"

2019-10-27 04:10:44,396 - WARNING - Data Layer does not have any weights to return. This get_weights call returns None.
2019-10-27 04:10:44,408 - INFO - Restoring checkpoint from folder ./an4_checkpoints ...
Selected optimization level O0: Pure FP32 training.

Defaults for this optimization level are:
enabled : True
opt_level : O0
cast_model_type : torch.float32
patch_torch_functions : False
keep_batchnorm_fp32 : None
master_weights : False
loss_scale : 1.0
Processing user overrides (additional kwargs that are not None)...
After processing overrides, optimization options are:
enabled : True
opt_level : O0
cast_model_type : torch.float32
patch_torch_functions : False
keep_batchnorm_fp32 : None
master_weights : False
loss_scale : 1.0
Warning: multi_tensor_applier fused unscale kernel is unavailable, possibly because apex was installed without --cuda_ext --cpp_ext. Using Python fallback. Original ImportError was: ModuleNotFoundError("No module named 'amp_C'",)
Starting .....
No file matches in ./an4_checkpoints
Checkpoint folder ./an4_checkpoints present but did not restore
Starting epoch 0

RuntimeError Traceback (most recent call last)
in ()
4 optimizer='novograd',
5 optimization_params={
----> 6 "num_epochs": 150, "lr": 0.01, "weight_decay": 1e-4
7 })
8

4 frames
/usr/local/lib/python3.6/dist-packages/nemo/core/neural_factory.py in train(self, tensors_to_optimize, optimizer, optimization_params, callbacks, lr_policy, batches_per_step, stop_on_nan_loss, reset)
516 lr_policy=lr_policy,
517 batches_per_step=batches_per_step,
--> 518 stop_on_nan_loss=stop_on_nan_loss)
519
520 def eval(self,

/usr/local/lib/python3.6/dist-packages/nemo/backends/pytorch/actions.py in train(self, tensors_to_optimize, optimizer, optimization_params, callbacks, lr_policy, batches_per_step, stop_on_nan_loss)
1191 continue
1192 scaled_loss.backward(
-> 1193 bps_scale.to(scaled_loss.get_device()))
1194 else:
1195 final_loss.backward(

/usr/local/lib/python3.6/dist-packages/torch/tensor.py in backward(self, gradient, retain_graph, create_graph)
148 products. Defaults to False.
149 """
--> 150 torch.autograd.backward(self, gradient, retain_graph, create_graph)
151
152 def register_hook(self, hook):

/usr/local/lib/python3.6/dist-packages/torch/autograd/init.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables)
91 grad_tensors = list(grad_tensors)
92
---> 93 grad_tensors = _make_grads(tensors, grad_tensors)
94 if retain_graph is None:
95 retain_graph = create_graph

/usr/local/lib/python3.6/dist-packages/torch/autograd/init.py in _make_grads(outputs, grads)
27 + str(grad.shape) + " and output["
28 + str(outputs.index(out)) + "] has a shape of "
---> 29 + str(out.shape) + ".")
30 new_grads.append(grad)
31 elif grad is None:

RuntimeError: Mismatch in shape: grad_output[0] has a shape of torch.Size([1]) and output[0] has a shape of torch.Size([]).`

Quartz: issues in replicating results

Hi,

I am trying to use the NeMo implementation of Quartz to replicate the results presented in this paper.

However I am facing some issues. First of all, the pretrained encoder model has a different structure with respect to the one implemented in Nemo. In particular, to be able to load the state dictionary I had to modify Masked1DConv to inherit from 1DConv (as in the original Jasper implementation).
Moreover there are discrepancies with the names of the layers that has to be fixed to be able to load properly the pretrained model.

After my attempts at fixing these issues, I still was not able to reach the performances mentioned in the paper. I tried evaluating on dev_other, and I reached 16.9% in terms of WER, which is much higher compared to 11.58% reported on the paper.

I used the configuration file and the pretrained model that can be found here.

The validation is run inside a docker container built starting from the Dockerfile available in the repo. The only minor difference regards the version of the pytorch image used, that is the 19.09 instead of 19.11 because of some issues with CUDA drivers that wouldn't allow me to use the GPUs.

Any help would be much appreciated. Thank you!

Is it possible to check quality of tacotron2 training without/before waveglow?

It is my fist attempt to use tacotron2 training. I am trying to synthesis voice for none English language. I am not sure that the my dataset and training options are absolutely correct.
On the other side my hardware is not so powerful to wait calmly.
So I would like to check preliminary results to make required changes, if they are needed.

Today is a 4-rd day of the training.
The epoch is a 10.
Step: 48000
Loss: between 0.40 - 0.75

Is this level enough to reach rough results (wav, png)?
I had tried to run tts_infer.py with defaults options, but I got warning that there is no waveglow (that might have been expected)․

Thank you in advance

Sentence classification needs a more abstract DataDesc class

Currently, the sentence classification task uses SentenceClassificationDataDesc to prepare training and testing data. If the user wants to use their own dataset. They need to add a process_xxx function and set num_class in SentenceClassificationDataDesc. That's not convenient because users have to change the source code of NeMo. Users should be able to define their own process_xxx function in training script and pass the function & num_class to SentenceClassificationDataDesc as parameters. SentenceClassificationDataDesc should use the user-defined process function to process data.

jasper_infer.py causes GPU memory OOM

Using the script jasper_infer.py according to the tutorial, I find the GPU memory seems not released after each batch and it increases after each batch until OOM.
model_config=/workspace/nemo/examples/asr/configs/jasper10x5dr.yaml
CUDA: 10.1
Tesla V100

Quartznet: replicating Training

Hi,
I would like to replicate the training process of quartznet, however I'm having some issues.

In particular, on the official paper and on website it says that a speed perturbation was applied to the dataset (±10%). However I can't seem to find trace of that neither in the download script, nor in the code.

The other issue I'm having is related to the hyperparameters used when training on Librispeech, such as learning rate and warmup steps. They are not mentioned explicitly anywhere, so I was wondering if you could help me with that as well.

Thank you!

unidecode requirement

You use the unidecode module, but it is not listed as a requirement so it doesn't get installed if one doesn't have it.

Drop last layer

I want to test the English checkpoint in a new alphabet for Spanish, is it possible to drop layers to train in a new alphabet?

Experience with small dataset

Hi there!

Just to gather some of your experience on working with small datasets. I'm currently investigating TIMIT with Nemo, and based on the AN4 architecture with the proper CTC symbols (Phonemes instead of character). Unfortunately, I observe poor performances (around 27% of WER) which is very high compared to works like:
https://arxiv.org/pdf/1701.02720.pdf

I also know that the PER reported at training time is based on a greedy decoding that might explain these performances.

Since my goal is to compare NeMo to others toolkits, I would like to be as fair as possible and ask you about specific tricks you have encountered to obtain better performances with smaller datasets. I'm currently trying to play with the architecture to reduce the complexity and overfitting (also noises are not great with TIMIT).

Thanks!

Kaldi unittest fails

======================================================================
ERROR: test_kaldi_dataloader (tests.test_asr.TestASRPytorch)

Traceback (most recent call last):
File "/home/okuchaiev/repos/NeMo/tests/test_asr.py", line 165, in test_kaldi_dataloader
batch_size=batch_size
File "/home/okuchaiev/repos/NeMo/collections/nemo_asr/nemo_asr/data_layer.py", line 464, in init
self._dataset = KaldiFeatureDataset(**dataset_params)
File "/home/okuchaiev/repos/NeMo/collections/nemo_asr/nemo_asr/parts/dataset.py", line 224, in init
for utt_id, feats in kaldi_io.read_mat_scp(feats_path)
File "/home/okuchaiev/repos/NeMo/collections/nemo_asr/nemo_asr/parts/dataset.py", line 222, in
id2feats = {
File "/home/okuchaiev/anaconda3/envs/py37/lib/python3.7/site-packages/kaldi_io-0.9.1-py3.7.egg/kaldi_io/kaldi_io.py", line 343, in read_mat_scp
fd = open_or_fd(file_or_fd)
File "/home/okuchaiev/anaconda3/envs/py37/lib/python3.7/site-packages/kaldi_io-0.9.1-py3.7.egg/kaldi_io/kaldi_io.py", line 63, in open_or_fd
fd = open(file, mode)
FileNotFoundError: [Errno 2] No such file or directory: 'tests/data/asr/kaldi_an4/feats.scp'

neural factory infer function return all tensors for the whole dataset

Not sure if my understanding for this function is correct.
It seems this function returns the list of batch tensors for the total dataset, which may cause CPU or GPU OOM if the dataset is huge or the tensors contain a lot of data, like the likelihood for each frames with a large vocabulary.

Is this possible to modify this function to return a generator instead ?

Add support / option for ASR audio streaming

Currently, the ASR collection is supporting audio files as an input for training and prediction. However, a lot of ASR uses involve streaming audio for prediction. In streaming, the bytes of audio are sent in intervals. It would be useful to have an option in relevant classes (AudioToTextDataLayer and others) to accept bytes (and not just path to audiofile/files) as an input. Then, using NeMo-ASR models for streaming would be possible.

Is there a plan to add this?

Dockerfile onnx-tensorrt patch failing

I am trying to use your Dockerfile but running into issues with your patch

Step 8/15 : RUN git clone https://github.com/onnx/onnx-tensorrt.git && cd onnx-tensorrt && git submodule update --init --recursive && patch -f < ../onnx-trt.patch &&     mkdir build && cd build && cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/usr -DGPU_ARCHS="60 70 75" && make -j16 && make install && mv -f /usr/lib/libnvonnx* /usr/lib/x86_64-linux-gnu/ && ldconfig
 ---> Running in f15383978bfd
Cloning into 'onnx-tensorrt'...
Submodule 'third_party/onnx' (https://github.com/onnx/onnx.git) registered for path 'third_party/onnx'
Cloning into '/tmp/onnx-trt/onnx-tensorrt/third_party/onnx'...
Submodule path 'third_party/onnx': checked out '553df22c67bee5f0fe6599cff60f1afc6748c635'
Submodule 'third_party/benchmark' (https://github.com/google/benchmark.git) registered for path 'third_party/onnx/third_party/benchmark'
Submodule 'third_party/pybind11' (https://github.com/pybind/pybind11.git) registered for path 'third_party/onnx/third_party/pybind11'
Cloning into '/tmp/onnx-trt/onnx-tensorrt/third_party/onnx/third_party/benchmark'...
Cloning into '/tmp/onnx-trt/onnx-tensorrt/third_party/onnx/third_party/pybind11'...
Submodule path 'third_party/onnx/third_party/benchmark': checked out 'e776aa0275e293707b6a0901e0e8d8a8a3679508'
Submodule path 'third_party/onnx/third_party/pybind11': checked out '09f082940113661256310e3f4811aa7261a9fa05'
Submodule 'tools/clang' (https://github.com/wjakob/clang-cindex-python3) registered for path 'third_party/onnx/third_party/pybind11/tools/clang'
Cloning into '/tmp/onnx-trt/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang'...
Submodule path 'third_party/onnx/third_party/pybind11/tools/clang': checked out '6a00cbc4a9b8e68b71caf7f774b3f9c753ae84d5'
patching file CMakeLists.txt
Hunk #1 FAILED at 20.
1 out of 1 hunk FAILED -- saving rejects to file CMakeLists.txt.rej
The command '/bin/sh -c git clone https://github.com/onnx/onnx-tensorrt.git && cd onnx-tensorrt && git submodule update --init --recursive && patch -f < ../onnx-trt.patch &&     mkdir build && cd build && cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/usr -DGPU_ARCHS="60 70 75" && make -j16 && make install && mv -f /usr/lib/libnvonnx* /usr/lib/x86_64-linux-gnu/ && ldconfig' returned a non-zero code: 1
ERROR: Job failed: command terminated with exit code 1

Fix documentation for the fine-tuning for multi-gpu training

In Fine-Tuning section of tutorial, add that in case of distributed training the restore syntax has to be the following:
jasper_encoder.restore_from("/data/atc_tenant/Speech2/nemodata/JasperEncoder-
jasper_decoder.restore_from("/data/atc_tenant/Speech2/nemodata/JasperDecoderForCTC-STEP-247400.pt", args.local_rank)

Mandarin ASR, predicitions stay as BLANK sequences

Hi, appreciate this great framework and your great work!
Your pretrained Mandarin quartznet has very good performance on Aishell Testset, so I want to train the same model arch on our own Mandarin reading-style data from scratch;
The train script is like this:
python -m torch.distributed.launch --nproc_per_node=2 ./jasper_aishell.py --batch_size=8 --num_epochs=150 --lr=0.00005 --warmup_steps=1000 --weight_decay=0.00001 --train_dataset=./word_4000h/lists/train.json --eval_datasets ./word_4000h/lists/dev_small.json --model_config=./aishell2_quartznet15x5/quartznet15x5.yaml --exp_name=quartznet_train --vocab_file=./word_4000h/am/token_dev_train_4400.txt --checkpoint_dir=$checkpoint_dir --work_dir=$checkpoint_dir
The training data is about 500 hours long.
At first, the prediction is pretty much random;Then after several thousand iterations(before warmup ends), the predicitions stays as BLANK sequences for two epochs like this:
Step: 4650
2020-01-07 09:53:20,694 - INFO - Loss: 110.91824340820312
2020-01-07 09:53:20,694 - INFO - training_batch_CER: 100.00%
2020-01-07 09:53:20,694 - INFO - Prediction:
2020-01-07 09:53:20,694 - INFO - Reference: 提起华华家的事情村民们声声长叹
Step time: 0.39273500442504883 seconds

I have tried the learning rate from 0.1 to 0.00005, warmpup steps from 1000 to 8000, batch size as 4,8,16,32, weight_decay from 0.001 to 0.00001, and none of those combinations could solve this problem.
Have you ever encountered this kind of problem?

Correction in the 1_ASR_tutorial_using_NeMo.ipynb

noticed a tiny error in the example jupyter notebook

metadata = { "audio_filename": audio_path, "duration": duration, "text": transcript }
should be

metadata = { "audio_filepath": audio_path, "duration": duration, "text": transcript }

otherwise it causes a key error

Question on multi_gpu

Hi !

Reading the documentation I should " First set placement to nemo.core.DeviceType.AllGpu in NeuralModuleFactory and in your Neural Modules" to enable multi-gpu training. But I'm definitely not sure about what module should has or should not has the placement. Do you have an example of a quartznet.py script that enables multi-gpu?

Is it possible to export to onnx format?

How would one go in exporting a model to onnx format? I guess its not supported out of the box, but are there any hints on how to do it since it is probably one module ?

Unidecode module error when unit-testing

After installing dependencies, including apex then the reinstall.sh script, but upon trying out the unit tests, I received an error for missing module unidecode. A simple pip install unidecode solved this, with all tests subsequently running successfully. The error prior to successful tests is attached, and computer details for completeness below.

Is this issue something to be ameliorated in the install procedure or setup file?

CentOS Linux release 7.5.1804 (Core)
Linux version 3.10.0-862.14.4.el7.x86_64 ([email protected]) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-28) (GCC) ) #1 SMP Wed Sep 26 15:12:11 UTC 2018

Unidecode_Error.txt

TypeError: Can't instantiate abstract class AudioToTextDataLayer with abstract methods create_ports

Hi, I installed NeMo by cloning the repo and following the installation orders. However when i tried to run the ASR example notebook I got this error:

File "examples/asr/jasper_eval.py", line 96, in main
**eval_dl_params)

Create the Jasper_4x1 encoder as specified, and a CTC decoder

---> 23 encoder = nemo_asr.JasperEncoder(**params['JasperEncoder'])
24
25 decoder = nemo_asr.JasperDecoderForCTC(

TypeError: Can't instantiate abstract class JasperEncoder with abstract methods create_ports

I also get a similar error when trying to evaluate a quartznet on the Librispeech dev:

TypeError: Can't instantiate abstract class AudioToTextDataLayer with abstract methods create_ports

Did I miss something? Thanks in advance.

ONNX Export NoneType error

Hello again, I'm trying to export quartznet15x5 v2 to ONNX with master(f946aca)

With the following command:

!python export_jasper_to_onnx.py --config quartznet15x5.yaml  \
--nn_encoder JasperEncoder-STEP-247400.pt --nn_decoder JasperDecoderForCTC-STEP-247400.pt  \
--onnx_encoder encoder.onnx --onnx_decoder decoder.onnx

Failing with:

Loading config file...
Determining model shape...
  Num encoder input features: 64
  Num decoder input features: 1024
Initializing models...
Loading checkpoints...
Exporting encoder...
2019-12-15 06:57:12,846 - WARNING - Module is JasperEncoder. We are removinginput and output length ports since they are not needed for deployment
2019-12-15 06:57:12,847 - WARNING - Turned off 0 masked convolutions
2019-12-15 06:57:12,848 - ERROR - ERROR: module export failed for JasperEncoder with exception 'NoneType' object has no attribute 'to'
Exporting decoder...
graph(%encoder_output : Float(1, 1024, 128),
      %1 : Float(29),
      %2 : Float(29, 1024, 1)):
  %3 : Float(1, 29, 128) = onnx::Conv[dilations=[1], group=1, kernel_shape=[1], pads=[0, 0], strides=[1]](%encoder_output, %2, %1), scope: JasperDecoderForCTC/Sequential[decoder_layers]/Conv1d[0] # /usr/local/lib/python3.6/dist-packages/torch/nn/modules/conv.py:202:0
  %4 : Float(1, 128, 29) = onnx::Transpose[perm=[0, 2, 1]](%3), scope: JasperDecoderForCTC # /usr/local/lib/python3.6/dist-packages/nemo_asr/jasper.py:207:0
  %output : Float(1, 128, 29) = onnx::LogSoftmax[axis=2](%4), scope: JasperDecoderForCTC # /usr/local/lib/python3.6/dist-packages/torch/nn/functional.py:1317:0
  return (%output)

/usr/local/lib/python3.6/dist-packages/torch/onnx/utils.py:772: UserWarning: No names were found for specified dynamic axes of provided input.Automatically generated names will be applied to each dynamic axes of input encoder_output
  'Automatically generated names will be applied to each dynamic axes of input {}'.format(key))
/usr/local/lib/python3.6/dist-packages/torch/onnx/utils.py:772: UserWarning: No names were found for specified dynamic axes of provided input.Automatically generated names will be applied to each dynamic axes of input output
  'Automatically generated names will be applied to each dynamic axes of input {}'.format(key))
Export completed successfully.

Is ONNX export compatible with the latest model using master?

tts_infer.py return segmentation fault

Ubuntu 18.04
Python3.6.8,
Pytorch 1.3
GPU:1080ti

I've downloaded the tacotron2 and waveglow model form ngc
tacotron2 model:https://ngc.nvidia.com/catalog/models/nvidia:tacotron2_ljspeech
waveglow model:https://ngc.nvidia.com/catalog/models/nvidia:waveglow_ljspeech

NeMo/examples/tts/tts_infer.py

And run the below command,and got segmentation fault。
python3 tts_infer.py --spec_model=tacotron2 --spec_model_config=configs/tacotron2.yaml --spec_model_load_dir=tacotron2_checkopints/ --vocoder=waveglow --vocoder_model_config=configs/waveglow.yaml --vocoder_model_load_dir=waveglow_checkopints/ --save_dir=wav_files/ --eval_dataset=test.json

Predict Proba from Inference

Hello,
I would like to get the predict_proba from neural_factory.infer()

Example
{
predict : "Hello World"
predict_proba : 0.85
}

Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.