loretoparisi / wave2vec-recognize-docker Goto Github PK

View Code? Open in Web Editor NEW

33.0 9.0 10.0 34 KB

Wave2vec 2.0 Recognize pipeline

License: MIT License

Dockerfile 15.76% Python 84.24%

wav2vec docker asr automatic-speech-recognition pytorch wav2letter kenlm

wave2vec-recognize-docker's Introduction

wav2vec

wav2vec 2.0 Recognize Implementation.

Disclaimer

Wave2vec is part of fairseq This repository is the result of the issue submitted in the fairseq repository here.

Resource

Please first download one of the pre-trained models available from fairseq (see later).

Pre-trained models

Model	Finetuning split	Dataset	Model
Wav2Vec 2.0 Base	No finetuning	Librispeech	download
Wav2Vec 2.0 Base	10 minutes	Librispeech	download
Wav2Vec 2.0 Base	100 hours	Librispeech	download
Wav2Vec 2.0 Base	960 hours	Librispeech	download
Wav2Vec 2.0 Large	No finetuning	Librispeech	download
Wav2Vec 2.0 Large	10 minutes	Librispeech	download
Wav2Vec 2.0 Large	100 hours	Librispeech	download
Wav2Vec 2.0 Large	960 hours	Librispeech	download
Wav2Vec 2.0 Large (LV-60)	No finetuning	Libri-Light	download
Wav2Vec 2.0 Large (LV-60)	10 minutes	Libri-Light + Librispeech	download
Wav2Vec 2.0 Large (LV-60)	100 hours	Libri-Light + Librispeech	download
Wav2Vec 2.0 Large (LV-60)	960 hours	Libri-Light + Librispeech	download

How to install

We make use of python:3.8.6-slim-buster as base image in order to let developers to have more flexibility in customize this Dockerfile. For a simplifed install please refer to Alternative Install section. If you go for this container, please install using the provided Dockerfile

docker build -t wav2vec -f Dockerfile .

How to Run

There are two version of recognize.py.

recognize.py: For running legacy finetuned model (without Hydra).
recognize.hydra.py: For running new finetuned with newer version of fairseq.

Before running, please copy the downloaded model (e.g. wav2vec_small_10m.pt) to the data/ folder. Please copy there the wav file to test as well, like data/temp.wav in the following examples. So the data/ folder will now look like this

.
├── dict.ltr.txt
├── temp.wav
└── wav2vec_small_10m.pt

We now run the container and the we enter and execute the recognition (recognize.py or recognize.hydra.py).

docker run -d -it --rm -v $PWD/data:/app/data --name w2v wav2vec
docker exec -it w2v bash
python examples/wav2vec/recognize.py --target_dict_path=/app/data/dict.ltr.txt /app/data/wav2vec_small_10m.pt /app/data/temp.wav

Common issues

1. What if my model are not compatible with fairseq?

At the very least, we have tested with fairseq master branch (> v0.10.1, commit ac11107). When you run into issues, like this:

omegaconf.errors.ValidationError: Invalid value 'False', expected one of [hard, soft]
full_key: generation.print_alignment
reference_type=GenerationConfig
object_type=GenerationConfig

It's probably that your model've been finetuned (or trained) with other version of fairseq. You should find yourself which version your model are trained, and edit commit hash in Dockerfile accordingly, BUT IT MIGHT BREAK src/recognize.py.

The workaround is look for what's changed in the parameters inside fairseq source code. In the above example, I've managed to find that:

fairseq/dataclass/configs.py (72a25a4 -> 032a404)

- print_alignment: bool = field(
+ print_alignment: Optional[PRINT_ALIGNMENT_CHOICES] = field(
-     default=False,
+     default=None,
      metadata={
-         "help": "if set, uses attention feedback to compute and print alignment to source tokens"
+         "help": "if set, uses attention feedback to compute and print alignment to source tokens "
+         "(valid options are: hard, soft, otherwise treated as hard alignment)",
+         "argparse_const": "hard",
      },
  )

The problem is fairseq had modified such that generation.print_alignment not valid anymore, so I modify recognize.hydra.py as below (you might wanna modify the value instead):

  OmegaConf.set_struct(w2v["cfg"], False)
+ del w2v["cfg"].generation["print_alignment"]
  cfg = OmegaConf.merge(OmegaConf.structured(Wav2Vec2CheckpointConfig), w2v["cfg"])

Alternative install

We provide an alternative Dockerfile named wav2letter.Dockerfile that makes use of wav2letter/wav2letter:cpu-latest Docker image as FROM. Here are the commands for build, install and run in this case:

docker build -t wav2vec2 -f wav2letter.Dockerfile .
docker run -d -it --rm -v $PWD/data:/root/data --name w2v2 wav2vec2
docker exec -it w2v2 bash
python examples/wav2vec/recognize.py --wav_path /root/data/temp.wav --w2v_path /root/data/wav2vec_small_10m.pt --target_dict_path /root/data/dict.ltr.txt

Contributors

Thanks to all contributors to this repo.

wave2vec-recognize-docker's People

Contributors

Stargazers

Watchers

Forkers

runtu9527 phantomcoder1996 raja1196 jeew00 dacson s174165 victorberaldo amiyamandal-dev dr-mars mbencherif

wave2vec-recognize-docker's Issues

Cannot build image `executor failed running [/bin/sh -c pip3 install soundfile torchaudio sentencepiece]: exit code: 137`

Hi there,

I am trying to build this docker image and get the following:

Any idea what am i missing?

Docker build error with fairseq - feb5f07

I've pull the latest commit of this repo, tried run docker build, and got this error:

WARNING: You are using pip version 19.3; however, version 20.3.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
Traceback (most recent call last):
  File "examples/speech_recognition/infer.py", line 17, in <module>
    import editdistance
ModuleNotFoundError: No module named 'editdistance'
The command '/bin/sh -c pip install --editable ./ && python examples/speech_recognition/infer.py --help && python examples/wav2vec/recognize.py --help' returned a non-zero code: 1

After fix it up by add pip install editdistance, I've run into this:

/usr/local/lib/python3.7/site-packages/torch/cuda/__init__.py:52: UserWarning: CUDA initialization: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx (Triggered internally at  /pytorch/c10/cuda/CUDAFunctions.cpp:100.)
  return torch._C._cuda_getDeviceCount() > 0
Traceback (most recent call last):
  File "examples/wav2vec/recognize.py", line 10, in <module>
    from fairseq.models.wav2vec.wav2vec2_asr import base_architecture, Wav2VecEncoder
ImportError: cannot import name 'base_architecture' from 'fairseq.models.wav2vec.wav2vec2_asr' (/app/fairseq/fairseq/models/wav2vec/wav2vec2_asr.py)
The command '/bin/sh -c pip install --editable ./ && python examples/speech_recognition/infer.py --help && python examples/wav2vec/recognize.py --help' returned a non-zero code: 1

I tried wav2letter.Dockerfile, but still got the above error.
Environment:

Docker 20.10.
Amazon EC2 Ubuntu 18.04 instance (Linux 5.4.0-1029-aws).

I think torch still require GPU to install, or new version of torch required it.
Have you guys run into this? Or should we update the Dockerfile(s).

Understand if it is possible to use own checkpoint from training as model file

Hello,

I've been busy with the default fairseq examples/speech_recognition/infer.py and also this repo's recognize.py, to see if it is possible to run inference using a model we made ourselves by finetuning a base model. We can get the script infer.py to work, but I've noticed that it needs to be able to find the original base model on disk. Moving the checkpoint model to a different machine is cumbersome, the base model has to be in the same location on the target machine.

I've tried to study how the model loading works for almost a day now, but I can't wrap my head around it. I think it only needs some args from the original base model, there is a lot of exchange going on between formats and names cfg, w2v_args, OmegaConf and Namespace.

The recognize.py and recognize.hydra.py break on loading a checkpoint file (but they work on published finetuned models). I would be helped if there is a way to produce a model file that works with recognize.py from the original base model and a checkpoint. I have not been able to find such a tool—I believe it is as simple as adding the correct .cfg.w2v_args info to the checkpoint, but I don't understand how.

I can get recognize.py to work with a checkpoint file with the patch below, but then model loading still refers to the original base model.

@@ -139,13 +162,24 @@ class Wav2VecPredictor:
         return feats
 
     def _load_model(self, model_path, target_dict):
-        w2v = torch.load(model_path)
-
+        #w2v = torch.load(model_path)
+        #if w2v['args'] is None:
+        #    w2v['args'] = Namespace()
         # Without create a FairseqTask
-        args = base_architecture(w2v["args"])
-        model = Wav2VecCtc(args, Wav2VecEncoder(args, target_dict))
-        model.load_state_dict(w2v["model"], strict=True)
-        return model
+        #args = base_architecture(w2v["args"])
+        #model = Wav2VecCtc(args, Wav2VecEncoder(args, target_dict))
+        #model.load_state_dict(w2v["model"], strict=True)
+
+        models, saved_cfg, task = load_model_ensemble_and_task(
+            utils.split_paths(model_path),
+            arg_overrides=None, # ast.literal_eval(args.model_overrides),
+            task=None,
+            suffix="",
+            strict=True,
+            num_shards=1,
+            state=None
+        )
+        return models[0]

KenLM decoder

Hi,
can you please guide me what all need to be changed/added in your scripts to inference with KenLM decoder?

Thanks for the docker!

RuntimeError: [enforce fail at CPUAllocator.cpp:65]

Followed the installation steps, built the Dockerfile (Which had its own hiccups, the fairseq repository does not have base_architecture definition in their models file, will raise a PR for it separately) and ran the code.

Running command
python examples/wav2vec/recognize.py --wav_path /app/data/test.WAV --w2v_path /app/data/wav2vec_small_10m.pt --target_dict_path /app/data/dict.ltr.txt

Error:
RuntimeError: [enforce fail at CPUAllocator.cpp:65] . DefaultCPUAllocator: can't allocate memory: you tried to allocate 314663671488 bytes. Error code 12 (Cannot allocate memory)

Screenshot of the error:

I tried with different models of wave2vec with similar error. Do let me know if more information is needed. Running this on Azure DS VM.

/root/fairseq/examples/speech_recognition/w2l_decoder.py:41: UserWarning: wav2letter python bindings are required to use this functionality.

Hi! Thank you so much for setting up this docker, I've been looking for so long for some sort of simple way to test this model, just put input audio in and get text back and it's ridiculous how complex it all is to set up. This whole thing should've been packaged into a pip install line! so thank you for your work!

After running the installation example:

docker build -t wav2vec2 -f wav2letter.Dockerfile .
docker run -d -it --rm -v $PWD/data:/root/data --name w2v2 wav2vec2
docker exec -it w2v2 bash

Everything went smoothly up til this line:

python examples/wav2vec/recognize.py --wav_path /root/data/temp.wav --w2v_path /root/data/wav2vec_small_10m.pt --target_dict_path /root/data/dict.ltr.txt

Which I modified to this (because it seemed that the wav file was not found:

#I first cd'd back to the root, and did:

python3 fairseq/examples/wav2vec/recognize.py --wav_path /root/data/temp.wav --w2v_path /root/data/wav2vec2_vox_960h.pt --target_dict_path /root/data/dict.ltr.txt

Now I get this error:

/root/fairseq/examples/speech_recognition/w2l_decoder.py:41: UserWarning: wav2letter python bindings are required to use this functionality. Please install from https://github.com/facebookresearch/wav2letter/wiki/Python-bindings
  "wav2letter python bindings are required to use this functionality. Please install from https://github.com/facebookresearch/wav2letter/wiki/Python-bindings"

I also get this error right above the previous error even though I have CUDA 9 or 10 I believe:

/usr/local/lib/python3.6/dist-packages/torch/cuda/__init__.py:52: UserWarning: CUDA initialization: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx (Triggered internally at  /pytorch/c10/cuda/CUDAFunctions.cpp:100.)
  return torch._C._cuda_getDeviceCount() > 0

I do remember that to get cuda working on a mac with Pytorch you do have to build Pytorch from source which also seems like another 10 thousand steps :(. Let alone integrating that into docker somehow which I don't understand.

Anyway, would appreciate any help whatsoever to get this working. Hopefully this thing can run without CUDA. At the moment I don't want to fine tune the model, simply test.

loretoparisi / wave2vec-recognize-docker Goto Github PK

wave2vec-recognize-docker's Introduction

wav2vec

Disclaimer

Resource

Pre-trained models

How to install

How to Run

Common issues

1. What if my model are not compatible with fairseq?

Alternative install

Contributors

wave2vec-recognize-docker's People

Contributors

Stargazers

Watchers

Forkers

wave2vec-recognize-docker's Issues

Recommend Projects

Recommend Topics

Recommend Org