Giter Site home page Giter Site logo

wave2vec-recognize-docker's Introduction

wav2vec

wav2vec 2.0 Recognize Implementation.

Disclaimer

Wave2vec is part of fairseq This repository is the result of the issue submitted in the fairseq repository here.

Resource

Please first download one of the pre-trained models available from fairseq (see later).

Pre-trained models

Model Finetuning split Dataset Model
Wav2Vec 2.0 Base No finetuning Librispeech download
Wav2Vec 2.0 Base 10 minutes Librispeech download
Wav2Vec 2.0 Base 100 hours Librispeech download
Wav2Vec 2.0 Base 960 hours Librispeech download
Wav2Vec 2.0 Large No finetuning Librispeech download
Wav2Vec 2.0 Large 10 minutes Librispeech download
Wav2Vec 2.0 Large 100 hours Librispeech download
Wav2Vec 2.0 Large 960 hours Librispeech download
Wav2Vec 2.0 Large (LV-60) No finetuning Libri-Light download
Wav2Vec 2.0 Large (LV-60) 10 minutes Libri-Light + Librispeech download
Wav2Vec 2.0 Large (LV-60) 100 hours Libri-Light + Librispeech download
Wav2Vec 2.0 Large (LV-60) 960 hours Libri-Light + Librispeech download

How to install

We make use of python:3.8.6-slim-buster as base image in order to let developers to have more flexibility in customize this Dockerfile. For a simplifed install please refer to Alternative Install section. If you go for this container, please install using the provided Dockerfile

docker build -t wav2vec -f Dockerfile .

How to Run

There are two version of recognize.py.

  • recognize.py: For running legacy finetuned model (without Hydra).
  • recognize.hydra.py: For running new finetuned with newer version of fairseq.

Before running, please copy the downloaded model (e.g. wav2vec_small_10m.pt) to the data/ folder. Please copy there the wav file to test as well, like data/temp.wav in the following examples. So the data/ folder will now look like this

.
├── dict.ltr.txt
├── temp.wav
└── wav2vec_small_10m.pt

We now run the container and the we enter and execute the recognition (recognize.py or recognize.hydra.py).

docker run -d -it --rm -v $PWD/data:/app/data --name w2v wav2vec
docker exec -it w2v bash
python examples/wav2vec/recognize.py --target_dict_path=/app/data/dict.ltr.txt /app/data/wav2vec_small_10m.pt /app/data/temp.wav

Common issues

1. What if my model are not compatible with fairseq?

At the very least, we have tested with fairseq master branch (> v0.10.1, commit ac11107). When you run into issues, like this:

omegaconf.errors.ValidationError: Invalid value 'False', expected one of [hard, soft]
full_key: generation.print_alignment
reference_type=GenerationConfig
object_type=GenerationConfig

It's probably that your model've been finetuned (or trained) with other version of fairseq. You should find yourself which version your model are trained, and edit commit hash in Dockerfile accordingly, BUT IT MIGHT BREAK src/recognize.py.

The workaround is look for what's changed in the parameters inside fairseq source code. In the above example, I've managed to find that:

fairseq/dataclass/configs.py (72a25a4 -> 032a404)

- print_alignment: bool = field(
+ print_alignment: Optional[PRINT_ALIGNMENT_CHOICES] = field(
-     default=False,
+     default=None,
      metadata={
-         "help": "if set, uses attention feedback to compute and print alignment to source tokens"
+         "help": "if set, uses attention feedback to compute and print alignment to source tokens "
+         "(valid options are: hard, soft, otherwise treated as hard alignment)",
+         "argparse_const": "hard",
      },
  )

The problem is fairseq had modified such that generation.print_alignment not valid anymore, so I modify recognize.hydra.py as below (you might wanna modify the value instead):

  OmegaConf.set_struct(w2v["cfg"], False)
+ del w2v["cfg"].generation["print_alignment"]
  cfg = OmegaConf.merge(OmegaConf.structured(Wav2Vec2CheckpointConfig), w2v["cfg"])

Alternative install

We provide an alternative Dockerfile named wav2letter.Dockerfile that makes use of wav2letter/wav2letter:cpu-latest Docker image as FROM. Here are the commands for build, install and run in this case:

docker build -t wav2vec2 -f wav2letter.Dockerfile .
docker run -d -it --rm -v $PWD/data:/root/data --name w2v2 wav2vec2
docker exec -it w2v2 bash
python examples/wav2vec/recognize.py --wav_path /root/data/temp.wav --w2v_path /root/data/wav2vec_small_10m.pt --target_dict_path /root/data/dict.ltr.txt 

Contributors

Thanks to all contributors to this repo.

wave2vec-recognize-docker's People

Contributors

loretoparisi avatar osddeitf avatar raja1196 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

wave2vec-recognize-docker's Issues

Docker build error with fairseq - feb5f07

I've pull the latest commit of this repo, tried run docker build, and got this error:

WARNING: You are using pip version 19.3; however, version 20.3.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
Traceback (most recent call last):
  File "examples/speech_recognition/infer.py", line 17, in <module>
    import editdistance
ModuleNotFoundError: No module named 'editdistance'
The command '/bin/sh -c pip install --editable ./ && python examples/speech_recognition/infer.py --help && python examples/wav2vec/recognize.py --help' returned a non-zero code: 1

After fix it up by add pip install editdistance, I've run into this:

/usr/local/lib/python3.7/site-packages/torch/cuda/__init__.py:52: UserWarning: CUDA initialization: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx (Triggered internally at  /pytorch/c10/cuda/CUDAFunctions.cpp:100.)
  return torch._C._cuda_getDeviceCount() > 0
Traceback (most recent call last):
  File "examples/wav2vec/recognize.py", line 10, in <module>
    from fairseq.models.wav2vec.wav2vec2_asr import base_architecture, Wav2VecEncoder
ImportError: cannot import name 'base_architecture' from 'fairseq.models.wav2vec.wav2vec2_asr' (/app/fairseq/fairseq/models/wav2vec/wav2vec2_asr.py)
The command '/bin/sh -c pip install --editable ./ && python examples/speech_recognition/infer.py --help && python examples/wav2vec/recognize.py --help' returned a non-zero code: 1

I tried wav2letter.Dockerfile, but still got the above error.
Environment:

  • Docker 20.10.
  • Amazon EC2 Ubuntu 18.04 instance (Linux 5.4.0-1029-aws).

I think torch still require GPU to install, or new version of torch required it.
Have you guys run into this? Or should we update the Dockerfile(s).

Understand if it is possible to use own checkpoint from training as model file

Hello,

I've been busy with the default fairseq examples/speech_recognition/infer.py and also this repo's recognize.py, to see if it is possible to run inference using a model we made ourselves by finetuning a base model. We can get the script infer.py to work, but I've noticed that it needs to be able to find the original base model on disk. Moving the checkpoint model to a different machine is cumbersome, the base model has to be in the same location on the target machine.

I've tried to study how the model loading works for almost a day now, but I can't wrap my head around it. I think it only needs some args from the original base model, there is a lot of exchange going on between formats and names cfg, w2v_args, OmegaConf and Namespace.

The recognize.py and recognize.hydra.py break on loading a checkpoint file (but they work on published finetuned models). I would be helped if there is a way to produce a model file that works with recognize.py from the original base model and a checkpoint. I have not been able to find such a tool—I believe it is as simple as adding the correct .cfg.w2v_args info to the checkpoint, but I don't understand how.

I can get recognize.py to work with a checkpoint file with the patch below, but then model loading still refers to the original base model.

@@ -139,13 +162,24 @@ class Wav2VecPredictor:
         return feats
 
     def _load_model(self, model_path, target_dict):
-        w2v = torch.load(model_path)
-
+        #w2v = torch.load(model_path)
+        #if w2v['args'] is None:
+        #    w2v['args'] = Namespace()
         # Without create a FairseqTask
-        args = base_architecture(w2v["args"])
-        model = Wav2VecCtc(args, Wav2VecEncoder(args, target_dict))
-        model.load_state_dict(w2v["model"], strict=True)
-        return model
+        #args = base_architecture(w2v["args"])
+        #model = Wav2VecCtc(args, Wav2VecEncoder(args, target_dict))
+        #model.load_state_dict(w2v["model"], strict=True)
+
+        models, saved_cfg, task = load_model_ensemble_and_task(
+            utils.split_paths(model_path),
+            arg_overrides=None, # ast.literal_eval(args.model_overrides),
+            task=None,
+            suffix="",
+            strict=True,
+            num_shards=1,
+            state=None
+        )
+        return models[0]

KenLM decoder

Hi,
can you please guide me what all need to be changed/added in your scripts to inference with KenLM decoder?

Thanks for the docker!

RuntimeError: [enforce fail at CPUAllocator.cpp:65]

Followed the installation steps, built the Dockerfile (Which had its own hiccups, the fairseq repository does not have base_architecture definition in their models file, will raise a PR for it separately) and ran the code.

Running command
python examples/wav2vec/recognize.py --wav_path /app/data/test.WAV --w2v_path /app/data/wav2vec_small_10m.pt --target_dict_path /app/data/dict.ltr.txt

Error:
RuntimeError: [enforce fail at CPUAllocator.cpp:65] . DefaultCPUAllocator: can't allocate memory: you tried to allocate 314663671488 bytes. Error code 12 (Cannot allocate memory)

Screenshot of the error:
Screen Shot 2020-11-26 at 2 28 34 PM

I tried with different models of wave2vec with similar error. Do let me know if more information is needed. Running this on Azure DS VM.

/root/fairseq/examples/speech_recognition/w2l_decoder.py:41: UserWarning: wav2letter python bindings are required to use this functionality.

Hi! Thank you so much for setting up this docker, I've been looking for so long for some sort of simple way to test this model, just put input audio in and get text back and it's ridiculous how complex it all is to set up. This whole thing should've been packaged into a pip install line! so thank you for your work!

After running the installation example:

docker build -t wav2vec2 -f wav2letter.Dockerfile .
docker run -d -it --rm -v $PWD/data:/root/data --name w2v2 wav2vec2
docker exec -it w2v2 bash

Everything went smoothly up til this line:

python examples/wav2vec/recognize.py --wav_path /root/data/temp.wav --w2v_path /root/data/wav2vec_small_10m.pt --target_dict_path /root/data/dict.ltr.txt

Which I modified to this (because it seemed that the wav file was not found:

#I first cd'd back to the root, and did:

python3 fairseq/examples/wav2vec/recognize.py --wav_path /root/data/temp.wav --w2v_path /root/data/wav2vec2_vox_960h.pt --target_dict_path /root/data/dict.ltr.txt

Now I get this error:

/root/fairseq/examples/speech_recognition/w2l_decoder.py:41: UserWarning: wav2letter python bindings are required to use this functionality. Please install from https://github.com/facebookresearch/wav2letter/wiki/Python-bindings
  "wav2letter python bindings are required to use this functionality. Please install from https://github.com/facebookresearch/wav2letter/wiki/Python-bindings"

I also get this error right above the previous error even though I have CUDA 9 or 10 I believe:

/usr/local/lib/python3.6/dist-packages/torch/cuda/__init__.py:52: UserWarning: CUDA initialization: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx (Triggered internally at  /pytorch/c10/cuda/CUDAFunctions.cpp:100.)
  return torch._C._cuda_getDeviceCount() > 0

I do remember that to get cuda working on a mac with Pytorch you do have to build Pytorch from source which also seems like another 10 thousand steps :(. Let alone integrating that into docker somehow which I don't understand.

Anyway, would appreciate any help whatsoever to get this working. Hopefully this thing can run without CUDA. At the moment I don't want to fine tune the model, simply test.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.