Giter Site home page Giter Site logo

facebookresearch / seamless_communication Goto Github PK

View Code? Open in Web Editor NEW
10.6K 10.6K 1.0K 52.87 MB

Foundational Models for State-of-the-Art Speech and Text Translation

License: Other

Python 9.67% CMake 0.32% Makefile 0.01% Zig 0.26% Shell 0.07% C++ 2.91% C 13.14% Cuda 2.03% Objective-C 0.49% Metal 0.62% Jupyter Notebook 70.47%

seamless_communication's Issues

Missing asset card

After several adjustments and adaptations in the repository, I was able to compile the source code in Ubuntu.
However, the execution gives an error because the program does not find the CARD of the model.
I imagine that the documentation is missing some step on how to setup for inference only.

(voz) astro@ubuntu:~/dev/voz/fix/seamless_communication$ m4t_predict  "seu tolo, nao sei de nada" t2tt en --src_lang pt
2023-08-22 22:46:12,134 INFO -- m4t_scripts.predict.predict: Running inference on the CPU.
Traceback (most recent call last):
  File "/home/astro/dev/voz/fix/fairseq2/src/fairseq2/assets/card_storage.py", line 76, in load_card
    fp = open(pathname)
FileNotFoundError: [Errno 2] No such file or directory: '/home/astro/dev/voz/voz/lib/python3.8/site-packages/seamless_communication/assets/cards/seamlessM4T_large.yaml'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/astro/dev/voz/voz/bin/m4t_predict", line 8, in <module>
    sys.exit(main())
  File "/home/astro/dev/voz/voz/lib/python3.8/site-packages/m4t_scripts/predict/predict.py", line 69, in main
    translator = Translator(args.model_name, args.vocoder_name, device)
  File "/home/astro/dev/voz/voz/lib/python3.8/site-packages/seamless_communication/models/inference/translator.py", line 60, in __init__
    self.model: UnitYModel = load_unity_model(
  File "/home/astro/dev/voz/fix/fairseq2/src/fairseq2/models/utils/model_loader.py", line 175, in __call__
    card = self.asset_store.retrieve_card(model_name_or_card)
  File "/home/astro/dev/voz/fix/fairseq2/src/fairseq2/assets/store.py", line 101, in retrieve_card
    data = self._storage.load_card(name)
  File "/home/astro/dev/voz/fix/fairseq2/src/fairseq2/assets/card_storage.py", line 78, in load_card
    raise AssetCardNotFoundError(
fairseq2.assets.card_storage.AssetCardNotFoundError: An asset card with the name 'seamlessM4T_large' cannot be found.

Install issue on ARM64 / embedded GPU

Have same installation issue on a ARM64 with NVIDIA GPU, 16 GB or 64 GB, CUDA 11.2.

pip install --verbose --trusted-host fair-package-repo.s3-website-us-east-1.amazonaws.com --extra-index-url http://fair-package-repo.s3-website-us-east-1.amazonaws.com/fairseq2/whl/stable/pt2.0.1/cu118 fairseq2 --verbose
Using pip 23.2.1 from /opt/ssd700gb/venv/lib/python3.8/site-packages/pip (python 3.8)
Non-user install because user site-packages disabled
Created temporary directory: /tmp/pip-build-tracker-1winxm5u
Initialized build tracking at /tmp/pip-build-tracker-1winxm5u
Created build tracker: /tmp/pip-build-tracker-1winxm5u
Entered build tracker: /tmp/pip-build-tracker-1winxm5u
Created temporary directory: /tmp/pip-install-0uco4p7o
Created temporary directory: /tmp/pip-ephem-wheel-cache-t8of0zzg
Looking in indexes: https://pypi.org/simple, http://fair-package-repo.s3-website-us-east-1.amazonaws.com/fairseq2/whl/stable/pt2.0.1/cu118
2 location(s) to search for versions of fairseq2:

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/opt/ssd700gb/venv/lib/python3.8/site-packages/pip/_internal/cli/base_command.py", line 180, in exc_logging_wrapper
status = run_func(*args)
File "/opt/ssd700gb/venv/lib/python3.8/site-packages/pip/_internal/cli/req_command.py", line 248, in wrapper
return func(self, options, args)
File "/opt/ssd700gb/venv/lib/python3.8/site-packages/pip/_internal/commands/install.py", line 377, in run
requirement_set = resolver.resolve(
File "/opt/ssd700gb/venv/lib/python3.8/site-packages/pip/_internal/resolution/resolvelib/resolver.py", line 101, in resolve
raise error from e
pip._internal.exceptions.DistributionNotFound: No matching distribution found for fairseq2n==0.1.0
Remote version of pip: 23.2.1
Local version of pip: 23.2.1
Was pip installed by pip? True
Removed build tracker: '/tmp/pip-build-tracker-1winxm5u'

Predict using code

I wanted to ask is it possible to run inference using code ,

I am tryting to do S2ST but i dont know what to put in speech units part in the code given below

wav, sr = translator.synthesize_speech(<speech_units>, <tgt_lang>)

# Save the translated audio generation.
torchaudio.save(
    <path_to_save_audio>,
    wav[0].cpu(),
    sample_rate=sr,
)

Streaming audio S2TT Example/Guidance

I am interested in streaming audio chunks and performing continuous S2TT. Is this possible with the current code? If so, any guidance would be much appreciated!

m4t s2tt produce bad quality transciption

on a colab GPU instance, I setup m4t runtime env and try a s2tt task. It produce bad quality transcription as follows compared whisper. I wonder if I have been doing something wrong on setup seamless m4t. My source voice is attached as a zip to this post.

/content/seamless_communication# m4t_predict japanweather.wav s2tt jpn
2023-08-23 12:32:11,231 INFO -- m4t_scripts.predict.predict: Running inference on the GPU.
Using the cached checkpoint of the model 'seamlessM4T_large'. Set force=True to download again.
Using the cached tokenizer of the model 'seamlessM4T_large'. Set force=True to download again.
Using the cached checkpoint of the model 'vocoder_36langs'. Set force=True to download again.
2023-08-23 12:33:46,030 INFO -- m4t_scripts.predict.predict: Translated text in jpn: 台風の最新情報は二十八日三時頃に**の西海を伴う注意必要な状況もありました ⁇ 特に台風に向かって強い風力が続く状況もあります ⁇
/content/seamless_communication#

while using whisper:

whisper japanweather.wav
Detecting language using up to the first 30 seconds. Use --language to specify the language
Detected language: Japanese
[00:00.000 --> 00:03.680] 予防センターから台風の最新情報をお伝えいたします
[00:03.680 --> 00:07.240] 大型で強い台風5号は28日3時現在
[00:07.240 --> 00:11.040] **の西の海上を北に時速20キロで済んでいます
[00:11.040 --> 00:14.440] 中心の気圧は955ヘクトパスカル
[00:14.440 --> 00:17.240] 中心吹きの最大風速は40メートルです
[00:17.240 --> 00:19.760] この後も北上を続けまして
[00:19.760 --> 00:23.840] 28日のうちに**大陸に上陸する見通しです
[00:23.840 --> 00:27.760] 大陸に上陸した後は急速に成力を弱めまして
[00:27.760 --> 00:32.400] 29日には熱帯的やつに変わると見られます
[00:32.400 --> 00:35.440] まだ強い制御庫を保っているということもありまして
[00:35.440 --> 00:40.320] 沖縄周辺、特に先島方面では風が強まるような状況です
[00:40.320 --> 00:45.680] 平均で15メートルを超えるような風の強い状況となることも考えられます
[00:45.680 --> 00:50.320] 恐怖や高波などには引き続き注意が必要といった状況です
[00:50.320 --> 00:53.640] また台風に向かって湿った空気が流れ込む影響で
[00:53.640 --> 00:55.920] 沖縄方面、先島だけではなくて
[00:55.920 --> 01:00.000] 沖縄は本当エリアにも甘くものがかかりやすい状況が続きます
[01:00.000 --> 01:01.760] 短時間ではありますけれども
[01:01.760 --> 01:03.880] 雨がざっと降るようなこともありますし
[01:03.880 --> 01:06.040] 雷の友だう心配もありますので
[01:06.040 --> 01:11.680] 雨や風、そして高波には引き続き注意が必要と言えそうです
[01:11.680 --> 01:14.680] 以上台風に関する情報をお伝えいたしました

japanweather.wav.zip

The installation cannot be competed

PIP version: pip 23.2.1
Python version: python 3.10
Error message after running pip install fairseq2==0.1:

Collecting fairseq2==0.1
  Obtaining dependency information for fairseq2==0.1 from https://files.pythonhosted.org/packages/cd/27/46c14e28e8cb0aa602660ce64d4547a37f460d382e4fcf94f2a53d47e5b0/fairseq2-0.1.0-py3-none-any.whl.metadata
  Using cached fairseq2-0.1.0-py3-none-any.whl.metadata (1.2 kB)
INFO: pip is looking at multiple versions of fairseq2 to determine which version is compatible with other requirements. This could take a while.
ERROR: Could not find a version that satisfies the requirement fairseq2n==0.1.0 (from fairseq2) (from versions: none)
ERROR: No matching distribution found for fairseq2n==0.1.0

precision error

(seamless) root@55d07513038c:~/seamless_communication# m4t_predict mirror.wav s2tt eng --model_name seamlessM4T_large
2023-08-23 04:07:17,769 INFO -- m4t_scripts.predict.predict: Running inference on the CPU.
Using the cached checkpoint of the model 'seamlessM4T_large'. Set `force=True` to download again.
Using the cached tokenizer of the model 'seamlessM4T_large'. Set `force=True` to download again.
Using the cached checkpoint of the model 'vocoder_36langs'. Set `force=True` to download again.
Traceback (most recent call last):
  File "/root/miniconda3/envs/seamless/bin/m4t_predict", line 8, in <module>
    sys.exit(main())
  File "/root/miniconda3/envs/seamless/lib/python3.10/site-packages/m4t_scripts/predict/predict.py", line 70, in main
    translated_text, wav, sr = translator.predict(
  File "/root/miniconda3/envs/seamless/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/root/miniconda3/envs/seamless/lib/python3.10/site-packages/seamless_communication/models/inference/translator.py", line 209, in predict
    result = self.get_prediction(
  File "/root/miniconda3/envs/seamless/lib/python3.10/site-packages/seamless_communication/models/inference/translator.py", line 120, in get_prediction
    return generator(
  File "/root/miniconda3/envs/seamless/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/root/miniconda3/envs/seamless/lib/python3.10/site-packages/seamless_communication/models/unity/generator.py", line 173, in __call__
    text_output = self.s2t_generator.generate_ex(source_seqs, source_seq_lens)
  File "/root/miniconda3/envs/seamless/lib/python3.10/site-packages/fairseq2/generation/text.py", line 155, in generate_ex
    return self._do_generate(source_seqs, source_seq_lens)
  File "/root/miniconda3/envs/seamless/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/root/miniconda3/envs/seamless/lib/python3.10/site-packages/fairseq2/generation/text.py", line 71, in _do_generate
    encoder_output, encoder_padding_mask = self.model.encode(
  File "/root/miniconda3/envs/seamless/lib/python3.10/site-packages/seamless_communication/models/unity/model.py", line 190, in encode
    seqs, padding_mask = self.encoder_frontend(seqs, seq_lens)
  File "/root/miniconda3/envs/seamless/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/seamless/lib/python3.10/site-packages/fairseq2/models/wav2vec2/frontend.py", line 130, in forward
    seqs, seq_lens = self.extract_features(seqs, seq_lens)
  File "/root/miniconda3/envs/seamless/lib/python3.10/site-packages/fairseq2/models/wav2vec2/frontend.py", line 163, in extract_features
    seqs = self.post_extract_layer_norm(seqs)
  File "/root/miniconda3/envs/seamless/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/seamless/lib/python3.10/site-packages/fairseq2/nn/normalization.py", line 107, in forward
    return layer_norm(x, self.normalized_shape, self.weight, self.bias, self.eps)
  File "/root/miniconda3/envs/seamless/lib/python3.10/site-packages/torch/nn/functional.py", line 2515, in layer_norm
    return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'

Any ideas?

CPU

Can I run this on CPU only, in a windows machine?

Error when fine-tuning the model: ModuleNotFoundError: No module named 'fairseq2.models.unity'

Hello, I am following the fine-tuning guide with the following command:

torchrun --rdzv-backend=c10d --rdzv-endpoint=localhost:0 --nnodes=1 --nproc-per-node=8 --no-python python finetune.py --mode SPEECH_TO_SPEECH --train_dataset ./m4t_dataset/train_manifest.json --eval_dataset ./m4t_dataset/validation_manifest.json --learning_rate 1e-6 --warmup_steps 100 --max_epochs 10 --patience 3 --model_name seamlessM4T_large --save_model_to ./m4t_dataset/checkpoint.pt

However, this happens:



Traceback (most recent call last):
  File "/home/privateserver/Coding/seamless_communication/scripts/m4t/finetune/finetune.py", line 16, in <module>
    import trainer
  File "/home/privateserver/Coding/seamless_communication/scripts/m4t/finetune/trainer.py", line 21, in <module>
    from fairseq2.models.unity import UnitYModel
ModuleNotFoundError: No module named 'fairseq2.models.unity'
Traceback (most recent call last):
  File "/home/privateserver/Coding/seamless_communication/scripts/m4t/finetune/finetune.py", line 16, in <module>
    import trainer
  File "/home/privateserver/Coding/seamless_communication/scripts/m4t/finetune/trainer.py", line 21, in <module>
    from fairseq2.models.unity import UnitYModel
ModuleNotFoundError: No module named 'fairseq2.models.unity'
Traceback (most recent call last):
  File "/home/privateserver/Coding/seamless_communication/scripts/m4t/finetune/finetune.py", line 16, in <module>
    import trainer
  File "/home/privateserver/Coding/seamless_communication/scripts/m4t/finetune/trainer.py", line 21, in <module>
    from fairseq2.models.unity import UnitYModel
ModuleNotFoundError: No module named 'fairseq2.models.unity'
Traceback (most recent call last):
  File "/home/privateserver/Coding/seamless_communication/scripts/m4t/finetune/finetune.py", line 16, in <module>
    import trainer
  File "/home/privateserver/Coding/seamless_communication/scripts/m4t/finetune/trainer.py", line 21, in <module>
    from fairseq2.models.unity import UnitYModel
ModuleNotFoundError: No module named 'fairseq2.models.unity'
Traceback (most recent call last):
  File "/home/privateserver/Coding/seamless_communication/scripts/m4t/finetune/finetune.py", line 16, in <module>
    import trainer
  File "/home/privateserver/Coding/seamless_communication/scripts/m4t/finetune/trainer.py", line 21, in <module>
    from fairseq2.models.unity import UnitYModel
ModuleNotFoundError: No module named 'fairseq2.models.unity'
Traceback (most recent call last):
  File "/home/privateserver/Coding/seamless_communication/scripts/m4t/finetune/finetune.py", line 16, in <module>
    import trainer
  File "/home/privateserver/Coding/seamless_communication/scripts/m4t/finetune/trainer.py", line 21, in <module>
    from fairseq2.models.unity import UnitYModel
ModuleNotFoundError: No module named 'fairseq2.models.unity'
Traceback (most recent call last):
  File "/home/privateserver/Coding/seamless_communication/scripts/m4t/finetune/finetune.py", line 16, in <module>
    import trainer
  File "/home/privateserver/Coding/seamless_communication/scripts/m4t/finetune/trainer.py", line 21, in <module>
    from fairseq2.models.unity import UnitYModel
ModuleNotFoundError: No module named 'fairseq2.models.unity'
Traceback (most recent call last):
  File "/home/privateserver/Coding/seamless_communication/scripts/m4t/finetune/finetune.py", line 16, in <module>
    import trainer
  File "/home/privateserver/Coding/seamless_communication/scripts/m4t/finetune/trainer.py", line 21, in <module>
    from fairseq2.models.unity import UnitYModel
ModuleNotFoundError: No module named 'fairseq2.models.unity'
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 87402) of binary: python
Traceback (most recent call last):
  File "/home/privateserver/Coding/seamless_communication/venv/bin/torchrun", line 8, in <module>
    sys.exit(main())
  File "/home/privateserver/Coding/seamless_communication/venv/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
    return f(*args, **kwargs)
  File "/home/privateserver/Coding/seamless_communication/venv/lib/python3.10/site-packages/torch/distributed/run.py", line 794, in main
    run(args)
  File "/home/privateserver/Coding/seamless_communication/venv/lib/python3.10/site-packages/torch/distributed/run.py", line 785, in run
    elastic_launch(
  File "/home/privateserver/Coding/seamless_communication/venv/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/privateserver/Coding/seamless_communication/venv/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
python FAILED
------------------------------------------------------------
Failures:
[1]:
  time      : 2023-08-23_01:55:36
  host      : privateserver
  rank      : 1 (local_rank: 1)
  exitcode  : 1 (pid: 87403)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[2]:
  time      : 2023-08-23_01:55:36
  host      : privateserver
  rank      : 2 (local_rank: 2)
  exitcode  : 1 (pid: 87404)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[3]:
  time      : 2023-08-23_01:55:36
  host      : privateserver
  rank      : 3 (local_rank: 3)
  exitcode  : 1 (pid: 87405)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[4]:
  time      : 2023-08-23_01:55:36
  host      : privateserver
  rank      : 4 (local_rank: 4)
  exitcode  : 1 (pid: 87406)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[5]:
  time      : 2023-08-23_01:55:36
  host      : privateserver
  rank      : 5 (local_rank: 5)
  exitcode  : 1 (pid: 87407)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[6]:
  time      : 2023-08-23_01:55:36
  host      : privateserver
  rank      : 6 (local_rank: 6)
  exitcode  : 1 (pid: 87408)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[7]:
  time      : 2023-08-23_01:55:36
  host      : privateserver
  rank      : 7 (local_rank: 7)
  exitcode  : 1 (pid: 87409)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2023-08-23_01:55:36
  host      : privateserver
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 87402)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================

Can someone tell me how to fix this? Thank you

run time error

$m4t_predict hello how are you t2st eng --src_lang eng --output_path .
m4t_predict: command not found

I am trying to run this but I am not able to get the result. how do I get this working

Albanian language support

Hi there,
I see that Albanian is not on the list of supported languages. What can be done by anyone from the Albanian speaking community to have our language supported?

CoreML version

Anyone try to optimize this PyTorch model for the Apple neural engine? Creating a CoreML model?

Voice translated incorrectly, into generic phrases

Text translated into generic that has nothing in common with the original text: I don't know, I don't know what to do etc

Command used: m4t_predict test.wav s2st eng --output_path res.wav
Source language: Russian, target lang - English

3 test runs, using the same input data:

  1. m4t_scripts.predict.predict: Translated text in eng: I don't know what to do. I don't know what to do. I don't know what to do.
  2. m4t_scripts.predict.predict: Translated text in eng: I'm not sure I'm going to be able to do it, but I'm going to do it.
  3. Translated text in eng: I don't know. I don't know. I don't know.

OS: Ubuntu 22.04.3 LTS
Python: 3.11.4
Conda env
GPU: RTX 2060

out of ram with On-device Models

Hello
My computer is a fanless Linux Mint (latest version) with 6Go of ram
with no other programs runing
free -h :
total used free shared buff/cache available
Mem: 5,6Gi 398Mi 4,2Gi 109Mi 1,0Gi 4,9Gi
Swap: 2,0Gi 350Mi 1,7Gi

This code gets killed and htop shows the ram and the swap usage going 100%
Any help please ?

import torchaudio
import torch

TEST_AUDIO_PATH = "jfk.wav"
TGT_LANG = "eng"
audio_input, _ = torchaudio.load(TEST_AUDIO_PATH) # Load waveform using torchaudio
s2t_model = torch.jit.load("unity_on_device_s2t.ptl") # Load exported S2T model
text = s2t_model(audio_input, tgt_lang=TGT_LANG) # Forward call with tgt_lang specified for ASR or S2TT
print(text)

libsndfile Error

I first cloned the repo and then created a conda environment.

Afterwards I ran following two commands as stated in Installation Section:

pip install .
conda install -y -c conda-forge libsndfile

Installation completes without and error. However if I run following code:

python3 scripts/m4t/predict/predict.py "Teknolojiyi merkezine alan yenilikçi yapımızla hayatını kolaylaştırmaya devam ediyoruz." t2tt eng --src_lang tur

I get following output:

Traceback (most recent call last):
  File "/home/guvenc/seamless_communication/scripts/m4t/predict/predict.py", line 10, in <module>
    from seamless_communication.models.inference import Translator
  File "/home/guvenc/.local/lib/python3.10/site-packages/seamless_communication/models/inference/__init__.py", line 6, in <module>
    from seamless_communication.models.inference.translator import Translator as Translator
  File "/home/guvenc/.local/lib/python3.10/site-packages/seamless_communication/models/inference/translator.py", line 12, in <module>
    from fairseq2.data import Collater
  File "/home/guvenc/.local/lib/python3.10/site-packages/fairseq2/data/__init__.py", line 7, in <module>
    from fairseq2.data.cstring import CString as CString
  File "/home/guvenc/.local/lib/python3.10/site-packages/fairseq2/data/cstring.py", line 58, in <module>
    from fairseq2n.bindings.data.string import CString as CString
  File "/home/guvenc/.local/lib/python3.10/site-packages/fairseq2n/__init__.py", line 90, in <module>
    _load_sndfile()
  File "/home/guvenc/.local/lib/python3.10/site-packages/fairseq2n/__init__.py", line 80, in _load_sndfile
    raise OSError(
OSError: libsndfile is not found!. Use your system package manager to install it (e.g. `apt install libsndfile1`).

I use Ubuntu based Pop!_OS.

sudo apt install libsndfile1 outputs:

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
libsndfile1 is already the newest version (1.0.31-2build1).
The following packages were automatically installed and are no longer required:
  golang-1.18-go golang-1.18-src golang-src
Use 'sudo apt autoremove' to remove them.
0 upgraded, 0 newly installed, 0 to remove and 250 not upgraded.

Installing libsndfile (https://github.com/libsndfile/libsndfile) from source didn't help either.

I wonder what I might be doing wrong.
Thanks!

Fine-tuning on single GPU

Is it possible to fine tune this in limited hardware environment, like a single 3090?

Any thoughts in Lora Implementation?

OSError: libtorch_cuda_cpp.so: cannot open shared object file: No such file or directory

OS: ubuntu 18.04
Python: 3.9
PyTorch: 2.0.1+cu117

Installation as follow sucessfully.

pip install .
conda install -y -c conda-forge libsndfile

run with error:

m4t_predict <input_text> t2tt <tgt_lang> --src_lang <src_lang>

Traceback (most recent call last):
File "/data/seamless_communication/build/lib/m4t_scripts/predict/predict.py", line 9, in
import torchaudio
File "/home/ai/anaconda3/envs/cpm39/lib/python3.9/site-packages/torchaudio/init.py", line 1, in
from torchaudio import ( # noqa: F401
File "/home/ai/anaconda3/envs/cpm39/lib/python3.9/site-packages/torchaudio/_extension.py", line 135, in
_init_extension()
File "/home/ai/anaconda3/envs/cpm39/lib/python3.9/site-packages/torchaudio/_extension.py", line 105, in _init_extension
_load_lib("libtorchaudio")
File "/home/ai/anaconda3/envs/cpm39/lib/python3.9/site-packages/torchaudio/_extension.py", line 52, in _load_lib
torch.ops.load_library(path)
File "/home/ai/anaconda3/envs/cpm39/lib/python3.9/site-packages/torch/_ops.py", line 643, in load_library
ctypes.CDLL(path)
File "/home/ai/anaconda3/envs/cpm39/lib/python3.9/ctypes/init.py", line 382, in init
self._handle = _dlopen(self._name, mode)
OSError: libtorch_cuda_cpp.so: cannot open shared object file: No such file or directory

Hardware Requirements for Deploying seamless_communication on a Linux Server

Hello,

I'm interested in deploying the seamless_communication project on my own Linux server. Before proceeding, I'd like to ensure that my server meets the necessary hardware requirements.

Could you please provide details on the recommended or minimum hardware specifications for running this project? Specifically, I'm looking for information on:

CPU requirements (e.g., number of cores, speed)
RAM size
GPU requirements (if applicable)
Disk space
I've gone through the documentation, but I couldn't find specific details regarding these hardware aspects. Any guidance or best practices related to hardware would be greatly appreciated.

Thank you for your time and assistance!

Best regards,

KeyError: 'generator'

Due to Internet reasons, I downloaded multitask_unity_large.pt and tokenizer.model from https://huggingface.co/facebook/seamless-m4t-large/tree/main by myself. And I modified the pathname in fairseq2/models/utils/model_loader.py and fairseq2/models/nllb/loader.py for the model and tokenizer.

But run with error:

m4t_predict <input_text> t2tt <tgt_lang> --src_lang <src_lang>

Traceback (most recent call last):
File "/data/anaconda3/envs/seamless/lib/python3.10/site-packages/fairseq2/models/utils/checkpoint_loader.py", line 83, in load_checkpoint
checkpoint = converter(checkpoint)
File "/data/anaconda3/envs/seamless/lib/python3.10/site-packages/seamless_communication/models/vocoder/loader.py", line 29, in _upgrade_checkpoint
old_state_dict = checkpoint["generator"]
KeyError: 'generator'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/data/anaconda3/envs/seamless/bin/m4t_predict", line 8, in
sys.exit(main())
File "/data/anaconda3/envs/seamless/lib/python3.10/site-packages/m4t_scripts/predict/predict.py", line 72, in main
translator = Translator(args.model_name, args.vocoder_name, device, dtype)
File "/data/anaconda3/envs/seamless/lib/python3.10/site-packages/seamless_communication/models/inference/translator.py", line 79, in init
self.vocoder: Vocoder = self.load_model_for_inference(
File "/data/anaconda3/envs/seamless/lib/python3.10/site-packages/seamless_communication/models/inference/translator.py", line 90, in load_model_for_inference
model = load_model_fn(model_name_or_card, device=device, dtype=dtype)
File "/data/anaconda3/envs/seamless/lib/python3.10/site-packages/fairseq2/models/utils/model_loader.py", line 188, in call
checkpoint = load_checkpoint(
File "/data/anaconda3/envs/seamless/lib/python3.10/site-packages/fairseq2/models/utils/checkpoint_loader.py", line 85, in load_checkpoint
raise_error(ex)
File "/data/anaconda3/envs/seamless/lib/python3.10/site-packages/fairseq2/models/utils/checkpoint_loader.py", line 70, in raise_error
raise RuntimeError(
RuntimeError: The load of the checkpoint of the model 'vocoder_36langs' has failed. See nested exception for details.

seamless communication on raspberry pi 4 or Nvidia Jetson

Is it possible to use pretrained model for inference on raspberry pi 4 or Nvidia Jetson?
What is the latest hardware requirements to use on boards?
Anyone can help me, or do you have such experience to use pretrained model like this on boards?

m4t_predict: command not found

Unable to find the m4t_predict command. Can you please advise.

ubuntu 20.04.1

pip --version
pip 23.0.1 from /usr/local/lib/python3.8/dist-packages/pip (python 3.8)

Commercial License

#28

I agree with this. Please make it a commercial license. LLaMa 2 is now everywhere. Hope this model can have similar success. without the commercial license, there is not much use of this model. I support @bitnom 's comments in the above issue. Since it is closed, i am raising a new one to get some support on this.

Finetuning on custom dataset for ASR

Hi, I have a custom dataset of one language with a csv files of the labels and filepaths, as well as a directory of audio files.

Can anyone suggest what steps they took to finetune the model (especially for a monolingua ASR task), as I am unclear how to prepare the local dataset given that I would not have the manifest and many other details present in the fleurs dataset in the example finetuning

Any help is greatly appreciated!

RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'

root@Ubuntu-2204-jammy-amd64-base ~/seamless/seamless_communication # python3 scripts/m4t/predict/predict.py привет t2tt eng --src_lang rus
INFO:__main__:Running inference on the CPU.
Using the cached checkpoint of the model 'seamlessM4T_large'. Set `force=True` to download again.
Using the cached tokenizer of the model 'seamlessM4T_large'. Set `force=True` to download again.
Using the cached checkpoint of the model 'vocoder_36langs'. Set `force=True` to download again.
Traceback (most recent call last):
  File "/root/seamless/seamless_communication/scripts/m4t/predict/predict.py", line 86, in <module>
    main()
  File "/root/seamless/seamless_communication/scripts/m4t/predict/predict.py", line 67, in main
    translated_text, wav, sr = translator.predict(
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/seamless_communication/models/inference/translator.py", line 209, in predict
    result = self.get_prediction(
  File "/usr/local/lib/python3.10/dist-packages/seamless_communication/models/inference/translator.py", line 120, in get_prediction
    return generator(
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/seamless_communication/models/unity/generator.py", line 175, in __call__
    text_output = self.t2t_generator.generate_ex(source_seqs, source_seq_lens)
  File "/usr/local/lib/python3.10/dist-packages/fairseq2/generation/text.py", line 155, in generate_ex
    return self._do_generate(source_seqs, source_seq_lens)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/fairseq2/generation/text.py", line 71, in _do_generate
    encoder_output, encoder_padding_mask = self.model.encode(
  File "/usr/local/lib/python3.10/dist-packages/seamless_communication/models/unity/model.py", line 191, in encode
    return self.encoder(seqs, padding_mask)  # type: ignore[no-any-return]
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/fairseq2/nn/transformer/encoder.py", line 155, in forward
    seqs, padding_mask = layer(seqs, padding_mask)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/fairseq2/nn/transformer/encoder_layer.py", line 175, in forward
    seqs = self._forward_self_attn(seqs, padding_mask)
  File "/usr/local/lib/python3.10/dist-packages/fairseq2/nn/transformer/encoder_layer.py", line 187, in _forward_self_attn
    seqs = self.self_attn_layer_norm(seqs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/fairseq2/nn/normalization.py", line 107, in forward
    return layer_norm(x, self.normalized_shape, self.weight, self.bias, self.eps)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/functional.py", line 2515, in layer_norm
    return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'

Hello. I have a problems with running

python3 scripts/m4t/predict/predict.py привет t2tt eng --src_lang rus

Any ideas how to solve?

Licensing and speech APIs (Dear Mark)

I just want to put it on the record here that achieving anything close to what this model provides is prohibitively expensive and prone to technical issues, for any startup. We could really benefit from a conditional, commercial-friendly license for this model.

At least LLAMA2 gave us a pathway to bringing useful things to market. Open-source will eventually surpass this model anyway. It seems like it would have been super-cool to lock startups into a LLAMA2-like license, which would pay off down the road.

I got really excited when I saw the news headline for this model. Elevenlabs and bark are not fun to build around (Not throwing shade).

Please Mark, liberate us from them.

Bug in SeamlessM4T-Large: t2tt when target is 'yue'

When the seamlessM4T_large model is used for t2tt, tgt_lang='yue', src_lang='eng', the returned results are in Mandarin with Simplified Han glyphs (expected results are in Cantonese with Traditional Han glyphs.

import torch
from seamless_communication.models.inference import Translator

translator_medium = Translator("seamlessM4T_medium", "vocoder_36langs", torch.device("cuda:0"), torch.float16)
translator_large = Translator("seamlessM4T_large", "vocoder_36langs", torch.device("cuda:0"), torch.float16)

message_to_translate = 'The forces of Syria’s president, Bashar al-Assad, struck soon after 2am. Residents of Ghouta, a Damascus suburb, told reporters that they heard a strange noise, as if someone was opening a bottle of Pepsi. A local doctor, fighting back tears, explained that many people had sought shelter underground, but the gas was heavier than air and it pooled in basements and cellars. Had they climbed the stairs instead, they would have lived.'
translated_text, _, _ = translator_medium.predict(message_to_translate, "t2tt", 'yue', src_lang='eng')
# with medium size model, we got expected Cantonese contents in Tradiditional Han glyphs
print(f'from medium model: {translated_text}')
translated_text, _, _ = translator_large.predict(message_to_translate, "t2tt", 'yue', src_lang='eng')
# with large size model, we got Mandarin contents in Simplified Han glyphs (NOT expected yue in Traditional Han script)
print(f'from large model: {translated_text}')

The results

from medium model: 敘利亞總統 Bashar al-Assad 嘅軍隊喺早上 2 點之後好快就擊中咗 達馬士革郊區 Ghouta 嘅居民話畀記者知 佢哋聽到一個奇怪嘅聲音 就好似有人打開一個<unk>酒瓶 一位當地醫生反抗眼淚 佢話好多人喺地下尋求庇護 但氣體比空氣好重 佢哋喺地下室同地下室聚集
from large model: 叙利亚总统巴沙尔·阿萨德 (Bashar al-Assad) 的军队在凌晨 2 点袭击. 大马士革郊区古塔 (Ghouta) 的居民告诉记者,他们听到一个奇怪的噪音,好像有人在打开百事可乐的瓶子. 一个当地医生,控制着眼泪,解释说许多人寻求地下避难所,但气体比空气更重,它聚集在地下室和地下室.如果他们爬上楼梯,他们会活下来.

Hosting the model on a notebook

Hi, there. I tried to install the package with the code

pip install fairseq2==0.1
pip install .

However, I met the error such that

"ERROR: Cannot uninstall 'TBB'. It is a distutils installed project and thus we cannot accurately determine which files belong to it which would lead to only a partial uninstall.
ERROR: Directory '.' is not installable. Neither 'setup.py' nor 'pyproject.toml' found.".

May I ask how to solve this issue? Thank you very much in advance.

failed in asr task

I try to test asr task in cli, but failed, do I miss anything?

$m4t_predict --model seamlessM4T_medium 16k.wav asr eng
2023-08-23 16:17:41,203 INFO -- m4t_scripts.predict.predict: Running inference on the GPU.
Using the cached checkpoint of the model 'seamlessM4T_medium'. Set force=True to download again.
Using the cached tokenizer of the model 'seamlessM4T_medium'. Set force=True to download again.
Using the cached checkpoint of the model 'vocoder_36langs'. Set force=True to download again.
Traceback (most recent call last):
....
File "/home/kaisermac/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/kaisermac/miniconda3/lib/python3.11/site-packages/fairseq2/nn/transformer/relative_attention.py", line 293, in forward
raise ValueError(
ValueError: The input sequence length must be less than or equal to the maximum sequence length (4096), but is 16272 instead.

Error with running predict.py

I have some problem while execute script:

❯ python3 predict.py /home/jambo/Downloads/bbb.mp3 s2st ukr --output_path au.mp3 --model_name seamlessM4T_large
INFO:__main__:Running inference on the GPU.
Downloading the checkpoint of the model 'seamlessM4T_large'...
100%|████████████████████████████████████████████████████████████████████████████████| 10.7G/10.7G [31:36<00:00, 6.03MB/s]
Downloading the tokenizer of the model 'seamlessM4T_large'...
100%|████████████████████████████████████████████████████████████████████████████████| 4.93M/4.93M [00:00<00:00, 6.06MB/s]
Downloading the checkpoint of the model 'vocoder_36langs'...
100%|██████████████████████████████████████████████████████████████████████████████████| 160M/160M [00:27<00:00, 6.06MB/s]
Traceback (most recent call last):
  File "/home/jambo/Documents/myown/python/projects/seamless_communication/src/predict.py", line 86, in <module>
    main()
  File "/home/jambo/Documents/myown/python/projects/seamless_communication/src/predict.py", line 67, in main
    translated_text, wav, sr = translator.predict(
  File "/home/jambo/Documents/myown/python/envs/seamless/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/jambo/Documents/myown/python/projects/seamless_communication/src/seamless_communication/models/inference/translator.py", line 209, in predict
    result = self.get_prediction(
  File "/home/jambo/Documents/myown/python/projects/seamless_communication/src/seamless_communication/models/inference/translator.py", line 120, in get_prediction
    return generator(
  File "/home/jambo/Documents/myown/python/envs/seamless/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/jambo/Documents/myown/python/projects/seamless_communication/src/seamless_communication/models/unity/generator.py", line 173, in __call__
    text_output = self.s2t_generator.generate_ex(source_seqs, source_seq_lens)
  File "/home/jambo/Documents/myown/python/envs/seamless/lib/python3.10/site-packages/fairseq2/generation/text.py", line 155, in generate_ex
    return self._do_generate(source_seqs, source_seq_lens)
  File "/home/jambo/Documents/myown/python/envs/seamless/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/jambo/Documents/myown/python/envs/seamless/lib/python3.10/site-packages/fairseq2/generation/text.py", line 71, in _do_generate
    encoder_output, encoder_padding_mask = self.model.encode(
  File "/home/jambo/Documents/myown/python/projects/seamless_communication/src/seamless_communication/models/unity/model.py", line 191, in encode
    return self.encoder(seqs, padding_mask)  # type: ignore[no-any-return]
  File "/home/jambo/Documents/myown/python/envs/seamless/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/jambo/Documents/myown/python/projects/seamless_communication/src/seamless_communication/models/unity/adaptor_block.py", line 104, in forward
    seqs, padding_mask = self.inner(seqs, padding_mask, layer_output_hook)
  File "/home/jambo/Documents/myown/python/envs/seamless/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/jambo/Documents/myown/python/envs/seamless/lib/python3.10/site-packages/fairseq2/nn/transformer/encoder.py", line 155, in forward
    seqs, padding_mask = layer(seqs, padding_mask)
  File "/home/jambo/Documents/myown/python/envs/seamless/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/jambo/Documents/myown/python/envs/seamless/lib/python3.10/site-packages/fairseq2/models/conformer/block.py", line 124, in forward
    seqs = self._forward_conv(seqs, padding_mask)
  File "/home/jambo/Documents/myown/python/envs/seamless/lib/python3.10/site-packages/fairseq2/models/conformer/block.py", line 169, in _forward_conv
    seqs = self.conv(seqs, padding_mask)
  File "/home/jambo/Documents/myown/python/envs/seamless/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/jambo/Documents/myown/python/envs/seamless/lib/python3.10/site-packages/fairseq2/models/conformer/convolution.py", line 113, in forward
    seqs = self.pointwise_conv1(seqs)
  File "/home/jambo/Documents/myown/python/envs/seamless/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/jambo/Documents/myown/python/envs/seamless/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 313, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/home/jambo/Documents/myown/python/envs/seamless/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 309, in _conv_forward
    return F.conv1d(input, weight, bias, self.stride,
RuntimeError: Calculated padded input size per channel: (0). Kernel size: (1). Kernel size can't be greater than actual input size

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.