When will the small model (281M) be released? about seamless_communication HOT 6 CLOSED

facebookresearch commented on July 16, 2024

When will the small model (281M) be released?

from seamless_communication.

Comments (6)

cndn commented on July 16, 2024 1

Hey @TapendraBaduwal The small model is vanilla UnitY - without pretraining components so far due to size limit, but will explore it. We exported the model with Torchscript and shared .ptl files here (compatible with torch lite intepreter) https://github.com/facebookresearch/seamless_communication/blob/main/docs/m4t/on_device_README.md.

from seamless_communication.

cndn commented on July 16, 2024 1

Hey @TapendraBaduwal I confirmed an issue. Fixing.

from seamless_communication.

cndn commented on July 16, 2024 1

Conclusion: pytorch runtime in python only worked with <=1.11.0, just got UnitY-Small-S2T to work with 2.0 (will update HF checkpoint soon) but not UnitY-Small. Updated the doc https://github.com/facebookresearch/seamless_communication/blob/main/docs/m4t/on_device_README.md

from seamless_communication.

TapendraBaduwal commented on July 16, 2024

@cndn
import torchaudio
import torch

TEST_AUDIO_PATH = "/home/tapendra/Desktop/seamless_communication/jfk.wav"
TGT_LANG = "eng"
audio_input, _ = torchaudio.load(TEST_AUDIO_PATH) # Load waveform using torchaudio
s2t_model = torch.jit.load("/home/tapendra/Desktop/seamless_communication/unity_on_device_s2t.ptl") # Load exported S2T model
text = s2t_model(audio_input, tgt_lang=TGT_LANG) # Forward call with tgt_lang specified for ASR or S2TT
print(text)

Error:
File "/home/tapendra/Desktop/seamless_communication/sp2tt.py", line 10, in
text = s2t_model(audio_input, tgt_lang=TGT_LANG) # Forward call with tgt_lang specified for ASR or S2TT
File "/home/tapendra/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
File "code/torch.py", line 26, in forward
_5 = torch.tensor(-1)
sample = annotate(Dict[str, Optional[Dict[str, Tensor]]], {"net_input": _2, "others": {"lang_tag": _5}})
_8 = (self.generator).generate(sample, None, )
~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
pred_tokens, texts, = _8
return texts[0]
File "code/torch/fairseq/sequence_generator_multi_decoder.py", line 14, in generate
sample: Dict[str, Optional[Dict[str, Tensor]]],
prefix_tokens: Optional[Tensor]=None) -> Tuple[List[List[Dict[str, Tensor]]], List[str]]:
_0 = (self)._generate(sample, prefix_tokens, None, None, False, )
~~~~~~~~~~~~~~~ <--- HERE
return _0
def _generate(self: torch.fairseq.sequence_generator_multi_decoder.MultiDecoderSequenceGenerator,
File "code/torch/fairseq/sequence_generator_multi_decoder.py", line 49, in _generate
src_tokens, src_lengths = _3, _3
_7 = (self.generator_mt.search).init_constraints(constraints, self.generator_mt.beam_size, )
encoder_outs = (self.generator_mt.model).forward_encoder(net_input0, )
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
_8 = self.generator_mt.model.single_model
_9 = _8.target_letter_decoder
File "code/torch/fairseq/sequence_generator_v2.py", line 749, in forward_encoder
else:
_353 = annotate(List[Dict[str, List[Tensor]]], [])
_354 = (getattr(self.models, "0").encoder).forward_torchscript(net_input, )
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
_355 = torch.append(_353, _354)
_352 = _353
File "code/torch/fairseq/models/speech_to_text/xm_transformer.py", line 28, in forward_torchscript
def forward_torchscript(self: torch.fairseq.models.speech_to_text.xm_transformer.Wav2VecEncoderWithAdaptor,
net_input: Dict[str, Tensor]) -> Dict[str, List[Tensor]]:
_3 = (self).forward(net_input["src_tokens"], net_input["src_lengths"], )
~~~~~~~~~~~~~ <--- HERE
return _3
def reorder_encoder_out(self: torch.fairseq.models.speech_to_text.xm_transformer.Wav2VecEncoderWithAdaptor,
File "code/torch/fairseq/models/speech_to_text/xm_transformer.py", line 19, in forward
_0 = torch.fairseq.data.data_utils.lengths_to_padding_mask
padding_mask = _0(src_lengths, )
out = (self.w2v_encoder).forward(src_tokens, padding_mask, )
~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
x = out["encoder_out"]
padding_mask0 = out["padding_mask"]
File "code/torch/fairseq/models/wav2vec/wav2vec2_asr.py", line 25, in forward
else:
_3 = False
res = (_0).extract_features(_1, _2, _3, None, )
~~~~~~~~~~~~~~~~~~~~ <--- HERE
x = res["x"]
padding_mask0 = res["padding_mask"]
File "code/torch/fairseq/models/w2vbert/w2vbert.py", line 283, in extract_features
mask: bool=False,
layer: Optional[int]=None) -> Dict[str, Tensor]:
res = (self).forward(source, padding_mask, mask, True, layer, None, None, None, )
~~~~~~~~~~~~~ <--- HERE
return res
class ConformerEncoder(Module):
File "code/torch/fairseq/models/w2vbert/w2vbert.py", line 116, in forward
else:
features2, unmasked_features0, padding_mask9 = features1, unmasked_features, padding_mask6
features4 = (self.post_extract_proj).forward(features2, )
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
features5 = (self.dropout_input).forward(features4, )
unmasked_features2 = (self.dropout_features).forward(unmasked_features0, )
File "code/torch/torch/nn/modules/linear.py", line 13, in forward
input: Tensor) -> Tensor:
_0 = torch.torch.nn.functional.linear
_1 = _0(torch.to(input, 6), torch.to(self.weight, 6), self.bias, )
~~ <--- HERE
return _1
File "code/torch/torch/nn/functional.py", line 4, in linear
weight: Tensor,
bias: Optional[Tensor]=None) -> Tensor:
return torch.linear(input, weight, bias)
~~~~~~~~~~~~ <--- HERE
def dropout(input: Tensor,
p: float=0.5,

Traceback of TorchScript, original code (most recent call last):
File "/private/home/dnn/anaconda3/envs/ilia/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 153, in generate
(default: self.eos)
"""
return self._generate(sample, prefix_tokens)
~~~~~~~~~~~~~~ <--- HERE
File "/private/home/dnn/seamless_main_20230228/fairseq-py/fairseq/models/fairseq_encoder.py", line 50, in forward_torchscript
"""
if torch.jit.is_scripting():
return self.forward(
~~~~~~~~~~~~ <--- HERE
src_tokens=net_input["src_tokens"],
src_lengths=net_input["src_lengths"],
File "/private/home/dnn/seamless_main_20230228/fairseq-py/fairseq/models/wav2vec/wav2vec2_asr.py", line 518, in forward
# x = x.transpose(0, 1)
# else:
res = self.w2v_model.extract_features(source=w2v_args['source'], padding_mask=w2v_args['padding_mask'], mask=(self.apply_mask and self.training))
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
x = res["x"]
padding_mask = res["padding_mask"]
File "/private/home/dnn/seamless_main_20230228/fairseq-py/fairseq/models/w2vbert/w2vbert.py", line 884, in extract_features
def extract_features(self, source, padding_mask, mask:bool=False, layer:Optional[int]=None):
res = self.forward(
~~~~~~~~~~~~ <--- HERE
source, padding_mask, mask=mask, features_only=True, layer=layer
)
File "/private/home/dnn/anaconda3/envs/ilia/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 96, in forward
def forward(self, input: Tensor) -> Tensor:
return F.linear(input.float(), self.weight.float(), self.bias)
~~~~~~~~ <--- HERE
File "/private/home/dnn/anaconda3/envs/ilia/lib/python3.8/site-packages/torch/nn/functional.py", line 1847, in linear
if has_torch_function_variadic(input, weight):
return handle_torch_function(linear, (input, weight), input, weight, bias=bias)
return torch._C._nn.linear(input, weight, bias)
~~~~~~~~~~~~~~~~~~~ <--- HERE
RuntimeError: self and mat2 must have the same dtype

from seamless_communication.

cndn commented on July 16, 2024

@TapendraBaduwal could you try this _s2t model first? https://dl.fbaipublicfiles.com/seamlessM4T/models/small/unity_on_device_s2t.ptl I'm working on the whole model, and will upload HF paths when done.

from seamless_communication.

TapendraBaduwal commented on July 16, 2024

@cndn Thank you for updating.

from seamless_communication.

When will the small model (281M) be released? about seamless_communication HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent