Comments (4)
Hey @Jamba777 The Translator takes 16khz audio, guess you need resample? I will make it explicit in doc.
from seamless_communication.
It works, thanks!
from seamless_communication.
Hi, I'm still facing this same error, even after using the script below to convert my files to 16khz.
I'm using it for an ASR task:
my conversion script can be found below
import os
import torchaudio
resample_rate = 16000
data_dir = "train_audios"
new_dir = "train_audios_16khz"
os.makedirs(new_dir, exist_ok = True)
for fpath in os.listdir(data_dir):
if fpath.endswith(".mp3"):
waveform, sample_rate = torchaudio.load(os.path.join(data_dir, fpath)) resampler = torchaudio.transforms.Resample(sample_rate, resample_rate, dtype=waveform.dtype)
resampled_waveform = resampler(waveform)
torchaudio.save(os.path.join(new_dir, fpath), resampled_waveform, resample_rate)
else:
pass
from seamless_communication.
okay, not sure what exactly was causing the issue, but it's working now, used librosa instead of torchaudio and saved as .wav instead of .mp3
Dropping my code here in case it's useful to anyone else :).
import os
import torchaudio
import librosa
import soundfile
data_dir = "train_audios"
new_dir = "train_audios_16khz_wav"
os.makedirs(new_dir, exist_ok = True)
for fpath in os.listdir(data_dir):
if fpath.endswith(".mp3"):
wav, sr = librosa.load(os.path.join(data_dir, fpath), sr = 16000)
soundfile.write(os.path.join(new_dir, fpath.split('.')[0] + '.wav') , wav, sr)
else:
pass
from seamless_communication.
Related Issues (20)
- Why only one previous word is used as input when predict the current word? HOT 2
- Incorrect layer index between offline discrete unit extraction and UnitY2 forced alignment
- OOM with 20GB GPU on SeamlessStreaming evaluate
- Is it possible to run SeamlessStreaming on an Apple M1 Pro?
- [Finetune] Error(s) in loading state_dict for UnitYModel
- finetune.run failed on assert batch.text_to_units.prev_output_tokens is not None
- Analysis of Audio Frame Alignment Discrepancies in Metadata Retrieval Process
- seamlessM4T_v2_large finetuning on speech translation task
- Confidence scores for the predictions generated?
- MuTox dataset not accessible HOT 1
- Wrong result for traditional Chinese HOT 5
- Some languages do not support speech synthesis
- fairseq2.assets.metadata_provider.AssetNotFoundError: An asset with the name '/Models/seamlessM4T_v2_large.pt' cannot be found.
- Facebook information
- LM Rescoring for Seamless text decoder HOT 1
- Initializing the model on an M3 Mac fails on a fresh conda environment
- What is the format of the input data .tsv in speech recognition? Is there any example file? HOT 2
- Seamless-M4T-v2 Catastrophic transcription error on clear audio (german), but file works fine in whisper v2
- How to reproduce T2TT result using HF? HOT 3
- How to save text translation to txt file? HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from seamless_communication.