Comments (3)
Now DeepSpeech supports transcribing audio files to .tlog
files through its transcribe.py
tool. It is implemented in a way that should avoid keeping an audio file completely in memory.
$ python transcribe.py --batch_size 30 --src audio.wav --dst audio.tlog --checkpoint_dir ../checkpoint-0.6.0/
from dsalign.
It seems this is an issue with pydub but they seem uninterested in fixing it, so this limitation will remain indefinitely as long as we're using pydub.
from dsalign.
So a bit of investigation and i came across the pyav library as an alternative to pydub. It is fairly simply to load the mp3 into RAM using the following code:
import av
c = av.open("largefile.mp3")
cd = c.decode(audio=0)
which yields a generator,
>>> cd <generator object at 0x7fe3e5ac0af8>
which we can use lazily like:
>>> f = next(cd) <av.AudioFrame 0, pts=0, 576 samples at 22050Hz, mono, s16p at 0x7fe3e598c6c8>
>>> R = av.audio.resampler.AudioResampler(rate=16000)
>>> f.pts = 0
>>> R.resample(f)
<av.AudioFrame 3, pts=None, 402 samples at 16000Hz, mono, s16p at 0x7fe3e598c660>
So wavsplit.py would have to be modified to use pyav like this but the benefit would be being able to lazily consume the audio file. It's already been written to use generators (yield etc) so this would be a better fit than pydub.
from dsalign.
Related Issues (20)
- Adapt for DeepSpeech 0.6.0 HOT 5
- Fix for Mac OSX HOT 5
- ERROR: 4-gram discount out of range for adjusted count 3: -0.9085922 HOT 1
- Excluded {} empty transcripts error HOT 1
- alphabet issues
- seems to hang if transcript contains no alphabet characters HOT 1
- handling long sentences
- inconsistencies HOT 3
- Large catalog spawns many processes which won't die HOT 3
- Duplicating audio/text pairs HOT 1
- Dataset/catalog access HOT 1
- Update DSAlign to 0.7+ HOT 4
- Part of aligned text gets shifted to the next segment HOT 1
- ./bin/lm-dependencies.sh is failing because some files are missing HOT 7
- Alphabet is not defined despite being loaded. HOT 1
- Is there any way to speed the alignment process?
- Could not generate example data from bin/gettestdata.sh
- Tensorflow warning HOT 2
- TaskCluster Download Issue
- Phoneme-level alignment
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dsalign.