Comments (2)
The underlying webrtc code can only handle frames that are 10, 20, or 30 ms long. You can use webrtcvad.valid_rate_and_frame_length
to check whether a sample rate/frame size is valid (see e.g. https://github.com/wiseman/py-webrtcvad/blob/master/test_webrtcvad.py#L20).
from py-webrtcvad.
import webrtcvad
import soundfile as sf
import numpy as np
import librosa
def extract_speech_segments(audio_path, output_path):
# Load the audio file
audio, sr = librosa.load(audio_path, sr = 16000)
# Set the VAD parameters
vad = webrtcvad.Vad()
vad.set_mode(3) # Aggressiveness level (0-3)
# Set the frame duration for VAD analysis
frame_duration = 30 # in milliseconds
# Convert the frame duration to the number of samples
frame_size = int(sr * (frame_duration / 1000.0))
# Initialize variables
speech_segments = []
current_segment_start = 0
current_segment_end = 0
# Iterate over the audio frames
for i in range(0, len(audio), frame_size):
frame = audio[i:i + frame_size]
# Convert the frame to int16 format
frame = np.int16(frame * 32768)
# Check if the frame contains speech
if vad.is_speech(frame.tobytes(), sample_rate=sr):
# If it's a new speech segment, update the current segment start
if current_segment_start == 0:
current_segment_start = i
# Update the current segment end
current_segment_end = i + frame_size
# If the frame does not contain speech
else:
# If we were in a speech segment, add it to the list
if current_segment_start != 0:
speech_segments.append((current_segment_start, current_segment_end))
current_segment_start = 0
current_segment_end = 0
# Save the speech segments as individual audio files
for idx, (start, end) in enumerate(speech_segments):
segment_audio = audio[start:end]
segment_output_path = f"{output_path}_segment{idx}.wav"
sf.write(segment_output_path, segment_audio, sr)
return speech_segments
Example usage
audio_path = '/kaggle/working/audio.wav'
output_path = '/kaggle/working/'
speech_segments = extract_speech_segments(audio_path, output_path)
This is my code.
Error Traceback (most recent call last)
Cell In[10], line 60
58 audio_path = '/kaggle/working/audio.wav'
59 output_path = '/kaggle/working/'
---> 60 speech_segments = extract_speech_segments(audio_path, output_path)
Cell In[10], line 33, in extract_speech_segments(audio_path, output_path)
30 frame = np.int16(frame * 32768)
32 # Check if the frame contains speech
---> 33 if vad.is_speech(frame.tobytes(), sample_rate=sr):
34 # If it's a new speech segment, update the current segment start
35 if current_segment_start == 0:
36 current_segment_start = i
File /opt/conda/lib/python3.10/site-packages/webrtcvad.py:27, in Vad.is_speech(self, buf, sample_rate, length)
23 if length * 2 > len(buf):
24 raise IndexError(
25 'buffer has %s frames, but length argument was %s' % (
26 int(len(buf) / 2.0), length))
---> 27 return _webrtcvad.process(self._vad, sample_rate, buf, length)
Error: Error while processing frame
and I am getting this error, also checked the prerequisites.
from py-webrtcvad.
Related Issues (20)
- webrtcvad.Vad() returns True with random noise HOT 1
- Question: Is there a way to filter noise before running VAD? HOT 1
- about transfer learning
- I am having Issue installing webrtcvad i have posted the error code bellow HOT 2
- problem installing on MacOs BigSur 11.4
- A sequence which the value indicates the corresponding sample is to be trimmed or preserved
- [https://drive.google.com/file/d/1ze3PHQtfpzCietKjyRT9Nd1t5jXKqtfS/view?usp=sharing](url) HOT 5
- How can I get the probability of whether this audio is speaking or not
- WEBRTC_ARCH_*, PPC and environment variable? HOT 4
- can't install on macOs Monterey 12.1 (21C52) HOT 1
- Memory leak problem still existed in version 2.0.10 HOT 3
- Help please ran the code and gave an error pip install -r requirements.txt
- Installing error HOT 2
- webrtcvad.Error: Error while processing frame
- multiprocessing bug HOT 1
- I need help HOT 10
- Trouble converting pyAudio Mic input to VAD frames HOT 4
- error: subprocess-exited-with-error while installing with pip HOT 3
- Failed to build webrtcvad when installing a package
- Using microphone detection HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from py-webrtcvad.