Giter Site home page Giter Site logo

Comments (2)

wiseman avatar wiseman commented on May 20, 2024 1

The underlying webrtc code can only handle frames that are 10, 20, or 30 ms long. You can use webrtcvad.valid_rate_and_frame_length to check whether a sample rate/frame size is valid (see e.g. https://github.com/wiseman/py-webrtcvad/blob/master/test_webrtcvad.py#L20).

from py-webrtcvad.

Prakash2608 avatar Prakash2608 commented on May 20, 2024

import webrtcvad
import soundfile as sf
import numpy as np
import librosa

def extract_speech_segments(audio_path, output_path):
# Load the audio file
audio, sr = librosa.load(audio_path, sr = 16000)

# Set the VAD parameters
vad = webrtcvad.Vad()
vad.set_mode(3)  # Aggressiveness level (0-3)

# Set the frame duration for VAD analysis
frame_duration = 30  # in milliseconds

# Convert the frame duration to the number of samples
frame_size = int(sr * (frame_duration / 1000.0))

# Initialize variables
speech_segments = []
current_segment_start = 0
current_segment_end = 0

# Iterate over the audio frames
for i in range(0, len(audio), frame_size):
    frame = audio[i:i + frame_size]

    # Convert the frame to int16 format
    frame = np.int16(frame * 32768)

    # Check if the frame contains speech
    if vad.is_speech(frame.tobytes(), sample_rate=sr):
        # If it's a new speech segment, update the current segment start
        if current_segment_start == 0:
            current_segment_start = i

        # Update the current segment end
        current_segment_end = i + frame_size

    # If the frame does not contain speech
    else:
        # If we were in a speech segment, add it to the list
        if current_segment_start != 0:
            speech_segments.append((current_segment_start, current_segment_end))
            current_segment_start = 0
            current_segment_end = 0

# Save the speech segments as individual audio files
for idx, (start, end) in enumerate(speech_segments):
    segment_audio = audio[start:end]
    segment_output_path = f"{output_path}_segment{idx}.wav"
    sf.write(segment_output_path, segment_audio, sr)

return speech_segments

Example usage

audio_path = '/kaggle/working/audio.wav'
output_path = '/kaggle/working/'
speech_segments = extract_speech_segments(audio_path, output_path)

This is my code.

Error Traceback (most recent call last)
Cell In[10], line 60
58 audio_path = '/kaggle/working/audio.wav'
59 output_path = '/kaggle/working/'
---> 60 speech_segments = extract_speech_segments(audio_path, output_path)

Cell In[10], line 33, in extract_speech_segments(audio_path, output_path)
30 frame = np.int16(frame * 32768)
32 # Check if the frame contains speech
---> 33 if vad.is_speech(frame.tobytes(), sample_rate=sr):
34 # If it's a new speech segment, update the current segment start
35 if current_segment_start == 0:
36 current_segment_start = i

File /opt/conda/lib/python3.10/site-packages/webrtcvad.py:27, in Vad.is_speech(self, buf, sample_rate, length)
23 if length * 2 > len(buf):
24 raise IndexError(
25 'buffer has %s frames, but length argument was %s' % (
26 int(len(buf) / 2.0), length))
---> 27 return _webrtcvad.process(self._vad, sample_rate, buf, length)

Error: Error while processing frame

and I am getting this error, also checked the prerequisites.

from py-webrtcvad.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.