amsehili / auditok Goto Github PK

View Code? Open in Web Editor NEW

724.0 28.0 93.0 3.67 MB

An audio/acoustic activity detection and audio segmentation tool

License: MIT License

Python 100.00%

audio-activities audio-data audio-segmentation voice-detection vad voice-activity-detection

auditok's Introduction

https://travis-ci.org/amsehili/auditok.svg?branch=master

auditok is an Audio Activity Detection tool that can process online data (read from an audio device or from standard input) as well as audio files. It can be used as a command-line program or by calling its API.

The latest version of the documentation can be found on readthedocs.

Installation

A basic version of auditok will run with standard Python (>=3.4). However, without installing additional dependencies, auditok can only deal with audio files in wav or raw formats. if you want more features, the following packages are needed:

pydub : read audio files in popular audio formats (ogg, mp3, etc.) or extract audio from a video file.
pyaudio : read audio data from the microphone and play audio back.
tqdm : show progress bar while playing audio clips.
matplotlib : plot audio signal and detections.
numpy : required by matplotlib. Also used for some math operations instead of standard python if available.

Install the latest stable version with pip:

sudo pip install auditok

Install the latest development version from github:

pip install git+https://github.com/amsehili/auditok

git clone https://github.com/amsehili/auditok.git
cd auditok
python setup.py install

Basic example

import auditok

# split returns a generator of AudioRegion objects
audio_regions = auditok.split(
    "audio.wav",
    min_dur=0.2,     # minimum duration of a valid audio event in seconds
    max_dur=4,       # maximum duration of an event
    max_silence=0.3, # maximum duration of tolerated continuous silence within an event
    energy_threshold=55 # threshold of detection
)

for i, r in enumerate(audio_regions):

    # Regions returned by `split` have 'start' and 'end' metadata fields
    print("Region {i}: {r.meta.start:.3f}s -- {r.meta.end:.3f}s".format(i=i, r=r))

    # play detection
    # r.play(progress_bar=True)

    # region's metadata can also be used with the `save` method
    # (no need to explicitly specify region's object and `format` arguments)
    filename = r.save("region_{meta.start:.3f}-{meta.end:.3f}.wav")
    print("region saved as: {}".format(filename))

output example:

Region 0: 0.700s -- 1.400s
region saved as: region_0.700-1.400.wav
Region 1: 3.800s -- 4.500s
region saved as: region_3.800-4.500.wav
Region 2: 8.750s -- 9.950s
region saved as: region_8.750-9.950.wav
Region 3: 11.700s -- 12.400s
region saved as: region_11.700-12.400.wav
Region 4: 15.050s -- 15.850s
region saved as: region_15.050-15.850.wav

Split and plot

Visualize audio signal and detections:

import auditok
region = auditok.load("audio.wav") # returns an AudioRegion object
regions = region.split_and_plot(...) # or just region.splitp()

output figure:

Limitations

Currently, the core detection algorithm is based on the energy of audio signal. While this is fast and works very well for audio streams with low background noise (e.g., podcasts with few people talking, language lessons, audio recorded in a rather quiet environment, etc.) the performance can drop as the level of noise increases. Furthermore, the algorithm makes no distinction between speech and other kinds of sounds, so you shouldn't use it for Voice Activity Detection if your audio data also contain non-speech events.

License

MIT.

auditok's People

Contributors

Stargazers

Watchers

Forkers

taf2 hdubey jhoelzl ramya1782 yfliao englandkr zhaoyul whohow123 nemocpp aitorbajo roguejin grevutiu-gabriel raviskumawat mohnkhan fanwei918 wirone mneunomne alanderex reinhardhsu ps2 chenyizi086 0i0 matdurand daisey666 elbum ruizewang alongwithyou rachidio waltonvision xinkez yveoms mahao8 usccolumbia tominem lejulian ashishpahwa7 starhaox llrraa djangid wy2609 chenny0808 jemsbhai rpj911 fenyx4 shoegazerstella dpunosevac torshie saucerpeach yoyota yoyota-archived shaynemei mikel-a-esparza hadiidbouk anotherother vlinhd11 zhangyuteng jackjiang313 proling1994 makinglong samelltiger kissacat mayuyukirin ruohoruotsi oucxlw xenjee j-schultz windowxiaoming pfect jinhill szhcw joinee0208 jaedukseo dennistang742 luxunhuang jisd2089 convect-bot wgwangang road2018 xiexukang alex-songs techthiyanes laomeinote petercortinas leminhnguyen hadryan arpitjain799 brahimmade 232136813 ty1135 jecky100000 hippocast plk-g ziyi6

auditok's Issues

Is there any citation? I'd like to cite it in our study.

Hello Mr.SEHILI.
This tool is very useful!
I want to cite auditok in our study.
Could you please tell me the citation?
Thanks.

Doubt : format of the output

Hello
This is not an issue but an doubt
I am using auditok for audio tokenization,
but i need data in librosa/soundfile format
when i check librosa/sf the data values are in floating points while in auditok they are large numbers
But in both documentation its mentioned that output is timeseries.
Can you please help me convert one output format to other, or at the least explain what is the format of the output in auditok

Get {start} and {end} with 3 decimals

Hello,

first of all, this is a great python module and does a very good job!

I am wondering if i can extend the number of decimals for the export tags {start} and {end}.
By default, the have 2 decimals - i need rounding to 3 decimals.

Regards,
Josef

Real-Time Silence detection from bytes

From #23, I am trying to split speaker's audio using pyaudio stream:

The Callback Part (how can I use `in_data` and split it like read input from microphone in #23 )

def callback(self, in_data, frame_count, time_info, status):
    """Write frames and return PA flag"""
    # wave_file.writeframes(in_data)
    self.frames.append(in_data)
    input= b''.join(self.frames)
    print(input)
    reader = AudioReader(
        input=input,
        sr=self.__SAMPLE_RATE,
        sw=self.__SAMPLE_WIDTH,
        ch=self.__CHANNEL
        )
    for (i, region) in enumerate(split(
        input=reader,
        # eth=self.__ENERGY_THRESHOLD,
        max_silence=self.__MAX_SILENCE,
        max_dur=self.__MAX_DURATION,
        min_dur=self.__MIN_DURATION
        )):
        print(f"{constants.CONSOLE_COLOR_RED}split{constants.CONSOLE_COLOR_WHITE}")
        path = f'{constants.TEMP_SPEAKER_OUTPUT_AUDIO_DIR}/{str(time.time()) + constants.TEMP_SPEAKER_OUTPUT_AUDIO_FORMAT}'
        region.save(path)
        self.frames = []
        break

    return (in_data, pyaudio.paContinue)

The pyaudio part

with p.open(format=pyaudio.paInt16,
        channels=default_speakers["maxInputChannels"],
        rate=int(default_speakers["defaultSampleRate"]),
        frames_per_buffer=pyaudio.get_sample_size(pyaudio.paInt16),
        input=True,
        input_device_index=default_speakers["index"],
        stream_callback=self.callback
) as stream:
    """
    Opena PA stream via context manager.
    After leaving the context, everything will
    be correctly closed(Stream, PyAudio manager)            
    """
    while self.ai_listen_handler.is_listening_speaker:
        time.sleep(1)

Use `auditok.split` to split microphone input in real-time

for region in auditok.split(
    input=None,
    sr=self.__SAMPLE_RATE,
    sw=self.__SAMPLE_WIDTH,
    ch=self.__CHANNEL,
    eth=self.__ENERGY_THRESHOLD,
    max_silence=self.__MAX_SILENCE,
    max_dur=self.__MAX_DURATION,
    min_dur=self.__MIN_DURATION
    ):
    if not self.ai_listen_handler.is_listening_mic:
        return

    path = f'{constants.TEMP_MIC_INPUT_AUDIO_DIR}/{str(time.time()) + constants.TEMP_MIC_INPUT_AUDIO_FORMAT}'
    region.save(path)

convert stereo to mono audioregion by mixing?

any ideas on mixing a stereo AudioRegion into single mono channel?

my trials with region.samples[0] + region.samples[1] is failing..

AudioParameterError

auditok.split(
'/tmp/tmprn08794x.wav'
)

AudioParameterError: The length of audio data must be an integer multiple of sample_width * channels

I'm unsure what I should do.

how to set Energy threshold for this current version

Creating an .srt subtitle file

Hi there,
@amsehili
I want to create an .srt subtitle file from an audio mp3 file that I have, my goal is to automatically detect the speech regions and create an .srt file with time-stamps and blank transcription, which I will then use to fill the transcription manually.

I have used this command:

auditok -e 55 -i interview.mp3 -m 10 --printf "{id}\n{start} --> {end}\nPut some text here...\n" --time-format "%h:%m:%s.%i" -o output.srt

the result is:

Output #0, srt, to 'c:\users\divana\appdata\local\temp\tmp_s9dvg':
Output file #0 does not contain any stream

Though if I dont use the -o option, the segments can be viewd in the terminal, but cant export to srt.
Can anyone help in creating a subtitle file .srt with time-stamps
Waiting for your reply

Audi file attached
interview.zip

Errors running auditok

I'm on Linux Mint 17.2.

When I try to run "auditok", I get the following messages:

~/git_projects/auditok $ auditok
ALSA lib pcm_dsnoop.c:618:(snd_pcm_dsnoop_open) unable to open slave
ALSA lib pcm_dmix.c:1022:(snd_pcm_dmix_open) unable to open slave
ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
bt_audio_service_open: connect() failed: Connection refused (111)
bt_audio_service_open: connect() failed: Connection refused (111)
bt_audio_service_open: connect() failed: Connection refused (111)
bt_audio_service_open: connect() failed: Connection refused (111)
ALSA lib pcm_dmix.c:1022:(snd_pcm_dmix_open) unable to open slave
Cannot connect to server socket err = No such file or directory
Cannot connect to server request channel
jack server is not running or cannot be started
1 0.00 5.00
2 5.00 10.00
3 10.00 15.00
4 15.00 20.00
^C5 20.00 20.60

Should I be concerned about any of these messages?
It doesn't seem to be detecting any sound. It just outputs automatically every 5 seconds.

Any suggestions would be appreciated.

Thanks.

Splitting audio

It possible use auditok to split audio in fixed length, before Silence?

For example:

auditok -n 600 -m 600 -i myFile.wav

I want split myFile.wav in up to 600 seconds (10min) of audio. If audio event is greather than 10min, then, it must split most close silence before the 10min boundary. That is possible with current implementation ?

Error when running audiotok

This is what I get when I try to run audiotok

ALSA lib pcm_dmix.c:1099:(snd_pcm_dmix_open) unable to open slave
ALSA lib pcm.c:2501:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
ALSA lib pcm.c:2501:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
ALSA lib pcm.c:2501:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
ALSA lib pcm_route.c:867:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_dmix.c:1099:(snd_pcm_dmix_open) unable to open slave
connect(2) call to /dev/shm/jack-1000/default/jack_0 failed (err=No such file or directory)
attempt to connect to server failed

I'm on archlinux with python 3.6

Quality Benchmarks Between audiotok / webrtcvad / silero-vad

Here I will post our benchmarks comparing these three instruments

splitting the audio with overlaps.

Hi thanks for this library.
I was wondering if there is a way in which i can split my audio into smaller clips but with adjacent clips having overlapping samples.
Example:
Lets say I have a clip of 1min and it has silences from 10-20sec and then 30-40 secs and my max length of audio is lets say 5 secs. So I was wondering if there is a way in which i can get the splitted audio with duration lets say 1st audio from 0-5 then second audio from 3-8 (overlapping 4and 5 seconds) and so on?

Thanks any help is appreciated.

Enforce PEP-8 guidelines across the Codebase

Few scripts in the Repo, don't align with the PEP-8 guidelines.

How to compile the auditok executable file

I downloaded the library, but did not find any files that can be compiled (c / cpp).
So. How to compile the auditok executable file.

@amsehili

ImportError: No module named setuptools

When I try to run the setup script, I get the following error:

~/auditok $ python setup.py install
Traceback (most recent call last):
  File "setup.py", line 4, in <module>
    from setuptools import setup
ImportError: No module named setuptools

Where should it be finding this "setuptools" directory/module?

Thanks

Error when running auditok

This is what I get when I try to run audiotok

I'm on archlinux with python 3.6, any help appreciated!

Making standalone executable for win32

Hi. After I successfully installed auditok, I have this this two files in the Script directory:

auditok.exe
auditok-script.py

I tested the executable, and it perfectly worked. But for now, I want to be able to run this on the other machine without python installed. In other words, I need win32 standalone console executable of auditok.

I tried to use pyinstaller:

pip install pyinstaller
pyinstaller auditok-script.py

After that I have "dist" directory with "auditok-script" dir and bunch of files in it. I run auditok-script.exe in the dir, and it gave me the error:

...\python\Scripts\dist\auditok-script>auditok-script.exe
Traceback (most recent call last):
  File "auditok-script.py", line 11, in <module>
  File "site-packages\pkg_resources\__init__.py", line 480, in load_entry_point
  File "site-packages\pkg_resources\__init__.py", line 472, in get_distribution
  File "site-packages\pkg_resources\__init__.py", line 344, in get_provider
  File "site-packages\pkg_resources\__init__.py", line 892, in require
  File "site-packages\pkg_resources\__init__.py", line 778, in resolve
pkg_resources.DistributionNotFound: The 'auditok==0.1.8' distribution was not found and is required by the application
[21452] Failed to execute script auditok-script

...\python\Scripts\dist\auditok-script>

I'm not very familiar with python, so maybe there another way to build standalone binaries? Or how can I fix this error?

Thanks.

Cannot use split for an audio-humming.wav

(split_env) C:\Users\v_gejwzhang\ASR>python split.py
Traceback (most recent call last):
File "split.py", line 9, in
energy_threshold=55 # threshold of detection
File "C:\ProgramData\Anaconda3\envs\split_env\lib\site-packages\auditok\core.py", line 227, in split
source = AudioReader(input, block_dur=analysis_window, **params)
File "C:\ProgramData\Anaconda3\envs\split_env\lib\site-packages\auditok\util.py", line 1008, in init
input = get_audio_source(input, **kwargs)
File "C:\ProgramData\Anaconda3\envs\split_env\lib\site-packages\auditok\io.py", line 731, in get_audio_source
return from_file(filename=input, **kwargs)
File "C:\ProgramData\Anaconda3\envs\split_env\lib\site-packages\auditok\io.py", line 919, in from_file
return _load_wave(filename, large_file)
File "C:\ProgramData\Anaconda3\envs\split_env\lib\site-packages\auditok\io.py", line 813, in _load_wave
with wave.open(file) as fp:
File "C:\ProgramData\Anaconda3\envs\split_env\lib\wave.py", line 499, in open
return Wave_read(f)
File "C:\ProgramData\Anaconda3\envs\split_env\lib\wave.py", line 163, in init
self.initfp(f)
File "C:\ProgramData\Anaconda3\envs\split_env\lib\wave.py", line 143, in initfp
self._read_fmt_chunk(chunk)
File "C:\ProgramData\Anaconda3\envs\split_env\lib\wave.py", line 260, in _read_fmt_chunk
raise Error('unknown format: %r' % (wFormatTag,))
wave.Error: unknown format: 3
`import auditok

split returns a generator of AudioRegion objects

audio_regions = auditok.split(
"humming.wav",
min_dur=0.2, # minimum duration of a valid audio event in seconds
max_dur=4, # maximum duration of an event
max_silence=0.3, # maximum duration of tolerated continuous silence within an event
energy_threshold=55 # threshold of detection
)

for i, r in enumerate(audio_regions):

# Regions returned by `split` have 'start' and 'end' metadata fields
print("Region {i}: {r.meta.start:.3f}s -- {r.meta.end:.3f}s".format(i=i, r=r))

# play detection
# r.play(progress_bar=True)

# region's metadata can also be used with the `save` method
# (no need to explicitly specify region's object and `format` arguments)
filename = r.save("region_{meta.start:.3f}-{meta.end:.3f}.wav")
print("region saved as: {}".format(filename))

`
humming.wav is a file bigger than 25 MB

Real-Time Silence detection

I am using PyAudio to collect audio streams from the microphone input to detect end-of-speech. I was wondering if I can process the streams in a real-time manner using auditok. If so, how do I optimize it to run faster?

I am using the following code for the time-being:

from auditok import ADSFactory, AudioEnergyValidator, StreamTokenizer, player_for


# We set the `record` argument to True so that we can rewind the source
asource = ADSFactory.ads(max_time=10, record=True)

validator = AudioEnergyValidator(sample_width=asource.get_sample_width(), 
                                 energy_threshold=65)

tokenizer = StreamTokenizer(
    validator=validator,
    min_length=20,
    max_length=400,
    max_continuous_silence=30,
)

asource.open()
tokens = tokenizer.tokenize(asource)

# Play detected regions back
player = player_for(asource)

trimmed_signal = b""
for p, q, r in tokens:
    trimmed_signal += b"".join(p)
print("\n ** Playing trimmed signal...")
player.play(trimmed_signal)

asource.close()
player.stop()

Ultimately, my end goal is to record user's input from the microphone (mobile device) and record audio, before doing a speech-to-text transcription using Google API.

Use auditok as API to detect pauses in speech

Thanks for the great lib!!

I have a bunch of utterances extracted from conversations, and I want to detect pauses in each of these utterances and how long the pauses are, both short and long pauses (500ms is the threshold for example for determining long or short).

Is it possible to use auditok as API to call such function for pause detection, in my existing data pipeline at all? Sorry if the question seems general, but it'd be greatly appreciated if you can provide any advices. Thank you again.

Could you please update pypi auditok?

The pypi auditok is still 0.1.5, and its way behind the current version.

Run on-device on Android device

I plan to run auditok on-device to detect pauses in speech. Currently, it is being handled as an API call (front-end streams the audio via a microphone and sends it to a back-end API which does the segmentation on a real-time basis).

Is it possible to convert it into a Tensorflow lite model sorts for on-device inference, rather than an API call?

Integrate spleeter model

Add an quantalized Spleeter model, so that auditok can isolate the vocal sound from the original audio before splitting. Without the background noise, auditok can split the dialogs better.

Dependencies aren't listed in setup.py and pyproject.toml

The project imports numpy, pydub, pyaudio, tqdm, but they aren't listed as dependencies.

In case there are no required dependencies - the require and depend clauses should be left empty, and optional dependencies should still be listed.

Otherwise the lack of this info causes confusion.

not support wave 24k?

How to process audio already loaded in numpy and/or torch?

Thank you for this good repo!

I find that auditok works in general better than popular VAD like silero (which can have unexplained behaviour on some types of audio).
I'd like to use it in my project, but I struggle to do so, because when I call the VAD, I don't have access to a wav file.
The only way I found to pass the torch vector of raw audio is to use this awkward conversion:

    byte_io = io.BytesIO(bytes())
    scipy.io.wavfile.write(byte_io, SAMPLE_RATE, (audio.numpy() * 32767).astype(np.int16)) # audio is a torch tensor
    bytes_wav = byte_io.read()

    segments = auditok.split(
        bytes_wav,
        sampling_rate=SAMPLE_RATE,        # sampling frequency in Hz
        channels=1,                       # number of channels
        sample_width=2,                   # number of bytes per sample
        min_dur=min_speech_duration,      # minimum duration of a valid audio event in seconds
        max_dur=len(audio)/SAMPLE_RATE,   # maximum duration of an event
        max_silence=min_silence_duration, # maximum duration of tolerated continuous silence within an event
        energy_threshold=50,
        drop_trailing_silence=True,
    )

Is there a better way to do that?

If you want to see more, or directly comment on the related PR, it's here: https://github.com/linto-ai/whisper-timestamped/pull/78/files#diff-4d4adecf50ce8affc04f13ab7274717945dd716eb910225ff154f717e81c3b64R1791

Replace/remove genty

The dependency genty looks abandoned, unmaintained and has issues with its own tests. The last commit was in the beginning of 2016. Please consider to remove/replace it.

Specify recording device

I have multiple recording devices on my system, but is seems to only pick up the mic plugged in to the microphone jack. How can I get it to use the microphone on the webcam that is plugged into a USB port?

I tried inserting the following at line 354 of io.py:
input_device_index = 2,
but it didn't have any effect.

I'm not very familiar with Python, but this looks like the right place according to the PyAudio documentation.

Failed when using split_and_plot

if min_dur <= 0: TypeError: '<=' not supported between instances of 'ellipsis' and 'int'

Callback data format

Hi I'm using auditok to segment sentences from speech and I would like to convert the data returned in the callback into an int16 wav format for use with a different application. Could you tell me how I could go about doing that?

module 'auditok' has no attribute 'split'

I follow the instructions,and i get an error as the title,why,please help!

Extract silence/background noise instead of high energy bursts/speech

Hello, Any simple way to extract the opposite of what the module is doing?

We ae trying to extract both speech AND silence/noise

We could easily extract the speech but any ideas how to extract the silence periods as audio regions as well?

Keyword {duration} returns error

Hello, beside {id}, {start} and {end}, i also want to use the keyword {duration}.

However, when using it, i get following error:

Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
  File "/myproject/venv/local/lib/python2.7/site-packages/auditok/cmdline.py", line 524, in run
    end = self.time_formatter(end_time)))
KeyError: 'duration'