Giter Site home page Giter Site logo

tapad's Introduction

The Abuse Project Audio Dataset (TAPAD)

World's largest profanity audio dataset

PICTURE logo
Dataset consists of ‭26,365 audio files
Click here for documentation

See The Abuse Project

TAPAD (∿) is an open dataset, meaning it will grow over time as more data is contributed. In order to enable reproducibility and accurate citation the dataset is versioned using git tags.

Current Status & ID3

Category Const
Total files 26,365
Dataset updated July 30, 2019
Language classes 75
File Type MP3
Mime Type audio/mpeg
Mpeg Audio Version 2
Audio Layer 3
Audio Bitrate 32 kbps
Sample Rate 24000
Channel Mode Single Channel
Ms Stereo Off
Intensity Stereo Off
Codec Type audio
Codec Time Base 1/24000
Codec Tag 0x0000
Sample Fmt fltp
Sample Rate 24000
Channels 1
Channel Layout mono
Bits Per Sample 0
R Frame Rate 0/0
Avg Frame Rate 0/0
Time Base 1/14112000

Languages are required to be 2 letters, normally their 2 letter ISO code, see: ISO_639-1

Scripts & Utilities

Filename Location Description Type
record.py acquire\custom Records audio in WAV format (default: 3 sec) Helper script
wingen.py acquire\generate TTS conversion using SAPI.SpVoice Helper script
gTTSgen.py acquire\generate TTS conversion using gTTS & abuse 0.1.1 Helper script
gspectogram.py utils Generates spectrogram of a wav file Utility tool

Structure

.
├───af
├───ar
├───bn
├───bs
├───ca
├───cs
├───cy
├───da
├───de
├───el
├───en
│   ├───1 (340 wav files)
│   └───2
├───en-au
├───en-ca
├───en-gb
├───en-gh
├───en-ie
├───en-in
├───en-ng
├───en-nz
├───en-ph
├───en-tz
├───en-uk
├───en-us
├───en-za
├───eo
├───es
├───es-es
├───es-us
├───et
├───fi
├───fr
├───fr-ca
├───fr-fr
├───hi
├───hr
├───hu
├───hy
├───id
├───is
├───it
├───ja
├───jw
├───km
├───ko
├───la
├───lv
├───mk
├───ml
├───mr
├───my
├───ne
├───nl
├───no
├───pl
├───pt
├───pt-br
├───pt-pt
├───ro
├───ru
├───si
├───sk
├───sq
├───sr
├───su
├───sv
├───sw
├───ta
├───te
├───th
├───tl
├───tr
├───uk
├───vi
├───zh-cn
└───zh-tw

Most of these audio classes have 347 MP3 files of ~5.783 minutes each. MP3 had a lot of patent issues but according to Wikipedia, "If the longest-running patent mentioned in the aforementioned references is taken as a measure, then the MP3 technology became patent-free in the United States on 16 April 2017 when U.S. Patent 6,009,399, held by and administered by Technicolor, expired".

Checking files

find audio/ -type f | wc -l

Made with TAPAD

Did you use or saw TAPAD in a paper, project or app? Add it here!

Maintainers

The dataset is maintained by :

LICENSE

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

To view a copy of this license, visit NC-SA 4.0 or send a letter to Creative Commons, PO Box 1866, Mountain View, CA 94042, USA.

tapad's People

Contributors

0x48piraj avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.