Speech Recognition Test API

Basic REST API for transcribing and finding specified words within MP3 audio files:

Based on the Google Speech API solution
Built in Python 3 and MongoDB
Built using a Docker multi-container environment

Set up instructions

Clone this repository
Create a credentials.json file under the root directory, containing the Google Cloud credentials
Create a .env file (rename the provided example)
Run docker compose up --build
Test this API in localhost:85/

API endpoints & examples

`POST /scan`

Given a MP3 audio file and a set of words, processes the given file and looks up the words in the transcription. Returns the number of ocurrences and timestamps found for each word.

JSON body examples:

{
	"file": "gs://speech-api-test-2344/PartyAnimal.mp3",
	"words": [
		"party",
		"not",
		"people"
	]
}

{
	"file": "gs://speech-api-test-2344/GreetingsandIntroductions.mp3",
	"words": [
		"athlete",
		"architect",
		"people"
	]
}

{
	"file": "gs://speech-api-test-2344/AirPollution.mp3",
	"words": [
		"pollution",
		"think",
		"people"
	]
}

HTTP 200 Response examples:

{
	"results": [
		{
			"count": 5,
			"timestamps": [
				"00:00:12.1",
				"00:00:14.8",
				"00:00:16.3",
				"00:00:33.8",
				"00:00:59.2"
			],
			"word": "pollution"
		},
		{
			"count": 1,
			"timestamps": [
				"00:00:10.6"
			],
			"word": "think"
		},
		{
			"count": 0,
			"timestamps": [],
			"word": "people"
		}
	]
}

HTTP 500 Response examples:

{
	"message": "No such object: speech-api-test-2344/Greetings and Introductions Basi.mp3"
}

`GET /logs`

Returns the latest logged in actions for this API.

HTTP 200 Response examples:

{
	"logs": [
		{
			"metadata": {
				"endpoint": "GET /logs",
				"user": "tester"
			},
			"timestamp": "2023-01-28 16:06:22",
			"type": "ACTION_APP_ACCESS"
		},
		{
			"metadata": {
				"endpoint": "POST /scan",
				"user": "tester"
			},
			"timestamp": "2023-01-28 16:06:22",
			"type": "ACTION_APP_ACCESS"
		},
		{
			"metadata": {
				"file": "gs://speech-api-test-2344/PartyAnimal.mp3",
				"words": [
					"party",
					"not",
					"people"
				]
			},
			"timestamp": "2023-01-28 16:06:22",
			"type": "ACTION_SCAN_AUDIO"
		},
	],
	"page": 1,
	"page_size": 50
}

TODO's

The following list contains additional features and capabilities that I've decided not to include in the current task's scope so as not to extend its delivery unnecessarily:

Implement basic data parameters validation:
- validate that words is a list of strings
- validate the file parameter
Add basic authentication by username and password
Allow API users to configure audio scanning parameters such as audio language and rate in Hertz
Implement pagination for logs endpoint
Implement logs filtering on action type, users, timestamp and metadata
Run this API using an isolated WSGI server running in its own container instead of using the Flask development server

perlucas / challenge-speech-api Goto Github PK

challenge-speech-api's Introduction

Speech Recognition Test API

Set up instructions

API endpoints & examples

`POST /scan`

`GET /logs`

TODO's

challenge-speech-api's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent