covid19-cough

This repository contains two datasets of coughs for preliminary diagnosis of COVID-19, and also some notebooks and scripts to play with the data.

Anaselmasry from kaggle

The first dataset, located at folder anaselmasry, contains pictures of spectrograms from this kaggle dataset: https://www.kaggle.com/anaselmasry/ai-covid19-from-cough-samples At folder anaselmasry/source you'll find the original images from kaggle. At folder anaselmasry/processed you'll find the images preprocessed. It contains 1934 spectrograms, 330 of them negative for COVID-19, 1604 positive for COVID-19

Examples of source images:

COVID-19 Negative

COVID-19 Positive

Same images preprocessed:

COVID-19 Negative

COVID-19 Positive

With this data you've a notebook showing how to train a CNN https://github.com/jesus-seijas-sp/covid19-cough/blob/main/Train%20anaselmasry.ipynb

This gets an AUC of 0.9876 on validation data, with accuracy 0.9845, precision 09877 and recall 0.9938

Coswara

The second database, located at folder coswara, contains data derivated for the repo located here: https://github.com/iiscleap/Coswara-Data This repo contains audios from 1486 patients, for each patient it contains an audio for breathing-deep, breathing-shallow, cough-heavy, cough-shallow, counting-fast, counting-normal, vowel-a, vowel-e, vowel-o. There is also a csv file with information from the patients. Each one has an slug used as identifier, that matchs the info at the csv with the audios and spectrograms of each patient. Also, Coswara project provides some data insights here: https://iiscleap.github.io/coswara-blog/coswara/2020/11/23/visualize_coswara_data_metadata.html The source repo contains the data in wav format stored in tars. At this repo the audios are in mp3 format to reduce the size. Also, spectrogram images has been calculated for each audio. You will find the mp3 at coswara/audios and the spectrograms at coswara/spectrograms

The script wav_to_mp3.py is the one used to transform the source wavs to mp3. The script mp3_to_spectrogram.py is the one used to transform the mp3 files to images with the spectrogram. The notebook Mel Spectrogram.ipynb can be used to take a look into the audios to understand how to visualize the wave, get the Spectrogram, and get the Mel Spectrogram.

It contains data for 1486 patients, 1186 healthy, 107 positive for COVID-19, 22 recovered, and 171 non positive but with respiratory problems.

Example of spectrograms:

Cough Heavy

Deep Breathing

jesus-seijas-sp / covid19-cough Goto Github PK

covid19-cough's Introduction

covid19-cough

Anaselmasry from kaggle

Coswara

covid19-cough's People

Stargazers

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent