Test suite created for benchmarking transcription models.
See Test.ipynb for an example of the following steps put together.
$ git clone https://github.com/PerimeterInstitute/physics-transcription-benchmarking
$ cd physics-transcription-benchmarking/
$ bash setup.sh
- WhisperPI →
from models.WhisperPI import WhisperPI
- WhisperOpenAI →
from models.WhisperOpenAI import WhisperOpenAI
- WhisperCPP →
from models.WhisperCPP import WhisperCPP
- AzureSpeechToText →
from models.AzureSpeechToText import AzureSpeechToText
See the wrapper model's associated constructor (defined in this README) to create an instance of it.
See How to Implement a Model Wrapper to create your own model wrapper.
from Test import Test
See Test class constructor.
See run() method from Test class.
During the test runtime, folders titled 'results/', 'transcriptions/', and 'TEMP_DATA/' will exist in the desired output folder.
** DO NOT delete or alter these folders in any way until the benchmarking test has completed! **
- Access TXT and VTT transcription(s) through Model Wrapper object.
- See resulting JSON files (contain load times, transcription times, accuracy data, etc.) in 'results/' folder in the current working directory.
- Do this using the createSummaryHTML() method in the Test class
- Do this using the repo's create_test_summary_html() method
See Test.ipynb for an example of how to use this class.
Test(model_array, prompt_function_array=[no_prompt], output_dir=getcwd())
: Creates Test instance
ModelWrapper[] model_array
: Array of models to be testedMethod[] prompt_function_array
: Array of prompt loading functions to be tested (defaults to contain provided prompt loading function,no_prompt()
, which returns an empty string)String output_dir
: Directory where test output will be stored, defaults to current working directory.
run(run_name, dataset_path, run_num=1, save_transcription=False)
: Runs tests comparing the transcriptions of each unique model/prompt/audio combinationString run_name
: Name of runString dataset_path
: Path to dataset to use for testingint run_num
: Number of times to transcribe the same audio file with the same model/prompt combiation (good for testing consistency!)Boolean save_transcription
: Boolean indicating if transcriptions should be saved
addModel(new_model)
: Adds provided model to model arrayModelWrapper new_model
: New model to be added
removeModel(existing_model_name)
: Removes model with provided name from model arrayString existing_model_name
: Name of model to be removed
addPromptFunction(new_prompt_func)
: Adds provided prompt function to prompt function arrayMethod new_prompt_func
: New prompt function to be added
removePromptFunction(existing_prompt_func_name)
: Removes prompt function with provided name from prompt function arrayString existing_prompt_func_name
: Name of prompt function to be removed
createSummaryHTML(html_filename=None)
: Creates HTML file that displays intuitive summary of test data from most recent run.String html_filename
: Output file name (do not include extension, defaults to RUN_NAME)
free()
: Removes and frees select attributes from memory
After running, a 'results/RUN_NAME/' folder will be created in the current working directory. This folder will contain various JSON result files that hold transcription data from each unique model/prompt combination.
If save_transcription
is set to True
, a 'transcriptions/RUN_NAME/' folder will be created in the current working directory. This folder will contain both the original and normalized transcriptions of each unique model/prompt/audio combination.
Example JSON result file:
{
"test_details": {
"model_info": {
"class_name": "WhisperOpenAI",
"model_name": "model_1",
"model_type": "medium",
"options": {
"language": "en"
}
},
"prompt_info": {
"prompt_function_name": "load_prompt_default",
"prompt_function_code": "def load_prompt_default(json_obj): ..."
},
"system_info": {
"system": "Linux",
"release": "5.15.0-1040-azure",
"version": "#47-Ubuntu SMP Thu Jun 1 19:38:24 UTC 2023",
"machine": "x86_64",
"processor": "x86_64"
},
"cpu_info": {
"physical_cores": 2,
"total_cores": 4
},
"memory_info": {
"total_memory": 16767574016,
"available_memory": 7527411712,
"used_memory": 8884101120
}
},
"test_results": {
"test_audio_1": {
"run_0": {
"start_datetime": "05/30/24, 15:10:58",
"transcribe_time": "0:00:03.993462",
"word_error_rate": 0.012195121951219513,
"match_error_rate": 0.012048192771084338,
"character_error_rate": 0.010548523206751054,
"word_information_lost": 0.012048192771084376,
"word_information_preserved": 0.9879518072289156,
"phrase_repeat_diff": 2
},
"run_1": {
"start_datetime": "05/30/24, 15:11:02",
"transcribe_time": "0:00:03.941539",
"word_error_rate": 0.012195121951219513,
"match_error_rate": 0.012048192771084338,
"character_error_rate": 0.010548523206751054,
"word_information_lost": 0.012048192771084376,
"word_information_preserved": 0.9879518072289156,
"phrase_repeat_diff": 2
},
"summary": {
"transcribe_time": "0:00:03.967500",
"word_error_rate": 0.012195121951219513,
"match_error_rate": 0.03951752632280421,
"character_error_rate": 0.010548523206751054,
"word_information_lost": 0.012048192771084376,
"word_information_preserved": 0.9879518072289156,
"phrase_repeat_diff": 2
}
},
"test_audio_2": {
"run_0": {
"start_datetime": "05/30/24, 15:11:25",
"transcribe_time": "0:00:11.942993",
"word_error_rate": 0.0546448087431694,
"match_error_rate": 0.05291005291005291,
"character_error_rate": 0.03714859437751004,
"word_information_lost": 0.06370357382893543,
"word_information_preserved": 0.9362964261710646,
"phrase_repeat_diff": 0
},
"run_1": {
"start_datetime": "05/30/24, 15:11:37",
"transcribe_time": "0:00:11.962662",
"word_error_rate": 0.0546448087431694,
"match_error_rate": 0.05291005291005291,
"character_error_rate": 0.03714859437751004,
"word_information_lost": 0.06370357382893543,
"word_information_preserved": 0.9362964261710646,
"phrase_repeat_diff": 0
},
"summary": {
"transcribe_time": "0:00:11.952828",
"word_error_rate": 0.0546448087431694,
"match_error_rate": 0.05291005291005291,
"character_error_rate": 0.03714859437751004,
"word_information_lost": 0.06370357382893543,
"word_information_preserved": 0.9362964261710646,
"phrase_repeat_diff": 0
}
}
},
"test_summary": {
"transcriptions_per_audio": 2,
"transcribe_time": "0:00:07.960164",
"word_error_rate": 0.03341996534719446,
"match_error_rate": 0.04621378961642856,
"character_error_rate": 0.023848558792130548,
"word_information_lost": 0.037875883300009905,
"word_information_preserved": 0.9621241166999901,
"phrase_repeat_diff": 1
}
}
AddToExistingTest(existing_test_json, dataset_path, model, prompt_function=no_prompt, output_dir=getcwd())
: Creates AddToExistingTest instance
String existing_test_json
: JSON file created from a previous testString dataset_path
: Dataset to be further tested (should be same as dataset used in provided JSON)ModelWrapper model
: Model to be further tested (should be same as model used in provided JSON)Method prompt_function
: Prompt function to be further tested (should be same as prompt function used in provided JSON)String output_dir
: Directory where test output will be stored, defaults to current working directory.
run(run_name, run_num=1, output_file_name=None)
: Adds test runs and updates provided test JSON with new run informationString run_name
: Name of runint run_num
: Number of test runs to addString output_file_name
: New JSON result file name (optional, defaults to file name of existing json)
free()
: Removes and frees select attributes from memory
After running, a 'results/RUN_NAME/' folder in the current working directory will be created. This folder will contain an updated JSON result file with both previous and new test information.
See Transcribe.ipynb for an example of how to use this class.
Transcribe(model_array, prompt_function_array=[no_prompt], output_dir=getcwd())
: Creates Transcribe instance
ModelWrapper[] model_array
: Array of models to use for transcriptionsMethod[] prompt_function_array
: Array of prompt loading functions to to use for transcriptions (defaults to contain provided prompt loading function,no_prompt()
, which returns an empty string)String output_dir
: Directory where transcription output will be stored, defaults to current working directory.
run(run_name, dataset_path, normalize=False)
: Creates transcription for each audio sample in provided datasetString run_name
: Name of runString dataset_path
: Path to dataset to use for transcriptionsBoolean normalize
: Boolean indicating whether or not to include normalized transcriptions alongside untouched transcriptions
free()
: Removes and frees select attributes from memory
After running, a 'transcriptions/RUN_NAME/' folder in the current working directory will be created. This folder will contain the transcriptions of each audio sample in the provided dataset. If normalize
is set to True
, this folder will also contain the normalized transcriptions of each audio sample in the provided dataset
In order to be compatible with the Test class, a Model Wrapper class must have name
, transcription
, vtt
, load_time
, and transcribe_time
attributes, as well as a transcribe()
method. Using the ModelWrapper.py interface ensures that all required attributes and methods are implemented in a Model Wrapper class.
from ModelWrapper import ModelWrapper
class YOUR_WRAPPER_NAME(ModelWrapper):
name = ""
transcription = {}
vtt = {}
load_time = {}
transcribe_time = {}
def load():
pass
def unload():
pass
def transcribe(self, audio_name, audio_file, prompt=None, output_dir=getcwd()):
pass
...
Put your model wrapper class file in the models/ folder. Import the wrapper using from models.YOUR_WRAPPER_NAME import YOUR_WRAPPER_NAME
Datasets must have the following structure in order to be used with the Test class:
dataset_name/
--> dataset_name.json
--> test_data/
--> data_1.mp4
--> data_1.txt
--> data_2.wav
--> data_2.txt
...
Please reference full_dataset.json for formatting of the dataset JSON file.
For each audio/transcript pair that will be tested, there should be an audio or video file (.mp4, .mp3, .wav, etc.) and a text file of the same name that contains a reference transcription. All of these files should go in the 'test_data' folder.
Benchmark using this dataset by using the dataset_path
parameter when instatiating the Test class.
See create_test_summary.ipynb for an example of the following steps put together.
from create_test_summary.TestSummary import create_test_summary_html
create_test_summary_html(results_folder, filename="test_summary.html")
: Creates HTML file that displays test summary information with a table and bar chart.
String results_folder
: File path to results folder containing result test model JSON filesString filename
: Output name for HTML file, defaults totest_summary.html
$ wandb login [ACCOUNT_KEY]
python3 test_hyperparams.py