Giter Site home page Giter Site logo

jonatasgrosman / huggingsound Goto Github PK

View Code? Open in Web Editor NEW
430.0 14.0 43.0 612 KB

HuggingSound: A toolkit for speech-related tasks based on Hugging Face's tools

License: MIT License

Makefile 1.18% Python 98.82%
transformers audio speech speech-recognition asr automatic-speech-recognition speech-to-text

huggingsound's Introduction

HuggingSound

HuggingSound: A toolkit for speech-related tasks based on HuggingFace's tools.

I have no intention of building a very complex tool here. I just wanna have an easy-to-use toolkit for my speech-related experiments. I hope this library could be helpful for someone else too :)

Requirements

  • Python 3.8+

Installation

$ pip install huggingsound

How to use it?

I'll try to summarize the usage of this toolkit. But many things will be missing from the documentation below. I promise to make it better soon. For now, you can open an issue if you have some questions or look at the source code to see how it works. You can check more usage examples in the repository examples folder.

Speech recognition

For speech recognition you can use any CTC model hosted on the Hugging Face Hub. You can find some available models here.

Inference

from huggingsound import SpeechRecognitionModel

model = SpeechRecognitionModel("jonatasgrosman/wav2vec2-large-xlsr-53-english")
audio_paths = ["/path/to/sagan.mp3", "/path/to/asimov.wav"]

transcriptions = model.transcribe(audio_paths)

print(transcriptions)

# transcriptions format (a list of dicts, one for each audio file):
# [
#  {
#   "transcription": "extraordinary claims require extraordinary evidence", 
#   "start_timestamps": [100, 120, 140, 180, ...],
#   "end_timestamps": [120, 140, 180, 200, ...],
#   "probabilities": [0.95, 0.88, 0.9, 0.97, ...]
# },
# ...]
#
# as you can see, not only the transcription is returned but also the timestamps (in milliseconds) 
# and probabilities of each character of the transcription.

Inference (boosted by a language model)

from huggingsound import SpeechRecognitionModel, KenshoLMDecoder

model = SpeechRecognitionModel("jonatasgrosman/wav2vec2-large-xlsr-53-english")
audio_paths = ["/path/to/sagan.mp3", "/path/to/asimov.wav"]

# The LM format used by the LM decoders is the KenLM format (arpa or binary file).
# You can download some LM files examples from here: https://huggingface.co/jonatasgrosman/wav2vec2-large-xlsr-53-english/tree/main/language_model
lm_path = "path/to/your/lm_files/lm.binary"
unigrams_path = "path/to/your/lm_files/unigrams.txt"

# We implemented three different decoders for LM boosted decoding: KenshoLMDecoder, ParlanceLMDecoder, and FlashlightLMDecoder
# On this example, we'll use the KenshoLMDecoder
# To use this decoder you'll need to install the Kensho's ctcdecode first (https://github.com/kensho-technologies/pyctcdecode)
decoder = KenshoLMDecoder(model.token_set, lm_path=lm_path, unigrams_path=unigrams_path)

transcriptions = model.transcribe(audio_paths, decoder=decoder)

print(transcriptions)

Evaluation

from huggingsound import SpeechRecognitionModel

model = SpeechRecognitionModel("jonatasgrosman/wav2vec2-large-xlsr-53-english")

references = [
    {"path": "/path/to/sagan.mp3", "transcription": "extraordinary claims require extraordinary evidence"},
    {"path": "/path/to/asimov.wav", "transcription": "violence is the last refuge of the incompetent"},
]

evaluation = model.evaluate(references)

print(evaluation)

# evaluation format: {"wer": 0.08, "cer": 0.02}

Fine-tuning

from huggingsound import TrainingArguments, ModelArguments, SpeechRecognitionModel, TokenSet

model = SpeechRecognitionModel("facebook/wav2vec2-large-xlsr-53")
output_dir = "my/finetuned/model/output/dir"

# first of all, you need to define your model's token set
# however, the token set is only needed for non-finetuned models
# if you pass a new token set for an already finetuned model, it'll be ignored during training
tokens = ["a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z", "'"]
token_set = TokenSet(tokens)

# define your train/eval data
train_data = [
    {"path": "/path/to/sagan.mp3", "transcription": "extraordinary claims require extraordinary evidence"},
    {"path": "/path/to/asimov.wav", "transcription": "violence is the last refuge of the incompetent"},
]
eval_data = [
    {"path": "/path/to/sagan2.mp3", "transcription": "absence of evidence is not evidence of absence"},
    {"path": "/path/to/asimov2.wav", "transcription": "the true delight is in the finding out rather than in the knowing"},
]

# and finally, fine-tune your model
model.finetune(
    output_dir, 
    train_data=train_data, 
    eval_data=eval_data, # the eval_data is optional
    token_set=token_set,
)

Troubleshooting

  • If you are having trouble when loading MP3 files: $ sudo apt-get install ffmpeg

Want to help?

See the contribution guidelines if you'd like to contribute to HuggingSound project.

You don't even need to know how to code to contribute to the project. Even the improvement of our documentation is an outstanding contribution.

If this project has been useful for you, please share it with your friends. This project could be helpful for them too.

If you like this project and want to motivate the maintainers, give us a ⭐. This kind of recognition will make us very happy with the work that we've done with ❤️

You can also sponsor me 😍

Citation

If you want to cite the tool you can use this:

@misc{grosman2022huggingsound,
  title={{HuggingSound: A toolkit for speech-related tasks based on Hugging Face's tools}},
  author={Grosman, Jonatas},
  howpublished={\url{https://github.com/jonatasgrosman/huggingsound}},
  year={2022}
}

huggingsound's People

Contributors

edgett avatar jonatasgrosman avatar nkaenzig avatar nkaenzig-aifund avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

huggingsound's Issues

Compability with python 3.10

This package cannot be installed with python 3.10.

Trying to install the wheel manually it complains about the correct numba version not being available.
Is numba<0.54.0,>=0.53.1 really required instead e.g. numba==0.55.0 or any other version which is available for python 3.10?

I would love to use this library, but currently it does not seem to be possible to install it on Ubuntu 22.04.

SpeechRecognitionModel.transcribe only transcribes first sentence of long 5 minute audio file

I have a 5 minute long audio recording, but when using huggingsound to parse it, only the first sentence was transcribed.

I'm using GreedyDecoder, like below:

import torch
from huggingsound import SpeechRecognitionModel

device = "cuda" if torch.cuda.is_available() else "cpu"
batch_size = 1
model = SpeechRecognitionModel("jonatasgrosman/wav2vec2-large-xlsr-53-english", device=device)
audio_paths = ["/path/to/sagan.mp3", "/path/to/asimov.wav"]

transcriptions = model.transcribe(audio_paths, batch_size=batch_size)

print(transcriptions)

Error during fine-tuning

I have code:

from huggingsound import TrainingArguments, ModelArguments, SpeechRecognitionModel, TokenSet
from transformers import Wav2Vec2Processor

processor_ref = Wav2Vec2Processor.from_pretrained("/my/dir/wav2vec2-large-xlsr-53-kalmyk")
token_list = list(processor_ref.tokenizer.encoder.keys())
print(len(token_list))

model = SpeechRecognitionModel("/my/dir/wav2vec2-large-xlsr-53-kalmyk")
output_dir = "/my/dir/tuned"

token_set = TokenSet(token_list)

model.finetune(
    output_dir, 
    train_data=train_data,
    token_set=token_set
)

I have list of dicts like this in my train_data:

train_data = [
    {"path": "/path/to/sagan.mp3", "transcription": "extraordinary claims require extraordinary evidence"},
    {"path": "/path/to/asimov.wav", "transcription": "violence is the last refuge of the incompetent"},
]

Then i get some errors. Can someone help me with that?

	size mismatch for lm_head.weight: copying a param with shape torch.Size([41, 1024]) from checkpoint, the shape in current model is torch.Size([45, 1024]).
	size mismatch for lm_head.bias: copying a param with shape torch.Size([41]) from checkpoint, the shape in current model is torch.Size([45]).
	You may consider adding `ignore_mismatched_sizes=True` in the model `from_pretrained` method. ```

Out of memory for long audios

Hi,

I think the title speaks for itself. I'm trying to use the 1b french model on several audios. It works fine for short audios (<1min) but pumps all the RAM with a audio of 8min. Is it something to be expected? Is there a way to make it work without having to split long audios into smaller chunks?

Thanks in advance

I want to contribute by adding wavaugment

Hi, thank you for this amazing library, it's so much easier to use than compiling my own dataset script and adapting to huggingface's provided script.

I want to contribute to this git by adding the WavAugment functionalities into finetuning process to enhance the finetuning models. To my knowledge WavAugment works amazingly with wav2vec2 based models.

Are you working on this feature? If not I would love to send a PR.

raise NoBackendError() audioread.exceptions.NoBackendError

First of all thank you for your work!
I am not able to run the transcibe() method.

This is my code:

from huggingsound import SpeechRecognitionModel

model = SpeechRecognitionModel("jonatasgrosman/wav2vec2-large-xlsr-53-german")
path = r"C:/Users/johndoe/PycharmProjects/kedro_pipeline/data/01_raw/Bond_ueber_das_wetter_und_berlin.mp3"
audio_paths = [path]

transcriptions = model.transcribe(audio_paths)

I assume my path is not correct, but I already tried different Formats:

r"C:\\Users\\johndoe\\..."
r"C:\Users\johndoe\..."

-> did not work either.

This is the output:

02/24/2022 11:40:11 - INFO - huggingsound.speech_recognition.model - Loading model...
  0%|          | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "C:\Users\johndoe\anaconda3\envs\lib\site-packages\librosa\core\audio.py", line 149, in load
    with sf.SoundFile(path) as sf_desc:
  File "C:\Users\johndoe\anaconda3\envs\lib\site-packages\soundfile.py", line 629, in __init__
    self._file = self._open(file, mode_int, closefd)
  File "C:\Users\johndoe\anaconda3\envs\lib\site-packages\soundfile.py", line 1183, in _open
    _error_check(_snd.sf_error(file_ptr),
  File "C:\Users\johndoe\anaconda3\envs\lib\site-packages\soundfile.py", line 1357, in _error_check
    raise RuntimeError(prefix + _ffi.string(err_str).decode('utf-8', 'replace'))
RuntimeError: Error opening 'C:/Users/johndoe/PycharmProjects/kedro_pipeline/data/01_raw/Bond_ueber_das_wetter_und_berlin.mp3': File contains data in an unknown format.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:/Users/johndoe/PycharmProjects/main.py", line 7, in <module>
    transcriptions = model.transcribe(audio_paths)
  File "C:\Users\johndoe\anaconda3\envs\lib\site-packages\huggingsound\speech_recognition\model.py", line 108, in transcribe
    waveforms = get_waveforms(paths_batch, sampling_rate)
  File "C:\Users\johndoe\anaconda3\envs\lib\site-packages\huggingsound\utils.py", line 52, in get_waveforms
    waveform, sr = librosa.load(path, sr=sampling_rate)
  File "C:\Users\johndoe\anaconda3\envs\lib\site-packages\librosa\core\audio.py", line 166, in load
    y, sr_native = __audioread_load(path, offset, duration, dtype)
  File "C:\Users\johndoe\anaconda3\envs\lib\site-packages\librosa\core\audio.py", line 190, in __audioread_load
    with audioread.audio_open(path) as input_file:
  File "C:\Users\johndoe\anaconda3\envs\lib\site-packages\audioread\__init__.py", line 116, in audio_open
    raise NoBackendError()
audioread.exceptions.NoBackendError

Process finished with exit code 1

Optional unigrams

According to the type hints in the KenshoLMDecoder initialization routine, the unigram path argument is optional, but the code still crashes if it is not provided.

A simple fix is to add the line:

self.unigrams = None

somewhere before the line here:

        if self.unigrams_path is not None:

Fine-tuning hardware requirement

Thanks for the great work,

Can you share the training hardware information like GPU model and memory with me? I want to fine tune the larger models like jonatasgrosman/wav2vec2-xls-r-1b-german, need to estimate the hardware requirements.

is_finetuned(self) returns True for facebook/wav2vec2-base

SpeechRecognitionModel.is_finetuned(self) returns True for the base model facebook/wav2vec2-base with is not finetuned.

How to reproduce:

from huggingsound import SpeechRecognitionModel

model = SpeechRecognitionModel("facebook/wav2vec2-base")

assert model.is_finetuned == False

The issue seems to be that Wav2Vec2Processor.from_pretrained() does return a processor with a PreTrainedTokenizer:
image

Therefore is_finetuned() yields True:
https://github.com/jonatasgrosman/huggingsound/blob/main/huggingsound/speech_recognition/model.py#L58-L78

Possibly this is not an issue of this Repo, but of the files on the huggingface model hub:
https://huggingface.co/facebook/wav2vec2-base/tree/main

Is there a reason why this model has preprocessor_config.json and tokenizer_config.json files?

If you look at facebook/wav2vec2-large, this one doesn't have these fields and therefore Wav2Vec2Processor.from_pretrained() won't return a processor/tokenizer.

Skipped Reference when using evaluation

I received a warning message:
WARNING:huggingsound.speech_recognition.model:6 references skipped because they were empty after text normalization

Should I worry about this message? Does it mean some of the data were not used for evaluation? (6 Skipped, 2 Used?)

Here are my sample data:

{"path": "sample-002469.mp3", "transcription": "we speak of them only to children once before"},
{"path": "sample-003480.mp3", "transcription": "why can't you be serious"},
{"path": "sample-002944.mp3", "transcription": "finally a young woman approached who was not dressed in black"},
{"path": "sample-001877.mp3", "transcription": "the woman was silent for some time"},
{"path": "sample-002230.mp3", "transcription": "the boy told him then that he needed to get to the pyramids"},
{"path": "sample-003314.mp3", "transcription": "you brought a new feeling into my crystal shop"},
{"path": "sample-003743.mp3", "transcription": "even though the sheep didn't teach me to speak arabic"},
{"path": "sample-004448.mp3", "transcription": "before this i always looked to the desert with longing said fatima"},

Space token problem

Hellos, first of all thanks for the lovely code!

I'm trying to fine tune XLSR-53 with some French data, code is just from the examples directory:

model = SpeechRecognitionModel("facebook/wav2vec2-large-xlsr-53", device=device)
output_dir = "wav2vec_finetuned_fr"


alphabet = ["a", "b", "c", "d", "e", "f", "g", "h", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z", "'", "-", "é", "à", "è", "ù", "ç", "â", "ê", "î", "ô", "û", "ë", "ï", "ü", "0", "1", "2", "3", "4", "5", "6", "7", "8", "9"]
token_set = TokenSet(alphabet)

training_args = TrainingArguments(
    learning_rate=3e-4,
    max_steps=1000,
    eval_steps=200,
    per_device_train_batch_size=12,
    per_device_eval_batch_size=12,
)
model_args = ModelArguments(
    activation_dropout=0.1,
    hidden_dropout=0.1,
)


# and finally, fine-tune your model
model.finetune(
    output_dir,
    train_data=train_data,
    eval_data=eval_data, # the eval_data is optional
    token_set=token_set,
    training_args=training_args,
    model_args=model_args,

However I get a training error:

  File "/usr/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
    output = module(*input, **kwargs)
  File "/usr/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/lib/python3.8/site-packages/transformers/models/wav2vec2/modeling_wav2vec2.py", line 1702, in forward
    raise ValueError(f"Label values must be <= vocab_size: {self.config.vocab_size}")
ValueError: Label values must be <= vocab_size: 56

Spaces are the problems, id 56 corresponds to whitespace. Here's an example sentence tokenized:

[15, 8, 56, 16, 8, 21, 2, 21, 8, 3, 12, 56, 15, 30, 56, 0, 22, 31, 19, 18, 15, 12, 2, 8, 56, 0, 56, 16, 0, 21, 20, 24, 32, 56, 48, 56, 1, 24, 23, 22, 56, 2, 18, 17, 23, 21, 8, 56, 21, 0, 2, 12, 17, 10, 56]
le mercredi l' as-police a marqué 3 buts contre racing

Afais from the code, special tokens and spaces are added by the Token set code. What am I doing wrong? 😊

Finetuned model produces empty transcriptions

What are your recommendations for finetuning on 100 audio files? is 1000 steps is must?
After 40 steps with 44khz, 16khz, 8khz audio transciption results are empty(

Is there a way to finetune via GPU with less than 20Gb gpu memory^^&? I am utilizing colab it gives less memory that require huggingsound training as it seams. THank you very much, appreciate your help!

image

image

_

Throw Segmentation fault when finetuning on cuda

(sound) yons@gpu1-4090:~/wav2text$ python sound-finetune.py
using cuda:0
model_path /home/yons/wav2text/my/finetuned/model
02/15/2024 06:27:09 - INFO - huggingsound.speech_recognition.model - Loading model...
02/15/2024 06:27:10 - WARNING - root - blank_token <pad> not in provided tokens. It will be added to the list of tokens
02/15/2024 06:27:10 - WARNING - root - silence_token | not in provided tokens. It will be added to the list of tokens
02/15/2024 06:27:10 - WARNING - root - unk_token <unk> not in provided tokens. It will be added to the list of tokens
02/15/2024 06:27:10 - WARNING - root - bos_token <s> not in provided tokens. It will be added to the list of tokens
02/15/2024 06:27:10 - WARNING - root - eos_token </s> not in provided tokens. It will be added to the list of tokens
begin trainning
02/15/2024 06:27:10 - WARNING - huggingsound.speech_recognition.model - The model is already fine-tuned. So the provided token_set won't be used. The model's token_set will be used instead
02/15/2024 06:27:10 - INFO - huggingsound.speech_recognition.model - Loading training data...
02/15/2024 06:27:10 - INFO - huggingsound.speech_recognition.model - Converting data format...
02/15/2024 06:27:10 - INFO - huggingsound.speech_recognition.model - Preparing data input and labels...
Map: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 14.41 examples/s]
02/15/2024 06:27:11 - INFO - huggingsound.speech_recognition.model - Loading evaluation data...
02/15/2024 06:27:11 - INFO - huggingsound.speech_recognition.model - Converting data format...
02/15/2024 06:27:11 - INFO - huggingsound.speech_recognition.model - Preparing data input and labels...
Map: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 1098.46 examples/s]
02/15/2024 06:27:11 - INFO - huggingsound.speech_recognition.model - Starting fine-tuning process...
02/15/2024 06:27:11 - INFO - huggingsound.trainer - Getting dataset stats...
02/15/2024 06:27:11 - INFO - huggingsound.trainer - Training dataset size: 6 samples, 0.011141649305555557 hours
02/15/2024 06:27:11 - INFO - huggingsound.trainer - Evaluation dataset size: 6 samples, 0.011141649305555557 hours
/home/yons/.conda/envs/sound/lib/python3.10/site-packages/transformers/models/wav2vec2/modeling_wav2vec2.py:1913: FutureWarning: The method `freeze_feature_extractor` is deprecated and will be removed in Transformers v5. Please use the equivalent `freeze_feature_encoder` method instead.
  warnings.warn(
02/15/2024 06:27:12 - INFO - huggingsound.trainer - Building trainer...
02/15/2024 06:27:12 - INFO - huggingsound.trainer - Starting training...
02/15/2024 06:27:12 - INFO - huggingsound.trainer - Calling train /home/yons/wav2text/my/finetuned/model
  0%|                                                                                                                                                                                                                       | 0/20 [00:00<?, ?it/s]/home/yons/.conda/envs/sound/lib/python3.10/site-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.
  warnings.warn(
/home/yons/.conda/envs/sound/lib/python3.10/site-packages/torch/nn/parallel/_functions.py:68: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.
  warnings.warn('Was asked to gather along dimension 0, but all '
Segmentation fault (core dumped)

env: python 3.10

torch==2.1.2
torchvision==0.16.2
torchaudio==2.1.2
accelerate==0.25.0
torchmetrics==1.2.1
transformers==4.35.2
datasets==2.16.1
jiwer==3.0.3
librosa==0.10.1

Convergence Speed of Wav2Vec

Thanks for sharing your code and fine-tuned models, that helps me a lot !
I have some confusion about fine-tuning wav2vec2.0. When I fine-tuned a chinese wav2vec2 from facebook/wav2vec2-large-xlsr-53, I find it converge too slow. So I want to know how much batch_size did you set and when the model converge (which epoch)?

Thank you again !

[feature request/idea] Add time stamps for the start/end of each word

The code currently seems to provide only time stamps for different characters. But often it’s more useful to have time stamps for separate words. This could be done by tokenizing the text and matching the start/end of each token to the time stamps for the relevant characters.

Challenges:

  • Making this work well for different languages. (How to turn this into an intuitive interface?)

'CTCTrainer' object has no attribute 'use_amp'

Use the latest huggingsound.

#!pip list | grep huggingsound
huggingsound 0.1.4

AttributeError occurs when finetune is performed as shown in the sample below.
https://github.com/jonatasgrosman/huggingsound#fine-tuning

/usr/local/lib/python3.7/dist-packages/huggingsound/trainer.py in training_step(self, model, inputs)
432 inputs = self._prepare_inputs(inputs)
433
--> 434 if self.use_amp:
435 with torch.cuda.amp.autocast():
436 loss = self.compute_loss(model, inputs)

AttributeError: 'CTCTrainer' object has no attribute 'use_amp'

Can you find the cause?

Using more GPU memory

Hello, Running the code below, It doesn't use more than 2129MiB out of 24564MiB(less than 10%). Trying to increase the batch size resulted in out of RAM memory(process was killed). How can I make better use of GPU memory to speed the training?

def train_spanish(environment):
torch.device("cuda")

model = SpeechRecognitionModel("jonatasgrosman/wav2vec2-large-xlsr-53-spanish")
processor = Wav2Vec2Processor.from_pretrained("jonatasgrosman/wav2vec2-large-xlsr-53-spanish")
token_list = list(processor.tokenizer.encoder.keys())
token_set = TokenSet(token_list)

train_set = []
eval_set = []

train_set,eval_set = add_sealed_data_set(train_set,eval_set,config[environment][SAMPLES_DIR])

training_arguments = TrainingArguments()
training_arguments.overwrite_output_dir = True
training_arguments.per_device_train_batch_size = 128
training_arguments.per_device_eval_batch_size  = 128

model.finetune(
        config[environment][MODEL_OUTPUT_DIR],
        train_data=train_set,
        eval_data=eval_set,  # the eval_data is optional
        token_set=token_set,
        training_args=training_arguments
)

KeyError: 'transcription'

File "/usr/local/bin/source/huggingsound/examples/speech_recognition/finetune.py", line 71, in <module>                                                                              
    model.finetune(                                                                                                                                                                    
  File "/usr/local/lib/python3.10/dist-packages/huggingsound/speech_recognition/model.py", line 353, in finetune                                                                       
    train_dataset = self._get_dataset(processor, text_normalizer, train_data, train_data_cache_dir, training_args.length_column_name, num_workers)                                     
  File "/usr/local/lib/python3.10/dist-packages/huggingsound/speech_recognition/model.py", line 272, in _get_dataset                                                                   
    dataset = self._prepare_dataset_for_finetuning(dataset, processor, text_normalizer, length_column_name, num_workers)                                                               
  File "/usr/local/lib/python3.10/dist-packages/huggingsound/speech_recognition/model.py", line 251, in _prepare_dataset_for_finetuning                                                
    dataset = dataset.map(                                                                                                                                                             
  File "/usr/local/lib/python3.10/dist-packages/datasets/arrow_dataset.py", line 580, in wrapper                                                                                       
    out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)                                                                                                                 
  File "/usr/local/lib/python3.10/dist-packages/datasets/arrow_dataset.py", line 545, in wrapper                                                                                       
    out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)                                                                                                                 
  File "/usr/local/lib/python3.10/dist-packages/datasets/arrow_dataset.py", line 3087, in map                                                                                          
    for rank, done, content in Dataset._map_single(**dataset_kwargs):                                                                                                                  
  File "/usr/local/lib/python3.10/dist-packages/datasets/arrow_dataset.py", line 3441, in _map_single                                                                                  
    example = apply_function_on_filtered_inputs(example, i, offset=offset)                                                                                                             
  File "/usr/local/lib/python3.10/dist-packages/datasets/arrow_dataset.py", line 3344, in apply_function_on_filtered_inputs                                                            
    processed_inputs = function(*fn_args, *additional_args, **fn_kwargs)                                                                                                               
  File "/usr/local/lib/python3.10/dist-packages/huggingsound/speech_recognition/model.py", line 242, in __process_dataset_sample                                                       
    transcription = text_normalizer(sample["transcription"]) + " "                                                                                                                     
  File "/usr/local/lib/python3.10/dist-packages/datasets/formatting/formatting.py", line 270, in __getitem__                                                                           
    value = self.data[key]
KeyError: 'transcription'

fix error in line 242:
transcription = text_normalizer(sample["transcription"]) + " "
change to
transcription = text_normalizer(sample["sentence"]) + " "

'CTCTrainer' object has no attribute 'deepspeed'

Hi Jonatan,

Thanks for creating huggingsound repository. I have personally found it useful in building speech technologies for new data.

I am facing the below issue when I attempted fine tuning "facebook/wav2vec2-xls-r-300m" model on my own dataset. Please note that the end-to-end setup is created on Kaggle.

AttributeError: 'CTCTrainer' object has no attribute 'deepspeed'

if (hasattr(self, 'use_amp') and self.use_amp) or (hasattr(self, 'use_cuda_amp') │ │ 452 │ │ │ self.scaler.scale(loss).backward() │ │ ❱ 453 │ │ elif self.deepspeed: │ │ 454 │ │ │ self.deepspeed.backward(loss) │ │ 455 │ │ else: │ │ 456 │ │ │ loss.backward()

I have tried the following, despite which the error still appears:

  • pip install deepspeed
  • I noticed that the Training arguments class doesn't have a "deepspeed" variable, probably the issue is because of it

Attaching the training arguments for your kind reference:

TrainingArguments(overwrite_output_dir=True,
ignore_pretrained_weights=False,
dataloader_num_workers=0,
learning_rate=0.0003,
min_learning_rate=0.0,
weight_decay=0.0,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
max_grad_norm=1.0,
lr_warmup_steps=0,
lr_decay_steps=0, eval_steps=200,
group_by_length=True,
length_column_name='length',
gradient_accumulation_steps=1,
gradient_checkpointing=True,
pad_to_multiple_of=None,
per_device_train_batch_size=2,
per_device_eval_batch_size=2,
fp16=False,
use_8bit_optimizer=False,
logging_steps=100,
num_train_epochs=3.0,
max_steps=1000,
report_to=['none'],
save_total_limit=None,
metric_for_best_model=None,
_n_gpu=1,
seed=42,
training_step_callbacks=[],
batch_creation_callbacks=[],
evaluation_callbacks=[],
metrics_batch_size=None,
show_dataset_stats=True,
early_stopping_patience=None,
load_best_model_at_end=False)

Please let me know if I have missed out anything. I really appreciate your time in advance.

Thanks and Regards,
Pradeep

Inference issue after finetuning for spanish

Hi!

First of all, thank you for your code and your models! Really really useful!

I've used the finetuning script to try to finetune it to spanish with common voice dataset. However, after inference, given any audio from the test set of common voice, the model generate an empty string. Have you faced this issue before? I didn't change anything from your script, so I don't know where the problem could be.

Thanks again! (ps. your jonatasgrosman/wav2vec2-xls-r-1b-spanish model is amazing, congrats!!)

Memory issue

Thanks for this wonderful package, we were trying to fine-tune a dataset with around 30k records, but getting memory error, could you let me know how to introduce batch size into this?

I tried changing the batch size value in the trainer file, but it's not working out. Can you advice?

Different evaluation results on HuggingFace and locally

Hello
I encountered a problem.

Model: jonatasgrosman/wav2vec2-xls-r-1b-russian

Example 1

On HuggingFace using Hosted inference API (good):

рекомендуем при обращении в контактный центр использовать код клиента

Locally using huggingsound library (bad, missing a whitespace)

рекомендуем приобращение в контактный центр использовать кодклиента

Example 2

On HuggingFace using Hosted inference API (good):

в настоящий момент по техническим причинам купюры номиналом пять тысяч рублей действительно не принимаются в некоторых банкоматах

Locally using huggingsound library (bad, ending, syntax)

в настоящий момент по техническим причинам купюра номеналом пять тысяч рублей действительно не принимаются в некоторых банкоматах

jonatasgrosman

Jonatas, tudo bem?
Cara, como eu uso este seu projeto?
segui aqui: https://pypi.org/project/huggingsound/, mas sem chance, nada funciona. :(

from huggingsound import SpeechRecognitionModel
Traceback (most recent call last):
File "<pyshell#0>", line 1, in
from huggingsound import SpeechRecognitionModel
File "C:\Users\Fabio\AppData\Local\Programs\Python\Python37\lib\site-packages\huggingsound_init_.py", line 1, in
from huggingsound.trainer import TrainingArguments, ModelArguments
File "C:\Users\Fabio\AppData\Local\Programs\Python\Python37\lib\site-packages\huggingsound\trainer.py", line 9, in
from datasets import Dataset
File "C:\Users\Fabio\AppData\Local\Programs\Python\Python37\lib\site-packages\datasets_init_.py", line 43, in
from .arrow_dataset import Dataset
File "C:\Users\Fabio\AppData\Local\Programs\Python\Python37\lib\site-packages\datasets\arrow_dataset.py", line 60, in
from huggingface_hub import HfApi, HfFolder
File "C:\Users\Fabio\AppData\Local\Programs\Python\Python37\lib\site-packages\huggingface_hub_init_.py", line 322, in getattr
submod = importlib.import_module(submod_path)
File "C:\Users\Fabio\AppData\Local\Programs\Python\Python37\lib\importlib_init_.py", line 127, in import_module
return bootstrap.gcd_import(name[level:], package, level)
File "C:\Users\Fabio\AppData\Local\Programs\Python\Python37\lib\site-packages\huggingface_hub\hf_api.py", line 32, in
import requests
File "C:\Users\Fabio\AppData\Local\Programs\Python\Python37\lib\site-packages\requests_init
.py", line 43, in
import urllib3
File "C:\Users\Fabio\AppData\Local\Programs\Python\Python37\lib\site-packages\urllib3_init
.py", line 42, in
"urllib3 v2.0 only supports OpenSSL 1.1.1+, currently "
ImportError: urllib3 v2.0 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'OpenSSL 1.1.0h 27 Mar 2018'. See: urllib3/urllib3#2168

Cut a new release?

It seems there hasn't been a release in going on a year, though lots of dependencies have been updated. This makes at least the readme somewhat confusing, as it instructs to install a very old version.

Is there any intention to deploy to Pypi? By deploying to pypi you will allow users to generate hashes in their requirements files. Deploying directly from github means that isn't possible, unless I'm mistaken.

AttributeError: 'CTCTrainer' object has no attribute 'use_amp'

Hello again, I am trying to perform a fine tuning and it returns me this error. I searched inside the code and 'use_amp' is in the 'training_step' method.
How can I solved it? Thanks in advance.

To reproduce:
conda create -n huggingsound python=3.9 -y
conda activate huggingsound
pip install huggingsound

and then, after the definition of model, training data, tokens and output directory, I run the 'model.finetune' method and got this error.

Transcriptions have no spaces - wav2vec2-xls-r-1b-spanish

I am working on Speech to Text for ~135 (or less) second audios of spanish recorded by lapel microphons or VR goggles. I am using wav2vec2-xls-r-1b-spanish and the language model lm.binary and unigrams.txt provided. They are the ones downloaded from jonatasgrosman/wav2vec2-large-xlsr-53-spanish, but based on the size they seems to be the exact same for 1b. I originally started with large version, but I opted for 1b for better performance.

My plan is to work on the text with the pysentimiento pre-trained spanish sentiment and emotion analyzer. The problem I have is that the text does not have spaces separating the words.

Is there a quick fix for this or any suggestions?

Example:
alesundíamanormalparamímelevantosobrelasochodelamañana desayunasepredesayunoalomismodeayunosquirconceriales yfrutameduchomeevistoacosasenchilavoycaminandosube lacuestahastaelaparadadelautobustyietesperoquevenga autobusesestallevaalaparadadesanlorenzocojoelmetro

code:


model = SpeechRecognitionModel("jonatasgrosman/wav2vec2-xls-r-1b-spanish")
lm_path = "language_model/lm.binary"
unigrams_path = "language_model/unigrams.txt"
decoder = KenshoLMDecoder(model.token_set, lm_path=lm_path, unigrams_path=unigrams_path)

def process_single_audio(correct_path, sr=16000,):
   

    #y, sr = librosa.load(str(path+correct_path),sr=sr)
    transcriptions = model.transcribe([str(correct_path)[1:]], decoder=decoder)

    print(transcriptions[0]['transcription'])


    return transcriptions[0]['transcription']

Trouble with installation

Hi, the title explains the issue I am encountering. Running pip install huggingsound on python 3.7 does not work. I am working on a clean conda environment using python 3.7 on a AWS instance.
The problem seems to be (up to now) the python-Levehnstein module.

Building wheel for python-Levenshtein (setup.py) ... error

and then it prints quite an amount of red lines. The final ones are:

ERROR: Command errored out with exit status 1: /opt/conda/envs/lm_asr/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-4k9a6e70/python-levenshtein_fac72c8f097c43cca31ab0f413816b09/setup.py'"'"'; __file__='"'"'/tmp/pip-install-4k9a6e70/python-levenshtein_fac72c8f097c43cca31ab0f413816b09/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-qum2of9f/install-record.txt --single-version-externally-managed --compile --install-headers /opt/conda/envs/lm_asr/include/python3.7m/python-Levenshtein Check the logs for full command output.

Thanks in advance for your attention.

Question about '1b' model

Dear Jonatas,

Question, not a bug-report.
The jonatasgrosman/wav2vec2-xls-r-1b-german model removes all numbers.
Is there a way to recognize numbers?

Thank you for your great models!
Best wishes from Vienna
Markus

Testcase - output.zip
Meaning: etwa 20000 euro - ungefähr 12000 euro, 1b result: etwa euro - ungefähr euro

import torch, transformers, librosa
filepath = 'output.wav'
for MODEL_ID in ['jonatasgrosman/wav2vec2-large-xlsr-53-german','jonatasgrosman/wav2vec2-xls-r-1b-german']:
    processor = transformers.Wav2Vec2Processor.from_pretrained(MODEL_ID)
    model = transformers.Wav2Vec2ForCTC.from_pretrained(MODEL_ID)
    speech_array, sampling_rate = librosa.load(filepath, sr=16_000)
    inputs = processor(speech_array, sampling_rate=16_000, return_tensors="pt", padding=True)
    with torch.no_grad():
        logits = model(inputs.input_values, attention_mask=inputs.attention_mask).logits
    predicted_ids = torch.argmax(logits, dim=-1)
    predicted_sentences = processor.batch_decode(predicted_ids)
    print( MODEL_ID, predicted_sentences[0] )

Install Issue

I am not able to install via PIP as I get this error message:

INFO: pip is looking at multiple versions of huggingsound to determine which version is compatible with other requirements. This could take a while.
ERROR: Ignored the following versions that require a different python version: 0.0.1 Requires-Python >=3.7,<3.10; 0.1.0 Requires-Python >=3.7,<3.10; 0.1.1 Requires-Python >=3.7,<3.10; 0.1.2 Requires-Python >=3.7,<3.10; 0.1.3 Requires-Python >=3.7,<3.10; 0.1.4 Requires-Python >=3.7,<3.10; 0.1.5 Requires-Python >=3.7,<3.10
ERROR: Could not find a version that satisfies the requirement torch!=1.12.0,<1.13.0,>=1.7 (from huggingsound) (from versions: 2.0.0, 2.0.1, 2.1.0, 2.1.1, 2.1.2, 2.2.0)
ERROR: No matching distribution found for torch!=1.12.0,<1.13.0,>=1.7

About self-defined token set

Hello!
I am trying to use my own token set and it's like this:
tokens = ["b", "p", "m", "f", "d", "t", "n", "l", "g", "k",
"h", "j", "q", "x", "zh", "ch", "sh", "r", "z", "c",
"s", "y", "w", "a", "o", "e", "i", "u", "v", "ai",
"ei", "ui", "ao","ou", "iu", "ie", "ve", "er", "an", "en",
"in", "un", "vn", "ang", "eng", "ing", "ong"]

My transcriptions are like:
"g u an b i k ong t i ao", "d a k ai d eng d ai", "r an r e g u an d i ao zh u an x i ang w en d u er"

I wonder if I directly do the finetune using the current dictionary and transcriptions, they can be mapped correctly, since the transcriptions are aleady separated with "spaces"

Issue with converting large dataset_from_dict_list

The following block slows down with the larger dict lists and becomes completely unusable when it contains millions of items:

keys = data[0].keys()
transformed_data = {}
for key in keys:
for d in data:
transformed_data[key] = transformed_data.get(key, []) + [d[key]]

changing it to

    keys = data[0].keys()
    transformed_data = {key: [d[key] for d in data] for key in keys}

fixes the issue

CUDA error

Bonjour,

Lors de min finetuning j'ai une erreur:

RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

Voici mon bout de code

`import torch
import shutil
import os
os.environ['CUDA_LAUNCH_BLOCKING'] = "1"

from huggingsound import TrainingArguments, ModelArguments, SpeechRecognitionModel, TokenSet
device = "cuda" if torch.cuda.is_available() else "cpu"

model = SpeechRecognitionModel("jonatasgrosman/wav2vec2-large-xlsr-53-french", device=device)
output_dir = "/content/drive/MyDrive/wav-example/output2"

for filename in os.listdir(output_dir):
file_path = os.path.join(output_dir, filename)
try:
if os.path.isfile(file_path) or os.path.islink(file_path):
os.unlink(file_path)
elif os.path.isdir(file_path):
shutil.rmtree(file_path)
except Exception as e:
print(f"Failed to delete {file_path}. Reason: {e}")

first of all, you need to define your model's token set

however, the token set is only needed for non-finetuned models

if you pass a new token set for an already finetuned model, it'll be ignored during training

Notez que l'ajout de ces tokens est crucial, car leur absence pourrait affecter les performances du modèle ou même entraîner des erreurs lors de l'entraînement ou de l'inférence.

tokens = [
"a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m",
"n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z",
"'", "", "|", "", "", ""
]
token_set = TokenSet(tokens)

define your train/eval data

train_data = [
{"path": "/content/drive/MyDrive/wav-example/audio4.wav", "transcription": "bonjour je m'appelle Manuel je développe sous Androïd en Kotlin je fais des applications mobiles pour la société forestière je travaille dans la classification et reconnaissance vocale dans les essences et dans le domaine de la foresterie merci"},
]
eval_data = [
{"path": "/content/drive/MyDrive/wav-example/audio5.wav", "transcription": "je m'appelle Julien je développe sous Androïd fullstack pour la société forestière"},
]

the lines below will load the training and model arguments objects,

you can check the source code (huggingsound.trainer.TrainingArguments and huggingsound.trainer.ModelArguments) to see all the available arguments

training_args = TrainingArguments(
learning_rate=3e-4,
max_steps=1000,
eval_steps=200,
per_device_train_batch_size=2,
per_device_eval_batch_size=2,
)
model_args = ModelArguments(
activation_dropout=0.1,
hidden_dropout=0.1,
)

evaluation = model.evaluate(eval_data)

print(evaluation)

and finally, fine-tune your model

model.finetune(
output_dir,
train_data=train_data,
eval_data=eval_data, # the eval_data is optional
token_set=token_set,
training_args=training_args,
model_args=model_args,
)`

Sous Google Collab Pro + sous une carte GPU avec Cuda NVidia A100

image

Attention mask error in certain models

Some models use attention mask, but some don't. For example, running HuggingSound with the Facebook Voxpopouli models (eg. facebook/wav2vec2-base-10k-voxpopuli-ft-pl) crashes with the error that the attention mask cannot be found.

The solution is to change the relevant code here:

         with torch.no_grad():
              logits = self.model(inputs.input_values.to(self.device), attention_mask=inputs.attention_mask.to(self.device)).logits

We just need to add a simple check:

            with torch.no_grad():
                if 'attention_mask' in inputs:
                    logits = self.model(inputs.input_values.to(self.device),attention_mask=inputs.attention_mask.to(self.device)).logits
                else:
                    logits = self.model(inputs.input_values.to(self.device)).logits

I'm too lazy to make a PR now, so if someone could do it that would be swell :-)

truth should be a list of list of strings after transform which are non-empty

Use the latest huggingsound.

#!pip list | grep huggingsound
huggingsound 0.1.4

When I run evaluate on a finetuned model, I get a ValueError.

/usr/local/lib/python3.7/dist-packages/jiwer/measures.py in _preprocess(truth, hypothesis, truth_transform, hypothesis_transform)
332 if not _is_list_of_list_of_strings(transformed_truth, require_non_empty_lists=True):
333 raise ValueError(
--> 334 "truth should be a list of lists of strings after transform which are non-empty"
335 )
336 if not _is_list_of_list_of_strings(

ValueError: truth should be a list of list of strings after transform which are non-empty

However, I do not think it is a bug in evaluate, because it terminates successfully for the trained model obtained as is, as shown below.
https://github.com/jonatasgrosman/huggingsound#evaluation

I think something is going on during finetune, what does this message point me to do?
Any advice would be appreciated.

Using ASR on GPU

Hi,
Thanks for the great work!
How to run the following on GPU

transcriptions = model.transcribe(audio_paths)

I am transcribing Japanese and 1 min of audio takes about 1 min to transcribe. So, GPU is needed. thank you!

EDIT: I figured, all I have to do was use cuda as one of the parameters.

model = SpeechRecognitionModel("jonatasgrosman/wav2vec2-large-xlsr-53-japanese", 
	"cuda")

Pre-trained uppercase models don't work

First of all thanks for this great library, it's really helpful :)

I just tried to fine-tune a model by facebook that they previously fine-tuned on English transcription tasks: facebook/wav2vec2-large-960h-lv60-self.

During the training I get WERs of 100%, and after training, model.transcribe() returns empty results.

The issue seems to be that this model was trained with a upper-case character vocabulary.

To overcome this, I found this very easy fix, which just converts the vocabulary of the encoder/decoder to lower case:

from huggingsound import TrainingArguments, ModelArguments, SpeechRecognitionModel, TokenSet
model = SpeechRecognitionModel(model_name, device='cuda')

model.processor.tokenizer.encoder = {k.lower(): v for k, v in model.processor.tokenizer.encoder.items()}
model.processor.tokenizer.decoder = {k: v.lower() for k, v in model.processor.tokenizer.decoder.items()} 

Would be great to integrate this somehow into the library.

Fine-tuned version of model - a raised exception

Hello.

I am getting the following exception:

ValueError: Not fine-tuned model! Please, fine-tune the model first.

I have looked into the code and see that it needs to have Wav2Vec2ForPreTraining (self.model_config.architectures) in the ctc_finetuded_architectures variable.

Now this variable has these values:

{'WavLMForCTC', 'HubertForCTC', 'UniSpeechSatForCTC', 'Wav2Vec2ForCTC', 'UniSpeechForCTC', 'SEWForCTC', 'SEWDForCTC'}

I am running the code with this model - https://huggingface.co/Yehor/wav2vec2-xls-r-300m-uk-with-lm

I disabled the code that raises that exception and it seems there is no issue.

I would like to use some type of configuration to be able to run the code without changing the library code.

Getting error during training

RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor

Which means that the data did not move to GPU.

My code:

torch.device("cuda")

model = SpeechRecognitionModel("jonatasgrosman/wav2vec2-large-xlsr-53-spanish", device="cuda")
processor_ref = Wav2Vec2Processor.from_pretrained("jonatasgrosman/wav2vec2-large-xlsr-53-spanish")
token_list = list(processor_ref.tokenizer.encoder.keys())
token_set = TokenSet(token_list)

train_set = []
eval_set = []

train_set, eval_set = add_sealed_data_set(train_set, eval_set, config[environment][SAMPLES_DIR])

training_arguments = TrainingArguments()
training_arguments.overwrite_output_dir = True
training_arguments.per_device_train_batch_size = 128
training_arguments.per_device_eval_batch_size = 128

model.finetune(
    config[environment][MODEL_OUTPUT_DIR],
    train_data=train_set,
    eval_data=eval_set,  # the eval_data is optional
    token_set=token_set,
    training_args=training_arguments
)

Managing to work around this by adding a move to cuda of my dataset inside huggingsound code. If I can make it work I'll create a PR

Possible issue when using HuggingFace portuguese language model

First, thank you very much for this great project, it makes ASR very easy!

And your models are awesome! I made some accuracy tests with https://huggingface.co/jonatasgrosman/wav2vec2-xls-r-1b-portuguese model (sepinf-inc/IPED#1214 (comment)) and it is comparable to Microsoft's and Google's pt-BR models, actually a bit better!

Now I'm trying to use a language model as described in the Readme.md. I'm trying to use the same LM in the language_model folder in the HuggingFace model card above, but it prints some warning in console:

09/02/2022 12:10:19 - WARNING - pyctcdecode.alphabet - Found entries of length > 1 in alphabet. This is unusual unless style is BPE, but the alphabet was not recognized as BPE type. Is this correct?
09/02/2022 12:10:19 - WARNING - pyctcdecode.alphabet - Unigrams and labels don't seem to agree.

WER accuracy also dropped a lot. Am I doing something wrong? What language model is compatible to the above Portuguese model?

Thanks in advance

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.