Giter Site home page Giter Site logo

huggingface / audio-transformers-course Goto Github PK

View Code? Open in Web Editor NEW
267.0 33.0 80.0 4.12 MB

The Hugging Face Course on Transformers for Audio

License: Apache License 2.0

Makefile 0.01% Python 0.75% MDX 99.24%
audio deep-learning hacktoberfest transformers

audio-transformers-course's Introduction

The Audio Transformers Course

This repo contains the content that's used to create Hugging Face's Audio Transformers Course. The course teaches you about applying Transformers to various tasks in audio and speech processing.It's completely free and open-source!

๐ŸŒŽ Languages and translations

Language Source Authors
Bengali chapters/bn
English chapters/en
Spanish chapters/es
French chapters/fr
Korean chapters/ko
Russian chapters/ru @blademoon, @Lightmourne
Turkish chapters/tr
Chinese (simplified) chapters/zh-CN

Translating the course into your language

As part of our mission to democratise machine learning, we'd love to have the course available in many more languages! Please follow the steps below if you'd like to help translate the course into your language ๐Ÿ™.

๐Ÿ—ž๏ธ Open an issue

To get started, navigate to the Issues page of this repo and check if anyone else has opened an issue for your language. If not, open a new issue by selecting the Translation template from the New issue button.

Once an issue is created, post a comment to indicate which chapters you'd like to work on and we'll add your name to the list.

๐Ÿ—ฃ Join our Discord

Since it can be difficult to discuss translation details quickly over GitHub issues, we have created dedicated channels for each language on our Discord server. Join here ๐Ÿ‘‰: http://hf.co/join/discord

๐Ÿด Fork the repository

Next, you'll need to fork this repo. You can do this by clicking on the Fork button on the top-right corner of this repo's page.

Once you've forked the repo, you'll want to get the files on your local machine for editing. You can do that by cloning the fork with Git as follows:

git clone https://github.com/YOUR-USERNAME/audio-transformers-course

๐Ÿ“‹ Copy-paste the English files with a new language code

The course files are organised under a main directory:

  • chapters: all the text and code snippets associated with the course.

You'll only need to copy the files in the chapters/en directory, so first navigate to your fork of the repo and run the following:

cd ~/path/to/audio-transformers-course
cp -r chapters/en/CHAPTER-NUMBER chapters/LANG-ID/CHAPTER-NUMBER

Here, CHAPTER-NUMBER refers to the chapter you'd like to work on and LANG-ID should be ISO 639-1 (two lower case letters) language code -- see here for a handy table. Alternatively, {two lowercase letters}-{two uppercase letters} format is also supported, e.g. zh-CN, here's an example.

โœ๏ธ Start translating

Now comes the fun part - translating the text! The first thing we recommend is translating the part of the _toctree.yml file that corresponds to your chapter. This file is used to render the table of contents on the website and provide the links to the Colab notebooks. The only fields you should change are the title, ones -- for example, here are the parts of _toctree.yml that we'd translate for Chapter 0 of the NLP course:

- title: 0. Setup # Translate this!
  sections:
  - local: chapter0/1 # Do not change this!
    title: Introduction # Translate this!

๐Ÿšจ Make sure the _toctree.yml file only contains the sections that have been translated! Otherwise you won't be able to build the content on the website or locally (see below how).

Once you have translated the _toctree.yml file, you can start translating the MDX files associated with your chapter.

๐Ÿ™‹ If the _toctree.yml file doesn't yet exist for your language, you can simply create one by copy-pasting from the English version and deleting the sections that aren't related to your chapter. Just make sure it exists in the chapters/LANG-ID/ directory!

๐Ÿ‘ทโ€โ™‚๏ธ Build the course locally

Once you're happy with your changes, you can preview how they'll look by first installing the doc-builder tool that we use for building all documentation at Hugging Face:

python -m pip install hf-doc-builder
doc-builder preview audio-transformers-course ../audio-transformers-course/chapters/LANG-ID --not_python_module

This will build and render the course on http://localhost:3000/. Although the content looks much nicer on the Hugging Face website, this step will still allow you to check that everything is formatted correctly.

๐Ÿš€ Submit a pull request

If the translations look good locally, the final step is to prepare the content for a pull request. Here, the first think to check is that the files are formatted correctly. For that you can run:

pip install -r requirements.txt
make style

Once that's run, commit any changes, open a pull request, and wait for a review. Congratulations, you've now completed your first translation ๐Ÿฅณ!

๐Ÿšจ To build the course on the website, double-check your language code exists in languages field of the build_documentation.yml and build_pr_documentation.yml files in the .github folder. If not, just add them in their alphabetical order.

๐Ÿ“” Jupyter notebooks

The Jupyter notebooks containing all the code from the course are hosted on the huggingface/notebooks repo. If you wish to generate them locally, first install the required dependencies:

python -m pip install -r requirements.txt

Then run the following script:

python utils/generate_notebooks.py --output_dir nbs

This script extracts all the code snippets from the chapters and stores them as notebooks in the nbs folder (which is ignored by Git by default).

โœ๏ธ Contributing a new chapter

Note: we are not currently accepting community contributions for new chapters. These instructions are for the Hugging Face authors.

Adding a new chapter to the course is quite simple:

  1. Create a new directory under chapters/en/chapterX, where chapterX is the chapter you'd like to add.
  2. Add numbered MDX files sectionX.mdx for each section.
  3. Update the _toctree.yml file to include your chapter sections -- this information will render the table of contents on the website. If your section involves both the PyTorch and TensorFlow APIs of transformers, make sure you include links to both Colabs in the colab field.

If you get stuck, check out one of the existing chapters -- this will often show you the expected syntax.

Once you are happy with the content, open a pull request and wait for a review. We recommend adding the first chapter draft as a single pull request -- the team will then provide feedback internally to iterate on the content ๐Ÿค—!

๐Ÿ™Œ Acknowledgements

The structure of this repo and README are inspired by the wonderful Advanced NLP with spaCy course.

audio-transformers-course's People

Contributors

agercas avatar bharatr21 avatar blademoon avatar crcdng avatar dame-cell avatar fisheggg avatar gabrielwithappy avatar hollance avatar jinnsp avatar jjyaoao avatar kaleo996 avatar lbourdois avatar lewtun avatar lightmourne avatar merveenoyan avatar mhrdyn7 avatar mishig25 avatar mkhalusova avatar osamja avatar practice-dump avatar proshian avatar ptah23 avatar ritog avatar rtrompier avatar sanchit-gandhi avatar susnato avatar vaibhavs10 avatar veluchs avatar wetdog avatar ylacombe avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

audio-transformers-course's Issues

Wrong keyword in "Unit 4: Pre-trained models and datasets for audio classification"

In the section Speech Commands, the code that is supposed to be run is:

classifier = pipeline(
    "audio-classification", model="MIT/ast-finetuned-speech-commands-v2"
)
classifier(sample["audio"])

But it leads to the following error:

ValueError: We expect a numpy ndarray as input
Full Error Output
ValueError                                Traceback (most recent call last)

[<ipython-input-8-a13009e6d325>](https://localhost:8080/#) in <cell line: 4>()
      2     "audio-classification", model="MIT/ast-finetuned-speech-commands-v2"
      3 )
----> 4 classifier(sample["audio"])

3 frames

[/usr/local/lib/python3.10/dist-packages/transformers/pipelines/audio_classification.py](https://localhost:8080/#) in __call__(self, inputs, **kwargs)
    128             - **score** (`float`) -- The corresponding probability.
    129         """
--> 130         return super().__call__(inputs, **kwargs)
    131 
    132     def _sanitize_parameters(self, top_k=None, **kwargs):

[/usr/local/lib/python3.10/dist-packages/transformers/pipelines/base.py](https://localhost:8080/#) in __call__(self, inputs, num_workers, batch_size, *args, **kwargs)
   1118             )
   1119         else:
-> 1120             return self.run_single(inputs, preprocess_params, forward_params, postprocess_params)
   1121 
   1122     def run_multi(self, inputs, preprocess_params, forward_params, postprocess_params):

[/usr/local/lib/python3.10/dist-packages/transformers/pipelines/base.py](https://localhost:8080/#) in run_single(self, inputs, preprocess_params, forward_params, postprocess_params)
   1124 
   1125     def run_single(self, inputs, preprocess_params, forward_params, postprocess_params):
-> 1126         model_inputs = self.preprocess(inputs, **preprocess_params)
   1127         model_outputs = self.forward(model_inputs, **forward_params)
   1128         outputs = self.postprocess(model_outputs, **postprocess_params)

[/usr/local/lib/python3.10/dist-packages/transformers/pipelines/audio_classification.py](https://localhost:8080/#) in preprocess(self, inputs)
    153 
    154         if not isinstance(inputs, np.ndarray):
--> 155             raise ValueError("We expect a numpy ndarray as input")
    156         if len(inputs.shape) != 1:
    157             raise ValueError("We expect a single channel audio input for AutomaticSpeechRecognitionPipeline")

ValueError: We expect a numpy ndarray as input

Happy New Year greetings and request for new topics in the course

Good day @sanchit-gandhi , @MKhalusova and the whole course team! Congratulations on the new year 2024!

First of all, I would like to thank you and the entire course team once again for the work done. The course turned out to be very good and allowed many of us to immerse ourselves in the topic of working with sound, some of us even managed to find a job thanks to your course. Especially liked the practical orientation of the course and the good and quite accessible presentation of the theoretical material. On behalf of all the members of the course translation team - "Thank you so much for your work!".

Over the past two months, people who have taken the course in Russian have been leaving feedback on topics they would like to see covered in the course. Together with Sergey (@Lightmourne), we systematized this feedback, which eventually became our request in this issue. If possible, could you further address the following topics in the course?

List of topics:

  1. Audio data preparation (broadly defined)
  2. Finding partial duplicates (duplication by time-shifting the audio) and full duplicate audio (filtering the dataset before training classification models). A common case of filtering datasets of 1 second duration.
  3. increasing the volume of audio data
  4. determining the need for class balancing for different tasks and models in the audio domain. Examples of class balancing for audio data, methods and techniques.

Translation to BENGALI

Hi there ๐Ÿ‘‹

Let's translate the course to BENGALI so that the whole community can benefit from this resource ๐ŸŒŽ!

Chapters

I would like to translate Chapters 0 and 1

Translation to Japanese

Hi there ๐Ÿ‘‹

Let's translate the course to Japanese so that the whole community can benefit from this resource ๐ŸŒŽ!

Below are the chapters and files that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

๐Ÿ™‹ If you'd like others to help you with the translation, you can also post in our forums or tag @_lewtun on Twitter to gain some visibility.

Chapters

0 - Setup

1 - Transformer models

2 - Using ๐Ÿค— Transformers

3 - Fine-tuning a pretrained model

4 - Sharing models and tokenizers

5 - The ๐Ÿค— Datasets library

6 - The ๐Ÿค— Tokenizers library

7 - Main NLP tasks

8 - How to ask for help

Events

Unusual ImportError (Chapter 4, Fine-Tuning)

An import error is raised when the TrainingArguments() is called.
[ImportError: Using the Trainer with PyTorch requires accelerate>=0.20.1: Please run pip install transformers[torch] or pip install accelerate -U]
However, this issue continues even if the latest version of Accelerate (0.21.0) is installed.
@sanchit-gandhi You can have a look at this notebook.

PolyAI/minds14 not available

Seem's like the PolyAI/minds14 dataset isn't available. When loading from code with load_dataset("PolyAI/minds14", "en-US"), I get the following error: ConnectionError: Couldn't reach https://www.dropbox.com/s/e2us0hcs3ilr20e/MInDS-14.zip?dl=1 (error 429). After reaching the following link, DropBox says:

Link temporarily disabled
This can happen when the link has been shared or downloaded too many times in a day. Check back later and weโ€™ll open access to more people.

It'd be great if this could be fixed, or at least replaced with another dataset which is available so that people can finish the course (which is great btw). ๐Ÿค—

The code was run on Colab Pro+ environment with A100 GPU, and the following is the entire traceback:
`---------------------------------------------------------------------------
ConnectionError Traceback (most recent call last)
in <cell line: 5>()
3 minds = DatasetDict()
4
----> 5 minds["train"] = load_dataset(
6 "PolyAI/minds14", "en-US",
7 )

10 frames
/usr/local/lib/python3.10/dist-packages/datasets/load.py in load_dataset(path, name, data_dir, data_files, split, cache_dir, features, download_config, download_mode, verification_mode, ignore_verifications, keep_in_memory, save_infos, revision, token, use_auth_token, task, streaming, num_proc, storage_options, **config_kwargs)
2151
2152 # Download and prepare data
-> 2153 builder_instance.download_and_prepare(
2154 download_config=download_config,
2155 download_mode=download_mode,

/usr/local/lib/python3.10/dist-packages/datasets/builder.py in download_and_prepare(self, output_dir, download_config, download_mode, verification_mode, ignore_verifications, try_from_hf_gcs, dl_manager, base_path, use_auth_token, file_format, max_shard_size, num_proc, storage_options, **download_and_prepare_kwargs)
952 if num_proc is not None:
953 prepare_split_kwargs["num_proc"] = num_proc
--> 954 self._download_and_prepare(
955 dl_manager=dl_manager,
956 verification_mode=verification_mode,

/usr/local/lib/python3.10/dist-packages/datasets/builder.py in _download_and_prepare(self, dl_manager, verification_mode, **prepare_splits_kwargs)
1715
1716 def _download_and_prepare(self, dl_manager, verification_mode, **prepare_splits_kwargs):
-> 1717 super()._download_and_prepare(
1718 dl_manager,
1719 verification_mode,

/usr/local/lib/python3.10/dist-packages/datasets/builder.py in _download_and_prepare(self, dl_manager, verification_mode, **prepare_split_kwargs)
1025 split_dict = SplitDict(dataset_name=self.dataset_name)
1026 split_generators_kwargs = self._make_split_generators_kwargs(prepare_split_kwargs)
-> 1027 split_generators = self._split_generators(dl_manager, **split_generators_kwargs)
1028
1029 # Checksums verification

~/.cache/huggingface/modules/datasets_modules/datasets/PolyAI--minds14/65c7e0f3be79e18a6ffaf879a083daf706312d421ac90d25718459cbf3c42696/minds14.py in _split_generators(self, dl_manager)
130 )
131
--> 132 archive_path = dl_manager.download_and_extract(self.config.data_url)
133 audio_path = dl_manager.extract(
134 os.path.join(archive_path, "MInDS-14", "audio.zip")

/usr/local/lib/python3.10/dist-packages/datasets/download/download_manager.py in download_and_extract(self, url_or_urls)
563 extracted_path(s): str, extracted paths of given URL(s).
564 """
--> 565 return self.extract(self.download(url_or_urls))
566
567 def get_recorded_sizes_checksums(self):

/usr/local/lib/python3.10/dist-packages/datasets/download/download_manager.py in download(self, url_or_urls)
426
427 start_time = datetime.now()
--> 428 downloaded_path_or_paths = map_nested(
429 download_func,
430 url_or_urls,

/usr/local/lib/python3.10/dist-packages/datasets/utils/py_utils.py in map_nested(function, data_struct, dict_only, map_list, map_tuple, map_numpy, num_proc, parallel_min_length, types, disable_tqdm, desc)
454 # Singleton
455 if not isinstance(data_struct, dict) and not isinstance(data_struct, types):
--> 456 return function(data_struct)
457
458 disable_tqdm = disable_tqdm or not logging.is_progress_bar_enabled()

/usr/local/lib/python3.10/dist-packages/datasets/download/download_manager.py in _download(self, url_or_filename, download_config)
452 # append the relative path to the base_path
453 url_or_filename = url_or_path_join(self._base_path, url_or_filename)
--> 454 return cached_path(url_or_filename, download_config=download_config)
455
456 def iter_archive(self, path_or_buf: Union[str, io.BufferedReader]):

/usr/local/lib/python3.10/dist-packages/datasets/utils/file_utils.py in cached_path(url_or_filename, download_config, **download_kwargs)
180 if is_remote_url(url_or_filename):
181 # URL, so get it from the cache (downloading if necessary)
--> 182 output_path = get_from_cache(
183 url_or_filename,
184 cache_dir=cache_dir,

/usr/local/lib/python3.10/dist-packages/datasets/utils/file_utils.py in get_from_cache(url, cache_dir, force_download, proxies, etag_timeout, resume_download, user_agent, local_files_only, use_etag, max_retries, token, use_auth_token, ignore_url_params, storage_options, download_desc)
599 raise ConnectionError(f"Couldn't reach {url} ({repr(head_error)})")
600 elif response is not None:
--> 601 raise ConnectionError(f"Couldn't reach {url} (error {response.status_code})")
602 else:
603 raise ConnectionError(f"Couldn't reach {url}")

ConnectionError: Couldn't reach https://www.dropbox.com/s/e2us0hcs3ilr20e/MInDS-14.zip?dl=1 (error 429)`

Error while using TrainingArguments in Unit4

I'm getting the following error while creating an instance of TrainingArguments in Unit-4:

ImportError: Using the Trainer with PyTorch requires accelerate>=0.21.0: Please run pip install transformers[torch] or pip install accelerate -U

I've tried doing both "pip install transformers[torch]" as well as "pip install accelerate -U" but the same error pops up still.

There is no issue in importing TrainingArguments, the error pops up while creating an instance only.

Gtzan Split Unit 4

Im chapters/en/chapter4/fine-tuning.mdx the following snippet is presented for loading the gtzan dataset:

from datasets import load_dataset

gtzan = load_dataset("marsyas/gtzan", "all")
gtzan

This returns a DatasetDict object, not a Dataset object, which causes the next snippet to fail:

gtzan = gtzan.train_test_split(seed=42, shuffle=True, test_size=0.1)
gtzan

When I run these together as is I get:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
[<ipython-input-4-0475a19d0be5>](https://localhost:8080/#) in <cell line: 1>()
----> 1 gtzan = gtzan.train_test_split(seed=42, shuffle=True, test_size=0.1)
      2 gtzan

AttributeError: 'DatasetDict' object has no attribute 'train_test_split'

I can bypass this by pointing the train_test_split function to the "train" split within the original DatasetDict object returned by the load_dataset function:

gtzan = gtzan["train"].train_test_split(seed=42, shuffle=True, test_size=0.1)
gtzan

Output:

DatasetDict({
    train: Dataset({
        features: ['file', 'audio', 'genre'],
        num_rows: 899
    })
    test: Dataset({
        features: ['file', 'audio', 'genre'],
        num_rows: 100
    })
})

Recommend updating the second code snippet to call train_test_split on the "train" split. Unless there is a way to get load_dataset to return the Dataset object itself - I'm not even sure what the "all" flag refers to there. I can make this change but was instructed on the discord server to file an issue.

Translation to Vietnamese

Hi there ๐Ÿ‘‹

Let's translate the course to Vietnamese so that the whole community can benefit from this resource ๐ŸŒŽ!

Below are the chapters and files that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

๐Ÿ™‹ If you'd like others to help you with the translation, you can also post in our forums or tag @_lewtun on Twitter to gain some visibility.

Chapters

0 - Setup

1 - Transformer models

2 - Using ๐Ÿค— Transformers

3 - Fine-tuning a pretrained model

4 - Sharing models and tokenizers

5 - The ๐Ÿค— Datasets library

6 - The ๐Ÿค— Tokenizers library

7 - Main NLP tasks

8 - How to ask for help

Events

In chapter 4, had to add another next before the example started working

Original:

from IPython.display import Audio

classifier(sample["audio"].copy())
Audio(sample["audio"]["array"], rate=sample["audio"]["sampling_rate"])

Modified working:

from IPython.display import Audio

sample = next(iter(speech_commands))
classifier(sample["audio"].copy())
Audio(sample["audio"]["array"], rate=sample["audio"]["sampling_rate"])

Self evaluation doesn't work for my model card

Seems like there's a problem with the code for self evaluation. I've completed the unit 4 hands on task but self evaluation doesn't count it as succesful. I took a look at the code and it seem's like my model card outputs the accuracy as eval_accuracy instead of Accuracy. Am I doing something wrong or the model card pattern has been changed and the self evaluation script is not yet updated?

Translation to Simplified Chinese

Hi there ๐Ÿ‘‹

Let's translate the course to Simplified Chinese so that the whole community can benefit from this resource ๐ŸŒŽ!

Below are the chapters and files that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

๐Ÿ™‹ If you'd like others to help you with the translation, you can also post in our forums or tag @_lewtun on Twitter to gain some visibility.

Chapters

Unit 0: Welcome to the course!

Unit 1: Working with audio data

Unit 2: A gentle introduction to audio applications

Unit 3: Transformer architectures for audio

Unit 4: Build a music genre classifier

Unit 5: Automatic speech recognition

Unit 6: From text to speech

Unit 7: Pulling it all together

Unit 8: Finish line

Course Events

doc-builder error

Hello!

I translated chapter2 into Russian. After finishing the translation, I wanted to test the correct display of the content of this chapter. For this purpose I used doc-builder. I ran into a problem because the files asr_pipeline.mdx and audio_classification_pipeline.mdx are not displayed in the browser at http://localhost:3000/. Instead I get the following message:

Error while preprocessing /tmp/tmpalt8krht/kit/src/routes/chapter2/asr_pipeline.mdx - btoa is not defined
ReferenceError: Error while preprocessing /tmp/tmpalt8krht/kit/src/routes/chapter2/asr_pipeline.mdx - btoa is not defined
    at base64 (file:///tmp/tmpalt8krht/kit/preprocess.js:435:28)
    at highlighter (file:///tmp/tmpalt8krht/kit/preprocess.js:488:13)
    at /tmp/tmpalt8krht/kit/node_modules/mdsvex/dist/main.cjs.js:25745:25
    at Array.map (<anonymous>)
    at /tmp/tmpalt8krht/kit/node_modules/mdsvex/dist/main.cjs.js:25743:11
    at wrapped (/tmp/tmpalt8krht/kit/node_modules/mdsvex/dist/main.cjs.js:206:19)
    at next (/tmp/tmpalt8krht/kit/node_modules/mdsvex/dist/main.cjs.js:299:28)
    at done (/tmp/tmpalt8krht/kit/node_modules/mdsvex/dist/main.cjs.js:236:16)
    at then (/tmp/tmpalt8krht/kit/node_modules/mdsvex/dist/main.cjs.js:243:5)
    at wrapped (/tmp/tmpalt8krht/kit/node_modules/mdsvex/dist/main.cjs.js:226:9)

I would like to note that the files introduction.mdx and hands_on.mdx are displayed correctly. These files do not contain blocks with python code.
How to solve this problem?

I attach a screenshot of one of the correctly displayed pages.
ะกะฝะธะผะพะบ ัะบั€ะฐะฝะฐ ะพั‚ 2023-08-02 13-51-31

Marvin

When doing "putting it all together" I ran into a problem because when I ran the wake up code the for loop gets skipped for some reason

Check My Progress doesn't evaluate submission

Hi!

I just finished training a model for hands-on exercise of Unit 4. Since even DistilHuBERT can take hours to train, I used PEFT library and used LoRA to finetune the model. When finished, I pushed my model to hf hub using the command provided in the course, but slightly modified it to get rid of error I was getting.
The provided command was:
trainer.push_to_hub(**kwargs)
I used:
trainer.model.push_to_hub(**kwargs)
This is the repository for the model: https://huggingface.co/ThreeBlessings/distilhubert-finetuned-gtzan.
However when I try to check my progress with the Check My Progress space, it doesn't show any evaluation results for my model. The progress table stays with default values.

I thought maybe it has something to do with me pushing LoRA model, not the full version. So I merged the LoRA weights with the base model weights and pushed it to huggingface hub, resulting in this repository: https://huggingface.co/ThreeBlessings/distilhubert-finetuned-gtzan-merged.
The Check My Progress space still didn't update after 24 hours.

I never modified kwargs that was provided in the course.
What am I doing wrong? Is there something I'm missing here?

Wrong output on chapter4/fine-tuning

Hi๐Ÿ‘‹,
I think the output should be:

DatasetDict({
    train: Dataset({
        features: ['genre', 'input_values', 'attention_mask'],
        num_rows: 899
    })
    test: Dataset({
        features: ['genre', 'input_values', 'attention_mask'],
        num_rows: 100
    })
})

Instead of

gtzan_encoded
```
**Output:**
```out
DatasetDict({
train: Dataset({
features: ['genre', 'input_values'],
num_rows: 899
})
test: Dataset({
features: ['genre', 'input_values'],
num_rows: 100
})
})
```

Since return_attention_mask=True in the feature_extractor. Is this the case?

EncoderClassifier from Speechbrain wrong import path.

The current import path for the EncoderClassifier class is incorrect in unit 6 notebook "Fine-tuning SpeechT5".

It points to speechbrain.pretrained instead of the correct path, which is speechbrain.inference.speaker (see: https://huggingface.co/speechbrain/spkrec-ecapa-voxceleb#compute-your-speaker-embeddings).

So changing

from speechbrain.pretrained import EncoderClassifier

to

from speechbrain.inference.speaker import EncoderClassifier

worked for me.

Why are code and code outputs in the same code block in the text?

I am going through the Load and explore an audio dataset section of the tutorial.

And I saw that code and code outputs are contained in the same code block, which I believe will confuse people.

image

I strongly belief that code and code output should go into different blocks of code, or the output may even be inserted as plaitext with monospace font, outside of a code block.

Please make this change.

If you need help, I can create PRs doing this. Let me know once you decide the style.

Translation to Spanish

Hi there ๐Ÿ‘‹

Let's translate the course to Spanish so that the whole community can benefit from this resource ๐ŸŒŽ!

Below are the chapters and files that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

๐Ÿ™‹ If you'd like others to help you with the translation, you can also post in our forums or tag @_lewtun on Twitter to gain some visibility.

##Chapters

Unit 0: Welcome to the course!

Unit 1: Working with audio data

Unit 2: A gentle introduction to audio applications

Unit 3: Transformer architectures for audio

Unit 4: Build a music genre classifier

Unit 5: Automatic Speech Recognition

Course Events

Spaces not building:runtime error, streams could not be allocated 404

For Unit 7 task, (Speech to speech translation), I tried duplicating the given space and that gave me runtime error (could not allocate streams 404, hardware not available), then i create new space (/nimrita/speech-to-speech-translation-MMS1) from scratch and that too fails to build with same errors. Please help.

[Kaggle Notebooks] Create Kaggle Notebooks for course units

Hi there ๐Ÿ‘‹

Many course participants faced issues working through the course materials and exercises on a free tier of Google Colab. An alternative to it is Kaggle Notebooks (it provides a fixed amount of GPU hours but is consistent in experience).
However, there are some differences in working with Kaggle notebooks, such as pushing models to Hub and setting up your environment.

How can you help?

  • Write a short tutorial illustrating the extra steps needed to run the course examples and exercises successfully in Kaggle Notebooks.
  • Once done, tag @MKhalusova and @Vaibhavs10 in the comments. We can review and suggest changes if required.

Thank you for deciding to volunteer your time and experience with the course.

Microphone Access

The first part includes an iframe, but the microphone access isn't allowed for some reason (I tried in Firefox, Chrome and Firefox mobile).
After like two hours of trying to fix browser permissions and stuff, I finally solved this by opening the frame in a new tab and then the browser permission dialog appeared and I allowed it. However, I guess this will be confusing for other people, so I suggest adding a button to open the demo in a new tab or some notice about this.
image

(Sorry if it's not the correct repo to report this. I'm new to huggingface ๐Ÿ™‚)

No such file or directory in chapter1-preprocessing when trying to calculate durations

Hey

I am not sure what the expected behaviour is and whether its my mistake, an error in the course or of the utilised dataset but I noticed the following in Chapter 1 - Preprocessing:

When I follow the course and try to execute

# use librosa to get example's duration from the audio file
new_column = [librosa.get_duration(path=x) for x in minds["path"]]

it will fail because the path, or x in the code snippet, looks something like /storage/hf-datasets-cache/all/datasets/27907695716030-config-parquet-and-info-PolyAI-minds14-941a5af2/downloads/extracted/a87e442545495cdb67dfdcbc9d4f35d234c9f8e471449b2db58d7c81b62f001a/en-AU~PAY_BILL/response_4.wav (which is the exact content as provided by the unmodified dataset as can be seen on the datasets page but does not exist on my machine).

Do I use the load_dataset function in a wrong way? Do I have to specify a path to explicitly save or cache the data somewhere? Is there a way that will automatically replace the 'path' value in the dataset with the local path on my machine?

Alternatively, one could change the function call of librosa.get_duration(path=x) and pass the audio array and the sampling_rate instead, e.g.

new_column = [librosa.get_duration(y=x["array"], sr=x["sampling_rate"]) for x in minds["audio"]]

Error in DataCollatorSpeechSeq2SeqWithPadding (Unit 5)

In the unit 5 of the audio course, the following code is used:

class DataCollatorSpeechSeq2SeqWithPadding:
    processor: Any

    def __call__(
        self, features: List[Dict[str, Union[List[int], torch.Tensor]]]
    ) -> Dict[str, torch.Tensor]:
        # split inputs and labels since they have to be of different lengths and need different padding methods
        # first treat the audio inputs by simply returning torch tensors
        input_features = [
            {"input_features": feature["input_features"][0]} for feature in features
        ]
        batch = self.processor.feature_extractor.pad(input_features, return_tensors="pt")

        # get the tokenized label sequences
        label_features = [{"input_ids": feature["labels"]} for feature in features]
        # pad the labels to max length
        labels_batch = self.processor.tokenizer.pad(label_features, return_tensors="pt")

        # replace padding with -100 to ignore loss correctly
        labels = labels_batch["input_ids"].masked_fill(
            labels_batch.attention_mask.ne(1), -100
        )

        # if bos token is appended in previous tokenization step,
        # cut bos token here as it's append later anyways
        if (labels[:, 0] == self.processor.tokenizer.bos_token_id).all().cpu().item():
            labels = labels[:, 1:]

        batch["labels"] = labels

        return batch

However, according to the following issue, bos_token_id shouldn't be used (@ArthurZucker). In my opinion, this should be replaced with self.processor.tokenizer.convert_tokens_to_ids("<|startoftranscript|>") or with model.config.decoder_start_token_id. What do you think?

Note if this is true, then there would be a similar error in @sanchit-gandhi's fine-tuning tutorial too.

Thanks for your attention.

Regards,
Tony

Translation to Portuguese-Brazil

Hi there ๐Ÿ‘‹

Let's translate the course to pt-BR so that the whole community can benefit from this resource ๐ŸŒŽ!

Below are the chapters and files that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

๐Ÿ™‹ If you'd like others to help you with the translation, you can also post in our forums or tag @_lewtun on Twitter to gain some visibility.

Chapters

** TOC **

Unit 0 - Welcome to the course!

Unit 1 - Working with audio data

Unit 2 - A gentle introduction to audio applications

Unit 3 - Transformer architectures for audio

Unit 4 - Build a music genre classifier

Unit 5 - Automatic Speech Recognition

  • [`introduction

.mdx`](https://github.com/rrg92/audio-transformers-course/blob/main/chapters/pt-BR/chapter5/introduction.mdx)

Unit 6 - From text to speech

Unit 7 - Putting it all together

Unit 8 - Finish line

Course Events


Translation to Russian

Hi there ๐Ÿ‘‹

Let's translate the course to Russian so that the whole community can benefit from this resource ๐ŸŒŽ!

Below are the chapters and files that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

๐Ÿ™‹ If you'd like others to help you with the translation, you can also post in our forums or tag @_lewtun on Twitter to gain some visibility.

Chapters

UNIT 0. WELCOME TO THE COURSE!

UNIT 1. WORKING WITH AUDIO DATA

UNIT 2. A GENTLE INTRODUCTION TO AUDIO APPLICATIONS

UNIT 3. TRANSFORMER ARCHITECTURES FOR AUDIO

UNIT 4. BUILD A MUSIC GENRE CLASSIFIER

UNIT 5. AUTOMATIC SPEECH RECOGNITION

UNIT 6. From text to speech

UNIT 7. Putting it all together

UNIT 8. Finish line

Course Events

Adding extra material:

small issue: OSError: Using the --html flag requires node v14 to be installed, but it was not found in your system.

when running this comamnd:

doc-builder preview audio-transformers-course ../audio-transformers-course/chapters/en --not_python_module

we got this:

Traceback (most recent call last):
File "/home/silvacarl/.local/lib/python3.8/site-packages/doc_builder/commands/build.py", line 31, in check_node_is_available
p = subprocess.run(
File "/usr/lib/python3.8/subprocess.py", line 493, in run
with Popen(*popenargs, **kwargs) as process:
File "/usr/lib/python3.8/subprocess.py", line 858, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "/usr/lib/python3.8/subprocess.py", line 1704, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
PermissionError: [Errno 13] Permission denied: 'node'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/silvacarl/.local/bin/doc-builder", line 8, in
sys.exit(main())
File "/home/silvacarl/.local/lib/python3.8/site-packages/doc_builder/commands/doc_builder_cli.py", line 47, in main
args.func(args)
File "/home/silvacarl/.local/lib/python3.8/site-packages/doc_builder/commands/preview.py", line 158, in preview_command
check_node_is_available()
File "/home/silvacarl/.local/lib/python3.8/site-packages/doc_builder/commands/build.py", line 40, in check_node_is_available
raise EnvironmentError(
OSError: Using the --html flag requires node v14 to be installed, but it was not found in your system.

any ideas?

Translation to Korean

Hi there ๐Ÿ‘‹

Let's translate the course to Korean so that the whole community can benefit from this resource ๐ŸŒŽ!

Below are the chapters and files that need translating - let us know here if you'd like to translate any. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

๐Ÿ™‹ If you'd like others to help you with the translation, you can also post in our forums.

This PR template from @gabrielwithappy might help.

Chapters

Unit 0: Welcome to the course!

Unit 1: Working with audio data

Unit 2: A gentle introduction to audio applications

Unit 3: Transformer architectures for audio

Unit 4: Build a music genre classifier

Course Events

Missing code in chapter 4

This line should be changed to the line below

# original
gtzan = load_dataset("marsyas/gtzan", "all")
# change to
gtzan = load_dataset("marsyas/gtzan", "all").get('train')

doc builder 0.5.0

This week, I embarked on creating a translation for the audio course project into Brazilian Portuguese. Following the instructions outlined in the project's README, I installed the latest version of doc-builder. However, during my initial usage of the doc-builder preview command, I encountered a couple of significant issues.

Firstly, only the Table of Contents (TOC) was rendered correctly, and any attempt to click on the links within the TOC resulted in a 404 error. This was quite puzzling and hindered the progress of my work. Additionally, I noticed that the preview was running on port 5173, which deviated from the port 3000 specified in the README documentation.

To troubleshoot this issue, I took the step of cloning the doc-builder project and switching to version 0.4.0 before installing it. Remarkably, after this adjustment, the doc-builder preview functioned as expected. The links in the TOC now correctly loaded the corresponding chapters.

This experience led me to surmise that the audio course project might not yet be fully compatible with the recently released version 0.5 of doc-builder. I am curious if others have encountered similar issues and can confirm my observations. If this incompatibility is indeed the case, it might be prudent to update the README.md of the audio course project to reflect the current situation until there is support for doc-builder version 0.5.

It's important to note that merely installing doc-builder version 0.4.0 via pip (using pip install doc-builder==0.4.0) did not resolve the issue due to an additional problem with doc-builder itself. I have already submitted a pull request to address this specific problem, which can be reviewed here: huggingface/doc-builder#489. This PR aims to rectify the underlying issue and ensure smoother operation for those using version 0.4.0 of doc-builder in their projects.

Translation to Italian

Hi there ๐Ÿ‘‹

Let's translate the course to Italian so that the whole community can benefit from this resource ๐ŸŒŽ!

Below are the chapters and files that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

๐Ÿ™‹ If you'd like others to help you with the translation, you can also post in our forums or tag @_lewtun on Twitter to gain some visibility.

(Tasks needs to be fixed, will be done as the work progresses)

Chapters

Unit 0. Welcome to the course!

Unit 1. Working with audio data

2 - Using ๐Ÿค— Transformers

3 - Fine-tuning a pretrained model

4 - Sharing models and tokenizers

5 - The ๐Ÿค— Datasets library

6 - The ๐Ÿค— Tokenizers library

7 - Main NLP tasks

Unit 8. Finish Line

Events

fix a typoe

In UNIT4 : Pretrained models for audio classification
Weโ€™ll load an official Audio Spectrogram Transformer checkpoint fine-tuned on the Speech Commands dataset, under the namespace "MIT/ast-finetuned-speech-commands-v2":

Copied
classifier = pipeline(
"audio-classification", model="MIT/ast-finetuned-speech-commands-v2"
)
classifier(sample["audio"])
Fix it to be classifier(sample["audio"]["array"])
I don't know how to make a pull request yet! :)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.