Giter Site home page Giter Site logo

huggingface / course Goto Github PK

View Code? Open in Web Editor NEW
1.9K 46.0 636.0 46.79 MB

The Hugging Face course on Transformers

Home Page: https://huggingface.co/course

License: Apache License 2.0

Python 0.20% Makefile 0.01% MDX 99.80%
deep-learning nlp transformers hacktoberfest

course's Introduction

The Hugging Face Course

This repo contains the content that's used to create the Hugging Face course. The course teaches you about applying Transformers to various tasks in natural language processing and beyond. Along the way, you'll learn how to use the Hugging Face ecosystem β€” πŸ€— Transformers, πŸ€— Datasets, πŸ€— Tokenizers, and πŸ€— Accelerate β€” as well as the Hugging Face Hub. It's completely free and open-source!

🌎 Languages and translations

Language Source Authors
English chapters/en @sgugger, @lewtun, @LysandreJik, @Rocketknight1, @sashavor, @osanseviero, @SaulLu, @lvwerra
Bengali (WIP) chapters/bn @avishek-018, @eNipu
German (WIP) chapters/de @JesperDramsch, @MarcusFra, @fabridamicelli
Spanish (WIP) chapters/es @camartinezbu, @munozariasjm, @fordaz
Persian (WIP) chapters/fa @jowharshamshiri, @schoobani
French chapters/fr @lbourdois, @ChainYo, @melaniedrevet, @abdouaziz
Gujarati (WIP) chapters/gu @pandyaved98
Hebrew (WIP) chapters/he @omer-dor
Hindi (WIP) chapters/hi @pandyaved98
Bahasa Indonesia (WIP) chapters/id @gstdl
Italian (WIP) chapters/it @CaterinaBi, @ClonedOne, @Nolanogenn, @EdAbati, @gdacciaro
Japanese (WIP) chapters/ja @hiromu166, @younesbelkada, @HiromuHota
Korean (WIP) chapters/ko @Doohae, @wonhyeongseo, @dlfrnaos19, @nsbg
Portuguese (WIP) chapters/pt @johnnv1, @victorescosta, @LincolnVS
Russian (WIP) chapters/ru @pdumin, @svv73, @blademoon
Thai (WIP) chapters/th @peeraponw, @a-krirk, @jomariya23156, @ckingkan
Turkish (WIP) chapters/tr @tanersekmen, @mertbozkir, @ftarlaci, @akkasayaz
Vietnamese chapters/vi @honghanhh
Chinese (simplified) chapters/zh-CN @zhlhyx, petrichor1122, @1375626371
Chinese (traditional) (WIP) chapters/zh-TW @davidpeng86

Translating the course into your language

As part of our mission to democratise machine learning, we'd love to have the course available in many more languages! Please follow the steps below if you'd like to help translate the course into your language πŸ™.

πŸ—žοΈ Open an issue

To get started, navigate to the Issues page of this repo and check if anyone else has opened an issue for your language. If not, open a new issue by selecting the Translation template from the New issue button.

Once an issue is created, post a comment to indicate which chapters you'd like to work on and we'll add your name to the list.

πŸ—£ Join our Discord

Since it can be difficult to discuss translation details quickly over GitHub issues, we have created dedicated channels for each language on our Discord server. If you'd like to join, follow the instructions at this channel πŸ‘‰: https://discord.gg/JfAtkvEtRb

🍴 Fork the repository

Next, you'll need to fork this repo. You can do this by clicking on the Fork button on the top-right corner of this repo's page.

Once you've forked the repo, you'll want to get the files on your local machine for editing. You can do that by cloning the fork with Git as follows:

git clone https://github.com/YOUR-USERNAME/course

πŸ“‹ Copy-paste the English files with a new language code

The course files are organised under a main directory:

  • chapters: all the text and code snippets associated with the course.

You'll only need to copy the files in the chapters/en directory, so first navigate to your fork of the repo and run the following:

cd ~/path/to/course
cp -r chapters/en/CHAPTER-NUMBER chapters/LANG-ID/CHAPTER-NUMBER

Here, CHAPTER-NUMBER refers to the chapter you'd like to work on and LANG-ID should be one of the ISO 639-1 or ISO 639-2 language codes -- see here for a handy table.

✍️ Start translating

Now comes the fun part - translating the text! The first thing we recommend is translating the part of the _toctree.yml file that corresponds to your chapter. This file is used to render the table of contents on the website and provide the links to the Colab notebooks. The only fields you should change are the title, ones -- for example, here are the parts of _toctree.yml that we'd translate for Chapter 0:

- title: 0. Setup # Translate this!
  sections:
  - local: chapter0/1 # Do not change this!
    title: Introduction # Translate this!

🚨 Make sure the _toctree.yml file only contains the sections that have been translated! Otherwise you won't be able to build the content on the website or locally (see below how).

Once you have translated the _toctree.yml file, you can start translating the MDX files associated with your chapter.

πŸ™‹ If the _toctree.yml file doesn't yet exist for your language, you can simply create one by copy-pasting from the English version and deleting the sections that aren't related to your chapter. Just make sure it exists in the chapters/LANG-ID/ directory!

πŸ‘·β€β™‚οΈ Build the course locally

Once you're happy with your changes, you can preview how they'll look by first installing the doc-builder tool that we use for building all documentation at Hugging Face:

pip install hf-doc-builder
doc-builder preview course ../course/chapters/LANG-ID --not_python_module

**preview command does not work with Windows.

This will build and render the course on http://localhost:3000/. Although the content looks much nicer on the Hugging Face website, this step will still allow you to check that everything is formatted correctly.

πŸš€ Submit a pull request

If the translations look good locally, the final step is to prepare the content for a pull request. Here, the first think to check is that the files are formatted correctly. For that you can run:

pip install -r requirements.txt
make style

Once that's run, commit any changes, open a pull request, and tag @lewtun for a review. Congratulations, you've now completed your first translation πŸ₯³!

🚨 To build the course on the website, double-check your language code exists in languages field of the build_documentation.yml and build_pr_documentation.yml files in the .github folder. If not, just add them in their alphabetical order.

πŸ“” Jupyter notebooks

The Jupyter notebooks containing all the code from the course are hosted on the huggingface/notebooks repo. If you wish to generate them locally, first install the required dependencies:

python -m pip install -r requirements.txt

Then run the following script:

python utils/generate_notebooks.py --output_dir nbs

This script extracts all the code snippets from the chapters and stores them as notebooks in the nbs folder (which is ignored by Git by default).

✍️ Contributing a new chapter

Note: we are not currently accepting community contributions for new chapters. These instructions are for the Hugging Face authors.

Adding a new chapter to the course is quite simple:

  1. Create a new directory under chapters/en/chapterX, where chapterX is the chapter you'd like to add.
  2. Add numbered MDX files sectionX.mdx for each section. If you need to include images, place them in the huggingface-course/documentation-images repository and use the HTML Images Syntax with the path https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/{langY}/{chapterX}/{your-image.png}.
  3. Update the _toctree.yml file to include your chapter sections -- this information will render the table of contents on the website. If your section involves both the PyTorch and TensorFlow APIs of transformers, make sure you include links to both Colabs in the colab field.

If you get stuck, check out one of the existing chapters -- this will often show you the expected syntax.

Once you are happy with the content, open a pull request and tag @lewtun for a review. We recommend adding the first chapter draft as a single pull request -- the team will then provide feedback internally to iterate on the content πŸ€—!

πŸ™Œ Acknowledgements

The structure of this repo and README are inspired by the wonderful Advanced NLP with spaCy course.

course's People

Contributors

a-krirk avatar abidlabs avatar akkasayaz avatar blademoon avatar bon-qi avatar caterinabi avatar dawoodkhan82 avatar enipu avatar fabridamicelli avatar haruki-n avatar icell avatar johnnv1 avatar jomariya23156 avatar jowharshamshiri avatar kambizg avatar lbourdois avatar lewtun avatar merveenoyan avatar mishig25 avatar mkhalusova avatar omer-dor avatar osanseviero avatar pandyaved98 avatar pdumin avatar sookeyy-12 avatar tyisme614 avatar victorescosta avatar xianbaoqian avatar yaoqih avatar younesbelkada avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

course's Issues

Tokenization Course Issues

Hello,

I believe the corpus and the word_freqs output used in the BPE / WordPiece implementations have a mismatch simply Course -> course is not capitalized in corpus but word_freqs seem to use the capitalized version.

To reproduce

corpus = [
    "This is the Hugging Face course.",
    "This chapter is about tokenization.",
    "This section shows several tokenizer algorithms.",
    "Hopefully, you will be able to understand how they are trained and generate tokens.",
]

tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")

word_freqs = defaultdict(int)
for text in corpus:
    words_with_offsets = tokenizer.backend_tokenizer.pre_tokenizer.pre_tokenize_str(text)
    words = [word for word, _ in words_with_offsets]
    for word in words:
        word_freqs[word] += 1

assert word_freqs == defaultdict(int, {'This': 3, 'is': 2, 'the': 1, 'Hugging': 1, 'Face': 1, 'Course': 1, '.': 4, 'chapter': 1, 'about': 1,
    'tokenization': 1, 'section': 1, 'shows': 1, 'several': 1, 'tokenizer': 1, 'algorithms': 1, 'Hopefully': 1,
    ',': 1, 'you': 1, 'will': 1, 'be': 1, 'able': 1, 'to': 1, 'understand': 1, 'how': 1, 'they': 1, 'are': 1,
    'trained': 1, 'and': 1, 'generate': 1, 'tokens': 1})

Translation to Azerbaijani

Hi there πŸ‘‹

Let's translate the course to Azerbaijani so that the whole community can benefit from this resource 🌎!

Below are the chapters and files that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

πŸ™‹ If you'd like others to help you with the translation, you can also post in our forums or tag @_lewtun on Twitter to gain some visibility.

Chapters

0 - Setup

1 - Transformer models

2 - Using πŸ€— Transformers

3 - Fine-tuning a pretrained model

4 - Sharing models and tokenizers

5 - The πŸ€— Datasets library

6 - The πŸ€— Tokenizers library

7 - Main NLP tasks

8 - How to ask for help

Events

Translation to Russian

Hi there πŸ‘‹

Let's translate the course to Russian so that the whole community can benefit from this resource 🌎!

Below are the chapters and files that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

πŸ™‹ If you'd like others to help you with the translation, you can also post in our forums or tag @_lewtun on Twitter to gain some visibility.

Chapters

0 - Setup

1 - Transformer models

2 - Using πŸ€— Transformers

3 - Fine-tuning a pretrained model

4 - Sharing models and tokenizers

5 - The πŸ€— Datasets library

6 - The πŸ€— Tokenizers library

7 - Main NLP tasks

8 - How to ask for help

9 – Building and sharing demos

Events

Subtitles

Translate to Korean

Hi there πŸ‘‹

Let's translate the course to Korean so that the whole community can benefit from this resource 🌎!

Below are the chapters and files that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

πŸ™‹ If you'd like others to help you with the translation, you can also post in our forums or tag @_lewtun on Twitter to gain some visibility.

Chapters

0 - Setup

1 - Transformer models

2 - Using πŸ€— Transformers

3 - Fine-tuning a pretrained model

4 - Sharing models and tokenizers

5 - The πŸ€— Datasets library

6 - The πŸ€— Tokenizers library

7 - Main NLP tasks

8 - How to ask for help

Events

Translate to Traditional Chinese (zh-TW)

Hi there πŸ‘‹

Let's translate the course to Traditional Chinese so that the whole community can benefit from this resource 🌎!

Below are the chapters and files that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

πŸ™‹ If you'd like others to help you with the translation, you can also post in our forums or tag @_lewtun on Twitter to gain some visibility.

Chapters

0 - Setup

1 - Transformer models

2 - Using πŸ€— Transformers

3 - Fine-tuning a pretrained model

4 - Sharing models and tokenizers

5 - The πŸ€— Datasets library

6 - The πŸ€— Tokenizers library

7 - Main NLP tasks

8 - How to ask for help

Events

Translation to Bengali

Hi there πŸ‘‹

Let's translate the course to Bengali so that the whole community can benefit from this resource 🌎!

Below are the chapters and files that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

πŸ™‹ If you'd like others to help you with the translation, you can also post in our forums or tag @_lewtun on Twitter to gain some visibility.

Chapters

0 - Setup

1 - Transformer models

2 - Using πŸ€— Transformers

3 - Fine-tuning a pretrained model

4 - Sharing models and tokenizers

5 - The πŸ€— Datasets library

6 - The πŸ€— Tokenizers library

7 - Main NLP tasks

8 - How to ask for help

Events

Translate to Spanish

Hi there πŸ‘‹

Let's translate the course to Spanish so that the whole community can benefit from this resource 🌎!

Below are the files and chapters that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

πŸ™‹ If you'd like others to help you with the translation, you can also post in our forums or tag @_lewtun on Twitter to gain some visibility.

Chapters

0 - Setup

1 - Transformer models

2 - Using πŸ€— Transformers

3 - Fine-tuning a pretrained model

4 - Sharing models and tokenizers

5 - The πŸ€— Datasets library

6 - The πŸ€— Tokenizers library

7 - Main NLP tasks

8 - How to ask for help

Events

Translate to Marathi

Hi there πŸ‘‹

Let's translate the course to Marathi so that the whole community can benefit from this resource 🌎!

Below are the chapters and files that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

πŸ™‹ If you'd like others to help you with the translation, you can also post in our forums or tag @_lewtun on Twitter to gain some visibility.

We've also created a marathi-translations channel on the Hugging Face Discord in case you wish to discuss translation details there. Just follow the instructions here πŸ‘‰ https://discord.gg/hKnxnxUr

Chapters

0 - Setup

1 - Transformer models

2 - Using πŸ€— Transformers

3 - Fine-tuning a pretrained model

4 - Sharing models and tokenizers

5 - The πŸ€— Datasets library

6 - The πŸ€— Tokenizers library

7 - Main NLP tasks

8 - How to ask for help

Events

Translation to Polish

Hi there πŸ‘‹

Let's translate the course to Polish so that the whole community can benefit from this resource 🌎!

Below are the chapters and files that need translating – let us know here if you'd like to translate any, and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

πŸ™‹ If you'd like others to help you with the translation, you can also post in our forums or tag @_lewtun on Twitter to gain some visibility.

Chapters

0 - Setup

1 - Transformer models

2 - Using πŸ€— Transformers

3 - Fine-tuning a pretrained model

4 - Sharing models and tokenizers

5 - The πŸ€— Datasets library

6 - The πŸ€— Tokenizers library

7 - Main NLP tasks

8 - How to ask for help

Events

Translation to Portuguese

Hi there πŸ‘‹

Let's translate the course to Portuguese so that the whole community can benefit from this resource 🌎!

Below are the chapters and files that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

πŸ™‹ If you'd like others to help you with the translation, you can also post in our forums or tag @_lewtun on Twitter to gain some visibility.

Chapters

0 - Setup

1 - Transformer models

2 - Using πŸ€— Transformers

3 - Fine-tuning a pretrained model

4 - Sharing models and tokenizers

5 - The πŸ€— Datasets library

6 - The πŸ€— Tokenizers library

7 - Main NLP tasks

8 - How to ask for help

Events

Translate to French

Hi there πŸ‘‹

Let's translate the course to French so that the whole community can benefit from this resource 🌎!

Below are the chapters and files that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

πŸ™‹ If you'd like others to help you with the translation, you can also post in our forums or tag @_lewtun on Twitter to gain some visibility.

Chapters

0 - Setup

1 - Transformer models

2 - Using πŸ€— Transformers

3 - Fine-tuning a pretrained model

4 - Sharing models and tokenizers

5 - The πŸ€— Datasets library

6 - The πŸ€— Tokenizers library

7 - Main NLP tasks

8 - How to ask for help

Events

Translate to Hindi

Hi there πŸ‘‹

Let's translate the course to HINDI so that the whole community can benefit from this resource 🌎!

Below are the chapters and files that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

πŸ™‹ If you'd like others to help you with the translation, you can also post in our forums or tag @_lewtun on Twitter to gain some visibility.

Chapters

0 - Setup

1 - Transformer models

2 - Using πŸ€— Transformers

3 - Fine-tuning a pretrained model

4 - Sharing models and tokenizers

5 - The πŸ€— Datasets library

6 - The πŸ€— Tokenizers library

7 - Main NLP tasks

8 - How to ask for help

Events

Translation to Dutch

Hi there πŸ‘‹

Let's translate the course to Dutch so that the whole community can benefit from this resource 🌎!

Below are the chapters and files that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

πŸ™‹ If you'd like others to help you with the translation, you can also post in our forums or tag @_lewtun on Twitter to gain some visibility.

Chapters

0 - Setup

1 - Transformer models

2 - Using πŸ€— Transformers

3 - Fine-tuning a pretrained model

4 - Sharing models and tokenizers

5 - The πŸ€— Datasets library

6 - The πŸ€— Tokenizers library

7 - Main NLP tasks

8 - How to ask for help

Events

A button towards the forum

Hi,

I copy/paste the message from our discussion on Discord @lewtun:

What do you think about adding a button at the beginning of web pages that would send the reader to the forum to ask a question if needed?

image

For example for this image, clicking on the button would take you to https://discuss.huggingface.co/t/chapter-1-questions/6797

I made this button in 10min on Paint to illustrate this idea, it will probably have to be redrawn.

Have a nice day,

Translating the course to Gujarati, spoken by around 50 million people

Hi there πŸ‘‹

Let's translate the course to Gujarati so that the whole community can benefit from this resource 🌎!

Below are the chapters and files that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

πŸ™‹ If you'd like others to help you with the translation, you can also post in our forums or tag @_lewtun on Twitter to gain some visibility.

We've also created a gujarati-translations channel on the Hugging Face Discord in case you wish to discuss translation details there. Just follow the instructions here πŸ‘‰ https://discord.gg/hKnxnxUr

Chapters

0 - Setup

1 - Transformer models

2 - Using πŸ€— Transformers

3 - Fine-tuning a pretrained model

4 - Sharing models and tokenizers

5 - The πŸ€— Datasets library

6 - The πŸ€— Tokenizers library

7 - Main NLP tasks

8 - How to ask for help

Events

Translation to Traditional Chinese (zh-TW/zh-HK)

Hi there πŸ‘‹

Let's translate the course to Traditional Chinese (zh-TW/zh-HK) so that the whole community can benefit from this resource 🌎!

Below are the chapters and files that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

πŸ™‹ If you'd like others to help you with the translation, you can also post in our forums or tag @_lewtun on Twitter to gain some visibility.

Chapters

0 - Setup

1 - Transformer models

2 - Using πŸ€— Transformers

3 - Fine-tuning a pretrained model

4 - Sharing models and tokenizers

5 - The πŸ€— Datasets library

6 - The πŸ€— Tokenizers library

7 - Main NLP tasks

8 - How to ask for help

Events

Translate to Turkish

Hi there πŸ‘‹

Let's translate the course to Turkish so that the whole community can benefit from this resource 🌎!

Below are the chapters and files that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

πŸ™‹ If you'd like others to help you with the translation, you can also post in our forums or tag @_lewtun on Twitter to gain some visibility.

Chapters

0 - Setup

1 - Transformer models

2 - Using πŸ€— Transformers

3 - Fine-tuning a pretrained model

4 - Sharing models and tokenizers

5 - The πŸ€— Datasets library

6 - The πŸ€— Tokenizers library

7 - Main NLP tasks

8 - How to ask for help

Events

Translate to Arabic

Hi there πŸ‘‹

Let's translate the course to Arabic so that the whole community can benefit from this resource 🌎!

Below are the chapters and files that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

πŸ™‹ If you'd like others to help you with the translation, you can also post in our forums or tag @_lewtun on Twitter to gain some visibility.

Chapters

0 - Setup

1 - Transformer models

2 - Using πŸ€— Transformers

3 - Fine-tuning a pretrained model

4 - Sharing models and tokenizers

5 - The πŸ€— Datasets library

6 - The πŸ€— Tokenizers library

7 - Main NLP tasks

8 - How to ask for help

Events

Mistakes in Ch6 The Tokenizer Library

Hi, thanks for your excellent course. Recently, I found two (maybe) mistakes during the learning process.

In Ch 6 - The πŸ€— Tokenizers library,

Specifically,

In Grouping Entities section,

import numpy as np

results = []
inputs_with_offsets = tokenizer(example, return_offsets_mapping=True)
tokens = inputs_with_offsets.tokens()
offsets = inputs_with_offsets["offset_mapping"]

idx = 0
while idx < len(predictions):
    pred = predictions[idx]
    label = model.config.id2label[pred]
    if label != "O":
        # Remove the B- or I-
        label = label[2:]
        start, _ = offsets[idx]

        # Grab all the tokens labeled with I-label
        all_scores = []
        while (
            idx < len(predictions)
            and model.config.id2label[predictions[idx]] == f"I-{label}"
        ):
-           all_scores.append(probabilities[idx][pred])
+           all_scores.append(probs[idx][predictions[idx]])
            _, end = offsets[idx]
            idx += 1

        # The score is the mean of all the scores of the tokens in that grouped entity
        score = np.mean(all_scores).item()
        word = example[start:end]
        results.append(
            {
                "entity_group": label,
                "score": score,
                "word": word,
                "start": start,
                "end": end,
            }
        )
    idx += 1

print(results)

In Handling long contexts,

candidates = []
for start_probs, end_probs in zip(start_probabilities, end_probabilities):
    scores = start_probs[:, None] * end_probs[None, :]
    idx = torch.triu(scores).argmax().item()

-   start_idx = idx // scores.shape[0]
-   end_idx = idx % scores.shape[0]
+   start_idx = idx // scores.shape[1]
+   end_idx = idx % scores.shape[1]
    score = scores[start_idx, end_idx].item()
    candidates.append((start_idx, end_idx, score))

print(candidates)

Translate to Italian

Hi there πŸ‘‹

Let's translate the course to Italian so that the whole community can benefit from this resource 🌎!

Below are the chapters and files that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

πŸ™‹ If you'd like others to help you with the translation, you can also post in our forums or tag @_lewtun on Twitter to gain some visibility.

Chapters

0 - Setup

1 - Transformer models

2 - Using πŸ€— Transformers

3 - Fine-tuning a pretrained model

4 - Sharing models and tokenizers

5 - The πŸ€— Datasets library

6 - The πŸ€— Tokenizers library

7 - Main NLP tasks

8 - How to ask for help

Events

HI/Chapter1/10.mdx is not loading as Quiz - Shows 500

Hi there πŸ‘‹

Let's translate the course to YOUR-LANG so that the whole community can benefit from this resource 🌎!

Below are the chapters and files that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

πŸ™‹ If you'd like others to help you with the translation, you can also post in our forums or tag @_lewtun on Twitter to gain some visibility.

Chapters

0 - Setup

1 - Transformer models

2 - Using πŸ€— Transformers

3 - Fine-tuning a pretrained model

4 - Sharing models and tokenizers

5 - The πŸ€— Datasets library

6 - The πŸ€— Tokenizers library

7 - Main NLP tasks

8 - How to ask for help

Events

Translate to Japanese

Hi there πŸ‘‹

Let's translate the course to YOUR-LANG so that the whole community can benefit from this resource 🌎!

Below are the chapters and files that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

πŸ™‹ If you'd like others to help you with the translation, you can also post in our forums or tag @_lewtun on Twitter to gain some visibility.

We've also created a japanese-translations channel on the Hugging Face Discord in case you wish to discuss translation details there. Just follow the instructions here πŸ‘‰ https://discord.gg/hKnxnxUr

Chapters

0 - Setup

1 - Transformer models

2 - Using πŸ€— Transformers

3 - Fine-tuning a pretrained model

4 - Sharing models and tokenizers

5 - The πŸ€— Datasets library

6 - The πŸ€— Tokenizers library

7 - Main NLP tasks

8 - How to ask for help

Events

Issue with Chapter 2 (Tokenizers/Tokenization) - output is different

Hey Hugging face,

I am opening this issue because I am currently following your course (that is great by the way) and noticed some difference between the output displayed in the tokenization section and the output in my notebook.

On your lesson it looks like this
image

When I am executing the same code in my EC2 instance on AWS I have that
image

with this list ['Using', 'a', 'Trans', '##former', 'network', 'is', 'simple'] .

I am presuming that the model bert-base-cased could have changed since the edition of the course so it really not a big deal but it's impact the conclusion here

image

I don't know if the issue is on my side so there is the details on my current setup if you want to try to reproduce it:

  • AWS ec2 p3.2xlarge
  • environment miniconda Latest
  • dependencies (in the requirements.txt)

requirements.txt

Don't hesitate if you have any questions

Translate to Kannada

Hi there πŸ‘‹

Let's translate the course to KANNADA so that the whole community can benefit from this resource 🌎!

Below are the chapters and files that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

πŸ™‹ If you'd like others to help you with the translation, you can also post in our forums or tag @_lewtun on Twitter to gain some visibility.

Chapters

0 - Setup

1 - Transformer models

2 - Using πŸ€— Transformers

3 - Fine-tuning a pretrained model

4 - Sharing models and tokenizers

5 - The πŸ€— Datasets library

6 - The πŸ€— Tokenizers library

7 - Main NLP tasks

8 - How to ask for help

Events

Translation to Persian

Hi there πŸ‘‹

Let's translate the course to Persian so that the whole community can benefit from this resource 🌎!

Before you get started please take a minute to read the latest version of our evolving translation guidelines (T). It is important that we maintain a common tone in our collective work, while contributing with our separate creative voices.

We have a glossary page (G) where we store our latest choice of Persian equivalents for words. This page may be subject to change with every PR and its review discussion. If there are changes, we will mention here that the glossary file has been updated. We need to retroactively apply the changes to our sections.

Check here for general instructions on contributing.

Here's the workflow for contributions:

  1. Please fork the Hugging Face course to your profile.
  2. Clone your fork to your local machine.
  3. Use this issue page for general discussion on word choices and whatnot.
  4. Fetch frequently from upstream to your fork and keep your local working tree updated.
  5. It is perfectly fine to link to your fork on this page for discussions.
  6. When you have the first draft of a page(s) done commit back to your fork and open a PR for that page(s) on the Hugging Face course repo.
    (Huggingface course/main branch <- Your fork/main or whatever branch you have)
  7. Ask someone to help you review the page(s) there. Commit the changes back to your fork and they will automatically be appended to the PR.
  8. If you have updates to the glossary try to include the stakeholders in the discussion(check commit history) and when done mention the changes on this page so we can all apply the changes retroactively to our sections.
  9. When done with the review, ask @lewtun to merge.

Below are the chapters and files that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

πŸ™‹ If you'd like others to help you with the translation, you can also post in our forums or tag @_lewtun on Twitter to gain some visibility.

Chapters

0 - Setup

1 - Transformer models

2 - Using πŸ€— Transformers

3 - Fine-tuning a pretrained model

4 - Sharing models and tokenizers

5 - The πŸ€— Datasets library

6 - The πŸ€— Tokenizers library

7 - Main NLP tasks

8 - How to ask for help

Events

Translation to Hebrew

Hi there πŸ‘‹

Let's translate the course to Hebrew so that the whole community can benefit from this resource 🌎!

Below are the chapters and files that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

πŸ™‹ If you'd like others to help you with the translation, you can also post in our forums or tag @_lewtun on Twitter to gain some visibility.

Chapters

0 - Setup

1 - Transformer models

2 - Using πŸ€— Transformers

3 - Fine-tuning a pretrained model

4 - Sharing models and tokenizers

5 - The πŸ€— Datasets library

6 - The πŸ€— Tokenizers library

7 - Main NLP tasks

8 - How to ask for help

Events

Translate to Telugu

Hi there πŸ‘‹

Let's translate the course to TELUGU so that the whole community can benefit from this resource 🌎!

Below are the chapters and files that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

πŸ™‹ If you'd like others to help you with the translation, you can also post in our forums or tag @_lewtun on Twitter to gain some visibility.

Chapters

0 - Setup

1 - Transformer models

2 - Using πŸ€— Transformers

3 - Fine-tuning a pretrained model

4 - Sharing models and tokenizers

5 - The πŸ€— Datasets library

6 - The πŸ€— Tokenizers library

7 - Main NLP tasks

8 - How to ask for help

Events

Translate to Malayalam

Hi there πŸ‘‹

Let's translate the course to MALAYALAM so that the whole community can benefit from this resource 🌎!

Below are the chapters and files that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

πŸ™‹ If you'd like others to help you with the translation, you can also post in our forums or tag @_lewtun on Twitter to gain some visibility.

Chapters

0 - Setup

1 - Transformer models

2 - Using πŸ€— Transformers

3 - Fine-tuning a pretrained model

4 - Sharing models and tokenizers

5 - The πŸ€— Datasets library

6 - The πŸ€— Tokenizers library

7 - Main NLP tasks

8 - How to ask for help

Events

Translate to German

Hi there πŸ‘‹

Let's translate the course to German so that the whole community can benefit from this resource 🌎!

Below are the chapters and files that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

πŸ™‹ If you'd like others to help you with the translation, you can also post in our forums or tag @_lewtun on Twitter to gain some visibility.

Chapters

0 - Setup

1 - Transformer models

2 - Using πŸ€— Transformers

3 - Fine-tuning a pretrained model

4 - Sharing models and tokenizers

5 - The πŸ€— Datasets library

6 - The πŸ€— Tokenizers library

7 - Main NLP tasks

8 - How to ask for help

Events

Translate to Thai

Hi there πŸ‘‹

Let's translate the course to Thai so that the whole community can benefit from this resource 🌎!

Below are the chapters and files that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

πŸ™‹ If you'd like others to help you with the translation, you can also post in our forums or tag @_lewtun on Twitter to gain some visibility.

Chapters

0 - Setup

1 - Transformer models

2 - Using πŸ€— Transformers

3 - Fine-tuning a pretrained model

4 - Sharing models and tokenizers

5 - The πŸ€— Datasets library

6 - The πŸ€— Tokenizers library

7 - Main NLP tasks

8 - How to ask for help

Events

Update EN/Chapter1/3.mdx

Hello!

I am opening this issue to fix the crashed image/option in the 3rd Topic of Chapter 1 in English.

Crashed

Translate to Gujarati

Translation to Gujarati

Hi there πŸ‘‹

Let's translate the course to Gujarati so that the whole community can benefit from this resource 🌎!

Below are the chapters and files that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

Chapters

0 - Setup

1 - Transformer models

2 - Using πŸ€— Transformers

3 - Fine-tuning a pretrained model

4 - Sharing models and tokenizers

5 - The πŸ€— Datasets library

6 - The πŸ€— Tokenizers library

7 - Main NLP tasks

8 - How to ask for help

Events

@lewtun

Translation to Georgian

Hi there πŸ‘‹

Let's translate the course to Georgian so that the whole community can benefit from this resource 🌎!

Below are the chapters and files that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

πŸ™‹ If you'd like others to help you with the translation, you can also post in our forums or tag @_lewtun on Twitter to gain some visibility.

Chapters

0 - Setup

1 - Transformer models

2 - Using πŸ€— Transformers

3 - Fine-tuning a pretrained model

4 - Sharing models and tokenizers

5 - The πŸ€— Datasets library

6 - The πŸ€— Tokenizers library

7 - Main NLP tasks

8 - How to ask for help

Events

Translate to Tamil

Hi there πŸ‘‹

Let's translate the course to Tamil so that the whole community can benefit from this resource 🌎!

Below are the chapters and files that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

πŸ™‹ If you'd like others to help you with the translation, you can also post in our forums or tag @_lewtun on Twitter to gain some visibility.

Chapters

0 - Setup

1 - Transformer models

2 - Using πŸ€— Transformers

3 - Fine-tuning a pretrained model

4 - Sharing models and tokenizers

5 - The πŸ€— Datasets library

6 - The πŸ€— Tokenizers library

7 - Main NLP tasks

8 - How to ask for help

Events

Translate to Simplified Chinese (zh-CN)

Hi there πŸ‘‹

Let's translate the course to Simplified Chinese so that the whole community can benefit from this resource 🌎!

Below are the chapters and files that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

πŸ™‹ If you'd like others to help you with the translation, you can also post in our forums or tag @_lewtun on Twitter to gain some visibility.

We've also created a chinese-simplified-translations channel on the Hugging Face Discord in case you wish to discuss translation details there. Just follow the instructions here πŸ‘‰ https://discord.gg/hKnxnxUr

Chapters

0 - Setup

1 - Transformer models

2 - Using πŸ€— Transformers

3 - Fine-tuning a pretrained model

4 - Sharing models and tokenizers

5 - The πŸ€— Datasets library

6 - The πŸ€— Tokenizers library

7 - Main NLP tasks

8 - How to ask for help

Events

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.