Language	Source	Authors
English	`chapters/en`	@sgugger, @lewtun, @LysandreJik, @Rocketknight1, @sashavor, @osanseviero, @SaulLu, @lvwerra
Bengali (WIP)	`chapters/bn`	@avishek-018, @eNipu
German (WIP)	`chapters/de`	@JesperDramsch, @MarcusFra, @fabridamicelli
Spanish (WIP)	`chapters/es`	@camartinezbu, @munozariasjm, @fordaz
Persian (WIP)	`chapters/fa`	@jowharshamshiri, @schoobani
French	`chapters/fr`	@lbourdois, @ChainYo, @melaniedrevet, @abdouaziz
Gujarati (WIP)	`chapters/gu`	@pandyaved98
Hebrew (WIP)	`chapters/he`	@omer-dor
Hindi (WIP)	`chapters/hi`	@pandyaved98
Bahasa Indonesia (WIP)	`chapters/id`	@gstdl
Italian (WIP)	`chapters/it`	@CaterinaBi, @ClonedOne, @Nolanogenn, @EdAbati, @gdacciaro
Japanese (WIP)	`chapters/ja`	@hiromu166, @younesbelkada, @HiromuHota
Korean (WIP)	`chapters/ko`	@Doohae, @wonhyeongseo, @dlfrnaos19, @nsbg
Portuguese (WIP)	`chapters/pt`	@johnnv1, @victorescosta, @LincolnVS
Russian (WIP)	`chapters/ru`	@pdumin, @svv73, @blademoon
Thai (WIP)	`chapters/th`	@peeraponw, @a-krirk, @jomariya23156, @ckingkan
Turkish (WIP)	`chapters/tr`	@tanersekmen, @mertbozkir, @ftarlaci, @akkasayaz
Vietnamese	`chapters/vi`	@honghanhh
Chinese (simplified)	`chapters/zh-CN`	@zhlhyx, petrichor1122, @1375626371
Chinese (traditional) (WIP)	`chapters/zh-TW`	@davidpeng86

Tokenization Course Issues

Hello,

I believe the corpus and the word_freqs output used in the BPE / WordPiece implementations have a mismatch simply Course -> course is not capitalized in corpus but word_freqs seem to use the capitalized version.

To reproduce

corpus = [
    "This is the Hugging Face course.",
    "This chapter is about tokenization.",
    "This section shows several tokenizer algorithms.",
    "Hopefully, you will be able to understand how they are trained and generate tokens.",
]

tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")

word_freqs = defaultdict(int)
for text in corpus:
    words_with_offsets = tokenizer.backend_tokenizer.pre_tokenizer.pre_tokenize_str(text)
    words = [word for word, _ in words_with_offsets]
    for word in words:
        word_freqs[word] += 1

assert word_freqs == defaultdict(int, {'This': 3, 'is': 2, 'the': 1, 'Hugging': 1, 'Face': 1, 'Course': 1, '.': 4, 'chapter': 1, 'about': 1,
    'tokenization': 1, 'section': 1, 'shows': 1, 'several': 1, 'tokenizer': 1, 'algorithms': 1, 'Hopefully': 1,
    ',': 1, 'you': 1, 'will': 1, 'be': 1, 'able': 1, 'to': 1, 'understand': 1, 'how': 1, 'they': 1, 'are': 1,
    'trained': 1, 'and': 1, 'generate': 1, 'tokens': 1})

gif is too small and can't be seen

Hi, and thank you for the awesome course! 😀

small note: the gif here
https://huggingface.co/course/chapter4/2?fw=pt

can't be seen.

Thanks again,
Eli

Translation to Azerbaijani

Hi there 👋

Let's translate the course to Azerbaijani so that the whole community can benefit from this resource 🌎!

Below are the chapters and files that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

Chapters

0 - Setup

1.mdx

1 - Transformer models

2 - Using 🤗 Transformers

3 - Fine-tuning a pretrained model

4 - Sharing models and tokenizers

5 - The 🤗 Datasets library

6 - The 🤗 Tokenizers library

7 - Main NLP tasks

8 - How to ask for help

Events

1.mdx

Translation to Russian

Hi there 👋

Let's translate the course to Russian so that the whole community can benefit from this resource 🌎!

Below are the chapters and files that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

Chapters

0 - Setup

1.mdx pdumin

1 - Transformer models

2 - Using 🤗 Transformers

3 - Fine-tuning a pretrained model

4 - Sharing models and tokenizers

5 - The 🤗 Datasets library

6 - The 🤗 Tokenizers library

7 - Main NLP tasks

8 - How to ask for help

9 – Building and sharing demos

Events

Subtitles

Translate to Korean

Hi there 👋

Let's translate the course to Korean so that the whole community can benefit from this resource 🌎!

Below are the chapters and files that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

Chapters

0 - Setup

1.mdx Doohae

1 - Transformer models

2 - Using 🤗 Transformers

3 - Fine-tuning a pretrained model

4 - Sharing models and tokenizers

5 - The 🤗 Datasets library

6 - The 🤗 Tokenizers library

7 - Main NLP tasks

8 - How to ask for help

Events

Translate to Traditional Chinese (zh-TW)

Hi there 👋

Let's translate the course to Traditional Chinese so that the whole community can benefit from this resource 🌎!

Below are the chapters and files that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

Chapters

0 - Setup

1.mdx @davidpeng86

1 - Transformer models

2 - Using 🤗 Transformers

3 - Fine-tuning a pretrained model

4 - Sharing models and tokenizers

5 - The 🤗 Datasets library

6 - The 🤗 Tokenizers library

7 - Main NLP tasks

8 - How to ask for help

Events

1.mdx

Translation to Bengali

Hi there 👋

Let's translate the course to Bengali so that the whole community can benefit from this resource 🌎!

Below are the chapters and files that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

Chapters

0 - Setup

1.mdx eNipu

1 - Transformer models

2 - Using 🤗 Transformers

3 - Fine-tuning a pretrained model

4 - Sharing models and tokenizers

5 - The 🤗 Datasets library

6 - The 🤗 Tokenizers library

7 - Main NLP tasks

8 - How to ask for help

Events

1.mdx

Correction required in Chapter

Chap
The block at the end should say 'Model Output'

Translate to Spanish

Hi there 👋

Let's translate the course to Spanish so that the whole community can benefit from this resource 🌎!

Below are the files and chapters that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

Chapters

0 - Setup

1.mdx

1 - Transformer models

2 - Using 🤗 Transformers

3 - Fine-tuning a pretrained model

4 - Sharing models and tokenizers

5 - The 🤗 Datasets library

6 - The 🤗 Tokenizers library

7 - Main NLP tasks

8 - How to ask for help

Events

1.mdx

Translate to Marathi

Hi there 👋

Let's translate the course to Marathi so that the whole community can benefit from this resource 🌎!

Below are the chapters and files that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

We've also created a marathi-translations channel on the Hugging Face Discord in case you wish to discuss translation details there. Just follow the instructions here 👉 https://discord.gg/hKnxnxUr

Chapters

0 - Setup

1.mdx

1 - Transformer models

2 - Using 🤗 Transformers

3 - Fine-tuning a pretrained model

4 - Sharing models and tokenizers

5 - The 🤗 Datasets library

6 - The 🤗 Tokenizers library

7 - Main NLP tasks

8 - How to ask for help

Events

1.mdx

Translation to Polish

Hi there 👋

Let's translate the course to Polish so that the whole community can benefit from this resource 🌎!

Below are the chapters and files that need translating – let us know here if you'd like to translate any, and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

Chapters

0 - Setup

1.mdx

1 - Transformer models

2 - Using 🤗 Transformers

3 - Fine-tuning a pretrained model

4 - Sharing models and tokenizers

5 - The 🤗 Datasets library

6 - The 🤗 Tokenizers library

7 - Main NLP tasks

8 - How to ask for help

Events

1.mdx

Translation to Portuguese

Hi there 👋

Let's translate the course to Portuguese so that the whole community can benefit from this resource 🌎!

Below are the chapters and files that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

Chapters

0 - Setup

1.mdx @victorescosta

1 - Transformer models

2 - Using 🤗 Transformers

3 - Fine-tuning a pretrained model

4 - Sharing models and tokenizers

5 - The 🤗 Datasets library

6 - The 🤗 Tokenizers library

7 - Main NLP tasks

8 - How to ask for help

Events

1.mdx

Translate to French

Hi there 👋

Let's translate the course to French so that the whole community can benefit from this resource 🌎!

Below are the chapters and files that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

Chapters

0 - Setup

1.mdx @abdouaziz

1 - Transformer models

2 - Using 🤗 Transformers

3 - Fine-tuning a pretrained model

4 - Sharing models and tokenizers

5 - The 🤗 Datasets library

6 - The 🤗 Tokenizers library

7 - Main NLP tasks

8 - How to ask for help

Events

1.mdx

Translate to Hindi

Hi there 👋

Let's translate the course to HINDI so that the whole community can benefit from this resource 🌎!

Below are the chapters and files that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

Chapters

0 - Setup

1.mdx @pandyaved98

1 - Transformer models

2 - Using 🤗 Transformers

3 - Fine-tuning a pretrained model

4 - Sharing models and tokenizers

5 - The 🤗 Datasets library

6 - The 🤗 Tokenizers library

7 - Main NLP tasks

8 - How to ask for help

Events

1.mdx

Update Part III summary

Translation to Dutch

Hi there 👋

Let's translate the course to Dutch so that the whole community can benefit from this resource 🌎!

Below are the chapters and files that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

Chapters

0 - Setup

1.mdx

1 - Transformer models

2 - Using 🤗 Transformers

3 - Fine-tuning a pretrained model

4 - Sharing models and tokenizers

5 - The 🤗 Datasets library

6 - The 🤗 Tokenizers library

7 - Main NLP tasks

8 - How to ask for help

Events

1.mdx

A button towards the forum

Hi,

I copy/paste the message from our discussion on Discord @lewtun:

What do you think about adding a button at the beginning of web pages that would send the reader to the forum to ask a question if needed?

For example for this image, clicking on the button would take you to https://discuss.huggingface.co/t/chapter-1-questions/6797

I made this button in 10min on Paint to illustrate this idea, it will probably have to be redrawn.

Have a nice day,

Translating the course to Gujarati, spoken by around 50 million people

Hi there 👋

Let's translate the course to Gujarati so that the whole community can benefit from this resource 🌎!

Below are the chapters and files that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

We've also created a gujarati-translations channel on the Hugging Face Discord in case you wish to discuss translation details there. Just follow the instructions here 👉 https://discord.gg/hKnxnxUr

Chapters

0 - Setup

1.mdx

1 - Transformer models

2 - Using 🤗 Transformers

3 - Fine-tuning a pretrained model

4 - Sharing models and tokenizers

5 - The 🤗 Datasets library

6 - The 🤗 Tokenizers library

7 - Main NLP tasks

8 - How to ask for help

Events

1.mdx

Translation to Traditional Chinese (zh-TW/zh-HK)

Hi there 👋

Let's translate the course to Traditional Chinese (zh-TW/zh-HK) so that the whole community can benefit from this resource 🌎!

Below are the chapters and files that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

Chapters

0 - Setup

1.mdx

1 - Transformer models

2 - Using 🤗 Transformers

3 - Fine-tuning a pretrained model

4 - Sharing models and tokenizers

5 - The 🤗 Datasets library

6 - The 🤗 Tokenizers library

7 - Main NLP tasks

8 - How to ask for help

Events

1.mdx

Translate to Turkish

Hi there 👋

Let's translate the course to Turkish so that the whole community can benefit from this resource 🌎!

Below are the chapters and files that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

Chapters

0 - Setup

1.mdx @akkasayaz

1 - Transformer models

2 - Using 🤗 Transformers

3 - Fine-tuning a pretrained model

4 - Sharing models and tokenizers

5 - The 🤗 Datasets library

6 - The 🤗 Tokenizers library

7 - Main NLP tasks

8 - How to ask for help

Events

1.mdx

Translate to Arabic

Hi there 👋

Let's translate the course to Arabic so that the whole community can benefit from this resource 🌎!

Below are the chapters and files that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

Chapters

0 - Setup

1.mdx @giyaseddin

1 - Transformer models

2 - Using 🤗 Transformers

3 - Fine-tuning a pretrained model

4 - Sharing models and tokenizers

5 - The 🤗 Datasets library

6 - The 🤗 Tokenizers library

7 - Main NLP tasks

8 - How to ask for help

Events

1.mdx

Mistakes in Ch6 The Tokenizer Library

Hi, thanks for your excellent course. Recently, I found two (maybe) mistakes during the learning process.

In Ch 6 - The 🤗 Tokenizers library,

Specifically,

In Grouping Entities section,

import numpy as np

results = []
inputs_with_offsets = tokenizer(example, return_offsets_mapping=True)
tokens = inputs_with_offsets.tokens()
offsets = inputs_with_offsets["offset_mapping"]

idx = 0
while idx < len(predictions):
    pred = predictions[idx]
    label = model.config.id2label[pred]
    if label != "O":
        # Remove the B- or I-
        label = label[2:]
        start, _ = offsets[idx]

        # Grab all the tokens labeled with I-label
        all_scores = []
        while (
            idx < len(predictions)
            and model.config.id2label[predictions[idx]] == f"I-{label}"
        ):
-           all_scores.append(probabilities[idx][pred])
+           all_scores.append(probs[idx][predictions[idx]])
            _, end = offsets[idx]
            idx += 1

        # The score is the mean of all the scores of the tokens in that grouped entity
        score = np.mean(all_scores).item()
        word = example[start:end]
        results.append(
            {
                "entity_group": label,
                "score": score,
                "word": word,
                "start": start,
                "end": end,
            }
        )
    idx += 1

print(results)

In Handling long contexts,

candidates = []
for start_probs, end_probs in zip(start_probabilities, end_probabilities):
    scores = start_probs[:, None] * end_probs[None, :]
    idx = torch.triu(scores).argmax().item()

-   start_idx = idx // scores.shape[0]
-   end_idx = idx % scores.shape[0]
+   start_idx = idx // scores.shape[1]
+   end_idx = idx % scores.shape[1]
    score = scores[start_idx, end_idx].item()
    candidates.append((start_idx, end_idx, score))

print(candidates)

Translate to Italian

Hi there 👋

Let's translate the course to Italian so that the whole community can benefit from this resource 🌎!

Below are the chapters and files that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

Chapters

0 - Setup

1.mdx @CaterinaBi

1 - Transformer models

2 - Using 🤗 Transformers

3 - Fine-tuning a pretrained model

4 - Sharing models and tokenizers

5 - The 🤗 Datasets library

6 - The 🤗 Tokenizers library

7 - Main NLP tasks

8 - How to ask for help

Events

1.mdx

HI/Chapter1/10.mdx is not loading as Quiz - Shows 500

Hi there 👋

Let's translate the course to YOUR-LANG so that the whole community can benefit from this resource 🌎!

Below are the chapters and files that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

Chapters

0 - Setup

1.mdx

1 - Transformer models

2 - Using 🤗 Transformers

3 - Fine-tuning a pretrained model

4 - Sharing models and tokenizers

5 - The 🤗 Datasets library

6 - The 🤗 Tokenizers library

7 - Main NLP tasks

8 - How to ask for help

Events

1.mdx

Translate to Japanese

Hi there 👋

Let's translate the course to YOUR-LANG so that the whole community can benefit from this resource 🌎!

Below are the chapters and files that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

We've also created a japanese-translations channel on the Hugging Face Discord in case you wish to discuss translation details there. Just follow the instructions here 👉 https://discord.gg/hKnxnxUr

Chapters

0 - Setup

1.mdx @hiromu166

1 - Transformer models

2 - Using 🤗 Transformers

3 - Fine-tuning a pretrained model

4 - Sharing models and tokenizers

5 - The 🤗 Datasets library

6 - The 🤗 Tokenizers library

7 - Main NLP tasks

8 - How to ask for help

Events

1.mdx @trtd56

Issue with Chapter 2 (Tokenizers/Tokenization) - output is different

Hey Hugging face,

I am opening this issue because I am currently following your course (that is great by the way) and noticed some difference between the output displayed in the tokenization section and the output in my notebook.

On your lesson it looks like this

When I am executing the same code in my EC2 instance on AWS I have that

with this list ['Using', 'a', 'Trans', '##former', 'network', 'is', 'simple'] .

I am presuming that the model bert-base-cased could have changed since the edition of the course so it really not a big deal but it's impact the conclusion here

I don't know if the issue is on my side so there is the details on my current setup if you want to try to reproduce it:

AWS ec2 p3.2xlarge
environment miniconda Latest
dependencies (in the requirements.txt)

requirements.txt

Don't hesitate if you have any questions

Translate to Kannada

Hi there 👋

Let's translate the course to KANNADA so that the whole community can benefit from this resource 🌎!

Below are the chapters and files that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

Chapters

0 - Setup

1.mdx

1 - Transformer models

2 - Using 🤗 Transformers

3 - Fine-tuning a pretrained model

4 - Sharing models and tokenizers

5 - The 🤗 Datasets library

6 - The 🤗 Tokenizers library

7 - Main NLP tasks

8 - How to ask for help

Events

1.mdx

Translation to Persian

Hi there 👋

Let's translate the course to Persian so that the whole community can benefit from this resource 🌎!

Before you get started please take a minute to read the latest version of our evolving translation guidelines (T). It is important that we maintain a common tone in our collective work, while contributing with our separate creative voices.

We have a glossary page (G) where we store our latest choice of Persian equivalents for words. This page may be subject to change with every PR and its review discussion. If there are changes, we will mention here that the glossary file has been updated. We need to retroactively apply the changes to our sections.

Check here for general instructions on contributing.

Here's the workflow for contributions:

Please fork the Hugging Face course to your profile.
Clone your fork to your local machine.
Use this issue page for general discussion on word choices and whatnot.
Fetch frequently from upstream to your fork and keep your local working tree updated.
It is perfectly fine to link to your fork on this page for discussions.
When you have the first draft of a page(s) done commit back to your fork and open a PR for that page(s) on the Hugging Face course repo.
(Huggingface course/main branch <- Your fork/main or whatever branch you have)
Ask someone to help you review the page(s) there. Commit the changes back to your fork and they will automatically be appended to the PR.
If you have updates to the glossary try to include the stakeholders in the discussion(check commit history) and when done mention the changes on this page so we can all apply the changes retroactively to our sections.
When done with the review, ask @lewtun to merge.

Below are the chapters and files that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

Chapters

0 - Setup

1.mdx @jowharshamshiri

1 - Transformer models

2 - Using 🤗 Transformers

3 - Fine-tuning a pretrained model

4 - Sharing models and tokenizers

5 - The 🤗 Datasets library

6 - The 🤗 Tokenizers library

7 - Main NLP tasks

8 - How to ask for help

Events

1.mdx

Add book reference

Translation to Hebrew

Hi there 👋

Let's translate the course to Hebrew so that the whole community can benefit from this resource 🌎!

Below are the chapters and files that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

Chapters

0 - Setup

1.mdx @omer-dor

1 - Transformer models

2 - Using 🤗 Transformers

3 - Fine-tuning a pretrained model

4 - Sharing models and tokenizers

5 - The 🤗 Datasets library

6 - The 🤗 Tokenizers library

7 - Main NLP tasks

8 - How to ask for help

Events

1.mdx

Translate to Telugu

Hi there 👋

Let's translate the course to TELUGU so that the whole community can benefit from this resource 🌎!

Below are the chapters and files that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

Chapters

0 - Setup

1.mdx

1 - Transformer models

2 - Using 🤗 Transformers

3 - Fine-tuning a pretrained model

4 - Sharing models and tokenizers

5 - The 🤗 Datasets library

6 - The 🤗 Tokenizers library

7 - Main NLP tasks

8 - How to ask for help

Events

1.mdx

Update "Open in Colab" tip to reflect AWS button

Translate to Malayalam

Hi there 👋

Let's translate the course to MALAYALAM so that the whole community can benefit from this resource 🌎!

Below are the chapters and files that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

Chapters

0 - Setup

1.mdx

1 - Transformer models

2 - Using 🤗 Transformers

3 - Fine-tuning a pretrained model

4 - Sharing models and tokenizers

5 - The 🤗 Datasets library

6 - The 🤗 Tokenizers library

7 - Main NLP tasks

8 - How to ask for help

Events

1.mdx

Translate to German

Hi there 👋

Let's translate the course to German so that the whole community can benefit from this resource 🌎!

Below are the chapters and files that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

Chapters

0 - Setup

1.mdx @MarcusFra

1 - Transformer models

2 - Using 🤗 Transformers

3 - Fine-tuning a pretrained model

4 - Sharing models and tokenizers

5 - The 🤗 Datasets library

6 - The 🤗 Tokenizers library

7 - Main NLP tasks

8 - How to ask for help

Events

1.mdx

Translate to Thai

Hi there 👋

Let's translate the course to Thai so that the whole community can benefit from this resource 🌎!

Below are the chapters and files that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

Chapters

0 - Setup

1.mdx @peeraponw

1 - Transformer models

2 - Using 🤗 Transformers

3 - Fine-tuning a pretrained model

4 - Sharing models and tokenizers

5 - The 🤗 Datasets library

6 - The 🤗 Tokenizers library

7 - Main NLP tasks

8 - How to ask for help

Events

1.mdx

Empty Building Model Card section

Hello 👋
Building Model Card section in chapter 4 seems empty. (link goes to nowhere)
I'll check what's wrong but if anyone knows feel free to do it before me :') pinging @lewtun here

Persian Translation

Translate Huggingface course into Persian

Update EN/Chapter1/3.mdx

Hello!

I am opening this issue to fix the crashed image/option in the 3rd Topic of Chapter 1 in English.

Schematic incorrectly shows model output as model input

The following schematic should show the text in the last block as model output instead of model input.

The fix should go into this markdown - https://github.com/huggingface/course/blob/main/chapters/en/chapter2/2.mdx#model-heads-making-sense-out-of-numbers.

Translate to Gujarati

Translation to Gujarati

Hi there 👋

Let's translate the course to Gujarati so that the whole community can benefit from this resource 🌎!

Below are the chapters and files that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

Chapters

0 - Setup

1.mdx

1 - Transformer models

2 - Using 🤗 Transformers

3 - Fine-tuning a pretrained model

4 - Sharing models and tokenizers

5 - The 🤗 Datasets library

6 - The 🤗 Tokenizers library

7 - Main NLP tasks

8 - How to ask for help

Events

1.mdx

@lewtun

Translation to Georgian

Hi there 👋

Let's translate the course to Georgian so that the whole community can benefit from this resource 🌎!

Below are the chapters and files that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

Chapters

0 - Setup

1.mdx

1 - Transformer models

2 - Using 🤗 Transformers

3 - Fine-tuning a pretrained model

4 - Sharing models and tokenizers

5 - The 🤗 Datasets library

6 - The 🤗 Tokenizers library

7 - Main NLP tasks

8 - How to ask for help

Events

1.mdx

Mistake in Unigram tokenization

Hi! Is it a mistake? There should be 17 instead of 5 in the end.

Rename `master` branch to `main` before going live

Add Gradio team to list of authors in Chapter 1

Add Docs/Demos/Blog Section to the end of Gradio Course

I think this kind of Gradio new user flow would be excellent:

gradio.app getting_started
Gradio Course
repo demos, blog posts, gradio docs

Related slack message
Related issue

cc: @lewtun

Translate to Tamil

Hi there 👋

Let's translate the course to Tamil so that the whole community can benefit from this resource 🌎!

Below are the chapters and files that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

Chapters

0 - Setup

1.mdx sandhiyaprabhasss

1 - Transformer models

2 - Using 🤗 Transformers

3 - Fine-tuning a pretrained model

4 - Sharing models and tokenizers

5 - The 🤗 Datasets library

6 - The 🤗 Tokenizers library

7 - Main NLP tasks

8 - How to ask for help

Events

1.mdx

Replace Inference API widgets with Gradio ones

In Chapter 7 we have examples of the Inference API widgets embedded in the sections:

It would be cool to use Gradio demos here as a precursor to Chapter 9

cc @abidlabs @dawoodkhan82 @osanseviero

Translate to Simplified Chinese (zh-CN)

Hi there 👋

Let's translate the course to Simplified Chinese so that the whole community can benefit from this resource 🌎!

Below are the chapters and files that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

We've also created a chinese-simplified-translations channel on the Hugging Face Discord in case you wish to discuss translation details there. Just follow the instructions here 👉 https://discord.gg/hKnxnxUr

Chapters

0 - Setup

1.mdx

1 - Transformer models

2 - Using 🤗 Transformers

3 - Fine-tuning a pretrained model

4 - Sharing models and tokenizers

5 - The 🤗 Datasets library

6 - The 🤗 Tokenizers library

7 - Main NLP tasks

8 - How to ask for help

Events

1.mdx

Fix cost comparison in the chapter of "How do Transformers work?"

I think you wanted to write here "$ in compute" to contrast fine tuning with the costly process of training from scratch.

Great content, any way, thank you very much!

Dark mode for Gradio demos is hard to read

The text in the demo below from chapter91/mdx is quite hard to read:

Is it possible to tweak this easily @dawoodkhan82 ?

huggingface / course Goto Github PK

course's Introduction

The Hugging Face Course

🌎 Languages and translations

Translating the course into your language

📔 Jupyter notebooks

✍️ Contributing a new chapter

🙌 Acknowledgements

course's People

Contributors

Stargazers

Watchers

Forkers

course's Issues

Chapters

Chapters

Chapters

Chapters

Chapters

Chapters

Chapters

Chapters

Chapters

Chapters

Chapters

Chapters

Chapters

Chapters

Chapters

Chapters

Chapters

Chapters

Chapters

Chapters

Chapters

Chapters

Chapters

Chapters

Chapters

Chapters

Translation to Gujarati

Chapters

Chapters

Chapters

Chapters

Recommend Projects

Recommend Topics

Recommend Org