humansignal / label-studio-transformers Goto Github PK

Label data using HuggingFace's transformers and automatically get a prediction service

License: Apache License 2.0

Python 100.00%

label-studio nlp transformers natural-language-processing natural-language-understanding bert pytorch-transformers text-labeling data-labeling

label-studio-transformers's Introduction

Label Studio for Hugging Face's Transformers

Website • Docs • Twitter • Join Slack Community

Transfer learning for NLP models by annotating your textual data without any additional coding.

This package provides a ready-to-use container that links together:

Label Studio as annotation frontend
Hugging Face's transformers as machine learning backend for NLP

Quick Usage

Install Label Studio and other dependencies

pip install -r requirements.txt

Create ML backend with BERT classifier

label-studio-ml init my-ml-backend --script models/bert_classifier.py
cp models/utils.py my-ml-backend/utils.py

# Start ML backend at http://localhost:9090
label-studio-ml start my-ml-backend

# Start Label Studio in the new terminal with the same python environment
label-studio start

Create a project with Choices and Text tags in the labeling config.
Connect the ML backend in the Project settings with http://localhost:9090

Create ML backend with BERT named entity recognizer

label-studio-ml init my-ml-backend --script models/ner.py
cp models/utils.py my-ml-backend/utils.py

# Start ML backend at http://localhost:9090
label-studio-ml start my-ml-backend

# Start Label Studio in the new terminal with the same python environment
label-studio start

Create a project with Labels and Text tags in the labeling config.
Connect the ML backend in the Project settings with http://localhost:9090

Training and inference

The browser opens at http://localhost:8080. Upload your data on Import page then annotate by selecting Labeling page. Once you've annotate sufficient amount of data, go to Model page and press Start Training button. Once training is finished, model automatically starts serving for inference from Label Studio, and you'll find all model checkpoints inside my-ml-backend/<ml-backend-id>/ directory.

Click here to read more about how to use Machine Learning backend and build Human-in-the-Loop pipelines with Label Studio

License

This software is licensed under the Apache 2.0 LICENSE © Heartex. 2020

label-studio-transformers's People

Contributors

Stargazers

Watchers

label-studio-transformers's Issues

TypeError: TransformersBasedTagger.init() takes 1 positional argument but 2 were given

while making connection with label studio

When predict ner datas use ner samples [KeyError: 'ner'] has been occured

When predict ner datas,next error has been occured,I print tasks,I found out that the place where it should be 'ner' was programmed with '$undefined$' ,

tasks data:

tasks:[{'id': 17, 'data': {'$undefined$': 'This work proposes a novel adaptation of a pretrained sequence-to-sequence model to the task of document ranking.'}, 'meta': {}, 'created_at': '2021-07-05T02:30:36.230799Z', 'updated_at': '2021-07-05T02:30:36.230834Z', 'is_labeled': True, 'overlap': 1, 'project': 10, 'file_upload': 6, 'annotations': [{'id': 17, 'created_username': ' [email protected], 1', 'created_ago': '0\xa0minutes', 'completed_by': 1, 'result': [{'value': {'start': 54, 'end': 74, 'text': 'sequence-to-sequence', 'labels': ['ORG']}, 'id': 'FA_HTHugoH', 'from_name': 'label', 'to_name': 'text', 'type': 'labels'}], 'was_cancelled': False, 'ground_truth': False, 'created_at': '2021-07-05T02:38:39.940285Z', 'updated_at': '2021-07-05T02:38:39.940321Z', 'lead_time': 11.491, 'task': 17}], 'predictions': []}]

error:

[2021-07-05 10:38:40,048] [ERROR] [label_studio_ml.exceptions::exception_f::53] Traceback (most recent call last):
  File "/workspace/label-studio/label-studio-ml-backend/label_studio_ml/exceptions.py", line 39, in exception_f
    return f(*args, **kwargs)
  File "/workspace/label-studio/label-studio-ml-backend/label_studio_ml/api.py", line 31, in _predict
    predictions, model = _manager.predict(tasks, project, label_config, force_reload, try_fetch, **params)
  File "/workspace/label-studio/label-studio-ml-backend/label_studio_ml/model.py", line 274, in predict
    predictions = m.model.predict(tasks, **kwargs)
  File "/workspace/label-studio/label-studio-transformers/ner-backend-test/ner.py", line 369, in predict
    texts = [task['data'][self.value] for task in tasks]
  File "/workspace/label-studio/label-studio-transformers/ner-backend-test/ner.py", line 369, in <listcomp>
    texts = [task['data'][self.value] for task in tasks]
KeyError: 'ner'

Traceback (most recent call last):
  File "/workspace/label-studio/label-studio-ml-backend/label_studio_ml/exceptions.py", line 39, in exception_f
    return f(*args, **kwargs)
  File "/workspace/label-studio/label-studio-ml-backend/label_studio_ml/api.py", line 31, in _predict
    predictions, model = _manager.predict(tasks, project, label_config, force_reload, try_fetch, **params)
  File "/workspace/label-studio/label-studio-ml-backend/label_studio_ml/model.py", line 274, in predict
    predictions = m.model.predict(tasks, **kwargs)
  File "/workspace/label-studio/label-studio-transformers/ner-backend-test/ner.py", line 369, in predict
    texts = [task['data'][self.value] for task in tasks]
  File "/workspace/label-studio/label-studio-transformers/ner-backend-test/ner.py", line 369, in <listcomp>
    texts = [task['data'][self.value] for task in tasks]
KeyError: 'ner'

Complicated

Why is this so complicated :( ?

How to use the prediction service?

Hi, thanks for this great tool.
However I couldn't find a detailed instruction of using the prediction service both here and https://github.com/heartexlabs/label-studio/blob/master/docs/source/guide/tasks.md.
I'd like to generate NER annotations after training my model, select the uncertain predictions, and then continue labeling.
Thanks in advance.

Ner.py pretrained_config_archive_map not found for any model

On the initialitation process
label-studio-ml init smdia-backend-ner --script models/ner.py --force

I'm receiving this error to all the models
AttributeError: type object 'BertConfig' has no attribute 'pretrained_config_archive_map'


Traceback (most recent call last):
  File "/usr/local/bin/label-studio-ml", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.6/dist-packages/label_studio_ml/server.py", line 119, in main
    create_dir(args)
  File "/usr/local/lib/python3.6/dist-packages/label_studio_ml/server.py", line 73, in create_dir
    model_classes = get_all_classes_inherited_LabelStudioMLBase(script_path)
  File "/usr/local/lib/python3.6/dist-packages/label_studio_ml/utils.py", line 29, in get_all_classes_inherited_LabelStudioMLBase
    module = importlib.import_module(module_name)
  File "/usr/lib/python3.6/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 678, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/labelstudio/label-studio-transformers/models/ner.py", line 36, in <module>
    [list(conf.pretrained_config_archive_map.keys()) for conf in (BertConfig,CamembertConfig, RobertaConfig, DistilBertConfig)],
  File "/labelstudio/label-studio-transformers/models/ner.py", line 36, in <listcomp>
    [list(conf.pretrained_config_archive_map.keys()) for conf in (BertConfig,CamembertConfig, RobertaConfig, DistilBertConfig)],
AttributeError: type object 'BertConfig' has no attribute 'pretrained_config_archive_map'

I tried to downgrade transformers to 2.0.0 but them fails the transformers import

could someone check this issue?

Can't make predictions: ML backend returns an error (ner.py)

Steps to reproduce:

Using docker to start up the server docker-compose up --build
Used import sample with three tasks [{"text":"To have faith is to trust yourself to the water"},{"text":"To have faith is to trust yourself to the water"},{"text":"To have faith is to trust yourself to the water"}]
Completed two tasks and trained huggingface transformer from ner.py.
Go to UI for third task prediction.
No prediction.

Requirements:
torch==1.5.0
transformers==2.4.1
tensorboardX==1.9
label-studio>=0.7.0

Full logs are here:

[2020-08-31 15:07:49,882] [ERROR] [label_studio.utils.models::make_predictions::528] Can't make predictions: ML backend returns an error: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<title>500 Internal Server Error</title>
<h1>Internal Server Error</h1>
<p>The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.</p>

Can you help me please with this issue?

--ml-backend-url option changed and where should i put --ml-backend-name ?

The current label-studio start command in the docker-compose.yml contains these options.
https://github.com/heartexlabs/label-studio-transformers/blob/9450322/docker-compose.yml#L15-L16

      --ml-backend-url http://label-studio-ml-backend:9090
      --ml-backend-name my_model

but the current label-studio doesn't have them.

label-studio start -h and I found it changed to --ml-backend.
I fixed this and I could see localhost:8200.

but where should I put a model name?

not showing predictions after training

Describe the bug
There are 2 problems:

After training ML backend model, I cannot find the model predictions in the UI when labelling.
Often cannot train all 100 epochs, the system will crash at middle, 30-70 epochs although dataset is small (50) and have GPU.
Error shows that: get latest job results from work dir doesn’t exist
Sometimes, when 3 not occurs, other issue is that: unable to load weight from pytorch checkpoint file.

To reproduce
Steps to reproduce the behaviour

Import pre annotated data
Manually label some of them
Go to ML UI in Setting, connect model (BERT classifier) and start training
After finishing, come back to Label UI. In prediction tab, only the pre annotated predictions are shown.

Expected behaviour
ML training should be completed and new predictions should be shown in UI

How to load trained bert ner model in python and do prediction on a new text?

Hi, i have trained one bert ner model through ML backend. Then, I would like to share the trained model with my colleagues and they could use the model to do predictions on new text data. How could we load the trained model in python and do prediction on new text data?

No module named label_studio_ml.api while starting ML backend

Was able to create ML backends successfully based on bert_classifier.py with:
"label-studio-ml init my-ml-backend-bert --script models/bert_classifier.py"

but while starting it with command "label-studio-ml start my-ml-backend-bert" i'm getting following error:

__File "././my-ml-backend-bert/_wsgi.py", line 30, in
from label_studio_ml.api import init_app
ImportError: No module named label_studio_ml.api

Also tried with other classifiers from this source
"https://github.com/heartexlabs/label-studio-ml-backend/tree/master/label_studio_ml/examples"
but each of them gives me the same error while starting.

label-studio requirement incorrect, also getting old

The README example does not work as is -- label-studio==1.0.0 does not provide the command label-studio-ml, and does not expose LabelStudioMLBase.

It works OK with label-studio==0.7, but that's not what's specified in requirements.txt.

(NB that it also doesn't work with the current head of label-studios-ml-backend).

Error with ner.py

When using the quick start for BERT NER:
label-studio-ml init my-ml-backend --script models/ner.py

This error occurs:
AttributeError: type object 'BertConfig' has no attribute 'pretrained_config_archive_map'

How do I know which model is trained for ner?

In ner model code, there are four model classes. how do i know which model is trained? like Bert or Roberta?

AttributeError: 'TransformersBasedTagger' object has no attribute '_tokenizer'

predict run into error:
label-studio-ml-backend/ner-ml-backend/ner.py", line 368, in predict
predict_set = SpanLabeledTextDataset(texts, tokenizer=self._tokenizer, **self._dataset_params_dict)
AttributeError: 'TransformersBasedTagger' object has no attribute '_tokenizer'