Giter Site home page Giter Site logo

doccano / doccano-client Goto Github PK

View Code? Open in Web Editor NEW
77.0 12.0 59.0 1.45 MB

A simple client for doccano API.

Home Page: https://doccano.github.io/doccano-client/

License: MIT License

Python 99.97% Makefile 0.03%
doccano annotation machine-learning natural-language-processing api-wrapper api-client data-labeling text-annotation upload-file python

doccano-client's Introduction

doccano client

Codacy Badge Tests

A simple client for the doccano API.

Installation

To install doccano-client, simply run:

pip install doccano-client

Usage

from doccano_client import DoccanoClient

# instantiate a client and log in to a Doccano instance
client = DoccanoClient('http://doccano.example.com')
client.login(username='username', password='password')

# get basic information about the authorized user
user = client.get_profile()

# list all projects
projects = client.list_projects()

Please see the documentation for further details.

Doccano API BETA Client

We're introducing a newly revamped Doccano API Client that features more Pythonic interaction as well as more testing and documentation. It also adds more regulated compatibility with specific Doccano release versions.

You can find the documentation on usage of the beta client here.

doccano-client's People

Contributors

afparsons avatar ayanamizuta avatar creisle avatar daleevans avatar david-engelmann avatar dependabot[bot] avatar dsciacca avatar eandrewjones avatar ghontolux avatar guigarfr avatar harmw avatar hironsan avatar houssam7737 avatar kuraga avatar lance132 avatar leonardlin avatar ljades avatar louisguitton avatar rolisz avatar st-hakky avatar tmarice avatar youichiro avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

doccano-client's Issues

Add ability to create a new project via api

Feature description

Often I need to create multiple, very similar projects. Currently the workflow is to go into the UI and manually create these - it's very repetitive and time consuming. It would be great to have the ability to create a new project using the API, including the ability to upload a dataset, labels, and assign members.

Is it possible to add annotations for a document directly to postgres?

I'm trying to add annotations to my sequence labelling and classifications project using post_doc_upload but this doesn't seem to be working as the data never appears in doccano? I have also tried to do this using the create_document method but this only creates documents and doesn't add any annotations.

I think it would be easier to upload the documents and then update the annotations in postgres. Which tables do i need to update?

How to reproduce the behaviour

instantiate a client and log in to a Doccano instance

doccano_client = DoccanoClient(
    "https://doccano.test.com/", "admin", "password"
)

doccano_client.post_doc_upload(37, file_name='ner_file.jsonl', file_path=f'{local_base_path}outfile', format='json', column_data = "text", column_label = "labels",)

Test data looks like:

{"text": "ok test volume", "labels": [[3, 10, "search"]]}
{"text": "ok testing toggle only mode", "labels": [[3, 10, "screen"], [18, 27, "screen"]]}

Your Environment

  • Operating System: MacOS Monterey 12.1
  • Python Version: 3.8
  • Package Version: 1.0.3

Entity span highlight invisible

Hello,

I'm facing an issue with sequence labeling packages pushed with doccano client where, once the name of the entity type is selected from the drop down menu appearing after having selected a text span, the span is not underlined or highlighted in anyway. This means that the annotator has no visual feedback of what he or she did. It looks exactly like if the example hadn't been annotated. Note that if one downloads the data, the labels added are actually there with the span boundaries and everything, in the .jsonl file.
Also note that if one creates manually a doccano project using the GUI in the browser, and using exactly the same data, labels, and guideline, this bug is NOT present.

Here is the sample code I use:

from doccano_api_client import DoccanoClient
import os

main_dir = os.getcwd()

doccano_url = 'http://0.0.0.0/'
doccano_client = DoccanoClient(
    baseurl=doccano_url,
    username='admin',
    password='password'
)

data_files_dir = os.path.join(os.getcwd(), 'Toy_data')
data_fn = os.path.join(data_files_dir, 'toy_data.jsonl')

ent_guideline_fn = os.path.join(data_files_dir, 'ent_guideline.md')
with open(ent_guideline_fn, encoding='utf-8', mode='r') as f:
    ent_guideline_string = f.read()

ent_description = 'Entities annotation for toy data'
ent_project_type = 'SequenceLabeling'
ent_response = doccano_client.create_project(name='toy_ent_api',
                                             description=ent_description,
                                             project_type=ent_project_type,
                                             guideline=ent_guideline_string)

ent_docc_proj_id = ent_response['id']

_ = doccano_client.post_doc_upload(project_id=ent_docc_proj_id,
                                   file_name=data_fn,
                                   file_path=data_files_dir,
                                   format='JSONL')

ent_docc_labels_fn = os.path.join(data_files_dir, 'ent_label_config.json')
_ = doccano_client.post_label_upload(project_id=ent_docc_proj_id,
                                     file_name=ent_docc_labels_fn,
                                     file_path=data_files_dir)

The code runs without errors.

toy_data.jsonl contains

{"id": "12345", "text": "I love doccano", "label": []}
{"id": "12346", "text": "Napoleon was French", "label": []}

ent_guideline.md contains

## product
For products
## person
For people

ent_label_config.json contains

[
  {
    "id": 1,
    "text": "product",
    "prefixKey": null,
    "suffixKey": null,
    "backgroundColor": "#FA28FF",
    "textColor": "#ffffff"
  },
  {
    "id": 2,
    "text": "person",
    "prefixKey": null,
    "suffixKey": null,
    "backgroundColor": "#A4DD00",
    "textColor": "#ffffff"
  }
]

I run doccano 1.5.5 in a local container on my computer running with Ubuntu 20.04. I created the image using
docker-compose -f docker-compose.prod.yml --env-file .env build
I ran it using
docker-compose -f docker-compose.prod.yml --env-file .env up
Note that I had to use the fix explained here:
https://github.com/doccano/doccano/pull/1739/files
to modify the files
Dockerfile
nginx/Dockerfile
in order to be able to build the docker image.

I use python 3.8.12 to run my script. Here is the output of pip feeze:

attrs==21.4.0
certifi==2022.5.18.1
charset-normalizer==2.0.12
doccano-client==1.0.3
idna==3.3
jsonlines==3.0.0
numpy==1.22.4
pandas==1.4.2
python-dateutil==2.8.2
pytz==2022.1
requests==2.27.1
six==1.16.0
urllib3==1.26.9

In particular, I use doccano-client 1.0.3

role names key mismatch - fix specified

How to reproduce the behaviour

image
image

The key in code should be "name" not "rolename". mismatched.

this is me debugging post_members not working.

Changed the line locally for myself to
role = list(filter(lambda role_info: role_info["name"] == rolename, res_roles))

and now works.
Your Environment

  • Operating System:
  • Python Version:
  • Package Version:

doccano-client failed to upload a file

How to reproduce the behaviour

doccano_client = DoccanoClient(
    'http://127.0.0.1:8000/',
    'admin',
    'password'
)
project_info = doccano_client.create_project(name="First Python Project")
upload_file = doccano_client.post_doc_upload(project_id=project_info['id'],
                                             file_name='anexample.json',
                                             format='JSON'
                                             )

Content of an example:

[
    {
        "text": "**Terrible** customer **service**.",
        "label": [],
        "metadata": "hi there",
        "ticker": "TSL"
    },
    {
        "text": "**Great** customer service.",
        "label": [],
        "metadata": "hi there",
        "ticker": "TSL"
    }
]

The problem is, I get exit code 0, but when I look in the dataset for the project, the content is not uploaded. It only works when I do it in GUI.

Your Environment

  • Operating System:
  • Python Version Used: 3.7
  • When you install doccano: 7/5/21
  • How did you install doccano (Heroku button etc): docker

Please Release v1.0.3 as pypi package

I think there are some improvements in these days.

I don't know the release cycle for this package, but we might want to use new features and bugfix from pypi package.

@Hironsan How do you think about this ;)

project_labels = self.list_label_types(project_id,"relation")

File "/home/finq/miniconda3/envs/adam/lib/python3.10/site-packages/doccano_client/client.py", line 146, in relation_type
service = LabelTypeService(self._relation_type_repository)
AttributeError: 'DoccanoAPI' object has no attribute '_relation_type_repository'

how to get list of lables in the project for new client version, How to pass Literal parameters. Thanks !

can not login

HTTPConnectionPool(host='127.0.0.1', port=8000): Max retries exceeded with url: /v1/auth/login/ (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f4c07cb8280>: Failed to establish a new connection: [Errno 111] Connection refused'))

beta client 1.6+ compatibility

How to reproduce the behaviour

I have forked the doccano-client and in my local environment made a few updates that it's compatible with 1.7.
Esp. project-attributes
span-types (ex labels)
relation-types (ex labels)

It would be easier to have a client that only needs to support 1.6+
or is the goal to have the beta client support 1.5+?

Does doccano-server support reporting back its own version, so that the client could feature toggle based on version?

I'm happy to merge back my local changes of the beta client to the repo

Your Environment

  • Operating System:
  • Python Version:
  • Package Version:

cannot import name dataclass_transform

python=3.8.8

pip install doccano-client

but when I import it:

from doccano_client import DoccanoClient
Traceback (most recent call last):
  File "D:\software\ANACONA\lib\site-packages\IPython\core\interactiveshell.py", line 3437, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-2-45fa2064b15e>", line 1, in <module>
    from doccano_client import DoccanoClient
  File "D:\software\PyCharm 2020.2.2\plugins\python\helpers\pydev\_pydev_bundle\pydev_import_hook.py", line 21, in do_import
    module = self._system_import(name, *args, **kwargs)
  File "D:\software\ANACONA\lib\site-packages\doccano_client\__init__.py", line 1, in <module>
    from doccano_client.client import DoccanoClient
  File "D:\software\PyCharm 2020.2.2\plugins\python\helpers\pydev\_pydev_bundle\pydev_import_hook.py", line 21, in do_import
    module = self._system_import(name, *args, **kwargs)
  File "D:\software\ANACONA\lib\site-packages\doccano_client\client.py", line 6, in <module>
    from doccano_client.models.comment import Comment
  File "D:\software\PyCharm 2020.2.2\plugins\python\helpers\pydev\_pydev_bundle\pydev_import_hook.py", line 21, in do_import
    module = self._system_import(name, *args, **kwargs)
  File "D:\software\ANACONA\lib\site-packages\doccano_client\models\comment.py", line 3, in <module>
    from pydantic import BaseModel
  File "D:\software\PyCharm 2020.2.2\plugins\python\helpers\pydev\_pydev_bundle\pydev_import_hook.py", line 21, in do_import
    module = self._system_import(name, *args, **kwargs)
  File "pydantic\__init__.py", line 2, in init pydantic.__init__
  File "pydantic\dataclasses.py", line 52, in init pydantic.dataclasses
ImportError: cannot import name dataclass_transform

API for easy and seamless import/export dataset of Doccano's database from python script to allow Human-in-the-loop and Active Learning capabilities

Feature description

Hello, I have been looking for a NER labelling tool that allows me to quickly iterate after a labelling session, training a model an infer with the trained models for later check of these inferences for an Active Learning problem I am dealing with. I have found that Doccano is an excellent tool, but as of my understanding, it lacks of quick import/export functionality, because I have to deal with files, upload them, tag/revise them, download new annotated dataset, etc.

In my opinion, it would be much easier to perform this task if I could directly query the database Doccano uses under the hood to keep all this texts and tags info, and play around with it.

Is there any feature already implemented that could help me achieve this task?

If it is not implemented: Is something related in the roadmap for this tool?

Thank you

add simple user through HTTP request

Hi, first of all nice work ! I use it a lot.

I don't manage to create (sign up) some new users by HTTP requests
add_new_user(username, password, email)
Do you think it would be possible ?

For now I create a lot of admin users (by adding a lot of admin users in the doccano/tools/run.sh). But I don't like this way.

Have a nice day !

Uploading the file from the API fails

Hello, I made an error uploading the document using the API, but the document can appear in the Doccano instance database. Looking forward to your reply to my questions, thank you very much!

r_du = doccano_client.post_doc_upload(8, "json", "fail1.json", "D:\WPSCloud")

error:
Traceback (most recent call last): File "D:/doccano/doccano-client/doccano_api_client/example.py", line 113, in r_du = doccano_client.post_doc_upload(8, "csv", "data_xa.csv", "D:\WPSCloud") File "D:\doccano\doccano-client\doccano_api_client_init_.py", line 521, in post_doc_upload data=data File "D:\doccano\doccano-client\doccano_api_client_init_.py", line 64, in post request_url, data=data, files=files, json=json).json() File "E:\Anaconda\lib\site-packages\requests\models.py", line 885, in json return complexjson.loads(self.text, **kwargs) File "E:\Anaconda\lib\json_init_.py", line 354, in loads return _default_decoder.decode(s) File "E:\Anaconda\lib\json\decoder.py", line 339, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "E:\Anaconda\lib\json\decoder.py", line 357, in raw_decode raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Delete document missing feture

Hi, i needed to use the delete document api function, so i make a fork and implemented it, hope you merge it. Thanks.

create example image

Feature description

I love to create and configure project using the doccano-client. Thanks a lot for this work !
I'm used to do it with project related to text data.

But is it also possible with images ?

For example is the method create_example able to upload an image as an example for a doccano project ?
It looks like it was made for texts only (I think the images features are quite recent for doccano)

def create_example(

issue in uploading json file

def upload_data(self,
                    project_id: int,
                    file_name: str,
                    file_folder: str = "./",
                    format: str = "JSONL"):
        logger.info(f'Uploading {file_name} to project {project_id}')
        self.client.upload(project_id=project_id,
                             file_paths=file_folder,
                             column_data=self.text_string,
                             column_label=self.label_string,
                             task=Task,
                             format=format)

this is te function and this is the issue:
File "/home/finq/miniconda3/envs/adam/lib/python3.10/pathlib.py", line 1117, in open
return self._accessor.open(self, mode, buffering, encoding, errors,
IsADirectoryError: [Errno 21] Is a directory: '/'

Wrong URL path for the UploadAPI View

How to reproduce the behavior

Doccano server has a defined path /v1/projects/<id>/upload for the document upload https://github.com/doccano/doccano/blob/master/backend/api/urls.py#L6
However the client library is using 'v1/projects/{project_id}/docs/upload' https://github.com/doccano/doccano-client/blob/master/doccano_api_client/__init__.py#L544

As a result, these versions are not compatible. An example from the logs:

nginx_1     | 172.23.0.1 - - [04/May/2021:09:15:46 +0000] "GET /v1/me HTTP/1.1" 200 47 "-" "python-requests/2.25.1" "-"                                                                             
backend_1   | Method Not Allowed (POST): /v1/projects/1/docs/upload                                                                                                                                                                                                           
backend_1   | Method Not Allowed: /v1/projects/1/docs/upload                                                                                                                                        
nginx_1     | 172.23.0.1 - - [04/May/2021:09:15:46 +0000] "POST /v1/projects/1/docs/upload HTTP/1.1" 405 0 "-" "python-requests/2.25.1" "-" 

I can prepare a quick PR for this fix. But I'm not an expert in the previous versions and I do not know what versions are broken. So please if you have time please add the description of when the breaking changes were added or guide me about these changes, thanks!

UPD:
Actually, this issue goes deeper. The problem isn't only about the wrong path, but about the completely new way to upload files.
When you've added drf filepond lib, this client wasn't updated to use a few calls instead.
The correct way for this function is something like that:

fp_r = s.post(host + "v1/fp/process/", files={"filepond": fl},)
upload_data = {
    "format":"JSONL",
    "uploadIds":[fp_r.text],
    "column_data":"text",
    "column_label":"label",
    "delimiter":"",
    "encoding":"utf_8"
}
r = s.post(host + "v1/projects/1/upload", json=upload_data)

I can prepare a PR to add this functionality, but still, I need some guidance about how do you want to handle this breaking change. And of course about the way to implement upload for a few files and do you want to upload a file /files with one call, or separate calls for filepond and upload endpoint?

Your Environment


Operating System: Debian Buster
Python Version: 3.7.8
Package Version:1.0.1
Doccano server: built from the tag v1.3.1

post_members

How to reproduce the behaviour

pip install doccano-client ,
from doccano_api_client import DoccanoClient

Your Environment

  • Operating System: windows 10
  • Python Version: 3.7
  • Package Version: 1.02

Problem is, the post_member function in github does work in the pip installation. I get,

line 45, in
members_add = doccano_client.post_members(project_id = project_info.id, usernames=userlist, roles=roleslist)

AttributeError: 'DoccanoClient' object has no attribute 'post_members'

I am also using PyCharm, and it does not recognize the function post_members.

Process finished with exit code 1

Broken Type Hints in DoccanoClient.

How to reproduce the behavior

When trying to import the DoccanoClient class with the following line from doccano_api_client import DoccanoClient, the import is unsuccessful because the newly added label functions have return type hints to request. instead of requests..

Solution

I've created a pull request to resolve the error here.

Your Environment

  • Operating System: ubuntu 20.04 (docker)
  • Python Version: 3.8
  • Package Version: current master (170c7dd)

Can't download dataset

How to reproduce the behaviour

doccano_client.get_doc_download(4, 'json')

backend_1   | Internal Server Error: /v1/projects/4/docs/download
backend_1   | Traceback (most recent call last):
backend_1   |   File "/usr/local/lib/python3.8/site-packages/django/core/handlers/exception.py", line 47, in inner
backend_1   |     response = get_response(request)
backend_1   |   File "/usr/local/lib/python3.8/site-packages/django/core/handlers/base.py", line 204, in _get_response
backend_1   |     response = response.render()
backend_1   |   File "/usr/local/lib/python3.8/site-packages/django/template/response.py", line 105, in render
backend_1   |     self.content = self.rendered_content
backend_1   |   File "/usr/local/lib/python3.8/site-packages/django/template/response.py", line 81, in rendered_content
backend_1   |     template = self.resolve_template(self.template_name)
backend_1   |   File "/usr/local/lib/python3.8/site-packages/django/template/response.py", line 63, in resolve_template
backend_1   |     return select_template(template, using=self.using)
backend_1   |   File "/usr/local/lib/python3.8/site-packages/django/template/loader.py", line 47, in select_template
backend_1   |     raise TemplateDoesNotExist(', '.join(template_name_list), chain=chain)
backend_1   | django.template.exceptions.TemplateDoesNotExist: index.html
backend_1   | [01/Aug/2021 12:22:12] "GET /v1/projects/4/docs/download?q=json&onlyApproved=false HTTP/1.1" 500 19489

Your Environment

  • Operating System: ubuntu
  • Python Version: 3.8
  • Package Version: 1.0.3.dev29+gbb4c9ba (installed from github)

Question: export data using client

How to reproduce the behaviour

I was wondering if I'm exported data properly using the client. Everything was working, but then updated to new version of Doccano and client, and I can't seem to export the data.

client = DoccanoClient(
    'http://localhost:8000/',
    'admin',
    'password'
)
data = client.get_doc_download(5, "json")
data.text

The result is an HTML page:

<!doctype html>
<html>
  <head>
    <title>doccano - doccano</title><meta data-n-head="1" charset="utf-8"><meta data-n-head="1" name="viewport" content="width=device-width,initial-scale=1"><meta data-n-head="1" data-hid="description" name="description" content="doccano is an open source annotation tools for machine learning practitioner."><link data-n-head="1" rel="icon" type="image/x-icon" href="/favicon.ico"><link data-n-head="1" rel="stylesheet" href="https://fonts.googleapis.com/css?family=Roboto:300,400,500,700|Material+Icons"><link data-n-head="1" rel="stylesheet" type="text/css" href="https://fonts.googleapis.com/css?family=Roboto:100,300,400,500,700,900&display=swap"><link data-n-head="1" rel="stylesheet" type="text/css" href="https://cdn.jsdelivr.net/npm/@mdi/font@latest/css/materialdesignicons.min.css"><script data-n-head="1" src="https://use.fontawesome.com/releases/v5.0.6/js/all.js"></script><link rel="preload" href="/static/_nuxt/runtime.861ae73.js" as="script"><link rel="preload" href="/static/_nuxt/commons/app.622be96.js" as="script"><link rel="preload" href="/static/_nuxt/vendors~app.6c65cb7.js" as="script"><link rel="preload" href="/static/_nuxt/app.4355148.js" as="script">
  </head>
  <body>
    <div id="__nuxt"><style>#nuxt-loading{background:#fff;visibility:hidden;opacity:0;position:absolute;left:0;right:0;top:0;bottom:0;display:flex;justify-content:center;align-items:center;flex-direction:column;animation:nuxtLoadingIn 10s ease;-webkit-animation:nuxtLoadingIn 10s ease;animation-fill-mode:forwards;overflow:hidden}@keyframes nuxtLoadingIn{0%{visibility:hidden;opacity:0}20%{visibility:visible;opacity:0}100%{visibility:visible;opacity:1}}@-webkit-keyframes nuxtLoadingIn{0%{visibility:hidden;opacity:0}20%{visibility:visible;opacity:0}100%{visibility:visible;opacity:1}}#nuxt-loading>div,#nuxt-loading>div:after{border-radius:50%;width:5rem;height:5rem}#nuxt-loading>div{font-size:10px;position:relative;text-indent:-9999em;border:.5rem solid #f5f5f5;border-left:.5rem solid #fff;-webkit-transform:translateZ(0);-ms-transform:translateZ(0);transform:translateZ(0);-webkit-animation:nuxtLoading 1.1s infinite linear;animation:nuxtLoading 1.1s infinite linear}#nuxt-loading.error>div{border-left:.5rem solid #ff4500;animation-duration:5s}@-webkit-keyframes nuxtLoading{0%{-webkit-transform:rotate(0);transform:rotate(0)}100%{-webkit-transform:rotate(360deg);transform:rotate(360deg)}}@keyframes nuxtLoading{0%{-webkit-transform:rotate(0);transform:rotate(0)}100%{-webkit-transform:rotate(360deg);transform:rotate(360deg)}}</style><script>window.addEventListener("error",function(){var e=document.getElementById("nuxt-loading");e&&(e.className+=" error")})</script><div id="nuxt-loading" aria-live="polite" role="status"><div>Loading...</div></div></div><script>window.__NUXT__={config:{},staticAssetsBase:void 0}</script>
  <script src="/static/_nuxt/runtime.861ae73.js"></script><script src="/static/_nuxt/commons/app.622be96.js"></script><script src="/static/_nuxt/vendors~app.6c65cb7.js"></script><script src="/static/_nuxt/app.4355148.js"></script></body>
</html>

Any advice is appreciated. My knowledge is very limited.

Thank you!


  • Operating System: Ubuntu 20.04.2
  • Python Version Used: 3.8.5
  • Doccano-Client version: 1.0.1
  • When you install doccano: yesterday
  • How did you install doccano (Heroku button etc): docker pull doccano/doccano

create_user documentation mismatch

How to reproduce the behaviour

Your Environment

  • Operating System: Docker
  • Python Version: 3.8
  • Package Version: 1.2.6

I noticed the documentation has a slight inconsistency around the new create user functions. It states that it returns the User object. If that's true we should update the description in the client.py file to state User instead of UserDetails, similar to the documentation in the user repository file

image

requests.exceptions.JSONDecodeError: Expecting value: line 2 column 1 (char 1)

Screenshot (158)

Issue is like this ,unable to login to doccano ,its not logging in please help to resolve this issue.

 File "/home/finq/review/adam-1/adam/doccano/doccano_api/doccano_api.py", line 34, in __init__
    self.client.login(username,password)
  File "/home/finq/miniconda3/envs/adam/lib/python3.10/site-packages/doccano_client/client.py", line 116, in login
    self._base_repository.login(username, password)
  File "/home/finq/miniconda3/envs/adam/lib/python3.10/site-packages/doccano_client/repositories/base.py", line 93, in login
    verbose_raise_for_status(response)
  File "/home/finq/miniconda3/envs/adam/lib/python3.10/site-packages/doccano_client/repositories/base.py", line 34, in verbose_raise_for_status
    raise DoccanoAPIError(err.response)
  File "/home/finq/miniconda3/envs/adam/lib/python3.10/site-packages/doccano_client/repositories/base.py", line 16, in __init__
    super().__init__(str(response.json()), response=response)
  File "/home/finq/miniconda3/envs/adam/lib/python3.10/site-packages/requests/models.py", line 975, in json
    raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)
requests.exceptions.JSONDecodeError: Expecting value: line 2 column 1 (char 1)
  • Operating System: windows 11
  • Python Version: 3.10
  • Package Version: doccano 1.7.0 client version:-1.2.2

Typo in commit 170c, should be `requests`

How to reproduce the behaviour

Install package from source code

git clone https://github.com/doccano/doccano-client.git
pip install -e doccano-client

Import in python

from doccano_api_client import Doccano

I got

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/howayi/workspace/github/doccano/doccano-client/doccano_api_client/__init__.py", line 148, in <module>
    class DoccanoClient(_Router):
  File "/Users/howayi/workspace/github/doccano/doccano-client/doccano_api_client/__init__.py", line 604, in DoccanoClient
    def get_category_type_list(self, project_id: int) -> request.models.Response:
NameError: name 'request' is not defined

Your Environment

  • Operating System:
  • Python Version:
  • Package Version:

get_document_list doesn't work with offset>9

when calling method get_document_list using an offset parameter, the results are wrong when offset is greater than 9. There is an error in the build_url_parameter method, that split offset=15 to ?offset=1&offset=5

Unable to connect to the client

I have deployed doccano to aws and now trying to use the client to upload data from an airflow pipeline but I can't seem to connect with the client. I am following the instructions but i keep getting an error:

from doccano_api_client import DoccanoClient

doccano_client = DoccanoClient(
    'https://doccano-dev.test.com/', 
    os.environ.get('DOCANNO_ADMIN'), 
    os.environ.get('DOCANNO_PWD')
)

I then get the error below

---------------------------------------------------------------------------
JSONDecodeError                           Traceback (most recent call last)
<ipython-input-100-60f5cf7a0cc5> in <module>
      4     'https://doccano-dev.test.com/',
      5     os.environ.get('DOCANNO_ADMIN'),
----> 6     os.environ.get('DOCANNO_PWD')
      7 )

~/.pyenv/versions/3.6.10/lib/python3.6/site-packages/doccano_api_client/__init__.py in __init__(self, baseurl, username, password)
    107         self.baseurl = baseurl if baseurl[-1] == '/' else baseurl+'/'
    108         self.session = requests.Session()
--> 109         self._login(username, password)
    110 
    111     def _login(

~/.pyenv/versions/3.6.10/lib/python3.6/site-packages/doccano_api_client/__init__.py in _login(self, username, password)
    125         url = 'v1/auth-token'
    126         auth = {'username': username, 'password': password}
--> 127         response = self.post(url, auth)
    128         token = response['token']
    129         self.session.headers.update(

~/.pyenv/versions/3.6.10/lib/python3.6/site-packages/doccano_api_client/__init__.py in post(self, endpoint, data, json, files)
     62         request_url = urljoin(self.baseurl, endpoint)
     63         return self.session.post(
---> 64                 request_url, data=data, files=files, json=json).json()
     65 
     66     def delete(

~/.pyenv/versions/3.6.10/lib/python3.6/site-packages/requests/models.py in json(self, **kwargs)
    896                     # used.
    897                     pass
--> 898         return complexjson.loads(self.text, **kwargs)
    899 
    900     @property

~/.pyenv/versions/3.6.10/lib/python3.6/site-packages/simplejson/__init__.py in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, use_decimal, **kw)
    523             parse_constant is None and object_pairs_hook is None
    524             and not use_decimal and not kw):
--> 525         return _default_decoder.decode(s)
    526     if cls is None:
    527         cls = JSONDecoder

~/.pyenv/versions/3.6.10/lib/python3.6/site-packages/simplejson/decoder.py in decode(self, s, _w, _PY3)
    368         if _PY3 and isinstance(s, bytes):
    369             s = str(s, self.encoding)
--> 370         obj, end = self.raw_decode(s)
    371         end = _w(s, end).end()
    372         if end != len(s):

~/.pyenv/versions/3.6.10/lib/python3.6/site-packages/simplejson/decoder.py in raw_decode(self, s, idx, _w, _PY3)
    398             elif ord0 == 0xef and s[idx:idx + 3] == '\xef\xbb\xbf':
    399                 idx += 3
--> 400         return self.scan_once(s, idx=_w(s, idx).end())

JSONDecodeError: Expecting value: line 2 column 1 (char 1)

How to download data?

When I try to call the following code
doc_download = doccano_client.get_doc_download(2, 'json')
print(doc_download.text)
`
<!doctype html>

<title>doccano - doccano</title>
<style>#nuxt-loading{background:#fff;visibility:hidden;opacity:0;position:absolute;left:0;right:0;top:0;bottom:0;display:flex;justify-content:center;align-items:center;flex-direction:column;animation:nuxtLoadingIn 10s ease;-webkit-animation:nuxtLoadingIn 10s ease;animation-fill-mode:forwards;overflow:hidden}@Keyframes nuxtLoadingIn{0%{visibility:hidden;opacity:0}20%{visibility:visible;opacity:0}100%{visibility:visible;opacity:1}}@-webkit-keyframes nuxtLoadingIn{0%{visibility:hidden;opacity:0}20%{visibility:visible;opacity:0}100%{visibility:visible;opacity:1}}#nuxt-loading>div,#nuxt-loading>div:after{border-radius:50%;width:5rem;height:5rem}#nuxt-loading>div{font-size:10px;position:relative;text-indent:-9999em;border:.5rem solid #f5f5f5;border-left:.5rem solid #fff;-webkit-transform:translateZ(0);-ms-transform:translateZ(0);transform:translateZ(0);-webkit-animation:nuxtLoading 1.1s infinite linear;animation:nuxtLoading 1.1s infinite linear}#nuxt-loading.error>div{border-left:.5rem solid #ff4500;animation-duration:5s}@-webkit-keyframes nuxtLoading{0%{-webkit-transform:rotate(0);transform:rotate(0)}100%{-webkit-transform:rotate(360deg);transform:rotate(360deg)}}@Keyframes nuxtLoading{0%{-webkit-transform:rotate(0);transform:rotate(0)}100%{-webkit-transform:rotate(360deg);transform:rotate(360deg)}}</style><script>window.addEventListener("error",function(){var e=document.getElementById("nuxt-loading");e&&(e.className+=" error")})</script>
Loading...
<script>window.__NUXT__={config:{_app:{basePath:"/",assetsPath:"/static/_nuxt/",cdnURL:null}}}</script> <script src="/static/_nuxt/2b323e7.js"></script><script src="/static/_nuxt/9166e9e.js"></script><script src="/static/_nuxt/0f52d28.js"></script><script src="/static/_nuxt/c5544a6.js"></script> `

How do I get the downloaded data ?
  • Operating System: windows
  • Python Version: 3.10.2
  • Package Version: doccano(1.5.5) doccano-client(1.0.3)

create_document lacks metadata field

Hi,

I really appreciate your efforts making this powerful and time-saving tool (until we see the feature requests being merged in doccano). One detail the create_document function lacks, though, is the metadata field, which is extremely useful to keep external annotations with the datasets.

That's it thanks for hard work.

Can't upload documents due to CSRF token missing or incorrect.

How to reproduce the behavior

Hello, I'm not sure where I should post this issue, in the client or in the doccano repository. So if it's the wrong place please tell me that and I'll move it.

Doccano rejects upload calls from the doccano-client because of the CSRF token.
All the post requests are rejected as csrf token is not provided in the header.
Some examples:

from doccano_api_client import DoccanoClient

> doccano_client = DoccanoClient(
>    os.environ["doccano_api_host"],
>    os.environ["doccano_api_login"],
>    os.environ["doccano_api_password"]
>)

> doccano_client.get_me()  # <- GET request, ok
{'id': 1, 'username': 'admin', 'is_superuser': True}
> doccano_client.get_document_list(1) # <- GET request, ok
{'count': 0, 'next': None, 'previous': None, 'results': []}

And now post requests:

> doccano_client.post_approve_labels(1, 1)
{'detail': 'CSRF Failed: CSRF token missing or incorrect.'}

And a more interesting case for the document upload. It doesn't handle the correct JSON error but sends HTML page and the client raises an exception:

>r = doccano_client.post_doc_upload(
>    1, 
>    file_format='json', 
>    file_name='fl.json', 
>    file_path=doccano_input_dir)
---------------------------------------------------------------------------
JSONDecodeError                           Traceback (most recent call last)
<ipython-input-22-e1e0971d4839> in <module>
      3     file_format='json',
      4     file_name='fl.json',
----> 5     file_path=doccano_input_dir)

<path>/lib/python3.7/site-packages/doccano_api_client/__init__.py in post_doc_upload(self, project_id, file_format, file_name, file_path)
    546             ),
    547             files=files,
--> 548             data=data
    549         )
    550 

<path>/lib/python3.7/site-packages/doccano_api_client/__init__.py in post(self, endpoint, data, json, files)
     62         request_url = urljoin(self.baseurl, endpoint)
     63         return self.session.post(
---> 64                 request_url, data=data, files=files, json=json).json()
     65 
     66     def delete(

<path>/lib/python3.7/site-packages/requests/models.py in json(self, **kwargs)
    898                     # used.
    899                     pass
--> 900         return complexjson.loads(self.text, **kwargs)
    901 
    902     @property

<path>/lib/python3.7/json/__init__.py in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    346             parse_int is None and parse_float is None and
    347             parse_constant is None and object_pairs_hook is None and not kw):
--> 348         return _default_decoder.decode(s)
    349     if cls is None:
    350         cls = JSONDecoder

<path>/lib/python3.7/json/decoder.py in decode(self, s, _w)
    335 
    336         """
--> 337         obj, end = self.raw_decode(s, idx=_w(s, 0).end())
    338         end = _w(s, end).end()
    339         if end != len(s):

<path>/lib/python3.7/json/decoder.py in raw_decode(self, s, idx)
    353             obj, end = self.scan_once(s, idx)
    354         except StopIteration as err:
--> 355             raise JSONDecodeError("Expecting value", s, err.value) from None
    356         return obj, end

JSONDecodeError: Expecting value: line 2 column 1 (char 1)

And if we go deeper to the details and remove this json() exception we have next result (I know that I'm not supposed to use private API but otherwise I can't get the real error message) :

data = {
    'file': (
        'fl.jsonl',
        open(f"{doccano_input_dir}/fl.json", "r")
    )
}
files = {
        'file': (
            'fl.jsonl',
            open(f"{doccano_input_dir}/fl.json", "r")
        )
    }
url = 'v1/projects/{project_id}/docs/upload'.format(project_id=1)
request_url = doccano_client.baseurl + url
resp = doccano_client.session.post(request_url, data=data)
print(resp.text)
============
'\n<!DOCTYPE html>\n<html lang="en">\n<head>\n  <meta http-equiv="content-type" content="text/html; charset=utf-8">\n  <meta name="robots" content="NONE,NOARCHIVE">\n  <title>403 Forbidden</title>\n  <style type="text/css">\n    html * { padding:0; margin:0; }\n    body * { padding:10px 20px; }\n    body * * { padding:0; }\n    body { font:small sans-serif; background:#eee; color:#000; }\n    body>div { border-bottom:1px solid #ddd; }\n    h1 { font-weight:normal; margin-bottom:.4em; }\n    h1 span { font-size:60%; color:#666; font-weight:normal; }\n    #info { background:#f6f6f6; }\n    #info ul { margin: 0.5em 4em; }\n    #info p, #summary p { padding-top:10px; }\n    #summary { background: #ffc; }\n    #explanation { background:#eee; border-bottom: 0px none; }\n  </style>\n</head>\n<body>\n<div id="summary">\n  <h1>Forbidden <span>(403)</span></h1>\n  <p>CSRF verification failed. Request aborted.</p>\n\n\n</div>\n\n<div id="info">\n  <h2>Help</h2>\n    \n    <p>Reason given for failure:</p>\n    <pre>\n    CSRF token missing or incorrect.\n    </pre>\n    \n\n  <p>In general, this can occur when there is a genuine Cross Site Request Forgery, or when\n  <a\n  href="https://docs.djangoproject.com/en/3.2/ref/csrf/">Django’s\n  CSRF mechanism</a> has not been used correctly.  For POST forms, you need to\n  ensure:</p>\n\n  <ul>\n    <li>Your browser is accepting cookies.</li>\n\n    <li>The view function passes a <code>request</code> to the template’s <a\n    href="https://docs.djangoproject.com/en/dev/topics/templates/#django.template.backends.base.Template.render"><code>render</code></a>\n    method.</li>\n\n    <li>In the template, there is a <code>{% csrf_token\n    %}</code> template tag inside each POST form that\n    targets an internal URL.</li>\n\n    <li>If you are not using <code>CsrfViewMiddleware</code>, then you must use\n    <code>csrf_protect</code> on any views that use the <code>csrf_token</code>\n    template tag, as well as those that accept the POST data.</li>\n\n    <li>The form has a valid CSRF token. After logging in in another browser\n    tab or hitting the back button after a login, you may need to reload the\n    page with the form, because the token is rotated after a login.</li>\n  </ul>\n\n  <p>You’re seeing the help section of this page because you have <code>DEBUG =\n  True</code> in your Django settings file. Change that to <code>False</code>,\n  and only the initial error message will be displayed.  </p>\n\n  <p>You can customize this page using the CSRF_FAILURE_VIEW setting.</p>\n</div>\n\n</body>\n</html>\n'

Environment:

  • Operating System: Debian Buster
  • Python Version: 3.7.8
  • Package Version:1.0.1
  • Doccano server: built from the tag v1.3.1

CSRF check fails on POST requests to HTTPS endpoint

How to reproduce the behaviour

When doing a POST request (e.g. DoccanoClient.post_doc_upload_binary) over HTTPS, the endpoint returns 403 Forbidden with the following content: {"detail": "CSRF Failed: Referer checking failed - no Referer."}.

Upon inspecting the Django documentation (https://docs.djangoproject.com/en/3.2/ref/csrf/#how-it-works), it seems that CSRF protection fails automatically if the request is made over HTTPS and no referer header is present.

I have circumvented this by adding the default {'referer': self.baseurl} header to the requests.Session. (#47 )

Your Environment

  • Operating System: Ubuntu 21.04
  • Python Version: 3.6.8
  • Package Version: 1.0.2

Integrating Our Version of doccano-client?

Feature description

Hey y'all,

First of all I just wanna say: Great job on this client!

Our company actually has been working on and maintaining a doccano-client as well around the time progress on this client started. It's currently internal, but we could easily open source it--hardly any of it is opinionated to our company's use cases.

The problem is, we don't have the hands at the moment to consistently maintain the package and keep it up to date with all the updates Doccano receives. It looks like this package has been able to, however, thanks to the open source visibility!

An idea: What are ya'll's thoughts on uniting the two? We can move our code to a fork of this repo, and create a major PR that integrates it right into this package if you like the results!

We believe our version of the client has a lot of features that this one can greatly benefit from:

  1. Versioning lines up with Doccano release tags, so it is always clear when it will be compatible with which release of Doccano's API.
  2. All interactions with the API are pythonic, rather than based on raw json. Example below.
  3. VERY heavily unit tested.
  4. Black, Isort, Flake8, Pydocstyle, Pylint, and Mypy compliant
project_controllers = client.projects.all()

my_project_name_controller = next(controller for controller in project_controllers if controller.project.name == "My Project Name")

my_project_name_controller.labels.update([Label(...), Label(...)])

test_example = models.Example(text="This is an example text")
my_project_name_controller.examples.create(test_example)

Would you be interested in this style? There are a few more layers so may take a bit more code to maintain in some ways, but it should cut a lot of tech debt, making it easier to scale. If you're interested, we can port our code to a fork of this repo so you can take a look.

Thank you for your time!

Getting empty result with get_doc_download()

Hi I am trying to use this api to download the annotated data, when I invoke get_doc_download method I get 200 status code but there is no data, the raw key of the response object includes empty byte array. Am I missing something about behaviour of this method? Other api methods seems to work fine.

Hi there, about can't upload

#16
#13
#50

I know this issue is discovered.

I tried the beta , it's not working.

because label data is the iteration work

upload data is important to the whole process

I hope you guys can solve it asap

Thank you for all of your work. It's making my job easy.

Cannot Injest Data with the API

How to reproduce the behaviour

Hi, i have an issue when i try to use post_doc_upload_binary(). Here is a snippet from my code:

from doccano_api_client import DoccanoClient

# instantiate a client and log in to a Doccano instance
doccano_client = DoccanoClient(
    'http://localhost:8000',
    'admin',
    'password'
)

# get basic information about the authorized user
r_me = doccano_client.get_me()
api_token = doccano_client._login("admin", "password")

# print the details from the above query
print(api_token)

with open('annotation_guide.md', 'r') as file:
    guideline = file.read()

project = doccano_client.create_project("Integration Doccano", "This is a test",
                                project_type="SequenceLabeling", guideline=guideline, randomize_document_order=False, collaborative_annotation=True)

print(doccano_client.get_project_list())

file_path = "./some_path"
label_path = "./label_config.json"

files = os.listdir(file_path)

doccano_client.post_doc_upload_binary(project['id'], files, format="plain")

It returns me this error:

[2021-06-22 13:59:58,940: INFO/MainProcess] Task api.tasks.injest_data[14eabc0b-c104-40e8-ac23-3f068f872e8b] received
[2021-06-22 13:59:58,946: ERROR/ForkPoolWorker-2] Task api.tasks.injest_data[14eabc0b-c104-40e8-ac23-3f068f872e8b] raised unexpected: Http404('No Project matches the given query.')
Traceback (most recent call last):
  File "/Users/enzo/Workspace/Testing/doccano-ft/dft_env/lib/python3.9/site-packages/django/shortcuts.py", line 76, in get_object_or_404
    return queryset.get(*args, **kwargs)
  File "/Users/enzo/Workspace/Testing/doccano-ft/dft_env/lib/python3.9/site-packages/django/db/models/query.py", line 435, in get
    raise self.model.DoesNotExist(
api.models.Project.DoesNotExist: Project matching query does not exist.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/enzo/Workspace/Testing/doccano-ft/dft_env/lib/python3.9/site-packages/celery/app/trace.py", line 450, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/Users/enzo/Workspace/Testing/doccano-ft/dft_env/lib/python3.9/site-packages/celery/app/trace.py", line 731, in __protected_call__
    return self.run(*args, **kwargs)
  File "/Users/enzo/Workspace/Testing/doccano-ft/dft_env/lib/python3.9/site-packages/backend/api/tasks.py", line 90, in injest_data
    project = get_object_or_404(Project, pk=project_id)
  File "/Users/enzo/Workspace/Testing/doccano-ft/dft_env/lib/python3.9/site-packages/django/shortcuts.py", line 78, in get_object_or_404
    raise Http404('No %s matches the given query.' % queryset.model._meta.object_name)
django.http.response.Http404: No Project matches the given query.

It seems to works pretty fine without the API, directly via doccano frontend page, but fail through the API

Your Environment

  • Operating System:
  • Python Version: 3.9.5
  • Package Version: 1.0.2

Upload file failed

Hello, I made an error uploading the document using the API, but the document can appear in the Doccano instance database. Looking forward to your reply to my questions, thank you very much!

r_du = doccano_client.post_doc_upload(8, "csv", "data_xa.csv", "D:\WPSCloud")

error:
Traceback (most recent call last): File "D:/doccano/doccano-client/doccano_api_client/example.py", line 113, in <module> r_du = doccano_client.post_doc_upload(8, "csv", "data_xa.csv", "D:\WPSCloud") File "D:\doccano\doccano-client\doccano_api_client\__init__.py", line 521, in post_doc_upload data=data File "D:\doccano\doccano-client\doccano_api_client\__init__.py", line 64, in post request_url, data=data, files=files, json=json).json() File "E:\Anaconda\lib\site-packages\requests\models.py", line 885, in json return complexjson.loads(self.text, **kwargs) File "E:\Anaconda\lib\json\__init__.py", line 354, in loads return _default_decoder.decode(s) File "E:\Anaconda\lib\json\decoder.py", line 339, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "E:\Anaconda\lib\json\decoder.py", line 357, in raw_decode raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

project_id is hard-coded in post_doc_upload_binary

On line 585 in init.py
return self.post("v1/projects/1/upload", json=upload_data)

should be changed to
return self.post('v1/projects/{project_id}/upload'.format(project_id=project_id),json=upload_data)

Publish New Version to PyPi with the Beta Client

Feature description

Thanks for the prompt reviews on my code introducing the beta client code! Now, coinciding with #60 's steps forward, can we publish a new version to pypi with this code? Once we do, we can introduce new changes to get the beta client 1.6+ compatible.

Pip package

Not sure what is required (multiple things, most likely) but having a pip wheel would be beneficial to adoption - opening this as tracking issue (as I didn't notice anything apart from the usual git clone) 👍

pip install doccano-client

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.