huggingface / api-inference-community Goto Github PK

License: Apache License 2.0

Python 93.90% Dockerfile 5.48% Shell 0.58% Makefile 0.04%

api-inference-community's Introduction

This repositories enable third-party libraries integrated with huggingface_hub to create their own docker so that the widgets on the hub can work as the transformers one do.

The hardware to run the API will be provided by Hugging Face for now.

The docker_images/common folder is intended to be a starter point for all new libs that want to be integrated.

Adding a new container from a new lib.

Copy the docker_images/common folder into your library's name docker_images/example.
Edit:
- docker_images/example/requirements.txt
- docker_images/example/app/main.py
- docker_images/example/app/pipelines/{task_name}.py
to implement the desired functionality. All required code is marked with IMPLEMENT_THIS markup.
Remove:
- Any pipeline files in docker_images/example/app/pipelines/ that are not used.
- Any tests associated with deleted pipelines in docker_images/example/tests.
- Any imports of the pipelines you deleted from docker_images/example/app/pipelines/__init__.py
Feel free to customize anything required by your lib everywhere you want. The only real requirements, are to honor the HTTP endpoints, in the same fashion as the common folder for all your supported tasks.
Edit example/tests/test_api.py to add TESTABLE_MODELS.
Pass the test suite pytest -sv --rootdir docker_images/example/ docker_images/example/
Submit your PR and enjoy !

Going the full way

Doing the first 7 steps is good enough to get started, however in the process you can anticipate some problems corrections early on. Maintainers will help you along the way if you don't feel confident to follow those steps yourself

Test your creation within a docker

./manage.py docker MY_MODEL

should work and responds on port 8000. curl -X POST -d "test" http://localhost:8000 for instance if the pipeline deals with simple text.

If it doesn't work out of the box and/or docker is slow for some reason you can test locally (using your local python environment) with :

./manage.py start MY_MODEL

Test your docker uses cache properly.

When doing subsequent docker launch with the same model_id, the docker should start up very fast and not redownload the whole model file. If you see the model/repo being downloaded over and over, it means the cache is not being used correctly. You can edit the docker_images/{framework}/Dockerfile and add an environment variable (by default it assumes HUGGINGFACE_HUB_CACHE), or your code directly to put the model files in the /data folder.

Add a docker test.

Edit the tests/test_dockers.py file to add a new test with your new framework in it (def test_{framework}(self): for instance). As a basic you should have 1 line per task in this test function with a real working model on the hub. Those tests are relatively slow but will check automatically that correct errors are replied by your API and that the cache works properly. To run those tests your can simply do:

RUN_DOCKER_TESTS=1 pytest -sv tests/test_dockers.py::DockerImageTests::test_{framework}

Modifying files within `api-inference-community/{routes,validation,..}.py`.

If you ever come across a bug within api-inference-community/ package or want to update it the development process is slightly more involved.

First, make sure you need to change this package, each framework is very autonomous so if your code can get away by being standalone go that way first as it's much simpler.
If you can make the change only in api-inference-community without depending on it that's also a great option. Make sure to add the proper tests to your PR.
Finally, the best way to go is to develop locally using manage.py command:
Do the necessary modifications within api-inference-community first.
Install it locally in your environment with pip install -e .
Install your package dependencies locally.
Run your webserver locally: ./manage.py start --framework example --task audio-source-separation --model-id MY_MODEL
When everything is working, you will need to split your PR in two, 1 for the api-inference-community part. The second one will be for your package specific modifications and will only land once the api-inference-community tag has landed.
This workflow is still work in progress, don't hesitate to ask questions to maintainers.

Another similar command ./manage.py docker --framework example --task audio-source-separation --model-id MY_MODEL Will launch the server, but this time in a protected, controlled docker environment making sure the behavior will be exactly the one in the API.

Available tasks

Automatic speech recognition: Input is a file, output is a dict of understood words being said within the file
Text generation: Input is a text, output is a dict of generated text
Image recognition: Input is an image, output is a dict of generated text
Question answering: Input is a question + some context, output is a dict containing necessary information to locate the answer to the question within the context.
Audio source separation: Input is some audio, and the output is n audio files that sum up to the original audio but contain individual sources of sound (either speakers or instruments for instant).
Token classification: Input is some text, and the output is a list of entities mentioned in the text. Entities can be anything remarkable like locations, organisations, persons, times etc...
Text to speech: Input is some text, and the output is an audio file saying the text...
Sentence Similarity: Input is some sentence and a list of reference sentences, and the list of similarity scores.

api-inference-community's People

Contributors

Stargazers

Watchers

Forkers

titu1994 devpelvinchristy samu31nz nicoelbert omarespejel muskanmahajan486 adrinjalali nateraw jtrmal sravyapopuri388 benjaminbossan lpw0 techthiyanes flexthink irg1008 cndn bayartsogt-ya sijunhe nouamanetazi kadermiyanyedi krishnasit2508 aliosmankaya ronak877 samkenx-hub-community hellowaywewe rwightman lenni991 vivym tomaarsen alibool davanstrien stephenhodgson ducktapedevops hironow tparcollet benoitwang hezarai onuralpszr stefan-it katielink mostafa10770 fogmeta prhbrt lagrangedao merveenoyan apolinario sulaimonao alexanderalonso890 adel-moumen

api-inference-community's Issues

Add unit tests

Use a release of Asteroid, not master

Prompting warnings in widget response when the inference doesn't work

Hello,

Any warning gets appended in scikit-learn pipelines, even if the inference is successful. I suggest we should check if the prediction and the response looks good and if not, we should return warnings. Otherwise it gets prepended on top of the response and will break the widget. (what I observed was version mismatch, which doesn't happen in the production, I know, but I don't think version mismatch should concern the person if the predictions are returned well or any warning on warning level and not error level)
(This is something I observed for text classification pipeline because I repurposed code from tabular pipelines, let me know if this isn't the case.) Also feel free to ignore this issue if this doesn't make any sense. I think the below code should be refactored.

for warning in record:
            _warnings.append(f"{warning.category.__name__}({warning.message})")

        for warning in self._load_warnings:
            _warnings.append(f"{warning.category.__name__}({warning.message})")

        if _warnings:
            for warning in _warnings:
                logger.warning(warning)

            if not exception:
                # we raise an error if there are any warnings, so that routes.py
                # can catch and return a non 200 status code.
                ### THIS IS THE PART I COMPLAIN ON :')
                error = {
                    "error": "There were warnings while running the model.",
                    "output": res,
                }
                raise ValueError(json.dumps(error))
            else:
                # if there was an exception, we raise it so that routes.py can
                # catch and return a non 200 status code.
                raise exception

        return res

WDYT @adrinjalali @BenjaminBossan

A new community image was pushed, need to update API

Narsil just pushed new code:

huggingface/huggingface_hub@5df3606

A new community image was pushed, need to update API

merveenoyan just pushed new code:

huggingface/huggingface_hub@778fe84

A new community image was pushed, need to update API

Narsil just pushed new code:

huggingface/huggingface_hub@575ab70

Have collections of requests

It would be nice to have something like postman collection of requests somewhere (doesn't necessarily have to be a postman collection), where there's requests, expected bodies, and outputs collected in a tidy way. @Narsil @OlivierDehaene
For context, I want to see where warnings are put to fix this issue. Thought it would be nice to have in general.

Publish on conda-forge

For now #67 is installing everything from conda-forge except api-inference-community. Having this package available on conda-forge would make it much faster for the sklearn builds to run, since a single command of mamba install would handle all the installations and that one's pretty fast.

Enable custom inference classifier in audio classification pipeline

Problem description
Today audio classification pipeline https://github.com/huggingface/api-inference-community/blob/b912a4bc5b0ed71836f37525eb4aedc65fb65a10/docker_images/speechbrain/app/pipelines/audio_classification.py only enables to inference with Speechbrain's native EncoderClassifier and is limited only to that class, but many other wav2vec examples of HuggingFace use other and custom classifiers.

It affects pretrained models like https://huggingface.co/TalTechNLP/voxlingua107-xls-r-300m-wav2vec or https://huggingface.co/speechbrain/emotion-recognition-wav2vec2-IEMOCAP/ using Hosted Inference API in HuggingFace Hub that uses custom classifier.

Solution
The solution is to enable custom classification ModelTypes when initating AudioClassificationPipeline besides EncoderClassifier which can be also be inherited from Pretrained super class.

🐛 Missing values converted to string "null" and raises unknown error

I pushed the pipeline in this notebook with skops.hub_utils. The pipeline has a missing value imputer for numerical values, so I expected it to work. However, somewhere in between pushing and Hub consuming the model card, NaNs are converted to .nan in model card metadata, which widget shows as null as a string (in first three rows of X_test, there were two NaNs).
There's two things to do. First, for below try except block:

try:
            with warnings.catch_warnings(record=True) as record:
                # We convert the inputs to a pandas DataFrame, and use self.columns
                # to order the columns in the order they're expected, ignore extra
                # columns given if any, and put NaN for missing columns.
                data = pd.DataFrame(inputs["data"], columns=self.columns)
                res = self.model.predict(data).tolist()
except Exception as e:
            exception = e

The exception is not raised properly, where it says that res wasn't assigned previously. We should raise the exception itself so that people can see in case there's something wrong with prediction or pipeline itself, like below (@Narsil's solution):

if exception:
      raise exception
return res

Secondly, we should convert those null values (pinging @mishig25 here), they should be converted to NaN, as when I modified the metadata manually through widget swapping null with NaN I realized it works fine. (weird I know, that's how scikit-learn expects them I guess 😂)

Pinging @adrinjalali here too!

(below works for api-inference-community requests)

                 # We convert the inputs to a pandas DataFrame, and use self.columns
                 # to order the columns in the order they're expected, ignore extra
                 # columns given if any, and put NaN for missing columns.
-                data = pd.DataFrame(inputs["data"], columns=self.columns)
+                data = {
+                    k: [v if v != "null" else None for v in values]
+                    for k, values in inputs["data"].items()
+                }
+                data = pd.DataFrame(data, columns=self.columns)
                 res = self.model.predict(data).tolist()
         except Exception as e:
             exception = e

A new community image was pushed, need to update API

osanseviero just pushed new code:

huggingface/huggingface_hub@5e02998

A new community image was pushed, need to update API

Narsil just pushed new code:

huggingface/huggingface_hub@a5e1cb9

A new community image was pushed, need to update API

osanseviero just pushed new code:

huggingface/huggingface_hub@3cce64b

Allow choice of similarity function for Sentence Similarity task

Some Sentence Transformers sentence similarity models need to use the dot-product for computing the similarity between embeddings, such as some of the MS-MARCO models on this page.

At the moment the Inference API uses pytorch_cos_sim, so it might be good to allow users to specify which similarity function to use via a parameter.

Inference widgets - local app

Hi guys - Thank you for open sourcing these amazing inference widgets ! I managed to test a token classifier using Flair in my local computer. Very practical for demo testing !

I have some questions :

When running a widget app locally, does it run on our cpu or on HF server ?
Is the model cached somewhere in the local computer ? (It takes some time to load it for the first time)
Can we use our own local model instead of using a deployed version in HF ?

A new community image was pushed, need to update API

Narsil just pushed new code:

huggingface/huggingface_hub@55d5a36

[Startup Plan]: Failed to launch GPU inference

Hi community,

I have subscribed a 7-day free trial of the Startup Plan and I wish to test GPU inference API on this model: https://huggingface.co/Matthieu/stsb-xlm-r-multilingual-custom

However, when using the below code:

import json
import requests

API_URL = "https://api-inference.huggingface.co/models/Matthieu/stsb-xlm-r-multilingual-custom"
headers = {"Authorization": "Bearer API_ORG_TOKEN"}

def query(payload):
    data = json.dumps(payload)
    response = requests.request("POST", API_URL, headers=headers, data=data)
    return json.loads(response.content.decode("utf-8"))

payload1 = {"inputs": "Navigateur Web : Ce logiciel permet d'accéder à des pages web depuis votre ordinateur. Il en existe plusieurs téléchargeables gratuitement comme Google Chrome ou Mozilla. Certains sont même déjà installés comme Safari sur Mac OS et Edge sur Microsoft.", "options": {"use_cache": False, "use_gpu": True}}

sentence_embeddings1 = query(payload1)
print(sentence_embeddings1)

I got the following error: {'error': 'Model Matthieu/stsb-xlm-r-multilingual-custom is currently loading', 'estimated_time': 44.490336920000004}

Do I have to wait some time until the model is loaded for GPU inference?

Thanks!

A new community image was pushed, need to update API

Narsil just pushed new code:

huggingface/huggingface_hub@5e83e38

How we can use Tensorflow 1.13 in Gradio Spaces?

I want to use tensorflow 1.13 in app.py but it's not allow to install . How I can install tensorflow 1.x?
os.system("pip install tensorflow==1.13")
@srush

A new community image was pushed, need to update API

Narsil just pushed new code:

huggingface/huggingface_hub@f907158

A new community image was pushed, need to update API

Narsil just pushed new code:

huggingface/huggingface_hub@5c7abde

Insert and remove from sys path in generic pipelines

Currently in generic pipeline we simply sys.path.append the path to the snapshot repo. This is fine if running in a docker container once, but for development it can be a bit of a nightmare, especially if you're playing with multiple different repos that have implemented generic pipelines. Since we appended, you'll get previously loaded pipelines instead of the one you expect.

I suggest we do what torch.hub does, and instead sys.path.insert(0, repo_dir), import the module, and then sys.path.remove(repo_dir).

Something like:

import sys
import json
from pathlib import Path
from huggingface_hub import snapshot_download

PIPELINE_FILE = 'pipeline.py'
CONFIG_FILE = 'config.json'


# Taken directly from torch.hub
def import_module(name, path):
    import importlib.util
    from importlib.abc import Loader
    spec = importlib.util.spec_from_file_location(name, path)
    module = importlib.util.module_from_spec(spec)
    assert isinstance(spec.loader, Loader)
    spec.loader.exec_module(module)
    return module


def load_pipeline(repo_id, **kwargs):

    if Path(repo_id).is_dir():
        repo_dir = Path(repo_id)
    else:
        repo_dir = Path(snapshot_download(repo_id))

    pipeline_path = repo_dir / PIPELINE_FILE
    sys.path.insert(0, repo_dir)
    module = import_module(PIPELINE_FILE, pipeline_path)
    sys.path.remove(repo_dir)

    return module.Pipeline(repo_dir)

CC @osanseviero

A new community image was pushed, need to update API

Narsil just pushed new code:

huggingface/huggingface_hub@7976616

[Startup Plan] Don't manage to get CPU optimized inference API

Hi community,

I have subscribed a 7-day free trial of the Startup Plan and I wish to test CPU optimized inference API on this model: https://huggingface.co/Matthieu/stsb-xlm-r-multilingual-custom

However, when using the below code:

import json
import requests

API_URL = "https://api-inference.huggingface.co/models/Matthieu/stsb-xlm-r-multilingual-custom"
headers = {"Authorization": "Bearer API_ORG_TOKEN"}

def query(payload):
    data = json.dumps(payload)
    response = requests.request("POST", API_URL, headers=headers, data=data)
    return json.loads(response.content.decode("utf-8")), response.headers.get('x-compute-type')

payload1 = {"inputs": "Navigateur Web : Ce logiciel permet d'accéder à des pages web depuis votre ordinateur. Il en existe plusieurs téléchargeables gratuitement comme Google Chrome ou Mozilla. Certains sont même déjà installés comme Safari sur Mac OS et Edge sur Microsoft.", "options": {"use_cache": False}}

sentence_embeddings1, x_compute_type1 = query(payload1)
print(sentence_embeddings1)
print(x_compute_type1)

I got the sentence embeddings but x-compute-type header of my request return cpu and not cpu+optimized. Do I have to ask something to have CPU optimized inference?

Thanks!

Use a release of ESPnet, not master

Divide sklearn generation script into tasks

Currently sklearn generation script takes only version as argument. It would be nice to put an option to run for a given task separately given we might add more tasks and it's cumbersome to run it for all tasks and push tons of models to skops-tests every time.
It would be nice if we could do:

def main(version, task):
    ...

if __name__ == "__main__":
    sklearn_version = sys.argv[1]
    task = sys.argv[2]
    main(sklearn_version, task)

It was a minor annoyance, I had to run the script and in the middle of the run I uploaded 5 models that I didn't need, and got 503.
Feel free to ignore if it's not good :')
@adrinjalali @BenjaminBossan

Remove unimplemented pipelines across the different integrations

Is your feature request related to a problem? Please describe.
Is there any reason why we have a ton of unused pipeline files across the different integrations (example)? I think it would be a lot easier to both navigate the repo and quickly see which integrations have which pipelines enabled if we only included the files that are actually being used.

Describe the solution you'd like

Remove unused pipeline files from the different integrations.
Update guidance in contributing docs if need be to reflect the fact we shouldn't be copying over all the pipeline template files.

WDYT?

A new community image was pushed, need to update API

osanseviero just pushed new code:

huggingface/huggingface_hub@037bb4a

A new community image was pushed, need to update API

osanseviero just pushed new code:

ab5f093

A new community image was pushed, need to update API

Narsil just pushed new code:

huggingface/huggingface_hub@5edcd6a

A new community image was pushed, need to update API

Narsil just pushed new code:

fac7db2

Inference API always returns error: Invalid token

Sorry if this is not the best place to post this issue. I am having an issue with the inference API suddenly, after it has worked perfectly for months:

After seeing '{ "error": "invalid token" }' coming back in response to queries, I created a new API token (old one was not showing up, abd changed the header from Authorization: Bearer api_XXX to Authorization: Bearer hf_YYY as outlined in the docs, but am still facing the same error.

Any idea what could be the issue? I am so hoping this could be fixed soon.

A new community image was pushed, need to update API

Narsil just pushed new code:

huggingface/huggingface_hub@82eeefc

A new community image was pushed, need to update API

Narsil just pushed new code:

huggingface/huggingface_hub@df441e6

A new community image was pushed, need to update API

nateraw just pushed new code:

huggingface/huggingface_hub@0e96fad

Misc improvements for Stanza models

Couple of minor improvements in order of increasing complexity/maintainability. This is relatively low priority, but the first one should be really quick to do and will significantly help users.

As a user, I can see a simple code snippet to know how to load the model.
As a user, I can easily filter for all PoS vs NER models of Stanza. This means we need to get the right tag in the uploaded models.
As a user, I get the right license. We need to fix the model licenses according to https://stanfordnlp.github.io/stanza/available_models.html.

Run production grade private models

Hello,

Following discussion with @Narsil, he told me that actually model hub is not meant to be able to load private models (as of now), since the api-inference-community was originally intended to promote community libraries that use the hub.

However, in the scope of running production grade private models would it be possible to internally discuss this possibility within model hub?

Thanks!

Audio-to-regions widget and community API for pyannote.audio

Opening an issue as per @osanseviero's suggestion on Twitter.
Issue imported from pyannote/pyannote-audio#835

pyannote.audio 2.0 will bring a unified pipeline API:

from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization")
output = pipeline("audio.wav")   # or pipeline({"waveform": np.ndarray, "sample_rate": int})

where output is a pyannote.core.Annotation instance.

I just created a space that allows to test a bunch of pipelines shared on Hugginface Hub but it would be nice if those were testable directly in their own model card.

My understanding is that two things needs to happen

pyannote pipeline support must be added to Hugginface Inference API
a new widget (based on wavesurfer.js like the aforementioned space?) must be added to the list of Huggingface widgets

Integrate Flair models

either here or directly in our private repo api-inference

fastai models leading to an internal server error

Description

fastai image classification models leading to an internal server error (photo below). PR #49.

Follow-up responsibility of @omarespejel.

Enable top_k as argument for text classification in community Inference API

Context: https://discuss.huggingface.co/t/spaces-based-on-gensim-model/14809/8

This requires changing the input validation of text-classification at https://github.com/huggingface/huggingface_hub/blob/main/api-inference-community/api_inference_community/validation.py#L185

cc @Narsil

When I try to import the model and call it directly it returns a nice audio with nice voice:

from espnet2.bin.tts_inference import Text2Speech
from espnet2.utils.types import str_or_none

tag = 'kan-bayashi/ljspeech_vits'
vocoder_tag = "none"

text2speech = Text2Speech.from_pretrained(
  model_tag=str_or_none(tag),
  vocoder_tag=str_or_none(vocoder_tag),
  device="cpu",
  threshold=0.5,
  minlenratio=0.0,
  maxlenratio=10.0,
  use_att_constraint=False,
  backward_window=1,
  forward_window=3,
  speed_control_alpha=1.0,
  noise_scale=0.333,
  noise_scale_dur=0.333,
)

Running inference API can sometimes be very slow or fail

Following up on this slack discussion.

The problem is that the skops CI sometimes times out after we added tests for the inference API (here is a failed run). We did some further investigation and this is what we can say for now:

The call to the inference API is the slow step
It's only the first call that is slow, repeated calls are fast thanks to warm start (but our tests don't benefit from that).
The more processes there are in parallel, the more likely it is for the call to be slow or to time out, but there is a lot of variance
Even 3 parallel processes can already cause slowdown
This happens even without the retry decorator we use
If a call does not finish within 3 min, in the vast majority of times, it times out after 7 min (there is no timeout set from the client side).
Occasionally, we also get an internal server error

ping @adrinjalali

A new community image was pushed, need to update API

osanseviero just pushed new code:

c5bd713

huggingface / api-inference-community Goto Github PK

api-inference-community's Introduction

Adding a new container from a new lib.

Going the full way

Modifying files within api-inference-community/{routes,validation,..}.py.

Available tasks

api-inference-community's People

Contributors

Stargazers

Watchers

Forkers

api-inference-community's Issues

Description

Recommend Projects

Recommend Topics

Recommend Org

Modifying files within `api-inference-community/{routes,validation,..}.py`.