Giter Site home page Giter Site logo

api-inference-community's Introduction

This repositories enable third-party libraries integrated with huggingface_hub to create their own docker so that the widgets on the hub can work as the transformers one do.

The hardware to run the API will be provided by Hugging Face for now.

The docker_images/common folder is intended to be a starter point for all new libs that want to be integrated.

Adding a new container from a new lib.

  1. Copy the docker_images/common folder into your library's name docker_images/example.

  2. Edit:

    • docker_images/example/requirements.txt
    • docker_images/example/app/main.py
    • docker_images/example/app/pipelines/{task_name}.py

    to implement the desired functionality. All required code is marked with IMPLEMENT_THIS markup.

  3. Remove:

    • Any pipeline files in docker_images/example/app/pipelines/ that are not used.
    • Any tests associated with deleted pipelines in docker_images/example/tests.
    • Any imports of the pipelines you deleted from docker_images/example/app/pipelines/__init__.py
  4. Feel free to customize anything required by your lib everywhere you want. The only real requirements, are to honor the HTTP endpoints, in the same fashion as the common folder for all your supported tasks.

  5. Edit example/tests/test_api.py to add TESTABLE_MODELS.

  6. Pass the test suite pytest -sv --rootdir docker_images/example/ docker_images/example/

  7. Submit your PR and enjoy !

Going the full way

Doing the first 7 steps is good enough to get started, however in the process you can anticipate some problems corrections early on. Maintainers will help you along the way if you don't feel confident to follow those steps yourself

  1. Test your creation within a docker
./manage.py docker MY_MODEL

should work and responds on port 8000. curl -X POST -d "test" http://localhost:8000 for instance if the pipeline deals with simple text.

If it doesn't work out of the box and/or docker is slow for some reason you can test locally (using your local python environment) with :

./manage.py start MY_MODEL

  1. Test your docker uses cache properly.

When doing subsequent docker launch with the same model_id, the docker should start up very fast and not redownload the whole model file. If you see the model/repo being downloaded over and over, it means the cache is not being used correctly. You can edit the docker_images/{framework}/Dockerfile and add an environment variable (by default it assumes HUGGINGFACE_HUB_CACHE), or your code directly to put the model files in the /data folder.

  1. Add a docker test.

Edit the tests/test_dockers.py file to add a new test with your new framework in it (def test_{framework}(self): for instance). As a basic you should have 1 line per task in this test function with a real working model on the hub. Those tests are relatively slow but will check automatically that correct errors are replied by your API and that the cache works properly. To run those tests your can simply do:

RUN_DOCKER_TESTS=1 pytest -sv tests/test_dockers.py::DockerImageTests::test_{framework}

Modifying files within api-inference-community/{routes,validation,..}.py.

If you ever come across a bug within api-inference-community/ package or want to update it the development process is slightly more involved.

  • First, make sure you need to change this package, each framework is very autonomous so if your code can get away by being standalone go that way first as it's much simpler.
  • If you can make the change only in api-inference-community without depending on it that's also a great option. Make sure to add the proper tests to your PR.
  • Finally, the best way to go is to develop locally using manage.py command:
  • Do the necessary modifications within api-inference-community first.
  • Install it locally in your environment with pip install -e .
  • Install your package dependencies locally.
  • Run your webserver locally: ./manage.py start --framework example --task audio-source-separation --model-id MY_MODEL
  • When everything is working, you will need to split your PR in two, 1 for the api-inference-community part. The second one will be for your package specific modifications and will only land once the api-inference-community tag has landed.
  • This workflow is still work in progress, don't hesitate to ask questions to maintainers.

Another similar command ./manage.py docker --framework example --task audio-source-separation --model-id MY_MODEL Will launch the server, but this time in a protected, controlled docker environment making sure the behavior will be exactly the one in the API.

Available tasks

  • Automatic speech recognition: Input is a file, output is a dict of understood words being said within the file
  • Text generation: Input is a text, output is a dict of generated text
  • Image recognition: Input is an image, output is a dict of generated text
  • Question answering: Input is a question + some context, output is a dict containing necessary information to locate the answer to the question within the context.
  • Audio source separation: Input is some audio, and the output is n audio files that sum up to the original audio but contain individual sources of sound (either speakers or instruments for instant).
  • Token classification: Input is some text, and the output is a list of entities mentioned in the text. Entities can be anything remarkable like locations, organisations, persons, times etc...
  • Text to speech: Input is some text, and the output is an audio file saying the text...
  • Sentence Similarity: Input is some sentence and a list of reference sentences, and the list of similarity scores.

api-inference-community's People

Contributors

adrinjalali avatar apolinario avatar benjaminbossan avatar calpt avatar cndn avatar davanstrien avatar dependabot[bot] avatar devpramod avatar flexthink avatar hellowaywewe avatar julien-c avatar kahne avatar lewtun avatar lpw0 avatar merveenoyan avatar narsil avatar nateraw avatar ooraph avatar osanseviero avatar patrickvonplaten avatar radames avatar rwightman avatar sheonhan avatar sijunhe avatar stephenhodgson avatar titu1994 avatar tomaarsen avatar tparcollet avatar vaibhavs10 avatar xianbaoqian avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

api-inference-community's Issues

Prompting warnings in widget response when the inference doesn't work

Hello,

Any warning gets appended in scikit-learn pipelines, even if the inference is successful. I suggest we should check if the prediction and the response looks good and if not, we should return warnings. Otherwise it gets prepended on top of the response and will break the widget. (what I observed was version mismatch, which doesn't happen in the production, I know, but I don't think version mismatch should concern the person if the predictions are returned well or any warning on warning level and not error level)
(This is something I observed for text classification pipeline because I repurposed code from tabular pipelines, let me know if this isn't the case.) Also feel free to ignore this issue if this doesn't make any sense. I think the below code should be refactored.

for warning in record:
            _warnings.append(f"{warning.category.__name__}({warning.message})")

        for warning in self._load_warnings:
            _warnings.append(f"{warning.category.__name__}({warning.message})")

        if _warnings:
            for warning in _warnings:
                logger.warning(warning)

            if not exception:
                # we raise an error if there are any warnings, so that routes.py
                # can catch and return a non 200 status code.
                ### THIS IS THE PART I COMPLAIN ON :')
                error = {
                    "error": "There were warnings while running the model.",
                    "output": res,
                }
                raise ValueError(json.dumps(error))
            else:
                # if there was an exception, we raise it so that routes.py can
                # catch and return a non 200 status code.
                raise exception

        return res

WDYT @adrinjalali @BenjaminBossan

Have collections of requests

It would be nice to have something like postman collection of requests somewhere (doesn't necessarily have to be a postman collection), where there's requests, expected bodies, and outputs collected in a tidy way. @Narsil @OlivierDehaene
For context, I want to see where warnings are put to fix this issue. Thought it would be nice to have in general.

Publish on conda-forge

For now #67 is installing everything from conda-forge except api-inference-community. Having this package available on conda-forge would make it much faster for the sklearn builds to run, since a single command of mamba install would handle all the installations and that one's pretty fast.

Enable custom inference classifier in audio classification pipeline

Problem description
Today audio classification pipeline https://github.com/huggingface/api-inference-community/blob/b912a4bc5b0ed71836f37525eb4aedc65fb65a10/docker_images/speechbrain/app/pipelines/audio_classification.py only enables to inference with Speechbrain's native EncoderClassifier and is limited only to that class, but many other wav2vec examples of HuggingFace use other and custom classifiers.

It affects pretrained models like https://huggingface.co/TalTechNLP/voxlingua107-xls-r-300m-wav2vec or https://huggingface.co/speechbrain/emotion-recognition-wav2vec2-IEMOCAP/ using Hosted Inference API in HuggingFace Hub that uses custom classifier.

Solution
The solution is to enable custom classification ModelTypes when initating AudioClassificationPipeline besides EncoderClassifier which can be also be inherited from Pretrained super class.

🐛 Missing values converted to string "null" and raises unknown error

I pushed the pipeline in this notebook with skops.hub_utils. The pipeline has a missing value imputer for numerical values, so I expected it to work. However, somewhere in between pushing and Hub consuming the model card, NaNs are converted to .nan in model card metadata, which widget shows as null as a string (in first three rows of X_test, there were two NaNs).
There's two things to do. First, for below try except block:

try:
            with warnings.catch_warnings(record=True) as record:
                # We convert the inputs to a pandas DataFrame, and use self.columns
                # to order the columns in the order they're expected, ignore extra
                # columns given if any, and put NaN for missing columns.
                data = pd.DataFrame(inputs["data"], columns=self.columns)
                res = self.model.predict(data).tolist()
except Exception as e:
            exception = e

The exception is not raised properly, where it says that res wasn't assigned previously. We should raise the exception itself so that people can see in case there's something wrong with prediction or pipeline itself, like below (@Narsil's solution):

if exception:
      raise exception
return res

Secondly, we should convert those null values (pinging @mishig25 here), they should be converted to NaN, as when I modified the metadata manually through widget swapping null with NaN I realized it works fine. (weird I know, that's how scikit-learn expects them I guess 😂)

Pinging @adrinjalali here too!

(below works for api-inference-community requests)

                 # We convert the inputs to a pandas DataFrame, and use self.columns
                 # to order the columns in the order they're expected, ignore extra
                 # columns given if any, and put NaN for missing columns.
-                data = pd.DataFrame(inputs["data"], columns=self.columns)
+                data = {
+                    k: [v if v != "null" else None for v in values]
+                    for k, values in inputs["data"].items()
+                }
+                data = pd.DataFrame(data, columns=self.columns)
                 res = self.model.predict(data).tolist()
         except Exception as e:
             exception = e

Inference widgets - local app

Hi guys - Thank you for open sourcing these amazing inference widgets ! I managed to test a token classifier using Flair in my local computer. Very practical for demo testing !

I have some questions :

  1. When running a widget app locally, does it run on our cpu or on HF server ?
  2. Is the model cached somewhere in the local computer ? (It takes some time to load it for the first time)
  3. Can we use our own local model instead of using a deployed version in HF ?

[Startup Plan]: Failed to launch GPU inference

Hi community,

I have subscribed a 7-day free trial of the Startup Plan and I wish to test GPU inference API on this model: https://huggingface.co/Matthieu/stsb-xlm-r-multilingual-custom

However, when using the below code:

import json
import requests

API_URL = "https://api-inference.huggingface.co/models/Matthieu/stsb-xlm-r-multilingual-custom"
headers = {"Authorization": "Bearer API_ORG_TOKEN"}

def query(payload):
    data = json.dumps(payload)
    response = requests.request("POST", API_URL, headers=headers, data=data)
    return json.loads(response.content.decode("utf-8"))

payload1 = {"inputs": "Navigateur Web : Ce logiciel permet d'accéder à des pages web depuis votre ordinateur. Il en existe plusieurs téléchargeables gratuitement comme Google Chrome ou Mozilla. Certains sont même déjà installés comme Safari sur Mac OS et Edge sur Microsoft.", "options": {"use_cache": False, "use_gpu": True}}

sentence_embeddings1 = query(payload1)
print(sentence_embeddings1)

I got the following error: {'error': 'Model Matthieu/stsb-xlm-r-multilingual-custom is currently loading', 'estimated_time': 44.490336920000004}

Do I have to wait some time until the model is loaded for GPU inference?

Thanks!

Insert and remove from sys path in generic pipelines

Currently in generic pipeline we simply sys.path.append the path to the snapshot repo. This is fine if running in a docker container once, but for development it can be a bit of a nightmare, especially if you're playing with multiple different repos that have implemented generic pipelines. Since we appended, you'll get previously loaded pipelines instead of the one you expect.

I suggest we do what torch.hub does, and instead sys.path.insert(0, repo_dir), import the module, and then sys.path.remove(repo_dir).

Something like:

import sys
import json
from pathlib import Path
from huggingface_hub import snapshot_download

PIPELINE_FILE = 'pipeline.py'
CONFIG_FILE = 'config.json'


# Taken directly from torch.hub
def import_module(name, path):
    import importlib.util
    from importlib.abc import Loader
    spec = importlib.util.spec_from_file_location(name, path)
    module = importlib.util.module_from_spec(spec)
    assert isinstance(spec.loader, Loader)
    spec.loader.exec_module(module)
    return module


def load_pipeline(repo_id, **kwargs):

    if Path(repo_id).is_dir():
        repo_dir = Path(repo_id)
    else:
        repo_dir = Path(snapshot_download(repo_id))

    pipeline_path = repo_dir / PIPELINE_FILE
    sys.path.insert(0, repo_dir)
    module = import_module(PIPELINE_FILE, pipeline_path)
    sys.path.remove(repo_dir)

    return module.Pipeline(repo_dir)

CC @osanseviero

[Startup Plan] Don't manage to get CPU optimized inference API

Hi community,

I have subscribed a 7-day free trial of the Startup Plan and I wish to test CPU optimized inference API on this model: https://huggingface.co/Matthieu/stsb-xlm-r-multilingual-custom

However, when using the below code:

import json
import requests

API_URL = "https://api-inference.huggingface.co/models/Matthieu/stsb-xlm-r-multilingual-custom"
headers = {"Authorization": "Bearer API_ORG_TOKEN"}

def query(payload):
    data = json.dumps(payload)
    response = requests.request("POST", API_URL, headers=headers, data=data)
    return json.loads(response.content.decode("utf-8")), response.headers.get('x-compute-type')

payload1 = {"inputs": "Navigateur Web : Ce logiciel permet d'accéder à des pages web depuis votre ordinateur. Il en existe plusieurs téléchargeables gratuitement comme Google Chrome ou Mozilla. Certains sont même déjà installés comme Safari sur Mac OS et Edge sur Microsoft.", "options": {"use_cache": False}}

sentence_embeddings1, x_compute_type1 = query(payload1)
print(sentence_embeddings1)
print(x_compute_type1)

I got the sentence embeddings but x-compute-type header of my request return cpu and not cpu+optimized. Do I have to ask something to have CPU optimized inference?

Thanks!

Divide sklearn generation script into tasks

Currently sklearn generation script takes only version as argument. It would be nice to put an option to run for a given task separately given we might add more tasks and it's cumbersome to run it for all tasks and push tons of models to skops-tests every time.
It would be nice if we could do:

def main(version, task):
    ...

if __name__ == "__main__":
    sklearn_version = sys.argv[1]
    task = sys.argv[2]
    main(sklearn_version, task)

It was a minor annoyance, I had to run the script and in the middle of the run I uploaded 5 models that I didn't need, and got 503.
Feel free to ignore if it's not good :')
@adrinjalali @BenjaminBossan

Remove unimplemented pipelines across the different integrations

Is your feature request related to a problem? Please describe.
Is there any reason why we have a ton of unused pipeline files across the different integrations (example)? I think it would be a lot easier to both navigate the repo and quickly see which integrations have which pipelines enabled if we only included the files that are actually being used.

Describe the solution you'd like

  • Remove unused pipeline files from the different integrations.
  • Update guidance in contributing docs if need be to reflect the fact we shouldn't be copying over all the pipeline template files.

WDYT?

Inference API always returns error: Invalid token

Sorry if this is not the best place to post this issue. I am having an issue with the inference API suddenly, after it has worked perfectly for months:

After seeing '{ "error": "invalid token" }' coming back in response to queries, I created a new API token (old one was not showing up, abd changed the header from Authorization: Bearer api_XXX to Authorization: Bearer hf_YYY as outlined in the docs, but am still facing the same error.

Any idea what could be the issue? I am so hoping this could be fixed soon.

Misc improvements for Stanza models

Couple of minor improvements in order of increasing complexity/maintainability. This is relatively low priority, but the first one should be really quick to do and will significantly help users.

  • As a user, I can see a simple code snippet to know how to load the model.
  • As a user, I can easily filter for all PoS vs NER models of Stanza. This means we need to get the right tag in the uploaded models.
  • As a user, I get the right license. We need to fix the model licenses according to https://stanfordnlp.github.io/stanza/available_models.html.

Run production grade private models

Hello,

Following discussion with @Narsil, he told me that actually model hub is not meant to be able to load private models (as of now), since the api-inference-community was originally intended to promote community libraries that use the hub.

However, in the scope of running production grade private models would it be possible to internally discuss this possibility within model hub?

Thanks!

Audio-to-regions widget and community API for pyannote.audio

Opening an issue as per @osanseviero's suggestion on Twitter.
Issue imported from pyannote/pyannote-audio#835


pyannote.audio 2.0 will bring a unified pipeline API:

from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization")
output = pipeline("audio.wav")   # or pipeline({"waveform": np.ndarray, "sample_rate": int})

where output is a pyannote.core.Annotation instance.

I just created a space that allows to test a bunch of pipelines shared on Hugginface Hub but it would be nice if those were testable directly in their own model card.

My understanding is that two things needs to happen

Use espnet/kan-bayashi_ljspeech_vits model via inference api

https://github.com/huggingface/huggingface_hub/blob/eeeb0d1b352fb3249541bb62f1f57f41ae3ab4e0/api-inference-community/docker_images/espnet/app/pipelines/automatic_speech_recognition.py#L8

Hey guys. I tried to use the mentioned model via inference api and the audio that comes as an response is distorted.

When I try to import the model and call it directly it returns a nice audio with nice voice:

from espnet2.bin.tts_inference import Text2Speech
from espnet2.utils.types import str_or_none

tag = 'kan-bayashi/ljspeech_vits'
vocoder_tag = "none"

text2speech = Text2Speech.from_pretrained(
  model_tag=str_or_none(tag),
  vocoder_tag=str_or_none(vocoder_tag),
  device="cpu",
  threshold=0.5,
  minlenratio=0.0,
  maxlenratio=10.0,
  use_att_constraint=False,
  backward_window=1,
  forward_window=3,
  speed_control_alpha=1.0,
  noise_scale=0.333,
  noise_scale_dur=0.333,
)

Running inference API can sometimes be very slow or fail

Following up on this slack discussion.

The problem is that the skops CI sometimes times out after we added tests for the inference API (here is a failed run). We did some further investigation and this is what we can say for now:

  1. The call to the inference API is the slow step
  2. It's only the first call that is slow, repeated calls are fast thanks to warm start (but our tests don't benefit from that).
  3. The more processes there are in parallel, the more likely it is for the call to be slow or to time out, but there is a lot of variance
  4. Even 3 parallel processes can already cause slowdown
  5. This happens even without the retry decorator we use
  6. If a call does not finish within 3 min, in the vast majority of times, it times out after 7 min (there is no timeout set from the client side).
  7. Occasionally, we also get an internal server error

ping @adrinjalali

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.