Comments (9)
For clarity, the ability to load private models exists for any model in the transformers
library.
Here, which library does your model run in?
from api-inference-community.
@julien-c thanks for clarification.
To more precise in the exchanges I had by mail with @Narsil and @jeffboudier, the problem would be more to use the hugging face API inference from a private model on model hub.
My wish is to use hugging face API inference on this sentence transformer model: https://huggingface.co/sentence-transformers/stsb-xlm-r-multilingual
However, output of this model has to pass mean pooling in order to get sentence embeddings. That's why I just customize the model to include this step within the model: https://huggingface.co/Matthieu/stsb-xlm-r-multilingual-custom
Still output of API inference from this custom model hasn't passed mean pooling although I added sentence-transformers
tag. I have made this model public since @Narsil seems to state that private model couldn't natively take this tag into account.
Thanks!
from api-inference-community.
Hey Matthieu, thanks for reaching out!
One of the key benefits of the 🤗 Accelerated Inference API is that you can serve any compatible model from the Model Hub, wether shared publicly, or uploaded privately to your Hugging Face account.
The subtlety here is that the model you are using is a model from the library sentence-transformers
which is integrated with the Model Hub, and with the Inference API through api-inference-community
. The custom post-processing behavior you are implementing may not be integrated with the pipeline currently implemented to serve the model.
For more on this I will let @Narsil comment here or in the email thread.
Cheers,
Jeff
from api-inference-community.
I'm not sure if there's a problem. I just tried this and I got the sentence embedding:
import json
import requests
API_URL = "https://api-inference.huggingface.co/models/Matthieu/stsb-xlm-r-multilingual-custom"
headers = {"Authorization": "Bearer YOUR_TOKEN"}
def query(payload):
data = json.dumps(payload)
response = requests.request("POST", API_URL, headers=headers, data=data)
return json.loads(response.content.decode("utf-8"))
data = query({"inputs": "Hello, my name is John and I live in New York"})
print(len(data))
# 748
From your email, you got ## {'error': 'Model Matthieu/stsb-xlm-r-multilingual-custom is currently loading', 'estimated_time': 44.490336920000004}
. Running the line above again should make you get the response after waiting some seconds.
Note that you can also use the feature-extraction
widget in your repo to get the sentence embedding, which means things are working ok. Please let me know if I misunderstood anything.
For future reference, the exact code that does feature-extraction
for sentence-transformers
can be found here.
from api-inference-community.
Hi @osanseviero thanks for your feedback. Indeed the widget works! It seemed to be just a question of time to be available.
@jeffboudier @Narsil since API seems to work, do you need to do in addition anything on your side to have access to CPU+GPU accelerated inference API based on this custom model?
Best,
Matthieu
from api-inference-community.
from api-inference-community.
Hi, it's mostly linked to the fact that models from api-inference-community
does not use an auth_token
mostly.
.from_pretrained(..., use_auth_token=XXXXX)
.
Because it is not enforced by this repository, the docker will not be able to see any private models, and will fail at load time (the load mechanism will work).
The reason for this issue, is to raise awarenesss, and gather a consensus on the direction.
I can see 3 directions:
- Force docker images to use the auth token, enabling private models. Caveat: No acceleration on those models, probably not the same level of support either.
- Do not force the images to support auth_tokens, make clearer that private models on community frameworks do not work.
- Make
sentence-transformers
an exception as it is so closely aligned with transformers, that making it core is much simpler than other community models (I'm thinking spacy design for instance)
I am in favor of option 1 (or 3), but in my eyes, the question of acceleration + GPU will come soon enough, and the amount of features will start piling up and be less community friendly.
Edit: as expected, #34 and #31 expect acceleration + GPU which are not supported (by design) currently
from api-inference-community.
Tea por
from api-inference-community.
@david429429 I'm closing this very old thread.
If you want production grade inference, you should try spaces
(free form inference) or hf-endpoints
(something closer to what is here, with actually also free form but much simpler to setup if you just want to deploy a given model).
from api-inference-community.
Related Issues (20)
- Doesn't work with the bart-large-cnn model HOT 1
- What is the ratelimit for inference api for pro users? HOT 1
- Hosted Inference Api, all models error 422 HOT 6
- pydantic.errors.PydanticImportError: `pydantic:ConstrainedFloat` has been removed in V2. HOT 4
- How do we use the detailed Parameters for the api? HOT 1
- Many of the docker images seem to be out of sync with latest inference community version HOT 7
- Update docker images to latest version of api-inference-community version
- Update docker images to latest version of api-inference-community version HOT 2
- meta-llama/Llama-2-70b-chat-hf Inference API shows incpmplete output HOT 1
- About using Hosted Inference API
- Proper parameters for `HuggingFaceM4/idefics-9b-instruct` HOT 1
- An exception occurs when running the NER model. HOT 1
- Return max_input_length if sequence is too long (NLP tasks) HOT 9
- No image-to-text task in pipelines!
- [Bug] Audio task accept headers are not respected HOT 2
- any pipeline using huggingface_hub.model_info is not offline compatible
- Adding End-Of-Generation-Token parameter for text generation Inference API
- 1xbet
- Bumping docker version of SpeechBrain? HOT 1
- [Bug] Multiple Image Outputs are returned in a single byte buffer
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from api-inference-community.