dngros / lmwrapper Goto Github PK
View Code? Open in Web Editor NEWAn object-oriented wrapper around language models (like openai endpoints or huggingface)
An object-oriented wrapper around language models (like openai endpoints or huggingface)
Accelerate allows training/inference of large models by automagically splitting the layers across CUDA devices. Initially, we had some issues with logprobs
due to model forward patching. Now that we generally use transition scores, it may work with accelerate.
There seems to be some issues with CodeLLaMa stop tokens that breaks it sometimes. I will try to get a working reproduction and understand what's going on.
CC: @claudiosv
Add some builtin support for generating multiple generations. This leaves room for more efficient generation on backends like HF (you only need to encode once). Also can potentially cache multiple generations (ideally in clever way that lets you grow the number of generations and reuse prior caches).
The whitespace on the first token of Mistral-7B-v0.1 seems to differ from the huggingface output. This can be fixed upstream (like when synthegrator snaps whitespce), but we need to identify the root cause.
In ba839a4 we add support for GPT-4-Turbo. However, GPT-4-Turbo behaves differently than other models. There is a large input limit (128,000 tokens), but a smaller output limit of 4096 tokens. We don't have any way of representing this currently, so features like checking if a prompt will go over and prompt trimming will not behave as expected.
Some of the huggingface models take several minutes just to load. This makes running the slow tests even more painful. There is probably some pytest cleverness here on loading once and running each. Not sure though.
Expose the probabilities of the top tokens in a prediction result
TODO:
vLLM offers an OpenAI compatible HTTP server and faster inference. We'd like to offer it as an option for lmwrapper.
#12 adds support for codet5+. We will want to also eventually verify that a more mainstream encoder-decoder model (like normal t5 or one of its more state-of-the-art successors) works too.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.