coreyb42 / exllama-rest-server Goto Github PK
View Code? Open in Web Editor NEWThis project forked from turboderp/exllama
A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
License: MIT License