ex3ndr / llama-coder Goto Github PK
View Code? Open in Web Editor NEWReplace Copilot local AI
Home Page: https://marketplace.visualstudio.com/items?itemName=ex3ndr.llama-coder
License: MIT License
Replace Copilot local AI
Home Page: https://marketplace.visualstudio.com/items?itemName=ex3ndr.llama-coder
License: MIT License
Git repo: https://github.com/oobabooga/text-generation-webui
Would be cool to just be able to use one thing to serve the models.
As a bonus point, it's compatible with OpenAI API, so if you support stuff like auth, that would mean llama-coder would support any OpenAI-compatibile server.
I've downloaded ollama I'm not sure what i'm expecting to happen I've pulled the model locally. There is no guidance on what is expected to happen or how to use?
Is it supposed to run on save is there a way to run a command or activate generate?
Ran into an issue with using the Ollama API through Open-WebUI.
They add an id message at the beginning of the stream that does not include a '.response' part of the message which causes llama-coder to have issues because it still tries to iterate 'tokens.reponse'.
It can be easily fixed by adding the following code to Line 28 of the autocomplete.js file.
if ('response' in tokens) {
...
// This is just to show where the closing bracket goes, this is existing code.
if (totalLines > args.maxLines && blockStack.length === 0) {
(0, log_1.info)('Too many lines, breaking.');
break;
}
}
Hello,
I hope this message finds you well. I am the maintainer of llama-github, an open-source Python library designed to empower LLM Chatbots, AI Agents, and Auto-dev Solutions by providing intelligent retrieval of code snippets, issues, and repository information from GitHub.
Proposal:
I believe that integrating llama-github into llama-coder could significantly enhance its functionality by enabling efficient retrieval of relevant GitHub content. This would align with llama-coder's philosophy of using local models, as llama-github can operate in a "simple mode" that does not require GPT-4, thus maintaining the spirit of local processing.
Benefits:
Example Usage:
Here is a simple mode example of how llama-github can be integrated into llama-coder:
pip install llama-github
from llama_github import GithubRAG
# Initialize GithubRAG with your credentials
github_rag = GithubRAG(
github_access_token="your_github_access_token"
)
# Retrieve context for a coding question
query = "How to create a NumPy array in Python?"
context = github_rag.retrieve_context(query, simple_mode = True)
print(context)
Additional Information:
You can find more details and documentation about llama-github here. I would be more than happy to assist with the integration process if you find this proposal valuable.
Thank you for considering this request!
Best regards,
Jet Xu
it works in .py files, but don't in .ipynb
When I run a model like codellama:34b-code-q6_K
it does seem to spin up my GPUs but then I end up with unusable output. Running latest ollama and extension version on Ubuntu 22.04
2024-03-12 01:57:27.392 [info] Running AI completion...
2024-03-12 01:57:31.655 [info] Receive line: {"model":"codellama:34b-code-q6_K","created_at":"2024-03-12T05:57:31.654294365Z","response":" ","done":false}
2024-03-12 01:57:31.700 [info] Receive line: {"model":"codellama:34b-code-q6_K","created_at":"2024-03-12T05:57:31.699862247Z","response":"\n","done":false}
2024-03-12 01:57:31.746 [info] Receive line: {"model":"codellama:34b-code-q6_K","created_at":"2024-03-12T05:57:31.745519433Z","response":"\u003c/","done":false}
2024-03-12 01:57:31.790 [info] Receive line: {"model":"codellama:34b-code-q6_K","created_at":"2024-03-12T05:57:31.790407012Z","response":"PRE","done":false}
2024-03-12 01:57:31.836 [info] Receive line: {"model":"codellama:34b-code-q6_K","created_at":"2024-03-12T05:57:31.835927564Z","response":"\u003e","done":false}
2024-03-12 01:57:31.881 [info] Receive line: {"model":"codellama:34b-code-q6_K","created_at":"2024-03-12T05:57:31.881386364Z","response":"","done":true,"total_duration":4488985784,"load_duration":270215,"prompt_eval_count":1427,"prompt_eval_duration":4261101000,"eval_count":6,"eval_duration":226981000}
2024-03-12 01:57:31.882 [info] AI completion completed:
Is there a key binding or a way to key bind to pause the autocomplete?
Hi.
I installed it locally on my M1 and it works in CLI.
When i click on Llama Coder in top right corner (status bar) of VS Code it does nothing.
Sorry for question, maybe its too obvious for me.
Can support for Deepseek Coder Instruct be added? According to the documentation here: https://deepseekcoder.github.io/ it is a significant improvement over the base model.
I have spent some time trying to find a usage guide but failed miserably. I can see the extension is installed and doing something as I can see the spinning wheel in the footer of the VSCode app.
I am also sure I have the model on my machine as I can see it in Ollama and run an instance via ollama run <MODE_NAME>
When go to the output tab of VSCode I can see the following:
2024-02-26 14:21:32.734 [info] Llama Coder is activated.
2024-02-26 14:26:51.699 [info] Canceled after AI completion.
2024-02-26 14:26:52.006 [info] Running AI completion...
2024-02-26 14:27:00.153 [info] Receive line: {"model":"stable-code:3b-code-q4_0","created_at":"2024-02-26T14:27:00.151644Z","response":".","done":false}
2024-02-26 14:27:00.158 [info] AI completion completed:
2024-02-26 14:27:00.158 [info] Canceled after AI completion.
2024-02-26 14:27:00.160 [info] Canceled before AI completion.
2024-02-26 14:28:34.455 [info] Running AI completion...
2024-02-26 14:28:47.386 [info] Receive line: {"model":"stable-code:3b-code-q4_0","created_at":"2024-02-26T14:28:47.384057Z","response":"","done":true,"total_duration":12925537752,"load_duration":5852886428,"prompt_eval_count":130,"prompt_eval_duration":7072116000,"eval_count":1,"eval_duration":22000}
2024-02-26 14:28:47.388 [info] AI completion completed:
2024-02-26 14:28:47.388 [info] Canceled after AI completion.
2024-02-26 14:28:47.392 [info] Canceled before AI completion.
2024-02-26 14:28:47.392 [info] Canceled before AI completion.
2024-02-26 14:28:47.394 [info] Canceled before AI completion.
2024-02-26 14:28:47.404 [info] Running AI completion...
2024-02-26 14:28:47.726 [info] Receive line: {"model":"stable-code:3b-code-q4_0","created_at":"2024-02-26T14:28:47.725568Z","response":"\n\n","done":false}
2024-02-26 14:28:47.829 [info] Receive line: {"model":"stable-code:3b-code-q4_0","created_at":"2024-02-26T14:28:47.828662Z","response":"","done":true,"total_duration":422186670,"load_duration":762292,"prompt_eval_count":4,"prompt_eval_duration":317920000,"eval_count":2,"eval_duration":102965000}
2024-02-26 14:28:47.830 [info] AI completion completed:
2024-02-26 14:28:50.926 [info] Running AI completion...
2024-02-26 14:28:51.185 [info] Receive line: {"model":"stable-code:3b-code-q4_0","created_at":"2024-02-26T14:28:51.184114Z","response":"","done":true,"total_duration":257221422,"load_duration":439644,"prompt_eval_count":3,"prompt_eval_duration":256262000,"eval_count":1,"eval_duration":24000}
2024-02-26 14:28:51.186 [info] AI completion completed:
However, nothing is ever outputted in the terminal or within the filesystem
There are a lot of models to choose from
I can't tell from their names (e.g. deepseek-coder:6.7b-base-q what the difference of all these models is
It would be good to at least link to a help page in that section of the settings that has info what the differences and use cases of each of the models is
Ollama is very popular and very easy to setup. It would be great if the extension didn't bundle with codellama and instead we could drop in the ollama endpoint to begin completions
I started using Llama Coder with the default stable-code:3b-code-q4_0
model.
I then inadvertently downloaded the codellama:7b-code-q4_K_M
model and set Llama coder to use that. This gave me the following error...
Seeing that the q4_K_M
model was not recommended for Macs, I deleted the model from Ollama, and I tried to revert to the stable-code:3b-code-q4_0
model again, in the settings...
The problem is that Llama Coder still seems to think that I am using the codellama:7b-code-q4_K_M
model, and nothing I can do seems to get rid of that reference...
I've even tried closing Cursor and restarting it, but it still continues to reference the wrong model and give the aforementioned error message.
I saw Ollama does not support Windows yet
This is actually a question not an issue:
Where does the VS Code extension download the model to?
Is there a way to choose the location where the model can be saved to?
I am testing lots of UIs for LLMs and therefore come into the situation that a model is downloaded rather often. This reduces disk space very quickly. For things likes Stable Diffusion with Automatic1111, ComfyUI, Fooocus, StabelSwarm etc it is easy to either set a custom source folder via the software's settings or to manually create a symlinked folder. Then I only need the models once.
Here it is no really clear where the model is saved to, which is something that tends to create distrust. I would very much like to be able to see where things are downloaded to, and even better: I would like to set options for things like that.
Would that be possible to implement here, too?
Also โ since this is not an issue: Can I motivate you to open the "Discussions" panel here in your github repo? That would provide a "Questions" section, where this topic would fit better. Thank you! :-)
Hi!
Thanks for coding this little wonder of extension. Kudos! I've been using it for a bit, and I have noticed that every autocompletion generates an extra request to the /api/tags
endpoint in Ollama:
I suspect it comes from the call to ollamaCheckModel()
in provideInlineCompletionItems()
:
llama-coder/src/prompts/provider.ts
Line 89 in 996ac71
In my view it should not be necessary to send a request to the /api/tags
endpoint every time. I am aware the latency it introduces is orders of magnitude lower than the /api/generate
cat, but still ... it's extra job for the extension that (in my view) does not need to do.
I'd suggest to go for a different strategy ๐ค Perhaps do the check once and save the list of available models to check locally. Then check again whenever the configuration changes, or every now and then.
Thanks!
I want to use a custom model via Ollama, to which I gave the name copilot. I did ollama create copilot -f copilot
and running ollama run copilot
works in the terminal, and other extensions also work well with it. However, when I start Llama Coder and set the custom model to 'copilot,' the extension tells me that copilot wasn't downloaded and asks if I want to download it. If I click on 'No,' nothing happens. Looking at the logs, I see "Ingoring since the user asked to ignore download.".
How does llama-coder check if a model exists with ollama?
How can I use this model, as I already manually created it using the gguf file and ollama create?
Thanks in advance!
I think it would be nice to have the possibility to utilize any model instead of being limited by the list you provide in settings
As said in the title, the chat feature is really missing. ie if I want the assistant to explain some code, I can't do it currently with llama coder.
For some reason the extension doesnt work with .astro files?
The tab on the bottom doesn't show, and there is no code completion. Seems like a bug with vscode maybe? All other file types I've tried work.
Always shows Unknown model undefined when trying to use. Ollama is running and different models tried.
Reinstalled several time with no help.
Please create a Visual Studio version of the extension. Thanks.
I work primarily remote using the ms-vscode-remote.remote-ssh-edit
extention. I'm not able to get the llama-coder
autocomplete to work on those projects. It works just fine for projects on my local machine. Not sure how to debug further, but seems that llama-coder
does not trigger when I stop typing.
this would allow it to be used with VSCodium (de-Microsofted VSCode)
the registry is here: https://open-vsx.org/
I already installed the codellama
and deepseek-llm
models using ollama
but your vscode extensions llama-coder
is now displaying downloading..
, so i guess it's downloading additional models needed. Would it be possible you could check if the models already exists on the system and make downloading them optional to save time and space?
There are superior coding models out now that are better than CodeLlama. Any plan to add support for those?
Hi,
can you please update the extension to enable usage the new Deepseek-coder models.
The models are already implemented in Ollama and outperforms code-lama models.
Thank you.
i have downloaded the base model but it's not working
It would be nice to be able to actively tell the plugin "I want an autocompletion for the current line" instead of "do autocomplete all the time" with a shortcut.
So instead of query the model after 250ms automatically after the last pressed key, just wait for the user to press a shortcut.
I have an ollama container running the stable-code:3b-code-q4_0 model. I'm able to interact with the model via curl:
curl -d '{"model":"stable-code:3b-code-q4_0", "prompt": "c++"}' https://notarealurl.io/api/generate
and get a response in a terminal in wsl where I'm running vscode:
However when I set the Ollama Server Endpoint to https://notarealurl.io/ I just get [warning] Error during inference: fetch failed
I've noticed that code suggestions from the extension sometimes include a trailing newline as shown in the screenshot, leading to an issue where pressing the Tab key inserts a tab space instead of accepting the suggestion. This disrupts the coding flow. Maybe this could be fixed by triming the newline from the end of suggestions before they're returned, as shown below:
if (res.endsWith('\n')) {
res = res.slice(0, -1);
}
@ex3ndr, hi,
is it possible to add to enum package.json
absolutely all models? Very few 70b models
It seems that ignoring the download of a model will disable any download, even when the model is changed. There should be a way to reset this.
Hi!
Do you plan to create a version of llama-coder for the JetBrains IDE?
I noticed that on every little change in the file, the extension will give me autocompletion suggestion. Often they are not necessary and "random" since they lack context.
Additionally this causes a big overheat and makes Ollama crash easily on my PC. So I suggest a feature would be nice, which regulates the amount of requests the extension does to the model. Something like an additional setting would be nice, which can be regulated by the user (e.g. make request of autocomplete to the model only 0.5 seconds after the user didn't input anything in the editor).
When my computer is running on battery I would like to be able to disable autocomplete so that I can save power. This behaviour should be configurable so that users can decide whether they want to optimize for power efficiency or productivity.
As title says
If I've already pulled the new (as of 2024-01-30) codellama-70b from meta (or python variant)
Will Llama Coder use this?
Or does it download the 34b and run that?
Does it just run whatever I'm running in Ollama and the list of models you provide are more "recommendations"?
Instructions seem to contradict or are not clear.
One one hand it simply says:
Local Installation
Install Ollama on local machine and then launch the extension in VSCode, everything should work as it is.
But then below that, the list of models doesn't go to 70b and probably doesn't include the new meta ones 70b, 70b python and 70b instruct?
Since my machine is capable of running it, I would prefer to.
Successfully running (and quite fast!) ollama run codellama:70b
from here: https://ollama.ai/library/codellama:70b
Only reason I haven't simply installed and launched is because I don't want to end up with a 34b download in an unspecified location despite already running 70b :)
Thanks
HI
I am using VSCodium as an open alternative and this plugin does not seem to like Vscodium. I always install using .vsix packages but this one failed with this message.
thanks
EDIT:
updated Vscodium, works now
Hi,
I use VS Code's Remote SSH/Remote Containers plugin for most of my development.
Testing out llama-coder initially I ran ollama on another machine on my local network, separate from my development laptop. This worked great!
However when I use VS Code to connect to a remote development server, that server doesn't have access to the local machine I use for inference. VS Code only allows llama-coder to run in the remote host. Unfortunately VS Code can't use reverse SSH tunnels, which was the first solution I tried.
I suspect that this could be resolved by adding "extensionKind": ["workspace", "ui"]
to the extension manifest, as described here. This would leave the default behavior of the extension running on the remote host, but allow it to be switched to running locally if desired.
Let me know what you think, the above comes from reading over the linked pages but I don't have any experience writing VS Code extensions so I could be wrong about something here.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.