ex3ndr / llama-coder Goto Github PK

View Code? Open in Web Editor NEW

1.3K 18.0 93.0 278 KB

Replace Copilot local AI

Home Page: https://marketplace.visualstudio.com/items?itemName=ex3ndr.llama-coder

License: MIT License

TypeScript 99.27% JavaScript 0.73%

llama-coder's Issues

Add support for oobabooga/text-generation-webui

Git repo: https://github.com/oobabooga/text-generation-webui

Would be cool to just be able to use one thing to serve the models.

As a bonus point, it's compatible with OpenAI API, so if you support stuff like auth, that would mean llama-coder would support any OpenAI-compatibile server.

Documentation on how to use?

I've downloaded ollama I'm not sure what i'm expecting to happen I've pulled the model locally. There is no guidance on what is expected to happen or how to use?

Is it supposed to run on save is there a way to run a command or activate generate?

Open-WebUI compatibility

Ran into an issue with using the Ollama API through Open-WebUI.

They add an id message at the beginning of the stream that does not include a '.response' part of the message which causes llama-coder to have issues because it still tries to iterate 'tokens.reponse'.

It can be easily fixed by adding the following code to Line 28 of the autocomplete.js file.

if ('response' in tokens) {
...
// This is just to show where the closing bracket goes, this is existing code.
if (totalLines > args.maxLines && blockStack.length === 0) {
        (0, log_1.info)('Too many lines, breaking.');
        break;
    }
}

Feature Request: Integrate llama-github for Enhanced GitHub Retrieval in Simple Mode

Hello,

I hope this message finds you well. I am the maintainer of llama-github, an open-source Python library designed to empower LLM Chatbots, AI Agents, and Auto-dev Solutions by providing intelligent retrieval of code snippets, issues, and repository information from GitHub.

Proposal:
I believe that integrating llama-github into llama-coder could significantly enhance its functionality by enabling efficient retrieval of relevant GitHub content. This would align with llama-coder's philosophy of using local models, as llama-github can operate in a "simple mode" that does not require GPT-4, thus maintaining the spirit of local processing.

Benefits:

Efficient Retrieval: llama-github's advanced retrieval techniques can quickly provide relevant code snippets and repository information, enhancing the coding assistance provided by llama-coder.
Local Processing: By using the simple mode of llama-github, you can avoid external OpenAI calls, ensuring that all LLM processing remains local, which is in line with the design principles of llama-coder.
Repository Pool: llama-github features a repository pool mechanism that helps conserve users' GitHub API quota by efficiently managing and reusing repository data. This can be particularly beneficial for llama-coder users who may have limited API quota.
Enhanced Context: Integrating llama-github can provide richer context and more comprehensive answers to coding queries, improving the overall user experience.

Example Usage:
Here is a simple mode example of how llama-github can be integrated into llama-coder:

pip install llama-github

from llama_github import GithubRAG

# Initialize GithubRAG with your credentials
github_rag = GithubRAG(
    github_access_token="your_github_access_token"
)

# Retrieve context for a coding question
query = "How to create a NumPy array in Python?"
context = github_rag.retrieve_context(query, simple_mode = True)
print(context)

Additional Information:
You can find more details and documentation about llama-github here. I would be more than happy to assist with the integration process if you find this proposal valuable.

Thank you for considering this request!

Best regards,
Jet Xu

add ipynb support (jupyter notebook)

it works in .py files, but don't in .ipynb

Larger models just seem to return metadata

When I run a model like codellama:34b-code-q6_K it does seem to spin up my GPUs but then I end up with unusable output. Running latest ollama and extension version on Ubuntu 22.04

2024-03-12 01:57:27.392 [info] Running AI completion...
2024-03-12 01:57:31.655 [info] Receive line: {"model":"codellama:34b-code-q6_K","created_at":"2024-03-12T05:57:31.654294365Z","response":" ","done":false}
2024-03-12 01:57:31.700 [info] Receive line: {"model":"codellama:34b-code-q6_K","created_at":"2024-03-12T05:57:31.699862247Z","response":"\n","done":false}
2024-03-12 01:57:31.746 [info] Receive line: {"model":"codellama:34b-code-q6_K","created_at":"2024-03-12T05:57:31.745519433Z","response":"\u003c/","done":false}
2024-03-12 01:57:31.790 [info] Receive line: {"model":"codellama:34b-code-q6_K","created_at":"2024-03-12T05:57:31.790407012Z","response":"PRE","done":false}
2024-03-12 01:57:31.836 [info] Receive line: {"model":"codellama:34b-code-q6_K","created_at":"2024-03-12T05:57:31.835927564Z","response":"\u003e","done":false}
2024-03-12 01:57:31.881 [info] Receive line: {"model":"codellama:34b-code-q6_K","created_at":"2024-03-12T05:57:31.881386364Z","response":"","done":true,"total_duration":4488985784,"load_duration":270215,"prompt_eval_count":1427,"prompt_eval_duration":4261101000,"eval_count":6,"eval_duration":226981000}
2024-03-12 01:57:31.882 [info] AI completion completed:

Keyboard Shortcut to Pause autocomplete?

Is there a key binding or a way to key bind to pause the autocomplete?

How to run it in VS Code?

Hi.

I installed it locally on my M1 and it works in CLI.
When i click on Llama Coder in top right corner (status bar) of VS Code it does nothing.
Sorry for question, maybe its too obvious for me.

Add support for Deepseek Coder - Instruct Models

Can support for Deepseek Coder Instruct be added? According to the documentation here: https://deepseekcoder.github.io/ it is a significant improvement over the base model.

Extension Literally Does Nothing

I have spent some time trying to find a usage guide but failed miserably. I can see the extension is installed and doing something as I can see the spinning wheel in the footer of the VSCode app.

I am also sure I have the model on my machine as I can see it in Ollama and run an instance via ollama run <MODE_NAME>

When go to the output tab of VSCode I can see the following:

2024-02-26 14:21:32.734 [info] Llama Coder is activated.
2024-02-26 14:26:51.699 [info] Canceled after AI completion.
2024-02-26 14:26:52.006 [info] Running AI completion...
2024-02-26 14:27:00.153 [info] Receive line: {"model":"stable-code:3b-code-q4_0","created_at":"2024-02-26T14:27:00.151644Z","response":".","done":false}
2024-02-26 14:27:00.158 [info] AI completion completed: 
2024-02-26 14:27:00.158 [info] Canceled after AI completion.
2024-02-26 14:27:00.160 [info] Canceled before AI completion.
2024-02-26 14:28:34.455 [info] Running AI completion...
2024-02-26 14:28:47.386 [info] Receive line: {"model":"stable-code:3b-code-q4_0","created_at":"2024-02-26T14:28:47.384057Z","response":"","done":true,"total_duration":12925537752,"load_duration":5852886428,"prompt_eval_count":130,"prompt_eval_duration":7072116000,"eval_count":1,"eval_duration":22000}
2024-02-26 14:28:47.388 [info] AI completion completed: 
2024-02-26 14:28:47.388 [info] Canceled after AI completion.
2024-02-26 14:28:47.392 [info] Canceled before AI completion.
2024-02-26 14:28:47.392 [info] Canceled before AI completion.
2024-02-26 14:28:47.394 [info] Canceled before AI completion.
2024-02-26 14:28:47.404 [info] Running AI completion...
2024-02-26 14:28:47.726 [info] Receive line: {"model":"stable-code:3b-code-q4_0","created_at":"2024-02-26T14:28:47.725568Z","response":"\n\n","done":false}
2024-02-26 14:28:47.829 [info] Receive line: {"model":"stable-code:3b-code-q4_0","created_at":"2024-02-26T14:28:47.828662Z","response":"","done":true,"total_duration":422186670,"load_duration":762292,"prompt_eval_count":4,"prompt_eval_duration":317920000,"eval_count":2,"eval_duration":102965000}
2024-02-26 14:28:47.830 [info] AI completion completed: 


2024-02-26 14:28:50.926 [info] Running AI completion...
2024-02-26 14:28:51.185 [info] Receive line: {"model":"stable-code:3b-code-q4_0","created_at":"2024-02-26T14:28:51.184114Z","response":"","done":true,"total_duration":257221422,"load_duration":439644,"prompt_eval_count":3,"prompt_eval_duration":256262000,"eval_count":1,"eval_duration":24000}
2024-02-26 14:28:51.186 [info] AI completion completed:

However, nothing is ever outputted in the terminal or within the filesystem

Vscode: add help text to model selection dropdown

There are a lot of models to choose from
I can't tell from their names (e.g. deepseek-coder:6.7b-base-q what the difference of all these models is
It would be good to at least link to a help page in that section of the settings that has info what the differences and use cases of each of the models is

Ollama support and decouple codellama

Ollama is very popular and very easy to setup. It would be great if the extension didn't bundle with codellama and instead we could drop in the ollama endpoint to begin completions

Unable to switch models

Mac Mini M1 (Silicon)
macOS 13.6 Ventura
Cursor 0.26.2 (Cursor is a VS Code clone and seems to be compatible with Llama Coder)
Llama Coder

I started using Llama Coder with the default stable-code:3b-code-q4_0 model.

I then inadvertently downloaded the codellama:7b-code-q4_K_M model and set Llama coder to use that. This gave me the following error...

Seeing that the q4_K_M model was not recommended for Macs, I deleted the model from Ollama, and I tried to revert to the stable-code:3b-code-q4_0 model again, in the settings...

The problem is that Llama Coder still seems to think that I am using the codellama:7b-code-q4_K_M model, and nothing I can do seems to get rid of that reference...

I've even tried closing Cursor and restarting it, but it still continues to reference the wrong model and give the aforementioned error message.

Does it work on Windows

I saw Ollama does not support Windows yet

Move model to external harddrive and symlink it

This is actually a question not an issue:

Where does the VS Code extension download the model to?
Is there a way to choose the location where the model can be saved to?

I am testing lots of UIs for LLMs and therefore come into the situation that a model is downloaded rather often. This reduces disk space very quickly. For things likes Stable Diffusion with Automatic1111, ComfyUI, Fooocus, StabelSwarm etc it is easy to either set a custom source folder via the software's settings or to manually create a symlinked folder. Then I only need the models once.

Here it is no really clear where the model is saved to, which is something that tends to create distrust. I would very much like to be able to see where things are downloaded to, and even better: I would like to set options for things like that.

Would that be possible to implement here, too?

Also – since this is not an issue: Can I motivate you to open the "Discussions" panel here in your github repo? That would provide a "Questions" section, where this topic would fit better. Thank you! :-)

Prevent checking if model exists on every autocompletion

Hi!

Thanks for coding this little wonder of extension. Kudos! I've been using it for a bit, and I have noticed that every autocompletion generates an extra request to the /api/tags endpoint in Ollama:

I suspect it comes from the call to ollamaCheckModel() in provideInlineCompletionItems():

llama-coder/src/prompts/provider.ts

Line 89 in 996ac71

    
           let modelExists = await ollamaCheckModel(inferenceConfig.endpoint, inferenceConfig.modelName);

In my view it should not be necessary to send a request to the /api/tags endpoint every time. I am aware the latency it introduces is orders of magnitude lower than the /api/generate cat, but still ... it's extra job for the extension that (in my view) does not need to do.

I'd suggest to go for a different strategy 🤔 Perhaps do the check once and save the list of available models to check locally. Then check again whenever the configuration changes, or every now and then.

Thanks!

Can't use custom model

I want to use a custom model via Ollama, to which I gave the name copilot. I did ollama create copilot -f copilot and running ollama run copilot works in the terminal, and other extensions also work well with it. However, when I start Llama Coder and set the custom model to 'copilot,' the extension tells me that copilot wasn't downloaded and asks if I want to download it. If I click on 'No,' nothing happens. Looking at the logs, I see "Ingoring since the user asked to ignore download.".

How does llama-coder check if a model exists with ollama?

How can I use this model, as I already manually created it using the gguf file and ollama create?
Thanks in advance!

SUGGESTION: allow using arbitrary model

I think it would be nice to have the possibility to utilize any model instead of being limited by the list you provide in settings

Would be good to add the equivalent of the Copilot Chat

As said in the title, the chat feature is really missing. ie if I want the assistant to explain some code, I can't do it currently with llama coder.

Astro files not getting completions

For some reason the extension doesnt work with .astro files?

The tab on the bottom doesn't show, and there is no code completion. Seems like a bug with vscode maybe? All other file types I've tried work.

Unknown model undefined

Always shows Unknown model undefined when trying to use. Ollama is running and different models tried.
Reinstalled several time with no help.

Support for Visual Studio

Please create a Visual Studio version of the extension. Thanks.

Not working with Remote-SSH form Microsoft, works fine on local files

I work primarily remote using the ms-vscode-remote.remote-ssh-edit extention. I'm not able to get the llama-coder autocomplete to work on those projects. It works just fine for projects on my local machine. Not sure how to debug further, but seems that llama-coder does not trigger when I stop typing.

publish to OpenVSX registry

this would allow it to be used with VSCodium (de-Microsofted VSCode)

the registry is here: https://open-vsx.org/

Avoid downloading models if already installed using ollama

I already installed the codellama and deepseek-llm models using ollama but your vscode extensions llama-coder is now displaying downloading.., so i guess it's downloading additional models needed. Would it be possible you could check if the models already exists on the system and make downloading them optional to save time and space?

Any plan to add latest coding models like WizardCoder-Python, Phind-Codellama?

There are superior coding models out now that are better than CodeLlama. Any plan to add support for those?

Please add Deepseek-coder models

Hi,

can you please update the extension to enable usage the new Deepseek-coder models.
The models are already implemented in Ollama and outperforms code-lama models.

deepseek-coder

Thank you.

base model issue

i have downloaded the base model but it's not working

Allow toggle completion instead of autocompletion

It would be nice to be able to actively tell the plugin "I want an autocompletion for the current line" instead of "do autocomplete all the time" with a shortcut.

So instead of query the model after 250ms automatically after the last pressed key, just wait for the user to press a shortcut.

Error during inference: fetch failed

I have an ollama container running the stable-code:3b-code-q4_0 model. I'm able to interact with the model via curl:

curl -d '{"model":"stable-code:3b-code-q4_0", "prompt": "c++"}' https://notarealurl.io/api/generate

and get a response in a terminal in wsl where I'm running vscode:

However when I set the Ollama Server Endpoint to https://notarealurl.io/ I just get [warning] Error during inference: fetch failed

Inconsistent Tab Behavior Due to Suggestions Ending with Newline

I've noticed that code suggestions from the extension sometimes include a trailing newline as shown in the screenshot, leading to an issue where pressing the Tab key inserts a tab space instead of accepting the suggestion. This disrupts the coding flow. Maybe this could be fixed by triming the newline from the end of suggestions before they're returned, as shown below:

if (res.endsWith('\n')) {
    res = res.slice(0, -1);
}

Unsupported document: vscode-vfs

im wonking github in vscode and i get this error in codellama output:

[info] Unsupported document: vscode-vfs://github/docker/dockercraft/docker-compose.yaml ignored.

Adding more models in enum package.json

@ex3ndr, hi,
is it possible to add to enum package.json absolutely all models? Very few 70b models

Ignoring download reset

It seems that ignoring the download of a model will disable any download, even when the model is changed. There should be a way to reset this.

llama-coder on JetBrains IDE

Hi!
Do you plan to create a version of llama-coder for the JetBrains IDE?

Add setting to reduce autocomplete suggestions

I noticed that on every little change in the file, the extension will give me autocompletion suggestion. Often they are not necessary and "random" since they lack context.

Additionally this causes a big overheat and makes Ollama crash easily on my PC. So I suggest a feature would be nice, which regulates the amount of requests the extension does to the model. Something like an additional setting would be nice, which can be regulated by the user (e.g. make request of autocomplete to the model only 0.5 seconds after the user didn't input anything in the editor).

As a user I would like to be able to disable autocomplete when running on battery

When my computer is running on battery I would like to be able to disable autocomplete so that I can save power. This behaviour should be configurable so that users can decide whether they want to optimize for power efficiency or productivity.

In order to preserve the first-run experience the extension should default to performing autocomplete regardless of power source.
If autocomplete is disabled due to this behaviour it should be indicated in the status bar item.
This behaviour should take precedent over whether a user has decided to pause autocomplete (see #13).
If autocomplete is disabled due to this setting and the user manually pauses autocomplete, autocomplete should remain paused (for the duration of the session) after either disabling this setting or attaching the computer to a power source.

Need clarification: Ollama and codellama-70b running. Will Llama Coder use this?

As title says
If I've already pulled the new (as of 2024-01-30) codellama-70b from meta (or python variant)
Will Llama Coder use this?
Or does it download the 34b and run that?

Does it just run whatever I'm running in Ollama and the list of models you provide are more "recommendations"?
Instructions seem to contradict or are not clear.
One one hand it simply says:

Local Installation
Install Ollama on local machine and then launch the extension in VSCode, everything should work as it is.

But then below that, the list of models doesn't go to 70b and probably doesn't include the new meta ones 70b, 70b python and 70b instruct?

Since my machine is capable of running it, I would prefer to.
Successfully running (and quite fast!) ollama run codellama:70b from here: https://ollama.ai/library/codellama:70b

Only reason I haven't simply installed and launched is because I don't want to end up with a 34b download in an unspecified location despite already running 70b :)

Thanks

Unable to install extension 'ex3ndr.llama-coder' as it is not compatible with VSCodium '1.83.1'.

I am using VSCodium as an open alternative and this plugin does not seem to like Vscodium. I always install using .vsix packages but this one failed with this message.

thanks

EDIT:

updated Vscodium, works now

More flexibility with remote hosts

Hi,

I use VS Code's Remote SSH/Remote Containers plugin for most of my development.

Testing out llama-coder initially I ran ollama on another machine on my local network, separate from my development laptop. This worked great!

However when I use VS Code to connect to a remote development server, that server doesn't have access to the local machine I use for inference. VS Code only allows llama-coder to run in the remote host. Unfortunately VS Code can't use reverse SSH tunnels, which was the first solution I tried.

I suspect that this could be resolved by adding "extensionKind": ["workspace", "ui"] to the extension manifest, as described here. This would leave the default behavior of the extension running on the remote host, but allow it to be switched to running locally if desired.

Let me know what you think, the above comes from reading over the linked pages but I don't have any experience writing VS Code extensions so I could be wrong about something here.

ex3ndr / llama-coder Goto Github PK

llama-coder's Issues

Recommend Projects

Recommend Topics

Recommend Org