Light

svilupp / aihelpme.jl Goto Github PK

View Code? Open in Web Editor NEW

30.0 30.0 5.0 24.63 MB

Harnessing Julia's Rich Documentation for Tailored AI-Assisted Coding Guidance

Home Page: https://svilupp.github.io/AIHelpMe.jl/dev/

License: MIT License

Julia 100.00%

generative-ai julia rag

aihelpme.jl's Introduction

About me

👋 Hi, I’m @svilupp!
👀 I’m interested in Bayesian and Causal Inference, because the world is an uncertain place ready to be intervened on!
🎯 My aspiration is unlocking Generative AI's 100x potential (🔗 see my blog: svilupp.github.io or try my packages: PromptingTools, LLMTextAnalysis, AIHelpMe)
📫 How to reach me? Probably the best way is right here or on Julia Slack!

Why this repo?

Things get outdated quickly...

I hope to share several examples of interesting models that I have worked on in my personal capacity:

Causal inference for better sleep - How restful would your sleep be with 1) a weighted blanket and 2) CBD drops? (EconML and generative modelling case study with OURA ring data)
Hierarchical Bayesian models for outlier detection in deeply nested event data (Numpyro multilevel model with multivariate COPULA-based wrapper on top)
An interactive business case with full-on uncertainty (Bayesian model with a tool for priors elicitation from SMEs, delivered as an interactive data app in Streamlit)
Can you beat a simple memory-based recommendation when you have extremely sparse interaction data? (LightFM case study)

And something just for fun

Deep (not deep-dish! pure Napoli-style!) pizza classifier, ie, how many samples does it take to tell Double pepperoni and Margharita apart?
Why do we wait so much in NHS hospitals? (Discrete-event-simulation-based case study)

aihelpme.jl's People

Contributors

Stargazers

Watchers

Forkers

pgimenez splendidbug adarshpalaskar1 lazarusa

aihelpme.jl's Issues

[FR] Add simple index deduplication

When we run load_index!([:a,:b]), the knowledge packs a and b can have duplicate content. It would be good to dedupe once we merge them (see src/loading.jl::78).

[FR] Add additional knowledge bases

Add the following knowledge sources:

Knowledge should contain both the documentation and the code snippets.
To be added as separate artifacts, clearly label the embedding model (and associated parameters like dimensions).

In addition, we need to build:

Docsite scraper for repeatable collection
- Scraper must be compliant (robots.txt) and considerate (throttling requests)
- It must be able to focus only on specific package version and ignore others
High quality chunker/processor
- It must be able to deduplicate content
- It should honor the structure of the content as much as possible, to generate high-quality document chunks

This functionality should be well-documented and user-friendly, so that anyone can index their own favourite package (and, ideally, share it with others in the Julia community).

All this tooling should live in the AIHelpMe as a separate module (initially) with its own separate dependencies (eg, Gumbo, etc).

[FR] [Docs] Improve documentation

Improve documentation:

add more examples (Ollama etc)
migrate to Vitepress

[FR] UI Interface for basic functionality

It would be great to have a UI for the users (as a separate package).

We’ve been discussing it with it the Genie team and are exploring that option now.

[FR] Add support for Ollama-based models

We need an option that's free and can be run locally.

Embedding model: nomic-embed-text

[FR] Add `aisearch` function

It would be great to provide a direct interface for Tavily search API for Julia specific searches.

On google, one encounters a lot of noise because Julia is not as common programming language. With a dedicated function, we could automatically outline some domains to include/exclude to automatically improve on the answer quality.

As an additional benefit, it would prevent us from having to break our flow and go to a browser.

Suggested interface: aisearch("some question about julia")
Tavily can automatically provide an answer as well, so we can skip the LLM call. But we could also add an LLM call that will know it's Julia specific. We'd need to measure which works better.

[FR] Update compat for PromptingTools

We need to migrate to the new RAG interface in PromptingTools v0.16 and answer highlighting.

[FR] Add `aisummary` function

It would be great to add a little function to summarize some function/functionality.

It would require slightly different approach as unlike RAG pulling only the top K relevant snippets, we would want to first compile ALL relevant snippets and then provide them into a summarization prompt.

[FR] Integrate Preferences.jl to allow persistent settings of pipelines/models

[FR] Add extension for LLMTextAnalysis

LLMTextAnalysis allows to display text through the lens of semantic distance.

It would be good to add it as an optional explorer for any answer, eg,

where in the space did my question(s) land
what were the retrieved docs (k, n)

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.