kordinglab / llm4papers Goto Github PK

License: Apache License 2.0

Python 93.54% Dockerfile 0.46% TeX 6.01%

llm4papers's Introduction

llm4papers

Read our blog post announcement on the project here.

This is a simple plugin for Overleaf (and, maybe, other editors in the future) that allows you to edit a document with LLMs (Large Language Models) in an intuitive and unintrusive way.

trim-720p-llm4papers-prototype.mov

How to use

Warning: I absolutely cannot overstate how un-production-ready this tool is. Do not use this on important documents! Don't use this on documents you care about. And if you want to use this on something important, don't. Okay. Now I'm de-risked and un-liable. Do whatever you want.

Clone and install dependencies (poetry install)
Populate a config.py file with your credentials (see config.example.py)
Add a new paper to the paper manifest (right now, can be done at startup or manually by editing the papers manifest JSON)
Run the server from this repository (poetry run python3 llm4papers/service.py)
Open the paper in Overleaf and edit as you usually would. When you want to invoke the AI assistant, add a comment with the following format: @ai: <command>. For example, the brain is weird. % @ai: formalize this
The AI assistant will replace lines on which this comment is found with the output of the command.

Technical Overview

This plugin works by cloning the Overleaf git repository and editing files locally in /tmp and then pushing them back to the remote. This is done using the git Python library.

Other document APIs can be added by implementing the PaperRemote protocol in the paper_remote module. For an example, see the OverleafGitPaperRemote class.

Made with 💚 at the Kording Lab KordingLab.com

llm4papers's People

Contributors

Stargazers

Watchers

llm4papers's Issues

Figure out how to manage merge errors

Right now, if we encounter more recent changes, we fail completely and nuke the repo:

llm4papers/llm4papers/models.py

Lines 134 to 148 in 15d5a7f

    
           try: 
        
               self._repo.git.stash() 
        
               self._repo.remotes.origin.pull(force=True) 
        
               try: 
        
                   self._repo.git.stash("pop") 
        
               except Exception as e: 
        
                   # TODO: this just means there was nothing to pop, but 
        
                   # we should handle this more gracefully. 
        
                   logger.debug(f"Nothing to pop: {e}") 
        
                   pass 
        
           except Exception as e: 
        
               logger.error( 
        
                   f"Error pulling from repo {self._reposlug}: {e}. " 
        
                   "Falling back on DESTRUCTION!!!" 
        
               )

This costs us an extra round-trip to the LLM, or, at worst, an endless loop of human edits overruling AI's opportunistic editing. I think this is just because I don't know how to use the git python library that well; surely there must be a way to stash and 3-way.

Intelligently-ish fall off polling of git repos

If there haven't been any AI requests in the last X polls, don't try again for a little bit; maybe fib-falloff for now?

Recover gracefully from git 429 rate-limits on overleaf

Right now it crashes the server, which is probably not the ideal user experience.

This line in the refresh (called from the constructor) is the offender:

llm4papers/llm4papers/paper_remote/OverleafGitPaperRemote.py

Line 92 in af7a108

self._repo = Repo.clone_from(self._gitrepo, f"/tmp/{self._reposlug}")

Support sqlite paper managers

This should also optionally track all edits (as diffs) for debugging.

Add ranged edits to Overleaf editors

Right now we only create edit requests of one line...

Don't edit lines if the user is still playing with them

As a proxy-measure for this, we can detect if the line with the AI-trigger comment has been edited in the last HEAD~ commit. If it has, then ignore it for now — it's possible the user is still editing it in realtime, and we don't want to start AI'ing before they're done prompting.

Multithread the server so multiple calls can be carried out at once

Configurably, we should multithread the polling server, multiplexing across documents (but not across edits in one document, in case one changes the context of another)

Build a better prompt for general scientific writing

Right now, the LLM will sometimes respond with a trailing "@user: i did it!" or whatever, sometimes it'll strip the newline, sometimes it'll dup the whole line... A more specific prompt, and more guidance might do the trick.

Try HuggingFace instead of OpenAI

Many of the features that make guidance cool are not supported by the OpenAI API.

Guidance supports using any of the HuggingFace open-source instruct-tuned models.

To-do:

add transformers dependency
read up on pretrained models and pick a few to try out
try em out and see if they're unusably slow or memory hungry or cpu hungry or just bad at doing tasks
add choice of llm to the config
go write a bunch of guidance programs that make use of the actual guidance features (#4)

Google Docs PaperRemote integration

This is probably a better-documented process than #10 too :)

GitPython `Repo.clone_from` fails if missing authentication

Thanks @wrongu for helping me track this down! The service will just hang, waiting for interactive authentication. Not sure how to disable this and fail fast in a non-TTY.

various race conditions in applying old edits to updated documents

current trigger/edit lifecycle is

git pull
create AI triggers
git pull (to "debounce" / check for recent human edits and potentially cancel the triggers)
run AI
push changes

but the extra pull in step 3 can invalidate the triggers in step 2

should be able to just remove the extra pull

Generalize the AI request trigger

A PaperRemote should have its own totally isolated way of identifying a request. (The @ai tag is definitely not the right way to do this; I tried to keep it relatively easy to pluck out, but may have failed.)

This task is bipartite:

Add a way for a paperremote (maybe in get_next_edit_request) to look for triggers
Verify that the returned text has no remaining triggers unless they're deliberate (i.e., it's possible that it MIGHT be desired behavior to add a new trigger in the response to chain multiple calls to the AI, but that seems less likely than just accidentally forgetting to strip the comment or whatever)

If someone who is not me wants to tackle this, let's talk about some infra and design — I've started thinking about it a bit.

generalize line ranges

option 1: move to char ranges

option 2: move to 'diff' semantics rather than selection ranges

...

option N: see discussion below

Research viability of direct hooks into Overleaf

Current git-based design

Pros

able to get it up and running instantly
don't need to know anything about overleaf document models, javascript, and all that jazz. Just need to be able to read/write files using python.
easy enough to "trigger" AI suggestions with % @ai: do a thing for me syntax
all requests are initiated on our end. no need to worry about websockets or accepting incoming updates from overleaf

Cons

git merge can be tricky (#2) and the best way we have to handle it now involves lots of extra calls to OpenAI (#3)
unclear if git fetch/commit/merge loop is fast enough / scalable
little visual indication to the user that something is happening
no access to "current cursor position"
we end up wanting to implement our own database (#9)
mild slowdown due to needing to poll for git updates. but not a huge deal because LLM calls are still the bottleneck

Hypothetical direct integration with overleaf

Speculative pros

overleaf is open source
maybe able to get the assistant to appear as a collaborator in the doc / give other visual indications
maybe able to use "suggestions" and "comments" rather than just edits
maybe get access to cursor position
maybe get access to the document model and history directly
maybe able to debounce user edits more directly / hook into however overleaf already does this (e.g. with their spell-checker)

Speculative cons and things to look into

despite being open source, they don't appear to have docs about their own api. we'd have to invest a bit of time in reverse-engineering things / familiarizing ourselves with it
can we still do it in our own python server, or will there be tons of CORS / auth issues? For comparison, writefull is all javascript. But unclear if this is because they're opting for a browser plugin model or if this is necessary.

	try:
	self._repo.git.stash()
	self._repo.remotes.origin.pull(force=True)
	try:
	self._repo.git.stash("pop")
	except Exception as e:
	# TODO: this just means there was nothing to pop, but
	# we should handle this more gracefully.
	logger.debug(f"Nothing to pop: {e}")
	pass
	except Exception as e:
	logger.error(
	f"Error pulling from repo {self._reposlug}: {e}. "
	"Falling back on DESTRUCTION!!!"
	)