How can we take advantage of gisting ?<

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Thank you for engaging on this issue <a class="user-mention notranslate" data-hovercar

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Implement Gisting about openadapt HOT 11 CLOSED

openadaptai commented on July 22, 2024

Implement Gisting

from openadapt.

Comments (11)

FFFiend commented on July 22, 2024 1

I've tried mapping the concept to fit Puterbot's case but I'm having trouble understanding what the paper is essentially proposing. Are we compressing prompts into "gist tokens" after initially feeding the LM the entire uncompressed prompt so that it saves on read and compute time later on? Or are we compressing it from the beginning ? The attention masks part is especially confusing, so I was wondering if I could get someone else's perspective on Gisting :D

from openadapt.

FFFiend commented on July 22, 2024 1

Alright so I think I spent an unreasonable amount of time trying to understand how exactly the gist tokens themselves looked like from multiple reads on the paper, but I think the repo you linked above blackboxes it for us.

What I think remains now is to take advantage of the above repo such that any prompting we do on the LLM we use to automate processes, can be gisted and we save on compute time. Unfortunately, I haven't tinkered around with the Puterbot codebase enough yet to try and see where we can inject this improvement :/ Will comment/commit soon :D

Edit: Also, I was hoping if you could explain what "diff" meant? I saw it used on the other repo too but I haven't been able to infer what it implies :D

from openadapt.

abrichr commented on July 22, 2024 1

@FFFiend if you are interested in being considered for an internship, please submit a PR with your work, and reference this issue in the description.

from openadapt.

abrichr commented on July 22, 2024

Thank you for engaging on this issue @FFFiend !

Are we compressing prompts into "gist tokens" after initially feeding the LM the entire uncompressed prompt so that it saves on read and compute time later on?

I believe that is correct. From https://arxiv.org/pdf/2304.08467.pdf:

we add a single special gist token to
the model vocabulary and embedding matrix, much
like the start/end-of-sentence tokens often present
in such models. Then, given a (task, input) pair
(t, x), we concatenate t and x with a set of k successive gist tokens in between: (t, g1, . . . , gk, x), e.g.
Translate French: The cat. This
sequence is fed into the model, with the restriction that input tokens after the gist tokens cannot
attend to any of the prompt tokens before the gist
tokens (but they can attend to the gist tokens). This
forces the model to compress the information in
the prompt into the gist tokens, since the input x
(and output y) cannot attend to the prompt t.

According to the repo (https://github.com/jayelm/gisting) it appears that the trained gist model parameter diffs are now available on HuggingFace for LLaMA-7B and FLAN-T5-XXL 😄

from openadapt.

abrichr commented on July 22, 2024

@FFFiend please join us on Slack if you haven't already: https://join.slack.com/t/mldsai/shared_invite/zt-1uf94nn7r-qcQnS~hinLPKftUapNzbuw (link at top of README)

from openadapt.

abrichr commented on July 22, 2024

Alright so I think I spent an unreasonable amount of time trying to understand how exactly the gist tokens themselves looked like from multiple reads on the paper,

Thank you!

but I think the repo you linked above blackboxes it for us.

Unfortunate, but not a deal breaker.

I haven't tinkered around with the Puterbot codebase enough yet to try and see where we can inject this improvement

I think the MVP looks like creating a GistingReplayStrategyMixin, analogous to e.g. LLMReplayStrategyMixin at https://github.com/MLDSAI/puterbot/blob/main/puterbot/strategies/llm_mixin.py (along the way, perhaps it's worthwhile thinking about what a general framework for any model might look like).

Integrate whatever they make available with as few lines as possible 👍

from openadapt.

abrichr commented on July 22, 2024

Edit: Also, I was hoping if you could explain what "diff" meant? I saw it used on the other repo too but I haven't been able to infer what it implies :D

In general a diff is just a difference between two states (e.g. the previous one and the current one). In our case it refers to Screenshots/WindowStates. I'm not sure about what "the other repo" is referencing, can you please clarify? 🙏

from openadapt.

abrichr commented on July 22, 2024

but I think the repo you linked above blackboxes it for us.

Can you please clarify?

From https://github.com/jayelm/gisting#demo--checkpoints:

To use the model and try out gist caching, use the src/compress.py script, e.g.

python -m src.compress --model_name_or_path jayelm/llama-7b-gist-1 --base_llama_path llama-7b \
    --instruction "Name the top cities in France that should not be missed. Include the best aspects of each place as well."

Seems to me like we want this in the GistingReplayStrategyMixin, with a compress method

from openadapt.

FFFiend commented on July 22, 2024

Edit: Also, I was hoping if you could explain what "diff" meant? I saw it used on the other repo too but I haven't been able to infer what it implies :D

In general a diff is just a difference between two states (e.g. the previous one and the current one). In our case it refers to Screenshots/WindowStates. I'm not sure about what "the other repo" is referencing, can you please clarify? 🙏

Oh I meant the Gisting repo used that word too and I was confused what its technical implications were :D

but I think the repo you linked above blackboxes it for us.

Can you please clarify?

From https://github.com/jayelm/gisting#demo--checkpoints:
To use the model and try out gist caching, use the src/compress.py script, e.g.
python -m src.compress --model_name_or_path jayelm/llama-7b-gist-1 --base_llama_path llama-7b \
    --instruction "Name the top cities in France that should not be missed. Include the best aspects of each place as well."
Seems to me like we want this in the GistingReplayStrategyMixin, with a compress method

Yep noticed this too, apologies on the confusion. The paper didn't go into an example of what the gist tokens looked like and upon a cursory read of the readme it didn't seem like showing the actual compression was something they highlighted, but I was wrong.

from openadapt.

FFFiend commented on July 22, 2024

Continuing our conversation, I just found out you can make Python modules using an empty init file LOL this task seems a tad bit easier now haha, I was going crazy looking for ways to integrate that ENTIRE codebase to complete the task

Edit: it appears I can't make use of Gisting as a module, neither with imports nor a pip install. I've created an issue over at its repository for now.

from openadapt.

FFFiend commented on July 22, 2024

Update: I spoke to the author of the repo and was proposed with 2 possible solutions: write a setup.py script or simply include the repo as a submodule on my fork.

So my plan is to use the Gisting repo as a submodule, modify the compress.py main file to return a compressed version of the input (which is passed in as the "instruction" variable to be compressed as I believe all we will be feeding any LLM we work with is instructions anyway, such as "open gmail and send an email to my Mom") as well as the gisted input tokens as well, and pass these into the model to generate the output, i.e instead of using the Tokenizer like in the llm_mixin file on line 53, we generate the input_tokens variable from a call to the modified compress.py file.

Do I have the right idea? If so I can whip up a quick PR :)

from openadapt.

Implement Gisting about openadapt HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent