khoomeik / llamagym Goto Github PK

View Code? Open in Web Editor NEW

964.0 964.0 43.0 1.22 MB

Fine-tune LLM agents with online reinforcement learning

License: MIT License

Python 100.00%

llamagym's Introduction

Currently:

🙈 Developing multimodal code-generation for extracting web data at scale @ Reworkd AI (YC S23)
🔬 Understanding how vision-language models build syntactic representations
⤴ Scaling up neural satisfiability solvers
📈 Exploring how scaling laws scale with data complexity
👾 Fine-tuning LLM agents to play games with online RL
↩️ Replacing backprop in autoregressive language models

Previously:

🎓 Graduated from Carnegie Mellon '23 with Honors in Computer Science
- 🔍 My thesis on vision-language semantics is cited by Google Brain, Meta AI, Stanford, etc.
📄 Published papers at ACL, ICLR, EMNLP, & EACL conferences and NeurIPS & ICCV workshops
🧑‍💻 Exited a content research micro-SaaS with some cool clustering, fact checking, & generation features
🤖 Fine-tuned language models at Microsoft AI over summer '22
🛠️ Worked on information retrieval, question answering, & summarization at various startups '20-21
🧠 Developed brain-computer interfaces with NSF funding and placed 1st nationally at NeuroTechX '20
🏆 Won 10+ hackathons including 1st @ Facebook '19, 2nd @ UCLA '19, 3rd @ MIT '20

Warning: has not learnt the Bitter Lesson. Prone to getting nerd-sniped by linguistically & cognitively motivated AI research directions.

llamagym's People

Contributors

Stargazers

Watchers

Forkers

mivanovitch chesketh76 evelynmitchell saifrahmed krohling kustomzone diode23 jaytoday thorstone137 techthiyanes muharremokutan hbcbh1999 apollohuang1 pengjukang masterzsh rhinojosa zebrajack xbeark svorwerk-flextg papiguy taner45 f901107 brainlabz sailfish009 danikhan632 jansystemic changheecho cgcui jade2290 jamorell guolan-newbie raymondhliu kanji95 limafang erlebnisw mbrukman imneov mainrs harry-zhou

llamagym's Issues

LangChain Integration

Tracking progress for langchain integration by @langchain-ai

Respond with emoji's to show your support

what models support？ and only cpu enough？

How to load fine-tuned model after training?

I want to run some evaluations on the trained model. I couldn't really figure out how to load the fine-tuned model that llamagym generates.

crewAI Integration

Tracking progress for crewAI integration by @joaomdmoura

Respond with emoji's to show your support

Is there a comparison of training speed with the implementation in TWOSOME?

ImportError: cannot import name 'top_k_top_p_filtering' from 'transformers' (/home/eito/AIExperiments/LlamaGym/myenv/lib/python3.10/site-packages/transformers/init.py)

It seems my import fails, when I try and import Agent from llamagym:

from llamagym import Agent

(myenv) eito@clouddev-3:~/AIExperiments/LlamaGym$ python examples/my_example.py 
Traceback (most recent call last):
  File "/home/eito/AIExperiments/LlamaGym/examples/my_example.py", line 1, in <module>
    from llamagym import Agent
  File "/home/eito/AIExperiments/LlamaGym/myenv/lib/python3.10/site-packages/llamagym/__init__.py", line 1, in <module>
    from .agent import Agent
  File "/home/eito/AIExperiments/LlamaGym/myenv/lib/python3.10/site-packages/llamagym/agent.py", line 6, in <module>
    from trl import (
  File "/home/eito/AIExperiments/LlamaGym/myenv/lib/python3.10/site-packages/trl/__init__.py", line 5, in <module>
    from .core import set_seed
  File "/home/eito/AIExperiments/LlamaGym/myenv/lib/python3.10/site-packages/trl/core.py", line 25, in <module>
    from transformers import top_k_top_p_filtering
ImportError: cannot import name 'top_k_top_p_filtering' from 'transformers' (/home/eito/AIExperiments/LlamaGym/myenv/lib/python3.10/site-packages/transformers/__init__.py)

OOM when run the example

Hi, I encounter OOM when run the example in this repository, what's the minimum GPU memory requirements to run the example

WARNING:root:The `device_map` argument is not provided. We will override the device_map argument. to set the entire model on the current device. If you want to set the model on multiple devices, please provide a custom `device_map` argument.
The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:03<00:00,  1.56s/it]
/home/haosdent/miniconda3/envs/localGPT/lib/python3.11/site-packages/trl/trainer/ppo_trainer.py:257: UserWarning: No dataset is provided. Make sure to set config.batch_size to the correct value before training.
  warnings.warn(
  0%|                                                                                                                                                             | 0/5000 [00:00<?, ?it/s]/home/haosdent/miniconda3/envs/localGPT/lib/python3.11/site-packages/bitsandbytes/autograd/_functions.py:322: UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization
  warnings.warn(f"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization")
  0%|▏                                                                                                                                                  | 5/5000 [00:11<3:06:40,  2.24s/it]
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/home/haosdent/k8s-rl/ppo_by_llm/ppo_train.py", line 110, in <module>
    train_stats = agent.terminate_episode()
                  ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/haosdent/miniconda3/envs/localGPT/lib/python3.11/site-packages/llamagym/agent.py", line 140, in terminate_episode
    train_stats = self.train_batch(
                  ^^^^^^^^^^^^^^^^^
  File "/home/haosdent/miniconda3/envs/localGPT/lib/python3.11/site-packages/llamagym/agent.py", line 165, in train_batch
    train_stats = self.ppo_trainer.step(queries, responses, rewards)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/haosdent/miniconda3/envs/localGPT/lib/python3.11/contextlib.py", line 81, in inner
    return func(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/haosdent/miniconda3/envs/localGPT/lib/python3.11/site-packages/trl/trainer/ppo_trainer.py", line 788, in step
    logprobs, logits, vpreds, _ = self.batched_forward_pass(
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/haosdent/miniconda3/envs/localGPT/lib/python3.11/contextlib.py", line 81, in inner
    return func(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/haosdent/miniconda3/envs/localGPT/lib/python3.11/site-packages/trl/trainer/ppo_trainer.py", line 984, in batched_forward_pass
    logits, _, values = model(**input_kwargs)
                        ^^^^^^^^^^^^^^^^^^^^^
  File "/home/haosdent/miniconda3/envs/localGPT/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/haosdent/miniconda3/envs/localGPT/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1561, in _call_impl
    result = forward_call(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/haosdent/miniconda3/envs/localGPT/lib/python3.11/site-packages/trl/models/modeling_value_head.py", line 170, in forward
    base_model_output = self.pretrained_model(
                        ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/haosdent/miniconda3/envs/localGPT/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/haosdent/miniconda3/envs/localGPT/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/haosdent/miniconda3/envs/localGPT/lib/python3.11/site-packages/peft/peft_model.py", line 1073, in forward
    return self.base_model(
           ^^^^^^^^^^^^^^^^
  File "/home/haosdent/miniconda3/envs/localGPT/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/haosdent/miniconda3/envs/localGPT/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/haosdent/miniconda3/envs/localGPT/lib/python3.11/site-packages/peft/tuners/tuners_utils.py", line 103, in forward
    return self.model.forward(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/haosdent/miniconda3/envs/localGPT/lib/python3.11/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/haosdent/miniconda3/envs/localGPT/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 1176, in forward
    outputs = self.model(
              ^^^^^^^^^^^
  File "/home/haosdent/miniconda3/envs/localGPT/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/haosdent/miniconda3/envs/localGPT/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/haosdent/miniconda3/envs/localGPT/lib/python3.11/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/haosdent/miniconda3/envs/localGPT/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 1019, in forward
    layer_outputs = decoder_layer(
                    ^^^^^^^^^^^^^^
  File "/home/haosdent/miniconda3/envs/localGPT/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/haosdent/miniconda3/envs/localGPT/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/haosdent/miniconda3/envs/localGPT/lib/python3.11/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/haosdent/miniconda3/envs/localGPT/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 755, in forward
    hidden_states = self.mlp(hidden_states)
                    ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/haosdent/miniconda3/envs/localGPT/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/haosdent/miniconda3/envs/localGPT/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/haosdent/miniconda3/envs/localGPT/lib/python3.11/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/haosdent/miniconda3/envs/localGPT/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 241, in forward
    down_proj = self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
                                                                ^^^^^^^^^^^^^^^
  File "/home/haosdent/miniconda3/envs/localGPT/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/haosdent/miniconda3/envs/localGPT/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/haosdent/miniconda3/envs/localGPT/lib/python3.11/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/haosdent/miniconda3/envs/localGPT/lib/python3.11/site-packages/bitsandbytes/nn/modules.py", line 414, in forward
    out = bnb.matmul(x, self.weight, bias=self.bias, state=self.state)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/haosdent/miniconda3/envs/localGPT/lib/python3.11/site-packages/bitsandbytes/autograd/_functions.py", line 563, in matmul
    return MatMul8bitLt.apply(A, B, out, bias, state)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/haosdent/miniconda3/envs/localGPT/lib/python3.11/site-packages/torch/autograd/function.py", line 553, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/haosdent/miniconda3/envs/localGPT/lib/python3.11/site-packages/bitsandbytes/autograd/_functions.py", line 404, in forward
    output = F.mm_dequant(out32, Sout32, SCA, state.SCB, bias=bias)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/haosdent/miniconda3/envs/localGPT/lib/python3.11/site-packages/bitsandbytes/functional.py", line 1816, in mm_dequant
    out = torch.empty(out_shape, dtype=torch.float16, device=A.device)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 30.00 MiB. GPU 0 has a total capacity of 23.50 GiB of which 9.69 MiB is free. Including non-PyTorch memory, this process has 23.47 GiB memory in use. Of the allocated memory 23.01 GiB is allocated by PyTorch, and 195.31 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

Fix the unexpected action of llm

In agent.py, method Agent.llm(), I'm wondering we should add two statements before and after self.model.generate(), as:

context_len = inputs['attention_mask'].size(1)  # add
generate_ids = self.model.generate(...)
generate_ids = generate_ids[:, context_len:]  # add

outputs before add:

>>> outputs
['[INST] <<SYS>>\nYou are an expert blackjack player. Every turn, you\'ll see your current sum, the dealer\'s showing card value, and whether you have a usable ace. Win by exceeding the dealer\'s hand but not exceeding 21.\nDecide whether to stay with your current sum by writing "Action: 0" or accept another card by writing "Action: 1". Accept a card unless very close to 21.\n<</SYS>>\n\nYou: 15. Dealer: 5. You have no ace. [/INST]  Action: 0']

outputs after add:

>>> outputs
[' Action: 0']

we get the expected action by truncating the beginning part of the llm output, which is identical to the input of llm.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.