Giter Site home page Giter Site logo

miraclemarvel55 / chatglm-rlhf Goto Github PK

View Code? Open in Web Editor NEW
186.0 1.0 26.0 954 KB

对ChatGLM直接使用RLHF提升或降低目标输出概率|Modify ChatGLM output with only RLHF

License: GNU Affero General Public License v3.0

Python 100.00%
chatglm custom nickname rlhf ppo reward similarity

chatglm-rlhf's Introduction

ChatGLM-RLHF

本代码的RLHF代码不需要Megatron或者deepspeed框架,也不需要专门下载RL的库,实现的RLHF的PPO只需要不到100行即可,轻松阅读。 有炼丹torch和transformers就好了,RLHF的Critic用的ChatGLM的缩小版本,而Reward咱们直接使用一个和目标输出比较的相似度模型即可(基于相似度的Reward比较通用,如有需要可以自己实现各自场景更适合的Reward模型)。 这样只需要学习核心的PPO算法即可,其他的都是已经了解的模型和结构。非常有利于NLPer进军RLHF,似乎看起来只需要RLHF也能finetune模型

功能:

  • RLHF数据格式的定义和使用√
  • 只用RLHF就对ChatGLM进行了微调√
  • 让ChatGLM认主√
    • 修改自我认知钢印
      • 主人的姓名
      • Robot的昵称
  • 更通用的Reward模型×
  • LLaMA-MOSS-RLHF√
  • batch 化生成多个不同的prompt,再同时RLHF×
  • 实现更精细化的情人设定×
  • 结合自己资料的检索和生成×

使用方法

安装环境仿照,ChatGLM-6B官方那个即可

1 修改自己想要的主人名字和昵称,执行

python data/generate_data.py

2 基于自己的硬件情况,选择CPU、GPU设定。我这边有两张卡,所以可以使用cuda 0和1,但是至少要一张卡3090 24G。训练的话因为action模型也就是ChatGLM是一定要在gpu上面的要不然实在太慢了。如果只是为了学习算法和调试观察张量流的变化,全部在CPU上面内存较大也是OK的,这样没有GPU的同学也能跟进和实践最新的NLP技术。

python chatglm_rlhf.py

效果展示

训练大约5个epoch,或者等到ratio几乎都是1的时候,代表模型生成的概率已经没有什么变化了,就可以体验一下了

  • 咩咩是你的什么?
    • 咩咩是我的主人给我起的昵称。
  • 咩咩是谁给你起的?
    • 咩咩是我的昵称。
    • 咩咩是主人给我起的。
  • 谁是你的主人捏?
    • 张三是我的主人。
    • 我的主人是张三
  • 泛化能力保持的很好嘛
    • who is your master
      • 我的主人是张三。
    • what is your nickname
      • My nickname is咩咩.
    • what is your relationship with 张三
      • 张三是我的主人。
    • what is your relationship with 咩咩
      • 咩咩是我的主人给我起的昵称。

联系方式

chatglm-rlhf's People

Contributors

miraclemarvel55 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

chatglm-rlhf's Issues

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (5,) + inhomogeneous part.

出现这个错误了,大佬有解吗?
['/home/zhangshuhao/GLM_RLHF/ChatGLM-RLHF', '/home/zhangshuhao/anaconda3/envs/ChatGLM-RLHF/lib/python38.zip', '/home/zhangshuhao/anaconda3/envs/ChatGLM-RLHF/lib/python3.8', '/home/zhangshuhao/anaconda3/envs/ChatGLM-RLHF/lib/python3.8/lib-dynload', '/home/zhangshuhao/anaconda3/envs/ChatGLM-RLHF/lib/python3.8/site-packages', '/home/zhangshuhao/anaconda3/envs/ChatGLM-RLHF/lib/python3.8/site-packages/trl-0.4.2.dev0-py3.8.egg']
The argument trust_remote_code is to be used with Auto classes. It has no effect here and is ignored.
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████| 8/8 [00:07<00:00, 1.09it/s]
Some weights of the model checkpoint at THUDM/chatglm-6b were not used when initializing ChatGLMModel: ['lm_head.weight']

  • This IS expected if you are initializing ChatGLMModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
  • This IS NOT expected if you are initializing ChatGLMModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
    Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
    The argument trust_remote_code is to be used with Auto classes. It has no effect here and is ignored.
    Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████| 8/8 [00:07<00:00, 1.06it/s]
    0%| | 0/16 [00:00<?, ?it/s]The dtype of attention mask (torch.int64) is not bool
    你的主人是谁?
    ['作为一个人工智能助手,我没有真正的主人。']
    Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.
    0%| | 0/16 [00:01<?, ?it/s]
    Traceback (most recent call last):
    File "chatglm_rlhf.py", line 212, in
    main(prompts_path = dialogues_path)
    File "chatglm_rlhf.py", line 164, in main
    reward = reward_model(gen_texts=gen_texts, good_answers=good_answers, bad_answers=bad_answers).unsqueeze(1)
    File "/home/zhangshuhao/anaconda3/envs/ChatGLM-RLHF/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
    File "/home/zhangshuhao/GLM_RLHF/ChatGLM-RLHF/models_rlhf.py", line 121, in forward
    jaccards = torch.tensor(np.vectorize(jaccard_s1)(ids[-len(examples):]), dtype=coses.dtype, device=coses.device)
    File "/home/zhangshuhao/anaconda3/envs/ChatGLM-RLHF/lib/python3.8/site-packages/numpy/lib/function_base.py", line 2329, in call
    return self._vectorize_call(func=func, args=vargs)
    File "/home/zhangshuhao/anaconda3/envs/ChatGLM-RLHF/lib/python3.8/site-packages/numpy/lib/function_base.py", line 2407, in _vectorize_call
    ufunc, otypes = self._get_ufunc_and_otypes(func=func, args=args)
    File "/home/zhangshuhao/anaconda3/envs/ChatGLM-RLHF/lib/python3.8/site-packages/numpy/lib/function_base.py", line 2361, in _get_ufunc_and_otypes
    args = [asarray(arg) for arg in args]
    File "/home/zhangshuhao/anaconda3/envs/ChatGLM-RLHF/lib/python3.8/site-packages/numpy/lib/function_base.py", line 2361, in
    args = [asarray(arg) for arg in args]
    ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (5,) + inhomogeneous part.

numpy error

When I use numpy 1.24.x, I got error in jaccards = torch.tensor(np.vectorize(jaccard_s1)(ids[-len(examples):]), dtype=coses.dtype, device=coses.device)
error as follwing:
setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (4,) + inhomogeneous part.

but I use 1.22.2, I got no error.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.