flagai-open / flagai Goto Github PK

FlagAI (Fast LArge-scale General AI models) is a fast, easy-to-use and extensible toolkit for large-scale model.

License: Apache License 2.0

Python 99.89% Shell 0.04% Dockerfile 0.07%

flagai's Introduction

FlagAI (Fast LArge-scale General AI models) is a fast, easy-to-use and extensible toolkit for large-scale model. Our goal is to support training, fine-tuning, and deployment of large-scale models on various downstream tasks with multi-modality.

Why should I use FlagAI?

Quickly Download Models via API

FlagAI provides an API that allows you to quickly download pre-trained models and fine-tune them on a wide range of datasets collected from SuperGLUE and CLUE benchmarks for both Chinese and English text.

FlagAI now supports over 30 mainstream models, including Language Model Aquila, multilingual text and image representation model AltCLIP, text-to-image generation model AltDiffusion , WuDao GLM (with a maximum of 10 billion parameters), EVA-CLIP, OPT, BERT, RoBERTa, GPT2, T5, ALM, and models from Huggingface Transformers, etc.
Parallel train with fewer than 10 lines of code

Backed by the four most popular data/model parallel libraries -- PyTorch, Deepspeed, Megatron-LM, BMTrain -- FlagAI allows for seamless integration between them, enabling users to parallel their training/testing process with fewer than ten lines of code.
Conveniently use the few-shot learning toolkits

FlagAI also provides prompt-learning toolkit for few-shot tasks.
Particularly good at Chinese tasks

These models can be applied to (Chinese/English) Text, for tasks like text classification, information extraction, question answering, summarization, and text generation, with a particular focus on Chinese tasks.

Toolkits and Pre-trained Models

The code is partially based on GLM, Transformers，timm and DeepSpeedExamples.

Toolkits

Name	Description	Examples
`GLM_custom_pvp`	Customizing PET templates	README.md
`GLM_ptuning`	p-tuning tool	——
`BMInf-generate`	Accelerating generation	README.md

Pre-trained Models

Model	Task	Train	Finetune	Inference/Generate	Examples
Aquila	Natural Language Processing	✅	✅	✅	README.md
ALM	Arabic Text Generation	✅	❌	✅	README.md
AltCLIP	Image-Text Matching	✅	✅	✅	README.md
AltCLIP-m18	Image-Text Matching	✅	✅	✅	README.md
AltDiffusion	Text-to-Image Generation	❌	❌	✅	README.md
AltDiffusion-m18	Text-to-Image Generation,supporting 18 languages	❌	❌	✅	README.md
BERT-title-generation-english	English Title Generation	✅	❌	✅	README.md
CLIP	Image-Text Matching	✅	❌	✅	——
CPM3-finetune	Text Continuation	❌	✅	❌	——
CPM3-generate	Text Continuation	❌	❌	✅	——
CPM3_pretrain	Text Continuation	✅	❌	❌	——
CPM_1	Text Continuation	❌	❌	✅	README.md
EVA-CLIP	Image-Text Matching	✅	✅	✅	README.md
Galactica	Text Continuation	❌	❌	✅	——
GLM-large-ch-blank-filling	Blank Filling	❌	❌	✅	TUTORIAL
GLM-large-ch-poetry-generation	Poetry Generation	✅	❌	✅	TUTORIAL
GLM-large-ch-title-generation	Title Generation	✅	❌	✅	TUTORIAL
GLM-pretrain	Pre-Train	✅	❌	❌	——
GLM-seq2seq	Generation	✅	❌	✅	——
GLM-superglue	Classification	✅	❌	❌	——
GPT-2-text-writting	Text Continuation	❌	❌	✅	TUTORIAL
GPT2-text-writting	Text Continuation	❌	❌	✅	——
GPT2-title-generation	Title Generation	❌	❌	✅	——
OPT	Text Continuation	❌	❌	✅	README.md
RoBERTa-base-ch-ner	Named Entity Recognition	✅	❌	✅	TUTORIAL
RoBERTa-base-ch-semantic-matching	Semantic Similarity Matching	✅	❌	✅	TUTORIAL
RoBERTa-base-ch-title-generation	Title Generation	✅	❌	✅	TUTORIAL
RoBERTa-faq	Question-Answer	❌	❌	✅	README.md
Swinv1	Image Classification	✅	❌	✅	——
Swinv2	Image Classification	✅	❌	✅	——
T5-huggingface-11b	Train	✅	❌	❌	TUTORIAL
T5-title-generation	Title Generation	❌	❌	✅	TUTORIAL
T5-flagai-11b	Pre-Train	✅	❌	❌	——
ViT-cifar100	Pre-Train	✅	❌	❌	——

More excamples in ./examples

More tutorials in ./docs

Contributing

Thanks for your interest in contributing! There are many ways to get involved; start with our contributor guidelines and then check these open issues for specific tasks.

Contact us

Welcome to raise your questions or feature requests on GitHub Issues , and share your experience on the Discussions board.

Official email: [email protected].
Zhihu: FlagAI
Scan the qrcode to join the WeChat group for communication:

Quick Start

We provide many models which are trained to perform different tasks. You can load these models by AutoLoader to make prediction. See more in FlagAI/quickstart.

Requirements and Installation

Python version >= 3.8
PyTorch version >= 1.8.0
[Optional] For training/testing models on GPUs, you'll also need to install CUDA and NCCL

To install FlagAI with pip:

pip install -U flagai

[Optional] To install FlagAI and develop locally:

git clone https://github.com/FlagAI-Open/FlagAI.git
python setup.py install

[Optional] For faster training, install NVIDIA's apex

git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

[Optional] For ZeRO optimizers, install DEEPSPEED (>= 0.7.7)

git clone https://github.com/microsoft/DeepSpeed
cd DeepSpeed
DS_BUILD_CPU_ADAM=1 DS_BUILD_AIO=1 DS_BUILD_UTILS=1 pip install -e .
ds_report # check the deespeed status

[Optional] For BMTrain training, install BMTrain (>= 0.2.2)

git clone https://github.com/OpenBMB/BMTrain
cd BMTrain
python setup.py install

[Optional] For BMInf low-resource inference, install BMInf

pip install bminf

[Optional] For Flash Attention, install Flash-attention (>=1.0.2)

pip install flash-attn

[Tips] For single-node docker environments, we need to set up ports for your ssh. e.g., [email protected] with port 711

>>> vim ~/.ssh/config
Host 127.0.0.1
    Hostname 127.0.0.1
    Port 7110
    User root

[Tips] For multi-node docker environments, generate ssh keys and copy the public key to all nodes (in ~/.ssh/)

>>> ssh-keygen -t rsa -C "[email protected]"

Load model and tokenizer

We provide the AutoLoad class to load the model and tokenizer quickly, for example:

from flagai.auto_model.auto_loader import AutoLoader

auto_loader = AutoLoader(
    task_name="title-generation",
    model_name="BERT-base-en"
)
model = auto_loader.get_model()
tokenizer = auto_loader.get_tokenizer()

This example is for the title_generation task, and you can also model other tasks by modifying the task_name. Then you can use the model and tokenizer to fine-tune or test.

Examples

1. Predictor

We provide the Predictor class to predict for different tasks, for example:

from flagai.model.predictor.predictor import Predictor
predictor = Predictor(model, tokenizer)
test_data = [
    "Four minutes after the red card, Emerson Royal nodded a corner into the path of the unmarked Kane at the far post, who nudged the ball in for his 12th goal in 17 North London derby appearances. Arteta's misery was compounded two minutes after half-time when Kane held the ball up in front of goal and teed up Son to smash a shot beyond a crowd of defenders to make it 3-0.The goal moved the South Korea talisman a goal behind Premier League top scorer Mohamed Salah on 21 for the season, and he looked perturbed when he was hauled off with 18 minutes remaining, receiving words of consolation from Pierre-Emile Hojbjerg.Once his frustrations have eased, Son and Spurs will look ahead to two final games in which they only need a point more than Arsenal to finish fourth.",
]

for text in test_data:
    print(
        predictor.predict_generate_beamsearch(text,
                                              out_max_length=50,
                                              beam_size=3))

This example is for the seq2seq task, where we can get beam-search results by calling the predict_generate_beamsearch function. In addition, we also support prediction for tasks such as NER and title generate.

2. NER

from flagai.auto_model.auto_loader import AutoLoader
from flagai.model.predictor.predictor import Predictor

task_name = "ner"
model_name = "RoBERTa-base-ch"
target = ["O", "B-LOC", "I-LOC", "B-ORG", "I-ORG", "B-PER", "I-PER"]
maxlen = 256

auto_loader = AutoLoader(task_name,
                         model_name=model_name,
                         load_pretrain_params=True,
                         class_num=len(target))

model = auto_loader.get_model()
tokenizer = auto_loader.get_tokenizer()

predictor = Predictor(model, tokenizer)

test_data = [
    "6月15日，河南省文物考古研究所曹操高陵文物队公开发表声明承认：“从来没有说过出土的珠子是墓主人的",
    "4月8日，北京冬奥会、冬残奥会总结表彰大会在人民大会堂隆重举行。***总书记出席大会并发表重要讲话。在讲话中，总书记充分肯定了北京冬奥会、冬残奥会取得的优异成绩，全面回顾了7年筹办备赛的不凡历程，深入总结了筹备举办北京冬奥会、冬残奥会的宝贵经验，深刻阐释了北京冬奥精神，对运用好冬奥遗产推动高质量发展提出明确要求。",
    "当地时间8日，欧盟委员会表示，欧盟各成员国政府现已冻结共计约300亿欧元与俄罗斯寡头及其他被制裁的俄方人员有关的资产。",
    "这一盘口状态下英国必发公司亚洲盘交易数据显示博洛尼亚热。而从欧赔投注看，也是主队热。巴勒莫两连败，",
]

for t in test_data:
    entities = predictor.predict_ner(t, target, maxlen=maxlen)
    result = {}
    for e in entities:
        if e[2] not in result:
            result[e[2]] = [t[e[0]:e[1] + 1]]
        else:
            result[e[2]].append(t[e[0]:e[1] + 1])
    print(f"result is {result}")

3. Semantic Matching example

from flagai.auto_model.auto_loader import AutoLoader
from flagai.model.predictor.predictor import Predictor

maxlen = 256

auto_loader = AutoLoader("semantic-matching",
                         model_name="RoBERTa-base-ch",
                         load_pretrain_params=True,
                         class_num=2)
model = auto_loader.get_model()
tokenizer = auto_loader.get_tokenizer()

predictor = Predictor(model, tokenizer)

test_data = [["后悔了吗", "你有没有后悔"], ["打开自动横屏", "开启移动数据"],
             ["我觉得你很聪明", "你聪明我是这么觉得"]]

for text_pair in test_data:
    print(predictor.predict_cls_classifier(text_pair))

LICENSE

The majority of FlagAI is licensed under the Apache 2.0 license, however portions of the project are available under separate license terms:

Megatron-LM is licensed under the Megatron-LM license
GLM is licensed under the MIT license
AltDiffusion is licensed under the CreativeML Open RAIL-M license

News

[9 June 2023] release v1.7.0, Support Aquila #324;
[31 Mar 2023] release v1.6.3, Support AltCLIP-m18 #303 and AltDiffusion-m18 #302;
[17 Mar 2023] release v1.6.2, Support application of new optimizers #266, and added a new gpt model name 'GPT2-base-en' for English;
[2 Mar 2023] release v1.6.1, Support Galactica model #234; BMInf, a low-resource inference package #238, and examples for p-tuning #227
[12 Jan 2023] release v1.6.0, support a new parallel lib called BMTrain and integate Flash Attention to speedup training of BERT and ViT models, examples in FlashAttentionBERT and FlashAttentionViT. Also add the contrastive search based text generation method SimCTG and DreamBooth finetuning based on AltDiffusion, examples in AltDiffusionNaruto.
[28 Nov 2022] release v1.5.0, support 1.1B EVA-CLIP and [ALM: A large Arabic Language Model based on GLM], examples in ALM
[10 Nov 2022] release v1.4.0, support AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities, examples in AltCLIP and AltDiffusion
[29 Aug 2022] release v1.3.0, Added CLIP module and redesigned tokenizer APIs in #81
[21 Jul 2022] release v1.2.0, ViTs are supported in #71
[29 Jun 2022] release v1.1.0, support OPTs downloading and inference/fine-tuning #63
[17 May 2022] made our first contribution in #1

Platforms supported

Misc

↳ Stargazers, thank you for your support!

↳ Forkers, thank you for your support!

↳ Star History

]

flagai's People

Contributors

Stargazers

Watchers

Forkers

geekplusaa george-han wangguojim dumpmemory luomor-ai sport20speed fade-color yinzh fuyinno4 shenzaimin baai-open-internal bowen92 nicecodeforked benjinglin kunlun-zhu shadowkun xggz luckygirl-lu wpq3142 enockipp mwsssxu zhiyuan-fan zhangfaquan jackyin5918 ledw-2 flagopen lindylin1817 chenyutongthu newsky robotpin coding1018 techthiyanes mbrukman straitrobot so2bin lijun20 shism2 wut0n9 cindyaud jackleiaaaaaa jrcribb marksmayo ssahgal timothyzhang quan-sun jaedukseo lubakabra suryatmodulus sumonst21 kirsireinken99 felipyfuga nanderoo shunxing1234 k-nearest-neighbor richardsonjf macko-pp baai-openplatform mldl rockiesiyuanzhang oneflow-inc maxmax2016 vincentwei2021 fqq11679 guankaisi edisonc72 lhbzx1984 quanquanshixiaogongzhu louisheck tianbuwei winstonwuxingang siyuan-zhou047 tzlby yuchen202 superhero-7 cv-synthesis gloriayy ftgreat gg-big-org undercontroller wolfworld6 ybqu hou-jing leemengtw algorithm-learning-community-for-python zhouao0314 ronghuiju kivvf julienze rickylovefreedom xttd188aa ioannisgkouzionis unesco3187 liuyongs1 sxyseo shania7 amutong yqgao716 flowbywind marscrazy liujuncn

flagai's Issues

有没有随机权重初始化加全量数据复现GLM预训练模型的代码啊？

如题
现在工程整体的一个问题是缺乏具体有效训练的代码
examples中的例子都是极小数据量的除非GLM有很强的few shot能力
否则无法使得使用者能根据自己的数据验证训练过程及模型的有效性。

已经训练好的模型：如 GLM-large-ch 及这些可预先加载的模型的效果都非常好
如果能给出这些模型从随机初始化及全量数据到训练完成的过程则会非常好。

也就是这个工程在开箱即用的意义下非常好，但在如何进行复刻和全量数据调试上缺乏根据。

能多开源一些相关的部分吗？谢谢

使用FlagAI框架save的模型没有办法加载

使用gpt2_titile_generation文件夹中的train.py文件finetune模型，保存的结果在generate.py中无法使用

altdiffusion-m9 is not be supported

Hi 按照以下程序运行后会出现The model_name: altdiffusion-m9 is not be supported的错误

import torch
from flagai.auto_model.auto_loader import AutoLoader
from flagai.model.predictor.predictor import Predictor

# Initialize 
prompt = "Anime portrait of natalie portman as an anime girl by stanley artgerm lau, wlop, rossdraws, james jean, andrei riabovitchev, marc simonetti, and sakimichan, trending on artstation"
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")


loader = AutoLoader(task_name="text2img", #contrastive learning
                    model_name="AltDiffusion-m9",
                    model_dir="./checkpoints")

model = loader.get_model()
model.eval()
model.to(device)
predictor = Predictor(model)
predictor.predict_generate_images(prompt)

请问有文本分类的例子么？[Question]

现有的例子只包含文本生成和标题生成，有没有文本分类的样例呢？

[BUG]训练一个GPT2模型保留到本地，当调用load_weights方法初始化本地权重时报错

如题。
如下禁用transpose_weight方法时正常

def load_weights_without_trans(self, checkpoint_path):
    checkpoint = torch.load(checkpoint_path,
                                map_location=torch.device("cpu"))
    if "module" in checkpoint:
        # ddp
        checkpoint = checkpoint["module"]
    #checkpoint = self.transpose_weight(checkpoint)
    self.load_state_dict(checkpoint, strict=False)
    return checkpoint

[BUG]missing multilingual information in the begining of AltDiffusion readme file

Congratulations on the new release of AltDiffusion-m9 which supporting 9 popular languages in the world. But when I was pointed to the link of exmple/AltDiffusion, I couldn't find any m9 information in the readme file, until I went into almost the end of readme file.

It will be good to add the multilingual support information in the very beginning of readme.

Is Apple silicon is supported?

I want to know if OPT is supported on M1/M1Pro/M1Max?

glm_title_ch.py报错KeyError: 'position_ids'

本文总结了十个可穿戴产品的设计原则，而这些原则同样也是笔者认为是这个行业最吸引人的地方，1为人们解决重复性问题2从人开始而不是从机器开始3要引起注意但不要刻意4提升用户能力而不是取代人。 :
--------------sample 0 :-------------------
-----------random sample: --------------
{'input_ids': [23694, 35526, 12895, 43392, 32153, 2837, 101, 1369, 43359, 24733, 1369, 1736, 88, 11921, 5789, 15658, 43469, 39550, 247, 4153, 43377, 797, 341, 3075, 30639, 43372, 43576, 43371, 71, 1878, 43576, 1354, 71, 43393, 43385, 817, 295, 30057, 7692, 43413, 852, 439, 169, 1878, 6170, 43371, 43361], 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}
Traceback (most recent call last):
File "glm_title_ch.py", line 33, in
predictor.predict_generate_randomsample(text,
File "/home/lz/miniconda3/envs/flagai/lib/python3.8/site-packages/flagai/model/predictor/predictor.py", line 288, in predict_generate_randomsample
return glm_random_sample(self.model, self.tokenizer, text,
File "/home/lz/miniconda3/envs/flagai/lib/python3.8/site-packages/flagai/model/predictor/utils.py", line 600, in glm_random_sample
position_ids = torch.tensor([data['position_ids']],
KeyError: 'position_ids'

[BUG] error running quickstart/title_en.py

I've just installed the package locally and ran test code quickstart/title_en.py and got the following issues.

Any possible reasons? thanks!! see detail below

skys-MacBook-Pro:quickstart sky$ python3 title_en.py
******************** title-generation 100013 bert-base-en
Traceback (most recent call last):
File "title_en.py", line 29, in
print(predictor.predict_generate_beamsearch(text, out_max_length=50, beam_size=3))
File "../flagai/model/predictor/predictor.py", line 231, in predict_generate_beamsearch
return bert_beamsearch(self.model,
File "../flagai/model/predictor/utils.py", line 676, in bert_beamsearch
out_puts_ids = bert_beam_search(model,
File "../flagai/model/predictor/utils.py", line 280, in bert_beam_search
scores = bert_predict_generate(model, new_input_ids,
File "../flagai/model/predictor/utils.py", line 235, in bert_predict_generate
score = model(**{
File "/Users/sky/Library/Python/3.8/lib/python/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "../flagai/model/bert_model.py", line 359, in forward
encoder_out, pooler_out = self.model(
File "/Users/sky/Library/Python/3.8/lib/python/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "../flagai/model/bert_model.py", line 153, in forward
extended_attention_mask = extended_attention_mask * attention_mask
RuntimeError: The size of tensor a (3) must match the size of tensor b (171) at non-singleton dimension 2

ran title_cn.py got similar error
File "/Users/sky/Library/Python/3.8/lib/python/site-packages/flagai/model/layers/attentions.py", line 940, in forward
attention_scores += attention_mask
RuntimeError: output with shape [1, 12, 90, 90] doesn't match the broadcast shape [1, 1, 1, 12, 90, 90]

Typo found in the first sentence of readme file

FlagAI (Fast LArge-scale General AI models) is an fast, easy-to-use and extensible toolkit for large-scale models.

[BUG]Connection timeout when excuting generate example of AltDiffusion

It seems that there is issue to establish connection to proxy of Huggingface to download safety checker model. Could we change the safety checker model download URL from Huggingface to Baai ModelHub?

Below is error output when runing python generate.py:

root@-0:~/FlagAI/examples/AltDiffusion# python generate.py
******************** text2img altdiffusion-m9
Extension horovod.torch has not been built: /usr/local/lib/python3.8/dist-packages/horovod/torch/mpi_lib/_mpi_lib.cpython-38-x86_64-linux-gnu.so not found
If this is not expected, reinstall Horovod with HOROVOD_WITH_PYTORCH=1 to debug the build error.
Warning! MPI libs are missing, but python applications are still avaiable.
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 64, 64) = 16384 dimensions.
making attention of type 'vanilla' with 512 in_channels
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/urllib3/connection.py", line 174, in _new_conn
    conn = connection.create_connection(
  File "/usr/local/lib/python3.8/dist-packages/urllib3/util/connection.py", line 96, in create_connection
    raise err
  File "/usr/local/lib/python3.8/dist-packages/urllib3/util/connection.py", line 86, in create_connection
    sock.connect(sa)
TimeoutError: [Errno 110] Connection timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/urllib3/connectionpool.py", line 699, in urlopen
    httplib_response = self._make_request(
  File "/usr/local/lib/python3.8/dist-packages/urllib3/connectionpool.py", line 382, in _make_request
    self._validate_conn(conn)
  File "/usr/local/lib/python3.8/dist-packages/urllib3/connectionpool.py", line 1010, in _validate_conn
    conn.connect()
  File "/usr/local/lib/python3.8/dist-packages/urllib3/connection.py", line 358, in connect
    conn = self._new_conn()
  File "/usr/local/lib/python3.8/dist-packages/urllib3/connection.py", line 186, in _new_conn
    raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x7f2b1c1d1b80>: Failed to establish a new connection: [Errno 110] Connection timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/requests/adapters.py", line 439, in send
    resp = conn.urlopen(
  File "/usr/local/lib/python3.8/dist-packages/urllib3/connectionpool.py", line 755, in urlopen
    retries = retries.increment(
  File "/usr/local/lib/python3.8/dist-packages/urllib3/util/retry.py", line 574, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='s3-proxy.huggingface.tech', port=443): Max retries exceeded with url: /lfs.huggingface.co/repos/c3/33/c333b2b94c5a8a06ddcbb20b02e728f6bef192870028f8a6859247cabb771a03/64b8393f1afd5a0c1ed2aa5f341fa7c08286839a48f3743162a76a2835c808bd?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Credential=AKIA4N7VTDGOZQA2IKWK%2F20230104%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20230104T011244Z&X-Amz-Expires=259200&X-Amz-Signature=6b4bb6d3d218d24cf8b030d2ee60679e3175ba64c350072718017b7701b01d02&X-Amz-SignedHeaders=host&response-content-disposition=attachment%3B%20filename%3D%22pytorch_model.bin%22&x-id=GetObject (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f2b1c1d1b80>: Failed to establish a new connection: [Errno 110] Connection timed out'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/transformers/modeling_utils.py", line 2007, in from_pretrained
    resolved_archive_file = cached_path(
  File "/usr/local/lib/python3.8/dist-packages/transformers/utils/hub.py", line 284, in cached_path
    output_path = get_from_cache(
  File "/usr/local/lib/python3.8/dist-packages/transformers/utils/hub.py", line 594, in get_from_cache
    http_get(url_to_download, temp_file, proxies=proxies, resume_size=resume_size, headers=headers)
  File "/usr/local/lib/python3.8/dist-packages/transformers/utils/hub.py", line 432, in http_get
    r = requests.get(url, stream=True, proxies=proxies, headers=headers)
  File "/usr/local/lib/python3.8/dist-packages/requests/api.py", line 75, in get
    return request('get', url, params=params, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/requests/api.py", line 61, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/requests/sessions.py", line 542, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python3.8/dist-packages/requests/sessions.py", line 655, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/requests/adapters.py", line 516, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='s3-proxy.huggingface.tech', port=443): Max retries exceeded with url: /lfs.huggingface.co/repos/c3/33/c333b2b94c5a8a06ddcbb20b02e728f6bef192870028f8a6859247cabb771a03/64b8393f1afd5a0c1ed2aa5f341fa7c08286839a48f3743162a76a2835c808bd?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Credential=AKIA4N7VTDGOZQA2IKWK%2F20230104%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20230104T011244Z&X-Amz-Expires=259200&X-Amz-Signature=6b4bb6d3d218d24cf8b030d2ee60679e3175ba64c350072718017b7701b01d02&X-Amz-SignedHeaders=host&response-content-disposition=attachment%3B%20filename%3D%22pytorch_model.bin%22&x-id=GetObject (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f2b1c1d1b80>: Failed to establish a new connection: [Errno 110] Connection timed out'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "generate.py", line 19, in <module>
    predictor.predict_generate_images(
  File "/usr/local/lib/python3.8/dist-packages/flagai/model/predictor/predictor.py", line 342, in predict_generate_images
    safety_checker, safety_feature_extractor = get_safety_checker()
  File "/usr/local/lib/python3.8/dist-packages/flagai/model/predictor/utils.py", line 24, in get_safety_checker
    safety_checker = StableDiffusionSafetyChecker.from_pretrained(safety_model_id)
  File "/usr/local/lib/python3.8/dist-packages/transformers/modeling_utils.py", line 2096, in from_pretrained
    raise EnvironmentError(
OSError: Can't load the model for 'CompVis/stable-diffusion-safety-checker'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'CompVis/stable-diffusion-safety-checker' is the correct path to a directory containing a file named pytorch_model.bin, tf_model.h5, model.ckpt or flax_model.msgpack.

OPT-66B mp_size

Describe the bug
拆分OPT-66B模型时，提示不能被整除

Screenshots

Any plan for Swin Transformer?

Is there any plan for Swin Transformer?

sample configs of img2img

Below is my code, the output is not good, I wander if the prompt is suitable. Could you give me some sample configs?

from diffusers import AltDiffusionPipeline, EulerDiscreteScheduler
from PIL import Image
from diffusers import AltDiffusionImg2ImgPipeline

if __name__ == '__main__':
    text2img = AltDiffusionPipeline.from_pretrained("BAAI/AltDiffusion-m9")
    img2img = AltDiffusionImg2ImgPipeline(**text2img.components)
    img2img = img2img.to("cuda")

    img = Image.open('input/高圆圆.jpeg')
    out_imgs = img2img(prompt="((masterpiece)), (((best quality))), ((ultra-detailed)), ((illustration)), girl, genshin impact,vision",\
                       init_image=img, strength=0.7,\
                       guidance_scale=30,\
                       negative_prompt='nsfw, longbody, lowres, bad anatomy, bad hands, missing fingers, pubic hair,extra digit, fewer digits, cropped, worst quality, low quality').images[0]
    out_imgs.save(f'output.png')

[BUG] glm_generate_samples_en.py 输出有问题

运行如下代码：

import torch
from flagai.model.predictor.predictor import Predictor
from flagai.auto_model.auto_loader import AutoLoader
if __name__ == "__main__":
    """Main training program."""
    print('Generate Samples')
    # Random seeds for reproducibility.
    # Model,
    loader = AutoLoader(task_name='lm',
                                model_name='GLM-large-en',
                                only_download_config=False)
    model = loader.get_model()
    tokenizer = loader.get_tokenizer()
    model.cuda(torch.cuda.current_device())

    predictor = Predictor(model, tokenizer)
    # generate samples
    text = [
        'Question: Is drinking beer bad for your health? Answer: [gMASK]',
    ]
    for t in text:
        output = predictor.predict_generate_randomsample(
            t, top_k=50, repetition_penalty=4.0, top_p=1.0)
        print(t, '\n', output)

得到如下输出：

******************** lm glm-large-en
Question: Is drinking beer bad for your health? Answer: [gMASK] 
 [CLS] question : is drinking beer bad for your health ? answer : [gMASK] <|startofpiece|> , , 1 <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|>

两台3090能达到微调`GLM-10b-ch`的要求吗？

我在两台3090上对GLM-10b-ch进行微调时到验证阶段总是会显存不足，但是训练阶段不会，想知道是两台3090不足以对GLM-10b-ch进行微调还是我的参数设置的有问题？
下面是我训练时使用的参数：
Trainer:

trainer = Trainer(
    env_type="deepspeed+mpu",
    epochs=10,
    experiment_name="GLM-10b-ch-seq2seq",
    eval_interval=2000,
    log_interval=100,
    load_dir=None,
    # parallel settings
    master_ip='127.0.0.1',
    master_port=17750,
    num_nodes=1,
    num_gpus=2,
    hostfile='hostfile',
    training_script=__file__,
    # deepspeed
    deepspeed_config='./config/deepspeed.json',
    # megatron-lm
    model_parallel_size=2,
    save_dir="checkpoints_glm_title_generation",
    save_interval=1,
    num_checkpoints=3,
)

deepspeed.json:

{
    "train_micro_batch_size_per_gpu": 16,
    "eval_micro_batch_size_per_gpu": 1,
    "gradient_accumulation_steps": 2,
    "steps_per_print": 100,
    "gradient_clipping": 1.0,
    "zero_optimization": {
      "stage": 3,
      "contiguous_gradients": false,
      "overlap_comm": true,
      "reduce_scatter": true,
      "reduce_bucket_size": 5e7,
      "allgather_bucket_size": 5e7,
      "cpu_offload": true 
    },
    "zero_allow_untested_optimizer": true,
    "fp16": {
      "enabled": true,
      "loss_scale": 0,
      "loss_scale_window": 1000,
      "hysteresis": 2,
      "min_loss_scale": 1
    },
    "optimizer": {
      "type": "Adam",
      "params": {
        "lr": 0.000005,
        "weight_decay": 0.01,
        "betas": [
          0.9,
          0.98
        ],
        "eps": 1e-6
      }
    },
    "activation_checkpointing": {
      "partition_activations": false,
      "contiguous_memory_optimization": false
    },
    "wall_clock_breakdown": false
  }

AltCLIP training for the first stage

Hi, thanks for your great work about AltCLIP.

In the paper, for the first stage to train AltCLIP, the teacher model text embedding is extracted from origin CLIP text encoder as [TOS] token, is this [TOS] token according to open-ai's CLIP: https://github.com/openai/CLIP/blob/d50d76daa670286dd6cacf3bcd80b5e4823fc8e1/clip/model.py#L354
x = x[torch.arange(x.shape[0]), text.argmax(dim=-1)]

But the Figure-1 in paper use [EOS] token, is the [TOS] and [EOS] token the same with open-ai's impelementation?

And the student model XLM-R use [CLS] token as text embedding to calculate teacher and student MSE loss ?

显存不够

我在测试ALTDiffusion，文档中说只需要10G以上现存就可以，但是我12g的显存跑不起来，显存不够。请问这是为什么呢？

初始化 AutoLoader 报TypeError: init() got multiple values for argument 'text_config'

1、初始化代码：
auto_loader = AutoLoader(
task_name="txt_img_matching",
model_dir="./checkpoints",
model_name="AltCLIP-XLMR-L" # Load the checkpoints from Modelhub(model.baai.ac.cn/models)
)

2、错误：
│ d:\Users\bigdata\Anaconda3\lib\site-packages\transformers\configuration_utils.py:688 in │
│ from_dict │
│ │
│ 685 │ │ if "_commit_hash" in kwargs and "_commit_hash" in config_dict: │
│ 686 │ │ │ kwargs["_commit_hash"] = config_dict["_commit_hash"] │
│ 687 │ │ │
│ ❱ 688 │ │ config = cls(**config_dict) │
│ 689 │ │ │
│ 690 │ │ if hasattr(config, "pruned_heads"): │
│ 691 │ │ │ config.pruned_heads = dict((int(key), value) for key, value in config.pruned │
│ │
│ C:\Users\bigdata\AppData\Roaming\Python\Python38\site-packages\flagai\model\mm\AltCLIP.py:79 in │
│ init │
│ │
│ 76 │ │ │ │ num_layers=3, │
│ 77 │ │ │ │ variant='invert', │
│ 78 │ │ │ │ **kwargs): │
│ ❱ 79 │ │ super().init(text_config_dict, vision_config_dict, projection_dim, │
│ 80 │ │ │ │ │ │ logit_scale_init_value, **kwargs) │
│ 81 │ │ if text_config_dict is None: │
│ 82 │ │ │ text_config_dict = {}

CLIP微调/后的模型如何导出

trainer.train(model=model, train_dataset=dataset, collate_fn=cifar10_collate_fn)
请问 train 完成后的模型如何导出用于推理计算

BLOOM 1b1 and 3b model support

https://huggingface.co/bigscience/bloom-1b1
https://huggingface.co/bigscience/bloom-3b

古诗生成任务无法复现结果

请问 tutorial 中古诗生成的效果是用多大数据量 finetune 的模型生成的结果？我使用样例数据 finetune 出来结果并不是很好

[BUG] Errors in MLM training of Bert

Describe the bug
There is a error when I try to finetune bert model on masked langguage model learning task.

Tasks

An officially supported task in the examples folder (such as GLUE/Title-generation, ...)
My own task or dataset

To Reproduce
https://github.com/marscrazy/Tab2NL/blob/train_with_flagai/train_our_flagai.py

import os
import argparse
from data import get_dataset
from sklearn.metrics import roc_auc_score
import numpy as np
import random
import time
import torch
from flagai.trainer import Trainer
from flagai.auto_model.auto_loader import AutoLoader
from transformers import  DataCollatorForLanguageModeling, AutoTokenizer

def set_seed(SEED):
    torch.manual_seed(SEED)
    torch.cuda.manual_seed_all(SEED)
    np.random.seed(SEED)
    random.seed(SEED)
    #torch.backends.cudnn.deterministic = True
set_seed(26)

def compute_metrics(predictions, labels, meta=None):
    predictions = predictions[:,1]
    return {'roc_auc':roc_auc_score(labels,predictions)}

class txtDataset(torch.utils.data.Dataset):
    def __init__(self, encodings, labels):
        self.encodings = encodings
        self.labels = labels

    def __getitem__(self, idx):
        item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
        item['labels'] = torch.tensor(self.labels[idx])
        return item

    def __len__(self):
        return len(self.labels)

def finetuning_model(
        train_x, train_y, val_x, val_y, cv_fold=1, dataset_id=11,
        model_dir = "bert-base-ch", #bert-base-uncased
        is_mlm = False,
        num_train_epochs=10, #10
        per_device_train_batch_size=16,  # batch size per device during training
        per_device_eval_batch_size=32,  # batch size for evaluation
        warmup_steps=200,  # number of warmup steps for learning rate scheduler
        weight_decay=0.1,  # strength of weight decay
        logging_steps=100,#20
        seed=11,
        learning_rate=4e-5,
        metric_for_best_model=None,
        config = None,
        tokenizer = None,
        model = None,
        output_dir = None,
        logging_dir = None,
        return_model = False
):
    if output_dir is None:
        output_dir = './results/'+str(dataset_id)+'-cv-'+str(cv_fold)+'-mlm'
    if logging_dir is None:
        logging_dir = './logs/'+str(dataset_id)+'-cv-'+str(cv_fold)+'-mlm'
    #if config is None:
        # config = AutoConfig.from_pretrained(model_dir)
    #    import json
    #    config = json.load(open('./checkpoints/BERT-base-en/config.json'))

    if model is None:
        if is_mlm:
            auto_loader = AutoLoader(
            "masklm",
            model_name="BERT-base-en",
            model_dir='./checkpoints',
            )
        else:
            auto_loader = AutoLoader(
            "classification",
            model_name="BERT-base-en",
            model_dir='./checkpoints',
            class_num = 2
            )
        model = auto_loader.get_model()
        tokenizer = AutoTokenizer.from_pretrained("./checkpoints/BERT-base-en")
    train_encodings = tokenizer(train_x.tolist(), truncation=True, padding=True)
    val_encodings = tokenizer(val_x.tolist(), truncation=True, padding=True)
    train_dataset = txtDataset(train_encodings, train_y.astype(np.longlong))
    val_dataset = txtDataset(val_encodings, val_y.astype(np.longlong))
    if is_mlm:
        data_collator = DataCollatorForLanguageModeling(
            tokenizer=tokenizer,
            mlm_probability=0.15
        )
    class MyTrainer(Trainer):
        def forward_step(self, data, model, mems):
            model_output = model(**{'input_ids':data['input_ids'],
                                  'segment_ids':data['token_type_ids'],
                                  'attention_mask':data['attention_mask']
                                 })
            print(model_output)
    trainer = MyTrainer(
        env_type='pytorch',
        epochs=num_train_epochs,
        weight_decay=weight_decay,
        log_interval=logging_steps,
        seed=seed,
        lr=learning_rate,
        save_dir=output_dir,
        tensorboard_dir=logging_dir
    )
    trainer.train(model=model,  # the instantiated 🤗 Transformers model to be trained
        train_dataset=train_dataset,  # training dataset
        valid_dataset=val_dataset,  # evaluation dataset
        metric_methods=[compute_metrics] if not is_mlm else [],
        collate_fn=data_collator if is_mlm else None)

    dir_name = os.listdir(output_dir)[0]
    cur_model_dir = os.path.join(output_dir,dir_name)
    del model
    torch.cuda.empty_cache()
    time.sleep(5)
    if return_model:
        return cur_model_dir, tokenizer, config
   
def train_ptm_cls(train_x,train_y,val_x, val_y, test_x, test_y, cv_fold=1, dataset_id=11,tokenizer=None, config=None,
                  model_dir = "../contrastive/resources/bert-base-uncased"):

    train_encodings = tokenizer(train_x.tolist(), truncation=True, padding=True)
    val_encodings = tokenizer(val_x.tolist(), truncation=True, padding=True)
    test_encodings = tokenizer(test_x.tolist(), truncation=True, padding=True)

    train_dataset = txtDataset(train_encodings, train_y.astype(np.longlong))
    test_dataset = txtDataset(test_encodings, test_y.astype(np.longlong))
    val_dataset = txtDataset(val_encodings, val_y.astype(np.longlong))
    model = AutoModelForSequenceClassification.from_pretrained(model_dir , config=config, from_tf=False,num_labels=2)
    output_dir = './results/'+str(dataset_id)+'-cv-'+str(cv_fold)+'-cls'
    log_dir = './logs/'+str(dataset_id)+'-cv-'+str(cv_fold)+'-cls'
    training_args = TrainingArguments(
        output_dir=output_dir,  # output directory
        num_train_epochs=10,  # total number of training epochs 
        per_device_train_batch_size=32,  # batch size per device during training
        per_device_eval_batch_size=32,  # batch size for evaluation
        warmup_steps=1000,  # number of warmup steps for learning rate scheduler
        weight_decay=0.1,  # strength of weight decay
        logging_dir=log_dir,  # directory for storing logs
        logging_steps=10, 
        eval_steps=10,
        save_steps=10,
        save_total_limit=1,
        do_eval=True,
        evaluation_strategy='steps',
        learning_rate=2e-5,
        seed=11,
        #save_strategy='steps',
        load_best_model_at_end=True,
        metric_for_best_model="roc_auc"
    )
    trainer = Trainer(
        model=model,  # the instantiated 🤗 Transformers model to be trained
        args=training_args,  # training arguments, defined above
        train_dataset=train_dataset,  # training dataset
        eval_dataset=test_dataset,  # evaluation dataset
        compute_metrics=compute_metrics
        #optimizers=(optimizer,None)
    )
    trainer.train()
    train_rs = trainer.evaluate(train_dataset)
    test_rs = trainer.evaluate(test_dataset)
    val_rs = trainer.evaluate(val_dataset)
    return train_rs['eval_roc_auc'], val_rs['eval_roc_auc'],test_rs['eval_roc_auc']


def train(dataset_id=1):
    ds = get_dataset(dataset_id=dataset_id)
    rs = []
    for i, (train_x, val_x, test_x, train_y, val_y, test_y) in enumerate(ds.generate_datasets(to_txt=True)):
        model_dir,tokenizer, config = finetuning_model(train_x,train_y,val_x, val_y,cv_fold=i, dataset_id=dataset_id,
                  model_dir = "../contrastive/resources/bert-base-uncased",is_mlm=True)
        train_auc, val_auc, test_auc = finetuning_model(
            train_x,train_y,val_x, val_y, test_x, test_y, cv_fold=i,dataset_id= dataset_id,tokenizer=tokenizer, config= config,
                  model_dir = model_dir,is_mlm=False)
        rs.append((train_auc,val_auc,test_auc))
        print("Train auc {:.3f}, val auc {:.3f}, Test auc {:.3f}".format(train_auc, val_auc, test_auc))

    for x,y,z in rs:
        print("Train auc {:.3f}, Val auc {:.3f}, Test auc {:.3f}".format(x,y,z))
    print("avg auc is {:.3f}\t{:.3f}".format(np.mean([x[-1] for x in rs]), np.std([x[-1] for x in rs])))
    #train_xgb(ds)

if __name__=="__main__":
    parser = argparse.ArgumentParser(description='Train Classifier with mixup', formatter_class=argparse.ArgumentDefaultsHelpFormatter)

    # Data
    parser.add_argument('--model_dir', type=str, default='H:\\contrast\\SimCSE-main\\SimCSE-main\\bert-base-uncased',help='the path to pretrained models')
    parser.add_argument('--dataset_id', type=str, default='11',choices=['1','2','3','4','5','6','7','8','9','10','11'], help='Choose between 1-11.')
    # MLM pretrain
    parser.add_argument('--mlm_warmup_steps', default=1000, type=int, metavar='N', help='warmup steps (default: 1000)')
    parser.add_argument('--mlm_learning_rate', type=float, default=2e-5)
    parser.add_argument('--mlm_decay', type=float, default=0.1, help='weight decay (L2 penalty)')
    parser.add_argument('--mlm_epochs', type=int, default=300, help='number of epochs to train')
    parser.add_argument('--mlm_train_batch_size', type=int, default=32)
    parser.add_argument('--mlm_eval_batch_size', type=int, default=32)
    parser.add_argument('--mlm_logging_steps', default=10, type=int, metavar='N', help='logging frequency (default: 10)')
    # text classification
    parser.add_argument('--cls_epochs', type=int, default=300, help='number of epochs to train')
    parser.add_argument('--cls_train_batch_size', type=int, default=32)
    parser.add_argument('--cls_eval_batch_size', type=int, default=32)
    parser.add_argument('--cls_warmup_steps', default=1000, type=int, metavar='N', help='warmup steps (default: 1000)')
    parser.add_argument('--cls_decay', type=float, default=0.1, help='weight decay (L2 penalty)')
    parser.add_argument('--cls_logging_steps', default=10, type=int, metavar='N', help='logging frequency (default: 10)')
    parser.add_argument('--cls_learning_rate', type=float, default=2e-5)
    # Optimization options
    #parser.add_argument('--train', type=str, default='vanilla', choices=['vanilla', 'mixup', 'mixup_hidden', 'SRRS'], help='mixup layer')
    # training
    #parser.add_argument('--momentum', type=float, default=0.9)
    #parser.add_argument('--schedule', type=int, nargs='+', default=[150, 225], help='decrease learning rate at these epochs')
    #parser.add_argument('--gammas', type=float, nargs='+', default=[0.1, 0.1], help='LR is multiplied by gamma on schedule, number of gammas should be equal to schedule')

    # Checkpoints
    parser.add_argument('--resume', default='', type=str, metavar='PATH', help='path to latest checkpoint (default: none)')
    parser.add_argument('--start_epoch', default=0, type=int, metavar='N', help='manual epoch number (useful on restarts)')
    # random seed
    parser.add_argument('--seed', default=0, type=int, help='manual seed')
    parser.add_argument('--add_name', type=str, default='')
    parser.add_argument('--job_id', type=str, default='')
    args = parser.parse_args()
    ds = get_dataset(dataset_id=int(args.dataset_id))
    rs = []
    for i, (train_x, val_x, test_x, train_y, val_y, test_y) in enumerate(ds.generate_datasets(to_txt=True,with_title=True if args.dataset_id not in ['1','3'] else False)):
        model_dir,tokenizer, config = finetuning_model(train_x, train_y, val_x, val_y,cv_fold=i, dataset_id=args.dataset_id,
        model_dir = "hkunlp/T5_large_prefix_all_tasks_2upsample2",#bert-base-uncased,hkunlp/from_all_T5_large_prefix_sql2text2
        is_mlm = True,
        num_train_epochs=10,  #args.mlm_epochs,10
        per_device_train_batch_size=args.mlm_train_batch_size,  # batch size per device during training
        per_device_eval_batch_size=args.mlm_eval_batch_size,  # batch size for evaluation
        warmup_steps=args.mlm_warmup_steps,  # number of warmup steps for learning rate scheduler
        weight_decay=args.mlm_decay,  # strength of weight decay
        logging_steps=100,#20
        seed=11,
        learning_rate=4e-5,
        metric_for_best_model=None,
        config = None,
        tokenizer = None,
        model = None,
        output_dir = None,
        logging_dir = None,
        return_model = False)
        
        model_dir,tokenizer,config, trainer= finetuning_model(
            train_x, train_y, val_x, val_y, cv_fold=i,dataset_id= args.dataset_id,tokenizer=tokenizer, config= config,
                  model_dir = model_dir,is_mlm=False, return_model=True)
        test_encodings = tokenizer(test_x.tolist(), truncation=True, padding=True)
        test_dataset = txtDataset(test_encodings, test_y.astype(np.longlong))
        test_auc = trainer.evaluate(test_dataset)['eval_roc_auc']
        rs.append(test_auc)
        print("Test auc {:.3f}".format(test_auc))
    print("avg auc is {:.3f}\t{:.3f}".format(np.mean(rs),np.std(rs)))

Expected behavior
fine-tuning BERT on MLM and classification tasks

Screenshots
If applicable, add screenshots to help explain your problem.

OS (please complete the following information):

OS: [e.g. ubuntu18.04]
Version [e.g. v1.0.0]

有GLM在 CMRC上下游fine-tuned 之后的模型吗？

1、如题有在阅读理解上调试后的模型吗？
2、而且 predictor 构造的模版分布于collect_fn中

        elif self.task_name in ["cmrc"]:
            mask_id = self.tokenizer.get_command_id('MASK')
            source_text = example.text_a
            target_text = example.meta["answer"].strip()
            question = example.meta["question"].strip()
            source_tokens = self.tokenizer.EncodeAsIds(source_text.rstrip())
            question_tokens = self.tokenizer.EncodeAsIds("问题：" + question +
                                                         "答案：")
            max_src_length = self.args.max_src_length - len(
                question_tokens) - 2
            if max_src_length <= 0:
                question_tokens = question_tokens[self.args.max_src_length //
                                                  4]
            source_tokens = [cls_id] + question_tokens + [
                mask_id
            ] + source_tokens[:max_src_length]

是否考虑将这部分做文档进行说明。

3、数据集的导入和预处理是依赖于具体数据集名称而不依赖于更一般的任务格式和数据集格式会增加用户模仿数据集输入格式的成本。

[BUG] Runtime Error from CLUE example

Describe the bug
train_10b_clue.py from ./examples/glm_superglue
afqmc task in CLUE
pytorch setting

task_name = 'afqmc'
trainer = Trainer(env_type="pytorch",
                  batch_size=16,
                  epochs=10,
                  eval_interval=10,
                  load_dir=None,
                  pytorch_device="cuda",
                  save_dir="./glm_superglue_en",
                  save_epoch=1)

model = GLMForSingleTokenCloze.from_pretrain(download_path="/mnt/test_10b_models",
                                             model_name="GLM-large-ch")


tokenizer =  GLMLargeChTokenizer()
train_dataset = SuperGlueDataset(task_name=task_name,
                                 data_dir='./datasets/',
                                 dataset_type='train',
                                 tokenizer=tokenizer,
                                 cloze_eval=True)
valid_dataset = SuperGlueDataset(task_name=task_name,
                                 data_dir='./datasets/',
                                 dataset_type='dev',
                                 tokenizer=tokenizer,
                                 cloze_eval=True)

cl_args = CollateArguments()
cl_args.cloze_eval = True
cl_args.multi_token = False

collate_fn = ConstructSuperglueStrategy(cl_args,
                                        tokenizer,
                                        task_name=task_name)
trainer.train(model,
              train_dataset=train_dataset,
              valid_dataset=valid_dataset,
              collate_fn=collate_fn,
              metric_methods=[["acc", accuracy_metric]])

Tasks

An officially supported task in the examples folder (such as GLUE/Title-generation, ...)
My own task or dataset

To Reproduce

(base) root@deepspeed:~/FlagAI/examples/glm_superglue# python train_10b_clue.py
file cog-pretrain.vocab not exist in ['cog-pretrain.model', 'cog-pratrain.vocab', 'pytorch_model.bin', 'vocab.txt', 'config.json', 'README.md']
{'pad': 50000, 'eos': 50000, 'sep': 50001, 'ENC': 50002, 'MASK': 50003, 'unk': 50004, 'sop': 50006, 'eop': 50007, 'sMASK': 50008, 'gMASK': 50009}
Creating afqmc dataset from file at ./datasets/ (split=train)
Returning 34334 train examples with label dist.: [('0', 23761), ('1', 10573)]
Creating afqmc dataset from file at ./datasets/ (split=dev)
Returning 4316 dev examples with label dist.: [('0', 2978), ('1', 1338)]
Optimizer = Adam
[2022-06-09 10:10:29,530] [INFO] [logger.py:70:log_dist] [Rank -1] loading checkpoints form None
[2022-06-09 10:10:29,530] [INFO] [logger.py:70:log_dist] [Rank -1] working on epoch 0 ...
Traceback (most recent call last):
  File "train_10b_clue.py", line 64, in <module>
    trainer.train(model,
  File "/opt/conda/lib/python3.8/site-packages/flagai-1.0.1-py3.8.egg/flagai/trainer.py", line 460, in train
    lm_loss, skipped_iter, _ = self.train_step(batch,
  File "/opt/conda/lib/python3.8/site-packages/flagai-1.0.1-py3.8.egg/flagai/trainer.py", line 568, in train_step
    step_output = self.forward_step(data, model, mems)
  File "/opt/conda/lib/python3.8/site-packages/flagai-1.0.1-py3.8.egg/flagai/trainer.py", line 635, in forward_step
    model_output = model(**data)
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/flagai-1.0.1-py3.8.egg/flagai/model/glm_model.py", line 754, in forward
    model_out = self.model(input_ids,
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/flagai-1.0.1-py3.8.egg/flagai/model/glm_model.py", line 453, in forward
    loss = F.cross_entropy(logits_parallel.contiguous().float(),
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/functional.py", line 2996, in cross_entropy
    return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
RuntimeError: Expected target size [16, 50048], got [16]

OS (please complete the following information):

Version : v1.0.1

About the logo size of FlagAI in README file

FlagAI的logo，上下有很大的空白，导致readme文件看起来前面不太均衡。检查一下是否可以减少logo上下的空白？谢谢

Add Dreambooth example for AltDiffusion

Hi, I see that the diffusers can already support Altdiffusion. And I try dreambooth on Altdiffusion by using the diffusers.It just need to change original StableDiffusion Pipeline to AltDiffusion Pipeline,and replace the text encoder.And I get results that looks great!

Here are some results I generated in Chinese.I use the special token <鸣人> to represent Uzumaki Naruto.

Prompt 一张<鸣人>男孩的照片，背景是沙漠，masterpieces

Prompt: 一张<鸣人>男孩的照片，背景是富士山，masterpieces

Prompt：一张<鸣人>男孩的照片，铅笔素描

Prompt：一张<鸣人>男孩的照片，油画梵高风格

I'm curious that whether flagai could support the dreambooth? Thanks!

是否有模型与HuggingFace transformers 模型相互转化的功能？

如题

run error

ubuntu

python3.9 run

loader = AutoLoader(task_name="lm", model_name="opt-1.3b-en")

The following error occurred

self.wte = nn.Embedding(config.vocab_size, config.n_embd)
AttributeError: 'dict' object has no attribute 'vocab_size'

关于AltCLIP Ablation study结果的问题

Hi 你们好，感谢你们的工作。
altCLIP 论文中Table[1]中report AltCLIP_T在ImageNet1k上的结果为(74.5, 59.6),但Table[3]-ablation study中的结果变成了(51.61, 41.66)

请问这是怎么回事呢？

[BUG]

Describe the bug
A clear and concise description of what the bug is.

Tasks

An officially supported task in the examples folder (such as GLUE/Title-generation, ...)
My own task or dataset

To Reproduce

Error code

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

OS (please complete the following information):

OS: [e.g. ubuntu18.04]
Version [e.g. v1.0.0]

Additional context
Add any other context about the problem here.

cannot import name 'clock_settime' from 'time' (unknown location)

Describe the bug
A clear and concise description of what the bug is.

Tasks

An officially supported task in the examples folder (such as GLUE/Title-generation, ...)
My own task or dataset

To Reproduce

Error code

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

OS (please complete the following information):

OS: [e.g. ubuntu18.04]
Version [e.g. v1.0.0]

Additional context
Add any other context about the problem here.

运行FlagAI框架gpt2_title_generation文件夹中的trian.py，在预测试报错

请问有没有模型的下载地址？[Question]

Describe the question
A clear and concise description of what the question is.
您好，请问有没有模型的下载地址，代码下载的方法速度较慢，有没有百度云盘等模型下载链接，谢谢~
Additional context
Add any other context about the question here.

[BUG]

你好，我在运行古诗生成任务的推理代码时遇到这个问题，请问应该如何处理

请问V100单卡能跑得动 glm-10b-ch 的推理和 finetune 吗？

我在 V100 单卡上可以跑得动 glm-10b 英文的推理，但是跑 quickstart 中的任务时把模型改成 glm-10b-ch 就会 OOM

[BUG] GPT2Config 没有dict属性 Trainer 对它进行属性改变时不可调用字典接口，亦不可json序列化

Code in Trainer

        if hasattr(tmp_model,
                   'config') and 'checkpoint_activations' in tmp_model.config:
            tmp_model.config[
                'checkpoint_activations'] = tmp_checkpoint_activations

Code in uitils.py

        if hasattr(model, 'save_config'):
            model.save_config(config_path)
            log_dist('  successfully saved {}'.format(config_path))

[Question]使用GLM模型训练TNews文本分类任务，准确率不高。

您好，下面是我使用的代码

import os
import numpy as np
import torch
from torch.utils.data import Dataset
from flagai.auto_model.auto_loader import AutoLoader
from flagai.trainer import Trainer
from flagai.metrics import accuracy_metric
from flagai.data.dataset import SuperGlueDataset
from flagai.test_utils import CollateArguments
from flagai.data.dataset import ConstructSuperglueStrategy
from flagai.data.dataset.superglue.control import DEFAULT_METRICS, MULTI_TOKEN_TASKS, CH_TASKS


device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

task_name = "tnews"
auto_loader = AutoLoader('classification',
                        model_name="GLM-large-ch",
                        model_dir="./checkpoints",
                        load_pretrain_params=True,
                        class_num=15)

cl_args = CollateArguments()
cl_args.cloze_eval = False
cl_args.multi_token = task_name in MULTI_TOKEN_TASKS

model = auto_loader.get_model()
tokenizer = auto_loader.get_tokenizer()

train_dataset = SuperGlueDataset(task_name=task_name,
                                 data_dir='./datasets/',
                                 dataset_type='train',
                                 tokenizer=tokenizer)

collate_fn = ConstructSuperglueStrategy(cl_args,
                                        tokenizer,
                                        task_name=task_name)

valid_dataset = SuperGlueDataset(task_name=task_name,
                                 data_dir='./datasets/',
                                 dataset_type='dev',
                                 tokenizer=tokenizer)

trainer = Trainer(
    env_type="pytorch",
    experiment_name="GLM_cls",
    batch_size=1,
    lr=1e-5,
    weight_decay=1e-5,
    epochs=10,
    log_interval=1,
    eval_interval=10000,
    pytorch_device=device,
    checkpoint_activations=False,
    save_dir="./glm_cls",
    save_interval=10000,
)

trainer.train(model,
              train_dataset=train_dataset,
              valid_dataset=valid_dataset,
              collate_fn=collate_fn,
              metric_methods=[["acc", accuracy_metric]])

训练到2W次迭代时，ACC达到61%，后面准确率越来越低，感觉像是过拟合了，但是准确率最高也只达到了61%，请问是啥原因造成的呢？

使用原始train_large_clue.py代码，loss一直震荡不收敛，准确率只有6%

import torch
from flagai.trainer import Trainer
from flagai.model.glm_model import GLMForSequenceClassification
from flagai.data.tokenizer import Tokenizer

from flagai.metrics import accuracy_metric
from flagai.data.dataset import SuperGlueDataset
from flagai.test_utils import CollateArguments
from flagai.data.dataset.superglue.control import DEFAULT_METRICS, MULTI_TOKEN_TASKS, CH_TASKS
from flagai.data.dataset import ConstructSuperglueStrategy


device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

task_name = "tnews"
model_name = 'GLM-large-ch'
cl_args = CollateArguments()
cl_args.cloze_eval = False
cl_args.multi_token = task_name in MULTI_TOKEN_TASKS

tokenizer = Tokenizer.from_pretrained(model_name)

class_num = 15
model = GLMForSequenceClassification.from_pretrain(model_name=model_name, spell_length=2,
                                                   class_num=class_num, tune_prefix_layers=1)

train_dataset = SuperGlueDataset(task_name=task_name,
                                 data_dir='./datasets/',
                                 dataset_type='train',
                                 tokenizer=tokenizer)

collate_fn = ConstructSuperglueStrategy(cl_args,
                                        tokenizer,
                                        task_name=task_name)

valid_dataset = SuperGlueDataset(task_name=task_name,
                                 data_dir='./datasets/',
                                 dataset_type='dev',
                                 tokenizer=tokenizer)

trainer = Trainer(env_type='pytorch',
                  pytorch_device=device,
                  epochs=2,
                  batch_size=1,
                  lr=1e-5,
                  weight_decay=1e-5,
                  eval_interval=10000,
                  checkpoint_activations=False,
                  fp16=True,
                  log_interval=1000,
                  save_interval=10000,
                  save_dir="./glm_large_clue")

trainer.train(model,
              train_dataset=train_dataset,
              valid_dataset=valid_dataset,
              collate_fn=collate_fn,
              metric_methods=[["acc", accuracy_metric]])

[BUG] glm-10b-ch 模型不能正确inference

Describe the bug
A clear and concise description of what the bug is.

Tasks

An officially supported task in the examples folder (such as GLUE/Title-generation, ...)
My own task or dataset

To Reproduce

from flagai.auto_model.auto_loader import AutoLoader
from flagai.model.predictor.predictor import Predictor

if __name__ == '__main__':
    loader = AutoLoader("seq2seq", "glm-10b-ch", model_dir="./checkpoints/")
    model = loader.get_model()
    tokenizer = loader.get_tokenizer()
    predictor = Predictor(model, tokenizer)

    text = "今天天气不错[gMASK]"
    output = predictor.predict_generate_beamsearch(text, out_max_length=5, beam_size=1)
    print(output)

结果会输出 ?? ?? ??，
debug内部发现给tokenizer decode之前的ID都是0

环境： Win10 x64， Python 3.10，FlagAI版本是pip上当前最新。默认似乎是使用CPU计算的，CPU有明显占用。

使用 AutoLoader("lm", "glm-10b-ch", model_dir="./checkpoints/") 也是一样的问题

[Question] 在CMRC下游任务进行训练之后如何估计呢？

是这样吗？

text = '''
问题:1994年3月,范廷颂担任什么职务?答案:[MASK] 范廷颂枢机(,),圣名保禄·若瑟(),是越南罗马天主教枢机。1963年被任为主教;1990年被擢升为天主教河内总教区宗座署理;1994年被擢升为总主教,同年年底被擢升为枢机;2009年2月离世。范廷颂于1919年6月15日在越南宁平省天主教发艳教区出生;童年时接受良好教育后,被一位越南神父带到河内继续其学业。范廷颂于1940年在河内大修道院完成神学学业。范廷颂于1949年6月6日在河内的主教座堂晋铎;及后被派到圣女小德兰孤儿院服务。1950年代,范廷颂在河内堂区创建移民接待中心以收容到河内避战的难民。1954年,法越战争结束,越南**共和国建都河内,当时很多天主教神职人员逃至越南的南方,但范廷颂仍然留在河内。翌年管理圣若望小修院;惟在1960年因捍卫修院的自由、自治及拒绝政府在修院设政治课的要求而被捕。1963年4月5日,教宗任命范廷颂为天主教北宁教区主教,同年8月15日就任;其牧铭为「我信天主的爱」。由于范廷颂被越南政府软禁差不多30年,因此他无法到所属堂区进行牧灵工作而专注研读等工作。范廷颂除了面对战争、贫困、被当局**天主教会等问题外,也秘密恢复修院、创建女修会团体等。1990年,教宗若望保禄二世在同年6月18日擢升范廷颂为天主教河内总教区宗座署理以填补该教区总主教的空缺。1994年3月23日,范廷颂被教宗若望保禄二世擢升为天主教河内总教区总主教并兼天主教谅山教区宗座署理;同年11月26日,若望保禄二世擢升    
'''
output=predictor.predict_generate_beamsearch(
    text, 
    out_max_length = 30
)
output

或者是这样的？

text = '''
问题:1994年3月,范廷颂担任什么职务?答案:[MASK] 范廷颂枢机(,),圣名保禄·若瑟(),是越南罗马天主教枢机。1963年被任为主教;1990年被擢升为天主教河内总教区宗座署理;1994年被擢升为总主教,同年年底被擢升为枢机;2009年2月离世。范廷颂于1919年6月15日在越南宁平省天主教发艳教区出生;童年时接受良好教育后,被一位越南神父带到河内继续其学业。范廷颂于1940年在河内大修道院完成神学学业。范廷颂于1949年6月6日在河内的主教座堂晋铎;及后被派到圣女小德兰孤儿院服务。1950年代,范廷颂在河内堂区创建移民接待中心以收容到河内避战的难民。1954年,法越战争结束,越南**共和国建都河内,当时很多天主教神职人员逃至越南的南方,但范廷颂仍然留在河内。翌年管理圣若望小修院;惟在1960年因捍卫修院的自由、自治及拒绝政府在修院设政治课的要求而被捕。1963年4月5日,教宗任命范廷颂为天主教北宁教区主教,同年8月15日就任;其牧铭为「我信天主的爱」。由于范廷颂被越南政府软禁差不多30年,因此他无法到所属堂区进行牧灵工作而专注研读等工作。范廷颂除了面对战争、贫困、被当局**天主教会等问题外,也秘密恢复修院、创建女修会团体等。1990年,教宗若望保禄二世在同年6月18日擢升范廷颂为天主教河内总教区宗座署理以填补该教区总主教的空缺。1994年3月23日,范廷颂被教宗若望保禄二世擢升为天主教河内总教区总主教并兼天主教谅山教区宗座署理;同年11月26日,若望保禄二世擢升    
'''
output=predictor.predict_generate_randomsample(
    text, 
    out_max_length = 30
)
output

输出好像都不太对啊

[MASK]前面的是问题，后面的是上下文。

[BUG]

你好，请问 tutorial 中 GLM 标题生成的例子，是用多大的模型生成出来的？我使用 quick_start 中的 glm_title_ch.py 代码，用的是 glm-10b-ch 效果并理想

效果如下：

[BUG] To change "baai-open" into "FlagAI-Open" in readme.md

Since we have moved the repo to FlagAI-Open, remember to change the README.md file.
There is the line as following.

git clone https://github.com/BAAI-Open/FlagAI.git

[TypeError: accuracy_metric() got an unexpected keyword argument 'tokenizer']

Describe the bug
A clear and concise description of what the bug is.

Tasks

glm_superglue
tnews

To Reproduce

Traceback (most recent call last):
  File "train_large_clue.py", line 51, in <module>
    trainer.train(model,
  File "/root/anaconda3/envs/py38/lib/python3.8/site-packages/flagai-1.6.1-py3.8.egg/flagai/trainer.py", line 598, in train
    eval_dict = self.evaluate_and_print_results(
  File "/root/anaconda3/envs/py38/lib/python3.8/site-packages/flagai-1.6.1-py3.8.egg/flagai/trainer.py", line 1103, in evaluate_and_print_results
    eval_dict = self.evaluate(forward_step_func=forward_step_func,
  File "/root/anaconda3/envs/py38/lib/python3.8/site-packages/flagai-1.6.1-py3.8.egg/flagai/trainer.py", line 1051, in evaluate
    metrics[i] += eval_method(all_logits, all_labels, meta=meta, tokenizer=self.tokenizer)
TypeError: accuracy_metric() got an unexpected keyword argument 'tokenizer'

对应方法函数：

"""train_large_clue.py""" 
trainer.train(model,
              train_dataset=train_dataset,
              valid_dataset=valid_dataset,
              collate_fn=collate_fn,
              metric_methods=[["acc", accuracy_metric]])


"""flagai.metrics.accuracy_metric.py""" 

def accuracy_metric(predictions, labels, meta=None):
    '''
    predictions: torch.size(n, class_num)
    labels: torch.size(n)
    '''
    count = 0
    assert len(predictions) == len(labels)
    if predictions.size() != labels.size():      
        predictions = torch.argmax(predictions, dim=-1)
        for prediction, label in zip(predictions, labels):
            count += prediction == label
    else:
        prediction, label = predictions[0], labels[0]
        
        if sigmoid(prediction) >= 0.5:
            count += label == 1
        else:
            count += label == 0
    return 100.0 * count / len(labels)

[BUG] superGLUE example bug

Describe the bug

key error from superGLUE example

task_name = 'qqp'
trainer = Trainer(env_type='pytorch',
                 pytorch_device="cuda",
                  epochs=2,
                  batch_size=1,
                  eval_interval=1000,
                  checkpoint_activations=False,
                  fp16=True,
                  log_interval=1,
                  save_dir="./glm_superglue_en",
                  # master_ip='127.0.0.1',
                  # master_port=17755,
                  # num_nodes=1,
                  # num_gpus=2,
                  # hostfile='./hostfile',
                  model_parallel_size=2,
                  deepspeed_config='./deepspeed.json',
                  training_script=__file__)

model = GLMForSingleTokenCloze.from_pretrain(download_path="/mnt/test_10b_models",
                                             model_name="GLM-large-en")

tokenizer = GLM10bENBPETokenizer()

train_dataset = SuperGlueDataset(task_name=task_name,
                                 data_dir='./datasets/',
                                 dataset_type='train',
                                 tokenizer=tokenizer,
                                 cloze_eval=True)
valid_dataset = SuperGlueDataset(task_name=task_name,
                                 data_dir='./datasets/',
                                 dataset_type='dev',
                                 tokenizer=tokenizer,
                                 cloze_eval=True)

cl_args = CollateArguments()
cl_args.cloze_eval = True

if task_name in ['copa', 'wsc', 'record']:
    cl_args.multi_token = True

from flagai.data.dataset import ConstructSuperglueStrategy

collate_fn = ConstructSuperglueStrategy(cl_args,
                                        tokenizer,
                                        task_name=task_name)
trainer.train(model,
              train_dataset=train_dataset,
              valid_dataset=valid_dataset,
              collate_fn=collate_fn,
              metric_methods=[["acc", accuracy_metric]])

Tasks

An officially supported task in the examples folder (such as GLUE/Title-generation, ...)
My own task or dataset

To Reproduce

Creating qqp dataset from file at ./datasets/ (split=train)
Returning 363846 train examples with label dist.: [('0', 229468), ('1', 134378)]
Creating qqp dataset from file at ./datasets/ (split=dev)
Returning 40430 dev examples with label dist.: [('0', 25545), ('1', 14885)]
Optimizer = Adam
[2022-06-08 17:54:06,911] [INFO] [logger.py:70:log_dist] [Rank -1] loading checkpoints form checkpoints/99
[2022-06-08 17:54:06,912] [INFO] [logger.py:70:log_dist] [Rank -1] WARNING: could not find the metadata file checkpoints/99/latest_checkpointed_iteration.txt
[2022-06-08 17:54:06,912] [INFO] [logger.py:70:log_dist] [Rank -1]     will not load any checkpoints and will start from random
[2022-06-08 17:54:06,912] [INFO] [logger.py:70:log_dist] [Rank -1] working on epoch 0 ...
Traceback (most recent call last):
  File "train_10b_superglue.py", line 59, in <module>
    trainer.train(model,
  File "/opt/conda/lib/python3.8/site-packages/flagai-1.0.1-py3.8.egg/flagai/trainer.py", line 448, in train
    for iteration_, batch in enumerate(train_dataloader):
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 530, in __next__
    data = self._next_data()
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 570, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 52, in fetch
    return self.collate_fn(data)
  File "/opt/conda/lib/python3.8/site-packages/flagai-1.0.1-py3.8.egg/flagai/data/dataset/data_collator/collate_fn.py", line 105, in __call__
    sample = self.pvp.encode(example, {})
  File "/opt/conda/lib/python3.8/site-packages/flagai-1.0.1-py3.8.egg/flagai/data/dataset/superglue/pvp.py", line 195, in encode
    raw_parts_a, raw_parts_b = self.get_parts(example)
  File "/opt/conda/lib/python3.8/site-packages/flagai-1.0.1-py3.8.egg/flagai/data/dataset/superglue/pvp.py", line 1493, in get_parts
    return [text_a], [" Do you mean ", text_b, [self.mask], "."]
  File "/opt/conda/lib/python3.8/site-packages/flagai-1.0.1-py3.8.egg/flagai/data/dataset/superglue/pvp.py", line 99, in mask
    return self.tokenizer.get_command('MASK').Id
  File "/opt/conda/lib/python3.8/site-packages/flagai-1.0.1-py3.8.egg/flagai/data/tokenizer/tokenizer.py", line 172, in get_command
    return self.command_name_map[name]
KeyError: 'MASK'

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots

OS (please complete the following information):

Version [ v1.0.1]

flagai-open / flagai Goto Github PK

flagai's Introduction

Why should I use FlagAI?

Toolkits and Pre-trained Models

Toolkits

Pre-trained Models

Contributing

Contact us

Quick Start

Requirements and Installation

Load model and tokenizer

Examples

1. Predictor

2. NER

3. Semantic Matching example

LICENSE

News

Platforms supported

Misc

↳ Stargazers, thank you for your support!

↳ Forkers, thank you for your support!

↳ Star History

flagai's People

Contributors

Stargazers

Watchers

Forkers

flagai's Issues

ubuntu

python3.9 run

The following error occurred

训练到2W次迭代时，ACC达到61%，后面准确率越来越低，感觉像是过拟合了，但是准确率最高也只达到了61%，请问是啥原因造成的呢？

使用原始train_large_clue.py代码，loss一直震荡不收敛，准确率只有6%

Recommend Projects

Recommend Topics

Recommend Org