Giter Site home page Giter Site logo

flagai-open / flagai Goto Github PK

View Code? Open in Web Editor NEW
3.8K 3.8K 415.0 112.1 MB

FlagAI (Fast LArge-scale General AI models) is a fast, easy-to-use and extensible toolkit for large-scale model.

License: Apache License 2.0

Python 99.89% Shell 0.04% Dockerfile 0.07%

flagai's People

Contributors

920232796 avatar anhforth avatar baai-openplatform avatar baai-wudao avatar csyourui avatar eggiter avatar eltociear avatar fade-color avatar ftgreat avatar isuco avatar jongjyh avatar ledw-2 avatar lindylin1817 avatar lockmatrix avatar marscrazy avatar noahre1 avatar quan-sun avatar rockiesiyuanzhang avatar shunxing1234 avatar siyu-hu avatar superhero-7 avatar wchh-2000 avatar xav1erw avatar xiaofengshi avatar xuanricheng avatar zacliu2023 avatar zhanglu0704 avatar zhaodongyan1 avatar zhiyongliu1114 avatar zhiyuan-fan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

flagai's Issues

[BUG] superGLUE example bug

Describe the bug

key error from superGLUE example

task_name = 'qqp'
trainer = Trainer(env_type='pytorch',
                 pytorch_device="cuda",
                  epochs=2,
                  batch_size=1,
                  eval_interval=1000,
                  checkpoint_activations=False,
                  fp16=True,
                  log_interval=1,
                  save_dir="./glm_superglue_en",
                  # master_ip='127.0.0.1',
                  # master_port=17755,
                  # num_nodes=1,
                  # num_gpus=2,
                  # hostfile='./hostfile',
                  model_parallel_size=2,
                  deepspeed_config='./deepspeed.json',
                  training_script=__file__)

model = GLMForSingleTokenCloze.from_pretrain(download_path="/mnt/test_10b_models",
                                             model_name="GLM-large-en")

tokenizer = GLM10bENBPETokenizer()

train_dataset = SuperGlueDataset(task_name=task_name,
                                 data_dir='./datasets/',
                                 dataset_type='train',
                                 tokenizer=tokenizer,
                                 cloze_eval=True)
valid_dataset = SuperGlueDataset(task_name=task_name,
                                 data_dir='./datasets/',
                                 dataset_type='dev',
                                 tokenizer=tokenizer,
                                 cloze_eval=True)

cl_args = CollateArguments()
cl_args.cloze_eval = True

if task_name in ['copa', 'wsc', 'record']:
    cl_args.multi_token = True

from flagai.data.dataset import ConstructSuperglueStrategy

collate_fn = ConstructSuperglueStrategy(cl_args,
                                        tokenizer,
                                        task_name=task_name)
trainer.train(model,
              train_dataset=train_dataset,
              valid_dataset=valid_dataset,
              collate_fn=collate_fn,
              metric_methods=[["acc", accuracy_metric]])

Tasks

  • An officially supported task in the examples folder (such as GLUE/Title-generation, ...)
  • My own task or dataset

To Reproduce

Creating qqp dataset from file at ./datasets/ (split=train)
Returning 363846 train examples with label dist.: [('0', 229468), ('1', 134378)]
Creating qqp dataset from file at ./datasets/ (split=dev)
Returning 40430 dev examples with label dist.: [('0', 25545), ('1', 14885)]
Optimizer = Adam
[2022-06-08 17:54:06,911] [INFO] [logger.py:70:log_dist] [Rank -1] loading checkpoints form checkpoints/99
[2022-06-08 17:54:06,912] [INFO] [logger.py:70:log_dist] [Rank -1] WARNING: could not find the metadata file checkpoints/99/latest_checkpointed_iteration.txt
[2022-06-08 17:54:06,912] [INFO] [logger.py:70:log_dist] [Rank -1]     will not load any checkpoints and will start from random
[2022-06-08 17:54:06,912] [INFO] [logger.py:70:log_dist] [Rank -1] working on epoch 0 ...
Traceback (most recent call last):
  File "train_10b_superglue.py", line 59, in <module>
    trainer.train(model,
  File "/opt/conda/lib/python3.8/site-packages/flagai-1.0.1-py3.8.egg/flagai/trainer.py", line 448, in train
    for iteration_, batch in enumerate(train_dataloader):
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 530, in __next__
    data = self._next_data()
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 570, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 52, in fetch
    return self.collate_fn(data)
  File "/opt/conda/lib/python3.8/site-packages/flagai-1.0.1-py3.8.egg/flagai/data/dataset/data_collator/collate_fn.py", line 105, in __call__
    sample = self.pvp.encode(example, {})
  File "/opt/conda/lib/python3.8/site-packages/flagai-1.0.1-py3.8.egg/flagai/data/dataset/superglue/pvp.py", line 195, in encode
    raw_parts_a, raw_parts_b = self.get_parts(example)
  File "/opt/conda/lib/python3.8/site-packages/flagai-1.0.1-py3.8.egg/flagai/data/dataset/superglue/pvp.py", line 1493, in get_parts
    return [text_a], [" Do you mean ", text_b, [self.mask], "."]
  File "/opt/conda/lib/python3.8/site-packages/flagai-1.0.1-py3.8.egg/flagai/data/dataset/superglue/pvp.py", line 99, in mask
    return self.tokenizer.get_command('MASK').Id
  File "/opt/conda/lib/python3.8/site-packages/flagai-1.0.1-py3.8.egg/flagai/data/tokenizer/tokenizer.py", line 172, in get_command
    return self.command_name_map[name]
KeyError: 'MASK'

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots

flagAI_bug

OS (please complete the following information):

  • Version [ v1.0.1]

[BUG]

你好,我在运行古诗生成任务的推理代码时遇到这个问题,请问应该如何处理

image

cannot import name 'clock_settime' from 'time' (unknown location)

Describe the bug
A clear and concise description of what the bug is.

Tasks

  • An officially supported task in the examples folder (such as GLUE/Title-generation, ...)
  • My own task or dataset

To Reproduce

Error code

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

OS (please complete the following information):

  • OS: [e.g. ubuntu18.04]
  • Version [e.g. v1.0.0]

Additional context
Add any other context about the problem here.

[BUG] Errors in MLM training of Bert

Describe the bug
There is a error when I try to finetune bert model on masked langguage model learning task.

Tasks

  • An officially supported task in the examples folder (such as GLUE/Title-generation, ...)
  • My own task or dataset

To Reproduce
https://github.com/marscrazy/Tab2NL/blob/train_with_flagai/train_our_flagai.py

import os
import argparse
from data import get_dataset
from sklearn.metrics import roc_auc_score
import numpy as np
import random
import time
import torch
from flagai.trainer import Trainer
from flagai.auto_model.auto_loader import AutoLoader
from transformers import  DataCollatorForLanguageModeling, AutoTokenizer

def set_seed(SEED):
    torch.manual_seed(SEED)
    torch.cuda.manual_seed_all(SEED)
    np.random.seed(SEED)
    random.seed(SEED)
    #torch.backends.cudnn.deterministic = True
set_seed(26)

def compute_metrics(predictions, labels, meta=None):
    predictions = predictions[:,1]
    return {'roc_auc':roc_auc_score(labels,predictions)}

class txtDataset(torch.utils.data.Dataset):
    def __init__(self, encodings, labels):
        self.encodings = encodings
        self.labels = labels

    def __getitem__(self, idx):
        item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
        item['labels'] = torch.tensor(self.labels[idx])
        return item

    def __len__(self):
        return len(self.labels)

def finetuning_model(
        train_x, train_y, val_x, val_y, cv_fold=1, dataset_id=11,
        model_dir = "bert-base-ch", #bert-base-uncased
        is_mlm = False,
        num_train_epochs=10, #10
        per_device_train_batch_size=16,  # batch size per device during training
        per_device_eval_batch_size=32,  # batch size for evaluation
        warmup_steps=200,  # number of warmup steps for learning rate scheduler
        weight_decay=0.1,  # strength of weight decay
        logging_steps=100,#20
        seed=11,
        learning_rate=4e-5,
        metric_for_best_model=None,
        config = None,
        tokenizer = None,
        model = None,
        output_dir = None,
        logging_dir = None,
        return_model = False
):
    if output_dir is None:
        output_dir = './results/'+str(dataset_id)+'-cv-'+str(cv_fold)+'-mlm'
    if logging_dir is None:
        logging_dir = './logs/'+str(dataset_id)+'-cv-'+str(cv_fold)+'-mlm'
    #if config is None:
        # config = AutoConfig.from_pretrained(model_dir)
    #    import json
    #    config = json.load(open('./checkpoints/BERT-base-en/config.json'))

    if model is None:
        if is_mlm:
            auto_loader = AutoLoader(
            "masklm",
            model_name="BERT-base-en",
            model_dir='./checkpoints',
            )
        else:
            auto_loader = AutoLoader(
            "classification",
            model_name="BERT-base-en",
            model_dir='./checkpoints',
            class_num = 2
            )
        model = auto_loader.get_model()
        tokenizer = AutoTokenizer.from_pretrained("./checkpoints/BERT-base-en")
    train_encodings = tokenizer(train_x.tolist(), truncation=True, padding=True)
    val_encodings = tokenizer(val_x.tolist(), truncation=True, padding=True)
    train_dataset = txtDataset(train_encodings, train_y.astype(np.longlong))
    val_dataset = txtDataset(val_encodings, val_y.astype(np.longlong))
    if is_mlm:
        data_collator = DataCollatorForLanguageModeling(
            tokenizer=tokenizer,
            mlm_probability=0.15
        )
    class MyTrainer(Trainer):
        def forward_step(self, data, model, mems):
            model_output = model(**{'input_ids':data['input_ids'],
                                  'segment_ids':data['token_type_ids'],
                                  'attention_mask':data['attention_mask']
                                 })
            print(model_output)
    trainer = MyTrainer(
        env_type='pytorch',
        epochs=num_train_epochs,
        weight_decay=weight_decay,
        log_interval=logging_steps,
        seed=seed,
        lr=learning_rate,
        save_dir=output_dir,
        tensorboard_dir=logging_dir
    )
    trainer.train(model=model,  # the instantiated 🤗 Transformers model to be trained
        train_dataset=train_dataset,  # training dataset
        valid_dataset=val_dataset,  # evaluation dataset
        metric_methods=[compute_metrics] if not is_mlm else [],
        collate_fn=data_collator if is_mlm else None)

    dir_name = os.listdir(output_dir)[0]
    cur_model_dir = os.path.join(output_dir,dir_name)
    del model
    torch.cuda.empty_cache()
    time.sleep(5)
    if return_model:
        return cur_model_dir, tokenizer, config
   
def train_ptm_cls(train_x,train_y,val_x, val_y, test_x, test_y, cv_fold=1, dataset_id=11,tokenizer=None, config=None,
                  model_dir = "../contrastive/resources/bert-base-uncased"):

    train_encodings = tokenizer(train_x.tolist(), truncation=True, padding=True)
    val_encodings = tokenizer(val_x.tolist(), truncation=True, padding=True)
    test_encodings = tokenizer(test_x.tolist(), truncation=True, padding=True)

    train_dataset = txtDataset(train_encodings, train_y.astype(np.longlong))
    test_dataset = txtDataset(test_encodings, test_y.astype(np.longlong))
    val_dataset = txtDataset(val_encodings, val_y.astype(np.longlong))
    model = AutoModelForSequenceClassification.from_pretrained(model_dir , config=config, from_tf=False,num_labels=2)
    output_dir = './results/'+str(dataset_id)+'-cv-'+str(cv_fold)+'-cls'
    log_dir = './logs/'+str(dataset_id)+'-cv-'+str(cv_fold)+'-cls'
    training_args = TrainingArguments(
        output_dir=output_dir,  # output directory
        num_train_epochs=10,  # total number of training epochs 
        per_device_train_batch_size=32,  # batch size per device during training
        per_device_eval_batch_size=32,  # batch size for evaluation
        warmup_steps=1000,  # number of warmup steps for learning rate scheduler
        weight_decay=0.1,  # strength of weight decay
        logging_dir=log_dir,  # directory for storing logs
        logging_steps=10, 
        eval_steps=10,
        save_steps=10,
        save_total_limit=1,
        do_eval=True,
        evaluation_strategy='steps',
        learning_rate=2e-5,
        seed=11,
        #save_strategy='steps',
        load_best_model_at_end=True,
        metric_for_best_model="roc_auc"
    )
    trainer = Trainer(
        model=model,  # the instantiated 🤗 Transformers model to be trained
        args=training_args,  # training arguments, defined above
        train_dataset=train_dataset,  # training dataset
        eval_dataset=test_dataset,  # evaluation dataset
        compute_metrics=compute_metrics
        #optimizers=(optimizer,None)
    )
    trainer.train()
    train_rs = trainer.evaluate(train_dataset)
    test_rs = trainer.evaluate(test_dataset)
    val_rs = trainer.evaluate(val_dataset)
    return train_rs['eval_roc_auc'], val_rs['eval_roc_auc'],test_rs['eval_roc_auc']


def train(dataset_id=1):
    ds = get_dataset(dataset_id=dataset_id)
    rs = []
    for i, (train_x, val_x, test_x, train_y, val_y, test_y) in enumerate(ds.generate_datasets(to_txt=True)):
        model_dir,tokenizer, config = finetuning_model(train_x,train_y,val_x, val_y,cv_fold=i, dataset_id=dataset_id,
                  model_dir = "../contrastive/resources/bert-base-uncased",is_mlm=True)
        train_auc, val_auc, test_auc = finetuning_model(
            train_x,train_y,val_x, val_y, test_x, test_y, cv_fold=i,dataset_id= dataset_id,tokenizer=tokenizer, config= config,
                  model_dir = model_dir,is_mlm=False)
        rs.append((train_auc,val_auc,test_auc))
        print("Train auc {:.3f}, val auc {:.3f}, Test auc {:.3f}".format(train_auc, val_auc, test_auc))

    for x,y,z in rs:
        print("Train auc {:.3f}, Val auc {:.3f}, Test auc {:.3f}".format(x,y,z))
    print("avg auc is {:.3f}\t{:.3f}".format(np.mean([x[-1] for x in rs]), np.std([x[-1] for x in rs])))
    #train_xgb(ds)

if __name__=="__main__":
    parser = argparse.ArgumentParser(description='Train Classifier with mixup', formatter_class=argparse.ArgumentDefaultsHelpFormatter)

    # Data
    parser.add_argument('--model_dir', type=str, default='H:\\contrast\\SimCSE-main\\SimCSE-main\\bert-base-uncased',help='the path to pretrained models')
    parser.add_argument('--dataset_id', type=str, default='11',choices=['1','2','3','4','5','6','7','8','9','10','11'], help='Choose between 1-11.')
    # MLM pretrain
    parser.add_argument('--mlm_warmup_steps', default=1000, type=int, metavar='N', help='warmup steps (default: 1000)')
    parser.add_argument('--mlm_learning_rate', type=float, default=2e-5)
    parser.add_argument('--mlm_decay', type=float, default=0.1, help='weight decay (L2 penalty)')
    parser.add_argument('--mlm_epochs', type=int, default=300, help='number of epochs to train')
    parser.add_argument('--mlm_train_batch_size', type=int, default=32)
    parser.add_argument('--mlm_eval_batch_size', type=int, default=32)
    parser.add_argument('--mlm_logging_steps', default=10, type=int, metavar='N', help='logging frequency (default: 10)')
    # text classification
    parser.add_argument('--cls_epochs', type=int, default=300, help='number of epochs to train')
    parser.add_argument('--cls_train_batch_size', type=int, default=32)
    parser.add_argument('--cls_eval_batch_size', type=int, default=32)
    parser.add_argument('--cls_warmup_steps', default=1000, type=int, metavar='N', help='warmup steps (default: 1000)')
    parser.add_argument('--cls_decay', type=float, default=0.1, help='weight decay (L2 penalty)')
    parser.add_argument('--cls_logging_steps', default=10, type=int, metavar='N', help='logging frequency (default: 10)')
    parser.add_argument('--cls_learning_rate', type=float, default=2e-5)
    # Optimization options
    #parser.add_argument('--train', type=str, default='vanilla', choices=['vanilla', 'mixup', 'mixup_hidden', 'SRRS'], help='mixup layer')
    # training
    #parser.add_argument('--momentum', type=float, default=0.9)
    #parser.add_argument('--schedule', type=int, nargs='+', default=[150, 225], help='decrease learning rate at these epochs')
    #parser.add_argument('--gammas', type=float, nargs='+', default=[0.1, 0.1], help='LR is multiplied by gamma on schedule, number of gammas should be equal to schedule')

    # Checkpoints
    parser.add_argument('--resume', default='', type=str, metavar='PATH', help='path to latest checkpoint (default: none)')
    parser.add_argument('--start_epoch', default=0, type=int, metavar='N', help='manual epoch number (useful on restarts)')
    # random seed
    parser.add_argument('--seed', default=0, type=int, help='manual seed')
    parser.add_argument('--add_name', type=str, default='')
    parser.add_argument('--job_id', type=str, default='')
    args = parser.parse_args()
    ds = get_dataset(dataset_id=int(args.dataset_id))
    rs = []
    for i, (train_x, val_x, test_x, train_y, val_y, test_y) in enumerate(ds.generate_datasets(to_txt=True,with_title=True if args.dataset_id not in ['1','3'] else False)):
        model_dir,tokenizer, config = finetuning_model(train_x, train_y, val_x, val_y,cv_fold=i, dataset_id=args.dataset_id,
        model_dir = "hkunlp/T5_large_prefix_all_tasks_2upsample2",#bert-base-uncased,hkunlp/from_all_T5_large_prefix_sql2text2
        is_mlm = True,
        num_train_epochs=10,  #args.mlm_epochs,10
        per_device_train_batch_size=args.mlm_train_batch_size,  # batch size per device during training
        per_device_eval_batch_size=args.mlm_eval_batch_size,  # batch size for evaluation
        warmup_steps=args.mlm_warmup_steps,  # number of warmup steps for learning rate scheduler
        weight_decay=args.mlm_decay,  # strength of weight decay
        logging_steps=100,#20
        seed=11,
        learning_rate=4e-5,
        metric_for_best_model=None,
        config = None,
        tokenizer = None,
        model = None,
        output_dir = None,
        logging_dir = None,
        return_model = False)
        
        model_dir,tokenizer,config, trainer= finetuning_model(
            train_x, train_y, val_x, val_y, cv_fold=i,dataset_id= args.dataset_id,tokenizer=tokenizer, config= config,
                  model_dir = model_dir,is_mlm=False, return_model=True)
        test_encodings = tokenizer(test_x.tolist(), truncation=True, padding=True)
        test_dataset = txtDataset(test_encodings, test_y.astype(np.longlong))
        test_auc = trainer.evaluate(test_dataset)['eval_roc_auc']
        rs.append(test_auc)
        print("Test auc {:.3f}".format(test_auc))
    print("avg auc is {:.3f}\t{:.3f}".format(np.mean(rs),np.std(rs)))

Expected behavior
fine-tuning BERT on MLM and classification tasks

Screenshots
If applicable, add screenshots to help explain your problem.

image

OS (please complete the following information):

  • OS: [e.g. ubuntu18.04]
  • Version [e.g. v1.0.0]

[BUG]

你好,请问 tutorial 中 GLM 标题生成的例子,是用多大的模型生成出来的?我使用 quick_start 中的 glm_title_ch.py 代码,用的是 glm-10b-ch 效果并理想

效果如下:

image

CLIP微调/后的模型如何导出

trainer.train(model=model, train_dataset=dataset, collate_fn=cifar10_collate_fn)
请问 train 完成后的模型如何导出用于推理计算

显存不够

我在测试ALTDiffusion, 文档中说只需要10G以上现存就可以,但是我12g的显存跑不起来, 显存不够。请问这是为什么呢?

有没有随机权重初始化加全量数据复现GLM预训练模型的代码啊?

如题
现在工程整体的一个问题是 缺乏具体有效训练的代码
examples中的例子都是极小数据量的 除非GLM有很强的few shot能力
否则无法使得使用者能根据自己的数据验证训练过程及模型的有效性。

已经训练好的模型:如 GLM-large-ch 及这些可预先加载的模型的效果都非常好
如果能给出这些模型从随机初始化及全量数据到训练完成的过程则会非常好。

也就是这个工程在开箱即用的意义下非常好,但在如何进行复刻和全量数据调试上缺乏根据。

能多开源一些相关的部分吗?谢谢

[BUG]missing multilingual information in the begining of AltDiffusion readme file

Congratulations on the new release of AltDiffusion-m9 which supporting 9 popular languages in the world. But when I was pointed to the link of exmple/AltDiffusion, I couldn't find any m9 information in the readme file, until I went into almost the end of readme file.

It will be good to add the multilingual support information in the very beginning of readme.

根据CLIP微调/Finetuning例子, loss为None

您好,按照readme中的例子 我是用自己的数据集进行finetune 在第一个iteration之后 得到的lm_loss全为None
在forward_step方法里打印了data 数据是正常的,但无法获取正确的model_output, 请大神帮忙看下

[TypeError: accuracy_metric() got an unexpected keyword argument 'tokenizer']

Describe the bug
A clear and concise description of what the bug is.

Tasks

  • glm_superglue
  • tnews

To Reproduce

Traceback (most recent call last):
  File "train_large_clue.py", line 51, in <module>
    trainer.train(model,
  File "/root/anaconda3/envs/py38/lib/python3.8/site-packages/flagai-1.6.1-py3.8.egg/flagai/trainer.py", line 598, in train
    eval_dict = self.evaluate_and_print_results(
  File "/root/anaconda3/envs/py38/lib/python3.8/site-packages/flagai-1.6.1-py3.8.egg/flagai/trainer.py", line 1103, in evaluate_and_print_results
    eval_dict = self.evaluate(forward_step_func=forward_step_func,
  File "/root/anaconda3/envs/py38/lib/python3.8/site-packages/flagai-1.6.1-py3.8.egg/flagai/trainer.py", line 1051, in evaluate
    metrics[i] += eval_method(all_logits, all_labels, meta=meta, tokenizer=self.tokenizer)
TypeError: accuracy_metric() got an unexpected keyword argument 'tokenizer'

对应方法函数:

"""train_large_clue.py""" 
trainer.train(model,
              train_dataset=train_dataset,
              valid_dataset=valid_dataset,
              collate_fn=collate_fn,
              metric_methods=[["acc", accuracy_metric]])


"""flagai.metrics.accuracy_metric.py""" 

def accuracy_metric(predictions, labels, meta=None):
    '''
    predictions: torch.size(n, class_num)
    labels: torch.size(n)
    '''
    count = 0
    assert len(predictions) == len(labels)
    if predictions.size() != labels.size():      
        predictions = torch.argmax(predictions, dim=-1)
        for prediction, label in zip(predictions, labels):
            count += prediction == label
    else:
        prediction, label = predictions[0], labels[0]
        
        if sigmoid(prediction) >= 0.5:
            count += label == 1
        else:
            count += label == 0
    return 100.0 * count / len(labels)

[BUG] error running quickstart/title_en.py

I've just installed the package locally and ran test code quickstart/title_en.py and got the following issues.

Any possible reasons? thanks!! see detail below


skys-MacBook-Pro:quickstart sky$ python3 title_en.py
******************** title-generation 100013 bert-base-en
Traceback (most recent call last):
File "title_en.py", line 29, in
print(predictor.predict_generate_beamsearch(text, out_max_length=50, beam_size=3))
File "../flagai/model/predictor/predictor.py", line 231, in predict_generate_beamsearch
return bert_beamsearch(self.model,
File "../flagai/model/predictor/utils.py", line 676, in bert_beamsearch
out_puts_ids = bert_beam_search(model,
File "../flagai/model/predictor/utils.py", line 280, in bert_beam_search
scores = bert_predict_generate(model, new_input_ids,
File "../flagai/model/predictor/utils.py", line 235, in bert_predict_generate
score = model(**{
File "/Users/sky/Library/Python/3.8/lib/python/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "../flagai/model/bert_model.py", line 359, in forward
encoder_out, pooler_out = self.model(
File "/Users/sky/Library/Python/3.8/lib/python/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "../flagai/model/bert_model.py", line 153, in forward
extended_attention_mask = extended_attention_mask * attention_mask
RuntimeError: The size of tensor a (3) must match the size of tensor b (171) at non-singleton dimension 2

  • ran title_cn.py got similar error
    File "/Users/sky/Library/Python/3.8/lib/python/site-packages/flagai/model/layers/attentions.py", line 940, in forward
    attention_scores += attention_mask
    RuntimeError: output with shape [1, 12, 90, 90] doesn't match the broadcast shape [1, 1, 1, 12, 90, 90]

run error

ubuntu

python3.9 run

loader = AutoLoader(task_name="lm", model_name="opt-1.3b-en")

The following error occurred

self.wte = nn.Embedding(config.vocab_size, config.n_embd)
AttributeError: 'dict' object has no attribute 'vocab_size'

两台3090能达到微调`GLM-10b-ch`的要求吗?

我在两台3090上对GLM-10b-ch进行微调时到验证阶段总是会显存不足,但是训练阶段不会,想知道是两台3090不足以对GLM-10b-ch进行微调还是我的参数设置的有问题?
下面是我训练时使用的参数:
Trainer:

trainer = Trainer(
    env_type="deepspeed+mpu",
    epochs=10,
    experiment_name="GLM-10b-ch-seq2seq",
    eval_interval=2000,
    log_interval=100,
    load_dir=None,
    # parallel settings
    master_ip='127.0.0.1',
    master_port=17750,
    num_nodes=1,
    num_gpus=2,
    hostfile='hostfile',
    training_script=__file__,
    # deepspeed
    deepspeed_config='./config/deepspeed.json',
    # megatron-lm
    model_parallel_size=2,
    save_dir="checkpoints_glm_title_generation",
    save_interval=1,
    num_checkpoints=3,
)

deepspeed.json:

{
    "train_micro_batch_size_per_gpu": 16,
    "eval_micro_batch_size_per_gpu": 1,
    "gradient_accumulation_steps": 2,
    "steps_per_print": 100,
    "gradient_clipping": 1.0,
    "zero_optimization": {
      "stage": 3,
      "contiguous_gradients": false,
      "overlap_comm": true,
      "reduce_scatter": true,
      "reduce_bucket_size": 5e7,
      "allgather_bucket_size": 5e7,
      "cpu_offload": true 
    },
    "zero_allow_untested_optimizer": true,
    "fp16": {
      "enabled": true,
      "loss_scale": 0,
      "loss_scale_window": 1000,
      "hysteresis": 2,
      "min_loss_scale": 1
    },
    "optimizer": {
      "type": "Adam",
      "params": {
        "lr": 0.000005,
        "weight_decay": 0.01,
        "betas": [
          0.9,
          0.98
        ],
        "eps": 1e-6
      }
    },
    "activation_checkpointing": {
      "partition_activations": false,
      "contiguous_memory_optimization": false
    },
    "wall_clock_breakdown": false
  }

[BUG] glm-10b-ch 模型不能正确inference

Describe the bug
A clear and concise description of what the bug is.

Tasks

  • An officially supported task in the examples folder (such as GLUE/Title-generation, ...)
  • My own task or dataset

To Reproduce

from flagai.auto_model.auto_loader import AutoLoader
from flagai.model.predictor.predictor import Predictor

if __name__ == '__main__':
    loader = AutoLoader("seq2seq", "glm-10b-ch", model_dir="./checkpoints/")
    model = loader.get_model()
    tokenizer = loader.get_tokenizer()
    predictor = Predictor(model, tokenizer)

    text = "今天天气不错[gMASK]"
    output = predictor.predict_generate_beamsearch(text, out_max_length=5, beam_size=1)
    print(output)

结果会输出 ?? ?? ??,
debug内部发现给tokenizer decode之前的ID都是0

环境: Win10 x64, Python 3.10,FlagAI版本是pip上当前最新。默认似乎是使用CPU计算的,CPU有明显占用。

使用 AutoLoader("lm", "glm-10b-ch", model_dir="./checkpoints/") 也是一样的问题

OPT finetuning

请问有没有OPT每种模型大小的资源使用情况?

[BUG]Connection timeout when excuting generate example of AltDiffusion

It seems that there is issue to establish connection to proxy of Huggingface to download safety checker model. Could we change the safety checker model download URL from Huggingface to Baai ModelHub?

Below is error output when runing python generate.py:

root@-0:~/FlagAI/examples/AltDiffusion# python generate.py
******************** text2img altdiffusion-m9
Extension horovod.torch has not been built: /usr/local/lib/python3.8/dist-packages/horovod/torch/mpi_lib/_mpi_lib.cpython-38-x86_64-linux-gnu.so not found
If this is not expected, reinstall Horovod with HOROVOD_WITH_PYTORCH=1 to debug the build error.
Warning! MPI libs are missing, but python applications are still avaiable.
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 64, 64) = 16384 dimensions.
making attention of type 'vanilla' with 512 in_channels
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/urllib3/connection.py", line 174, in _new_conn
    conn = connection.create_connection(
  File "/usr/local/lib/python3.8/dist-packages/urllib3/util/connection.py", line 96, in create_connection
    raise err
  File "/usr/local/lib/python3.8/dist-packages/urllib3/util/connection.py", line 86, in create_connection
    sock.connect(sa)
TimeoutError: [Errno 110] Connection timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/urllib3/connectionpool.py", line 699, in urlopen
    httplib_response = self._make_request(
  File "/usr/local/lib/python3.8/dist-packages/urllib3/connectionpool.py", line 382, in _make_request
    self._validate_conn(conn)
  File "/usr/local/lib/python3.8/dist-packages/urllib3/connectionpool.py", line 1010, in _validate_conn
    conn.connect()
  File "/usr/local/lib/python3.8/dist-packages/urllib3/connection.py", line 358, in connect
    conn = self._new_conn()
  File "/usr/local/lib/python3.8/dist-packages/urllib3/connection.py", line 186, in _new_conn
    raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x7f2b1c1d1b80>: Failed to establish a new connection: [Errno 110] Connection timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/requests/adapters.py", line 439, in send
    resp = conn.urlopen(
  File "/usr/local/lib/python3.8/dist-packages/urllib3/connectionpool.py", line 755, in urlopen
    retries = retries.increment(
  File "/usr/local/lib/python3.8/dist-packages/urllib3/util/retry.py", line 574, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='s3-proxy.huggingface.tech', port=443): Max retries exceeded with url: /lfs.huggingface.co/repos/c3/33/c333b2b94c5a8a06ddcbb20b02e728f6bef192870028f8a6859247cabb771a03/64b8393f1afd5a0c1ed2aa5f341fa7c08286839a48f3743162a76a2835c808bd?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Credential=AKIA4N7VTDGOZQA2IKWK%2F20230104%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20230104T011244Z&X-Amz-Expires=259200&X-Amz-Signature=6b4bb6d3d218d24cf8b030d2ee60679e3175ba64c350072718017b7701b01d02&X-Amz-SignedHeaders=host&response-content-disposition=attachment%3B%20filename%3D%22pytorch_model.bin%22&x-id=GetObject (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f2b1c1d1b80>: Failed to establish a new connection: [Errno 110] Connection timed out'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/transformers/modeling_utils.py", line 2007, in from_pretrained
    resolved_archive_file = cached_path(
  File "/usr/local/lib/python3.8/dist-packages/transformers/utils/hub.py", line 284, in cached_path
    output_path = get_from_cache(
  File "/usr/local/lib/python3.8/dist-packages/transformers/utils/hub.py", line 594, in get_from_cache
    http_get(url_to_download, temp_file, proxies=proxies, resume_size=resume_size, headers=headers)
  File "/usr/local/lib/python3.8/dist-packages/transformers/utils/hub.py", line 432, in http_get
    r = requests.get(url, stream=True, proxies=proxies, headers=headers)
  File "/usr/local/lib/python3.8/dist-packages/requests/api.py", line 75, in get
    return request('get', url, params=params, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/requests/api.py", line 61, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/requests/sessions.py", line 542, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python3.8/dist-packages/requests/sessions.py", line 655, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/requests/adapters.py", line 516, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='s3-proxy.huggingface.tech', port=443): Max retries exceeded with url: /lfs.huggingface.co/repos/c3/33/c333b2b94c5a8a06ddcbb20b02e728f6bef192870028f8a6859247cabb771a03/64b8393f1afd5a0c1ed2aa5f341fa7c08286839a48f3743162a76a2835c808bd?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Credential=AKIA4N7VTDGOZQA2IKWK%2F20230104%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20230104T011244Z&X-Amz-Expires=259200&X-Amz-Signature=6b4bb6d3d218d24cf8b030d2ee60679e3175ba64c350072718017b7701b01d02&X-Amz-SignedHeaders=host&response-content-disposition=attachment%3B%20filename%3D%22pytorch_model.bin%22&x-id=GetObject (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f2b1c1d1b80>: Failed to establish a new connection: [Errno 110] Connection timed out'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "generate.py", line 19, in <module>
    predictor.predict_generate_images(
  File "/usr/local/lib/python3.8/dist-packages/flagai/model/predictor/predictor.py", line 342, in predict_generate_images
    safety_checker, safety_feature_extractor = get_safety_checker()
  File "/usr/local/lib/python3.8/dist-packages/flagai/model/predictor/utils.py", line 24, in get_safety_checker
    safety_checker = StableDiffusionSafetyChecker.from_pretrained(safety_model_id)
  File "/usr/local/lib/python3.8/dist-packages/transformers/modeling_utils.py", line 2096, in from_pretrained
    raise EnvironmentError(
OSError: Can't load the model for 'CompVis/stable-diffusion-safety-checker'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'CompVis/stable-diffusion-safety-checker' is the correct path to a directory containing a file named pytorch_model.bin, tf_model.h5, model.ckpt or flax_model.msgpack.

sample configs of img2img

Below is my code, the output is not good, I wander if the prompt is suitable. Could you give me some sample configs?

from diffusers import AltDiffusionPipeline, EulerDiscreteScheduler
from PIL import Image
from diffusers import AltDiffusionImg2ImgPipeline

if __name__ == '__main__':
    text2img = AltDiffusionPipeline.from_pretrained("BAAI/AltDiffusion-m9")
    img2img = AltDiffusionImg2ImgPipeline(**text2img.components)
    img2img = img2img.to("cuda")

    img = Image.open('input/高圆圆.jpeg')
    out_imgs = img2img(prompt="((masterpiece)), (((best quality))), ((ultra-detailed)), ((illustration)), girl, genshin impact,vision",\
                       init_image=img, strength=0.7,\
                       guidance_scale=30,\
                       negative_prompt='nsfw, longbody, lowres, bad anatomy, bad hands, missing fingers, pubic hair,extra digit, fewer digits, cropped, worst quality, low quality').images[0]
    out_imgs.save(f'output.png')

有GLM在 CMRC上下游fine-tuned 之后的模型吗?

1、如题 有在阅读理解上调试后的模型吗?
2、而且 predictor 构造的模版分布于collect_fn中

        elif self.task_name in ["cmrc"]:
            mask_id = self.tokenizer.get_command_id('MASK')
            source_text = example.text_a
            target_text = example.meta["answer"].strip()
            question = example.meta["question"].strip()
            source_tokens = self.tokenizer.EncodeAsIds(source_text.rstrip())
            question_tokens = self.tokenizer.EncodeAsIds("问题:" + question +
                                                         "答案:")
            max_src_length = self.args.max_src_length - len(
                question_tokens) - 2
            if max_src_length <= 0:
                question_tokens = question_tokens[self.args.max_src_length //
                                                  4]
            source_tokens = [cls_id] + question_tokens + [
                mask_id
            ] + source_tokens[:max_src_length]

是否考虑将这部分做文档进行说明。

3、数据集的导入和预处理是依赖于具体数据集名称 而不依赖于更一般的任务格式和数据集格式 会增加用户模仿数据集输入格式的成本。

altdiffusion-m9 is not be supported

Hi 按照以下程序运行后 会出现The model_name: altdiffusion-m9 is not be supported的错误

import torch
from flagai.auto_model.auto_loader import AutoLoader
from flagai.model.predictor.predictor import Predictor

# Initialize 
prompt = "Anime portrait of natalie portman as an anime girl by stanley artgerm lau, wlop, rossdraws, james jean, andrei riabovitchev, marc simonetti, and sakimichan, trending on artstation"
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")


loader = AutoLoader(task_name="text2img", #contrastive learning
                    model_name="AltDiffusion-m9",
                    model_dir="./checkpoints")

model = loader.get_model()
model.eval()
model.to(device)
predictor = Predictor(model)
predictor.predict_generate_images(prompt)

[BUG] glm_generate_samples_en.py 输出有问题

运行如下代码:

import torch
from flagai.model.predictor.predictor import Predictor
from flagai.auto_model.auto_loader import AutoLoader
if __name__ == "__main__":
    """Main training program."""
    print('Generate Samples')
    # Random seeds for reproducibility.
    # Model,
    loader = AutoLoader(task_name='lm',
                                model_name='GLM-large-en',
                                only_download_config=False)
    model = loader.get_model()
    tokenizer = loader.get_tokenizer()
    model.cuda(torch.cuda.current_device())

    predictor = Predictor(model, tokenizer)
    # generate samples
    text = [
        'Question: Is drinking beer bad for your health? Answer: [gMASK]',
    ]
    for t in text:
        output = predictor.predict_generate_randomsample(
            t, top_k=50, repetition_penalty=4.0, top_p=1.0)
        print(t, '\n', output)

得到如下输出:

******************** lm glm-large-en
Question: Is drinking beer bad for your health? Answer: [gMASK] 
 [CLS] question : is drinking beer bad for your health ? answer : [gMASK] <|startofpiece|> , , 1 <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|>

关于AltCLIP Ablation study结果的问题

Hi 你们好,感谢你们的工作。
altCLIP 论文中Table[1]中report AltCLIP_T在ImageNet1k上的结果为(74.5, 59.6),但Table[3]-ablation study中的结果变成了(51.61, 41.66)
image
image
请问这是怎么回事呢?

[Question] 在CMRC下游任务 进行训练之后 如何估计呢?

是这样吗?

text = '''
问题:1994年3月,范廷颂担任什么职务?答案:[MASK] 范廷颂枢机(,),圣名保禄·若瑟(),是越南罗马天主教枢机。1963年被任为主教;1990年被擢升为天主教河内总教区宗座署理;1994年被擢升为总主教,同年年底被擢升为枢机;2009年2月离世。范廷颂于1919年6月15日在越南宁平省天主教发艳教区出生;童年时接受良好教育后,被一位越南神父带到河内继续其学业。范廷颂于1940年在河内大修道院完成神学学业。范廷颂于1949年6月6日在河内的主教座堂晋铎;及后被派到圣女小德兰孤儿院服务。1950年代,范廷颂在河内堂区创建移民接待中心以收容到河内避战的难民。1954年,法越战争结束,越南**共和国建都河内,当时很多天主教神职人员逃至越南的南方,但范廷颂仍然留在河内。翌年管理圣若望小修院;惟在1960年因捍卫修院的自由、自治及拒绝政府在修院设政治课的要求而被捕。1963年4月5日,教宗任命范廷颂为天主教北宁教区主教,同年8月15日就任;其牧铭为「我信天主的爱」。由于范廷颂被越南政府软禁差不多30年,因此他无法到所属堂区进行牧灵工作而专注研读等工作。范廷颂除了面对战争、贫困、被当局**天主教会等问题外,也秘密恢复修院、创建女修会团体等。1990年,教宗若望保禄二世在同年6月18日擢升范廷颂为天主教河内总教区宗座署理以填补该教区总主教的空缺。1994年3月23日,范廷颂被教宗若望保禄二世擢升为天主教河内总教区总主教并兼天主教谅山教区宗座署理;同年11月26日,若望保禄二世擢升    
'''
output=predictor.predict_generate_beamsearch(
    text, 
    out_max_length = 30
)
output

或者是这样的?

text = '''
问题:1994年3月,范廷颂担任什么职务?答案:[MASK] 范廷颂枢机(,),圣名保禄·若瑟(),是越南罗马天主教枢机。1963年被任为主教;1990年被擢升为天主教河内总教区宗座署理;1994年被擢升为总主教,同年年底被擢升为枢机;2009年2月离世。范廷颂于1919年6月15日在越南宁平省天主教发艳教区出生;童年时接受良好教育后,被一位越南神父带到河内继续其学业。范廷颂于1940年在河内大修道院完成神学学业。范廷颂于1949年6月6日在河内的主教座堂晋铎;及后被派到圣女小德兰孤儿院服务。1950年代,范廷颂在河内堂区创建移民接待中心以收容到河内避战的难民。1954年,法越战争结束,越南**共和国建都河内,当时很多天主教神职人员逃至越南的南方,但范廷颂仍然留在河内。翌年管理圣若望小修院;惟在1960年因捍卫修院的自由、自治及拒绝政府在修院设政治课的要求而被捕。1963年4月5日,教宗任命范廷颂为天主教北宁教区主教,同年8月15日就任;其牧铭为「我信天主的爱」。由于范廷颂被越南政府软禁差不多30年,因此他无法到所属堂区进行牧灵工作而专注研读等工作。范廷颂除了面对战争、贫困、被当局**天主教会等问题外,也秘密恢复修院、创建女修会团体等。1990年,教宗若望保禄二世在同年6月18日擢升范廷颂为天主教河内总教区宗座署理以填补该教区总主教的空缺。1994年3月23日,范廷颂被教宗若望保禄二世擢升为天主教河内总教区总主教并兼天主教谅山教区宗座署理;同年11月26日,若望保禄二世擢升    
'''
output=predictor.predict_generate_randomsample(
    text, 
    out_max_length = 30
)
output

输出好像都不太对啊

[MASK]前面的是问题,后面的是上下文。

Add Dreambooth example for AltDiffusion

Hi, I see that the diffusers can already support Altdiffusion. And I try dreambooth on Altdiffusion by using the diffusers.It just need to change original StableDiffusion Pipeline to AltDiffusion Pipeline,and replace the text encoder.And I get results that looks great!

Here are some results I generated in Chinese.I use the special token <鸣人> to represent Uzumaki Naruto.

Prompt 一张<鸣人>男孩的照片,背景是沙漠,masterpieces
一张鸣人男孩的照片_背景是沙漠_masterpieces

Prompt: 一张<鸣人>男孩的照片,背景是富士山,masterpieces
一张鸣人男孩的照片_背景是富士山_masterpieces png

Prompt:一张<鸣人>男孩的照片,铅笔素描
一张鸣人男孩的照片_背景是沙漠_masterpieces_铅笔素描_2

Prompt:一张<鸣人>男孩的照片,油画梵高风格
一张鸣人男孩的照_油画_梵高风格

I'm curious that whether flagai could support the dreambooth? Thanks!

[BUG]

Describe the bug
A clear and concise description of what the bug is.

Tasks

  • An officially supported task in the examples folder (such as GLUE/Title-generation, ...)
  • My own task or dataset

To Reproduce

Error code

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

OS (please complete the following information):

  • OS: [e.g. ubuntu18.04]
  • Version [e.g. v1.0.0]

Additional context
Add any other context about the problem here.

古诗生成任务无法复现结果

请问 tutorial 中古诗生成的效果是用多大数据量 finetune 的模型生成的结果?我使用样例数据 finetune 出来结果并不是很好
image

glm_title_ch.py报错KeyError: 'position_ids'

本文总结了十个可穿戴产品的设计原则,而这些原则同样也是笔者认为是这个行业最吸引人的地方,1为人们解决重复性问题2从人开始而不是从机器开始3要引起注意但不要刻意4提升用户能力而不是取代人。 :
--------------sample 0 :-------------------
-----------random sample: --------------
{'input_ids': [23694, 35526, 12895, 43392, 32153, 2837, 101, 1369, 43359, 24733, 1369, 1736, 88, 11921, 5789, 15658, 43469, 39550, 247, 4153, 43377, 797, 341, 3075, 30639, 43372, 43576, 43371, 71, 1878, 43576, 1354, 71, 43393, 43385, 817, 295, 30057, 7692, 43413, 852, 439, 169, 1878, 6170, 43371, 43361], 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}
Traceback (most recent call last):
File "glm_title_ch.py", line 33, in
predictor.predict_generate_randomsample(text,
File "/home/lz/miniconda3/envs/flagai/lib/python3.8/site-packages/flagai/model/predictor/predictor.py", line 288, in predict_generate_randomsample
return glm_random_sample(self.model, self.tokenizer, text,
File "/home/lz/miniconda3/envs/flagai/lib/python3.8/site-packages/flagai/model/predictor/utils.py", line 600, in glm_random_sample
position_ids = torch.tensor([data['position_ids']],
KeyError: 'position_ids'

请问有没有模型的下载地址?[Question]

Describe the question
A clear and concise description of what the question is.
您好,请问有没有模型的下载地址,代码下载的方法速度较慢,有没有百度云盘等模型下载链接,谢谢~
Additional context
Add any other context about the question here.

[Question]使用GLM模型训练TNews文本分类任务,准确率不高。

您好,下面是我使用的代码

import os
import numpy as np
import torch
from torch.utils.data import Dataset
from flagai.auto_model.auto_loader import AutoLoader
from flagai.trainer import Trainer
from flagai.metrics import accuracy_metric
from flagai.data.dataset import SuperGlueDataset
from flagai.test_utils import CollateArguments
from flagai.data.dataset import ConstructSuperglueStrategy
from flagai.data.dataset.superglue.control import DEFAULT_METRICS, MULTI_TOKEN_TASKS, CH_TASKS


device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

task_name = "tnews"
auto_loader = AutoLoader('classification',
                        model_name="GLM-large-ch",
                        model_dir="./checkpoints",
                        load_pretrain_params=True,
                        class_num=15)

cl_args = CollateArguments()
cl_args.cloze_eval = False
cl_args.multi_token = task_name in MULTI_TOKEN_TASKS

model = auto_loader.get_model()
tokenizer = auto_loader.get_tokenizer()

train_dataset = SuperGlueDataset(task_name=task_name,
                                 data_dir='./datasets/',
                                 dataset_type='train',
                                 tokenizer=tokenizer)

collate_fn = ConstructSuperglueStrategy(cl_args,
                                        tokenizer,
                                        task_name=task_name)

valid_dataset = SuperGlueDataset(task_name=task_name,
                                 data_dir='./datasets/',
                                 dataset_type='dev',
                                 tokenizer=tokenizer)

trainer = Trainer(
    env_type="pytorch",
    experiment_name="GLM_cls",
    batch_size=1,
    lr=1e-5,
    weight_decay=1e-5,
    epochs=10,
    log_interval=1,
    eval_interval=10000,
    pytorch_device=device,
    checkpoint_activations=False,
    save_dir="./glm_cls",
    save_interval=10000,
)

trainer.train(model,
              train_dataset=train_dataset,
              valid_dataset=valid_dataset,
              collate_fn=collate_fn,
              metric_methods=[["acc", accuracy_metric]])

训练到2W次迭代时,ACC达到61%,后面准确率越来越低,感觉像是过拟合了,但是准确率最高也只达到了61%,请问是啥原因造成的呢?

使用原始train_large_clue.py代码,loss一直震荡不收敛,准确率只有6%

import torch
from flagai.trainer import Trainer
from flagai.model.glm_model import GLMForSequenceClassification
from flagai.data.tokenizer import Tokenizer

from flagai.metrics import accuracy_metric
from flagai.data.dataset import SuperGlueDataset
from flagai.test_utils import CollateArguments
from flagai.data.dataset.superglue.control import DEFAULT_METRICS, MULTI_TOKEN_TASKS, CH_TASKS
from flagai.data.dataset import ConstructSuperglueStrategy


device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

task_name = "tnews"
model_name = 'GLM-large-ch'
cl_args = CollateArguments()
cl_args.cloze_eval = False
cl_args.multi_token = task_name in MULTI_TOKEN_TASKS

tokenizer = Tokenizer.from_pretrained(model_name)

class_num = 15
model = GLMForSequenceClassification.from_pretrain(model_name=model_name, spell_length=2,
                                                   class_num=class_num, tune_prefix_layers=1)

train_dataset = SuperGlueDataset(task_name=task_name,
                                 data_dir='./datasets/',
                                 dataset_type='train',
                                 tokenizer=tokenizer)

collate_fn = ConstructSuperglueStrategy(cl_args,
                                        tokenizer,
                                        task_name=task_name)

valid_dataset = SuperGlueDataset(task_name=task_name,
                                 data_dir='./datasets/',
                                 dataset_type='dev',
                                 tokenizer=tokenizer)

trainer = Trainer(env_type='pytorch',
                  pytorch_device=device,
                  epochs=2,
                  batch_size=1,
                  lr=1e-5,
                  weight_decay=1e-5,
                  eval_interval=10000,
                  checkpoint_activations=False,
                  fp16=True,
                  log_interval=1000,
                  save_interval=10000,
                  save_dir="./glm_large_clue")

trainer.train(model,
              train_dataset=train_dataset,
              valid_dataset=valid_dataset,
              collate_fn=collate_fn,
              metric_methods=[["acc", accuracy_metric]])

AltCLIP training for the first stage

Hi, thanks for your great work about AltCLIP.

In the paper, for the first stage to train AltCLIP, the teacher model text embedding is extracted from origin CLIP text encoder as [TOS] token, is this [TOS] token according to open-ai's CLIP: https://github.com/openai/CLIP/blob/d50d76daa670286dd6cacf3bcd80b5e4823fc8e1/clip/model.py#L354
x = x[torch.arange(x.shape[0]), text.argmax(dim=-1)]

But the Figure-1 in paper use [EOS] token, is the [TOS] and [EOS] token the same with open-ai's impelementation?

And the student model XLM-R use [CLS] token as text embedding to calculate teacher and student MSE loss ?

[BUG] Runtime Error from CLUE example

Describe the bug
train_10b_clue.py from ./examples/glm_superglue
afqmc task in CLUE
pytorch setting

task_name = 'afqmc'
trainer = Trainer(env_type="pytorch",
                  batch_size=16,
                  epochs=10,
                  eval_interval=10,
                  load_dir=None,
                  pytorch_device="cuda",
                  save_dir="./glm_superglue_en",
                  save_epoch=1)

model = GLMForSingleTokenCloze.from_pretrain(download_path="/mnt/test_10b_models",
                                             model_name="GLM-large-ch")


tokenizer =  GLMLargeChTokenizer()
train_dataset = SuperGlueDataset(task_name=task_name,
                                 data_dir='./datasets/',
                                 dataset_type='train',
                                 tokenizer=tokenizer,
                                 cloze_eval=True)
valid_dataset = SuperGlueDataset(task_name=task_name,
                                 data_dir='./datasets/',
                                 dataset_type='dev',
                                 tokenizer=tokenizer,
                                 cloze_eval=True)

cl_args = CollateArguments()
cl_args.cloze_eval = True
cl_args.multi_token = False

collate_fn = ConstructSuperglueStrategy(cl_args,
                                        tokenizer,
                                        task_name=task_name)
trainer.train(model,
              train_dataset=train_dataset,
              valid_dataset=valid_dataset,
              collate_fn=collate_fn,
              metric_methods=[["acc", accuracy_metric]])

Tasks

  • An officially supported task in the examples folder (such as GLUE/Title-generation, ...)
  • My own task or dataset

To Reproduce

(base) root@deepspeed:~/FlagAI/examples/glm_superglue# python train_10b_clue.py
file cog-pretrain.vocab not exist in ['cog-pretrain.model', 'cog-pratrain.vocab', 'pytorch_model.bin', 'vocab.txt', 'config.json', 'README.md']
{'pad': 50000, 'eos': 50000, 'sep': 50001, 'ENC': 50002, 'MASK': 50003, 'unk': 50004, 'sop': 50006, 'eop': 50007, 'sMASK': 50008, 'gMASK': 50009}
Creating afqmc dataset from file at ./datasets/ (split=train)
Returning 34334 train examples with label dist.: [('0', 23761), ('1', 10573)]
Creating afqmc dataset from file at ./datasets/ (split=dev)
Returning 4316 dev examples with label dist.: [('0', 2978), ('1', 1338)]
Optimizer = Adam
[2022-06-09 10:10:29,530] [INFO] [logger.py:70:log_dist] [Rank -1] loading checkpoints form None
[2022-06-09 10:10:29,530] [INFO] [logger.py:70:log_dist] [Rank -1] working on epoch 0 ...
Traceback (most recent call last):
  File "train_10b_clue.py", line 64, in <module>
    trainer.train(model,
  File "/opt/conda/lib/python3.8/site-packages/flagai-1.0.1-py3.8.egg/flagai/trainer.py", line 460, in train
    lm_loss, skipped_iter, _ = self.train_step(batch,
  File "/opt/conda/lib/python3.8/site-packages/flagai-1.0.1-py3.8.egg/flagai/trainer.py", line 568, in train_step
    step_output = self.forward_step(data, model, mems)
  File "/opt/conda/lib/python3.8/site-packages/flagai-1.0.1-py3.8.egg/flagai/trainer.py", line 635, in forward_step
    model_output = model(**data)
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/flagai-1.0.1-py3.8.egg/flagai/model/glm_model.py", line 754, in forward
    model_out = self.model(input_ids,
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/flagai-1.0.1-py3.8.egg/flagai/model/glm_model.py", line 453, in forward
    loss = F.cross_entropy(logits_parallel.contiguous().float(),
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/functional.py", line 2996, in cross_entropy
    return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
RuntimeError: Expected target size [16, 50048], got [16]

OS (please complete the following information):

  • Version : v1.0.1

初始化 AutoLoader 报TypeError: __init__() got multiple values for argument 'text_config'

1、初始化代码:
auto_loader = AutoLoader(
task_name="txt_img_matching",
model_dir="./checkpoints",
model_name="AltCLIP-XLMR-L" # Load the checkpoints from Modelhub(model.baai.ac.cn/models)
)

2、错误:
│ d:\Users\bigdata\Anaconda3\lib\site-packages\transformers\configuration_utils.py:688 in │
│ from_dict │
│ │
│ 685 │ │ if "_commit_hash" in kwargs and "_commit_hash" in config_dict: │
│ 686 │ │ │ kwargs["_commit_hash"] = config_dict["_commit_hash"] │
│ 687 │ │ │
│ ❱ 688 │ │ config = cls(**config_dict) │
│ 689 │ │ │
│ 690 │ │ if hasattr(config, "pruned_heads"): │
│ 691 │ │ │ config.pruned_heads = dict((int(key), value) for key, value in config.pruned │
│ │
│ C:\Users\bigdata\AppData\Roaming\Python\Python38\site-packages\flagai\model\mm\AltCLIP.py:79 in │
init
│ │
│ 76 │ │ │ │ num_layers=3, │
│ 77 │ │ │ │ variant='invert', │
│ 78 │ │ │ │ **kwargs): │
│ ❱ 79 │ │ super().init(text_config_dict, vision_config_dict, projection_dim, │
│ 80 │ │ │ │ │ │ logit_scale_init_value, **kwargs) │
│ 81 │ │ if text_config_dict is None: │
│ 82 │ │ │ text_config_dict = {}

OPT-66B mp_size

Describe the bug
拆分OPT-66B模型时,提示不能被整除

Screenshots
image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.