skyworkai / skywork Goto Github PK

Skywork series models are pre-trained on 3.2TB of high-quality multilingual (mainly Chinese and English) and code data. We have open-sourced the model, training data, evaluation data, evaluation methods, etc. 天工系列模型在3.2TB高质量多语言和代码数据上进行预训练。我们开源了模型参数，训练数据，评估数据，评估方法。

License: Other

Shell 13.36% Python 86.64%

llm

skywork's People

Contributors

Stargazers

Watchers

Forkers

marigoold enockipp flowbywind shawshining colionx sea2603 markwjj kou-bin legendzhu bikong2 sgzwiz wolfworld6 ftgreat xuguowong winscat zhaoganglxh davidalphafox lizhiwen19900709 xfg0913 hkxiron young-chao xiaoyanger netesea kemolo forex24 yueyedeai hostidc allensmile likeucode wangjenmin dugusword baiyun122 whitefu ishine liangofthechen simba2017 haikuoxin newsam18 zjzno1 eltociear xiaojuntianshi beefsoup18 coderwpf llbb123 mwsssxu shiyybua lukasrayx qcwthu loadchange gokunwu vvyun winning1120xx jumpingfish-i daoyuly azeroth-dev yubobo leafminer ljy2019 kekewind xlands dorucioclea swx-10 starrylun sundogs8603 ccclyu mardawang0518 baozhiyang-hep 270438469 iyangming zyz0000 hanxiaoshengbucai paul099 dulimei d0z1ngshark apollohuang1 hkgai yuhui1038 ytgusha xiaomao996688 embed001 triple-mu oceantalk xiaoze7713 h3d9 alwaysssssss chuanmingliu aiedward 121666751 lihuibng pengsihua2023 jianxindong chenqing24 williamscitech

skywork's Issues

咨询一下预训练阶段第一次预训练和第二次预训练的数据使用问题

第一次预训练用了通用的数据，第二次预训练加入了一些专业数据，请问第二次预训练是把第一次的数据和专业数据放在一起进行训练的还是专业数据单独训练的，如果是单独训练的会不会有通用知识遗忘的问题？望解答~

eval loss标准化是否可以理解为平均每个token的loss

如题
另外
如果我提前先分别计算单条样本的平均token loss，然后在平均每一条样本计算得到的平均token loss，是否是和你们的实现等价的呢

会支持更多的上下文吗，目前是4k吧

关于评测集的选择和使用

看您的介绍中有提到说构建了中文，英文，代码，arxiv文章等多个领域的验证集，请问这些验证集有开源的打算吗？因为目前我看只有（技术文章 | 电影评论 | 政务报告 | 游戏 | 金融 | 通用领域）这六个类型，可能和通用中英文bencmark相关的验证集对大家来说更有实用性～
就关于目前您开源的这六个验证集，我看其中基本来源基本都是新闻。请问您有尝试过将通用中英文bencmark的数据改写成sequence的形式来进行模型效果验证吗，或者说对于特定领域的任务，尝试将特定领域的数据改写成sequence来进行验证么？～

请问能不能测试下GPT-4的 L_{test} - L_{train}

L_{test} - L_{train}，想知道GPT-4是不是也见过了训练集，所以严格来说GPT-4也不能完全说是zero-shot ?

BUG：ceval, cmmlu和mmlu中选项ABCD的概率计算错误

Skywork/eval/文件夹下的evaluate_ceval.py, evaluate_cmmlu.py和evaluate_mmlu.py文件中，获取选项ABCD的概率的关键代码如下：

    softval = torch.nn.functional.softmax(
        torch.tensor(
            [
                logits[tokenizer("A")["input_ids"][-1]],
                logits[tokenizer("B")["input_ids"][-1]],
                logits[tokenizer("C")["input_ids"][-1]],
                logits[tokenizer("D")["input_ids"][-1]],
            ]
        ),
        dim=0,
    )

以选项A为例：
tokenizer("A")会把“A“认为是一个句子，在”A“前面拼接句子开始标志”_“。因此tokenizer实际上转化的字符为“<s> _A”，得到input_ids=[1, 319]。代码中tokenizer("A")["input_ids"][-1]取得的id是319，对应的字符为“_A”，而真正“A”字符对应的id是:

tokenizer.convert_tokens_to_ids('A')=29909.

BCD选项也存在同样的问题。

评估时的一个full_prompt的例子格式如下：

以下是关于农学的单项选择题，请直接给出正确答案的选项。

题目：肉牛屠宰后，胴体的哪个部位肉质较好
A. 胸
B. 腹
C. 大腿
D. 小腿
答案：C

……

题目：羊胴体中，肉质较好的部位是
A. 胸下肉
B. 肩胛肉
C. 后腿肉
D. 小腿肉
答案：C

以下是关于农学的单项选择题，请直接给出正确答案的选项。

题目：在农业生产中被当作极其重要的劳动对象发挥作用，最主要的不可替代的基本生产资料是
A. 农业生产工具
B. 土地
C. 劳动力
D. 资金
答案：

根据full_prompt例子的格式，选项应该填在“答案：”后面，不应该另起一行。
因此选择ABCD选项的id时，应该取“A”“B”“C”"D"字符的概率，而不是“_A”,"_B","_C","_D"字符的概率。

请问什么时候会再度开放开源数据集？

目前的开源数据地址被 404 了

BUG：评测loss计算中attention_mask有误

你好，我看到你们的 eval_loss.py 中计算loss的时候，attention_mask 取的是 attention_mask[:, :-1]，同时是采用的 right padding。这样的做法在batch情况下，即有padding的时候，会导致非最大长度的那些句子会额外在最后计算一个生成padding_token的loss，并且该loss的数量级通常大于正文的数量级，会造成结果有误。在right padding的前提下，改成 attention_mask[:, 1:] 即可解决该问题。

SkyPile-150B not found

Hello, https://huggingface.co/datasets/Skywork/SkyPile-150B not found, thanks

Questions about eval_loss.py

In Line 58, we calculate the number of tokens using attention_mask = attention_mask[:, :-1] and torch.sum(attention_mask).item(). But do we need to shift attention mask? Maybe torch.sum(attention_mask).item() - batch_size(without shifting) is correct?

For example if the batch size is 2, the input_ids can be [[1, 2, 3], [1, 2, pad]] and the attention mask is [[True, True, True], [True, True, False]]. Using attention_mask = attention_mask[:, :-1] and torch.sum(attention_mask).item() will output 4 as the number of tokens. But actually the token number should be 3 because we only calculate logits on [2, 3] and [2, pad] (first label is shifted by label = label[:, 1: ]) and pad isn't counted as a valid token for calculating loss. If we set IGNORE_INDEX in labels according to attention_mask, we don't need shifted attention mask when calculating loss.

A code example:

import torch
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("Skywork/Skywork-13B-base", use_fast=False, trust_remote_code=True)

tokenizer.pad_token = tokenizer.eos_token

tokenized_texts = tokenizer(["This is an example text.", "这是一个实例文本，这句话比较长。"], add_special_tokens=False, padding=True, truncation=True, max_length=128, return_tensors="pt")

input_ids = tokenized_texts.input_ids
attention_mask = tokenized_texts.attention_mask

print(f"Input sequence length: {input_ids.size(1)}")
print("Input labels:")
print(input_ids)
print("Input attention mask:")
print(attention_mask)
print(f"num_tokens: {torch.sum(attention_mask).item() - input_ids.size(0)}")

shift_attention_mask = attention_mask[:, :-1]
print("Shifted attention mask:")
print(shift_attention_mask)
print(f"num_tokens: {torch.sum(shift_attention_mask).item()}")

output is:

Input sequence length: 13
Input labels:
tensor([[  910,   338,   385,  1342,  1426, 29889,     2,     2,     2,     2,
             2,     2,     2],
        [29871, 30810, 30392, 41176, 50921, 45522, 30214, 30810, 32760, 31852,
         40579, 31143, 30267]])
Input attention mask:
tensor([[1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0],
        [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])
num_tokens: 17
Shifted attention mask:
tensor([[1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0],
        [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])
num_tokens: 18

But from the Input labels output, it is obvious that the token num should be 17 (first sample 338 to 29889 and second 30810 to 30267).

请问中文数据集是否有大陆可方便下载地址？

谢谢

Skywork 团队有兴趣推出一个 7B 的蒸馏版本以支持推测采样和低资源设备推理吗？

如题，据我所知，隔壁 CausalLM/7B 已经搞了，他们用的知识蒸馏的方法。

save的checkpoint里面没有bin文件

大佬们，有一小问题，想请教一下，我改动了config文件，改成了zero3的时候，保存下来的checkpoint里面没有bin文件，是不是我哪一步没有调对？还是说不能用zero3？

泄露检测的ref集合问题

请问，论文中用于泄漏检测的ref集合，是否是data/eval_loss中的mock_gsm8k数据集？

感谢！

chat模型快发布了吗

您好，请问下chat模型快发布了吗

does it support FlashAttention-2?

Error: bash_scripts/skywork_eval_loss.sh

关于预训练数据拼接

你好，关于预训练数据的处理方式，我有一些疑惑：
从项目中提供的预训练数据示例文件pt_train.jsonl和训练代码来看，你们对于预训练数据是单条直接padding到sequence_length的做法，padding直接使用eos字符；在我之前了解的一些项目里，做法是将所有训练数据分词后拼接再切片；我个人对于这两种处理方式的理解是：如果直接拼接再切片可以保证训练中几乎所有tokens都是有效的，而如果单独处理则会计算大量的padding的tokens。
请问你们在实际训练的时候是否预先将许多单条数据拼接到接近sequence_length组成新的一条数据？

SkyPile-150B数据集通过hf-mirror只能下载前两页数据文件，更多的就会报链接错误了，是什么情况呢？

如题

评测数据集MOCK_GSM8K_TEST使用方式

您好，我正在尝试在其他模型上复现该实验，想请问在这个实验中，这个数据集的使用方式是否与原版的GSM8k有所不同？因为我没有看到有question相关的prompt部分

当前eval_loss脚本不支持chatglm系列模型

如题，当前eval_loss脚本不支持chatglm系列模型。可否补上相关评测。
或者怎么改支持呢，我试了下好像chatglm的tokenizer属性不太一致，并且padding_side也和eval中写的不一样，导致强行用这个脚本测出的loss是inf。

The eval_loss_tp.py file is missing.

The eval_loss_tp.py file used in bash_scripts/skywork_eval_loss.sh is not presented in the repo.

论文链接是百川大模型的技术报告

url arXiv: 2309.10305是百川大模型的技术报告

@Article{skyworkmath,
title={SkyMath: Technical Report},
author={Liu Yang, Haihua Yang, Wenjun Cheng, Lei Lin, Chenxia Li, Yifu Chen, Lunan Liu, Jianfei Pan, Tianwen Wei, Biye Li, Liang Zhao, Lijie Wang, Bo Zhu, Guoliang Li, Xuejie Wu, Xilin Luo, Rui Hu},
journal={arXiv preprint arXiv: 2310.16713},
url={https://arxiv.org/abs/2309.10305},
year={2023}
}

Would you provide more information about SkyMath?

Your paper suggested Instruction Boosting and Self-Compare FT would be very helpful but IB looks like Wizard-Evol and IB is very similar to PHP and according to the tech report, I cannot tell what are the differences between them.

Skywork-13B-Chat

Hi,When will Skywork-13B-Chat be released?

您好，作为一个个人开发者，想接入天工大模型，但是在开放平台官网中找了很久没找到api_key和api_secret？

报错Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit

Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit
是GPU不够吗
if "cpu" in device_map_without_lm_head.values() or "disk" in device_map_without_lm_head.values():
-> 3246 raise ValueError(
3247 """
3248 Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit
...
these modules in 32-bit, you need to set load_in_8bit_fp32_cpu_offload=True and pass a custom
device_map to from_pretrained. Check
https://huggingface.co/docs/transformers/main/en/main_classes/quantization#offload-between-cpu-and-gpu
for more details.

Can we get more information about deduplication ratio?

Thank you for your dedication to transparency and integrity in your work. I am particularly intrigued to inquire about the rate of deduplication observed throughout the pipeline process of your corpus management.

验证集评测结果复现与报告表不符

你好，我有一些问题，希望解答一下：
采用Qwen-14B运行给定的命令：bash bash_scripts/skywork_eval_loss.sh：

平均结果为2.424与报告中的结果9.67不一致。这是为什么？验证集发布不全吗？

评测时为什么要对文本做截断？截断是对输入句子长度做截断而不是对token做截断？不同模型的max_length不一致，max token都采用4096吗？都用4096不适应所有对比模型。用各自模型的max length?如果是的话，每个模型的输入长度不一样，评测就不公平了？
不截断的话，现有评测代码不支持窗口滑动，也不适应所有对比模型的max length？

希望解答一下以上疑问，谢谢～

Error: OSError: skywork/skywork-13b-chat is not a local folder

When I use the sample code, there are some errors.

OSError: skywork/skywork-13b-chat is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
If this is a private repository, make sure to pass a token having permission to this repo with use_auth_token or log in with huggingface-cli login and pass use_auth_token=True

请问测试数据会公开么？

如题，谢谢。

请问数据清洗，预处理代码，有计划开源吗？

感谢开源这么好的模型，如题~

RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling `cublasCreate(handle)`

我使用公示的代码(预测结果，单卡A100) 尝试跑结果，大佬们。请问一下，可以给我一点点建议吗？

出现了下面的错误信息
....
../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [25,0,0], thread: [6,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [25,0,0], thread: [7,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [25,0,0], thread: [8,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [25,0,0], thread: [9,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [25,0,0], thread: [10,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [25,0,0 return self._call_impl(*args, **kwargs)
], thread: [11,0 File "/mnt/SSD_12TB/ethan/application/anaconda3/envs/sky/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [25,0,0], thread: [12,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [25,0,0], thread: [13,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [25,0,0], thread: [14,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [25,0,0], thread: [15,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [25,0,0], thread: [16,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [25,0,0], thread: [17,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [25,0,0], thread: [18,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [25,0,0], thread: [19,0 return forward_call(*args, **kwargs)
,0] Assertion srcIndex < srcSelectDimSize File "/home/ethan/.cache/huggingface/modules/transformers_modules/Skywork-13B-base/modeling_skywork.py", line 726, in forward failed.
../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [25,0,0], thread: [20,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [25,0,0], thread: [21,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [25,0,0], thread: [22,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [25,0,0], thread: [23,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [25,0,0], thread: [24,0,0 outputs = self.model(
] Assertion srcIndex < srcSelectDimSize failed.
File "/mnt/SSD_12TB/ethan/application/anaconda3/envs/sky/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [25,0,0], thread: [25,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [25,0,0], thread: [26,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [25,0,0], thread: [27,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [25,0,0], thread: [28,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [25,0,0], thread: [29,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [25,0,0], thread: [30,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [25,0,0], thread: [31,0,0] Assertion srcIndex < srcSelectDimSize failed.
return self._call_impl(*args, **kwargs)
File "/mnt/SSD_12TB/ethan/application/anaconda3/envs/sky/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ethan/.cache/huggingface/modules/transformers_modules/Skywork-13B-base/modeling_skywork.py", line 641, in forward
layer_outputs = decoder_layer(
File "/mnt/SSD_12TB/ethan/application/anaconda3/envs/sky/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/mnt/SSD_12TB/ethan/application/anaconda3/envs/sky/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ethan/.cache/huggingface/modules/transformers_modules/Skywork-13B-base/modeling_skywork.py", line 449, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/mnt/SSD_12TB/ethan/application/anaconda3/envs/sky/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/mnt/SSD_12TB/ethan/application/anaconda3/envs/sky/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ethan/.cache/huggingface/modules/transformers_modules/Skywork-13B-base/modeling_skywork.py", line 346, in forward
query_states = self.q_proj(hidden_states)
File "/mnt/SSD_12TB/ethan/application/anaconda3/envs/sky/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/mnt/SSD_12TB/ethan/application/anaconda3/envs/sky/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/mnt/SSD_12TB/ethan/application/anaconda3/envs/sky/lib/python3.9/site-packages/torch/nn/modules/linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling cublasCreate(handle)

17个额外token

剩下的17个额外token在哪里可以找到？

PPL领域数据计算Average结果按照每个领域结果平均对不上

官方提供磁力链接数据源吗

Mistral AI 昨天发布了一条仅包含磁力链接的推文，很有启发意义。
网友打开该磁力链接后发现居然是一个大小为 87 GB 的种子。从命名和目录结构来看，这是一个 PyTorch 模型文件。
Mistral AI 这次 “开源” 的 mixtral-8x7b-32kseqlen 是一个基于混合专家 (Mixture of Experts, MoE) 的大模型，由 8 个 70 亿参数规模 (8×7b) 的专家网络组成。据称是全球首个开源 MoE 大模型

我感觉应该绕开 Hugging Face

这样**地区可以快速下载

legacy behaviour of the <SkyworkTokenizer'> This means that tokens that come after special tokens will not be properly handled.

When loading tokenizer with transformers.AutoTokenizer we receive a warning: You are using the legacy behaviour of the <class 'transformers_modules.Skywork.Skywork-13B-base.98a59dec44df3a8fd8fcd4bac07e94db35219eb1.tokenization_skywork.SkyworkTokenizer'> This means that tokens that come after special tokens will not be properly handled.

We already update transformers from 4.31.0 to 4.34.0, but we face the same warning in both version.

目前的开源模型版本是否支持工具调用？

如题

Skypile-150B数据里是否包含Skypile-STEM数据？

如题，麻烦问下开源的Skypile-150B数据里是否包含第二阶段预训练所需要的STEM数据

Code open-sourced？

The code in the train folder seems to be only applicable for fine-tuning the model. Will the pre-training code for the model be open-sourced?

关于该模型的部署，官方有推荐的框架吗？

如题

请问您开源的150B数据集和悟道的200G有重叠嘛

很感谢您开源的模型和数据集，在这里想请问您开源的150B数据集和悟道的200G有重叠嘛

hello world

TypeError: can only concatenate tuple (not "str") to tuple

the dataset (Skypile-150B) can not be download

hello!

i want to download the dataset, Skypile-150B.

but i find that huggingface link will be 404 not found.

is there any other download link?

thank you!

OSError: Unable to load weights from pytorch checkpoint file for Skywork-13B-base/pytorch_model-00010-of-00053.bin' If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.

Why releasing 13B-model instead of smaller ones, say, 7B?

In your tech report Chaper 3 you made a comparison between llama-7B and (your) GPT-7B, but you finally released a 13B model. So there are two questions:

will you release a smaller model ?
why do you design your model as the report listed? Does it perform better than the llama architecture?

请问您开源的150B数据集huggingface上怎么不能下载了？

谢谢

ValueError: Trainer: evaluation requires an eval_dataset.

在预训练最后一个step时需要评估验证指标，因为没有指定eval data而报错了，请问怎么关掉这个？ValueError: Trainer: evaluation requires an eval_dataset.
metrics = self.evaluate(ignore_keys=ignore_keys_for_eval)
File "/home/suser/.conda/envs/llm/lib/python3.10/site-packages/transformers/trainer.py", line 3062, in evaluate
eval_dataloader = self.get_eval_dataloader(eval_dataset)
File "/home/suser/.conda/envs/llm/lib/python3.10/site-packages/transformers/trainer.py", line 888, in get_eval_dataloader
raise ValueError("Trainer: evaluation requires an eval_dataset.")
ValueError: Trainer: evaluation requires an eval_dataset.