Hi, my envs: 4090 * 1、torch2.1.0+cu118、flagai==1.8.2 I can now use

Update: I checked it <a class="issue-link js-issue-link" data-error-text="Failed to l

<a target="_blank" rel="noopener noreferrer" href="https://private-user-images.githubu

先试试这个例子能运行吗？ <a href="https://github.com/FlagAI-Open/Aquila2/blob/ma

Update: edit finetune.sh ---->> python -V <a target="_blank" rel="noopener n

Update: <div class="snippet-clipboard-content notranslate position-relative overfl

run bash finetune/7B/finetune.sh It didn't output any useful information about aquila2 HOT 13 CLOSED

Micla-SHL commented on August 26, 2024

run bash finetune/7B/finetune.sh It didn't output any useful information

from aquila2.

Comments (13)

Micla-SHL commented on August 26, 2024

Update: I checked it #106 and edit script
DATA_FILE=/Micla/Project/Aquila2/data/alpaca_data_train.jsonl

be ineffective

from aquila2.

ftgreat commented on August 26, 2024

All training logs are saved in the log dir.

from aquila2.

Micla-SHL commented on August 26, 2024

OMG！This is throwing the same error that Aquila_v1 had.

from aquila2.

ftgreat commented on August 26, 2024

大概率是环境问题。如果硬件适配的话，可以使用我们提供的镜像文件。

from aquila2.

Micla-SHL commented on August 26, 2024

大概率是环境问题。如果硬件适配的话，可以使用我们提供的镜像文件。

ok, I'll give it a go. My hunch is it's probably something with NCCL.

I need to confirm whether this is right or not. Give me 2-3 days to work on it more and I'll swing by with a status update.
PS: The image envs: Aquila_v1

from aquila2.

ftgreat commented on August 26, 2024

先试试这个例子能运行吗？

https://github.com/FlagAI-Open/Aquila2/blob/main/examples/predict_chat.py

from aquila2.

Micla-SHL commented on August 26, 2024

Update: edit finetune.sh ---->> python -V

run ssh $ip,
The ssh connected me to a weird environment I didn't recognize, unlike my usual ssh experience.

I've identified the issue, but don't have a fix yet. The sample code has been running fine without any issues.

from aquila2.

Micla-SHL commented on August 26, 2024

Update:

ssh $ip  \ python -V       #OK
ssh $ip eval "python -V"   #python command not found

torchrun command not found because of "eval"

update:
add

      ssh $ip eval \
        	"export PATH=$PATH;

I fixed the earlier problem, though a fresh environment setup error has arisen.

Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


RuntimeErrortorch._C._cuda_setDevice(device)
: RuntimeError
RuntimeErrorCUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

I wanna take a crack at figuring it out solo before anything else.

from aquila2.

Micla-SHL commented on August 26, 2024

Update: run ok, maby
expecthostfile: slots={num} interacts with finetume.sh: --nproc_per_node={num}

Now:
The training data is too small. Emerge ValueError: num_samples should be a positive integer value, but got num_samples=0
Edit finetume.sh : DATA_FILE=/Micla/Data/GPT_Data/WuDaoCorpus2.0_base_200G/part-202101281a.json
TypeError: AquilaPreTrainedModel._set_gradient_checkpointing() got an unexpected keyword argument 'enable'
I'd check the dataset user guide for pointers on fetching and remaking the data.

from aquila2.

ftgreat commented on August 26, 2024

please use enough samples.

please try transformers 4.31.0 version. thanks.

from aquila2.

Micla-SHL commented on August 26, 2024

please use enough samples.

please try transformers 4.31.0 version. thanks.

请使用足够的样品。

请尝试transformers 4.31.0版本。谢谢.

Thank you. I'm in the process of collecting samples. Training should be fine if I get enough samples. I have Transformers version 4.35.0
I imagine the Aquila 2 from FlagScale and the one here are targeted at different use cases, right? The Aquila 2 in this context seems fine-tuned towards the provided Q&A, while FlagScale's Aquila 2 is probably a more general version trained on some benchmark dataset.

from aquila2.

ftgreat commented on August 26, 2024

please use enough samples.
please try transformers 4.31.0 version. thanks.
请使用足够的样品。
请尝试transformers 4.31.0版本。谢谢.

Thank you. I'm in the process of collecting samples. Training should be fine if I get enough samples. I have Transformers version 4.35.0 I imagine the Aquila 2 from FlagScale and the one here are targeted at different use cases, right? The Aquila 2 in this context seems fine-tuned towards the provided Q&A, while FlagScale's Aquila 2 is probably a more general version trained on some benchmark dataset.

Aquila2 base models is pretrained on about 2T tokens using FlagScale, which was forked from Megatron-LM framework.
Aquila2 chat models is fine-tuned on hundreds of thousands Q&A samples using Deepspeed framework for better usability.

Base models are foundations of chat models. You can use base models and your own Q&A samples to fine-tune.

from aquila2.

Micla-SHL commented on August 26, 2024

please use enough samples.
please try transformers 4.31.0 version. thanks.
请使用足够的样品。
请尝试transformers 4.31.0版本。谢谢.

Thank you. I'm in the process of collecting samples. Training should be fine if I get enough samples. I have Transformers version 4.35.0 I imagine the Aquila 2 from FlagScale and the one here are targeted at different use cases, right? The Aquila 2 in this context seems fine-tuned towards the provided Q&A, while FlagScale's Aquila 2 is probably a more general version trained on some benchmark dataset.

Aquila2 base models is pretrained on about 2T tokens using FlagScale, which was forked from Megatron-LM framework. Aquila2 chat models is fine-tuned on hundreds of thousands Q&A samples using Deepspeed framework for better usability.

Base models are foundations of chat models. You can use base models and your own Q&A samples to fine-tune.

请使用足够的样品。
请尝试transformers 4.31.0版本。谢谢.
请使用足够的样品。
请试用transformers 4.31.0版本。谢谢。

谢谢我正在收集样本。如果我有足够的样本，训练应该没问题。我有变形金刚4.35.0版，我想FlagScale的Aquila 2和这里的一个针对不同的用例，对吗？在这种情况下，Aquila 2似乎对提供的Q A进行了微调，而FlagScale的Aquila 2可能是在一些基准数据集上训练的更通用的版本。

Aquila 2基础模型使用从Megatron-LM框架派生的FlagScale在大约2 T的token上进行预训练。 Aquila 2聊天模型使用Deepspeed框架在数十万个问答样本上进行了微调，以获得更好的可用性。

基本模型是聊天模型的基础。您可以使用基本模型和自己的问答样本进行微调。

Thanks

from aquila2.

run bash finetune/7B/finetune.sh It didn't output any useful information about aquila2 HOT 13 CLOSED

Comments (13)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent