Giter Site home page Giter Site logo

Comments (13)

Micla-SHL avatar Micla-SHL commented on August 26, 2024

Update: I checked it #106 and edit script
DATA_FILE=/Micla/Project/Aquila2/data/alpaca_data_train.jsonl

be ineffective

from aquila2.

ftgreat avatar ftgreat commented on August 26, 2024

All training logs are saved in the log dir.

from aquila2.

Micla-SHL avatar Micla-SHL commented on August 26, 2024

2023-11-06 14-20-36屏幕截图
OMG!This is throwing the same error that Aquila_v1 had.
2023-11-06 14-24-35屏幕截图

from aquila2.

ftgreat avatar ftgreat commented on August 26, 2024

大概率是环境问题。如果硬件适配的话,可以使用我们提供的镜像文件。

from aquila2.

Micla-SHL avatar Micla-SHL commented on August 26, 2024

大概率是环境问题。如果硬件适配的话,可以使用我们提供的镜像文件。

ok, I'll give it a go. My hunch is it's probably something with NCCL.
2023-11-08 17-52-49屏幕截图

I need to confirm whether this is right or not. Give me 2-3 days to work on it more and I'll swing by with a status update.
PS: The image envs: Aquila_v1

from aquila2.

ftgreat avatar ftgreat commented on August 26, 2024

先试试这个例子能运行吗?

https://github.com/FlagAI-Open/Aquila2/blob/main/examples/predict_chat.py

from aquila2.

Micla-SHL avatar Micla-SHL commented on August 26, 2024

Update: edit finetune.sh ---->> python -V
2023-11-11 01-51-39屏幕截图
run ssh $ip,
The ssh connected me to a weird environment I didn't recognize, unlike my usual ssh experience.

2023-11-11 01-55-09屏幕截图
I've identified the issue, but don't have a fix yet. The sample code has been running fine without any issues.

from aquila2.

Micla-SHL avatar Micla-SHL commented on August 26, 2024

Update:

ssh $ip  \ python -V       #OK
ssh $ip eval "python -V"   #python command not found

torchrun command not found because of "eval"

update:
add

      ssh $ip eval \
        	"export PATH=$PATH;

I fixed the earlier problem, though a fresh environment setup error has arisen.

Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


RuntimeErrortorch._C._cuda_setDevice(device)
: RuntimeError
RuntimeErrorCUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

I wanna take a crack at figuring it out solo before anything else.

from aquila2.

Micla-SHL avatar Micla-SHL commented on August 26, 2024

Update: run ok, maby
expecthostfile: slots={num} interacts with finetume.sh: --nproc_per_node={num}

Now:
The training data is too small. Emerge ValueError: num_samples should be a positive integer value, but got num_samples=0
Edit finetume.sh : DATA_FILE=/Micla/Data/GPT_Data/WuDaoCorpus2.0_base_200G/part-202101281a.json
TypeError: AquilaPreTrainedModel._set_gradient_checkpointing() got an unexpected keyword argument 'enable'
I'd check the dataset user guide for pointers on fetching and remaking the data.

from aquila2.

ftgreat avatar ftgreat commented on August 26, 2024

please use enough samples.

please try transformers 4.31.0 version. thanks.

from aquila2.

Micla-SHL avatar Micla-SHL commented on August 26, 2024

please use enough samples.

please try transformers 4.31.0 version. thanks.

请使用足够的样品。

请尝试transformers 4.31.0版本。谢谢.

Thank you. I'm in the process of collecting samples. Training should be fine if I get enough samples. I have Transformers version 4.35.0
I imagine the Aquila 2 from FlagScale and the one here are targeted at different use cases, right? The Aquila 2 in this context seems fine-tuned towards the provided Q&A, while FlagScale's Aquila 2 is probably a more general version trained on some benchmark dataset.

from aquila2.

ftgreat avatar ftgreat commented on August 26, 2024

please use enough samples.
please try transformers 4.31.0 version. thanks.
请使用足够的样品。
请尝试transformers 4.31.0版本。谢谢.

Thank you. I'm in the process of collecting samples. Training should be fine if I get enough samples. I have Transformers version 4.35.0 I imagine the Aquila 2 from FlagScale and the one here are targeted at different use cases, right? The Aquila 2 in this context seems fine-tuned towards the provided Q&A, while FlagScale's Aquila 2 is probably a more general version trained on some benchmark dataset.

Aquila2 base models is pretrained on about 2T tokens using FlagScale, which was forked from Megatron-LM framework.
Aquila2 chat models is fine-tuned on hundreds of thousands Q&A samples using Deepspeed framework for better usability.

Base models are foundations of chat models. You can use base models and your own Q&A samples to fine-tune.

from aquila2.

Micla-SHL avatar Micla-SHL commented on August 26, 2024

please use enough samples.
please try transformers 4.31.0 version. thanks.
请使用足够的样品。
请尝试transformers 4.31.0版本。谢谢.

Thank you. I'm in the process of collecting samples. Training should be fine if I get enough samples. I have Transformers version 4.35.0 I imagine the Aquila 2 from FlagScale and the one here are targeted at different use cases, right? The Aquila 2 in this context seems fine-tuned towards the provided Q&A, while FlagScale's Aquila 2 is probably a more general version trained on some benchmark dataset.

Aquila2 base models is pretrained on about 2T tokens using FlagScale, which was forked from Megatron-LM framework. Aquila2 chat models is fine-tuned on hundreds of thousands Q&A samples using Deepspeed framework for better usability.

Base models are foundations of chat models. You can use base models and your own Q&A samples to fine-tune.

请使用足够的样品。
请尝试transformers 4.31.0版本。谢谢.
请使用足够的样品。
请试用transformers 4.31.0版本。谢谢。

谢谢我正在收集样本。如果我有足够的样本,训练应该没问题。我有变形金刚4.35.0版,我想FlagScale的Aquila 2和这里的一个针对不同的用例,对吗?在这种情况下,Aquila 2似乎对提供的Q A进行了微调,而FlagScale的Aquila 2可能是在一些基准数据集上训练的更通用的版本。

Aquila 2基础模型使用从Megatron-LM框架派生的FlagScale在大约2 T的token上进行预训练。 Aquila 2聊天模型使用Deepspeed框架在数十万个问答样本上进行了微调,以获得更好的可用性。

基本模型是聊天模型的基础。您可以使用基本模型和自己的问答样本进行微调。

Thanks

from aquila2.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.