Comments (3)
你好,如果您的环境中没有rdma的相关设备的话,可以在config里设置train.rdma_enabled = False
来关闭rdma。
from libai.
您好,您可以安装v0.2.0版本的libai来对应v0.8.0版本的oneflow。
也可以通过 python3 -m pip install --pre oneflow -f https://staging.oneflow.info/branch/master/cu102
的方式安装nightly版本的oneflow来对应最新版本的libai。
不过正在开发中还没有正式发布的代码,还没有经过严格的测试与验证,可靠性会低于正式发布的版本。
from libai.
train.rdma_enabled = False
感谢,该问题已解决,此外还发现另一个bug,需要将libai/libai/layers/attention.py中第242行的tril_fill_value=-10000.0
注释掉程序才能正常运行,但不知道注释掉这行对程序逻辑是否会有影响?
from libai.
Related Issues (20)
- 多机训练报错 HOT 13
- 多机训练失败后,非master node的进程没有完全kill掉 HOT 3
- 关于benchmark实验结果的疑问 HOT 2
- [Bug]libai test error:File exists: './data_test/bert_data' HOT 3
- 微信群满了 HOT 3
- CI test 失效
- 纯tensor并行训练,4卡和8卡使用的集合通信算子不同 HOT 2
- TypeError: __init__() got an unexpected keyword argument 'flags' HOT 5
- GLM libai推理报错 HOT 2
- MT5和T5的区别 HOT 4
- [多机多卡][MT5]failed to connect to all addresses HOT 1
- GPT2预训练,libai的throughput和以前的数据不匹配 HOT 1
- 测试并行框架,张量并行结果与官网所给数据不一致
- GLM 10B CN推理加速耗时 HOT 1
- 运行教程的bash tools/train.sh tools/train_net.py configs/vit_imagenet.py 8 命令报错
- Project下的MAE多卡训练报错
- 运行GLM示例报错 module 'oneflow._C' has no attribute 'fused_multi_head_attention_inference_v2' HOT 1
- 建议requirements 中涉及requests指定一下具体版本
- 单机多卡跑gpt2_pretrain.py遇到如下问题
- LLaMA-7B SFT died with <Signals.SIGABRT: 6>
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from libai.