Comments (3)
或许您可以提供一下GPU、环境信息,我这里测试是没有问题的。
from xrayglm.
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.43.02 Driver Version: 535.98 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 3080 Ti On | 00000000:02:00.0 On | N/A |
| 30% 35C P8 32W / 350W | 850MiB / 12288MiB | 26% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0
Package Version
aiofiles 23.1.0
aiohttp 3.8.4
aiosignal 1.3.1
altair 5.0.1
anyio 3.7.0
async-timeout 4.0.2
attrs 23.1.0
bitsandbytes 0.39.0
certifi 2023.5.7
charset-normalizer 3.1.0
click 8.1.3
cmake 3.26.3
contourpy 1.0.7
cpm-kernels 1.0.11
cycler 0.11.0
datasets 2.12.0
deepspeed 0.9.2
dill 0.3.6
einops 0.6.1
exceptiongroup 1.1.1
fastapi 0.95.2
ffmpy 0.3.0
filelock 3.12.0
fonttools 4.39.4
frozenlist 1.3.3
fsspec 2023.5.0
gradio 3.33.0
gradio_client 0.2.5
h11 0.14.0
hjson 3.1.0
httpcore 0.17.2
httpx 0.24.1
huggingface-hub 0.15.1
idna 3.4
Jinja2 3.1.2
jsonschema 4.17.3
kiwisolver 1.4.4
latex2mathml 3.76.0
linkify-it-py 2.0.2
lit 16.0.5
Markdown 3.4.3
markdown-it-py 2.2.0
MarkupSafe 2.1.2
matplotlib 3.7.1
mdit-py-plugins 0.3.3
mdtex2html 1.2.0
mdurl 0.1.2
mpmath 1.3.0
multidict 6.0.4
multiprocess 0.70.14
networkx 3.1
ninja 1.11.1
numpy 1.24.3
nvidia-cublas-cu11 11.10.3.66
nvidia-cuda-cupti-cu11 11.7.101
nvidia-cuda-nvrtc-cu11 11.7.99
nvidia-cuda-runtime-cu11 11.7.99
nvidia-cudnn-cu11 8.5.0.96
nvidia-cufft-cu11 10.9.0.58
nvidia-curand-cu11 10.2.10.91
nvidia-cusolver-cu11 11.4.0.1
nvidia-cusparse-cu11 11.7.4.91
nvidia-nccl-cu11 2.14.3
nvidia-nvtx-cu11 11.7.91
orjson 3.9.0
packaging 23.1
pandas 2.0.2
Pillow 9.5.0
pip 23.0.1
protobuf 3.20.3
psutil 5.9.5
py-cpuinfo 9.0.0
pyarrow 12.0.0
pydantic 1.10.8
pydub 0.25.1
Pygments 2.15.1
pyparsing 3.0.9
PyQt5 5.15.9
PyQt5-Qt5 5.15.2
PyQtWebEngine 5.15.6
PyQtWebEngine-Qt5 5.15.2
pyrsistent 0.19.3
python-dateutil 2.8.2
python-multipart 0.0.6
pytz 2023.3
PyYAML 6.0
regex 2023.5.5
requests 2.31.0
responses 0.18.0
scipy 1.10.1
semantic-version 2.10.0
sentencepiece 0.1.99
setuptools 67.8.0
six 1.16.0
sniffio 1.3.0
starlette 0.27.0
SwissArmyTransformer 0.3.7
sympy 1.12
tensorboardX 2.6
tokenizers 0.13.3
toolz 0.12.0
torch 2.0.1+cu118
torchaudio 2.0.2+cu118
torchvision 0.15.2+cu118
tqdm 4.65.0
transformers 4.29.2
triton 2.0.0
typing_extensions 4.6.2
tzdata 2023.3
uc-micro-py 1.0.2
urllib3 2.0.2
uvicorn 0.22.0
websockets 11.0.3
wheel 0.38.4
xxhash 3.2.0
yarl 1.9.2
(xrayglm) lsj@DESKTOP-H1KB736:/mnt/c/Users/38561/xrayglm$ python cli_demo.py --quant 4 --from_pretrained checkpoints/checkpoints-XrayGLM-300 --prompt_zh '详细描述这张胸部X光片的诊断结果'
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run
python -m bitsandbytes
and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
bin /home/lsj/.conda/envs/xrayglm/lib/python3.10/site-packages/bitsandbytes-0.39.0-py3.10.egg/bitsandbytes/libbitsandbytes_cuda118.so
/home/lsj/.conda/envs/xrayglm/lib/python3.10/site-packages/bitsandbytes-0.39.0-py3.10.egg/bitsandbytes/cuda_setup/main.py:149: UserWarning: /home/lsj/.conda/envs/xrayglm did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
warn(msg)
CUDA SETUP: CUDA runtime path found: /usr/local/cuda-11.8/lib64/libcudart.so.11.0
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 118
CUDA SETUP: Loading binary /home/lsj/.conda/envs/xrayglm/lib/python3.10/site-packages/bitsandbytes-0.39.0-py3.10.egg/bitsandbytes/libbitsandbytes_cuda118.so...
[2023-06-02 21:00:20,419] [INFO] building FineTuneVisualGLMModel model ...
[2023-06-02 21:00:20,420] [INFO] [RANK 0] > initializing model parallel with size 1
[2023-06-02 21:00:20,421] [INFO] [RANK 0] You are using model-only mode.
For torch.distributed users or loading model parallel models, set environment variables RANK, WORLD_SIZE and LOCAL_RANK.
/home/lsj/.conda/envs/xrayglm/lib/python3.10/site-packages/torch/nn/init.py:405: UserWarning: Initializing zero-element tensors is a no-op
warnings.warn("Initializing zero-element tensors is a no-op")
replacing layer 0 attention with lora
replacing layer 14 attention with lora
[2023-06-02 21:00:30,567] [INFO] [RANK 0] > number of parameters on model parallel rank 0: 7811237376
[2023-06-02 21:00:37,716] [INFO] [RANK 0] global rank 0 is loading checkpoint checkpoints/checkpoints-XrayGLM-300/300/mp_rank_00_model_states.pt
Killed
这次添加了export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/wsl/lib/
已经没有任何报错了,但是还是在读取模型的时候killed,不知道是否与WSL2不是原生Linux有关。windows版按照教程装还是提示没有deepspeed。目前只有在colab上成功架设。
from xrayglm.
检索了一下好像是说Linux的虚拟内存不够导致进程被杀掉了,这台电脑只有16G内存,过几天换台服务器试一下T T
from xrayglm.
Related Issues (20)
- 'Namespace' object has no attribute 'pad_token_id' 请问这个问题怎么解决呢! HOT 2
- 关于微调之后模型的表现能力的问题
- 有没有尝试使用CogVLM进行训练 HOT 3
- 使用qlora微调后,运行cli_demo.py 对应的权重,报错RuntimeError: The size of tensor a (12288) must match the size of tensor b (25165824) at non-singleton dimension 0
- huggingface远程加载模型不稳定 HOT 2
- 中文医学多模态数据集问题 HOT 1
- (*bias): last dimension must be contiguous HOT 4
- 请问多轮对话数据格式如何设计? HOT 1
- 运行训练脚本报错
- 使用提供的模型权重推理时报模型加载出错
- 运行cli_demo.py 程序卡住不动 HOT 1
- 微调真的学习到了图片中的内容,还是只学习到了文字的说话方式
- 是否能提供体验的链接
- exits with return code = -11
- 请问有没有定量指标的结果?
- 模型下载的网站打不开啦。。。。。
- BadZipFile: File is not a zip file”
- 在尝试进行多卡微调的时候报错,这是因为nccl版本的问题吗? HOT 3
- 请问如何获取最好的模型权重
- AttributeError: 'Namespace' object has no attribute 'pad_token_id'. Did you mean: 'bos_token_id'? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from xrayglm.