Hello, I have come across the following error when trying to run thi

StopIteration: Caught StopIteration in replica 0 on device 0. about dnabert_2 HOT 3 CLOSED

anihab commented on May 31, 2024

StopIteration: Caught StopIteration in replica 0 on device 0.

from dnabert_2.

Comments (3)

Zhihan1996 commented on May 31, 2024

Hey,

Can you try to run the model in the distributed mode with torchrun --nproc-per-node=8 xxx.py instead of using python xxx.py?

from dnabert_2.

anihab commented on May 31, 2024

Hello! Apologies for the late response, I've been travelling. I tried running in distributed mode but ran into this output error instead (file linked below).

torchrun-output.txt

Do you think this could possibly be an issue with my environment? I'm using the conda environment that I built using the requirements file from your github and some additional packages (like pandas, since the output indicated it was missing on my first couple runs). I have following installed:

# packages in environment:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main  
_openmp_mutex             5.1                       1_gnu  
accelerate                0.20.3                   pypi_0    pypi
aiohttp                   3.8.4                    pypi_0    pypi
aiosignal                 1.3.1                    pypi_0    pypi
antlr4-python3-runtime    4.9.3                    pypi_0    pypi
async-timeout             4.0.2                    pypi_0    pypi
attrs                     23.1.0                   pypi_0    pypi
biopython                 1.78             py38h7f8727e_0  
blas                      1.0                         mkl  
bottleneck                1.3.5            py38h7deecbd_0  
ca-certificates           2023.05.30           h06a4308_0  
certifi                   2023.5.7                 pypi_0    pypi
charset-normalizer        3.2.0                    pypi_0    pypi
cmake                     3.26.4                   pypi_0    pypi
datasets                  2.13.1                   pypi_0    pypi
dill                      0.3.6                    pypi_0    pypi
einops                    0.6.1                    pypi_0    pypi
evaluate                  0.4.0                    pypi_0    pypi
filelock                  3.12.2                   pypi_0    pypi
frozenlist                1.3.3                    pypi_0    pypi
fsspec                    2023.6.0                 pypi_0    pypi
huggingface-hub           0.16.4                   pypi_0    pypi
idna                      3.4                      pypi_0    pypi
intel-openmp              2023.1.0         hdb19cb5_46305  
jinja2                    3.1.2                    pypi_0    pypi
joblib                    1.2.0            py38h06a4308_0  
ld_impl_linux-64          2.38                 h1181459_1  
libffi                    3.4.4                h6a678d5_0  
libgcc-ng                 11.2.0               h1234567_1  
libgfortran-ng            11.2.0               h00389a5_1  
libgfortran5              11.2.0               h1234567_1  
libgomp                   11.2.0               h1234567_1  
libstdcxx-ng              11.2.0               h1234567_1  
lit                       16.0.6                   pypi_0    pypi
markupsafe                2.1.3                    pypi_0    pypi
mkl                       2023.1.0         h6d00ec8_46342  
mkl-service               2.4.0            py38h5eee18b_1  
mkl_fft                   1.3.6            py38h417a72b_1  
mkl_random                1.2.2            py38h417a72b_1  
mpmath                    1.3.0                    pypi_0    pypi
multidict                 6.0.4                    pypi_0    pypi
multiprocess              0.70.14                  pypi_0    pypi
ncurses                   6.4                  h6a678d5_0  
networkx                  3.1                      pypi_0    pypi
numexpr                   2.8.4            py38hc78ab66_1  
numpy                     1.24.4                   pypi_0    pypi
numpy-base                1.24.3           py38h060ed82_1  
nvidia-cublas-cu11        11.10.3.66               pypi_0    pypi
nvidia-cuda-cupti-cu11    11.7.101                 pypi_0    pypi
nvidia-cuda-nvrtc-cu11    11.7.99                  pypi_0    pypi
nvidia-cuda-runtime-cu11  11.7.99                  pypi_0    pypi
nvidia-cudnn-cu11         8.5.0.96                 pypi_0    pypi
nvidia-cufft-cu11         10.9.0.58                pypi_0    pypi
nvidia-curand-cu11        10.2.10.91               pypi_0    pypi
nvidia-cusolver-cu11      11.4.0.1                 pypi_0    pypi
nvidia-cusparse-cu11      11.7.4.91                pypi_0    pypi
nvidia-nccl-cu11          2.14.3                   pypi_0    pypi
nvidia-nvtx-cu11          11.7.91                  pypi_0    pypi
omegaconf                 2.3.0                    pypi_0    pypi
openssl                   3.0.9                h7f8727e_0  
packaging                 23.1                     pypi_0    pypi
pandas                    2.0.3                    pypi_0    pypi
peft                      0.3.0                    pypi_0    pypi
pip                       23.1.2           py38h06a4308_0  
psutil                    5.9.5                    pypi_0    pypi
pyarrow                   12.0.1                   pypi_0    pypi
python                    3.8.17               h955ad1f_0  
python-dateutil           2.8.2              pyhd3eb1b0_0  
pytz                      2023.3                   pypi_0    pypi
pyyaml                    6.0                      pypi_0    pypi
readline                  8.2                  h5eee18b_0  
regex                     2023.6.3                 pypi_0    pypi
requests                  2.31.0                   pypi_0    pypi
responses                 0.18.0                   pypi_0    pypi
safetensors               0.3.1                    pypi_0    pypi
scikit-learn              1.2.2            py38h6a678d5_1  
scipy                     1.9.3            py38hf6e8229_2  
setuptools                67.8.0           py38h06a4308_0  
six                       1.16.0             pyhd3eb1b0_1  
sqlite                    3.41.2               h5eee18b_0  
sympy                     1.12                     pypi_0    pypi
tbb                       2021.8.0             hdb19cb5_0  
threadpoolctl             2.2.0              pyh0d69192_0  
tk                        8.6.12               h1ccaba5_0  
tokenizers                0.13.3                   pypi_0    pypi
torch                     2.0.1                    pypi_0    pypi
tqdm                      4.65.0                   pypi_0    pypi
transformers              4.30.2                   pypi_0    pypi
triton                    2.0.0                    pypi_0    pypi
typing-extensions         4.7.1                    pypi_0    pypi
tzdata                    2023.3                   pypi_0    pypi
urllib3                   2.0.3                    pypi_0    pypi
wheel                     0.38.4           py38h06a4308_0  
xxhash                    3.2.0                    pypi_0    pypi
xz                        5.4.2                h5eee18b_0  
yarl                      1.9.2                    pypi_0    pypi
zlib                      1.2.13               h5eee18b_0

from dnabert_2.

lucaskbobadilla commented on May 31, 2024

Any ideas here? getting the same error

from dnabert_2.

StopIteration: Caught StopIteration in replica 0 on device 0. about dnabert_2 HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent