Comments (13)
Hi @bridgeqiqi @antal-horvath and @doulemint We have fixed the docker containers and the tutorials in our latest release: https://github.com/facebookresearch/vissl/releases/tag/v0.1.6
Please checkout https://vissl.ai/tutorials/.
from vissl.
Hi @antal-horvath , thank you for the report. We need to update our dockers , I'd recommend to follow the INSTALL.md
to cross-check any missing commands.
For classy vision, you can run pip install classy-vision@https://github.com/facebookresearch/ClassyVision/tarball/master
and it should work :)
from vissl.
Thanks for the quick reply. I'll try next Monday and will let you know if it worked.
from vissl.
Alright, I followed the installation instructions for the vissl pip package up to RUN pip install vissl
: This part took forever because the dependency resolver never finished here
INFO: pip is looking at multiple versions of pillow to determine which version is compatible with other requirements. This could take a while. Downloading Pillow-7.0.0-cp38-cp38-manylinux1_x86_64.whl (2.1 MB) Downloading Pillow-6.2.2-cp38-cp38-manylinux1_x86_64.whl (2.1 MB) Downloading Pillow-6.2.1-cp38-cp38-manylinux1_x86_64.whl (2.1 MB) Downloading Pillow-6.2.0.tar.gz (37.4 MB)
I then instead followed step 4.
Now, although I installed apex using this line in the Dockerfile
RUN VERSIONSTR=$(python3 /tmp/get_version_str.py); echo $VERSIONSTR; python3 -m pip install -f https://dl.fbaipublicfiles.com/vissl/packaging/apexwheels/$VERSIONSTR/download.html apex;
I realized after running the swav training command again that there is a warning about apex not being available.
Also python3 -c 'import apex'
yields
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib/python3.8/dist-packages/apex/init.py", line 13, in
from pyramid.session import UnencryptedCookieSessionFactoryConfig
ImportError: cannot import name 'UnencryptedCookieSessionFactoryConfig' from 'pyramid.session' (unknown location)
The dockerfile command above, in my case yields the following during the docker build process:
py38_cu102_pyt181
Looking in links: https://dl.fbaipublicfiles.com/vissl/packaging/apexwheels/py38_cu102_pyt181/download.html
Collecting apex
Downloading apex-0.9.10dev.tar.gz (36 kB)
...
...
Successfully built apex velruse anykeystore cryptacular pbkdf2
Installing collected packages: zope.interface, urllib3, plaster, PasteDeploy, idna, chardet, certifi, zope.deprecation, webob, venusian, translationstring, transaction, requests, plaster-pastedeploy, oauthlib, MarkupSafe, hupper, greenlet, defusedxml, wtforms, SQLAlchemy, requests-oauthlib, repoze.sendmail, python3-openid, pyramid, pbkdf2, anykeystore, zope.sqlalchemy, wtforms-recaptcha, velruse, pyramid-mailer, cryptacular, apex
Successfully installed MarkupSafe-1.1.1 PasteDeploy-2.1.1 SQLAlchemy-1.4.12 anykeystore-0.2 apex-0.9.10.dev0 certifi-2020.12.5 chardet-4.0.0 cryptacular-1.5.5 defusedxml-0.7.1 greenlet-1.0.0 hupper-1.10.2 idna-2.10 oauthlib-3.1.0 pbkdf2-1.3 plaster-1.0 plaster-pastedeploy-0.7 pyramid-2.0 pyramid-mailer-0.15.1 python3-openid-3.2.0 repoze.sendmail-4.4.1 requests-2.25.1 requests-oauthlib-1.3.0 transaction-3.0.1 translationstring-1.4 urllib3-1.26.4 velruse-1.1.1 venusian-3.0.0 webob-1.8.7 wtforms-2.3.3 wtforms-recaptcha-0.3.2 zope.deprecation-4.4.0 zope.interface-5.4.0 zope.sqlalchemy-1.4
WARNING: Running pip as root will break packages and permissions. You should install packages reliably by using venv: https://pip.pypa.io/warnings/venv
The last warning might suggest, that RUN python3 -m pip install -f https://dl.fbaipublicfiles.com/vissl/packaging/apexwheels/py38_cu102_pyt181/download.html apex
might not work as expected.
from vissl.
I think we can close this issue as it is indeed solved with
RUN python3 -m pip uninstall -y classy_vision
RUN python3 -m pip install classy-vision@https://github.com/facebookresearch/ClassyVision/tarball/master
But, since this issue was a result of the dockerfile not being up-to-date, and I was trying to fix it, I want to share my experience.
- Adding the installation of classy-vision to the current version of the dockerfile in this repo indeed solved the problem. But with the setup of cuda 10.2 and torch==1.5 I am not able to succeed with the tests in
./dev/run/quick_tests.sh
as it does not support the GPU architecture A100-SXM4. - Therefore, I tried to set up my own dockerfile and because of the WARNING at the bottom of my last comment I also added venv to the docker, following https://pythonspeed.com/articles/activate-virtualenv-dockerfile/. Here's my dockerfile:
FROM nvidia/cuda:11.2.2-cudnn8-devel-ubuntu20.04
ENV DEBIAN_FRONTEND noninteractive# Install some basic utilities
RUN apt-get update && apt-get install -y wget vim nano curl ca-certificates sudo git python3 python3-dev python3-pip python3-distutils python3-setproctitle python3-opencv python3-venv && rm -rf /var/lib/apt/lists/*# set virtuel environment venv in docker, see https://pythonspeed.com/articles/activate-virtualenv-dockerfile/
ENV VIRTUAL_ENV=/opt/venv
RUN python3 -m venv $VIRTUAL_ENV
ENV PATH="$VIRTUAL_ENV/bin:$PATH"# could be necessary for vissl? adapted it to venv
RUN ln -sv $VIRTUAL_ENV/bin/python3 /usr/bin/python# setup python environment
RUN python3 -m pip install --no-cache-dir -U setuptools
RUN python3 -m pip install --upgrade pip# install pytorch
RUN python3 -m pip install --pre torch torchvision -f https://download.pytorch.org/whl/nightly/cu112/torch_nightly.html# install apex
COPY ./get_version_str.py /tmp/get_version_str.py
RUN VERSIONSTR=$(python3 /tmp/get_version_str.py); echo $VERSIONSTR; python3 -m pip install -f https://dl.fbaipublicfiles.com/vissl/packaging/apexwheels/$VERSIONSTR/download.html apex;# install vissl
RUN git clone --recursive https://github.com/facebookresearch/vissl.git /tmp/vissl
WORKDIR /tmp/vissl
RUN python3 -m pip install --progress-bar off -r requirements.txt
RUN python3 -m pip install opencv-python
RUN python3 -m pip uninstall -y classy_vision
RUN python3 -m pip install classy-vision@https://github.com/facebookresearch/ClassyVision/tarball/master
RUN python3 -m pip install -e ".[dev]"#RUN python3 -m pip uninstall pyramid -y
RUN python3 -c 'import vissl, apex'
But docker build --network=host -t vissl:latest .
still fails when trying to import apex at the last line, which results in the same error mentioned in the previous comment:
Traceback (most recent call last):
File "", line 1, in
File "/opt/venv/lib/python3.8/site-packages/apex/init.py", line 13, in
from pyramid.session import UnencryptedCookieSessionFactoryConfig
ImportError: cannot import name 'UnencryptedCookieSessionFactoryConfig' from 'pyramid.session' (unknown location)
And the proposed solution in https://stackoverflow.com/questions/66610378/unencryptedcookiesessionfactoryconfig-error-when-importing-apex does not solve the problem because then we would get the error No module named 'pyramid'
.
@prigoyal I'm eager to use this repo, but I would need a docker version with the latest cuda und pytorch versions without apex problems. When do you plan to update the dockerfile?
from vissl.
Hi @antal-horvath , thank you so much. I appreciate you sharing the docker file above. We just need to refresh the docker files on our end. If you are up for it, please feel free to send the PR. :)
As for Apex, we have built Apex packages up to pytorch 1.8.0 only. The other versions will be supported in future release. However, you don't need to be blocked by it. We provide a simple bash script that can install the Apex from source. https://github.com/facebookresearch/vissl/blob/master/docker/common/install_apex.sh
It sounds like long term this might be the scalable solution for you (?) since you'd like to continue to adapt Apex , cuda, pytorch etc to latest versions? :)
I am excited and willing to support you , let me know what you prefer/what solution works :)
from vissl.
Actually, I don't get it to work. Also not with pytorch 1.8.0 (same error with pyramid); and also not with the above-mentioned script (with cuda 11.1, cudnn8, and torch 1.8.0). This script results in the following error:
... File "/opt/venv/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1561, in _get_cuda_arch_flags arch_list[-1] += '+PTX' IndexError: list index out of range Running setup.py install for apex: finished with status 'error'
ERROR: Command errored out with exit status 1: /opt/venv/bin/python3 -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-sv_7wu_k/apex_5dfed5a412cf4f1c9412e6765e702701/setup.py'"'"'; file='"'"'/tmp/pip-install-sv_7wu_k/apex_5dfed5a412cf4f1c9412e6765e702701/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(file) if os.path.exists(file) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' --cpp_ext --cuda_ext install --record /tmp/pip-record-w3h7171m/install-record.txt --single-version-externally-managed --compile --install-headers /opt/venv/include/site/python3.8/apex Check the logs for full command output.
Exception information:
Traceback (most recent call last):
File "/opt/venv/lib/python3.8/site-packages/pip/_internal/req/req_install.py", line 825, in install
success = install_legacy(
File "/opt/venv/lib/python3.8/site-packages/pip/_internal/operations/install/legacy.py", line 81, in install
raise LegacyInstallFailure
pip._internal.operations.install.legacy.LegacyInstallFailure
from vissl.
thank you @antal-horvath , we will take this into account and work on providing the VISSL official docker.
To unblock you immediately, it sounds like your blocker is primarily the Apex installation? Apex is currently used / required for 3 things:
- AMP for mixed precision
- SyncBatchNorm layer
- LARC
for the 1) and 2), PyTorch provides the alternatives now.
on 1) , for using PyTorch AMP, see this https://github.com/facebookresearch/vissl/blob/master/vissl/config/defaults.yaml#L724
on 2), for using pytorch SyncBN , see this https://github.com/facebookresearch/vissl/blob/master/vissl/config/defaults.yaml#L706
for 3), we recently added the LARS to VISSL. We can look into adding the LARC directly as well. LMK if this is a blocker.
Hope this helps.
from vissl.
new updates. https://github.com/facebookresearch/ClassyVision/tarball/master This link doesn't work right now.
from vissl.
Hi @doulemint , thank you for reaching out. The classy vision master
branch was renamed to main
. Could you try with that? Our installation instructions are updated to reflect the changes. https://github.com/facebookresearch/vissl/blob/main/INSTALL.md#step-4-install-vissl :)
from vissl.
How to register the dataset_catalog?
I follow the official tutorial code, save_file(json_data, '<my_vissl_path>/configs/config/dataset_catalog.json')
and then print(VisslDatasetCatalog.list())
. It shows me null list [].
from vissl.
Hi @antal-horvath and @doulemint I've fixed the docker containers in this PR: #458
LMK if you have any questions.
from vissl.
@bridgeqiqi The tutorials have a few problems right now, I am working on fixing them and will release ASAP.
from vissl.
Related Issues (20)
- Example of using ZeRO and LARC
- Incosistent results when running inference on trained resnet model HOT 1
- Low performance of Supervised training
- MLP_DIM in ViT is not adjustable through the config. HOT 2
- [Feature] Online linear probing
- Linear Image classification weights not loading properly: vissl pretrained models simclr or Dcv2 HOT 3
- "on_forward"-hook operations and back-propagation
- EMA does not work on fp16 and does not copy weights?
- First-class support for timm models
- Can't load pretrained seer/swav checkpoints (RG_Y_128GF) HOT 1
- Is it possible to get full imagenet pretrained weights of MoCo v2 ? HOT 2
- NPID + MoCoV2 weights are the same? HOT 1
- Potential confusion over licensing comment
- Does the MoCo implementation do shuffled Batch Norm when run on a single GPU? HOT 3
- Loading TRAINED models with VISSL HOT 9
- Use pretrain outside of vissl
- can someone share/has anyone shared a checkpoint ( with the class head) of a ssl model on imagenet?
- What config relate to learning rate warm up, weight decay, and momentum in 1 node n GPUs (n > 1 && n < 8) config?
- AttributeError: module 'importlib_resources' has no attribute 'is_resource'
- all vissl.ai/tutorials links broken
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from vissl.