Comments (7)
Another error:
command issued for mnist example:
C:\mlperf\mlbox_11062020\box_examples\mnist> docker run --rm --net=host --privileged=true --volume C:\mlperf\mlbox_11062020\box_examples\mnist\workspace/data:/mlbox_io0/data --volume C:\mlperf\mlbox_11062020\box_examples\mnist\workspace/download_logs:/mlbox_io1/download_logs serebrya/mlbox_mnist:0.0.2 download --data_dir=/mlbox_io0/data --log_dir=/mlbox_io1/download_logs
here is the error:
2020-11-10 16:58:42.772479: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory
2020-11-10 16:58:42.772697: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory
2020-11-10 16:58:42.772714: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
from mlcube.
@hshaikusa These errors are OK. When no GPUs are available, TF should fall back to CPU compute backend. I see these messages on Linux machines as well.
from mlcube.
@sergey-serebryakov , ok
here is another error i am facing for mnist:
command:
C:\mlperf\mlbox_11062020\box_examples\mnist> mlcommons_box_docker run --mlbox=. --platform=platforms/docker.yaml --task=run/train.yaml
outcome:
MLBox(root=C:\mlperf\mlbox_11062020\box_examples\mnist, name=mnist, version=0.1.0, task=MLBoxTask(inputs={'data_dir': 'directory', 'parameters_file': 'file'}, outputs={'log_dir': 'directory', 'model_dir': 'directory'}), invoke=MLBoxInvoke(task_name=train, input_binding={'data_dir': '$WORKSPACE/data', 'parameters_file': '$WORKSPACE/parameters/default.parameters.yaml'}, output_binding={'log_dir': '$WORKSPACE/train_logs', 'model_dir': '$WORKSPACE/model'}), platform=<mlcommons_box.common.objects.platform_config.PlatformConfig object at 0x0000015A78854F48>)
docker inspect --type=image serebrya/mlbox_mnist:0.0.2 > /dev/null 2>&1
The system cannot find the path specified.
Docker image (serebrya/mlbox_mnist:0.0.2) does not exist. Running 'configure' phase.
docker pull serebrya/mlbox_mnist:0.0.2
0.0.2: Pulling from serebrya/mlbox_mnist
Digest: sha256:75667646473cda957bd23b52b6f660fb462986d7776d323a654ae59269ce02b9
Status: Image is up to date for serebrya/mlbox_mnist:0.0.2
docker.io/serebrya/mlbox_mnist:0.0.2
mounts={'C:\mlperf\mlbox_11062020\box_examples\mnist\workspace/data': '/mlbox_io0/data', 'C:\mlperf\mlbox_11062020\box_examples\mnist\workspace/parameters': '/mlbox_io1/C:\mlperf\mlbox_11062020\box_examples\mnist\workspace/parameters', 'C:\mlperf\mlbox_11062020\box_examples\mnist\workspace/train_logs': '/mlbox_io2/train_logs', 'C:\mlperf\mlbox_11062020\box_examples\mnist\workspace/model': '/mlbox_io3/model'}, args=['train', '--data_dir=/mlbox_io0/data', '--parameters_file=/mlbox_io1/C:\mlperf\mlbox_11062020\box_examples\mnist\workspace/parameters/default.parameters.yaml', '--log_dir=/mlbox_io2/train_logs', '--model_dir=/mlbox_io3/model']
docker run --rm --net=host --privileged=true --volume C:\mlperf\mlbox_11062020\box_examples\mnist\workspace/data:/mlbox_io0/data --volume C:\mlperf\mlbox_11062020\box_examples\mnist\workspace/parameters:/mlbox_io1/C:\mlperf\mlbox_11062020\box_examples\mnist\workspace/parameters --volume C:\mlperf\mlbox_11062020\box_examples\mnist\workspace/train_logs:/mlbox_io2/train_logs --volume C:\mlperf\mlbox_11062020\box_examples\mnist\workspace/model:/mlbox_io3/model serebrya/mlbox_mnist:0.0.2 train --data_dir=/mlbox_io0/data --parameters_file=/mlbox_io1/C:\mlperf\mlbox_11062020\box_examples\mnist\workspace/parameters/default.parameters.yaml --log_dir=/mlbox_io2/train_logs --model_dir=/mlbox_io3/model
docker: Error response from daemon: invalid mode: \mlperf\mlbox_11062020\box_examples\mnist\workspace/parameters.
See 'docker run --help'.
Traceback (most recent call last):
File "c:\programdata\anaconda3\envs\mlbox_11062020\lib\runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "c:\programdata\anaconda3\envs\mlbox_11062020\lib\runpy.py", line 85, in run_code
exec(code, run_globals)
File "C:\ProgramData\Anaconda3\envs\mlbox_11062020\Scripts\mlcommons_box_docker.exe_main.py", line 7, in
File "c:\programdata\anaconda3\envs\mlbox_11062020\lib\site-packages\click\core.py", line 829, in call
return self.main(*args, **kwargs)
File "c:\programdata\anaconda3\envs\mlbox_11062020\lib\site-packages\click\core.py", line 782, in main
rv = self.invoke(ctx)
File "c:\programdata\anaconda3\envs\mlbox_11062020\lib\site-packages\click\core.py", line 1259, in invoke
return process_result(sub_ctx.command.invoke(sub_ctx))
File "c:\programdata\anaconda3\envs\mlbox_11062020\lib\site-packages\click\core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "c:\programdata\anaconda3\envs\mlbox_11062020\lib\site-packages\click\core.py", line 610, in invoke
return callback(*args, **kwargs)
File "c:\programdata\anaconda3\envs\mlbox_11062020\lib\site-packages\mlcommons_box_docker_main.py", line 45, in run
runner.run()
File "c:\programdata\anaconda3\envs\mlbox_11062020\lib\site-packages\mlcommons_box_docker\docker_run.py", line 72, in run
self._run_or_die(cmd)
File "c:\programdata\anaconda3\envs\mlbox_11062020\lib\site-packages\mlcommons_box_docker\docker_run.py", line 117, in _run_or_die
raise RuntimeError('Command failed: {}'.format(cmd))
RuntimeError: Command failed: docker run --rm --net=host --privileged=true --volume
C:\mlperf\mlbox_11062020\box_examples\mnist\workspace/data:/mlbox_io0/data --volume C:\mlperf\mlbox_11062020\box_examples\mnist\workspace/parameters:/mlbox_io1/C:\mlperf\mlbox_11062020\box_examples\mnist\workspace/parameters --volume C:\mlperf\mlbox_11062020\box_examples\mnist\workspace/train_logs:/mlbox_io2/train_logs --volume C:\mlperf\mlbox_11062020\box_examples\mnist\workspace/model:/mlbox_io3/model serebrya/mlbox_mnist:0.0.2 train --data_dir=/mlbox_io0/data --parameters_file=/mlbox_io1/C:\mlperf\mlbox_11062020\box_examples\mnist\workspace/parameters/default.parameters.yaml --log_dir=/mlbox_io2/train_logs --model_dir=/mlbox_io3/model
from mlcube.
@hshaikusa Thanks, there's one more issue to be fixed associated with how mount points are constructed. I updated the first message in this thread.
I cannot run docker on my win laptop (probably, due to McAfee). I asked our admins to allocate a Windows virtual instance that I can use for testing.
from mlcube.
I think we might need to support Windows specific filepath construction. Probably a workaround for now (as we're working to stabilize the code) is to maybe use WSL
and add instructions for that.
from mlcube.
Update: I got access to Windows server and I could install docker. I should be able to provide a fix for Windows systems (local Docker runner) next week.
from mlcube.
@sergey-serebryakov cool. looking forward to the fixes. please plan for them to push to PyPI once you are done with your level of validation. I would like them to validate as an outsider who can download as per the instructions and play with them.
from mlcube.
Related Issues (20)
- mlcube using singularity run --nv
- unclear scope for the project - training only, no serving component?
- Updated the dynabench doc to gather requirements and pick out a path forward HOT 1
- Singularity run arguments not working HOT 2
- Add Option to Adjust Hyperparams via Env Vars HOT 1
- Upgrade click to `>=8.0.0,<9.0.0` HOT 7
- SPython incompatible with windows. MLCube fails at import time HOT 1
- MLCube doesn't recognize parameter type on Windows HOT 1
- [Medperf] update cookiecutter for dependency resolution HOT 1
- Singualrity-compatible MLCube templates and examples
- MLCube needs to run docker images using host user credentials (name and group).
- GPUs parameter doesn't seem to be working HOT 3
- Add support for `mlcube --version` CLI command.
- Singularity fails in 0.0.10rc0 HOT 15
- mlcube configure with local docker image
- Run docker image with Singularity fails when providing runner cli args
- Assign CUDA_VISIBLE_DEVICES for Docker when passing GPUs
- change the name of `--mount` arg
- Runtime parameters of mlcube can't include the letter "h" HOT 3
- Update OmegaConf version
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mlcube.