Hi, I was able to use FER on CPU, but cannot make it work on GPU. Based on thi

Dear <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

FER not working with GPU,about justinshenk/fer

Comments (9)

kormoczi commented on May 23, 2024 1

Dear @Saran-nns and @JustinShenk,

My results were similar...
If there is any problem with the GPU initialization (similar to your example, @Saran-nns), then the system falls back using the CPU, so everything works (or at least it looks like).
But if the GPU initialization is OK, than after that there will be errors.

So I think the question is still pending...

Best regards,
Csaba

from fer.

kormoczi commented on May 23, 2024 1

Hi @Saran-nns,

Thanks for your suggestions, I will check them...
But I have two questions:

You wrote this: "From your log, it is clear that the CUDA couldn't reach the cudnn .dll files."
Which part of my log shows this?
You suggested to use cuda 11.x, cudnn 8.x, but as I have stated in the beginning, I am using cuda 11.0.3 / cudnn 8.0.5 already,
so this should be ok... No?

Best regards

from fer.

kormoczi commented on May 23, 2024 1

Hi @Saran-nns,

I have checked the project again, and to my very big surprise, after I have re-built the docker image (without any modification), right now the example is working without any problem!
I can't tell yet, what has changed, but most probably not the FER library and not the CUDA/CuDNN...

Best regards

from fer.

Saran-nns commented on May 23, 2024

Thanks for reporting the issue.

I suspect possible compatibility issues between the OS and CUDA/cudnn versions:

I ran the example.py under the env:

OS: Windows 10
Python : 3.6
TF:2.4
CUDA:10.2 with 11.0 dll
Cudnn:8.2

The script ran without issues as seen below;

2021-08-28 14:07:48.781767: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll WARNING:tensorflow:From C:\Users\saran\Anaconda3\envs\tfgpu\lib\site-packages\tensorflow\python\compat\v2_compat.py:96: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version. Instructions for updating: non-resource variables are not supported in the long term 2021-08-28 14:08:42.383958: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library nvcuda.dll 2021-08-28 14:08:47.564227: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: pciBusID: 0000:01:00.0 name: GeForce GTX 1060 with Max-Q Design computeCapability: 6.1 coreClock: 1.3415GHz coreCount: 10 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 178.99GiB/s 2021-08-28 14:08:47.602994: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll 2021-08-28 14:08:47.743832: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cublas64_11.dll'; dlerror: cublas64_11.dll not found 2021-08-28 14:08:47.801707: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cublasLt64_11.dll'; dlerror: cublasLt64_11.dll not found 2021-08-28 14:08:48.108700: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cufft64_10.dll 2021-08-28 14:08:48.163044: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library curand64_10.dll 2021-08-28 14:08:48.172109: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cusolver64_11.dll'; dlerror: cusolver64_11.dll not found 2021-08-28 14:08:48.180278: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cusparse64_11.dll'; dlerror: cusparse64_11.dll not found 2021-08-28 14:08:48.189189: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudnn64_8.dll'; dlerror: cudnn64_8.dll not found 2021-08-28 14:08:48.195744: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1766] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform. Skipping registering GPU devices... 2021-08-28 14:08:48.347281: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2 To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2021-08-28 14:08:48.421196: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix: 2021-08-28 14:08:48.435452: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264] 2021-08-28 14:08:51.170786: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: pciBusID: 0000:01:00.0 name: GeForce GTX 1060 with Max-Q Design computeCapability: 6.1 coreClock: 1.3415GHz coreCount: 10 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 178.99GiB/s 2021-08-28 14:08:51.183624: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1766] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform. Skipping registering GPU devices... 2021-08-28 14:08:53.758128: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix: 2021-08-28 14:08:53.765005: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264] 0 2021-08-28 14:08:53.769173: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0: N WARNING:tensorflow:From C:\Users\saran\Anaconda3\envs\tfgpu\lib\site-packages\tensorflow\python\keras\layers\normalization.py:534: _colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. 28-08-2021:14:08:54,419 WARNING [deprecation.py:336] From C:\Users\saran\Anaconda3\envs\tfgpu\lib\site-packages\tensorflow\python\keras\layers\normalization.py:534: _colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. C:\Users\saran\Anaconda3\envs\tfgpu\lib\site-packages\tensorflow\python\keras\engine\training.py:2426: UserWarning: Model.state_updateswill be removed in a future version. This property should not be used in TensorFlow 2.0, asupdates are applied automatically. warnings.warn('Model.state_updates will be removed in a future version. ' [{'box': (83, 83, 200, 200), 'emotions': {'angry': 0.0, 'disgust': 0.0, 'fear': 0.0, 'happy': 0.97, 'sad': 0.0, 'surprise': 0.0, 'neutral': 0.03}}]

May I know your

OS
Do you have multiple versions of CUDA installed?

Please try to upgrade to Cudnn==8.2 and let us know if the error persists

from fer.

JustinShenk commented on May 23, 2024

It ran without issues as you said but does not appear to be using GPUs, which is the issue reported: Could not load dynamic library 'cudnn64_8.dll';

…

On Sat 28. Aug 2021 at 14:36 Saranraj Nambusubramaniyan < ***@***.***> wrote: Thanks for reporting the issue. I suspect possible compatibility issues between the OS and CUDA/cudnn versions: I ran the example.py under the env: OS: Windows 10 Python : 3.6 TF:2.4 CUDA:10.2 with 11.0 dll Cudnn:8.2 The script ran without issues as seen below; 2021-08-28 14:07:48.781767: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll WARNING:tensorflow:From C:\Users\saran\Anaconda3\envs\tfgpu\lib\site-packages\tensorflow\python\compat\v2_compat.py:96: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version. Instructions for updating: non-resource variables are not supported in the long term 2021-08-28 14:08:42.383958: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library nvcuda.dll 2021-08-28 14:08:47.564227: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: pciBusID: 0000:01:00.0 name: GeForce GTX 1060 with Max-Q Design computeCapability: 6.1 coreClock: 1.3415GHz coreCount: 10 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 178.99GiB/s 2021-08-28 14:08:47.602994: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll 2021-08-28 14:08:47.743832: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cublas64_11.dll'; dlerror: cublas64_11.dll not found 2021-08-28 14:08:47.801707: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cublasLt64_11.dll'; dlerror: cublasLt64_11.dll not found 2021-08-28 14:08:48.108700: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cufft64_10.dll 2021-08-28 14:08:48.163044: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library curand64_10.dll 2021-08-28 14:08:48.172109: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cusolver64_11.dll'; dlerror: cusolver64_11.dll not found 2021-08-28 14:08:48.180278: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cusparse64_11.dll'; dlerror: cusparse64_11.dll not found 2021-08-28 14:08:48.189189: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudnn64_8.dll'; dlerror: cudnn64_8.dll not found 2021-08-28 14:08:48.195744: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1766] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform. Skipping registering GPU devices... 2021-08-28 14:08:48.347281: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2 To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2021-08-28 14:08:48.421196: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix: 2021-08-28 14:08:48.435452: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264] 2021-08-28 14:08:51.170786: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: pciBusID: 0000:01:00.0 name: GeForce GTX 1060 with Max-Q Design computeCapability: 6.1 coreClock: 1.3415GHz coreCount: 10 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 178.99GiB/s 2021-08-28 14:08:51.183624: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1766] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform. Skipping registering GPU devices... 2021-08-28 14:08:53.758128: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix: 2021-08-28 14:08:53.765005: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264] 0 2021-08-28 14:08:53.769173: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0: N WARNING:tensorflow:From C:\Users\saran\Anaconda3\envs\tfgpu\lib\site-packages\tensorflow\python\keras\layers\normalization.py:534: _colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. 28-08-2021:14:08:54,419 WARNING [deprecation.py:336] From C:\Users\saran\Anaconda3\envs\tfgpu\lib\site-packages\tensorflow\python\keras\layers\normalization.py:534: _colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. C:\Users\saran\Anaconda3\envs\tfgpu\lib\site-packages\tensorflow\python\keras\engine\training.py:2426: UserWarning: Model.state_updateswill be removed in a future version. This property should not be used in TensorFlow 2.0, asupdates are applied automatically. warnings.warn('Model.state_updates will be removed in a future version. ' [{'box': (83, 83, 200, 200), 'emotions': {'angry': 0.0, 'disgust': 0.0, 'fear': 0.0, 'happy': 0.97, 'sad': 0.0, 'surprise': 0.0, 'neutral': 0.03}}] — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#30 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACOLMZG56ZADOG4XOY7X2FTT7DJ3BANCNFSM5C5EID6Q> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

from fer.

Saran-nns commented on May 23, 2024

Hi @kormoczi . Thanks for the update.

I updated CUDA and cudnn and found the example.py ran successfully with GPU.

Logs:

(tfgpu) N:\fer>python example.py 2021-09-02 17:16:47.723348: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll WARNING:tensorflow:From C:\Users\saran\Anaconda3\envs\tfgpu\lib\site-packages\tensorflow\python\compat\v2_compat.py:96: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version. Instructions for updating: non-resource variables are not supported in the long term 2021-09-02 17:18:06.955330: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library nvcuda.dll 2021-09-02 17:18:13.564614: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: pciBusID: 0000:01:00.0 name: GeForce GTX 1060 with Max-Q Design computeCapability: 6.1 coreClock: 1.3415GHz coreCount: 10 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 178.99GiB/s 2021-09-02 17:18:13.576812: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll 2021-09-02 17:18:16.750779: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublas64_11.dll 2021-09-02 17:18:16.756477: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublasLt64_11.dll 2021-09-02 17:18:17.022092: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cufft64_10.dll 2021-09-02 17:18:17.790750: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library curand64_10.dll 2021-09-02 17:18:18.722597: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cusolver64_11.dll 2021-09-02 17:18:19.191502: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cusparse64_11.dll 2021-09-02 17:18:21.192585: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudnn64_8.dll 2021-09-02 17:18:22.231133: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0 2021-09-02 17:18:23.184233: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2 To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2021-09-02 17:18:23.578463: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: pciBusID: 0000:01:00.0 name: GeForce GTX 1060 with Max-Q Design computeCapability: 6.1 coreClock: 1.3415GHz coreCount: 10 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 178.99GiB/s 2021-09-02 17:18:23.592416: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0 2021-09-02 17:18:47.042383: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix: 2021-09-02 17:18:47.071966: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264] 0 2021-09-02 17:18:47.076344: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0: N 2021-09-02 17:18:47.330953: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4484 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 with Max-Q Design, pci bus id: 0000:01:00.0, compute capability: 6.1) 2021-09-02 17:18:49.850172: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: pciBusID: 0000:01:00.0 name: GeForce GTX 1060 with Max-Q Design computeCapability: 6.1 coreClock: 1.3415GHz coreCount: 10 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 178.99GiB/s 2021-09-02 17:18:49.861020: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0 2021-09-02 17:18:49.865012: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix: 2021-09-02 17:18:49.870792: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264] 0 2021-09-02 17:18:49.873996: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0: N 2021-09-02 17:18:49.877882: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4484 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 with Max-Q Design, pci bus id: 0000:01:00.0, compute capability: 6.1) WARNING:tensorflow:From C:\Users\saran\Anaconda3\envs\tfgpu\lib\site-packages\tensorflow\python\keras\layers\normalization.py:534: _colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. 02-09-2021:17:18:50,945 WARNING [deprecation.py:336] From C:\Users\saran\Anaconda3\envs\tfgpu\lib\site-packages\tensorflow\python\keras\layers\normalization.py:534: _colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. C:\Users\saran\Anaconda3\envs\tfgpu\lib\site-packages\tensorflow\python\keras\engine\training.py:2426: UserWarning: Model.state_updateswill be removed in a future version. This property should not be used in TensorFlow 2.0, asupdates are applied automatically. warnings.warn('Model.state_updates will be removed in a future version. ' 2021-09-02 17:18:56.204265: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudnn64_8.dll 2021-09-02 17:19:15.136577: I tensorflow/stream_executor/cuda/cuda_dnn.cc:359] Loaded cuDNN version 8202 2021-09-02 17:19:48.728937: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublas64_11.dll 2021-09-02 17:19:57.370451: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublasLt64_11.dll 2021-09-02 17:20:06.030969: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:689] Error in PredictCost() for the op: op: "Softmax" attr { key: "T" value { type: DT_FLOAT } } inputs { dtype: DT_FLOAT shape { unknown_rank: true } } device { type: "GPU" vendor: "NVIDIA" model: "GeForce GTX 1060 with Max-Q Design" frequency: 1341 num_cores: 10 environment { key: "architecture" value: "6.1" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 1572864 shared_memory_size_per_multiprocessor: 98304 memory_size: 4702352179 bandwidth: 192192000 } outputs { dtype: DT_FLOAT shape { unknown_rank: true } } [{'box': (83, 83, 200, 200), 'emotions': {'angry': 0.0, 'disgust': 0.0, 'fear': 0.0, 'happy': 0.97, 'sad': 0.0, 'surprise': 0.0, 'neutral': 0.03}}]

From your log, it is clear that the CUDA couldn't reach the cudnn .dll files.

Please make sure that,

You have the right version of cudnn. I suggest cuda 11.x, cudnn 8.x
You have added cudnn in your system path
Copy and paste the dll files from CUDNN bin to CUDA bin as in user guide
Please remove any old versions of CUDA and cudnn from your system paths to avoid path conflicts
Restart your system

Hope this helps

from fer.

Saran-nns commented on May 23, 2024

@kormoczi Also, 5. Restart your system

from fer.

Saran-nns commented on May 23, 2024

@kormoczi
Great that it works through docker.
cuda 11.x is not packaged with cublas. cudnn provides this functionality for ml frameworks like tf, pytorch or keras to generate (initialize) any cublas handles. Even you have the right versions installed, the error could still throw if their (cuda and cudnn) paths(including the python environment) are not well defined.

from fer.

Saran-nns commented on May 23, 2024

Thanks for the issue again and hope you enjoy ferr'ing :)

from fer.

FER not working with GPU about fer HOT 9 OPEN

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent