wencolani / crosse Goto Github PK
View Code? Open in Web Editor NEWInteraction Embeddings for Prediction and Explanation in Knowledge Graphs (WSDM'2019)
Interaction Embeddings for Prediction and Explanation in Knowledge Graphs (WSDM'2019)
Hi, I am training CrossE on FB15K and I am encountering some problems with OOM errors.
I am using a Tesla P100 GPU with 16GB.
Apparently, the issue is related to the batch size, as it only takes place when I use a batch size greater than around 2500.
If I reduce the batch size to 2000 it works (it raises some OOM errors here and there, but since the training does not stop, I assume that tensorflow manages to handle the situation under the hood):
totalMemory: 15.90GiB freeMemory: 3.12GiB
2019-07-31 12:26:55.290114: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-07-31 12:26:55.291419: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-07-31 12:26:55.291432: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2019-07-31 12:26:55.291437: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2019-07-31 12:26:55.291676: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7814 MB memory) -> physical GPU (device: 0, name: Tesla P100-SXM2-16GB, pci bus id: 0000:05:00.0, compute capability: 6.0)
initializing raw training data...
raw training data initialized.
2019-07-31 12:26:59.036369: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 7.63G (8194432512 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-07-31 12:26:59.037461: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 6.87G (7374989312 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-07-31 12:26:59.038501: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 6.18G (6637490176 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-07-31 12:26:59.039536: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 5.56G (5973741056 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-07-31 12:26:59.040558: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 5.01G (5376366592 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-07-31 12:26:59.041597: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 4.51G (4838729728 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-07-31 12:26:59.042615: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 4.06G (4354856448 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-07-31 12:26:59.043631: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 3.65G (3919370752 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-07-31 12:26:59.044693: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 3.29G (3527433472 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-07-31 12:26:59.128179: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
[100 sec](186000/483142) : 0.38 -- loss : 10.61627 rloss: 0.00032
The weird thing is, using nvidia-smi I observe that CrossE seems to allocate almost all the memory in my GPU (15278MiB), but only a small amount is actually used (2479MiB).
| NVIDIA-SMI 410.48 Driver Version: 410.48 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P100-SXM2... Off | 00000000:05:00.0 Off | 0 |
| N/A 36C P0 41W / 300W | 15278MiB / 16280MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 2748 C python3 2479MiB |
+-----------------------------------------------------------------------------+
Unfortunately I can not just use batch size 2000, because I need to replicate your results and I guess using a smaller batch size will result in worse performances.
What environment did you use to train the model on FB15K? (OS, tensorflow version, CUDA drivers, GPU).
Hello, I'm Minho Lee
First of all, I've read your paper CrossE in 2019, Thank you.
However, when I analyzed GitHub that you've uploaded, I wasn't able to find the code to generate an explanation part.
If you don't mind, I'd like to request you to upload the code generating an explanation part.
Thank you.
Hello, I'm studying your research and trying to extend your code for my Master Degree thesis.
I'm not able to run the code locally due to RAM limits (16 Gb); may I ask you if it is possible to provide the minimum, and also recommended, hardware requirements in order to execute the project?
What configuration are you using?
What is the approximate running time?
Thank you for your attention
Looking forward to your reply~
@wencolani So I just used the training file and trained the model using triples ( A , B --> C/D -> D/F )
I have two questions regarding the trained model :
Waiting for your reply
Thank you !!
hi, could you help me with some questions about your code?
In your paper, there is a experiment for generating explanations.
Do you have this experiment in your code? Where is it?
Thank you for your attention.
Wish you happy every day.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.