Comments (5)
Hello @Qeenon ,
Thank you for raising this issue. I tried reproducing the error and could notice that the sanitizer was identifying a leak of 20 bytes when variables are stored on the GPU. The problem does not seem to appear when loading on CPU.
I tried looping over a sequence of model loading/inference (see gist https://gist.github.com/guillaume-be/76e0d287dc125592e8a2088cc48f7066). No memory leak seems to be visible after ~ 5000 iterations. I have an intuition that the issue could come from the tokenizer loading, and not necessarily from the model itself. Which tokenizer are you loading?
Are you using a GPU for your service? Is there a model in particular for which the memory leak is more severe? Would you be able to share a snippet of code to reproduce the issue?
I will raise the issue of GPU memory leak with the author of the torch bindings and see if it could come from here. Note that the models are not meant to be loaded for every query, the lazy_static or use of the batched_fn
crate (see example at https://github.com/epwalsh/rust-dl-webserver) are the appropriate way to serve such models.
from rust-bert.
Yet I'm not 100% sure that leaks related to models, so far I'm trying to detect whether it's so
This file loads QA / Conversation / Translation models, on this commit it was on demand and after some time bot used to become alike 25GB on RAM. Now I've changed to code and put those inside lazy_static and so far it's going okay-ish but I'd let it to keep running for some more time to be sure that it was related to models.
They was working on CPU (I didn't setup cuda properly on host machine).
from rust-bert.
Translation example (see gist at https://gist.github.com/guillaume-be/60d4a4a61ec16d21478ba497d517a054)
This does not raise any warning using RUSTFLAGS=-Zsanitizer=leak cargo run -Zbuild-std --target x86_64-unknown-linux-gnu translation
.
Below I am including the logs from valgrind target/debug/examples/translation --leak-check=full
, which seems to indicate that some memory is not freed at the exit: https://gist.github.com/guillaume-be/c278ff8c9665264ef901736ea53ab88f
Some of the warnings seem to be caused by the reqwest library. I tried to rerun a minimal example:
extern crate anyhow;
use rust_bert::resources::{Resource, RemoteResource};
use rust_bert::marian::{MarianModelResources, MarianVocabResources, MarianSpmResources, MarianConfigResources};
fn main() -> anyhow::Result<()> {
let model_resource = Resource::Remote(RemoteResource::from_pretrained(MarianModelResources::ENGLISH2RUSSIAN));
let vocab_resource = Resource::Remote(RemoteResource::from_pretrained(MarianVocabResources::ENGLISH2RUSSIAN));
let merge_resource = Resource::Remote(RemoteResource::from_pretrained(MarianSpmResources::ENGLISH2RUSSIAN));
let config_resource = Resource::Remote(RemoteResource::from_pretrained(MarianConfigResources::ENGLISH2RUSSIAN));
let _out1 = model_resource.get_local_path();
let _out2 = vocab_resource.get_local_path();
let _out3 = merge_resource.get_local_path();
let _out4 = config_resource.get_local_path();
Ok(())
}
valgrind log (minimal example)
==28563== Memcheck, a memory error detector ==28563== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==28563== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info ==28563== Command: target/debug/examples/resource_download --leak-check=full ==28563== ==28563== Warning: set address range perms: large range [0x4dab000, 0x40fba000) (defined) ==28563== Warning: set address range perms: large range [0x40fba000, 0x51cf0000) (defined) ==28563== Source and destination overlap in memcpy_chk(0x1ffefff5c0, 0x1ffefff5c0, 5) ==28563== at 0x4843BF0: __memcpy_chk (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==28563== by 0x44EC0F83: cpuinfo_linux_parse_cpulist (in /home/guillaume/libtorch/lib/libtorch_cpu.so) ==28563== by 0x44EBFC96: cpuinfo_linux_get_max_possible_processor (in /home/guillaume/libtorch/lib/libtorch_cpu.so) ==28563== by 0x44EBDFA1: cpuinfo_x86_linux_init (in /home/guillaume/libtorch/lib/libtorch_cpu.so) ==28563== by 0x51FDF47E: __pthread_once_slow (pthread_once.c:116) ==28563== by 0x44EBA3B6: cpuinfo_initialize (in /home/guillaume/libtorch/lib/libtorch_cpu.so) ==28563== by 0x41D1B157: at::native::compute_cpu_capability() (in /home/guillaume/libtorch/lib/libtorch_cpu.so) ==28563== by 0x41D1B30C: at::native::get_cpu_capability() (in /home/guillaume/libtorch/lib/libtorch_cpu.so) ==28563== by 0x42706CB8: THFloatVector_startup::THFloatVector_startup() (in /home/guillaume/libtorch/lib/libtorch_cpu.so) ==28563== by 0x4197ED85: _GLOBAL__sub_I_THVector.cpp (in /home/guillaume/libtorch/lib/libtorch_cpu.so) ==28563== by 0x4011B89: call_init.part.0 (dl-init.c:72) ==28563== by 0x4011C90: call_init (dl-init.c:30) ==28563== by 0x4011C90: _dl_init (dl-init.c:119) ==28563== ==28563== Source and destination overlap in memcpy_chk(0x1ffefff5c0, 0x1ffefff5c0, 5) ==28563== at 0x4843BF0: __memcpy_chk (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==28563== by 0x44EC0F83: cpuinfo_linux_parse_cpulist (in /home/guillaume/libtorch/lib/libtorch_cpu.so) ==28563== by 0x44EBFD16: cpuinfo_linux_get_max_present_processor (in /home/guillaume/libtorch/lib/libtorch_cpu.so) ==28563== by 0x44EBDFAC: cpuinfo_x86_linux_init (in /home/guillaume/libtorch/lib/libtorch_cpu.so) ==28563== by 0x51FDF47E: __pthread_once_slow (pthread_once.c:116) ==28563== by 0x44EBA3B6: cpuinfo_initialize (in /home/guillaume/libtorch/lib/libtorch_cpu.so) ==28563== by 0x41D1B157: at::native::compute_cpu_capability() (in /home/guillaume/libtorch/lib/libtorch_cpu.so) ==28563== by 0x41D1B30C: at::native::get_cpu_capability() (in /home/guillaume/libtorch/lib/libtorch_cpu.so) ==28563== by 0x42706CB8: THFloatVector_startup::THFloatVector_startup() (in /home/guillaume/libtorch/lib/libtorch_cpu.so) ==28563== by 0x4197ED85: _GLOBAL__sub_I_THVector.cpp (in /home/guillaume/libtorch/lib/libtorch_cpu.so) ==28563== by 0x4011B89: call_init.part.0 (dl-init.c:72) ==28563== by 0x4011C90: call_init (dl-init.c:30) ==28563== by 0x4011C90: _dl_init (dl-init.c:119) ==28563== ==28563== Source and destination overlap in memcpy_chk(0x1ffefff5b0, 0x1ffefff5b0, 5) ==28563== at 0x4843BF0: __memcpy_chk (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==28563== by 0x44EC0F83: cpuinfo_linux_parse_cpulist (in /home/guillaume/libtorch/lib/libtorch_cpu.so) ==28563== by 0x44EBFD99: cpuinfo_linux_detect_possible_processors (in /home/guillaume/libtorch/lib/libtorch_cpu.so) ==28563== by 0x44EBE00D: cpuinfo_x86_linux_init (in /home/guillaume/libtorch/lib/libtorch_cpu.so) ==28563== by 0x51FDF47E: __pthread_once_slow (pthread_once.c:116) ==28563== by 0x44EBA3B6: cpuinfo_initialize (in /home/guillaume/libtorch/lib/libtorch_cpu.so) ==28563== by 0x41D1B157: at::native::compute_cpu_capability() (in /home/guillaume/libtorch/lib/libtorch_cpu.so) ==28563== by 0x41D1B30C: at::native::get_cpu_capability() (in /home/guillaume/libtorch/lib/libtorch_cpu.so) ==28563== by 0x42706CB8: THFloatVector_startup::THFloatVector_startup() (in /home/guillaume/libtorch/lib/libtorch_cpu.so) ==28563== by 0x4197ED85: _GLOBAL__sub_I_THVector.cpp (in /home/guillaume/libtorch/lib/libtorch_cpu.so) ==28563== by 0x4011B89: call_init.part.0 (dl-init.c:72) ==28563== by 0x4011C90: call_init (dl-init.c:30) ==28563== by 0x4011C90: _dl_init (dl-init.c:119) ==28563== ==28563== Source and destination overlap in memcpy_chk(0x1ffefff5b0, 0x1ffefff5b0, 5) ==28563== at 0x4843BF0: __memcpy_chk (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==28563== by 0x44EC0F83: cpuinfo_linux_parse_cpulist (in /home/guillaume/libtorch/lib/libtorch_cpu.so) ==28563== by 0x44EBFDF9: cpuinfo_linux_detect_present_processors (in /home/guillaume/libtorch/lib/libtorch_cpu.so) ==28563== by 0x44EBE02F: cpuinfo_x86_linux_init (in /home/guillaume/libtorch/lib/libtorch_cpu.so) ==28563== by 0x51FDF47E: __pthread_once_slow (pthread_once.c:116) ==28563== by 0x44EBA3B6: cpuinfo_initialize (in /home/guillaume/libtorch/lib/libtorch_cpu.so) ==28563== by 0x41D1B157: at::native::compute_cpu_capability() (in /home/guillaume/libtorch/lib/libtorch_cpu.so) ==28563== by 0x41D1B30C: at::native::get_cpu_capability() (in /home/guillaume/libtorch/lib/libtorch_cpu.so) ==28563== by 0x42706CB8: THFloatVector_startup::THFloatVector_startup() (in /home/guillaume/libtorch/lib/libtorch_cpu.so) ==28563== by 0x4197ED85: _GLOBAL__sub_I_THVector.cpp (in /home/guillaume/libtorch/lib/libtorch_cpu.so) ==28563== by 0x4011B89: call_init.part.0 (dl-init.c:72) ==28563== by 0x4011C90: call_init (dl-init.c:30) ==28563== by 0x4011C90: _dl_init (dl-init.c:119) ==28563== ==28563== Thread 2 reqwest-internal: ==28563== Syscall param statx(file_name) points to unaddressable byte(s) ==28563== at 0x522579FE: statx (statx.c:29) ==28563== by 0xAB4B00: statx (weak.rs:134) ==28563== by 0xAB4B00: std::sys::unix::fs::try_statx (fs.rs:123) ==28563== by 0xAB30A7: std::sys::unix::fs::stat (fs.rs:1105) ==28563== by 0x510D3D: std::fs::metadata (fs.rs:1567) ==28563== by 0x5126E1: openssl_probe::find_certs_dirs::{{closure}} (lib.rs:31) ==28563== by 0x5125ED: core::ops::function::impls:: for &mut F>::call_mut (function.rs:269) ==28563== by 0x510E3E: core::iter::traits::iterator::Iterator::find::check::{{closure}} (iterator.rs:2227) ==28563== by 0x514733: core::iter::adapters::map::map_try_fold::{{closure}} (map.rs:87) ==28563== by 0x5116F9: core::iter::traits::iterator::Iterator::try_fold (iterator.rs:1888) ==28563== by 0x514454: as core::iter::traits::iterator::Iterator>::try_fold (map.rs:113) ==28563== by 0x514642: core::iter::traits::iterator::Iterator::find (iterator.rs:2231) ==28563== by 0x510C5C: as core::iter::traits::iterator::Iterator>::next (filter.rs:55) ==28563== Address 0x0 is not stack'd, malloc'd or (recently) free'd ==28563== ==28563== Syscall param statx(buf) points to unaddressable byte(s) ==28563== at 0x522579FE: statx (statx.c:29) ==28563== by 0xAB4B00: statx (weak.rs:134) ==28563== by 0xAB4B00: std::sys::unix::fs::try_statx (fs.rs:123) ==28563== by 0xAB30A7: std::sys::unix::fs::stat (fs.rs:1105) ==28563== by 0x510D3D: std::fs::metadata (fs.rs:1567) ==28563== by 0x5126E1: openssl_probe::find_certs_dirs::{{closure}} (lib.rs:31) ==28563== by 0x5125ED: core::ops::function::impls:: for &mut F>::call_mut (function.rs:269) ==28563== by 0x510E3E: core::iter::traits::iterator::Iterator::find::check::{{closure}} (iterator.rs:2227) ==28563== by 0x514733: core::iter::adapters::map::map_try_fold::{{closure}} (map.rs:87) ==28563== by 0x5116F9: core::iter::traits::iterator::Iterator::try_fold (iterator.rs:1888) ==28563== by 0x514454: as core::iter::traits::iterator::Iterator>::try_fold (map.rs:113) ==28563== by 0x514642: core::iter::traits::iterator::Iterator::find (iterator.rs:2231) ==28563== by 0x510C5C: as core::iter::traits::iterator::Iterator>::next (filter.rs:55) ==28563== Address 0x0 is not stack'd, malloc'd or (recently) free'd ==28563== ==28563== ==28563== HEAP SUMMARY: ==28563== in use at exit: 1,897,131 bytes in 30,570 blocks ==28563== total heap usage: 379,202 allocs, 348,632 frees, 54,173,510 bytes allocated ==28563== ==28563== LEAK SUMMARY: ==28563== definitely lost: 114 bytes in 1 blocks ==28563== indirectly lost: 0 bytes in 0 blocks ==28563== possibly lost: 2,120 bytes in 8 blocks ==28563== still reachable: 1,894,897 bytes in 30,561 blocks ==28563== suppressed: 0 bytes in 0 blocks ==28563== Rerun with --leak-check=full to see details of leaked memory ==28563== ==28563== For lists of detected and suppressed errors, rerun with: -s ==28563== ERROR SUMMARY: 6 errors from 6 contexts (suppressed: 0 from 0)
I am not quite sure what is going on here, this goes a bit beyond my comfort zone.
@jerry73204 I see you did some troubleshooting on the tch-rs
crate related to memory leak, do you have an idea on what may be happening here?
@proycon , @epwalsh reaching out to you as well for support if you would have some time
Note I also ran the following script for ~200 iterations and did not notice a significant increase in memory consumption (stable at ~2544MB): https://gist.github.com/guillaume-be/34a982ca33749ba4be2951836ab36b97
I also ran an end-to-end translation example, including reloading the entire model and tokenizer at each iteration (see https://gist.github.com/guillaume-be/06bbc56639522d8745f2d357b310bc17). I ran the script for 20 minutes (500 full model reloads and translation), the memory consumption remained stable at 2560MB). I could run it longer, but I am unlikely to reach a 25GB memory use in a realistic amount of time.
from rust-bert.
@Qeenon I ran a few more experiments on my end.
Looking at the valgrind logs, there are two potential sources of leaks that are entirely related to the model loading. I have compared the value with experiments running inference and can now rule out memory leaks during predictions.
For the model loading, these seem to appear when registering variables or modules to the variable store. As a validation, I did a quick experiment, and a very basic module creation using the base tch-rs
library also raises a memory leak warning in valgrind:
fn main() -> anyhow::Result<()> {
let device = Device::cuda_if_available();
let vs = nn::VarStore::new(device);
let _module = nn::linear(&vs.root() / "dense", 1024, 1024, Default::default());
Ok(())
}
Since running this for more than a million iteration does not lead to any actual memory leak by monitoring the resources consumed by the process, I believe this is a spurious error.
Here is a summary of my investigations so far:
- The LeakSanitizer does not return any memory leak error for the models I tested (masked language model and translation)
- Valgrind indicates potential for a moderate memory leak at model loading (not prediction). This is linked to the registration of variables in the variable store in
tch-rs
, and does not seem to result in a actual memory leak (false positive) - I tried running several hundreds of model loading / 1 prediction on the translation model and did not see a noticeable increase in memory consumption
Based on the following I would assume that there does not seem to be an obvious issue of memory leak in the models from the library. Loading the model once and running predictions on demand is indeed the right way of using those - is this working for you?
from rust-bert.
Thank you for your investigations here.
I'm really right now can't be sure with it and next time I will do tests on my side first.
from rust-bert.
Related Issues (20)
- Fine-tuning Marian model can't use pipeline doing translate task HOT 2
- When label mapping aren't provided - we get a crash HOT 1
- Zeroshot with DeBerta v2 vs BART - is it worth it? HOT 2
- Can I use this lib with onnx but without libtorch? HOT 1
- how to use Cross-Encoder for MS Marco by rust-bert?
- Seek Assistance and Support for DeBERTa Model HOT 2
- Is multilabel prediction correct? HOT 1
- Evaluation fails when trying to extract keywords from a specific sentence HOT 2
- Please expose tonekizer params on models where `forward_t` is exposed
- Downloading a model to a local Directory HOT 4
- Question: Configuring ZeroShotClassificationModel with DeBERTaV2 - Documentation HOT 1
- Upgrade Cargo dependencies HOT 2
- GPT-2 text generation throws an unexpected error HOT 4
- Any plan to release a new version? HOT 2
- Question: is it ok to continue after OOM error from `encode`
- update to be working with torch 2.2.0
- support for huggingface access token
- Error on running example in Linux
- RemoteResource doesn't allow loading safetensors models
- linking with `cc` failed: exit status: 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rust-bert.