Comments (21)
Thanks for the ping! I've already added the new version to Gentoo, and ofc I forgot to reenable the test ;-).
from pyfakefs.
I do not know if it is related but our internal CI is getting OOM killed apparently on a unittest that uses pyfakefs. The memory goes from 1.6GB usage with python 3.11.6 and pyfakefs 5.3.1 to 2.5GB (and OOM killed) with python 3.12.0 and pyfakefs 5.3.2. The container runs on EKS and is based off Ubuntu 22.04.
I'm testing various variations to identify the exact test that triggers the memory leak and see if the problem is with the pyfakefs version.
from pyfakefs.
@mrbean-bremen Unfortunately I do not really have much more details. It is possible that some of the files we read are somewhat large, we do have a folder with under 800kB of yaml data loaded by some legacy code we know to be scaling inefficiently, but it hardly explains the increase in memory usage we have noticed. If we do get more specific details we'll make sure to post them here.
from pyfakefs.
Thanks for the report! Did you test this also with other Python versions, or only with 3.12? I assume that pandas
and openpyxl
had the version as defined in extra_requirements.txt
, e.g. 2.1.3 and 3.1.2, is this correct?
from pyfakefs.
Thanks for the report! Did you test this also with other Python versions, or only with 3.12?
I did with 3.10, 3.11 and PyPy3.10. The issue seems to be specific to 3.12.
I assume that
pandas
andopenpyxl
had the version as defined inextra_requirements.txt
, e.g. 2.1.3 and 3.1.2, is this correct?
Yes, this is the versions I've originally used. The "reproducer" above doesn't install openpyxl at all, the issue happens regardless of whether it's installed or not.
I'm pretty sure it's somehow related to CFFI — site-packages/_cffi_backend.cpython-312-x86_64-linux-gnu.so
is the only file featuring the string _IOBase
.
from pyfakefs.
FWICS cffi expects to be able to import _io._IOBase
here:
It seems that for some reason pyfakefs
intercepts that.
from pyfakefs.
Oh, that's probably because pyfakefs overrides _io
in 3.12 but not earlier versions:
pyfakefs/pyfakefs/fake_filesystem_unittest.py
Lines 657 to 659 in 8c7a99c
from pyfakefs.
FWICS all Base classes are prefixed with _
in the _io
module, so pyfakefs probably needs to mirror that:
>>> _io._
_io._BufferedIOBase() _io._BytesIOBuffer() _io._IOBase() _io._RawIOBase() _io._TextIOBase()
from pyfakefs.
Ah ok - thank you, that makes sense! Is there any possibility to reproduce this in a docker image which uses cffi? I don't know cffi (or gentoo, for that matter), so I'm not sure how I would test that...
from pyfakefs.
Ok, thinking about this I probably should just use a separate fake wrapper class for _io
, not the same as for io
, as this currently probably leads to it looking up _IOBase
in io
instead of _io
. I had the impression that _io
is just an alias for io
, but obviously this is not completely true.
from pyfakefs.
I will put something together a bit later today.
from pyfakefs.
Ah ok - thank you, that makes sense! Is there any possibility to reproduce this in a docker image which uses cffi? I don't know cffi (or gentoo, for that matter), so I'm not sure how I would test that...
I think so. The reproducer I gave above should work in Python3.12 venv anywhere.
from pyfakefs.
Yes, sorry - I had misread the issue (for some reason thought that a system package was involved).
It is even reproducible under Windows.
from pyfakefs.
@mgorny - shall be fixed in main now, please check!
Do you need a patch release?
from pyfakefs.
@mgorny - shall be fixed in main now, please check!
It's green now. Thanks!
Do you need a patch release?
I don't think there's a need for an urgent release. I don't think this problem is likely to affect many people in the wild, and I've just deselected the two tests from the current version of Gentoo ebuild.
from pyfakefs.
Ok, in this case I'll wait until something else comes up, thanks!
from pyfakefs.
FYI: A new patch release is now out.
from pyfakefs.
If it is indeed related to pyfakefs, it should have to do with Python 3.12. There are a few changes in patching (mostly pathlib
and io
) specifically for Python 3.12, though I have no idea how these could be cause that increased memory usage. It is unlikely that the change appeared in pyfakefs 5.3.2 (this can easilly be verified by using 5.3.1 with Python 3.12).
It could also be that the change has to do with Python 3.12 unrelated to pyfakefs, though I don't know how best to test this...
from pyfakefs.
@sodul - any luck with finding the cause of the problem or a reproducible example ? If yes, please write a new issue for this.
from pyfakefs.
@mrbean-bremen I could not get a true root cause unfortunately.
One of the issues was that we use EKS (Kubernetes in AWS) and our running nodes switched from cgroup v1 to v2 which meant that we ended up with much larger parallelism than before since v2 does not expose the CPU limits of the container. We hardcoded the parallelism to a reasonable amount, but the memory is still larger than before, but sufficiently under control to no longer get OOM killed.
from pyfakefs.
One thing I could think of is if you are mapping large files into the fake system and reading them. They will be read into memory at first access and stay there, as long as the fake fs instance lives. Not sure if you are doing something like this, but I can't think of anything else pyfakefs-related that would use noticable amounts of RAM.
from pyfakefs.
Related Issues (20)
- sdist is missing `tox.ini` HOT 3
- pyfakefs 5.3.4 + moto 4.2.13, test failed HOT 1
- Example usage docs using pathlib leads to errors HOT 2
- pyarrow.lib.ArrowKeyError: A type extension with name pandas.period already defined HOT 9
- Infinite loop with multiprocessing HOT 4
- EncodingWarnings raised (3) when using pyfakefs HOT 7
- Adding/removing directory entries does not respect directory ownership HOT 1
- Directory enumeration/search permissions mismatch Unix behavior HOT 3
- Wrong temporary directory created following a reset via `fs.os` HOT 4
- The `io.open` function should not consider file permissions when opening from an existing file descriptor HOT 3
- Patched `os.symlink` mishandles directory descriptors HOT 3
- Missing mock of `os.dup` HOT 4
- `os.getxattr` should raise `OSError(ENODATA)` for non-existent extended attributes HOT 1
- Patched `os.open` does not respect `O_NOFOLLOW` HOT 1
- Faked `os.link` is missing the `follow_symlinks` parameter HOT 1
- Fake `os.open` does not handle the `O_DIRECTORY` flag
- shutil.rmtree failing in OSType.WINDOWS with PermissionError HOT 2
- `os.open(O_DIRECTORY)` should raise `ENOENT` when the file does not exist
- Faked `os.makedirs` differs from the built-in `os.makedirs` and `mkdir -p` HOT 5
- Closing a duplicated fileno invalidates the previous file description
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pyfakefs.