Giter Site home page Giter Site logo

`pyfakefs/tests/patched_packages_test.py::TestPatchedPackages::test_read_{csv,table}` fail if zstandard is used with CFFI backend, on Python 3.12 about pyfakefs HOT 21 CLOSED

mgorny avatar mgorny commented on June 7, 2024
`pyfakefs/tests/patched_packages_test.py::TestPatchedPackages::test_read_{csv,table}` fail if zstandard is used with CFFI backend, on Python 3.12

from pyfakefs.

Comments (21)

mgorny avatar mgorny commented on June 7, 2024 1

Thanks for the ping! I've already added the new version to Gentoo, and ofc I forgot to reenable the test ;-).

from pyfakefs.

sodul avatar sodul commented on June 7, 2024 1

I do not know if it is related but our internal CI is getting OOM killed apparently on a unittest that uses pyfakefs. The memory goes from 1.6GB usage with python 3.11.6 and pyfakefs 5.3.1 to 2.5GB (and OOM killed) with python 3.12.0 and pyfakefs 5.3.2. The container runs on EKS and is based off Ubuntu 22.04.

I'm testing various variations to identify the exact test that triggers the memory leak and see if the problem is with the pyfakefs version.

from pyfakefs.

sodul avatar sodul commented on June 7, 2024 1

@mrbean-bremen Unfortunately I do not really have much more details. It is possible that some of the files we read are somewhat large, we do have a folder with under 800kB of yaml data loaded by some legacy code we know to be scaling inefficiently, but it hardly explains the increase in memory usage we have noticed. If we do get more specific details we'll make sure to post them here.

from pyfakefs.

mrbean-bremen avatar mrbean-bremen commented on June 7, 2024

Thanks for the report! Did you test this also with other Python versions, or only with 3.12? I assume that pandas and openpyxl had the version as defined in extra_requirements.txt, e.g. 2.1.3 and 3.1.2, is this correct?

from pyfakefs.

mgorny avatar mgorny commented on June 7, 2024

Thanks for the report! Did you test this also with other Python versions, or only with 3.12?

I did with 3.10, 3.11 and PyPy3.10. The issue seems to be specific to 3.12.

I assume that pandas and openpyxl had the version as defined in extra_requirements.txt, e.g. 2.1.3 and 3.1.2, is this correct?

Yes, this is the versions I've originally used. The "reproducer" above doesn't install openpyxl at all, the issue happens regardless of whether it's installed or not.

I'm pretty sure it's somehow related to CFFI — site-packages/_cffi_backend.cpython-312-x86_64-linux-gnu.so is the only file featuring the string _IOBase.

from pyfakefs.

mgorny avatar mgorny commented on June 7, 2024

FWICS cffi expects to be able to import _io._IOBase here:

https://github.com/python-cffi/cffi/blob/49127c6929bfc7186fbfd3819dd5e058ad888de4/src/c/file_emulator.h#L6-L17

It seems that for some reason pyfakefs intercepts that.

from pyfakefs.

mgorny avatar mgorny commented on June 7, 2024

Oh, that's probably because pyfakefs overrides _io in 3.12 but not earlier versions:

if IS_PYPY or sys.version_info >= (3, 12):
# in PyPy and later cpython versions, the module is referenced as _io
self._fake_module_classes["_io"] = fake_io.FakeIoModule

from pyfakefs.

mgorny avatar mgorny commented on June 7, 2024

FWICS all Base classes are prefixed with _ in the _io module, so pyfakefs probably needs to mirror that:

>>> _io._
_io._BufferedIOBase()  _io._BytesIOBuffer()   _io._IOBase()          _io._RawIOBase()       _io._TextIOBase()

from pyfakefs.

mrbean-bremen avatar mrbean-bremen commented on June 7, 2024

Ah ok - thank you, that makes sense! Is there any possibility to reproduce this in a docker image which uses cffi? I don't know cffi (or gentoo, for that matter), so I'm not sure how I would test that...

from pyfakefs.

mrbean-bremen avatar mrbean-bremen commented on June 7, 2024

Ok, thinking about this I probably should just use a separate fake wrapper class for _io, not the same as for io, as this currently probably leads to it looking up _IOBase in io instead of _io. I had the impression that _io is just an alias for io, but obviously this is not completely true.

from pyfakefs.

mrbean-bremen avatar mrbean-bremen commented on June 7, 2024

I will put something together a bit later today.

from pyfakefs.

mgorny avatar mgorny commented on June 7, 2024

Ah ok - thank you, that makes sense! Is there any possibility to reproduce this in a docker image which uses cffi? I don't know cffi (or gentoo, for that matter), so I'm not sure how I would test that...

I think so. The reproducer I gave above should work in Python3.12 venv anywhere.

from pyfakefs.

mrbean-bremen avatar mrbean-bremen commented on June 7, 2024

Yes, sorry - I had misread the issue (for some reason thought that a system package was involved).
It is even reproducible under Windows.

from pyfakefs.

mrbean-bremen avatar mrbean-bremen commented on June 7, 2024

@mgorny - shall be fixed in main now, please check!
Do you need a patch release?

from pyfakefs.

mgorny avatar mgorny commented on June 7, 2024

@mgorny - shall be fixed in main now, please check!

It's green now. Thanks!

Do you need a patch release?

I don't think there's a need for an urgent release. I don't think this problem is likely to affect many people in the wild, and I've just deselected the two tests from the current version of Gentoo ebuild.

from pyfakefs.

mrbean-bremen avatar mrbean-bremen commented on June 7, 2024

Ok, in this case I'll wait until something else comes up, thanks!

from pyfakefs.

mrbean-bremen avatar mrbean-bremen commented on June 7, 2024

FYI: A new patch release is now out.

from pyfakefs.

mrbean-bremen avatar mrbean-bremen commented on June 7, 2024

If it is indeed related to pyfakefs, it should have to do with Python 3.12. There are a few changes in patching (mostly pathlib and io) specifically for Python 3.12, though I have no idea how these could be cause that increased memory usage. It is unlikely that the change appeared in pyfakefs 5.3.2 (this can easilly be verified by using 5.3.1 with Python 3.12).
It could also be that the change has to do with Python 3.12 unrelated to pyfakefs, though I don't know how best to test this...

from pyfakefs.

mrbean-bremen avatar mrbean-bremen commented on June 7, 2024

@sodul - any luck with finding the cause of the problem or a reproducible example ? If yes, please write a new issue for this.

from pyfakefs.

sodul avatar sodul commented on June 7, 2024

@mrbean-bremen I could not get a true root cause unfortunately.

One of the issues was that we use EKS (Kubernetes in AWS) and our running nodes switched from cgroup v1 to v2 which meant that we ended up with much larger parallelism than before since v2 does not expose the CPU limits of the container. We hardcoded the parallelism to a reasonable amount, but the memory is still larger than before, but sufficiently under control to no longer get OOM killed.

from pyfakefs.

mrbean-bremen avatar mrbean-bremen commented on June 7, 2024

One thing I could think of is if you are mapping large files into the fake system and reading them. They will be read into memory at first access and stay there, as long as the fake fs instance lives. Not sure if you are doing something like this, but I can't think of anything else pyfakefs-related that would use noticable amounts of RAM.

from pyfakefs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.