Giter Site home page Giter Site logo

Comments (8)

damianwadley avatar damianwadley commented on May 29, 2024

I'm not going to profess to be an expert in PHP's memory management, but here's some things that stand out to me:

  • The "cache MB" is going to be lower than actual, as it's not counting the overhead of managing the strings and array and whatever else. Not much lower, of course, but it won't be the full picture of how much memory is being spent on your cache. That said, it's probably close enough to accurate that you could use it for rough estimates about how much memory the cache is using itself in relation to overall usage, and adjust its freeing operations accordingly.
  • The "leak %" is not an accurate label: (1) memory_get_usage([false]) reports memory in use while memory_get_usage(true) reports memory that's been allocated for potential use, and (2) you've forgotten to add a * 100. So a value of 1.97 means that PHP has claimed 97% more memory than it's currently using. Which is not necessarily a bad thing - especially if you're doing a lot of arbitrary string creation - and certainly doesn't indicate a leak.
  • As such, your (example) cache currently bases its calculations on the memory in use, but PHP's memory limitations are going to be governed by how much it tries to allocate. So if you're in a situation where you have that 97% overcommit and PHP somehow needs more than that for its next operation, you're going to be mistaken by some 49% about what's actually happening. So you should be using memory_get_usage(true) when determining if your cache needs to release any memory; that also explains why switching to using it fixes the problem.
  • Note that gc_collect_cycles is not a "manually invoke the GC" function (which is something virtually nobody should ever need to do as PHP is eager about doing that). It "collects cycles" - which I don't see happening your code, so that's why there's no difference when you add calls to it.

Thing to keep in mind that may be relevant here, since we're talking about strings and memory usage, though I'm not sure is actually a factor in your examples: concatenation of long strings hurts. Which is typical for languages with managed strings. Functions like str_repeat are smart about allocation, but $long_string_1 . $long_string_2 isn't going to be nice on memory. That said, IIRC PHP does have some optimizations around what happens with multiple concatenations, so it's not entirely doom and gloom.

Can you be more specific about precisely what sort of behavior you're reporting as the bug, and what sort of fix you might have in mind for it?

from php-src.

iluuu1994 avatar iluuu1994 commented on May 29, 2024

You can read more about how allocations work in PHP here:
https://www.phpinternalsbook.com/php7/memory_management/zend_memory_manager.html#zendmm-internal-design

The reason strings >=2MB (2MB - 4KB, to be exact) behave differently for you is because this is the "huge" allocation threshold. Huge allocations are allocated directly using mmap, rather than being part of a chunk, which is a container of multiple allocations. Now, one chunk is made up of many different allocations, and it can only be released when the entire chunk becomes free. When a single allocation of the chunk is still needed, the entire 2MB chunk must be kept around.

This model generally works well for PHP because it's very fast, and the entire memory pool gets cleared when the request ends, avoiding fragmentation. However, this means that some chunks may stick around if you allocate a lot of memory and free it again.

You could try calling gc_mem_caches() (gc_collect_cycles() has no effect, since the strings can't be involved in cycles), which will free any chunks that are actually free.

from php-src.

martinhoch42 avatar martinhoch42 commented on May 29, 2024

@iluuu1994, thank you for the explanation and the link. This explains the behavior I am seeing with the scripts above.

Apparently, while trying to break down my problem into something reproducible, I just found a good way to abuse the memory management. Apologies for that.

Am I correct to assume that if a large allocation uses part of a chunk, the remaining pages of that chunk are still free to be used for other allocations?

My issue is probably caused by many large allocations that are "recycled" and PHP struggling to put them into free pages on existing chunks, then having to resort to creating new chunks too often while keeping existing chunks around for other allocations. I guess I am just having an unfortunate use case for PHP's memory management.

I doubt that gc_mem_caches is going to change anything, but I will check it out.

@damianwadley, thanks for the input. I tried using memory_get_usage(true), but that resulted in my "real-world script" just constantly purging the cache.

but $long_string_1 . $long_string_2 isn't going to be nice on memory.

Yes, this is indeed something I noticed during the investigation and might be part of my problem.

Maybe a Documentation Change for memory_get_usage

I am not sure if this is something the PHP docs want to do, but linking to the "Zend Memory Manager" page from memory_get_usage could have prevented me from opening this issue.

I already tried checking the internals book, but I ended up on the "zvals memory management" page, which was obviously not related to the issue, and I moved on before seeing the other page.

When reading "Zend Memory Manager," I also understood what "including unused pages" in the description of real_usage for memory_get_usage actually means.

I am closing this issue. If you think adding something to memory_get_usage is worth consideration, I can open an issue on the docs GitHub. (I am not sure if this is worth pursuing.)

from php-src.

iluuu1994 avatar iluuu1994 commented on May 29, 2024

Am I correct to assume that if a large allocation uses part of a chunk, the remaining pages of a chunk are still free to be used for other allocations?

Right. Large allocations are rounded up to a number of pages dedicated for the allocation. The rest of the pages in that chunk can be used for other small or large allocations. This does mean that large allocations may waste up to 4K - 1 in memory.

Probably worse, allocations just above 1MB may waste ~50% of memory due to each allocation requiring a new chunk, since the existing chunks occupying >1MB don't hold enough pages for the new allocation. This spaces is not wasted, per-se, as it can still be used for other allocations. Typically, in a real-world program, you'd have a mix of large and small allocations.

We may try to decreasing the huge allocation threshold. However, this may potentially lead to more mmap calls which may negatively impact performance.

from php-src.

martinhoch42 avatar martinhoch42 commented on May 29, 2024

Typically, in a real-world program, you'd have a mix of large and small allocations.

In theory this should be the same for my case but shrug.

from php-src.

iluuu1994 avatar iluuu1994 commented on May 29, 2024

I have thought about promoting large allocations to huge if no chunks are free, and the size is >1MB. However, I'm not sure how much it would actually help.

Even though memory_get_usage(true) may report a certain amount of memory, it doesn't mean this memory is actually reserved on your hardware. The number reported by memory_get_usage(true) is virtual memory, and computers nowadays are very good at only occupying the memory that is actually being used.

To be a bit more precise, when PHP allocates the 2MB chunk using mmap, the OS reserves a 2MB address range for this purpose. However, none of that memory is actually stored in memory yet. Once one of the addresses in this range is accessed, the hardware triggers a page fault, which is when the OS actually goes searching for free memory in your RAM. That effectively means that, even if your huge allocation only occupies a part of the chunk and the rest of the chunk is "wasted", it's not actually wasted because it was never accessed or written to and as such it remains unmapped to your RAM.

Apart from being confusing when looking at memory_get_usage, the only other place where this really matters is memory_limit, as you have noted in this issue, where the limit can be reached 50% prematurely. I'm not sure if there's a great way to mitigate this.

from php-src.

martinhoch42 avatar martinhoch42 commented on May 29, 2024

I don't think your argument about memory not being actually lost to the system has merit, since these inefficiencies, as far as I understand, should only matter in edge cases. Nobody cares if one process that is working on large data and is using a couple of 100 MB overhead is actually blocking that extra RAM. They care about those processes not aborting due to exceeding the memory_limit prematurely while keeping the memory_limit to something reasonable for other problems not to impact normal operations.

That being said, I am not arguing against moving >1MB to huge.

As for my original problem, calling gc_mem_caches after purging my internal cache actually did solve the problem.

Reading zend_mm_alloc_pages and zend_mm_gc, I think that my problem is that some chunks are still used because the pages are part of bins (small allocations) and once those are checked in zend_mm_gc, the chunks are freed.

So, theoretically, my problem is a mix between large allocations and small allocations? Where the large allocations cause new chunks to be allocated from the system while the small allocations keep the chunk "alive" for too long? This would match what my script is doing.

from php-src.

iluuu1994 avatar iluuu1994 commented on May 29, 2024

Right, so that's where this idea came from:

I have thought about promoting large allocations to huge if no chunks are free, and the size is >1MB.

When you're creating many allocations that are just bigger than half of a chunk, each allocation will require a new chunk because one chunk cannot hold two such allocations, so you'll reach memory_limit 50% too early.

With this idea, such an allocation would instead be mmapped, rather than going to a new chunk. Thus, it could occupy 1MB instead of 2, and be released independently. However, this solution is not complete, because the same problem exists for chunk_size / 3 + 1, you'll just waste 33% instead of 50%.

As for my original problem, calling gc_mem_caches after purging my internal cache actually did solve the problem.

Reading zend_mm_alloc_pages and zend_mm_gc, I think that my problem is that some chunks are still used because the pages are part of bins (small allocations) and once those are checked in zend_mm_gc, the chunks are freed.

New chunks are added to the tail of the chunk list, which means that new small allocations will go to earlier chunks first, if they fit. Given that the half-filled chunks are at the tail of the chunk list, they should remain half-empty, unless you create a ton of small allocations that fill up the MBs worth of half-empty chunks.

However, when mixing allocations that are almost 2MB with small allocations, the large allocation will create a new chunk that's almost full, with small allocations spilling into the next chunk very quickly. If the small allocations stay around, this would lead to many, almost completely empty chunks. This may be mitigated by lowering the max large size, but then you're still susceptible to the same thing with two large allocations that together make up almost 2MB. It seems there's always a way to get into this situation. Lowering the thresholds can just reduce the severity.

while the small allocations keep the chunk "alive" for too long?

Chunks don't get removed unless you're running into a memory_limit, or when calling gc_mem_caches explicitly. There should generally not be a reason to call gc_mem_caches, other than to release memory for other processes after freeing a bunch of it in a long running process. If your chunks are free, they are completely unoccupied. PHP cannot move allocations, as this would invalidate pointers.

from php-src.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.