Giter Site home page Giter Site logo

Comments (13)

jpd236 avatar jpd236 commented on August 29, 2024 1

Please see #11 (comment) for how to use a snapshot build if you'd like to pick up this fix. We haven't done an official release yet as there are a few more fixes I'd like to get in before we do.

from volley.

joebowbeer avatar joebowbeer commented on August 29, 2024

More about this bug:

Almost all of the OOME failures happen while reading the string-string map of headers during initialize.

Below is a typical stack trace, from https://code.google.com/p/android/issues/detail?id=230219

Exception java.lang.OutOfMemoryError:
  com.android.volley.toolbox.DiskBasedCache.streamToBytes (DiskBasedCache.java:323)
  com.android.volley.toolbox.DiskBasedCache.readString (DiskBasedCache.java:541)
  com.android.volley.toolbox.DiskBasedCache.readStringStringMap (DiskBasedCache.java:563)
  com.android.volley.toolbox.DiskBasedCache$CacheHeader.readHeader (DiskBasedCache.java:404)
  com.android.volley.toolbox.DiskBasedCache.initialize (DiskBasedCache.java:157)
  com.android.volley.CacheDispatcher.run (CacheDispatcher.java:84)

There are also a few rare instances where OOME is thrown from this same location in DiskBasedCache, but on behalf of a get request. I assume these are thrown when an app writes a bad entry to the cache then later tries to retrieve the entry from the cache during the same execution, before the cache has been re-initialized.

A bad entry may also manifest as a NegativeArraySizeException (NASE) when trying to get the entry. These exceptions are relatively rare and occur after the header map has been read. They result when the excessively large number of bytes read is subtracted from the file length and then used as an array size. If this value is negative, NASE is thrown. (Theoretically it's possible that a bad entry could go undetected.)

A nasty property of these exceptions is that they persist. Once the problem occurs, it will occur every time the cache is initialized. This is why some additional catch-all exception handling is needed: preferably per entry, serving to remove the errant entry if an exception is thrown when reading or writing it. Otherwise, something else is needed in order to prevent these bad cache entries from crashing the app. One developer recommends the use of an uncaught exception handler that will clean the entire cache in the event of an uncaught OOME or NASE. See https://github.com/anyercastillo/volley-crash

Let's go back to the original bug report in 2013:

https://code.google.com/p/android/issues/detail?id=61406

The reporter attached a bad cache entry to the bug report.

The problem occurs after the "Content-Length: 2574" header is serialized.

This string pair should be serialized as:

0x0e 0x00 0x00 0x00 0x00 0x00 0x00 0x00 Content-Length
0x04 0x00 0x00 0x00 0x00 0x00 0x00 0x00 2574
0x07 0x00 0x00 0x00 0x00 0x00 0x00 0x00 Expires // beginning of next pair

But in this case it was serialized as:

0x0e 0x00 0x00 0x00 0x00 0x00 0x00 0x00 Content-Length
0x04 0x00 0x00 0x00 0x00 0x00 0x00 0x00 2574
0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x32 0x35 0x37 0x34 // .......2574
0x07 0x00 0x00 0x00 0x00 0x00 0x00 0x00 Expires // beginning of next pair

Here's what happens after the "Content-Length" and "2574" strings are read:

  1. The next 8 bytes are read: 0x3200000000000000L.
  2. This long is cast to an int to be used as the byte-length for the next encoded string key.
  3. (int)0x3200000000000000 is 0, so no bytes are read, resulting in a bogus "" key.
  4. The next 8 bytes are read: 0x0000000007343735L.
  5. This long is cast to int to be used as the byte-length for the next encoded string value.
  6. (int)0x0000000007343735 is 120 million, triggering an allocation attempt that fails.
  7. OOME!

Open questions:

Why are the string byte array lengths serialized as longs?
How are these bad headers written in the first place?

from volley.

jpd236 avatar jpd236 commented on August 29, 2024

That report is quite old, and seems like it might have been addressed by https://android-review.googlesource.com/#/c/99201/. Are there more recent reports of this happening?

from volley.

joebowbeer avatar joebowbeer commented on August 29, 2024

The change you refer to was a cache-all on write header. It seems to have no impact on the thousands of OOME crashes that are happening now.

The first stack trace I referenced in my previous comment is very new, but is essentially identical to the report from 4 years ago.

from volley.

jpd236 avatar jpd236 commented on August 29, 2024

OK, fair - but I think that the example entry above might have been affected by that bug. Hard to tell given how old it is.

I can't help but wonder, from reading things like http://stackoverflow.com/questions/24095909/volley-out-of-memory-error-weird-allocation-attempt/42196956#42196956, whether this might be to "misuse" of Volley/DiskBasedCache rather than corruption happening out of the control of the app. There are many apps that use Volley and have never seen this issue before, whereas the ones that do see it see it frequently from what I'm gathering. That does seem to suggest a usage pattern that might be causing the problem.

It is very certainly the case that DiskBasedCache is not and does not purport to be thread safe (or process safe). If you're trying to write the same directory location in multiple threads or processes that could absolutely cause bad things to happen.

Would be good to get more info... something like an example app w/ source that exhibits this problem.

But it's at least true that the cache can be cleared by the user at any time, which is something the app doesn't control, and guarding against at least that is reasonable. I just want to have some confidence that we're hitting the real issue here and not papering over a more fundamental cause than random disk corruption. (100 times per day for an app does not sound like disk corruption, at all).

from volley.

jpd236 avatar jpd236 commented on August 29, 2024

I wonder whether a combination of a FileLock and some sort of thread lock on the cache directory could root out that kind of thing in a more obvious manner.

Note that Volley only accesses the cache in a single thread (CacheDispatcher), so it doesn't have to worry about thread safety. But if you have multiple request queues, or multiple processes, as a few of these reports seem to suggest were the case, then you'll have problems.

from volley.

joebowbeer avatar joebowbeer commented on August 29, 2024

OK, fair - but I think that the example entry above might have been affected by that bug. Hard to tell given how old it is.

Given that the stack traces I've seen are all similar, and that no code has been added that I believe really addresses this problem, I assume the old repro case still applies.

I can't help but wonder, from reading things like http://stackoverflow.com/questions/24095909/volley-out-of-memory-error-weird-allocation-attempt/42196956#42196956, whether this might be to "misuse" of Volley/DiskBasedCache rather than corruption happening out of the control of the app. There are many apps that use Volley and have never seen this issue before, whereas the ones that do see it see it frequently from what I'm gathering. That does seem to suggest a usage pattern that might be causing the problem.

Once the problem occurs, it recurs every time the cache is initialized. This tends to amplify the problem in the apps where the problem occurs.

I saw the suspicion about misuse and this occurred to me as well, but I can't find a concrete case.

I think a usage pattern or data pattern is more likely. Perhaps there is a flaw in the DiskBasedCache implementation that is only hit by some apps? For example, the mapping from key to file name is pretty poor. What if two keys map to the same file? Currently, DiskBasedCache has barely any unit tests, and testing for the proper handling of key collision is not one of them.

It is very certainly the case that DiskBasedCache is not and does not purport to be thread safe (or process safe). If you're trying to write the same directory location in multiple threads or processes that could absolutely cause bad things to happen.

Every overridden method is synchronized, so I think DBC does intend to be thread safe, and I don't see any obvious flaws in that aspect of its implementation. I have no evidence that any of the apps encountering this problem are using multiple processes.

from volley.

jpd236 avatar jpd236 commented on August 29, 2024

I saw the suspicion about misuse and this occurred to me as well, but I can't find a concrete case... I have no evidence that any of the apps encountering this problem are using multiple processes.

http://stackoverflow.com/a/38106070 "The main application process and a 'SyncAdapter' process following the sync adapter pattern. Only the app with two processes would crash. These two processes would independently initialize the cache."

mcxiaoke/android-volley#61 "you can check your reqeustQueue is Singleton, I has solved the problem that is OOM caused by context leak with every context init a requestQueue."

For example, the mapping from key to file name is pretty poor. What if two keys map to the same file? Currently, DiskBasedCache has barely any unit tests, and testing for the proper handling of key collision is not one of them.

Even that were to happen, I don't see how it would lead to corruption in this manner, as opposed to an entry in the cache that had unexpected (but well-formed) data, given the locks you pointed out.

Every overridden method is synchronized, so I think DBC does intend to be thread safe, and I don't see any obvious flaws in that aspect of its implementation. I have no evidence that any of the apps encountering this problem are using multiple processes.

You're right - sorry; it is indeed the case that multiple NetworkDispatcher threads as well as the CacheDispatcher thread are accessing the cache in parallel. However, the only ensured thread safety is for a single Cache instance. If you're creating more than one instance in a process, or more than one instance across multiple processes, and those both point to the same location on disk, then you'll run into issues like this, and that's exactly what both the cases noted above were doing.

from volley.

joebowbeer avatar joebowbeer commented on August 29, 2024

It is not clear that the two comments referred to above had any relation to the reporter's problem. In the first case, another answer was marked as correct.

In my experience with this bug, there are not and never were AFAICT multiple processes. However, I agree that multiple processes is one of the easiest root causes to imagine.

Other causes I can imagine:

  1. Incomplete cleanup (e.g., on update or replacement or trim). There are places in the code where the file is deleted before the stream is closed. These look suspicious and I recommend they be fixed.

  2. A bad output buffer implementation, such as a poorly implemented cache in the buffer. Adding unit tests might reveal a problem here.

  3. Interference from another crash? If the app crashes or is terminated for some other reason while an entry is being written, can this leave a bad entry on disk? More rigorous exception handling and recovery may help.

Once the problem has occurred, by means that remain a mystery after 4 years, the exception that results is surprisingly stable. It manifests either as a string byte array size that is too large, or in rarer cases as a negative remaining data size (now fixed).

Focusing in on the location of the exception itself, there is one small change that would help rid the cache of bad entries. These lines in the readString method:

int n = (int) readLong(is);
byte[] b = streamToBytes(is, n);

Should be:

long n = readLong(is);
if (n < 0 || n > fileLength) {
  throw new IOException("string length=" + n + " file length=" + fileLength);
}
byte[] b = streamToBytes(is, (int) n);

Every time a header is read or an entry is fetched from cache, file.length() can serve as an effective upper bound for the serialized lengths. A little refactoring is needed in order to utilize it.

from volley.

jpd236 avatar jpd236 commented on August 29, 2024

Multiprocess might be rare but mutlithread is certainly not, and I could definitely see bugs where multiple request queues are being created.

Agreed that delete() before close() doesn't seem right and should probably be fixed. The other possibilities feel unlikely to me.

I think I'm fine with adding a length check there as a sanity measure. And of course, I'm fine with more unit tests.

from volley.

joebowbeer avatar joebowbeer commented on August 29, 2024

In the process of adding more tests to DiskBasedCacheTest, I observe that the NegativeArraySizeException (NASE) handling in the get() method is not actually removing the entry. Apparently this is because the finally handler is closing the file input stream after an attempt is made to delete the underlying file. If the failure in get() does not actually remove the file, a subsequent initialize() should also fail, crashing the app because there is no handler for NASE in the initialize() method. It appears that neither NASE nor the more prevalent OOME are being handled in either get() or initialize().

from volley.

joebowbeer avatar joebowbeer commented on August 29, 2024

Fixed by PR #52

from volley.

tbohnen avatar tbohnen commented on August 29, 2024

Hi, sorry to polute the thread however I can't see anywhere else to resolve this. I can't seem to find any documentation on how to actually get the update. I have 1.0.0 and had the below error:

DiskBasedCache.java line 323 in DiskBasedCache.streamToBytes()
DiskBasedCache.java line 119 in DiskBasedCache.get()
CacheDispatcher.java line 100 in CacheDispatcher.run()

from volley.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.