Giter Site home page Giter Site logo

Comments (24)

mvaudel avatar mvaudel commented on June 14, 2024 1

Hi @tzaeschke,

We might have found where the maps got corrupted, and then keys got out of sync between zoodb and the rest of the code. It remains to be confirmed by the other developers that this problem is solved, but wanted to let you know already so that you don't dig any further on this.

It seems that you were right and this was all the product of a concurrency issue that was not correctly handled. I wanted to express our gratitude to help us with this. I have rarely seen such deep debugging from a third party and that has really helped us a lot in solving the problem (if it is solved).

Many thanks,

Marc

from zoodb.

tzaeschke avatar tzaeschke commented on June 14, 2024

I still have trouble running the example. I do the following:

  • New Project
  • Enter name -> Project reference
  • select qExactive01819.mzID as identification file
  • select qExactive01819.t.xml as spectrum file
  • Load Data
    -> This hangs after 10% or so.
    Is the procedure correct?

from zoodb.

hbarsnes avatar hbarsnes commented on June 14, 2024

Please select the whole zip file as the identification file. The rest should fill itself out based on the content of the zip file.

from zoodb.

tzaeschke avatar tzaeschke commented on June 14, 2024

I still have trouble executing this. Is select the qExactive01819.mzID.gz (I did select the whole file before, unlike what I described). When I click load data it starts loading it but gets stuck at 10%:
image
Also, the Graphics card is 98% busy for some reason, is that expected...?
image

from zoodb.

tzaeschke avatar tzaeschke commented on June 14, 2024

I also tried reproducing it with a small test class, but using a HashMap<String, HashSet<Long>>. Appears to work fine. I noticed that compomics does not do zooActivateRead/zooActivateWrite when accessing the IdentificationKeys class, but in that should only lead to missing entries/updates, not to any Exception.

from zoodb.

hbarsnes avatar hbarsnes commented on June 14, 2024

Strange. And this happens when you select the whole searchgui_out.zip file as the identification file?

I assume you are building your own versions of both utilities and PeptideShaker using the new_backend branches?

Can you check if there is anything in the PeptideShaker error log? You'll find it via the Welcome Dialog > Settings & Help > Help > Bug Report.

from zoodb.

hbarsnes avatar hbarsnes commented on June 14, 2024

I've also uploaded the saved project in case looking at this one might be helpful as well: http://vedlegg.uib.no/?id=af33505200a61725a5ecc7a6265824a7

from zoodb.

tzaeschke avatar tzaeschke commented on June 14, 2024

Some progress.

  1. I actually tried to import the qExactive01819.mzID.gz ZIP file instead of the search_gui.zip ZIP file. However, importing the latter causes a very similar problem, it hangs at 9%. The debugger shows 7 worker threads waiting for a mutex in the CmsFileReader. From the numbering it appears there should be 8 worker threads, one may have crashed, but I don't see any errors. The bug report says: PeptideShaker.log not found..
    Also, even when PeptideShaker is not importing, something is busy-looping on the graphics card, probably the "D3D Screen Updater" or one of the AWT threads...?

  2. Using the database I can reproduce the ZooDB exception that you posted. I will have a look whether I can find out what is going on.

from zoodb.

hbarsnes avatar hbarsnes commented on June 14, 2024

The bug report says: PeptideShaker.log not found..

Ah, yes of course. If you are running PeptideShaker directly from the source code the log file will not be created. But the stack trace should be sent to your standard output, which I guess would be your IDE output in this case? Nothing there either?

Perhaps something went wrong with the upload/download of the zip file? Here's another example dataset that also results in the same error: http://vedlegg.uib.no/?id=20de84a6fafdc563b2e1236a9287c7a9. Maybe you have better luck with this one?

Also, even when PeptideShaker is not importing, something is busy-looping on the graphics card, probably the "D3D Screen Updater" or one of the AWT threads...?

I think this is just the waiting/progress dialog and can be safely ignored. At least it should not affect the issue above.

Using the database I can reproduce the ZooDB exception that you posted. I will have a look whether I can find out what is going on.

Great! Thanks for looking into all of this!

from zoodb.

tzaeschke avatar tzaeschke commented on June 14, 2024

Regarding the new project not working: PsmFirstHitRunnable.run() catches Exception, I changed it to catch Throwable and then saw the following:

java.lang.NoSuchMethodError: java.nio.MappedByteBuffer.position(I)Ljava/nio/MappedByteBuffer;
	at com.compomics.util.experiment.io.mass_spectrometry.cms.CmsFileReader.getPrecursorMz(CmsFileReader.java:271)
	at com.compomics.util.experiment.io.mass_spectrometry.MsFileHandler.getPrecursorMz(MsFileHandler.java:261)
	at com.compomics.util.experiment.identification.filtering.PeptideAssumptionFilter.validatePrecursor(PeptideAssumptionFilter.java:311)
	at eu.isas.peptideshaker.fileimport.PsmFirstHitRunnable.processPsm(PsmFirstHitRunnable.java:240)
	at eu.isas.peptideshaker.fileimport.PsmFirstHitRunnable.run(PsmFirstHitRunnable.java:167)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:266)
	at java.util.concurrent.FutureTask.run(FutureTask.java)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

The problem is discussed here. There are two solutions:

  • Upgrade to Java 9 or later (this is what I did), now the import works fine
  • Downcast to Buffer, see fixes in this PR.

Essentially, it may be good to catch Throwable instead of Exception and potentially ask the user to upgrade to Java 9 or later. I am not sure why this happened, because I compiled everything with Java 8, but maybe some of the downloaded libraries were compiled with Java 9.

from zoodb.

hbarsnes avatar hbarsnes commented on June 14, 2024

That is some impressive in-depth debugging just to get to the actual issue! :)

I'm not able to reproduce the MappedByteBuffer issue myself though. We use JDK 8 for the compilation of utilities and PeptideShaker, and PeptideShaker runs fine on both JDK 8 and 13. Which JDK were you using for compiling and which for running?

And I assume this means that you are finally able to load the data in PeptideShaker and reproduce the original HashMap issue?

from zoodb.

tzaeschke avatar tzaeschke commented on June 14, 2024

I think I compiled and ran everything with JDK 8 1.8.0_60.

Regarding the HashMap issue, I realized that importing the large dataset from Domink, it got stuck or failed at different points. At some point I got the following Exception:

java.lang.ClassCastException: class java.util.HashMap$Node cannot be cast to class java.util.HashMap$TreeNode (java.util.HashMap$Node and java.util.HashMap$TreeNode are in module java.base of loader 'bootstrap')
	at java.base/java.util.HashMap$TreeNode.moveRootToFront(HashMap.java:1884)
	at java.base/java.util.HashMap$TreeNode.treeify(HashMap.java:2000)
	at java.base/java.util.HashMap.treeifyBin(HashMap.java:772)
	at java.base/java.util.HashMap.putVal(HashMap.java:644)
	at java.base/java.util.HashMap.put(HashMap.java:612)
	at java.base/java.util.HashSet.add(HashSet.java:220)
	at com.compomics.util.experiment.identification.Identification.addSpectrumMatches(Identification.java:545)
	at eu.isas.peptideshaker.fileimport.PsmImportRunnable.run(PsmImportRunnable.java:200)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:830)

This appears to happen sometimes when a HashMap is accessed concurrently.
I then changed line 626 in FileImporter to only use one thread:

                1,  // processingParameters.getnThreads(),

I could then import searchgui_out.zip and save/reload the database without obvious problems.
It looks like the HashMaps in the IdentificationKeys class are accessed via the Identification class without much thread protection.
I assume what happens is that the HashMaps get modified while ZooDB is saving or reading them, thus causing the HashMap error.
Could you have a look?

Fixing this by setting import-threads to 1 suggest that only read-access should be a problem, so it is a bit weird that I can still not import the database you sent me, but that may have become corrupted during testing...?

from zoodb.

hbarsnes avatar hbarsnes commented on June 14, 2024

I can confirm that using only one thread works. So it seems like you are correct in that it is a multi threading issue. I will have to dig deeper to figure out why and how to solve it, but at least we now know what the underlying problem is.

Fixing this by setting import-threads to 1 suggest that only read-access should be a problem, so it is a bit weird that I can still not import the database you sent me, but that may have become corrupted during testing...?

Well, if the reading error is happening during the processing of the data but goes unnoticed it will not show up before we try to reopen the project? On the other hand, shouldn't ZooDB complain when the project is saved if the HashMap is corrupted?

Thanks for all the help!

from zoodb.

tzaeschke avatar tzaeschke commented on June 14, 2024

No problem :-)
I guess to make it threadsafe the classic approach is to make all methods synchronized and replace all methods that return a HashMap or an Iterator with forEach(), for example getSpectrumMatchesIterator(long[] spectrumMatches,WaitingHandler waitingHandler)
becomes something like
forEachSpectrumMatches(long[] spectrumMatches,WaitingHandler waitingHandler, Function<SpectrumMatch, R> callable).

For a more general refactoring, I found the HashMaps to be quite large, which means long write times if they ever get rewritten. It should be possible to remove the HashMaps and use database indexes and queries to find the objects. ZooDB should be able to deal with concurrent access/queries (even without the mutex you have around the database).

I also have a suggestion for ObjectsDB.establishConnection(). The query for firstLevel == true is not indexed, so it will traverse every single object in the database, hence the long start time. One solution would be to install an index on the firstLevel field, but that would slow down writes (a little bit). The cheapest solution may be to make the query more precise; it looks to me like only objects of few classes can have firstLevel==true. So one could make 6-7 queries for the 6-7 classes that can be first level, the database would then only traverse objects of these 6-7 classes instead of all classes.

from zoodb.

hbarsnes avatar hbarsnes commented on June 14, 2024

Thanks for the suggestions! I will forward them to Dominik as he's the one in charge of this part of the code.

from zoodb.

hbarsnes avatar hbarsnes commented on June 14, 2024

I've been looking more into this issue and there is something I do not understand. If the assumption is correct that the hashmap is corrupted during the processing, then shouldn't the hashmap be corrupted before it is added to the database and not only when we read it back?

As far as I can tell there is nothing wrong with the hashmap at the time we insert it into the database. At least I can iterate all of the content without any exceptions being thrown.

Or is there something I'm missing here?

from zoodb.

tzaeschke avatar tzaeschke commented on June 14, 2024

The problem that I could reproduce is that it seems to get corrupted during reading (1-threaded import works, 8-threaded import does not work).
But, indeed, I could never import the ZooDB-test.psdb you sent me, so that appears to be corrupted in the database. There are at least two scenarios:

  • It got corrupted while inserting it into the database. That would be different from the multi-threaded import issue, we would need to look at this again, maybe export is also multithreaded?
  • It got corrupted after reading: During import, the HashMap got corrupted (due to multi-threaded import). As a reaction to the exception (or ignoring the Exception, or in a different thread), the importer closed the database connection with a call to commit(). I saw the following code in ObjectsDB.close():
       pm.currentTransaction().commit();
        if (pm.currentTransaction().isActive()) {
            pm.currentTransaction().rollback();
        }

This indicates that a transactions are always committed on close(), thus committing a broken HashMap into the database. If close() is called after receiving an Exception, this may explain a corrupted HashMap in the DB.
On a separate node, the above code will never roll back, because after commit() the transaction will always be non-active. Also, commit() would throw an Exception if the transaction is not active anymore.

Does that help?

from zoodb.

hbarsnes avatar hbarsnes commented on June 14, 2024

The thing is that the map is not added to the database during the processing, only upon saving the project. In other words, the only time identification.addObject(IdentificationKeys.KEY, identification.getIdentificationKeys()) is used is in the PsdbExporter.saveAs method.

The map could of course be corrupted before adding it to the database, due to multi-threading issues, but as far as I can tell there is nothing wrong with the map neither before nor after adding to the database.

It got corrupted while inserting it into the database. That would be different from the multi-threaded import issue, we would need to look at this again, maybe export is also multithreaded?

I've tried adding the map (or rather the object containing the map) to the database and then immediately take it back out, but still the map is fine. For example I can do

IdentificationKeys identificationKeys = (IdentificationKeys) identification.getObjectsDB().retrieveObject(IdentificationKeys.KEY);

at the end of PsdbExporter.saveAs and the map does not appear to be corrupt. So it must happen at a later stage.

The export itself is not multi-threaded. At least not from our side.

It got corrupted after reading: During import, the HashMap got corrupted (due to multi-threaded import). As a reaction to the exception (or ignoring the Exception, or in a different thread), the importer closed the database connection with a call to commit().

There is no multi-threading when opening the connection to the database. As you can see from PsdbParent.loadPsdbFile onwards nothing much is really happening before trying to query the database.

This indicates that a transactions are always committed on close(), thus committing a broken HashMap into the database. If close() is called after receiving an Exception, this may explain a corrupted HashMap in the DB.

Maybe, but I don't see how that is linked to using one or multiple threads? And there should be an exception thrown at some point then?

from zoodb.

tzaeschke avatar tzaeschke commented on June 14, 2024

I think there are several things to unpack here, so I will give a bit of background on ZooDB (and apologies if you already know that):

the map is not added to the database during the processing

I assume the 'processing' is what happens after the loading. The thing is that PeptideShaker does not use ZooDB/JDO 100% as it was intended (nothing wrong with that, just saying :-) ). The idea is that one uses an application with all its objects, and whenever the application thinks it is in a consistent state that is worth saving, one can call commit(). More to the point ZooDB is meant to keep track of all objects. If a new object is added to a top level object, ZooDB detects that. If an object is modified, ZooDB detects that. On commit(), all new and modified objects are stored. Similar for loading, only top-level objects need to be loaded by a query, all other objects will get loaded lazily whenever the application traverses the fields of already loaded objects. Next to accessing fields, queries are of course the other prime mode of accessing/loading objects. Since ZooDB does not have a class enhancer (at the moment), this automatic loading/saving requires manual implementation of the zooActivateRead()/zooActivateWrite() methods that I mentioned before.
What is my point? Once an object was once in the database, it will be stored again on the next commit if it is modified, there is no need to add it again. However, you are right in this special case, because IdentificationKeys does not have the zooActivateRead/Write() implemented so it would indeed need to be manually set to 'modified', otherwise ZooDB will not realize it was modified.
TL;DR: As you said, and from what I can see, if the map gets corrupted during load, it should not be saved again.

Maybe, but I don't see how that is linked to using one or multiple threads? And there should be an exception thrown at some point then?

I also do no see a connection, but using single-threaded mode seems to avoid the issue? ZooDB doesn't really know much about objects and what they mean, except their names and type. If the object graph (such as HashMap and its components) changes while ZooDB is reading it (and writing it to disk), it is impossible to detect inconsistencies. So the only way to detect errors is trying to reconstruct the object graph (here: a HashMap) and see the HashMap class thinks that something is wrong -> Exception. This is pretty much what we see here. This is equivalent to a in-memory race condition, except that we saved the inconsistent view to a database. In other words, ZooDB has no way of telling the difference between a corrupted and a healthy HashMap.

Anyway, something is a bit strange here, so I will continue to investigate. Let me know if there is any progress with investigating the 1 vs 8 thread Import.

from zoodb.

tzaeschke avatar tzaeschke commented on June 14, 2024

So I dug a bit deeper and found the following problem. When ZooDB writes a HashMap/HashSet to the database it first calls size() and writes the number of entries. It then uses an iterator to iterate over all elements and write them to the file. The problem is that the number of elements that the iterator finds is different from what size() reported. During loading this causes because it expects, say, 3000 entries but find 3096, ie it will read 3000 and then try to read the next Field from bytes that represent the 3001st entry.

I tested a simple hack/fix that works: I let ZooDB internally copy the HashMaps/Sets before serializing them, if I do that the size() matches the number of entries and everything works. Clearly, this is not a fix I will put permanently into ZooDB because it affect performance and GC.

This looks very much like something is modifying the HashMaps/HashSets while ZooDB is writing them. However, ZooDB is single threaded (no concurrent modification pretty much impossible) and I also cannot reproduce it in a single-threaded JUnit test with the same data in the HashMaps/Sets.

I will do some more research tomorrow, but I am afraid this strongly suggests some kind of concurrent access inside PeptideShaker while ZooDB is writing. Is it possible that the new-project-importer threads are not all finished by the time that the pop-up disappears and saving is enabled? Or that the GUI threads update the data that they are displaying?

from zoodb.

tzaeschke avatar tzaeschke commented on June 14, 2024

Ah, just saw you last comment. Glad it got (probable) solved!

from zoodb.

hbarsnes avatar hbarsnes commented on June 14, 2024

Ah, just saw you last comment. Glad it got (probable) solved!

Yes, it seems like @mvaudel's recent changes solved the problem! :)

Thanks again for all the help!

from zoodb.

tzaeschke avatar tzaeschke commented on June 14, 2024

No worries, you are welcome.

from zoodb.

tzaeschke avatar tzaeschke commented on June 14, 2024

I am closing this, please reopen if the error occurs again.

from zoodb.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.