Giter Site home page Giter Site logo

Comments (5)

SGevorg avatar SGevorg commented on September 12, 2024

@jeffwillette thanks for raising the issue.
@mihran113 @alberttorosyan please take a look at this whenever you can.

from aim.

jeffwillette avatar jeffwillette commented on September 12, 2024

It appears to be a problem with sqlite. the run_metadata.sqlite database is showing that it is locked even though there should be no process which is writing to it. This must somehow be a result of the crash.

I left the forever hang go for a while and it ended in this error with this triggering many more exceptions which end in the same error:

Traceback (most recent call last):                                                                                                                                                
  File "/c2/jeff/anaconda3/envs/set-ssl/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1910, in _execute_context                                             
    self.dialect.do_execute(                                                                                                                                                      
  File "/c2/jeff/anaconda3/envs/set-ssl/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 736, in do_execute                                                 
    cursor.execute(statement, parameters)                                                                                                                                         
sqlite3.OperationalError: database is locked

I tried manually unlockin/backing up/dumping the database which I read should unlock it, but it doesn't. I moved trhe database to another filesystem, and I was able to read from it using sqlite3, but once I move it back to the location in the .aim folder, it is still locked.

This means it must have something to do with the filesystem, which is a NFS. The thing is, there is no process which is actively connected to the database, so something must have persisted from the crash to keep it locked, but I cannot find what it is. Any ideas?

from aim.

jeffwillette avatar jeffwillette commented on September 12, 2024

Seems related to #1865.

Also sqlite recommends not running a db on an NFS (https://www.sqlite.org/faq.html, https://www.sqlite.org/howtocorrupt.html). If you google around for this topic, it comes up with "don't do it" almost everywhere.

from aim.

alberttorosyan avatar alberttorosyan commented on September 12, 2024

Hey @jeffwillette! Thanks for the additional input. Looking into this issue now.
Is there any additional output when you run aim up --log-level DEBUG?
On a separate note, do you recall when the crash happened? It could be a separate issue or somehow related to this one.

from aim.

jeffwillette avatar jeffwillette commented on September 12, 2024

@alberttorosyan, I think it was only the trace I posted above. I am almost certain this issue comes down to NFS and sqlite clashing with each other, but I wasn't sure how to proceed so I just had to start over and delete the old repo (lucky there was nothing crucial in there).

The crash happened, right before the problem came up. Servers unexpectedly lost power in a power outage and when I got back and tried to fire up aim again, I was confronted with this error.

I think this might be quite dangerous for those running on an NFS. If anyone runs into this problem in the future, the only way I was able to get the sqlite database to unlock was to copy the file to a non-NFS drive and then I was open to open the db manually to inspect the tables. So if any important information were in there, I guess the whole aim repo could be copied to the drive and it should theoretically work again.

Anyway, if this is determined to solely be a sqlite/NFS issue, then feel free to close the issue.

from aim.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.