Giter Site home page Giter Site logo

Comments (7)

tommcintyre avatar tommcintyre commented on June 25, 2024

I'm also seeing this on Linux. After 3 separate attempts to sync from scratch I have been unable to get much past block 369k due to database corruption. Now on startup I get:
FATAL [database] Record database /home/ubuntu/bs/blockchain/spends is corrupt (201379538)[0] via get

I wrote some code to iterate and dump out all transactions in the blockchain/txs database between blocks 0 and 369000. Of those 79M transactions, 2042 were corrupt. I guess the same type of infrequent record corruption is happening across the other DBs.

Note that I'm using the "version2" branch, as I encountered other, more serious problems running master (I forget which issue I encountered at the time, but it was already logged here and the advice from one of the devs was to switch ot version2 for now). The components were all built from source on Ubuntu 14.04.3 LTS.

from libbitcoin-blockchain.

swansontec avatar swansontec commented on June 25, 2024

We had a testnet server die on us. This is Linux:

15:18:54.261771 FATAL [database] Record database blockchain/spends is corrupt (3442084)[0] via get

from libbitcoin-blockchain.

swansontec avatar swansontec commented on June 25, 2024

I shut down the server, deleted the blockchain directory, and re-synced from scratch. The sync stopped at block 577327 just past our last checkpoint (577090) with:

21:45:08.046860 FATAL [database] Record database blockchain/spends is corrupt (3649974)[0] via get

So, this box seems incapable of syncing the blockchain without encountering this error. I will try another box with different specs, on the theory that it's some sort of timing thing.

from libbitcoin-blockchain.

evoskuil avatar evoskuil commented on June 25, 2024

Given that I can reproduce it faithfully, and several others have as well, I'm pretty sure it's not your hardware. I do lean toward it being the result of a race, which is either far more likely during validation, or limited to validation. Note that the previous behavior was to tie up a thread when this happened, so it took several occurrences to effectively bring down the server, although it manifested as a stall.

from libbitcoin-blockchain.

evoskuil avatar evoskuil commented on June 25, 2024

My working theory is that the assumptions of the lock-free design are violated, specifically in the lack of atomicity in finalizing certain write operations. I previously commented the code, as a work in progress. If you search on "atomic" (case insensitive) you will find most of it. For example: this line.

index_type linked_records::insert(index_type next)
{
    static_assert(sizeof(index_type) == sizeof(uint32_t),
        "index_type incorrect size");

    // Create new record.
    auto record = allocator_.allocate();
    auto data = allocator_.get(record);

    // Write next value at first 4 bytes of record.
    auto serial = make_serializer(data);

    // MUST BE ATOMIC ???
    serial.write_4_bytes_little_endian(next);
    return record;
}

This operation moves a 32 bit value into the file memory map. I believe it was intended to rely on the "natural atomicity" of moving an 32 bit value on most (if not all) CPUs. But it's implemented as a loop over 1 byte moves, so it's certainly not atomic on any CPU.

from libbitcoin-blockchain.

evoskuil avatar evoskuil commented on June 25, 2024

For the sake of clarification, the lock-free programming model isn't actually free of locks. Our write operations are sequenced by the boost::asio, which locks internally. We rely on our own sequential lock to prevent reads during write. An atomic counter is incremented before entering an write operation and decremented after exit. Evenness is used to test for a write in progress.

A read that encounters an even counter loops (with a 100ms sleep) until the counter becomes even. Upon entering the read the counter value is stored in the read closure. The read returns a copy of all data. Upon completing the read the stored counter value is compared to the present value. If they are not the same the read returns a failure, based on the possibility that the read was corrupted by a write.

So any write during read fails the read, and any read started during a write is blocked until the write completes. As long all writes execute on the same boost::asio strand (which is the case) and all reads are guarded as described (which is the case) it should not be possible for concurrency to corrupt an index.

The benefit arising from the LFP model as implemented is that the writer never suffers starvation. My preceding post on atomicity is based on the idea that reads would execute concurrently with writes, i.e. read-copy-update. This would provide a read performance benefit as writes would not block reads. To do this safely the index must be updated atomically, which is not the case. However, this is simply a lack of optimization and would not result in corruption unless the read guards were removed.

from libbitcoin-blockchain.

evoskuil avatar evoskuil commented on June 25, 2024

So in light of the two preceding posts we (1) do not have the benefit of atomic updates to protect reads (i.e. read-copy-update is not viable without modifying/verifying each update technique), and (2) expect that reads overtaken by a write operation will test the counter and fail.

This is clearly the intent, but I believe a write operation is creating an intermediate state that puts the read operation into an infinite loop. As such this would indicate that (a) the database is not actually corrupted, and (b) the read could be modified to fail gracefully (since we can detect the "corruption").

from libbitcoin-blockchain.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.