After adding these exceptions to the slab and record implementations I've encountered

We had a testnet server die on us. This is Linux: <div class="snippet-clipboard-co

My working theory is that the assumptions of the <a href="http://preshing.com/20120612

Blockchain corruption appears to result in normal course. about libbitcoin-blockchain HOT 7 CLOSED

libbitcoin commented on June 25, 2024

Blockchain corruption appears to result in normal course.

from libbitcoin-blockchain.

Comments (7)

tommcintyre commented on June 25, 2024

I'm also seeing this on Linux. After 3 separate attempts to sync from scratch I have been unable to get much past block 369k due to database corruption. Now on startup I get:
FATAL [database] Record database /home/ubuntu/bs/blockchain/spends is corrupt (201379538)[0] via get

I wrote some code to iterate and dump out all transactions in the blockchain/txs database between blocks 0 and 369000. Of those 79M transactions, 2042 were corrupt. I guess the same type of infrequent record corruption is happening across the other DBs.

Note that I'm using the "version2" branch, as I encountered other, more serious problems running master (I forget which issue I encountered at the time, but it was already logged here and the advice from one of the devs was to switch ot version2 for now). The components were all built from source on Ubuntu 14.04.3 LTS.

from libbitcoin-blockchain.

swansontec commented on June 25, 2024

We had a testnet server die on us. This is Linux:

15:18:54.261771 FATAL [database] Record database blockchain/spends is corrupt (3442084)[0] via get

from libbitcoin-blockchain.

swansontec commented on June 25, 2024

I shut down the server, deleted the blockchain directory, and re-synced from scratch. The sync stopped at block 577327 just past our last checkpoint (577090) with:

21:45:08.046860 FATAL [database] Record database blockchain/spends is corrupt (3649974)[0] via get

So, this box seems incapable of syncing the blockchain without encountering this error. I will try another box with different specs, on the theory that it's some sort of timing thing.

from libbitcoin-blockchain.

evoskuil commented on June 25, 2024

Given that I can reproduce it faithfully, and several others have as well, I'm pretty sure it's not your hardware. I do lean toward it being the result of a race, which is either far more likely during validation, or limited to validation. Note that the previous behavior was to tie up a thread when this happened, so it took several occurrences to effectively bring down the server, although it manifested as a stall.

from libbitcoin-blockchain.

evoskuil commented on June 25, 2024

My working theory is that the assumptions of the lock-free design are violated, specifically in the lack of atomicity in finalizing certain write operations. I previously commented the code, as a work in progress. If you search on "atomic" (case insensitive) you will find most of it. For example: this line.

index_type linked_records::insert(index_type next)
{
    static_assert(sizeof(index_type) == sizeof(uint32_t),
        "index_type incorrect size");

    // Create new record.
    auto record = allocator_.allocate();
    auto data = allocator_.get(record);

    // Write next value at first 4 bytes of record.
    auto serial = make_serializer(data);

    // MUST BE ATOMIC ???
    serial.write_4_bytes_little_endian(next);
    return record;
}

This operation moves a 32 bit value into the file memory map. I believe it was intended to rely on the "natural atomicity" of moving an 32 bit value on most (if not all) CPUs. But it's implemented as a loop over 1 byte moves, so it's certainly not atomic on any CPU.

from libbitcoin-blockchain.

evoskuil commented on June 25, 2024

For the sake of clarification, the lock-free programming model isn't actually free of locks. Our write operations are sequenced by the boost::asio, which locks internally. We rely on our own sequential lock to prevent reads during write. An atomic counter is incremented before entering an write operation and decremented after exit. Evenness is used to test for a write in progress.

A read that encounters an even counter loops (with a 100ms sleep) until the counter becomes even. Upon entering the read the counter value is stored in the read closure. The read returns a copy of all data. Upon completing the read the stored counter value is compared to the present value. If they are not the same the read returns a failure, based on the possibility that the read was corrupted by a write.

So any write during read fails the read, and any read started during a write is blocked until the write completes. As long all writes execute on the same boost::asio strand (which is the case) and all reads are guarded as described (which is the case) it should not be possible for concurrency to corrupt an index.

The benefit arising from the LFP model as implemented is that the writer never suffers starvation. My preceding post on atomicity is based on the idea that reads would execute concurrently with writes, i.e. read-copy-update. This would provide a read performance benefit as writes would not block reads. To do this safely the index must be updated atomically, which is not the case. However, this is simply a lack of optimization and would not result in corruption unless the read guards were removed.

from libbitcoin-blockchain.

evoskuil commented on June 25, 2024

So in light of the two preceding posts we (1) do not have the benefit of atomic updates to protect reads (i.e. read-copy-update is not viable without modifying/verifying each update technique), and (2) expect that reads overtaken by a write operation will test the counter and fail.

This is clearly the intent, but I believe a write operation is creating an intermediate state that puts the read operation into an infinite loop. As such this would indicate that (a) the database is not actually corrupted, and (b) the read could be modified to fail gracefully (since we can detect the "corruption").

from libbitcoin-blockchain.

Blockchain corruption appears to result in normal course. about libbitcoin-blockchain HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent