Giter Site home page Giter Site logo

Comments (2)

metanivek avatar metanivek commented on June 12, 2024

From a recent discussion with NL, this would likely work for them but we would need to have a way to communicate to them the latest good commit hash so that they can rollback lib_store as well.

from irmin.

Ngoguey42 avatar Ngoguey42 commented on June 12, 2024

I have another design to propose to solve crash (in)consistency problems.

Step 1. Change the "index" from "Index" to an append only file

Since irmin-pack supports minimal indexing, the index grows 41 byte per block, which is 43 MB per year. All the Index machinery is not useful anymore when using the minimal indexing mode.

Up to now we wanted to keep Index in case of minimal indexing fails - so that Tezos could fallback on the non-minimal indexing strategy. Minimal-indexing has proven itself, there is no need to keep that failsafe.

During our initial discussions on implementing irmin-pack's lower layer, it seemed to us that dropping the support for non-minimal indexing would simplify a lot the implementation (we would still support stores that knew non-minimal indexing in the past).

At open time, the control file now allows to detect the case where the suffix is ahead of time of the dict. However, we are still not able to detect the cases where the index is ahead of time of the suffix (we either raise Pack_store.Invalid_read or worse).

All in all, we can now consider the fact of migrating away from Index.

For the index, we could use a storage scheme similar to dict. It would be an append only file that is fully loaded in memory when opening the store, that could be garbage collected and which end offset could be remembered by the control file.

For the GC we would include a "generation" integer in the index filename. We would GC using the "surgery" technique. We would have to handle newies the same way as the Irmin 3.4 suffix.

Step 2. Raise Recovery_needed when opening a deeply corrupted store

Currently with irmin 3.4, when opening a store where the control file is ahead of time of the dict or the suffix, we raise Inconsistent_store. We currently provide no way for recovering these stores.

Following step 1. we would be able to also detect these cases for the index.

For both the dict/index/suffix we could then raise Recovery_needed and implement a recovery method. See next step.

Step 3. A new recovery method

Following the 2 previous steps we could implement a recovery method that:

  • Decides a new end offset for the index file
  • Decides a new end offset for the suffix file
  • Decides a new end offset for the dict file
  • Overwrites the old control file

The algorithms would search in index for the valid entry with the highest offset. An entry in index is valid if:

  1. it points to a valid offset in the suffix/prefix/lower,
  2. if all the objects preceeding that valid object have valid pointers in the dict.

We would also be able to drop the existing "reconstruct index" recovery method.

Step 0. a. Migrating stores that only knew minimal indexing

The simplest solution would be a migration that happens at open_rw time of the file manager. It would traverse Index and convert it to the very first index file. A crash during that migration would not be destructive.

A second solution would be to make the existing Index readonly and use the new index file scheme for the new index entries. The migrated irmin-pack stores would forever keep the Index directory. GC would work normally for the new entries.

A third solution, on top of the second solution, would be to migrate the data out of Index during the first GC. We could then discard Index after the finalise of that first GC.

Step 0. b. Migrating stores that knew non-minimal indexing

We would stick to the "second solution" of the previous section.

Discriminating between case a. and b. would be possible by looking at the existing control file. We've already stored these informations in it's current form.

from irmin.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.