Giter Site home page Giter Site logo

Comments (9)

arthmoeros avatar arthmoeros commented on July 28, 2024 1
  • The file-level block (block the file read until any write completed) may also fail. e.g. in the sequence of p1 read, p2 read, p1 write, p2 write.
  • The transaction-level block will make sure the read-then-write of a transaction always run in order. e.g. [p1 read, p1 write], [p2 read, p2 write]. But I'm not very engaged with a file-based lock mechanism, you have to maintain the lock file in case it failed.

Yeah, I meant a transaction-level lock by the means of a lock file presence on the minio bucket, the second process would have to hold on until the lock is released, with a timeout fail just in case anything wrong happens with the lock management.

The implementation would have to be very "reliable", ensuring in case of any failure the lock must be released.

The scope of this "lock handling" is very specific, I think it can be manageable until a tarball/db decoupling happens on Verdaccio.

from verdaccio-minio.

barolab avatar barolab commented on July 28, 2024

Hi @arthmoeros, thanks for contributing. There's indeed a memory cache in the package database, which obviously will be a problem if you have multiple verdaccio instances. I never had the clustering issue in mind when I first wrote the plugin, so a PR will be welcome to fix this issue.
I'll take a look at #12 in the meantime.

from verdaccio-minio.

arthmoeros avatar arthmoeros commented on July 28, 2024

Fixed with #14

from verdaccio-minio.

barolab avatar barolab commented on July 28, 2024

Thanks for the PR #14 @arthmoeros, I was able to reproduce the issue #12 locally and checked that everything works fine for me too.

I published a 0.2.3 with the fix.

from verdaccio-minio.

favoyang avatar favoyang commented on July 28, 2024

While browsing an old thread brings me here.

I have a quick glance at #14, which removed the local cache of the db file. A good move, but it may still have a racing issue. Think about two users submit two packages at the same time. If their requests go to two processes, both processes read the db file from the disk, then modify the content, then write back to the disk. The second action will overwrite the first one. Correct me if I'm wrong.

For aws-s3 backend, I proposed a way to make the storage state-less. See verdaccio/monorepo#275. It's certainly not perfect with the trade-off performance. But if you are serious about the consistent of cluster deployment, it worth getting the attention to the racing issue.

from verdaccio-minio.

arthmoeros avatar arthmoeros commented on July 28, 2024

While browsing an old thread brings me here.

I have a quick glance at #14, which removed the local cache of the db file. A good move, but it may still have a racing issue. Think about two users submit two packages at the same time. If their requests go to two processes, both processes read the db file from the disk, then modify the content, then write back to the disk. The second action will overwrite the first one. Correct me if I'm wrong.

For aws-s3 backend, I proposed a way to make the storage state-less. See verdaccio/monorepo#275. It's certainly not perfect with the trade-off performance. But if you are serious about the consistent of cluster deployment, it worth getting the attention to the racing issue.

Yeah, the racing issue is there, but is very specific, it involves two new packages submissions at the same time, even if it is run in two processes, the file is read before saving it. Don't get me wrong, I do agree with you, it may be very specific, but the risk is there.

I would suggest a "file locking mechanism + hold until unlocked" rather than a stateless storage, because as I said, it is only involved when a new package is submitted. If you submit a new version of an existing package, the db file is not touched. I think the performance hit is preferable only in that scenario rather than the whole storage operations.

from verdaccio-minio.

arthmoeros avatar arthmoeros commented on July 28, 2024

Now, a rather robust solution would be that Verdaccio decoupled the storage management and db management, so we could handle the db management in a proper database, like any sql db or anything like it.

from verdaccio-minio.

favoyang avatar favoyang commented on July 28, 2024

Yes, it only happens when two new packages submitted at the same time. A rare case for a low traffic registry.

A carefully implemented locking mechanism can also work.

  • The file-level block (block the file read until any write completed) may also fail. e.g. in the sequence of p1 read, p2 read, p1 write, p2 write.
  • The transaction-level block will make sure the read-then-write of a transaction always run in order. e.g. [p1 read, p1 write], [p2 read, p2 write]. But I'm not very engaged with a file-based lock mechanism, you have to maintain the lock file in case it failed.

But I totally agree that separating storage management (tarball file) and db management is the best way to go.

from verdaccio-minio.

favoyang avatar favoyang commented on July 28, 2024

I think it can be manageable until a tarball/db decoupling happens on Verdaccio.

Refs https://github.com/openupm/verdaccio-storage-proxy

from verdaccio-minio.

Related Issues (8)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.