Giter Site home page Giter Site logo

Comments (9)

lukechampine avatar lukechampine commented on June 9, 2024 1

oh yeah, meant to do that -- done

from us.

lukechampine avatar lukechampine commented on June 9, 2024

Hmm.

This panic happens in the Encode method, which splits one []byte into multiple [][]bytes. I took a hardcore approach to the API here, and required that the caller must always pass [][]bytes with sufficient capacity to hold the result. (The alternative would be for Encode to dynamically allocate new shards on each call, which I found unacceptable.) So fillSectors is calling Encode with shards that are too small.

Since minShards is 1, we're using the simpleRedundancy encoder, and the required size of the shards is the same as the input []byte. So we should check what the actual size of the passed shard is.

Can you add this statement and tell me what it prints?

diff --git a/renter/reedsolomon.go b/renter/reedsolomon.go
index e8ab227..46ceafa 100644
--- a/renter/reedsolomon.go
+++ b/renter/reedsolomon.go
@@ -109,6 +109,7 @@ func (r simpleRedundancy) Encode(data []byte, shards [][]byte) {
        shardSize := numChunks * merkle.SegmentSize
        for i := range shards {
                if cap(shards[i]) < shardSize {
+                       println(cap(shards[i]) < shardSize)
                        panic("each shard must have capacity of at least len(data)/m")
                }
                shards[i] = shards[i][:shardSize]

from us.

jkawamoto avatar jkawamoto commented on June 9, 2024

I added this: println(cap(shards[i]) < shardSize, cap(shards[i]), shardSize) and got this true 4194304 4194368.

from us.

jkawamoto avatar jkawamoto commented on June 9, 2024

I also added println(r, len(data), chunkSize, numChunks) and got 1 4194359 64 65537.
I'm not sure if it' related, but len(data) looks bigger than the actual data we write with PseudoFile.Write.

from us.

lukechampine avatar lukechampine commented on June 9, 2024

println(cap(shards[i]) < shardSize, cap(shards[i]), shardSize)

oops, yeah, that's what I meant 😅.

It looks like len(data) is 55 bytes too long. There are three possible explanations:

  • fillSectors has a bug that causes it to encode too much data
  • mergePendingWrites has a bug that causes it to buffer too much data
  • canFit or calcShardSize has a bug that causes fileWriteAt to not flush when it should

If you're able to create a minimal reproducer, that would be most helpful. I tried reproducing the bug with some tests of my own, but was unsuccessful. Otherwise, we'll have to do more println debugging. Here's the next patch I suggest:

diff --git a/renter/renterutil/fileops.go b/renter/renterutil/fileops.go
index bcb16fb..46e606d 100644
--- a/renter/renterutil/fileops.go
+++ b/renter/renterutil/fileops.go
@@ -216,6 +216,7 @@ func (fs *PseudoFS) fillSectors(f *openMetaFile) error {
        // extend each pendingWrite with its unaligned segments, merging writes as appropriate
        for i := 0; i < len(f.pendingWrites); i++ {
                pw := f.pendingWrites[i]
+               println("pending write", i, pw.offset, len(pw.data))
                // if the write begins in the middle of a segment, we must download
                // that segment
                if align := pw.offset % f.m.MinChunkSize(); align != 0 {
@@ -226,6 +227,7 @@ func (fs *PseudoFS) fillSectors(f *openMetaFile) error {
                        }
                        pw.offset -= align
                        pw.data = append(chunk[:align], pw.data...)
+                       println("align head:", align)
                }
                // if the write ends in the middle of a segment, we must download
                // that segment
@@ -236,10 +238,12 @@ func (fs *PseudoFS) fillSectors(f *openMetaFile) error {
                                return err
                        }
                        pw.data = append(pw.data, chunk[align:]...)
+                       println("align tail:", len(chunk[align:]))
                }
                // merge with subsequent writes, if applicable
                for i+1 < len(f.pendingWrites) && pw.end() >= f.pendingWrites[i+1].offset {
                        next := f.pendingWrites[i+1]
+                       println("merge with", i+1, pw.end(), "->", next.end())
                        if pw.end() >= next.end() {
                                // full overwrite; only happens if both writes are within same MinChunk
                                copy(pw.data[next.offset-pw.offset:], next.data)
@@ -543,6 +547,10 @@ func (fs *PseudoFS) fileWriteAt(f *openMetaFile, p []byte, off int64) (int, erro
                data:   append([]byte(nil), p...),
                offset: off,
        })
+       // sanity check: all pending writes should fit within a sector
+       if !fs.canFit(f, 0) {
+               panic("too much pending data")
+       }
 
        // update metadata
        f.m.ModTime = time.Now()

from us.

jkawamoto avatar jkawamoto commented on June 9, 2024

I think you can reproduce it by cloning this https://github.com/storewise/benchmark and running this command: go run ./cmd/wave ed25519:80123282887700bef150939d5f4196cf9b8145882e5ba850391431e183968449 --max-storage-size "$((40*1024*1024))" --storage-duration 1000.

from us.

jkawamoto avatar jkawamoto commented on June 9, 2024

Btw, this is the result with the above patch:

pending write 0 0 4023
pending write 0 4023 4194304
align head: 55
1 4194359 64 65537
true 4194304 4194368
panic: too much pending data

from us.

lukechampine avatar lukechampine commented on June 9, 2024

ok, I have a better idea of what's going on now, and I've been able to reproduce it in a test. The first Write is for 4023 bytes. Since this is less than a sector, the write is buffered in memory. Then the second write, for 1 sector, comes in. Since combining these writes would exceed 1 sector, the first write is flushed. Then the second write is passed to fillSectors. fillSectors sees that the write is not aligned on a 64-byte boundary, so it downloads data from the first write and prepends it to the second write. But this causes the write to exceed 1 sector, so the erasure encoder panics.

I see two problems here. First, if a write is not aligned, we need to incorporate the extra alignment bytes into its total size. In this case, that would mean splitting the 1-sector write into two parts. Second, fillSectors is redownloading the data that it just uploaded moments ago -- this is obviously inefficient. Instead, we should keep misaligned pendingWrites around, on the assumption that we'll probably need them later. This is a more complicated fix, but doable.

It's worth noting that both of these problems occur only in the context of misaligned writes. If your writes are misaligned, you're going to have suboptimal performance. I wonder if there's some way to design the API that makes it harder to misalign your writes.

from us.

jkawamoto avatar jkawamoto commented on June 9, 2024

Thanks! Could you release a new version?

from us.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.