This is the stuck trace: <a href="https://github.com/lukechampine/us/files/4526630/err

Hmm. This panic happens in the Encode</cod

I think you can reproduce it by cloning this <a href="https://github.com/storewise/ben

Btw, this is the result with the above patch: <div class="snippet-clipboard-conten

panic: each shard must have capacity of at least len(data)/m about us HOT 9 CLOSED

lukechampine commented on June 9, 2024

panic: each shard must have capacity of at least len(data)/m

from us.

Comments (9)

lukechampine commented on June 9, 2024 1

oh yeah, meant to do that -- done

from us.

lukechampine commented on June 9, 2024

Hmm.

This panic happens in the Encode method, which splits one []byte into multiple [][]bytes. I took a hardcore approach to the API here, and required that the caller must always pass [][]bytes with sufficient capacity to hold the result. (The alternative would be for Encode to dynamically allocate new shards on each call, which I found unacceptable.) So fillSectors is calling Encode with shards that are too small.

Since minShards is 1, we're using the simpleRedundancy encoder, and the required size of the shards is the same as the input []byte. So we should check what the actual size of the passed shard is.

Can you add this statement and tell me what it prints?

diff --git a/renter/reedsolomon.go b/renter/reedsolomon.go
index e8ab227..46ceafa 100644
--- a/renter/reedsolomon.go
+++ b/renter/reedsolomon.go
@@ -109,6 +109,7 @@ func (r simpleRedundancy) Encode(data []byte, shards [][]byte) {
        shardSize := numChunks * merkle.SegmentSize
        for i := range shards {
                if cap(shards[i]) < shardSize {
+                       println(cap(shards[i]) < shardSize)
                        panic("each shard must have capacity of at least len(data)/m")
                }
                shards[i] = shards[i][:shardSize]

from us.

jkawamoto commented on June 9, 2024

I added this: println(cap(shards[i]) < shardSize, cap(shards[i]), shardSize) and got this true 4194304 4194368.

from us.

jkawamoto commented on June 9, 2024

I also added println(r, len(data), chunkSize, numChunks) and got 1 4194359 64 65537.
I'm not sure if it' related, but len(data) looks bigger than the actual data we write with PseudoFile.Write.

from us.

lukechampine commented on June 9, 2024

println(cap(shards[i]) < shardSize, cap(shards[i]), shardSize)

oops, yeah, that's what I meant 😅.

It looks like len(data) is 55 bytes too long. There are three possible explanations:

fillSectors has a bug that causes it to encode too much data
mergePendingWrites has a bug that causes it to buffer too much data
canFit or calcShardSize has a bug that causes fileWriteAt to not flush when it should

If you're able to create a minimal reproducer, that would be most helpful. I tried reproducing the bug with some tests of my own, but was unsuccessful. Otherwise, we'll have to do more println debugging. Here's the next patch I suggest:

diff --git a/renter/renterutil/fileops.go b/renter/renterutil/fileops.go
index bcb16fb..46e606d 100644
--- a/renter/renterutil/fileops.go
+++ b/renter/renterutil/fileops.go
@@ -216,6 +216,7 @@ func (fs *PseudoFS) fillSectors(f *openMetaFile) error {
        // extend each pendingWrite with its unaligned segments, merging writes as appropriate
        for i := 0; i < len(f.pendingWrites); i++ {
                pw := f.pendingWrites[i]
+               println("pending write", i, pw.offset, len(pw.data))
                // if the write begins in the middle of a segment, we must download
                // that segment
                if align := pw.offset % f.m.MinChunkSize(); align != 0 {
@@ -226,6 +227,7 @@ func (fs *PseudoFS) fillSectors(f *openMetaFile) error {
                        }
                        pw.offset -= align
                        pw.data = append(chunk[:align], pw.data...)
+                       println("align head:", align)
                }
                // if the write ends in the middle of a segment, we must download
                // that segment
@@ -236,10 +238,12 @@ func (fs *PseudoFS) fillSectors(f *openMetaFile) error {
                                return err
                        }
                        pw.data = append(pw.data, chunk[align:]...)
+                       println("align tail:", len(chunk[align:]))
                }
                // merge with subsequent writes, if applicable
                for i+1 < len(f.pendingWrites) && pw.end() >= f.pendingWrites[i+1].offset {
                        next := f.pendingWrites[i+1]
+                       println("merge with", i+1, pw.end(), "->", next.end())
                        if pw.end() >= next.end() {
                                // full overwrite; only happens if both writes are within same MinChunk
                                copy(pw.data[next.offset-pw.offset:], next.data)
@@ -543,6 +547,10 @@ func (fs *PseudoFS) fileWriteAt(f *openMetaFile, p []byte, off int64) (int, erro
                data:   append([]byte(nil), p...),
                offset: off,
        })
+       // sanity check: all pending writes should fit within a sector
+       if !fs.canFit(f, 0) {
+               panic("too much pending data")
+       }
 
        // update metadata
        f.m.ModTime = time.Now()

from us.

jkawamoto commented on June 9, 2024

I think you can reproduce it by cloning this https://github.com/storewise/benchmark and running this command: go run ./cmd/wave ed25519:80123282887700bef150939d5f4196cf9b8145882e5ba850391431e183968449 --max-storage-size "$((40*1024*1024))" --storage-duration 1000.

from us.

jkawamoto commented on June 9, 2024

Btw, this is the result with the above patch:

pending write 0 0 4023
pending write 0 4023 4194304
align head: 55
1 4194359 64 65537
true 4194304 4194368
panic: too much pending data

from us.

lukechampine commented on June 9, 2024

ok, I have a better idea of what's going on now, and I've been able to reproduce it in a test. The first Write is for 4023 bytes. Since this is less than a sector, the write is buffered in memory. Then the second write, for 1 sector, comes in. Since combining these writes would exceed 1 sector, the first write is flushed. Then the second write is passed to fillSectors. fillSectors sees that the write is not aligned on a 64-byte boundary, so it downloads data from the first write and prepends it to the second write. But this causes the write to exceed 1 sector, so the erasure encoder panics.

I see two problems here. First, if a write is not aligned, we need to incorporate the extra alignment bytes into its total size. In this case, that would mean splitting the 1-sector write into two parts. Second, fillSectors is redownloading the data that it just uploaded moments ago -- this is obviously inefficient. Instead, we should keep misaligned pendingWrites around, on the assumption that we'll probably need them later. This is a more complicated fix, but doable.

It's worth noting that both of these problems occur only in the context of misaligned writes. If your writes are misaligned, you're going to have suboptimal performance. I wonder if there's some way to design the API that makes it harder to misalign your writes.

from us.

jkawamoto commented on June 9, 2024

Thanks! Could you release a new version?

from us.

panic: each shard must have capacity of at least len(data)/m about us HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent