Giter Site home page Giter Site logo

Comments (11)

spinlock avatar spinlock commented on August 15, 2024

感觉行号对不上。通过调用栈不好判断是什么错误?

2016年5月17日星期二,张云乾 [email protected] 写道:

使用redis-port同步数据从redis到codis,数据量有7G,400多万个key,在sync
rdb阶段crash,感觉像是在建立到target连接读取返回值时hang住了?实验了多次,每次停在了不同的key上,应该和源数据没关系?下面是crash
log。多谢
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xb code=0x1 addr=0x0 pc=0x41caa9]

goroutine 7 [running]:
main.newRDBLoader.func1(0xc820075ec0, 0xc82001e8a0, 0xc820079e00)
/home/easemob/go/src/github.com/left2right/redis-port/cmd/utils.go:331
+0x639
created by main.newRDBLoader
/home/easemob/go/src/github.com/left2right/redis-port/cmd/utils.go:344
+0x68

goroutine 1 [select]:
main.(

cmdSync).SyncRDBFile(0xc820079e00, 0xc82001e8a0, 0x7fffd855a4e3, 0x26,
0x0, 0x0, 0x3b8a4c5f)
/home/easemob/go/src/github.com/left2right/redis-port/cmd/sync.go:227
http://github.com/left2right/redis-port/cmd/sync.go:227 +0xb2b main.(

cmdSync).Main(0xc820079e00)
/home/easemob/go/src/github.com/left2right/redis-port/cmd/sync.go:91
+0x863
main.main()
/home/easemob/go/src/github.com/left2right/redis-port/cmd/main.go:377
+0x24c3

goroutine 17 [syscall, 6 minutes, locked to thread]:
runtime.goexit()
/usr/local/go/src/runtime/asm_amd64.s:1721 +0x1

goroutine 8 [chan receive, 6 minutes]:
main.(

_cmdSync).SyncRDBFile.func1(0xc820075f20, 0x7fffd855a4e3, 0x26, 0x0, 0x0,
0xc820075ec0, 0xc820079e00)
/home/easemob/go/src/github.com/left2right/redis-port/cmd/sync.go:222
http://github.com/left2right/redis-port/cmd/sync.go:222 +0x136 created by
main.(_cmdSync).SyncRDBFile
/home/easemob/go/src/github.com/left2right/redis-port/cmd/sync.go:224
+0x110

goroutine 9 [IO wait]:
net.runtime_pollWait(0x2b88bf6e7388, 0x72, 0xc820076140)
/usr/local/go/src/runtime/netpoll.go:157 +0x60
net.(

_pollDesc).Wait(0xc82127eae0, 0x72, 0x0, 0x0)
/usr/local/go/src/net/fd_poll_runtime.go:73 +0x3a net.(_pollDesc).WaitRead(0xc82127eae0,
0x0, 0x0)
/usr/local/go/src/net/fd_poll_runtime.go:78 +0x36
net.(

_netFD).Read(0xc82127ea80, 0xc82001d000, 0x1000, 0x1000, 0x0,
0x2b88bfd21028, 0xc820076140) /usr/local/go/src/net/fd_unix.go:232 +0x23a
net.(_conn).Read(0xc820030018, 0xc82001d000, 0x1000, 0x1000, 0x0, 0x0,
0x0)
/usr/local/go/src/net/net.go:172 +0xe4
bufio.(

_Reader).fill(0xc8200740c0) /usr/local/go/src/bufio/bufio.go:97 +0x1e9
bufio.(_Reader).ReadSlice(0xc8200740c0, 0xa, 0x0, 0x0, 0x0, 0x0, 0x0)
/usr/local/go/src/bufio/bufio.go:328 +0x21a
github.com/garyburd/redigo/redis.(

_conn).readLine(0xc8200ac000, 0x0, 0x0, 0x0, 0x0, 0x0)
/home/easemob/go/src/github.com/left2right/redis-port/Godeps/_workspace/src/github.com/garyburd/redigo/redis/conn.go:338
http://github.com/left2right/redis-port/Godeps/_workspace/src/github.com/garyburd/redigo/redis/conn.go:338
+0x5a github.com/garyburd/redigo/redis.(
http://github.com/garyburd/redigo/redis.(_conn).readReply(0xc8200ac000,
0x0, 0x0, 0x0, 0x0)
/home/easemob/go/src/
github.com/left2right/redis-port/Godeps/_workspace/src/github.com/garyburd/redigo/redis/conn.go:411
+0x57
github.com/garyburd/redigo/redis.(

_conn).Do(0xc8200ac000, 0x69ff40, 0xc, 0xc824c0a7e0, 0x3, 0x3, 0x0, 0x0,
0x0, 0x0)
/home/easemob/go/src/github.com/left2right/redis-port/Godeps/_workspace/src/github.com/garyburd/redigo/redis/conn.go:559
http://github.com/left2right/redis-port/Godeps/_workspace/src/github.com/garyburd/redigo/redis/conn.go:559
+0x6b2 main.restoreRdbEntry(0x2b88bfd270e8, 0xc8200ac000, 0xc824bf0680)
/home/easemob/go/src/github.com/left2right/redis-port/cmd/utils.go:286
http://github.com/left2right/redis-port/cmd/utils.go:286 +0x1c6a main.(_cmdSync).SyncRDBFile.func1.1(0xc8212056c0,
0x7fffd855a4e3, 0x26, 0x0, 0x0, 0xc820075ec0, 0xc820079e00)
/home/easemob/go/src/github.com/left2right/redis-port/cmd/sync.go:216
+0x377
created by main.(*cmdSync).SyncRDBFile.func1
/home/easemob/go/src/github.com/left2right/redis-port/cmd/sync.go:219
+0xe7


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#22

from redis-port.

left2right avatar left2right commented on August 15, 2024

为了定位到是哪个key,在代码里面添加了下输出所有key的语句,看了下,感觉是Sync -> SyncRDBFile -> restoreRdbEntry(c, e)这个地方建立的连接一直没有数据返回导致io wait?在网上查了下,有几个开源的项目也遇到类似的情况,其中一个解决办法是给连接加了超时https://github.com/golang/gddo/issues/139(感觉这个不太好,可能会丢数据?)。我拿re di s-port最新的代码编译,在试下

from redis-port.

spinlock avatar spinlock commented on August 15, 2024

你的情况应该和他提到的不是一回事儿,那个 issue 里面应该是 http 请求 hang 住了,tcp 的确有这情况。

但是这个 issue 里面,贴出的 stack trace 里面报错是 panic: runtime error: invalid memory address or nil pointer dereference,这种情况应该是 nil 指针造成的。

如果能稳定复现的话,应该是 redis-port 里面有 bug 或者没处理到的地方。所以你最好能调试一下或者提供出错的代码,方便跟进一下。

from redis-port.

left2right avatar left2right commented on August 15, 2024
2016/05/17 13:14:29 [INFO] total=1039336531 -     64666156 [  6%]  entry=270889
2016/05/17 13:14:30 [INFO] total=1039336531 -     65678020 [  6%]  entry=275075
2016/05/17 13:14:31 [INFO] total=1039336531 -     66752256 [  6%]  entry=279343
2016/05/17 13:14:32 [INFO] total=1039336531 -     68302108 [  6%]  entry=283564
2016/05/17 13:14:33 [PANIC] parse rdb entry error
[error]: EOF
    11  /home/easemob/go/src/github.com/CodisLabs/redis-port/pkg/rdb/reader.go:75
            github.com/CodisLabs/redis-port/pkg/rdb.(*rdbReader).Read
    10  /usr/local/go/src/io/io.go:514
            io.(*teeReader).Read
    9   /home/easemob/go/src/github.com/CodisLabs/redis-port/pkg/rdb/reader.go:73
            github.com/CodisLabs/redis-port/pkg/rdb.(*rdbReader).Read
    8   /usr/local/go/src/io/io.go:298
            io.ReadAtLeast
    7   /usr/local/go/src/io/io.go:316
            io.ReadFull
    6   /home/easemob/go/src/github.com/CodisLabs/redis-port/pkg/rdb/reader.go:229
            github.com/CodisLabs/redis-port/pkg/rdb.(*rdbReader).readByte
    5   /home/easemob/go/src/github.com/CodisLabs/redis-port/pkg/rdb/reader.go:244
            github.com/CodisLabs/redis-port/pkg/rdb.(*rdbReader).readUint8
    4   /home/easemob/go/src/github.com/CodisLabs/redis-port/pkg/rdb/reader.go:187
            github.com/CodisLabs/redis-port/pkg/rdb.(*rdbReader).readEncodedLength
    3   /home/easemob/go/src/github.com/CodisLabs/redis-port/pkg/rdb/reader.go:143
            github.com/CodisLabs/redis-port/pkg/rdb.(*rdbReader).readString
    2   /home/easemob/go/src/github.com/CodisLabs/redis-port/pkg/rdb/reader.go:99
            github.com/CodisLabs/redis-port/pkg/rdb.(*rdbReader).readObjectValue
    1   /home/easemob/go/src/github.com/CodisLabs/redis-port/pkg/rdb/loader.go:129
            github.com/CodisLabs/redis-port/pkg/rdb.(*Loader).NextBinEntry
    0   /home/easemob/go/src/github.com/CodisLabs/redis-port/cmd/utils.go:247
            main.newRDBLoader.func1
        ... ...
[stack]:
    0   /home/easemob/go/src/github.com/CodisLabs/redis-port/cmd/utils.go:248
            main.newRDBLoader.func1
        ... ...

这是用这个repo的re dis-port的最新的master代码编译后运行报的错,之前也是遇到这个报错,以为是某个key的value太大导致的,就修改了NextBinEntry ->readObjectValue代码,为了定位是哪个key如下:

            val, err := l.readObjectValue(t)
            if err != nil {
                entry.DB = l.db
                entry.Key = key
                return entry, err
            }

另外一个可能有助于定位问题的点是:由于是在sync rdb阶段crash,我们就直接在源redis bgsave生成rdb文件(1G多),用另一个redis server将这个rdb起来,发现里面key的数量和 源redis不一样(试了两次,其中一次只有2w多的key,第二次是300多万的key,少了100多万)

from redis-port.

left2right avatar left2right commented on August 15, 2024

这里面的key的过期时间为1天,qps大概为5000左右,这个redis当前在线上运行使用,打算迁入codis

from redis-port.

spinlock avatar spinlock commented on August 15, 2024

先说第二个问题,rdb 文件就是 redis bgsave 生成的,redis-port 也只是发 bgsave 指令给 master 让他声称 rdb 而已。

两次 redis 从 rdb 直接恢复差很多,这个问题只能是 redis 的问题。

而且,rdb 里面包含即将过期,但是目前尚未过期的数据。如果在过期时间点之后使用 rdb 恢复数据,那些过期数据就可能直接被丢弃,你说的少 100w key 是不是因为这个?

from redis-port.

spinlock avatar spinlock commented on August 15, 2024

再说第一个问题,这个错误是 EOF。issue 最开始的那个 panic 我怀疑是不是你的调试改错了导致的 panic 而不是 redis-port 的错?

然后是 EOF 通常是 master 主动 close 连接。一般是同步的时候 backlog 生成速度大于 redis-port 消费速度导致 backlog buffer 满了而主动关闭与 redis-port 之间的 socket connection 导致的。
这种现象和解决方案 CodisLabs/codis#318 有讨论过,你看看是不是一样的原因。

from redis-port.

left2right avatar left2right commented on August 15, 2024

而且,rdb 里面包含即将过期,但是目前尚未过期的数据。如果在过期时间点之后使用 rdb 恢复数据,那些过期数据就可能直接被丢弃,你说的少 100w key 是不是因为这个?===有考虑到这点,后续再验证下,多谢

from redis-port.

left2right avatar left2right commented on August 15, 2024

EOF 通常是 master 主动 close 连接。一般是同步的时候 backlog 生成速度大于 redis-port 消费速度导致 backlog buffer 满了而主动关闭与 redis-port 之间的 socket connection 导致的。
这种现象和解决方案 CodisLabs/codis#318 有讨论过,你看看是不是一样的原因。===好的,我看看,多谢

from redis-port.

left2right avatar left2right commented on August 15, 2024

用了上述方法解决了问题,多谢~

from redis-port.

zhjggok avatar zhjggok commented on August 15, 2024

想要让redis-port支持普通redis迁移,即A redis到B redis,在修改cmd/utils.go里205行:s, err := redigo.String(c.Do("slotrestore", e.Key, ttlms, e.Value))
slotrestore为restore命令后开始迁移报这个错,看样子是restore里的checksum不对,请问需要怎样修改吗

./bin/redis-port sync -f 127.0.0.1:9000 -t 127.0.0.1:9001
2016/10/18 18:11:15 [INFO] set ncpu = 4, parallel = 4
2016/10/18 18:11:15 [INFO] sync from '127.0.0.1:9000' to '127.0.0.1:9001'
2016/10/18 18:11:15 [INFO] rdb file = 42
2016/10/18 18:11:15 [PANIC] restore command error
[error]: ERR DUMP payload version or checksum are wrong
[stack]:
1 /data/tmp/redis-port-master/cmd/utils.go:207
main.restoreRdbEntry
0 /data/tmp/redis-port-master/cmd/sync.go:212
main.(_cmdSync).SyncRDBFile.func1.1
... ...
2016/10/18 18:11:15 [PANIC] restore command error
[error]: ERR DUMP payload version or checksum are wrong
[stack]:
1 /data/tmp/redis-port-master/cmd/utils.go:207
main.restoreRdbEntry
0 /data/tmp/redis-port-master/cmd/sync.go:212
main.(_cmdSync).SyncRDBFile.func1.1
... ...

from redis-port.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.