Giter Site home page Giter Site logo

Comments (8)

Xinlong-Chen avatar Xinlong-Chen commented on May 22, 2024

日志:

003511 INFO S0 send request to 2 {&{Term:1 LeaderId:0 PrevLogIndex:-12345 PrevLogTerm:-10001 Entries:[] LeaderCommit:0}}
003511 INFO S0 send request to 1 {&{Term:1 LeaderId:0 PrevLogIndex:-12345 PrevLogTerm:-10001 Entries:[] LeaderCommit:0}}
003514 LOG1 S2 S0 appendEntries
003514 LOG1 S1 S0 appendEntries
003515 LOG1 S1 arg: &{Term:1 LeaderId:0 PrevLogIndex:-12345 PrevLogTerm:-10001 Entries:[] LeaderCommit:0} reply: &{Term:1 Success:true XTerm:0 XIndex:0 XLen:0} {log: [{Term:-10001 Cmd:}]}
003514 LOG1 S2 arg: &{Term:1 LeaderId:0 PrevLogIndex:-12345 PrevLogTerm:-10001 Entries:[] LeaderCommit:0} reply: &{Term:1 Success:true XTerm:0 XIndex:0 XLen:0} {log: [{Term:-10001 Cmd:}]}
003515 INFO S0 get response from 1 {&{Term:1 Success:true XTerm:0 XIndex:0 XLen:0}}
003516 INFO S0 get response from 2 {&{Term:1 Success:true XTerm:0 XIndex:0 XLen:0}}
003519 TERM S1 converting to Candidate in T(2)
003519 TIMR S1 Election timeout, Start election, T2

一开始S0是leader,给S1、S2发送心跳包,S1、S2重置了定时器(时间点3514为入口日志,并在3515时间点进行了回复,重置定时器在此时间段内),但是在3519时间点S1超时并发起选举从而篡权,会造成达不成共识的结果。(时间点长度为0.1ms)

from mit6.824-2021.

OneSizeFitsQuorum avatar OneSizeFitsQuorum commented on May 22, 2024

同意 go 中 timer 的使用需要更精细,这里是我当时遗漏的地方~不过对于你提到的例子来说,s1 就算超时发起选举似乎也不会影响 safety?在我的印象中,只要 requestVote RPC 实现的正确,就算所有节点都发起重选举也只会影响 liveness 而不会影响 safety?但是鉴于这个概率很小,liveness 影响的应该也不会很严重吧?

from mit6.824-2021.

Xinlong-Chen avatar Xinlong-Chen commented on May 22, 2024

若篡权过程用户在旧leader S0上执行start命令了,因为start是立即返回的,此时尚未进行日志传播(S0有新cmd的日志,S1,S2没有)。
在这个情况下,S1发起选举。
选举过程中新leader S1(S0不投票,日志并不足够新,S2投票,类似于论文中图8的一个情况)会给S0的日志覆盖掉,这样就之前的那条命令就达不成共识了。
这个概率确实不大,我lab2a-lab2c全测3000次大概有2-3这个bug。

from mit6.824-2021.

OneSizeFitsQuorum avatar OneSizeFitsQuorum commented on May 22, 2024

选举过程中新leader S1(S0不投票,日志并不足够新,S2投票,类似于论文中图8的一个情况)会给S0的日志覆盖掉,这样就之前的那条命令就达不成共识了。

达不成共识也不影响 safety 吧?反正也没有 apply。甚至于在 lab3 里面还会使得命令达到 exactly-once 的效果,从而使得客户端可以无脑重试。

from mit6.824-2021.

Xinlong-Chen avatar Xinlong-Chen commented on May 22, 2024

但是config中的one会失效,造成测试的失败。
lab3我目前尚未完成,不是特别了解。

from mit6.824-2021.

OneSizeFitsQuorum avatar OneSizeFitsQuorum commented on May 22, 2024

哦哦那可能测试里面是会有这种 case 的,实际系统里面可能还好。欢迎你使用正确的方案测试没问题后再详细描述一下问题和解决方案,之后我再 pin 一下你的 issue,这样之后的同学在看文档的时候就可以不像我一样踩坑了~

from mit6.824-2021.

Xinlong-Chen avatar Xinlong-Chen commented on May 22, 2024

抱歉,这么久才回复,目前我采用的是协程睡眠,定时polling的思路,ticker函数如下:

func (rf *Raft) electionTimeout() bool {
	return time.Now().After(rf.electionTime)
}

func (rf *Raft) heartbeatTimeout() bool {
	return time.Now().After(rf.heartbeatTime)
}

func (rf *Raft) ticker() {
	for rf.killed() == false {
		rf.mu.Lock()
		switch rf.status {
		case follower:
			if rf.electionTimeout() {
				rf.TurnTo(candidate)
				rf.doElection()
				rf.resetElectionTime()
			}
		case candidate:
			if rf.electionTimeout() {
				rf.TurnTo(candidate)
				rf.doElection()
				rf.resetElectionTime()
			}
		case leader:
			if rf.heartbeatTimeout() {
				rf.doAppendEntries()
				rf.resetHeartbeatTime()
			}
		}
		rf.mu.Unlock()
		time.Sleep(time.Duration(gap_time) * time.Millisecond)
	}
}

此方法lab2、lab3测试过1W+,无fail。


若采用定时器方案,需要更加精细的使用:

if !t.Stop() {
    select {
    case <-t.C: // try to drain the channel
    default:
    }
}
t.Reset(d)

同时还需要考虑ticker函数中select的定时事件已发生的情况(可以考虑select上锁+非阻塞的select+ default定时sleep的思路)。

from mit6.824-2021.

MagicMarvel avatar MagicMarvel commented on May 22, 2024

感谢大佬的思路
终于找到和我一样问题的人了,我也遇到因为篡位导致测试用例的one函数失效的问题

发现了前人做的总结:#35 (comment)

但是config中的one会失效,造成测试的失败。 lab3我目前尚未完成,不是特别了解。

from mit6.824-2021.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.