wesleyac / raft Goto Github PK
View Code? Open in Web Editor NEWA Raft implementation in python
License: MIT License
A Raft implementation in python
License: MIT License
This might be "working as intended" but, in 15 seconds of discussion, Wesley and I couldn't think of an obvious reason this shouldn't work.
If you try to get a list of up_nodes, we get DownNodes!
leaders = collections.defaultdict(set)
for node in self.power_broker['up_nodes'].values():
if node.is_leader():
leaders[node.term].add(node.node_id)
======================================================================
ERROR: runTest (hypothesis.stateful.WorldBroker.TestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/danluu/dev/raft/venv/lib/python3.6/site-packages/hypothesis/stateful.py", line 191, in runTest
run_state_machine_as_test(state_machine_class)
File "/Users/danluu/dev/raft/venv/lib/python3.6/site-packages/hypothesis/stateful.py", line 109, in run_state_machine_as_test
breaker.run(state_machine_factory(), print_steps=True)
File "/Users/danluu/dev/raft/venv/lib/python3.6/site-packages/hypothesis/stateful.py", line 247, in run
state_machine.execute_step(value)
File "/Users/danluu/dev/raft/src/world_broker.py", line 141, in execute_step
if node.is_leader():
AttributeError: 'DownNode' object has no attribute 'is_leader'
Here's one example of a bad execution
Hypothesis test steps:
Step #1: []
Step #2: [<events.ReceiveDrop at 0x1057d88d0>]
Step #3: []
Step #4: []
terms : set of leaders
defaultdict(<class 'set'>, {1: {2}, 3: {2}, 6: {4}, 10: {2, 4}})
Term 10 has both 2 and 4 as leaders (or the test incorrectly thinks that's the case).
List of all change_type
calls that don't change a node type to itself
0: Follower->Candidate
1: Candidate->Leader
1: Follower->Candidate
1: Leader->Follower
2: Follower->Candidate
3: Candidate->Leader
3: Leader->Follower
4: Follower->Candidate
4: Follower->Candidate
5: Candidate->Leader
5: Candidate->Leader
5: Leader->Follower
5: Leader->Follower
5: Follower->Candidate
6: Candidate->Leader
6: Leader->Follower
9: Follower->Candidate
9: Follower->Candidate
10: Candidate->Leader
10: Candidate->Leader
The node that became a leader in term 6 doesn't stop being a leader, but in term 10, a new node becomes a leader.
This seems possibly related to #17, where a node went from Candidate to Follower to Leader. The bug that incorrectly caused the node to go from Candidate to Follower was fixed, but an additional bug was that the node should not have been able to go form Follower to Leader.
Term #0, Node #2: Follower->Candidate
Node #2 increased term to 1
Node #2 voted for node #2
Node #3 increased term to 1
Term #0, Node #2: Follower->Candidate
Node #2 increased term to 1
Node #2 voted for node #2
Node #3 increased term to 1
Step #1: [<events.ReceiveDrop at 0x10a18a940>, <events.ReceiveDrop at 0x10a18a668>]
E
======================================================================
ERROR: runTest (hypothesis.stateful.WorldBroker.TestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/danluu/dev/raft/venv/lib/python3.6/site-packages/hypothesis/stateful.py", line 191, in runTest
run_state_machine_as_test(state_machine_class)
File "/Users/danluu/dev/raft/venv/lib/python3.6/site-packages/hypothesis/stateful.py", line 109, in run_state_machine_as_test
breaker.run(state_machine_factory(), print_steps=True)
File "/Users/danluu/dev/raft/venv/lib/python3.6/site-packages/hypothesis/stateful.py", line 251, in run
state_machine.teardown()
File "/Users/danluu/dev/raft/src/world_broker.py", line 212, in teardown
self.print_log()
File "/Users/danluu/dev/raft/src/world_broker.py", line 78, in print_log
if entry['log_type'] == 'change_type':
KeyError: 'log_type'
----------------------------------------------------------------------
Ran 1 test in 0.764s
FAILED (errors=1)
Term #0, Node #2: Follower->Candidate
Node #2 increased term to 1
Node #2 voted for node #2
Node #3 increased term to 1
One problem is that we log something that doesn't match our print log function. It's possible there's some other bad thing going on here that's masked by our code blowing up because we can't print the log.
Your code has been rated at -0.75/10
Tell me how you really feel pylint
Right now, we calculate election timeouts using
self.rng.randint(self.conf['election_timeout_window'][0],
self.conf['election_timeout_window'][1])
We might get better test shrinking if we have hypothesis supply this randomness, similar to how we have hypothesis supply the randomness for message delays.
Need to leave for dinner, but it turns out that if you make this change:
- self.catastrophy_level = 0
+ self.catastrophy_level = 1
Hypothesis errors out with:
============================================================================ FAILURES ============================================================================
________________________________________________________________________ TestSet.runTest _________________________________________________________________________
self = fixed_dictionaries({'affected_nodes': sets(elements=sampled_from(range(0, 5))),
'delay': integers(min_value=1, max_va...unt': integers(min_value=-100, max_value=100),
'start_time': integers(min_value=0, max_value=400)}).flatmap(ClockSkew)
def accept(self):
if not hasattr(self, cache_key):
try:
> setattr(self, cache_key, getattr(self, force_key))
E AttributeError: 'OneOfStrategy' object has no attribute 'force_is_empty'
venv/lib/python3.6/site-packages/hypothesis/searchstrategy/strategies.py:102: AttributeError
During handling of the above exception, another exception occurred:
self = fixed_dictionaries({'affected_nodes': sets(elements=sampled_from(range(0, 5))),
'delay': integers(min_value=1, max_va... integers(min_value=1, max_value=400),
'start_time': integers(min_value=0, max_value=400)}).flatmap(DeliveryDuplicate)
def accept(self):
if not hasattr(self, cache_key):
try:
> setattr(self, cache_key, getattr(self, force_key))
E AttributeError: 'OneOfStrategy' object has no attribute 'force_is_empty'
venv/lib/python3.6/site-packages/hypothesis/searchstrategy/strategies.py:102: AttributeError
During handling of the above exception, another exception occurred:
self = fixed_dictionaries({'affected_node_pair': (sampled_from(range(0, 5)), sampled_from(range(0, 5))),
'delay': integers(m...gth': integers(min_value=1, max_value=400),
'start_time': integers(min_value=0, max_value=400)}).flatmap(TransmitDrop)
def accept(self):
if not hasattr(self, cache_key):
try:
> setattr(self, cache_key, getattr(self, force_key))
E AttributeError: 'FlatMapStrategy' object has no attribute 'force_is_empty'
venv/lib/python3.6/site-packages/hypothesis/searchstrategy/strategies.py:102: AttributeError
During handling of the above exception, another exception occurred:
self = fixed_dictionaries({'affected_node_pair': (sampled_from(range(0, 5)), sampled_from(range(0, 5))),
'delay': integers(m...alue=150),
'event_length': integers(min_value=1, max_value=400),
'start_time': integers(min_value=0, max_value=400)})
def accept(self):
if not hasattr(self, cache_key):
try:
> setattr(self, cache_key, getattr(self, force_key))
E AttributeError: 'LazyStrategy' object has no attribute 'force_is_empty'
venv/lib/python3.6/site-packages/hypothesis/searchstrategy/strategies.py:102: AttributeError
During handling of the above exception, another exception occurred:
self = <hypothesis.stateful.WorldBroker.TestCase testMethod=runTest>
def runTest(self):
> run_state_machine_as_test(state_machine_class)
venv/lib/python3.6/site-packages/hypothesis/stateful.py:191:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
venv/lib/python3.6/site-packages/hypothesis/stateful.py:104: in run_state_machine_as_test
breaker = find_breaking_runner(state_machine_factory, settings)
venv/lib/python3.6/site-packages/hypothesis/stateful.py:90: in find_breaking_runner
database_key=state_machine_factory.__name__.encode('utf-8')
venv/lib/python3.6/site-packages/hypothesis/core.py:800: in find
runner.run()
venv/lib/python3.6/site-packages/hypothesis/internal/conjecture/engine.py:320: in run
self._run()
venv/lib/python3.6/site-packages/hypothesis/internal/conjecture/engine.py:564: in _run
self.reuse_existing_examples()
venv/lib/python3.6/site-packages/hypothesis/internal/conjecture/engine.py:543: in reuse_existing_examples
self.test_function(data)
venv/lib/python3.6/site-packages/hypothesis/internal/conjecture/engine.py:125: in test_function
self._test_function(data)
venv/lib/python3.6/site-packages/hypothesis/core.py:771: in template_condition
success = condition(result)
venv/lib/python3.6/site-packages/hypothesis/stateful.py:71: in is_breaking_run
runner.run(state_machine_factory())
venv/lib/python3.6/site-packages/hypothesis/stateful.py:243: in run
value = self.data.draw(state_machine.steps())
venv/lib/python3.6/site-packages/hypothesis/internal/conjecture/data.py:112: in draw
return strategy.do_draw(self)
venv/lib/python3.6/site-packages/hypothesis/searchstrategy/lazy.py:154: in do_draw
return data.draw(self.wrapped_strategy)
venv/lib/python3.6/site-packages/hypothesis/searchstrategy/lazy.py:104: in wrapped_strategy
*self.__args, **self.__kwargs
venv/lib/python3.6/site-packages/hypothesis/strategies.py:454: in lists
if elements.is_empty:
...
This is in the branch test-snapshot
at 56fc771
If we run with cat = 100, we get a failure every time (that I've seen). Here's the set of adverse events from one run:
Step #1: {'adverse_events': [<events.PowerDown at 0x10b24d128>,
<events.TransmitDrop at 0x10b24d0b8>,
<events.ReceiveDrop at 0x10b35b0f0>,
<events.ClockSkew at 0x10b31b358>,
<events.PowerDown at 0x10b376860>],
{'event_type': 'Simulation Initialization'}
{'affected_node_pair': (2, 1), 'delay': 1, 'event_length': 621, 'start_time': 0, 'event_type': 'TransmitDrop', 'global_time': 0}
{'affected_node': 2, 'event_length': 1, 'skew_amount': 57, 'start_time': 0, 'event_type': 'ClockSkew', 'global_time': 0}
{'affected_nodes': {0}, 'delay': 1, 'event_length': 621, 'start_time': 0, 'event_type': 'ReceiveDrop', 'global_time': 0}
{'affected_node': 3, 'event_length': 275, 'start_time': 0, 'event_type': 'PowerDown', 'global_time': 0}
{'affected_node': 3, 'start_time': 275, 'event_type': 'StopPowerDown', 'global_time': 275}
{'affected_node': 2, 'start_time': 275, 'data': <message.RequestVoteResponse object at
{'affected_node': 3, 'event_length': 259, 'start_time': 362, 'event_type': 'PowerDown', 'global_time': 362}
{'affected_node': 3, 'start_time': 621, 'event_type': 'StopPowerDown', 'global_time': 621}
This simulation runs for 700ms. For 621ms, node 0 cannot receive messages and there's a problem with nodes 2 and 1 communicating with each other.
With just those two events, only nodes 3 and 4 could possibly be elected leader. In addition to those two events, node 3 is powered down from 0 to 275 and from 259 to 621. If we handle overlapping powerdown events correctly (do we?), that would make node 3 unavailable until 621. In that case, only node 4 could be elected leader, but there's no guarantee that node 4 will go up for election in the first 700ms, and in fact in this particular log node 4 never becomes a candidate so the test fails.
Also, it's not clear how we have events of duration 621 when we have max_ms_per_event=400.
Hello,
In fact, this is not a issue.I just want modify one string of README.md into
pip install -r requirements.txt
This is clearly to some people, thanks
The update_term
code contains
self.term = 0
self.log = [] # list[tuple(term, entry)]
self.commit_index = 0
self.last_applied = 0
self.voted_for = None
self.node_type = 'Follower'
self.votes_received = set()
self.election_timeout = self.calculate_election_timeout()
This seems like it can't be right. Maybe it was supposed to be attached to some kind of initialization function? Jinny and I are going to remove this. Please let us know if there's some reason this or something like it should be there.
Jinny and I looked at this and didn't know what the expected mechanism for making time pass is.
We tried adding this check in teardown:
if self.catastrophy_level == 0:
# self.execute_step(20)
# TODO: this check should be stronger.
# TODO: heal before checking for other catastrophy levels.
assert(len(self.leaders_history) > 0)
This check fails because, with catastrophy level 0, we execute for 0 time and then the test ends, so we don't have a leader.
There are a few ways we could fix this, but we're not sure if any of them conflicts with the current intent of the code.
We fixed this by add __lt__
and __eq__
on Event
. This works, but is quite dangerous because we broke object equality, so any future use of equality is potentially confusing
def __lt__(self,other):
return self.event_map['start_time'] < other.event_map['start_time']
def __eq__(self,other):
# WARNING: this completely breaks object equality.
return self.event_map['start_time'] == other.event_map['start_time']
If we look at a trace of node state transitions, we see that a node goes from candidate to follower to leader. It should probably not become a follower in between the candidate and leader states:
src/world_broker.py execute_step
timer_trip 2
change_type 2: Follower -> Candidate
change_type 2: Candidate -> Follower
change_type 4: Follower -> Follower
change_type 3: Follower -> Follower
change_type 2: Follower -> Leader
change_type 1: Follower -> Follower
change_type 2: Leader -> Leader
change_type 0: Follower -> Follower
change_type 2: Leader -> Leader
I'm going to remove bin
and include
, which appear to have virtualenv stuff that will only work if you are using a mac and your name is bc
:-). Let me know if I'm reading this incorrectly and that stuff should not be removed.
In the diagram below, blue indicates that a node is down and brown/red indicates a candidate going up for election:
Any of nodes 0, 3, or 4 could theoretically become a leader. However, in the first region of the diagram, nodes 3 and 4 both become candidates at the same time and split the vote so that neither can become leader.
The next time around, 3 and 4 again both go up for election at the same time. Immediately afterwards, there's a 1ms outage in node 1, which prevents either 3 or 4 from becoming leader.
Given these events, it's "correct" that no leader is elected. It's suspicious that nodes 3 and 4 both have an election timeout of 211
twice in a row, so perhaps we have a bug there, but even if there's a bug there and we fix it, that doesn't prevent this case from happening.
In order to get the code to run without errors, we (Brennan, Jinny, and I) added some checks to pulling values out of self.log and then added some defaults.
last_logged_term = -1
last_logged_entry = None
if len(self.log) > 0:
last_logged_term = self.log[-1][0]
last_logged_entry = self.log[-1][1]
This was in code none of us wrote and we were focused on other stuff and didn't read this code closely to make sure that the code makes sense. It's possible/likely that this change makes the code run but introduces a bug.
Related question: does the code in the log work if it gets passed a None
?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.