I was reading up on Raft, an understandable consensus algorithm. The link to the paper is here. Only continue reading if you’ve read the paper.
Right here The paper states that
,
A Log entry is committed once the leader that created the entry has
replicated it on a majority of the servers.
I misunderstood this statement to assume that whenever an entry is present on a majority of the servers, it is committed. This assumption doesn’t hold.
On Page-8 of the raft paper and in Figure-8
,
In Figure-8.c, It is Term 4 and S1 is leader. Even though the entry at Index 2 Term 2 is present on a majority of the servers, it is entirely possible that that entry can be wiped out as shown in Figure-8.d. In the next Term i.e. Term 5 (whenever new leader is elected term increases) S5 could overwrite entries at indices >= 2 with its own Term 3 entry (because of AppendEntries RPC #3)
If an existing entry conflicts with a new one(same index but different terms),
delete the existing entry and all that follow it.
In order for this to not happen, the leader in the current term should replicate an entry that it received in the current term on a majority of the servers. In our case, S1 needs to replicate the entry it received in Term 4 on a majority of the servers. After doing that, if S1 crashes before committing the entries, S5 cannot be elected leader and hence won’t overwrite the entries with indices >= 2. The RequestVote RPC would fail for S5 as RequestVoteRPC #2 would not hold, S5’s logs aren’t up-to-date with S1,S2,S2 (i.e. a majority) and hence S5 cannot get votes from a majority of the servers and can’t be elected as leader.
There are two subtle learnings that I could discern -