In this lesson, we mainly explain "how to thoroughly understand the Paxos algorithm"?
The Paxos algorithm holds a very important position in the distributed field. Mike Burrows, the author of the open-source distributed lock component Google Chubby, once said that there is only one consistency algorithm in the world, which is the Paxos algorithm; all other algorithms are inferior products.
Although the Paxos algorithm is important, it is also famous for its complexity. However, the Paxos algorithm is a necessary knowledge point for learning distributed systems. In this lesson, we will face the challenge and learn about the Paxos algorithm together.
Quorum Mechanism#
Before learning the Paxos algorithm, let's first look at the Quorum election algorithm in distributed systems. The Quorum mechanism can be seen in various consistency algorithms, and its main mathematical idea comes from the pigeonhole principle. In simple terms, if there are N replicas and W of them successfully update at once, then when I read data, I need to read from more than N-W replicas to ensure that I can read at least one updated piece of data.
Corresponding to the Quorum mechanism is WARO, which stands for Write All Read One. It is a simple replica control protocol where a write operation is considered successful only when all replicas have successfully updated; otherwise, it is deemed a failure.
WARO prioritizes read service because only when all replicas have successfully updated can it be considered a successful update, thus ensuring
all replicas are consistent. In this case, any replica's data can be read. The availability of write service is lower because if even one replica fails to update, the write operation is considered a failure. Assuming there are N replicas, if N-1 are down, the remaining replica can still provide read service; however, if one replica is down, the write service will not succeed.
WARO sacrifices the availability of update service to maximize the availability of read service, while Quorum is a compromise between update service and read service.
Quorum Definition#
The definition of Quorum is as follows: Assuming there are N replicas, an update operation wi is considered successful only after it has successfully updated in W replicas. The data corresponding to this successfully submitted update operation is called "successfully submitted data." For read operations, at least R replicas need to be read to access the updated data, where W + R > N, meaning W and R overlap. Generally, W + R = N + 1.
N = the number of data replicas stored
W = the number of replicas required for a successful update
R = the number of replicas accessed for reading a data object
Quorum specifies that at least N + 1 - W replicas must be read, which sounds a bit abstract. For example, if we maintain 10 replicas and successfully update three at once, we need to read data from at least eight replicas to ensure we read the latest data.
Application of Quorum#
The Quorum mechanism cannot guarantee strong consistency, meaning it cannot ensure that any user or node can read the most recently successfully submitted replica data at any time.
The use of the Quorum mechanism requires a metadata service to obtain the latest successfully submitted version number, which can determine the latest successfully submitted version number, allowing the confirmation of the latest written data from the already read data.
Quorum is a commonly used mechanism in distributed systems to ensure data redundancy and the voting algorithm for eventual consistency. The Quorum mechanism can be seen in algorithms such as Paxos, Raft, and ZooKeeper's Zab.
Roles and Interactions of Paxos Nodes#
Having understood the Quorum mechanism, we will now learn the Paxos algorithm, starting with the roles and interactions of nodes in the Paxos algorithm.
Roles of Paxos Nodes#
In the Paxos protocol, there are three types of node roles: Proposer, Acceptor, and Learner. Additionally, there is a Client, which acts as the originator of proposals.
The above three roles are only a logical division; in practice, a node can simultaneously act as all three roles.
Proposer#
There can be multiple Proposers. At the beginning of the process, a Proposer puts forward a proposal, which is the value. The so-called value can be any operation in engineering, such as "changing the value of a certain variable to a new value." In the Paxos protocol, these operations are uniformly abstracted as value.
Different Proposers can propose different or even contradictory values. For example, one Proposer proposes "set variable X to 1," while another Proposer proposes "set variable X to 2." However, for the same round of the Paxos process, at most one value can be approved.
Acceptor#
In the cluster, there are N Acceptors, which are completely equal and independent. The value proposed by the Proposer must be approved by more than half (N/2 + 1) of the Acceptors to pass.
Learner#
Learners do not participate in elections but learn the approved values. In Paxos, Learners mainly participate in the related state machine synchronization process.
Here, the Learner's process refers to the Quorum parliamentary mechanism, where a value needs to obtain approval from W = N/2 + 1 Acceptors. The Learner needs to read the results from at least N/2 + 1 Acceptors and at most N Acceptors to learn a passing value.
Client#
The Client role, as the originator of proposals, does not actually participate in the election process, such as the source of modification requests.
Interaction Between Proposer and Acceptor#
In Paxos, the Proposer and Acceptor are the core roles of the algorithm. Paxos describes how multiple Acceptors reach consensus on various proposals made by the Proposer in a system composed of multiple Proposers and Acceptors, while the Learner merely "learns" the final approved proposal.
The interaction between Proposer and Acceptor mainly involves 4 types of message communication, as shown in the figure below:
These 4 types of messages correspond to the two phases and four processes of the Paxos algorithm, which will be discussed when analyzing the election process.
Paxos Election Process#
The election process can be divided into two parts: the preparation phase and the election phase. You can view the sequence diagram below:
Phase 1: Preparation Phase#
The Proposer generates a globally unique and incrementing ProposalID and sends a Prepare request to all machines in the Paxos cluster. This request does not carry a value, only N, which is the ProposalID.
Upon receiving the Prepare request, the Acceptor checks whether the received ProposalID is greater than all previously responded proposals' N. If it is, then:
-
Persist N locally, which can be recorded as Max_N;
-
Reply to the request, including the value of the proposal with the largest N that has already been accepted. If there are no accepted proposals at this time, return an empty value;
-
Make a commitment not to accept any proposals smaller than Max_N.
If not, do not reply or reply with an error.
Phase 2: Election Phase#
For convenience, we further divide Phase 2 into P2a, P2b, and P2c.
P2a: Proposer Sends Accept#
After a period of time, the Proposer collects some Prepare replies, and there are several scenarios:
-
If the number of replies > half of the number of Acceptors, and all replied values are empty, then the Proposer sends an accept request with its specified value.
-
If the number of replies > half of the number of Acceptors, and some replied values are not empty, then the Proposer sends an accept request with the value that has the largest ProposalID from the replies as its proposal content.
-
If the number of replies <= half of the number of Acceptors, then attempt to generate a larger ProposalID and return to the preparation phase.
P2b: Acceptor Responds to Accept#
Upon receiving the Accept request, the Acceptor determines:
-
If the received N >= Max_N (generally equal), then reply that the submission is successful and persist N and value;
-
If the received N < Max_N, then do not reply or reply with submission failure.
P2c: Proposer Counts Votes#
After a period of time, the Proposer collects some Accept replies indicating successful submissions, such as:
-
When the number of replies > half of the number of Acceptors, it indicates that the submission of the value is successful. At this point, a broadcast can be sent to all Proposers and Learners to notify them of the committed value;
-
When the number of replies <= half of the number of Acceptors, attempt to generate a larger ProposalID and return to the preparation phase.
-
When a submission failure reply is received, attempt to generate a larger ProposalID and also return to the preparation phase.
Common Questions about Paxos#
There are several common questions regarding the Paxos protocol, which will be briefly introduced.
1. If less than half of the Acceptors fail, how does it operate normally?
In the Paxos process, if less than half of the Acceptors fail, it can be divided into two scenarios:
The first scenario is that if less than half of the Acceptors fail before the final value is determined, all Proposers will compete to propose again, and eventually, one proposal will succeed in submission.
The second scenario is that if less than half of the Acceptors fail after the final value is determined, all Proposers must submit using the final value, meaning the value has already taken effect and can be obtained without further modification.
2. What is the significance of Acceptors needing to accept a larger N, i.e., ProposalID?
This mechanism prevents one Proposer from crashing and causing blocking issues, allowing other Proposers to use larger ProposalIDs to seize temporary access rights.
3. How is a unique number, i.e., ProposalID, generated?
In the paper "Paxos Made Simple," it is mentioned that unique numbering allows all Proposers to choose from non-overlapping data sets, ensuring that there are no duplicates among different Proposers. For example, if there are 5 Proposers in the system, each Proposer can be assigned an identifier j (0~4), so that each Proposer's proposal number can be 5*i + j, where i indicates the number of times a proposal has been made.
Summary#
This lesson shared knowledge related to the Paxos protocol. Paxos is a classic distributed protocol, and understanding it will make learning other distributed protocols much easier.
The more important aspect of the Paxos algorithm is understanding the process, rather than memorizing each step. Besides what is introduced in the text, there are many related branch judgments and selection scenarios. If you wish to learn more about the derivation and proof related to the Paxos algorithm, I have attached links to several papers related to Paxos at the end. Interested students can study them:
"Java Engineer High Salary Training Camp"
Practical training + interview simulation + internal recommendation from big companies. If you want to improve your technical skills and get a high salary in a big company, click the link to improve yourself!
Selected Comments#
*Xin:#
If the number of replies > half of the number of Acceptors, and some replied values are not empty, then the Proposer sends an accept request with the value that has the largest ProposalID from the replies as its proposal content.
I don't quite understand what it means to use it as one's own proposal content.
Instructor's reply:
After voting, if it is found that other nodes have unique IDs exceeding one's own, abandon the local operation and choose the proposal from other nodes.
**Lu:#
Teacher, I have a question: if a Proposer generates a ProposalID once, will it not generate a larger ProposalID? If so, wouldn't the last Proposer's generated ProposalID always be the largest, which would not prevent this Proposer from crashing and causing blocking?
Instructor's reply:
The ProposalID generated by the Proposer is based on locally recorded information and is a single-point maximum.
**Wen:#
What happens if more than half of the Acceptors fail? How is it handled?
Instructor's reply:
The algorithm will not work; specific implementations depend on the engineering of each company.
**9904:#
Not bad, worth collecting.
*Xing:#
I have learned a lot; the election problem! This is a basic algorithm, and every company's product is customized and optimized based on this foundation.
Sang:#
In the preparation phase, when the Acceptor receives the Prepare request, does it reply to all Proposers or only to the one who requested?
Instructor's reply:
Reply to the proposer of the proposal.
Sang:#
Hello, teacher. The text mentions that multiple Proposers' proposals may differ, and if a proposal is not accepted by more than half, the Proposer will generate a larger ProposalID. How can it be confirmed which Proposers' values need to be preserved? Could an incorrect value be selected by generating a larger ProposalID?
Instructor's reply:
Paxos should not have malicious nodes; it can be compared to the Byzantine problem.