|
|
Line 1: |
Line 1: |
| A fundamental problem in [[distributed computing]] is to achieve overall system reliability in the presence of a number of faulty processes. This often requires processes to agree on some data value that is
| | The author is known as Araceli Gulledge. The thing I adore most bottle tops gathering and now I have time to take on new things. Some time in the past he selected to live in Idaho. Interviewing is what I do in my working day job.<br><br>my web page - [http://www.duelingkitchens.com.au/portfolio/do-it-yourself-auto-repair-tricks-and-tips/ www.duelingkitchens.com.au] |
| needed during computation. Examples of applications of '''consensus''' include whether to commit a transaction to a database, agreeing on the identity of a [[Leader election|leader]], [[state machine replication]], and [[atomic broadcast]]s.
| |
| | |
| == Problem description ==
| |
| The consensus problem requires agreement among a number of processes for a single data value. Some of the processes may fail or be unreliable in other ways, so consensus protocols must be [[fault tolerant]]. The processes must somehow put forth their candidate values, communicate with one another, and agree on a single consensus value.
| |
| | |
| One approach to generating consensus is for all processes to agree on a majority value. For n processes, a majority value will require at least n/2 + 1 votes to win. However one or more faulty processes may skew the resultant outcome such that consensus may not be reached or reached incorrectly.
| |
| | |
| Protocols that solve consensus problems are designed to deal with limited numbers of faulty [[Process (computing)|processes]]. These protocols must satisfy a number of requirements to be useful. For instance a trivial protocol could have all processes output binary value 1. This is not useful and thus the requirement is modified such that the output must somehow depend on the input. That is, the output value of a consensus protocol must
| |
| be the input value of some process. Another requirement is that a process may decide upon and output a value only once and this decision is irrevocable. A process is called correct in an execution if it does not experience a failure. A consensus protocol tolerating halting failures must satisfy the following properties
| |
| | |
| ;Termination: Every correct process decides some value.
| |
| ;Validity: If all processes propose the same value <math>v</math>, then all correct processes decide <math>v</math>.
| |
| ;Integrity: Every correct process decides at most one value, and if it decides some value <math>v</math>, then <math>v</math> must have been proposed by some process.
| |
| ;Agreement: Every correct process must agree on the same value.
| |
| | |
| A protocol that can correctly guarantee consensus amongst n processes of
| |
| which at most t fail is said to be t-resilient.
| |
| | |
| In evaluating the performance of consensus protocols two factors of interest are running time and message complexity. Running time is given in [[Big O notation]] in the number of rounds of message exchange as a function of some input parameters (typically the number of processes and/or the size of the input domain). Message complexity refers to the amount of message traffic that is generated by the protocol. Other factors may include memory usage and the size of messages.
| |
| | |
| ==Models of computation==
| |
| There are two types of failures a process may undergo, a crash failure, or
| |
| a [[Byzantine failure]]. A crash failure occurs when a process abruptly stops
| |
| and does not resume. Byzantine failures are failures in which absolutely no
| |
| conditions are imposed. For example they may occur as a result of the
| |
| malicious actions of an adversary. A process that experiences a Byzantine failure may send contradictory or
| |
| conflicting data to other processes, or it may also sleep and then resume
| |
| activity after a lengthy delay. Of the two types of failures, Byzantine
| |
| failures are far more disruptive.
| |
| Thus a consensus protocol tolerating Byzantine failures must be resilient to every possible error that
| |
| can occur.
| |
| | |
| A stronger version of consensus tolerating Byzantine failures is given below
| |
| | |
| ;Termination:Every correct process decides some value.
| |
| ;Validity:If all correct processes propose the same value v, then all correct processes decide v.
| |
| ;Integrity:If a correct process decides v, then v must have been proposed by some correct process.
| |
| ;Agreement:Every correct process must agree on the same value.
| |
| | |
| Varying models of computation may define a consensus problem. Some models may deal with fully connected graphs while others may deal with rings and trees. Asynchronous versus synchronous models for message passing may be considered. In some models message authentication is allowed whereas in others processes are completely anonymous. Shared memory models in which processes communicate by accessing objects in shared memory are also an important area of research.
| |
| | |
| A special case of the consensus problem called binary consensus restricts the input and hence the output domain to a single binary digit {0,1}. When the input domain is large relative to the number of processes, for instance an input set of all the natural numbers, it can be shown that consensus is impossible in a synchronous message passing model.
| |
| | |
| While real world communications are often inherently asynchronous it is more practical and useful to model synchronous systems.<ref>{{cite journal|last=Aguilera|first=Marcos|title=Stumbling over Consensus Research: Misunderstandings and Issues|journal=Lecture Notes in Computer Science|year=2010|volume=5959|pages=59–72|doi=10.1007/978-3-642-11294-2_4}}</ref> In a fully asynchronous message-passing distributed system in which one process may have a halting failure, it has been proved that consensus is impossible.<ref>{{cite journal|last=Fischer|first=Michael J|coauthors=Nancy A. Lynch, Michael S. Paterson|title=Impossibility of distributed consensus with one faulty process|journal=Journal of the ACM|date=April 1985|volume=32|issue=2|doi=10.1145/3149.214121|url=http://dl.acm.org/citation.cfm?id=214121|accessdate=19 December 2011}}</ref> However, this impossibility result derives from a worst case scenario of a process schedule which is highly unlikely. In reality, process scheduling has a degree of randomness.<ref>{{cite journal|last=Aguilera|first=Marcos|title=Stumbling over Consensus Research: Misunderstandings and Issues|journal=Lecture Notes in Computer Science|year=2010|volume=5959|pages=59–72|doi=10.1007/978-3-642-11294-2_4}}</ref>
| |
| | |
| In an asynchronous model some forms of failures can be handled by a synchronous consensus protocol. For instance, the loss of a communication link may be modeled as a process which has suffered a Byzantine failure.
| |
| | |
| In synchronous systems it is assumed that all communications proceed in rounds. In one round a process may send all the messages it requires while receiving all messages from other processes. In this manner no message from one round may influence any messages sent within the same round.
| |
| | |
| ==Equivalency of agreement problems==
| |
| | |
| Three agreement problems of interest are as follows.
| |
| | |
| ===Terminating Reliable Broadcast===
| |
| {{Main|Terminating Reliable Broadcast}}
| |
| | |
| A collection of ''n'' processes, numbered from ''0'' to ''n - 1,'' communicate by sending messages to one another. Process ''0'' must transmit a value ''v'' to all processes such that:
| |
| | |
| # if process ''0'' is correct, then every correct process receives ''v''
| |
| # for any two correct processes, each process receives the same value.
| |
| | |
| It is also known as The General's Problem.
| |
| | |
| ===Consensus===
| |
| Formal requirements for a consensus protocol may include:
| |
| | |
| * ''Agreement'': All correct processes must agree on the same value.
| |
| * ''Weak validity'': If all correct processes receive the same input value, then they must all output that value.
| |
| * ''Strong validity'': For each correct process, its output must be the input of some correct process.
| |
| * ''Termination'': All processes must eventually decide on an output value
| |
| | |
| ===Weak Interactive Consistency===
| |
| | |
| For ''n'' processes in a partially synchronous system (the system alternates between good and bad periods of synchrony), each process chooses a private value. The processes communicate with each other by rounds to determine a public value and generate a
| |
| consensus vector with the following requirements:<ref>{{cite journal|last=Milosevic|first=Zarko|coauthors=Martin Hutle, Andre Schiper|title=Unifying Byzantine Consensus Algorithms with Weak Interactive Consistency|journal=Principles of Distributed Systems, Lecture Notes in Computer Science|year=2009|volume=5293|pages=300–314|doi=10.1007/978-3-642-10877-8_24}}</ref>
| |
| | |
| # if a correct process sends ''v,'' then all correct processes receive either ''v'' or nothing (integrity property)
| |
| # all messages sent in a round by a correct process are received in the same round by all correct processes (consistency property).
| |
| | |
| It can be shown that variations of these problems are equivalent in that the solution for a problem in one type of model may be the solution for another problem in another type of model. For example, a solution to the Weak Byzantine General problem in a synchronous authenticated message passing model leads to a solution for Weak Interactive Consistency.<ref>{{cite journal|last=L|first=Lamport|title=The Weak Byzantine Generals Problem|journal=Journal of the ACM|date=July 1983|volume=30|issue=3|doi=10.1145/2402.322398|url=http://dl.acm.org/citation.cfm?id=322398|accessdate=19 December 2011}}</ref> An interactive consistency algorithm can solve the consensus problem by having each process choose the majority value in its consensus vector as its consensus value.<ref>{{cite web|last=Fischer|first=Michael J|title=The Consensus Problem in Unreliable Distributed Systems (A Brief Survey)|url=http://www.cs.yale.edu/publications/techreports/tr273.pdf|accessdate=19 December 2011}}</ref>
| |
| | |
| ==Solvability results for some agreement problems==
| |
| | |
| There is a t-resilient anonymous synchronous protocol which solves the [[Byzantine Generals problem]]
| |
| ,<ref>{{cite journal|last=Lamport|first=L|coauthors=R. Shostak, M. Pease|title=The Byzantine Generals Problem|journal=ACM Transactions on Programming Languages and Systems|date=July 1982|volume=4|series=3|pages=382–401|doi=10.1145/357172.357176}}</ref><ref>{{cite journal|last=Lamport|first=Leslie|coauthors=Marshall Pease, Robert Shostak|title=Reaching Agreement in the Presence of Faults|journal=Journal of the ACM|date=April 1980|volume=27|issue=2|pages=228–234|doi=10.1145/322186.322188|pmid=|url=http://research.microsoft.com/users/lamport/pubs/reaching.pdf|accessdate=2007-07-25}}</ref> problem iff t/n < 1/3 and the Weak Byzantine Generals case <ref>{{cite journal|last=L|first=Lamport|title=The Weak Byzantine Generals Problem|journal=Journal of the ACM|date=July 1983|volume=30|issue=3|doi=10.1145/2402.322398|url=http://dl.acm.org/citation.cfm?id=322398|accessdate=19 December 2011}}</ref> where t is the number of failures and n is the number of processes.
| |
| | |
| For a system of 3 processors with one of them Byzantine, there is no solution for the consensus problem in a synchronous message passing model with binary inputs''.<ref>{{cite book|last=Hagit|first=Attiya|title=Distributed Computing 2nd Ed|year=2004|publisher=Wiley|isbn=978-0-471-45324-6|pages=101–103}}</ref>
| |
| | |
| In a fully asynchronous system there is no consensus solution that can tolerate one or more crash failures even when only requiring the non triviality property
| |
| .<ref>{{cite journal|last=Fischer|first=Michael J|coauthors=Nancy A. Lynch, Michael S. Paterson|title=Impossibility of distributed consensus with one faulty process|journal=Journal of the ACM|date=April 1985|volume=32|issue=2|doi=10.1145/3149.214121}}</ref> This result is sometimes called the FLP impossibility proof. The authors [[Michael J. Fischer]], [[Nancy Lynch]], and [[Mike Paterson]] were awarded a [[Dijkstra Prize]] for this significant work. The FLP result does not state that consensus can never be reached: merely that under the model's assumptions, no algorithm can always reach consensus in bounded time. In practice it is highly unlikely to occur.
| |
| | |
| == Some consensus protocols ==
| |
| An example of a polynomial time binary consensus protocol that tolerates Byzantine failures is the Phase King algorithm <ref>{{cite journal|last=Berman|first=Piotr|coauthors=Juan A. Garay|title=Cloture Votes: n/4-resilient Distributed Consensus in t + 1 rounds|journal=Theory of Computing Systems|volume=26|series=2|pages=3–19|doi=10.1007/BF01187072|url=http://www.springerlink.com/content/gx21355767268411/|accessdate=19 December 2011}}</ref> by Garay and Berman. The algorithm solves consensus in a synchronous message passing model with n processes and up to f failures, provided n>4f.
| |
| In the phase king algorithm, there are f+1 phases, with 2 rounds per phase.
| |
| Each process keeps track of its preferred output (initially equal to the process's own input value). In the first round of each phase each process broadcasts its own preferred value to all other processes. It then receives the values from all processes and determines which value is the majority value and its count. In the second round of the phase, the process whose id matches the current phase number is designated the king of the phase. The king broadcasts the majority value it observed in the first round and serves as a tie breaker. Each process then updates its preferred value as follows. If the count of the majority value the process observed in the first round is greater than n/2 + f, the process changes its preference to that majority value; otherwise it uses the phase king's value. At the end of f + 1 phases the processes output their preferred values.
| |
| | |
| Google has implemented a distributed lock service library called Chubby.<ref>
| |
| {{cite conference
| |
| | title=The Chubby lock service for loosely-coupled distributed systems
| |
| | author=Burrows, M.
| |
| | booktitle=Proceedings of the 7th [[Symposium on Operating Systems Design and Implementation]]
| |
| | pages=335–350
| |
| | year=2006
| |
| | publisher=USENIX Association Berkeley, CA, USA
| |
| | url=http://labs.google.com/papers/chubby.html
| |
| }}</ref> Chubby maintains lock information in small files which are stored in a replicated database to achieve high availability in the face of failures. The database is implemented on top of a fault-tolerant log layer which is based on the [[Paxos algorithm|Paxos consensus algorithm]]. In this scheme, Chubby clients communicate with the Paxos ''master'' in order to access/update the replicated log; i.e., read/write to the files.<ref>
| |
| {{cite conference
| |
| | first = Tushar
| |
| | last = C.
| |
| | authorlink =
| |
| | coauthors = Griesemer, R; Redstone J.
| |
| | title = Paxos Made Live - An Engineering Perspective
| |
| | booktitle = Proceedings of the Twenty-sixth Annual ACM [[Symposium on Principles of Distributed Computing]]
| |
| | pages = 398–407
| |
| | publisher = ACM Press New York, NY, USA
| |
| | year = 2007
| |
| | location = Portland, Oregon, USA
| |
| | url = http://delivery.acm.org/10.1145/1290000/1281103/p398-chandra.pdf?key1=1281103&key2=4382532021&coll=GUIDE&dl=GUIDE&CFID=15324100&CFTOKEN=95510390
| |
| | doi = 10.1145/1281100.1281103
| |
| | accessdate = 2008-02-06
| |
| }}</ref>
| |
| | |
| [[Bitcoin]] uses [[proof of work]] to maintain consensus in its [[peer-to-peer]] network. Nodes in the bitcoin network attempt to solve a cryptographic proof-of-work problem, where probability of finding the solution is proportional to the computational effort, in hashes per second, expended, and the node that solves the problem has their version of the block of transactions added to the peer-to-peer distributed timestamp server accepted by all of the other nodes. As any node in the network can attempt to solve the proof-of-work problem, a [[Sybil attack]] becomes unfeasible unless the attacker has over 50% of the computational resources of the network.
| |
| | |
| * [[Chandra-Toueg consensus algorithm]]
| |
| * [[Randomized consensus]]
| |
| * [[Raft consensus algorithm]]
| |
| | |
| ==Applications of consensus protocols==
| |
| One important application of consensus protocols is to provide [[synchronization]]. Traditional methods of concurrent access to shared data objects implement some form of [[mutual exclusion]] through locks. However the drawback is if a process dies while in its critical section, other correct processes may never acquire the lock. Thus mutual exclusion is poorly suited to asynchronous fault tolerant systems. A wait-free implementation of a data object supporting concurrent accesses guarantees that any process can complete its execution within a finite number of steps independent of the behavior of other processes. Atomic objects such as read/write registers have been proposed for the implementation of wait free synchronization. However it has been shown that such objects as well as traditional primitives such as test&set and fetch&add cannot be used for such an implementation.<ref>{{cite web|last=Herlihy|first=Maurice|title=Wait-Free Synchronization|url=http://www.cs.brown.edu/~mph/Herlihy91/p124-herlihy.pdf|accessdate=19 December 2011}}</ref>
| |
| | |
| == References ==
| |
| {{Reflist|30em}}
| |
| | |
| ==Further reading==
| |
| *{{cite doi|10.1145/331524.331529}}
| |
| *{{cite doi|10.1137/S0097539796307698}}
| |
| | |
| == See also ==
| |
| * [[Uniform Consensus]]
| |
| * [[Quantum Byzantine agreement]]
| |
| * [[Byzantine fault tolerance]]
| |
| | |
| {{DEFAULTSORT:Consensus (Computer Science)}}
| |
| [[Category:Distributed computing problems]]
| |
| [[Category:Fault-tolerant computer systems]]
| |