Education Series: The Byzantine General’s Problem

 

The Byzantine Generals’ Problem

The Byzantine Generals’ Problem (shortened to BGP for the rest of the post) is a fundamental problem in a distributed computer network. BGP is a derivation of the unsolvable Two General’s Problem, in which, two generals sitting on two opposite sides of an enemy’s camp need to send coordinated attack information to each other through (not around) the enemy’s camp. Therein lies the problem. How do you trust that the information has not been compromised along the way? How do you know the messenger has not been corrupted by the enemy? The BGP, described in more detail below, expands this problem to a Generals-to-Lieutenants configuration.

Blockchain technologies have their own varying solutions to solving this issue. Bitcoin, Ethereum, and their inspired, kindred blockchain protocols are all decentralized peer-to-peer networks. Trust is paramount. No one party owns the ongoings nor decides the actions of the network, so how to ensure the network is running smoothly and honestly. In other articles we will explore how different blockchains solve BGP. Before formally defining BGP, let’s discuss system architecture.



Designing The System Network

When designing a network, designers are faced with three options: distributed, centralized, and decentralized networks. Depending on the goals of the organization, one network design is optimal over another. Below is an illustration of these different network architctures.

types_of_systems

Let’s define each of these systems:

  1. Distributed systems have no single authority. Each of the dots in the diagram is a node, or a network participant. Each node is connected directly to every other node. In non-computer network terms, we can imagine a vote where each voter has equal power to one another.
  2. Centralized systems has a single authority. All nodes obey the commands of the dot colored yellow in the diagram. In non-computer network terms, we can imagine a King or Emperor commanding their nation.
  3. Decentralized systems, like distributed systems, have no single authority. The difference here is there are layers of nodes. The blue nodes, or end nodes, connect to secondary nodes (marked in yellow). The secondary nodes connect to one another. It should be noted there is a high degree of variability in how these systems are constructed but my commentary stays faithful to the image above.


So which design should we choose?

Bootstrapping, Creating, Updating a Network

  • [Highest] Centralized systems can be created quickly. They can also be updated quickly.
  • [Medium] Once you leave Centralized systems, decisions become slower as decisions and actions start to slow down in Decentralized systems.
  • [Low] Distributed systems are by far the slowest.

Maintenance

  • [Easy] Centralized systems are easier to maintain since there is only one point of failure.
  • [Medium] Decentralized have more points of failure but fewer than distributed systems
  • [Hard] Distributed systems are harder to maintain due to a large number of nodes.

Stability / Fault Tolerance

  • [Highest Risk] Centralized systems are vulnerable to a failing leader. In computer networks, if the central server is down the entire network goes with it.
  • [Medium Risk] Decentralized systems are less vulnerable. One secondary node goes down but the network goes on. There are many other secondary nodes to keep fighting the good fight.
  • [Low Risk] Distributed networks are the least vulnerable. Kill one head, and two more grow in its place. Hail Hydra!

Evolving the Network

  1. [Slowest] Centralized systems evolve the slowest since they have one single authority to develop new ideas and incorporate it into the network.
  2. [Medium] Decentralized systems have more points to bring in new data to evolve the network.
  3. [Fastest] Distributed has even more access points than decentralized systems.

Scalability

  1. [Low] Centralized  systems are low scalability options.
  2. [Medium] Decentralized has high scalability with finite capacity.
  3. [High] Distributed systems have infinite scaling limits.


The Byzantine’s General Problem

If you need a refresher on history, the Byzantine Empire was the successor to the Roman Empire. The Romans split their empire in half between the East and West. The Eastern half became the Byzantine Empire. Following in the footstep of their forebears, the Byzantine army has launched a military campaign against a walled city. It’s strategy is to surround the defenders before launching its attacks. The army is comprised of several legions, or divisions, each led by a general and their reporting lieutenants. The lieutenants and generals only communicate via messengers, and legion generals also communicate between each other via messengers. Do you already see the analogy of this structure with the computer network structures up above? If not, don’t worry, we’ll break that analogy down for you later.

Queue the danger music since some of the generals may be traitors. The loyal generals are trying to reach agreement on what to do next. They devise a plan (or in computer science, an algorithm) guarantee two things: 1) all loyal generals agree on the same plan (consensus), and 2) the loyal generals act upon that plan regardless of what the traitor generals do. There are a few constraints we need to place on this plan. The consensus plan should be reasonable. The number of traitor generals should be small enough that they do not corrupt the loyal generals plan(s). All the generals or commanders have to agree upon one of two possible actions: 1) exact time to attack or 2) exact time to retreat. This is high risk because if only one of the armies attacks, the entire army will be defeated.



Screen Shot 2018-04-30 at 10.48.34 AM.png

Diagram from the original Byzantine General’s Problem paper



In the figure above, traitors are filled in with black lines. In Figure 1, a loyal General sends a message to Lieutenant 1 and 2. L2 is a traitor and sends a conflicting message to his peer. L1 does not know who the traitor is: is the General a traitor or is his peer a traitor? In Figure 2, the General is a traitor and sends conflicting messages to L1 and L2. Since both Lieutenants are loyal, L2 sends L1 the message he received from the General. From L1’s point of view, he again does not know who is the traitor: the General or Lieutenant 2? In both scenarios, the army attacks with less than full strength.



Screen Shot 2018-04-30 at 10.53.52 AM.png


Now let’s get closer to finding a solution. We create two new Figures (3 and 4) to expand the BGP above to include an additional Lieutenant (L3) and one more message type. In Figure 1 and 2 we had two message types: “Attack” and “Retreat”. In Figure 3 and 4 we add “Uncertain”, denoted as Z in the figures above. These additions increase the problem’s complexity. It gets even more complicated once we add more Generals, Lieutenants, and message types.

Without going over the math in too much detail, consensus can be reached as long as 2/3 of actors are loyal.  In the figures above, each Lieutenant all have the same orders and consensus is met. Each Lieutenant has X, Y, andas their orders. Let’s also plan for a solution where if messages are missing or the message is Uncertain the default strategy is to retreat. Even if X, Y, and Z are different combinations of retreat, attack, and uncertain then the Lieutenants take on the default action: retreat. Live on to fight to another day.



Byzantine Fault Tolerance

Byzantine Fault Tolerance (BFT) is a trait of a system that can withstand failures (attacking at less than full strength) defined by BGP. BFT is one solution to the infamous BGP. Yet Byzantine Faults are the most severe issues in a variety of network applications. High stakes systems such as nuclear arsenal commands or complex vehicles such as commercial airliners require solutions to BGP. In blockchain technology this is even more pervasive. Bitcoin for example does not have a single authority or central server to issue commands to the rest of the network. This distributed computing environment requires a sophisticated BFT solution.

In Bitcoin, a traitorous General is the same as a node sending unreliable or inconsistent information to the rest of the network. There is no IT department or central authority to correct issues. Absolute trust is needed by every general, or node, on the network. Bitcoin’s attempt to solve BFT through its PoW algorithm has been battle tested since inception. Is it a final solution to the problem? What Bitcoin lacks in speed, it makes up for in security. Blockchains that optimize for speed such as EOS decrease their levels of decentralization to achieve their goals. EOS has yet to be battle tested but passing muster on BFT will be one criteria to judge its success.

Hopefully you learned something new. In a future post we will delve into the concept of Tragedy of the Commons.



Sources:

  1. Byzantine General’s Problem, 1982 Leslie Lamport, Robert Shostak, Marshall Pease
  2. Byzantine Fault Tolerance, Wikipedia


Thank you for coming to the site. Quantalysus publishes blockchain research and analysis for the crypto community. Please follow on Twitter, Steem (please follow and upvote if you can – thanks!)Telegram channel (New!), and Medium to stay up to date.

If you want to earn Aelf (ELF) tokens for just using Twitter and Reddit, sign up for their candy / bounty program.

If you learned something:

Other posts:

4 thoughts on “Education Series: The Byzantine General’s Problem”

Leave a Reply