Formal Proofs in Combinatorics

May 13, 2020  

This text is based on related reading about formal proofs in combinatroics. Mainly two papers, one is about formalization of two central theorems in combinatorics of finite posets [@singh2017formalization], one is the paper published in 2008 about the Four Color Theorem fully in formal proof [@gonthier2008formal]. 1

The goal of this note is to give a short introduction to the formal proofs and its applications in proving some theorems. The first half of the text is devoted to a brief review of how computer programs can do the proving job and some formalization examples in combinatorics. Then we introduce how it applied to prove the Four-Color Theorem and other related lemmas.

This note includs some of my own understandings, so it may be intuitive but not rigorously correct.

Formal Methods and Proofs

In this section, we will explain why we need formal proofs and how a proof assistant (like Coq) works (in a shallow level).


A proof of a proposition is an argument, that can convince anyone that the proposition is correct. In mathematics, a handwritten proof often must be checked by many mathematicians before the corresponding proposition can be accepted as a theorem or lemma. Normally, these proofs are given in natural languages. But nature languages can be full of ambiguities. Also, a proof can have many details that are omitted. These ambiguities and the omitted details may lead to a wrong proof to be accepted.

When a proof has a very large size, or the correctness of a proposition is vitally important (normally in engineering, like the correctness of a chip design), it may be unacceptable written in natural languages. If we introduce a formal language to write the proof, and validate each step in the “formal proof” using the proof theory [@prawitz2006natural], then we will have a more convincing proof.

But another issue may rise: the proof size may be become very large when we use a limited formal language, and the validating work can be also very cumbersome and meaningless for humans. Thus, we can program the rules in proof theory and let computers to do this work. Since the computers are doing exactly the same as the limited proof rules tell it to do, it can not go wrong or beyond the rules if we ensure the rules are correct. What’s more, the computer will show the proof state in each step, and it can be served as an interactive assistant to help people to write the formal proof. This is as known as a proof assistant.

Coq Formal Proof Assistant

Coq [@barras1997coq] is a well-known proof assistant developed by INRIA. It allows us to define terms, types, predicates and state mathematical theorems. It can also be interactive and help us to develop formal proofs and prove the theorems.

The core idea of how a proof and a program are equivalent is the Curry-Howard isomorphism/correspondence. It says that two families, the proof systems on one hand, and the models of computation on the other – are in fact the same kind of mathematical objects. For example, the implication from P to Q denoted as $P \rightarrow Q$ in the logic side, can be mapped into the field of programming side as a function that convert type $P$ into type $Q$. Other limited examples are shown in the table.

Logic sideProgramming side
universal quantification ($\forall$)generalised product type ($\Pi$ type)
existential quantification ($\exists$)generalised sum type ($\Sigma$ type)
implication ($P \rightarrow Q$)function type (type $P \rightarrow Q$)
conjunction ($P \wedge Q$)product type (type $(P, Q)$)
disjunction ($P \vee Q$)sum type (type $P | Q$)
true formulaunit type (type $()$)
false formulabottom type

In this way, if we give a proof of a first-order logic proposition, then it can be easily converted into a program using a programming language that implements all of these types. And vice versa this program can be served as a proof. To enhance this idea and make it more expressive (not limited in the first-order logic or other trivial things), we can implement more sophisticated type systems in the proof system and achieve more advanced goal.

When the formal language in Coq is expressive enough, we can then define the concepts, state the theorems, and try to develop the proofs in this proof assistant. Once a statement is proved in Coq, the proof is certified without having to go through the proof-script.

A Simple Example

Here we provide a simple example to show how formal proofs look like in Coq. The theorem that we want to prove is the addition is commutative between the natural numbers. We can see that the proof assistant is only a modeling tool that can describe the mathematical objects, and the lemmas or theorems can be proved by checking the derivation process. Thus, we can divide the whole process into two parts, the modeling stage and the proving stage.

Modeling Stage

The target of the modeling stage is to describe the mathematical objects and concept precisely so that it can be understood by the proof assistant to do the further checking.

Thus, we need to define the natural numbers and the plus operation in the formal system. We choose to formalize the Peano axioms. In Coq, it can be implemented as an inductive type definition. And the axiom 3, 4, 5 in Peano axioms will be implemented automatically. Note that the texts between “(*” and “*)” are comments that show current proof states.

Inductive nat : Type :=
  | O (* O is a natural number. *)
  | S (n : nat). (* Natural number n has a successor (denoted as ``S n'') which is also a natural number.*)

Also, we need to formalize the plus operator in the form which seems like a recursive function. It considers the first operand in two cases, O or some other natural numbers which have the predecessor. As for O, simply return the other operand. As for another case, the plus operator will be wrapped into the successor structure to build up a new natural number.

Fixpoint plus (n m : nat) : nat :=
  match n with
  | O => m
  | S n' => S (plus n' m)

Proving Stage

Then we need some auxiliary lemmas to complete the proof. We can also introduce the infix “$+$” of plus where $n + m$ denotes plus n m . And by default, “$+$” is left-associative.

Example Lemma 1. $\forall n \in \mathbb{N}, n = n + O$.

Its formal proof:

Lemma plus_n_O : forall n : nat, n = n + O.
  intros n. induction n as [| n' IHn'].
  - (* n = O *) reflexivity.
  - (* n = S n' *) simpl. rewrite <- IHn'. reflexivity. Qed.

We can see that the induction axiom in Peano axioms is used here. By induction on $n$, we can say that the lemma is correct for all natural numbers.

Example Lemma 2. $\forall n, m \in \mathbb{N}, S (n + m) = n + (S~m)$.

Its formal proof is as simple as the last one:

Lemma plus_n_Sm : forall n m : nat,
  S (n + m) = n + (S m).
  intros n m. 
  induction n as [| n' IHn']. (* by induction on n *)
  - reflexivity.
  - simpl. rewrite -> IHn'. reflexivity. Qed. 

Finally, we can give the proof of the commutativity.

Example Theorem 1. $\forall n, m \in \mathbb{N}, n + m = m + n$.

Theorem plus_comm : forall n m : nat, n + m = m + n.
  intros n m.
  induction n as [| n' IHn'].
  - simpl. rewrite <- plus_n_O. reflexivity.
  - simpl. rewrite <- plus_n_Sm. rewrite -> IHn'. reflexivity. Qed.

The “rewrite” tactic is to apply the lemma / theorem to current LHS or RHS. Then we completed a formal proof of the commutativity property for addition between natural numbers. This proof can be checked by machines and also can be translated easily into corresponding handwritten proofs.

Coq standard library and other formalized libraries can be reused in further proving since all of them have been checked correctly and no need to review the proof scripts.

Formalized Theorems in Combinatorics

Though proof assistant is very powerful in describing the theorems and concepts, it needs much work to introduce the structure in the related theory. In this section, we want to formally prove two central theorems in combinatorics about chains, antichains and cover of finite posets. The proof scripts in this section can be found at GitHub repository.


Partial Order is defined as a record type (a type that holds several fields like a structure) in the Coq library. The four fields in the PO record represent the requirements that it must satisfy: it should have a carrier set, and there is a binary relation on this set and the relation is ordering.

Partially Ordered Set (Poset) can be then instantiated by PO U where U is a type (also the binary relation).

Finite Partially Ordered Set can be defined as a poset whose elements are finite. The “finite” property can be also found in the Coq standard library.

Record FPO (U : Type) : Type := Definition_of_FPO {
  PO_of :> PO U ;
  FPO_cond : Finite _ (Carrier_of _ PO_of ) }.

Chains are totally ordered subsets of a poset, so it can be formalized as:

Record Chain : Type :=
Definition_of_chain {
  PO_of_chain : PO U;
  Chain_cond : Totally_ordered U PO_of_chain (@Carrier_of _ PO_of_chain)}.

But this definition can not be well-used in the actual proof, since it can not carry the idea that “$A$ is a chain of the poset $B$”. Thus we can use a predicate to check if one is a chain in some poset: if the set $e$ is a subset of the poset, and each two elements ($x, y$) in $E$ is comparable (R $x$ $y$ $\vee$ R $y$ $x$, or equivalently, $x \le y \vee y \le x$), then we have:

Definition Is_a_chain_in (e: Ensemble U): Prop := (Included U e C /\ Inhabited U e) /\ (forall
  x y: U, (Included U (Couple U x y) e) -> R x y \/ R y x).

An antichain is a subset of the poset who has no two distinct elements are comparable.

A (anti)chain cover is a collection of (anti)chains whose union is the entire poset. The width of the poset is the size of the largest antichain. And the height of one poset is the size of the largest chain.

An element $b \in P$ is called a maximal element if there is no $a \in P$ such that $b \le a$. Similarly, an element $a \in P$ is called a minimal element if there is no $b \in P$ such that $b \le a$.


Once we have these definitions, these facts can be proved trivially.

Fact 1. There exists a chain in every finite poset.

Fact 2. There exists a chain cover for every finite poset.

Fact 3. The set minimal($P$) and maximal($P$) are both non-empty for every finite poset $P$.

Proof. By induction on the size of $P$. ◻

Fact 4. Minimal($P$) and maximal($P$) are both antichains in $P$.

Proof. By contradiction. ◻

Mirsky’s theorem

Mirsky’s theorem relates the size of an antichain cover and a chain in a poset. The definitions we have seen so far are sufficient to express the formal statement of Mirsky’s theorem in Coq.

Theorem 1. For any finite poset $P$, the maximum size of an chain (the height) of $P$ is equal to the size of the smallest antichain cover of P.

Proof. Firstly, split it to two proof goals:

  1. Size of a chian is smaller than the size of an antichain cover in finite poset $P$.

  2. There exists an antichain cover of size equal to the height of the poset $P$.

The first goal is easy to prove. Any chain has at most one common element with each antichain from an antichain cover. And each element of the chain must be covered by some antichain from the cover. Hence, the size of any chain is smaller than or equal to the size of any antichain cover.

To prove the second goal in Coq, we need to firstly prove several auxiliary lemmas. The induction tactic can be directly used, but to prove the proposition about the existence, we need to provide a concrete procedure of how we construct such an antichain cover. By induction on the height $h(P)$ of $P$. In the case $h(P) = 1$, then the poset itself is the antichain, since no two elements are comparable. In the case $h(P) = r + 1$, if we discard the largest elements in the largest chains (maybe several) of $P$ as $P’$, then $P’$ has the height $r$ and then can be covered by $r$ antichains by the hypothesis. The discarded elements are not comparable because they are from different chains. Thus, they form another antichain in $P$. By add this antichain into the former cover, we have an $r+1$ sized antichain cover of $P$. ◻

The related formal proof can be found at this link.

Dilworth’s decomposition theorem

Theorem 2. For any finite poset $P$, the maximum size of an antichain (the width) is equal to the minimum number of chains in any chain cover.

Proof. Similar to the proof of Mirsky’s theorem, it can be proved by proving two goals:

  1. Size of an antichain is smaller then size of a chain cover.

  2. There exists a chain cover of size equal to the width of the poset $P$.

The first goal is easy by contradiction.

As for the second goal, we need to prove it by induction on the size of $P$. Let $m$ be the width of $P$. In the case the size $P$ is 1, the proof is trivial. In the case $P$ of size at $n+1$, we destruct the predicate: there exists an antichain $\mathcal{A}$ of size $m$ which is neither maximal($P$) or minimal($P$). The predicate can only be true or false. This destruct is supported intrinsically by Coq.

When it’s true, we define the sets $P^+$ and $P^-$ as follows:

\[ \begin{aligned} P^+ & = \{x \in P | x \ge y \text{~for some~} y \in \mathcal{A}\} \\ P^+ & = \{x \in P | x \le y \text{~for some~} y \in \mathcal{A}\} \\ \end{aligned} \]

If $x \in \mathcal{A}$ then $x \in P^+ \cap P^-$, thus $x \in P^+ \cup P^-$. If $x \notin \mathcal{A}$, then there is must be an element $y \in \mathcal{A}$ is comparable with $x$, $x \le y$ or $x \ge y$. Thus $\forall x \in P, x \in P^+ \cup P^-$.

Therefore, $P^+ \cup P^- = P$. And $P^+ \neq P \wedge P^- \neq P$ since there at least one minimal or maximal element not in these two sets, respectively. We have the $\mathcal{A}$ as the largest antichain in $P^+$ and $P^+$ by contradiction. Using the hypothesis, there exists a chain cover sized $m$ to cover $P^+$ or $P^-$, denoted as $C_i$ or $D_i$, respectively. Elements in $\mathcal{A}$ are the minimal elements of $C_i$ ad the maximal elements of $D_i$. Therefore, we can join these two chains and form a larger chain cover of the origin poset $P$ since $P^+ \cup P^- = P$.

When the predicate is false (no antichain other than maximal($P$) or minimal($P$) has size $m$), we can remove the maximal element $a$ and minimal element $b$ in a chain in $P$ to get $P’$. Since $a$ must in the antichain maximal($P$) and $b$ must in the antichain minimal($P$), and other antichains have elements less than $m$ (the predicate is false), thus the width of $P’$ is $n-1$. By the hypothesis, there exists a cover of chains sized $m-1$. Add the chain just removed from the $P$, we can finally construct an $m$-sized chain cover of poset $P$. ◻

The related formal proof can be found at this link. In the formal proof, the author made some modification. Instead of using induction on the cardinality of posets, they use well-founded induction on the strict set-inclusion relation. When working with the Ensemble module of the Coq standard library it is easy to deal with the set-inclusion relation compared to the comparison based on set cardinalities.

Four-Color Theorem

This section will talk about how programs help to do the proof, but not the proofs themselves. Only the core idea and proof steps are elaborated.

Theorem 3. Every planar map can be colored with at most four colors.

And it’s dual version:

Theorem 4. The vertices of every planar graph (without loops) can be colored with at most four colors in such way that no pair of vertices which lie on a common edge have the same color.

First Attampt

The first computer-aided proof of 4CT is provided by K. Appel and W. Haken in 1976 [@appel1976every].

A drawing is a triangulation if every region is a triangle. A configuration is a subgraph of a planar triangulation consisting of a circuit and its interior. A configuration is called reducible if it can be shown by certain standard methods that it cannot be immersed in a minimal counterexample to the four color conjecture. A set of configurations is called unavoidable if every planar triangulation contains some member of the set.

From the definitions, 4CT can be proved by showing an unavoidable set of reducible configurations. By discharging algorithm, one can produce one of these unavoidable sets of configurations. The major effort in K. Appel’s work was involved in the development of the discharging procedure. The algorithm they developed produced a set $U$ of configurations less than 2000, each of ring size fourteen or smaller.

And the next step is to prove the reduciblility of the configurations in the unavoidable set. This step is also done by computer programs. Since in the producing step, the algorithm is avoiding configurations of ring size greater than fourteen, the reducibility of each configuration can be proved without exorbitant use of computer time. But it also cost more than 1000 hours to check all the configurations by computer.

The proof has not been fully accepted, basicly for two reasons:

  1. part of the proof uses a computer and cannot be verified by hand.

  2. even the part of the proof that is supposed to be checked by hand is extrordinarily compicated and tedious.

An Elegant Revison

Using the same general approach in A&H proof, Neil Robertson et al. made their own proof of 4CT [@robertson1997four]. The basic idea of the proof is also exhibiting a set if configurations, but much less than before (633 cases). Where their method differs from A&H is in how they prove the unavoidability. Their proofs are easier to check: they not only rewrite the hand-checking of unavoidability, but also provide a “more” formally version which can be checked by a computer in a few minutes.

But there is still a problem, the proof still combines a textual argument, which could be checked by inspection, and a program that can not be verified. If there exists any “programmer error” in the program, we can not accept the output result. The computer aided proofs is still working in a verification way, but it can not be considered as a “proof”.

Hybrid and Formal Proofs

In 2000, Georges Gonthier tried to produce such a formal proof as Section 1 for [@robertson1997four], and he succeeded. But soon another issue raised: there is a gap between the formal proof and the textual proof. One can not absolutely prove that the interface of the textual proof is fit for the formal proof. The specification used in the formal proof maybe wrong at first. To fix this issue, they formalized the entire proof of 4CT [@gonthier2008formal].


As for large scale proofs, formal method may be very helpful. Proof assistants can clearly record the current proof state, and even find available lemmas or theorems from archives for users. A shortcoming is that formal proofs can be in large size and very cumbersome to develop. It may be fixed in the future by developing more advanced proof assistants, but textual proofs can not be replaced.

  1. Recently I was working in the field of formal proofs and verification, so I choose this topic. ↩︎