Some common beginners’ proof errors

Recently, both through grading proofs and trying to teach some new math majors how to write proofs, I’ve had the opportunity to see a lot of invalid proofs. I want to record here some of the more common errors that invalidate an argument here.

Compilation errors. When grading a huge stack of problem sets, I kind of feel like a compiler. I go through each argument and stop once I run into an error. If I can guess what the author meant to say, I throw a warning (i.e. take off points) but continue reading; otherwise, I crash (i.e. mark the proof wrong).

So by a compilation error, I just mean an error in which the student had an argument which is probably valid in their heads, but when they wrote it down, they mixed up quantifiers, used an undefined term, wrote something syntactically invalid, or similar. I believe that these are the most common errors. Here are some examples of sentences in proofs that I would consider as committing a compilation error:

For a conic section C, \Delta = b^2 - 4ac.

Here the variable \Delta,a,b,c are undefined at the time that they are used, while the variable C is never used after it is bound. From context, I can guess that \Delta is supposed to be the discriminant of C, and C is supposed to be the conic section in the (x,y)-plane cut out by the equation ax^2 + bxy + cy^2 + dx + ey + f = 0 where (a,b,c,d,e,f) are constants. So this isn’t too egregious but it is still an error, and in more complicated arguments could potentially be a serious issue.

There’s another thing problematic about this example. We use “For” without a modifier “For every” or “For some”. Does just one conic section satisfy the equation \Delta = b^2 - 4ac, or does every conic section satisfy this equation? Of course, the author meant that every conic section satisfies this equation, and in fact probably meant this equation to be a definition of \Delta. So this compilation error can be fixed by instead writing:

Let C be the conic section in the (x,y)-plane defined by the equation ax^2 + bxy + cy^2 + dx + ey + f = 0. Then let \Delta = b^2 - 4ac be the discriminant of C.

Here’s another compilation error:

Let V be a 3-dimensional vector space. For every x, y \in V, define f(x, y) = xy.

Here the author probably means that f(x, y) is the cross-product, or the wedge product, or the polynomial product, or the tensor product, or some other kind of product, of x,y. But we don’t know which product it is! Indeed, V is just some three-dimensional vector space, so it doesn’t come with a product structure. We could fix this by writing, for example:

Let V = \mathbb R^3, and for every x, y \in V, define f(x, y) = x \times y for the cross product of x,y.

We have seen that compilation errors are usually just caused by sloppiness. That doesn’t mean that compilation errors can’t point to a more serious problem with one’s proof — they could, for example, obscure a key step in the argument which is actually fatally flawed. Arguably, this is the case with Shinichi Mochizuki’s infamous incorrect proof of Szpiro’s Conjecture. However, I think that most beginners can avoid compilation errors by making sure that they define every variable before using it, are never ambiguous about if they mean “for every” or “for some”, and otherwise just being very careful in their writing. And beginners should avoid using symbol-soup whenever possible, in favor of the English language. If you ever write something like

Suppose that f: ~\forall \varepsilon > 0 \exists \delta > 0 \forall(x,y:|x-y| < \delta) (|f(x) - f(y)| < \varepsilon).

I will probably take off points, even though I can, in principle, parse what you’re trying to say. The reason is that you could just as well write

Suppose that f: A \to \mathbb R is a function, and for every \varepsilon > 0 we can find a \delta > 0 such that for any x, y \in A such that |x - y| < \delta, |f(x) - f(y)| < \varepsilon.

which is much easier to read.

Edge-case errors. An edge-case error is an error in a proof, where the proof manages to cover every case except for a single special case where the proof fails. These errors are also often caused by sloppiness, but are more likely to actually be a serious flaw in an argument than a compilation error. They also tend to be a lot harder to detect than compilation errors. Here’s an example:

Let f: X \to Y be a function. Then there is some y \in Y in the image of f.

Do you see the problem? Don’t read ahead until you try to find it for a few minutes.

Okay, first of all, if you read ahead without trying to find the problem, shame on you; second of all, if you’ve written something like this, don’t feel shame, because it’s a common mistake. The issue, of course, is that X is allowed to be the empty set, in which case f is the infamous empty function into Y.

Most of the time the empty function isn’t too big of an issue, but it can come up sometimes. For example, the fact that the empty function exists means that arguably 0^0 = 1, which is problematic because it means that the function x \mapsto 0^x is not continuous (since if x > 0 then 0^x = 0).

Here’s an example from topology:

Let X be a connected space and let x_1,x_2 \in X. Then let \gamma be a path from x_1 to x_2.

In practice, most spaces we care about are quite nice — manifolds, varieties, CW-complexes, whatever. In such spaces, if they are connected we can find a path between any two points. However, this is not true in general, and the famous counterexample is the topologist’s sine curve. The point is that it’s very important to make sure you get your assumptions right — if you wrote this in a proof there’s a good chance it would cause the rest of the argument to fail, unless you had an additional assumption that the space X did in fact have the property that connected implied path-connected.

In general, a good strategy to avoid errors like the above error is to beware of the standard counterexamples of whatever area of math you are currently working in, and make sure none of them can sneak past your argument! One way to think about this is to imagine that you are Alice, and Bob is handing you the best counterexamples he can find for your argument. You can only beat Bob if you can demonstrate that none of his counterexamples actually work.

Let me also give an example from my own work.

Let X be a bounded subset of \mathbb R. Then the supremum of X exists and is an element of \mathbb R.

It sure looks like this statement is true, since \mathbb R is a complete order. But in fact, X could be the empty set, in which case every real number is an upper bound on X and so \mathrm{sup } X = -\infty. In most cases, the reaction would be “So what? It’s just an edge case error.” But actually, in my case, I later discovered that the thing I was trying to prove was only interesting in the case that X was the empty set, in which case this step of the argument immediately fails. A month later, I’m still not sure what to do to get around this issue, though I have some ideas.

Fatal errors. These are errors which immediately cause the entire argument to fail. If they can be patched, so much the better, but unlike the other two types of errors that can usually be worked around, a fatal error often cannot be repaired.

The most common fatal error I see in beginners’ proofs is the circular argument, as in the below example:

We claim that every vector space is finite-dimensional. In fact, if \{v_1, \dots, v_n\} is a basis of the vector space V, then \mathrm{dim }V = n < \infty, which proves our claim.

If you read a standard textbook on linear algebra, they will certainly assume that given a vector space V, you can find a basis \{v_1, \dots, v_n\} of V. But in fact, such a finite basis only exists if, a priori, V is finite-dimensional! So all the student here has managed to prove is that if V is a finite-dimensional vector space, then V is a finite-dimensional vector space… not very interesting.

(This is not to say that there are almost-circular arguments which do prove something nontrivial. Induction is a form of this, as is the closely related “proof by a priori estimate” technique used in differential equations. But if one looks closely at these arguments they will see that they are not, in fact, circular.)

The other kind of fatal error is similar: there’s some sneaky assumption used in the proof, which isn’t really an edge case assumption. I have blogged about an insidious such assumption, namely the continuum hypothesis. In general, these assumptions often are related to edge-case issues, but may even happen in the generic case, as you mentally make an assumption that you forget to keep track of. Here is another example, also from measure theory:

Let X be a Banach space and let F: [0, 1] \to X be a bounded measurable function. Then we can find a sequence (F_n) of simple measurable functions such that F_n \to F almost everywhere pointwise and (F_n) is Cauchy in mean, so we define \int_0^1 F(x) ~dx = \lim_{n \to \infty} \int_0^1 F_n(x) ~dx.

This definition looks like the standard definition of an integral in any measure theory course. However, without a stronger assumption on X, it’s just nonsense. For one thing, we haven’t shown that the definition of \int_0^1 F(x) ~dx doesn’t depend on the choice of (F_n). That can be fixed. What cannot be fixed is that (F_n) might not exist at all! This happens if X is not separable, in which case the definition of the integral is nonsense.

This sort of fatal error is particularly tricky to deal with when one is first learning a more general version of a familiar theory. Most undergraduates are familiar with linear algebra, and the fact that every finite-dimensional vector space has a basis. In particular, every element of a vector space can be written uniquely in terms of a given basis. So when one first learns about finite abelian groups, they might be tempted to write:

Let G be a finite abelian group, and let g_1, \dots, g_n be a minimal set of generators of G. Then for every x \in G there are unique x_1, \dots, x_n \in \mathbb Z such that x = x_1g_1 + \cdots + x_n g_n.

In fact, the counterexample here is G = \mathbb Z/2, n = 1, g_1 = 1, and x = 0, because we can write 0 = 0g_1 = 2g_1 = 4g_1 = \cdots. So, when generalizing a theory, one does need to be really careful that they haven’t “imported” a false theorem to higher generality! (There’s no shame in making this mistake, though; I think that many mathematicians tried to prove Fermat’s Last Theorem but committed this error by assuming that unique factorization would hold in certain rings — after all, unique factorization holds in everyone’s favorite ring \mathbb Z — that it fails in.)


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s