What the hell is a Christoffel symbol?

I have to admit that I’ve gone a long time without really understanding the physical interpretation of the Christoffel symbols of a connection. In fact, there is an interpretation that, in the special case of the Christoffel symbols for the Levi-Civita connection in polar coordinates on Euclidean space, could be understood by me at age 16, after I took an intro physics class (though I definitely wouldn’t understand the relativistic or Yang-Millsy stuff). Here I want to record it. As usual, I’m pretty sure that everything here is very well-known, but I want to write it all down for my own intuition.

Let D be the covariant derivative of a connection on a vector bundle E. Given a coordinate frame e, one defines the Christoffel symbols by D_j e_k = {\Gamma^i}_{jk} e_i. Here and always we use Einstein’s convention.

The Levi-Civita connection. Suppose E is the tangent bundle of spacetime and D is the Levi-Civita connection of the metric. Then for any free-falling particle with velocity v and acceleration a, one has the relativistic form of Newton’s first law of motion a^k + {\Gamma^k}_{ij} v^iv^j = 0, which to mathematicians is more popularly known as the geodesic equation. It says that the “acceleration” in the coordinate frame e is entirely due to the fact that e itself is an accelerated frame.

Viewing \Gamma^k as a bilinear form, we can rewrite Newton’s first law as a^k = -\Gamma^k(v, v), which now resembles Newton’s second law with unit mass. Indeed, the acceleration of the particle is given exactly by a quantity -\Gamma^k(v, v) e_k which can be reasonably interpreted as a “force”. For example, one could consider the case that the spatial origin is a particle P which is orbiting around a point. If one believes that P really is “inertial”, then they will measure a fictitious force — the centrifugal force — acting on all objects. In general relativity, moreover, I think that the notion of “inertial” is ill-defined. In this case, if v is timelike then $\Gamma^k(v, v)$ is the acceleration due to gravity. In particular these fictitious forces all scale linearly with mass, because the geodesic equation does not have a mass factor and so we need to cancel out the factor of mass in the law F = ma.

It will be convenient to go to another level of abstraction and view \Gamma: T_pM \otimes T_pM \to T_pM as a quadratic form valued in the tangent space. In other words it is tempting to think of \Gamma as a section of T^*M \otimes T^*M \otimes TM. This of course presupposes that M has a trivial tangent bundle, since the Christoffel symbols are only defined locally. Putting our doubts aside, this is equivalent to thinking of \Gamma as a section of T^*M \otimes \text{End } TM.

Connections on G-bundles. Let me remind you that if G is a Lie group, then a G-bundle is a bundle of representations of G. Thus we can view quotients of G and its Lie algebra \mathfrak g as both subsets of End E, whenever E is a G-bundle. By a gauge transformation of a G-bundle E one means a section of End E which is in fact a section of G. Thus gauge transformations act on E (and so also on End E, etc.)

If E is a G-bundle, by a covariant derivative on E I mean a covariant derivative whose Christoffel symbols \Gamma are not just sections of T^*M \otimes \text{End } E but in fact are sections of T^*M \otimes \mathfrak g. (Briefly, the Christoffel symbols are \mathfrak g-valued 1-forms.) In this case, if we have two covariant derivatives D, D’ which lie in the same orbit of the gauge transformations, we call D, D’ gauge-equivalent. We tend to think of covariant derivatives of G-bundles (modulo gauge-equivalence) as describing physical theories.

For example, consider the trivial U(1)-bundle E. This is the trivial line bundle equipped with the canonical action of U(1) on the complex numbers. A covariant derivative D on E is defined by locally giving Christoffel symbols which are \mathfrak u(1)-valued 1-forms — in other words, imaginary 1-forms. A gauge transformation, then, is defined by adding an imaginary exact 1-form to the Christoffel symbols. We interpret the Christoffel symbols A as (i times) potentials for the electromagnetic field. In fact, one can take the exterior derivative of A and obtain a closed 2-form F = dA, which one can view as the Faraday tensor. The fact that one can add an exact 1-form to A is exactly the gauge invariance of the Maxwell equation *d*dA = j where j is the current 1-form.

So what is D in the case of electromagnetism? It acts on sections as D_i = \partial_i + A_i. So for a function u (i.e. a section of the trivial bundle E) on M, (D_i - \partial_i)u weights u according to the strength of the electromagnetic potential. This is mainly interesting when u is a constant function, in which case Du = uA is the potential rescaled by u.

I think that the takeaway here is: the Christoffel symbols are a fictitious and local V-valued 1-form, where V is some vector bundle (V = \mathfrak g or V = TM \otimes T^*M above). In any particular case they should have a nice physical interpretation but I don’t think one can interpret the Maxwell-Yang-Mills case and the Levi-Civita case as one and the same.

What’s wrong with the Museum of Math?

I’d like to bring attention to an open letter cosigned by several staff members of the National Museum of Mathematics (hereinafter MoMath) and addressed to its board of directors and CEO, Cindy Lawrence. In the comments of a blog post of Sam Shah, several other staff members corroborate the allegations in the letter.

While you really should read the open letter and comments yourself, I would like to in particular stress how outrageous the allegation about the policy of MoMath concerning Title I schools is. Recall that Title I schools are those public schools which have been identified as having a large amount of low-income students, and which have been given additional funding from the US Department of Education in order to promote those students’ educations. If the allegation is true, MoMath offers scholarships to allow Title I schools to have field trips to MoMath for free, but then discriminates against them by having shorter educational sessions, so that students will not have time to solve the problems that they are posed. This can only serve to discourage them from mathematics, leaving everyone worse off.

Math is for everybody, and this is more than just a flashy slogan. Public American K-12 education is notorious for spreading the philosophy that mathematics is an innate ability, rather than a skill that can be trained; this creates a clear inequity between those children (typically of wealthier, more educated parents) who believe that they can do mathematics, and those who do not, which later carries over to income inequality in adulthood. Moreover, one cannot really do away with incentives for students to learn mathematics. On a less economic and more aesthetic level, MoMath’s mission statement proposes to encourage “broad and diverse audience to understand the evolving, creative, human, and aesthetic nature of mathematics” — a task that it has evidently failed at.

My high school was woefully underfunded, though not Title I. Our treatment of mathematics was shallow and only a tiny percentage of my class ended up taking any math beyond Calculus 1 in high school. I really had no idea what mathematics was, or that one could pursue it as a career — I ended up in this business somewhat by accident! Something like MoMath would have been a wonderful experience for me, and probably many of my classmates who never really learned what mathematics was. The same holds, I suspect, for many students at Title I schools. But even those who are able to visit MoMath will have any benefits from the trip denied to them.

Linear algebra done dubiously

A book that has been a contentious topic of discussion is Linear Algebra Done Right, by Axler. The reason, at least ostensibly[1], is because Axler’s treatment avoids the discussion of determinants. For the critics’ part, Axler himself seems to play this up, marketing the book as a revolutionary treatment where determinants are not discussed. Apparently, Sergei Treil found this marketing so offensive that he wrote a competing textbook known as Linear Algebra Done Wrong.

I do not quite buy the hype here. There’s a whole chapter on determinants in Axler’s book, which even includes a discussion of Jacobian determinants. Axler just doesn’t use determinants to prove the three main theorems of intermediate linear algebra over an algebraically closed field \overline K, namely the fact that every linear operator has an eigenvalue, that every linear operator has a unique Jordan canonical form, and the Cayley-Hamilton theorem. In all of these cases, one could prove the theorem using determinants, but there’s no good reason to, since there is a perfectly reasonable structure theory of linear operators over \overline K which does not mention determinants, and it gives fairly easy and conceptual proofs to all three theorems.

(I don’t think Axler’s book is perfect, for the record. Most annoyingly, he doesn’t seem to clearly distinguish between theorems that are valid over general \overline K and theorems that are specifically valid over \overline K = \mathbb C, which is the case for most of the results in the latter half of the book, except for one single chapter about the structure theory over \mathbb C. But I do think that a lot of the angry comments I’ve seen about the book on Reddit and elsewhere, which mainly focus on the issue of determinants, are just totally out to lunch.)

Anyways, it occurred to me today that the way I like to think about linear algebra neither involves determinants nor Axler’s structure theory, but is rather a complex-analytic version of linear algebra. I don’t think it essentially uses complex analysis though, and could probably be adopted to general \overline K.

The point is to consider the resolvent R(z) = (T - z)^{-1} of the linear operator T acting on a vector space V of dimension n, which is a rational map from \mathbb P^1 to a space of matrices. Clearly an eigenvalue is a pole of R, and the number of poles equals the number of zeroes (this is clearly true when \overline K = \mathbb C, but I suspect it is true for arbitrary \overline K). Since R has a zero of order n at \infty, T must have n eigenvalues. (If \overline K = \mathbb C, Rouche’s theorem even gives a bound on the size of the eigenvalues, and a way to compute approximations to the eigenvalues.)

Now we have n eigenvalues z_1, \dots, z_n counted with multiplicity. If \overline K = \mathbb C, we may consider loops \gamma_j around the puncture of \mathbb C \setminus \{z_j\} and define P_j = \frac{z_j}{2\pi i} \int_{\gamma_j} R(z) ~dz and similarly N_j = \frac{1}{2\pi i} \int_{\gamma_j} (z - z_j)R(z) ~dz. It is now a straightforward consequence of Cauchy’s integral formula that N_j is nilpotent and we have A = \sum_j P_j + N_j. Furthermore, if V_j is the image of P_j, then A acts on V_j as A = \lambda_j + N_j, and we have a direct sum decomposition V = \bigoplus_j V_j. That implies that A = \sum_j P_j + N_j is the Jordan canonical form of A. Let me leave the details to these notes of Knill. I would be very interested to see if an argument like this can be used in the general case of \overline K an algebraically closed field, possibly by replacing \gamma_j by the generator of some algebraic analogue of the fundamental group of the “open subscheme” (if that makes any sense) \overline K \setminus \{z_j\}, and replacing the differential form (z - z_j)R(z) ~\frac{dz}{2\pi i} with some sort of algebraic analogue of cohomology.

It remains to prove the Cayley-Hamilton theorem. (This proof, which was shown to me by Charles Pugh, is what got me thinking about linear algebra in this fashion in the first place.) Recall that the Cayley-Hamilton theorem says that if p is a characteristic polynomial of T, thus the zeroes of p are the eigenvalues of T, then p(T) = 0. This is obviously true if T is diagonalizable.

Now, the set of diagonalizable matrices is dense, because for example it includes the set of matrices with distinct eigenvalues, which is a generic set. On the other hand, the set Z of matrices with the Cayley-Hamilton property is closed, since p is continuous. Since clearly the space of all matrices is connected we conclude that Z = \overline K. This argument ostensibly works over \overline K = \mathbb C, but with a little work, it also holds for arbitrary \overline K, because we may use the Zariski topology.

This would be a pretty horrible way to teach linear algebra, but maybe one could simplify it so that it’s not so horrible.

[1] Axler has a signature, and quite clear and amicable, writing style, unlike most older textbooks. How much of the actual debate here is just Bourbakists in shambles?

Much ado about large cardinals

Lately, with Peter Scholze’s MathOverflow post about Grothendieck universes and the Isabelle/HOL implementation of schemes, it seems that in the sphere of online math there has been a somewhat renewed interest in when large cardinals make proving theorems easier. (Specifically, it is not necessary that one actually needs the large cardinals to prove the theorem — only that it makes the proof easier!) So I thought it would be fun to look through some old homework of mine and see if I could find an example where if I had allowed myself the use of a large cardinal, my life would have been easier. I found an example from when I took a course in C*-algebras a few years ago.

Let X be a locally compact Hausdorff space. By a compactification of X we mean an open dense embedding X \to Y where Y is a compact Hausdorff space. By Alexandroff’s theorem, X always has a compactification, but in general if X is not compact then X may have multiple compactifications. We consider the category Comp X of compactifications of X equipped with continuous surjections which preserve X; the Alexandroff compactification is the final object of Comp X.

The Stone–Čech theorem. The category Comp X has an initial object.

One may show that the initial object of Comp X is \text{Spec } C_b(X) where C_b(X) is the Banach space of bounded continuous functions on X with its supremum norm, and the functor Spec is taken in the sense of C*-algebras (thus Spec A consists of maximal closed ideals equipped with the Zariski-Jacobson topology). This proof is presumably inoffensive to anyone who accepts ZFC (and offensive to anyone who does not, since one needs Zorn’s lemma to show that C_b(X) has a maximal ideal in general — and ZF alone cannot prove that Comp X has an initial object).

However, for the purposes of the result I was trying to prove, I needed a proof of the Stone–Čech theorem that did not rely on the existence of \text{Spec } C_b(X), or else my argument would have been circular. To do this, one proceeds as follows. If Z \to Y is a morphism in Comp X, then since X is dense in Z, the underlying continuous surjection Z \to Y is completely determined by its behavior on X, but it is also the identity on X. Therefore Comp X is a poset category. Let \mathcal C be a chain in Comp X; then \mathcal C is an inverse system of topological spaces, and if C is the inverse limit of \mathcal C, then one can show that there is a closed embedding C \to \prod \mathcal C. Since \prod \mathcal C is a compact Hausdorff space by Tychonoff’s theorem, so is C. Taking the inverse limits of the open dense embeddings X \to Y, where Y \in \mathcal C, we obtain an open dense embedding X \to C, so C is an upper bound of \mathcal C in Comp X.

At this point, one may proceed in two ways. Working in ZFC, it is only valid to apply Zorn’s lemma if Comp X is equivalent to a small category, but \text{Comp } \mathbb N is a large category. To see that Comp X is equivalent to a small category, it suffices to show that there is a cardinal \kappa such that every compactification of Comp X has at most \kappa points; then for every compactification Y of X, one can find a compactification Z of X such that Y \cong Z in Comp X, and the set-theoretic rank of Z is at most \kappa, and so Comp X is a subset of the set V_\kappa. Furthermore, if Y is a compactification of X and y \in Y, then, since X is dense in Y, by the boolean prime ideal theorem there is an ultrafilter U on the set Open X of open subsets of X such that \lim U = y. Since Y is Hausdorff, it follows that y is the UNIQUE limit of U, but some cardinal arithmetic can be used to show that if \lambda is the cardinality of X, then there are only 2^{2^\lambda} ultrafilters on Open X (since elements of an ultrafilter on Open X are open subsets of X), so the cardinality of Y is at most 2^{2^\lambda}. Therefore we may let \kappa = 2^{2^{\lambda}}.

Okay, that was stupid. We can also proceed by large cardinals. The following argument feels much more conceptual to me:

Definitions. Let \delta > \aleph_0 be a regular cardinal. We say that \delta is an inaccessible cardinal if for every cardinal \lambda < \delta, 2^\lambda < \delta. We say that \delta is a hyperinaccessible cardinal if \delta is an inaccessible cardinal and there is an increasing chain of inaccessible cardinals \delta_\alpha such that \lim_\alpha \delta_\alpha = \delta.

Let \delta be a hyperinaccessible cardinal and suppose that \text{card }X < \delta. Then there are inaccessible cardinals \text{card }X < \kappa < \kappa' < \delta. If X \in V_\kappa and Y is a compactification of X, then Y can be obtained as an extension of the Alexandroff compactification by splitting nets, but V_\kappa is a Grothendieck universe and so the topology of X can be already probed by nets in V_\kappa; therefore Y \in V_\kappa. Therefore \text{Comp } X \subseteq V_\kappa is a small category in V_{\kappa'}, so X has a Stone–Čech compactification \beta X with \text{card } \beta X < \kappa' < \delta.

This argument looks verbose, but only because I have written out the details; I think in practice I would just say that if X lies underneath an inaccessible cardinal \kappa, then enough nets to probe the topology of X are also under \kappa, so every compactification is as well.

Sundry facts about pseudodifferential operators

In this blog post I will just record some things I’ve been trying to learn about lately, largely just so I can have a place to collect my thoughts. Most of this is in Hörmander’s monograph on differential operators, and is motivated by trying to understand Vasy’s method and Atiyah-Singer index theory.

Pseudodifferential operators on manifolds.

Let us recall that a symbol on an open subset X of \mathbb R^d is by definition a smooth function on the cotangent bundle of X (for which certain seminorms are finite). This was curious to me — you can motivate it by saying that a symbol is an observable and the cotangent bundle is “phase space” in the sense that a point (x, \xi) \in T^*X consists of a position x and a momentum \xi, but why should the momentum live in a cotangent space and not the fiber of some other vector bundle? When we quantize a symbol a, defining an operator a(D) by formally substituting the differential operator D = -i\nabla in place of the momentum, we by definition obtain a pseudodifferential operator. Now let \kappa: X \to Y be a diffeomorphism, and introduce the pushforward symbol \kappa_* a(y, \eta) = e^{-iy\eta} a(\kappa^{-1}(y), D) a^{iy\eta}. This is the “right” definition in the sense that )\kappa_*a(x, D)u)(\kappa(x)) = a(x, D)u(\kappa(x)).

If a is a symbol of order m, then \kappa_* a(y, \eta) = a(\kappa^{-1}(y), \kappa'(\kappa^{-1}(y))^t \eta) modulo symbols of order m – 1. But \kappa'(x) is invariantly defined as an isomorphism of tangent bundles \kappa'(x): TX \to TY, so its transpose should be an isomorphism (\kappa')^{-1}(x): T^*Y \to T^*X of the dual bundle. This only makes sense if \eta \in T^*_yY is a covector at y.

The above paragraphs are totally obvious, and yet puzzled me for the past three years, until last week when I sat down and decided to work out the details for myself.

The consequence is that we cannot define the symbol of a pseudodifferential operator invariantly. Rather, we declare that a pseudodifferential operator A has the property that for every chart \kappa: X \to Y and every pair of cutoffs \phi, \psi on Y, then the operator \phi \circ \kappa_* \circ A \circ \kappa^* \circ \psi is a pseudodifferential operator on Y (in the sense that it is the quantization of a symbol on Y; here the pushforward \kappa_* is defined to be the inverse of the pullback \kappa^*). Since Y is an open subset of \mathbb R^d this makes sense.

Previously we have discussed pseudodifferential operators on manifolds M. These can be viewed more abstractly as acting on sections of the trivial line bundle M \times \mathbb C. However, in geometry one frequently has to deal with sections of more general vector bundles over M. For example, a 1-form is a section of the cotangent bundle. If E, F are vector bundles over M of rank r, s respectively, one may define the Hom-bundle Hom(E, F), which locally is isomorphic to the matrix bundle M \times \mathbb C^{r \times s}. Then a pseudodifferential operator from sections of E to sections of F is nothing more than a linear map which, after trivialization of E and F, looks like a $s \times r$ matrix of pseudodifferential operators on M. The principal symbol of such an operator sends the cotangent bundle of M into the Hom-bundle Hom(E, F).

Wavefront sets.

In this section we will impose that all pseudodifferential operators have Schwartz kernels K such that the projections of supp K are both proper maps. Modulo the space \Psi^{-\infty} of pseudodifferential operators of order -\infty, this assumption is no loss of generality. Under this assumption, the top-order term of a symbol — that is, the principal symbol — satisfies the pushforward formula \kappa_* a(y, \eta) = a(\kappa^{-1}(y), \kappa'(\kappa^{-1}(y))^t \eta), so the principal symbol is well-defined as an element of S^m/S^{m-1} (here S^\ell is the \ellth symbol class). The principal symbol encodes important information about the nature of the operator; for example we have:

Definition. An elliptic pseudodifferential operator of order m is one whose principal symbol is \sim |\xi|^m near infinity of each cotangent space.

The important property is that if A is an elliptic pseudodifferential operator, then A is also invertible modulo the quantization \Psi^{-\infty} of S^{-\infty}. For example the Laplace-Beltrami operator is elliptic on Riemannian manifolds since its symbol is \xi^2; since the quadratic form induced by a Lorentzian metric is not positive-definite, it follows that on Lorentzian manifolds, the Laplace-Beltrami operator is not elliptic. Since a Lorentzian Laplace-Beltrami operator is really just the d’Alembertian, whose symbol is \xi^2 - \tau^2, this should be no surprise.

Recall that a conic set in a vector space is a set which is closed under multiplication by conic scalars. A conic set in a vector bundle, then, is one which is conic in every fiber.

Definition. Let a be the principal symbol of a pseudodifferential operator A of order m. We say that A is noncharacteristic near (x_0, \xi_0) \in T^*M if there is a conic neighborhood of (x_0, \xi_0) wherein a(x, \xi) \sim |\xi|^m near infinity. Otherwise, we say that (x_0, \xi_0) is a characteristic point. The set of characteristic points is denoted Char A and the set of noncharacteristic points is denoted Ell A.

Thus a pseudodifferential operator A is noncharacteristic at (x, \xi) if in a neighborhood of x, A is elliptic when restricted to the direction \xi. By definition, Char A is closed, so we may make the following definition.

Definition. Let u be a distribution. The wavefront set WF(u) is the intersection of all sets Char A, where A ranges over pseudodifferential operators such that Au \in C^\infty.

Then WF(u) is a closed conic subset of the cotangent bundle T^*M, and its projection to M is exactly the singular support ss(u). Indeed, x \notin ss(u) iff for every pseudodifferential operator A in a sufficiently small neighborhood of x, Au \in C^\infty; in other words no matter how hard we try, we cannot force u to become singular without differentiating it away from x. The wavefront set also remembers the direction in which this singularity happens; by elliptic invertibility, it will not happen in a direction that A is noncharacteristic.

For example, the only way that u(x, y) = \delta_{y = 0} can be made smooth is by cutting off u to away from \{(x, y): y = 0\}, which can be done by pseudodifferential operators of order 0 which are elliptic in the x-direction, but not possibly in the y-direction, along the x-axis.

Pseudotransport equations.

Hyperbolic operators are meant to generalize the transport equation (\partial_t - \partial_x)u(t, x) = 0. Let us therefore begin by studying the “pseudotransport” equation (\partial_t + a(t, x, D_x))u(t, x) = 0.

We assume that t \mapsto a(t, x, D_x) is uniformly bounded in S^1 and continuous in C^\infty, and the real part of a is uniformly bounded from below. Then we have the energy estimate

\displaystyle \frac{1}{2} \int_0^T ||e^{-\lambda t} u(t)||_{H^s}^p \lambda~dt \leq ||u(0)||_{H^s}^p

valid for any s \in \mathbb R and \lambda large enough depending on s. Applying the Hanh-Banach theorem we conclude that for every initial data in H^s we can find u \in C^0([0, \infty) \to H^s) which solves the pseudotransport equation. In particular, given Schwartz initial data, it follows that u is smooth.

Now fix initial data \phi \in H^s and assume that the principal symbol exists and is imaginary. (This forces the transport operator to be real and of order 1.) Let q be a symbol of order 0 on space, with principal symbol q_0. If in fact Q(D) is a pseudodifferential operator on spacetime such that such at time 0, Q(0) = q, and Q(t, D) commutes with \partial_t + a(t, x, D_x) then Qu solves the pseudotransport equation. (Actually, we will find Q so that [Q(t), \partial_t + a(t, x, D_x)] is a pseudodifferential operator of order -\infty; this is good enough.) In particular if q\phi \in C^\infty_0 then WF(u) is contained in Char Q, and WF(u) should be the intersection of all such sets Char Q.

To compute WF(u), let ia_0 be the principal symbol of a(D) and suppose that Q \sim \sum_j Q_j, where Q_0 is principal, is given. Then the principal symbol of [\partial_t + a(t, x, D_x), Q(t, x, D)] is the Poisson bracket

\displaystyle \{\tau + a_0(t, x, \xi), Q_0(t, x, \xi)\} = (\partial_t + H_{a_0})Q_0

where H_p is the Hamilton vector field of a symbol p. By inducting on j, we can use this computation to compute Q_j and conclude that modulo an error term of order -\infty, we can choose Q to be invariant along the Hamiltonian flow \psi given by the Hamiltonian a_0. That is, if F_tu(0) = u(t), then WF \circ F_t = \psi_t \circ WF. This result is a sort of “propagation of singularities” for the pseudotransport equation, which generalizes the fact that the transport equation acts on Dirac masses by transporting them, as expected.

Solving the hyperbolic Cauchy problem.

Let X be a manifold that represents “spacetime”. A priori we may not have a Lorentzian metric to work with, so instead we fix a function \phi that is a “time coordinate”. The level surfaces of \phi can be viewed as “spacelike hypersurfaces” in X.

Throughout we will let X_0 = \{\phi = 0\} and X_+ = \{\phi > 0\} denote the present and future, respectively.

Definition. A hyperbolic operator is a differential operator P of principal symbol p and order m such that p(x, d\phi(x)) = 0 and for every (x, \xi) \in T^*M such that \xi is not in the span of d\phi, there are m distinct \tau \in \mathbb R such that p(x, \xi + \tau d\phi(x)) = 0.

Since P is a differential operator, p(x) is a homogeneous polynomial of order m. To make sense of the condition, let me restrict to the case that X = \mathbb R^2 with its usual Riemannian metric and \phi is the projection onto the t-axis. Then after rotating the first coordinate so that \xi is a covector dual to the x-axis, the condition says that given (x, t, \xi) we can find exactly m real numbers \tau such that p(x, t, \xi, \tau) = 0. In the case of the d’Alembertian, we have p(x, t, \xi, \tau) = \xi^2 - \tau^2, and indeed given \xi we can set \tau = \pm \xi.

To state the initial-value problem with initial data in the “initial-time slice” X_0, let v be a vector field such that v\phi = 1, so v points “forward in time”. The action of v is “differentiating with respect to time”. Note that this hypothesis prevents \phi from degenerating.

Theorem (solving the hyperbolic Cauchy problem). Let P be a hyperbolic operator of order m with smooth coefficients, Y a precompact open submanifold of X, and s \geq 0. Assume we are given an inhomogeneous term f \in H^s_{loc}(X_+) satisfying f|X_0 = 0 and initial data \psi_j \in H^{loc}_{s + m - 1 -j}(X_0), j < m. Then there is u \in H^{s + m - 1}_{loc}(X) supported in \overline X_+ such that Pu = f in X_+ \cap Y and v^ju = \psi_j in X_0 \cap Y.

The proof is in Chapter 23.2 of Hörmander. The idea is to first prove uniqueness of solutions. By compactness, we may cover Y with finitely many charts U which are isomorphic to open subsets of Minkowski spacetime in which level sets of \phi are spacelike hypersurfaces and orbits of v are worldlines. Since Minkowski spacetime has an honest-to-god time coordinate, the hyperbolicity hypothesis allows us to factor the principal symbol p into first-order factors, and hence factor P into pseudotransport operators on U, at least modulo a lower-order error. We may then apply the solution of the Cauchy problem for pseudotransport operators to solve the Cauchy problem for Pu = f in each chart U, and since there were only finitely many, uniqueness allows us to stitch the local solutions together into a global solution.

The proof outlined in the above paragraph is motivated by the special case when P is the d’Alembertian, which already appears in Chapter 2 of Evans. In that proof, one first observes that the Cauchy problem for the transport equation has an explicit solution. Then one reduces to the case that spacetime is two-dimensional, in which case there is an explicit factorization of P into transport operators, namely P = (\partial_x - \partial_t)(\partial_x + \partial_t).

Propagation of singularities, part I.

To study the propagation of singularities we need to recall some symplectic geometry. Let Q be a pseudodifferential operator on X and q its principal symbol. Then the Hamilton vector field H_q induces a flow on T^*X which preserves q.

Definition. The bicharacteristic flow of a pseudodifferential operator Q of principal symbol q is the flow of H_q on q^{-1}(0). A bicharacteristic of Q is an orbit of the bicharacteristic flow.

The intuition for the bicharacteristic flow is that its projection to X is “lightlike”, at least if Q is the d’Alembertian.

Theorem (Hörmander’s propagation of singularities). Let P be a pseudodifferential operator of order m such that the Schwartz kernel of P has proper support, and the principal symbol of P is real. Then for every distribution u, WF(u) – WF(f) is invariant under the bicharacteristic flow of P.

By definition of the wavefront set, for every distribution u, WF(u) – WF(Qu) is contained in Char Q. But if Q is a differential operator, then Char Q is exactly the “characteristic variety” q^{-1}(0), which is exactly the variety where the bicharacteristic flow of Q is defined. Therefore we can ask that WF(u) – WF(Qu) be invariant under the bicharacteristic flow.

If P is a hyperbolic operator of principal symbol p, then the solutions \tau of the equation p(x, \xi + \tau d\phi(x)) = 0 are all real and distinct, and modulo lower-order terms this can be used to enforce that the coefficients of p are real. We phrase this more simply by saying that the principal symbol of every hyperbolic operator is real.

A partial converse to the reality of principal symbols of hyperbolic operators holds. If Q is a differential operator, then its principal symbol q is a homogeneous polynomial on each cotangent space. Fixing a particular cotangent space, we can write q(\xi) = \sum_\alpha c_\alpha \xi^\alpha where \alpha ranges over all multiindices of order m and c_\alpha \in \mathbb R. In order that the characteristic variety of Q have more than one real point, there must be some c_\alpha positive and some negative. But this is exactly the situation of the d’Alembertian, whose principal symbol is q(\xi, \tau) = \xi^2 - \tau^2.

Thus, while the propagation of singularities theorem only assumes that the principal symbol is real, if the operator P is (for example) elliptic or parabolic, then the conclusion of the theorem is degenerate in the sense that the characteristic variety only has a single real point, so that WF(u) – WF(f) is invariant under EVERY group action on the characteristic variety, not just the bicharacteristic flow.

The interpretation of the propagation of singularities theorem is that P is something like the d’Alembertian, in which case p is something like a Lorentzian metric. The bicharacteristic flow is a flow on the characteristic bundle, which is the space whose points (x, \xi) consist of a position x and a lightlike momentum \xi. Therefore the projection of any bicharacteristic to X consists of a worldline. Thus, if the initial data is something like a Dirac mass at x, then the Dirac mass travels along the worldline containing x.

To prove the propagation of singularities theorem, we need a propagation estimate. Recall that if A is a pseudodifferential operator, then WF(A) denotes the microsupport of A; that is, the complement of the largest conic set on which A has order -\infty.

Theorem (propagation estimate). Let U be an open conic set, and let A, B, B_1 \in \Psi^0(X). Let P be a pseudodifferential operator of real principal symbol p and order m.
For every N > 0 and s \in \mathbb R there is C > 0 such that for every distribution u and every inhomogeneous term f with Pu = f,

\displaystyle ||Au||_{H^{s+m-1}} \leq C||B_1 f||_{H^s} + C||Bu||_{H^{s+m-1}} + C||u||_{H^{-N}}

given that the following criteria are met:

  1. The projection of U is precompact in X.
  2. For every (x, \xi) \in U, if p(x, \xi) = 0, then H_p and the radial vector field \xi\partial_\xi are linearly independent at (x, \xi).
  3. WF(A) and WF(B) are contained in U, while WF(1 - B_1) \cap U = \emptyset.
  4. For every trajectory (x(t), \xi(t)) of H_p with (x(0), \xi(0)) \in WF(A), there is T < 0 such that for every T \leq t \leq 0, (x(t), \xi(t)) \in U and (x(-T), \xi(-T)) \in Ell(B).

The term C||u||_{H^{-N}} is an error term created by the use of pseudodifferential operators and is not interesting. The operator B_1 is a cutoff which microlocalizes the problem to a neighborhood to the conic set U. We are interested in WF(u) – WF(f), so we want WF(B_1) \cap WF(f) and B_1|U = 1. Actually, since we only care about the complement of WF(f), we might as well take f Schwartz, in which case we can take B_1 = 1 and simplify the propagation estimate to

\displaystyle ||Au||_{H^{s+m-1}} \leq C||f||_{H^s} + C||Bu||_{H^{s+m-1}} + \text{error terms}.

The interesting point here is the relationship between the operators A and B. We can optimize the propagation estimate by assuming that WF(B) = Ell B. This is because we really desperately want B to be elliptic on its microsupport, so that it does not introduce any new singularities. Under the assumption WF(B) = Ell B, B is a microlocalization to WF(B), and if (x, \xi) \in WF(A), then (x, \xi) got to WF(A) after passing through WF(B). The point is that if u has a singularity at (x, \xi) \in WF(A), then (if the regularity exponent s is taken large enough) ||Au||_{H^{s+m-1}} = \infty, but we assumed f Schwartz, so this implies ||Bu||_{H^{s+m-1}} = \infty, so that if we traveled back along the bicharacteristic flow (x(t), \xi(t)) from (x, \xi) for long enough, we would see that u already had a singularity at some time (x(T), \xi(T)) with T < 0.

Moreover, the propagation estimate is time-reversible in the sense we can replace T < 0 with -T > 0. Thus the bicharacteristic flow neither creates nor destroys singularities in the distribution u. This readily implies the propagation of singularities theorem.

The proof of the propagation estimate is quite technical and this post is meant as a more of a conceptual discussion so I will omit it.

Topology and game design

Aside from being bad at math, I am also bad at Final Fantasy XIV. So it happened that, while attempting to be less bad at Final Fantasy XIV and better understanding an aspect of one of the game’s encounters, I actually became less bad at math, and now I wonder if game developers should incorporate more involved topology into their games’ design.

Final Fantasy XIV as it is.

Let me review how raiding on Final Fantasy XIV works, for those unfamiliar. A group of eight characters, each controlled by one player, fights one “boss” monster. The boss’ attacks are frequently lethal, so to avoid a game over, the players must avoid avoidable attacks. The game is designed so that if an attack can be avoided, it can only be avoided in a particularly precise, and often opaque, manner. Examples of this include deciphering lines of iambic poetry, executing intricate but scripted movement patterns, or interpreting the way to avoid a truly bizarre instant-kill attack using obscure tidbits from the game’s lore. And don’t let yourself get distracted by the head-banging soundtrack

A recent boss, the Shadowkeeper, introduced in Futures Rewritten, has an attack known as Giga Slash which can be solved by thinking of it as inducing an orientation on the platform in which the battle takes place.

Let me remind the reader that an orientation of a curve (a one-dimensional space) is a choice of which direction is considered “right”; an orientation of a surface (a two-dimensional space) is a choice of which direction is considered “clockwise”; an orientation of a three-dimensional space is a choice of which coordinates are considered “right-handed”; and so on. [1]

Giga Slash involves Shadowkeeper drawing a sword, which she then slashes either to the left or to the right of her with, depending on which hand she draws the blade with. In particular, the attack will divide the platform into two rectangles, one of which is lethal to stand in and the other of which is not. What makes Giga Slash more interesting is that frequently Shadowkeeper’s “shadow” — a separate entity — or the player characters’ shadows, will be the origin of the attack instead.

In the former case, the party must either all stand to the left or to the right of the boss’s shadow, where “left” and “right” depend not on the player’s perception but the direction the shadow faces; thus the boss’s position induces a one-dimensional orientation on the platform, which is the orientation that one must use to resolve the attack, rather than the “natural” orientation given by the fact that there is a canonical choice of north, south, east, and west built into the game; players often refer to this orientation as the “absolute positions” and the orientations given by boss positioning (and in this case, shadow position) as “relative positions”.

The fact that the party has to deal with “relative positions” is hardly unusual. What makes Shadowkeeper more unusual is the second case, wherein four characters’ shadows are each the origin of a copy of the attack. In that case, their shadow appears as a black blob which is always to the absolute north, south, east, or west, of the characters, no matter how the characters moves. More abstractly, the shadow can be viewed as a unit vector which originates at the character, and is translated but otherwise not acted on by player movement (and also not acted on by anything else, for that matter). One player is assigned absolute north, one absolute west, et cetra.

The point is that the character shadows will always slash to their left if the boss is holding the sword in her left hand, and vice versa. The goal of the players is to aim the slashes in such a way that there is a safe rectangle. Many guides involve trickery with rotating the camera and so on to ensure that this happens, but there is a simple solution. The hand the boss is using is equivalent to a choice of orientation on the platform. If the boss raises her left hand, then the orientation is counterclockwise; otherwise it is clockwise. Now the players must all stand so that their shadow vector is tangent to the circle centered on the boss’s hitbox, radius a little large than the boss’s hitbox, and oriented according to the boss’s hand, and this will ensure that the interior of the boss’s hitbox is safe, as demonstrated here. Unfortunately, because right and counterclockwise are usually the “positive” orientations in mathematics, but here right is associated with clockwise, I still frequently do this trick incorrectly…😞

Whether the tactic I just outlined is easier or more difficult to execute than just manipulating the camera angle, it demystified how choices of orientation “look” in practice to me. Unfortunately, while topology is my weakest area of math, one needs to choose an orientation in order to define curl [2] and so I wasted a lot of time trying to understand what the vorticity equation was actually trying to say, until Shadowkeeper cleared things up for me.

Final Fantasy XIV as it could be.

In Final Fantasy XIV, most platforms have a very simple geometry, either being a square or a circle. When there have been exceptions, the players have often exploited the geometry to avoid attacks in ways the developers did not intend to be possible, causing the developers to shy away from introducing any nontrivial geometry or topology into the fights. But the above discussion got me thinking: what if we fought a boss on a nonorientable surface, such as a Möbius strip? [3] Along with my friend Greg DeFillippo (who was the mastermind behind some of the below proposed attacks), I have tried to find out.

The first obstacle is determining which direction gravity faces. For this model, I think it’s reasonable that the gravity always face towards the Möbius strip, a la Super Mario Galaxy. However, one could also have a mechanic which reverses the flow of gravity at the whim of one of the players; then if the players are “above” the strip they need the gravity to point downwards, and if they are “below” the strip they need it to point upwards.

One simple attack could consist of a blade that sweeps across the Möbius strip, killing anyone it touches; the only way to dodge it is to simply jump to the other side of the strip. Since the Möbius strip only has one side, the blade eventually sweeps over the entire strip, forcing everyone to dodge twice.

A more interesting example requires the use of a mechanic commonly seen in Final Fantasy XIV known as “proximity”. Proximity requires that two characters be sufficiently far from each other when the attack completes, or they will die. However, since the attack is on a Möbius strip, if the players run too far they will end up close to each other, in spite of how far they have run. (This attack could also be done on a torus, as in Pac-Man, so it does not use nonorientability, but it does use the existence of a nontrivial topology.)

Another example uses a mechanic known as “forced march”. This assigns an arrow to each character which causes them to run in that direction relative to the direction they are facing at the start of the attack. For example, a character that is facing towards true north and assigned a right arrow will run east at the start of the attack. The goal is for the player to face their character in a direction so that they avoid the (possibly several) other attacks that go out at the same time as the forced march. This requires the player to think about orientation; but this becomes much harder to do when the platform itself is nonorientable! For example, if the arrow faced right and the character faced true north to avoid an attack to the west, the character would run east, but then find themself in the west, exactly where they did not want to be.

The forced march can be modified to more strongly use nonorientability. One can locally define what it means to be clockwise, say on the top of the Möbius strip, and this will contradict what it means to be clockwise on the bottom of the strip. If the forced march, instead of a straight line, forced characters to run in a circle, the forced march would have two different effects if the character was on the top or the bottom of the strip. (If the character was on the side of the strip, either the attack would have to kill them instantly, or simply be completely unpredictable.)

I’d love to see other examples of mechanics that are designed for platforms with nontrivial topology. If you can cook up any particular cruel examples, post them in the comments below 😈

Technical notes.

[1] More abstractly, recall that if A \in GL(d) is a d \times d invertible matrix, then the determinant of A is either positive, in which case we say it is orientation-preserving, or negative, in which we case we say it is orientation-reversing. A change of coordinates is said to be orientation-preserving (resp. reversing) if its Jacobian matrix is orientation-preserving (resp. reversing). Thus on an orientable manifold there exist two possible orientations — in the low-dimensional cases, right and left, clockwise and counterclockwise, and right-handed and left-handed.

[2] The curl of a vector field V is by definition the Hodge dual of the derivative of the Hodge dual of V, and Hodge duality is only defined up to a choice of orientation. A much more concrete definition of curl is to first declare that if V is a vector field on a surface, then the curl of V is the angular momentum of a unit mass particle whose velocity field is V, and then if V is a vector field on a three-dimensional space, then the curl of V in the direction of a unit tangent vector e is the curl of V in the plane e^\perp. The trouble is that angular momentum of a particle rotating in positive orientation is by definition positive, so one first needs to decide what one means by positive orientation.

[3] Originally I wanted to do this on a Klein bottle but could not determine how to depict raids on a surface that does not embed in three-dimensional space.

What I want to learn, Spring 2021

As much for my own future reference as for anything, here’s a summary of some things I’d like to learn, maybe not this season, but soon.

First on the docket, I’d like to learn Vasy’s method. This is a technique for meromorphically continuing the resolvent of the Schrödinger operator on an asymptotically hyperbolic manifold — that is, a manifold which, near its boundary, looks like the Poincaré model of hyperbolic space does near its boundary. A priori the definition of the the resolvent only makes sense on a small open subset of the complex plane, and one hopes to show that the definition of the resolvent makes sense on the entire plane, except possibly a discrete set of poles.

On a somewhat similar note, I’d like to learn the Atiyah-Singer index theorem. This theorem equates the Fredholm index of an elliptic pseudodifferential operator on a line bundle L to its “topological index”, which is a rational number defined in terms of the cohomology of L. This is largely motivated by my quest to understand the sense in which cohomology counts solutions to PDE, c.f. my recent post on the genera of Riemann surfaces. I previously tried to learn the heat-kernel proof of Atiyah-Singer shortly after I first learned about pseudodifferential operators but got nowhere. This time, I will armed with the knowledge of the Riemann-Roch theorem, which may make all the difference.

Unlike the previous two requests, which are both PDE-analytic in nature, I think that my knowledge of complex analysis has prepared me to learn the proof that there are twenty-seven lines on a cubic surface in \mathbb P^3. This would entirely be for fun, and I may blog about it, so as to tell the story of a hapless analyst faffing around hopelessly in deep algebra.

Finally, I would like to fix up and publicize the Sage code that is mentioned by my paper on computation of Kac-Moody root multiplicities with Joshua Lin and Peter Connick. I suspect that this will require learning some nontrivial representation theory and complexity theory, though in its current form the algorithm is essentially a consequence of elementary facts about quadratic forms over \mathbb Z.

Elliptic regularity implies that compact genera are finite

A few years ago I took a PDE course. We were learning about something to do with elliptic pseudodifferential operators and the speaker drew a commutative diagram on the board and said, “You see, this comes from a short exact sequence –” and the whole room started laughing in discomfort. The speaker then remarked that Craig Evans himself would ban him from teaching analysis if word of the incident ever leaked, which might have something to do with why I have not disclosed the speaker’s name 🥵

Before recently, I found topology to be quite a scary area of math. It is still very much my weakest suit, but I should like to have some amount of competency with it. I have since come around to the viewpoint that cohomology is just a clever gadget for counting solutions of PDE. This has made the pill a little easier to swallow, and makes the previous anecdote all the more awkward.

As part of my ventures into trying to learn topology, in this post I will give a proof that the genus of any compact Riemann surface is finite. I am confident that this proof is not original, because it’s sort of the obvious proof if an analyst trying to prove this fact just followed their nose, but it seems a lot more natural to me than the proof in Forster, so let’s do this.

[Since the time of writing, I have made some corrections to incorrect or confusing statements. Thanks to Sarah Griffith for pointing these out!]

Let us start with some generalities. Fix a compact Riemann surface {X}, references to which we will suppress when possible. Let

\displaystyle 0 \rightarrow A \rightarrow B \rightarrow C \rightarrow 0

be a short exact sequence of sheaves. In our case, the sheaves will be sheaves of Fréchet spaces on {X}, which might not be homologically kosher, but that won’t cause any real issues. Then we get a long exact sequence in cohomology

\displaystyle 0 \rightarrow H^0(A) \rightarrow H^0(B) \rightarrow H^0(C) \rightarrow H^1(A) \rightarrow H^1(B) \rightarrow H^1(C) \rightarrow \cdots.

If B is a fine sheaf, i.e. it has partitions of unity subordinate to every open cover, then {H^1(B) = 0} and the long exact sequence collapses to the exact sequence

\displaystyle 0 \rightarrow H^0(A) \rightarrow B(X) \rightarrow C(X) \rightarrow H^1(A) \rightarrow 0.

In particular, the morphism of sheaves {B \rightarrow C} induces a bounded linear map {T: B(X) \rightarrow C(X)} such that {H^0(A)} is the kernel of {T} and {H^1(A)} is the cokernel of {T}. Now, if {T} is a Fredholm operator, then its index {k} satisfies

\displaystyle k = \text{dim } H^0(A) - \text{dim } H^1(A).

Let {\mathcal O} denote the sheaf of holomorphic functions on {X} and {\overline \partial} the Cauchy-Riemann operator. Let {\mathcal E} denote the sheaf of smooth functions on {X}; since {X} has enough partitions of unity, {\mathcal E} is a fine sheaf. The maps {\overline \partial: \mathcal E(U) \rightarrow \mathcal E(U)}, for {U \subseteq X} open, induces a short exact sequence of sheaves of Fréchet spaces

\displaystyle 0 \rightarrow \mathcal O \rightarrow \mathcal E \rightarrow \mathcal E \rightarrow 0

and hence an exact sequence in cohomology

\displaystyle 0 \rightarrow \mathbf C \rightarrow \mathcal E(X) \rightarrow \mathcal E(X) \rightarrow H^1(\mathcal O) \rightarrow 0.

Here we used Liouville’s theorem. On the other hand, the dimension of {H^1(\mathcal O)} is by definition the genus {g} of {X}. Therefore, if {k} is the Fredholm index of {\overline \partial}, then

\displaystyle g = 1 - k.

It remains to show that {k} is well-defined and finite; that is, {\overline \partial} is Fredholm. This is a standard elliptic regularity argument, which I will now recall. We first fix a volume form {dV} on {X}, which exists since {X} is an orientable surface. This induces an {L^2} norm on {X}, namely

\displaystyle ||u||_{L^2} = \int_X |u|^2 ~dV.

Unfortunately the usual Sobolev notation {H^s} clashes with the notation for cohomology, so let me use {W^s} to denote the completion of {\mathcal E} under the norm

\displaystyle ||u||_s = \sum_{|\alpha| \leq s} ||\partial^\alpha u||_{L^2}

where {\alpha} ranges over multiindices. Then {W^0 = L^2} and {\overline \partial} maps {W^1 \rightarrow W^0}. The kernel of {\overline \partial} is finite-dimensional (since it is isomorphic to {\mathbf C}, by Liouville’s theorem and Weyl’s lemma), so to deduce that {\overline \partial} is Fredholm as an operator {W^1 \rightarrow W^0} it suffices to show that the cokernel of {\overline \partial} is finite-dimensional.

We first claim the elliptic regularity estimate

\displaystyle ||u||_1 \leq C ||f||_0 + C ||u||_0

for any smooth functions u,f which satisfy {\overline \partial}u = f. By definition of the Sobolev norm, we have

\displaystyle ||u||_1 = ||u||_0 + ||u'||_0 + ||f||_0.

Without loss of generality, we may assume that {u} is smooth. Then we can write {u = v + w} where {v} and {\overline w} are holomorphic. In particular, {u' = v'} and {f = \overline \partial w}, so

\displaystyle ||u||_1 = ||u||_0 + ||v'||_0 + ||f||_0.

The only troublesome term here is {v'}. Taking a Cauchy estimate, we see that

\displaystyle |v'(z)| \leq ||v||_{L^\infty} \leq C||v||_{L^2} = C||v||_0.

But {X} is compact, so has finite volume; therefore

\displaystyle ||v'||_0 = ||v'||_{L^2} \leq C||v||_{L^\infty} \leq C||v||_0 \leq C||u||_0.

This gives the desired bound.

Let {u_n} be a sequence in {W^1} with {f_n = \overline \partial u_n \in W^0}, and assume that the {f_n} are Cauchy in {W^0}. Without loss of generality we may assume that {u_n \in K^\perp} where {K} is the kernel of {\overline \partial}. If the {u_n} are not bounded in {W^1}, we may replace them with {u_n/||u_n||_1}, and thus assume that they are in fact bounded. By the Rellich-Kondrachov theorem (which says that the natural map {W^1 \rightarrow W^0} is compact), we may therefore assume that the {u_n} are Cauchy in {W^0}. But then

\displaystyle ||u_n - u_m||_1 \leq C ||f_n - f_m||_0 + C ||u_n - u_m||_0

so the {u_n} are Cauchy in {W^1}. Therefore the {u_n} converge in {K^\perp}, hence the {f_n} converge in the image {Z} of {\overline \partial}, since {\overline \partial} gives an isomorphism {K^\perp \rightarrow Z}. Therefore {Z} is closed.

If one applies integration by parts to {\overline \partial}, the fact that X has no boundary implies that for any f,g,

\displaystyle \langle \overline \partial f, g\rangle = \int_X \overline \partial f \overline g ~dV = -\int_X f \overline{\partial g} ~dV = -\langle f, g'\rangle

and thus \overline \partial^* = -\partial. Since Z is closed, the dual of the cokernel of {\overline \partial} is the kernel L of -\partial; by the Rellich-Kondrachov theorem, the unit ball of L is compact and therefore L is finite-dimensional. By the Hanh-Banach theorem, this implies that the cokernel of {\overline \partial} is finite-dimensional. Therefore {k} and hence {g} is finite.

A PDE-analytic proof of the fundamental theorem of algebra

The fundamental theorem of algebra is one of the most important theorems in mathematics, being core to algebraic geometry and complex analysis. Unraveling the definitions, it says:

Fundamental theorem of algebra. Let f be a polynomial over \mathbf C of degree d. Then the equation f(z) = 0 has d solutions z, counting multiplicity.

Famously, most proofs of the fundamental theorem of algebra are complex-analytic in nature. Indeed, complex analysis is the natural arena for such a theorem to be proven. One has to use the fact that \mathbf R is a real closed field, but since there are lots of real closed fields, one usually defines \mathbf R in a fundamentally analytic way and then proves the intermediate value theorem, which shows that \mathbf R is a real closed field. One can then proceed by tricky algebraic arguments (using, e.g. Galois or Sylow theory), or appeal to a high-powered theorem of complex analysis. Since the fundamental theorem is really a theorem about algebraic geometry, and complex analysis sits somewhere between algebraic geometry and PDE analysis in the landscape of mathematics (and we need some kind of analysis to get the job done; purely algebro-geometric methods will not be able to distinguish \mathbf R from another field K such that -1 does not have a square root in K) it makes a lot of sense to use complex analysis.

But, since complex analysis sits between algebraic geometry and PDE analysis, why not abandon all pretense of respectability (that is to say, algebra — analysis is not a field worthy of the respect of a refined mathematician) and give a PDE-analytic proof? Of course, this proof will end up “looking like” multiple complex-analytic proofs, and indeed it is basically the proof by Liouville’s theorem dressed up in a trenchcoat (and in fact, gives Liouville’s theorem, and probably some other complex-analytic results, as a byproduct). In a certain sense — effectiveness — this proof is strictly inferior to the proof by the argument principle, and in another certain sense — respectability — this proof is strictly inferior to algebraic proofs. However, it does have the advantage of being easy to teach to people working in very applied fields, since it entirely only uses the machinery of PDE analysis, rather than fancy results such as Liouville’s theorem or the Galois correspondence.

The proof
By induction, it suffices to prove that if f is a polynomial with no zeroes, then f is constant. So suppose that f has no zeroes, and introduce g(z) = 1/f(z). As usual, we want to show that g is constant.

Since f is a polynomial, it does not decay at infinity, so g(\infty) is finite. Therefore g can instead be viewed as a function on the sphere, g: S^2 \to \mathbf C, by stereographic projection. Also by stereographic projection, one can cover the sphere by two copies of \mathbf R^2, one centered at the south pole that misses only the north pole, and one centered at the north pole that only misses the south pole. Thus one can define the Laplacian, \Delta = \partial_x^2 + \partial_y^2, in each of these coordinates; it remains well-defined on the overlaps of the charts, so \Delta is well-defined on all of S^2. (In fancy terminology, which may help people who already know ten different proofs of the fundamental theorem of algebra but will not enlighten anyone else, we view S^2 as a Riemannian manifold under the pushforward metric obtained by stereographic projection, and consider the Laplace-Beltrami operator of S^2.)

Recall that a function u is called harmonic provided that \Delta u = 0. We claim that g is harmonic. The easiest way to see this is to factor \Delta = 4\partial\overline \partial where 2\partial = \partial_x - i\partial_y. Then \overline \partial u = 0 exactly if u has a complex derivative, by the Cauchy-Riemann equations. There are other ways to see this, too, such as using the mean-value property of harmonic functions and computing the antiderivative of g. In any case, the proof is just calculus.

So g is a harmonic function on the compact connected manifold S^2; by the extreme value theorem, g has (or more precisely, its real and imaginary parts have) a maximum. By the maximum principle of harmonic functions (which is really just the second derivative test — being harmonic generalizes the notion of having zero second derivative), it follows that g is equal to its maximum, so is constant. (In fancy terminology, we view g as the canonical representative of the zeroth de Rham cohomology class of S^2 using the Hodge theorem.)