What’s wrong with the Museum of Math?

I’d like to bring attention to an open letter cosigned by several staff members of the National Museum of Mathematics (hereinafter MoMath) and addressed to its board of directors and CEO, Cindy Lawrence. In the comments of a blog post of Sam Shah, several other staff members corroborate the allegations in the letter.

While you really should read the open letter and comments yourself, I would like to in particular stress how outrageous the allegation about the policy of MoMath concerning Title I schools is. Recall that Title I schools are those public schools which have been identified as having a large amount of low-income students, and which have been given additional funding from the US Department of Education in order to promote those students’ educations. If the allegation is true, MoMath offers scholarships to allow Title I schools to have field trips to MoMath for free, but then discriminates against them by having shorter educational sessions, so that students will not have time to solve the problems that they are posed. This can only serve to discourage them from mathematics, leaving everyone worse off.

Math is for everybody, and this is more than just a flashy slogan. Public American K-12 education is notorious for spreading the philosophy that mathematics is an innate ability, rather than a skill that can be trained; this creates a clear inequity between those children (typically of wealthier, more educated parents) who believe that they can do mathematics, and those who do not, which later carries over to income inequality in adulthood. Moreover, one cannot really do away with incentives for students to learn mathematics. On a less economic and more aesthetic level, MoMath’s mission statement proposes to encourage “broad and diverse audience to understand the evolving, creative, human, and aesthetic nature of mathematics” — a task that it has evidently failed at.

My high school was woefully underfunded, though not Title I. Our treatment of mathematics was shallow and only a tiny percentage of my class ended up taking any math beyond Calculus 1 in high school. I really had no idea what mathematics was, or that one could pursue it as a career — I ended up in this business somewhat by accident! Something like MoMath would have been a wonderful experience for me, and probably many of my classmates who never really learned what mathematics was. The same holds, I suspect, for many students at Title I schools. But even those who are able to visit MoMath will have any benefits from the trip denied to them.

Linear algebra done dubiously

A book that has been a contentious topic of discussion is Linear Algebra Done Right, by Axler. The reason, at least ostensibly[1], is because Axler’s treatment avoids the discussion of determinants. For the critics’ part, Axler himself seems to play this up, marketing the book as a revolutionary treatment where determinants are not discussed. Apparently, Sergei Treil found this marketing so offensive that he wrote a competing textbook known as Linear Algebra Done Wrong.

I do not quite buy the hype here. There’s a whole chapter on determinants in Axler’s book, which even includes a discussion of Jacobian determinants. Axler just doesn’t use determinants to prove the three main theorems of intermediate linear algebra over an algebraically closed field \overline K, namely the fact that every linear operator has an eigenvalue, that every linear operator has a unique Jordan canonical form, and the Cayley-Hamilton theorem. In all of these cases, one could prove the theorem using determinants, but there’s no good reason to, since there is a perfectly reasonable structure theory of linear operators over \overline K which does not mention determinants, and it gives fairly easy and conceptual proofs to all three theorems.

(I don’t think Axler’s book is perfect, for the record. Most annoyingly, he doesn’t seem to clearly distinguish between theorems that are valid over general \overline K and theorems that are specifically valid over \overline K = \mathbb C, which is the case for most of the results in the latter half of the book, except for one single chapter about the structure theory over \mathbb C. But I do think that a lot of the angry comments I’ve seen about the book on Reddit and elsewhere, which mainly focus on the issue of determinants, are just totally out to lunch.)

Anyways, it occurred to me today that the way I like to think about linear algebra neither involves determinants nor Axler’s structure theory, but is rather a complex-analytic version of linear algebra. I don’t think it essentially uses complex analysis though, and could probably be adopted to general \overline K.

The point is to consider the resolvent R(z) = (T - z)^{-1} of the linear operator T acting on a vector space V of dimension n, which is a rational map from \mathbb P^1 to a space of matrices. Clearly an eigenvalue is a pole of R, and the number of poles equals the number of zeroes (this is clearly true when \overline K = \mathbb C, but I suspect it is true for arbitrary \overline K). Since R has a zero of order n at \infty, T must have n eigenvalues. (If \overline K = \mathbb C, Rouche’s theorem even gives a bound on the size of the eigenvalues, and a way to compute approximations to the eigenvalues.)

Now we have n eigenvalues z_1, \dots, z_n counted with multiplicity. If \overline K = \mathbb C, we may consider loops \gamma_j around the puncture of \mathbb C \setminus \{z_j\} and define P_j = \frac{z_j}{2\pi i} \int_{\gamma_j} R(z) ~dz and similarly N_j = \frac{1}{2\pi i} \int_{\gamma_j} (z - z_j)R(z) ~dz. It is now a straightforward consequence of Cauchy’s integral formula that N_j is nilpotent and we have A = \sum_j P_j + N_j. Furthermore, if V_j is the image of P_j, then A acts on V_j as A = \lambda_j + N_j, and we have a direct sum decomposition V = \bigoplus_j V_j. That implies that A = \sum_j P_j + N_j is the Jordan canonical form of A. Let me leave the details to these notes of Knill. I would be very interested to see if an argument like this can be used in the general case of \overline K an algebraically closed field, possibly by replacing \gamma_j by the generator of some algebraic analogue of the fundamental group of the “open subscheme” (if that makes any sense) \overline K \setminus \{z_j\}, and replacing the differential form (z - z_j)R(z) ~\frac{dz}{2\pi i} with some sort of algebraic analogue of cohomology.

It remains to prove the Cayley-Hamilton theorem. (This proof, which was shown to me by Charles Pugh, is what got me thinking about linear algebra in this fashion in the first place.) Recall that the Cayley-Hamilton theorem says that if p is a characteristic polynomial of T, thus the zeroes of p are the eigenvalues of T, then p(T) = 0. This is obviously true if T is diagonalizable.

Now, the set of diagonalizable matrices is dense, because for example it includes the set of matrices with distinct eigenvalues, which is a generic set. On the other hand, the set Z of matrices with the Cayley-Hamilton property is closed, since p is continuous. Since clearly the space of all matrices is connected we conclude that Z = \overline K. This argument ostensibly works over \overline K = \mathbb C, but with a little work, it also holds for arbitrary \overline K, because we may use the Zariski topology.

This would be a pretty horrible way to teach linear algebra, but maybe one could simplify it so that it’s not so horrible.

[1] Axler has a signature, and quite clear and amicable, writing style, unlike most older textbooks. How much of the actual debate here is just Bourbakists in shambles?

Much ado about large cardinals

Lately, with Peter Scholze’s MathOverflow post about Grothendieck universes and the Isabelle/HOL implementation of schemes, it seems that in the sphere of online math there has been a somewhat renewed interest in when large cardinals make proving theorems easier. (Specifically, it is not necessary that one actually needs the large cardinals to prove the theorem — only that it makes the proof easier!) So I thought it would be fun to look through some old homework of mine and see if I could find an example where if I had allowed myself the use of a large cardinal, my life would have been easier. I found an example from when I took a course in C*-algebras a few years ago.

Let X be a locally compact Hausdorff space. By a compactification of X we mean an open dense embedding X \to Y where Y is a compact Hausdorff space. By Alexandroff’s theorem, X always has a compactification, but in general if X is not compact then X may have multiple compactifications. We consider the category Comp X of compactifications of X equipped with continuous surjections which preserve X; the Alexandroff compactification is the final object of Comp X.

The Stone–Čech theorem. The category Comp X has an initial object.

One may show that the initial object of Comp X is \text{Spec } C_b(X) where C_b(X) is the Banach space of bounded continuous functions on X with its supremum norm, and the functor Spec is taken in the sense of C*-algebras (thus Spec A consists of maximal closed ideals equipped with the Zariski-Jacobson topology). This proof is presumably inoffensive to anyone who accepts ZFC (and offensive to anyone who does not, since one needs Zorn’s lemma to show that C_b(X) has a maximal ideal in general — and ZF alone cannot prove that Comp X has an initial object).

However, for the purposes of the result I was trying to prove, I needed a proof of the Stone–Čech theorem that did not rely on the existence of \text{Spec } C_b(X), or else my argument would have been circular. To do this, one proceeds as follows. If Z \to Y is a morphism in Comp X, then since X is dense in Z, the underlying continuous surjection Z \to Y is completely determined by its behavior on X, but it is also the identity on X. Therefore Comp X is a poset category. Let \mathcal C be a chain in Comp X; then \mathcal C is an inverse system of topological spaces, and if C is the inverse limit of \mathcal C, then one can show that there is a closed embedding C \to \prod \mathcal C. Since \prod \mathcal C is a compact Hausdorff space by Tychonoff’s theorem, so is C. Taking the inverse limits of the open dense embeddings X \to Y, where Y \in \mathcal C, we obtain an open dense embedding X \to C, so C is an upper bound of \mathcal C in Comp X.

At this point, one may proceed in two ways. Working in ZFC, it is only valid to apply Zorn’s lemma if Comp X is equivalent to a small category, but \text{Comp } \mathbb N is a large category. To see that Comp X is equivalent to a small category, it suffices to show that there is a cardinal \kappa such that every compactification of Comp X has at most \kappa points; then for every compactification Y of X, one can find a compactification Z of X such that Y \cong Z in Comp X, and the set-theoretic rank of Z is at most \kappa, and so Comp X is a subset of the set V_\kappa. Furthermore, if Y is a compactification of X and y \in Y, then, since X is dense in Y, by the boolean prime ideal theorem there is an ultrafilter U on the set Open X of open subsets of X such that \lim U = y. Since Y is Hausdorff, it follows that y is the UNIQUE limit of U, but some cardinal arithmetic can be used to show that if \lambda is the cardinality of X, then there are only 2^{2^\lambda} ultrafilters on Open X (since elements of an ultrafilter on Open X are open subsets of X), so the cardinality of Y is at most 2^{2^\lambda}. Therefore we may let \kappa = 2^{2^{\lambda}}.

Okay, that was stupid. We can also proceed by large cardinals. The following argument feels much more conceptual to me:

Definitions. Let \delta > \aleph_0 be a regular cardinal. We say that \delta is an inaccessible cardinal if for every cardinal \lambda < \delta, 2^\lambda < \delta. We say that \delta is a hyperinaccessible cardinal if \delta is an inaccessible cardinal and there is an increasing chain of inaccessible cardinals \delta_\alpha such that \lim_\alpha \delta_\alpha = \delta.

Let \delta be a hyperinaccessible cardinal and suppose that \text{card }X < \delta. Then there are inaccessible cardinals \text{card }X < \kappa < \kappa' < \delta. If X \in V_\kappa and Y is a compactification of X, then Y can be obtained as an extension of the Alexandroff compactification by splitting nets, but V_\kappa is a Grothendieck universe and so the topology of X can be already probed by nets in V_\kappa; therefore Y \in V_\kappa. Therefore \text{Comp } X \subseteq V_\kappa is a small category in V_{\kappa'}, so X has a Stone–Čech compactification \beta X with \text{card } \beta X < \kappa' < \delta.

This argument looks verbose, but only because I have written out the details; I think in practice I would just say that if X lies underneath an inaccessible cardinal \kappa, then enough nets to probe the topology of X are also under \kappa, so every compactification is as well.

Sundry facts about pseudodifferential operators

In this blog post I will just record some things I’ve been trying to learn about lately, largely just so I can have a place to collect my thoughts. Most of this is in Hörmander’s monograph on differential operators, and is motivated by trying to understand Vasy’s method and Atiyah-Singer index theory.

Pseudodifferential operators on manifolds.

Let us recall that a symbol on an open subset X of \mathbb R^d is by definition a smooth function on the cotangent bundle of X (for which certain seminorms are finite). This was curious to me — you can motivate it by saying that a symbol is an observable and the cotangent bundle is “phase space” in the sense that a point (x, \xi) \in T^*X consists of a position x and a momentum \xi, but why should the momentum live in a cotangent space and not the fiber of some other vector bundle? When we quantize a symbol a, defining an operator a(D) by formally substituting the differential operator D = -i\nabla in place of the momentum, we by definition obtain a pseudodifferential operator. Now let \kappa: X \to Y be a diffeomorphism, and introduce the pushforward symbol \kappa_* a(y, \eta) = e^{-iy\eta} a(\kappa^{-1}(y), D) a^{iy\eta}. This is the “right” definition in the sense that )\kappa_*a(x, D)u)(\kappa(x)) = a(x, D)u(\kappa(x)).

If a is a symbol of order m, then \kappa_* a(y, \eta) = a(\kappa^{-1}(y), \kappa'(\kappa^{-1}(y))^t \eta) modulo symbols of order m – 1. But \kappa'(x) is invariantly defined as an isomorphism of tangent bundles \kappa'(x): TX \to TY, so its transpose should be an isomorphism (\kappa')^{-1}(x): T^*Y \to T^*X of the dual bundle. This only makes sense if \eta \in T^*_yY is a covector at y.

The above paragraphs are totally obvious, and yet puzzled me for the past three years, until last week when I sat down and decided to work out the details for myself.

The consequence is that we cannot define the symbol of a pseudodifferential operator invariantly. Rather, we declare that a pseudodifferential operator A has the property that for every chart \kappa: X \to Y and every pair of cutoffs \phi, \psi on Y, then the operator \phi \circ \kappa_* \circ A \circ \kappa^* \circ \psi is a pseudodifferential operator on Y (in the sense that it is the quantization of a symbol on Y; here the pushforward \kappa_* is defined to be the inverse of the pullback \kappa^*). Since Y is an open subset of \mathbb R^d this makes sense.

Previously we have discussed pseudodifferential operators on manifolds M. These can be viewed more abstractly as acting on sections of the trivial line bundle M \times \mathbb C. However, in geometry one frequently has to deal with sections of more general vector bundles over M. For example, a 1-form is a section of the cotangent bundle. If E, F are vector bundles over M of rank r, s respectively, one may define the Hom-bundle Hom(E, F), which locally is isomorphic to the matrix bundle M \times \mathbb C^{r \times s}. Then a pseudodifferential operator from sections of E to sections of F is nothing more than a linear map which, after trivialization of E and F, looks like a $s \times r$ matrix of pseudodifferential operators on M. The principal symbol of such an operator sends the cotangent bundle of M into the Hom-bundle Hom(E, F).

Wavefront sets.

In this section we will impose that all pseudodifferential operators have Schwartz kernels K such that the projections of supp K are both proper maps. Modulo the space \Psi^{-\infty} of pseudodifferential operators of order -\infty, this assumption is no loss of generality. Under this assumption, the top-order term of a symbol — that is, the principal symbol — satisfies the pushforward formula \kappa_* a(y, \eta) = a(\kappa^{-1}(y), \kappa'(\kappa^{-1}(y))^t \eta), so the principal symbol is well-defined as an element of S^m/S^{m-1} (here S^\ell is the \ellth symbol class). The principal symbol encodes important information about the nature of the operator; for example we have:

Definition. An elliptic pseudodifferential operator of order m is one whose principal symbol is \sim |\xi|^m near infinity of each cotangent space.

The important property is that if A is an elliptic pseudodifferential operator, then A is also invertible modulo the quantization \Psi^{-\infty} of S^{-\infty}. For example the Laplace-Beltrami operator is elliptic on Riemannian manifolds since its symbol is \xi^2; since the quadratic form induced by a Lorentzian metric is not positive-definite, it follows that on Lorentzian manifolds, the Laplace-Beltrami operator is not elliptic. Since a Lorentzian Laplace-Beltrami operator is really just the d’Alembertian, whose symbol is \xi^2 - \tau^2, this should be no surprise.

Recall that a conic set in a vector space is a set which is closed under multiplication by conic scalars. A conic set in a vector bundle, then, is one which is conic in every fiber.

Definition. Let a be the principal symbol of a pseudodifferential operator A of order m. We say that A is noncharacteristic near (x_0, \xi_0) \in T^*M if there is a conic neighborhood of (x_0, \xi_0) wherein a(x, \xi) \sim |\xi|^m near infinity. Otherwise, we say that (x_0, \xi_0) is a characteristic point. The set of characteristic points is denoted Char A and the set of noncharacteristic points is denoted Ell A.

Thus a pseudodifferential operator A is noncharacteristic at (x, \xi) if in a neighborhood of x, A is elliptic when restricted to the direction \xi. By definition, Char A is closed, so we may make the following definition.

Definition. Let u be a distribution. The wavefront set WF(u) is the intersection of all sets Char A, where A ranges over pseudodifferential operators such that Au \in C^\infty.

Then WF(u) is a closed conic subset of the cotangent bundle T^*M, and its projection to M is exactly the singular support ss(u). Indeed, x \notin ss(u) iff for every pseudodifferential operator A in a sufficiently small neighborhood of x, Au \in C^\infty; in other words no matter how hard we try, we cannot force u to become singular without differentiating it away from x. The wavefront set also remembers the direction in which this singularity happens; by elliptic invertibility, it will not happen in a direction that A is noncharacteristic.

For example, the only way that u(x, y) = \delta_{y = 0} can be made smooth is by cutting off u to away from \{(x, y): y = 0\}, which can be done by pseudodifferential operators of order 0 which are elliptic in the x-direction, but not possibly in the y-direction, along the x-axis.

Pseudotransport equations.

Hyperbolic operators are meant to generalize the transport equation (\partial_t - \partial_x)u(t, x) = 0. Let us therefore begin by studying the “pseudotransport” equation (\partial_t + a(t, x, D_x))u(t, x) = 0.

We assume that t \mapsto a(t, x, D_x) is uniformly bounded in S^1 and continuous in C^\infty, and the real part of a is uniformly bounded from below. Then we have the energy estimate

\displaystyle \frac{1}{2} \int_0^T ||e^{-\lambda t} u(t)||_{H^s}^p \lambda~dt \leq ||u(0)||_{H^s}^p

valid for any s \in \mathbb R and \lambda large enough depending on s. Applying the Hanh-Banach theorem we conclude that for every initial data in H^s we can find u \in C^0([0, \infty) \to H^s) which solves the pseudotransport equation. In particular, given Schwartz initial data, it follows that u is smooth.

Now fix initial data \phi \in H^s and assume that the principal symbol exists and is imaginary. (This forces the transport operator to be real and of order 1.) Let q be a symbol of order 0 on space, with principal symbol q_0. If in fact Q(D) is a pseudodifferential operator on spacetime such that such at time 0, Q(0) = q, and Q(t, D) commutes with \partial_t + a(t, x, D_x) then Qu solves the pseudotransport equation. (Actually, we will find Q so that [Q(t), \partial_t + a(t, x, D_x)] is a pseudodifferential operator of order -\infty; this is good enough.) In particular if q\phi \in C^\infty_0 then WF(u) is contained in Char Q, and WF(u) should be the intersection of all such sets Char Q.

To compute WF(u), let ia_0 be the principal symbol of a(D) and suppose that Q \sim \sum_j Q_j, where Q_0 is principal, is given. Then the principal symbol of [\partial_t + a(t, x, D_x), Q(t, x, D)] is the Poisson bracket

\displaystyle \{\tau + a_0(t, x, \xi), Q_0(t, x, \xi)\} = (\partial_t + H_{a_0})Q_0

where H_p is the Hamilton vector field of a symbol p. By inducting on j, we can use this computation to compute Q_j and conclude that modulo an error term of order -\infty, we can choose Q to be invariant along the Hamiltonian flow \psi given by the Hamiltonian a_0. That is, if F_tu(0) = u(t), then WF \circ F_t = \psi_t \circ WF. This result is a sort of “propagation of singularities” for the pseudotransport equation, which generalizes the fact that the transport equation acts on Dirac masses by transporting them, as expected.

Solving the hyperbolic Cauchy problem.

Let X be a manifold that represents “spacetime”. A priori we may not have a Lorentzian metric to work with, so instead we fix a function \phi that is a “time coordinate”. The level surfaces of \phi can be viewed as “spacelike hypersurfaces” in X.

Throughout we will let X_0 = \{\phi = 0\} and X_+ = \{\phi > 0\} denote the present and future, respectively.

Definition. A hyperbolic operator is a differential operator P of principal symbol p and order m such that p(x, d\phi(x)) = 0 and for every (x, \xi) \in T^*M such that \xi is not in the span of d\phi, there are m distinct \tau \in \mathbb R such that p(x, \xi + \tau d\phi(x)) = 0.

Since P is a differential operator, p(x) is a homogeneous polynomial of order m. To make sense of the condition, let me restrict to the case that X = \mathbb R^2 with its usual Riemannian metric and \phi is the projection onto the t-axis. Then after rotating the first coordinate so that \xi is a covector dual to the x-axis, the condition says that given (x, t, \xi) we can find exactly m real numbers \tau such that p(x, t, \xi, \tau) = 0. In the case of the d’Alembertian, we have p(x, t, \xi, \tau) = \xi^2 - \tau^2, and indeed given \xi we can set \tau = \pm \xi.

To state the initial-value problem with initial data in the “initial-time slice” X_0, let v be a vector field such that v\phi = 1, so v points “forward in time”. The action of v is “differentiating with respect to time”. Note that this hypothesis prevents \phi from degenerating.

Theorem (solving the hyperbolic Cauchy problem). Let P be a hyperbolic operator of order m with smooth coefficients, Y a precompact open submanifold of X, and s \geq 0. Assume we are given an inhomogeneous term f \in H^s_{loc}(X_+) satisfying f|X_0 = 0 and initial data \psi_j \in H^{loc}_{s + m - 1 -j}(X_0), j < m. Then there is u \in H^{s + m - 1}_{loc}(X) supported in \overline X_+ such that Pu = f in X_+ \cap Y and v^ju = \psi_j in X_0 \cap Y.

The proof is in Chapter 23.2 of Hörmander. The idea is to first prove uniqueness of solutions. By compactness, we may cover Y with finitely many charts U which are isomorphic to open subsets of Minkowski spacetime in which level sets of \phi are spacelike hypersurfaces and orbits of v are worldlines. Since Minkowski spacetime has an honest-to-god time coordinate, the hyperbolicity hypothesis allows us to factor the principal symbol p into first-order factors, and hence factor P into pseudotransport operators on U, at least modulo a lower-order error. We may then apply the solution of the Cauchy problem for pseudotransport operators to solve the Cauchy problem for Pu = f in each chart U, and since there were only finitely many, uniqueness allows us to stitch the local solutions together into a global solution.

The proof outlined in the above paragraph is motivated by the special case when P is the d’Alembertian, which already appears in Chapter 2 of Evans. In that proof, one first observes that the Cauchy problem for the transport equation has an explicit solution. Then one reduces to the case that spacetime is two-dimensional, in which case there is an explicit factorization of P into transport operators, namely P = (\partial_x - \partial_t)(\partial_x + \partial_t).

Propagation of singularities, part I.

To study the propagation of singularities we need to recall some symplectic geometry. Let Q be a pseudodifferential operator on X and q its principal symbol. Then the Hamilton vector field H_q induces a flow on T^*X which preserves q.

Definition. The bicharacteristic flow of a pseudodifferential operator Q of principal symbol q is the flow of H_q on q^{-1}(0). A bicharacteristic of Q is an orbit of the bicharacteristic flow.

The intuition for the bicharacteristic flow is that its projection to X is “lightlike”, at least if Q is the d’Alembertian.

Theorem (Hörmander’s propagation of singularities). Let P be a pseudodifferential operator of order m such that the Schwartz kernel of P has proper support, and the principal symbol of P is real. Then for every distribution u, WF(u) – WF(f) is invariant under the bicharacteristic flow of P.

By definition of the wavefront set, for every distribution u, WF(u) – WF(Qu) is contained in Char Q. But if Q is a differential operator, then Char Q is exactly the “characteristic variety” q^{-1}(0), which is exactly the variety where the bicharacteristic flow of Q is defined. Therefore we can ask that WF(u) – WF(Qu) be invariant under the bicharacteristic flow.

If P is a hyperbolic operator of principal symbol p, then the solutions \tau of the equation p(x, \xi + \tau d\phi(x)) = 0 are all real and distinct, and modulo lower-order terms this can be used to enforce that the coefficients of p are real. We phrase this more simply by saying that the principal symbol of every hyperbolic operator is real.

A partial converse to the reality of principal symbols of hyperbolic operators holds. If Q is a differential operator, then its principal symbol q is a homogeneous polynomial on each cotangent space. Fixing a particular cotangent space, we can write q(\xi) = \sum_\alpha c_\alpha \xi^\alpha where \alpha ranges over all multiindices of order m and c_\alpha \in \mathbb R. In order that the characteristic variety of Q have more than one real point, there must be some c_\alpha positive and some negative. But this is exactly the situation of the d’Alembertian, whose principal symbol is q(\xi, \tau) = \xi^2 - \tau^2.

Thus, while the propagation of singularities theorem only assumes that the principal symbol is real, if the operator P is (for example) elliptic or parabolic, then the conclusion of the theorem is degenerate in the sense that the characteristic variety only has a single real point, so that WF(u) – WF(f) is invariant under EVERY group action on the characteristic variety, not just the bicharacteristic flow.

The interpretation of the propagation of singularities theorem is that P is something like the d’Alembertian, in which case p is something like a Lorentzian metric. The bicharacteristic flow is a flow on the characteristic bundle, which is the space whose points (x, \xi) consist of a position x and a lightlike momentum \xi. Therefore the projection of any bicharacteristic to X consists of a worldline. Thus, if the initial data is something like a Dirac mass at x, then the Dirac mass travels along the worldline containing x.

To prove the propagation of singularities theorem, we need a propagation estimate. Recall that if A is a pseudodifferential operator, then WF(A) denotes the microsupport of A; that is, the complement of the largest conic set on which A has order -\infty.

Theorem (propagation estimate). Let U be an open conic set, and let A, B, B_1 \in \Psi^0(X). Let P be a pseudodifferential operator of real principal symbol p and order m.
For every N > 0 and s \in \mathbb R there is C > 0 such that for every distribution u and every inhomogeneous term f with Pu = f,

\displaystyle ||Au||_{H^{s+m-1}} \leq C||B_1 f||_{H^s} + C||Bu||_{H^{s+m-1}} + C||u||_{H^{-N}}

given that the following criteria are met:

  1. The projection of U is precompact in X.
  2. For every (x, \xi) \in U, if p(x, \xi) = 0, then H_p and the radial vector field \xi\partial_\xi are linearly independent at (x, \xi).
  3. WF(A) and WF(B) are contained in U, while WF(1 - B_1) \cap U = \emptyset.
  4. For every trajectory (x(t), \xi(t)) of H_p with (x(0), \xi(0)) \in WF(A), there is T < 0 such that for every T \leq t \leq 0, (x(t), \xi(t)) \in U and (x(-T), \xi(-T)) \in Ell(B).

The term C||u||_{H^{-N}} is an error term created by the use of pseudodifferential operators and is not interesting. The operator B_1 is a cutoff which microlocalizes the problem to a neighborhood to the conic set U. We are interested in WF(u) – WF(f), so we want WF(B_1) \cap WF(f) and B_1|U = 1. Actually, since we only care about the complement of WF(f), we might as well take f Schwartz, in which case we can take B_1 = 1 and simplify the propagation estimate to

\displaystyle ||Au||_{H^{s+m-1}} \leq C||f||_{H^s} + C||Bu||_{H^{s+m-1}} + \text{error terms}.

The interesting point here is the relationship between the operators A and B. We can optimize the propagation estimate by assuming that WF(B) = Ell B. This is because we really desperately want B to be elliptic on its microsupport, so that it does not introduce any new singularities. Under the assumption WF(B) = Ell B, B is a microlocalization to WF(B), and if (x, \xi) \in WF(A), then (x, \xi) got to WF(A) after passing through WF(B). The point is that if u has a singularity at (x, \xi) \in WF(A), then (if the regularity exponent s is taken large enough) ||Au||_{H^{s+m-1}} = \infty, but we assumed f Schwartz, so this implies ||Bu||_{H^{s+m-1}} = \infty, so that if we traveled back along the bicharacteristic flow (x(t), \xi(t)) from (x, \xi) for long enough, we would see that u already had a singularity at some time (x(T), \xi(T)) with T < 0.

Moreover, the propagation estimate is time-reversible in the sense we can replace T < 0 with -T > 0. Thus the bicharacteristic flow neither creates nor destroys singularities in the distribution u. This readily implies the propagation of singularities theorem.

The proof of the propagation estimate is quite technical and this post is meant as a more of a conceptual discussion so I will omit it.

Topology and game design

Aside from being bad at math, I am also bad at Final Fantasy XIV. So it happened that, while attempting to be less bad at Final Fantasy XIV and better understanding an aspect of one of the game’s encounters, I actually became less bad at math, and now I wonder if game developers should incorporate more involved topology into their games’ design.

Final Fantasy XIV as it is.

Let me review how raiding on Final Fantasy XIV works, for those unfamiliar. A group of eight characters, each controlled by one player, fights one “boss” monster. The boss’ attacks are frequently lethal, so to avoid a game over, the players must avoid avoidable attacks. The game is designed so that if an attack can be avoided, it can only be avoided in a particularly precise, and often opaque, manner. Examples of this include deciphering lines of iambic poetry, executing intricate but scripted movement patterns, or interpreting the way to avoid a truly bizarre instant-kill attack using obscure tidbits from the game’s lore. And don’t let yourself get distracted by the head-banging soundtrack

A recent boss, the Shadowkeeper, introduced in Futures Rewritten, has an attack known as Giga Slash which can be solved by thinking of it as inducing an orientation on the platform in which the battle takes place.

Let me remind the reader that an orientation of a curve (a one-dimensional space) is a choice of which direction is considered “right”; an orientation of a surface (a two-dimensional space) is a choice of which direction is considered “clockwise”; an orientation of a three-dimensional space is a choice of which coordinates are considered “right-handed”; and so on. [1]

Giga Slash involves Shadowkeeper drawing a sword, which she then slashes either to the left or to the right of her with, depending on which hand she draws the blade with. In particular, the attack will divide the platform into two rectangles, one of which is lethal to stand in and the other of which is not. What makes Giga Slash more interesting is that frequently Shadowkeeper’s “shadow” — a separate entity — or the player characters’ shadows, will be the origin of the attack instead.

In the former case, the party must either all stand to the left or to the right of the boss’s shadow, where “left” and “right” depend not on the player’s perception but the direction the shadow faces; thus the boss’s position induces a one-dimensional orientation on the platform, which is the orientation that one must use to resolve the attack, rather than the “natural” orientation given by the fact that there is a canonical choice of north, south, east, and west built into the game; players often refer to this orientation as the “absolute positions” and the orientations given by boss positioning (and in this case, shadow position) as “relative positions”.

The fact that the party has to deal with “relative positions” is hardly unusual. What makes Shadowkeeper more unusual is the second case, wherein four characters’ shadows are each the origin of a copy of the attack. In that case, their shadow appears as a black blob which is always to the absolute north, south, east, or west, of the characters, no matter how the characters moves. More abstractly, the shadow can be viewed as a unit vector which originates at the character, and is translated but otherwise not acted on by player movement (and also not acted on by anything else, for that matter). One player is assigned absolute north, one absolute west, et cetra.

The point is that the character shadows will always slash to their left if the boss is holding the sword in her left hand, and vice versa. The goal of the players is to aim the slashes in such a way that there is a safe rectangle. Many guides involve trickery with rotating the camera and so on to ensure that this happens, but there is a simple solution. The hand the boss is using is equivalent to a choice of orientation on the platform. If the boss raises her left hand, then the orientation is counterclockwise; otherwise it is clockwise. Now the players must all stand so that their shadow vector is tangent to the circle centered on the boss’s hitbox, radius a little large than the boss’s hitbox, and oriented according to the boss’s hand, and this will ensure that the interior of the boss’s hitbox is safe, as demonstrated here. Unfortunately, because right and counterclockwise are usually the “positive” orientations in mathematics, but here right is associated with clockwise, I still frequently do this trick incorrectly…😞

Whether the tactic I just outlined is easier or more difficult to execute than just manipulating the camera angle, it demystified how choices of orientation “look” in practice to me. Unfortunately, while topology is my weakest area of math, one needs to choose an orientation in order to define curl [2] and so I wasted a lot of time trying to understand what the vorticity equation was actually trying to say, until Shadowkeeper cleared things up for me.

Final Fantasy XIV as it could be.

In Final Fantasy XIV, most platforms have a very simple geometry, either being a square or a circle. When there have been exceptions, the players have often exploited the geometry to avoid attacks in ways the developers did not intend to be possible, causing the developers to shy away from introducing any nontrivial geometry or topology into the fights. But the above discussion got me thinking: what if we fought a boss on a nonorientable surface, such as a Möbius strip? [3] Along with my friend Greg DeFillippo (who was the mastermind behind some of the below proposed attacks), I have tried to find out.

The first obstacle is determining which direction gravity faces. For this model, I think it’s reasonable that the gravity always face towards the Möbius strip, a la Super Mario Galaxy. However, one could also have a mechanic which reverses the flow of gravity at the whim of one of the players; then if the players are “above” the strip they need the gravity to point downwards, and if they are “below” the strip they need it to point upwards.

One simple attack could consist of a blade that sweeps across the Möbius strip, killing anyone it touches; the only way to dodge it is to simply jump to the other side of the strip. Since the Möbius strip only has one side, the blade eventually sweeps over the entire strip, forcing everyone to dodge twice.

A more interesting example requires the use of a mechanic commonly seen in Final Fantasy XIV known as “proximity”. Proximity requires that two characters be sufficiently far from each other when the attack completes, or they will die. However, since the attack is on a Möbius strip, if the players run too far they will end up close to each other, in spite of how far they have run. (This attack could also be done on a torus, as in Pac-Man, so it does not use nonorientability, but it does use the existence of a nontrivial topology.)

Another example uses a mechanic known as “forced march”. This assigns an arrow to each character which causes them to run in that direction relative to the direction they are facing at the start of the attack. For example, a character that is facing towards true north and assigned a right arrow will run east at the start of the attack. The goal is for the player to face their character in a direction so that they avoid the (possibly several) other attacks that go out at the same time as the forced march. This requires the player to think about orientation; but this becomes much harder to do when the platform itself is nonorientable! For example, if the arrow faced right and the character faced true north to avoid an attack to the west, the character would run east, but then find themself in the west, exactly where they did not want to be.

The forced march can be modified to more strongly use nonorientability. One can locally define what it means to be clockwise, say on the top of the Möbius strip, and this will contradict what it means to be clockwise on the bottom of the strip. If the forced march, instead of a straight line, forced characters to run in a circle, the forced march would have two different effects if the character was on the top or the bottom of the strip. (If the character was on the side of the strip, either the attack would have to kill them instantly, or simply be completely unpredictable.)

I’d love to see other examples of mechanics that are designed for platforms with nontrivial topology. If you can cook up any particular cruel examples, post them in the comments below 😈

Technical notes.

[1] More abstractly, recall that if A \in GL(d) is a d \times d invertible matrix, then the determinant of A is either positive, in which case we say it is orientation-preserving, or negative, in which we case we say it is orientation-reversing. A change of coordinates is said to be orientation-preserving (resp. reversing) if its Jacobian matrix is orientation-preserving (resp. reversing). Thus on an orientable manifold there exist two possible orientations — in the low-dimensional cases, right and left, clockwise and counterclockwise, and right-handed and left-handed.

[2] The curl of a vector field V is by definition the Hodge dual of the derivative of the Hodge dual of V, and Hodge duality is only defined up to a choice of orientation. A much more concrete definition of curl is to first declare that if V is a vector field on a surface, then the curl of V is the angular momentum of a unit mass particle whose velocity field is V, and then if V is a vector field on a three-dimensional space, then the curl of V in the direction of a unit tangent vector e is the curl of V in the plane e^\perp. The trouble is that angular momentum of a particle rotating in positive orientation is by definition positive, so one first needs to decide what one means by positive orientation.

[3] Originally I wanted to do this on a Klein bottle but could not determine how to depict raids on a surface that does not embed in three-dimensional space.

What I want to learn, Spring 2021

As much for my own future reference as for anything, here’s a summary of some things I’d like to learn, maybe not this season, but soon.

First on the docket, I’d like to learn Vasy’s method. This is a technique for meromorphically continuing the resolvent of the Schrödinger operator on an asymptotically hyperbolic manifold — that is, a manifold which, near its boundary, looks like the Poincaré model of hyperbolic space does near its boundary. A priori the definition of the the resolvent only makes sense on a small open subset of the complex plane, and one hopes to show that the definition of the resolvent makes sense on the entire plane, except possibly a discrete set of poles.

On a somewhat similar note, I’d like to learn the Atiyah-Singer index theorem. This theorem equates the Fredholm index of an elliptic pseudodifferential operator on a line bundle L to its “topological index”, which is a rational number defined in terms of the cohomology of L. This is largely motivated by my quest to understand the sense in which cohomology counts solutions to PDE, c.f. my recent post on the genera of Riemann surfaces. I previously tried to learn the heat-kernel proof of Atiyah-Singer shortly after I first learned about pseudodifferential operators but got nowhere. This time, I will armed with the knowledge of the Riemann-Roch theorem, which may make all the difference.

Unlike the previous two requests, which are both PDE-analytic in nature, I think that my knowledge of complex analysis has prepared me to learn the proof that there are twenty-seven lines on a cubic surface in \mathbb P^3. This would entirely be for fun, and I may blog about it, so as to tell the story of a hapless analyst faffing around hopelessly in deep algebra.

Finally, I would like to fix up and publicize the Sage code that is mentioned by my paper on computation of Kac-Moody root multiplicities with Joshua Lin and Peter Connick. I suspect that this will require learning some nontrivial representation theory and complexity theory, though in its current form the algorithm is essentially a consequence of elementary facts about quadratic forms over \mathbb Z.

Elliptic regularity implies that compact genera are finite

A few years ago I took a PDE course. We were learning about something to do with elliptic pseudodifferential operators and the speaker drew a commutative diagram on the board and said, “You see, this comes from a short exact sequence –” and the whole room started laughing in discomfort. The speaker then remarked that Craig Evans himself would ban him from teaching analysis if word of the incident ever leaked, which might have something to do with why I have not disclosed the speaker’s name 🥵

Before recently, I found topology to be quite a scary area of math. It is still very much my weakest suit, but I should like to have some amount of competency with it. I have since come around to the viewpoint that cohomology is just a clever gadget for counting solutions of PDE. This has made the pill a little easier to swallow, and makes the previous anecdote all the more awkward.

As part of my ventures into trying to learn topology, in this post I will give a proof that the genus of any compact Riemann surface is finite. I am confident that this proof is not original, because it’s sort of the obvious proof if an analyst trying to prove this fact just followed their nose, but it seems a lot more natural to me than the proof in Forster, so let’s do this.

[Since the time of writing, I have made some corrections to incorrect or confusing statements. Thanks to Sarah Griffith for pointing these out!]

Let us start with some generalities. Fix a compact Riemann surface {X}, references to which we will suppress when possible. Let

\displaystyle 0 \rightarrow A \rightarrow B \rightarrow C \rightarrow 0

be a short exact sequence of sheaves. In our case, the sheaves will be sheaves of Fréchet spaces on {X}, which might not be homologically kosher, but that won’t cause any real issues. Then we get a long exact sequence in cohomology

\displaystyle 0 \rightarrow H^0(A) \rightarrow H^0(B) \rightarrow H^0(C) \rightarrow H^1(A) \rightarrow H^1(B) \rightarrow H^1(C) \rightarrow \cdots.

If B is a fine sheaf, i.e. it has partitions of unity subordinate to every open cover, then {H^1(B) = 0} and the long exact sequence collapses to the exact sequence

\displaystyle 0 \rightarrow H^0(A) \rightarrow B(X) \rightarrow C(X) \rightarrow H^1(A) \rightarrow 0.

In particular, the morphism of sheaves {B \rightarrow C} induces a bounded linear map {T: B(X) \rightarrow C(X)} such that {H^0(A)} is the kernel of {T} and {H^1(A)} is the cokernel of {T}. Now, if {T} is a Fredholm operator, then its index {k} satisfies

\displaystyle k = \text{dim } H^0(A) - \text{dim } H^1(A).

Let {\mathcal O} denote the sheaf of holomorphic functions on {X} and {\overline \partial} the Cauchy-Riemann operator. Let {\mathcal E} denote the sheaf of smooth functions on {X}; since {X} has enough partitions of unity, {\mathcal E} is a fine sheaf. The maps {\overline \partial: \mathcal E(U) \rightarrow \mathcal E(U)}, for {U \subseteq X} open, induces a short exact sequence of sheaves of Fréchet spaces

\displaystyle 0 \rightarrow \mathcal O \rightarrow \mathcal E \rightarrow \mathcal E \rightarrow 0

and hence an exact sequence in cohomology

\displaystyle 0 \rightarrow \mathbf C \rightarrow \mathcal E(X) \rightarrow \mathcal E(X) \rightarrow H^1(\mathcal O) \rightarrow 0.

Here we used Liouville’s theorem. On the other hand, the dimension of {H^1(\mathcal O)} is by definition the genus {g} of {X}. Therefore, if {k} is the Fredholm index of {\overline \partial}, then

\displaystyle g = 1 - k.

It remains to show that {k} is well-defined and finite; that is, {\overline \partial} is Fredholm. This is a standard elliptic regularity argument, which I will now recall. We first fix a volume form {dV} on {X}, which exists since {X} is an orientable surface. This induces an {L^2} norm on {X}, namely

\displaystyle ||u||_{L^2} = \int_X |u|^2 ~dV.

Unfortunately the usual Sobolev notation {H^s} clashes with the notation for cohomology, so let me use {W^s} to denote the completion of {\mathcal E} under the norm

\displaystyle ||u||_s = \sum_{|\alpha| \leq s} ||\partial^\alpha u||_{L^2}

where {\alpha} ranges over multiindices. Then {W^0 = L^2} and {\overline \partial} maps {W^1 \rightarrow W^0}. The kernel of {\overline \partial} is finite-dimensional (since it is isomorphic to {\mathbf C}, by Liouville’s theorem and Weyl’s lemma), so to deduce that {\overline \partial} is Fredholm as an operator {W^1 \rightarrow W^0} it suffices to show that the cokernel of {\overline \partial} is finite-dimensional.

We first claim the elliptic regularity estimate

\displaystyle ||u||_1 \leq C ||f||_0 + C ||u||_0

for any smooth functions u,f which satisfy {\overline \partial}u = f. By definition of the Sobolev norm, we have

\displaystyle ||u||_1 = ||u||_0 + ||u'||_0 + ||f||_0.

Without loss of generality, we may assume that {u} is smooth. Then we can write {u = v + w} where {v} and {\overline w} are holomorphic. In particular, {u' = v'} and {f = \overline \partial w}, so

\displaystyle ||u||_1 = ||u||_0 + ||v'||_0 + ||f||_0.

The only troublesome term here is {v'}. Taking a Cauchy estimate, we see that

\displaystyle |v'(z)| \leq ||v||_{L^\infty} \leq C||v||_{L^2} = C||v||_0.

But {X} is compact, so has finite volume; therefore

\displaystyle ||v'||_0 = ||v'||_{L^2} \leq C||v||_{L^\infty} \leq C||v||_0 \leq C||u||_0.

This gives the desired bound.

Let {u_n} be a sequence in {W^1} with {f_n = \overline \partial u_n \in W^0}, and assume that the {f_n} are Cauchy in {W^0}. Without loss of generality we may assume that {u_n \in K^\perp} where {K} is the kernel of {\overline \partial}. If the {u_n} are not bounded in {W^1}, we may replace them with {u_n/||u_n||_1}, and thus assume that they are in fact bounded. By the Rellich-Kondrachov theorem (which says that the natural map {W^1 \rightarrow W^0} is compact), we may therefore assume that the {u_n} are Cauchy in {W^0}. But then

\displaystyle ||u_n - u_m||_1 \leq C ||f_n - f_m||_0 + C ||u_n - u_m||_0

so the {u_n} are Cauchy in {W^1}. Therefore the {u_n} converge in {K^\perp}, hence the {f_n} converge in the image {Z} of {\overline \partial}, since {\overline \partial} gives an isomorphism {K^\perp \rightarrow Z}. Therefore {Z} is closed.

If one applies integration by parts to {\overline \partial}, the fact that X has no boundary implies that for any f,g,

\displaystyle \langle \overline \partial f, g\rangle = \int_X \overline \partial f \overline g ~dV = -\int_X f \overline{\partial g} ~dV = -\langle f, g'\rangle

and thus \overline \partial^* = -\partial. Since Z is closed, the dual of the cokernel of {\overline \partial} is the kernel L of -\partial; by the Rellich-Kondrachov theorem, the unit ball of L is compact and therefore L is finite-dimensional. By the Hanh-Banach theorem, this implies that the cokernel of {\overline \partial} is finite-dimensional. Therefore {k} and hence {g} is finite.

A PDE-analytic proof of the fundamental theorem of algebra

The fundamental theorem of algebra is one of the most important theorems in mathematics, being core to algebraic geometry and complex analysis. Unraveling the definitions, it says:

Fundamental theorem of algebra. Let f be a polynomial over \mathbf C of degree d. Then the equation f(z) = 0 has d solutions z, counting multiplicity.

Famously, most proofs of the fundamental theorem of algebra are complex-analytic in nature. Indeed, complex analysis is the natural arena for such a theorem to be proven. One has to use the fact that \mathbf R is a real closed field, but since there are lots of real closed fields, one usually defines \mathbf R in a fundamentally analytic way and then proves the intermediate value theorem, which shows that \mathbf R is a real closed field. One can then proceed by tricky algebraic arguments (using, e.g. Galois or Sylow theory), or appeal to a high-powered theorem of complex analysis. Since the fundamental theorem is really a theorem about algebraic geometry, and complex analysis sits somewhere between algebraic geometry and PDE analysis in the landscape of mathematics (and we need some kind of analysis to get the job done; purely algebro-geometric methods will not be able to distinguish \mathbf R from another field K such that -1 does not have a square root in K) it makes a lot of sense to use complex analysis.

But, since complex analysis sits between algebraic geometry and PDE analysis, why not abandon all pretense of respectability (that is to say, algebra — analysis is not a field worthy of the respect of a refined mathematician) and give a PDE-analytic proof? Of course, this proof will end up “looking like” multiple complex-analytic proofs, and indeed it is basically the proof by Liouville’s theorem dressed up in a trenchcoat (and in fact, gives Liouville’s theorem, and probably some other complex-analytic results, as a byproduct). In a certain sense — effectiveness — this proof is strictly inferior to the proof by the argument principle, and in another certain sense — respectability — this proof is strictly inferior to algebraic proofs. However, it does have the advantage of being easy to teach to people working in very applied fields, since it entirely only uses the machinery of PDE analysis, rather than fancy results such as Liouville’s theorem or the Galois correspondence.

The proof
By induction, it suffices to prove that if f is a polynomial with no zeroes, then f is constant. So suppose that f has no zeroes, and introduce g(z) = 1/f(z). As usual, we want to show that g is constant.

Since f is a polynomial, it does not decay at infinity, so g(\infty) is finite. Therefore g can instead be viewed as a function on the sphere, g: S^2 \to \mathbf C, by stereographic projection. Also by stereographic projection, one can cover the sphere by two copies of \mathbf R^2, one centered at the south pole that misses only the north pole, and one centered at the north pole that only misses the south pole. Thus one can define the Laplacian, \Delta = \partial_x^2 + \partial_y^2, in each of these coordinates; it remains well-defined on the overlaps of the charts, so \Delta is well-defined on all of S^2. (In fancy terminology, which may help people who already know ten different proofs of the fundamental theorem of algebra but will not enlighten anyone else, we view S^2 as a Riemannian manifold under the pushforward metric obtained by stereographic projection, and consider the Laplace-Beltrami operator of S^2.)

Recall that a function u is called harmonic provided that \Delta u = 0. We claim that g is harmonic. The easiest way to see this is to factor \Delta = 4\partial\overline \partial where 2\partial = \partial_x - i\partial_y. Then \overline \partial u = 0 exactly if u has a complex derivative, by the Cauchy-Riemann equations. There are other ways to see this, too, such as using the mean-value property of harmonic functions and computing the antiderivative of g. In any case, the proof is just calculus.

So g is a harmonic function on the compact connected manifold S^2; by the extreme value theorem, g has (or more precisely, its real and imaginary parts have) a maximum. By the maximum principle of harmonic functions (which is really just the second derivative test — being harmonic generalizes the notion of having zero second derivative), it follows that g is equal to its maximum, so is constant. (In fancy terminology, we view g as the canonical representative of the zeroth de Rham cohomology class of S^2 using the Hodge theorem.)

Let’s Read: Sendov’s conjecture in high degree, part 4: details of case one

In this proof we (finally!) finish the proof of case one.

As usual, we throughout fix a nonstandard natural {n} and a complex polynomial of degree {n} whose zeroes are all in {\overline{D(0, 1)}}. We assume that {a} is a zero of {f} whose standard part is {1}, and assume that {f} has no critical points in {\overline{D(a, 1)}}. Let {\lambda} be a random zero of {f} and {\zeta} a random critical point. Under these circumstances, {\lambda^{(\infty)}} is uniformly distributed on {\partial D(0, 1)} and {\zeta^{(\infty)}} is almost surely zero. In particular,

\displaystyle \mathbf E \log\frac{1}{|\lambda|}, \mathbf E \log |\zeta - a| = O(n^{-1})

and {\zeta} is infinitesimal in probability, hence infinitesimal in distribution. Let {\mu} be the expected value of {\zeta} (thus also of {\lambda}) and {\sigma^2} its variance. I think we won’t need the nonstandard-exponential bound {\varepsilon_0^n} this time, as its purpose was fulfilled last time.

Last time we reduced the proof of case one to a sequence of lemmata. We now prove them.

1. Preliminary bounds

Lemma 1 Let {K \subseteq \mathbf C} be a compact set. Then

\displaystyle f(z) - f(0), ~f'(z) = O((|z| + o(1))^n)

uniformly for {z \in K}.

Proof: It suffices to prove this for a compact exhaustion, and thus it suffices to assume

\displaystyle K = \overline{D(0, R)}.

By underspill, it suffices to show that for every standard {\varepsilon > 0} we have

\displaystyle |f(z) - f(0)|, ~|f'(z)| \leq C(|z| + \varepsilon)^n.

We first give the proof for {f'}.

First suppose that {\varepsilon < |z| \leq R}. Since {\zeta} is infinitesimal in distribution,

\displaystyle \mathbf E \log |z - \zeta| \leq \mathbf E \log \max(|z - \zeta|, \varepsilon/2) \leq \log \max(|z|, \varepsilon/2) + o(1);

here we need the {\varepsilon/2} and the {R} since {\log |z - \zeta|} is not a bounded continuous function of {\zeta}. Since {\varepsilon < |z|} we have

\displaystyle \mathbf E \log |z - \zeta| \leq \log |z| + o(1)

but we know that

\displaystyle -\frac{\log n}{n - 1} - \frac{1}{n - 1} \log |f'(z)| = U_\zeta(z) = -\mathbf E \log |z - \zeta|

so, solving for {\log |f'(z)|}, we get

\displaystyle \log |f'(z)| \leq (n - 1) \log |z| + o(n);

we absorbed a {\log n} into the {o(n)}. That gives

\displaystyle |f'(z)| \leq e^{o(n)} |z|^{n-1}.

Since {f'} is a polynomial of degee {n - 1} and {f} is monic (so the top coefficient of {f'} is {n}) this gives a bound

\displaystyle |f'(z)| \leq e^{o(n)} (|z| + \varepsilon)^{n - 1}

even for {|z| \leq \varepsilon}.

Now for {f}, we use the bound

\displaystyle |f(z) - f(0)| \leq \max_{|w| < |z|} |f'(w)|

to transfer the above argument. \Box

2. Uniform convergence of {\zeta}

Lemma 2 There is a standard compact set {S \subseteq \overline{D(0, 1)}} and a standard countable set {T \subseteq \overline{D(0, 1)} \setminus \overline{D(1, 1)}} such that

\displaystyle S = (\overline{D(0, 1)} \cap \partial D(1, 1)) \cup T,

all elements of {T} are isolated in {S}, and {||\zeta - S||_{L^\infty}} is infinitesimal.

Tao claims

\displaystyle \mathbf P(|\zeta - a| \geq \frac{1}{2m}) = O(n^{-1})

where {m} is a large standard natural, which makes no sense since the left-hand side should be large (and in particular, have positive standard part). I think this is just a typo though.

Proof: Since {\zeta} was assumed far from {a = 1 - o(1)} we have

\displaystyle \zeta \in \overline{D(0, 1)} \setminus D(1, 1 - o(1)).

We also have

\displaystyle \mathbf E \log |\zeta - a| = O(n^{-1})

so for every standard natural {m} there is a standard natural {k_m} such that

\displaystyle \mathbf P(\log |\zeta - a| \geq \frac{1}{2m}) \leq \frac{k_m}{n}.

Multiplying both sides by {n} we see that

\displaystyle \text{card } Z \cap K_m = \text{card } Z \cap \{\zeta_0 \in \overline{D(0, 1)}: \log |\zeta_0 - a| \geq \frac{1}{2m}\} \leq k_m

where {Z} is the variety of critical points {f' = 0}. Let {T_m} be the set of standard parts of zeroes in {K_m}; then {T_m} has cardinality {\leq k_m} and so is finite. For every zero {\zeta_0 \in Z}, either

  1. For every {m},

    \displaystyle |\zeta_0 - a| < \exp\left(\frac{1}{2m}\right)

    so the standard part of {|\zeta_0 - a|} is {1}, or

  2. There is an {m} such that {d(\zeta_0, T_m)} is infinitesimal.

So we may set {T = \bigcup_m T_m}; then {T} is standard and countable, and does not converge to a point in {\partial D(1, 1)}, so {S} is standard and {||\zeta - S||_{L^\infty}} is infinitesimal.

I was a little stumped on why {S} is compact; Tao doesn’t prove this. It turns out it’s obvious, I was just too clueless to see it. The construction of {T} forces that for any {\varepsilon > 0}, there are only finitely many {z \in T} with {|z - \partial D(1, 1)| \geq \varepsilon}, so if {T} clusters anywhere, then it can only cluster on {\partial D(1, 1)}. This gives the desired compactness. \Box

The above proof is basically just the proof of Ascoli’s compactness theorem adopted to this setting and rephrased to replace the diagonal argument (or 👏 KEEP 👏 PASSING 👏 TO 👏 SUBSEQUENCES 👏) with the choice of a nonstandard natural. I think the point is that, once we have chosen a nontrivial ultrafilter on {\mathbf N}, a nonstandard function is the same thing as sequence of functions, and the ultrafilter tells us which subsequences of reals to pass to.

3. Approximating {f,f'} outside of {S}

We break up the approximation lemma into multiple parts. Let {K} be a standard compact set which does not meet {S}. Given a curve {\gamma} we denote its arc length by {|\gamma|}; we always assume that an arc length does exist.

A point which stumped me for a humiliatingly long time is the following:

Lemma 3 Let {z, w \in K}. Then there is a curve {\gamma} from {z} to {w} which misses {S} and satisfies the uniform estimate

\displaystyle |z - w| \sim |\gamma|.

Proof: We use the decomposition of {S} into the arc

\displaystyle S_0 = \partial D(1, 1) \cap \overline{D(0, 1)}

and the discrete set {T}. We try to set {\gamma} to be the line segment {[z, w]} but there are two things that could go wrong. If {[z, w]} hits a point of {T} we can just perturb it slightly by an error which is negligible compared to {[z, w]}. Otherwise we might hit a point of {S_0} in which case we need to go the long way around. However, {S_0} and {K} are compact, so we have a uniform bound

\displaystyle \max(\frac{1}{|z - S_0|}, \frac{1}{|w - S_0|}) = O(1).

Therefore we can instead consider a curve {\gamma} which goes all the way around {S_0}, leaving {D(0, 1)}. This curve has length {O(1)} for {z, w} close to {S_0} (and if {z, w} are far from {S_0} we can just perturb a line segment without generating too much error). Using our uniform max bound above we see that this choice of {\gamma} is valid. \Box

Recall that the moments {\mu,\sigma} of {\zeta} are infinitesimal.

Since {||\zeta - S||_{L^\infty}} is infinitesimal, and {K} is a positive distance from any infinitesimals (since it is standard compact), we have

\displaystyle |z - \zeta|, |z - \mu| \sim 1

uniformly in {z}. Therefore {f} has no critical points near {K} and so {f''/f'} is holomorphic on {K}.

We first need a version of the fundamental theorem.

Lemma 4 Let {\gamma} be a contour in {K} of length {|\gamma|}. Then

\displaystyle f'(\gamma(1)) = f'(\gamma(0)) \left(\frac{\gamma(1) - \mu}{\gamma(0) - \mu}\right)^{n - 1} e^{O(n) |\gamma| \sigma^2}.

Proof: Our bounds on {|z - \zeta|} imply that we can take the Taylor expansion

\displaystyle \frac{1}{z - \zeta} = \frac{1}{z - \mu} + \frac{\zeta - \mu}{(z - \mu)^2} + O(|\zeta - \mu|^2)

of {\zeta} in terms of {\mu}, which is uniform in {\zeta}. Taking expectations preserves the constant term (since it doesn’t depend on {\zeta}), kills the linear term, and replaces the quadratic term with a {\sigma^2}, thus

\displaystyle s_\zeta(z) = \frac{1}{z - \mu} + O(\sigma^2).

At the start of this series we showed

\displaystyle f'(\gamma(1)) = f'(\gamma(0)) \exp\left((n-1)\int_\gamma s_\zeta(z) ~dz\right).

Plugging in the Taylor expansion of {s_\zeta} we get

\displaystyle f'(\gamma(1)) = f'(\gamma(0)) \exp\left((n-1)\int_\gamma \frac{dz}{z - \zeta}\right) e^{O(n) |\gamma| \sigma^2}.

Simplifying the integral we get

\displaystyle \exp\left((n-1)\int_\gamma \frac{dz}{z - \zeta}\right) = \left(\frac{\gamma(1) - \mu}{\gamma(0) - \mu}\right)^{n - 1}

whence the claim. \Box

Lemma 5 Uniformly for {z,w \in K} one has

\displaystyle f'(w) = (1 + O(n|z - w|\sigma^2 e^{o(n|z - w|)})) \frac{(w - \mu)^{n-1}}{(z - \mu)^{n - 1}}f'(z).

Proof: Applying the previous two lemmata we get

\displaystyle f'(w) = e^{O(n|z - w|\sigma^2)} \frac{(w - \mu)^{n-1}}{(z - \mu)^{n - 1}}f'(z).

It remains to simplify

\displaystyle e^{O(n|z - w|\sigma^2)} = 1 + O(n|z - w|\sigma^2 e^{o(n|z - w|)}).

Taylor expanding {\exp} and using the self-similarity of the Taylor expansion we get

\displaystyle e^z = 1 + O(|z| e^{|z|})

which gives that bound. \Box

Lemma 6 Let {\varepsilon > 0}. Then

\displaystyle f(z) = f(0) + \frac{1 + O(\sigma^2)}{n} f'(z) (z - \mu) + O((\varepsilon + o(1))^n).

uniformly in {z \in K}.

Proof: We may assume that {\varepsilon} is small enough depending on {K}, since the constant in the big-{O} notation can depend on {K} as well, and {\varepsilon} only appears next to implied constants. Now given {z} we can find {\gamma} from {z} to {\partial B(0, \varepsilon)} which is always moving at a speed which is uniformly bounded from below and always moving in a direction towards the origin. Indeed, we can take {\gamma} to be a line segment which has been perturbed to miss the discrete set {T}, and possibly arced to miss {S_0} (say if {z} is far from {D(0, 1)}). By compactness of {K} we can choose the bounds on {\gamma} to be not just uniform in time but also in space (i.e. in {K}), and besides that {\gamma} is a curve through a compact set {K'} which misses {S}. Indeed, one can take {K'} to be a closed ball containing {K}, and then cut out small holes in {K'} around {T} and {S_0}, whose radii are bounded below since {K} is compact. Since the moments of {\zeta} are infinitesimal one has

\displaystyle \int_\gamma (w - \mu)^{n-1} ~dw = \frac{(z - \mu)^n}{n} - \frac{\varepsilon^n e^{in\theta}}{n} = \frac{(z - \mu)^n}{n} - O((\varepsilon + o(1))^n).

Here we used {\varepsilon < 1} to enforce

\displaystyle \varepsilon^n/n = O(\varepsilon^n).

By the previous lemma,

\displaystyle f'(w) = (1 + O(n|z - w|\sigma^2 e^{o(n|z - w|)})) \frac{(w - \mu)^{n-1}}{(z - \mu)^{n - 1}}f'(z).

Integrating this result along {\gamma} we get

\displaystyle f(\gamma(0)) = f(\gamma(1)) - \frac{f'(\gamma(0))}{(\gamma(0) - \mu)^{n-1}} \left(\int_\gamma (w - \mu)^{n-1} ~dw + O\left(n\sigma^2 \int_\gamma|\gamma(0) - w| e^{o(n|\gamma(0) - w|)}|w - \mu|^{n-1}~dw \right) \right).

Applying our preliminary bound, the previous paragraph, and the fact that {|\gamma(1)| = \varepsilon}, thus

\displaystyle f(\gamma(1)) = f(0) + O((\varepsilon + o(1))^n),

we get

\displaystyle f(z) = f(0) + O((\varepsilon + o(1))^n) - \frac{f'(z)}{(z - \mu)^{n-1}} \left(\frac{(z - \mu)^n}{n} - O((\varepsilon + o(1))^n) + O\left(n\sigma^2 \int_\gamma|z - w| e^{o(n|z - w|)}|w - \mu|^{n-1}~dw \right)\right).

We treat the first term first:

\displaystyle \frac{f'(z)}{(z - \mu)^{n-1}} \frac{(z - \mu)^n}{n} = \frac{1}{n} f'(z) (z - \mu).

For the second term, {z \in K} while {\mu^{(\infty)} \in K}, so {|z - \mu|} is bounded from below, whence

\displaystyle \frac{f'(z)}{(z - \mu)^{n-1}} O((\varepsilon + o(1))^n) = O((\varepsilon + o(1))^n).

Thus we simplify

\displaystyle f(z) = f(0) + O((\varepsilon + o(1))^n) + \frac{1}{n} f'(z) (z - \mu) + \frac{f'(z)}{(z - \mu)^{n-1}} O\left(n\sigma^2 \int_\gamma|z - w| e^{o(n|z - w|)}|w - \mu|^{n-1}~dw \right).

It will be convenient to instead write this as

\displaystyle f(z) = f(0) + O((\varepsilon + o(1))^n) + \frac{1}{n} f'(z) (z - \mu) + O\left(n|f'(z)|\sigma^2 \int_\gamma|z - w| e^{o(n|z - w|)} \left|\frac{w - \mu}{z - \mu}\right|^{n-1}~dw \right).

Now we deal with the pesky integral. Since {\gamma} is moving towards {\partial B(0, \varepsilon)} at a speed which is bounded from below uniformly in “spacetime” (that is, {K \times [0, 1]}), there is a standard {c > 0} such that if {w = \gamma(t)} then

\displaystyle |w - \mu| \leq |z - \mu| - ct

since {\gamma} is going towards {\mu}. (Tao’s argument puzzles me a bit here because he claims that the real inner product {\langle z - w, z\rangle} is uniformly bounded from below in spacetime, which seems impossible if {w = z}. I agree with its conclusion though.) Exponentiating both sides we get

\displaystyle \left|\frac{w - \mu}{z - \mu}\right|^{n-1} = O(e^{-nct})

which bounds

\displaystyle f(z) = f(0) + O((\varepsilon + o(1))^n) + \frac{1}{n} f'(z) (z - \mu) + O\left(n|f'(z)|\sigma^2 \int_0^1 te^{-(c-o(1))nt} ~dt\right).

Since {c} is standard, it dominates the infinitesimal {o(1)}, so after shrinking {c} a little we get a new bound

\displaystyle f(z) = f(0) + O((\varepsilon + o(1))^n) + \frac{1}{n} f'(z) (z - \mu) + O\left(n|f'(z)|\sigma^2 \int_0^1 te^{-cnt} ~dt\right).

Since {n\int_0^1 te^{-cnt} ~dt} is exponentially small in {n}, in particular it is smaller than {O(n^{-1})}. Plugging in everything we get the claim. \Box

4. Control on zeroes away from {S}

After the gargantuan previous section, we can now show the “approximate level set” property that we discussed last time.

Lemma 7 Let {K} be a standard compact set which misses {S} and {\varepsilon > 0} standard. Then for every zero {\lambda_0 \in K} of {f},

\displaystyle U_\zeta(\lambda) = \frac{1}{n} \log \frac{1}{|f(0)|} + O(n^{-1}\sigma^2 + (\varepsilon + o(1))^n).

Last time we showed that this implies

\displaystyle U_\zeta(\lambda_0) = U_\zeta(a) + O(n^{-1}\sigma^2 + (\varepsilon + o(1))^n).

Thus all the zeroes of {f} either live in {S} or a neighborhood of a level set of {U_\zeta}. Proof: Plugging in {z = \lambda_0} in the approximation

\displaystyle f(z) = f(0) + \frac{1 + O(\sigma^2)}{n} f'(z) (z - \mu) + O((\varepsilon + o(1))^n)

we get

\displaystyle f(0) + \frac{1 + O(\sigma^2)}{n} f'(\lambda_0) (\lambda_0 - \mu) = O((\varepsilon + o(1))^n).

Several posts ago, we proved {|f(0)| \sim 1} as a consequence of Grace’s theorem, so {f(0)O((\varepsilon + o(1))^n) = O((\varepsilon + o(1))^n)}. In particular, if we solve for {f'(\lambda_0)} we get

\displaystyle \frac{|f'(\lambda_0)}{n} |\lambda_0 - \mu| = |f(0)| (1 + O(\sigma^2 + (\varepsilon + o(1))^n).

Using

\displaystyle U_\zeta(z) = -\frac{\log n}{n - 1} - \frac{1}{n - 1} \log |f'(z)|,

plugging in {z = \lambda_0}, and taking logarithms, we get

\displaystyle -\frac{n - 1}{n} U_\zeta(\lambda_0) + \frac{1}{n} \log | \lambda_0 - \mu| = \frac{1}{n} \log |f(0)| + O(n^{-1}\sigma^2 + (\varepsilon + o(1))^n).

Now {\lambda_0 \in K} and {K} misses the standard compact set {S}, so since {0 \in S} we have

\displaystyle |\lambda - \zeta|, |\lambda - \mu| \sim 1

(since {\zeta^{(\infty)} \in S} and {\mu} is infinitesimal). So we can Taylor expand in {\zeta} about {\mu}:

\displaystyle \log |\lambda_0 - \zeta| = \log |\lambda_0 - \mu| - \text{Re }\frac{\zeta - \mu}{\lambda_0 - \mu} + O(\sigma^2).

Taking expectations and using {\mathbf E \zeta - \mu},

\displaystyle -U_\zeta(\lambda_0) = \log |\lambda_0 - \mu| + O(\sigma^2).

Plugging in {\log |\lambda_0 - \mu|} we see the claim. \Box

I’m not sure who originally came up with the idea to reason like this; I think Tao credits M. J. Miller. Whoever it was had an interesting idea, I think: {f = 0} is a level set of {f}, but one that a priori doesn’t tell us much about {f'}. We have just replaced it with a level set of {U_\zeta}, a function that is explicitly closely related to {f'}, but at the price of an error term.

5. Fine control

We finish this series. If you want, you can let {\varepsilon > 0} be a standard real. I think, however, that it will be easier to think of {\varepsilon} as “infinitesimal, but not as infinitesimal as the term of the form o(1)”. In other words, {1/n} is smaller than any positive element of the ordered field {\mathbf R(\varepsilon)}; briefly, {1/n} is infinitesimal with respect to {\mathbf R(\varepsilon)}. We still reserve {o(1)} to mean an infinitesimal with respect to {\mathbf R(\varepsilon)}. Now {\varepsilon^n = o(1)} by underspill, since this is already true if {\varepsilon} is standard and {0 < \varepsilon < 1}. Underspill can also be used to transfer facts at scale {\varepsilon} to scale {1/n}. I think you can formalize this notion of “iterated infinitesimals” by taking an iterated ultrapower of {\mathbf R} in the theory of ordered rings.

Let us first bound {\log |a|}. Recall that {|a| \leq 1} so {\log |a| \leq 0} but in fact we can get a sharper bound. Since {T} is discrete we can get {e^{-i\theta}} arbitrarily close to whatever we want, say {-1} or {i}. This will give us bounds on {1 - a} when we take the Taylor expansion

\displaystyle \log|a| = -(1 - a)(1 + o(1)).

Lemma 8 Let {e^{i\theta} \in \partial D(0, 1) \setminus S} be standard. Then

\displaystyle \log |a| \leq \text{Re } ((1 - e^{-i\theta} + o(1))\mu) - O(|\mu|^2 + \sigma^2 + (\varepsilon + o(1))^n).

Proof: Let {K} be a standard compact set which misses {S} and {\lambda_0 \in K} a zero of {f}. Since {\zeta \notin K} (since {S} is close to {\zeta}) and {|a-\zeta|} has positive standard part (since {d(a, S) = 1}) we can take Taylor expansions

\displaystyle -\log |\lambda_0 - \zeta| = -\log |\lambda_0| + \text{Re } \frac{\zeta}{\lambda_0} + O(|\zeta|^2)

and

\displaystyle -\log |a - \zeta| = -\log|a| + \text{Re } \frac{\zeta}{a} + O(|\zeta|^2)

in {\zeta} about {0}. Taking expectations we have

\displaystyle U_\zeta(\lambda_0) = -\log |\lambda_0| + \text{Re } \frac{\mu}{\lambda_0} + O(\mathbf E |\zeta|^2)

and similarly for {a}. Thus

\displaystyle -\log |a| + \text{Re } \frac{\mu}{a} = -\log |\lambda_0| + \text{Re } \frac{\mu}{\lambda_0} + O(\mathbf E |\zeta|^2 + n^{-1}\sigma^2 + (\varepsilon + o(1))^n)

since

\displaystyle U_\zeta(\lambda_0) - U_\zeta(a) = O(n^{-1}\sigma^2 + (\varepsilon + o(1))^n).

Since

\displaystyle \mathbf E|\zeta|^2 = |\mu|^2 + \sigma^2

we have

\displaystyle -\log|\lambda_0| + \text{Re } \left(\frac{1}{\lambda_0} - \frac{1}{a}\right)\mu = -\log|a| + O(|\mu|^2 + \sigma^2 + (\varepsilon + o(1))^n).

Now {|\lambda_0| \leq 1} so {-\log |\lambda_0| \geq 0}, whence

\displaystyle \text{Re } \left(\frac{1}{\lambda_0} - \frac{1}{a}\right)\mu \geq -\log|a| + O(|\mu|^2 + \sigma^2 + (\varepsilon + o(1))^n).

Now recall that {\lambda^{(\infty)}} is uniformly distributed on {\partial D(0, 1)}, so we can choose {\lambda_0} so that

\displaystyle |\lambda_0 - e^{i\theta}| = o(1).

Thus

\displaystyle \frac{1}{\lambda_0} - \frac{1}{a} = 1 - e^{-i\theta} + o(1)

which we can plug in to get the claim. \Box

Now we prove the first part of the fine control lemma.

Lemma 9 One has

\displaystyle \mu, 1 - a = O(\sigma^2 + (\varepsilon + o(1))^n).

Proof: Let {\theta_+ \in [0.98\pi, 0.99\pi],\theta_- \in [1.01\pi, 1.02\pi]} be standard reals such that {e^{i\theta_\pm} \notin S}. I don’t think the constants here actually matter; we just need {0 < 0.01 < 0.02 < \pi/8} or something. Anyways, summing up two copies of the inequality from the previous lemma with {\theta = \theta_\pm} we have

\displaystyle 1.9 \text{Re } \mu \geq \text{Re } ((1 + e^{-i\theta_+} + 1 + e^{-i\theta_-} + o(1))\mu) \geq \log |a| + O(|\mu|^2 + \sigma^2 + (\varepsilon + o(1))^n)

since

\displaystyle 2 + e^{-i\theta_+} + e^{-i\theta_-} + o(1) \leq 1.9.

That is,

\displaystyle \text{Re } \mu \geq \frac{\log|a|}{1.9} + O(|\mu|^2 + \sigma^2 + (\varepsilon + o(1))^n).

Indeed,

\displaystyle -\log |a| = (1 - a)(1 + o(1)),

so

\displaystyle \text{Re }\mu \geq -\frac{1 - a}{1.9 + o(1)} + O(|\mu|^2 + \sigma^2 + (\varepsilon + o(1))^n).

If we square the tautology {|\zeta - a| \geq 1} then we get

\displaystyle |\zeta|^2 - 2a \text{Re }\zeta + a^2 \geq 1.

Taking expected values we get

\displaystyle |\mu|^2 + \sigma^2 - 2a \text{Re }\mu + a^2 \geq 1

or in other words

\displaystyle \text{Re }\mu \leq -\frac{1 - a^2}{2a} + O(|\mu|^2 + \sigma^2) = -(1 - a)(1 + o(1)) + O(|\mu|^2 + \sigma^2)

where we used the Taylor expansion

\displaystyle \frac{1 - a^2}{2a} = (1 - a)(1 + o(1))

obtained by Taylor expanding {1/a} about {1} and applying {1 - a = o(1)}. Using

\displaystyle \text{Re }\mu \geq -\frac{1 - a}{1.9 + o(1)} + O(|\mu|^2 + \sigma^2 + (\varepsilon + o(1))^n)

we get

\displaystyle -\frac{1 - a}{1.9 + o(1)} + O(|\mu|^2 + \sigma^2 + (\varepsilon + o(1))^n) \leq \text{Re }\mu \leq -(1 - a)(1 + o(1)) + O(|\mu|^2 + \sigma^2)

Thus

\displaystyle (1 - a)\left(1 + \frac{1}{1.9 + o(1)} + o(1)\right) = O(|\mu|^2 + \sigma^2 + (\varepsilon + o(1))^n).

Dividing both sides by {1 + \frac{1}{1.9 + o(1)} + o(1) \in [1, 2]} we have

\displaystyle 1 - a = O(|\mu|^2 + \sigma^2 + (\varepsilon + o(1))^n).

In particular

\displaystyle \text{Re }\mu = O(|\mu|^2 + \sigma^2 + (\varepsilon + o(1))^n)(1 + o(1)) + O(|\mu|^2 + \sigma^2) = O(|\mu|^2 + \sigma^2 + (\varepsilon + o(1))^n).

Now we treat the imaginary part of {\text{Im } \mu}. The previous lemma gave

\displaystyle \text{Re } ((1 - e^{-i\theta} + o(1))\mu) - \log |a| = O(|\mu|^2 + \sigma^2 + (\varepsilon + o(1))^n).

Writing everything in terms of real and imaginary parts we can expand out

\displaystyle \text{Re } ((1 - e^{-i\theta} + o(1))\mu) = (\sin \theta + o(1))\text{Re } \mu + (1 - \cos \theta + o(1))\text{Re }\mu.

Using the bounds

\displaystyle (1 - \cos \theta + o(1))\text{Re }\mu, ~\log |a| = O(|\mu|^2 + \sigma^2 + (\varepsilon + o(1))^n)

(Which follow from the previous paragraph and the bound {\log |a| = O(1 - a)}), we have

\displaystyle (\sin \theta + o(1))\text{Im } \mu = O(|\mu|^2 + \sigma^2 + (\varepsilon + o(1))^n).

Since {T} is discrete we can find {\theta} arbitrarily close to {\pm \pi/2} which meets the hypotheses of the above equation. Therefore

\displaystyle \text{Im } \mu = O(|\mu|^2 + \sigma^2 + (\varepsilon + o(1))^n).

Pkugging everything in, we get

\displaystyle 1 - a \sim \mu = O(|\mu|^2 + \sigma^2 + (\varepsilon + o(1))^n).

Now {|\mu|^2 = o(|\mu|)} since {\mu} is infinitesimal; therefore we can discard that term. \Box

Now we are ready to prove the second part. The point is that we are ready to dispose of the semi-infinitesimal {\varepsilon}. Doing so puts a lower bound on {U_\zeta(a)}.

Lemma 10 Let {I \subseteq \partial D(0, 1) \setminus S} be a standard compact set. Then for every {e^{i\theta} \in I},

\displaystyle U_\zeta(a) - U_\zeta(e^{i\theta}) \geq -o(\sigma^2) - o(1)^n.

Proof: Since {\lambda^{(\infty)}} is uniformly distributed on {\partial D(0, 1)}, there is a zero {\lambda_0} of {f} with {|\lambda_0 - e^{i\theta}| = o(1)}. Since {|\lambda_0| \leq 1}, we can find an infinitesimal {\eta} such that

\displaystyle \lambda_0 = e^{i\theta}(1 - \eta)

and {|1 - \eta| \leq 1}. In the previous section we proved

\displaystyle U_\zeta(a) - U_\zeta(\lambda_0) = O(n^{-1}\sigma^2) + (\varepsilon + o(1))^n).

Using {n^{-1} = o(1)} and plugging in {\lambda_0} we have

\displaystyle U_\zeta(a) - U_\zeta(e^{i\theta}(1 - \eta)) = o(\sigma^2) + O((\varepsilon + o(1))^n).

Now

\displaystyle \text{Re } \eta \int_0^1 \frac{dt}{1 - t\eta + e^{-i\theta}\zeta} = \log |1 - e^{-i\theta}\zeta| - \log|1 - \eta - e^{-i\theta}\zeta| = \log|e^{i\theta} - \zeta| - \log|e^{i\theta} - e^{i\theta}\eta - \zeta|.

Taking expectations,

\displaystyle \text{Re }\eta \mathbf E\int_0^1 \frac{dt}{1 - t\eta + e^{-i\theta}\zeta} = U_\zeta(e^{i\theta}(1 - \eta)) - U_\zeta(e^{i\theta}).

Taking a Taylor expansion,

\displaystyle \frac{1}{1 - t\eta - e^{-i\theta}\zeta} = \frac{1}{1 - t\eta} + \frac{e^{-i\theta}\zeta}{(1 - t\eta)^2} + O(|\zeta|^2)

so by Fubini’s theorem

\displaystyle \mathbf E\int_0^1 \frac{dt}{1 - t\eta + e^{-i\theta}\zeta} = \int_0^1 \left(\frac{1}{1 - t\eta} + \frac{e^{-i\theta}}{(1 - t\eta)^2}\mu + O(|\mu|^2 + \sigma^2)\right)~dt;

using the previous lemma and {\eta = o(1)} we get

\displaystyle  U_\zeta(e^{i\theta}(1 - \eta)) - U_\zeta(e^{i\theta}) = \text{Re }\eta \int_0^1 \frac{dt}{1 - t\eta} + o(\sigma^2) + O((\varepsilon + o(1))^n).

We also have

\displaystyle \text{Re } \eta \int_0^1 \frac{dt}{1 - t\eta} = -\log \frac{1}{e^{i\theta} - e^{i\theta}\eta} = U_0(1 - \eta)

since {0} is deterministic (and {U_0(e^{i\theta} z) = U_0(z)}, and {U_0(1) = 0}; very easy to check!) I think Tao makes a typo here, referring to {U_i(e^{i\theta}(1 - \eta))}, which seems irrelevant. We do have

\displaystyle U_0(1 - \eta) = -\log|1 - \eta| \geq 0

since {|1 - \eta| \leq 0}. Plugging in

\displaystyle \text{Re } \eta \int_0^1 \frac{dt}{1 - t\eta} \geq 0

we get

\displaystyle U_\zeta(e^{i\theta} - e^{i\theta}\eta) - U_\zeta(e^{i\theta}) \geq -o(\sigma^2) - O((\varepsilon + o(1))^n).

I think Tao makes another typo, dropping the Big O, but anyways,

\displaystyle U_\zeta(a) - U_\zeta(e^{i\theta} - e^{i\theta}\eta) = o(\sigma^2) - O((\varepsilon + o(1))^n)

so by the triangle inequality

\displaystyle U_\zeta(a) - U_\zeta(e^{i\theta}) \geq -o(\sigma^2) - O((\varepsilon + o(1))^n).

By underspill, then, we can take {\varepsilon \rightarrow 0}. \Box

We need a result from complex analysis called Jensen’s formula which I hadn’t heard of before.

Theorem 11 (Jensen’s formula) Let {g: D(0, 1) \rightarrow \mathbf C} be a holomorphic function with zeroes {a_1, \dots, a_n \in D(0, 1)} and {g(0) \neq 0}. Then

\displaystyle \log |g(0)| = \sum_{j=1}^n \log |a_j| + \frac{1}{2\pi} \int_0^{2\pi} \log |g(e^{i\theta})| ~d\theta.

In hindsight this is kinda trivial but I never realized it. In fact {\log |g|} is subharmonic and in fact its Laplacian is exactly a linear combination of delta functions at each of the zeroes of {g}. If you subtract those away then this is just the mean-value property

\displaystyle \log |g(0)| = \frac{1}{2\pi} \int_0^{2\pi} \log |g(e^{i\theta})| ~d\theta.

Let us finally prove the final part. In what follows, implied constants are allowed to depend on {\varphi} but not on {\delta}.

Lemma 12 For any standard {\varphi \in C^\infty(\partial D(0, 1))},

\displaystyle \int_0^{2\pi} \varphi(e^{i\theta}) U_\zeta(e^{i\theta}) ~d\theta = o(\sigma^2) + o(1)^n.

Besides,

\displaystyle U_\zeta(a) = o(\sigma^2) + o(1)^n.

Proof: Let {m} be the Haar measure on {\partial D(0, 1)}. We first prove this when {\varphi \geq 0}. Since {T} is discrete and {\partial D(0, 1)} is compact, for any standard (or semi-infinitesimal) {\delta > 0}, there is a standard compact set

\displaystyle I \subseteq \partial D(0, 1) \setminus S

such that

\displaystyle m(\partial D(0, 1) \setminus I) < \delta.

By the previous lemma, if {e^{i\theta} \in I} then

\displaystyle \varphi(e^{i\theta}) U_\zeta(a) - \varphi(e^{i\theta}) U_\zeta(e^{i\theta}) \geq -o(\sigma^2) - o(1)^n

and the same holds when we average in Haar measure:

\displaystyle  U_\zeta(a)\int_I \varphi~dm - \int_I \varphi(e^{i\theta}) U_\zeta(e^{i\theta})~dm(e^{i\theta}) \geq -o(\sigma^2) - o(1)^n.

We have

\displaystyle |\log |e^{i\theta} - \zeta| + \text{Re } e^{-i\theta}\zeta| \leq |\log|3 - \zeta| + 3\text{Re } \zeta| \in L^2(dm(e^{i\theta}))

so, using the Cauchy-Schwarz inequality, one has

\displaystyle \int_{\partial D(0, 1) \setminus I} \varphi(e^{i\theta}) (\log |e^{i\theta} - \zeta| + \text{Re } e^{-i\theta}\zeta) ~dm(e^{i\theta}) = \sqrt{\int_I |\log |e^{i\theta} - \zeta| + \text{Re } e^{-i\theta}\zeta|} = O(\delta^{1/2}).

Meanwhile, if {|\zeta| \leq 1/2} then the fact that

\displaystyle \log |e^{i\theta} - \zeta| = \text{Re }-\frac{\zeta}{e^{i\theta}} + O(|\zeta|^2)

implies

\displaystyle \log |e^{i\theta} - \zeta| + \text{Re } \frac{\zeta}{e^{i\theta}} = O(|\zeta|^2)

and hence

\displaystyle \int_{\partial D(0, 1) \setminus I} \varphi(e^{i\theta}) (\log |e^{i\theta} - \zeta| + \text{Re } e^{-i\theta}\zeta) ~dm(e^{i\theta}) = O(\delta|\zeta|^2).

We combine these into the unified estimate

\displaystyle \int_{\partial D(0, 1) \setminus I} \varphi(e^{i\theta}) (\log |e^{i\theta} - \zeta| + \text{Re } e^{-i\theta}\zeta) ~dm(e^{i\theta}) = O(\delta^{1/2}|\zeta|^2)

valid for all {|\zeta| \leq 1}, hence almost surely. Taking expected values we get

\displaystyle \int_{\partial D(0, 1) \setminus I} \varphi(e^{i\theta})U_\zeta(e^{i\theta}) + \varphi(e^{i\theta}) \text{Re }e^{-i\theta}\mu ~dm(e^{i\theta}) = O(\delta^{1/2}(|\mu|^2 + \sigma^2)) + o(\sigma^2) + o(1)^n.

In the last lemma we bounded {|\mu|} so we can absorb all the terms with {\mu} in them to get

\displaystyle \int_{\partial D(0, 1) \setminus I} \varphi(e^{i\theta})U_\zeta(e^{i\theta}) ~dm(e^{i\theta}) = O(\delta^{1/2}\sigma^2) + o(\sigma^2) + o(1)^n.

We also have

\displaystyle \int_{\partial D(0, 1) \setminus I} \varphi ~dm = O(\delta)

(here Tao refers to a mysterious undefined measure {\sigma} but I’m pretty sure he means {m}). Putting these integrals together with the integrals over {I},

\displaystyle \ U_\zeta(a)int_{\partial D(0, 1)} \varphi ~dm - \int_{\partial D(0, 1)} \varphi(e^{i\theta}) U_\zeta(e^{i\theta}) ~dm(e^{i\theta}) \geq -O(\delta^{1/2}\sigma^2) - o(\sigma^2) - o(1)^n.

By underspill we can delete {\delta}, thus

\displaystyle  U_\zeta(a)\int_{\partial D(0, 1)} \varphi ~dm - \int_{\partial D(0, 1)} \varphi(e^{i\theta}) U_\zeta(e^{i\theta}) ~dm(e^{i\theta}) \geq - o(\sigma^2) - o(1)^n.

We now consider the specific case {\varphi = 1}. Then

\displaystyle U_\zeta(a) - \int_{\partial D(0, 1)} U_\zeta ~dm \geq -o(\sigma^2) - o(1)^n.

Now Tao claims and doesn’t prove

\displaystyle \int_{\partial D(0, 1)} U_\zeta ~dm = 0.

To see this, we expand as

\displaystyle \int_{\partial D(0, 1)} U_\zeta ~dm = -\mathbf E \frac{1}{2\pi} \int_0^{2\pi} \log|\zeta - e^{i\theta}| ~d\theta

using Fubini’s theorem. Now we use Jensen’s formula with {g(z) = \zeta - z}, which has a zero exactly at {\zeta}. This seems problematic if {\zeta = 0}, but we can condition on {|\zeta| > 0}. Indeed, if {\zeta = 0} then we have

\displaystyle  \int_0^{2\pi} \log|\zeta - e^{i\theta}| ~d\theta = \int_0^{2\pi} \log 1 ~d\theta = 0

which already gives us what we want. Anyways, if {|\zeta| > 0}, then by Jensen’s formula,

\displaystyle \frac{1}{2\pi} \int_0^{2\pi} \log|\zeta - e^{i\theta}| ~d\theta = \log |\zeta| - \log |\zeta| = 0.

So that’s how it is. Thus we have

\displaystyle -U_\zeta(a) \leq o(\sigma^2) + o(1)^n.

Since {|a - \zeta| \geq 1}, {\log |a - \zeta| \geq 0}, so the same is true of its expected value {-U_\zeta(a)}. This gives the desired bound

\displaystyle U_\zeta(a) = o(\sigma^2) + o(1)^n.

We can use that bound to discard {U_\zeta(a)} from the average

\displaystyle  U_\zeta(a)\int_{\partial D(0, 1)} \varphi ~dm - \int_{\partial D(0, 1)} \varphi(e^{i\theta}) U_\zeta(e^{i\theta}) ~dm(e^{i\theta}) \geq - o(\sigma^2) - o(1)^n,

thus

\displaystyle \int_{\partial D(0, 1)} \varphi(e^{i\theta}) U_\zeta(e^{i\theta}) ~dm(e^{i\theta})= o(\sigma^2) + o(1)^n.

Repeating the Jensen’s formula argument from above we see that we can replace {\varphi} with {\varphi - k} for any {k \geq 0}. So this holds even if {\varphi} is not necessarily nonnegative. \Box