Sundry facts about pseudodifferential operators

In this blog post I will just record some things I’ve been trying to learn about lately, largely just so I can have a place to collect my thoughts. Most of this is in Hörmander’s monograph on differential operators, and is motivated by trying to understand Vasy’s method and Atiyah-Singer index theory.

Pseudodifferential operators on manifolds.

Let us recall that a symbol on an open subset X of \mathbb R^d is by definition a smooth function on the cotangent bundle of X (for which certain seminorms are finite). This was curious to me — you can motivate it by saying that a symbol is an observable and the cotangent bundle is “phase space” in the sense that a point (x, \xi) \in T^*X consists of a position x and a momentum \xi, but why should the momentum live in a cotangent space and not the fiber of some other vector bundle? When we quantize a symbol a, defining an operator a(D) by formally substituting the differential operator D = -i\nabla in place of the momentum, we by definition obtain a pseudodifferential operator. Now let \kappa: X \to Y be a diffeomorphism, and introduce the pushforward symbol \kappa_* a(y, \eta) = e^{-iy\eta} a(\kappa^{-1}(y), D) a^{iy\eta}. This is the “right” definition in the sense that )\kappa_*a(x, D)u)(\kappa(x)) = a(x, D)u(\kappa(x)).

If a is a symbol of order m, then \kappa_* a(y, \eta) = a(\kappa^{-1}(y), \kappa'(\kappa^{-1}(y))^t \eta) modulo symbols of order m – 1. But \kappa'(x) is invariantly defined as an isomorphism of tangent bundles \kappa'(x): TX \to TY, so its transpose should be an isomorphism (\kappa')^{-1}(x): T^*Y \to T^*X of the dual bundle. This only makes sense if \eta \in T^*_yY is a covector at y.

The above paragraphs are totally obvious, and yet puzzled me for the past three years, until last week when I sat down and decided to work out the details for myself.

The consequence is that we cannot define the symbol of a pseudodifferential operator invariantly. Rather, we declare that a pseudodifferential operator A has the property that for every chart \kappa: X \to Y and every pair of cutoffs \phi, \psi on Y, then the operator \phi \circ \kappa_* \circ A \circ \kappa^* \circ \psi is a pseudodifferential operator on Y (in the sense that it is the quantization of a symbol on Y; here the pushforward \kappa_* is defined to be the inverse of the pullback \kappa^*). Since Y is an open subset of \mathbb R^d this makes sense.

Previously we have discussed pseudodifferential operators on manifolds M. These can be viewed more abstractly as acting on sections of the trivial line bundle M \times \mathbb C. However, in geometry one frequently has to deal with sections of more general vector bundles over M. For example, a 1-form is a section of the cotangent bundle. If E, F are vector bundles over M of rank r, s respectively, one may define the Hom-bundle Hom(E, F), which locally is isomorphic to the matrix bundle M \times \mathbb C^{r \times s}. Then a pseudodifferential operator from sections of E to sections of F is nothing more than a linear map which, after trivialization of E and F, looks like a $s \times r$ matrix of pseudodifferential operators on M. The principal symbol of such an operator sends the cotangent bundle of M into the Hom-bundle Hom(E, F).

Wavefront sets.

In this section we will impose that all pseudodifferential operators have Schwartz kernels K such that the projections of supp K are both proper maps. Modulo the space \Psi^{-\infty} of pseudodifferential operators of order -\infty, this assumption is no loss of generality. Under this assumption, the top-order term of a symbol — that is, the principal symbol — satisfies the pushforward formula \kappa_* a(y, \eta) = a(\kappa^{-1}(y), \kappa'(\kappa^{-1}(y))^t \eta), so the principal symbol is well-defined as an element of S^m/S^{m-1} (here S^\ell is the \ellth symbol class). The principal symbol encodes important information about the nature of the operator; for example we have:

Definition. An elliptic pseudodifferential operator of order m is one whose principal symbol is \sim |\xi|^m near infinity of each cotangent space.

The important property is that if A is an elliptic pseudodifferential operator, then A is also invertible modulo the quantization \Psi^{-\infty} of S^{-\infty}. For example the Laplace-Beltrami operator is elliptic on Riemannian manifolds since its symbol is \xi^2; since the quadratic form induced by a Lorentzian metric is not positive-definite, it follows that on Lorentzian manifolds, the Laplace-Beltrami operator is not elliptic. Since a Lorentzian Laplace-Beltrami operator is really just the d’Alembertian, whose symbol is \xi^2 - \tau^2, this should be no surprise.

Recall that a conic set in a vector space is a set which is closed under multiplication by conic scalars. A conic set in a vector bundle, then, is one which is conic in every fiber.

Definition. Let a be the principal symbol of a pseudodifferential operator A of order m. We say that A is noncharacteristic near (x_0, \xi_0) \in T^*M if there is a conic neighborhood of (x_0, \xi_0) wherein a(x, \xi) \sim |\xi|^m near infinity. Otherwise, we say that (x_0, \xi_0) is a characteristic point. The set of characteristic points is denoted Char A and the set of noncharacteristic points is denoted Ell A.

Thus a pseudodifferential operator A is noncharacteristic at (x, \xi) if in a neighborhood of x, A is elliptic when restricted to the direction \xi. By definition, Char A is closed, so we may make the following definition.

Definition. Let u be a distribution. The wavefront set WF(u) is the intersection of all sets Char A, where A ranges over pseudodifferential operators such that Au \in C^\infty.

Then WF(u) is a closed conic subset of the cotangent bundle T^*M, and its projection to M is exactly the singular support ss(u). Indeed, x \notin ss(u) iff for every pseudodifferential operator A in a sufficiently small neighborhood of x, Au \in C^\infty; in other words no matter how hard we try, we cannot force u to become singular without differentiating it away from x. The wavefront set also remembers the direction in which this singularity happens; by elliptic invertibility, it will not happen in a direction that A is noncharacteristic.

For example, the only way that u(x, y) = \delta_{y = 0} can be made smooth is by cutting off u to away from \{(x, y): y = 0\}, which can be done by pseudodifferential operators of order 0 which are elliptic in the x-direction, but not possibly in the y-direction, along the x-axis.

Pseudotransport equations.

Hyperbolic operators are meant to generalize the transport equation (\partial_t - \partial_x)u(t, x) = 0. Let us therefore begin by studying the “pseudotransport” equation (\partial_t + a(t, x, D_x))u(t, x) = 0.

We assume that t \mapsto a(t, x, D_x) is uniformly bounded in S^1 and continuous in C^\infty, and the real part of a is uniformly bounded from below. Then we have the energy estimate

\displaystyle \frac{1}{2} \int_0^T ||e^{-\lambda t} u(t)||_{H^s}^p \lambda~dt \leq ||u(0)||_{H^s}^p

valid for any s \in \mathbb R and \lambda large enough depending on s. Applying the Hanh-Banach theorem we conclude that for every initial data in H^s we can find u \in C^0([0, \infty) \to H^s) which solves the pseudotransport equation. In particular, given Schwartz initial data, it follows that u is smooth.

Now fix initial data \phi \in H^s and assume that the principal symbol exists and is imaginary. (This forces the transport operator to be real and of order 1.) Let q be a symbol of order 0 on space, with principal symbol q_0. If in fact Q(D) is a pseudodifferential operator on spacetime such that such at time 0, Q(0) = q, and Q(t, D) commutes with \partial_t + a(t, x, D_x) then Qu solves the pseudotransport equation. (Actually, we will find Q so that [Q(t), \partial_t + a(t, x, D_x)] is a pseudodifferential operator of order -\infty; this is good enough.) In particular if q\phi \in C^\infty_0 then WF(u) is contained in Char Q, and WF(u) should be the intersection of all such sets Char Q.

To compute WF(u), let ia_0 be the principal symbol of a(D) and suppose that Q \sim \sum_j Q_j, where Q_0 is principal, is given. Then the principal symbol of [\partial_t + a(t, x, D_x), Q(t, x, D)] is the Poisson bracket

\displaystyle \{\tau + a_0(t, x, \xi), Q_0(t, x, \xi)\} = (\partial_t + H_{a_0})Q_0

where H_p is the Hamilton vector field of a symbol p. By inducting on j, we can use this computation to compute Q_j and conclude that modulo an error term of order -\infty, we can choose Q to be invariant along the Hamiltonian flow \psi given by the Hamiltonian a_0. That is, if F_tu(0) = u(t), then WF \circ F_t = \psi_t \circ WF. This result is a sort of “propagation of singularities” for the pseudotransport equation, which generalizes the fact that the transport equation acts on Dirac masses by transporting them, as expected.

Solving the hyperbolic Cauchy problem.

Let X be a manifold that represents “spacetime”. A priori we may not have a Lorentzian metric to work with, so instead we fix a function \phi that is a “time coordinate”. The level surfaces of \phi can be viewed as “spacelike hypersurfaces” in X.

Throughout we will let X_0 = \{\phi = 0\} and X_+ = \{\phi > 0\} denote the present and future, respectively.

Definition. A hyperbolic operator is a differential operator P of principal symbol p and order m such that p(x, d\phi(x)) = 0 and for every (x, \xi) \in T^*M such that \xi is not in the span of d\phi, there are m distinct \tau \in \mathbb R such that p(x, \xi + \tau d\phi(x)) = 0.

Since P is a differential operator, p(x) is a homogeneous polynomial of order m. To make sense of the condition, let me restrict to the case that X = \mathbb R^2 with its usual Riemannian metric and \phi is the projection onto the t-axis. Then after rotating the first coordinate so that \xi is a covector dual to the x-axis, the condition says that given (x, t, \xi) we can find exactly m real numbers \tau such that p(x, t, \xi, \tau) = 0. In the case of the d’Alembertian, we have p(x, t, \xi, \tau) = \xi^2 - \tau^2, and indeed given \xi we can set \tau = \pm \xi.

To state the initial-value problem with initial data in the “initial-time slice” X_0, let v be a vector field such that v\phi = 1, so v points “forward in time”. The action of v is “differentiating with respect to time”. Note that this hypothesis prevents \phi from degenerating.

Theorem (solving the hyperbolic Cauchy problem). Let P be a hyperbolic operator of order m with smooth coefficients, Y a precompact open submanifold of X, and s \geq 0. Assume we are given an inhomogeneous term f \in H^s_{loc}(X_+) satisfying f|X_0 = 0 and initial data \psi_j \in H^{loc}_{s + m - 1 -j}(X_0), j < m. Then there is u \in H^{s + m - 1}_{loc}(X) supported in \overline X_+ such that Pu = f in X_+ \cap Y and v^ju = \psi_j in X_0 \cap Y.

The proof is in Chapter 23.2 of Hörmander. The idea is to first prove uniqueness of solutions. By compactness, we may cover Y with finitely many charts U which are isomorphic to open subsets of Minkowski spacetime in which level sets of \phi are spacelike hypersurfaces and orbits of v are worldlines. Since Minkowski spacetime has an honest-to-god time coordinate, the hyperbolicity hypothesis allows us to factor the principal symbol p into first-order factors, and hence factor P into pseudotransport operators on U, at least modulo a lower-order error. We may then apply the solution of the Cauchy problem for pseudotransport operators to solve the Cauchy problem for Pu = f in each chart U, and since there were only finitely many, uniqueness allows us to stitch the local solutions together into a global solution.

The proof outlined in the above paragraph is motivated by the special case when P is the d’Alembertian, which already appears in Chapter 2 of Evans. In that proof, one first observes that the Cauchy problem for the transport equation has an explicit solution. Then one reduces to the case that spacetime is two-dimensional, in which case there is an explicit factorization of P into transport operators, namely P = (\partial_x - \partial_t)(\partial_x + \partial_t).

Propagation of singularities, part I.

To study the propagation of singularities we need to recall some symplectic geometry. Let Q be a pseudodifferential operator on X and q its principal symbol. Then the Hamilton vector field H_q induces a flow on T^*X which preserves q.

Definition. The bicharacteristic flow of a pseudodifferential operator Q of principal symbol q is the flow of H_q on q^{-1}(0). A bicharacteristic of Q is an orbit of the bicharacteristic flow.

The intuition for the bicharacteristic flow is that its projection to X is “lightlike”, at least if Q is the d’Alembertian.

Theorem (Hörmander’s propagation of singularities). Let P be a pseudodifferential operator of order m such that the Schwartz kernel of P has proper support, and the principal symbol of P is real. Then for every distribution u, WF(u) – WF(f) is invariant under the bicharacteristic flow of P.

By definition of the wavefront set, for every distribution u, WF(u) – WF(Qu) is contained in Char Q. But if Q is a differential operator, then Char Q is exactly the “characteristic variety” q^{-1}(0), which is exactly the variety where the bicharacteristic flow of Q is defined. Therefore we can ask that WF(u) – WF(Qu) be invariant under the bicharacteristic flow.

If P is a hyperbolic operator of principal symbol p, then the solutions \tau of the equation p(x, \xi + \tau d\phi(x)) = 0 are all real and distinct, and modulo lower-order terms this can be used to enforce that the coefficients of p are real. We phrase this more simply by saying that the principal symbol of every hyperbolic operator is real.

A partial converse to the reality of principal symbols of hyperbolic operators holds. If Q is a differential operator, then its principal symbol q is a homogeneous polynomial on each cotangent space. Fixing a particular cotangent space, we can write q(\xi) = \sum_\alpha c_\alpha \xi^\alpha where \alpha ranges over all multiindices of order m and c_\alpha \in \mathbb R. In order that the characteristic variety of Q have more than one real point, there must be some c_\alpha positive and some negative. But this is exactly the situation of the d’Alembertian, whose principal symbol is q(\xi, \tau) = \xi^2 - \tau^2.

Thus, while the propagation of singularities theorem only assumes that the principal symbol is real, if the operator P is (for example) elliptic or parabolic, then the conclusion of the theorem is degenerate in the sense that the characteristic variety only has a single real point, so that WF(u) – WF(f) is invariant under EVERY group action on the characteristic variety, not just the bicharacteristic flow.

The interpretation of the propagation of singularities theorem is that P is something like the d’Alembertian, in which case p is something like a Lorentzian metric. The bicharacteristic flow is a flow on the characteristic bundle, which is the space whose points (x, \xi) consist of a position x and a lightlike momentum \xi. Therefore the projection of any bicharacteristic to X consists of a worldline. Thus, if the initial data is something like a Dirac mass at x, then the Dirac mass travels along the worldline containing x.

To prove the propagation of singularities theorem, we need a propagation estimate. Recall that if A is a pseudodifferential operator, then WF(A) denotes the microsupport of A; that is, the complement of the largest conic set on which A has order -\infty.

Theorem (propagation estimate). Let U be an open conic set, and let A, B, B_1 \in \Psi^0(X). Let P be a pseudodifferential operator of real principal symbol p and order m.
For every N > 0 and s \in \mathbb R there is C > 0 such that for every distribution u and every inhomogeneous term f with Pu = f,

\displaystyle ||Au||_{H^{s+m-1}} \leq C||B_1 f||_{H^s} + C||Bu||_{H^{s+m-1}} + C||u||_{H^{-N}}

given that the following criteria are met:

  1. The projection of U is precompact in X.
  2. For every (x, \xi) \in U, if p(x, \xi) = 0, then H_p and the radial vector field \xi\partial_\xi are linearly independent at (x, \xi).
  3. WF(A) and WF(B) are contained in U, while WF(1 - B_1) \cap U = \emptyset.
  4. For every trajectory (x(t), \xi(t)) of H_p with (x(0), \xi(0)) \in WF(A), there is T < 0 such that for every T \leq t \leq 0, (x(t), \xi(t)) \in U and (x(-T), \xi(-T)) \in Ell(B).

The term C||u||_{H^{-N}} is an error term created by the use of pseudodifferential operators and is not interesting. The operator B_1 is a cutoff which microlocalizes the problem to a neighborhood to the conic set U. We are interested in WF(u) – WF(f), so we want WF(B_1) \cap WF(f) and B_1|U = 1. Actually, since we only care about the complement of WF(f), we might as well take f Schwartz, in which case we can take B_1 = 1 and simplify the propagation estimate to

\displaystyle ||Au||_{H^{s+m-1}} \leq C||f||_{H^s} + C||Bu||_{H^{s+m-1}} + \text{error terms}.

The interesting point here is the relationship between the operators A and B. We can optimize the propagation estimate by assuming that WF(B) = Ell B. This is because we really desperately want B to be elliptic on its microsupport, so that it does not introduce any new singularities. Under the assumption WF(B) = Ell B, B is a microlocalization to WF(B), and if (x, \xi) \in WF(A), then (x, \xi) got to WF(A) after passing through WF(B). The point is that if u has a singularity at (x, \xi) \in WF(A), then (if the regularity exponent s is taken large enough) ||Au||_{H^{s+m-1}} = \infty, but we assumed f Schwartz, so this implies ||Bu||_{H^{s+m-1}} = \infty, so that if we traveled back along the bicharacteristic flow (x(t), \xi(t)) from (x, \xi) for long enough, we would see that u already had a singularity at some time (x(T), \xi(T)) with T < 0.

Moreover, the propagation estimate is time-reversible in the sense we can replace T < 0 with -T > 0. Thus the bicharacteristic flow neither creates nor destroys singularities in the distribution u. This readily implies the propagation of singularities theorem.

The proof of the propagation estimate is quite technical and this post is meant as a more of a conceptual discussion so I will omit it.

Topology and game design

Aside from being bad at math, I am also bad at Final Fantasy XIV. So it happened that, while attempting to be less bad at Final Fantasy XIV and better understanding an aspect of one of the game’s encounters, I actually became less bad at math, and now I wonder if game developers should incorporate more involved topology into their games’ design.

Final Fantasy XIV as it is.

Let me review how raiding on Final Fantasy XIV works, for those unfamiliar. A group of eight characters, each controlled by one player, fights one “boss” monster. The boss’ attacks are frequently lethal, so to avoid a game over, the players must avoid avoidable attacks. The game is designed so that if an attack can be avoided, it can only be avoided in a particularly precise, and often opaque, manner. Examples of this include deciphering lines of iambic poetry, executing intricate but scripted movement patterns, or interpreting the way to avoid a truly bizarre instant-kill attack using obscure tidbits from the game’s lore. And don’t let yourself get distracted by the head-banging soundtrack

A recent boss, the Shadowkeeper, introduced in Futures Rewritten, has an attack known as Giga Slash which can be solved by thinking of it as inducing an orientation on the platform in which the battle takes place.

Let me remind the reader that an orientation of a curve (a one-dimensional space) is a choice of which direction is considered “right”; an orientation of a surface (a two-dimensional space) is a choice of which direction is considered “clockwise”; an orientation of a three-dimensional space is a choice of which coordinates are considered “right-handed”; and so on. [1]

Giga Slash involves Shadowkeeper drawing a sword, which she then slashes either to the left or to the right of her with, depending on which hand she draws the blade with. In particular, the attack will divide the platform into two rectangles, one of which is lethal to stand in and the other of which is not. What makes Giga Slash more interesting is that frequently Shadowkeeper’s “shadow” — a separate entity — or the player characters’ shadows, will be the origin of the attack instead.

In the former case, the party must either all stand to the left or to the right of the boss’s shadow, where “left” and “right” depend not on the player’s perception but the direction the shadow faces; thus the boss’s position induces a one-dimensional orientation on the platform, which is the orientation that one must use to resolve the attack, rather than the “natural” orientation given by the fact that there is a canonical choice of north, south, east, and west built into the game; players often refer to this orientation as the “absolute positions” and the orientations given by boss positioning (and in this case, shadow position) as “relative positions”.

The fact that the party has to deal with “relative positions” is hardly unusual. What makes Shadowkeeper more unusual is the second case, wherein four characters’ shadows are each the origin of a copy of the attack. In that case, their shadow appears as a black blob which is always to the absolute north, south, east, or west, of the characters, no matter how the characters moves. More abstractly, the shadow can be viewed as a unit vector which originates at the character, and is translated but otherwise not acted on by player movement (and also not acted on by anything else, for that matter). One player is assigned absolute north, one absolute west, et cetra.

The point is that the character shadows will always slash to their left if the boss is holding the sword in her left hand, and vice versa. The goal of the players is to aim the slashes in such a way that there is a safe rectangle. Many guides involve trickery with rotating the camera and so on to ensure that this happens, but there is a simple solution. The hand the boss is using is equivalent to a choice of orientation on the platform. If the boss raises her left hand, then the orientation is counterclockwise; otherwise it is clockwise. Now the players must all stand so that their shadow vector is tangent to the circle centered on the boss’s hitbox, radius a little large than the boss’s hitbox, and oriented according to the boss’s hand, and this will ensure that the interior of the boss’s hitbox is safe, as demonstrated here. Unfortunately, because right and counterclockwise are usually the “positive” orientations in mathematics, but here right is associated with clockwise, I still frequently do this trick incorrectly…😞

Whether the tactic I just outlined is easier or more difficult to execute than just manipulating the camera angle, it demystified how choices of orientation “look” in practice to me. Unfortunately, while topology is my weakest area of math, one needs to choose an orientation in order to define curl [2] and so I wasted a lot of time trying to understand what the vorticity equation was actually trying to say, until Shadowkeeper cleared things up for me.

Final Fantasy XIV as it could be.

In Final Fantasy XIV, most platforms have a very simple geometry, either being a square or a circle. When there have been exceptions, the players have often exploited the geometry to avoid attacks in ways the developers did not intend to be possible, causing the developers to shy away from introducing any nontrivial geometry or topology into the fights. But the above discussion got me thinking: what if we fought a boss on a nonorientable surface, such as a Möbius strip? [3] Along with my friend Greg DeFillippo (who was the mastermind behind some of the below proposed attacks), I have tried to find out.

The first obstacle is determining which direction gravity faces. For this model, I think it’s reasonable that the gravity always face towards the Möbius strip, a la Super Mario Galaxy. However, one could also have a mechanic which reverses the flow of gravity at the whim of one of the players; then if the players are “above” the strip they need the gravity to point downwards, and if they are “below” the strip they need it to point upwards.

One simple attack could consist of a blade that sweeps across the Möbius strip, killing anyone it touches; the only way to dodge it is to simply jump to the other side of the strip. Since the Möbius strip only has one side, the blade eventually sweeps over the entire strip, forcing everyone to dodge twice.

A more interesting example requires the use of a mechanic commonly seen in Final Fantasy XIV known as “proximity”. Proximity requires that two characters be sufficiently far from each other when the attack completes, or they will die. However, since the attack is on a Möbius strip, if the players run too far they will end up close to each other, in spite of how far they have run. (This attack could also be done on a torus, as in Pac-Man, so it does not use nonorientability, but it does use the existence of a nontrivial topology.)

Another example uses a mechanic known as “forced march”. This assigns an arrow to each character which causes them to run in that direction relative to the direction they are facing at the start of the attack. For example, a character that is facing towards true north and assigned a right arrow will run east at the start of the attack. The goal is for the player to face their character in a direction so that they avoid the (possibly several) other attacks that go out at the same time as the forced march. This requires the player to think about orientation; but this becomes much harder to do when the platform itself is nonorientable! For example, if the arrow faced right and the character faced true north to avoid an attack to the west, the character would run east, but then find themself in the west, exactly where they did not want to be.

The forced march can be modified to more strongly use nonorientability. One can locally define what it means to be clockwise, say on the top of the Möbius strip, and this will contradict what it means to be clockwise on the bottom of the strip. If the forced march, instead of a straight line, forced characters to run in a circle, the forced march would have two different effects if the character was on the top or the bottom of the strip. (If the character was on the side of the strip, either the attack would have to kill them instantly, or simply be completely unpredictable.)

I’d love to see other examples of mechanics that are designed for platforms with nontrivial topology. If you can cook up any particular cruel examples, post them in the comments below 😈

Technical notes.

[1] More abstractly, recall that if A \in GL(d) is a d \times d invertible matrix, then the determinant of A is either positive, in which case we say it is orientation-preserving, or negative, in which we case we say it is orientation-reversing. A change of coordinates is said to be orientation-preserving (resp. reversing) if its Jacobian matrix is orientation-preserving (resp. reversing). Thus on an orientable manifold there exist two possible orientations — in the low-dimensional cases, right and left, clockwise and counterclockwise, and right-handed and left-handed.

[2] The curl of a vector field V is by definition the Hodge dual of the derivative of the Hodge dual of V, and Hodge duality is only defined up to a choice of orientation. A much more concrete definition of curl is to first declare that if V is a vector field on a surface, then the curl of V is the angular momentum of a unit mass particle whose velocity field is V, and then if V is a vector field on a three-dimensional space, then the curl of V in the direction of a unit tangent vector e is the curl of V in the plane e^\perp. The trouble is that angular momentum of a particle rotating in positive orientation is by definition positive, so one first needs to decide what one means by positive orientation.

[3] Originally I wanted to do this on a Klein bottle but could not determine how to depict raids on a surface that does not embed in three-dimensional space.

What I want to learn, Spring 2021

As much for my own future reference as for anything, here’s a summary of some things I’d like to learn, maybe not this season, but soon.

First on the docket, I’d like to learn Vasy’s method. This is a technique for meromorphically continuing the resolvent of the Schrödinger operator on an asymptotically hyperbolic manifold — that is, a manifold which, near its boundary, looks like the Poincaré model of hyperbolic space does near its boundary. A priori the definition of the the resolvent only makes sense on a small open subset of the complex plane, and one hopes to show that the definition of the resolvent makes sense on the entire plane, except possibly a discrete set of poles.

On a somewhat similar note, I’d like to learn the Atiyah-Singer index theorem. This theorem equates the Fredholm index of an elliptic pseudodifferential operator on a line bundle L to its “topological index”, which is a rational number defined in terms of the cohomology of L. This is largely motivated by my quest to understand the sense in which cohomology counts solutions to PDE, c.f. my recent post on the genera of Riemann surfaces. I previously tried to learn the heat-kernel proof of Atiyah-Singer shortly after I first learned about pseudodifferential operators but got nowhere. This time, I will armed with the knowledge of the Riemann-Roch theorem, which may make all the difference.

Unlike the previous two requests, which are both PDE-analytic in nature, I think that my knowledge of complex analysis has prepared me to learn the proof that there are twenty-seven lines on a cubic surface in \mathbb P^3. This would entirely be for fun, and I may blog about it, so as to tell the story of a hapless analyst faffing around hopelessly in deep algebra.

Finally, I would like to fix up and publicize the Sage code that is mentioned by my paper on computation of Kac-Moody root multiplicities with Joshua Lin and Peter Connick. I suspect that this will require learning some nontrivial representation theory and complexity theory, though in its current form the algorithm is essentially a consequence of elementary facts about quadratic forms over \mathbb Z.

Elliptic regularity implies that compact genera are finite

A few years ago I took a PDE course. We were learning about something to do with elliptic pseudodifferential operators and the speaker drew a commutative diagram on the board and said, “You see, this comes from a short exact sequence –” and the whole room started laughing in discomfort. The speaker then remarked that Craig Evans himself would ban him from teaching analysis if word of the incident ever leaked, which might have something to do with why I have not disclosed the speaker’s name 🥵

Before recently, I found topology to be quite a scary area of math. It is still very much my weakest suit, but I should like to have some amount of competency with it. I have since come around to the viewpoint that cohomology is just a clever gadget for counting solutions of PDE. This has made the pill a little easier to swallow, and makes the previous anecdote all the more awkward.

As part of my ventures into trying to learn topology, in this post I will give a proof that the genus of any compact Riemann surface is finite. I am confident that this proof is not original, because it’s sort of the obvious proof if an analyst trying to prove this fact just followed their nose, but it seems a lot more natural to me than the proof in Forster, so let’s do this.

[Since the time of writing, I have made some corrections to incorrect or confusing statements. Thanks to Sarah Griffith for pointing these out!]

Let us start with some generalities. Fix a compact Riemann surface {X}, references to which we will suppress when possible. Let

\displaystyle 0 \rightarrow A \rightarrow B \rightarrow C \rightarrow 0

be a short exact sequence of sheaves. In our case, the sheaves will be sheaves of Fréchet spaces on {X}, which might not be homologically kosher, but that won’t cause any real issues. Then we get a long exact sequence in cohomology

\displaystyle 0 \rightarrow H^0(A) \rightarrow H^0(B) \rightarrow H^0(C) \rightarrow H^1(A) \rightarrow H^1(B) \rightarrow H^1(C) \rightarrow \cdots.

If B is a fine sheaf, i.e. it has partitions of unity subordinate to every open cover, then {H^1(B) = 0} and the long exact sequence collapses to the exact sequence

\displaystyle 0 \rightarrow H^0(A) \rightarrow B(X) \rightarrow C(X) \rightarrow H^1(A) \rightarrow 0.

In particular, the morphism of sheaves {B \rightarrow C} induces a bounded linear map {T: B(X) \rightarrow C(X)} such that {H^0(A)} is the kernel of {T} and {H^1(A)} is the cokernel of {T}. Now, if {T} is a Fredholm operator, then its index {k} satisfies

\displaystyle k = \text{dim } H^0(A) - \text{dim } H^1(A).

Let {\mathcal O} denote the sheaf of holomorphic functions on {X} and {\overline \partial} the Cauchy-Riemann operator. Let {\mathcal E} denote the sheaf of smooth functions on {X}; since {X} has enough partitions of unity, {\mathcal E} is a fine sheaf. The maps {\overline \partial: \mathcal E(U) \rightarrow \mathcal E(U)}, for {U \subseteq X} open, induces a short exact sequence of sheaves of Fréchet spaces

\displaystyle 0 \rightarrow \mathcal O \rightarrow \mathcal E \rightarrow \mathcal E \rightarrow 0

and hence an exact sequence in cohomology

\displaystyle 0 \rightarrow \mathbf C \rightarrow \mathcal E(X) \rightarrow \mathcal E(X) \rightarrow H^1(\mathcal O) \rightarrow 0.

Here we used Liouville’s theorem. On the other hand, the dimension of {H^1(\mathcal O)} is by definition the genus {g} of {X}. Therefore, if {k} is the Fredholm index of {\overline \partial}, then

\displaystyle g = 1 - k.

It remains to show that {k} is well-defined and finite; that is, {\overline \partial} is Fredholm. This is a standard elliptic regularity argument, which I will now recall. We first fix a volume form {dV} on {X}, which exists since {X} is an orientable surface. This induces an {L^2} norm on {X}, namely

\displaystyle ||u||_{L^2} = \int_X |u|^2 ~dV.

Unfortunately the usual Sobolev notation {H^s} clashes with the notation for cohomology, so let me use {W^s} to denote the completion of {\mathcal E} under the norm

\displaystyle ||u||_s = \sum_{|\alpha| \leq s} ||\partial^\alpha u||_{L^2}

where {\alpha} ranges over multiindices. Then {W^0 = L^2} and {\overline \partial} maps {W^1 \rightarrow W^0}. The kernel of {\overline \partial} is finite-dimensional (since it is isomorphic to {\mathbf C}, by Liouville’s theorem and Weyl’s lemma), so to deduce that {\overline \partial} is Fredholm as an operator {W^1 \rightarrow W^0} it suffices to show that the cokernel of {\overline \partial} is finite-dimensional.

We first claim the elliptic regularity estimate

\displaystyle ||u||_1 \leq C ||f||_0 + C ||u||_0

for any smooth functions u,f which satisfy {\overline \partial}u = f. By definition of the Sobolev norm, we have

\displaystyle ||u||_1 = ||u||_0 + ||u'||_0 + ||f||_0.

Without loss of generality, we may assume that {u} is smooth. Then we can write {u = v + w} where {v} and {\overline w} are holomorphic. In particular, {u' = v'} and {f = \overline \partial w}, so

\displaystyle ||u||_1 = ||u||_0 + ||v'||_0 + ||f||_0.

The only troublesome term here is {v'}. Taking a Cauchy estimate, we see that

\displaystyle |v'(z)| \leq ||v||_{L^\infty} \leq C||v||_{L^2} = C||v||_0.

But {X} is compact, so has finite volume; therefore

\displaystyle ||v'||_0 = ||v'||_{L^2} \leq C||v||_{L^\infty} \leq C||v||_0 \leq C||u||_0.

This gives the desired bound.

Let {u_n} be a sequence in {W^1} with {f_n = \overline \partial u_n \in W^0}, and assume that the {f_n} are Cauchy in {W^0}. Without loss of generality we may assume that {u_n \in K^\perp} where {K} is the kernel of {\overline \partial}. If the {u_n} are not bounded in {W^1}, we may replace them with {u_n/||u_n||_1}, and thus assume that they are in fact bounded. By the Rellich-Kondrachov theorem (which says that the natural map {W^1 \rightarrow W^0} is compact), we may therefore assume that the {u_n} are Cauchy in {W^0}. But then

\displaystyle ||u_n - u_m||_1 \leq C ||f_n - f_m||_0 + C ||u_n - u_m||_0

so the {u_n} are Cauchy in {W^1}. Therefore the {u_n} converge in {K^\perp}, hence the {f_n} converge in the image {Z} of {\overline \partial}, since {\overline \partial} gives an isomorphism {K^\perp \rightarrow Z}. Therefore {Z} is closed.

If one applies integration by parts to {\overline \partial}, the fact that X has no boundary implies that for any f,g,

\displaystyle \langle \overline \partial f, g\rangle = \int_X \overline \partial f \overline g ~dV = -\int_X f \overline{\partial g} ~dV = -\langle f, g'\rangle

and thus \overline \partial^* = -\partial. Since Z is closed, the dual of the cokernel of {\overline \partial} is the kernel L of -\partial; by the Rellich-Kondrachov theorem, the unit ball of L is compact and therefore L is finite-dimensional. By the Hanh-Banach theorem, this implies that the cokernel of {\overline \partial} is finite-dimensional. Therefore {k} and hence {g} is finite.

A PDE-analytic proof of the fundamental theorem of algebra

The fundamental theorem of algebra is one of the most important theorems in mathematics, being core to algebraic geometry and complex analysis. Unraveling the definitions, it says:

Fundamental theorem of algebra. Let f be a polynomial over \mathbf C of degree d. Then the equation f(z) = 0 has d solutions z, counting multiplicity.

Famously, most proofs of the fundamental theorem of algebra are complex-analytic in nature. Indeed, complex analysis is the natural arena for such a theorem to be proven. One has to use the fact that \mathbf R is a real closed field, but since there are lots of real closed fields, one usually defines \mathbf R in a fundamentally analytic way and then proves the intermediate value theorem, which shows that \mathbf R is a real closed field. One can then proceed by tricky algebraic arguments (using, e.g. Galois or Sylow theory), or appeal to a high-powered theorem of complex analysis. Since the fundamental theorem is really a theorem about algebraic geometry, and complex analysis sits somewhere between algebraic geometry and PDE analysis in the landscape of mathematics (and we need some kind of analysis to get the job done; purely algebro-geometric methods will not be able to distinguish \mathbf R from another field K such that -1 does not have a square root in K) it makes a lot of sense to use complex analysis.

But, since complex analysis sits between algebraic geometry and PDE analysis, why not abandon all pretense of respectability (that is to say, algebra — analysis is not a field worthy of the respect of a refined mathematician) and give a PDE-analytic proof? Of course, this proof will end up “looking like” multiple complex-analytic proofs, and indeed it is basically the proof by Liouville’s theorem dressed up in a trenchcoat (and in fact, gives Liouville’s theorem, and probably some other complex-analytic results, as a byproduct). In a certain sense — effectiveness — this proof is strictly inferior to the proof by the argument principle, and in another certain sense — respectability — this proof is strictly inferior to algebraic proofs. However, it does have the advantage of being easy to teach to people working in very applied fields, since it entirely only uses the machinery of PDE analysis, rather than fancy results such as Liouville’s theorem or the Galois correspondence.

The proof
By induction, it suffices to prove that if f is a polynomial with no zeroes, then f is constant. So suppose that f has no zeroes, and introduce g(z) = 1/f(z). As usual, we want to show that g is constant.

Since f is a polynomial, it does not decay at infinity, so g(\infty) is finite. Therefore g can instead be viewed as a function on the sphere, g: S^2 \to \mathbf C, by stereographic projection. Also by stereographic projection, one can cover the sphere by two copies of \mathbf R^2, one centered at the south pole that misses only the north pole, and one centered at the north pole that only misses the south pole. Thus one can define the Laplacian, \Delta = \partial_x^2 + \partial_y^2, in each of these coordinates; it remains well-defined on the overlaps of the charts, so \Delta is well-defined on all of S^2. (In fancy terminology, which may help people who already know ten different proofs of the fundamental theorem of algebra but will not enlighten anyone else, we view S^2 as a Riemannian manifold under the pushforward metric obtained by stereographic projection, and consider the Laplace-Beltrami operator of S^2.)

Recall that a function u is called harmonic provided that \Delta u = 0. We claim that g is harmonic. The easiest way to see this is to factor \Delta = 4\partial\overline \partial where 2\partial = \partial_x - i\partial_y. Then \overline \partial u = 0 exactly if u has a complex derivative, by the Cauchy-Riemann equations. There are other ways to see this, too, such as using the mean-value property of harmonic functions and computing the antiderivative of g. In any case, the proof is just calculus.

So g is a harmonic function on the compact connected manifold S^2; by the extreme value theorem, g has (or more precisely, its real and imaginary parts have) a maximum. By the maximum principle of harmonic functions (which is really just the second derivative test — being harmonic generalizes the notion of having zero second derivative), it follows that g is equal to its maximum, so is constant. (In fancy terminology, we view g as the canonical representative of the zeroth de Rham cohomology class of S^2 using the Hodge theorem.)

Let’s Read: Sendov’s conjecture in high degree, part 4: details of case one

In this proof we (finally!) finish the proof of case one.

As usual, we throughout fix a nonstandard natural {n} and a complex polynomial of degree {n} whose zeroes are all in {\overline{D(0, 1)}}. We assume that {a} is a zero of {f} whose standard part is {1}, and assume that {f} has no critical points in {\overline{D(a, 1)}}. Let {\lambda} be a random zero of {f} and {\zeta} a random critical point. Under these circumstances, {\lambda^{(\infty)}} is uniformly distributed on {\partial D(0, 1)} and {\zeta^{(\infty)}} is almost surely zero. In particular,

\displaystyle \mathbf E \log\frac{1}{|\lambda|}, \mathbf E \log |\zeta - a| = O(n^{-1})

and {\zeta} is infinitesimal in probability, hence infinitesimal in distribution. Let {\mu} be the expected value of {\zeta} (thus also of {\lambda}) and {\sigma^2} its variance. I think we won’t need the nonstandard-exponential bound {\varepsilon_0^n} this time, as its purpose was fulfilled last time.

Last time we reduced the proof of case one to a sequence of lemmata. We now prove them.

1. Preliminary bounds

Lemma 1 Let {K \subseteq \mathbf C} be a compact set. Then

\displaystyle f(z) - f(0), ~f'(z) = O((|z| + o(1))^n)

uniformly for {z \in K}.

Proof: It suffices to prove this for a compact exhaustion, and thus it suffices to assume

\displaystyle K = \overline{D(0, R)}.

By underspill, it suffices to show that for every standard {\varepsilon > 0} we have

\displaystyle |f(z) - f(0)|, ~|f'(z)| \leq C(|z| + \varepsilon)^n.

We first give the proof for {f'}.

First suppose that {\varepsilon < |z| \leq R}. Since {\zeta} is infinitesimal in distribution,

\displaystyle \mathbf E \log |z - \zeta| \leq \mathbf E \log \max(|z - \zeta|, \varepsilon/2) \leq \log \max(|z|, \varepsilon/2) + o(1);

here we need the {\varepsilon/2} and the {R} since {\log |z - \zeta|} is not a bounded continuous function of {\zeta}. Since {\varepsilon < |z|} we have

\displaystyle \mathbf E \log |z - \zeta| \leq \log |z| + o(1)

but we know that

\displaystyle -\frac{\log n}{n - 1} - \frac{1}{n - 1} \log |f'(z)| = U_\zeta(z) = -\mathbf E \log |z - \zeta|

so, solving for {\log |f'(z)|}, we get

\displaystyle \log |f'(z)| \leq (n - 1) \log |z| + o(n);

we absorbed a {\log n} into the {o(n)}. That gives

\displaystyle |f'(z)| \leq e^{o(n)} |z|^{n-1}.

Since {f'} is a polynomial of degee {n - 1} and {f} is monic (so the top coefficient of {f'} is {n}) this gives a bound

\displaystyle |f'(z)| \leq e^{o(n)} (|z| + \varepsilon)^{n - 1}

even for {|z| \leq \varepsilon}.

Now for {f}, we use the bound

\displaystyle |f(z) - f(0)| \leq \max_{|w| < |z|} |f'(w)|

to transfer the above argument. \Box

2. Uniform convergence of {\zeta}

Lemma 2 There is a standard compact set {S \subseteq \overline{D(0, 1)}} and a standard countable set {T \subseteq \overline{D(0, 1)} \setminus \overline{D(1, 1)}} such that

\displaystyle S = (\overline{D(0, 1)} \cap \partial D(1, 1)) \cup T,

all elements of {T} are isolated in {S}, and {||\zeta - S||_{L^\infty}} is infinitesimal.

Tao claims

\displaystyle \mathbf P(|\zeta - a| \geq \frac{1}{2m}) = O(n^{-1})

where {m} is a large standard natural, which makes no sense since the left-hand side should be large (and in particular, have positive standard part). I think this is just a typo though.

Proof: Since {\zeta} was assumed far from {a = 1 - o(1)} we have

\displaystyle \zeta \in \overline{D(0, 1)} \setminus D(1, 1 - o(1)).

We also have

\displaystyle \mathbf E \log |\zeta - a| = O(n^{-1})

so for every standard natural {m} there is a standard natural {k_m} such that

\displaystyle \mathbf P(\log |\zeta - a| \geq \frac{1}{2m}) \leq \frac{k_m}{n}.

Multiplying both sides by {n} we see that

\displaystyle \text{card } Z \cap K_m = \text{card } Z \cap \{\zeta_0 \in \overline{D(0, 1)}: \log |\zeta_0 - a| \geq \frac{1}{2m}\} \leq k_m

where {Z} is the variety of critical points {f' = 0}. Let {T_m} be the set of standard parts of zeroes in {K_m}; then {T_m} has cardinality {\leq k_m} and so is finite. For every zero {\zeta_0 \in Z}, either

  1. For every {m},

    \displaystyle |\zeta_0 - a| < \exp\left(\frac{1}{2m}\right)

    so the standard part of {|\zeta_0 - a|} is {1}, or

  2. There is an {m} such that {d(\zeta_0, T_m)} is infinitesimal.

So we may set {T = \bigcup_m T_m}; then {T} is standard and countable, and does not converge to a point in {\partial D(1, 1)}, so {S} is standard and {||\zeta - S||_{L^\infty}} is infinitesimal.

I was a little stumped on why {S} is compact; Tao doesn’t prove this. It turns out it’s obvious, I was just too clueless to see it. The construction of {T} forces that for any {\varepsilon > 0}, there are only finitely many {z \in T} with {|z - \partial D(1, 1)| \geq \varepsilon}, so if {T} clusters anywhere, then it can only cluster on {\partial D(1, 1)}. This gives the desired compactness. \Box

The above proof is basically just the proof of Ascoli’s compactness theorem adopted to this setting and rephrased to replace the diagonal argument (or 👏 KEEP 👏 PASSING 👏 TO 👏 SUBSEQUENCES 👏) with the choice of a nonstandard natural. I think the point is that, once we have chosen a nontrivial ultrafilter on {\mathbf N}, a nonstandard function is the same thing as sequence of functions, and the ultrafilter tells us which subsequences of reals to pass to.

3. Approximating {f,f'} outside of {S}

We break up the approximation lemma into multiple parts. Let {K} be a standard compact set which does not meet {S}. Given a curve {\gamma} we denote its arc length by {|\gamma|}; we always assume that an arc length does exist.

A point which stumped me for a humiliatingly long time is the following:

Lemma 3 Let {z, w \in K}. Then there is a curve {\gamma} from {z} to {w} which misses {S} and satisfies the uniform estimate

\displaystyle |z - w| \sim |\gamma|.

Proof: We use the decomposition of {S} into the arc

\displaystyle S_0 = \partial D(1, 1) \cap \overline{D(0, 1)}

and the discrete set {T}. We try to set {\gamma} to be the line segment {[z, w]} but there are two things that could go wrong. If {[z, w]} hits a point of {T} we can just perturb it slightly by an error which is negligible compared to {[z, w]}. Otherwise we might hit a point of {S_0} in which case we need to go the long way around. However, {S_0} and {K} are compact, so we have a uniform bound

\displaystyle \max(\frac{1}{|z - S_0|}, \frac{1}{|w - S_0|}) = O(1).

Therefore we can instead consider a curve {\gamma} which goes all the way around {S_0}, leaving {D(0, 1)}. This curve has length {O(1)} for {z, w} close to {S_0} (and if {z, w} are far from {S_0} we can just perturb a line segment without generating too much error). Using our uniform max bound above we see that this choice of {\gamma} is valid. \Box

Recall that the moments {\mu,\sigma} of {\zeta} are infinitesimal.

Since {||\zeta - S||_{L^\infty}} is infinitesimal, and {K} is a positive distance from any infinitesimals (since it is standard compact), we have

\displaystyle |z - \zeta|, |z - \mu| \sim 1

uniformly in {z}. Therefore {f} has no critical points near {K} and so {f''/f'} is holomorphic on {K}.

We first need a version of the fundamental theorem.

Lemma 4 Let {\gamma} be a contour in {K} of length {|\gamma|}. Then

\displaystyle f'(\gamma(1)) = f'(\gamma(0)) \left(\frac{\gamma(1) - \mu}{\gamma(0) - \mu}\right)^{n - 1} e^{O(n) |\gamma| \sigma^2}.

Proof: Our bounds on {|z - \zeta|} imply that we can take the Taylor expansion

\displaystyle \frac{1}{z - \zeta} = \frac{1}{z - \mu} + \frac{\zeta - \mu}{(z - \mu)^2} + O(|\zeta - \mu|^2)

of {\zeta} in terms of {\mu}, which is uniform in {\zeta}. Taking expectations preserves the constant term (since it doesn’t depend on {\zeta}), kills the linear term, and replaces the quadratic term with a {\sigma^2}, thus

\displaystyle s_\zeta(z) = \frac{1}{z - \mu} + O(\sigma^2).

At the start of this series we showed

\displaystyle f'(\gamma(1)) = f'(\gamma(0)) \exp\left((n-1)\int_\gamma s_\zeta(z) ~dz\right).

Plugging in the Taylor expansion of {s_\zeta} we get

\displaystyle f'(\gamma(1)) = f'(\gamma(0)) \exp\left((n-1)\int_\gamma \frac{dz}{z - \zeta}\right) e^{O(n) |\gamma| \sigma^2}.

Simplifying the integral we get

\displaystyle \exp\left((n-1)\int_\gamma \frac{dz}{z - \zeta}\right) = \left(\frac{\gamma(1) - \mu}{\gamma(0) - \mu}\right)^{n - 1}

whence the claim. \Box

Lemma 5 Uniformly for {z,w \in K} one has

\displaystyle f'(w) = (1 + O(n|z - w|\sigma^2 e^{o(n|z - w|)})) \frac{(w - \mu)^{n-1}}{(z - \mu)^{n - 1}}f'(z).

Proof: Applying the previous two lemmata we get

\displaystyle f'(w) = e^{O(n|z - w|\sigma^2)} \frac{(w - \mu)^{n-1}}{(z - \mu)^{n - 1}}f'(z).

It remains to simplify

\displaystyle e^{O(n|z - w|\sigma^2)} = 1 + O(n|z - w|\sigma^2 e^{o(n|z - w|)}).

Taylor expanding {\exp} and using the self-similarity of the Taylor expansion we get

\displaystyle e^z = 1 + O(|z| e^{|z|})

which gives that bound. \Box

Lemma 6 Let {\varepsilon > 0}. Then

\displaystyle f(z) = f(0) + \frac{1 + O(\sigma^2)}{n} f'(z) (z - \mu) + O((\varepsilon + o(1))^n).

uniformly in {z \in K}.

Proof: We may assume that {\varepsilon} is small enough depending on {K}, since the constant in the big-{O} notation can depend on {K} as well, and {\varepsilon} only appears next to implied constants. Now given {z} we can find {\gamma} from {z} to {\partial B(0, \varepsilon)} which is always moving at a speed which is uniformly bounded from below and always moving in a direction towards the origin. Indeed, we can take {\gamma} to be a line segment which has been perturbed to miss the discrete set {T}, and possibly arced to miss {S_0} (say if {z} is far from {D(0, 1)}). By compactness of {K} we can choose the bounds on {\gamma} to be not just uniform in time but also in space (i.e. in {K}), and besides that {\gamma} is a curve through a compact set {K'} which misses {S}. Indeed, one can take {K'} to be a closed ball containing {K}, and then cut out small holes in {K'} around {T} and {S_0}, whose radii are bounded below since {K} is compact. Since the moments of {\zeta} are infinitesimal one has

\displaystyle \int_\gamma (w - \mu)^{n-1} ~dw = \frac{(z - \mu)^n}{n} - \frac{\varepsilon^n e^{in\theta}}{n} = \frac{(z - \mu)^n}{n} - O((\varepsilon + o(1))^n).

Here we used {\varepsilon < 1} to enforce

\displaystyle \varepsilon^n/n = O(\varepsilon^n).

By the previous lemma,

\displaystyle f'(w) = (1 + O(n|z - w|\sigma^2 e^{o(n|z - w|)})) \frac{(w - \mu)^{n-1}}{(z - \mu)^{n - 1}}f'(z).

Integrating this result along {\gamma} we get

\displaystyle f(\gamma(0)) = f(\gamma(1)) - \frac{f'(\gamma(0))}{(\gamma(0) - \mu)^{n-1}} \left(\int_\gamma (w - \mu)^{n-1} ~dw + O\left(n\sigma^2 \int_\gamma|\gamma(0) - w| e^{o(n|\gamma(0) - w|)}|w - \mu|^{n-1}~dw \right) \right).

Applying our preliminary bound, the previous paragraph, and the fact that {|\gamma(1)| = \varepsilon}, thus

\displaystyle f(\gamma(1)) = f(0) + O((\varepsilon + o(1))^n),

we get

\displaystyle f(z) = f(0) + O((\varepsilon + o(1))^n) - \frac{f'(z)}{(z - \mu)^{n-1}} \left(\frac{(z - \mu)^n}{n} - O((\varepsilon + o(1))^n) + O\left(n\sigma^2 \int_\gamma|z - w| e^{o(n|z - w|)}|w - \mu|^{n-1}~dw \right)\right).

We treat the first term first:

\displaystyle \frac{f'(z)}{(z - \mu)^{n-1}} \frac{(z - \mu)^n}{n} = \frac{1}{n} f'(z) (z - \mu).

For the second term, {z \in K} while {\mu^{(\infty)} \in K}, so {|z - \mu|} is bounded from below, whence

\displaystyle \frac{f'(z)}{(z - \mu)^{n-1}} O((\varepsilon + o(1))^n) = O((\varepsilon + o(1))^n).

Thus we simplify

\displaystyle f(z) = f(0) + O((\varepsilon + o(1))^n) + \frac{1}{n} f'(z) (z - \mu) + \frac{f'(z)}{(z - \mu)^{n-1}} O\left(n\sigma^2 \int_\gamma|z - w| e^{o(n|z - w|)}|w - \mu|^{n-1}~dw \right).

It will be convenient to instead write this as

\displaystyle f(z) = f(0) + O((\varepsilon + o(1))^n) + \frac{1}{n} f'(z) (z - \mu) + O\left(n|f'(z)|\sigma^2 \int_\gamma|z - w| e^{o(n|z - w|)} \left|\frac{w - \mu}{z - \mu}\right|^{n-1}~dw \right).

Now we deal with the pesky integral. Since {\gamma} is moving towards {\partial B(0, \varepsilon)} at a speed which is bounded from below uniformly in “spacetime” (that is, {K \times [0, 1]}), there is a standard {c > 0} such that if {w = \gamma(t)} then

\displaystyle |w - \mu| \leq |z - \mu| - ct

since {\gamma} is going towards {\mu}. (Tao’s argument puzzles me a bit here because he claims that the real inner product {\langle z - w, z\rangle} is uniformly bounded from below in spacetime, which seems impossible if {w = z}. I agree with its conclusion though.) Exponentiating both sides we get

\displaystyle \left|\frac{w - \mu}{z - \mu}\right|^{n-1} = O(e^{-nct})

which bounds

\displaystyle f(z) = f(0) + O((\varepsilon + o(1))^n) + \frac{1}{n} f'(z) (z - \mu) + O\left(n|f'(z)|\sigma^2 \int_0^1 te^{-(c-o(1))nt} ~dt\right).

Since {c} is standard, it dominates the infinitesimal {o(1)}, so after shrinking {c} a little we get a new bound

\displaystyle f(z) = f(0) + O((\varepsilon + o(1))^n) + \frac{1}{n} f'(z) (z - \mu) + O\left(n|f'(z)|\sigma^2 \int_0^1 te^{-cnt} ~dt\right).

Since {n\int_0^1 te^{-cnt} ~dt} is exponentially small in {n}, in particular it is smaller than {O(n^{-1})}. Plugging in everything we get the claim. \Box

4. Control on zeroes away from {S}

After the gargantuan previous section, we can now show the “approximate level set” property that we discussed last time.

Lemma 7 Let {K} be a standard compact set which misses {S} and {\varepsilon > 0} standard. Then for every zero {\lambda_0 \in K} of {f},

\displaystyle U_\zeta(\lambda) = \frac{1}{n} \log \frac{1}{|f(0)|} + O(n^{-1}\sigma^2 + (\varepsilon + o(1))^n).

Last time we showed that this implies

\displaystyle U_\zeta(\lambda_0) = U_\zeta(a) + O(n^{-1}\sigma^2 + (\varepsilon + o(1))^n).

Thus all the zeroes of {f} either live in {S} or a neighborhood of a level set of {U_\zeta}. Proof: Plugging in {z = \lambda_0} in the approximation

\displaystyle f(z) = f(0) + \frac{1 + O(\sigma^2)}{n} f'(z) (z - \mu) + O((\varepsilon + o(1))^n)

we get

\displaystyle f(0) + \frac{1 + O(\sigma^2)}{n} f'(\lambda_0) (\lambda_0 - \mu) = O((\varepsilon + o(1))^n).

Several posts ago, we proved {|f(0)| \sim 1} as a consequence of Grace’s theorem, so {f(0)O((\varepsilon + o(1))^n) = O((\varepsilon + o(1))^n)}. In particular, if we solve for {f'(\lambda_0)} we get

\displaystyle \frac{|f'(\lambda_0)}{n} |\lambda_0 - \mu| = |f(0)| (1 + O(\sigma^2 + (\varepsilon + o(1))^n).

Using

\displaystyle U_\zeta(z) = -\frac{\log n}{n - 1} - \frac{1}{n - 1} \log |f'(z)|,

plugging in {z = \lambda_0}, and taking logarithms, we get

\displaystyle -\frac{n - 1}{n} U_\zeta(\lambda_0) + \frac{1}{n} \log | \lambda_0 - \mu| = \frac{1}{n} \log |f(0)| + O(n^{-1}\sigma^2 + (\varepsilon + o(1))^n).

Now {\lambda_0 \in K} and {K} misses the standard compact set {S}, so since {0 \in S} we have

\displaystyle |\lambda - \zeta|, |\lambda - \mu| \sim 1

(since {\zeta^{(\infty)} \in S} and {\mu} is infinitesimal). So we can Taylor expand in {\zeta} about {\mu}:

\displaystyle \log |\lambda_0 - \zeta| = \log |\lambda_0 - \mu| - \text{Re }\frac{\zeta - \mu}{\lambda_0 - \mu} + O(\sigma^2).

Taking expectations and using {\mathbf E \zeta - \mu},

\displaystyle -U_\zeta(\lambda_0) = \log |\lambda_0 - \mu| + O(\sigma^2).

Plugging in {\log |\lambda_0 - \mu|} we see the claim. \Box

I’m not sure who originally came up with the idea to reason like this; I think Tao credits M. J. Miller. Whoever it was had an interesting idea, I think: {f = 0} is a level set of {f}, but one that a priori doesn’t tell us much about {f'}. We have just replaced it with a level set of {U_\zeta}, a function that is explicitly closely related to {f'}, but at the price of an error term.

5. Fine control

We finish this series. If you want, you can let {\varepsilon > 0} be a standard real. I think, however, that it will be easier to think of {\varepsilon} as “infinitesimal, but not as infinitesimal as the term of the form o(1)”. In other words, {1/n} is smaller than any positive element of the ordered field {\mathbf R(\varepsilon)}; briefly, {1/n} is infinitesimal with respect to {\mathbf R(\varepsilon)}. We still reserve {o(1)} to mean an infinitesimal with respect to {\mathbf R(\varepsilon)}. Now {\varepsilon^n = o(1)} by underspill, since this is already true if {\varepsilon} is standard and {0 < \varepsilon < 1}. Underspill can also be used to transfer facts at scale {\varepsilon} to scale {1/n}. I think you can formalize this notion of “iterated infinitesimals” by taking an iterated ultrapower of {\mathbf R} in the theory of ordered rings.

Let us first bound {\log |a|}. Recall that {|a| \leq 1} so {\log |a| \leq 0} but in fact we can get a sharper bound. Since {T} is discrete we can get {e^{-i\theta}} arbitrarily close to whatever we want, say {-1} or {i}. This will give us bounds on {1 - a} when we take the Taylor expansion

\displaystyle \log|a| = -(1 - a)(1 + o(1)).

Lemma 8 Let {e^{i\theta} \in \partial D(0, 1) \setminus S} be standard. Then

\displaystyle \log |a| \leq \text{Re } ((1 - e^{-i\theta} + o(1))\mu) - O(|\mu|^2 + \sigma^2 + (\varepsilon + o(1))^n).

Proof: Let {K} be a standard compact set which misses {S} and {\lambda_0 \in K} a zero of {f}. Since {\zeta \notin K} (since {S} is close to {\zeta}) and {|a-\zeta|} has positive standard part (since {d(a, S) = 1}) we can take Taylor expansions

\displaystyle -\log |\lambda_0 - \zeta| = -\log |\lambda_0| + \text{Re } \frac{\zeta}{\lambda_0} + O(|\zeta|^2)

and

\displaystyle -\log |a - \zeta| = -\log|a| + \text{Re } \frac{\zeta}{a} + O(|\zeta|^2)

in {\zeta} about {0}. Taking expectations we have

\displaystyle U_\zeta(\lambda_0) = -\log |\lambda_0| + \text{Re } \frac{\mu}{\lambda_0} + O(\mathbf E |\zeta|^2)

and similarly for {a}. Thus

\displaystyle -\log |a| + \text{Re } \frac{\mu}{a} = -\log |\lambda_0| + \text{Re } \frac{\mu}{\lambda_0} + O(\mathbf E |\zeta|^2 + n^{-1}\sigma^2 + (\varepsilon + o(1))^n)

since

\displaystyle U_\zeta(\lambda_0) - U_\zeta(a) = O(n^{-1}\sigma^2 + (\varepsilon + o(1))^n).

Since

\displaystyle \mathbf E|\zeta|^2 = |\mu|^2 + \sigma^2

we have

\displaystyle -\log|\lambda_0| + \text{Re } \left(\frac{1}{\lambda_0} - \frac{1}{a}\right)\mu = -\log|a| + O(|\mu|^2 + \sigma^2 + (\varepsilon + o(1))^n).

Now {|\lambda_0| \leq 1} so {-\log |\lambda_0| \geq 0}, whence

\displaystyle \text{Re } \left(\frac{1}{\lambda_0} - \frac{1}{a}\right)\mu \geq -\log|a| + O(|\mu|^2 + \sigma^2 + (\varepsilon + o(1))^n).

Now recall that {\lambda^{(\infty)}} is uniformly distributed on {\partial D(0, 1)}, so we can choose {\lambda_0} so that

\displaystyle |\lambda_0 - e^{i\theta}| = o(1).

Thus

\displaystyle \frac{1}{\lambda_0} - \frac{1}{a} = 1 - e^{-i\theta} + o(1)

which we can plug in to get the claim. \Box

Now we prove the first part of the fine control lemma.

Lemma 9 One has

\displaystyle \mu, 1 - a = O(\sigma^2 + (\varepsilon + o(1))^n).

Proof: Let {\theta_+ \in [0.98\pi, 0.99\pi],\theta_- \in [1.01\pi, 1.02\pi]} be standard reals such that {e^{i\theta_\pm} \notin S}. I don’t think the constants here actually matter; we just need {0 < 0.01 < 0.02 < \pi/8} or something. Anyways, summing up two copies of the inequality from the previous lemma with {\theta = \theta_\pm} we have

\displaystyle 1.9 \text{Re } \mu \geq \text{Re } ((1 + e^{-i\theta_+} + 1 + e^{-i\theta_-} + o(1))\mu) \geq \log |a| + O(|\mu|^2 + \sigma^2 + (\varepsilon + o(1))^n)

since

\displaystyle 2 + e^{-i\theta_+} + e^{-i\theta_-} + o(1) \leq 1.9.

That is,

\displaystyle \text{Re } \mu \geq \frac{\log|a|}{1.9} + O(|\mu|^2 + \sigma^2 + (\varepsilon + o(1))^n).

Indeed,

\displaystyle -\log |a| = (1 - a)(1 + o(1)),

so

\displaystyle \text{Re }\mu \geq -\frac{1 - a}{1.9 + o(1)} + O(|\mu|^2 + \sigma^2 + (\varepsilon + o(1))^n).

If we square the tautology {|\zeta - a| \geq 1} then we get

\displaystyle |\zeta|^2 - 2a \text{Re }\zeta + a^2 \geq 1.

Taking expected values we get

\displaystyle |\mu|^2 + \sigma^2 - 2a \text{Re }\mu + a^2 \geq 1

or in other words

\displaystyle \text{Re }\mu \leq -\frac{1 - a^2}{2a} + O(|\mu|^2 + \sigma^2) = -(1 - a)(1 + o(1)) + O(|\mu|^2 + \sigma^2)

where we used the Taylor expansion

\displaystyle \frac{1 - a^2}{2a} = (1 - a)(1 + o(1))

obtained by Taylor expanding {1/a} about {1} and applying {1 - a = o(1)}. Using

\displaystyle \text{Re }\mu \geq -\frac{1 - a}{1.9 + o(1)} + O(|\mu|^2 + \sigma^2 + (\varepsilon + o(1))^n)

we get

\displaystyle -\frac{1 - a}{1.9 + o(1)} + O(|\mu|^2 + \sigma^2 + (\varepsilon + o(1))^n) \leq \text{Re }\mu \leq -(1 - a)(1 + o(1)) + O(|\mu|^2 + \sigma^2)

Thus

\displaystyle (1 - a)\left(1 + \frac{1}{1.9 + o(1)} + o(1)\right) = O(|\mu|^2 + \sigma^2 + (\varepsilon + o(1))^n).

Dividing both sides by {1 + \frac{1}{1.9 + o(1)} + o(1) \in [1, 2]} we have

\displaystyle 1 - a = O(|\mu|^2 + \sigma^2 + (\varepsilon + o(1))^n).

In particular

\displaystyle \text{Re }\mu = O(|\mu|^2 + \sigma^2 + (\varepsilon + o(1))^n)(1 + o(1)) + O(|\mu|^2 + \sigma^2) = O(|\mu|^2 + \sigma^2 + (\varepsilon + o(1))^n).

Now we treat the imaginary part of {\text{Im } \mu}. The previous lemma gave

\displaystyle \text{Re } ((1 - e^{-i\theta} + o(1))\mu) - \log |a| = O(|\mu|^2 + \sigma^2 + (\varepsilon + o(1))^n).

Writing everything in terms of real and imaginary parts we can expand out

\displaystyle \text{Re } ((1 - e^{-i\theta} + o(1))\mu) = (\sin \theta + o(1))\text{Re } \mu + (1 - \cos \theta + o(1))\text{Re }\mu.

Using the bounds

\displaystyle (1 - \cos \theta + o(1))\text{Re }\mu, ~\log |a| = O(|\mu|^2 + \sigma^2 + (\varepsilon + o(1))^n)

(Which follow from the previous paragraph and the bound {\log |a| = O(1 - a)}), we have

\displaystyle (\sin \theta + o(1))\text{Im } \mu = O(|\mu|^2 + \sigma^2 + (\varepsilon + o(1))^n).

Since {T} is discrete we can find {\theta} arbitrarily close to {\pm \pi/2} which meets the hypotheses of the above equation. Therefore

\displaystyle \text{Im } \mu = O(|\mu|^2 + \sigma^2 + (\varepsilon + o(1))^n).

Pkugging everything in, we get

\displaystyle 1 - a \sim \mu = O(|\mu|^2 + \sigma^2 + (\varepsilon + o(1))^n).

Now {|\mu|^2 = o(|\mu|)} since {\mu} is infinitesimal; therefore we can discard that term. \Box

Now we are ready to prove the second part. The point is that we are ready to dispose of the semi-infinitesimal {\varepsilon}. Doing so puts a lower bound on {U_\zeta(a)}.

Lemma 10 Let {I \subseteq \partial D(0, 1) \setminus S} be a standard compact set. Then for every {e^{i\theta} \in I},

\displaystyle U_\zeta(a) - U_\zeta(e^{i\theta}) \geq -o(\sigma^2) - o(1)^n.

Proof: Since {\lambda^{(\infty)}} is uniformly distributed on {\partial D(0, 1)}, there is a zero {\lambda_0} of {f} with {|\lambda_0 - e^{i\theta}| = o(1)}. Since {|\lambda_0| \leq 1}, we can find an infinitesimal {\eta} such that

\displaystyle \lambda_0 = e^{i\theta}(1 - \eta)

and {|1 - \eta| \leq 1}. In the previous section we proved

\displaystyle U_\zeta(a) - U_\zeta(\lambda_0) = O(n^{-1}\sigma^2) + (\varepsilon + o(1))^n).

Using {n^{-1} = o(1)} and plugging in {\lambda_0} we have

\displaystyle U_\zeta(a) - U_\zeta(e^{i\theta}(1 - \eta)) = o(\sigma^2) + O((\varepsilon + o(1))^n).

Now

\displaystyle \text{Re } \eta \int_0^1 \frac{dt}{1 - t\eta + e^{-i\theta}\zeta} = \log |1 - e^{-i\theta}\zeta| - \log|1 - \eta - e^{-i\theta}\zeta| = \log|e^{i\theta} - \zeta| - \log|e^{i\theta} - e^{i\theta}\eta - \zeta|.

Taking expectations,

\displaystyle \text{Re }\eta \mathbf E\int_0^1 \frac{dt}{1 - t\eta + e^{-i\theta}\zeta} = U_\zeta(e^{i\theta}(1 - \eta)) - U_\zeta(e^{i\theta}).

Taking a Taylor expansion,

\displaystyle \frac{1}{1 - t\eta - e^{-i\theta}\zeta} = \frac{1}{1 - t\eta} + \frac{e^{-i\theta}\zeta}{(1 - t\eta)^2} + O(|\zeta|^2)

so by Fubini’s theorem

\displaystyle \mathbf E\int_0^1 \frac{dt}{1 - t\eta + e^{-i\theta}\zeta} = \int_0^1 \left(\frac{1}{1 - t\eta} + \frac{e^{-i\theta}}{(1 - t\eta)^2}\mu + O(|\mu|^2 + \sigma^2)\right)~dt;

using the previous lemma and {\eta = o(1)} we get

\displaystyle  U_\zeta(e^{i\theta}(1 - \eta)) - U_\zeta(e^{i\theta}) = \text{Re }\eta \int_0^1 \frac{dt}{1 - t\eta} + o(\sigma^2) + O((\varepsilon + o(1))^n).

We also have

\displaystyle \text{Re } \eta \int_0^1 \frac{dt}{1 - t\eta} = -\log \frac{1}{e^{i\theta} - e^{i\theta}\eta} = U_0(1 - \eta)

since {0} is deterministic (and {U_0(e^{i\theta} z) = U_0(z)}, and {U_0(1) = 0}; very easy to check!) I think Tao makes a typo here, referring to {U_i(e^{i\theta}(1 - \eta))}, which seems irrelevant. We do have

\displaystyle U_0(1 - \eta) = -\log|1 - \eta| \geq 0

since {|1 - \eta| \leq 0}. Plugging in

\displaystyle \text{Re } \eta \int_0^1 \frac{dt}{1 - t\eta} \geq 0

we get

\displaystyle U_\zeta(e^{i\theta} - e^{i\theta}\eta) - U_\zeta(e^{i\theta}) \geq -o(\sigma^2) - O((\varepsilon + o(1))^n).

I think Tao makes another typo, dropping the Big O, but anyways,

\displaystyle U_\zeta(a) - U_\zeta(e^{i\theta} - e^{i\theta}\eta) = o(\sigma^2) - O((\varepsilon + o(1))^n)

so by the triangle inequality

\displaystyle U_\zeta(a) - U_\zeta(e^{i\theta}) \geq -o(\sigma^2) - O((\varepsilon + o(1))^n).

By underspill, then, we can take {\varepsilon \rightarrow 0}. \Box

We need a result from complex analysis called Jensen’s formula which I hadn’t heard of before.

Theorem 11 (Jensen’s formula) Let {g: D(0, 1) \rightarrow \mathbf C} be a holomorphic function with zeroes {a_1, \dots, a_n \in D(0, 1)} and {g(0) \neq 0}. Then

\displaystyle \log |g(0)| = \sum_{j=1}^n \log |a_j| + \frac{1}{2\pi} \int_0^{2\pi} \log |g(e^{i\theta})| ~d\theta.

In hindsight this is kinda trivial but I never realized it. In fact {\log |g|} is subharmonic and in fact its Laplacian is exactly a linear combination of delta functions at each of the zeroes of {g}. If you subtract those away then this is just the mean-value property

\displaystyle \log |g(0)| = \frac{1}{2\pi} \int_0^{2\pi} \log |g(e^{i\theta})| ~d\theta.

Let us finally prove the final part. In what follows, implied constants are allowed to depend on {\varphi} but not on {\delta}.

Lemma 12 For any standard {\varphi \in C^\infty(\partial D(0, 1))},

\displaystyle \int_0^{2\pi} \varphi(e^{i\theta}) U_\zeta(e^{i\theta}) ~d\theta = o(\sigma^2) + o(1)^n.

Besides,

\displaystyle U_\zeta(a) = o(\sigma^2) + o(1)^n.

Proof: Let {m} be the Haar measure on {\partial D(0, 1)}. We first prove this when {\varphi \geq 0}. Since {T} is discrete and {\partial D(0, 1)} is compact, for any standard (or semi-infinitesimal) {\delta > 0}, there is a standard compact set

\displaystyle I \subseteq \partial D(0, 1) \setminus S

such that

\displaystyle m(\partial D(0, 1) \setminus I) < \delta.

By the previous lemma, if {e^{i\theta} \in I} then

\displaystyle \varphi(e^{i\theta}) U_\zeta(a) - \varphi(e^{i\theta}) U_\zeta(e^{i\theta}) \geq -o(\sigma^2) - o(1)^n

and the same holds when we average in Haar measure:

\displaystyle  U_\zeta(a)\int_I \varphi~dm - \int_I \varphi(e^{i\theta}) U_\zeta(e^{i\theta})~dm(e^{i\theta}) \geq -o(\sigma^2) - o(1)^n.

We have

\displaystyle |\log |e^{i\theta} - \zeta| + \text{Re } e^{-i\theta}\zeta| \leq |\log|3 - \zeta| + 3\text{Re } \zeta| \in L^2(dm(e^{i\theta}))

so, using the Cauchy-Schwarz inequality, one has

\displaystyle \int_{\partial D(0, 1) \setminus I} \varphi(e^{i\theta}) (\log |e^{i\theta} - \zeta| + \text{Re } e^{-i\theta}\zeta) ~dm(e^{i\theta}) = \sqrt{\int_I |\log |e^{i\theta} - \zeta| + \text{Re } e^{-i\theta}\zeta|} = O(\delta^{1/2}).

Meanwhile, if {|\zeta| \leq 1/2} then the fact that

\displaystyle \log |e^{i\theta} - \zeta| = \text{Re }-\frac{\zeta}{e^{i\theta}} + O(|\zeta|^2)

implies

\displaystyle \log |e^{i\theta} - \zeta| + \text{Re } \frac{\zeta}{e^{i\theta}} = O(|\zeta|^2)

and hence

\displaystyle \int_{\partial D(0, 1) \setminus I} \varphi(e^{i\theta}) (\log |e^{i\theta} - \zeta| + \text{Re } e^{-i\theta}\zeta) ~dm(e^{i\theta}) = O(\delta|\zeta|^2).

We combine these into the unified estimate

\displaystyle \int_{\partial D(0, 1) \setminus I} \varphi(e^{i\theta}) (\log |e^{i\theta} - \zeta| + \text{Re } e^{-i\theta}\zeta) ~dm(e^{i\theta}) = O(\delta^{1/2}|\zeta|^2)

valid for all {|\zeta| \leq 1}, hence almost surely. Taking expected values we get

\displaystyle \int_{\partial D(0, 1) \setminus I} \varphi(e^{i\theta})U_\zeta(e^{i\theta}) + \varphi(e^{i\theta}) \text{Re }e^{-i\theta}\mu ~dm(e^{i\theta}) = O(\delta^{1/2}(|\mu|^2 + \sigma^2)) + o(\sigma^2) + o(1)^n.

In the last lemma we bounded {|\mu|} so we can absorb all the terms with {\mu} in them to get

\displaystyle \int_{\partial D(0, 1) \setminus I} \varphi(e^{i\theta})U_\zeta(e^{i\theta}) ~dm(e^{i\theta}) = O(\delta^{1/2}\sigma^2) + o(\sigma^2) + o(1)^n.

We also have

\displaystyle \int_{\partial D(0, 1) \setminus I} \varphi ~dm = O(\delta)

(here Tao refers to a mysterious undefined measure {\sigma} but I’m pretty sure he means {m}). Putting these integrals together with the integrals over {I},

\displaystyle \ U_\zeta(a)int_{\partial D(0, 1)} \varphi ~dm - \int_{\partial D(0, 1)} \varphi(e^{i\theta}) U_\zeta(e^{i\theta}) ~dm(e^{i\theta}) \geq -O(\delta^{1/2}\sigma^2) - o(\sigma^2) - o(1)^n.

By underspill we can delete {\delta}, thus

\displaystyle  U_\zeta(a)\int_{\partial D(0, 1)} \varphi ~dm - \int_{\partial D(0, 1)} \varphi(e^{i\theta}) U_\zeta(e^{i\theta}) ~dm(e^{i\theta}) \geq - o(\sigma^2) - o(1)^n.

We now consider the specific case {\varphi = 1}. Then

\displaystyle U_\zeta(a) - \int_{\partial D(0, 1)} U_\zeta ~dm \geq -o(\sigma^2) - o(1)^n.

Now Tao claims and doesn’t prove

\displaystyle \int_{\partial D(0, 1)} U_\zeta ~dm = 0.

To see this, we expand as

\displaystyle \int_{\partial D(0, 1)} U_\zeta ~dm = -\mathbf E \frac{1}{2\pi} \int_0^{2\pi} \log|\zeta - e^{i\theta}| ~d\theta

using Fubini’s theorem. Now we use Jensen’s formula with {g(z) = \zeta - z}, which has a zero exactly at {\zeta}. This seems problematic if {\zeta = 0}, but we can condition on {|\zeta| > 0}. Indeed, if {\zeta = 0} then we have

\displaystyle  \int_0^{2\pi} \log|\zeta - e^{i\theta}| ~d\theta = \int_0^{2\pi} \log 1 ~d\theta = 0

which already gives us what we want. Anyways, if {|\zeta| > 0}, then by Jensen’s formula,

\displaystyle \frac{1}{2\pi} \int_0^{2\pi} \log|\zeta - e^{i\theta}| ~d\theta = \log |\zeta| - \log |\zeta| = 0.

So that’s how it is. Thus we have

\displaystyle -U_\zeta(a) \leq o(\sigma^2) + o(1)^n.

Since {|a - \zeta| \geq 1}, {\log |a - \zeta| \geq 0}, so the same is true of its expected value {-U_\zeta(a)}. This gives the desired bound

\displaystyle U_\zeta(a) = o(\sigma^2) + o(1)^n.

We can use that bound to discard {U_\zeta(a)} from the average

\displaystyle  U_\zeta(a)\int_{\partial D(0, 1)} \varphi ~dm - \int_{\partial D(0, 1)} \varphi(e^{i\theta}) U_\zeta(e^{i\theta}) ~dm(e^{i\theta}) \geq - o(\sigma^2) - o(1)^n,

thus

\displaystyle \int_{\partial D(0, 1)} \varphi(e^{i\theta}) U_\zeta(e^{i\theta}) ~dm(e^{i\theta})= o(\sigma^2) + o(1)^n.

Repeating the Jensen’s formula argument from above we see that we can replace {\varphi} with {\varphi - k} for any {k \geq 0}. So this holds even if {\varphi} is not necessarily nonnegative. \Box

Let’s Read: Sendov’s conjecture in high degree, part 3: case zero, and a sketch of case one

In this post we’re going to complete the proof of case zero and continue the proof of case one. In the last two posts we managed to prove:

Theorem 1 Let {n} be a nonstandard natural, and let {f} be a monic polynomial of degree {n} on {\mathbf C} with all zeroes in {\overline{D(0, 1)}}. Suppose that {a} is a zero of {f} such that:

  1. Either {a} or {a - 1} is infinitesimal, and
  2. {f} has no critical points on {\overline{D(a, 1)}}.

Let {\lambda} be a random zero of {f} and {\zeta} a random critical point of {f}. Let {\mu} be the expected value of {\lambda}. Let {z} be a complex number outside some measure zero set and let {\gamma} be a contour that misses the zeroes of {f},{f'}. Then:

  1. {\zeta \in \overline{D(0, 1)} \setminus \overline{D(a, 1)}}.
  2. {\mu = \mathbf E \zeta}.
  3. One has

    \displaystyle U_\lambda(z) = -\frac{1}{n} \log |f(z)|

    and

    \displaystyle U_\zeta(z) = -\frac{\log n}{n - 1} - \frac{1}{n-1} \log |f'(z)|.

  4. One has

    \displaystyle s_\lambda(z) = \frac{1}{n} \frac{f'(z)}{f(z)}

    and

    \displaystyle s_\zeta(z) = \frac{1}{n - 1} \frac{f''(z)}{f'(z)}.

  5. One has

    \displaystyle U_\lambda(z) - \frac{n - 1}{n} U_\zeta(z) = \frac{1}{n} \log |s_\lambda(z)|

    and

    \displaystyle s_\lambda(z) - \frac{n - 1}{n} s_\zeta(z) = -\frac{1}{n} \frac{s_\lambda'(z)}{s_\lambda(z)}.

  6. One has

    \displaystyle f(\gamma(1)) = f(\gamma(0)) \exp \left(n \int_\gamma s_\lambda(z) ~dz\right)

    and

    \displaystyle f'(\gamma(1)) = f(\gamma(0)) \exp\left((n-1) \int_\gamma s_\zeta(z) ~dz\right).

Moreover,

  1. If {a} is infinitesimal (case zero), then {\lambda^{(\infty)},\zeta^{(\infty)}} are identically distributed and almost surely lie in

    \displaystyle C = \{e^{i\theta}: 2\theta \in [\pi, 3\pi]\}.

    Moreover, if {K} is any compact set which misses {C}, then

    \displaystyle \mathbf P(\lambda \in K) = O\left(a + \frac{\log n}{n^{1/3}}\right),

    so {d(\lambda, C)} is infinitesimal in probability.

  2. If {a - 1} is infinitesimal (case one), then {\lambda^{(\infty)}} is uniformly distributed on {\partial D(0, 1)} and {\zeta^{(\infty)}} is almost surely zero. Moreover,

    \displaystyle \mathbf E \log \frac{1}{|\lambda|}, \mathbf E \log |\zeta - a| = O(n^{-1}).

We also saw that Sendov’s conjecture in high degree was equivalent to the following result, that we will now prove.

Lemma 2 Let {n} be a nonstandard natural, and let {f} be a monic polynomial of degree {n} on {\mathbf C} with all zeroes in {\overline{D(0, 1)}}. Let {a} be a zero of {f} such that:

  1. Either {a \log n} is infinitesimal (case zero), or
  2. There is a standard {\varepsilon_0 > 0} such that

    \displaystyle 1 - o(1) \leq a \leq 1 - \varepsilon_0^n

    (case one).

If there are no critical points of {f} in {\overline{D(a, 1)}}, then {0 = 1}.

1. Case zero

Now we prove case zero — the easy case — of Lemma 2.

Suppose that {a \log n} is infinitesimal. In this case, {\lambda^{(\infty)}, \zeta^{(\infty)}} are identically distributed and almost surely are {\in C}.

Lemma 3 There are {0 < r_1 < r_2 < 1/2} such that for every {|z| \in [r_1, r_2]},

\displaystyle |s_{\lambda^{(\infty)}}(z)| \sim 1

uniformly.

Proof: Since {\lambda^{(\infty)}} is supported in {C}, {s_{\lambda^{(\infty)}}} is holomorphic away from {C}. Since {\lambda^{(\infty)}} is bounded, if {z} is near {\infty} then

\displaystyle s_{\lambda^{(\infty)}}(z) = \mathbf E\frac{1}{z - \lambda^{(\infty)}} \sim \mathbf E \frac{1}{z} = \frac{1}{z}

which is nonzero near {\infty}. So the variety {s_{\lambda^{(\infty)}} = 0} is discrete, so there are {0 < r_1 < r_2 < 1/2} such that {s_{\lambda^{(\infty)}}(re^{i\theta}) \neq 0} whenever {r \in [r_1, r_2]}. To see this, suppose not; then for every {r_1 < r_2} we can find {r \in [r_1, r_2]} and {\theta} with {s_{\lambda^{(\infty)}}(re^{i\theta}) = 0}, so {s_{\lambda^{(\infty)}}} has infinitely many zeroes in the compact set {\overline{D(0, 1/2)}}. Since this is definitely not true, the claim follows by continuity of {s_{\lambda^{(\infty)}}}. \Box

Let {m} be the number of zeroes of {s_{\lambda^{(\infty)}}} in {D(0, r_1)}, so {m} is a nonnegative standard natural since {s_{\lambda^{(\infty)}}} is standard and {\overline{D(0, 1/2)}} is compact. Let {\gamma(\theta) = re^{i\theta}} where {r \in (r_1, r_2)}; then

\displaystyle \int_\gamma \frac{s_{\lambda^{(\infty)}}'(z)}{s_{\lambda^{(\infty)}}(z)} = m,

by the argument principle.

We claim that in fact {m \leq -1}, which contradicts that {m} is nonnegative. This will be proven in the rest of this section.

Here something really strange happens in Tao’s paper. He proves this:

Lemma 4 One has

\displaystyle \left|\frac{1}{n} \frac{f'(z)}{f(z)} - s_{\lambda^{(\infty)}(z)}\right| = o(1)

in {L^1_{loc}}.

We now need to show that the convergence in {L^1_{loc}} above commutes with the use of the argument principle so that

\displaystyle \int_\gamma \frac{(f'/f)'(z)}{f'(z)/f(z)} = m;

this will be good because we have control on the zeroes and critical points of {f} using our contradiction assumption. What’s curious to me is that Tao seems to substitute this with convergence in {L^\infty_{loc}} on an annulus. Indeed, convergence in {L^\infty_{loc}} does commute with use of the argument principle, but at no point of the proof does it seem like he uses the convergence in {L^1_{loc}}. So I include the proof of the latter in the next section as a curiosity item, but I think it can be omitted entirely. Tell me in the comments if I’ve made a mistake here.

If {\chi} is a smooth cutoff supported on {\overline{D(0, 1/2)}} and identically one on {\overline{D(0, r_3)}} (where {r_2 < r_3 < 1/2}), one has

\displaystyle \frac{1}{n} \frac{f'(z)}{f(z)} = \mathbf E \frac{1}{z - \lambda} = \mathbf E \frac{1 - \chi(\lambda)}{z - \lambda} + \mathbf E \frac{\chi(\lambda)}{z - \lambda}.

The {1 - \chi} term is easy to deal with, since for every {z}, {(1 - \chi(\lambda))/(z - \lambda)} is a bounded continuous function of {\lambda} whenever {|z| < r_2} (so {|\lambda - z| \geq r_3 - r_2 > 0}). By the definition of being infinitesimal in distribution we have

\displaystyle \left|\mathbf E\frac{1 - \chi(\lambda)}{z - \lambda} - \frac{1}{z - \lambda^{(\infty)}}\right| = o(1).

Therefore

\displaystyle \mathbf E \frac{1 - \chi(\lambda)}{z - \lambda} - s_{\lambda^{(\infty)}}(z)

is uniformly infinitesimal.

Now we treat the {\chi} term. Interestingly, this is the main point of the argument where we use that {a \log n} is infinitesimal, and the rest of the argument seems to mainly go through with much weaker assumptions on {a}.

Lemma 5 There is an {r \in [r_1, r_2]} such that if {|z| = r} then

\displaystyle \left|\mathbf E \frac{\chi(\lambda)}{z - \lambda}\right| = o(1).

Proof: By the triangle inequality and its reverse, if {|z| = r} then

\displaystyle \left|\mathbf E \frac{\chi(\lambda)}{z - \lambda}\right| \leq \mathbf E \frac{\chi(\lambda)}{|r - |\lambda||}.

Here {r \in [r_1, r_2]} is to be chosen.

Since we have

\displaystyle \mathbf P(\lambda \in K) = O\left(a + \frac{\log n}{n^{1/3}}\right)

whenever {K} is a compact set which misses {C}, this in particular holds when {K = \overline{B(0, 1/2)}}. Since {a = o(1/\log n)} and {n^{-1/3}\log n = o(1/\log n)} it follows that

\displaystyle \mathbf P(|\lambda| \leq 1/2) = o(1/\log n).

In particular,

\displaystyle \mathbf E\chi(\lambda) = o(1/\log n).

We now claim

\displaystyle \int_{r_1}^{r_2} \frac{dr}{\max(|r - |\lambda||, n^{-10})} = O(\log n).

By splitting the integrand we first bound

\displaystyle \int_{\substack{[r_1,r_2]\\|r-|\lambda|| \leq n^{-10}}} \frac{dr}{\max(|r - |\lambda||, n^{-10})} \leq 2n^{-10}n^{10} = 2 = O(\log n)

since {\log n} is nonstandard and the domain of integration has measure at most {2n^{-10}}. On the other hand, the other term

\displaystyle \int_{\substack{[r_1,r_2]\\|r-|\lambda|| \geq n^{-10}}} \frac{dr}{\max(|r - |\lambda||, n^{-10})} \leq \int_{n^{-10}}^{r_2 - r_1} \frac{dr}{r} = \log(r_2 - r_1) - 10 \log n = O(\log n)

since {\log(r_2 - r_1)} is standard while {\log n} is nonstandard. This proves the claim.

Putting the above two paragraphs together and using Fubini’s theorem,

\displaystyle \int_{r_1}^{r_2} \mathbf E \frac{\chi(\lambda)}{\max(|r - |\lambda||, n^{-10})} ~dr = \mathbf E\chi(\lambda) \int_{r_1}^{r_2} \frac{1}{\max(|r - |\lambda||, n^{-10})} ~dr = O(\log n) \mathbf E\chi(\lambda)

is infinitesimal. So outside of a set of infinitesimal measure, {r \in [r_1, r_2]} satisfies

\displaystyle \mathbf E \frac{\chi(\lambda)}{\max(|r - |\lambda||, n^{-10})} = o(1).

If {|r - |\lambda|| \leq n^{-10}} then there is a (deterministic) zero {\lambda_0} such that {|r - |\lambda_0|| \leq n^{-10}}, thus {r} lies in a set of measure {2n^{-10}}. There are {\leq n} such sets since there are {n} zeroes of {f}, so their union has measure {2n^{-9}}, which is infinitesimal. Therefore

\displaystyle \mathbf E \frac{\chi(\lambda)}{|r - |\lambda||} = o(1)

which implies the claim. \Box

Summing up, we have

\displaystyle \frac{f'}{nf} = s_{\lambda^{(\infty)}} + o(1)

in {L^\infty(B(0, r))}, where {r} is as in the previous lemma. Pulling out the factor of {1/n}, which is harmless, we can use the argument principle to deduce that {m} is the number of zeroes minus poles of {f'/f}; that is, the number of critical points minus zeroes of {f}. Indeed, convergence in {L^\infty} does commute with the argument principle, so we can throw out the infinitesimal {o(1)}.

But {a} is infinitesimal, and we assumed that {f} had no critical points in {\overline{D(a, 1)}}, which contains {D(0, r)}. So {f} has no critical points, but has a zero {a}; therefore {m \leq -1}.

In a way this part of the proof was very easy: the only tricky bit was using the cutoff to get convergence in {L^\infty} like we needed. The hint that we could use the argument principle was the fact that {a} was infinitesimal, so we had control of the critical points near the origin.

2. Convergence in {L^1_{loc}}

Let {\nu} be the distribution of {\lambda} and {\nu^{(\infty)}} of {\lambda^{(\infty)}}. Since {\lambda - \lambda^{(\infty)}} is infinitesimal in distribution, {\nu^{(\infty)} - \nu} is infinitesimal in the weak topology of measures; that is, for every continuous function {g} and compact set {K},

\displaystyle \int_K g ~d(\nu^{(\infty)} - \nu) = o(1).

Now

\displaystyle s_{\lambda^{(\infty)}}(z) - s_\lambda(z) = \int_{D(0, 1)} \frac{d\nu^{(\infty)}(w)}{z - w} - \int_{D(0, 1)} \frac{d\nu(w)}{z - w}.

If {K} is a compact set and {\rho} is Lebesgue measure then

\displaystyle \int_K s_{\lambda^{(\infty)}} - s_\lambda ~d\rho = \int_K \int_{D(0, 1)} \frac{d(\nu^{(\infty)} - \nu)(w)}{z - w} ~d\rho(z).

By Tonelli’s theorem

\displaystyle \int_K \int_{D(0, 1)} \frac{d(\nu^{(\infty)} - \nu)(w)}{|z - w|} ~d\rho(z) = \int_{D(0, 1)} \int_K \frac{d\rho(z)}{|z - w|} d(\nu^{(\infty)} - \nu)(w)

and the inner integral is finite since {1/|z|} is Lebesgue integrable in codimension {2}. So the outer integrand is a bounded continuous function, which implies that

\displaystyle \int_K \int_{D(0, 1)} \frac{d(\nu^{(\infty)} - \nu)(w)}{|z - w|} ~d\rho(z) = o(1)

which gives what we want when we recall

\displaystyle s_\lambda(z) = \frac{1}{n} \frac{f'(z)}{f(z)}

and we plug in {s_\lambda}.

3. Case one: Outlining the proof

The proof for case one is much longer, and is motivated by the pseudo-counterexample

\displaystyle f(z) = z^n - 1.

Here {a} is an {n}th root of unity, and {f} has no critical points on {D(a, 1)}, but does have {n - 1} critical points at {0 \in \partial D(a, 1)}. Similar pseudo-counterexamples hold for

\displaystyle 1 - o(1) \leq a \leq 1 - \varepsilon_0^n

where {\varepsilon_0 > 0} is standard. We will seek to control these examples by controlling {\zeta} up to an error of size {O(\sigma^2) + o(1)^n}; here {\sigma^2} is the variance of {\zeta} and {o(1)^n} is an infinitesimal raised to the power of {n}, thus is very small, and forces us to balance out everything in terms of {a}.

As discussed in the introduction of this post, {\zeta} is infinitesimal in probability (and, in particular, its expected value {\mu} is infinitesimal); thus, with overwhelming probability, the critical points of {f} are all infinitesimals. Combining this with the fact that {\lambda^{(\infty)}} is uniformly distributed on {\partial D(0, 1)}, it follows that {f} sort of looks like {f(z) = z^n - 1}.

We start with some nice bounds:

Lemma 6 (preliminary bounds) For any standard compact set {K \subset \mathbf C}, one has

\displaystyle f(z) = f(0) + O((|z| + o(1))^n)

and

\displaystyle f'(z) = O((|z| + o(1))^n)

uniformly in {z \in K}.

In other words, {f} sort of grows like the translate of a homogeneous polynomial of degree {n}.

It would be nice if {\zeta} was infinitesimal in {L^\infty}, but this isn’t quite true; the following lemma is the best we can do.

Lemma 7 (uniform convergence of {\zeta}) There is a standard compact set

\displaystyle S = (\overline{D(0, 1)} \cap \partial D(1, 1)) \cup T

where {T} is countable, standard, does not meet {\overline{D(1, 1)}}, and consists of isolated points of {S}, such that {d(\zeta, S)} is infinitesimal in {L^\infty}.

So we think of {S} as some sort of generalization of {0}. Away from {S} we have good bounds on {f,f'}:

Lemma 8 (approximating {f,f'} outside {S}) For any standard compact set {K \subset \mathbf C \setminus S}:

  1. Uniformly in {z, w \in K},

    \displaystyle f'(w) = (1 + O(n|z - w|\sigma^2|e^{o(n|z -w|)})) \frac{f'(z)}{(z - \mu)^{n-1}} (w - \mu)^{n-1}.

  2. For every standard {\varepsilon > 0} and uniformly in {z \in K},

    \displaystyle f(z) = f(0) + \frac{1 + O(\sigma^2)}{n} f'(z) (z - \mu) + O((\varepsilon + o(1))^n).

As a consequence, we can show that every zero of {f} which is far from {S} is close to the level set

\displaystyle U_\zeta = \frac{1}{n} \log\frac{1}{|f(0)|}.

This in particular holds for {a}, since the standard part of {a} is {1}, and {T} does not come close to {1} (so neither does {S}). In fact the error term is infinitesimal:

Lemma 9 (zeroes away from {S}) For any standard compact set {K \subset \mathbf C \setminus S}, any standard {\varepsilon > 0}, and any zero {\lambda_0 \in K},

\displaystyle U_\zeta(\lambda_0) = \frac{1}{n} \log \frac{1}{|f(0)|} + O(n^{-1}\sigma^2) + O((\varepsilon + o(1))^n)

uniformly in {\lambda_0}.

Since {a} satisfies the hypotheses of the above lemma,

\displaystyle U_\zeta(\lambda) - U_\zeta(a) = O(n^{-1}\sigma^2 + (\varepsilon + o(1))^n)

is infinitesimal. This gives us some more bounds:

Lemma 10 (fine control) For every standard {\varepsilon > 0}:

  1. One has

    \displaystyle \mu, 1 - a = O(\sigma^2 + (\varepsilon + o(1))^n).

  2. For every compact set {I \subseteq \partial D(0, 1) \setminus S} and {e^{i\theta} \in I},

    \displaystyle U_\zeta(a) - U_\zeta(e^{i\theta}) -o(\sigma^2) - o(1)^n.

  3. For every standard smooth function {\varphi: \partial D(0, 1) \rightarrow \mathbf C},

    \displaystyle \int_0^{2\pi} \varphi(e^{i\theta}) U_\zeta(e^{i\theta}) ~d\theta = o(\sigma^2) + o(1)^n.

  4. One has

    \displaystyle U_\zeta(a) = o(\sigma)^2 + o(1)^n.

Here Tao claims

\displaystyle \int_0^{2\pi} e^{-2i\theta} \log \frac{1}{|e^{i\theta} - \zeta|} ~d\theta = \frac{\pi}{2}\zeta^2.

Apparently this follows from Fourier inversion but I don’t see it. In any case if we take the expected value of the left-hand side we get

\displaystyle \int_0^{2\pi} \mathbf Ee^{-2i\theta} \log \frac{1}{|e^{i\theta} - \zeta|} ~d\theta = \int_0^{2\pi} e^{-2i\theta} U_\zeta(e^{i\theta}) = o(\sigma^2) + o(1)^n

by the fine control lemma, so

\displaystyle \mathbf E \zeta^2 = o(\sigma^2) + o(1)^n.

In particular this holds for the real part of {\zeta}. Since {\sigma^2} is infinitesimal, so are the first two moments of the real part of {\zeta}.

Since {|a - \zeta| \in [1, 2]}, one has

\displaystyle |a - \zeta| - 1 \sim \log |a - \zeta|.

This is true since for any {s \in [1, 2]} one has {\log s \sim s - 1} (which follows by Taylor expansion). In particular,

\displaystyle |1 - \zeta| \leq |1 - a| + |a - \zeta| = 1 + O((1 - a) + \log |a - \zeta|).

Let {\tilde \zeta} be the best approximation of {\zeta} on the arc {\partial D(1, 1) \cap \overline{D(0, 1)}}, which exists since that arc is compact; then

\displaystyle |\zeta - \tilde \zeta| = O((1 - a) + \log|a - \zeta|).

Since {\tilde \zeta \in \partial D(1, 1)}, it has the useful property that

\displaystyle \text{arg }\zeta \in [\pi/3,\pi/2] \cup [-\pi/2, -\pi/3];

therefore

\displaystyle \text{Re } \tilde \zeta^2 \leq -\frac{1}{2} |\tilde \zeta|^2.

Plugging in the expansion for {\tilde \zeta} we have

\displaystyle \text{Re } \zeta^2 \leq -\frac{1}{2} |\zeta|^2 + O(|\zeta|((1-a) + \log|a -\zeta|) + ((1 - a) + \log|a - \zeta|)^2).

We now use the inequality {2|zw| \leq |z|^2 + |w|^2} several times. First we bound

\displaystyle \frac{1}{2} |\zeta|((1-a) + \log|a -\zeta|) \leq \frac{1}{4}|\zeta|^2 + O(\log^2 |a - \zeta|).

I had to think a bit about why this is legal; the point is that you can absorb the implied constant on {\zeta} into the implied constant on {\log |a - \zeta|} before applying the inequality. Now we bound

\displaystyle ((1 - a) + \log|a - \zeta|)^2 = (1 - a)^2 + 2(1 - a)\log|a - \zeta| + \log^2 |a - \zeta| = O((1-a)^2 + \log^2 |a - \zeta|)

by similar reasoning.

Thus we conclude the bound

\displaystyle \text{Re } \zeta^2 \leq - \frac{1}{4} |\zeta|^2 + O((1-a)^2 + \log^2 |a - \zeta|),

or in other words,

\displaystyle \mathbf E \text{Re }\zeta^2 \leq -\frac{1}{4} \mathbf E |\zeta|^2 + O((1-a)^2+ \mathbf E \log^2 |a - \zeta|).

Applying the fine control lemma, or more precisely the result

\displaystyle 1 - a = O(\sigma^2 + (\varepsilon + o(1))^n),

as well as the fact that {1 - a} is infinitesimal, we have

\displaystyle (1-a)^2 = (1 - a) O(\sigma^2 + (\varepsilon + o(1))^n) = o(\sigma^2) + o(\varepsilon + o(1))^n)

for every standard {\varepsilon > 0}, hence by underspill

\displaystyle (1-a)^2 = o(\sigma^2) + o(1)^n.

By the fine control lemma,

\displaystyle U_\zeta(a) = o(\sigma^2) + o(1)^n.

Thus we bound

\displaystyle \mathbf E \log^2 |a - \zeta| \leq -\mathbf E \log |a - \zeta| = U_\zeta(a) = o(\sigma^2) + o(1)^n

owing to the fact that {|a - \zeta| \in [1, 2]} so that {\log |a - \zeta| \in [0, 1]}.

Plugging in the above bounds,

\displaystyle \mathbf E \text{Re }\zeta^2 \leq -\frac{1}{4} \mathbf E|\zeta|^2 + o(\sigma^2) + o(1)

By definition of variance we have

\displaystyle \mathbf E |\zeta|^2 - |\mu|^2 = \sigma^2

and {\mu} is infinitesimal so we can spend the {o(\sigma^2)} term as

\displaystyle \mathbf E\text{Re }\zeta^2 \leq -\frac{1+o(1)}{4} \mathbf E |\zeta|^2 + o(1)^n.

But the fine control lemma said

\displaystyle \mathbf E\text{Re }\zeta^2 = o(\sigma^2) + o(1)^n.

So

\displaystyle |\mu|^2 + \sigma^2 = o(1)^n.

In particular,

\displaystyle o(\sigma^2) = o(1)^n

since {\mu} is infinitesimal.

We used underspill to show

\displaystyle (1 - a)^2 = o(\sigma^2) + o(1)^n = o(1)^n

so

\displaystyle 1 - \varepsilon_0^n \geq a = 1 - o(1)^n > 1 - \varepsilon_0^n

since {\varepsilon_0} was standard, which implies {0 = 1}.

Next time, we’ll go back and fill in all the lemmata that we skipped in the proof for case one. This is a tricky bit — pages 25 through 34 of Tao’s paper. (For comparison, we covered pages 19 through 21, some of the exposition in pages 24 through 34, and pages 34 through 36 this time). Next time, then.

Let’s Read: Sendov’s conjecture in high degree, part 2: distribution of random zeroes

Before we begin, I want to fuss around with model theory again. Recall that if {z} is a nonstandard complex number, then {z^{(\infty)}} denotes the standard part of {z}, if it exists. We previously defined what it meant for a nonstandard random variable to be infinitesimal in distribution. One can define something similar for any metrizable space with a notion of {0}, where {f} is infinitesimal provided that {d(f, 0)} is. For example, a nonstandard random variable {\eta} is infinitesimal in {L^1_{loc}} if for every compact set {K} that {\eta} can take values in, {||\eta||_{L^1(K)}} is infinitesimal, since {L^1_{loc}} is metrizable with

\displaystyle d(0, \eta) = \sum_{m=1}^\infty 2^{-m} \frac{||\eta||_{L^1(K_m)}}{1 + ||\eta||_{L^1(K_m)}}

whenever {(K_m)} is a compact exhaustion. If {f} is nonstandard, {|f - f^{(\infty)}|} is infinitesimal in some metrizable space, and {f^{(\infty)}} is standard, then we call {f^{(\infty)}} the standard part of {f} in {\mathcal T}; then the standard part is unique since metrizable spaces are Hausdorff.

If the metrizable space is compact, the case that we will mainly be interested in, then the standard part exists. This is a point that we will use again and again. Passing to the cheap perspective, this says that if {K} is a compact metric space and {(f^{(n)})} is a sequence in {K}, then there is a {f^{(\infty)}} which is approximates {f^{(n)}} infinitely often, but that’s just the Bolzano-Weierstrass theorem. Last time used Prokohov’s theorem to show that if {\xi} is a nonstandard tight random variable, then {\xi} has a standard part {\xi^{(\infty)}} in distribution.

We now restate and prove Proposition 9 from the previous post.

Theorem 1 (distribution of random zeroes) Let {n} be a nonstandard natural, {f} a monic polynomial of degree {n} with all zeroes in {\overline{D(0, 1)}}, and let {a \in [0, 1]} be a zero of {f}. Suppose that {f'} has no zeroes in {\overline{D(a, 1)}}. Let {\lambda} be a random zero of {f} and {\zeta} a random zero of {f'}. Then:

  1. If {a^{(\infty)} = 0} (case zero), then {\lambda^{(\infty)}} and {\zeta^{(\infty)}} are identically distributed and almost surely lie in the curve

    \displaystyle C = \{e^{i\theta}: 2\theta \in [\pi, 3\pi]\}.

    In particular, {d(\lambda, C) = o(1)} in probability. Moreover, for every compact set {K \subseteq \overline{D(0, 1)} \setminus C},

    \displaystyle \mathbf P(\lambda \in K) = O\left(a + \frac{\log n}{n^{1/3}}\right).

  2. If {a^{(\infty)} = 1} (case one), then {\lambda^{(\infty)}} is uniformly distributed on the unit circle {\partial D(0, 1)} and {\zeta^{(\infty)}} is almost surely zero. Moreover,

    \displaystyle \mathbf E \log \frac{1}{|\lambda|}, \mathbf E\log |\zeta - a| = O(n^{-1}).

1. Moment-generating functions and balayage

We first show that {\lambda^{(\infty)}} and {\zeta^{(\infty)}} have equal moment-generating functions in a suitable sense.

To do this, we first show that they have the same logarithmic potential. Let {\eta} be a random variable such that {|\eta| = O(1)} almost surely (that is, {\eta} is almost surely bounded). Then the logarithmic potential

\displaystyle U_\eta(z) = \mathbf E \log \frac{1}{|z - \eta|}

is defined almost everywhere as we discussed last time, and is harmonic outside of the essential range of {\eta}.

Lemma 2 Let {\eta} be a nonstandard, almost surely bounded, random complex number. Then the standard part of {U_\eta} is {U_{\eta^{(\infty)}}} according to the topology of {L^1_{loc}} under Lebesgue measure.

Proof: We pass to the cheap perspective. If we instead have a random sequence of {\eta_j} and {\eta_j \rightarrow \eta} in distribution, then {U_{\eta_j} \rightarrow U_\eta} in {L^1_{loc}}, since up to a small error in {L^1_{loc}} we can replace {\log} with a test function {g}; one then has

\displaystyle \lim_{j \rightarrow \infty} \iint_{K \times \mathbf C} g\left(\frac{1}{|z - w|}\right) ~d\mu_j(w) ~dz = \iint_{K \times \mathbf C} g\left(\frac{1}{|z - w|}\right) ~d\mu(w) ~dz

where {\mu_j \rightarrow \mu} in the weak topology of measures, {\mu_j} is the distribution of {\eta_j}, {\mu} is the distribution of {\eta}, and {K} is a compact set equipped with Lebesgue measure. \Box

Lemma 3 For every {1 < |z| \leq 3/2}, we have

\displaystyle U_\lambda(z) - U_\zeta(z) = O\left(\frac{1}{n} \log \frac{1}{|z| - 1}\right).

In particular, {U_{\lambda^{(\infty)}}(z) = U_{\zeta^{(\infty)}}(z)}.

Proof: By definition, {\lambda \in D(0, 1)}, so {z - \lambda \in D(z, 1)}. Now {D(z, 1)} is a disc with diameter {T([|z| - 1, |z| + 1])} where {T} is a rotation around the origin. Taking reciprocals preserves discs and preserves {T}, so {(z - \lambda)^{-1}} sits inside a disc {W} with a diameter {T[(|z|+1)^{-1}, (|z|-1)^{-1}]}. Then {W} is convex, so the expected value of {(z - \lambda)^{-1}} is also {\in W}. Therefore the Stieltjes transform

\displaystyle s_\lambda(z) = \mathbf E \frac{1}{z - \lambda}

satisfies {s_\lambda(z) \in W}. In particular,

\displaystyle \log |s_\lambda(z)| \in \left[\log \frac{1}{|z| + 1}, \log \frac{1}{|z| - 1}\right].

But we showed that

\displaystyle U_\lambda(z) - \frac{n - 1}{n} U_\zeta(z) = \frac{1}{n} \log |s_\lambda(z)|

almost everywhere last time. This implies that for almost every {z},

\displaystyle -\frac{\log(|z| + 1)}{n} \leq U_\lambda(z) - \frac{n - 1}{n}U_\zeta(z) \leq -\frac{\log(|z| - 1)}{n}

but all terms here are continuous so we can promote this to a statement that holds for every {z}. In particular,

\displaystyle U_\lambda(z) - \frac{n - 1}{n} U_\zeta(z) = O\left(\frac{1}{n} \log \frac{1}{|z|-1}\right)

hence

\displaystyle U_\lambda(z) - U_\zeta(z) = O\left(\frac{1}{n} U_\zeta(z) + \frac{1}{n} \log \frac{1}{|z|-1}\right).

Since {|\zeta| < 1} while {1 < |z| < 3/2}, {|z - \zeta|} is bounded from above and below by a constant times {|z| - 1}. Therefore the same holds of its logarithm {U_\zeta(z)}, which is bounded from above and below by a constant times {-\log(|z| - 1)}. This implies the first claim.

To derive the second claim from the first, we use the previous lemma, which implies that we must show that

\displaystyle \log \frac{1}{|z| - 1} = O(n)

in {L^1_{loc}}. But this follows since {-\log|\cdot|} is integrable in two dimensions. \Box

Lemma 4 Let {\eta} be an almost surely bounded random variable. Then

\displaystyle U_\eta(Re^{i\theta}) = -\log R + \frac{1}{2} \sum_{m \neq 0} \frac{e^{im\theta}}{|m| R^{|m|}} \mathbf E\eta^{|m|}.

Proof: One has the Taylor series

\displaystyle \log \frac{1}{|Re^{i\theta} - w|} = -\log R + \frac{1}{2} \sum_{m \neq 0} \frac{e^{im\theta} w^{|m|}}{|m| R^{|m|}}.

Indeed, by rescaling and using {\log(ab) = \log a + \log b}, we may assume {R = 1}. The summands expand as

\displaystyle \text{Re }\frac{e^{im\theta} w^{|m|}}{|m| R^{|m|}} = \frac{w^{|m|} \cos |m|\theta}{|m|}

and the imaginary parts all cancel by symmetry about {0}. Using the symmetry about {0} again we get

\displaystyle -\log R + \frac{1}{2} \sum_{m \neq 0} \frac{e^{im\theta} w^{|m|}}{|m| R^{|m|}} = \sum_{m=1}^\infty \frac{w^{|m|} \cos |m|\theta}{|m|}.

This equals the left-hand side as long as {|w| < R}. Taking expectations and commuting the expectation with the sum using Fubini’s theorem (since {\eta} is almost surely bounded), we see the claim. \Box

Lemma 5 For all {m \geq 1}, one has

\displaystyle \mathbf E\lambda^m - \mathbf E\zeta^m = O\left(\frac{m \log m}{n}\right).

In particular, {\lambda^{(\infty)}} and {\zeta^{(\infty)}} have identical moments.

Proof: If we take {1 < R \leq 3/2} then we conclude that

\displaystyle \sum_{m \neq 0} \frac{e^{im\theta}}{|m| R^{|m|}} \mathbf E\lambda^{|m|} - \sum_{m \neq 0} \frac{e^{im\theta}}{|m| R^{|m|}} \mathbf E\zeta^{|m|} = O\left(\frac{1}{n} \log \frac{1}{R - 1}\right).

The left-hand side is a Fourier series, and by uniqueness of Fourier series it holds that for every {m},

\displaystyle \frac{e^{im\theta}}{|m| R^{|m|}} \mathbf E(\lambda^{|m|} - \zeta^{|m|}) = O\left(\frac{1}{n} \log \frac{1}{R - 1}\right).

This gives a bound on the difference of moments

\displaystyle \mathbf E\lambda^m - \mathbf E\zeta^m = O\left(\frac{m R^m}{n} \log \frac{1}{R - 1}\right)

which is only possible if the moments of {\lambda^{(\infty)}} and {\zeta^{(\infty)}} are identical. The left-hand side doesn’t depend on {R}, but if {m \geq 2}, {R = 1 + 1/m}, then {R^m \leq 2} and {-\log(R - 1) = \log m} so the claim holds. On the other hand, if {m = 1} then this claim still holds, since we showed last time that

\displaystyle \mathbf E\lambda = \mathbf E\zeta

and obviously {1 \log 1 = 0}. \Box

Here I was puzzled for a bit. Surely if two random variables have the same moment-generating function then they are identically distributed! But, while we can define the moment-generating function of a random variable as a formal power series {F}, it is not true that {F} has to have a positive radius of convergence, in which case the inverse Laplace transform of {F} is ill-defined. Worse, the circle is not simply connected, and in case one, we have to look at a uniform distribution on the circle, whose moments therefore aren’t going to points on the circle, so the moment-generating function doesn’t tell us much.

2. Balayage

We recall the definition of the Poisson kernel {P}:

\displaystyle P(Re^{i\theta}, re^{i\alpha}) = \sum_{m = -\infty}^\infty \frac{r^{|m|}}{R^{|m|}} e^{im(\theta - \alpha)}

whenever {0 < r < R} is a radius. Convolving the Poisson kernel against a continuous function {g} on {\partial B(0, R)} solves the Dirichlet problem of {B(0, R)} with boundary data {g}.

Definition 6 Let {\eta \in D(0, R)} be a random variable. The balayage of {\eta} is

\displaystyle \text{Bal}(\eta)(Re^{i\theta}) = \mathbf EP(Re^{i\theta}, \eta).

Balayage is a puzzling notion. First, the name refers to a hair-care technique, which is kind of unhelpful. According to Tao, we’re supposed to interpret balayage as follows.

If {w_0 \in B(0, R)} is an initial datum for Brownian motion {w}, then {P(Re^{i\theta}, w_0)} is the probability density of the first location {Re^{i\theta}} where {w} passes through {\partial B(0, R)}. Tao asserts this without proof, but conveniently, this was a problem in my PDE class last semester. The idea is to approximate {\mathbf R^2} by the lattice {L_\varepsilon = \varepsilon \mathbf Z^2}, which we view as a graph where each vertex has degree {4}, with one edge to each of the vertices directly above, below, left, and right of it. Then the Laplacian on {\mathbf R^2} is approximated by the graph Laplacian on {L_\varepsilon}, and Brownian motion is approximated by the discrete-time stochastic process wherein a particle starts at the vertex that best approximates {w_0} and at each stage has a {1/4} chance of moving to each of the vertices adjacent to its current position.

So suppose that {w_0} and {Re^{i\theta}} are actually vertices of {L_\varepsilon}. The probability density {P_\varepsilon(Re^{i\theta}, w_0)} is harmonic in {w_0} with respect to the graph Laplacian since it is the mean of {P_\varepsilon(Re^{i\theta}, w)} as {w} ranges over the adjacent vertices to {w_0}; therefore it remains harmonic as we take {\varepsilon \rightarrow 0}. The boundary conditions follow similarly.

Now {\eta} if is a random initial datum for Brownian motion which starts in {D(0, R)}, the balayage of {\eta} is again a probability density on {\partial B(0, R)} that records where one expects the Brownian motion to escape, but this time the initial datum is also random.

I guess the point is that balayage serves as a substitute for the moment-generating function in the event that the latter is just a formal power series. We want to be able to use analytic techniques on the moment-generating function, but we can’t, so we just use balayage instead.

Let {\psi} be the balayage of {\eta}. Since {\eta} is bounded, we can use Fubini’s theorem to commute the expectation with the sum and see that

\displaystyle \psi(Re^{i\theta}) = \sum_{m-\infty}^\infty R^{-|m|} e^{im\theta} \mathbf E(r^{|m|} e^{-im\alpha}) = 1 + 2\sum_{m=1}^\infty R^{-|m|} \cos m\theta \mathbf E(r^{|m|} \cos m\alpha)

provided that {\eta = re^{i\alpha}}. It will be convenient to rewrite this in the form

\displaystyle \psi(Re^{i\theta}) = 1 + 2\text{Re} \sum_{m=1}^\infty R^{-m}e^{im\theta} \mathbf E\eta^m

so {\psi} is uniquely determined by the moment-generating function of {\eta}. In particular, {\lambda^{(\infty)}} and {\zeta^{(\infty)}} have identical balayage, and one has a bound

\displaystyle \text{Bal}(\lambda)(Re^{i\theta}) - \text{Bal}(\zeta)(Re^{i\theta}) = O\left(\frac{1}{n}\sum_{m=1}^\infty \frac{m \log m}{R^m}\right).

We claim that

\displaystyle \sum_{m=1}^\infty \frac{m \log m}{R^m} = O\left(-\frac{\log(R-1)}{(R - 1)^2}\right)

which implies the bound

\displaystyle \text{Bal}(\lambda)(Re^{i\theta}) - \text{Bal}(\zeta)(Re^{i\theta}) = O\left(\frac{1}{n}\frac{\log\frac{1}{R-1}}{(R - 1)^2}\right).

To see this, we discard the {m = 1} term since {1 \log 1 = 0}, which implies that

\displaystyle \sum_{m=1}^\infty \frac{m \log m}{R^m} = \sum_{M=1}^\infty \sum_{m=2^M}^{2^{M+1} - 1} \frac{m \log m}{R^m}.

Up to a constant factor we may assume that the logarithms are base {2} in which case we get a bound

\displaystyle \sum_{m=1}^\infty \frac{m \log m}{R^m} \leq C\sum_{M=1}^\infty \frac{M2^M}{R^{2^M}}.

The constant is absolute since {R \in (1, 3/2]}.

By the integral test, we get a bound

\displaystyle \sum_{M=1-\log(R-1)}^\infty \frac{M2^M}{R^{2^M}} \leq C\int_{-\log(R-1)}^\infty \frac{x2^x}{R^{2^x}} ~dx \leq C\int_{-\log(R-1)}^\infty \frac{2^{x^{(1+\varepsilon)}}}{R^{2^x}} ~dx.

Using the bound

\displaystyle \int_{1/(R-1)}^\infty \frac{dy}{R^y} \leq CR^{-1/(R-1)} \leq C2^{-1/(R-1)}

for any {N} and the change of variable {y = 2^x} (thus {dy = 2^x \log 2 ~dx}), we get a bound

\displaystyle \sum_{M=1-\log(R-1)}^\infty \frac{M2^M}{R^{2^M}} \leq C \int_{-\log(R-1)} \frac{dy}{R^y} \leq C2^{-1/(R-1)}

since the {\varepsilon} error in the exponent can’t affect the exponential decay of the integral in {1/(R-1)}. Since we certainly have

\displaystyle 2^{-1/(R-1)} \leq C\frac{-\log(R-1)}{(R-1)^2}

this is a suitable tail bound.

To complete the proof of the claim we need to bound the main term. To this end we bound

\displaystyle \sum_{M=1}^{-\log(R-1)} \frac{M2^M}{R^{2^M}} \leq \log\frac{1}{R-1} \sup_{x > 0} \frac{x2^x}{R^{2^x}} = \log\frac{1}{R-1} 2 \uparrow \sup_y \frac{y \log y}{R^y}.

Here {\alpha \uparrow \beta = \alpha^\beta} denotes exponentiation. Now if {R - 1} is small enough (say {R - 1 < 3/4}), this supremum will be attained when {x > 1}, thus {y \log y \leq 2y}. Therefore

\displaystyle \sum_{M=1}^{-\log(R-1)} \frac{M2^M}{R^{2^M}} \leq \left(2\uparrow \sup_{y > 0} \frac{y}{R^y}\right)^2 \log\frac{1}{R-1} .

Luckily {yR^{-y}} is easy to differentiate: its critical point is {1/y = \log R}. This gives

\displaystyle \sup_{y > 0} \frac{y}{R^y} \leq \log \frac{1}{R - 1}

so

\displaystyle \left(2\uparrow \sup_{y > 0} \frac{y}{R^y}\right)^2 \leq \frac{1}{(R-1)^2}

which was the bound we needed, and proves the claim. Maybe there’s an easier way to do this, because Tao says the claim is a trivial consequence of dyadic decomposition.

Let’s interpret the bound that we just proved. Well, if the balayage of {\eta} is supposed to describe the point on the circle {\partial B(0, R)} at which a Brownian motion with random initial datum {\eta} escapes, a bound on a difference of two balyages should describe how the trajectories diverge after escaping. In this case, the divergence is infinitesimal, but at different speeds depending on {R}. As {R \rightarrow 1}, our infinitesimal divergence gains a positive standard part, while if {R} stays close to {3/2}, the divergence remains infinitesimal. This makes sense, since if we take a bigger circle we forget more and more about the fact that {\zeta,\lambda} are not the same random variable, since Brownian motion has more time to “forget more stuff” as it just wanders around aimlessly. So in the regime where {R} is close to {3/2}, it is reasonable to take standard parts and pass to {\zeta^{(\infty)}} and {\lambda^{(\infty)}}, while in the regime where {R} is close to {1} this costs us dearly.

3. Case zero

Suppose that {a} is infinitesimal.

We showed last time that {\zeta \in \overline{D(0, 1)} \setminus \overline{D(a, 1)}}, so {d(\zeta, C) = O(a)} is infinitesimal. Therefore {\zeta^{(\infty)} \in C} almost surely.

I think there’s a typo here, because Tao lets {K} range over {D(0, 1) \setminus C} and considers points {e^{i\theta} \in D(0, 1) \setminus C}, which don’t exist since {|e^{i\theta}| = 1} while every point in {D(0, 1)} has {|\cdot| < 1}. I think this can be fixed by taking closures, which is what I do in the next lemma.

Tao proves a “qualitative” claim and then says that by repeating the argument and looking out for constants you can get a “quantitative” version which is what he actually needs. I’m just going to prove the quantitative argument straight-up. The idea is that if {K} is a compact set which misses {C} and {\lambda \in K} then a Brownian motion with initial datum {\lambda} will probably escape through an arc {J} which is close to {K}, but {J} is not close to {C} so a Brownian motion which starts at {\zeta} will probably not escape through {J}. Therefore {\lambda,\zeta} have very different balayage, even though the difference in their balayage was already shown to be infinitesimal.

I guess this shows the true power of balayage: even though the moment-generating function is “just” a formal power series, we know that the essential supports of {\lambda,\zeta} must “look like each other” up to rescaling in radius. This still holds in case one, where one of them is a circle and the other is the center of the circle. Either way, you get the same balayage, since whether you start at some point on a circle or you start in the center of the circle, if you’re a Brownian motion you will exhibit the same long-term behavior.

In the following lemmata, let {K \subset \overline{D(0, 1)} \setminus C} be a compact set. The set {\{\theta \in (-\pi/2, \pi/2): e^{i\theta} \in K\}} is compact since it is the preimage of a compact set, so contained a compact interval {I_K \subseteq (-\pi/2, \pi/2)}.

Lemma 7 One has

\displaystyle \inf_{w \in K} \int_{I_K} P(Re^{i\theta}, w) ~d\theta > 0.

Proof: Since {K} is compact the minimum is attained. Let {w} be the minimum. Since {P} is a real-valued harmonic function in {w}, thus

\displaystyle \Delta \int_{I_K} P(Re^{i\theta}, w) ~d\theta = \int_{I_K} \Delta P(Re^{i\theta}, w) ~d\theta = 0,

the maximum principle implies that the worst case is when {K} meets {\partial D(0, R)} and {w \in \partial D(0, R)}, say {w = Re^{i\alpha}}. Then

\displaystyle P(Re^{i\theta}, w) = \sum_{m=-\infty}^\infty e^{im(\theta - \alpha)}.

Of course this is just a formal power series and doesn’t make much sense. But if instead {w = re^{i\alpha}} where {r/R} is very small depending on a given {\varepsilon > 0}, then, after discarding quadratic terms in {r/R},

\displaystyle P(Re^{i\theta}, w) \leq \frac{1 + \varepsilon}{1 - 2(r/R)\cos(\theta - \alpha)}.

This follows since in general

\displaystyle P(Re^{i\theta}, w) = \frac{1 - (r/R)^2}{1 - 2(r/R) \cos(\theta - \alpha) + (r/R)^2}.

Now

\displaystyle \int_{I_K^c} \frac{d\theta}{1 - 2(r/R)\cos(\theta - \alpha)} < \pi

since the integrand is maximized when {\cos(\theta - \alpha) = 0}, in which case the integrand evaluates to the measure of {I_K^c}, which is {< \pi} since {I_K^c = (-\pi/2, \pi/2) \setminus I_K} and {I_K} has positive measure. Therefore

\displaystyle \int_{I_K^c} P(Re^{i\theta}, w) ~d\theta < \frac{3\pi}{2}.

On the other hand, for any {w} one has

\displaystyle \int_{-\pi/2}^{\pi/2} P(Re^{i\theta}, w) ~d\theta = 2\pi,

so this implies gives a lower bound on the integral over {I_K}. \Box

Lemma 8 If {1 < R \leq 3/2} then

\displaystyle \mathbf P(\lambda \in K) \leq C_K\left(a + R - 1 + \frac{\log \frac{1}{R - 1}}{n(R-1)^2} \right).

Proof: Let {w = \lambda} in the previous lemma, conditioning on the event {\lambda \in K}, to see that

\displaystyle \int_{I_K} P(Re^{i\theta}, \lambda) ~d\theta \geq \delta_K

where {\delta_K > 0}. Taking expectations and dividing by the probability that {\lambda \in K}, we can use Fubini’s theorem to deduce

\displaystyle \mathbf P(\lambda \in K) \leq C_K \int_{I_K} \text{Bal}(\lambda)(Re^{i\theta}) ~d\theta

where {C_K\delta_K = 1}. Applying the bound on {|\text{Bal}(\lambda) - \text{Bal}(\zeta)|} from the section on balayage, we deduce

\displaystyle \mathbf P(\lambda \in K) \leq C_K \int_{I_K} \text{Bal}(\zeta)(Re^{i\theta}) ~d\theta + C_K\frac{\log\frac{1}{R-1}}{n(R-1)^2}.

We already showed that {d(\zeta, C) = O(a)}. So in order to show

\displaystyle \int_{I_K} \text{Bal}(\zeta)(Re^{i\theta}) ~d\theta \leq C_K(a + R - 1),

which was the bound that we wanted, it suffices to show that for every {re^{i\alpha}} such that {d(re^{i\alpha}, C) = O(a)},

\displaystyle \int_{I_K} P(Re^{i\theta}, re^{i\alpha}) ~d\theta \leq C_K(a + R - 1).

Tao says that “one can show” this claim, but I wasn’t able to do it. I think the point is that under those cirumstances one has {r = R - O(a)} and {\cos \alpha \ll a} even as {\cos \theta \gg 0}, so we have some control on {\cos(\theta - \alpha)}. In fact I was able to compute

\displaystyle \int_{I_K} P(Re^{i\theta}, re^{i\alpha}) ~d\theta = -\sum_m (r/R)^{|m|}\frac{e^{-im(\alpha + \delta) + e^{-im(\alpha - \delta)}}}{m}

which suggests that this is the right direction, but the bounds I got never seemed to go anywhere. Someone bug me in the comments if there’s an easy way to do this that I somehow missed. \Box

Now we take {R = 1 + n^{-1/3}} to complete the proof.

4. Case one

Suppose that {1 - a} is infinitesimal. Let {\mu} be the expected value of {\lambda} (hence also of {\zeta}). Let {0 < \delta \leq 1/2} be a standard real.

We first need to go on an excursion to a paper of Dégot, who proves the following theorem:

Lemma 9 One has

\displaystyle |f'(a)| \geq cn |f(\delta)|.

Moreover,

\displaystyle |f(\delta)| \leq (1 + \delta^2 - 2\delta \text{Re }\mu)^{n/2}.

I will omit the proof since it takes some complex analysis I’m pretty unfamiliar with. It seems to need Grace’s theorem, which I guess is a variant of one of the many theorems in complex analysis that says that the polynomial image of a disk is kind of like a disk. It also uses some theorem called the Walsh contraction principle that involves polynomials on the projective plane. Curious.

In what follows we will say that an event {E} is standard-possible if the probability that {E} happens has positive standard part.

Lemma 10 For every {\varepsilon > 0}, {\mathbf P(\text{Re }\zeta \leq \varepsilon)} is standard-possible. Besides, {|f'(a)| > n}.

Proof: Since {|\zeta - a| > 1} almost surely and

\displaystyle U_\zeta(a) = -\frac{\log n}{n - 1} - \frac{1}{n - 1} \log |f'(a)|

but

\displaystyle U_\zeta(a) = -\mathbf E \log |\zeta - a| < 0,

we have

\displaystyle |f'(a)| > n.

Combining this with the lemma we see that the standard part of {|f(\delta)|} is {> 0}, so

\displaystyle 1^{1/n} \leq O(\sqrt{1 + \delta^2 + \delta\text{Re }\mu}).

On the other hand,

\displaystyle 1 - O(n^{-1}) \leq 1^{1/n}

and since {n} is nonstandard, {1/n} is infinitesimal, so the constant in {O(\sqrt{1 + \delta^2 + \text{Re }\mu})} gets eaten. In particular,

\displaystyle 1 - O(n^{-1}) \leq \sqrt{1 + \delta^2 + \delta\text{Re }\mu}

which implies that

\displaystyle 1 + o(1) \leq 1 + \delta^2 + \delta\text{Re }\mu

and hence

\displaystyle \text{Re }\mu \leq \frac{\delta}{2} + o(1).

Since this is true for arbitrary standard {\delta}, underspill implies that there is an infinitesimal {\kappa} such that

\displaystyle \text{Re }\mu \leq \kappa.

But {|\text{Re }\zeta| \leq 1} almost surely, and we just showed

\displaystyle \mathbf E\text{Re }\zeta \leq \kappa.

So the claim holds. \Box

We now allow {\delta} to take the value {0}, thus {0 \leq \delta \leq 1/2}.

Lemma 11 One has

\displaystyle |f(0)| \sim |f(\delta)| \sim 1

and

\displaystyle |f'(a)| \sim n.

Moreover, {|f(z)| \sim 1} if {|z - 1/2| < 1/100}, so {f} has no zeroes {z} in that disk.

Proof: Since

\displaystyle \mathbf E \log\frac{1}{|z - \zeta|} = -\frac{\log n}{n - 1} - \frac{1}{n - 1} \log |f'(z)|

one has

\displaystyle \log |f'(a)| - \log |f'(\delta)| = (n-1)\mathbf E \frac{|a - \zeta|}{|\delta - \zeta|}.

Now {|a - \zeta| \geq 1} and {|\zeta| \leq 1}.

Here I drew two unit circles in {\mathbf C}, one entered at the origin and one at {1} (since {|a - 1|} is infinitesimal); {\zeta} is (up to infinitesimal error) in the first circle and out of the second. The rightmost points of intersection between the two circles are on a vertical line which by the Pythagorean theorem is to the left of the vertical line {x = a/2}, which in turn is to the left of the perpendicular bisector {x = (a+\delta)/2} {[\delta, a]}. Thus {|a - \zeta| \geq |\delta - \zeta|}, and if {|\delta - \zeta| = |a - \zeta|} then the real part of {\zeta} is {(a+\delta)/2}. In particular, if the standard real part of {\zeta} is {< 1/2} then {|a - \zeta| > |\delta - \zeta|}, so {\log |a - \zeta|/|\delta - \zeta|} has positive standard part.

By the previous lemma, it is standard-possible that the standard real part of {\zeta} is {\leq 1/4 < 1/2}, so the standard real part of {\zeta} is standard-possibly positive and {\mathbf E \log|a-\zeta|/|\delta - \zeta|} is almost surely nonnegative. Plugging into the above we deduce the existence of a standard absolute constant {c > 0} such that

\displaystyle \log |f'(a)| - \log |f'(\delta)| \geq cn.

In particular,

\displaystyle f'(\delta) \leq |f'(\delta)| \leq e^{-cn} |f'(a)|.

Keeping in mind that {|f'(a)| > n} is nonstandard, this doesn’t necessarily mean that {f'(\delta)} has nonpositive standard part, but it does give a pretty tight bound. Taking a first-order Taylor approximation we get

\displaystyle f(0) = f(\delta) + O(e^{-cn|f'(a)|}).

But one has

\displaystyle |f'(a)| \geq cn |f(\delta)|

from the Dégot lemma. Clearly this term dominates {e^{-cn}|f'(a)|} so we have

\displaystyle |f(0)| \geq \frac{c}{n} |f'(a)|.

Since one has a lower bound {|f'(a)| > n} this implies {|f(0)|} is controlled from below by an absolute constant.

We also claim {|f(0)| \leq 1}. In fact, we showed last time that

\displaystyle -U_\lambda(0) = \frac{1}{n} \log |f(0)|;

we want to show that {\log |f(0)| \leq 0}, so it suffices to show that {U_\lambda(0) \geq 0}, or in other words that

\displaystyle \mathbf E \log |\lambda| \leq 0.

Since {|\lambda| \leq 1} by assumption on {f}, this is trivial. We deduce that

\displaystyle |f(0)| \sim |f(\delta)| \sim 1

and hence

\displaystyle |f'(a)| \sim n.

Now Tao claims that the proof that {|f(z)| \sim 1} is similar, if {|z - 1/2| < 1/100}. Since {\delta = 1/2} was a valid choice of {\delta} we have {|f(1/2)| \sim 1}. Since {|z - 1/2| < 1/100}, if {\text{Re }\zeta \leq 1/4} then {|a - \zeta|/|z - \zeta| \geq c > 1} where {c} is an absolute constant. Applying the fact that {\text{Re }\zeta \leq 1/4} is standard-possible and {\mathbf E \log|a-\zeta|/|z - \zeta|} is almost surely nonnegative we get

\displaystyle f'(z) \leq e^{-cn} |f'(a)|

so we indeed have the claim. \Box

We now prove the desired bound

\displaystyle \mathbf E \log \frac{1}{|\lambda|} \leq O(n^{-1}).

Actually,

\displaystyle \mathbf E \log \frac{1}{|\lambda|} = \frac{1}{n} \log \frac{1}{|f(0)|}

as we proved last time, so the bound {|f(0)| \sim 1} guarantees the claim.4

In particular

\displaystyle \mathbf E \log \frac{1}{|\lambda^{(\infty)}|} = 0

by Fatou’s lemma. So {|\lambda^{(\infty)}| = 1} almost surely. Therefore {U_{\lambda^{(\infty)}}} is harmonic on {D(0, 1)}, and we already showed that {|f(z)| \sim 1} if {|z - 1/2|} was small enough, thus

\displaystyle U_\lambda(z) = O(n^{-1})

if {|z - 1/2|} was small enough. That implies {U_{\lambda^{(\infty)}} = 0} on an open set and hence everywhere. Since

\displaystyle U_\eta(Re^{i\theta}) = \frac{1}{2} \sum_{m \neq 0} \frac{e^{im\theta}}{|m|} \mathbf E\eta^{|m|}

we can plug in {\eta = \lambda^{(\infty)}} and conclude that all moments of {\lambda^{(\infty)}} except the zeroth moment are zero. So {\lambda^{(\infty)}} is uniformly distributed on the unit circle.

By overspill, I think one can intuit that if {f} is a random polynomial of high degree which has a zero close to {1}, all zeroes in {D(0, 1)}, and no critical point close to {a}, then {f} sort of looks like

\displaystyle z \mapsto \prod_{k=0}^{n-1} z - \omega^k

where {\omega} is a primitive root of unity of the same degree as {f}. Therefore {f} looks like a cyclotomic polynomial, and therefore should have lots of zeroes close to the unit sphere, in particular close to {1}, a contradiction. This isn’t rigorous but gives some hint as to why this case might be bad.

Now one has

\displaystyle \mathbf E \log |\zeta - a| = \frac{1}{n} \log \frac{|f'(a)|}{n} = O(n^{-1})

and in particular by Fatou’s lemma

\displaystyle \mathbf E \log |\zeta^{(\infty)} - 1| = 0.

But it was almost surely true that {\zeta^{(\infty)} \notin D(1, 1)}, thus that {\log |\zeta^{(\infty)} - 1| \geq 0}. So this enforces {\zeta^{(\infty)} \in \partial D(1, 1)} almost surely. In particular, almost surely,

\displaystyle \zeta^{(\infty)} \in \partial D(1, 1) \cap \overline{D(0, 1)} = \gamma.

Since {\gamma} is a contractible curve, its complement is connected. We recall that {U_{\lambda^{(\infty)}} = U_{\zeta^{(\infty)}}} near infinity, and since we already know the distribution of {\lambda^{(\infty)}}, we can use it to compute {U_{\zeta^{(\infty)}}} near infinity. Tao says the computation of {U_{\zeta^{(\infty)}}} is a straightforward application of the Newtonian shell theorem; he’s not wrong but I figured I should write out the details.

For {\eta = \lambda^{(\infty)}} one has

\displaystyle U_\eta(z) = \mathbf E \log \frac{1}{|z - \eta|} = \frac{1}{2\pi} \int_{\partial D(0, 1)} \log \frac{1}{|z - w|} ~d|w|

where the {d|w|} denotes that this is a line integral in {\mathbf R^2} rather than in {\mathbf C}. Translating we get

\displaystyle U_\eta(z) =- \frac{1}{2\pi} \int_{\partial D(z, 1)} \log |w| ~d|w|

which is the integral of the fundamental solution of the Laplace equation over {\partial D(z, 1)}. If {|z| > 1} (reasonable since {z} is close to infinity), this implies the integrand is harmonic, so by the mean-value formula one has

\displaystyle U_\eta(z) = -\log |z|

and so this holds for both {\eta = \lambda^{(\infty)}} and {\eta = \zeta^{(\infty)}} near infinity. But then {\zeta^{(\infty)}} is harmonic away from {\gamma}, so that implies that

\displaystyle U_{\zeta^{(\infty)}} = \log \frac{1}{|z|}.

Since the distribution {\nu} of {\zeta^{(\infty)}} is the Laplacian of {U_{\zeta^{(\infty)}}} one has

\displaystyle \nu = \Delta \log \frac{1}{|z|} = \delta_0.

Therefore {\zeta^{(\infty)} = 0} almost surely. In particular, {\zeta} is infinitesimal almost surely. This completes the proof in case one.

By the way, I now wonder if when one first learns PDE it would be instructive to think of the fundamental solution of the Laplace equation and the mean-value formulae as essentially a consequence of the classical laws of gravity. Of course the arrow of causation actually points the other way, but we are humans living in a physical world and so have a pretty intuitive understanding of what gravity does, while stuff like convolution kernels seem quite abstract.

Next time we’ll prove a contradiction for case zero, and maybe start on the proof for case one. The proof for case one looks really goddamn long, so I’ll probably skip or blackbox some of it, maybe some of the earlier lemmata, in the interest of my own time.

Let’s Read: Sendov’s conjecture in high degree, part 1

Having some more free time than usual, I figured I would read a recent paper that looked interesting. However, I’m something of a noob at math, so I figure it’s worth it to take it slowly and painstakingly think through the details. This will be a sort of stream-of-conciousness post where I do just that.

The paper I’ll be reading will be Terry Tao’s “Sendov’s conjecture for sufficiently high degree polynomials.” This paper looks interesting to me because it applies “cheap nonstandard” methods to prove a result in complex analysis. In addition it uses probability-theoretic methods, which I’m learning a lot of right now. Sendov’s conjecture is the following:

Sendov’s conjecture Let {f: \mathbf C \rightarrow \mathbf C} be a polynomial of degree {n \geq 2} that has all zeroes {\leq 1}. If {a} is a zero then {f'} has a zero in {\overline{D(a, 1)}}.

Without loss of generality, we may assume that {f} is monic and that {a \in [0, 1]}. Indeed, if {f} is a polynomial in the variable {z} we divide by the argument of {z} and then rescale by the top coefficient.

Tao notes that by Tarski’s theorem, in principle it suffices to get an upper bound {n_0} on the degree of a counterexample and then use a computer-assisted proof to complete the argument. I think that for every {n \leq n_0} you’d get a formula in the theory of real closed fields that could be decided in {O(n^r)} time, where {r} is an absolute constant (which, unfortunately, is exponential in the number of variables of the formula, and so is probably quite large). Worse, Tao is going to use a compactness argument and so is going to get an astronomical bound {n_0}. Still, something to keep in mind — computer-assisted proofs seem like the future in analysis.

More precisely, Tao proves the following:

Proposition 1 For every {n} in a monotone sequence in {\mathbf N}, let {f} be a monic polynomial of degree {n} with all zeroes in {\overline{D(0, 1)}} and let {a \in [0, 1]} satisfy {f(a) = 0}. If for every {n}, {f'} has no zeroes in {\overline{D(a, 1)}}, then {0 = 1}.

It’s now pretty natural to see how “cheap nonstandard” methods apply. One can pass to a subsequence countably many times and still preserve the hypotheses of the proposition, so by diagonalization and compactness, we may assume good (but ineffective) convergence properties. For example, we can assume that {a = a^{(\infty)} + o(1)}, where {a^{(\infty)}} does not depend on {n} and {o(1)} is with respect to {n}.

Using overspill, one can view the proposition model-theoretically: it says that if {n} is a nonstandard natural number, {f} a monic polynomial of degree {n} with a zero {a \in [0, 1]}, and there are no zeroes of {f'} in {\overline{D(a, 1)}}, then {0 = 1}. Tao never fully takes this POV, but frequently appeals to results like the following:

Proposition 2 Let {P} be a first-order predicate. Then:

  1. (Overspill) If for every sufficiently small standard {\varepsilon}, {P(\varepsilon)}, then there is an infinitesimal {\delta} such that {P(\delta)}.
  2. (Underspill) If for every infinitesimal {\delta}, {P(\delta)}, then there are arbitrarily small {\varepsilon} such that {P(\varepsilon)}.
  3. ({\aleph_0}-saturation) If {P} is {\forall z \in K(f(z) = O(1))} where {K \subseteq \mathbf C} is compact, then the implied constant in the statement of {P} is independent of {z}.

Henceforth we will use asymptotic notation in the nonstandard sense; for example, a quantity is {o(1)} if it is infinitesimal. This is equivalent to the cheap nonstandard perspective where a quantity is {o(1)} iff it is with respect to {n}, where {n} is ranging over some monotone sequence of naturals. I think the model-theoretic perspective is helpful here because we are going to pass to subsequences a lot, and at least in the presence of the boolean prime ideal theorem, letting {n} be a fixed nonstandard natural number is equivalent to choosing a nonprincipal ultrafilter on {\mathbf N} that picks out the subsequences we are going to pass to, in the perspective where {n} ranges over a monotone sequence of standard naturals.

This follows because the Stone-Cech compactification {\beta \mathbf N} is exactly the space of ultrafilters on {\mathbf N}. Indeed, if {n} ranges over a monotone sequence of standard naturals, then in {\beta \mathbf N}, {n} converges to an element {U} of {\beta \mathbf N \setminus \mathbf N}, which then is a nonprincipal ultrafilter. If {\mathbf N^\omega/U} denotes the ultrapower of {\mathbf N} with respect to {U}, then I think the equivalence class of the sequence {\{1, 2, \dots\}} in {\mathbf N^\omega/U} is exactly the limit of {n}. Conversely, once a nonprincipal ultrafilter {U \in \beta \mathbf N \setminus \mathbf N} has been fixed, we have a canonical way to pass to subsequences: only pass to a subsequence which converges to {U}. This is possible since {\beta \mathbf N} is compact.

I think it will be at times convenient to go back to the “monotone sequence of standard naturals” perspective, especially when we’re doing computations, so I reserve the right to go between the two. We’ll call the monotone sequence perspective the “cheap perspective” and the model-theoretic perspective the “expensive perspective”.

I’m not familiar with the literature on Sendov’s conjecture, so I’m going to blackbox the reduction that Tao carries out. The reduction says that, due to the Gauss-Lucas theorem and previously existing partial results on Sendov’s conjecture, to prove Proposition 1, it suffices to show:

Proposition 3 Let {n} be a nonstandard natural, let {f} be a monic polynomial of degree {n} with all zeroes in {\overline{D(0, 1)}} and let {a \in [0, 1]} satisfy {f(a) = 0}. Suppose that {f'} has no zeroes in {\overline{D(0, 1)}} and

  1. (Theorem 3.1 in Tao) either {a = o(1/\log n)}, or
  2. (Theorem 5.1 in Tao) there is a standard {\varepsilon_0 > 0} such that

    \displaystyle 1 - o(1) \leq a \leq 1 - \varepsilon_0^n.

Then {0 = 1}.

In the former case we have {a = o(1)} and in the latter we have {|a - 1| = o(1)}. We’ll call the former “case zero” and the latter “case one.”

Tao gives a probabilistic proof of Proposition 3, and that’s going to be the bulk of this post and its sequels. Let {\zeta} be a random zero of {f'}, drawn uniformly from the finite set of zeroes. Le {\lambda} denote a random zero of {f} chosen independently of {\zeta}.

In the cheap perspective, {\zeta} depends on {n}, we are going to study properties of the convergence of {\zeta} as {n \rightarrow \infty}, by using our chosen ultrafilter to repeatedly pass to subsequences to make {\zeta} converge in some suitable topology. The probability spaces that {\zeta} lives in depend on {n}, but as long as we are interested in a deterministic limit {c} that does not depend on {n}, this is no problem. Indeed {\zeta} will converge to {c} uniformly (resp. in probability) provided that for every {\varepsilon > 0}, {|\zeta - c| \leq \varepsilon} almost surely (resp. {\mathbf P(|\zeta - c| \leq \varepsilon) = 1 - o(1)}), and this makes sense even though the probability space we are studying depends on {n}. The usual definition of convergence in distribution still makes sense even for random variables {\zeta} converges to a random variable {X} that deos not depend on {n} provided that their distributions {\zeta_*\mathbf P} converge vaguely to {X_*\mathbf P}.

Okay, it’s pretty obvious what being infinitesimally close to a deterministic standard real is in the uniform or probabilistic sense. Expanding out the definition of the vague topology of measures, a nonstandard measure {\mu} on a locally compact Hausdorff space {Q} is infinitesimally close to a standard measure {\nu} provided that for every continuous function {f} with compact support,

\displaystyle \left|\int_Q f~d\mu - \int_Q f~d\nu\right| = o(1).

This induces a definition of being infinitesimally close in distribution.

Okay, no more model-theoretic games, it’s time to start the actual proof.

Definition 4 Let {\eta} be a bounded complex random variable. The logarithmic potential of {\eta} is

\displaystyle U_\eta(z) = \mathbf E \log \frac{1}{|z - \eta|}.

Here {\mathbf E} denotes expected value. Tao claims but does not prove that this definition makes sense for almost every {z \in \mathbf C}. To check this, let {K} be a compact set {\subseteq \mathbf C} equipped with Lebesgue measure {\mu} and let {\nu} be the distribution of {\zeta}. Then

\displaystyle \int_K U_\eta(z) ~d\mu(z) = \int_K \int_\mathbf C \log \frac{1}{|z - \omega|} ~d\nu(\omega) ~d\mu(z)

and the integrand is singular along the set {\{z = \omega\}}, which has real codimension {2} in {K \times \text{supp} \nu}. The double integral of a logarithm makes sense almost surely provided that the logarithm blows up with real codimension {2} (to see this, check the double integral of log {1/x} on {\mathbf R^2}) so this looks good.

Definition 5 Let {\eta} be a bounded complex random variable. The Stieltjes transform of {\eta} is

\displaystyle s_\eta(z) = \mathbf E \frac{1}{z - \eta}.

Then {s_\eta} is “less singular” than {U_\eta}, so this definition is inoffensive almost everywhere.

Henceforth Tao lets {\mu_\eta} denote the distribution of {\eta}, where {\eta} is any bounded complex random variable. Then {s_\eta = -\partial U_\eta} and {\mu_\eta = \overline \partial s_\eta/2\pi} where {\partial,\overline \partial} are the complex derivative and Cauchy-Riemann operators respectively. Since {\partial \overline \partial = \Delta} we have

\displaystyle 2\pi\mu_n = -\Delta s_\eta.

This just follows straight from the definitions. Of course {\mu_\eta} might not be an absolutely continuous measure, so this only makes sense if we use the calculus of distributions.

Does this make sense if {\eta} is deterministic, say {\eta = 0} almost surely? In that case {\mu_\eta} is a Dirac measure at {0} and {s_\eta(z) = 1/z}. Everything looks good, since {U_\eta(z) = -\log |z|}.

For the next claims I need the Gauss-Lucas theorem:

Theorem 6 (Gauss-Lucas) If {P} is a polynomial on {\mathbf C}, then all zeroes of {P'} belong to the convex hull of the variety {P = 0}.

Lemma 7 (Lemma 1.6i in Tao) {\lambda} surely lies in {\overline{D(0, 1)}} and {\zeta} surely lies in {\overline{D(0, 1)} \setminus \overline{D(a, 1)}}.

Proof: The first claim is just a tautology. For the second, by assumption on {f} all zeroes of {f} lie in the convex set {\overline{D(0, 1)}}, so so does their convex hull. In particular {\zeta} lies in {\overline{D(0, 1)}} almost surely by the Gauss-Lucas theorem. Our contradiction hypothesis says that {\zeta \notin \overline{D(a, 1)}}. \Box

Lemma 8 (Lemma 1.6ii-iv in Tao) One has {\mathbf E\lambda = \mathbf E\zeta}. For almost every {z \in \mathbf C},

\displaystyle U_\lambda(z) = -\frac{1}{n} \log |f(z)|

and

\displaystyle U_\zeta(z) = -\frac{\log n}{n - 1} - \frac{1}{n - 1} \log |f'(z)|.

Moreover,

\displaystyle s_\lambda(z) = \frac{1}{n} \frac{f'(z)}{f(z)}

and

\displaystyle s_\zeta(z) = \frac{1}{n - 1} \frac{f''(z)}{f'(z)}.

Moreover,

\displaystyle U_\lambda(z) - \frac{n-1}{n}U_\zeta(z) = \frac{1}{n} \log |s_\lambda(z)|

and

\displaystyle s_\lambda(z) - \frac{n - 1}{n} s_\zeta(z) = -\frac{1}{n} \frac{s'_\lambda(z)}{s_\lambda(z)}.

Proof: We pass to the cheap perspective, so {n} is a large standard natural. Since {n} is large, in particular {n \geq 2}, if {b_{n-1}} is the coefficient of {z^{n-1}} in {f} then the roots of {f} sum to {-b_{n-1}}. The roots of {f'} sum to {-(n-1)b_{n-1}/n} by calculus. So {\mathbf E\lambda = \mathbf E\zeta}.

We write

\displaystyle f(z) = \prod_{j=1}^n z - \lambda_j

and

\displaystyle f'(z) = n\prod_{j=1}^{n-1} z - \zeta_j.

Taking {-\log|\cdot|} of both sides and then dividing by {n} we immediately get {U_\lambda} and {U_\zeta}. Then we take the complex derivative of both sides of the {U_\lambda} and {U_\zeta} formulae to get the formulae for {s_\lambda} and {s_\zeta}.

Now the formula for {\log |s_\lambda|} follows by subtracting the above formulae, as does the formula for {s_\lambda'/s_\lambda}. \Box

Since the distributions of {\lambda} and {\zeta} have bounded support (it’s contained in {\overline{D(0, 1)}}) by Prokhorov’s theorem we can find standard random variables {\lambda^{(\infty)}} and {\zeta^{(\infty)}} such that {\lambda - \lambda^{(\infty)}} is infinitesimal in distribution and similarly for {\zeta}. The point is that {\lambda^{(\infty)}} and {\zeta^{(\infty)}} give, up to an infinitesimal error, information about the behavior of {f}, {f'}, and {f''} by the above lemma and the following proposition.

Proposition 9 (Theorem 1.10 in Tao) One has:

  1. In case zero, {\lambda^{(\infty)}} and {\zeta^{(\infty)}} are identically distributed and almost surely lie in {C = \{e^{i\theta}: 2\theta \in [\pi, 3\pi]\}}, so {d(\lambda, C)} is infinitesimal in probability. Moreover, for every compact set {K \subseteq \overline{D(0, 1)} \setminus C},

    \displaystyle \mathbf P(\lambda \in K) = O\left(a + \frac{\log n}{n^{1/3}}\right).

  2. In case one, {\lambda^{(\infty)}} is uniformly distributed on {\partial D(0, 1)} and {\zeta^{(\infty)}} is almost surely zero. Moreover,

    \displaystyle \mathbf E \log \frac{1}{|\lambda|}, \mathbf E\log |\zeta - \eta| = O\left(\frac{1}{n}\right).

The proposition gives quantitative bounds that force the zeroes to all be in certain locations. Looking ahead in the paper, it follows that:

  1. In case zero, the Stieltjes transform of {\lambda^{(\infty)}} is infinitesimally close to {f'/nf}, so by a stability-of-zeroes argument to show that {f} has no zeroes near the origin, even though {a = o(1)}.
  2. In case one, if {\sigma} is the standard deviation of {\zeta}, then we have control on the zeroes of {f} up to an error of size {o(\sigma^2) + o(1)^n}, which we can then use to deduce a contradiction.

Next time I’ll start the proof of this proposition. Its proof apparently follows from the theory of Newtonian potentials, which is not too surprising since {-\Delta \log x = \delta(x)} if {\Delta} is the Laplacian of {\mathbf R^2}. It needs the following lemma:

Lemma 10 (Lemma 1.6vi in Tao) If {\gamma} is a curve in {\mathbf C} that misses the zeroes of {f} and {f'} then

\displaystyle f(\gamma(1)) = f(\gamma(0)) \exp\left(n \int_\gamma s_\lambda(z) ~dz\right)

and

\displaystyle f'(\gamma(1)) = f'(\gamma(0)) \exp\left((n-1) \int_\gamma s_\zeta(z) ~dz\right).

Proof: One has

\displaystyle ns_\lambda(z) = \frac{f'(z)}{f(z)}

by the previous lemma. Breaking up {\gamma} into finitely many parts one can assume that {\gamma} is a contractible curve in a simply connected set, in which case we have a branch of the logarithm along {\gamma}. Now apply the fundamental theorem. The case for {s_\zeta} is the same. \Box

Internalizing tricks: the Heine-Borel theorem

I think that in analysis, the most important results are the tricks, not the theorems. I figure most analysts could prove any of the theorems in Rudin or Pugh at will, not because they have the results memorized, but because they know the tricks.

So it’s really important to internalize tricks! Here’s an example of how we could take apart a proof of the Heine-Borel theorem that every closed bounded set is compact, and internalize some of the tricks in it.

The proof we want to study is as follows.

Step 1. We first prove that [0, 1] is compact. Let (x_n) be a sequence in [0, 1] that we want to show has a convergent subsequence. Let x_{n_1} = x_1 and let I_1 = [0, 1].

Step 2. Suppose by induction that we are given I_1, \dots, I_J such that I_j is a subinterval of I_{j-1} of half length and there is a subsequence of (x_n) in I_j, and x_{n_j} \in I_j. By the pigeonhole principle, since there are infinitely many points of (x_n) in I_j, if we divide I_j into left and right closed subintervals of equal length, one of those two subintervals has infinitely many points of (x_n) as well. So let that subinterval be I_{J+1} and let (x_{n_{J+1}}) be the first point of (x_n) after x_{n_J} in I_{J+1}.

Step 3. After the induction completes we have a subsequence (x_{n_j}) of (x_n). By construction, x_{n_j} \in I_j and I_{j+1} is half of I_j, so |x_{n_j} - x_{n_{j+1}}| < 2^j. That implies that (x_{n_j}) is a Cauchy sequence, so it converges in \mathbf R, say x_{n_j} \to x.

Step 4. Since x is a limit of a sequence in [0, 1], and [0, 1] is closed, x \in [0, 1]. Therefore (x_n) has a convergent subsequence. So [0, 1] is compact.

Step 5. Now let K = [0, 1]^n be a box. We claim that K is compact. To see this, let $(x_n)$ be a sequence in K. If n = 1, then $(x_n)$ has a convergent subsequence.

Step 6. Suppose by induction that [0, 1]^{n-1} is compact. Then we can write x_n = (y_n, z_n) where y_n \in [0, 1]^{n-1}, z_n \in [0, 1]. So there is a convergent subsequence (y_{n_k}). Now (z_{n_k}) has a convergent subsequence (z_{n_{k_j}}), and then (x_{n_{k_j}}) is a convergent subsequence. So K is compact.

Step 7. Now let K be closed and bounded. So there is a box L = [-R, R]^n such that K \subseteq L. Without loss of generality, assume that R = 1.

Step 8. Since K is a closed subset of the compact set L, K is compact.

Let’s look at the tricks used at each stage:

Step 1. We want to show that an arbitrary closed and bounded set is compact. This sounds quite hard, as such sets can be nasty; however, it is often the case that if you can prove a special case of the theorem, the general theorem follows. Since [0, 1] is the prototypical example of a compact set, and is much nicer than e.g. Cantor dust in 26 dimensions, we first try to prove the Heine-Borel theorem on [0, 1].

Step 2. Here we use the informal principle that compactness is equivalent to path-finding in an infinite binary tree. That is, compactness requires us to make infinitely many choices, which is exactly the same thing as finding a path through an infinitely large tree, where we will have to choose whether to go left or right infinitely many times. Ideally every time we choose whether we go left or right, we will cut down on the complexity of the problem by half. Here the “complexity” is the size of the interval we’re looking at. This notion of “compactness” is ubiquitous in analysis, combinatorics, and logic. It is the deepest part of the proof of the Heine-Borel theorem, and is known as Koenig’s lemma.

Step 2 has another key idea to it. We need to make infinitely many choices, so we make infinitely many choices using induction. In general when traversing a graph, inducting on the length of the path so far will come in handy. If you don’t know which way to go, the pigeonhole principle and other nonconstructive tricks will also be highly useful here.

Step 3. Compactness gave us a subsequence, but we don’t know what the limit is. But to prove that a sequence converges without referring to an explicit limit, instead show that it is Cauchy. Actually, here we are forced to do this, because the argument of Step 2 could’ve been carried out over the rational numbers, yet the conclusion of the Heine-Borel theorem is false there. So this step could also be interpreted as make sure to use every hypothesis; here the hypothesis that we are working over the reals is key.

Step 4. Make sure to use every hypothesis; up to this point we’ve only used that [0, 1] is bounded, not closed.

Step 5. Here we again reason that if you can prove a special case of the theorem, the general theorem follows.

Step 6. Here n is an arbitrary natural number, so we prove a theorem about every natural number using induction. This is especially nice because the idea behind this proof was to build up the class of compact set iteratively, initializing with the unit interval; at every stage of this induction we also get a unit box.

This trick can be viewed as a special case of if you can prove a special case of the theorem, the general theorem follows: indeed, proving a theorem for every natural number would require infinitely many cases to be considered, but here there are just two, the base case and the inductive case. The inductive case was really easy, so the thing we are really interested in is the base case.

Step 7. Here we abstract away unnecessary parameters using symmetry. The parameter R is totally useless because topological notions don’t care about scaling. However, we do have a box, and it would be nice if it was a unit box because we just showed that unit boxes are compact. So we might as well forget about R and just assume it’s 1.

Step 8. Once again we make sure to use every hypothesis; the boundedness got us inside a box, so the closedness must be used to finish the proof.