Boolean algebras and probability

Lately I’ve been thinking a lot about the basics of probability theory. I originally learned the definition of probability measure from a logician — Wes Holliday — and his definition has probably permanently influenced how I think about probability, but I think that even analysts like myself agree that the formulation of probability in terms of sample spaces is a little unnatural. For example, one typically makes it a point to never mention specific points in the sample space, but only events — elements of a sigma-algebra over the sample space. Moreover, one typically identifies an event which happens almost surely with the sure event. Random variables are identified with their distributions, and so isomorphism of random variables “forgets” about events that almost surely do not happen as well as individual points in the sample space. Sample spaces are also utterly uninteresting: up to measurable isomorphism, there is only one standard Borel space, and in practice most probability spaces are Polish and therefore standard. There is also a lot of “junk” in the power set of a standard Borel space; all but a lesser cardinality of sets are not measurable. Simply put, the sample space contains far more information than a probabilist is willing to use.

Let me stress that one actually gets to serious probability theory, which I guess I would define as beginning when one proves the central limit theorem, the above is not terribly important (though it can help — see An uncountable Moore-Schmidt theorem, a paper in ergodic theory which in part inspired this blog post). Worrying about whether one has extra information won’t help you prove inequalities, which is what analysis is all about. But I think when one is still learning the basics, or is considering results which are either deeply infinitary (as in the case of the uncountable Moore-Schmidt theorem) or purely “measurable” (only having to do with sigma-algebras), it is instructive to step back and think about what a sample space really is.

As for myself, I’m on the teaching staff of a measure theory course this semester, and an exercise this week was to prove the following theorem, which appears in the exercises of both Billingsley and father Rudin.

Theorem 1. If \Sigma is an infinite sigma-algebra, then \Sigma has cardinality at least that of \mathbb R.

This exercise is hard essentially because it is not measure-theoretic. In fact, while the student is probably tempted to concoct a horribly complicated measure-theoretic argument, the statement is purely combinatorial in nature, and has a simple combinatorial proof if one forgets the underlying set \Omega over which \Sigma is defined. Of course, this proof is totally equivalent to a similar proof about sigma-algebras, but there will be some pitfalls — I will point out one in particular — that arise if one is tempted to refer to \Omega, and so that one is not tempted by Lucifer in his guise \Omega, it is better to remove that temptation entirely.

Before proving Theorem 1, let me demonstrate how one can think about measure theory and probability without ever referring to \Omega.

We start with the combinatorial preliminaries.

Definition 2. By a countably complete lattice, one means a partially ordered set \mathbb B such that every countable subset X \subseteq \mathbb B has a supremum and infimum. In this case, we write x \wedge y (read “x and y“) to denote \inf \{x, y\}, and similarly x \vee y (“x or y“) for \sup \{x, y\}. One says “x implies y” to mean x \leq y.

Definition 3. By a countably complete boolean algebra one means a countably complete lattice \mathbb B equipped with a minimum 0 (“false”) and a maximum 1 (“true”), such that:

  • For every x, y, z \in \mathbb B, x \wedge (y \vee z) = (x \wedge y) \vee (x \wedge z).
  • For every x \in \mathbb B, there is a unique !x \in \mathbb B (“not x“) such that x \wedge !x = 0 and x \vee !x = 1.

Since we will only need countably complete boolean algebras, we will simply refer to such objects as boolean algebras. One immediately checks that the morphisms in the category of boolean algebras are those maps which preserve countable “and”, countable “or”, “true”, and “false”; that the usual laws of classical logic hold in any boolean algebra; and that every sigma-algebra is a boolean algebra under \subseteq. Conversely, I think that given a suitable form of the axiom of choice, one can recover a measurable space (\Omega, \mathbb B) from a boolean algebra \mathbb B by letting \Omega consist of a suitable subset of the set of ultrafilters on \mathbb B; however, the choice of \Omega is not unique (a nice exercise).

The setting of boolean algebras feels like a very natural setting for probability theory to me, because probability theory by design is concerned with events, things for which it makes sense to conjoin with the words “and” and “or” and “not”. One can now easily use the language of propositional logic: for example, two events E,F are mutually exclusive if E \wedge F = 0. Let us now state the usual basic notions of probability theory in this new language:

Definition 4. Let \mathbb B be a boolean algebra. A probability measure \mu on \mathbb B is a mapping \mu: \mathbb B \to [0, 1] such that, whenever E_i are a sequence of pairwise mutually exclusive events, \mu\left(\bigvee_i E_i\right) = \sum_i \mu(E_i).

Definition 5. Let \mathbb B be a boolean algebra and P = (P, \Sigma) be a standard Borel space. A P-valued random variable X on \mathbb B is a morphism of boolean algebras X: \Sigma \to \mathbb B.

Definition 6. Let \mathbb B be a boolean algebra and P = (P, \Sigma) be a standard Borel space. Let X be a P-valued random variable on \mathbb B. If \mu is a probability measure on \mathbb B and E \in \Sigma, one writes \mu(X \in E) to mean \mu(X(E)). The distribution of X is the measure \mu_X(E) = \mu(X \in E) on P. If P = \mathbb R, the cdf of X is the function x \mapsto \mu(X \leq x).

Definition 7. Let \mathbb B,\mathbb B' be boolean algebras. Let X,X' be random variables on \mathbb B,\mathbb B' respectively. A morphism of random variables F: X \to X' is a morphism of boolean algebras F: \mathbb B \to \mathbb B' such that F \circ X = X'.

Given these definitions, I think that it is now straightforward to formulate the central limit theorem, Borel-Cantelli lemma, et cetra. One can drop the word “almost” from many such statements by passing to the quotient \mathbb B/N where N is the ideal of events that almost never happen.

We now prove Theorem 1 in two lemmata.

Lemma 8. Let \mathbb B be an infinite boolean algebra. Then there exists a infinite set A \subseteq \mathbb B of pairwise mutually exclusive events.

To prove Lemma 8, let \mathbb B_0 = \mathbb B and reason by induction. Suppose that we have an infinite boolean algebra \mathbb B_n which is a quotient of \mathbb B_{n-1} and events x_i \in \mathbb B_i,~i \leq n -1. Since \mathbb B_n is infinite, there exists x_n \in \mathbb B_n such that \{y \in \mathbb B_n: y \leq x_n\} is infinite and !x_n \neq 0.

We now condition on x_n, thus set \mathbb B_{n+1} = \mathbb B_n/(!x_n). Here (!x_n) is the ideal generated by !x_n. There is a natural bijection \{y \in \mathbb B_n: y \leq x_n\} \to \mathbb B_{n+1}, since any element y \in \mathbb B_n can be decomposed as (y \wedge x_n) \vee (y \wedge !x_n), giving a decomposition \mathbb B_n = \mathbb B_{n+1} \oplus (!x_n). Therefore we can view x_n as not just an element of \mathbb B_n but an element of \mathbb B_0 and as an element of \mathbb B_{n+1} (where it is identified with 1).

The bijection \{y \in \mathbb B_n: y \leq x_n\} \to \mathbb B_{n+1} implies that \mathbb B_{n+1} is infinite, so we can keep the induction going. Viewing x_n as an element of \mathbb B_0, one easily checks that the x_i form a sequence of pairwise mutually exclusive events. This completes the proof of Lemma 8.

Here, a lot of students got stuck (in the usual “sample space” language). The reason is that they tried to construct the set A using properties of \Omega. For example, a common thing to do was to try to let A be in bijection with \Omega by letting x_\omega = \bigcap \{y \in \mathbb B: \omega \in y\}. In general x_\omega might be a nonmeasurable set; this can be fixed by assuming towards contradiction that \mathbb B is countable. More importantly, the map \omega \mapsto x_\omega is not injective in general, essentially for the same reason that \Omega is not uniquely determined by \mathbb B. By replacing \Omega with a suitable quotient, and showing that this quotient is also infinite, one can force \omega \mapsto x_\omega to be injective, but this can be rather messy. In addition, assuming that \Omega is countable will weaken this lemma, so that it does not imply Theorem 1 unless one invokes the continuum hypothesis. Moral: do not ever refer to points in the sample space!

Lemma 9. Let \mathbb B be a boolean algebra and suppose that there is a countably infinite set A \subseteq \mathbb B of pairwise mutually exclusive events. Then the boolean algebra \sigma(A) generated by A has cardinality at least 2^{\aleph_0}.

To prove Lemma 9, let \{x_i\}_{i \in \mathbb N} be an enumeration of A. Let I be a set of natural numbers and define f(I) = \bigvee_{i \in I} x_i \vee \bigvee_{i \notin I} !x_i. Then f is an injective map I \to A. Indeed, if f(I) = f(J), then suppose without loss of generality that i \in I \setminus J. Then f(I) \geq x_i and f(I) \geq !x_i, so f(I) \geq x_i \vee !x_i = 1. Since the x_i are pairwise mutually exclusive, the only way that f(I) = 1 is if I = J = \mathbb N. This proves Lemma 9, and hence Theorem 1.

A “friendly” proof of the Heine-Borel theorem

I learned a cute proof that [0, 1] is compact today, and felt the need to share it. By some basic point-set topology, this implies that [0, 1]^n is compact for any n, and then this implies that any closed and bounded subset of \mathbb R^n is compact; the converse is easy, so this proves the Heine-Borel theorem.

To prove it, we will use the infinite Ramsey theorem, which can be stated as “For every party with \omega guests, either \omega are all mutual friends, or \omega are all mutual strangers.” There are various elementary or not-so-elementary proofs of the infinite Ramsey theorem; see Wikipedia’s article on Ramsey’s theorem for an example. It is a vast generalization of the result from elementary combinatorics that says that “For every party with 6 guests, either 3 are friends or 3 are strangers.” Here \omega is the countable infinite cardinal.

We now claim that every sequence (x_n)_n of real numbers has a monotone subsequence. This completes the proof, since any bounded monotone sequence is Cauchy.

Say that n < m are friends if x_n < x_m; otherwise, say they are strangers. By the infinite Ramsey theorem, there is an infinite set A \subseteq \mathbb N such that either all elements of A are friends, or all elements of A are strangers. Passing to the subsequence indexed by A, we find a subsequence consisting only of friends (in which case it is increasing) or of strangers (in which case it either has a subsequence which is constant, or a subsequence which is decreasing).

By the way, this “friendly” property of \omega is quite special. If \kappa is an uncountable cardinal such that for every party with \kappa guests, either \kappa are mutual friends or \kappa are mutual strangers, then we say that \kappa is “weakly compact”, and can easily prove that \kappa is a large cardinal in the sense that it is inaccessible and has a higher strength than an inaccessible cardinal (or even of a proper class of them).

Noncommutative spaces

The central dogma of geometry is that a space X is the “same” data as the functions on X, i.e. the functions X \to R for R some ring. In analysis, of course, we usually take R to be the real numbers, or an algebra over the real numbers. Some examples of this phenomenon:

  • A smooth manifold X is determined up to smooth diffeomorphism by the sheaf of rings of smooth functions from open subsets of X into \mathbb R.
  • An open subset X of the Riemann sphere is determined up to conformal transformation is determined by the algebra of holomorphic functions X \to \mathbb C.
  • An algebraic variety X over a field K is determined up to isomorphism by the sheaf of rings of polynomial functions from Zariski open subsets of X into K.
  • The example we will focus on in this post: a compact Hausdorff space X is determined by the C*-algebra of continuous functions X \to \mathbb C.

In the above we have always assumed that the ring R was commutative, and in fact a field. As a consequence the algebra A of functions X \to R is also commutative. If we view A as “the same data” as X, then there should be some generalization of the notion of a space that corresponds to when A is noncommutative. I learned how to do this for compact Hausdorff spaces when I took a course on C*-algebras last semester, and I’d like to describe this generalization of the notion of compact Hausdorff space here.

Fix a Hilbert space H, and let B(H) be the algebra of bounded linear operators on H. By a C*-algebra acting on H, we mean a complex subalgebra of B(H) which is closed under taking adjoints and closed with respect to the operator norm. In case H = L^2(X) for some measure space X, we will refer to C*-algebras acting on H as C*-algebras on X.

In case X is a compact Hausdorff space, there are three natural C*-algebras on X. First is B(H) itself; second is the algebra B_0(H) of compact operators acting on H (in other words, the norm-closure of finite rank operators), and third is the C*-algebra C(X) of continuous functions on X, which acts on H by pointwise multiplication. The algebra C(X) is of course commutative.

If A is a commutative, unital C*-algebra and I is a maximal ideal in A, then A/I is a field, and by the Gelfand-Mazur theorem in fact A/I = \mathbb C. It follows that the maximal spectrum (i.e. the set of maximal ideals) of A is in natural bijection with the space of continuous, surjective algebra morphisms A \to \mathbb C; namely, I corresponds to the projection A \to A/I. In particular I is closed. One can show that every such morphism has norm 1, so lies in the unit ball of the dual space A*; and that the limit of a net of morphisms in the weakstar topology is also a morphism. Therefore the weakstar topology of A* restricts to the maximal spectrum of A, which is then a compact Hausdorff space by the Banach-Alaoglu theorem.

If A = C(X), then the maximal spectrum of A consists of the ideals I_x of functions f such that f(x) = 0, where x \in X. The projections A \to A/I_x are exactly of the form f \mapsto f(x). This is the content of the baby Gelfand-Naimark theorem: the maximal spectrum of C(X) is X, and conversely every commutative, unital C*-algebra arises this way. (The great Gelfand-Naimark theorem, on the other hand, guarantees that every C*-algebra, defined as a special kind of ring rather than analytically, is a C*-algebra acting [faithfully] on a Hilbert space as I defined above.) The baby Gelfand-Naimark theorem is far from constructive; the proof of the Banach-Alaoglu theorem requires the axiom of choice, especially when A is not separable.

Let us briefly run through the properties of X that we can easily recover from C(X). Continuous maps X \to Y correspond to continuous, unital algebra morphisms C(Y) \to C(X); a map f is sent to the morphism which pulls back functions along f. Points, as noted above, correspond to maximal ideals. If K is a closed subset of X and I is the ideal of functions that vanish on K, then A/I is the “localization” of A at I. (So inclusions of compact Hausdorff spaces correspond to projections of C*-algebras.)

If X is just locally compact, then the C*-algebra C_0(X) of continuous functions on X which vanish at the fringe of every compactification of X is not unital (since 1 does not vanish at the fringe), and the unital algebras A which extend C_0(X) correspond to compact Hausdorff spaces that contain X, in a sort of “Galois correspondence”. In fact, the one-point compactification of X corresponds to the minimal unital C*-algebra containing C_0(X), while the Stone-Cech compactification corresponds to the C*-algebra C_b(X) of bounded continuous functions on X. Therefore the compactifications of X correspond to the unital C*-algebras A such that C_0(X) \subset A \subseteq C_b(X), and surjective continuous functions which preserve the compactification structure, say Z \to Y, correspond to the inclusion maps C(Y) \to C(Z). Two examples of this phenomenon:

  • The C*-algebra C_b(\mathbb N) = \ell^\infty has a maximal spectrum whose points are exactly the ultrafilters on \mathbb N. In particular \ell^\infty/c_0 has a maximal spectrum whose points are exactly the free ultrafilters. This is why we needed the axiom of choice so badly.
  • The compactification of \mathbb R obtained as the C*-algebra generated by C_0(\mathbb R) and $\theta \mapsto e^{i\theta}$ is a funny-looking curve in \mathbb R^3. Try drawing it! (As a hint: think of certain spaces which are connected but not path-connected…)

If X = (X, d) is a metric space, then we can recover the metric d from its Lipschitz seminorm L. This is the seminorm Lf = \sup_{x_1, x_2} |f(x_1)-f(x_2)|/d(x_1,x_2), which is finite exactly if f is Lipschitz. One then has d(x_1, x_2) = \sup_{Lf \leq 1} |f(x_1) - f(x_2)|. The Lipschitz seminorm also satisfies the identities L1 = 0, and L(fg) \leq ||f||_\infty Lg + ||g||_\infty Lf, the latter being known as the Leibniz axiom. A seminorm \rho satisfying these properties gives rise to a metric on X, and if the resulting topology on X is actually the topology of X, we say that \rho is a Lip-norm.

Finally, if E \to X is a continuous vector bundle, then let \Gamma(E) denote the vector space of continuous sections of E. Then \Gamma(E) is a projective module over C(X), and Swan’s theorem says that every projective module arises this way. Moreover, the module structure of \Gamma(E) determines E up to isomorphism.

So now we have all we need to consider noncommutative compact Hausdorff spaces. By such a thing, I just mean a unital C*-algebra A, which I will call X when I am thinking of it as a space. Since A is noncommutative, we cannot appeal to the Gelfand-Mazur theorem, and in fact the notion of a maximal ideal doesn’t quite make sense. The correct generalization of maximal ideal is known as a “primitive ideal”: an ideal I is primitive if I is the annihilator of a simple A-module. (The only simple A-module is \mathbb C if A is commutative, so in that case I is maximal.) The primitive spectrum of A admits a Zariski topology, which coincides with the weakstar topology when A is commutative. So the points of X will consist of primitive ideals of A. Metrics on X will be Lip-norms; vector bundles will be projective modules.

What of functions? We already know that elements of A should be functions on X. But what is their codomain? Clearly not a field — they aren’t commutative! If I is a primitive ideal corresponding to a point x, we let A/I be the localization at x. This will be some C*-algebra, which is a simple module A_x that we view as the ring that functions send x into. (So this construction may send different points into different rings — this isn’t so different from algebraic geometry, however, where localization at an integer n sends points of \text{Spec} \mathbb Z into the ring \mathbb Z/n.) Matrix rings are simple, so often A_x will be a matrix ring.

Let’s run through an example of a noncommutative space. Let T be the unit circle in $\mathbb R^2$. The group $\mathbb Z/2$ acts on T by (x, y) \mapsto (x, -y). This corresponds to an action on the C*-algebra C(T) by f(x, y) \mapsto f(x, -y). Whenever a group G acts on a C*-algebra B, we can define a semidirect product B \rtimes G, and so we let our C*-algebra A be the semidirect product C(T) \rtimes \mathbb Z/2. It turns out that A is the C*-algebra generated by unitary operators that form a group isomorphic to \mathbb Z/2 * \mathbb Z/2, where * is the coproduct of groups.

We may express elements of A as f + gb, where b is the nontrivial element of \mathbb Z/2 and f,g \in C(T); A is a C*-algebra acting on L^2(T) by (f + gb)\xi(x, y) = f(x, y)\xi(x, y) + g(x, y)\xi(x, -y), for any \xi \in L^2(T). The center Z(A) of A, therefore, consists of f \in C(T) such that f(x, y) = f(x, -y). Therefore Z(A) is isomorphic to the C*-algebra C([-1, 1]), where f \in Z(A) is sent to \tilde f(x) = f(x, \pm y) (where x^2 + y^2 = 1, and the choice of sign does not matter by assumption on f).

The spectrum of Z(A) is easy to compute, therefore: it is just [-1, 1]! For every x \in [-1, 1], let A_x = A/I_x be “localization at x”, where I_x is the ideal generated by f \in Z(A) which vanish at x. Now let \cos \theta = x; one then has a morphism of C*-algebras A_x \to \mathbb C^{2 \times 2} by $f + gb \mapsto \frac{1}{2}\begin{bmatrix}f(e^{i\theta})&g(e^{i\theta})\\g(e^{-i\theta})&f(e^{-i\theta})\end{bmatrix}$. This morphism is always injective, and if x \in (-1, 1), it is also surjective. Therefore A_x = \mathbb C^{2 \times 2} in that case. Since \mathbb C^{2 \times 2} is a simple ring, it follows that the spectrum of A contains x.

But something funny happens at the end-points, corresponding to the fact that (\pm 1, 0) were fixed points of \mathbb Z/2. Since f(e^{i\theta}) = f(e^{-i\theta}) in that case and similarly for g, A_x is isomorphic to the 2-dimensional subalgebra of \mathbb C^{2 \times 2} consisting of symmetric matrices with a doubled diagonal entry. This is not a simple module, and in fact projects onto \mathbb C in two different ways; therefore there are two points in the spectrum of A corresponding to each of \pm 1!

Thus the primitive ideal space X of A is not Hausdorff; in fact it looks like [-1, 1], except that the end-points are doubled. This is similar to the “bug-eyed line” phenomenon in algebraic geometry.

What is the use of such a space? Well, Qiaochu Yuan describes a proof (that I believe is due to Marc Rieffel), using this space X, that if B is any C*-algebra and p, q \in B are projections such that ||p - q|| < 1, then there is a unitary operator u such that pu = uq at MathOverflow. The idea is that p, q actually project onto generators of the C*-algebra A = C(T) \rtimes \mathbb Z/2, using the fact that A is also generated by \mathbb Z/2 * \mathbb Z/2.

As a consequence, in any separable C*-algebra, there are only countably many projections. Thus one may view the above discussion as a very overcomplicated, long-winded proof of the fact that B(\ell^2) is not separable.

An ergodic take on the random graph

I wanted to learn about the random graph, which is normally an object studied in model theory, but I know very little model theory. Since I’m taking a course on ergodic theory this term, I figured now would be the time to figure out the basic properties of the random graph from an analyst’s perspective. (I’m sure none of the ideas here are original, though I haven’t seem them written down in this presentation before.)

We will be considering undirected graphs without loops, at most one edge per pair of vertices, and one vertex for each natural number. We want to think of such graphs as being encoded by real numbers, but since multiple decimal expansions encode the same real number, this will prove problematic. So instead, we will encode them using infinite binary strings.

Let C be the usual Cantor set of infinite binary strings. If 2 denotes the 2-element set then C is a product of countably many copies of 2. We define a probability measure mu on C. The topology of C is generated by cylinders C_y, where y ranges over finite binary strings. If y is length l, then \mu(C_y) = 2^{-\ell}. (Actually, we could replace mu by any Bernoulli measure, but this is slightly cleaner.) This measure mu extends to the Borel sets of C.

Fix your favorite bijection F from N to the set X of pairs (n, m) of natural numbers such that n < m. F lifts to a bijection between the power sets of N and X. Now the power set of N is C, while the power set of X, say G, is nothing more than the set of possible adjacency relations that could give N the structure of a graph. Pushing mu forward along F, we obtain a probability space G = (G, mu) whose outcomes are graphs on N! Moreover, to specify such a graph, we just need to choose a point in the Cantor set.

But wait, this probability space is highly uniform. That is, if we pick two elements of G at random, then almost surely they will be isomorphic. To see this, we show that almost surely, every finite graph embeds into a randomly chosen element of G.

To see that the graph G on N such that every countable graph embeds uniquely into G is unique up to isomorphism, let G’ be another such graph. Since both graphs are defined on N, there is a natural ordering on their vertices. We construct an isomorphism between G and G’ in countably many stages. Assume inductively that at stage 2n – 1, we have constructed isomorphic finite subgraphs H_{2n – 1} and H_{2n – 1}’ of G and G’ respectively, with 2n – 1 elements. At stage 2n, let g be the least element of G that does not appear in H_{2n – 1} and let H_{2n} be the union of g with H_{2n – 1}. Then H_{2n} is a finite graph, so is isomorphic to a subgraph H_{2n}’ of G’, which can be chosen to contain H_{2n – 1}’. Then at stage 2n + 1, let g’ be the least element of G’ which does not appear in H_{2n}’, adjoin it to get a graph H_{2n + 1}’, and find an isomorphism to a subgraph H_{2n + 1} of G. The H_n and H_n’ eventually exhaust G and G’, respectively, so give rise to an isomorphism between G and G’.

Now, to see that every finite graph embeds almost surely. It suffices to show that every finite binary string appears in a randomly chosen infinite binary string almost surely. To do this, say that an infinite binary string x is normal if 0 appears in x with frequency 1/2. If x is normal, then any finite word of length l appears in x with frequency 2^{-\ell}, and in particular appears at least once.

So it suffices to show that almost every infinite binary string is normal. This follows by the ergodic theorem. Let T be the shift map on C (so T(x_1x_2x_3\dots) = x_2x_3x_4\dots.) Clearly T is measure-preserving. Now define a Borel function f: C \to 2 by projection onto the first factor: f(x_1x_2x_3\dots) = x_1. Then f(T^n(x_1x_2x_3\dots)) = x_{n+1} and obviously the expected value of f is 1/2. So by the ergodic theorem, the frequency of 0 in x is the expected value of f, so we are done.

Nonatomic measures and the nonstandard reals, for analysts

This argument has been useful to me in the past, as a way of constructing the nonstandard real numbers (real numbers with infinitesimals). Of course, infinitesimals are useful for cheesing epsilon-delta arguments where the explicit values of the constants that appear aren’t important, but one often feels “guilty” about using them because they come off as “less rigorous.” This is probably because the proof that nonstandard reals exist is usually thought of as “difficult to explain” to a non-logician, but the proof that I will give is meant to be “measure-theoretic” in nature, so much friendlier to someone with a background in analysis.

First, an incomprehensible definition. We will define the nonstandard reals to be any ultrapower \mathbb R^\omega/\mathcal U, where \omega is the set of natural numbers and \mathcal U is a free ultrafilter on the powerset 2^\omega of \omega.

An ultrafilter is nothing more than a finitely additive measure valued in {0, 1}, which is said to be free if it is non-atomic. (Throughout this post, we will take measures to be finitely additive, rather than countably additive.) If a set X is sent to 1, we write X \in \mathcal U (so we identify the measure with the set of all sets which it sends to 1). Of course atomic ultrafilters on 2^\omega exist (just take any Dirac mass.) But as we will see, atomic ultrafilters are not very interesting for our purposes here.

Let \lambda be an infinite set, k an ordered field, and \mathcal U an ultrafilter on 2^\lambda. (This can be defined in much greater generality than ordered fields, but that’s all we’ll need for our purposes.) As usual k^\lambda is the set of all maps \lambda \to k (so if \lambda  = \omega, it is the sequence space of k). Now \mathcal U determines an equivalence relation on k^\lambda, by declaring that two maps x and y are equivalent if they are equal almost everywhere. The resulting quotient is known as the ultrapower k^\lambda/\mathcal U of k.

We give k^\lambda/\mathcal U the structure of an ordered field, as follows. First, 0 and 1 are nothing more than the equivalence classes of 0 and 1. We now observe that + and \cdot lift to pointwise operations on k^\lambda, and then descend to operations on k^\lambda/\mathcal U, just as we can define operations on the space L^p of Lebesgue-almost everywhere equivalence classes p-integrable functions by taking quotients of operations on the space \mathcal L^p of p-integrable functions. Finally, we declare x < y if x(n) < y(n) for almost every n \in \lambda.

We have to use some logical machinery somewhere in this proof, and so now we will. Specifically, we need Łoś’s theorem (also known as the fundamental theorem of ultrapowers, or the transfer principle.) To do this, we need the notion of a first-order formula in the language of ordered fields (or I will say “formula” for short). This is any mathematical statement about a given ordered field k which can be expressed using only finitely many copies of the symbols \exists, \forall, \implies, \neg (i.e. “not”), +, \cdot, <, =, 0, 1, and parentheses and variables, where \exists x means that there exists an element x of k (and not a subset of k, or a set of subsets of k, or so on) and \neg is negation. For example, the statement that \sqrt 2 exists is a formula, namely \exists x x\cdot x = 1 +1. A more sophisticated example of a formula is the quadratic formula, as would be a nice exercise to verify. We need infinitely many formulae to express that k is characteristic 0, namely \neg((1 + 1 + \dots + 1) = 0) for every length of 1’s. And we cannot express that k is archimedean even with infinitely many formulae.

Łoś’s theorem says that if k is an ordered field and k^\lambda/\mathcal U = F is any ultrapower of k, then for every formula \varphi that is true in k, \varphi is true in F as well (and if this conclusion holds, then we say that k and F are elementarily equivalent). Note carefully that elementary equivalence is weaker than isomorphism (indeed, by Łoś’s theorem, $\mathbb R$ is elementarily equivalent to the nonstandard reals, which are nonarchimedean).

To prove Łoś’s theorem, we use induction on the complexity of \varphi. For so-called “atomic formulae”, those that can be expressed without use of \neg, \implies, or \exists, this is an easy “almost everywhere” argument. For negation, notice that if \varphi = \neg\psi, then \psi is true almost everywhere iff \varphi is false almost everywhere by the inductive hypothesis. Similarly for implication. For quantification, assume \varphi = \exists x \psi. If \varphi is true in k, then there is an a \in k be such that a satisfies \psi. So the map which is constantly a satisfies \psi by the inductive hypothesis. On the other hand, if \varphi is true in k^\lambda/\mathcal U, then find a map A \in k^\lambda/\mathcal U satisfying \psi. Then there is a a in the image of A which satisfies \psi, since A satisfies \psi almost everywhere. This proves Łoś’s theorem.

We are almost done. The problem is that if we choose any ultrapower of \mathbb R, we have not accomplished anything. In fact if we take \mathbb R^\omega/\delta_0, then the map x \mapsto (x, x, \dots) is an isomorphism \mathbb R \to \mathbb R^\omega/\delta_0, so we have not shown the existence of infinitesimals. This is why we assumed that the ultrafilter \mathcal U was free.

To prove that a free ultrafilter exists, we define a filter \mathcal F to be a measure on a subalgebra \Sigma_{\mathcal F} of 2^\omega valued in {0, 1}, and declare that a filter is free if it is not atomic. Clearly the filter which sends finite sets to 0 and cofinite sets to 1 is non-atomic, so one exists. We order filters by saying that \mathcal F \leq \mathcal G if there is an inclusion of algebras \Sigma_{\mathcal F} \subseteq \Sigma_{\mathcal G}, and \mathcal F agrees with \mathcal G if they agree whenever both are defined, just like in the proof of the Hanh-Banach theorem. By taking limits we see that the supremum of filters is a filter. Now we apply Zorn’s lemma to the space of free filters, which is easily seen to be an ultrafilter.

(Unimportant aside: The existence of a \omega-additive free ultrafilter on 2^\omega is a very special property of \omega. It is not true for \mathbb R or most other infinite sets. In fact, if \lambda is uncountable and has a \lambda-additive free ultrafilter on 2^\lambda, we say that \lambda is a measurable cardinal. The existence of a measurable cardinal cannot be proven from the axioms of set theory, and in fact is much stronger than the consistency of said axioms.)

Now if \mathcal U is a free ultrafilter, then the ultrapower \mathbb R^\omega/\mathcal U is by definition the ordered field of nonstandard real numbers. By Łoś’s theorem, any statement which does not require quantifying over infinitely many reals or over sets of reals and is true in \mathbb R is still true in \mathbb R^\omega/\mathcal U. Moreover, the reals embed in \mathbb R^\omega/\mathcal U by the map x \mapsto (x, x, \dots). On the other hand, the sequence (1, 1/2, 1/3, 1/4, \dots), descends to an positive element \varepsilon of \mathbb R^\omega/\mathcal U which is smaller than any standard real number, since it is smaller than any standard real number almost everywhere. So we do have infinitesimals, and the inclusion \mathbb R \to \mathbb R^\omega/\mathcal U is not an isomorphism, as desired.

Discontinuous linear functionals and the axiom of choice

In lecture the other day, my functional analysis professor remarked that he could not think of any examples of a Banach space B and a linear functional f on B such that f was discontinuous which were not constructed without use of the axiom of choice, and said, “The logicians might have something to say about that.”

Indeed, Solovay’s model is a model of ZF (without the axiom of choice) in which every linear functional on every Banach space is continuous. Notably, this model thinks that every subset of every Polish space has the property of Baire (i.e. it differs from an open set by a meager set — this could never happen under ZFC, since a nonmeasurable set does not have the property of Baire). For the purposes of this problem, we might as well assume that B is Polish, since if not, and f is an unbounded functional, then there is a linearly independent sequence x_n in B such that |f(x_n)| \to \infty. The completion of the span of the sequence x_n is a separable (hence Polish) Banach space on which f is discontinuous.

We now apply Pettis’s theorem:

Let G be a Baire group and H be a Polish group, and \varphi: G \to H be a (possibly discontinuous) morphism of groups. If, for every open set U \subseteq G, \varphi^{-1}(U) has the property of Baire, then \varphi is actually continuous.

Pettis’s theorem is, for example, Theorem 9.10 in Kechris’ book “Classical Descriptive Set Theory.” The proof is not especially difficult.

Since B is a Banach space, the Baire category theorem implies that B is an additive Baire group; while \mathbb R is an additive Polish group and f is a morphism of groups. We can assume that Solovay’s model thinks that any A \subseteq B has the property of Baire, so f is continuous by Pettis’s theorem.

What is a stochastic graph?

Consider a population of N(t) individuals, each of which are either infected with Disease X or not. On day t \in \mathbb N, individual i has probability \beta_{ij}(t) \in [0, 1] of infecting individual j with Disease X. Besides, individual i was a probability c_i(t) of death, which depends on the age of i and his state of infection. New healthy individuals enter into the population at a rate \psi(t).

This came up as a proposed model for studying the spread of HIV when I was at the San Diego State REU last summer. We considered drawing a graph whose nodes represent individuals, such that an edge (i,j) existed if individual i was infected with HIV and in sexual contact with individual j.

We never went through with this model, for lack of time (and because we feared that if N(t) was large enough to be realistic, computation time on MATLAB would become rather intractable for our resources); we were also unsure of how to prove theoretical results about such a model, such as the basic reproduction number \mathcal R_0 (defined to be the expected number of new infections from infected individual i when i is introduced to an otherwise perfectly susceptible population) and the expected prevalence, defined to be the expected number of infected individuals I(t) divided by N(t). I am not familiar with work on problems such as this, but let me record a few prospective definitions for my own amusement.

Let T be an index set \ni 0 (which we shall assume \subseteq \mathbb N for simplicity) and let \Sigma,\Phi,\Psi be probability spaces.

Let’s say that a graph G = (V, E) (whose vertices are indexed by \mathbb N) is decorated by (\Sigma,\Phi,\Psi) if for the graph G, each node i \in V and each edge (i,j) \in E we associate random variables X: \Sigma \to \mathbb R, X_i: \Phi: \mathbb R, X_{ij}: \Psi \to \mathbb R.

Let \Omega = \Sigma \times \Phi^{\mathbb N} \times \Psi^{\mathbb N^2}, the product taken in the sense of measure spaces. Then a decoration is nothing more than a random variable \Omega \to \mathbb R. We interpret decorations as referring to the variables \beta_{ij}, c_i, \psi, etc. described above.

Let \mathcal{RG}_{\Sigma,\Phi,\Psi} be the category of random graphs which have been decorated by (\Sigma,\Phi,\Psi).

A map \mathbf G: T \to \text{Hom}(\Omega,\mathcal{RG}_{\Sigma,\Phi,\Psi}) is a stochastic graph if \mathbf G(0) is deterministic (i.e. there is a fixed graph \mathbf G_0, which we call the initial data, such that for each \omega \in \Omega \mathbf G(0)(\omega) = \mathbf G_0 almost surely).

We interpret a map \Omega \to \mathcal{RG}_{\Sigma,\Phi,\Psi} as an assignment which takes the decoration \omega_t of \mathbf G(t)(\omega_{t-1}) and returns the random graph \mathbf G(t+1)(\omega_t) (so that at time t+1, the random graph is “parametrized by the decorations of time t“).

As it is, it seems like I’ve written down a very broad definition, and I’m not sure if it’s actually useful. Indeed, I’m not too sure how to restrict this definition to particularly useful cases. It’ll be an interesting problem to keep in the back of my head, until I grow bored of it or find some literature on the matter.