Lately I’ve been thinking a lot about the basics of probability theory. I originally learned the definition of probability measure from a logician — Wes Holliday — and his definition has probably permanently influenced how I think about probability, but I think that even analysts like myself agree that the formulation of probability in terms of sample spaces is a little unnatural. For example, one typically makes it a point to never mention specific points in the sample space, but only events — elements of a sigma-algebra over the sample space. Moreover, one typically identifies an event which happens almost surely with the sure event. Random variables are identified with their distributions, and so isomorphism of random variables “forgets” about events that almost surely do not happen as well as individual points in the sample space. Sample spaces are also utterly uninteresting: up to measurable isomorphism, there is only one standard Borel space, and in practice most probability spaces are Polish and therefore standard. There is also a lot of “junk” in the power set of a standard Borel space; all but a lesser cardinality of sets are not measurable. Simply put, the sample space contains far more information than a probabilist is willing to use.
Let me stress that one actually gets to serious probability theory, which I guess I would define as beginning when one proves the central limit theorem, the above is not terribly important (though it can help — see An uncountable Moore-Schmidt theorem, a paper in ergodic theory which in part inspired this blog post). Worrying about whether one has extra information won’t help you prove inequalities, which is what analysis is all about. But I think when one is still learning the basics, or is considering results which are either deeply infinitary (as in the case of the uncountable Moore-Schmidt theorem) or purely “measurable” (only having to do with sigma-algebras), it is instructive to step back and think about what a sample space really is.
As for myself, I’m on the teaching staff of a measure theory course this semester, and an exercise this week was to prove the following theorem, which appears in the exercises of both Billingsley and father Rudin.
Theorem 1. If is an infinite sigma-algebra, then
has cardinality at least that of
.
This exercise is hard essentially because it is not measure-theoretic. In fact, while the student is probably tempted to concoct a horribly complicated measure-theoretic argument, the statement is purely combinatorial in nature, and has a simple combinatorial proof if one forgets the underlying set over which
is defined. Of course, this proof is totally equivalent to a similar proof about sigma-algebras, but there will be some pitfalls — I will point out one in particular — that arise if one is tempted to refer to
, and so that one is not tempted by Lucifer in his guise
, it is better to remove that temptation entirely.
Before proving Theorem 1, let me demonstrate how one can think about measure theory and probability without ever referring to .
We start with the combinatorial preliminaries.
Definition 2. By a countably complete lattice, one means a partially ordered set such that every countable subset
has a supremum and infimum. In this case, we write
(read “
and
“) to denote
, and similarly
(“
or
“) for
. One says “
implies
” to mean
.
Definition 3. By a countably complete boolean algebra one means a countably complete lattice equipped with a minimum
(“false”) and a maximum
(“true”), such that:
- For every
,
.
- For every
, there is a unique
(“not
“) such that
and
.
Since we will only need countably complete boolean algebras, we will simply refer to such objects as boolean algebras. One immediately checks that the morphisms in the category of boolean algebras are those maps which preserve countable “and”, countable “or”, “true”, and “false”; that the usual laws of classical logic hold in any boolean algebra; and that every sigma-algebra is a boolean algebra under . Conversely, I think that given a suitable form of the axiom of choice, one can recover a measurable space
from a boolean algebra
by letting
consist of a suitable subset of the set of ultrafilters on
; however, the choice of
is not unique (a nice exercise).
The setting of boolean algebras feels like a very natural setting for probability theory to me, because probability theory by design is concerned with events, things for which it makes sense to conjoin with the words “and” and “or” and “not”. One can now easily use the language of propositional logic: for example, two events are mutually exclusive if
. Let us now state the usual basic notions of probability theory in this new language:
Definition 4. Let be a boolean algebra. A probability measure
on
is a mapping
such that, whenever
are a sequence of pairwise mutually exclusive events,
.
Definition 5. Let be a boolean algebra and
be a standard Borel space. A
-valued random variable
on
is a morphism of boolean algebras
.
Definition 6. Let be a boolean algebra and
be a standard Borel space. Let
be a
-valued random variable on
. If
is a probability measure on
and
, one writes
to mean
. The distribution of
is the measure
on
. If
, the cdf of
is the function
.
Definition 7. Let be boolean algebras. Let
be random variables on
respectively. A morphism of random variables
is a morphism of boolean algebras
such that
.
Given these definitions, I think that it is now straightforward to formulate the central limit theorem, Borel-Cantelli lemma, et cetra. One can drop the word “almost” from many such statements by passing to the quotient where
is the ideal of events that almost never happen.
We now prove Theorem 1 in two lemmata.
Lemma 8. Let be an infinite boolean algebra. Then there exists a infinite set
of pairwise mutually exclusive events.
To prove Lemma 8, let and reason by induction. Suppose that we have an infinite boolean algebra
which is a quotient of
and events
. Since
is infinite, there exists
such that
is infinite and
.
We now condition on , thus set
. Here
is the ideal generated by
. There is a natural bijection
, since any element
can be decomposed as
, giving a decomposition
. Therefore we can view
as not just an element of
but an element of
and as an element of
(where it is identified with
).
The bijection implies that
is infinite, so we can keep the induction going. Viewing
as an element of
, one easily checks that the
form a sequence of pairwise mutually exclusive events. This completes the proof of Lemma 8.
Here, a lot of students got stuck (in the usual “sample space” language). The reason is that they tried to construct the set using properties of
. For example, a common thing to do was to try to let
be in bijection with
by letting
. In general
might be a nonmeasurable set; this can be fixed by assuming towards contradiction that
is countable. More importantly, the map
is not injective in general, essentially for the same reason that
is not uniquely determined by
. By replacing
with a suitable quotient, and showing that this quotient is also infinite, one can force
to be injective, but this can be rather messy. In addition, assuming that
is countable will weaken this lemma, so that it does not imply Theorem 1 unless one invokes the continuum hypothesis. Moral: do not ever refer to points in the sample space!
Lemma 9. Let be a boolean algebra and suppose that there is a countably infinite set
of pairwise mutually exclusive events. Then the boolean algebra
generated by
has cardinality at least
.
To prove Lemma 9, let be an enumeration of
. Let
be a set of natural numbers and define
. Then
is an injective map
. Indeed, if
, then suppose without loss of generality that
. Then
and
, so
. Since the
are pairwise mutually exclusive, the only way that
is if
. This proves Lemma 9, and hence Theorem 1.
Fantastic Aiden! Of course, my social science brain can’t absorb most of your paper (blog), but it is certainly fascinating and incredible. Keep up the great work!
LikeLike
Hi Johnson! Forgive me for being a bit overly verbose in this article, hah.
LikeLike