Conditional Probability and Conditional Expectation

talks
A quick notebook on Conditional Probability and Conditional Expectation in statistics and machine learning.
Author
Affiliation

Mustafa Aslan

Cardiff University, UK

Published

April 23, 2025

The Discrete Case

For any two events \(E\) and \(F\), the conditional probability of \(E\) given \(F\) is defined, as long as \(P(F) > 0\), by

\[ P(E|F) = \frac{P(E F)}{P(F)}. \]

Hence, if \(X\) and \(Y\) are discrete random variables, then it is natural to define the conditional probability mass function of \(X\) given that \(Y = y\), by

\[ \begin{align*} p_{X|Y}(x|y) &= P(X = x | Y = y) \\ &= \frac{P(X = x, Y = y)}{P(Y = y)} \\ &= \frac{p(x,y)}{p_Y(y)}. \end{align*} \]

Similarly, the conditional probability distribution function of \(X\) given that \(Y=y\) is defined, for all \(y\) such that \(P(Y = y) > 0\), by

\[ \begin{align*} F_{X|Y}(x|y) &= P(X \leq x | Y = y) \\ &= \sum_{a \leq x} P_{X | Y}(a | y) \\ \end{align*} \]

The conditional expectation of \(X\) given that \(Y = y\) is defined by \[ E[X | Y = y] = \sum_{x} x P(X = x | Y = y) = \sum_{x} x p_{X|Y}(x|y). \]

The Continuous Case

If \(X\) and \(Y\) have a joint probability density function \(f(x, y)\), then the conditional probability density function of \(X\), given that \(Y = y\), is defined for all values of \(y\) such that \(f_Y(y) > 0\), by

\[ \begin{align*} f_{X|Y}(x|y) &= \frac{f(x,y)}{f_Y(y)} \\ &= \frac{f(x,y)}{\int_{-\infty}^{\infty} f(x,y) dx}. \end{align*} \] The conditional distribution function of \(X\) given that \(Y = y\) is defined by \[ \begin{align*} F_{X|Y}(x|y) &= P(X \leq x | Y = y) \\ &= \int_{-\infty}^{x} f_{X|Y}(a|y) da \\ &= \int_{-\infty}^{x} \frac{f(a,y)}{f_Y(y)} da \\ &= \frac{1}{f_Y(y)} \int_{-\infty}^{x} f(a,y) da. \end{align*} \]

The conditional expectation of \(X\) given that \(Y = y\) is defined or all values of \(y\) such that \(f_Y(y) > 0\), by \[ \begin{align*} E[X | Y = y] &= \int_{-\infty}^{\infty} x f_{X|Y}(x|y) dx \\ &= \int_{-\infty}^{\infty} x \frac{f(x,y)}{f_Y(y)} dx \\ &= \frac{1}{f_Y(y)} \int_{-\infty}^{\infty} x f(x,y) dx. \end{align*} \]

Computing Expectations by Conditioning

Let us denote by \(E[X|Y]\) that function of the random variable \(Y\) whose value at \(Y = y\) is \(E[X|Y = y]\). Note that \(E[X|Y]\) is itself a random variable. An extremely important important property of conditional expectation is that for all random variables \(X\) and \(Y\) \[ E[X] = E[E[X|Y]] \]

If \(Y\) is a discrete random variable, then \[ E[X] = \sum_{y} E[X|Y = y] P(Y = y) \]

If \(Y\) is continuous with density \(f_Y(y)\), then \[ E[X] = \int_{-\infty}^{\infty} E[X|Y = y] f_Y(y) dy \]