Basics of Probability and Statistics

Author

Devendra Ghate

Published

April 18, 2025

Discrete Random Variables

Consider a random variable \(X\) that can take on a finite number of values \(x_1, x_2, \ldots, x_n\). The well known example of a discrete random variable is the outcome of a fair six-sided die roll, which can take on values from 1 to 6. The probability of \(X = 1\) is denoted by \(P(X = 1)\), and in this case, it is \(\frac{1}{6}\). The general probability mass function (PMF) of a discrete random variable is a function that describes the probability of the random variable taking on each of its possible values. The PMF is denoted by \(P(X = x)\).

Mean of a discrete random variable is given by:

\[E(X) = \sum_{i=1}^{n} x_i P(X = x_i)\]

Variance of a discrete random variable is given by:

\[\text{Var}(X) = E(X^2) - E(X)^2 = \sum_{i=1}^{n} x_i^2 P(X = x_i) - \left(\sum_{i=1}^{n} x_i P(X = x_i)\right)^2\]

These are the first and second moments of the random variable, respectively.

Continuous Random Variables

Probability Density Function (PDF)

The probability density function (PDF) of a continuous random variable is a function that describes the relative likelihood for this random variable to take on a given value. The PDF is the derivative of the cumulative distribution function (CDF) with respect to the random variable. The CDF is the integral of the PDF.

For example, if the variable \(X\) is a continuous random variable over a range of \([-\infty, \infty]\) and the PDF of \(X\) is \(f(x)\), then the probability of \(X\) taking on a value between \(a\) and \(b\) is given by the integral of \(f(x)\) over the range of \(a\) to \(b\):

\[P(a \leq X \leq b) = \int_a^b f(x) dx\]

Code

# This code plots the PDF of a normal distribution for two different values of mu and variance to illustrate the effect of these parameters on the shape of the PDF

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm

x = np.arange(-5, 5, 0.01)

def normal(x, mu, sigma):
    return norm.pdf(x, mu, sigma)

y1 = normal(x, 0, 1)
y2 = normal(x, 0, 2)
y3 = normal(x, 1, 1)

plt.figure(figsize=(8, 6))
plt.plot(x, y1, label="mu=0, sigma^2=1", linewidth=2)
plt.plot(x, y2, label="mu=0, sigma^2=2", linewidth=2)
plt.plot(x, y3, label="mu=1, sigma^2=1", linewidth=2)
plt.xlabel("x")
plt.ylabel("f(x)")
plt.title("PDF of a normal distribution")
plt.legend(loc="upper left")
plt.grid(True)
plt.show()

The most common example of a PDF is the well known bell curve, or normal distribution, which is given by the following PDF:

\[f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{(x-\mu)^2}{2\sigma^2}}\]

where \(\mu\) is the mean of the distribution and \(\sigma^2\) is the variance. The normal distribution is symmetric around the mean, and the variance determines the spread of the distribution.

Clearly, we can see that for \(f(x)\) to be a valid PDF, it must satisfy the following two properties:

\(1 \geq f(x) \geq 0\) for all \(x\)
\(\int_{-\infty}^{\infty} f(x) dx = 1\)

The first property ensures that the PDF is non-negative, and the second property ensures that the total probability of the random variable taking on any value in its range is 1.

For a uniform distribution over a range of \([a, b]\), the PDF is given by: \[f(x) = \frac{1}{b-a}\]

where \(a\) and \(b\) are the lower and upper bounds of the range, respectively.

Another example of a PDF on a finite range is the exponential distribution, which is given by the following PDF:

\[f(x) = \lambda e^{-\lambda x}\]

Where \(\lambda\) is the rate parameter.

Cumulative Distribution Function (CDF)

The cumulative distribution function (CDF) of a random variable \(X\) is a function that describes the probability that \(X\) will take on a value less than or equal to a given value. The CDF is the integral of the PDF.

For example, if the variable \(X\) is a continuous random variable over a range of \([-\infty, \infty]\) and the PDF of \(X\) is \(f(x)\), then the CDF of \(X\) is given by the integral of \(f(x)\) from \(-\infty\) to \(x\):

\[F(x) = \int_{-\infty}^{x} f(x) dx\]

Code

# This code plots the CDF of a normal distribution for two different values of mu and variance to illustrate the effect of these parameters on the shape of the CDF

import numpy as np
import matplotlib.pyplot as plt
from scipy.special import erf

x = np.arange(-5, 5, 0.01)

def normal_cdf(x, mu, sigma):
    return 0.5 * (1 + erf((x - mu) / (sigma * np.sqrt(2))))

y1 = normal_cdf(x, 0, 1)
y2 = normal_cdf(x, 0, 2)
y3 = normal_cdf(x, 1, 1)

plt.figure(figsize=(8, 6))
plt.plot(x, y1, label="mu=0, sigma^2=1", linewidth=2)
plt.plot(x, y2, label="mu=0, sigma^2=2", linewidth=2)
plt.plot(x, y3, label="mu=1, sigma^2=1", linewidth=2)
plt.xlabel("x")
plt.ylabel("F(x)")
plt.title("CDF of a normal distribution")
plt.legend(loc="lower right")
plt.grid(True)
plt.show()

CDF of normal distribution is given by:

\[F(x) = \frac{1}{2} \left(1 + \text{erf}\left(\frac{x-\mu}{\sigma\sqrt{2}}\right)\right)\]

Where \(\mu\) is the mean of the distribution and \(\sigma^2\) is the variance.

Compare the plots of the PDF and CDF of a normal distribution in the figures above. The CDF is the integral of the PDF, and as such, it is a measure of the probability that the random variable will take on a value less than or equal to a given value. The CDF is always non-decreasing and ranges from 0 to 1.

For a function \(F(x)\) to be a valid CDF, it must satisfy the following two properties:

\(0 \leq F(x) \leq 1\) for all \(x\)
\(\lim_{x \to -\infty} F(x) = 0\) and \(\lim_{x \to \infty} F(x) = 1\)

Bivariate Random Variables

A bivariate random variable is a random variable that takes on two values. The joint probability distribution of two random variables \(X\) and \(Y\) is a function that describes the probability of \(X\) taking on a value \(x\) and \(Y\) taking on a value \(y\). The joint probability distribution is denoted by \(P(X = x, Y = y)\).

If \(X\) and \(Y\) are both normal random variables, then the joint probability distribution is given by the following PDF:

\[P(x, y) = \frac{1}{2\pi\sigma_x\sigma_y\sqrt{1-\rho^2}} \exp\left(-\frac{1}{2(1-\rho^2)}\left(\frac{(x-\mu_x)^2}{\sigma_x^2} - 2\rho\frac{(x-\mu_x)(y-\mu_y)}{\sigma_x\sigma_y} + \frac{(y-\mu_y)^2}{\sigma_y^2}\right)\right)\]

where \(\mu_x\) and \(\mu_y\) are the means of \(X\) and \(Y\), \(\sigma_x^2\) and \(\sigma_y^2\) are the variances of \(X\) and \(Y\), and \(\rho\) is the correlation coefficient between \(X\) and \(Y\).