I’m excited to be here.
Goals of the class1:
My commitments:
Course materials can be found at uobayes.netlify.app
COURSE FORMAT: Flipped class style
I am pro-AI. The tools that have appeared in the last few years can not only help you do your work more quickly, but they can open up projects for you that you’d never be able to do otherwise. That being said, they’re a tool like any other, and they need to be learned.
My best AI tips:
AI doesn’t get the first word – giving AI something to work from will always result in a better outcome.
Your notes or earlier manuscripts
Talk to text!
You can ask the AI to stick as close as possible to your original wording, but to restructure and polish for a specific purpose.
AI doesn’t get the last word – these models hallucinate a lot less than the early days, but you still need to double check the work.
Ask the AI to interview you about the topic you’re writing about.
ChatGPT and Claude are great; have you tried Cursor?
Know when you need to do the thing yourself. (Pedagogical reasons usually.)
I created a workspace for you.
Questions, coordination, memes.
I’ll lurk and step in occasionally or update lectures accordingly
Do you want me to discuss a topic from class? Office hours.
Workspace setup:
library(tidyverse)
library(cowplot)
For today, we’ll build a foundation by reviewing probability (covered in PSY 611 – remember?) and connecting these ideas to Bayesian frameworks for calculating and thinking about probability.
I’ll draw on an article, Introduction to Bayesian Inference by Alex Etz and Joachim Vandekerckove (2018, Psychon Bull Review).
In the vast majority of cases, psychologists are trying to make statements about singular events:
Formula:
\[ P(A, B) = P(A)P(B|A) = P(B)P(A|B) \]
Meaning:
The joint probability – \(P(A, B)\) – reflects:
For Independent Events: \[ P(A, B) = P(A)P(B) \]
For Dependent Events: \[ P(A, B) = P(A)P(B|A) \]
Let’s say A is the event that it rains today and B is the event that it rains tomorrow. There’s a 60% chance it will rain today. If it does rain today, it’ll probably rain tomorrow (let’s say 2/3 chance). But if it doesn’t rain today, it probably won’t rain tomorrow (p = .625).
The probability of the joint events are found by multiplying the values along a path.
Formula: \[ P(A) = P(A, B) + P(A, \neg B) \]
Meaning:
Disjoint set:
Scenario:
Using the Sum Rule: \[ P(A) = P(A, B) + P(A, \neg B) \]
Given:
Result: \[ P(A) = \frac{13}{52} + \frac{13}{52} = 0.5 \]
The sum rule finds total probability by accounting for all disjoint ways \((A)\) can occur.
General Formula: \[ P(A) = \sum_{i} P(A, B_i) \]
Ensures that no possibilities are overlooked.
Construct the equivalent path diagram starting on the left with a fork that depends on event B, instead of event A.
Bayesian inference is the application of the product and sum rules to real problems of inference.
If
\[ P(H, D) = P(D)P(H|D) = P(H)P(D|H) \] then: \[ P(H|D) = \frac{P(H)P(D|H)}{P(D)} \]
This is Bayes’ Rule!
prior probability: \(P(H)\)
likelihood function: \(P(D|H)\)
posterior probabilities: \(P(H|D)\)
On the board
The Product Rule states that \[ P(H, D) = P(D)P(H|D) \] therefore: \[ P(H|D) = \frac{P(H, D)}{P(D)} \] In addition, \[ P(H, D) = P(H)P(D|H) \] so we can replace the numerator: \[ P(H|D) = \frac{P(H)P(D|H)}{P(D)} \]
How do we calculate this probability? We use the Sum Rule.
\[ P(D) = P(D, H) + P(D, \neg H) \\ = P(H)P(D|H) + P(\neg H)P(D|\neg H) \] Now, we can rewrite Bayes’ Rule using only prior probabilities and likelihoods.
\[ P(H|D) = \frac{P(H)P(D|H)}{P(H)P(D|H) + P(\neg H)P(D|\neg H)} \] And we can express our posterior in any case with K competing and mutually-exclusive hypotheses.
\[ P(H|D) = \frac{P(H)P(D|H)}{\sum_{k = 1}^K P(H_k)P(D|H_k)} \]
We form a ratio of relative belief in one hypothesis vis-a-vis another by comparing their posterior odds:
\[ \frac{P(H|D)}{P(\neg H|D)} \] We can insert the equations for posterior odds and find that this reduces to:
\[ \frac{P(H)}{P(\neg H)} \times \frac{P(D|H)}{P(D|\neg H)} \] The first part is called the prior odds and the second is called Bayes factor.
Bayes Factor: the extent to which the data sway our relative belief from one hypothesis to the other.
Full equation: \[ \frac{\frac{P(H)P(D|H)}{P(H)P(D|H) + P(\neg H)P(D|\neg H)}}{\frac{P(\neg H)P(D|\neg H)}{P(H)P(D|H) + P(\neg H)P(D|\neg H)}} \] Denominators cancel out.
Bayes factors are not the same as posterior probabilities.
At Hogwarts, professor Sprout leads the Herbology Department. In the Department’s greenhouses, she cultivates a magical plant that when consumed causes a witch or wizard to feel euphoric and relaxed. Professor Trelawney, the professor of Divination, is an avid user of this plant and frequently visits Professor Sprout’s laboratory to sample the latest harvest.
However, it has turned out that one in a thousand codacle plants is afflicted with a mutation that changes its effects: Consuming mutated plants causes unpleasant side effects such as paranoia, anxiety, and spontaneous levitation.
In order to evaluate the quality of her crops, Professor Sprout has developed a mutation-detecting spell. The new spell has a 99% chance to accurately detect an existing mutation, but also has a 2% chance to falsely indicate that a healthy plant is a mutant. When Professor Sprout presents her results at a School colloquium, Trelawney asks two questions: What is the probability that a plant is a mutant, when your spell says that it is? And what is the probability the plant is a mutant, when your spell says that it is healthy?
Instead of reporting probabilities, let’s summarize the problem another way:
\[ \frac{99}{99+1998} \approx .047 \]
Plugging in the values: \[ P(M|\neg D) = \frac{0.001 \times 0.01}{(0.001 \times 0.01) + (0.999 \times 0.98)} \]
If the spell indicates “not mutant,” the plant is almost certainly not a mutant: \[ P(M|\neg D) \approx 0.001\% \]
Why? The specificity (\(P(\neg D|\neg M) = 98\%\)) ensures most negatives are true negatives.
Diagram this example (similar to the rain/no rain diagram).
Diagram this example (similar to the rain/no rain diagram).
Suppose, however, that Trelawney knows that Professor Sprout’s diagnosis \((H_S)\) is statistically independent from the diagnosis of her talented research associate Neville Longbottom \((D_L)\) — meaning that for any given state of nature \(M\) or \(\neg M\), Longbottom’s diagnosis does not depend on Sprout’s. Further suppose that both Sprout and Longbottom return the mutant diagnosis (and for simplicity we also assume Longbottom’s spells are equally as accurate as Sprout’s). To find the posterior probability the plant is a mutant after two independent mutant diagnoses, \(P (M|H_S, D_L)\), Trelawney can apply a fundamental principle in Bayesian inference: Yesterday’s posterior is today’s prior.
What is the probability the plant is mutant after two independent mutant diagnoses?
Because diagnosis \(H_S\) and diagnosis \(D_L\) are independent, we know that: \[ P(D_L|M, H_S) = P(D_L|M) \] and \[ P(D_L|\neg M, H_S) = P(D_L|\neg M) \] therefore
\[ P(M|H_S, D_L) = \frac{P(M|H_S)P(D_L|M)}{P(M|H_S)P(D_L|M) + P(\neg M|H_S)P(D_L|\neg M)} \]
\[ = \frac{.047 \times .99}{.047 \times .99 + .953 \times .02} \approx .71 \]
Posterior probabilities depend heavily on prior probabilities.
Bayes’ Rule updates prior beliefs with evidence: \[ P(H|D) \propto P(H) \times P(D|H) \]
There is value in multiple independent sources of evidence:
| Score | Slytherin | Gryffindor | Ravenclaw | Hufflepuff |
|---|---|---|---|---|
| Excellent | 0.80 | 0.05 | 0.05 | 0.00 |
| Outstanding | 0.10 | 0.20 | 0.80 | 0.10 |
| Acceptable | 0.05 | 0.70 | 0.15 | 0.25 |
| Poor | 0.05 | 0.05 | 0.00 | 0.65 |
Professor McGonigall wants to know how likely it would be that a true Gryffindor will end up in Slytherin after taking this test.
\[ P(\text{Gryffindor}|H_S, S_E) = \frac{P(\text{Gryffindor})P(H_S|\text{Gryffindor})P(S_E|\text{Gryffindor})}{P(H_S, S_E)} \]
Known probabilities:
What is the probability a student is sorted into Slytherin and also scores Excellent?
\(P(H_S, S_E) = \sum_{i} P(\text{House}_i)P(H_S|\text{House}_i)P(S_E|\text{House}_i)\)
| House | Prior \(P(\text{House})\) | \(P(H_S \text{given House})\) | \(P(S_E\text{given House})\) | Joint \(P(H_S, S_E)\) |
|---|---|---|---|---|
| Slytherin | 0.25 | 1.00 | 0.80 | 0.2000 |
| Gryffindor | 0.25 | 0.20 | 0.05 | 0.0025 |
| Ravenclaw | 0.25 | 0.20 | 0.05 | 0.0025 |
| Hufflepuff | 0.25 | 0.20 | 0.00 | 0.0000 |
Marginal probability: \[ P(\text{Slytherin, Excellent}) = 0.2000 + 0.0025 + 0.0025 + 0.0000 = 0.2050 \]
Posterior probability: \[ P(\text{Gryffindor}|(H_S, S_E)) = \frac{P(\text{Gryffindor})P(H_S|\text{Gryffindor})P(S_E|\text{Gryffindor})}{P(H_S, S_E)} \]
Calculation: \[ P(\text{Gryffindor}|(H_S, S_E)) = \frac{0.25 \times 0.20 \times 0.05}{0.2050} = \frac{0.0025}{0.2050} \approx 0.0122 \]
Most of our research questions are about parameters in the continuous case. For this, we make use of probability density functions. Densities:
\(P(a_1 < A < a_2) = \int_{a_1}^{a_2}p(a)da\)
\(P(-\ < A < a_2) = \int_{a_1}^{a_2}p(a)da\)
Area in both is .10 or 10% Below 81 between 108 and 113
The product and sum rules have continuous analogues:
\[ p(a,b) = p(a)p(b|a) \]
Where \(p(a)\) is the density of the continuous parameter a, and \(p(b|a)\) is the conditional density of b (assuming a value of a).
\[ p(a) = \int_Bp(a,b)db \]
\(db\) – the differential – represents an infinitesimally small interval of the variable \(b\). It is part of the integral that sums (or integrates) the joint probability \(p(a, b)\) over all possible values of \(b\) within the range \(B\). - In other words, the \(db\) indicates that we’re integrating over values of \(b\) (not values of \(a\)).
Derived from the product rule: \[ p(a | b) = \frac{p(a, b)}{p(b)} = \frac{p(a)p(b | a)}{p(b)} \]
Numerator:
Denominator:
Marginal likelihood \(p(b)\), ensuring the posterior is a proper probability density:
\[ p(b) = \int_A p(a)p(b | a) da \]
Posterior density:
\[ p(\theta | x) = \frac{p(\theta) p(x | \theta)}{\int_\Theta p(\theta) p(x | \theta) \, d\theta} \]
Numerator:
Denominator:
\[ p(x | \lambda) = \frac{1}{x!} \lambda^x \text{exp}(-\lambda) \] - \(\lambda\): The expected number of events per time interval. - \(x\): Observed number of events in an interval.
\[ p(\lambda | a, b) = \frac{b^a}{\Gamma(a)} \text{exp}(-b \lambda)\lambda^{a - 1} \]
George collects data from three experiments:
Likelihood for the data:
\(p(X_n | \lambda) = \prod^n_I \frac{1}{x_i!} \lambda^{x_i} \text{exp}(-\lambda)\)
Posterior Distribution
Bayes’ Rule:
\[ p(\lambda | X_n) \propto p(\lambda) p(X_n | \lambda) \]
Conjugacy simplifies the posterior to another Gamma distribution:
\[ p(\lambda | X_n) = \text{Gamma}\left(a + \sum x_i, b + n \right) \]
Parameters of the posterior:
\(a_{\text{post}} = a + \sum x_i = 2 + (7 + 8 + 19) = 36\)
\(b_{\text{post}} = b + n = 0.2 + 3 = 3.2\)
Results
Posterior Mean:
\(\mathbb{E}[\lambda | X_n] = \frac{a_{\text{post}}}{b_{\text{post}}} = \frac{36}{3.2} \approx 11.25\)
Posterior Mode:
\(\lambda_{\text{mode}} = \frac{a_{\text{post}} - 1}{b_{\text{post}}} = \frac{36 - 1}{3.2} \approx 10.94\)
Bayesian analysis is the act of applying product and sum rules to probability.
Is that not your cup of tea?
Don’t worry! Most of this term, we’ll be abandoning the formulas altogheter – there are easier and more intuitive ways to fit and use these models.
So why did we just go through all of this? - So you can appreciate the work done by others to make this more approachable. - Because you will need to get familiar with different probability distributions (Poisson, gamma, Cauchy, etc). These form the basis of your prior distributions, so knowing what they look like and how to set good priors using them is essential.
These goals are in tension with each other. Good pedigogical code is bad application code. We’ll start with the former, move to the latter, but sometimes return to pedigogy.↩︎
With the important caveat that I have a one-year-old in daycare, so on any given day, he or I or both of us are sick. My hope is the planned course structure is forgiving of missed days.↩︎
\(\text{exp}(-\lambda)\) is a clearer way to write \(e^{-\lambda}\)↩︎