Workspace setup:

library(tidyverse)
library(cowplot)

Bayes’ Theorem

Recall from last lecture, the probability that a specific hypothesis, H, is true given a set of data, D, is defined as:

P(H|D)=P(H)P(D|H)P(D)

You can get there via some calculus, but there are other, more intuitive ways of getting to this value. Today, we’re going to build up the intuition of this formula.


writing functions

#
fun_name = function( argument=1 ){  # <1>
  y = exp(argument)                 # <2>
  y^2
}                                   # <3>
  1. Define the function name and arguments. Add defaults.
  2. What should be done to the objects established in the arguments.
  3. By default, the function will return the last calculation (note this is not assigned to an object.) Or, you can create a list of objects to be returned using return = list(...) at the end of your function.

An example:

dice_roll = function( N=1 ){
  sample( x=c(1:6), size=N, replace=T)
}

dice_roll()
## [1] 5
dice_roll(5)
## [1] 3 2 2 5 1

Generative models

Suppose you have a globe representing the planet, and you want to estimate how much of the surface is covered in water. You adopt the following strategy: You toss the globe in the air, and when you catch it, you write down whether the surface under your right thumb is water or land.

exercise

Write a function to simulate tosses of this globe. Make sure your function allows you to vary N, the number of tosses, and p, the true proportion of water on the globe.


solution

# function to toss a globe covered p by water N times
sim_globe = function( p=0.7 , N=9 ){
  sample(
    x = c("W", "L"),  # possible values
    size = N,         # how many draws
    prob = c(p, 1-p), # probability of each possibility
    replace = TRUE    # the same value can be drawn multiple times
  )
}

sim_globe()
## [1] "W" "W" "L" "W" "L" "L" "W" "W" "W"

P(H|D)=P(H)P(D|H)P(D)

  • P(H) is the prior, and that is set by the researcher.
  • P(D) is the probability of the data given all possible hypotheses, and is therefore the sum of all the P(H)×P(D|H).
  • So we only need to find P(D|H) or the probability of a sample given a specific, hypothetical value of H.

What probability distribution would represent the likelihood of a specific sample of water and land given a known propotion of water?

The binomial distribution.

dbinom(x = 6, size = 9, prob = .7)
## [1] 0.2668279

exercise: grid approximation

Write a function that computes the posterior distribution of p, the true proportion of water, based on a given sample and a flat prior. The input of this function should be (1) a sample and (2) a vector of possible parameter values of p. Within the function, you’ll need to calculate the following value:

P(pi|D)=P(pi)P(D|pi)P(D)

Where pi is each value of p that you fed into the function and D is the sample in hand. As a reminder:

  • P(pi) is the prior, and that is set by the researcher. I’ve asked you to set a flat prior, so this should be the same for all values of p.
  • P(D|pi) or the likelihood of a given a specific, hypothetical value of p. This is calculated using the binomial distribution.
  • P(D) is the probability of the data given all possible hypotheses, and is therefore the sum of all the P(pi)×P(D|pi).

exercise: solution

compute_posterior = function(sample, poss = seq(0,1,length.out=100)){
  
  W = sum(sample == "W")
  L = sum(sample == "L")
  
  prior = rep(1, length(poss))
  
  likelihood = sapply( poss, function(x) dbinom(x = W, size = W+L, prob = x))
  
  post = ( prior*likelihood ) / sum( prior*likelihood)
  
  return(post)
}

Testing it out:

(sample = sim_globe())
## [1] "W" "W" "L" "L" "W" "W" "W" "W" "L"
compute_posterior(sample)
##   [1] 0.000000e+00 8.741894e-12 5.425284e-10 5.990575e-09 3.261806e-08
##   [6] 1.205399e-07 3.485649e-07 8.509010e-07 1.834811e-06 3.598404e-06
##  [11] 6.547829e-06 1.121325e-05 1.826302e-05 2.851562e-05 4.294893e-05
##  [16] 6.270652e-05 8.910077e-05 1.236125e-04 1.678871e-04 2.237272e-04
##  [21] 2.930816e-04 3.780305e-04 4.807680e-04 6.035804e-04 7.488219e-04
##  [26] 9.188879e-04 1.116184e-03 1.343097e-03 1.601956e-03 1.895001e-03
##  [31] 2.224345e-03 2.591936e-03 2.999519e-03 3.448599e-03 3.940405e-03
##  [36] 4.475850e-03 5.055498e-03 5.679534e-03 6.347729e-03 7.059415e-03
##  [41] 7.813458e-03 8.608241e-03 9.441644e-03 1.031103e-02 1.121325e-02
##  [46] 1.214461e-02 1.310092e-02 1.407747e-02 1.506903e-02 1.606992e-02
##  [51] 1.707401e-02 1.807474e-02 1.906518e-02 2.003808e-02 2.098589e-02
##  [56] 2.190088e-02 2.277513e-02 2.360067e-02 2.436951e-02 2.507375e-02
##  [61] 2.570565e-02 2.625773e-02 2.672284e-02 2.709431e-02 2.736600e-02
##  [66] 2.753241e-02 2.758880e-02 2.753126e-02 2.735684e-02 2.706360e-02
##  [71] 2.665076e-02 2.611870e-02 2.546910e-02 2.470498e-02 2.383075e-02
##  [76] 2.285223e-02 2.177672e-02 2.061293e-02 1.937103e-02 1.806258e-02
##  [81] 1.670044e-02 1.529871e-02 1.387258e-02 1.243815e-02 1.101227e-02
##  [86] 9.612248e-03 8.255589e-03 6.959637e-03 5.741183e-03 4.616016e-03
##  [91] 3.598404e-03 2.700509e-03 1.931739e-03 1.298012e-03 8.009482e-04
##  [96] 4.369673e-04 1.962992e-04 6.189388e-05 8.227801e-06 0.000000e+00
plot(seq(0,1,length.out=100), compute_posterior(sample), type = "l")


exercise

Compute and plot the grid approximate posterior distribution for each of the following sets of observations. In each case, assume a uniform prior for p:

  • W W W
  • W W W L
  • L W W L W W W

exercise: solution p1

sample = c("W", "W", "W")
plot(seq(0,1,length.out=100), compute_posterior(sample), type = "l")


exercise: solution p2

sample = c("W", "W", "W", "L")
plot(seq(0,1,length.out=100), compute_posterior(sample), type = "l")


exercise: solution p3

sample = c("L", "W", "W", "L", "W", "W", "W")
plot(seq(0,1,length.out=100), compute_posterior(sample), type = "l")


exercise

Now assume a prior for p that is equal to zero when p<0.5 and is a positive constant when p≥0.5. Again compute and plot the grid approximate posterior distribution for each of the sets of observations in the prior problem.


exercise: solution

compute_posterior = function(sample, poss = seq(0,1,length.out=100)){
  
  W = sum(sample == "W")
  L = sum(sample == "L")
  
  prior = ifelse(poss < .5, 0, 1)
  
  likelihood = sapply( poss, function(x) dbinom(x = W, size = W+L, prob = x))
  
  post = ( prior*likelihood ) / sum( prior*likelihood)
  
  return(post)
}

sample = c("W", "W", "W")
plot(seq(0,1,length.out=100), compute_posterior(sample), type = "l")


sample = c("W", "W", "W", "L")
plot(seq(0,1,length.out=100), compute_posterior(sample), type = "l")


sample = c("L", "W", "W", "L", "W", "W", "W")
plot(seq(0,1,length.out=100), compute_posterior(sample), type = "l")


exercise

Suppose the globe tossing data had turned out to be 8 water in 15 tosses. Construct the posterior distribution, using grid approximation. Use the same flat prior as before.

Draw 10,000 samples from the grid approximation from above using the sample() function.

Then use the samples to calculate the 90% HPDI for p using the function HPDI() in the rethinking package.


exercise: solution

p_grid <- seq( from=0 , to=1 , length.out=1000 )
prior <- rep( 1 , 1000 )
likelihood <- dbinom( 8 , size=15 , prob=p_grid )
posterior <- likelihood * prior
posterior <- posterior / sum(posterior)
set.seed(100)
samples <- sample( p_grid , prob=posterior , size=1e4 , replace=TRUE )
rethinking::HPDI(samples, prob = .90)
##      |0.9      0.9| 
## 0.3343343 0.7217217

exercise

Construct a posterior predictive check for this model and data. This means simulate the distribution of samples, averaging over the posterior uncertainty in p. What is the probability of observing 8 water in 15 tosses?


exercise: solution

dummy_w <- rbinom(1e4, 15, prob = samples)
table(dummy_w)
## dummy_w
##    0    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15 
##    6   32  129  272  526  851 1192 1393 1425 1405 1110  815  449  285   86   24
sum(dummy_w == 8)/1e4
## [1] 0.1425

exercise

Now use a prior that is zero below p<.5 and a positive constant otherwise. Repeat each problem above and compare the inferences. What difference does the better prior make?

  • Calculate the 90% HDPI.
  • What is the probability of observing 8 water in 15 tosses?

exercise: solution

p_grid <- seq( from=0 , to=1 , length.out=1000 )
prior <- ifelse(p_grid < .5, 0, 1)
likelihood <- dbinom( 8 , size=15 , prob=p_grid )
posterior <- likelihood * prior
posterior <- posterior / sum(posterior)
set.seed(100)
samples <- sample( p_grid , prob=posterior , size=1e4 , replace=TRUE )
rethinking::HPDI(samples, prob = .90)
##      |0.9      0.9| 
## 0.5005005 0.7097097

exercise: solution

dummy_w <- rbinom(1e4, 15, prob = samples)
table(dummy_w)
## dummy_w
##    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15 
##    2    7   35  132  347  684 1081 1630 1776 1667 1273  769  388  184   25
sum(dummy_w == 8)/1e4
## [1] 0.163