Actuaries
should be
scientific
Actuaries
should be
Bayesian
What?
Why?
How?
Bayesian
Recap
P(θ|D)=P(D|θ)P(θ)∫dθ′P(D|θ′)P(θ′)
P(θ|D)=P(D|θ)P(θ)∫dθ′P(D|θ′)P(θ′)
P(θ|D)=P(D|θ)P(θ)∫dθ′P(D|θ′)P(θ′)
P(θ|D)=P(D|θ)P(θ)∫dθ′P(D|θ′)P(θ′)
P(θ|D)=P(D|θ)P(θ)∫dθ′P(D|θ′)P(θ′)
Why be
Bayesian
It works
It produces results that are testable and given the right models generally are as good (sometimes better) than frequentist analysis.
It's coherent
Most Bayesian analysis stems from a few foundational ideas compared to frequentist statistics which has a much more varied set of fundamentals.
Posterior information is rich and useful
It's natural
Humans naturally think in Bayesian terms which are then 'unlearned' when studying frequentist analysis.
It's self
documenting
Through the priors, assumptions are formally absorbed into the analysis and cannot be hidden as in classical statistics.
You get the
full Posterior
It's modular
Multilevel models, choice of priors, e.g. simple to make robust to outliers
Why not be
Bayesian
What
prior?
It's
awkward
A whole posterior distribution can be difficult to communicate/process
It's
hard
ok if conjugate pairs, but otherwise, it's difficult to do without MCMC or other computationally intensive programs.
It's
weird
It's not what people are used to, and the themes are unfamiliar, though it's gaining traction in social sciences and pharmacology (and election prediction)
Frequentism
vs
Bayesianism
Probability
Data
Parameters
Ranges
p-values &
significance
testing
How to be
Bayesian
The Bayesian
Workflow
Analytically:
Estimating the
Bias of a coin
θ=P(yi=heads)
p(y|θ)=θy(1−θ)(1−y)
P(θ|a,b)=beta(a,b)=θ(a−1)(1−θ)(b−1)B(a,b)
P(θ|nheads,ntails,a,b)=beta(a+nheads,b+ntails)=θ(a+nheads−1)(1−θ)(b+ntails−1)B(a+nheads,b+ntails)
https://mjones.shinyapps.io/coin/
click here
click here
Flat beta(1,1) distribution:
7 Heads, 5 tails:
Another Beta: Beta(6,8)
Problems
What prior
to use?
Normalising
Computationally:
First Steps
Computationally:
MCMC
Markov Chain
Monte Carlo
Markov Chain
Monte Carlo
Markov Chain
Monte Carlo
Metropolis
50 Islands
Populations in ratio 1:2:3:...:50
50 Islands
Populations in ratio 1:2:3:...:50
Want to visit each in accordance with its population
Don't make
your own
Computationally:
In practice
## Markov Chain Monte Carlo (MCMC) output:## Start = 501 ## End = 531 ## Thinning interval = 1 ## theta[1] theta[2]## [1,] 0.5331603 0.7960773## [2,] 0.7224562 0.4499415## [3,] 0.7102999 0.2461300## [4,] 0.5370162 0.5314258## [5,] 0.5014244 0.2403774## [6,] 0.8557981 0.5679354## [7,] 0.4511761 0.5866845## [8,] 0.5826508 0.4369579## [9,] 0.6994008 0.3722830## [10,] 0.3815623 0.3713527## [11,] 0.8098616 0.3163093## [12,] 0.5554198 0.4975122## [13,] 0.4028894 0.3134066## [14,] 0.3785868 0.3637059## [15,] 0.4323854 0.5251539## [16,] 0.7882131 0.4448705## [17,] 0.6833106 0.4087616## [18,] 0.6819669 0.4665725## [19,] 0.7579478 0.3968828## [20,] 0.5021557 0.4240982## [21,] 0.7010325 0.4487093## [22,] 0.5308746 0.4451829## [23,] 0.6973850 0.4182190## [24,] 0.5973255 0.5136492## [25,] 0.8815459 0.3668754## [26,] 0.7933463 0.4132620## [27,] 0.6295374 0.1937704## [28,] 0.7943369 0.5278090## [29,] 0.5191301 0.3335732## [30,] 0.5250645 0.4350766## [31,] 0.4926411 0.3410629Bayesianism
applied to
Insurance
| AY | premium | 6 | 18 | 30 | 42 | 54 | 66 | 78 | 90 | 102 | 114 |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1991 | 10,000 | 358 | 1,125 | 1,735 | 2,183 | 2,746 | 3,320 | 3,466 | 3,606 | 3,834 | 3,901 |
| 1992 | 10,400 | 352 | 1,236 | 2,170 | 3,353 | 3,799 | 4,120 | 4,648 | 4,914 | 5,339 | |
| 1993 | 10,800 | 291 | 1,292 | 2,219 | 3,235 | 3,986 | 4,133 | 4,629 | 4,909 | ||
| 1994 | 11,200 | 311 | 1,419 | 2,195 | 3,757 | 4,030 | 4,382 | 4,588 | |||
| 1995 | 11,600 | 443 | 1,136 | 2,128 | 2,898 | 3,403 | 3,873 | ||||
| 1996 | 12,000 | 396 | 1,333 | 2,181 | 2,986 | 3,692 | |||||
| 1997 | 12,400 | 441 | 1,288 | 2,420 | 3,483 | ||||||
| 1998 | 12,800 | 359 | 1,421 | 2,864 | |||||||
| 1999 | 13,200 | 377 | 1,363 | ||||||||
| 2000 | 13,600 | 344 |
See here CLAY,dev∼N(μAY,dev,σ2dev)μAY,dev=ULTAY⋅G(dev|ω,θ)σdev=σ√μdevULTAY∼N(μult,σ2ult)G(dev|ω,θ)=1−exp(−(devθ)ω)
## Inference for Stan model: MultiLevelGrowthCurve.## 4 chains, each with iter=7000; warmup=2000; thin=2; ## post-warmup draws per chain=2500, total post-warmup draws=10000.## ## mean se_mean sd 50% 75% 97.5% n_eff Rhat## mu_ult 5355.32 5.74 275.18 5344.66 5532.70 5914.99 2300 1## omega 1.30 0.00 0.03 1.30 1.32 1.36 2933 1## theta 47.50 0.06 2.51 47.32 49.02 52.91 1712 1## sigma_ult 595.88 2.44 170.08 567.25 682.87 1014.74 4868 1## sigma 3.08 0.01 0.33 3.05 3.28 3.82 4228 1## ## Samples were drawn using NUTS(diag_e) at Tue Oct 9 18:44:13 2018.## For each parameter, n_eff is a crude measure of effective sample size,## and Rhat is the potential scale reduction factor on split chains (at ## convergence, Rhat=1).Further
Topics
Multilevel
Models
Bayesian
Model
Averaging
Causal
Networks
Actuaries
should be
scientific
Keyboard shortcuts
| ↑, ←, Pg Up, k | Go to previous slide |
| ↓, →, Pg Dn, Space, j | Go to next slide |
| Home | Go to first slide |
| End | Go to last slide |
| Number + Return | Go to specific slide |
| b / m / f | Toggle blackout / mirrored / fullscreen mode |
| c | Clone slideshow |
| p | Toggle presenter mode |
| t | Restart the presentation timer |
| ?, h | Toggle this help |
| Esc | Back to slideshow |