+ - 0:00:00
Notes for current slide
Notes for next slide

Introduction to Computational Bayesian Methods for Actuaries

Michael Jones

10 October 2018

1 / 86

Actuaries
should be
scientific

2 / 86

Actuaries
should be
Bayesian

3 / 86

What?

4 / 86

Why?

5 / 86

How?

6 / 86

Bayesian
Recap

7 / 86

P(θ|D)=P(D|θ)P(θ)dθP(D|θ)P(θ)

8 / 86

P(θ|D)=P(D|θ)P(θ)dθP(D|θ)P(θ)

9 / 86

P(θ|D)=P(D|θ)P(θ)dθP(D|θ)P(θ)

10 / 86

P(θ|D)=P(D|θ)P(θ)dθP(D|θ)P(θ)

11 / 86

P(θ|D)=P(D|θ)P(θ)dθP(D|θ)P(θ)

12 / 86

Why be
Bayesian

13 / 86

It works

14 / 86

It produces results that are testable and given the right models generally are as good (sometimes better) than frequentist analysis.

It's coherent

15 / 86

Most Bayesian analysis stems from a few foundational ideas compared to frequentist statistics which has a much more varied set of fundamentals.

Posterior information is rich and useful

It's natural

16 / 86

Humans naturally think in Bayesian terms which are then 'unlearned' when studying frequentist analysis.

It's self
documenting

17 / 86

Through the priors, assumptions are formally absorbed into the analysis and cannot be hidden as in classical statistics.

You get the
full Posterior

18 / 86

It's modular

19 / 86

Multilevel models, choice of priors, e.g. simple to make robust to outliers

Why not be
Bayesian

20 / 86

What
prior?

21 / 86

It's
awkward

22 / 86

A whole posterior distribution can be difficult to communicate/process

It's
hard

23 / 86

ok if conjugate pairs, but otherwise, it's difficult to do without MCMC or other computationally intensive programs.

It's
weird

24 / 86

It's not what people are used to, and the themes are unfamiliar, though it's gaining traction in social sciences and pharmacology (and election prediction)

Frequentism
vs
Bayesianism

25 / 86

Probability

26 / 86

Data

27 / 86

Parameters

28 / 86

Ranges

29 / 86
0.100 0.125 0.150 0.175 0.200 100 draws from a Binomial with mean 15 Frequentist Confidence Intervals
30 / 86
95% HDI: 0.05 to 0.57 0 200 400 600 800 0.00 0.25 0.50 0.75 y count Bayesian Highest Density Intervals
31 / 86

p-values &
significance
testing

32 / 86

How to be
Bayesian

33 / 86

The Bayesian
Workflow

34 / 86
  • Identify your data
  • Define a descriptive model
  • Specify a prior
  • Compute the Posterior
  • Interpret the Posterior
  • Check the model is reasonable
35 / 86

Analytically:
Estimating the
Bias of a coin

36 / 86

Parameter

θ=P(yi=heads)

37 / 86

Likelihood

Bernoulli:

p(y|θ)=θy(1θ)(1y)

38 / 86

Prior

Beta

P(θ|a,b)=beta(a,b)=θ(a1)(1θ)(b1)B(a,b)

39 / 86
20, 0.1 20, 1 20, 2 20, 3 20, 4 20, 20 4, 0.1 4, 1 4, 2 4, 3 4, 4 4, 20 3, 0.1 3, 1 3, 2 3, 3 3, 4 3, 20 2, 0.1 2, 1 2, 2 2, 3 2, 4 2, 20 1, 0.1 1, 1 1, 2 1, 3 1, 4 1, 20 0.1, 0.1 0.1, 1 0.1, 2 0.1, 3 0.1, 4 0.1, 20 θ y
40 / 86

Posterior

Another Beta

P(θ|nheads,ntails,a,b)=beta(a+nheads,b+ntails)=θ(a+nheads1)(1θ)(b+ntails1)B(a+nheads,b+ntails)

41 / 86

Live demo time:

https://mjones.shinyapps.io/coin/

If it works

click here

If it doesn't work

click here

42 / 86

No idea about the coin

Flat beta(1,1) distribution:

0.50 0.75 1.00 1.25 1.50 0.00 0.25 0.50 0.75 1.00 θ Prior
43 / 86

Collect some data

7 Heads, 5 tails:

0e+00 1e-04 2e-04 3e-04 0.00 0.25 0.50 0.75 1.00 theta likelihood
44 / 86

Posterior

Another Beta: Beta(6,8)

0e+00 2e-05 4e-05 6e-05 0.00 0.25 0.50 0.75 1.00 theta posterior
45 / 86

Strong Prior

prior likelihood posterior 0.00 0.25 0.50 0.75 1.00 0 2 4 6 8 0e+00 1e-04 2e-04 3e-04 0 2 4 theta y
46 / 86

Problems

47 / 86
  • Conjugate priors
  • calculating the normalisation integral

What prior
to use?

48 / 86

Normalising

49 / 86

Computationally:
First Steps

50 / 86
0 1 2 0.00 0.25 0.50 0.75 1.00 x y
51 / 86
0 1 2 0.00 0.25 0.50 0.75 1.00 x y
52 / 86
0 1 2 0.00 0.25 0.50 0.75 1.00 x y
53 / 86
0 1 2 0.00 0.25 0.50 0.75 1.00 x y
54 / 86
0 10 20 0.00 0.25 0.50 0.75 1.00 x y
55 / 86
  • If the posterior is narrower, you waste a lot of time/computing power calculating the posterior in places which do not matter
  • Not only that, but as your dimensions increase (i.e. number of parameters), the number of points increases exponentially.

Computationally:
MCMC

56 / 86

Markov Chain
Monte Carlo

57 / 86

Markov Chain
Monte Carlo

58 / 86

Markov Chain
Monte Carlo

59 / 86

Metropolis

60 / 86

Metropolis Algorithm

61 / 86

Metropolis Algorithm

62 / 86

Metropolis Algorithm

63 / 86

Metropolis Algorithm

64 / 86

Metropolis Algorithm

65 / 86

Metropolis Algorithm Example

66 / 86

Metropolis Algorithm Example

  • 50 Islands
66 / 86

Metropolis Algorithm Example

  • 50 Islands

  • Populations in ratio 1:2:3:...:50

66 / 86

Metropolis Algorithm Example

  • 50 Islands

  • Populations in ratio 1:2:3:...:50

  • Want to visit each in accordance with its population

66 / 86
beginning end 0 100 200 300 400 500 999500 999600 999700 999800 999900 1000000 0 10 20 30 40 50 Step Number Island Number
67 / 86
1000 10000 1e+06 100 200 300 1 10 20 0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 0.00 0.05 0.10 0.15 0.20 0.25 0.00 0.03 0.06 0.09 0.00 0.01 0.02 0.03 0.04 0.0 0.1 0.2 0.3 0.4 0.00 0.05 0.10 0.15 0.00 0.01 0.02 0.03 0.04 0.00 0.25 0.50 0.75 1.00 0.00 0.05 0.10 0.15 0.00 0.01 0.02 0.03 0.04 Step Number Visiting proportion
68 / 86

Don't make
your own

69 / 86

Computationally:
In practice

70 / 86

Two Coins

71 / 86
## Markov Chain Monte Carlo (MCMC) output:
## Start = 501
## End = 531
## Thinning interval = 1
## theta[1] theta[2]
## [1,] 0.5331603 0.7960773
## [2,] 0.7224562 0.4499415
## [3,] 0.7102999 0.2461300
## [4,] 0.5370162 0.5314258
## [5,] 0.5014244 0.2403774
## [6,] 0.8557981 0.5679354
## [7,] 0.4511761 0.5866845
## [8,] 0.5826508 0.4369579
## [9,] 0.6994008 0.3722830
## [10,] 0.3815623 0.3713527
## [11,] 0.8098616 0.3163093
## [12,] 0.5554198 0.4975122
## [13,] 0.4028894 0.3134066
## [14,] 0.3785868 0.3637059
## [15,] 0.4323854 0.5251539
## [16,] 0.7882131 0.4448705
## [17,] 0.6833106 0.4087616
## [18,] 0.6819669 0.4665725
## [19,] 0.7579478 0.3968828
## [20,] 0.5021557 0.4240982
## [21,] 0.7010325 0.4487093
## [22,] 0.5308746 0.4451829
## [23,] 0.6973850 0.4182190
## [24,] 0.5973255 0.5136492
## [25,] 0.8815459 0.3668754
## [26,] 0.7933463 0.4132620
## [27,] 0.6295374 0.1937704
## [28,] 0.7943369 0.5278090
## [29,] 0.5191301 0.3335732
## [30,] 0.5250645 0.4350766
## [31,] 0.4926411 0.3410629
72 / 86
theta[1] 0.2 0.4 0.6 0.8 1.0 m o d e = 0.657 95 % HDI 0.401 0.872 + theta[1]-theta[2] -0.4 -0.2 0.0 0.2 0.4 0.6 m o d e = 0.239 11.8 % < 0 < 88.2 % 95 % HDI -0.134 0.551 + 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.2 0.4 0.6 0.8 theta[1] theta[2] theta[2] 0.2 0.4 0.6 0.8 m o d e = 0.424 95 % HDI 0.19 0.682 +
73 / 86

Bayesianism
applied to
Insurance

74 / 86
AY premium 6 18 30 42 54 66 78 90 102 114
1991 10,000 358 1,125 1,735 2,183 2,746 3,320 3,466 3,606 3,834 3,901
1992 10,400 352 1,236 2,170 3,353 3,799 4,120 4,648 4,914 5,339
1993 10,800 291 1,292 2,219 3,235 3,986 4,133 4,629 4,909
1994 11,200 311 1,419 2,195 3,757 4,030 4,382 4,588
1995 11,600 443 1,136 2,128 2,898 3,403 3,873
1996 12,000 396 1,333 2,181 2,986 3,692
1997 12,400 441 1,288 2,420 3,483
1998 12,800 359 1,421 2,864
1999 13,200 377 1,363
2000 13,600 344
75 / 86
1000 2000 3000 4000 5000 30 60 90 Development Period Amount 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000
76 / 86

The model

See here CLAY,devN(μAY,dev,σ2dev)μAY,dev=ULTAYG(dev|ω,θ)σdev=σμdevULTAYN(μult,σ2ult)G(dev|ω,θ)=1exp((devθ)ω)

77 / 86

Stan model output

## Inference for Stan model: MultiLevelGrowthCurve.
## 4 chains, each with iter=7000; warmup=2000; thin=2;
## post-warmup draws per chain=2500, total post-warmup draws=10000.
##
## mean se_mean sd 50% 75% 97.5% n_eff Rhat
## mu_ult 5355.32 5.74 275.18 5344.66 5532.70 5914.99 2300 1
## omega 1.30 0.00 0.03 1.30 1.32 1.36 2933 1
## theta 47.50 0.06 2.51 47.32 49.02 52.91 1712 1
## sigma_ult 595.88 2.44 170.08 567.25 682.87 1014.74 4868 1
## sigma 3.08 0.01 0.33 3.05 3.28 3.82 4228 1
##
## Samples were drawn using NUTS(diag_e) at Tue Oct 9 18:44:13 2018.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at
## convergence, Rhat=1).
78 / 86
ult.9. ult.10. ult.5. ult.6. ult.7. ult.8. ult.1. ult.2. ult.3. ult.4. 4000 5000 6000 7000 4000 5000 6000 7000 8000 4500 5000 5500 6000 4500 5000 5500 6000 6500 5000 5500 6000 6500 7000 5000 6000 7000 8000 3900 4200 4500 4800 4800 5200 5600 6000 6400 5000 5500 6000 6500 5000 5500 6000 6500 0.0000 0.0005 0.0010 0.0015 0e+00 3e-04 6e-04 9e-04 0.0000 0.0005 0.0010 0.0015 0.0020 0e+00 5e-04 1e-03 0.0000 0.0005 0.0010 0.0015 0.0020 0.0000 0.0005 0.0010 0.0015 0e+00 2e-04 4e-04 6e-04 0.000 0.001 0.002 0.003 0.0000 0.0005 0.0010 0.0015 0.00000 0.00025 0.00050 0.00075 0.00100 value density
79 / 86
1999 2000 1995 1996 1997 1998 1991 1992 1993 1994 0 50 100 0 50 100 0 50 100 0 50 100 0 2000 4000 6000 0 2000 4000 6000 0 2000 4000 6000 dev cum
80 / 86

Shinystan

Available here

81 / 86

Further
Topics

82 / 86

Multilevel
Models

83 / 86

Bayesian
Model
Averaging

84 / 86

Causal
Networks

85 / 86

References and Further Reading

Interactive

86 / 86

Actuaries
should be
scientific

2 / 86
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow