The first step is admitting you have a ProblemKicking the Spreadsheet HabitMichael Jones6 May 20211

Times are changing3

Clients and stakeholders' needs are changing. Questions need to be answered quicker and more accurately for cheaper so we have to be more responsive Many may already be using data warehouses behind the scenes, and so we need to be comfortable with the language and technology in order to meet them where they are, or perhaps even help them get to where they want to be.

You may get a huge number of files to process and you won't be able to do it no matter how many actuarial students you throw at it.

Are we becoming
 replaceable?4

The genie is out of the bottle. Aside from reserved roles, there's nothing technical that we do that couldn't be done by a similarly skilled data scientist:

cheaper (since they aren't operating in a potentially inflated salary environment driven by false scarcity?)
better? since they haven't got a huge mountain of legacy thought

Part 1

Spreadsheets:
A biased history5

Public domain image (GNU General Public Licence)

Visicalc 1979 on the Apple II

Lotus 1-2-3 in 1983 on MS-DOS

Image copyright Microsoft, used with permission of Microsoft, or at least that's What Wikipedia told me to say

Microsoft Excel 1995

Screenshot under Mozillla Public Licence

LibreOffice

People seem to like
spreadsheets10

There are some good things

Rough work
Quick view of your data
Toy models
Data interchange

Also I've seen people make art in spreadsheets and perform standup comedy about spreadsheets.

But...12

https://www.theverge.com/2020/8/6/21355674/human-genes-rename-microsoft-excel-misreading-dates

News report from The Verge

We have the most successful, most popular spreadsheet program breaking peoples' data.

The tools you use dicate
what you can do

but they also dictate
 what you think is possible14

Spreadsheets make you squash problems into a format that can be solved by a spreadsheet or make you think your problem isn't solvable because it can't be done in a spreadsheet easily.

Part 2:

Reaching their limits15

https://www.bbc.co.uk/news/technology-54423988

Excel is not a database17

2003 TransAlta lost $24 million due to Excel copy-and-paste error
2005 Kodak suffered an $11 million severance error due to Excel typo
18

2010 MI5 bugged the wrong phones due to a spreadsheet formatting error
2011 AstraZeneca accidentally released confidential information due to an Excel templating error

https://www.teampay.co/insights/biggest-excel-mistakes-of-all-time/

When you don't control
your primary data system,
you are at risk20

Present and future challengesLife risk modeling
Solvency II
IFRS 17
Complex Models
21

Spreadsheets have 
inherent problems
that limit their
ongoing usefulness22

ProblemsThe cell as the fundamental unit
No data types & lack of abstraction
Slowness
Conflation of model, implementation and instance
23

Problems (continuted)VBA
Fragmentation of model instances
Wider Office ecosystem
It crashes a lot
24

What would a solution look like?quicker results
less risk
more value add work

more fulfilling work
more efficient
more reactive

25

Part 3:

Another way26

SQLA relational model of data
A centralised source
Data constraints
Interoperability
27

Programming languagesR
Python
Julia
Separate the model, the implementation and the use28

Code
reuse29

Abstractions and
Data Structures30

SpeedAviva, Julia and Solvency II
Me, a complex client and an adventuRe
31

New or better
methods32

Version control33

3rd party
libraries34

Automated
testing35

Computable
Documents36

Better
Plotting37

diamonds

## # A tibble: 53,940 x 10
##    carat cut       color clarity depth table price     x     y     z
##    <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
##  1  0.23 Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43
##  2  0.21 Premium   E     SI1      59.8    61   326  3.89  3.84  2.31
##  3  0.23 Good      E     VS1      56.9    65   327  4.05  4.07  2.31
##  4  0.29 Premium   I     VS2      62.4    58   334  4.2   4.23  2.63
##  5  0.31 Good      J     SI2      63.3    58   335  4.34  4.35  2.75
##  6  0.24 Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48
##  7  0.24 Very Good I     VVS1     62.3    57   336  3.95  3.98  2.47
##  8  0.26 Very Good H     SI1      61.9    55   337  4.07  4.11  2.53
##  9  0.22 Fair      E     VS2      65.1    61   337  3.87  3.78  2.49
## 10  0.23 Very Good H     VS1      59.4    61   338  4     4.05  2.39
## # … with 53,930 more rows
37

diamonds %>%
  ggplot(aes(x = carat,
             y = price,
             colour = clarity))

diamonds %>%
  ggplot(aes(x = carat,
             y = price,
             colour = clarity)) +
  geom_point()

diamonds %>%
  ggplot(aes(x = carat,
             y = price,
             colour = clarity)) +
  geom_point() +
  facet_wrap(~cut)

diamonds %>%
  ggplot(aes(x = carat,
             y = price,
             colour = clarity)) +
  geom_point() +
  facet_wrap(~cut) +
  theme_bw()

diamonds %>%
  ggplot(aes(x = carat,
             y = price,
             colour = clarity)) +
  geom_point() +
  facet_wrap(~cut) +
  theme_bw() +
  theme(legend.position = "bottom")

diamonds %>%
  ggplot(aes(x = carat,
             y = price,
             colour = clarity)) +
  geom_point() +
  facet_wrap(~cut) +
  theme_bw() +
  theme(legend.position = "bottom") +
  labs(x = "Carat",
       y = "Price (USD)",
       title = "Some facts about Diamonds")

ggplot(midwest, aes(x = poptotal))

ggplot(midwest, aes(x = poptotal)) +
  geom_histogram(bins = 30)

ggplot(midwest, aes(x = poptotal)) +
  geom_histogram(bins = 30) +
  facet_wrap(~state)

ggplot(midwest, aes(x = poptotal)) +
  geom_histogram(bins = 30) +
  facet_wrap(~state) +
  scale_x_log10(labels = scales::comma)

ggplot(midwest, aes(x = poptotal)) +
  geom_histogram(bins = 30) +
  facet_wrap(~state) +
  scale_x_log10(labels = scales::comma) +
  theme_bw()

ggplot(midwest, aes(x = poptotal)) +
  geom_histogram(bins = 30) +
  facet_wrap(~state) +
  scale_x_log10(labels = scales::comma) +
  theme_bw() +
  labs(x = "Total Population",
       y = "Count",
       title = "Distribution of Total county population in Midwestern States")

Control & reproducibility38

TAS 200 (Insurance):
Implementations and realisations of models shall be reproducible.
But what does that mean?39

All Analysis is a DAG

Interlude:
A live demo41

Part 4:

How to learn42

Pick a Language43

Probably don't bother
with the IFoA Certificate
in Data Science44

Start
Small45

www.xkcd.com/1205

The best things in life are free

R for Data Science, Hadley Wickham
Rstudio tutorials
Package documentation
Conference recordings

The best things in life are freePeople at work
Online communities, e.g. Twitter
Data Science screencasts
Other books and blogs
48

If someone else is payingPaid training
Dead-tree books
Physical conferences
49

Come to terms
with the cost50

Epilogue51

If you must use spreadsheets,
use them wellData Organization in Spreadsheets, Broman & Woo, The American Statistician
Tidy Data, Hadley Wickham, The Journal of Statistical Software
52

Final ThoughtsActuaries could probably be replaced
Code-based work flows offer massive advantages of speed, accuracy and quality
It's never too late to start learning
53

The first step is admitting you have a ProblemKicking the Spreadsheet HabitMichael Jones6 May 20211

Times are changing3

You may get a huge number of files to process and you won't be able to do it no matter how many actuarial students you throw at it.

Are we becoming
 replaceable?4

The genie is out of the bottle. Aside from reserved roles, there's nothing technical that we do that couldn't be done by a similarly skilled data scientist:

cheaper (since they aren't operating in a potentially inflated salary environment driven by false scarcity?)
better? since they haven't got a huge mountain of legacy thought

Part 1

Spreadsheets:
A biased history5

Public domain image (GNU General Public Licence)

Visicalc 1979 on the Apple II

Lotus 1-2-3 in 1983 on MS-DOS

Image copyright Microsoft, used with permission of Microsoft, or at least that's What Wikipedia told me to say

Microsoft Excel 1995

Screenshot under Mozillla Public Licence

LibreOffice

People seem to like
spreadsheets10

There are some good things

Rough work
Quick view of your data
Toy models
Data interchange

Also I've seen people make art in spreadsheets and perform standup comedy about spreadsheets.

But...12

https://www.theverge.com/2020/8/6/21355674/human-genes-rename-microsoft-excel-misreading-dates

News report from The Verge

We have the most successful, most popular spreadsheet program breaking peoples' data.

The tools you use dicate
what you can do

but they also dictate
 what you think is possible14

Spreadsheets make you squash problems into a format that can be solved by a spreadsheet or make you think your problem isn't solvable because it can't be done in a spreadsheet easily.

Part 2:

Reaching their limits15

https://www.bbc.co.uk/news/technology-54423988

Excel is not a database17

2003 TransAlta lost $24 million due to Excel copy-and-paste error
2005 Kodak suffered an $11 million severance error due to Excel typo
18

2010 MI5 bugged the wrong phones due to a spreadsheet formatting error
2011 AstraZeneca accidentally released confidential information due to an Excel templating error

https://www.teampay.co/insights/biggest-excel-mistakes-of-all-time/

When you don't control
your primary data system,
you are at risk20

Present and future challengesLife risk modeling
Solvency II
IFRS 17
Complex Models
21

Spreadsheets have 
inherent problems
that limit their
ongoing usefulness22

ProblemsThe cell as the fundamental unit
No data types & lack of abstraction
Slowness
Conflation of model, implementation and instance
23

Problems (continuted)VBA
Fragmentation of model instances
Wider Office ecosystem
It crashes a lot
24

What would a solution look like?quicker results
less risk
more value add work

more fulfilling work
more efficient
more reactive

25

Part 3:

Another way26

SQLA relational model of data
A centralised source
Data constraints
Interoperability
27

Programming languagesR
Python
Julia
Separate the model, the implementation and the use28

Code
reuse29

Abstractions and
Data Structures30

SpeedAviva, Julia and Solvency II
Me, a complex client and an adventuRe
31

New or better
methods32

Version control33

3rd party
libraries34

Automated
testing35

Computable
Documents36

Better
Plotting37

diamonds

## # A tibble: 53,940 x 10
##    carat cut       color clarity depth table price     x     y     z
##    <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
##  1  0.23 Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43
##  2  0.21 Premium   E     SI1      59.8    61   326  3.89  3.84  2.31
##  3  0.23 Good      E     VS1      56.9    65   327  4.05  4.07  2.31
##  4  0.29 Premium   I     VS2      62.4    58   334  4.2   4.23  2.63
##  5  0.31 Good      J     SI2      63.3    58   335  4.34  4.35  2.75
##  6  0.24 Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48
##  7  0.24 Very Good I     VVS1     62.3    57   336  3.95  3.98  2.47
##  8  0.26 Very Good H     SI1      61.9    55   337  4.07  4.11  2.53
##  9  0.22 Fair      E     VS2      65.1    61   337  3.87  3.78  2.49
## 10  0.23 Very Good H     VS1      59.4    61   338  4     4.05  2.39
## # … with 53,930 more rows
37

diamonds %>%
  ggplot(aes(x = carat,
             y = price,
             colour = clarity))

diamonds %>%
  ggplot(aes(x = carat,
             y = price,
             colour = clarity)) +
  geom_point()

diamonds %>%
  ggplot(aes(x = carat,
             y = price,
             colour = clarity)) +
  geom_point() +
  facet_wrap(~cut)

diamonds %>%
  ggplot(aes(x = carat,
             y = price,
             colour = clarity)) +
  geom_point() +
  facet_wrap(~cut) +
  theme_bw()

diamonds %>%
  ggplot(aes(x = carat,
             y = price,
             colour = clarity)) +
  geom_point() +
  facet_wrap(~cut) +
  theme_bw() +
  theme(legend.position = "bottom")

diamonds %>%
  ggplot(aes(x = carat,
             y = price,
             colour = clarity)) +
  geom_point() +
  facet_wrap(~cut) +
  theme_bw() +
  theme(legend.position = "bottom") +
  labs(x = "Carat",
       y = "Price (USD)",
       title = "Some facts about Diamonds")

ggplot(midwest, aes(x = poptotal))

ggplot(midwest, aes(x = poptotal)) +
  geom_histogram(bins = 30)

ggplot(midwest, aes(x = poptotal)) +
  geom_histogram(bins = 30) +
  facet_wrap(~state)

ggplot(midwest, aes(x = poptotal)) +
  geom_histogram(bins = 30) +
  facet_wrap(~state) +
  scale_x_log10(labels = scales::comma)

ggplot(midwest, aes(x = poptotal)) +
  geom_histogram(bins = 30) +
  facet_wrap(~state) +
  scale_x_log10(labels = scales::comma) +
  theme_bw()

ggplot(midwest, aes(x = poptotal)) +
  geom_histogram(bins = 30) +
  facet_wrap(~state) +
  scale_x_log10(labels = scales::comma) +
  theme_bw() +
  labs(x = "Total Population",
       y = "Count",
       title = "Distribution of Total county population in Midwestern States")

Control & reproducibility38

TAS 200 (Insurance):
Implementations and realisations of models shall be reproducible.
But what does that mean?39

All Analysis is a DAG

Interlude:
A live demo41

Part 4:

How to learn42

Pick a Language43

Probably don't bother
with the IFoA Certificate
in Data Science44

Start
Small45

www.xkcd.com/1205

The best things in life are free

R for Data Science, Hadley Wickham
Rstudio tutorials
Package documentation
Conference recordings

The best things in life are freePeople at work
Online communities, e.g. Twitter
Data Science screencasts
Other books and blogs
48

If someone else is payingPaid training
Dead-tree books
Physical conferences
49

Come to terms
with the cost50

Epilogue51

If you must use spreadsheets,
use them wellData Organization in Spreadsheets, Broman & Woo, The American Statistician
Tidy Data, Hadley Wickham, The Journal of Statistical Software
52

Final ThoughtsActuaries could probably be replaced
Code-based work flows offer massive advantages of speed, accuracy and quality
It's never too late to start learning
53

↑, ←, Pg Up, k	Go to previous slide
↓, →, Pg Dn, Space, j	Go to next slide
Home	Go to first slide
End	Go to last slide
Number + Return	Go to specific slide
b / m / f	Toggle blackout / mirrored / fullscreen mode
c	Clone slideshow
p	Toggle presenter mode
t	Restart the presentation timer
?, h	Toggle this help
o	Tile View: Overview of Slides

The first step is admitting you have a Problem

Kicking the Spreadsheet Habit

Michael Jones

6 May 2021

Times are changing

Are we becoming replaceable?

Part 1Spreadsheets:A biased history

People seem to likespreadsheets

There are some good things

But...

The tools you use dicatewhat you can dobut they also dictate what you think is possible

Part 2:Reaching their limits

Excel is not a database

When you don't controlyour primary data system,you are at risk

Present and future challenges

Spreadsheets have inherent problemsthat limit theirongoing usefulness

Problems

Problems (continuted)

What would a solution look like?

Part 3:Another way

SQL

Programming languages

Separate the model, the implementation and the use

Codereuse

Abstractions andData Structures

Speed

New or bettermethods

Version control

3rd partylibraries

Automatedtesting

ComputableDocuments

BetterPlotting

Control & reproducibility

TAS 200 (Insurance):

Implementations and realisations of models shall be reproducible.

But what does that mean?

All Analysis is a DAG

Interlude:A live demo

Part 4:How to learn

Pick a Language

Probably don't botherwith the IFoA Certificatein Data Science

StartSmall

The best things in life are free

The best things in life are free

If someone else is paying

Come to termswith the cost

Epilogue

If you must use spreadsheets,use them well

Final Thoughts

Help

The first step is admitting you have a Problem

The first step is admitting you have a Problem

Kicking the Spreadsheet Habit

Michael Jones

6 May 2021

Times are changing

Are we becoming replaceable?

Part 1Spreadsheets:A biased history

People seem to likespreadsheets

There are some good things

But...

The tools you use dicatewhat you can dobut they also dictate what you think is possible

Part 2:Reaching their limits

Excel is not a database

When you don't controlyour primary data system,you are at risk

Present and future challenges

Spreadsheets have inherent problemsthat limit theirongoing usefulness

Problems

Problems (continuted)

What would a solution look like?

Part 3:Another way

SQL

Programming languages

Separate the model, the implementation and the use

Codereuse

Abstractions andData Structures

Speed

New or bettermethods

Version control

3rd partylibraries

Are we becoming
replaceable?

Part 1

Spreadsheets:
A biased history

People seem to like
spreadsheets

The tools you use dicate
what you can do

but they also dictate
what you think is possible

Part 2:

Reaching their limits

When you don't control
your primary data system,
you are at risk

Spreadsheets have
inherent problems
that limit their
ongoing usefulness

Part 3:

Another way

Code
reuse

Abstractions and
Data Structures

New or better
methods

3^rd party
libraries

Automated
testing

Computable
Documents

Better
Plotting

Interlude:
A live demo

Part 4:

How to learn

Probably don't bother
with the IFoA Certificate
in Data Science

Start
Small

Come to terms
with the cost

If you must use spreadsheets,
use them well

Are we becoming
replaceable?

Part 1

Spreadsheets:
A biased history

People seem to like
spreadsheets

The tools you use dicate
what you can do

but they also dictate
what you think is possible

Part 2:

Reaching their limits

When you don't control
your primary data system,
you are at risk

Spreadsheets have
inherent problems
that limit their
ongoing usefulness

Part 3:

Another way

Code
reuse

Abstractions and
Data Structures

New or better
methods

3^rd party
libraries

Automated
testing

Computable
Documents

Better
Plotting

Interlude:
A live demo

Part 4:

How to learn

Probably don't bother
with the IFoA Certificate
in Data Science

Start
Small

Come to terms
with the cost

If you must use spreadsheets,
use them well