Risk Prediction

Advanced Statistics for Records Research

Albert Henry

UCL Institute of Health Informatics

Online slides (CC licence: BY-SA-NC):

https://ihi-risk-teaching.netlify.app/

1 / 46

Learning objectives

This lecture will cover:

Types of prediction
2x2 contingency table
Metrics to evaluate model performance
Model development and validation

in the context of binary risk prediction

Acknowledgements

Materials and data presented are based on previous slides prepared by L. Palla, D. Prieto, and E. Williamson

2 / 46

Scenario

Consider a dataset consisting some variables collected from a group of individuals. We want to predict which individuals develop a certain binary outcome $Y$ .

Examples

Population	Available data	Outcome to predict
Surgical patients	age, sex, comorbidities, medications, severity of disease	complications after surgery
Covid-19 patients	age, sex, vaccination status, virus variant, chest X-ray	ICU admission within 10 days
Healthy individuals	body mass index, lifestyle, socioeconomic status, genotypes, lipid profile	myocardial infarction in 10-year follow-up

*Note: prediction modelling is not limited to biomedical problems

3 / 46

Types of prediction: deterministic vs probabilistic

Deterministic:
- Classify each individual into one of the two possible outcomes.
- Often used in (Supervised) Machine Learning
Probabilistic:
- Assign each individual a probability of developing the outcome.
- Often used in Biostatistics and is known as Risk Prediction*

Both methods use individual-level data on a set of variables (or predictors) to develop a prediction model

*Other names: risk prediction model, predictive model, prognostic (or prediction) index or rule, and risk score

4 / 46

Types of prediction: diagnostic vs prognostic

Moons KGM, Altman DG, Reitsma JB, et al. Ann Intern Med. 2015 Jan 6;162(1):W1–73.

5 / 46

Example

Who will develop myocardial infarction in the next 5 years?

id	age	sex	diabet	SBP
1	35	F	Yes	145
2	35	M	No	130
3	55	F	No	115
4	55	M	Yes	170
5	65	F	No	135
6	65	M	Yes	140
7	75	M	Yes	160
8	75	F	No	130
9	85	F	Yes	130
10	85	M	No	160

6 / 46

Deterministic classification

Suppose we have a pre-defined deterministic classification model C:

C(age, sex, diabet, SBP) = 0 or 1

id	age	sex	diabet	SBP	Classify
1	35	F	Yes	145	No
2	35	M	No	130	No
3	55	F	No	115	No
4	55	M	Yes	170	Yes
5	65	F	No	135	No
6	65	M	Yes	140	Yes
7	75	M	Yes	160	Yes
8	75	F	No	130	No
9	85	F	Yes	130	Yes
10	85	M	No	160	Yes

7 / 46

Probabilistic prediction

Suppose we have a pre-defined probabilistic prediction model P:

P(age, sex, diabet, SBP) = [0, 1]

id	age	sex	diabet	SBP	Classify	Predict
1	35	F	Yes	145	No	0.15
2	35	M	No	130	No	0.05
3	55	F	No	115	No	0.10
4	55	M	Yes	170	Yes	0.55
5	65	F	No	135	No	0.30
6	65	M	Yes	140	Yes	0.52
7	75	M	Yes	160	Yes	0.60
8	75	F	No	130	No	0.40
9	85	F	Yes	130	Yes	0.55
10	85	M	No	160	Yes	0.60

8 / 46

Observation

... after 5 years follow-up

id	age	sex	diabet	SBP	Classify	Predict	Observed
1	35	F	Yes	145	No	0.15	No
2	35	M	No	130	No	0.05	No
3	55	F	No	115	No	0.10	No
4	55	M	Yes	170	Yes	0.55	No
5	65	F	No	135	No	0.30	Yes
6	65	M	Yes	140	Yes	0.52	No
7	75	M	Yes	160	Yes	0.60	Yes
8	75	F	No	130	No	0.40	No
9	85	F	Yes	130	Yes	0.55	Yes
10	85	M	No	160	Yes	0.60	Yes

9 / 46

Model validation

Both C and P models are not always correct in predicting the outcome
The goal of model validation is to evaluate model performance by comparing predictions against observed values.
For binary prediction, model validation usually starts with creating a 2x2 contingency table / confusion matrix consisting all four possible pairs of predicted-observed values

10 / 46

2x2 contingency table /  Confusion matrix
 
    id 
    Predicted 
    Observed 
  
    1 
    No 
    No 
  
    2 
    No 
    No 
  
    3 
    No 
    No 
  
    4 
    Yes 
    No 
  
    5 
    No 
    Yes 
  
    6 
    Yes 
    No 
  
    7 
    Yes 
    Yes 
  
    8 
    No 
    No 
  
    9 
    Yes 
    Yes 
  
    10 
    Yes 
    Yes 
  
11 / 46

2x2 contingency table / Confusion matrix

id	Predicted	Observed
1	No	No
2	No	No
3	No	No
4	Yes	No
5	No	Yes
6	Yes	No
7	Yes	Yes
8	No	No
9	Yes	Yes
10	Yes	Yes

Predicted	Observed	Count
Yes	Yes	3
Yes	No	2
No	Yes	1
No	No	4

Predicted	Observed	Term
+	+	True Positive
+	-	False Positive
-	+	False Negative
-	-	True Negative

11 / 46

2x2 contingency table / Confusion matrix

	Observed
Predicted	+	-
+	3	2
-	1	4

	Observed
Predicted	+	-
+	True Positive	False Positive
-	False Negative	True Negative

With 2 x 2 contingency table, we can calculate several useful metrics* to evaluate model performance, including:

sensitivity, recall, hit / detection rate, or true positive rate (TPR)
specificity, selectivity, or true negative rate (TNR)
precision or positive predictive value (PPV)
negative predictive value (NPV)

*for a full list, refer to Wikipedia entry for Confusion matrix

12 / 46

Sensitivity

a.k.a True Positive Rate, Recall, Hit / Detection Rate
Probability of correctly predicting positive outcome
What is the probability of a positive classification, given a positive outcome?
$P (\hat{Y} = 1 | Y = 1$ )

	Observed
Predicted	+	-
+	TP	FP
-	FN	TN

$S e n s i t i v i t y = \frac{\sum T r u e P o s i t i v e}{\sum O b s e r v e d P o s i t i v e}$ $S e n s i t i v i t y = \frac{T P}{T P + F N}$

13 / 46

Specificity

a.k.a Selectivity, True Negative Rate
Probability of correctly predicting ngative outcome
What is the probability of a negative classification, given a negative outcome?
$P (\hat{Y} = 0 | Y = 0$ )

	Observed
Predicted	+	-
+	TP	FP
-	FN	TN

$S p e c i f i c i t y = \frac{\sum T r u e N e g a t i v e}{\sum O b s e r v e d N e g a t i v e}$ $S p e c i f i c i t y = \frac{T N}{T N + F P}$

14 / 46

Positive predictive value (PPV)

Probability of a positive outcome given a positive classification
$P (Y = 1 | \hat{Y} = 1$ )
For a given model (e.g. diagnostic test) with fixed sensitivity and specificity, PPV is positively correlated with disease prevalence
Intuition: as prevalence increases, true positive increases while false positives decreases

	Observed
Predicted	+	-
+	TP	FP
-	FN	TN

$P P V = \frac{\sum T r u e P o s i t i v e}{\sum P r e d i c t e d P o s i t i v e}$ $P P V = \frac{T P}{T P + F P}$

15 / 46

Negative predictive value (NPV)

Probability of a negative outcome given a negative classification
$P (Y = 0 | \hat{Y} = 0$ )
For a given model (e.g. diagnostic test) with fixed sensitivity and specificity, NPV is negatively correlated with disease prevalence
Intuition: as prevalence increases, true negative decreases while false negative increases

	Observed
Predicted	+	-
+	TP	FP
-	FN	TN

$N P V = \frac{\sum T r u e N e g a t i v e}{\sum P r e d i c t e d N e g a t i v e}$ $N P V = \frac{T N}{T N + F N}$

16 / 46

Comparing classifications with observations

Suppose we have the following contingency table for classification model C:

	Observed
Predicted	+	-	Total
+	3	2	5
-	1	4	5
Total	4	6	10

Sensitivity = ?

Specificity = ?

PPV = ?

NPV = ?

17 / 46

Comparing classifications with observations

Suppose we have the following contingency table for classification model C:

	Observed
Predicted	+	-	Total
+	3	2	5
-	1	4	5
Total	4	6	10

Sensitivity = 3/4 = 0.75

Specificity = 4/6 = 0.67

PPV = 3/5 = 0.6

NPV = 4/5 = 0.8

18 / 46

Comparing predictions with observations

For prediction model P, we order by predicted probability and choose a cut-off point to classify as "Yes", e.g.

"Yes" if probability (P) > 0.1

id	Predict	Observed	Prob >0.1
2	0.05	No	No
3	0.10	No	No
1	0.15	No	Yes
5	0.30	Yes	Yes
8	0.40	No	Yes
6	0.52	No	Yes
4	0.55	No	Yes
9	0.55	Yes	Yes
7	0.60	Yes	Yes
10	0.60	Yes	Yes

19 / 46

Cut-off point: Yes if P > 0.1

	Observed
Predicted	+	-	Total
+	4	4	8
-	0	2	2
Total	4	6	10

Sensitivity = 4/4 = 1

Specificity = 2/6 = 0.33

PPV = 4/8 = 0.5

NPV = 2/2 = 1

Higher sensitivity, lower specificity than classification model C

20 / 46

Cut-off point: Yes if P > 0.4
 
    id 
    Predict 
    Observed 
    Prob >0.4 
  


    2 
    0.05 
    No 
    No 
  

    3 
    0.10 
    No 
    No 
  

    1 
    0.15 
    No 
    No 
  

    5 
    0.30 
    Yes 
    No 
  

    8 
    0.40 
    No 
    No 
  

    6 
    0.52 
    No 
    Yes 
  

    4 
    0.55 
    No 
    Yes 
  

    9 
    0.55 
    Yes 
    Yes 
  

    7 
    0.60 
    Yes 
    Yes 
  

    10 
    0.60 
    Yes 
    Yes 
  



21 / 46

Cut-off point: Yes if P > 0.4

	Observed
Predicted	+	-	Total
+	3	2	5
-	1	4	5
Total	4	6	10

Sensitivity = 3/4 = 0.75

Specificity = 4/6 = 0.67

PPV = 3/5 = 0.6

NPV = 4/5 = 0.8

Same contingency table as classification model C

22 / 46

Cut-off point: Yes if P > 0.55
 
    id 
    Predict 
    Observed 
    Prob >0.55 
  


    2 
    0.05 
    No 
    No 
  

    3 
    0.10 
    No 
    No 
  

    1 
    0.15 
    No 
    No 
  

    5 
    0.30 
    Yes 
    No 
  

    8 
    0.40 
    No 
    No 
  

    6 
    0.52 
    No 
    No 
  

    4 
    0.55 
    No 
    No 
  

    9 
    0.55 
    Yes 
    No 
  

    7 
    0.60 
    Yes 
    Yes 
  

    10 
    0.60 
    Yes 
    Yes 
  



23 / 46

Cut-off point: Yes if P > 0.55

	Observed
Predicted	+	-	Total
+	2	0	2
-	2	6	8
Total	4	6	10

Sensitivity = 2/4 = 0.5

Specificity = 6/6 = 1

PPV = 2/2 = 1

NPV = 6/8 = 0.75

Lower sensitivity, higher specificity than classification model C

24 / 46

All cut-off points

If we repeat this process for each probability value, we can obtain a list of sensitivity and specificity values

id	Predict	Observed	Cut-off	Sensitivity	Specificity
2	0.05	No	P >0.05	1.00	0.17
3	0.10	No	P >0.1	1.00	0.33
1	0.15	No	P >0.15	1.00	0.50
5	0.30	Yes	P >0.3	0.75	0.50
8	0.40	No	P >0.4	0.75	0.67
6	0.52	No	P >0.52	0.75	0.83
4	0.55	No	P >0.55	0.50	1.00
9	0.55	Yes	P >0.55	0.50	1.00
7	0.60	Yes	P >0.6	0.00	1.00
10	0.60	Yes	P >0.6	0.00	1.00

25 / 46

Receiver Operating Characterictic (ROC) Curve

A curve linking all the sensitivity against the specificity values

Sens	Spec
1.00	0.17
1.00	0.33
1.00	0.50
0.75	0.50
0.75	0.67
0.75	0.83
0.50	1.00
0.50	1.00
0.00	1.00
0.00	1.00

26 / 46

Area Under the [ROC] Curve (AUC / AUROC)

What is the AUC?

27 / 46

Area Under the [ROC] Curve (AUC / AUROC)

What is the AUC?

Sample calculation with yardstick package in R

# df = the dataset shown in previous slides
roc <- yardstick::roc_auc(df, truth = Observed, Predict)
roc

## # A tibble: 1 × 3
##   .metric .estimator .estimate
##   <chr>   <chr>          <dbl>
## 1 roc_auc binary         0.854

AUC = 0.8541667

28 / 46

AUC and the distributions of predictors in the two outcome groups at different cut-off values

by Dariya Sydykova (follow link for more info)

29 / 46

AUC as a measure of model discrimination

by Dariya Sydykova (follow link for more info)

30 / 46

What does AUC tell us?

AUC estimates the probability that a randomly chosen observed “yes” was assigned a higher probability than a randomly observed “no” by the model
Real world models will have AUC from 0.5 to 1. A value closer to 1 indicates better performance in separating "yes" and "no".
For binary classification, AUC is equal to concordance (C) statistic

31 / 46

How do we come up with predictions?

We propose a statistical model for the probability of the event happening $P (Y_{i} = 1)$ depending on the other variables
For example a logistic model:

$\begin{matrix} (1) & \log (\frac{P (Y_{i} = 1)}{1 - P (Y_{i} = 1)}) = β_{0} + β_{1} X_{i} + β_{2} Z_{i} + \dots \end{matrix}$

We need a training set where we can observe all the variables $Y_{i}, X_{i}, Z_{i}, \dots$ to estimate the coefficients $β_{0}, β_{1}, β_{2}, \dots$
Once we have the coefficients that best fit the data we can calculate the predicted risk for each individual $i$

$\begin{matrix} (2) & \hat{P} (Y_{i} = 1) = \frac{e^{{\hat{β}}_{0} + {\hat{β}}_{1} X_{i} + {\hat{β}}_{2} Z_{i} + \dots}}{1 + e^{{\hat{β}}_{0} + {\hat{β}}_{1} X_{i} + {\hat{β}}_{2} Z_{i} + \dots}} \end{matrix}$

32 / 46

Dataset for model development & model validation

Internal validation

The validity of claims for the underlying population where the data originated from (reproducibility)
Split sample validation: split dataset randomly into training (for model development) and test set (for validation)
Other methods: cross validation and bootstrap resampling

External validation

Generalizability of claims to ‘plausibly related’ populations not included in the initial study population (transportability)
e.g. temporal or geographical validation

Steyerberg EW, Vergouwe Y (2014)

33 / 46

A larger example with 2000 individuals

We will use the variables Age, Sex, SBP, and BMI to predict if the person will be dead (Death = 1) or alive (Death= 0) in 5 years time

34 / 46

Model M1: Logistic regression

Stata command:

logistic dead age sex sbp bmi

Note that BMI is not statistically significant (P = 0.185)

35 / 46

Make predictions from Model M1

Create variable: logit of the probability of death - equation (1)

predict m1lp, xb

Create variable: predicted probability of death - equation (2)

predict m1pr

36 / 46

Predicted probability of death in `dead` and `alive` group

37 / 46

Model calibration

Evaluate goodness of fit with Hosmer-Lemeshow test

estat gof, group(10) table

38 / 46

Model calibration

Goodness of fit: Observed and expected events by deciles of risk

39 / 46

Model calibration

Goodness of fit: Observed and expected events by deciles of risk

Limitation: Hosmer-Lemeshow test can not tell the direction of miscalibration and relies on arbitrary grouping

39 / 46

Model calibration with calibration plot

Calibration-in-the-large: compares average predicted risk with observed risk (0: ideal, <0: underestimation, >0: overestimation)
Calibration slope: evaluate spread of estimated risk (1: ideal, <1: too extreme, >1: too moderate)

Further discussion: Steyerberg EW, Vergouwe Y (2014) and Calster BV et al. (2019)

40 / 46

Contingency table: cut-off $P (Y = 1) \geq 0.3$

estat classification, cutoff(0.3)

41 / 46

ROC curve from model M1

AUC = 0.78

There is a 78% probability that a person who died is assigned a higher predicted risk by the model than a person was alive by the end of the follow up

42 / 46

Sensitivity and Specificity for model M1
by cut-off value

Plotting sensitivity and specificity against cut-off value can help to select the most appropriate cut-off

In practice, this trade-off often needs to be decided on a case-by-case basis

43 / 46

Evaluating prediction model performance

Calibration

The agreement between the predicted & observed outcomes
For a group of patients with 10% predicted risk, do 10% experience the outcome?
e.g. Goodness of fit test, calibration curve

Discrimination

The ability of the model to distinguish between positive and negative outcome
e.g. AUC / C statistic

Clinical usefulness

Does the model provide accurate predictions at the patient level that can be used to guide clinical decision making?
e.g. Decision curve analysis

44 / 46

Summary

Prediction modelling can be broadly categorised into deterministic and probabilistic methods
2 x 2 contingency table / confusion matrix is useful as a first step to evaluate model performance
AUC is a useful measure of the model discrimination
Comparing observed and predicted risks is useful for model calibration
Assessing clinical usefulness requires other approaches and often requires insights from beyond the data

45 / 46

References and further reading

46 / 46

Learning objectives

This lecture will cover:

Types of prediction

2x2 contingency table

Metrics to evaluate model performance

Model development and validation

in the context of binary risk prediction

Acknowledgements

Materials and data presented are based on previous slides prepared by L. Palla, D. Prieto, and E. Williamson

↑, ←, Pg Up, k	Go to previous slide
↓, →, Pg Dn, Space, j	Go to next slide
Home	Go to first slide
End	Go to last slide
Number + Return	Go to specific slide
b / m / f	Toggle blackout / mirrored / fullscreen mode
c	Clone slideshow
p	Toggle presenter mode
t	Restart the presentation timer
?, h	Toggle this help

id	age	sex	diabet	SBP
1	35	F	Yes	145
2	35	M	No	130
3	55	F	No	115
4	55	M	Yes	170
5	65	F	No	135
6	65	M	Yes	140
7	75	M	Yes	160
8	75	F	No	130
9	85	F	Yes	130
10	85	M	No	160

id	age	sex	diabet	SBP	Classify
1	35	F	Yes	145	No
2	35	M	No	130	No
3	55	F	No	115	No
4	55	M	Yes	170	Yes
5	65	F	No	135	No
6	65	M	Yes	140	Yes
7	75	M	Yes	160	Yes
8	75	F	No	130	No
9	85	F	Yes	130	Yes
10	85	M	No	160	Yes

id	Predicted	Observed
1	No	No
2	No	No
3	No	No
4	Yes	No
5	No	Yes
6	Yes	No
7	Yes	Yes
8	No	No
9	Yes	Yes
10	Yes	Yes

id	Predicted	Observed
1	No	No
2	No	No
3	No	No
4	Yes	No
5	No	Yes
6	Yes	No
7	Yes	Yes
8	No	No
9	Yes	Yes
10	Yes	Yes

id	age	sex	diabet	SBP
1	35	F	Yes	145
2	35	M	No	130
3	55	F	No	115
4	55	M	Yes	170
5	65	F	No	135
6	65	M	Yes	140
7	75	M	Yes	160
8	75	F	No	130
9	85	F	Yes	130
10	85	M	No	160

id	age	sex	diabet	SBP	Classify
1	35	F	Yes	145	No
2	35	M	No	130	No
3	55	F	No	115	No
4	55	M	Yes	170	Yes
5	65	F	No	135	No
6	65	M	Yes	140	Yes
7	75	M	Yes	160	Yes
8	75	F	No	130	No
9	85	F	Yes	130	Yes
10	85	M	No	160	Yes

id	Predicted	Observed
1	No	No
2	No	No
3	No	No
4	Yes	No
5	No	Yes
6	Yes	No
7	Yes	Yes
8	No	No
9	Yes	Yes
10	Yes	Yes

id	Predicted	Observed
1	No	No
2	No	No
3	No	No
4	Yes	No
5	No	Yes
6	Yes	No
7	Yes	Yes
8	No	No
9	Yes	Yes
10	Yes	Yes

Risk Prediction

Advanced Statistics for Records Research

Albert Henry

UCL Institute of Health Informatics

Learning objectives

Acknowledgements

Scenario

Types of prediction: deterministic vs probabilistic

Types of prediction: diagnostic vs prognostic

Example

Deterministic classification

Probabilistic prediction

Observation

Model validation

2x2 contingency table / Confusion matrix

2x2 contingency table / Confusion matrix

2x2 contingency table / Confusion matrix

Sensitivity

Specificity

Positive predictive value (PPV)

Negative predictive value (NPV)

Comparing classifications with observations

Comparing classifications with observations

Comparing predictions with observations

Cut-off point: Yes if P > 0.1

Cut-off point: Yes if P > 0.4

Cut-off point: Yes if P > 0.4

Cut-off point: Yes if P > 0.55

Cut-off point: Yes if P > 0.55

All cut-off points

Receiver Operating Characterictic (ROC) Curve

Area Under the [ROC] Curve (AUC / AUROC)

Area Under the [ROC] Curve (AUC / AUROC)

AUC and the distributions of predictors in the two outcome groups at different cut-off values

AUC as a measure of model discrimination

What does AUC tell us?

How do we come up with predictions?

Dataset for model development & model validation

Internal validation

External validation

A larger example with 2000 individuals

Model M1: Logistic regression

Make predictions from Model M1

Predicted probability of death in dead and alive group

Model calibration

Model calibration

Model calibration

Model calibration with calibration plot

Contingency table: cut-off P(Y=1)≥0.3P(Y=1)≥0.3

ROC curve from model M1

Sensitivity and Specificity for model M1 by cut-off value

Evaluating prediction model performance

Summary

References and further reading

Learning objectives

Acknowledgements

Help

Predicted probability of death in `dead` and `alive` group

Contingency table: cut-off $P (Y = 1) \geq 0.3$

Sensitivity and Specificity for model M1
by cut-off value

id	age	sex	diabet	SBP
1	35	F	Yes	145
2	35	M	No	130
3	55	F	No	115
4	55	M	Yes	170
5	65	F	No	135
6	65	M	Yes	140
7	75	M	Yes	160
8	75	F	No	130
9	85	F	Yes	130
10	85	M	No	160

id	age	sex	diabet	SBP	Classify
1	35	F	Yes	145	No
2	35	M	No	130	No
3	55	F	No	115	No
4	55	M	Yes	170	Yes
5	65	F	No	135	No
6	65	M	Yes	140	Yes
7	75	M	Yes	160	Yes
8	75	F	No	130	No
9	85	F	Yes	130	Yes
10	85	M	No	160	Yes

id	Predicted	Observed
1	No	No
2	No	No
3	No	No
4	Yes	No
5	No	Yes
6	Yes	No
7	Yes	Yes
8	No	No
9	Yes	Yes
10	Yes	Yes

id	Predicted	Observed
1	No	No
2	No	No
3	No	No
4	Yes	No
5	No	Yes
6	Yes	No
7	Yes	Yes
8	No	No
9	Yes	Yes
10	Yes	Yes