class: center, middle background-color: #EEFAFC # Risk Prediction ### Advanced Statistics for Records Research <br/> ### .black[Albert Henry] #### .black[UCL Institute of Health Informatics] <br/><br/> Online slides (CC licence: BY-SA-NC): .lightblue[https://ihi-risk-teaching.netlify.app/] --- ## Learning objectives This lecture will cover: - __Types__ of prediction - __2x2 contingency table__ - Metrics to evaluate __model performance__ - Model development and validation in the context of __binary risk prediction__ <br/> ### Acknowledgements Materials and data presented are based on previous slides prepared by L. Palla, D. Prieto, and E. Williamson --- # Scenario Consider a dataset consisting some variables collected from a group of individuals. We want to predict which individuals develop a certain __binary outcome `\(Y\)`__. __Examples__ | Population | Available data | Outcome to predict | |---------------------|--------------------------------------------------------------------|--------------------------------------------| | Surgical patients | age, sex, comorbidities, medications, severity of disease | complications after surgery | | Covid-19 patients | age, sex, vaccination status, virus variant, chest X-ray | ICU admission within 10 days | | Healthy individuals | body mass index, lifestyle, socioeconomic status, genotypes, lipid profile | myocardial infarction in 10-year follow-up | *___Note___: prediction modelling is ___not___ limited to biomedical problems --- ### Types of prediction: deterministic vs probabilistic - __Deterministic:__ - **Classify** each individual into one of the two possible outcomes. - Often used in __(Supervised) Machine Learning__ - __Probabilistic:__ - Assign each individual a **probability** of developing the outcome. - Often used in __Biostatistics__ and is known as __Risk Prediction*__ Both methods use individual-level data on a set of variables (or __predictors__) to develop a prediction model .footnote[ __*__Other names: risk prediction model, predictive model, prognostic (or prediction) index or rule, and risk score ] --- ### Types of prediction: diagnostic vs prognostic .center[ ![:scale 50%](https://www.acpjournals.org/na101/home/literatum/publisher/acp/journals/content/aim/2015/aim.2015.162.issue-1/m14-0698/20211006/images/medium/2ff1_box_a_schematic_representation_of_diagnostic_and_prognostic_prediction_modeling_studies.jpg) .smaller[Moons KGM, Altman DG, Reitsma JB, et al. Ann Intern Med. 2015 Jan 6;162(1):W1–73.] ] --- ## Example .large[Who will develop myocardial infarction in the next 5 years?] <table class="table table-hover table-striped" style="font-size: 20px; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:center;background-color: #bbbbbb !important;"> id </th> <th style="text-align:center;background-color: #bbbbbb !important;"> age </th> <th style="text-align:center;background-color: #bbbbbb !important;"> sex </th> <th style="text-align:center;background-color: #bbbbbb !important;"> diabet </th> <th style="text-align:center;background-color: #bbbbbb !important;"> SBP </th> <th style="text-align:center;background-color: #bbbbbb !important;"> Classify </th> <th style="text-align:center;background-color: #bbbbbb !important;"> Predict </th> <th style="text-align:center;background-color: #bbbbbb !important;"> Observed </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> 1 </td> <td style="text-align:center;"> 35 </td> <td style="text-align:center;"> F </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 145 </td> <td style="text-align:center;"> </td> <td style="text-align:center;"> </td> <td style="text-align:center;"> </td> </tr> <tr> <td style="text-align:center;"> 2 </td> <td style="text-align:center;"> 35 </td> <td style="text-align:center;"> M </td> <td style="text-align:center;"> No </td> <td style="text-align:center;"> 130 </td> <td style="text-align:center;"> </td> <td style="text-align:center;"> </td> <td style="text-align:center;"> </td> </tr> <tr> <td style="text-align:center;"> 3 </td> <td style="text-align:center;"> 55 </td> <td style="text-align:center;"> F </td> <td style="text-align:center;"> No </td> <td style="text-align:center;"> 115 </td> <td style="text-align:center;"> </td> <td style="text-align:center;"> </td> <td style="text-align:center;"> </td> </tr> <tr> <td style="text-align:center;"> 4 </td> <td style="text-align:center;"> 55 </td> <td style="text-align:center;"> M </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 170 </td> <td style="text-align:center;"> </td> <td style="text-align:center;"> </td> <td style="text-align:center;"> </td> </tr> <tr> <td style="text-align:center;"> 5 </td> <td style="text-align:center;"> 65 </td> <td style="text-align:center;"> F </td> <td style="text-align:center;"> No </td> <td style="text-align:center;"> 135 </td> <td style="text-align:center;"> </td> <td style="text-align:center;"> </td> <td style="text-align:center;"> </td> </tr> <tr> <td style="text-align:center;"> 6 </td> <td style="text-align:center;"> 65 </td> <td style="text-align:center;"> M </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 140 </td> <td style="text-align:center;"> </td> <td style="text-align:center;"> </td> <td style="text-align:center;"> </td> </tr> <tr> <td style="text-align:center;"> 7 </td> <td style="text-align:center;"> 75 </td> <td style="text-align:center;"> M </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 160 </td> <td style="text-align:center;"> </td> <td style="text-align:center;"> </td> <td style="text-align:center;"> </td> </tr> <tr> <td style="text-align:center;"> 8 </td> <td style="text-align:center;"> 75 </td> <td style="text-align:center;"> F </td> <td style="text-align:center;"> No </td> <td style="text-align:center;"> 130 </td> <td style="text-align:center;"> </td> <td style="text-align:center;"> </td> <td style="text-align:center;"> </td> </tr> <tr> <td style="text-align:center;"> 9 </td> <td style="text-align:center;"> 85 </td> <td style="text-align:center;"> F </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 130 </td> <td style="text-align:center;"> </td> <td style="text-align:center;"> </td> <td style="text-align:center;"> </td> </tr> <tr> <td style="text-align:center;"> 10 </td> <td style="text-align:center;"> 85 </td> <td style="text-align:center;"> M </td> <td style="text-align:center;"> No </td> <td style="text-align:center;"> 160 </td> <td style="text-align:center;"> </td> <td style="text-align:center;"> </td> <td style="text-align:center;"> </td> </tr> </tbody> </table> --- ## Deterministic classification Suppose we have a pre-defined deterministic classification model `C`: `C(age, sex, diabet, SBP) = 0 or 1` <table class="table table-hover table-striped" style="font-size: 20px; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:center;background-color: #bbbbbb !important;"> id </th> <th style="text-align:center;background-color: #bbbbbb !important;"> age </th> <th style="text-align:center;background-color: #bbbbbb !important;"> sex </th> <th style="text-align:center;background-color: #bbbbbb !important;"> diabet </th> <th style="text-align:center;background-color: #bbbbbb !important;"> SBP </th> <th style="text-align:center;background-color: #bbbbbb !important;"> Classify </th> <th style="text-align:center;background-color: #bbbbbb !important;"> Predict </th> <th style="text-align:center;background-color: #bbbbbb !important;"> Observed </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> 1 </td> <td style="text-align:center;"> 35 </td> <td style="text-align:center;"> F </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 145 </td> <td style="text-align:center;font-weight: bold;color: #1261A5 !important;"> No </td> <td style="text-align:center;"> </td> <td style="text-align:center;"> </td> </tr> <tr> <td style="text-align:center;"> 2 </td> <td style="text-align:center;"> 35 </td> <td style="text-align:center;"> M </td> <td style="text-align:center;"> No </td> <td style="text-align:center;"> 130 </td> <td style="text-align:center;font-weight: bold;color: #1261A5 !important;"> No </td> <td style="text-align:center;"> </td> <td style="text-align:center;"> </td> </tr> <tr> <td style="text-align:center;"> 3 </td> <td style="text-align:center;"> 55 </td> <td style="text-align:center;"> F </td> <td style="text-align:center;"> No </td> <td style="text-align:center;"> 115 </td> <td style="text-align:center;font-weight: bold;color: #1261A5 !important;"> No </td> <td style="text-align:center;"> </td> <td style="text-align:center;"> </td> </tr> <tr> <td style="text-align:center;"> 4 </td> <td style="text-align:center;"> 55 </td> <td style="text-align:center;"> M </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 170 </td> <td style="text-align:center;font-weight: bold;color: #1261A5 !important;"> Yes </td> <td style="text-align:center;"> </td> <td style="text-align:center;"> </td> </tr> <tr> <td style="text-align:center;"> 5 </td> <td style="text-align:center;"> 65 </td> <td style="text-align:center;"> F </td> <td style="text-align:center;"> No </td> <td style="text-align:center;"> 135 </td> <td style="text-align:center;font-weight: bold;color: #1261A5 !important;"> No </td> <td style="text-align:center;"> </td> <td style="text-align:center;"> </td> </tr> <tr> <td style="text-align:center;"> 6 </td> <td style="text-align:center;"> 65 </td> <td style="text-align:center;"> M </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 140 </td> <td style="text-align:center;font-weight: bold;color: #1261A5 !important;"> Yes </td> <td style="text-align:center;"> </td> <td style="text-align:center;"> </td> </tr> <tr> <td style="text-align:center;"> 7 </td> <td style="text-align:center;"> 75 </td> <td style="text-align:center;"> M </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 160 </td> <td style="text-align:center;font-weight: bold;color: #1261A5 !important;"> Yes </td> <td style="text-align:center;"> </td> <td style="text-align:center;"> </td> </tr> <tr> <td style="text-align:center;"> 8 </td> <td style="text-align:center;"> 75 </td> <td style="text-align:center;"> F </td> <td style="text-align:center;"> No </td> <td style="text-align:center;"> 130 </td> <td style="text-align:center;font-weight: bold;color: #1261A5 !important;"> No </td> <td style="text-align:center;"> </td> <td style="text-align:center;"> </td> </tr> <tr> <td style="text-align:center;"> 9 </td> <td style="text-align:center;"> 85 </td> <td style="text-align:center;"> F </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 130 </td> <td style="text-align:center;font-weight: bold;color: #1261A5 !important;"> Yes </td> <td style="text-align:center;"> </td> <td style="text-align:center;"> </td> </tr> <tr> <td style="text-align:center;"> 10 </td> <td style="text-align:center;"> 85 </td> <td style="text-align:center;"> M </td> <td style="text-align:center;"> No </td> <td style="text-align:center;"> 160 </td> <td style="text-align:center;font-weight: bold;color: #1261A5 !important;"> Yes </td> <td style="text-align:center;"> </td> <td style="text-align:center;"> </td> </tr> </tbody> </table> --- ## Probabilistic prediction Suppose we have a pre-defined probabilistic prediction model `P`: `P(age, sex, diabet, SBP) = [0, 1]` <table class="table table-hover table-striped" style="font-size: 20px; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:center;background-color: #bbbbbb !important;"> id </th> <th style="text-align:center;background-color: #bbbbbb !important;"> age </th> <th style="text-align:center;background-color: #bbbbbb !important;"> sex </th> <th style="text-align:center;background-color: #bbbbbb !important;"> diabet </th> <th style="text-align:center;background-color: #bbbbbb !important;"> SBP </th> <th style="text-align:center;background-color: #bbbbbb !important;"> Classify </th> <th style="text-align:center;background-color: #bbbbbb !important;"> Predict </th> <th style="text-align:center;background-color: #bbbbbb !important;"> Observed </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> 1 </td> <td style="text-align:center;"> 35 </td> <td style="text-align:center;"> F </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 145 </td> <td style="text-align:center;font-weight: bold;color: #1261A5 !important;"> No </td> <td style="text-align:center;font-weight: bold;color: #1261A5 !important;"> 0.15 </td> <td style="text-align:center;"> </td> </tr> <tr> <td style="text-align:center;"> 2 </td> <td style="text-align:center;"> 35 </td> <td style="text-align:center;"> M </td> <td style="text-align:center;"> No </td> <td style="text-align:center;"> 130 </td> <td style="text-align:center;font-weight: bold;color: #1261A5 !important;"> No </td> <td style="text-align:center;font-weight: bold;color: #1261A5 !important;"> 0.05 </td> <td style="text-align:center;"> </td> </tr> <tr> <td style="text-align:center;"> 3 </td> <td style="text-align:center;"> 55 </td> <td style="text-align:center;"> F </td> <td style="text-align:center;"> No </td> <td style="text-align:center;"> 115 </td> <td style="text-align:center;font-weight: bold;color: #1261A5 !important;"> No </td> <td style="text-align:center;font-weight: bold;color: #1261A5 !important;"> 0.10 </td> <td style="text-align:center;"> </td> </tr> <tr> <td style="text-align:center;"> 4 </td> <td style="text-align:center;"> 55 </td> <td style="text-align:center;"> M </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 170 </td> <td style="text-align:center;font-weight: bold;color: #1261A5 !important;"> Yes </td> <td style="text-align:center;font-weight: bold;color: #1261A5 !important;"> 0.55 </td> <td style="text-align:center;"> </td> </tr> <tr> <td style="text-align:center;"> 5 </td> <td style="text-align:center;"> 65 </td> <td style="text-align:center;"> F </td> <td style="text-align:center;"> No </td> <td style="text-align:center;"> 135 </td> <td style="text-align:center;font-weight: bold;color: #1261A5 !important;"> No </td> <td style="text-align:center;font-weight: bold;color: #1261A5 !important;"> 0.30 </td> <td style="text-align:center;"> </td> </tr> <tr> <td style="text-align:center;"> 6 </td> <td style="text-align:center;"> 65 </td> <td style="text-align:center;"> M </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 140 </td> <td style="text-align:center;font-weight: bold;color: #1261A5 !important;"> Yes </td> <td style="text-align:center;font-weight: bold;color: #1261A5 !important;"> 0.52 </td> <td style="text-align:center;"> </td> </tr> <tr> <td style="text-align:center;"> 7 </td> <td style="text-align:center;"> 75 </td> <td style="text-align:center;"> M </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 160 </td> <td style="text-align:center;font-weight: bold;color: #1261A5 !important;"> Yes </td> <td style="text-align:center;font-weight: bold;color: #1261A5 !important;"> 0.60 </td> <td style="text-align:center;"> </td> </tr> <tr> <td style="text-align:center;"> 8 </td> <td style="text-align:center;"> 75 </td> <td style="text-align:center;"> F </td> <td style="text-align:center;"> No </td> <td style="text-align:center;"> 130 </td> <td style="text-align:center;font-weight: bold;color: #1261A5 !important;"> No </td> <td style="text-align:center;font-weight: bold;color: #1261A5 !important;"> 0.40 </td> <td style="text-align:center;"> </td> </tr> <tr> <td style="text-align:center;"> 9 </td> <td style="text-align:center;"> 85 </td> <td style="text-align:center;"> F </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 130 </td> <td style="text-align:center;font-weight: bold;color: #1261A5 !important;"> Yes </td> <td style="text-align:center;font-weight: bold;color: #1261A5 !important;"> 0.55 </td> <td style="text-align:center;"> </td> </tr> <tr> <td style="text-align:center;"> 10 </td> <td style="text-align:center;"> 85 </td> <td style="text-align:center;"> M </td> <td style="text-align:center;"> No </td> <td style="text-align:center;"> 160 </td> <td style="text-align:center;font-weight: bold;color: #1261A5 !important;"> Yes </td> <td style="text-align:center;font-weight: bold;color: #1261A5 !important;"> 0.60 </td> <td style="text-align:center;"> </td> </tr> </tbody> </table> --- ## Observation ... after 5 years follow-up <table class="table table-hover table-striped" style="font-size: 20px; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:center;background-color: #bbbbbb !important;"> id </th> <th style="text-align:center;background-color: #bbbbbb !important;"> age </th> <th style="text-align:center;background-color: #bbbbbb !important;"> sex </th> <th style="text-align:center;background-color: #bbbbbb !important;"> diabet </th> <th style="text-align:center;background-color: #bbbbbb !important;"> SBP </th> <th style="text-align:center;background-color: #bbbbbb !important;"> Classify </th> <th style="text-align:center;background-color: #bbbbbb !important;"> Predict </th> <th style="text-align:center;background-color: #bbbbbb !important;"> Observed </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> 1 </td> <td style="text-align:center;"> 35 </td> <td style="text-align:center;"> F </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 145 </td> <td style="text-align:center;font-weight: bold;color: #1261A5 !important;"> No </td> <td style="text-align:center;font-weight: bold;color: #1261A5 !important;"> 0.15 </td> <td style="text-align:center;font-weight: bold;color: #1261A5 !important;"> No </td> </tr> <tr> <td style="text-align:center;"> 2 </td> <td style="text-align:center;"> 35 </td> <td style="text-align:center;"> M </td> <td style="text-align:center;"> No </td> <td style="text-align:center;"> 130 </td> <td style="text-align:center;font-weight: bold;color: #1261A5 !important;"> No </td> <td style="text-align:center;font-weight: bold;color: #1261A5 !important;"> 0.05 </td> <td style="text-align:center;font-weight: bold;color: #1261A5 !important;"> No </td> </tr> <tr> <td style="text-align:center;"> 3 </td> <td style="text-align:center;"> 55 </td> <td style="text-align:center;"> F </td> <td style="text-align:center;"> No </td> <td style="text-align:center;"> 115 </td> <td style="text-align:center;font-weight: bold;color: #1261A5 !important;"> No </td> <td style="text-align:center;font-weight: bold;color: #1261A5 !important;"> 0.10 </td> <td style="text-align:center;font-weight: bold;color: #1261A5 !important;"> No </td> </tr> <tr> <td style="text-align:center;"> 4 </td> <td style="text-align:center;"> 55 </td> <td style="text-align:center;"> M </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 170 </td> <td style="text-align:center;font-weight: bold;color: #1261A5 !important;"> Yes </td> <td style="text-align:center;font-weight: bold;color: #1261A5 !important;"> 0.55 </td> <td style="text-align:center;font-weight: bold;color: #1261A5 !important;"> No </td> </tr> <tr> <td style="text-align:center;"> 5 </td> <td style="text-align:center;"> 65 </td> <td style="text-align:center;"> F </td> <td style="text-align:center;"> No </td> <td style="text-align:center;"> 135 </td> <td style="text-align:center;font-weight: bold;color: #1261A5 !important;"> No </td> <td style="text-align:center;font-weight: bold;color: #1261A5 !important;"> 0.30 </td> <td style="text-align:center;font-weight: bold;color: #1261A5 !important;"> Yes </td> </tr> <tr> <td style="text-align:center;"> 6 </td> <td style="text-align:center;"> 65 </td> <td style="text-align:center;"> M </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 140 </td> <td style="text-align:center;font-weight: bold;color: #1261A5 !important;"> Yes </td> <td style="text-align:center;font-weight: bold;color: #1261A5 !important;"> 0.52 </td> <td style="text-align:center;font-weight: bold;color: #1261A5 !important;"> No </td> </tr> <tr> <td style="text-align:center;"> 7 </td> <td style="text-align:center;"> 75 </td> <td style="text-align:center;"> M </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 160 </td> <td style="text-align:center;font-weight: bold;color: #1261A5 !important;"> Yes </td> <td style="text-align:center;font-weight: bold;color: #1261A5 !important;"> 0.60 </td> <td style="text-align:center;font-weight: bold;color: #1261A5 !important;"> Yes </td> </tr> <tr> <td style="text-align:center;"> 8 </td> <td style="text-align:center;"> 75 </td> <td style="text-align:center;"> F </td> <td style="text-align:center;"> No </td> <td style="text-align:center;"> 130 </td> <td style="text-align:center;font-weight: bold;color: #1261A5 !important;"> No </td> <td style="text-align:center;font-weight: bold;color: #1261A5 !important;"> 0.40 </td> <td style="text-align:center;font-weight: bold;color: #1261A5 !important;"> No </td> </tr> <tr> <td style="text-align:center;"> 9 </td> <td style="text-align:center;"> 85 </td> <td style="text-align:center;"> F </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 130 </td> <td style="text-align:center;font-weight: bold;color: #1261A5 !important;"> Yes </td> <td style="text-align:center;font-weight: bold;color: #1261A5 !important;"> 0.55 </td> <td style="text-align:center;font-weight: bold;color: #1261A5 !important;"> Yes </td> </tr> <tr> <td style="text-align:center;"> 10 </td> <td style="text-align:center;"> 85 </td> <td style="text-align:center;"> M </td> <td style="text-align:center;"> No </td> <td style="text-align:center;"> 160 </td> <td style="text-align:center;font-weight: bold;color: #1261A5 !important;"> Yes </td> <td style="text-align:center;font-weight: bold;color: #1261A5 !important;"> 0.60 </td> <td style="text-align:center;font-weight: bold;color: #1261A5 !important;"> Yes </td> </tr> </tbody> </table> --- ## Model validation .large[ * Both __`C`__ and __`P`__ models are not always correct in predicting the outcome * The goal of model validation is to evaluate model performance by __comparing predictions against observed values__. * For binary prediction, model validation usually starts with creating a 2x2 contingency table / confusion matrix consisting all four possible __pairs of predicted-observed values__ ] --- ### 2x2 contingency table / Confusion matrix .pull-left[ <table class="table table-hover table-striped" style="font-size: 20px; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:center;background-color: #bbbbbb !important;"> id </th> <th style="text-align:center;background-color: #bbbbbb !important;"> Predicted </th> <th style="text-align:center;background-color: #bbbbbb !important;"> Observed </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> 1 </td> <td style="text-align:center;"> No </td> <td style="text-align:center;"> No </td> </tr> <tr> <td style="text-align:center;"> 2 </td> <td style="text-align:center;"> No </td> <td style="text-align:center;"> No </td> </tr> <tr> <td style="text-align:center;"> 3 </td> <td style="text-align:center;"> No </td> <td style="text-align:center;"> No </td> </tr> <tr> <td style="text-align:center;"> 4 </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> No </td> </tr> <tr> <td style="text-align:center;"> 5 </td> <td style="text-align:center;"> No </td> <td style="text-align:center;"> Yes </td> </tr> <tr> <td style="text-align:center;"> 6 </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> No </td> </tr> <tr> <td style="text-align:center;"> 7 </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> Yes </td> </tr> <tr> <td style="text-align:center;"> 8 </td> <td style="text-align:center;"> No </td> <td style="text-align:center;"> No </td> </tr> <tr> <td style="text-align:center;"> 9 </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> Yes </td> </tr> <tr> <td style="text-align:center;"> 10 </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> Yes </td> </tr> </tbody> </table> ] -- .pull-right[ <table class="table table-hover table-striped" style="font-size: 20px; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:center;background-color: #bbbbbb !important;"> Predicted </th> <th style="text-align:center;background-color: #bbbbbb !important;"> Observed </th> <th style="text-align:center;background-color: #bbbbbb !important;"> Count </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 3 </td> </tr> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> No </td> <td style="text-align:center;"> 2 </td> </tr> <tr> <td style="text-align:center;"> No </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 1 </td> </tr> <tr> <td style="text-align:center;"> No </td> <td style="text-align:center;"> No </td> <td style="text-align:center;"> 4 </td> </tr> </tbody> </table> <br/> <table class="table table-hover table-striped" style="font-size: 20px; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:center;background-color: #bbbbbb !important;"> Predicted </th> <th style="text-align:center;background-color: #bbbbbb !important;"> Observed </th> <th style="text-align:center;background-color: #bbbbbb !important;"> Term </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> + </td> <td style="text-align:center;"> + </td> <td style="text-align:center;font-weight: bold;color: #1261A5 !important;"> True Positive </td> </tr> <tr> <td style="text-align:center;"> + </td> <td style="text-align:center;"> - </td> <td style="text-align:center;font-weight: bold;color: #1261A5 !important;"> False Positive </td> </tr> <tr> <td style="text-align:center;"> - </td> <td style="text-align:center;"> + </td> <td style="text-align:center;font-weight: bold;color: #1261A5 !important;"> False Negative </td> </tr> <tr> <td style="text-align:center;"> - </td> <td style="text-align:center;"> - </td> <td style="text-align:center;font-weight: bold;color: #1261A5 !important;"> True Negative </td> </tr> </tbody> </table> ] --- ### 2x2 contingency table / Confusion matrix .small[ .pull-left[ <table class="table" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="empty-cells: hide;border-bottom:hidden;" colspan="1"></th> <th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="2"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Observed</div></th> </tr> <tr> <th style="text-align:center;"> Predicted </th> <th style="text-align:center;"> + </th> <th style="text-align:center;"> - </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;font-weight: bold;color: #101010 !important;background-color: #f0f0f0 !important;"> + </td> <td style="text-align:center;width: 4cm; background-color: #fafafa !important;"> 3 </td> <td style="text-align:center;width: 4cm; background-color: #fafafa !important;"> 2 </td> </tr> <tr> <td style="text-align:center;font-weight: bold;color: #101010 !important;background-color: #f0f0f0 !important;"> - </td> <td style="text-align:center;width: 4cm; background-color: #fafafa !important;"> 1 </td> <td style="text-align:center;width: 4cm; background-color: #fafafa !important;"> 4 </td> </tr> </tbody> </table> ] .pull-right[ <table class="table" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="empty-cells: hide;border-bottom:hidden;" colspan="1"></th> <th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="2"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Observed</div></th> </tr> <tr> <th style="text-align:center;"> Predicted </th> <th style="text-align:center;"> + </th> <th style="text-align:center;"> - </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;font-weight: bold;color: #101010 !important;background-color: #f0f0f0 !important;"> + </td> <td style="text-align:center;width: 4cm; background-color: #fafafa !important;"> True Positive </td> <td style="text-align:center;width: 4cm; background-color: #fafafa !important;"> False Positive </td> </tr> <tr> <td style="text-align:center;font-weight: bold;color: #101010 !important;background-color: #f0f0f0 !important;"> - </td> <td style="text-align:center;width: 4cm; background-color: #fafafa !important;"> False Negative </td> <td style="text-align:center;width: 4cm; background-color: #fafafa !important;"> True Negative </td> </tr> </tbody> </table> ] <br/> With 2 x 2 contingency table, we can calculate several useful metrics* to evaluate model performance, including: * __sensitivity__, __recall__, __hit / detection rate__, or __true positive rate (TPR)__ * __specificity__, __selectivity__, or __true negative rate (TNR)__ * __precision__ or __positive predictive value (PPV)__ * __negative predictive value (NPV)__ ] <br/><br/> .smaller[ *for a full list, refer to [Wikipedia entry for Confusion matrix](https://en.wikipedia.org/wiki/Confusion_matrix) ] --- ## Sensitivity * a.k.a __True Positive Rate__, __Recall__, __Hit / Detection Rate__ * Probability of __correctly predicting positive outcome__ * What is the probability of a __.blue[positive classification]__, given a __.purple[positive outcome]__? * `\(P(\color{blue}{\hat{Y} = 1} ~ | ~ \color{purple}{Y = 1}\)`) <br/> .pull-left[ <table class="table" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="empty-cells: hide;border-bottom:hidden;" colspan="1"></th> <th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="2"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Observed</div></th> </tr> <tr> <th style="text-align:center;"> Predicted </th> <th style="text-align:center;"> + </th> <th style="text-align:center;"> - </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;font-weight: bold;background-color: #f0f0f0 !important;"> + </td> <td style="text-align:center;width: 4cm; color: #1261A5 !important;background-color: #fafafa !important;background-color: #98FB98 !important;"> TP </td> <td style="text-align:center;width: 4cm; color: #1261A5 !important;background-color: #fafafa !important;"> FP </td> </tr> <tr> <td style="text-align:center;font-weight: bold;background-color: #f0f0f0 !important;"> - </td> <td style="text-align:center;width: 4cm; color: #1261A5 !important;background-color: #fafafa !important;background-color: #98FB98 !important;"> FN </td> <td style="text-align:center;width: 4cm; color: #1261A5 !important;background-color: #fafafa !important;"> TN </td> </tr> </tbody> </table> ] .pull-right[ .small[ `$${Sensitivity}=\frac{\sum{True~Positive}}{\sum{Observed~Positive}}$$` `$${Sensitivity}=\frac{TP}{TP + FN}$$` ] ] --- ## Specificity * a.k.a __Selectivity__, __True Negative Rate__ * Probability of __correctly predicting ngative outcome__ * What is the probability of a __.teal[negative classification]__, given a __.brown[negative outcome]__? * `\(P(\color{teal}{\hat{Y} = 0} ~ | ~ \color{brown}{Y = 0}\)`) <br/> .pull-left[ <table class="table" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="empty-cells: hide;border-bottom:hidden;" colspan="1"></th> <th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="2"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Observed</div></th> </tr> <tr> <th style="text-align:center;"> Predicted </th> <th style="text-align:center;"> + </th> <th style="text-align:center;"> - </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;font-weight: bold;background-color: #f0f0f0 !important;"> + </td> <td style="text-align:center;width: 4cm; color: #1261A5 !important;background-color: #fafafa !important;"> TP </td> <td style="text-align:center;width: 4cm; color: #1261A5 !important;background-color: #fafafa !important;background-color: #98FB98 !important;"> FP </td> </tr> <tr> <td style="text-align:center;font-weight: bold;background-color: #f0f0f0 !important;"> - </td> <td style="text-align:center;width: 4cm; color: #1261A5 !important;background-color: #fafafa !important;"> FN </td> <td style="text-align:center;width: 4cm; color: #1261A5 !important;background-color: #fafafa !important;background-color: #98FB98 !important;"> TN </td> </tr> </tbody> </table> ] .pull-right[ .small[ `$${Specificity}=\frac{\sum{True~Negative}}{\sum{Observed~Negative}}$$` `$${Specificity}=\frac{TN}{TN + FP}$$` ] ] --- ## Positive predictive value (PPV) * Probability of a __.purple[positive outcome]__ given a __.blue[positive classification]__ * `\(P(\color{purple}{Y = 1} ~ | ~ \color{blue}{\hat{Y} = 1}\)`) * For a given model (e.g. diagnostic test) with fixed sensitivity and specificity, **PPV** is __positively correlated__ with __disease prevalence__ * Intuition: as **prevalence increases**, **true positive increases** while **false positives decreases** <br/> .pull-left[ <table class="table" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="empty-cells: hide;border-bottom:hidden;" colspan="1"></th> <th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="2"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Observed</div></th> </tr> <tr> <th style="text-align:center;"> Predicted </th> <th style="text-align:center;"> + </th> <th style="text-align:center;"> - </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;background-color: #98FB98 !important;font-weight: bold;background-color: #f0f0f0 !important;"> + </td> <td style="text-align:center;width: 4cm; color: #1261A5 !important;background-color: #fafafa !important;background-color: #98FB98 !important;"> TP </td> <td style="text-align:center;width: 4cm; color: #1261A5 !important;background-color: #fafafa !important;background-color: #98FB98 !important;"> FP </td> </tr> <tr> <td style="text-align:center;font-weight: bold;background-color: #f0f0f0 !important;"> - </td> <td style="text-align:center;width: 4cm; color: #1261A5 !important;background-color: #fafafa !important;"> FN </td> <td style="text-align:center;width: 4cm; color: #1261A5 !important;background-color: #fafafa !important;"> TN </td> </tr> </tbody> </table> ] .pull-right[ .small[ `$${PPV}=\frac{\sum{True~Positive}}{\sum{Predicted~Positive}}$$` `$${PPV}=\frac{TP}{TP + FP}$$` ] ] --- ## Negative predictive value (NPV) * Probability of a __.brown[negative outcome]__ given a __.teal[negative classification]__ * `\(P(\color{brown}{Y = 0} ~ | ~ \color{teal}{\hat{Y} = 0}\)`) * For a given model (e.g. diagnostic test) with fixed sensitivity and specificity, __NPV__ is __negatively correlated__ with __disease prevalence__ * Intuition: as **prevalence increases**, **true negative decreases** while **false negative increases** <br/> .pull-left[ <table class="table" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="empty-cells: hide;border-bottom:hidden;" colspan="1"></th> <th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="2"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Observed</div></th> </tr> <tr> <th style="text-align:center;"> Predicted </th> <th style="text-align:center;"> + </th> <th style="text-align:center;"> - </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;font-weight: bold;background-color: #f0f0f0 !important;"> + </td> <td style="text-align:center;width: 4cm; color: #1261A5 !important;background-color: #fafafa !important;"> TP </td> <td style="text-align:center;width: 4cm; color: #1261A5 !important;background-color: #fafafa !important;"> FP </td> </tr> <tr> <td style="text-align:center;background-color: #98FB98 !important;font-weight: bold;background-color: #f0f0f0 !important;"> - </td> <td style="text-align:center;width: 4cm; color: #1261A5 !important;background-color: #fafafa !important;background-color: #98FB98 !important;"> FN </td> <td style="text-align:center;width: 4cm; color: #1261A5 !important;background-color: #fafafa !important;background-color: #98FB98 !important;"> TN </td> </tr> </tbody> </table> ] .pull-right[ .small[ `$${NPV}=\frac{\sum{True~Negative}}{\sum{Predicted~Negative}}$$` `$${NPV}=\frac{TN}{TN + FN}$$` ] ] --- ### Comparing _classifications_ with observations Suppose we have the following contingency table for classification model `C`: <table class="table" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="empty-cells: hide;border-bottom:hidden;" colspan="1"></th> <th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="2"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Observed</div></th> <th style="empty-cells: hide;border-bottom:hidden;" colspan="1"></th> </tr> <tr> <th style="text-align:center;"> Predicted </th> <th style="text-align:center;"> + </th> <th style="text-align:center;"> - </th> <th style="text-align:center;"> Total </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;font-weight: bold;color: #101010 !important;background-color: #f0f0f0 !important;"> + </td> <td style="text-align:center;width: 4cm; background-color: #fafafa !important;"> 3 </td> <td style="text-align:center;width: 4cm; background-color: #fafafa !important;"> 2 </td> <td style="text-align:center;width: 4cm; background-color: #fafafa !important;font-weight: bold;color: #1261A5 !important;background-color: #f0f0f0 !important;"> 5 </td> </tr> <tr> <td style="text-align:center;font-weight: bold;color: #101010 !important;background-color: #f0f0f0 !important;"> - </td> <td style="text-align:center;width: 4cm; background-color: #fafafa !important;"> 1 </td> <td style="text-align:center;width: 4cm; background-color: #fafafa !important;"> 4 </td> <td style="text-align:center;width: 4cm; background-color: #fafafa !important;font-weight: bold;color: #1261A5 !important;background-color: #f0f0f0 !important;"> 5 </td> </tr> <tr> <td style="text-align:center;font-weight: bold;color: #1261A5 !important;background-color: #f0f0f0 !important;font-weight: bold;color: #101010 !important;background-color: #f0f0f0 !important;"> Total </td> <td style="text-align:center;width: 4cm; background-color: #fafafa !important;font-weight: bold;color: #1261A5 !important;background-color: #f0f0f0 !important;"> 4 </td> <td style="text-align:center;width: 4cm; background-color: #fafafa !important;font-weight: bold;color: #1261A5 !important;background-color: #f0f0f0 !important;"> 6 </td> <td style="text-align:center;width: 4cm; background-color: #fafafa !important;font-weight: bold;color: #1261A5 !important;background-color: #f0f0f0 !important;font-weight: bold;color: #1261A5 !important;background-color: #f0f0f0 !important;"> 10 </td> </tr> </tbody> </table> .large[ .pull-left[ Sensitivity __= ?__ Specificity __= ?__ ] .pull-right[ PPV __= ?__ NPV __= ?__ ] ] --- ### Comparing _classifications_ with observations Suppose we have the following contingency table for classification model `C`: <table class="table" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="empty-cells: hide;border-bottom:hidden;" colspan="1"></th> <th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="2"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Observed</div></th> <th style="empty-cells: hide;border-bottom:hidden;" colspan="1"></th> </tr> <tr> <th style="text-align:center;"> Predicted </th> <th style="text-align:center;"> + </th> <th style="text-align:center;"> - </th> <th style="text-align:center;"> Total </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;font-weight: bold;color: #101010 !important;background-color: #f0f0f0 !important;"> + </td> <td style="text-align:center;width: 4cm; background-color: #fafafa !important;"> 3 </td> <td style="text-align:center;width: 4cm; background-color: #fafafa !important;"> 2 </td> <td style="text-align:center;width: 4cm; background-color: #fafafa !important;font-weight: bold;color: #1261A5 !important;background-color: #f0f0f0 !important;"> 5 </td> </tr> <tr> <td style="text-align:center;font-weight: bold;color: #101010 !important;background-color: #f0f0f0 !important;"> - </td> <td style="text-align:center;width: 4cm; background-color: #fafafa !important;"> 1 </td> <td style="text-align:center;width: 4cm; background-color: #fafafa !important;"> 4 </td> <td style="text-align:center;width: 4cm; background-color: #fafafa !important;font-weight: bold;color: #1261A5 !important;background-color: #f0f0f0 !important;"> 5 </td> </tr> <tr> <td style="text-align:center;font-weight: bold;color: #1261A5 !important;background-color: #f0f0f0 !important;font-weight: bold;color: #101010 !important;background-color: #f0f0f0 !important;"> Total </td> <td style="text-align:center;width: 4cm; background-color: #fafafa !important;font-weight: bold;color: #1261A5 !important;background-color: #f0f0f0 !important;"> 4 </td> <td style="text-align:center;width: 4cm; background-color: #fafafa !important;font-weight: bold;color: #1261A5 !important;background-color: #f0f0f0 !important;"> 6 </td> <td style="text-align:center;width: 4cm; background-color: #fafafa !important;font-weight: bold;color: #1261A5 !important;background-color: #f0f0f0 !important;font-weight: bold;color: #1261A5 !important;background-color: #f0f0f0 !important;"> 10 </td> </tr> </tbody> </table> .large[ .pull-left[ Sensitivity __= 3/4 = 0.75__ Specificity __= 4/6 = 0.67__ ] .pull-right[ PPV __= 3/5 = 0.6__ NPV __= 4/5 = 0.8__ ] ] --- ### Comparing _predictions_ with observations .small[ For prediction model `P`, we order by predicted probability and choose a __cut-off point__ to classify as "Yes", e.g. __"Yes" if probability (P) > 0.1__ <table class="table" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:right;"> id </th> <th style="text-align:right;"> Predict </th> <th style="text-align:left;"> Observed </th> <th style="text-align:left;"> Prob >0.1 </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 0.05 </td> <td style="text-align:left;"> No </td> <td style="text-align:left;"> <span style=" color: red !important;">No</span> </td> </tr> <tr> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 0.10 </td> <td style="text-align:left;"> No </td> <td style="text-align:left;"> <span style=" color: red !important;">No</span> </td> </tr> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 0.15 </td> <td style="text-align:left;"> No </td> <td style="text-align:left;"> <span style=" color: green !important;">Yes</span> </td> </tr> <tr> <td style="text-align:right;"> 5 </td> <td style="text-align:right;"> 0.30 </td> <td style="text-align:left;"> Yes </td> <td style="text-align:left;"> <span style=" color: green !important;">Yes</span> </td> </tr> <tr> <td style="text-align:right;"> 8 </td> <td style="text-align:right;"> 0.40 </td> <td style="text-align:left;"> No </td> <td style="text-align:left;"> <span style=" color: green !important;">Yes</span> </td> </tr> <tr> <td style="text-align:right;"> 6 </td> <td style="text-align:right;"> 0.52 </td> <td style="text-align:left;"> No </td> <td style="text-align:left;"> <span style=" color: green !important;">Yes</span> </td> </tr> <tr> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> 0.55 </td> <td style="text-align:left;"> No </td> <td style="text-align:left;"> <span style=" color: green !important;">Yes</span> </td> </tr> <tr> <td style="text-align:right;"> 9 </td> <td style="text-align:right;"> 0.55 </td> <td style="text-align:left;"> Yes </td> <td style="text-align:left;"> <span style=" color: green !important;">Yes</span> </td> </tr> <tr> <td style="text-align:right;"> 7 </td> <td style="text-align:right;"> 0.60 </td> <td style="text-align:left;"> Yes </td> <td style="text-align:left;"> <span style=" color: green !important;">Yes</span> </td> </tr> <tr> <td style="text-align:right;"> 10 </td> <td style="text-align:right;"> 0.60 </td> <td style="text-align:left;"> Yes </td> <td style="text-align:left;"> <span style=" color: green !important;">Yes</span> </td> </tr> </tbody> </table> ] --- ### Cut-off point: _Yes_ if P > 0.1 <table class="table" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="empty-cells: hide;border-bottom:hidden;" colspan="1"></th> <th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="2"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Observed</div></th> <th style="empty-cells: hide;border-bottom:hidden;" colspan="1"></th> </tr> <tr> <th style="text-align:center;"> Predicted </th> <th style="text-align:center;"> + </th> <th style="text-align:center;"> - </th> <th style="text-align:center;"> Total </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;font-weight: bold;color: #101010 !important;background-color: #f0f0f0 !important;"> + </td> <td style="text-align:center;width: 4cm; background-color: #fafafa !important;"> 4 </td> <td style="text-align:center;width: 4cm; background-color: #fafafa !important;"> 4 </td> <td style="text-align:center;width: 4cm; background-color: #fafafa !important;font-weight: bold;color: #1261A5 !important;background-color: #f0f0f0 !important;"> 8 </td> </tr> <tr> <td style="text-align:center;font-weight: bold;color: #101010 !important;background-color: #f0f0f0 !important;"> - </td> <td style="text-align:center;width: 4cm; background-color: #fafafa !important;"> 0 </td> <td style="text-align:center;width: 4cm; background-color: #fafafa !important;"> 2 </td> <td style="text-align:center;width: 4cm; background-color: #fafafa !important;font-weight: bold;color: #1261A5 !important;background-color: #f0f0f0 !important;"> 2 </td> </tr> <tr> <td style="text-align:center;font-weight: bold;color: #1261A5 !important;background-color: #f0f0f0 !important;font-weight: bold;color: #101010 !important;background-color: #f0f0f0 !important;"> Total </td> <td style="text-align:center;width: 4cm; background-color: #fafafa !important;font-weight: bold;color: #1261A5 !important;background-color: #f0f0f0 !important;"> 4 </td> <td style="text-align:center;width: 4cm; background-color: #fafafa !important;font-weight: bold;color: #1261A5 !important;background-color: #f0f0f0 !important;"> 6 </td> <td style="text-align:center;width: 4cm; background-color: #fafafa !important;font-weight: bold;color: #1261A5 !important;background-color: #f0f0f0 !important;font-weight: bold;color: #1261A5 !important;background-color: #f0f0f0 !important;"> 10 </td> </tr> </tbody> </table> .large[ .pull-left[ Sensitivity __= 4/4 = 1__ Specificity __= 2/6 = 0.33__ ] .pull-right[ PPV __= 4/8 = 0.5__ NPV __= 2/2 = 1__ ] ] <br/><br/> Higher sensitivity, lower specificity than classification model `C` --- ### Cut-off point: _Yes_ if P > 0.4 .small[ <table class="table" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:right;"> id </th> <th style="text-align:right;"> Predict </th> <th style="text-align:left;"> Observed </th> <th style="text-align:left;"> Prob >0.4 </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 0.05 </td> <td style="text-align:left;"> No </td> <td style="text-align:left;"> <span style=" color: red !important;">No</span> </td> </tr> <tr> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 0.10 </td> <td style="text-align:left;"> No </td> <td style="text-align:left;"> <span style=" color: red !important;">No</span> </td> </tr> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 0.15 </td> <td style="text-align:left;"> No </td> <td style="text-align:left;"> <span style=" color: red !important;">No</span> </td> </tr> <tr> <td style="text-align:right;"> 5 </td> <td style="text-align:right;"> 0.30 </td> <td style="text-align:left;"> Yes </td> <td style="text-align:left;"> <span style=" color: red !important;">No</span> </td> </tr> <tr> <td style="text-align:right;"> 8 </td> <td style="text-align:right;"> 0.40 </td> <td style="text-align:left;"> No </td> <td style="text-align:left;"> <span style=" color: red !important;">No</span> </td> </tr> <tr> <td style="text-align:right;"> 6 </td> <td style="text-align:right;"> 0.52 </td> <td style="text-align:left;"> No </td> <td style="text-align:left;"> <span style=" color: green !important;">Yes</span> </td> </tr> <tr> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> 0.55 </td> <td style="text-align:left;"> No </td> <td style="text-align:left;"> <span style=" color: green !important;">Yes</span> </td> </tr> <tr> <td style="text-align:right;"> 9 </td> <td style="text-align:right;"> 0.55 </td> <td style="text-align:left;"> Yes </td> <td style="text-align:left;"> <span style=" color: green !important;">Yes</span> </td> </tr> <tr> <td style="text-align:right;"> 7 </td> <td style="text-align:right;"> 0.60 </td> <td style="text-align:left;"> Yes </td> <td style="text-align:left;"> <span style=" color: green !important;">Yes</span> </td> </tr> <tr> <td style="text-align:right;"> 10 </td> <td style="text-align:right;"> 0.60 </td> <td style="text-align:left;"> Yes </td> <td style="text-align:left;"> <span style=" color: green !important;">Yes</span> </td> </tr> </tbody> </table> ] --- ### Cut-off point: _Yes_ if P > 0.4 <table class="table" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="empty-cells: hide;border-bottom:hidden;" colspan="1"></th> <th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="2"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Observed</div></th> <th style="empty-cells: hide;border-bottom:hidden;" colspan="1"></th> </tr> <tr> <th style="text-align:center;"> Predicted </th> <th style="text-align:center;"> + </th> <th style="text-align:center;"> - </th> <th style="text-align:center;"> Total </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;font-weight: bold;color: #101010 !important;background-color: #f0f0f0 !important;"> + </td> <td style="text-align:center;width: 4cm; background-color: #fafafa !important;"> 3 </td> <td style="text-align:center;width: 4cm; background-color: #fafafa !important;"> 2 </td> <td style="text-align:center;width: 4cm; background-color: #fafafa !important;font-weight: bold;color: #1261A5 !important;background-color: #f0f0f0 !important;"> 5 </td> </tr> <tr> <td style="text-align:center;font-weight: bold;color: #101010 !important;background-color: #f0f0f0 !important;"> - </td> <td style="text-align:center;width: 4cm; background-color: #fafafa !important;"> 1 </td> <td style="text-align:center;width: 4cm; background-color: #fafafa !important;"> 4 </td> <td style="text-align:center;width: 4cm; background-color: #fafafa !important;font-weight: bold;color: #1261A5 !important;background-color: #f0f0f0 !important;"> 5 </td> </tr> <tr> <td style="text-align:center;font-weight: bold;color: #1261A5 !important;background-color: #f0f0f0 !important;font-weight: bold;color: #101010 !important;background-color: #f0f0f0 !important;"> Total </td> <td style="text-align:center;width: 4cm; background-color: #fafafa !important;font-weight: bold;color: #1261A5 !important;background-color: #f0f0f0 !important;"> 4 </td> <td style="text-align:center;width: 4cm; background-color: #fafafa !important;font-weight: bold;color: #1261A5 !important;background-color: #f0f0f0 !important;"> 6 </td> <td style="text-align:center;width: 4cm; background-color: #fafafa !important;font-weight: bold;color: #1261A5 !important;background-color: #f0f0f0 !important;font-weight: bold;color: #1261A5 !important;background-color: #f0f0f0 !important;"> 10 </td> </tr> </tbody> </table> .large[ .pull-left[ Sensitivity __= 3/4 = 0.75__ Specificity __= 4/6 = 0.67__ ] .pull-right[ PPV __= 3/5 = 0.6__ NPV __= 4/5 = 0.8__ ] ] <br/><br/> Same contingency table as classification model `C` --- ### Cut-off point: _Yes_ if P > 0.55 .small[ <table class="table" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:right;"> id </th> <th style="text-align:right;"> Predict </th> <th style="text-align:left;"> Observed </th> <th style="text-align:left;"> Prob >0.55 </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 0.05 </td> <td style="text-align:left;"> No </td> <td style="text-align:left;"> <span style=" color: red !important;">No</span> </td> </tr> <tr> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 0.10 </td> <td style="text-align:left;"> No </td> <td style="text-align:left;"> <span style=" color: red !important;">No</span> </td> </tr> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 0.15 </td> <td style="text-align:left;"> No </td> <td style="text-align:left;"> <span style=" color: red !important;">No</span> </td> </tr> <tr> <td style="text-align:right;"> 5 </td> <td style="text-align:right;"> 0.30 </td> <td style="text-align:left;"> Yes </td> <td style="text-align:left;"> <span style=" color: red !important;">No</span> </td> </tr> <tr> <td style="text-align:right;"> 8 </td> <td style="text-align:right;"> 0.40 </td> <td style="text-align:left;"> No </td> <td style="text-align:left;"> <span style=" color: red !important;">No</span> </td> </tr> <tr> <td style="text-align:right;"> 6 </td> <td style="text-align:right;"> 0.52 </td> <td style="text-align:left;"> No </td> <td style="text-align:left;"> <span style=" color: red !important;">No</span> </td> </tr> <tr> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> 0.55 </td> <td style="text-align:left;"> No </td> <td style="text-align:left;"> <span style=" color: red !important;">No</span> </td> </tr> <tr> <td style="text-align:right;"> 9 </td> <td style="text-align:right;"> 0.55 </td> <td style="text-align:left;"> Yes </td> <td style="text-align:left;"> <span style=" color: red !important;">No</span> </td> </tr> <tr> <td style="text-align:right;"> 7 </td> <td style="text-align:right;"> 0.60 </td> <td style="text-align:left;"> Yes </td> <td style="text-align:left;"> <span style=" color: green !important;">Yes</span> </td> </tr> <tr> <td style="text-align:right;"> 10 </td> <td style="text-align:right;"> 0.60 </td> <td style="text-align:left;"> Yes </td> <td style="text-align:left;"> <span style=" color: green !important;">Yes</span> </td> </tr> </tbody> </table> ] --- ### Cut-off point: _Yes_ if P > 0.55 <table class="table" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="empty-cells: hide;border-bottom:hidden;" colspan="1"></th> <th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="2"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Observed</div></th> <th style="empty-cells: hide;border-bottom:hidden;" colspan="1"></th> </tr> <tr> <th style="text-align:center;"> Predicted </th> <th style="text-align:center;"> + </th> <th style="text-align:center;"> - </th> <th style="text-align:center;"> Total </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;font-weight: bold;color: #101010 !important;background-color: #f0f0f0 !important;"> + </td> <td style="text-align:center;width: 4cm; background-color: #fafafa !important;"> 2 </td> <td style="text-align:center;width: 4cm; background-color: #fafafa !important;"> 0 </td> <td style="text-align:center;width: 4cm; background-color: #fafafa !important;font-weight: bold;color: #1261A5 !important;background-color: #f0f0f0 !important;"> 2 </td> </tr> <tr> <td style="text-align:center;font-weight: bold;color: #101010 !important;background-color: #f0f0f0 !important;"> - </td> <td style="text-align:center;width: 4cm; background-color: #fafafa !important;"> 2 </td> <td style="text-align:center;width: 4cm; background-color: #fafafa !important;"> 6 </td> <td style="text-align:center;width: 4cm; background-color: #fafafa !important;font-weight: bold;color: #1261A5 !important;background-color: #f0f0f0 !important;"> 8 </td> </tr> <tr> <td style="text-align:center;font-weight: bold;color: #1261A5 !important;background-color: #f0f0f0 !important;font-weight: bold;color: #101010 !important;background-color: #f0f0f0 !important;"> Total </td> <td style="text-align:center;width: 4cm; background-color: #fafafa !important;font-weight: bold;color: #1261A5 !important;background-color: #f0f0f0 !important;"> 4 </td> <td style="text-align:center;width: 4cm; background-color: #fafafa !important;font-weight: bold;color: #1261A5 !important;background-color: #f0f0f0 !important;"> 6 </td> <td style="text-align:center;width: 4cm; background-color: #fafafa !important;font-weight: bold;color: #1261A5 !important;background-color: #f0f0f0 !important;font-weight: bold;color: #1261A5 !important;background-color: #f0f0f0 !important;"> 10 </td> </tr> </tbody> </table> .large[ .pull-left[ Sensitivity __= 2/4 = 0.5__ Specificity __= 6/6 = 1__ ] .pull-right[ PPV __= 2/2 = 1__ NPV __= 6/8 = 0.75__ ] ] <br/><br/> Lower sensitivity, higher specificity than classification model `C` --- ## All cut-off points .small[ If we repeat this process for each probability value, we can obtain a list of sensitivity and specificity values <table class="table" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:right;"> id </th> <th style="text-align:right;"> Predict </th> <th style="text-align:left;"> Observed </th> <th style="text-align:left;"> Cut-off </th> <th style="text-align:right;"> Sensitivity </th> <th style="text-align:right;"> Specificity </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 0.05 </td> <td style="text-align:left;"> No </td> <td style="text-align:left;"> P >0.05 </td> <td style="text-align:right;"> 1.00 </td> <td style="text-align:right;"> 0.17 </td> </tr> <tr> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 0.10 </td> <td style="text-align:left;"> No </td> <td style="text-align:left;"> P >0.1 </td> <td style="text-align:right;"> 1.00 </td> <td style="text-align:right;"> 0.33 </td> </tr> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 0.15 </td> <td style="text-align:left;"> No </td> <td style="text-align:left;"> P >0.15 </td> <td style="text-align:right;"> 1.00 </td> <td style="text-align:right;"> 0.50 </td> </tr> <tr> <td style="text-align:right;"> 5 </td> <td style="text-align:right;"> 0.30 </td> <td style="text-align:left;"> Yes </td> <td style="text-align:left;"> P >0.3 </td> <td style="text-align:right;"> 0.75 </td> <td style="text-align:right;"> 0.50 </td> </tr> <tr> <td style="text-align:right;"> 8 </td> <td style="text-align:right;"> 0.40 </td> <td style="text-align:left;"> No </td> <td style="text-align:left;"> P >0.4 </td> <td style="text-align:right;"> 0.75 </td> <td style="text-align:right;"> 0.67 </td> </tr> <tr> <td style="text-align:right;"> 6 </td> <td style="text-align:right;"> 0.52 </td> <td style="text-align:left;"> No </td> <td style="text-align:left;"> P >0.52 </td> <td style="text-align:right;"> 0.75 </td> <td style="text-align:right;"> 0.83 </td> </tr> <tr> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> 0.55 </td> <td style="text-align:left;"> No </td> <td style="text-align:left;"> P >0.55 </td> <td style="text-align:right;"> 0.50 </td> <td style="text-align:right;"> 1.00 </td> </tr> <tr> <td style="text-align:right;"> 9 </td> <td style="text-align:right;"> 0.55 </td> <td style="text-align:left;"> Yes </td> <td style="text-align:left;"> P >0.55 </td> <td style="text-align:right;"> 0.50 </td> <td style="text-align:right;"> 1.00 </td> </tr> <tr> <td style="text-align:right;"> 7 </td> <td style="text-align:right;"> 0.60 </td> <td style="text-align:left;"> Yes </td> <td style="text-align:left;"> P >0.6 </td> <td style="text-align:right;"> 0.00 </td> <td style="text-align:right;"> 1.00 </td> </tr> <tr> <td style="text-align:right;"> 10 </td> <td style="text-align:right;"> 0.60 </td> <td style="text-align:left;"> Yes </td> <td style="text-align:left;"> P >0.6 </td> <td style="text-align:right;"> 0.00 </td> <td style="text-align:right;"> 1.00 </td> </tr> </tbody> </table> ] --- ### .small[Receiver Operating Characterictic (ROC) Curve] .small[A curve linking all the sensitivity against the specificity values] .left-column[ <table class="table" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:right;"> Sens </th> <th style="text-align:right;"> Spec </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 1.00 </td> <td style="text-align:right;"> 0.17 </td> </tr> <tr> <td style="text-align:right;"> 1.00 </td> <td style="text-align:right;"> 0.33 </td> </tr> <tr> <td style="text-align:right;"> 1.00 </td> <td style="text-align:right;"> 0.50 </td> </tr> <tr> <td style="text-align:right;"> 0.75 </td> <td style="text-align:right;"> 0.50 </td> </tr> <tr> <td style="text-align:right;"> 0.75 </td> <td style="text-align:right;"> 0.67 </td> </tr> <tr> <td style="text-align:right;"> 0.75 </td> <td style="text-align:right;"> 0.83 </td> </tr> <tr> <td style="text-align:right;"> 0.50 </td> <td style="text-align:right;"> 1.00 </td> </tr> <tr> <td style="text-align:right;"> 0.50 </td> <td style="text-align:right;"> 1.00 </td> </tr> <tr> <td style="text-align:right;"> 0.00 </td> <td style="text-align:right;"> 1.00 </td> </tr> <tr> <td style="text-align:right;"> 0.00 </td> <td style="text-align:right;"> 1.00 </td> </tr> </tbody> </table> ] .right-column[ <img src="index_files/figure-html/unnamed-chunk-24-1.png" width="80%" /> ] --- ### .small[Area Under the [ROC] Curve (AUC / AUROC)] What is the AUC? <img src="index_files/figure-html/unnamed-chunk-25-1.png" width="60%" /> --- ### .small[Area Under the [ROC] Curve (AUC / AUROC)] What is the AUC? Sample calculation with [yardstick package](https://github.com/tidymodels/yardstick) in R ```r # df = the dataset shown in previous slides roc <- yardstick::roc_auc(df, truth = Observed, Predict) roc ``` ``` ## # A tibble: 1 × 3 ## .metric .estimator .estimate ## <chr> <chr> <dbl> ## 1 roc_auc binary 0.854 ``` __AUC = 0.8541667__ --- ### .small[AUC and the distributions of predictors in the two outcome groups at different cut-off values] ![](https://github.com/dariyasydykova/open_projects/blob/master/ROC_animation/animations/cutoff.gif?raw=true) .footnote[by Dariya Sydykova ([follow link for more info](https://github.com/dariyasydykova/open_projects/tree/master/ROC_animation))] --- ### AUC as a measure of model discrimination ![](https://github.com/dariyasydykova/open_projects/blob/master/ROC_animation/animations/ROC.gif?raw=true) .footnote[by Dariya Sydykova ([follow link for more info](https://github.com/dariyasydykova/open_projects/tree/master/ROC_animation))] --- ### What does AUC tell us? * AUC estimates the __probability that a randomly chosen observed “yes” was assigned a higher probability than a randomly observed “no”__ by the model * Real world models will have __AUC from 0.5 to 1__. A value closer to 1 indicates better performance in separating "yes" and "no". * For binary classification, AUC is equal to __concordance (C) statistic__ --- ### How do we come up with predictions? * We propose a statistical model for the __probability of the event happening__ `\(P\left(Y_{i}=1\right)\)` depending on the other variables * For example a __logistic model__: `$$\log \left(\frac{P\left(Y_{i}=1\right)}{1-P\left(Y_{i}=1\right)}\right)=\beta_{0}+\beta_{1} X_{i}+\beta_{2} Z_{i}+\cdots \tag{1}$$` * We need a __training set__ where we can observe all the variables `\(Y_{i}, X_{i}, Z_{i}, \ldots\)` to estimate the __coefficients__ `\(\beta_{0}, \beta_{1}, \beta_{2}, \dots\)` * Once we have the coefficients that best fit the data we can __calculate the predicted risk for each individual__ `\(i\)` `$$\widehat{P}\left(Y_{i}=1\right)=\frac{e^{\widehat{\beta}_{0}+\widehat{\beta}_{1} X_{i}+\widehat{\beta}_{2} Z_{i}+\cdots}}{1+e^{\widehat{\beta}_{0}+\hat{\beta}_{1} X_{i}+\hat{\beta}_{2} Z_{i}+\cdots}} \tag{2}$$` --- .small[ ### Dataset for model development & model validation #### Internal validation * The validity of claims for the underlying __population where the data originated from__ __(reproducibility)__ * __Split sample validation__: split dataset randomly into __training__ (for model development) and __test__ set (for validation) * Other methods: __cross validation__ and __bootstrap__ resampling #### External validation * Generalizability of claims to __‘plausibly related’ populations__ not included in the initial study population __(transportability)__ * e.g. __temporal__ or __geographical__ validation .footnote[[Steyerberg EW, Vergouwe Y (2014)](https://doi.org/10.1093/eurheartj/ehu207)] ] --- ### A larger example with 2000 individuals We will use the variables `Age, Sex, SBP, and BMI` to predict if the person will be dead (`Death = 1`) or alive (`Death= 0`) in 5 years time <img src="figure/stata_data.png" width="668" height="400px" /> --- ### Model _M1_: Logistic regression Stata command: `logistic dead age sex sbp bmi` <img src="figure/stata_logReg.png" width="80%" /> Note that BMI is not statistically significant (`P = 0.185`) --- ### Make predictions from Model _M1_ Create variable: __logit of the probability of death__ - __equation (1)__ `predict m1lp, xb` Create variable: __predicted probability of death__ - __equation (2)__ `predict m1pr` <img src="figure/stata_logReg_predict.png" width="80%" /> --- #### Predicted probability of death in __`dead`__ and __`alive`__ group ![:scale 60%](figure/stata_boxPlot_1.png) --- ## Model calibration Evaluate __goodness of fit__ with __Hosmer-Lemeshow test__ `estat gof, group(10) table` ![:scale 70%](figure/stata_gof.png) --- ## Model calibration __Goodness of fit__: Observed and expected events by deciles of risk .center[![:scale 50%](figure/stata_gof_decile.png)] -- .small[__.crimson[Limitation:]__ Hosmer-Lemeshow test can not tell the direction of miscalibration and relies on arbitrary grouping] --- ### Model calibration with calibration plot .smaller[ - __Calibration-in-the-large__: compares average predicted risk with observed risk (__0__: ideal, __<0__: underestimation, __>0__: overestimation) - __Calibration slope__: evaluate spread of estimated risk (__1__: ideal, __<1__: too extreme, __>1__: too moderate) .center[![:scale 75%](figure/calibration_plot.png)] Further discussion: [Steyerberg EW, Vergouwe Y (2014)](https://doi.org/10.1093/eurheartj/ehu207) and [Calster BV _et al._ (2019)](https://bmcmedicine.biomedcentral.com/articles/10.1186/s12916-019-1466-7) ] --- ### Contingency table: cut-off `\(P(Y = 1) \geq 0.3\)` `estat classification, cutoff(0.3)` <img src="figure/stata_classify.png" width="70%" /> --- ### ROC curve from model _M1_ .center[ ![:scale 40%](figure/stata_roc_1.png) __`AUC = 0.78`__ ] There is a 78% probability that a person who died is assigned a higher predicted risk by the model than a person was alive by the end of the follow up --- .small[ ### Sensitivity and Specificity for model _M1_ <br/> by cut-off value .center[![:scale 45%](figure/stata_sensSpec_1.png)] Plotting __sensitivity and specificity against cut-off value__ can help to select the most appropriate cut-off In practice, this trade-off often needs to be decided on a __case-by-case__ basis ] --- ### Evaluating prediction model performance __Calibration__ * The agreement between the predicted & observed outcomes * For a group of patients with 10% predicted risk, do 10% experience the outcome? * e.g. Goodness of fit test, calibration curve __Discrimination__ * The ability of the model to distinguish between _positive_ and _negative_ outcome * e.g. AUC / C statistic __Clinical usefulness__ * Does the model provide accurate predictions at the patient level that can be used to guide clinical decision making? * e.g. [Decision curve analysis](http://www.decisioncurveanalysis.org/) --- ## Summary * Prediction modelling can be broadly categorised into __deterministic__ and __probabilistic__ methods * __2 x 2 contingency table / confusion matrix__ is useful as a first step to evaluate model performance * __AUC__ is a useful measure of the model __discrimination__ * Comparing observed and predicted risks is useful for model __calibration__ * Assessing __clinical usefulness__ requires other approaches and often requires insights from ___beyond the data___ --- ## References and further reading * Textbook: [Steyerberg EW. Clinical Prediction Models: A Practical Approach to Development, Validation, and Updating. Springer, New York, NY; 2009.](https://link.springer.com/book/10.1007/978-0-387-77244-8) * Practical article: [Steyerberg EW, Vergouwe Y. Towards better clinical prediction models: seven steps for development and an ABCD for validation. Eur Heart J. 2014 Aug 1;35(29):1925–1931.](https://academic.oup.com/eurheartj/article/35/29/1925/2293109) * Reporting guide: [Moons KG, Altman DG, Reitsma JB, et al. Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD): Explanation and Elaboration. Ann Intern Med. 2015;162:W1–W73.](https://annals.org/aim/fullarticle/2088542/) * Comparison with machine learning: [Breiman L. Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author). Stat Sci. Institute of Mathematical Statistics; 2001 Aug;16(3):199–231.](https://projecteuclid.org/euclid.ss/1009213726)