Show simple item record

dc.contributor.authorDavis, James
dc.date.accessioned2023-07-17T14:10:07Z
dc.date.available2023-07-17T14:10:07Z
dc.date.issued2023en
dc.date.submitted2023
dc.identifier.citationDavis, James, Identification of predictors of gait speeds in The Irish Longitudinal Study on Ageing using automated feature selection and explainable machine learning, Trinity College Dublin, School of Medicine, Medical Gerontology, 2023en
dc.identifier.otherYen
dc.descriptionAPPROVEDen
dc.description.abstractThis work reports the novel application of automated feature selection and explainable machine learning to identify and compare, in participants aged 50 years or over from wave 3 of The Irish Longitudinal Study on Ageing (TILDA), predictors of three gait speed modalities: usual gait speed (UGS), maximum gait speed (MGS), and gait speed reserve (GSR = MGS - UGS). The principal aim of the investigation was to identify which factors were associated with each gait modality, with a comparative focus on GSR. Stepwise feature selection was applied to shortlists of input features covering multiple domains, including demographics, anthropometrics, medical history, cognition, cardiovascular system, physical strength, and sensory and psychological domains. In a first experiment using data from 2397 participants, a stepwise linear regression-based feature selection algorithm was applied to a shortlist of 34 input features in the prediction of GSR. A mean R_adj^2 (SD) 5-fold cross-validation score of 0.16 (0.03) was achieved with 14 variables (with 80% training and 20% test R_adj^2 scores of 0.18 and 0.16, respectively). Of the 14 selected features, 11 had statistically significant (p<0.05) effects in the model: sex, Montreal Cognitive Assessment (MOCA) score, third level education, chair stands time, age, body mass index (BMI), grip strength, cardiac output, number of medications, fear of falling (FOF), and mean cognitive reaction time (CRT). In a second experiment, explainable machine learning was applied to an expanded set of 88 input features. Using data from 3925 participants, features were selected by a histogram gradient boosting regression-based stepwise feature selection algorithm. Feature importance and input-output relationships were explored using TreeExplainer from the Shapely Additive Explanations (SHAP) explainable machine learning package. The mean R_adj^2 (SD) from 5-fold cross-validation score on training data and the R_adj^2 score on test data were: 0.38 (0.04) and 0.41 for UGS; 0.45 (0.04) and 0.46 for MGS; and 0.19 (0.02) and 0.21 for GSR, respectively. Selected features by decreasing SHAP values were education, grip strength, mean CRT motor reaction time, MOCA errors, age, chair stands time, height, sex, accuracy proportion in the sound-induced flash illusion, FOF, orthostatic intolerance, Mini-Mental State Examination (MMSE) errors, and number of cardiovascular conditions. Both models selected features across multiple input domains, underscoring the nature of GSR as a measure of individual reserve across multiple physiological systems. In the prediction of GSR, both algorithms identified the importance of prospectively non-modifiable factors such as advancing age, female sex, lower educational attainment, and existing morbidities; but also highlighted potentially modifiable factors such as reduced upper and lower body strength (lower grip strength and longer chair stands time, respectively), lower cognitive (MOCA) and psychomotor performance (CRT), and lower self-efficacy in the psychological domain (fear of falling). R_adj^2 scores were marginally higher with machine learning; yet the main advantage of this algorithm over the linear regression-based pipeline is that it allowed for the identification of clinically meaningful non-linearities in the visualised relationship between selected features and GSR. Potential clinical cut-offs and regions of interest for certain features were identifiable, making the models highly interpretable for clinicians. Although the linear modelling was faster and simpler to use, results suggest that the tree-based explainable machine learning methodology is preferable due to its non-parametric nature and a model explainer such as SHAP that allows for visualisation of input-output relationships. In older adults, the demonstration of GSR is necessary on a daily basis to maintain independent living (e.g., for being able to complete a road crossing or catch a means of public transport). Overall, findings support a network physiology approach to the study of physiological reserve and could help policy makers and clinicians design strategies to promote resilience and functional independence in community-dwelling older adults.en
dc.language.isoenen
dc.publisherTrinity College Dublin. School of Medicine. Discipline of Medical Gerontologyen
dc.rightsYen
dc.subjecttildaen
dc.subjectolder adultsen
dc.subjectageingen
dc.subjectwalkingen
dc.subjectgait speeden
dc.subjectphysiologicalen
dc.subjectreserveen
dc.subjectexplainable machine learningen
dc.subjectgaiten
dc.titleIdentification of predictors of gait speeds in The Irish Longitudinal Study on Ageing using automated feature selection and explainable machine learningen
dc.typeThesisen
dc.type.supercollectionthesis_dissertationsen
dc.type.supercollectionrefereed_publicationsen
dc.type.qualificationlevelMastersen
dc.identifier.peoplefinderurlhttps://tcdlocalportal.tcd.ie/pls/EnterApex/f?p=800:71:0::::P71_USERNAME:DAVISJ5en
dc.identifier.rssinternalid257041en
dc.rights.ecaccessrightsopenAccess
dc.contributor.sponsorScience Foundation Irelanden
dc.identifier.urihttp://hdl.handle.net/2262/103114


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record