Code to download, preprocess, and analyze data for a project assessing the performance of supervised machine learning methods in predicting diagnosis of autism spectrum disorder from measures of regional brain cortical thickness.
Autism spectrum disorder (ASD) is a prevalent neurodevelopmental disorder characterized by an early onset and life-long impairments in social interaction, communication, and behavior. Although differences in the thickness of the cerebral cortex have been reported in ASD relative to typically developing (TD) controls, both overall as well as in specific regions, these results have been inconsistent. Therefore, the goal of the present study was to determine if regional measures of cortical thickness could be used to reliably classify individuals with ASD from TD controls, and thus demonstrate potential utility as a biomarker of the disorder. The sample consisted of 1035 participants, 505 with ASD and 530 TD controls, in the Autism Brain Imaging Data Exchange dataset. Cortical thickness measures in 25 regions of interest (ROIs) were calculated from structural magnetic resonance imaging (MRI) data and adjusted for confounds for each participant. Four machine learning algorithms including naive Bayes, support vector machines (SVM), k-nearest neighbors, and random forest, were used to generate initial classification models. Among the four algorithms, SVM was the best-performing, although its performance was poor with accuracy = 58.45%. Dimension reduction techniques including recursive feature elimination determined that adjusted cortical thickness in a subset of 12 ROIs were the most important features; however, a classification model using SVM and these 12 features still performed poorly (ten-fold cross-validated accuracy = 53.91%, sensitivity = 0.5408, specificity = 0.6055). The results of this analysis suggest that regional cortical thickness is not capable of classifying a heterogenous group of individuals with ASD from TD individuals.