Cross-Attention Transformer for Autism Diagnosis Using Multimodal MRI
Novel deep learning architecture for Autism Spectrum Disorder classification using multimodal neuroimaging
Overview
This research project represents the first systematic investigation of bidirectional cross-attention mechanisms for Autism Spectrum Disorder (ASD) classification using multimodal MRI data. By implementing novel Transformer-based architectures that fuse functional MRI (fMRI) and structural MRI (sMRI) data, this work advances the field of neuroimaging-based diagnostic systems while addressing critical methodological challenges in medical machine learning.
Technical Implementation
The project involved developing 11 distinct Transformer-based architectures exploring different approaches to multimodal neuroimaging fusion:
- Built 11 Transformer architectures investigating ROI-based, network-based, and feature-type tokenization strategies for optimal data representation
- Processed 871 subjects from the ABIDE (Autism Brain Imaging Data Exchange) dataset using CC200 parcellation (19,900 fMRI features) and FreeSurfer extraction (800 sMRI features)
- Implemented rigorous evaluation using both standard k-fold cross-validation and leave-one-out cross-validation to assess true generalizability across 20 different acquisition sites
- Developed bidirectional cross-attention mechanisms enabling the model to learn complementary patterns between structural and functional brain imaging modalities
Key Findings
The research yielded several important insights for the neuroimaging and machine learning communities:
- Cross-attention consistently outperformed single-modality baselines (69.9% vs 63.5% accuracy), demonstrating the complementary value of integrating both functional and structural brain imaging data
- Revealed significant site bias effects — complex tokenization strategies showed limited benefits under rigorous leave-one-out validation despite strong k-fold performance, highlighting the importance of evaluation methodology
- Confirmed 70% performance ceiling identified in prior ASD neuroimaging research, suggesting fundamental limitations in current neuroimaging-based classification approaches rather than architectural deficiencies
- Demonstrated methodological rigor matters — the gap between k-fold and leave-one-out performance emphasized the need for appropriate cross-site validation in medical imaging applications
Dataset & Methodology
The ABIDE dataset is one of the largest publicly available collections of autism neuroimaging data, providing a robust foundation for developing generalizable models:
- 871 subjects across 20 different acquisition sites worldwide
- Both ASD-diagnosed individuals and neurotypical controls
- High-quality resting-state fMRI and structural MRI scans
- Standardized preprocessing pipelines ensuring data quality
The use of leave-one-site-out cross-validation was particularly critical, as it tests whether models trained on data from certain institutions can generalize to completely unseen sites — a realistic scenario for clinical deployment.
Research Impact
This work makes several contributions to both machine learning and neuroscience:
- Addresses evaluation methodology gaps in neuroimaging research by demonstrating the importance of rigorous cross-site validation
- Demonstrates both promise and limitations of Transformer architectures for clinical diagnostic applications
- Provides practical insights for researchers working on multimodal medical imaging problems
- Contributes to the growing body of work applying attention mechanisms to neuroimaging, extending beyond traditional CNN-based approaches
- Emphasizes the need for realistic evaluation in medical machine learning to avoid overestimating model performance
Future Directions
The research points to several promising avenues for future investigation:
- Exploring more sophisticated domain adaptation techniques to address site biases
- Investigating alternative data modalities beyond fMRI and sMRI
- Developing methods to identify ASD subtypes rather than binary classification
- Integrating non-imaging clinical data to improve diagnostic accuracy
- Examining attention weights to understand which brain regions contribute most to predictions
View on GitHub
Explore the complete implementation, including model architectures, training pipelines, and evaluation code.
View Repository on GitHub