Guided Research Marcel Seitz
| Title (de) | Nutzung multimodalen maschinellen Lernens bei standardisierten Daten für die Früherkennung von Demenz |
| Title (en) | Leveraging Multimodal Machine Learning on Standardized Data for Early Dementia Detection |
| Project | AssistD |
| Type | Guided Research |
| Status | started |
| Student | Marcel Seitz |
| Advisor | Alexandre Mercier |
| Supervisor | Prof. Dr. Florian Matthes |
| Start Date | 24.04.2025 |
| Sebis Contributor Agreement signed on | 08.04.2025 |
| Checklist filled | Yes |
| Submission date | 24.10.2025 |
Abstract
Early detection of Mild Cognitive Impairment (MCI) is essential for timely intervention before progression to dementia. This study presents automatic MCI detection based on the Delaware corpus within DementiaBank, focusing on the Cookie Theft Picture Description task. A reproducible multimodal framework was developed to align and process 1,386 paired audio–text segments, representing audio as log-Mel spectrograms or waveforms and text as either orthographic or phonetic (ARPAbet) transcripts. Unimodal audio (ViT, CNN, HuBERT), unimodal text (BERT, DistilBERT), and multimodal cross-attention fusion configurations were systematically compared. Hyperparameters were optimized via Bayesian optimization, and statistical validation was performed using Welch’s unequal-variance t-tests on participant-level splits. The Vision Transformer (ViT) achieved the highest F1 (0.77) and balanced accuracy (0.65), confirming the strong discriminative power of acoustic features. Phonetic ARPAbet transcriptions improved BERT stability, particularly within multimodal fusion frameworks, while DistilBERT performed best on orthographic text. Among multimodal models, CNN–BERT (non-ARPA) achieved the highest balanced accuracy (0.65) and precision (0.64), showing small but statistically significant differences (p < 0.001) relative to the best unimodal model. Acoustic explanations revealed interpretable spectro–temporal patterns consistent with clinical markers of cognitive decline, underscoring the potential of speech-based screening for early detection of cognitive impairment.