Guided Research Philip Werz
| Title (de) | Nutzung multimodalen maschinellen Lernens bei spontaner Sprache für die Früherkennung von Demenz |
| Title (en) | Leveraging Multimodal Machine Learning on Spontaneous Speech for Early Dementia Detection |
| Project | AssistD |
| Type | Guided Research |
| Status | started |
| Student | Philip Werz |
| Advisor | Alexandre Mercier |
| Supervisor | Prof. Dr. Florian Matthes |
| Start Date | 24.04.2025 |
| Sebis Contributor Agreement signed on | 08.04.2025 |
| Checklist filled | Yes |
| Submission date | 24.10.2025 |
Abstract
Early detection of dementia is essential for timely intervention and effective care planning. Existing diagnostic procedures rely predominantly on clinical assessments that are often time-intensive, costly, and subjective. In this study, the feasibility of automated dementia screening was investigated as part of the broader effort to advance machine learning–based classification of Mild Cognitive Impairment (MCI) and Dementia (D). The analysis was conducted on the Voice Assistant Subset (VAS) of the Dementia TalkBank corpus, which includes 100 participants categorized into Healthy (H), MCI, and D groups. For unimodal evaluation, CNN- and ViT-based classifiers were optimized, while the multimodal framework employed Cross-Attention Fusion Models integrating BERT or DistilBERT text encoders with CNN or ViT audio backbones. The results demonstrate that unimodal audio models achieved the highest performance, particularly in distinguishing Healthy from Dementia participants, whereas multimodal fusion provided stable but limited additional gains. Overall, the findings strengthen existing evidence on the robustness of acoustic markers for cognitive status classification and establish a reproducible experimental baseline for future multimodal studies using more naturalistic speech data.