Guided Research Philip Werz

Title (de)	Nutzung multimodalen maschinellen Lernens bei spontaner Sprache für die Früherkennung von Demenz
Title (en)	Leveraging Multimodal Machine Learning on Spontaneous Speech for Early Dementia Detection
Project	AssistD
Type	Guided Research
Status	started
Student	Philip Werz
Advisor	Alexandre Mercier
Supervisor	Prof. Dr. Florian Matthes
Start Date	24.04.2025
Sebis Contributor Agreement signed on	08.04.2025
Checklist filled	Yes
Submission date	24.10.2025

Abstract

Early detection of dementia is essential for timely intervention and effective care planning. Existing diagnostic procedures rely predominantly on clinical assessments that are often time-intensive, costly, and subjective. In this study, the feasibility of automated dementia screening was investigated as part of the broader effort to advance machine learning–based classification of Mild Cognitive Impairment (MCI) and Dementia (D). The analysis was conducted on the Voice Assistant Subset (VAS) of the Dementia TalkBank corpus, which includes 100 participants categorized into Healthy (H), MCI, and D groups. For unimodal evaluation, CNN- and ViT-based classifiers were optimized, while the multimodal framework employed Cross-Attention Fusion Models integrating BERT or DistilBERT text encoders with CNN or ViT audio backbones. The results demonstrate that unimodal audio models achieved the highest performance, particularly in distinguishing Healthy from Dementia participants, whereas multimodal fusion provided stable but limited additional gains. Overall, the findings strengthen existing evidence on the robustness of acoustic markers for cognitive status classification and establish a reproducible experimental baseline for future multimodal studies using more naturalistic speech data.

To top