Master's Thesis Leonhard Stengel
This thesis investigates the challenge of improving Automatic Speech Recognition (ASR) performance for industry-specific jargon in low-resource scenarios where in-domain audio data is unavailable. While state-of-the-art ASR systems achieve high accuracy on general speech, they often fail when encountering specialized terminology, creating a significant barrier to professional adoption.
Through a systematic literature review, this work addresses three fundamental research questions:
- How can industry-specific jargon be defined and represented in speech datasets to enable benchmarking of ASR systems?
- Which evaluation metrics best capture the performance of ASR systems on jargon-heavy speech?
- What methods can improve ASR performance in recognizing industry-specific jargon with no availability of in-domain audio data?
Three diverse datasets were collected representing aviation (FAA-Glossary), medical (UNITED-SYN-MED), and financial (Earnings-21) domains. Traditional Word Error Rate proved insufficient for evaluating jargon recognition, necessitating keyword-focused metrics including Precision, Recall, and F1-score. Baseline evaluation of state-of-the-art models (Whisper-large-v3, Canary-Qwen-2.5b, Gemini-2.5-Flash) revealed performance gaps in detecting jargon terms, with F1-scores as low as 43.5% for highly specialized medical terminology.
Three improvement methods were systematically evaluated: keyword-guided adaptationusing keyword spotting, LLM-based post-correction, and LLM zero-shot in-context learning. Zero-shot in-context learning emerged as the most promising approach, achieving consistent improvements across all datasets in recognizing industry-specific jargon terms. The method demonstrated absolute F1-score improvements of up to 20.9 percentage points but also introduced trade-offs in terms of increased operational costs.
This research provides practical insights for organizations seeking to deploy ASR technology in specialized domains, demonstrating that while significant challenges remain, viable solutions exist for improving jargon recognition without requiring domain-specific audio datasets.
| Attribute | Value |
|---|---|
| Title (de) | Master's Thesis Leonhard Stengel |
| Title (en) | Evaluating and Improving Automatic Speech Recognition for Industry-Specific Jargon |
| Project | |
| Type | Master's Thesis |
| Status | started |
| Student | |
| Advisor | Alexandre Mercier |
| Supervisor | Prof. Dr. Florian Matthes |
| Start Date | 13.05.2025 |
| Sebis Contributor Agreement signed on | |
| Checklist filled | Yes |
| Submission date | 13.11.2025 |