Bachelor's Thesis Yun Zhou
Evaluating Hybrid Retrieval Strategies in Medical RAG
Retrieval-Augmented Generation (RAG) pipelines rely heavily on the quality of their retrieval component, especially in safety-critical domains such as medical question answering. In the Aidvice pipeline, ineffective retrieval can affect the quality of generated answers, highlighting the need for reliable and well-designed retrieval strategies.
Hybrid retrieval, which combines sparse and dense retrieval methods, has shown strong potential for improving retrieval performance. However, the effectiveness of hybrid retrieval depends on several design choices, including the selection of sparse retrieval models and the way retrieval signals are combined. In addition, the performance of these approaches may vary depending on the characteristics of the input queries.
This thesis investigates how different design choices in hybrid retrieval affect performance in a medical RAG system. In particular, the study explores variations in sparse retrieval methods, different fusion strategies for combining sparse and dense signals, and the influence of query characteristics. The approaches are systematically compared in terms of retrieval effectiveness and their impact on answer quality, aiming to provide practical insights into how hybrid retrieval systems should be designed in medical applications.
Research Questions:
RQ1: How do different sparse retrieval methods (e.g., standard BM25, domain-adapted BM25, and SPLADE) affect the performance of hybrid retrieval when combined with a fixed dense retrieval component?
RQ2: How do different fusion strategies for combining sparse and dense retrieval signals (e.g., weighted score combination, rank-based merging, and sequential retrieval pipelines) affect retrieval effectiveness and answer quality?
RQ3: How does the performance of hybrid retrieval designs vary across different types of medical queries?
| Attribute | Value |
|---|---|
| Title (de) | Evaluating Hybrid Retrieval Strategies in Medical RAG |
| Title (en) | Evaluating Hybrid Retrieval Strategies in Medical RAG |
| Project | AI-Based Knowledge Assistant for Cancer Care (Aidvice) |
| Type | Bachelor's Thesis |
| Status | started |
| Student | Yun Zhou |
| Advisor | Ibrahim Ebrar Yurt |
| Supervisor | Prof. Dr. Florian Matthes |
| Start Date | 08.05.2026 |
| Sebis Contributor Agreement signed on | 08.05.2026 |
| Checklist filled | Yes |
| Submission date | 08.10.2026 |