Guided Research Rajna Fani
A Human Assessment of Reference-Free and Reference-Based Evaluation Approaches in the HR Domain
Abstract and Motivation:
In the era of Large Language Models (LLMs), assessing the quality of generated text presents an ongoing challenge. This study explores the effectiveness of reference-free metrics in evaluating text quality produced by advanced language models, comparing them with traditional evaluation methods.
This research finds its practical application in addressing prolonged waiting times for employees seeking information from the Human Resources department through SAP HR Chatbots. By harnessing advanced text generation models, conversational agents have the potential to expedite responses and reduce the HR department's workload.
Moreover, the study examines the reliability of reference-free evaluation metrics and compares them to traditional reference-based metrics. It also assesses the performance of automatic metrics versus human evaluation by domain experts. The research evaluates two approaches, the Fine-tuned Language Model (LM) Approach and the LLM-Powered Approach, using a question-answering dataset that includes FAQs and user utterances from chatbot logs to gauge generative model performance.
Research Questions
1. What are the emerging state-of-the-art metrics in the evaluation of generative conversational agents, and how do they compare to traditional metrics?
2. Are reference-free evaluation metrics, especially those leveraging advanced language models, a more reliable indicator of a generative model's performance compared to traditional reference-based metrics?
3. How effectively do automatic metrics perform in assessing generative model performance when subjected to human evaluation by domain experts?
References
TRANSLATE with x
English
TRANSLATE with
COPY THE URL BELOW
EMBED THE SNIPPET BELOW IN YOUR SITE
Enable collaborative features and customize widget: Bing Webmaster Portal
"; langMenu.appendChild(origLangDiv); LanguageMenu.Init('LanguageMenu', LanguageMenu_keys, LanguageMenu_values, LanguageMenu_callback, LanguageMenu_popupid); window["LanguageMenu"] = LanguageMenu; clearInterval(intervalId); } }, 1); // ]]>
| Attribute | Value |
|---|---|
| Title (de) | Eine menschliche Bewertung von Referenz-freien und Referenz-basierten Bewertungsansätzen im HR-Bereich |
| Title (en) | A Human Assessment of Reference-Free and Reference-Based Evaluation Approaches in the HR Domain |
| Project | Enterprise AI at SAP |
| Type | Guided Research |
| Status | completed |
| Student | Rajna Fani |
| Advisor | Anum Afzal |
| Supervisor | Prof. Dr. Florian Matthes |
| Start Date | 15.10.2023 |
| Sebis Contributor Agreement signed on | 13.10.2023 |
| Checklist filled | Yes |
| Submission date | 15.04.2024 |