Guided Research Rajna Fani

A Human Assessment of Reference-Free and Reference-Based Evaluation Approaches in the HR Domain

Abstract and Motivation:

In the era of Large Language Models (LLMs), assessing the quality of generated text presents an ongoing challenge. This study explores the effectiveness of reference-free metrics in evaluating text quality produced by advanced language models, comparing them with traditional evaluation methods.

This research finds its practical application in addressing prolonged waiting times for employees seeking information from the Human Resources department through SAP HR Chatbots. By harnessing advanced text generation models, conversational agents have the potential to expedite responses and reduce the HR department's workload.

Moreover, the study examines the reliability of reference-free evaluation metrics and compares them to traditional reference-based metrics. It also assesses the performance of automatic metrics versus human evaluation by domain experts. The research evaluates two approaches, the Fine-tuned Language Model (LM) Approach and the LLM-Powered Approach, using a question-answering dataset that includes FAQs and user utterances from chatbot logs to gauge generative model performance.

Research Questions

1. What are the emerging state-of-the-art metrics in the evaluation of generative conversational agents, and how do they compare to traditional metrics?

2. Are reference-free evaluation metrics, especially those leveraging advanced language models, a more reliable indicator of a generative model's performance compared to traditional reference-based metrics?

3. How effectively do automatic metrics perform in assessing generative model performance when subjected to human evaluation by domain experts?

References

TRANSLATE with x

English

TRANSLATE with

COPY THE URL BELOW

EMBED THE SNIPPET BELOW IN YOUR SITE

Enable collaborative features and customize widget: Bing Webmaster Portal

"; langMenu.appendChild(origLangDiv); LanguageMenu.Init('LanguageMenu', LanguageMenu_keys, LanguageMenu_values, LanguageMenu_callback, LanguageMenu_popupid); window["LanguageMenu"] = LanguageMenu; clearInterval(intervalId); } }, 1); // ]]>

Attribute	Value
Title (de)	Eine menschliche Bewertung von Referenz-freien und Referenz-basierten Bewertungsansätzen im HR-Bereich
Title (en)	A Human Assessment of Reference-Free and Reference-Based Evaluation Approaches in the HR Domain
Project	Enterprise AI at SAP
Type	Guided Research
Status	completed
Student	Rajna Fani
Advisor	Anum Afzal
Supervisor	Prof. Dr. Florian Matthes
Start Date	15.10.2023
Sebis Contributor Agreement signed on	13.10.2023
Checklist filled	Yes
Submission date	15.04.2024

To top

Chair of Software Engineering for Business Information Systems

Prof. Dr. Florian Matthes

Arabic	Hebrew	Polish
Bulgarian	Hindi	Portuguese
Catalan	Hmong Daw	Romanian
Chinese Simplified	Hungarian	Russian
Chinese Traditional	Indonesian	Slovak
Czech	Italian	Slovenian
Danish	Japanese	Spanish
Dutch	Klingon	Swedish
English	Korean	Thai
Estonian	Latvian	Turkish
Finnish	Lithuanian	Ukrainian
French	Malay	Urdu
German	Maltese	Vietnamese
Greek	Norwegian	Welsh
Haitian Creole	Persian