Candidate Profile Summarization-A RAG Approach with Synthetic Data Generation for Tech Jobs
As Large Language Models (LLMs) become increasingly applied to resume evaluation and candidate selection, this study investigates the effectiveness of using in-context example resumes to generate synthetic data. We compare a Retrieval-Augmented Generation (RAG) system to a Named Entity Recognition (NER)- based baseline for job-resume matching, generating diverse synthetic resumes with models like Mixtral-8x22B-Instruct-v0.1. Our results show that combining BERT, ROUGE, and Jaccard similarity metrics effectively assesses synthetic resume quality, ensuring the least lexical overlap along with high similarity and diversity. Our experiments show that RAG notably outperforms NER for retrieval tasks—though generation-based summarization remains challenged by role differentiation. Human evaluation further highlights issues of factual accuracy and completeness, emphasizing the importance of in-context examples, prompt engineering, and improvements in summary generation for robust, automated candidate selection.
| Attribute | Value |
|---|---|
| Address | Varna, Bulgaria |
| Authors | Anum Afzal, Ishwor Subedi |
| Citation | Afzal, A., Subedi, I., & Matthes, F. (2025). Candidate profile summarization: A RAG approach with synthetic data generation for tech jobs. In Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing (RANLP 2025) (pp. 22–31). Varna, Bulgaria. |
| Key | Af25c |
| Research project | |
| Title | Candidate Profile Summarization-A RAG Approach with Synthetic Data Generation for Tech Jobs |
| Type of publication | Conference |
| Year | 2025 |
| Team members | Anum Afzal, Ishwor Subedi |
| Publication URL | https://acl-bg.org/proceedings/2025/RANLP%202025/pdf/2025.ranlp-1.3.pdf |
| Project | |
| Acronym | RANLP |