Sc22a

Towards Bilingual Word Embedding Models for Engineering

Word embeddings represent the semantic meanings of words in high-dimensional vector space. Because of this capability, word embeddings could be used in a wide range of Natural Language Processing (NLP) tasks. While domain-specific monolingual word embeddings are common in literature, domain-specific bilingual word embeddings are uncommon. In general, large text corpora are required for training high quality word embeddings. Furthermore, training domain-specific word embeddings necessitates the use of source texts from the relevant domain. To train bilingual domain-specific word embeddings, the domain-specific texts must also be available in two different languages. In this paper, we use a large dataset of engineering-related articles in German and English to train bilingual engineering-specific word embedding models using different approaches. We will evaluate our trained models, identify the most promising approach, and demonstrate that the best performing one is very capable of representing semantic relationships between engineering-specific words and mapping languages in a shared vector space. Moreover, we show that the additional use of an engineering-specific learning dictionary can improve the quality of bilingual engineering-specific word embeddings.

Attribute	Value
Address	Chiang Mai, Thailand,
Authors	Tim Schopf , Peter Weinberger , Thomas Kinkeldei , Florian Matthes
Citation	@inproceedings{10.1145/3535782.3535835, author = {Schopf, Tim and Weinberger, Peter and Kinkeldei, Thomas and Matthes, Florian}, title = {Towards Bilingual Word Embedding Models for Engineering: Evaluating Semantic Linking Capabilities of Engineering-Specific Word Embeddings Across Languages}, year = {2022}, isbn = {9781450395816}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3535782.3535835}, doi = {10.1145/3535782.3535835}, abstract = {Word embeddings represent the semantic meanings of words in high-dimensional vector space. Because of this capability, word embeddings could be used in a wide range of Natural Language Processing (NLP) tasks. While domain-specific monolingual word embeddings are common in literature, domain-specific bilingual word embeddings are uncommon. In general, large text corpora are required for training high quality word embeddings. Furthermore, training domain-specific word embeddings necessitates the use of source texts from the relevant domain. To train bilingual domain-specific word embeddings, the domain-specific texts must also be available in two different languages. In this paper, we use a large dataset of engineering-related articles in German and English to train bilingual engineering-specific word embedding models using different approaches. We will evaluate our trained models, identify the most promising approach, and demonstrate that the best performing one is very capable of representing semantic relationships between engineering-specific words and mapping languages in a shared vector space. Moreover, we show that the additional use of an engineering-specific learning dictionary can improve the quality of bilingual engineering-specific word embeddings.}, booktitle = {2022 4th International Conference on Management Science and Industrial Engineering (MSIE)}, pages = {407–413}, numpages = {7}, location = {Chiang Mai, Thailand}, series = {MSIE 2022} }
Key	Sc22a
Research project	Technology Scouting as a Service (TSaaS)
Title	Towards Bilingual Word Embedding Models for Engineering
Type of publication	Conference
Year	2022
Publication URL	https://dl.acm.org/doi/10.1145/3535782.3535835
Project	Technology Scouting as a Service (TSaaS)
Acronym
Team members

To top

Chair of Software Engineering for Business Information Systems

Prof. Dr. Florian Matthes

Sc22a.pdf