Towards Bilingual Word Embedding Models for Engineering
Word embeddings represent the semantic meanings of words in high-dimensional vector space. Because of this capability, word embeddings could be used in a wide range of Natural Language Processing (NLP) tasks. While domain-specific monolingual word embeddings are common in literature, domain-specific bilingual word embeddings are uncommon. In general, large text corpora are required for training high quality word embeddings. Furthermore, training domain-specific word embeddings necessitates the use of source texts from the relevant domain. To train bilingual domain-specific word embeddings, the domain-specific texts must also be available in two different languages. In this paper, we use a large dataset of engineering-related articles in German and English to train bilingual engineering-specific word embedding models using different approaches. We will evaluate our trained models, identify the most promising approach, and demonstrate that the best performing one is very capable of representing semantic relationships between engineering-specific words and mapping languages in a shared vector space. Moreover, we show that the additional use of an engineering-specific learning dictionary can improve the quality of bilingual engineering-specific word embeddings.
| Attribute | Value |
|---|---|
| Address | Chiang Mai, Thailand, |
| Authors | Tim Schopf , Peter Weinberger , Thomas Kinkeldei , Florian Matthes |
| Citation | @inproceedings{10.1145/3535782.3535835, author = {Schopf, Tim and Weinberger, Peter and Kinkeldei, Thomas and Matthes, Florian}, title = {Towards Bilingual Word Embedding Models for Engineering: Evaluating Semantic Linking Capabilities of Engineering-Specific Word Embeddings Across Languages}, year = {2022}, isbn = {9781450395816}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3535782.3535835}, doi = {10.1145/3535782.3535835}, abstract = {Word embeddings represent the semantic meanings of words in high-dimensional vector space. Because of this capability, word embeddings could be used in a wide range of Natural Language Processing (NLP) tasks. While domain-specific monolingual word embeddings are common in literature, domain-specific bilingual word embeddings are uncommon. In general, large text corpora are required for training high quality word embeddings. Furthermore, training domain-specific word embeddings necessitates the use of source texts from the relevant domain. To train bilingual domain-specific word embeddings, the domain-specific texts must also be available in two different languages. In this paper, we use a large dataset of engineering-related articles in German and English to train bilingual engineering-specific word embedding models using different approaches. We will evaluate our trained models, identify the most promising approach, and demonstrate that the best performing one is very capable of representing semantic relationships between engineering-specific words and mapping languages in a shared vector space. Moreover, we show that the additional use of an engineering-specific learning dictionary can improve the quality of bilingual engineering-specific word embeddings.}, booktitle = {2022 4th International Conference on Management Science and Industrial Engineering (MSIE)}, pages = {407–413}, numpages = {7}, location = {Chiang Mai, Thailand}, series = {MSIE 2022} } |
| Key | Sc22a |
| Research project | Technology Scouting as a Service (TSaaS) |
| Title | Towards Bilingual Word Embedding Models for Engineering |
| Type of publication | Conference |
| Year | 2022 |
| Publication URL | https://dl.acm.org/doi/10.1145/3535782.3535835 |
| Project | Technology Scouting as a Service (TSaaS) |
| Acronym | |
| Team members |