Hasan, Souleiman and Curry, Edward (2017) Word Re-Embedding via Manifold Dimensionality Retention. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics (ACL), Stroudsburg, PA, USA, pp. 321-326. ISBN 978-1-945626-83-8
Preview
Hasan_Word_CEMNP_2017.pdf
Download (150kB) | Preview
Abstract
Word embeddings seek to recover a Euclidean metric space by mapping words into vectors, starting from words co-occurrences in a corpus. Word embeddings may underestimate the similarity between nearby words, and overestimate it between distant words in the Euclidean metric space. In this paper, we re-embed pre-trained word embeddings with a stage of manifold learning which retains dimensionality. We show that this approach is theoretically founded in the metric recovery paradigm, and empirically show that it can improve on state-of-the-art embeddings in word similarity tasks 0.5 - 5.0% points depending on the original space.
Item Type: | Book Section |
---|---|
Additional Information: | This paper was presented at EMNLP 2017 The Conference on Empirical Methods in Natural Language Processing, September 9-11, 2017 Copenhagen, Denmark. |
Keywords: | Embeddings; Set theory; Topology; Vector spaces; Co-occurrence; Euclidean metrics; Manifold learning; On state; Word similarity; Natural language processing systems; |
Academic Unit: | Faculty of Science and Engineering > Research Institutes > Hamilton Institute Faculty of Social Sciences > School of Business |
Item ID: | 11995 |
Identification Number: | 10.18653/v1/D17-1033 |
Depositing User: | Souleiman Hasan |
Date Deposited: | 05 Dec 2019 14:23 |
Publisher: | Association for Computational Linguistics (ACL) |
Refereed: | Yes |
Related URLs: | |
URI: | https://mu.eprints-hosting.org/id/eprint/11995 |
Use Licence: | This item is available under a Creative Commons Attribution Non Commercial Share Alike Licence (CC BY-NC-SA). Details of this licence are available here |
Repository Staff Only (login required)
Downloads
Downloads per month over past year