This repo contains an implementation of Neural Word Embedding as Implicit Matrix Factorization by Omer Levy and Yoav Goldberg.
@inproceedings{NIPS2014_feab05aa,
author = {Levy, Omer and Goldberg, Yoav},
booktitle = {Advances in Neural Information Processing Systems},
editor = {Z. Ghahramani and M. Welling and C. Cortes and N. Lawrence and K.Q. Weinberger},
pages = {},
publisher = {Curran Associates, Inc.},
title = {Neural Word Embedding as Implicit Matrix Factorization},
url = {https://proceedings.neurips.cc/paper_files/paper/2014/file/feab05aa91085b7a8012516bc3533958-Paper.pdf},
volume = {27},
year = {2014}
}
mkdir data
cd data
wget https://downloads.wortschatz-leipzig.de/corpora/eng_news-typical_2016_1M.tar.gz
tar -xvzf eng_news-typical_2016_1M.tar.gz
pip install -r requirements.txt
Preparations:
python src/tools/dependency-parser.py
python src/tools/pair-data.py
Main script:
python src/main.py
python src/tools/theasarus.py --word money
python src/tools/theasarus.py --find_rnns 1
Words most similar to money
:
[('resources', 0.12),
('capital', 0.11),
('paper', 0.11),
('property', 0.1),
('total', 0.1),
('value', 0.09),
('cards', 0.09),
('time', 0.09),
('knowledge', 0.09),
('products', 0.09)]