Skip to content

PouriaYazdani/clustering_persian_economic_papers

Repository files navigation

clustering_persian_economic_papers

In this project I used the follwing pipleline to perform clustering on Persian economic papers.

  • crawled 592 papers from Tarbiat Modares University
  • extracted title, abstract and keyword for each paper.
  • performed cleaning preprocess including normalizing, lemmatizing, removing stopwords and redundant words using hazm.
  • perform 2 types of word embeddings using FaBERT and gensim's Word2Vec.
  • Performed various type of clusteting algorithms using sklearn.cluster package.
  • Evaluated results using unknown ground truth evaluation metrics and by visualizing sorted similarity matrix. (read here for more).
  • inspected identified clusters by eye and allocated appropriate names to each cluster.

Here you can view the presentation slides.

Figure_1 Figure_2

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages