GitHub - mitya8128/graph_summarizer: summarize text using graphs and language vector models

Summarize long texts by combining graph-algorithmic approaches with distributive language vector models.

Work in progress.

Some code was taken from my another repository.

Extract important words by wordrank algorithm for each sentences of text
- word2vec based word vectorization
- build adjacency matrix on top of that (distance between words in sentence)
- transfer adjacency matrix into weighted graph
- find clique of graph with max length
- therefore we deduce the most "important" words in graph-theoretic means
Extract important sentences of text through textrank algorithm - function build_similarity_matrix as an entrypoint
- basically it compares sentences and deduce metric of similarity based on simple equality of tokens from sentences
- thus with that information we could find the most "informative" sentences (by means of high similarity metric)
If necessary you could run whole algorithm several times through text (could be useful if you want to compress text more) - function generate_summary_loop as a full description of pipeline

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
README.md		README.md
config.py		config.py
demo_russian.ipynb		demo_russian.ipynb
graph_utils.py		graph_utils.py
requirements.txt		requirements.txt
textrank_sentence.py		textrank_sentence.py
utils.py		utils.py
wordrank.py		wordrank.py

Provide feedback