Skip to content

summarize text using graphs and language vector models

Notifications You must be signed in to change notification settings

mitya8128/graph_summarizer

Repository files navigation

Graph_summarizer

 

Summarize long texts by combining graph-algorithmic approaches with distributive language vector models.

Work in progress.

Some code was taken from my another repository.
 

Main idea:

  • Extract important words by wordrank algorithm for each sentences of text
    • word2vec based word vectorization
    • build adjacency matrix on top of that (distance between words in sentence)
    • transfer adjacency matrix into weighted graph
    • find clique of graph with max length
    • therefore we deduce the most "important" words in graph-theoretic means
  • Extract important sentences of text through textrank algorithm - function build_similarity_matrix as an entrypoint
    • basically it compares sentences and deduce metric of similarity based on simple equality of tokens from sentences
    • thus with that information we could find the most "informative" sentences (by means of high similarity metric)
  • If necessary you could run whole algorithm several times through text (could be useful if you want to compress text more) - function generate_summary_loop as a full description of pipeline  

Demos:

demo_russian
 

About

summarize text using graphs and language vector models

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published