Giter Site home page Giter Site logo

textrank-js's Introduction

textrank-js

TextRank is an algorithm for Text Summarization, by Rada Mihalcea & Paul Tarau. The code here is based on their paper "TextRank: Bringing Order into Texts". I've noticed that there are many implementations out there, but this one is intended to demonstrate the algorithm without any additional baggage. Also, unlike many other implementations I have seen, this has no algorithm dependencies and could also work in the browser. I wanted to show how elegant, simple and clean the algorithm is, so I kept it short -- about 130 lines of Javascript (ES5). It currently depends only on lodash ('_'), a standard JS library used in many (most?) projects, for a few choice zingey one-liners.

The algorithm itself can extend to any type of graph, as they note in their paper, but I have provided the two types of graphs explored in the paper: keyword extraction an undirected graph derived from collocation, and a sentence extraction graph using the similarity weighting (as described in the paper) on the edges. There is a function for building a graph of each type, and once the graph has been built, the textRank function performs the algorithm on the generated graph.

Note this code only implements the TextRank algorithm itself, the sentences must be properly formatted upfront. I have provided example tokenization for both tasks in the tests directory, both derived from tokenizing the Wikipedia entry for "Automatic summarization", both minimally processed using a custom (very minimal) tokenizer, and OpenNLP's default models for sentence splitting and POS, and converted to JSON. As long as you get the format right that this is expecting, you should be able to use whatever library you want to preprocess. The keyword extraction builder needs the format to include POS tags since it filters the content while it is building its adjacencies. The sentence extraction builder does not require POS, but requires pre-split sentences.

The "tests" are not currently testing anything, but serve as demonstration code for how to run the software. Note that textRank() has a default number of iterations -- it doesnt try and test for convergence. This is just to keep it simple, it would be simple to modify to test this instead, but for now you can pass in any number you want if that default isnt suitable (see test examples).

Build using Grunt:

$ npm install
$ grunt

Here's what the output looks like for the first 5 extracted sentences when performing the sentence extraction task:

automatic summarization is the process of reducing a text document with a computer program in order to create a summary that retains the most important points of the original document

two particular types of summarization often addressed in the literature are keyphrase extraction where the goal is to select individual words or phrases to tag a document and document summarization where the goal is to select whole sentences to create a short paragraph summary

in general abstraction can condense a text more strongly than extraction but the programs that can do this are harder to develop as they require the use of natural language generation technology which itself is a growing field

while some work has been done in abstractive summarization creating an abstract synopsis like that of a human the majority of summarization systems are extractive selecting a subset of sentences to place in a summary

apart from fully automated summarizers fas there are systems that aid users with the task of summarization mahs = machine aided human summarization for example by highlighting candidate passages to be included in the summary and there are systems that depend on post-processing by a human hams = human aided machine summarization

You are welcome to use this code for whatever nefarious purposes, but please attribute it to this implementation if you do.

textrank-js's People

Contributors

dpressel avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.