This repository contains R and Python (v 2.7) scripts for implementing sCAKE and LAKE method for single document keyword Extraction.
sCAKE: semantic Connectivity Aware Keyword Extraction
LAKE: Language-Agnostic Keyword Extraction
Author: Swagata Duari
Acknowledgement: Rakhi Saxena, Vasudha Bhatnagar
@article{DUARI2019100,
title = "sCAKE: Semantic Connectivity Aware Keyword Extraction",
journal = "Information Sciences",
volume = "477",
pages = "100 - 117",
year = "2019",
issn = "0020-0255",
doi = "https://doi.org/10.1016/j.ins.2018.10.034",
url = "http://www.sciencedirect.com/science/article/pii/S0020025518308521",
author = "Swagata Duari and Vasudha Bhatnagar",
keywords = "Automatic Keyword Extraction, Text Graph, Semantic Connectivity, Parameterless, Language Agnostic"
}
The algorithms rank all the candidate keywords and present the output in descending order of SCScore. and does not define the number of candidates to be extracted as keywords. Thus the user have to decide on the number of extracted keywords. sCAKE is designed for languages with support of sohisticated NLP tools, like English. This impementation of sCAKE is aimed for English language only. However, interested users may apply the appropriate NLP tools, if available, for the language of their interest. Alternatively, the user may work with LAKE which can be applied on documents of any language.
-
Run 'create-position-info-algoname.R' (replace 'algoname' with sCAKE or LAKE as required)
-
For LAKE, run 'compute-sigma-index.R'. Skip this for sCAKE.
-
Run 'create-graph-algoname.R' (creates graphs according the mentioned algorithm)
-
For both algorithms, run 'Convert-adjmat-to-edgelist.R'
-
For both algorithms, run Python script 'InfluenceEvaluation.py' using following command:
python InfluenceEvaluation.py '/path/to/input/directory/Edgelist/'
(Reformatted and reused this script. The Author of python script is my collegue.)
-
For both algorithms, run 'Word-score-with-PostionWeight.R'