Giter Site home page Giter Site logo

zudiay / pagerank-for-identifying-central-people-in-news-articles Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 226 KB

Implementation of PageRank-based method to identify the most important people occurring in news articles.

License: MIT License

Python 100.00%

pagerank-for-identifying-central-people-in-news-articles's Introduction

PageRank for Identifying Central People in News Articles

Implementation of PageRank-based method to identify the most important people occurring in news articles.

The Reuters-21578 data set is used. Reuters-21578 contains 21578 news stories from Reuters newswire. There are 22 SGML files, each containing 1000 news articles, except the last file, which contains 578 articles. . There are a total of 118 topics (classes) and each article is classified into one or more topics

The data.txt file contains an undirected and unweighted graph of social network of co-occurrence in news articles. The graph has been constructed from a subset of 3000 news articles from the Reuters-21578 corpus by identifying the person names. The vertices of the graph are defined as distinct people. An edge is constructed between two people if their names appear in the same news article. The resulting social network consists of 459 nodes and 1422 edges.

The power iteration method is used in the PageRank algorithm. Teleportation rate is taken as 0.15.

Running the program

  • Python version Python 3.10.0

  • Place the input file with the same folder as the main.py file.

  • Open terminal at that folder.

  • Run the following command by replacing the <file_name> with your input file name. python3 main.py <file_name>

  • Example: python3 main.py data.txt

Developed for CMPE493 Introduction to Information Retrieval course, Bogazici University, Fall 2021

pagerank-for-identifying-central-people-in-news-articles's People

Contributors

zudiay avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.