Giter Site home page Giter Site logo

ovroabir / source-recommendation-system Goto Github PK

View Code? Open in Web Editor NEW
1.0 2.0 1.0 3.65 MB

Source-Recommendation-System takes an article from the user as input and outputs any relevant article from a dataset of 8.5 million articles.

HTML 3.97% CSS 0.03% R 1.46% Shell 1.89% Batchfile 0.85% Java 9.32% Python 68.83% Scala 13.32% Dockerfile 0.11% Makefile 0.19% XSLT 0.03%
python pyspark spark fakenewscorpus fake-news relevant news distributed-systems hadoop related-articles

source-recommendation-system's Introduction

Source-Recommendation-System

Source Recommendation System takes an article from the user as input and outputs any relevant article from 8.5 million articles in the dataset to the user. It uses Apache Spark to handle this huge load of articles.

Prerequisites

This project uses rake-nltk library to extract keywords.

pip install rake-nltk

FakeNewsCorpus was used as dataset (27 GB) for news articles. Apache Spark has been used to handle this huge dataset. It needs to be correctly installed and configured. The configuration file for Spark can be found at spark-2.4.4-bin-hadoop2.7 folder. Hadoop was used as underlying distributed file system. The configuration for Hadoop can be found at hadoop-conf folder. Both of them needs to changed according to your configuration.

Source Code

The source code can be found at /src folder.

Algorithm & Implementation Details

This idea was implement as project for course work of Distributed System course in Colorado State Univeristy. Detailed description of the algorithm can be found here -

Authors

source-recommendation-system's People

Contributors

ovroabir avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Forkers

ringo3

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.