Giter Site home page Giter Site logo

distributedcb's Introduction

DistributedCB

A Parallel and Distributed Content-Based Recommender System

This is the Python implementation of the parallel and distributed content-based recommender system that was accepted for publication in the Journal of Intelligent Information Systems (JIIS) in 2013. The code provided here may serve as a reference for others interested in parallel and distributed recommender systems. Please cite the corresponding paper if you make use of this work.

@article{dooms2013distributedcb,
  year={2013},
  issn={0925-9902},
  journal={Journal of Intelligent Information Systems},
  doi={10.1007/s10844-013-0276-1},
  title={In-memory, distributed content-based recommender system},
  url={http://dx.doi.org/10.1007/s10844-013-0276-1},
  publisher={Springer US},
  keywords={Recommender system; Distributed; Parallel; Speedup},
  author={Dooms, Simon and Audenaert, Pieter and Fostier, Jan and De Pessemier, Toon and Martens, Luc},
  pages={1-25},
  language={English}
}

First, the work_division.py file should be used to pre-process the input data so that jobs can be distributed in a load balanced way (see the paper for more details). Next, the distributed_CB_recommender.py Python file can be used to calculate the actual recommendations. By changing the input parameters, the number of computing nodes (and cores per node) can be set.

This work was carried out using the Stevin Supercomputer Infrastructure at Ghent University, funded by Ghent University, the Hercules Foundation and the Flemish Government โ€“ department EWI. The scripts used to submit jobs on this HPC infrastructure are not provided here, as they are too system-specific. Contact Simon Dooms if these can be of any use for you.

The DistributedCB recommender system does not rely on Mahout or other MapReduce related frameworks, instead recommendation logic is parallelized using the Python 'multiprocessing' module. Recommendation work is distributed across different computing nodes and every node processes its work in-memory, meaning that data will be kept in RAM from start to end for efficiency reasons. Our approach does not impose any file system requirements and can be run on any machine capable of running Python code.

This code is intended mainly for academic use, and will therefore not be maintained.

distributedcb's People

Contributors

sdooms avatar sidooms avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.