Giter Site home page Giter Site logo

datasystemslab / recdb-postgresql Goto Github PK

View Code? Open in Web Editor NEW
349.0 29.0 43.0 108.34 MB

RecDB is a recommendation engine built entirely inside PostgreSQL

Makefile 0.88% Shell 0.32% C 89.14% C++ 2.62% Perl 0.87% XSLT 0.05% CSS 0.01% Assembly 0.01% Python 0.02% SQLPL 0.38% PLpgSQL 3.19% DTrace 0.01% XS 0.03% HTML 0.07% Batchfile 0.02% Yacc 1.58% Lex 0.53% M4 0.18% Objective-C 0.12% Csound Score 0.01%
recommendation-algorithms database recommendation-engine recommender-system recommendation recommendation-system machine-learning

recdb-postgresql's Introduction

Welcome to RecDB

RecDB is an Open Source Recommendation Engine Built Entirely Inside PostgreSQL 9.2. RecDB allows application developers to build recommendation applications in a heartbeat through a wide variety of built-in recommendation algorithms like user-user collaborative filtering, item-item collaborative filtering, singular value decomposition. Applications powered by RecDB can produce online and flexible personalized recommendations to end-users.

RecDB Logo current version: v1.1

How to Get Source Code

You can check out the code, as follows:

$ git clone https://github.com/Sarwat/recdb-postgresql.git

Recommended Machine Specifications

RecDB is designed to be run on a Unix operating system. At least 1GB of RAM is recommended for most queries, though when working with very large data sets more RAM may be desirable, especially when you are not working with apriori (materialized) recommenders.

Building and Installation

Once you've synced with GitHub, the folder should contain the source code for PostgreSQL, as well as several Perl scripts in a directory named "./PostgreSQL/scripts/". If you are familiar with installing PostgreSQL from source, RecDB is installed the exact same way; for the sake of simplicity, however, we have included these scripts that will simplify the process. Note that the installation and remake-related scripts MUST be run from within the PostgreSQL folder in order for them to work correctly.

  1. Run the installation script install.pl.
perl scripts/install.pl [abs_path]

[abs_path] is the absolute path to the directory where you want PostgreSQL installed. The directory should exist before running this script. This will also create a folder "data" in the PostgreSQL folder; this is where the database will be located.

  1. Run the database server script pgbackend.pl.
perl scripts/pgbackend.pl

The install.pl script stores the install path in a separate file, so there shouldn't be any need to specify it.

  1. In a second terminal, run the database interaction script pgfrontend.pl.
perl scripts/pgfrontend.pl [db_name] [server_host]

[db_name] is the name of the database that you intend to use. [server_host] is the address of the host server running the PostgreSQL backend. If this option is not specified, the script assumes it to be "localhost".


If you need to rebuild PostgreSQL, there are two options.

If you have not modified the grammar, you can do a quick rebuild with remake.pl.

perl scripts/remake.pl

If you have modified the grammar, you will need to do a longer rebuild with remakefull.pl.

perl scripts/remakefull.pl [abs_path]

[abs_path] is the absolute path to the directory where you want PostgreSQL installed. The directory should exist before running this script.

If you ever want to eliminate the current database , use the clean.pl script.

perl scripts/clean.pl [db_name] [server_host]

How RecDB Works

Loading Data

We provide the MovieLens data to build a "Hello-World" movie recommendation application using RecDB. You can load the data using the sql script called "initmovielens1mdatabase.sql" stored in "./PostgreSQL" directory. We provide the dataset at "./PostgreSQL/moviedata / MovieLens1M/" directory. For instance, the ratings (i.e., ml_ratings) table may have a schema as follows:

+-----------------------------+
| userid | itemid | ratingval |
+-----------------------------+

Creating Recommenders

Users may create recommenders apriori so that when a recommendation query is issued may be answer with less latency. The user needs to specify the ratings table in the ON clause and also specify where the user, item, and rating value columns are in that table. Moreover, the user has to designate the recommendation algorithm to be used to predict item ratings in the USING clause.

CREATE RECOMMENDER MovieRec ON ml_ratings
USERS FROM userid
ITEMS FROM itemid
EVENTS FROM ratingval
USING ItemCosCF

Currently, the available recommendation algorithms that could be passed to the USING clause are the following:

  • ItemCosCF Item-Item Collaborative Filtering using Cosine Similarity measure.

  • ItemPearCF Item-Item Collaborative Filtering using Pearson Correlation Similarity measure.

  • UserCosCF User-User Collaborative Filtering using Cosine Similarity measure.

  • UserPearCF User-User Collaborative Filtering using Cosine Similarity measure.

  • SVD Simon Funk Singular Value Decomposition.

Similarly, materialized recommenders can be removed with the following command:

DROP RECOMMENDER MovieRec

Note that if you query a materialized recommender, the three columns listed above will be the only ones returned, and attempting to reference any additional columns will result in an error.

Recommendation Query

In the recommendation query, the user needs to specify the ratings table and also specify where the user, item, and rating value columns are in that table. Moreover, the user has to designate the recommendation algorithm to be used to predict item ratings. For example, if ml_ratings(userid,itemid,ratingval) represents the ratings table in a movie recommendation application, then to recommend top-10 movies based on the rating predicted using Item-Item Collaborative filtering (applying cosine similarity measure) algorithm to user 1, the user writes the following SQL:

SELECT * FROM ml_ratings R
RECOMMEND R.itemid TO R.userid ON R.ratingval USING ItemCosCF
WHERE R.userid = 1
ORDER BY R.ratingval
LIMIT 10

When you issue a query such as this, the only interesting data will come from the three columns specified in the RECOMMEND clause. Any other columns that exist in the specified ratings tables will be set to 0.

Note that if you do not specify which user(s) you want recommendations for, it will generate recommendations for all users, which can take an extremely long time to finish.

More Complex Queries

The main benefit of implementing the recommendation functionality inside a database engine (PostgreSQL) is to allow for integration with traditional database operations, e.g., selection, projection, join. For example, the following query recommends the top 10 Comedy movies to user 1. In order to do that, the query joins the recommendation with the Movies table and apply a filter on the movies genre column (genre LIKE '%Comedy%').

SELECT * FROM ml_ratings R, Movies M
RECOMMEND R.itemid TO R.userid ON R.ratingval USING ItemCosCF
WHERE R.userid = 1 AND M.movieid = R.itemid AND M.genre LIKE '%Comedy%'
ORDER BY R.ratingval
LIMIT 10

Publications

Authors

Support or Contact

Having trouble with RecDB? contact [email protected] and we’ll help you sort it out.

Follow @Rec_DB on Twitter for updates

recdb-postgresql's People

Contributors

anurag-akajeetu avatar rmoraffah avatar thesounddefense avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

recdb-postgresql's Issues

postgresql extension

building the server from scratch is not an option for me unfortunately, as I want to deploy to a PAAS. Is there any chance to get this API packaged into an extension for psql sometime?

best
Bernd

RecDB installation on macOS 10.13.6 failed.

Hi,

Just start to install RecDB on my Mac.
But got failed with logs:

Compiling PostgreSQL.
In file included from bootparse.y:27:
In file included from ../../../src/include/bootstrap/bootstrap.h:17:
../../../src/include/nodes/execnodes.h:1494:8: error: expected member name or ';' after declaration specifiers
int ID; /* the user or item ID /
~~~ ^
bootparse.c:107:12: note: expanded from macro 'ID'
#define ID 259
^
In file included from bootparse.y:27:
In file included from ../../../src/include/bootstrap/bootstrap.h:17:
../../../src/include/nodes/execnodes.h:1494:5: error: expected ';' at end of declaration list
int ID; /
the user or item ID */
^
;
2 errors generated.
make[3]: *** [bootparse.o] Error 1
make[2]: *** [bootstrap-recursive] Error 2
make[1]: *** [all-backend-recurse] Error 2
make: *** [all-src-recurse] Error 2

Any help? Got postgresql installed already.

Linking other algorithms with RecDB

Is there a way that I can use my own implemented algorithm to produce recommendations using RecDB.
I want to link my algo with RecDB so that it can be used in the same way as the built in ones are used.
Please tell me the way to do this.

Column name not recognized by materialized recommender

After recommender materialization a (previously working) recommendation query does not work anymore.

How to reproduce the problem:
0) load the movie dataset

  1. verify that the query works
  2. materialize the recommender
  3. verify that the query fails
  4. drop the recommender
  5. verify that the query works

Here is the complete list of SQL statements (except point 0):

CREATE RECOMMENDER MovieRec ON ml_ratings
USERS FROM userid
ITEMS FROM itemid
EVENTS FROM ratingid
USING ItemCosCF;
SELECT * FROM ml_ratings R
RECOMMEND R.itemid TO R.userid ON R.ratingval
USING ItemCosCF
WHERE R.userid = 1
ORDER BY R.ratingval
LIMIT 10;
SELECT * FROM ml_ratings R
RECOMMEND R.itemid TO R.userid ON R.ratingval
USING ItemCosCF
WHERE R.userid = 1
ORDER BY R.ratingval
LIMIT 10;

ERROR: column r.ratingval does not exist at character 62
STATEMENT: SELECT * FROM ml_ratings R
RECOMMEND R.itemid TO R.userid ON R.ratingval
USING ItemCosCF
WHERE R.userid = 1
ORDER BY R.ratingval
LIMIT 10;

DROP RECOMMENDER MovieRec;
SELECT * FROM ml_ratings R
RECOMMEND R.itemid TO R.userid ON R.ratingval
USING ItemCosCF
WHERE R.userid = 1
ORDER BY R.ratingval
LIMIT 10;

Explain Analyze breaks with recommendation

EXPLAIN ANALYZE breaks with a recommendation query. To reproduce the problem, first load the MovieLens dataset and then perform the following queries:

CREATE RECOMMENDER MovieRec ON ml_ratings USERS FROM userid ITEMS FROM itemid EVENTS FROM ratingval USING itemcoscf;

Then execute the following:

EXPLAIN ANALYZE SELECT * FROM ml_ratings RECOMMEND itemid TO userid ON ratingval USING itemcoscf WHERE userid = 1 ORDER BY ratingval LIMIT 10;

The back-end ends unexpectedly and returns a segmentation fault.

Note that EXPLAIN alone works just fine.

Should the DB columns be indexed? I noticed none of the example code was.

So, I am trying to experiment on using RecDB to deal with a MASSIVE database with about 170 million entries and have had some rather slow query times, as you can imagine. I was wondering if indexing userID or itemID columns of my database would provide any noticeable speed advantages since none of the examples scripts appeared to create indexes on the DB. Also for a DB of a 170 million reviews what would you expect the runtime for creating the APRIORI recommender? Hours? Days? Weeks?

Any information would be extremely helpful.

attempting to run RecATHON

Hello,
I am new to programming, and I am attempting to execute RecATHON.
So far, I found startrecathonserver.jar on recdb-postgresql\PostgreSQL\experiments directory.
I have executed it by using java -jar, but I have encountered this error that recathonconfig.properties is missing.
So, I created it without writing a code, and terminal prints as 'Server Listening!' once I executed it again.

I am not sure I am on right track and/or I don't know of next step.
How do I start to run RecATHON?

ERROR: type mismatch in getTupleFloat() when using UserPearCF

Hi, everyone!
Thanks for such an interesting implementation as recdb.

I tried to play with different options and successfully applied item-item recommenders.
But, got this problem using UserPearCF and UserCosCF recommenders

I created a rating table

create table ratings
(
	record_id bigserial not null
		constraint ratings_users_pk
			primary key,
	user_id integer not null,
	story_id integer not null,
	rating integer not null
);

create unique index ratings_record_id_uindex
	on ratings (record_id);

create index ratings_story_id_index
	on ratings (story_id);

create index ratings_user_id_index
	on ratings (user_id);

Then, I created a recommender for this

CREATE RECOMMENDER UserRec ON ratings
USERS FROM user_id
ITEMS FROM story_id
EVENTS FROM rating
USING userpearcf;

Trying to execute a query similar to the example

SELECT R.story_id, R.rating
          FROM ratings R RECOMMEND R.story_id TO R.user_id ON R.rating USING UserPearCF
            WHERE R.user_id = 1000
            ORDER BY R.rating DESC

Got this error:

ERROR:  type mismatch in getTupleFloat()

Installation error

I have installed the needed packages using:
sudo apt-get install build-essential libreadline-dev zlib1g-dev flex bison libxml2-dev libxslt-dev libssl-dev

But when I run the install.pl script I get the following error:

***
ERROR: `flex' is missing on your system. It is needed to create the
file `bootscanner.c'. You can either get flex from a GNU mirror site
or download an official distribution of PostgreSQL, which contains
pre-packaged flex output.
***
make[3]: *** [bootscanner.c] Error 1
make[2]: *** [bootstrap-recursive] Error 2
make[1]: *** [all-backend-recurse] Error 2
make: *** [all-src-recurse] Error 2
Compilation error.

How can I solve this?

RecDB query to recommender fails

Hi,

I've created a recommender based off a table I have in the RecDB. My understanding is that when I run a recommending query after creating a recommender, RecDB automatically sees a created recommender and uses it for a faster response.

However, after creating a recommender and running a recommend query, I get the following error server side

select * from relations recommend anime_id to user_id on score using ItemCosCF where user_id = 499231 order by score desc limit 20;
LOG:  server process (PID 17044) was terminated by signal 11: Segmentation fault
DETAIL:  Failed process was running: select * from relations recommend anime_id to user_id on score using ItemCosCF where user_id = 499231 order by score desc limit 20;
LOG:  terminating any other active server processes
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and repeat your command.
FATAL:  the database system is in recovery mode
LOG:  all server processes terminated; reinitializing
LOG:  database system was interrupted; last known up at 2016-04-28 03:13:29 UTC
LOG:  database system was not properly shut down; automatic recovery in progress
LOG:  record with zero length at C/2257FB40
LOG:  redo is not required
LOG:  database system is ready to accept connections
LOG:  autovacuum launcher started



Could anyone kindly suggest what the problem might be? Thanks for your time.

RecDB is not working with Postgres JDBC Driver

I have noticed that while connecting with Postgres JDBC driver, the system can not recognize Recommend statement. Since connecting to the applications using java is an important part of the DBMS, I think this would be useful to add this feature to RecDB.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.