This repository offers a simple implementation of a modified version of Twitter by using different type of databases. The resulting queries and performance to answer different questions were presented at Cabane.io 2021 conference.
The results and presentation slides associated to this presentation are available here.
You can also have fun and create your own queries if you'd like.
Our data model, very simplified compared to the real Twitter, looks like this:
There is one important detail here regarding the User
. In the real Twitter, there are not many types of users.
I wanted however to experiment inheritance, since it's a common problem developers have to face when working in Object
Oriented paradigm, and most times, that hierarchy has to be represented in the database as well. For this reason,
the User
is declined in two different concrete types: Business
, and Individial
Many databases are compared in this non-exhaustive enumeration in the attempt to create the most performant and maintainable database model to answer classic questions for a social network, while also remaining developer friendly.
The accent is mostly put on the ease of developing and understanding the different queries, but performances are also considered.
The first database in our comparison list is a pretty popular one in the SQL world. PostgreSQL widely used, extremely powerful and is also the most popular overall because of the numerous plugins available for it.
Next one on our list is by far the most popular Graph database on the market. Neo4j has been around since 2007, but it's gotten more popular only in the last few years, as use-cases for Graph databases seem to multiply.
MongoDB was explored as well to create and compare to the two other databases. The idea was however rapidly abandoned, as it is fairly complicated to represent a social network with independent documents, considering that every relation is a possibly infinitely growable mutable list.
To give an idea, I had to use either Document References , which has a lot of limitations, or map everything manually by doing multiple queries to the database, which would have turned it in a not-so-good SQL database.
It appeared obvious after a few hours of trying to make it work that a social-network was an anti-use-case for Mongo.
- Java 11+
- Docker with docker-compose to start the databases
The source code available here will take care of inserting fake data in both database types automatically. It does so by creating random users, and assigning to each user a random number of tweets, retweets, likes, mentions, etc.
# For all databases
docker-compose up -d
Neo4j Browser is available at http://localhost:7474
- User: neo4j
- Password: cabaneio2021
As for Postgres, choose your favorite IDE
- User: postgres
- Password: cabaneio2021
- Database: cabaneio
./gradlew assemble && java -jar build/libs/cabane.io-twitter.jar
Once inside the interactive shell, you can run a suite of commands
twitter:> help
twitter:> insert-users --inserters postgres,neo4j --count 100
twitter:> exit
You can also start it directly from your IDE