Giter Site home page Giter Site logo

tinder-cesardiez / todo-database-management-system Goto Github PK

View Code? Open in Web Editor NEW

This project forked from robertzhao2002/database-management-system

0.0 0.0 0.0 9.72 MB

Cornell CS 4321 (Practicum in Database Systems) Fall 2022 Semester Project

Java 99.70% Kotlin 0.28% Makefile 0.02%

todo-database-management-system's Introduction

Database Management System

Top-Level

The top level class of our code is Interpreter.java

Selection Pushing

The selection pushing is initiated in the LogicalPlanBuilder.java, which creates a UnionFindVisitor.java to parse the WHERE expression and use the UnionFind.java to keep track of transitive equalities and the bounds.

We follow the expression visiting algorithm introduced in the handout. For each conjunct, we update the union find and keep track of any unusable expressions. We deviated slightly from the handout by also adding equalities between attributes to the unusable expressions data structure. This is because it is hard to extract this information directly from the union find, so by persisting the expression, we can easily add it back when creating the select/join condition. We will also note that we separate unusable join expressions from unusable select expressions. The difference lies in the number of tables referenced in the conjunct, and this distinction helps us easily extract one or the other without having to check the number of referenced tables later.

Selection Implementation

The selection implementation is decided in the PhysicalPlanBuilder.java when we visit a LogicalSelectOperator.

We followed the cost formulas in the handout, and we utilize the Stats.java class to perform most of the calculations. We start by getting all the indexes that are for the table being selected. If no indexes are available, then we cannot perform any optimizations and just make a scan and select operator. If there are indexes, then we start by getting the cost to scan the table directly. Then, for each index, we create an IndexExpressionVisitor.java to parse the select condition and get the bounds for the index attribute. We compute the reduction factor and cost for using the index based on these bounds, and save the index with lowest cost. Finally, if the best index cost is lower than the scan cost, then we use the best index and create an IndexScanOperator.java, potentially adding a select operator if some conjuncts are not covered by the index. Otherwise, we create a scan and select operator.

Join Ordering

The join ordering is decided in the JoinOrderOptimizer.java.

We followed the dynamic programming algorithm introduced in the handout. We iterate through increasing sizes of subsets. For each subset size, we generate all the combinations of table names, then compute their entry in the DP table. An entry contains the best join order, its associated cost, the size of the join (including the root), and the V-Values for each attribute in the join. Before creating this entry, we have to find the ordering of tables that minimizes the cost. To do this, we iterate through all possible tables to exclude from the current subset, and we obtain the cost by looking up the one-smaller subset in the DP table and adding it's cost and output size. Once we have the ordering with lowest cost, we compute the new output size and corresponding V-Values. These functions generally have three cases: single table scan, single table select, and a join between two tables. In each case, we follow the formulas in the handout. Instead of parsing the join expressions again to obtain the equalities, we make use of the union find to extract sets of equal attributes.

Join Implementation

The join implementation is decided in the PhysicalPlanBuilder.java via the selectJoinImplementation function.

We followed the simple strategy of using SMJ whenever possible. More specifically, if the join expression represents an equijoin, then we create an SMJ operator, otherwise, we create a BNLJ operator. We hardcoded 5 buffer pages for the BNLJ operator.

Running the Application

To run the top-level application, use the command below.

./gradlew run --args="[config-file-path]"

Formatting Code

To have unified formatting, we used a 3rd party formatting Gradle plugin called Spotless by Diffplug. We used the Palantir Java Formatting style. We have the formatter as a pre-commit hook, so it runs every time we commit.

Applying the Formatting to All Java Files

To manually run the formatter, run the command below. NOTE: this will potentially change many files, if they aren't following the formatting guidelines properly.

./gradlew spotlessJavaApply

Verifying that the All Java Files are Formatted Properly

To check if your sourcecode follows the formatting guidelines, run the command below.

./gradlew spotlessJavaCheck

Committing without Formatting

git commit -am [message] --no-verify

Unit Tests

Run all Unit Tests

./gradlew test

Run a Specific Set of Unit Tests

./gradlew test --tests [test-class-name]

Zip Command

make zip

Acknowledgements

Third-party services used to facilitate the project:

todo-database-management-system's People

Contributors

robertzhao2002 avatar thebenkogan avatar c4pt41n-v0yag3r avatar dj293 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.