Giter Site home page Giter Site logo

emberdb's People

Contributors

duongcongtoai avatar

Watchers

 avatar  avatar

emberdb's Issues

Async mode for executor trait

To support async mode, the trait itself must be async.
Reference: https://docs.rs/datafusion/latest/datafusion/physical_plan/trait.ExecutionPlan.html#

For now we only care about 1 method:

    async fn execute(
        &self,
        partition: usize,
        context: Arc<ExecutionContext>,
    ) -> Result<SendableRecordBatchStream>

where SendableRecordBatchStream impl RecordBatchStream

pub trait RecordBatchStream: Stream<Item = ArrowResult<RecordBatch>> {
    /// Returns the schema of this `RecordBatchStream`.
    ///
    /// Implementation of this trait should guarantee that all `RecordBatch`'s returned by this
    /// stream should have the same schema as returned from this method.
    fn schema(&self) -> SchemaRef;
}

It impl std Stream trait, each poll will results to a RecordBatch

#[derive(Clone, Debug, PartialEq)]
pub struct RecordBatch {
    schema: SchemaRef,
    columns: Vec<Arc<dyn Array>>,
}

dyn Array because we don't know a head at compiled time which type the data is. We can use the same set of implementation like arrow crate: https://docs.rs/arrow/latest/arrow/array/index.html

Impl external mergesort join

There is only 1 implementation of join operator currently (grace hash join). There is an edge case that if a partition containing all the records that share the same value on join attribute, recursive hashing will not work and result in infinite recursion (all hashed value will eventually go to the same bucket again) => we need a fallback join operator if this case is met, and we can use sortmerge join like CockroachDB

Impl Seq scan

Description

The SeqScanExecutor iterates over a table and return its tuples, one-at-a-time. A sequential scan is specified by a SeqScanPlanNode. The plan node specifies the table over which the scan is performed. The plan node may also contain a predicate; if a tuple does not satisfy the predicate, it is not produced by the scan.
Hint: Be careful when using the TableIterator object. Make sure that you understand the difference between the pre-increment and post-increment operators. You may find yourself getting strange output by switching between ++iter and iter++.
Hint: You will want to make use of the predicate in the sequential scan plan node. In particular, take a look at AbstractExpression::Evaluate. Note that this returns a Value, on which you can invoke GetAs() to evaluate the result as a native C++ boolean type.
Hint: The output of sequential scan is a copy of each matched tuple and its original record identifier (RID).

Current package has no standard for Operator interface, maybe it's time to make it relevant

trait Operator {
    fn next(&mut self, t: ExecutionContext) -> Result<Batch> {}
}

ExecutionContext is put there for later phase (it contains reference to bpm, lock_manager, txn)
It may hold a reference to an underlying storage that support seq scan. This scan action should receive a filter that can be evaluated (check Expression in ToyDB)

SeqScan operator on construction receive some informatin such as predicate/filter, tablename, schema for output (which fields are selected).

Test

TEST_F(ExecutorTest, DISABLED_SimpleSeqScanTest) {
  // SELECT col_a, col_b FROM test_1 WHERE col_a < 500

  // Construct query plan
  TableMetadata *table_info = GetExecutorContext()->GetCatalog()->GetTable("test_1");
  Schema &schema = table_info->schema_;
  auto *col_a = MakeColumnValueExpression(schema, 0, "col_a");
  auto *col_b = MakeColumnValueExpression(schema, 0, "col_b");
  auto *const500 = MakeConstantValueExpression(ValueFactory::GetIntegerValue(500));
  auto *predicate = MakeComparisonExpression(col_a, const500, ComparisonType::LessThan);
  auto *out_schema = MakeOutputSchema({{"col_a", col_a}, {"col_b", col_b}});
  SeqScanPlanNode plan{out_schema, predicate, table_info->oid_};

  // Execute
  std::vector<Tuple> result_set;
  GetExecutionEngine()->Execute(&plan, &result_set, GetTxn(), GetExecutorContext());

  // Verify
  std::cout << "col_a, col_b" << std::endl;
  for (const auto &tuple : result_set) {
    ASSERT_TRUE(tuple.GetValue(out_schema, out_schema->GetColIdx("col_a")).GetAs<int32_t>() < 500);
    ASSERT_TRUE(tuple.GetValue(out_schema, out_schema->GetColIdx("col_b")).GetAs<int32_t>() < 10);
    std::cout << tuple.GetValue(out_schema, out_schema->GetColIdx("col_a")).GetAs<int32_t>() << ", "
              << tuple.GetValue(out_schema, out_schema->GetColIdx("col_b")).GetAs<int32_t>() << std::endl;
  }
  ASSERT_EQ(result_set.size(), 500);
}

Impl Insert

Description

The InsertExecutor inserts tuples into a table and updates indexes. There are two types of inserts that your executor must support.
The first type are insert operations in which the values to be inserted are embedded directly inside the plan node itself. We refer to these as raw inserts.
The second type are inserts that take the values to be inserted from a child executor. For example, you may have an InsertPlanNode with a SeqScanPlanNode as is child to implement an INSERT INTO .. SELECT ...
You may assume that the InsertExecutor is always at the root of the query plan in which it appears. The InsertExecutor should not modify its result set.
Hint: You will need to lookup table information for the target of the insert during executor initialization. See the System Catalog section below for additional information on accessing the catalog.
Hint: You will need to update all indexes for the table into which tuples are inserted. See the Index Updates section below for further details.
Hint: You will need to use the TableHeap class to perform table modifications.

Test

TEST_F(ExecutorTest, DISABLED_SimpleRawInsertTest)

=> Insert then run seq scan to test result

TEST_F(ExecutorTest, DISABLED_SimpleSelectInsertTest)

This test use a seed table called "table test_1"
Create a seq scan node with some predicate
Embed this node inside insert node
After the execution, execute both seqscan node and seqscan on newly created table and compare

TEST_F(ExecutorTest, DISABLED_SimpleRawInsertWithIndexTest)
  • Insert some raw values into an empty table
  • Execute createIndex operator on the recent table
  • SeqScan on the table, for each tuple, get the value of the indexed columns, then use that value to find RID using the index, then use that RID to fetch from the underlying storage, now we have original tuple and newly fetched tuple using index, compare them

Index:

  • An abstraction with underlying data structure (may use Btree)
    Basic method:
  virtual void InsertEntry(const Tuple &key, RID rid, Transaction *transaction) = 0;

  // delete the index entry linked to given tuple
  virtual void DeleteEntry(const Tuple &key, RID rid, Transaction *transaction) = 0;

  virtual void ScanKey(const Tuple &key, std::vector<RID> *result, Transaction *transaction) = 0;

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.