[kvstore] Support property index

Please upload your design doc before starting your work. <a class="user-mention notra

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

There are some requirements here: Index and data should be con

Design concept: Ensures absolute consistency without support

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

Comments (32)

dangleptr commented on May 18, 2024 2

After discussed offline, the workflow as follows:

Create index
First of all, "create index" will block the console until the index could work!

QE(Query Engine) send "create index" to MS(Meta Service)
MS dispatch "create index" to all SS(Storage Service) (Through RPC)
After received "create index" from MS, SS should record a time point, and from that point, all inserts will generate index data automatically. Meanwhile, SS start a background job to scan all old data before that point and fill back index data for them.
After index created on SS, it responses ack to MS. And after MS collect all acks from SS, it will write index information and response ack to QE.

Notes:

During the process, any SS crashed will result in failure.
Before the index information stored on MS, it is invalid to outside.
Any updates on schema, the index still work if no changes happened on the column indexed. If you want to change the column with index, you should drop the index firstly.
Any updates on data with index, SS should read the related row firstly, and delete the index related secondly, finally write back the new data and index. (It is costly, maybe we could figure out a better way)

Drop Index

Delete the index information on MS.
Remove the staleness index during compaction on SS.

Any suggestions? @sherman-the-tank @boshengchen @darionyaphet

from nebula.

dangleptr commented on May 18, 2024 2

If no more suggestions today, i will split the issue to more tasks and discuss the detail for each task.

@sherman-the-tank @boshengchen @darionyaphet

from nebula.

dangleptr commented on May 18, 2024 1

Please upload your design doc before starting your work. @boshengchen @darionyaphet

from nebula.

sherman-the-tank commented on May 18, 2024 1

@boshengchen Awesome thoughts! It's a wonderful start point.

Here are some suggestions and questions I have

What is the reason to store the column type in the index schema? (We can just store a schema version)
The storage design (key format) seems not considering the index with multiple columns
Here are some use cases for index, not sure if we could cover all of them

Only use the first N columns in a M-column index (where M > N)
Range query (get all vertices which property P fits in a range)
Gets all unique values for a given column

Need to make sure the index will reflect the creation/deletion/modification of the original vertices or the original edges

This seems a huge task. After we settle down on design, I might want to break this into multiple smaller subtasks

from nebula.

sherman-the-tank commented on May 18, 2024

We want to have the infrastructure to support indices on one or more properties

from nebula.

dangleptr commented on May 18, 2024

There are some requirements here:

Index and data should be consistent.
Support building index online.
Support dropping index.

from nebula.

ayyt commented on May 18, 2024

👍

from nebula.

boshengchen commented on May 18, 2024

Design concept:

Ensures absolute consistency without supporting transactions，consistency can only be guaranteed by the current storage. So, can not using other storage structures.
Must ensure that each data write operation is synchronized with the index data write.
Primary and foreign keys are not supported.
Unique constraints are not supported.

DDL grammar ：

CREATE INDEX index_name ON TAG | EDGE  tag_name | edge_name (col1, col2,…)
ALTER INDEX index_name ADD (col1, col2,…) | DROP (col5,col6,…)
DROP INDEX index_name

Metadata manager:

### Index metadata:

	Struct IndexItem {
	1: byte col_id,
	2: string name,
	3: SupportedType type,
	}
	
	Struct Index{
	1: byte max_id,
	2: string name,
	3: list< IndexItem> items,
	}
	
	Key :  __schema_index__ + spaceId + indexType(tag or edge) + tag_id or edgeType + version
	Val :   serialize（Index）
	
 ### For example : 
	
	CREATE TAG person(name string, age int TTL = 100, married bool, salary double, create_time timestamp)
	
	Create index person_index on tag person(name, age)
	
	Index val is:
	
	Index {
		Max_id = 2,
		Name = person_index,
		items = {
			Id = 1,
	    		Name = name,
	    		Type = string,
		},
		{
			Id = 2,
			Name = age,
			Type = int,
		}
	}
	
	Alter index person_index drop (name)
	
	Index val is:
	
	Index {
		Max_id = 2,
		Name = person_index,
		items = {
		  Id = 2,
		  Name = age,
		  Type = int,
		}
	}
	
	Alter index person_index add(salary)
	
	Index val is:
	
	Index {
		Max_id = 3,
		Name = person_index,
		items = {
			Id = 2,
			Name = age,
			Type = int,
		},
		{
	    		Id = 3,
	    		Name = salary,
	    		Type = double,
		}
	}
	
### Index cache in meta :
	// TODO

Storage :

vertex index:
key : partId + tagId + colID(from IndexItem) + prop_str + vertexId + version
val : rowKey

// vertexId + version : just to make sure that key is unique.

edge index:
key : partId + edgeType + colId(from IndexItem) + prop_str + src_vertexId + dist_vertexId + version
val : rowKey
// src_vertexId + dist_vertexId + version : just to make sure that key is unique.

Related feature ：

create index // First , Assemble the index metadata. then, Assemble index data in parallel on each part.
alter index // Add operation same with create index logic; Drop operation need get colId from meta cache of index cache, then assembly delete prefix key.
drop index // Same with alter index drop operation.
alter tag // Important ！Further analysis need , Whether old versions of index data should be deleted？
alter edge // Same with alter tag.
remove tag // Clear all related data.
remove edge // Clear all related data.
insert vertex // Check for index keys by index cache, if including index entry, need assembly index key and val together with vertex data.
insert edge // Same with insert vertex.
QueryVertexPropsProcessor // If the props has index entry, Scan the index first，Then use the row keys you've got to do MultiGet.
QueryEdgePropsProcessor // same with QueryVertexPropsProcessor.

// TODO , The following are temporarily unsupported.
update vertex
update edge
delete vertex
delete edge

from nebula.

darionyaphet commented on May 18, 2024

Here is non existent ALTER INDEX, what you said should be ALTER TAG or ALTER EDGE

Add one more DDL create tag/edge with specify index :

CREATE TAG <tag_name> (<prop_def_list>)
<prop_def_list> ::= <prop_def>+ (key_def)*
<prop_def> ::= <prop_name>,<type> 
<prop_name> ::= <label> 
<key_def> ::= <key_name>+
<key_name> ::= <label> 

CREATE EDGE <edge_type_name> (<prop_def_list>)
<edge_type_name> := <label>

from nebula.

boshengchen commented on May 18, 2024

@boshengchen Awesome thoughts! It's a wonderful start point.

Here are some suggestions and questions I have

What is the reason to store the column type in the index schema? (We can just store a schema version)
answer:
1, About column type, various filtering conditions may be encountered, for example > , < , !=, == .... , and different data types have different algorithms. The filtering is done during the index scan phase. So store the column type for later use. we can also get it from the original schema.
2, About schema version, I've stored it in index meta key ：schema_index + spaceId + indexType(tag or edge) + tag_id or edgeType + version

The storage design (key format) seems not considering the index with multiple columns
answer : Actually, I've been thinking about it, we can solve multi-column by the current design，for example :
CREATE TAG person(name string, age int TTL = 100, married bool, salary double, create_time timestamp)
Create index person_index on tag person(name, age)
Index {
Max_id = 2,
Name = person_index,
items = {
Id = 1,
Name = name,
Type = string,
},
{
Id = 2,
Name = age,
Type = int,
}
}
Key structure:
partId + tagId + colID + prop_str + vertexId + version
The index key should be :
partId + tagId + 1 + prop_str + vertexId + version
partId + tagId + 2 + prop_str + vertexId + version
The advantage is that the same index can be used to handle the filtering of different columns：
GO FROM 1,2,3 OVER friend WHERE person.name == "cbs"
GO FROM 1,2,3 OVER friend WHERE person.age > 20

Here are some use cases for index, not sure if we could cover all of them

Only use the first N columns in a M-column index (where M > N)
I can't fully understand what scenario is used for this case. 😊

Range query (get all vertices which property P fits in a range)

Gets all unique values for a given column
Good ！Let me think it over.

Need to make sure the index will reflect the creation/deletion/modification of the original vertices or the original edges

This seems a huge task. After we settle down on design, I might want to break this into multiple smaller subtasks

Nice suggestion @sherman-the-tank ，I have a clearer and more thorough understanding from your suggestion. Thanks ! Answer your questions one by one as above.

from nebula.

boshengchen commented on May 18, 2024

Here is non existent ALTER INDEX, what you said should be ALTER TAG or ALTER EDGE

No, ALTER INDEX is supplement to @dangleptr's requirement .
ALTER INDEX related to index meta data update, and index data re-building.
ALTER TAG or ALTER EDGE only index data re-building needed .

from nebula.

darionyaphet commented on May 18, 2024

I will keep updating this design document

Query Language

CREATE INDEX Syntax

<create-tag-index-stmt> ::= CREATE TAG INDEX  ( <tag-index-name> )?
                            ON <tag-name> '(' <tag-property-name> ')'
                            (ASYNC)?

<create-edge-index-stmt> ::= CREATE EDGE INDEX ( <edge-index-name> )?
                             ON <edge-name> '(' <edge-property-name> ')'
                            (ASYNC)?

Sample:

CREATE TAG  INDEX userIndex ON Movie(user) ASYNC;
CREATE EDGE INDEX likeIndex ON Watch(like);

DROP INDEX Syntax

<drop-tag-index-stmt> ::= DROP TAG INDEX <tag-index-name>

<drop-edge-index-stmt> ::= DROP EDGE INDEX <edge-index-name>

Sample:

DROP TAG  INDEX userIndex;
DROP EDGE INDEX likeIndex;

from nebula.

dangleptr commented on May 18, 2024

Thanks for your docs @boshengchen @darionyaphet

Let's break the topic into three parts:

What' the DDL about index?
How to store the index in storage layer?
How to build the index and how to drop it?

1. What' the DDL about index?
About the index language, i think we should support "CREATE/DROP index".
And @darionyaphet 's design looks good to me.

But i have a question about "create index"?
What's the actions next after typing "create index"? Just create meta information or build the real index on storage layer?

from nebula.

dangleptr commented on May 18, 2024

2. How to store the index in storage layer?
The index and data should be stored together to ensure the consistency.
Currently, our data has multi versions, so the index is only created on the latest one?
And how to deal with update data? how to deal with update schema?

from nebula.

dangleptr commented on May 18, 2024

3. How to build the index and how to drop it?

How to discover the index is created on storage-side?
When to build the index on original data?
When to build the index on new coming data?
How to drop a index?

from nebula.

darionyaphet commented on May 18, 2024

By default, when an index is created, it will build index synchronously during CREATE is called.

But it's not feasible when the dataset is very large. So I support a ASYNC keyword.

When this keyword is specify, it will return immediately and using a tool to make index for old data.

If the rebuild index failed, user can get a feedback.

from nebula.

boshengchen commented on May 18, 2024

I will keep updating this design document

Query Language

CREATE INDEX Syntax

<create-tag-index-stmt> ::= CREATE TAG INDEX  ( <tag-name> )?
                            ON <tag-name> '(' <tag-property-name> ')'
                            (ASYNC)?

<create-edge-index-stmt> ::= CREATE EDGE INDEX ( <edge-name> )?
                             ON <edge-name> '(' <edge-property-name> ')'
                            (ASYNC)?

Sample:

CREATE TAG  INDEX userIndex ON Movie(user) ASYNC;
CREATE EDGE INDEX likeIndex ON Watch(like);

DROP INDEX Syntax

<drop-tag-index-stmt> ::= DROP TAG INDEX ( <space> '.' )? <identifier>

<drop-edge-index-stmt> ::= DROP EDGE INDEX ( <space> '.' )? <identifier>

Sample:

DROP TAG  INDEX userIndex;
DROP EDGE INDEX cinema.likeIndex;

Good idea, It seems better to put the keyword TAG before INDEX 👍 .
In addition, there are two questions：
1 . What means of by CREATE TAG INDEX ( <tag-name> )? ? is this index_name ? or same with index_name?
2, about ASYNC,When start creating index, Whether should to block all writes ? Otherwise, how do ensure that the new data is indexed？So I think there are two point that need to be analyzed: Whether asynchronous operations are necessary？ Whether write operations need to be blocked during index creat?

from nebula.

darionyaphet commented on May 18, 2024

Storage Layer

In Storage Layer, we should save index and data in a batch operation to make sure the consistency.

About Index Structure

The index key was composed of : prefix + index-name + version + value + key.

(Currently uncertainty is whether a type needs to be added between prefix and index-name)

The prefix is used to make sure the index is in front of the data.

The value is serialized by multiple columns value.

About Append

When use the first N columns in a M-column index (M > N), we can use prefix query it and also valid for range query.

If we want to Gets all unique values for a given column, deserialize the value when the column is included by index.

About Update

When update the data, should remove the index data firstly and append the new index .

About Drop

Remove the whole tag and edge by range :
[prefix + index-name + MIN, prefix + index-name + MAX]

Remove the vertex and edge should remove index according to the column value in a batch .

As shown in the following figure :

-------------------------------------------------------------
|                                Index                      |
-------------------------------------------------------------
|                   Key              |         Col          |
-------------------------------------------------------------
|  0000-index_name-version-0102-0001 |                      |
|  0000-index_name-version-0102-0002 |                      |
|  0000-index_name-version-0103-0003 |                      |
|  ................................. |                      |
|  0000-index_name-version-0405-0099 |                      |
-------------------------------------------------------------
|                                Data                       |
-------------------------------------------------------------
|                Key                |  Col0 |  Col1 |  Col2 |
-------------------------------------------------------------
|                0001               |   01  |   02  |   03  |
|                0002               |   01  |   02  |   ..  |
|                0003               |   01  |   03  |   ..  |
|                ....               |   ..  |   ..  |   ..  |
|                0099               |   04  |   05  |   06  |
-------------------------------------------------------------

from nebula.

boshengchen commented on May 18, 2024

Actually, I still don't understand of "Only use the first N columns in a M-column index (where M > N)",
Could give me an example? Or a SQL? Thanks.

from nebula.

boshengchen commented on May 18, 2024

Storage Layer

In Storage Layer, we should save index and data in a batch operation to make sure the consistency.

About Index Structure

The index key was composed of : prefix + index-name + version + value + key.

(Currently uncertainty is whether a type needs to be added between prefix and index-name)

The prefix is used to make sure the index is in front of the data.

The value is serialized by multiple columns value.

About Append

When use the first N columns in a M-column index (M > N), we can use prefix query it and also valid for range query.

If we want to Gets all unique values for a given column, deserialize the value when the column is included by index.

About Update

When update the data, should remove the index data firstly and append the new index .

About Drop

Remove the whole tag and edge by range :
[prefix + index-name + MIN, prefix + index-name + MAX]

As shown in the following figure :
-------------------------------------------------------------
|                                Index                      |
-------------------------------------------------------------
|                   Key              |         Col          |
-------------------------------------------------------------
|  0000-index_name-version-0102-0001 |                      |
|  0000-index_name-version-0102-0002 |                      |
|  0000-index_name-version-0103-0003 |                      |
|  ................................. |                      |
|  0000-index_name-version-0405-0099 |                      |
-------------------------------------------------------------
|                                Data                       |
-------------------------------------------------------------
|                Key                |  Col0 |  Col1 |  Col2 |
-------------------------------------------------------------
|                0001               |   01  |   02  |   03  |
|                0002               |   01  |   02  |   ..  |
|                0003               |   01  |   03  |   ..  |
|                ....               |   ..  |   ..  |   ..  |
|                0099               |   04  |   05  |   06  |
-------------------------------------------------------------

Oh, yeah ! How can scan second column ? such as where col1 == 02 ?
Another，how can delete the index data when delete vertex or edge? I means how to get the index key by data row?

from nebula.

boshengchen commented on May 18, 2024

2. How to store the index in storage layer?
The index and data should be stored together to ensure the consistency.
Currently, our data has multi versions, so the index is only created on the latest one?
This is a difficult point，Or build index via latest version of the schema. Or build index for different schema versions according to business needs. We still need to analyze which way is better. Any idea?
And how to deal with update data? how to deal with update schema?
About update schema, Version question need to be identified first . Right?

from nebula.

dangleptr commented on May 18, 2024

Let's talk about it offline. Will be more efficient.

from nebula.

darionyaphet commented on May 18, 2024

How can scan second column ? such as where col1 == 02 ?

It's seems could not fetch the date using the second column in index .

from nebula.

sherman-the-tank commented on May 18, 2024

I'm glad to see a lot of great ideas here. Just want to through a few of my thoughts

The index should work very similar to mysql index. So please refer mysql on how index works
I highly suggest to make async mode as the default behavior when creating index, given the large amount of data we have.
We should NOT introduce another tool to check the status of the async index build. We should be able to return the status report using statement "SHOW INDEX..."
Regarding the storage key format, I would like to see how we are going to serialize multiple values, because that's going to affect the prefix match
Regarding how the original schema change will affect the index, I think we should do the same thing as mysql. If I remember correctly, when a column is used in an index, you cannot simply alter that column. You have to explicitly drop the index first. You can think there is a constrain here

from nebula.

sherman-the-tank commented on May 18, 2024

We SHOULD always consider async mode first in a distributed environment.

from nebula.

darionyaphet commented on May 18, 2024

I highly suggest to make async mode as the default behavior when creating index, given the large amount of data we have.

In MySQL, CREATE INDEX worked as a sync behavior. When it's finished, user will assume the index have build successful . I'm afraid using async mode as default will make user confused .

We should NOT introduce another tool to check the status of the async index build. We should be able to return the status report using statement "SHOW INDEX..."

What I said another tool is a command to startup build index on old data in async mode.

Regarding the storage key format, I would like to see how we are going to serialize multiple values, because that's going to affect the prefix match

Using dataman to serialize multiple values would be a good choise ? A viewpoint is to avoid take up unnecessary disk space.

Regarding how the original schema change will affect the index, I think we should do the same thing as mysql. If I remember correctly, when a column is used in an index, you cannot simply alter that column. You have to explicitly drop the index first. You can think there is a constrain here

After talking offline, I think it's forbidden to delete the indexed columns and should drop it first.

from nebula.

dangleptr commented on May 18, 2024

Agreed with @darionyaphet . To be consistent with mysql, we don't support async mode firstly.

from nebula.

dangleptr commented on May 18, 2024

Regarding the storage key format, I would like to see how we are going to serialize multiple values, because that's going to affect the prefix match

If you create index on two columns such as A + B, just combine the two values in order.

from nebula.

boshengchen commented on May 18, 2024

Cool, Take us to fly.

from nebula.

dangleptr commented on May 18, 2024

The tasks list here. Any suggestions? @sherman-the-tank @boshengchen @darionyaphet

from nebula.

sherman-the-tank commented on May 18, 2024

Awesome work, guys!!

Discussed with @dangleptr offline, here are the summary. I would like to see this integrated into the design documentation

Regarding whether the default mode is async or sync, unfortunately we cannot strictly follow what mysql does. Please keep in mind, in distributed system, any time-consuming operation should be async. We should make it a rule. The reason is actually very simple. In single-host system, such as mysql, the status of that single host can be easily tracked and modified. But that's not the case in the distributed system. For instance, in mysql console, if the waiting time is too long, user can always use CTRL-C to stop the index building process, which will cause CREATE INDEX fail. But in distributed system, it's really hard to use CTRL-C to stop the process.
CREATE TAG INDEX and CREATE EDGE INDEX should return succeeded right away when the index meta info is successfully created on the meta server. Each index could be in any of these status, NORMAL,, CONSTRUCTING,, INVALID. The newly created index should be in INVALID status. Users can check the status by executing command SHOW INDEX
Rebuilding index is a back-end job, so any index with CONSTRUCTING status can be interrupted by command "STOP REPAIRING INDEX...". If the index construction is interrupted, the index's status becomes INVALID
When data change, index should be updated automatically. But this could be broke by command "INSERT... NO INDEX". The purpose of this command is to load large chunk of data faster. If that happens, the status of the index should become INVALID
Any INVALID index can be repaired by command "REPAIR INDEX ..."

from nebula.

dangleptr commented on May 18, 2024

Sounds good to me. Let me post the requirements mentioned above in each detail task.

from nebula.

[kvstore] Support property index about nebula HOT 32 CLOSED

Comments (32)

Design concept:

DDL grammar ：

Metadata manager:

Storage :

Related feature ：

Query Language

Query Language

Storage Layer

Storage Layer

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent