oom-ai / oomstore Goto Github PK
View Code? Open in Web Editor NEWLightweight and Fast Feature Store Powered by Go (and Rust).
Home Page: https://oom.ai
License: Apache License 2.0
Lightweight and Fast Feature Store Powered by Go (and Rust).
Home Page: https://oom.ai
License: Apache License 2.0
Register new feature to onestore
register a new feature
Usage:
featctl register feature [flags]
Flags:
-d, --description string feature description
-g --group string feature the group the feature belongs to
--value_type string feature type
-h, --help help for feature
featctl register feature phone_model --group device --value_type varchar(30) --description "手机型号"
API Description
GetHistoricalFeatureValues
takes in a list of real labels and corresponding timestamps, returns historical feature values based on timestamps.
API Definition
GetHistoricalFeatureValues(ctx context.Context, opt types.GetHistoricalFeatureValuesOpt) error
API Usage
err := store.GetHistoricalFeatureValues(ctx, opt)
Motivation
Without tidb-toolkit
dependencies we can use mysql as our playground, speed up integration test and integrate it into ci workflow.
Progress
dumpling
dependencytidb-lightning
dependencyIt is currently using selecting from the feature
table, which won't contain the entity_name column. Use rich_feature
view instead.
Use featctl list revisions
to list all existed revisions of a specified feature group.
Proto Interface:
featctl list revisions --group <group>
Example:
$ featctl list revisions --group device
revision, create_time, modify_time, description
device_20210922, 2021-09-22T16:08:38+08:00, 2021-09-22T16:08:38+08:00, ...
device_20210923, 2021-09-23T16:08:38+08:00, 2021-09-23T16:08:38+08:00, ...
Currently, in API CreateFeature
, we don't check the correctness of value_type
when creating a feature. If user pass in an invalid value_type
(for example, var(12)
), it will only be found when user tries to import data for that feature.
We will add validation for value_type
when creating a feature, throw out the error if value_type
is invalid.
In the API CreateFeature
, we will create a temporary table with that feature. If failed, we know the value_type
is invalid.
Since we're using sqlx, according to its docs:
you should always use these to send values to the database, as they will prevent SQL injection attacks. database/sql does not attempt any validation on the query text; it is sent to the server as is, along with the encoded parameters.
A common misconception with bindvars is that they are used for interpolation. They are only for parameterization, and are not allowed to change the structure of an SQL statement. For instance, using bindvars to try and parameterize column or table names will not work.
We should use bindvars whenever possible, i.e., WHERE field = ?
rather than WHERE xx = '%s'
API Description
UpdateFeature
updates the specified feature in OneStore, currently, we only allow users to update description of the feautre.
API Definition
UpdateFeature(ctx context.Context, types.UpdateFeatureOpt) error
API Usage
err := store.UpdateFeature(ctx, opt)
Check list
Issue | Status | Test | API |
---|---|---|---|
#118 | ✅ | ✅ | featctl init |
#120 | ✅ | ✅ | featctl register entity user --length 32 --description "user-related features" |
#146 | ✅ | ✅ | featctl register group profile --entity user |
#121 | ✅ | ✅ | featctl register feature city --group profile --value-type "varchar(64)" |
#122 | ✅ | ✅ | featctl list entity |
#123 | ✅ | ✅ | featctl list group --entity device |
#125 | ✅ | ✅ | featctl list feature --entity user --group profile |
#127 | ✅ | ✅ | featctl list revision --group profile |
#128 | ✅ | ✅ | featctl describe entity --name user |
#129 | ✅ | ✅ | featctl describe group --name profile |
#130 | ✅ | ✅ | featctl describe feature --name city |
#167 | ✅ | ✅ | featctl update entity device --description "registered device" |
#169 | ✅ | ✅ | featctl update group info --description "basic device info" |
#168 | ✅ | ✅ | featctl update feature model --description "device model" |
#124 | ✅ | ✅ | featctl import |
#131 | ✅ | ✅ | featctl export --group profile --revision 1633449600 |
#132 | ✅ | ✅ | featctl get online-features |
#186 | ✅ | ✅ | featctl join historical-features |
Check list
Issue | Status | API |
---|---|---|
#72 | ✅ | Create a new OneStore |
#85 | ✅ | Get an instance of OneStore |
#73 | ✅ | Get an instance of feature |
#74 | ✅ | Get current revision for group |
#93 | ✅ | Walk a feature group |
#98 | ✅ | Import a batch feature group |
#77 | ✅ | List entities |
#78 | ✅ | List features |
#79 | ✅ | List revisions of feature group |
#80 | ✅ | Get online feature values |
#117 | ✅ | Get historical feature values |
#76 | ✅ | Create an entity in store |
#82 | ✅ | Create a batch feature in store |
#83 | ✅ | Create a group in store |
#84 | ✅ | Update a feature in store |
API Description
Open
will open a OneStore workspace.
API Definition
Open(ctx context.Context, opt types.OneStoreOpt) (*OneStore, error)
API Usage
store, err := onestore.Open(ctx, opt)
CreateEntity
receives entity_name
and description
create a Entity instance
CreateEntity(ctx context.Context, entity, description string) (*types.Entity, error)
entity, err := store.CreateEntity(ctx, "device", "entities related to device features such as phone_price,phone_model"
API Description
CreateFeature
creates a new feature in OneStore.
API Definition
CreateBatchFeature(ctx context.Context, opt types.CreateBatchFeatureOpt) (*types.Feature, error)
API Usage
feature, err := store.CreateBatchFeature(ctx, opt)
Currently, if a feature group contains multiple features, they can be in different revisions. It looks weird when we implement the query
subcommand, since we need to locate the exact entity table for each feature since they may live in different entity tables.
Proposal: all features inside the same feature group should be in the same revision anyways.
This should be documented somewhere, if not in code.
Motivation
we will be able to export data directly into the tty after #61 merged. With a --limit
option users can easily and safely preview the feature entities:
$ featctl export -g device --limit 9 | csview
2021/09/28 18:13:51 connecting feature store ...
2021/09/28 18:13:51 retrieving source table ...
2021/09/28 18:13:51 downloading features ...
2021/09/28 18:13:51 succeeded.
+------------+----------------+-------+
| entity_key | model | price |
+------------+----------------+-------+
| 1 | xiaomi-mix3 | 3999 |
| 2 | huawei-p40 | 5299 |
| 3 | oppo-r9 | 3999 |
| 4 | oppo-a37 | 1999 |
| 5 | vivo-y51 | 999 |
| 6 | apple-iphone11 | 4999 |
| 7 | apple-iphone12 | 5999 |
| 8 | huawei-meta40 | 6500 |
| 9 | xiaomi-mi11 | 4500 |
+------------+----------------+-------+
API Description
ListRevision
returns a list of revisions, given group_name
.
API Definition
ListRevision(ctx context.Context, groupName string) ([]*types.Revision, error)
API Usage
revisions, err := store.ListRevision(ctx, "device")
featctl export => featctl export
featctl import => featctl import
featctl get feature => featctl query
featctl describe => featctl describe
featctl create feature => featctl create config
featctl set => featctl update config
featctl list revisions => featctl list revisions
featctl list features => featctl list features
Naming is the hardest thing, any suggestions are welcome.
Register entity in onstore
featctl register entity device --length 32 --description "device info"
list entity
featctl list entity
API Description
GetFeature
receives feature_name
, returns a feature instance.
API Definition
GetFeature(ctx context.Context, featureName string) (types.Feature, error)
API Usage
feature, error := store.GetFeature(ctx, "device_price")
API Description
ListEntity
returns a list of entities.
API Definition
ListEntity(ctx context.Context) ([]*types.Entity,error)
API Usage
entities, err := store.ListEntity(ctx)
API Description
ListFeature
returns a list of features, given entity_name
or group_name
.
API Definition
ListFeature(ctx context.Context, opt types.ListFeatureOpt) ([]*types.Feature, error)
API Usage
features, err := store.ListFeature(ctx, opt)
Use featctl list features
to list all existed features.
Proto Interface:
featctl list features [--group <group>]
Example:
$ featctl list features --group device
$ featctl list features
Use featctl get feature
to access specified feature value.
Proto Interface:
featctl get feature --group <group> --name <name1> [name2, ...] --key <key1> [key2, ...]
Example:
$ featctl get feature --group device --name model price --key 'c7f7f1dd' 'c0809ed6'
entity_key, model, price
c7f7f1dd, mix3, 4999.00
c0809ed6, iphone12, 5299.00
What are some most obvious business models in the open source world? What should we keep in mind in this early stage?
API Description
Create
will open a new OneStore workspace.
API Definition
Create(ctx context.Context, opt types.OneStoreOpt) error
API Usage
err := onestore.Create(ctx, opt)
API Description
CreateGroup
creates a new feature group in OneStore.
API Definition
CreateGroup(ctx context.Context, opt types.CreateGroupOpt) (*types.Group,error)
API Usage
group, err := store.CreateGroup(ctx, opt)
The version package appears to be for featctl, and other components should have their own version packages in the future
Use featctl init
to create a new feature store in the specified database.
Usage:
featctl init [--database <database>]
API Description
GetGroup
receives group_name
, returns the group.
API Definition
GetGroup(ctx context.Context, groupName string) (*types.Revision, error)
API Usage
group, error := store.GetGroup(ctx, "device")
list feature group by entity key
featctl list group --entity device
API Description
WalkFeatureValues
traverse the specified revision of the feature group.
API Definition
WalkFeatureValues(ctx context.Context, opt types.WalkFeatureValuesOpt) error
API Usage
err := store.WalkFeatureValues(ctx, opt)
data_table = group_name + "_" + "1634194805", which is significantly longer than the group_name itself.
Let's make it at least 32+10=42 characters
Actual:
$ featctl describe feature -g device -n price
2021/09/28 12:18:30 failed querying feature config, group_name=device, feature_name=price: expected slice but got struct
Expected:
$ featctl describe --group device --name price
Name: price
Group: device
Revision: 20210909
Status: disabled
Category: batch
ValueType: int(11)
Description: model price
RevisionsLimit: 3
CreateTime: 2021-09-10T15:20:43Z
ModifyTime: 2021-09-13T18:58:34Z
To avoid redundancy, we store some feature properties only in feature_group
table, not in feature
table. For example, entity_name
, category
fields are only stored in feature_group
. In the case of querying features by those fields, we have to join feature
and feature_group
table, which makes life hard.
In order to get rid of JOIN
in such cases, we decide to use Views
in database. We will create a view rich_feature
as
CREATE VIEW rich_feature AS
SELECT
f.name, f.group_name, f.value_type, f.description, f.create_time, f.modify_time,
s.entity_name, s.category
FROM feature AS f
LEFT JOIN feature_group AS fg
ON f.group_name = fg.name;
API Description
GetOnlineFeatureValues
takes in a list of entity_keys
and a list of feature_names
, returns a matrix of feature values.
API Definition
GetOnlineFeatureValues(ctx context.Context, featureNames []string, entityKeys []string) ([]*types.FeatureValue, error)
API Usage
featureValues, err := store.GetOnlineFeatureValues(ctx, featureNames, entityKeys)
The Open method of the database package now exports sqlx object by default. This means that methods TableExists and ColumnInfo are not directly available to the outside world, so some refactoring may be needed here. There are now two directions for refactoring.
Change the exported object of the database package from sqlx.DB to database.DB
type DB struct {
*sqlx.DB
}
func Open(option *Option) (DB, error) {
db, err := sqlx.Open(
"mysql",
fmt.Sprintf("%s:%s@tcp(%s:%s)/%s?parseTime=true",
option.User,
option.Pass,
option.Host,
option.Port,
option.DbName),
)
return DB{db}, err
}
Replace the methods of the DB object with functions
func TableExists(ctx context.Context,db *sqlx.DB, table string) (bool, error) {
}
func ColumnInfo(ctx context.Context, db sqlx.DB, table string, column string) (Column, error) {
}
We can work on this together
API Description
ImportBatchFeatures
imports batch feature values to OneStore, batch features should be imported in groups.
API Definition
ImportBatchFeatures(ctx context.Context, opt types.ImportBatchFeaturesOpt) error
API Usage
err := store.ImportBatchFeatures(ctx, opt)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.