Giter Site home page Giter Site logo

oom-ai / oomstore Goto Github PK

View Code? Open in Web Editor NEW
85.0 85.0 11.0 3.65 MB

Lightweight and Fast Feature Store Powered by Go (and Rust).

Home Page: https://oom.ai

License: Apache License 2.0

Makefile 0.50% Go 82.82% Shell 10.41% Python 0.67% Rust 5.59%
featurestore go ml mlops python rust

oomstore's People

Contributors

jinghancc avatar lianxmfor avatar wfxr avatar yiksanchan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

oomstore's Issues

implement featctl command register feature

Description

Register new feature to onestore

Usage

register a new feature

Usage:
  featctl register feature [flags]

Flags:
  -d, --description   string feature description
  -g  --group            string feature the group the feature belongs to
        --value_type    string feature type
  -h, --help                 help for feature

featctl register feature phone_model --group device --value_type varchar(30) --description "手机型号"

Implement API GetHistoricalFeatureValues

API Description
GetHistoricalFeatureValues takes in a list of real labels and corresponding timestamps, returns historical feature values based on timestamps.

API Definition

GetHistoricalFeatureValues(ctx context.Context, opt types.GetHistoricalFeatureValuesOpt) error

API Usage

err := store.GetHistoricalFeatureValues(ctx, opt)

Remove tidb-toolkit binary dependencies

Motivation

Without tidb-toolkit dependencies we can use mysql as our playground, speed up integration test and integrate it into ci workflow.

Progress

  • Remove dumpling dependency
  • Remove tidb-lightning dependency

Feature: implement `featctl list revisions` subcommand

Use featctl list revisions to list all existed revisions of a specified feature group.

Proto Interface:

featctl list revisions --group <group>

Example:

$ featctl list revisions --group device
revision, create_time, modify_time, description
device_20210922, 2021-09-22T16:08:38+08:00, 2021-09-22T16:08:38+08:00, ...
device_20210923, 2021-09-23T16:08:38+08:00, 2021-09-23T16:08:38+08:00, ...

API CreateFeature: validate value_type

What's wrong ?

Currently, in API CreateFeature, we don't check the correctness of value_type when creating a feature. If user pass in an invalid value_type (for example, var(12)), it will only be found when user tries to import data for that feature.

Expected result

We will add validation for value_type when creating a feature, throw out the error if value_type is invalid.

How to fix

In the API CreateFeature, we will create a temporary table with that feature. If failed, we know the value_type is invalid.

refactor: always use bindvars for parameterization

Since we're using sqlx, according to its docs:

you should always use these to send values to the database, as they will prevent SQL injection attacks. database/sql does not attempt any validation on the query text; it is sent to the server as is, along with the encoded parameters.

A common misconception with bindvars is that they are used for interpolation. They are only for parameterization, and are not allowed to change the structure of an SQL statement. For instance, using bindvars to try and parameterize column or table names will not work.

We should use bindvars whenever possible, i.e., WHERE field = ? rather than WHERE xx = '%s'

Implement API UpdateFeature

API Description
UpdateFeature updates the specified feature in OneStore, currently, we only allow users to update description of the feautre.

API Definition

UpdateFeature(ctx context.Context, types.UpdateFeatureOpt) error

API Usage

err := store.UpdateFeature(ctx,  opt)

refactor featctl CLI

Check list

Issue Status Test API
#118 featctl init
#120 featctl register entity user --length 32 --description "user-related features"
#146 featctl register group profile --entity user
#121 featctl register feature city --group profile --value-type "varchar(64)"
#122 featctl list entity
#123 featctl list group --entity device
#125 featctl list feature --entity user --group profile
#127 featctl list revision --group profile
#128 featctl describe entity --name user
#129 featctl describe group --name profile
#130 featctl describe feature --name city
#167 featctl update entity device --description "registered device"
#169 featctl update group info --description "basic device info"
#168 featctl update feature model --description "device model"
#124 featctl import
#131 featctl export --group profile --revision 1633449600
#132 featctl get online-features
#186 featctl join historical-features

API Implementation

Check list

Issue Status API
#72 Create a new OneStore
#85 Get an instance of OneStore
#73 Get an instance of feature
#74 Get current revision for group
#93 Walk a feature group
#98 Import a batch feature group
#77 List entities
#78 List features
#79 List revisions of feature group
#80 Get online feature values
#117 Get historical feature values
#76 Create an entity in store
#82 Create a batch feature in store
#83 Create a group in store
#84 Update a feature in store

Implement API Open

API Description

Open will open a OneStore workspace.

API Definition

Open(ctx context.Context, opt types.OneStoreOpt) (*OneStore, error)

API Usage

store, err := onestore.Open(ctx, opt)

Implement API CreateEntity

API description

CreateEntity receives entity_name and description create a Entity instance

API Definition

CreateEntity(ctx context.Context, entity, description string) (*types.Entity, error)

API Usage

entity, err := store.CreateEntity(ctx, "device", "entities related to device features such as phone_price,phone_model"

Implement API CreateBatchFeature

API Description
CreateFeature creates a new feature in OneStore.

API Definition

CreateBatchFeature(ctx context.Context, opt types.CreateBatchFeatureOpt) (*types.Feature, error)

API Usage

feature, err := store.CreateBatchFeature(ctx, opt)

RFC: version control by feature group rather than feature

Currently, if a feature group contains multiple features, they can be in different revisions. It looks weird when we implement the query subcommand, since we need to locate the exact entity table for each feature since they may live in different entity tables.

Proposal: all features inside the same feature group should be in the same revision anyways.

Feature request: let `featctl export` support `--limit` option

Motivation

we will be able to export data directly into the tty after #61 merged. With a --limit option users can easily and safely preview the feature entities:

$ featctl export -g device --limit 9 | csview
2021/09/28 18:13:51 connecting feature store ...
2021/09/28 18:13:51 retrieving source table ...
2021/09/28 18:13:51 downloading features ...
2021/09/28 18:13:51 succeeded.
+------------+----------------+-------+
| entity_key | model          | price |
+------------+----------------+-------+
| 1          | xiaomi-mix3    | 3999  |
| 2          | huawei-p40     | 5299  |
| 3          | oppo-r9        | 3999  |
| 4          | oppo-a37       | 1999  |
| 5          | vivo-y51       | 999   |
| 6          | apple-iphone11 | 4999  |
| 7          | apple-iphone12 | 5999  |
| 8          | huawei-meta40  | 6500  |
| 9          | xiaomi-mi11    | 4500  |
+------------+----------------+-------+

Implement API ListRevision

API Description
ListRevision returns a list of revisions, given group_name.

API Definition

ListRevision(ctx context.Context, groupName string) ([]*types.Revision, error)

API Usage

revisions, err := store.ListRevision(ctx, "device")

Refactor: rename featctl sub commands

featctl export         => featctl export
featctl import         => featctl import
featctl get feature    => featctl query
featctl describe       => featctl describe

featctl create feature => featctl create config
featctl set            => featctl update config
featctl list revisions => featctl list revisions
featctl list features  => featctl list features

Naming is the hardest thing, any suggestions are welcome.

@YikSanChan @jinghancc @lianxmfor

Implement API GetFeature

API Description
GetFeature receives feature_name, returns a feature instance.

API Definition

GetFeature(ctx context.Context, featureName string) (types.Feature, error)

API Usage

feature, error := store.GetFeature(ctx, "device_price")

Implement API ListEntity

API Description
ListEntity returns a list of entities.

API Definition

ListEntity(ctx context.Context) ([]*types.Entity,error)

API Usage

entities, err := store.ListEntity(ctx)

Implement API ListFeature

API Description
ListFeature returns a list of features, given entity_name or group_name.

API Definition

ListFeature(ctx context.Context, opt types.ListFeatureOpt) ([]*types.Feature, error)

API Usage

features, err := store.ListFeature(ctx, opt)

Feature: implement `featctl get feature` subcommand

Use featctl get feature to access specified feature value.

Proto Interface:

featctl get feature --group <group> --name <name1> [name2, ...] --key <key1> [key2, ...]

Example:

$ featctl get feature --group device --name model price --key 'c7f7f1dd' 'c0809ed6'
entity_key, model, price
c7f7f1dd, mix3, 4999.00
c0809ed6, iphone12, 5299.00

Implement API Create

API Description
Create will open a new OneStore workspace.

API Definition

Create(ctx context.Context, opt types.OneStoreOpt) error

API Usage

err := onestore.Create(ctx, opt)

Implement API CreateGroup

API Description
CreateGroup creates a new feature group in OneStore.

API Definition

CreateGroup(ctx context.Context, opt types.CreateGroupOpt) (*types.Group,error)

API Usage

group, err := store.CreateGroup(ctx, opt)

Implement API GetGroup

API Description
GetGroup receives group_name, returns the group.

API Definition

GetGroup(ctx context.Context, groupName string) (*types.Revision, error)

API Usage

group, error := store.GetGroup(ctx, "device")

Implement API WalkFeatureValues

API Description
WalkFeatureValues traverse the specified revision of the feature group.

API Definition

WalkFeatureValues(ctx context.Context, opt types.WalkFeatureValuesOpt) error

API Usage

err := store.WalkFeatureValues(ctx, opt)

`featctl describe` not work now

Actual:

$ featctl describe feature -g device -n price
2021/09/28 12:18:30 failed querying feature config, group_name=device, feature_name=price: expected slice but got struct

Expected:

$ featctl describe --group device --name price
Name:           price
Group:          device
Revision:       20210909
Status:         disabled
Category:       batch
ValueType:      int(11)
Description:    model price
RevisionsLimit: 3
CreateTime:     2021-09-10T15:20:43Z
ModifyTime:     2021-09-13T18:58:34Z

Create view for feature

To avoid redundancy, we store some feature properties only in feature_group table, not in feature table. For example, entity_name, category fields are only stored in feature_group. In the case of querying features by those fields, we have to join feature and feature_group table, which makes life hard.

In order to get rid of JOIN in such cases, we decide to use Views in database. We will create a view rich_feature as

CREATE VIEW rich_feature AS 
SELECT 
    f.name, f.group_name, f.value_type, f.description, f.create_time, f.modify_time,
    s.entity_name, s.category
FROM feature AS f
LEFT JOIN feature_group AS fg
ON f.group_name = fg.name;

Implement API GetOnlineFeatureValues

API Description
GetOnlineFeatureValues takes in a list of entity_keys and a list of feature_names, returns a matrix of feature values.

API Definition

GetOnlineFeatureValues(ctx context.Context, featureNames []string, entityKeys []string) ([]*types.FeatureValue, error)

API Usage

featureValues, err := store.GetOnlineFeatureValues(ctx, featureNames, entityKeys)

[Discussion] how to export the methods of the database package

The Open method of the database package now exports sqlx object by default. This means that methods TableExists and ColumnInfo are not directly available to the outside world, so some refactoring may be needed here. There are now two directions for refactoring.

Direction one

Change the exported object of the database package from sqlx.DB to database.DB

type DB struct {
	*sqlx.DB
}

func Open(option *Option) (DB, error) {
	db, err := sqlx.Open(
		"mysql",
		fmt.Sprintf("%s:%s@tcp(%s:%s)/%s?parseTime=true",
			option.User,
			option.Pass,
			option.Host,
			option.Port,
			option.DbName),
	)
	return DB{db}, err
}

Direction two

Replace the methods of the DB object with functions

func TableExists(ctx context.Context,db *sqlx.DB, table string) (bool, error) {
}

func ColumnInfo(ctx context.Context, db sqlx.DB, table string, column string) (Column, error) {
}

Implement API ImportBatchFeatures

API Description
ImportBatchFeatures imports batch feature values to OneStore, batch features should be imported in groups.

API Definition

ImportBatchFeatures(ctx context.Context, opt types.ImportBatchFeaturesOpt) error

API Usage

err := store.ImportBatchFeatures(ctx, opt)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.