Giter Site home page Giter Site logo

dataman's Introduction

DataMan Build Status Go Report Card

A data service-- which has:

- schema enforcement
- replication
- geo-distribution / load-balancing
- caching (MUCH later ;) )
- archiving / deleting data
- security
- backups

The intention is to have a stack of "backend stores" that this unified API can talk with to store the actual data. As such a lot of the features (schema, sharding, etc.) are done independently of the underlying store.

dataman's People

Contributors

jacksontj avatar svrana avatar tvirgl-wish avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

dataman's Issues

Create CLI for running integration tests

To run tests today you need to run go test in the integration tests directory. The structure of the test suites actually only requires some config files and a directory of tests. There is no reason to require the user to put all their tests into upstream dataman's repo -- ideally there'd be a CLI with flags for the various config files and the directory of tests to run.

Allow new keys for _document type

I have a column in my postgres table that is of datatype json. However, I do not know what the keys in this column would be, as the user will specify this. I want this column to act as a counter dictionary. Therefore, the operations I need would be:

  1. Insert new key/value pair if key not in dict
  2. update value if key in dict
  3. increment value if key in dict

I have tried to do this with _document datatype:

"counter_json": {
  "name": "counter_json",
  "field_type": "_document",
  "provision_state": 3
},

with an update operation:

{
  "Type": "update",
  "Args": {
    "db": "event_instance_period",
    "collection": "event_base",
    "filter": {
      "event_instance_id": evt.EventInstanceId,
      "start_time":        evt.StartTime,
      "updated":           evt.Updated,
      "end_time":          evt.EndTime
    },
    "record": {
      "count":    1
    },
    "record_op": {
      "counter_json.a": ["+", 1]
    }
  }
}

However, dataman returns an error: record_op field counter_json.a doesn't exist in collection.

Allow user to specify constraint for Set operation

Currently when performing a Set operation on a collection, the constraint automatically defaults to the primary key. For example, the query generated would be something like this:

INSERT INTO <collection> (<columns>) VALUES (<values>) ON CONFLICT (_id) DO UPDATE SET ...

It would be great to have the ability to specify which constraint to use for set operation. If the collection has a unique constraint, the user should be able to do set operations for those cases:

INSERT INTO <collection> (<columns>) VALUES (<values>) ON CONFLICT (<constraint>) DO UPDATE SET ...

This can be implemented by accepting an additional argument, constraint, when calling the operation. It can be either a list of column names corresponding to the constraint, or just the constrain name itself.

{
  "Type": "set",
  "Args": {
    "db": "event_sum",
    "collection": "event_base",
    "record": {
      "service_id":          evt.ServiceId,
      "event_type":          evt.EventType,
      "event_name":          evt.EventName,
      "processed_data":      evt.ProcessedData,
      "processed_data_hash": evt.ProcessedDataHash
    },
    "constraint": ["service_id", "event_type", "processed_data_hash"]
  }
}

Allow user to do 'LIKE' operation

Hello, could you add the 'LIKE' operation so that it can support the query similar to the following:
select name from event_group where name Like "%Group1%"

Batch delete API

This is kind of related to #47. If we decide to support multi deletes, then we don't need a batch API

Otherwise, it'd be a nice QoL improvement to allow you to send a batch of primary keys to delete

Create Makefile

with targets for:

  • release: build a release set of binaries
  • test: run all tests in the repo
  • fmt: format all code and tests (json in there)

Have "set" check for pkey

TLDR; Need to add a pkey check before validating that the record is valid for an update / insert

Right now the set operation checks if it is valid as an insert or an update. Its possible to create a set which is missing the pkey which is invalid as an insert but passes the update validation. Since set is supposed to be an upsert for a single record-- this should error with that message.

Need bytea type for binary/bytes data

FATA[0002] Unknown postgres data_type bytea in file map[column_name:data data_type:bytea character_maximum_length:<nil> is_nullable:YES column_default:<nil>]

Option for multi deletes

Mongodb allows you to multi delete() things based on fields that don't include the primary key

If we don't want to support this, we can get around it on the client side by querying the docs, then deleting them. If you'd rather go that route, feel free to close this

support pgstore side projections for subfields

in pg if you select something like foo->'bar'->>'baz' the column name is not a "normal" name, so the current util method doesn't unpack it into the record properly. Need to see how hard this will be, if enough work this might be the time to redo the sql converter to use $1 etc. all over.

The idea I have right off the top of my head is that pg results are always in the order of the select (assuming you gave one) -- of so we could provide a slice (in the same order) of "addresses" inside the record for the results to go, then the util.go stuff could just use <record>.Set()

Group by function, sum()

  • GROUP BY function and SUM() function from SQL (group aggregation with sum)
  • Would be very useful especially with grafana

Make integration tests sharding agnostic

The long-term goal is to have dataman abstract away the specifics of the sharding from the client. As such we should be able to run the same set of tests against N sharding configurations.

Not escaping quotes on `text` fields

Not sure if this is fixed in latest version but:

ERROR:

"
Dataman SET: ERROR: Error running query: Err=pq: syntax error at or near "Helvetica" query=INSERT INTO "public".XXXX ("_id","...) VALUES (111,null,null,82,'2018-12-27 03:49:11',null,7,8,51,'XXXXX: ....',29,69,'
"

Content has a single quote in this part of the data:

"
body itemscope itemtype="http://schema.org/EmailMessage" style="font-family: 'Helvetica Neue',Helvetica,Arial,sans-serif; box-sizing: border-box; font-size: 14px; -webkit-font-smoothing: antialiased; -webkit-text-size-adjust: none; width: 100% !important; height: 100%; line-height: 1.6em; background-color: #f6f6f6; margin: 0;"
"

Will work around by sanitizing it myself for now.

Unique Constraint violations return as a general querty error, should be ValidationError map so we know what key it was in violation

Unique Constraint violations return as a general querty error, should be ValidationError map so we know what key it was in violation.

Error running query: Err=pq: duplicate key value violates unique constraint "constraint_name"

Currently this is returned as Error, I think it should be ValidationError, so that I get the map of fields with this specific field in it. Otherwise I cant know which field it may have been without doing a ton of queries and DB schema inspections (constaint fields, each fields value, test my field values, see which are in violation).

Filter by columns from joined tables

Hello, I was wondering if it is possible to filter by columns from tables joined with the "join" argument? For example, something like this:

{
  "Type": "filter",
  "Args": {
    "db": "event_sum",
    "collection": "event_instance_period",
    "join": ["event_instance_id", "event_instance_id.event_base_id"],
    "filter": {
      "start_time":        [">", evt.StartTime],
      "end_time":          ["<", evt.EndTime],
      "event_instance_id.event_base_id.service_id": ["=", evt.ServiceId]
    }
  }
}

event_instance_id joins a foreign table event_instance, and event_instance.event_base_id joins another foreign table event_base. I need to filter by event_base.service_id.

Right now, the error that occurs when I perform the operation above is as such: SubField "event_instance_id"->>'event_base_id' doesn't exist in event_instance_period: map[]

Let me know if you need clarification on anything ๐Ÿ‘

Query limiting system

As a centralized data system, it's likely that well want to enforce some rules about queries that are sent. An example would be, no filter queries without a limit. Ideally this would be a rule driven system (to avoid the need for code) that can be reloaded (to avoid restarts and downtime).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.