Giter Site home page Giter Site logo

gocassa's Introduction

gocassa

GoDoc Build Status

Gocassa is a high-level library on top of gocql.

Current version: v1.4.0

Compared to gocql it provides query building, adds data binding, and provides easy-to-use "recipe" tables for common query use-cases. Unlike cqlc, it does not use code generation.

For docs, see: https://godoc.org/github.com/gocassa/gocassa

Usage

Below is a basic example showing how to connect to a Cassandra cluster and setup a simple table. For more advanced examples see the "Table Types" section below.

package main

import(
    "fmt"
    "time"
    
    "github.com/gocassa/gocassa"
)

type Sale struct {
    Id          string
    CustomerId  string
    SellerId    string
    Price       int
    Created     time.Time
}

func main() {
    keySpace, err := gocassa.ConnectToKeySpace("test", []string{"127.0.0.1"}, "", "")
    if err != nil {
        panic(err)
    }
    salesTable := keySpace.Table("sale", &Sale{}, gocassa.Keys{
        PartitionKeys: []string{"Id"},
    })

    err = salesTable.Set(Sale{
        Id: "sale-1",
        CustomerId: "customer-1",
        SellerId: "seller-1",
        Price: 42,
        Created: time.Now(),
    }).Run()
    if err != nil {
        panic(err)
    }

    result := Sale{}
    if err := salesTable.Where(gocassa.Eq("Id", "sale-1")).ReadOne(&result).Run(); err != nil {
        panic(err)
    }
    fmt.Println(result)
}

You can pass additional options to a gocassa Op to further configure your queries, for example the following query orders the results by the field "Name" in descending order and limits the results to a total of 100.

err := salesTable.List("seller-1", nil, 0, &results).WithOptions(gocassa.Options{
    ClusteringOrder: []ClusteringOrderColumn{
        {DESC, "Name"},
    },
    Limit: 100,
}).Run()

Table Types

Gocassa provides multiple table types with their own unique interfaces:

  • a raw CQL table called simply Table - this lets you do pretty much any query imaginable
  • and a number of single purpose 'recipe' tables (Map, Multimap, TimeSeries, MultiTimeSeries, MultiMapMultiKey), which aims to help the user by having a simplified interface tailored to a given common query use case

Table

    salesTable := keySpace.Table("sale", &Sale{}, gocassa.Keys{
        PartitionKeys: []string{"Id"},
    })
    result := Sale{}
    err := salesTable.Where(gocassa.Eq("Id", "sale-1")).ReadOne(&result).Run()

link to this example

MapTable

MapTable provides only very simple CRUD functionality:

    // …
    salesTable := keySpace.MapTable("sale", "Id", &Sale{})
    result := Sale{}
    salesTable.Read("sale-1", &result).Run()
}

link to this example

Read, Set, Update, and Delete all happen by "Id".

MultimapTable

MultimapTable can list rows filtered by equality of a single field (eg. list sales based on their sellerId):

    salesTable := keySpace.MultimapTable("sale", "SellerId", "Id", &Sale{})
    // …
    results := []Sale{}
    err := salesTable.List("seller-1", nil, 0, &results).Run()

link to this example

For examples on how to do pagination or Update with this table, refer to the example (linked under code snippet).

TimeSeriesTable

TimeSeriesTable provides an interface to list rows within a time interval:

    salesTable := keySpace.TimeSeriesTable("sale", "Created", "Id", &Sale{}, 24 * time.Hour)
    //...
    results := []Sale{}
    err := salesTable.List(yesterdayTime, todayTime, &results).Run()

MultiTimeSeriesTable

MultiTimeSeriesTable is like a cross between MultimapTable and TimeSeriesTable. It can list rows within a time interval, and filtered by equality of a single field. The following lists sales in a time interval, by a certain seller:

    salesTable := keySpace.MultiTimeSeriesTable("sale", "SellerId", "Created", "Id", &Sale{}, 24 * time.Hour)
    //...
    results := []Sale{}
    err := salesTable.List("seller-1", yesterdayTime, todayTime, &results).Run()

MultiMapMultiKeyTable

MultiMapMultiKeyTable can perform CRUD operations on rows filtered by equality of multiple fields (eg. read a sale based on their city , sellerId and Id of the sale):

    salePartitionKeys := []Sale{"City"}
    saleClusteringKeys := []Sale{"SellerId","Id"}
    salesTable := keySpace.MultimapMultiKeyTable("sale", salePartitionKeys, saleClusteringKeys, Sale{})
    // …
    result := Sale{}
    saleFieldCity = salePartitionKeys[0]
    saleFieldSellerId = saleClusteringKeys[0]
    saleFieldSaleId = saleClusteringKeys[1]

    field := make(map[string]interface{})
    id := make(map[string]interface{})


    field[saleFieldCity] = "London"
    id[saleFieldSellerId] = "141-dasf1-124"
    id[saleFieldSaleId] = "512hha232"

    err := salesTable.Read(field, id , &result).Run()

Encoding/Decoding data structures

When setting structs in gocassa the library first converts your value to a map. Each exported field is added to the map unless

  • the field's tag is "-", or
  • the field is empty and its tag specifies the "omitempty" option

Each fields default name in the map is the field name but can be specified in the struct field's tag value. The "cql" key in the struct field's tag value is the key name, followed by an optional comma and options. Examples:

// Field is ignored by this package.
Field int `cql:"-"`
// Field appears as key "myName".
Field int `cql:"myName"`
// Field appears as key "myName" and
// the field is omitted from the object if its value is empty,
// as defined above.
Field int `cql:"myName,omitempty"`
// Field appears as key "Field" (the default), but
// the field is skipped if empty.
// Note the leading comma.
Field int `cql:",omitempty"`
// All fields in the EmbeddedType are squashed into the parent type.
EmbeddedType `cql:",squash"`

When encoding maps with non-string keys the key values are automatically converted to strings where possible, however it is recommended that you use strings where possible (for example map[string]T).

Troubleshooting

Too long table names

In case you get the following error:

Column family names shouldn't be more than 48 characters long (got "somelongishtablename_multitimeseries_start_id_24h0m0s")

You can use the TableName options to override the default internal ones:

tbl = tbl.WithOptions(Options{TableName: "somelongishtablename_mts_start_id_24h0m0s"})

gocassa's People

Contributors

andreas avatar capitancambio avatar codelingobot avatar crufter avatar dancannon avatar danielchatfield avatar ghais avatar h7kanna avatar hailobackup avatar imjoshholloway avatar jjolma avatar juliendsv avatar mattheath avatar obeattie avatar renatomserra avatar sjwhitworth avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gocassa's Issues

Add batch support

Batch are use to make several insert in an atomic way. Should be always use when people wants to keep consistency between different table.

Consider decoupling statement generation and statement execution

I think it's a bit weird that in order to use basically any functionality in Gocassa, one has to be connected to Cassandra. The snippet in the README highlights this:

keySpace, err := gocassa.ConnectToKeySpace("test", []string{"127.0.0.1"}, "", "")

ie. getting a keyspace object requires connecting to that keyspace. In most ORMs, the query building logic is a completely separate layer to the query execution logic. ie. I can use most of the functionality of the ORM without ever being connected to the database. It's only at execution time that I need to obtain a connection. It would be great if we could consider the possibility of moving to that model with gocassa. I see the rationale as follows:

  • When my service starts up, I don't necessarily immediately want to connect to Cassandra. Right now, most services will attempt to connect when a handler or other function is invoked that requires a connection. The current model seems to force me into a service panic when it starts up if it can't connect to C*

    I can get around this somewhat with an idempotent connect() function inside each data access method, that tries to setup the keyspaces/tables, but this doesn't feel like I should have to deal with this in my service code.

  • For keyspace and table creation, I need to be able to get the CQL to execute without ever connecting to C*.

  • I can see use-cases where I might want to see what CQL is going to be executed without ever executing it.

  • I should be able to swap out the underlying connection at runtime for a mock or similar.

Anyway, I just wanted to put this here so we have a proper place to discuss the proposal and status.


Full HipChat conversation from yesterday:

Oliver Beattie: I'm a bit concerned about how being able to get a table or keyspace object is tied to having a Cassandra connection. Why is this? It seems to force us into a pattern of "if we can't connect to C*, the only option is to panic" which doesn't feel ideal. In most ORMs, the stuff that actually builds a query is a separate layer to the components that execute that query (ie. you can build a query without being connected).
John Worrell: true I suppose, but as soon as you try to execute it, then you get the panic
when you say "build a query" I assume you mean make a string
Andreas Garnaes: I think creating the session could transparently be moved to where a query is actually being executed.
Oliver Beattie: i mean defining schema, and building the query (yes, the string, and its bind parameters)
the thing is, I don't want to have to panic
I want to be able to return an error (to requests), like we do now
usually you would decouple query building and query execution
this also means in the case of testing, you just replace/mock the connection itself, and the query building/data binding logic does not change
i can sort of get around it by having a connect() function that is idempotent, and called in all the data storage/retrieval methods, but it's crufty
i may also want to get a compiled CQL statement without executing it (which again should not require a connection to C*)
Andreas Garnaes: Moving session creation to execution time would allow returning errors. It would just be a matter of using keyspace.session() instead of keyspace.session. Wrt testing I'd prefer the current approach of in-memory Table instead anyway.
Andreas Garnaes: I see the point of separating CQL generation and execution, it just seems to complicate the interfaces (unnecessarily?).
Oliver Beattie: how is that?
Oliver Beattie: basically, i don't know anything of the internals, but that getting a keyspace means getting a connection, doesn't make sense to me
Andreas Garnaes: I think that's a good point which should be adressed.
Dafydd Woods: lol
Asim Aslam: +1 to separation of keyspace and connection logic. I also found this a bit weird. Think the interface needs to be more aligned with most database drivers.
Oliver Beattie: other use of this is for CQL schema creation: presumably we need a service to be able to report the CQL needed to create its schema without it having connected to a keyspace in the first place
which means that the schemas have to be defined before server.Run(), and it cannot depend on having a C* connection

Consistency level defaults to ONE and is difficult to override

The cluster consistency is set to ONE by default when creating a new gocql backend here:
https://github.com/hailocab/gocassa/blob/ebff9ff045fa0c728d6a3f2830a6fa17867c0587/gocql_backend.go#L42-L44

  • It's not documented anywhere
  • To replace this, the user must provide a whole new QueryExecutor which is very inconvienient
  • The inline documentation explicitly states that the QueryExecutor should only be replaced for mocking or testing purposes here
  • The default in gocql is Quorum

OneToManyTable returns struct pointers regardless of the declared type

I'm using a OneToManyTable, which I've declared as so:

queriesTable = keyspace.OneToManyTable("queries", "Status", "Id", storedQuery{})

As you can see, the declared row type is storedQuery. However, when I call List() (and possibly other methods too), I get *storedQuery as the result type:

panic: Unhandled response type: &dao.storedQuery{Q:"EVENT SEQ(a b);", Id:"0f32b15e-1cbc-402c-628b-be03bdb7f937", Status:0}

Is this expected behaviour?

† The panic is not in the gocassa library itself, but originates from a type switch I have in my service.

one to many table : update is not working

type Race struct {
DriverId int
RaceId int
CustomerName string
Fare int
}

mymap := map[string]interface{}{
"Fare": 10000,
}
err = races.Update(1, 1, mymap)

return UPDATE juju SET Fare = ?, Fare = ? [1 1]

instead of UPDATE juju SET Fare = ? where driverid = ? and raced = ? [10000 1 1]

Mock table cluster ordering wrong

Hello.

I think the mock table's cluster ordering is incorrect. I expect it to be in ascending order.

Here is some CQL that is similar to the TestTableRead test:

DROP TABLE Foo;
CREATE TABLE Foo (
  Pk1 int,
  PK2 int,
  Ck1 int,
  Ck2 int,
  Name text,
  PRIMARY KEY ((Pk1, Pk2), Ck1, Ck2)
);

-- u1
insert into Foo (Pk1, PK2, Ck1, Ck2, Name) values (1,1,1,1,'John');

-- u2
insert into Foo (Pk1, PK2, Ck1, Ck2, Name) values (1,2,1,1,'Joe');

-- u3
insert into Foo (Pk1, PK2, Ck1, Ck2, Name) values (1,1,2,1,'Josh');

-- u4
insert into Foo (Pk1, PK2, Ck1, Ck2, Name) values (1,1,1,2,'Jane');

-- u5
insert into Foo (Pk1, PK2, Ck1, Ck2, Name) values (2,1,1,1,'Jill');

select * from Foo where Pk1 = 1 and Pk2 = 1;

 pk1 | pk2 | ck1 | ck2 | name
-----+-----+-----+-----+------
   1 |   1 |   1 |   1 | John
   1 |   1 |   1 |   2 | Jane
   1 |   1 |   2 |   1 | Josh

It sorts ascending, comparing the clustering columns in order. This is different than the mock table's results which expects John, Josh, Jane.

    s.NoError(s.tbl.Where(Eq("Pk1", 1), Eq("Pk2", 1)).Read(&users).Run())
    s.Equal([]user{u1, u3, u4}, users)

Please let me know if I am overlooking something.

Thanks,
Jeff

The Relation struct is very opaque

The Relation struct does not export any fields are no methods to examine it's state, really. This makes it hard to mock Relation outside the cmagic package.

Can't insert with Set command.

Hi @hailobackup

I really like you idea about gocassa. But I have a problem when insert new row to table.

First I create a table by code below:


import (
    .....
    "github.com/hailocab/gocassa"
    .....
)
.......
func (r *Resource) GetCassandraTable(name string, e interface{}, p, c []string) gocassa.Table {
    options := gocassa.Options{TableName: name}
    for _, k := range c {
        options = options.AppendClusteringOrder(k, true)
    }
    table := r.Cassandra.Table(name, e, gocassa.Keys{
        PartitionKeys:     p,
        ClusteringColumns: c,
    }).WithOptions(options)
    createIf(table)
    return table
}
.....

Code create table:

r.UserAlbum = r.GetCassandraTable("user_albums", cassandra.UserAlbum{}, []string{"user"}, []string{"album"})

But when create new row:

userAlbum := cassandra.UserAlbum{}
userAlbum.User = a.CreatedBy
userAlbum.Album = gocql.TimeUUID()
userAlbum.CreatedAt = a.CreatedAt
userAlbum.Bot = a.Bot
err = r.Resource.UserAlbum.Set(userAlbum).Run()

I got error: _PRIMARY KEY part user found in SET part_

But when i changed the source file at: gocassa/table.go line 172: updateOpType -> insertOpType and It works. And I want to know what is the function _removeFields_ use for (line 161).

Thanks you.

low case for all the metadata

when someone is using magic , he has to use a Struct.
Each name are using Camel case in go, we should transform them in low case for cassandra.

Cassandra is case sensitive and its a pain in the ass to handle it.

New time series recipe

We need an other one time series recipe. One where we can store something in the time. It s a mix between one to may recipe and the actual time series recipe.

example.

create table areaEvent (
area city text,
bucket_t timestamp,
t timestamp,
uuid uuid,
value 1 blabla,
primary key ((area,bucket_t),t,uuid);

select area,t,value where area = 'LON' and bucket_t in(?) and t > 'XXX'

Cannot delete rows from tables with custom TableName

I have a table files with partition key hash. gocassa is configured with the TableName option:

options := gocassa.Options{
  TableName: "files",
}
table := keyspace.Table("files", &File{}, keys).WithOptions(options)

When deleting a row using gocassa it gives the following error. Insertion and updates on the same table work fine.

unconfigured columnfamily files__hash

Rename the "SET" method to "REPLACE"

"SET" doesn't give the full picture of what it actually does, which (when using an object) will completely overwrite any previous data.

"REPLACE" tells the whole story.

times series : the bucket doesn't have the good value

package main

import (
//"fmt"
gocql "github.com/gocql/gocql"
c "github.com/hailocab/cmagic"
"time"
)

type Ts struct {
T time.Time
InternalId gocql.UUID
RaceId int
Fare int
}

func main() {
// connect to a keyspace
k, err := c.ConnectToKeySpace("juju", []string{"127.0.0.1"}, "", "")
if err != nil {
panic(err)
}
//Debug mode
k.DebugMode(true)
// using a one to one table
times := k.TimeSeriesTable("ts", "T", "InternalId", time.Hour, Ts{})

if err != nil {
panic(err)
}
uuid, err := gocql.RandomUUID()
// writing a row
err = times.Set(Ts{
T: time.Now(),
InternalId: uuid,
RaceId: 1,
Fare: 10,
})
if err != nil {
panic(err)
}

//read a row
//row, err := times.Read(timeStamp, "")
//fmt.Println(row)
}
Show less

INSERT INTO ts (t, internalid, raceid, fare, bucket) VALUES (?, ?, ?, ?, ?) [2015-01-19 13:33:27.723319424 +0000 GMT 70b517b6-d34e-4f7b-8339-280240364c15 1 10 1421672400]

bucket | t | internalid | fare | raceid
--------------------------+--------------------------+--------------------------------------+------+--------
1970-01-17 11:54:32+0000 | 2015-01-19 13:33:27+0000 | 70b517b6-d34e-4f7b-8339-280240364c15 | 10 | 1
t

he bucket value is wrong

Should be 2015-01-19 13:00:00+0000

Updating map fields is non additive and overwrites previous values

When passing a struct which contains a map field through the Update() method, the values in the map completely replace previous values in the cassandra map collection, rather than being merged. I would expect this behaviour on Set() as the entire struct is passed into gocassa, but I am unsure on the behaviour of Update() - I was expecting this to be additive.

It is of course entirely possible that I have missed how to do this, especially as I've literally just seen the Modifer type. If so, we need more docs on how to do this 😉

Contrived example:

type Basket struct {
    Id          string
    CustomerId  string
    Items    map[string]string
    //...
}

// Create new & save
basketTable.Set(Basket{
    Id: "2dcb132a",
    CustomerId: "bcfefb43",
    Items map[string]string{
        "db6998c3": "Red Bull",
        "cf30b742": "Cornetto",
    },
})

// Reading from C* would return {"db6998c3": "Red Bull", "cf30b742": "Cornetto"}

// Update just the map field with additional values
basketTable.Update("2dcb132a", map[string]interface{}{
    "Items": {
        "88d57b18": "Creme Egg",
    },
})

// C* now returns {"88d57b18": "Creme Egg"}

time series list are calling too many bucket.

I have a table with an hourly bucket time.
I try to have a list of stuff happened between 2015-01-19 14:10:05 and 2015-01-19 14:11:05 .
The query should ask for one bucket, but it s calling for two bucket.

It s the same comportement with more than one bucket

SELECT t, internalid, raceid, fare, bucket FROM juju.ts WHERE bucket IN (?, ?) AND t >= ? AND t <= ? [1421676000000 1421679600000 2015-01-19 14:10:05 +0000 UTC 2015-01-19 14:11:05 +0000 UTC]

Replace * by all the column name

In cassandra (like quite all the DB), select * is less efficient than select a,b,c, ...
So for read queries, the library should always specify the column

One to many : how te delete a complete row ?

The delete function it s asking for two value. Should be nice to be allow to give only the partition key when we need to delete the entire row.

races.Delete(1, 1) ==> delete a logical row
races.Delete(1,nil) ==> delete a physical row.

Table name generator creates long table names

When using the table helpers the table name is generated by contatenating various strings. Unfortunately even with short names for the various fields this can easily create table names which are longer than the 48 character maximum - giving the following error on Create(): Column family names shouldn't be more than 48 characters long.

Happens here:
https://github.com/hailocab/gocassa/blob/02f233ec3d3e1d0fd18d88929f286cbf8a4bc80d/keyspace.go#L97-L100

t: k.table(fmt.Sprintf("%s_multiTimeSeries_%s_%s_%s_%s", name, indexField, timeField, idField, bucketSize.String()), row, m, Keys{
    PartitionKeys:     []string{indexField, bucketFieldName},
    ClusteringColumns: []string{timeField, idField},
}).(*t),

I can't see any way to continue to override this naming while continuing to use the receipe tables? Happy to submit a patch, any suggestions on approach, or am I missing something?

timeseriesB : argument seems confusing for me

for this struc
type Benefits struct {
T time.Time
InternalId gocql.UUID
City string
MinuteBalance int
}
the signature is :
TimeSeriesBTable(tableName, timeField, fieldToIndexByField, uniqueKey, bucketSize, row)

I need to create the table like that

times := k.TimeSeriesBTable("benefits", "City", "T", "InternalId", time.Hour, Benefits{})

Seems strange, i was waiting for :
times := k.TimeSeriesBTable("benefits", "T", "City", "InternalId", time.Hour, Benefits{})

It s a mistake ? let s talk on that tomorrow

gocassa uses json tags to map to the struct names

We found a bug that if you have json tags on a struct and you are reading from cassandra to the struct it will try to use the json tags as the struct field names.

For example gocassa will successfull write data to cassandra in the foobar field. When reading it will get caught up unmarshalling. It seems to try to write to the struct's foo_bar field which doesn't exist.

Example:

type Example struct {
  FooBar string `json: "foo_bar"`
}

Here is the line of code that we think is causing the issue:
https://github.com/hailocab/gocassa/blob/master/op.go#L65

In your narrators humble opinion the expected behaviour would be to use the cql tags not the json tags.

One to one update is not working

type Job struct {
JobId string
City string
Fare int
}
mymap := map[string]interface{}{
"City": "MAD",
"Fare": 10000,
}
//Updating a row
err = jobs.Update("1", mymap)
if err != nil {
panic(err)
}

return

UPDATE juju SET City = ?, Fare = ?, City = ?, Fare = ? [1]
panic: Multiple incompatible setting of column fare

instead of UPDATE juju SET city = ?, fare = ? where jobId = ? [ "MAD" 10000 1]

Note about link to gocassa from gocql

Great work in producing this library :-)

Since we maintain a list of binding options on the gocql README, I've added gocassa to this list.

This is just an FYI - feel free to close this issue straight away.

Also as an FYI - there is an ongoing discussion in gocql about layering should work in particular for data binding, this might be relevant for the work you are doing.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.