Giter Site home page Giter Site logo

go-polo's People

Contributors

sarvalabs-manish avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

go-polo's Issues

Failed to convert reflected pointer values to their true type before set

Problem

Unexpected panic occurs when an object is decoded with the following code

type Fruit struct {
    Name string
    Cost *uint8
}

func main() {
    fruit := Fruit{Name: "Orange", Cost: nil}

    wire, err := polo.Polorize(fruit)
    if err != nil {
        log.Println(err)
    } 
    
    fmt.Println(wire)

    decoded := new(Fruit)
    if err = polo.Depolorize(decoded, wire); err != nil {
        log.Println(err)
    }

    fmt.Println(decoded)
}

Output/Stack Trace:

[14 47 6 96 79 114 97 110 103 101]

panic: reflect.Set: value of type uint64 is not assignable to type uint8 [recovered]
	panic: reflect.Set: value of type uint64 is not assignable to type uint8

...
	/Users/manish/sdk/go/src/reflect/value.go:3064 +0x210
reflect.Value.Set({0x102e6da00?, 0x14000120bb9?, 0x102e6da00?}, {0x102e6d9c0?, 0x102e452e8?, 0x38746e69752a2a?})
	/Users/manish/sdk/go/src/reflect/value.go:2090 +0xcc
github.com/sarvalabs/go-polo.(*Depolorizer).depolorizePointer(0x102e668e0?, {0x102ecc230, 0x102e668e0})
	/Users/manish/code/github/sarvalabs/go-polo/depolorizer.go:774 +0xe4
github.com/sarvalabs/go-polo.(*Depolorizer).depolorizeValue(0x0?, {0x102ecc230?, 0x102e668e0})
	/Users/manish/code/github/sarvalabs/go-polo/depolorizer.go:812 +0x9a8
github.com/sarvalabs/go-polo.(*Depolorizer).depolorizeStructValue(0x14000059c28?, {0x102ecc230?, 0x102e90ca0})
	/Users/manish/code/github/sarvalabs/go-polo/depolorizer.go:686 +0x618
github.com/sarvalabs/go-polo.(*Depolorizer).depolorizeValue(0x102e614a0?, {0x102ecc230?, 0x102e90ca0})
	/Users/manish/code/github/sarvalabs/go-polo/depolorizer.go:897 +0xd2c
github.com/sarvalabs/go-polo.(*Depolorizer).Depolorize(0x14000120b80?, {0x102e614a0?, 0x140001322e8?})
	/Users/manish/code/github/sarvalabs/go-polo/depolorizer.go:90 +0x120
github.com/sarvalabs/go-polo.Depolorize({0x102e614a0, 0x140001322e8}, {0x14000120b80?, 0x1?, 0x1?})
...

Culprit & Analysis

go-polo/depolorizer.go

Lines 760 to 777 in f767a8b

func (depolorizer *Depolorizer) depolorizePointer(target reflect.Type) (reflect.Value, error) {
// recursively call depolorize with the pointer element
value, err := depolorizer.depolorizeValue(target.Elem())
if err != nil {
return zeroVal, err
}
// Handle ZeroVal
if value == zeroVal {
return zeroVal, nil
}
// Create a new pointer value and set its inner value and return it
p := reflect.New(target.Elem())
p.Elem().Set(value)
return p, nil
}

The above code for Depolorizer.depolorizePointer() shows that the function does not convert the underlying reflected value into its true type before setting it (line 774). This causes uint64 (PosInt) number to not be cast to uint8. This only occurs if the type is declared as a pointer, resulting in a call to depolorizePointer`, where the problem code lives

Solution

We should fix the call line 774 from p.Elem().Set(value) to p.Elem().Set(value.Convert(p.Elem().Type())). This will ensure that value gets converted to its

Enforce Message Size Limits

Enforcing Size Limits

  • Protocol Buffers messages have a hard limit of 2 GB because most implementations use signed 32-bit signed arithmetic (determines the max encodable tag value). Google's official implementation enforce a hard limit of 64 MB because the entire message needs to be loaded into RAM to be decoded. Read More
  • Currently go-polo does not enforce any size limit and uses 64-bit unsigned arithmetic and hence has a much higher potential size. This needs to be artificially limited to 64 MB to make the implementation crash tolerant while loading large messages into memory. There has been no instance of such crashes (yet) but better safe than sorry. (This must probably also be enforced by the specification across implementation)
  • Consideration for size limiting are only for free sized wire types:
    • Integers and Floats use under 8 bytes, Booleans use space and Varint Tags are a maximum of 10 bytes.
    • This leaves BigInt and Word as the atomics that can have arbitrary sized data. (need their own size limit enforcement)
    • Pack and Doc encoded data must implicitly ensure that the size of sub-elements collectively does not exceed the size limit. This check will filter down to arbitrary sized encoding as their encoded output will be rejected by the pack data.

Implementation Approach

  • As mentioned in #29, we will use an EncodingOption to determine the allowed message size and default the wire config value to 64MB and disallow values over 2GB in the encoding option

POLO Raw Encoding

Supprt Raw Encode/Decode

  • Add a type polo.Raw that can be used by consuming applications to define a type in a struct that must decoded into its raw data, rather than to any actual concrete type. This raw data can then be used to decode that data independently.
  • Atomic wires must also be able to be decoded into a polo.Raw when used with Depolorize().

WireRaw Wire Type

  • Add a WireRaw wire type at value 5, replacing the deprecated WireBigInt.
  • If a Raw object is attempted to be polorized, it will be encoded with this new wiretype.
  • The wiretype is used to represent further POLO encoded binary data, for example, [5 3 1 44] is a valid POLO wire tagged as with the WireRaw wire type. It can either be decoded into a polo.Raw or into a []byte which will contain the data [3 1 44]. This data is again POLO encoded data which can be interpreted as a WirePosInt with a value of 300

Refactor Document Raw Encoding

  • Refactor the underlying Document type to map[string]Raw
  • Document Encoding should result in a WireLoad that alternates between WireWord and WireRaw for each key and value in the Document instead of the current form of being all WireWord objects.

Attempting to set to unsettable reflected value causes unexpected panic

Problem

Unexpected panics that occurs when an object is decoded with the following code

type Fruit struct {
    Name string
    Cost uint64
}

func main() {
    var fruit *Fruit

    data := []byte{...}
    if err := polo.Depolorize(fruit, data); err != nil {
        log.Println(err)
    }  
}

Culprit & Analysis

go-polo/depolorizer.go

Lines 80 to 100 in f767a8b

func (depolorizer *Depolorizer) Depolorize(object any) error {
// Reflect the object value
value := reflect.ValueOf(object)
if value.Kind() != reflect.Pointer {
return ErrObjectNotPtr
}
// Obtain the type of the underlying type
target := value.Type().Elem()
// Depolorize the next element to the target type
result, err := depolorizer.depolorizeValue(target)
if err != nil {
return err
} else if result == zeroVal {
return nil
}
// Convert and set the decoded value
value.Elem().Set(result.Convert(target))
return nil
}

The above code shows that the Depolorize method of Depolorizer does not perform a check to see if the element of the value variable (reflected object for a pointer) is Settable and presumes this property purely based on the fact that it is a pointer and addressable. This causes the following lines where we set the decoded value to the object with value.Elem().Set() to panic.

Object declared with var fruit *Fruit instead of fruit := new(Fruit) as in the problem code above will result in a non-settable pointer value

Solution

After checking that value is a pointer, we need to perform a call value.Elem().CanSet to determine if it is a settable object, otherwise, we must return an error to communicate the same.

Refactor BigInteger Encoding

Refactor BigInteger Encoding

  • Currently, BigIntegers are supported at the wiretype level on POLO. The problem with his approach is that BigIntegers are allowed to have polarity in languages like Rust and Go. The WireBigInt wire type does not express this polarity.
  • The refactored approach drops the WireBigInt wiretype and instead uses the WirePosInt and WireNegInt wiretypes which are currently used for integers upto 64-bits. The new behaviour allows those wiretypes to now express arbitrary sized integer data.
  • The refactor will also need to remove sizing errors that occur when decoding data beyong 8 bytes for integers.

Implement special encoding/decoding buffers for maps

  • Implement a MapPolorizer and a MapDepolorizer that allow inserting/fetching key-values by random access instead of sorted (ordered) access.
  • It will be a wrapper type over the simple encoding/decoding buffers.
  • This is useful for implementing custom serialization functions for types that include mapping types and we will temporarily allow this by exporting the map compare and map sorter functions as detailed in #31

Fix Document Encoding for Nested Structures

  • Currently, when using DocumentEncode to doc-encode an object, only the top-level struct gets encoded that way. Any structs that are more deeply nested are encoded regularly.
  • The correct approach must be to preserve document encoding rules for all nested structures
  • This is considered a bug and not a feature completion

Document Encoding Capabilities

POLO Document Encoding

Document Encoding is a special feature of POLO allowing an alternate encoded form for string addressable collections.
This useful for preserving field names for classes/structs, because they are encoded into the wire as well. The real capabilities
of Document Encoding are after they have been decoded into a Document object.

Document Type

  • The polo.Document is an intermediary structure when decoding a document encoded wire. (wire type 13)
  • It allows access to the top level fields using string keys, this access can include to raw byte data but also to decoded objects.
  • It can easily collapsed back into its original document encoded byte form.
type Document map[string][]byte
  • Each value in this Document map represents some POLO encoded data which in turn may be atomic. or compound in nature (can also be another document).
  • Only structs and maps with string keys can be encoded as documents. For maps, their keys are used document fields names. While for structs, only exported fields can be document encoded using their field name. This can overloaded with custom struct tags as follows using the polo:"<name>" tagging convention.
type Fruit struct {
    Name  string     `polo:"name"`
    Cost    uint64    `polo:"cost"`
    Alias    []string  `polo:"aliases"`  
}
  • Encoding objects as document encoded wires cannot be achieved without providing some kind of option or flag to the Polorize function as it would not be able to determine it independently. So we need a separate function DocumentEncode for encoding objects into document encoded wire.
  • Decoding objects from a document encode wire is straightforward and involves augmenting the Depolorize function to recognize that the target object is Document (reflect.Map Kind) or a struct when encountering the WireDoc wire type.

Method Specification

// DocumentEncode encodes a given object into its POLO bytes with Document Encoding.
// Will return an error if given object is not a supported type (structs, maps with strings keys or a pointer to either).
func DocumentEncode(object any) ([]byte, error) {...}

// Size returns the number of elements in the Document
func (doc Document) Size() int {...}

// Bytes collapses the Document into its POLO wire form
func (doc Document) Bytes() []byte {...}

// Is returns whether the wire for a given key is of a specific WireType.
// Allows pseudo type inspection for document encoded components
func (doc Document) Is(key string, kind WireType) bool {...}

// Get retrieves the raw POLO bytes for a given key. Returns nil if no data for key.
func (doc Document) Get(key string) []byte {...}

// Set inserts some raw POLO bytes for a given key. Does not validate the data.
// Can lead to malform data while decoding if not POLO bytes.
func (doc Document) Set(key string, data []byte) {...}

// GetObject decodes the data for a given key into the given object.
// Returns an error if the object is not a pointer, if there is no data for the key or if it cannot decoded into the given object type
func (doc Document) GetObject(key string, object any) error {...}

// SetObject encodes the given object and sets it for the given key.
// Returns an error if the given object cannot be serialized with Polorize.
func (doc Document) SetObject(key string, object any) error {...}

Refactor Encoding API

Refactor POLO Encoding API

Currently the POLO API specifies 5 ways to encode data.

  • Polorize() can be used to encode an arbitrary Go object into its POLO bytes
  • Depolorize() can be used to decode some POLO bytes into an arbitrary Go object.
  • Packer type exposes methods to encode pack encoded data sequentially with its Pack and PackWire methods
  • Unpacker type exposes methods to decoded pack encoded data sequentially with its Unpack and UnpackWire methods.
  • DocumentEncode() can be used to encode string keyed maps and structs into doc-encoded POLO bytes with the Document type being able to encode and decode objects from it using GetObject and SetObject methods

Unified Encoding Model

The proposal is to unify the encoding/decoding API with the following.

  • Polorizer and Depolorizer types to replace Packer and Unpacker to allow encoding/decoding atomics and pack encoded data and documents. It allows for a more performant model of sequential encoding/decoding without using Go's reflect library notorious for its performance problems. We will still retain the Polorize() and Depolorize() functions to perform this polorization using the reflect package.
  • Both these types will have methods to encode booleans, strings, numerics and other atomics directly.

Undecodable Document Wire for Document.Bytes()

Document.Bytes() Output Is Undecodable

Code:

func main() {
    type Fruit struct {
        Name  string
        Cost  int
        Alias []string
    }

    doc := make(Document)
    doc.SetObject("Name", "orange")
    doc.SetObject("Cost", 300)
    doc.SetObject("Alias", []string{"tangerine", "mandarin"})
    docwire := doc.Bytes()
    fmt.Println(docwire)
    
    fruit := new(Fruit)
    if err := Depolorize(fruit, docwire); err != nil {
    fmt.Println(err)
    }
    
    fmt.Println(fruit)
}

Output:

[13 175 1 6 86 198 3 134 4 198 4 134 5 65 108 105 97 115 6 14 63 6 150 1 116 97 110 103 101 114 105 110 101 109 97 110 100 97 114 105 110 67 111 115 116 6 3 1 44 78 97 109 101 6 6 111 114 97 110 103 101]
decode error: struct field [polo.Fruit.Cost <int>]: incompatible wire type. expected: posint. got: word
&{ 0 []}

Analysis

The issue with Document.Bytes() method persists. This seems to be because the fix with #9 did not correctly collapse the bytes. Calling DocumentEncode on a document constructed map has the effect of creating extra word tags for the already encoded data in the Document.

Solution

Replace the Bytes() method with the code below. This time the regular Polorize is sufficient but the first tag of the message is replaced to have the WireDoc tag instead of the WirePack tag.

// Bytes returns the POLO wire representation of a Document
func (doc Document) Bytes() []byte {
    data, _ := Polorize(doc)
    data[0] = byte(WireDoc)
    return data
}

fix issue with wire config propogation for load decoding

  • there is an issue with decode functions that rely on newLoadDepolorizer as it does not allow us to propagate the wire config correctly.
  • we must extend the function to inherit the wire config as a value or as an EncodingOption

Allow Polorizer & Depolorizer Option Flags

  • Allow Polorizer and Depolorizer to have option flags to change the encoding/decoding behaviour of the buffers.
  • These flags must also be able to be passed through the Polorize and Depolorize functions
  • Some of the possible flags can include
    • Performing Document Encoding (assists #28 )
    • Disabling Custom Encoding for types that implement the Polorizable and Depolorizable interface
    • Manipulating the buffer size limits (assists #6 )
    • Encoding Behaviour for Big Numbers, Byte Slices and Byte Arrays

Incorrect Wiretype On Document Collapse

Incorrect Wiretype for Document Upon Collapse

Calling Document.Bytes() to collapse a Document object into its POLO bytes form should return some binary data that starts with a byte of value 13 [0b1101] indicating the WireDoc wire type. But it currently returns 14 [0b1110] which indicates that it is pack encoded (incorrectly)

This seems to be caused by an improper implementation of the Bytes method of Document. It should be using DocumentEncode to polorize the Document into its bytes but instead uses the standard Polorize function leading it to be pack encoded instead of being doc encoded. See below for details

// github.com/sarvalabs/go-polo/document.go

// Bytes returns the POLO wire representation of a Document
func (doc Document) Bytes() []byte {
    data, _ := Polorize(doc)  // << BUG >>
    return data
}

Solution

The fix is straigthforward, requiring a refactor of the Bytes method to use the DocumentEncode function instead of Polorize

Export map sorting & compare functions

  • Expose the map sort and map compare capabilities to allow for implementing custom serialization on types that include map types.
  • This is a temporary until we implement a special encoding/decoding buffer for maps as detailed in #30

Custom Message Packing & Unpacking

Custom Packing and Unpacking Functionality

  • There are instances when unsupported types need to encoded/decoded but POLO cannot handle those types (e.g. Interfaces)
  • We can custom message packing and unpacking functionality similar to a continous encoder or decoder found in Borsh implementations.
  • This would allow users to sequentially pack some objects into a message in order of their choosing. The output type is always WirePack and can similarly be 'unpacked' element by element.
  • We can use the underlying buffer tools such as readbuffer and loadreader to do this cleanly.
  • Note that the element ordering is handled by the consumer application will need to be consistent on the decoding end as well.

Packing

type Packer struct {
    // unexported readbuffer 
}

// NewPacker is a constructor for Packer
func NewPacker() *Packer {...}

// Pack encodes the given object into Packer
func (pack *Packer) Pack(object any) error {...}

// PackWire writes the encoded wire into Packer.
// wire must not be empty and begin with a valid WireType
func (pack *Packer) PackWire(wire []byte) error {...}

// Bytes returns the encoded data as valid WirePack tagged POLO bytes
func (pack Packer) Bytes() []byte {...}

Unpacking

type Unpacker struct {
    // unexported loadreader
}

// NewUnpacker is a constructor for Unpacker
// Returns an error if the data is not a compound wire
func NewUnpacker (wire []byte) (*Unpacker, error) {...}

// Unpack decodes the next element into the given object.
// Returns error if not a pointer or if the element cannot be decoded into the object
func (unpack *Unpacker) Unpack(object any) error {...}

// UnpackWire reads an element from Unpacker and returns it.
func (unpack *Unpacker) UnpackWire() ([]byte, error) {...}

// Done returns whether all the elements have been unpacked
func (unpack Unpacker) Done() bool {...}

Polorize Panics for Nil Objects

Unexpected Panic when calling Polorize with nil.

This panic only occurs when passing abstract nil such as in the following function

func Do(mgs interface{}) error {
    bytes, err :=  polo.Polorize(msg)  // panic occurs here
    if err != nil {
        return err
    }
    
    // more logic
}

The panic message and partial stack trace is as follows

panic: reflect: call of reflect.Value.Type on zero Value [recovered]
	panic: reflect: call of reflect.Value.Type on zero Value

panic({0x102538760, 0x140001322b8})
	/Users/manish/sdk/go/src/runtime/panic.go:838 +0x204
reflect.Value.Type({0x0?, 0x0?, 0x4?})
	/Users/manish/sdk/go/src/reflect/value.go:2453 +0x130
github.com/sarvalabs/go-polo.polorize({0x0?, 0x0?, 0x102538520?}, 0x102538520?)
	/Users/manish/code/github/sarvalabs/go-polo/polorize.go:69 +0x264
github.com/sarvalabs/go-polo.Polorize({0x0?, 0x0?})
	/Users/manish/code/github/sarvalabs/go-polo/polo.go:10 +0xec

This kind of panic is expected to occur only for nil values of abstract types such as those of interface declarations. For concrete nil values, this issue does not seem to occur.

The specific culprit is when an unsupported type is passed, the error message attempts to pass the unsupported type and its kind with the error, requiring a call to reflect.Value.Type(). This call panics for zero values and cannot be checked for abstract nil values with IsNil or IsZero without also causing another panic.

Solution

We can attempt to check for nil values at the Polorize function level before reflecting on the given object but this will rule out nil values for valid and concrete types such as slices and maps.

The best option is to recover from the panic when it occurs within the unsupported type case of the reflection tree within the polorize function

  • Implement Solution
  • Implement Test Cases for Nil Objects (Abstract and Concrete)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.