Giter Site home page Giter Site logo

parquet-tools's People

Contributors

dependabot[bot] avatar hangxie avatar likang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

parquet-tools's Issues

Get rid of filter option in cat

It's half-baked with limited feature, while jq can do better job. The original idea is to provide embedded filter function in case data size is huge that cannot be loaded into memory by jq, but with support of JSONL in parquet-tools, this may no longer needed - even if jq loads full JSONL into memory, one can split the jumbo file into small chunks then apply jq.

Don't forget update USAGE.md

Refactor schema code

Code for raw format is fine, but code for JSON schema and go struct are pretty messy, they should be refactored to make maintenance easier.

nested go struct

Current go struct implementation does not work well with nested struct, s3://dpla-provider-export/2021/04/all.parquet/part-00000-471427c6-8097-428d-9703-a751a6572cca-c000.snappy.parquet is a good example to use.

CI: cannot build arm/v7 image

amd64 and arm64 both are good, but not arm/v7:

#34 [linux/arm/v7 builder 4/4] RUN apt update     && apt install -y bash make git     && make build
#34 sha256:795f1301e7583f9110bd32e669be8e74cc5c6f300e73b986d8015c928e772ac0
#34 110.4 go: downloading github.com/xitongsys/parquet-go-source v0.0.0-20201108113611-f372b7d813be
#34 110.6 go: downloading github.com/xitongsys/parquet-go v1.6.1-0.20210331075444-5ecfa15142b5
#34 111.5 go: downloading github.com/stretchr/testify v1.6.1
#34 112.5 go: downloading github.com/pkg/errors v0.9.1
#34 113.0 go: downloading github.com/golang/mock v1.4.3
#34 113.0 go: downloading github.com/apache/thrift v0.13.1-0.20201008052519-daf620915714
#34 114.6 go: downloading github.com/davecgh/go-spew v1.1.1
#34 115.3 go: downloading github.com/pmezard/go-difflib v1.0.0
#34 115.4 go: downloading gopkg.in/yaml.v3 v3.0.0-20200313102051-9f266ea9e77c
#34 116.8 go: downloading github.com/jmespath/go-jmespath v0.4.0
#34 120.5 go: downloading gopkg.in/check.v1 v1.0.0-20180628173108-788fd7840127
#34 121.2 go: downloading golang.org/x/net v0.0.0-20201110031124-69a78807bb2b
#34 129.8 go: downloading github.com/jmespath/go-jmespath/internal/testify v1.5.1
#34 130.3 go: downloading github.com/golang/snappy v0.0.1
#34 130.6 go: downloading github.com/klauspost/compress v1.10.5
#34 144.0 go: downloading github.com/kr/pretty v0.1.0
#34 144.1 go: downloading gopkg.in/yaml.v2 v2.2.8
#34 144.1 go: downloading github.com/kr/text v0.1.0
#34 144.3 go: downloading golang.org/x/text v0.3.3
#34 157.7 ==> Building executable
#34 174.8 go build runtime/cgo: gcc: exit status 1

INT96 import issue

INT96 values does not match original value after import:

    "Int96": "1717-12-28T19:20:10.805069776Z",		      |	    "Int96": "2022-01-01T09:09:09.009009Z",

Support HDFS

It seems quite some people are using Hadoop along with parquet and HDFS scheme is supported by parquet-mr, I think it's a good idea to have HDFS support here as well.

Unable to process map with value type of list

With parquet file generated by parquet-go's json_schema.go example:

$ parquet-tools schema json_schema.parquet
panic: runtime error: index out of range [0] with length 0 [recovered]
	panic: runtime error: index out of range [0] with length 0

goroutine 1 [running]:
github.com/alecthomas/kong.catch(0x1400081fe60)
	github.com/alecthomas/[email protected]/kong.go:383 +0xb8
panic({0x103c7e2c0, 0x140000cd260})
	runtime/panic.go:838 +0x204
github.com/hangxie/parquet-tools/cmd.(*schemaNode).updateTagFromConvertedType(0x1400081ec48, 0x1400060ef60?)
	github.com/hangxie/parquet-tools/cmd/schema.go:210 +0x818
github.com/hangxie/parquet-tools/cmd.(*schemaNode).getTagMap(0x1400081ec48)
	github.com/hangxie/parquet-tools/cmd/schema.go:285 +0x520
github.com/hangxie/parquet-tools/cmd.getTagMapAsChild({0x0, 0x0, 0x140000ca960, {0x140006143da, 0x5}, 0x140000ca9b0, 0x140000ca9b8, 0x0, 0x0, 0x0, ...}, ...)
	github.com/hangxie/parquet-tools/cmd/schema.go:312 +0x98
github.com/hangxie/parquet-tools/cmd.(*schemaNode).updateTagFromConvertedType(0x1400014ecb0, 0x1400060eed0?)
	github.com/hangxie/parquet-tools/cmd/schema.go:225 +0x558
github.com/hangxie/parquet-tools/cmd.(*schemaNode).getTagMap(0x1400014ecb0)
	github.com/hangxie/parquet-tools/cmd/schema.go:285 +0x520
github.com/hangxie/parquet-tools/cmd.(*schemaNode).jsonSchema(0x1400014ecb0)
	github.com/hangxie/parquet-tools/cmd/schema.go:129 +0x24
github.com/hangxie/parquet-tools/cmd.(*schemaNode).jsonSchema(0x1400014e7e0)
	github.com/hangxie/parquet-tools/cmd/schema.go:147 +0x160
github.com/hangxie/parquet-tools/cmd.(*SchemaCmd).Run(0x104375920, 0x1?)
	github.com/hangxie/parquet-tools/cmd/schema.go:37 +0x278
reflect.Value.call({0x103b79460?, 0x104375920?, 0x140006bfad8?}, {0x1038feff2, 0x4}, {0x1400000e1b0, 0x1, 0x10314604c?})
	reflect/value.go:556 +0x5e4
reflect.Value.Call({0x103b79460?, 0x104375920?, 0xc?}, {0x1400000e1b0, 0x1, 0x1})
	reflect/value.go:339 +0x98
github.com/alecthomas/kong.callMethod({0x1038feb24, 0x3}, {0x103c05580?, 0x104375920?, 0x3?}, {0x103b79460?, 0x104375920?, 0x0?}, 0x0?)
	github.com/alecthomas/[email protected]/callbacks.go:71 +0x3a4
github.com/alecthomas/kong.(*Context).RunNode(0x140000c4f80, 0x1400038e700, {0x140006bff00, 0x1, 0x1})
	github.com/alecthomas/[email protected]/context.go:706 +0x468
github.com/alecthomas/kong.(*Context).Run(0x1400038e1c0?, {0x140006bff00?, 0x0?, 0x0?})
	github.com/alecthomas/[email protected]/context.go:723 +0xc0
main.main()
	github.com/hangxie/parquet-tools/main.go:40 +0x2bc

Update usage

Review USAGE.md to make sure all sections are up to date

add to brew formula

Should make a v1.0.0 release before doing so, we are pretty close to that.

build rpm

Need to check if RPM is still a thing ... been away from RHEL/CentOS/Fedora for some time.

Improve parquet-go error handling

parquet-go should not panic on catchable errors, follow error was from invalid CSV schema:

$ go run . import -m cmd/testdata/jsonl.schema -s cmd/testdata/jsonl.source /tmp/jsonl.parquet
panic: runtime error: index out of range [1] with length 1 [recovered]
	panic: runtime error: index out of range [1] with length 1

goroutine 1 [running]:
github.com/alecthomas/kong.catch(0x140003bfef8)
	/Users/xiehang/go/pkg/mod/github.com/alecthomas/[email protected]/kong.go:383 +0xd0
panic({0x102dcc7a0, 0x14000036c18})
	/opt/homebrew/Cellar/go/1.17/libexec/src/runtime/panic.go:1038 +0x21c
github.com/xitongsys/parquet-go/common.StringToTag({0x14000500000, 0x1})
	/Users/xiehang/go/pkg/mod/github.com/hangxie/[email protected]/common/common.go:86 +0x2980
github.com/xitongsys/parquet-go/schema.NewSchemaHandlerFromMetadata({0x140002a0600, 0x14, 0x20})
	/Users/xiehang/go/pkg/mod/github.com/hangxie/[email protected]/schema/csv.go:28 +0x378
github.com/xitongsys/parquet-go/writer.NewCSVWriter({0x140002a0600, 0x14, 0x20}, {0x102e6cf50, 0x1400000fdb8}, 0x8)
	/Users/xiehang/go/pkg/mod/github.com/hangxie/[email protected]/writer/csv.go:27 +0x50
github.com/hangxie/parquet-tools/cmd.newCSVWriter({0x14000036be8, 0x12}, {0x140002a0600, 0x14, 0x20})
	/Users/xiehang/Dev/parquet-tools/cmd/common.go:174 +0x94
github.com/hangxie/parquet-tools/cmd.(*ImportCmd).importCSV(0x10349d588)
	/Users/xiehang/Dev/parquet-tools/cmd/import.go:56 +0x328
github.com/hangxie/parquet-tools/cmd.(*ImportCmd).Run(0x10349d588, 0x140000818a0)
	/Users/xiehang/Dev/parquet-tools/cmd/import.go:27 +0x54
reflect.Value.call({0x102d56de0, 0x10349d588, 0x213}, {0x102a76885, 0x4}, {0x1400000fda0, 0x1, 0x1})
	/opt/homebrew/Cellar/go/1.17/libexec/src/reflect/value.go:543 +0x584
reflect.Value.Call({0x102d56de0, 0x10349d588, 0x213}, {0x1400000fda0, 0x1, 0x1})
	/opt/homebrew/Cellar/go/1.17/libexec/src/reflect/value.go:339 +0x8c
github.com/alecthomas/kong.callMethod({0x102a763b9, 0x3}, {0x102da8860, 0x10349d588, 0x199}, {0x102d56de0, 0x10349d588, 0x213}, 0x1400047c8d0)
	/Users/xiehang/go/pkg/mod/github.com/alecthomas/[email protected]/callbacks.go:71 +0x4ac
github.com/alecthomas/kong.(*Context).RunNode(0x140000ecd00, 0x14000442460, {0x140003bff38, 0x1, 0x1})
	/Users/xiehang/go/pkg/mod/github.com/alecthomas/[email protected]/context.go:706 +0x3e0
github.com/alecthomas/kong.(*Context).Run(0x140000ecd00, {0x140003bff38, 0x1, 0x1})
	/Users/xiehang/go/pkg/mod/github.com/alecthomas/[email protected]/context.go:723 +0x80
main.main()
	/Users/xiehang/Dev/parquet-tools/main.go:33 +0x1a8
exit status 2

Refactor document

To make README more useful, I should put installation part and simple use cases to README, then the USAGE.md still have full document for everything, it seems most user just read README and have no interest to go through the lengthy USAGE.md.

schema command panicked at a certain parquet file

@erikburgess reported that certain parquet file will panick schema command with:

panic: runtime error: index out of range [0] with length 0 [recovered]
	panic: runtime error: index out of range [0] with length 0

goroutine 1 [running]:
github.com/alecthomas/kong.catch(0x14000559e60)
	github.com/alecthomas/[email protected]/kong.go:383 +0xb4
panic({0x104e45100, 0x140001e56e0})
	runtime/panic.go:884 +0x204
github.com/hangxie/parquet-tools/cmd.(*schemaNode).updateTagFromConvertedType(0x14000524240, 0x140001128d0?)
	github.com/hangxie/parquet-tools/cmd/schema.go:262 +0x838
github.com/hangxie/parquet-tools/cmd.(*schemaNode).getTagMap(0x14000524240)
	github.com/hangxie/parquet-tools/cmd/schema.go:337 +0x518
github.com/hangxie/parquet-tools/cmd.(*schemaNode).jsonSchema(0x14000524240)
	github.com/hangxie/parquet-tools/cmd/schema.go:127 +0x24
github.com/hangxie/parquet-tools/cmd.(*schemaNode).jsonSchema(0x14000114630)
	github.com/hangxie/parquet-tools/cmd/schema.go:145 +0x160
github.com/hangxie/parquet-tools/cmd.(*SchemaCmd).Run(0x105559d68, 0x1?)
	github.com/hangxie/parquet-tools/cmd/schema.go:37 +0x298
reflect.Value.call({0x104d3d540?, 0x105559d68?, 0x140004d7ac8?}, {0x104ab8de6, 0x4}, {0x1400000fd10, 0x1, 0x1042ef39c?})
	reflect/value.go:584 +0x688
reflect.Value.Call({0x104d3d540?, 0x105559d68?, 0x140004d7b68?}, {0x1400000fd10?, 0xc?, 0xc?})
	reflect/value.go:368 +0x90
github.com/alecthomas/kong.callMethod({0x104ab890b, 0x3}, {0x104dcb920?, 0x105559d68?, 0x3?}, {0x104d3d540?, 0x105559d68?, 0x0?}, 0x0?)
	github.com/alecthomas/[email protected]/callbacks.go:71 +0x3f0
github.com/alecthomas/kong.(*Context).RunNode(0x140001c0a00, 0x140003aa700, {0x140004d7f00, 0x1, 0x1})
	github.com/alecthomas/[email protected]/context.go:706 +0x460
github.com/alecthomas/kong.(*Context).Run(0x140003aa1c0?, {0x140004d7f00?, 0x0?, 0x0?})
	github.com/alecthomas/[email protected]/context.go:723 +0xbc
main.main()
	github.com/hangxie/parquet-tools/main.go:40 +0x2b8

Docker hub notification

Slack webhook does not work with docker hub repository, the notification needs to be moved to CCI job.

protocol error: received DATA after END_STREAM

To re-produce:

$ parquet-tools schema https://huggingface.co/datasets/laion/laion2B-en/resolve/main/part-00047-5114fd87-297e-42b0-9d11-50f1df323dfa-c000.snappy.parquet
2022/08/24 09:51:57 protocol error: received DATA after END_STREAM
2022/08/24 09:51:58 protocol error: received DATA after END_STREAM
2022/08/24 09:51:58 protocol error: received DATA after END_STREAM
2022/08/24 09:51:58 protocol error: received DATA after END_STREAM
2022/08/24 09:51:58 protocol error: received DATA after END_STREAM
2022/08/24 09:51:58 protocol error: received DATA after END_STREAM
2022/08/24 09:51:58 protocol error: received DATA after END_STREAM
2022/08/24 09:51:58 protocol error: received DATA after END_STREAM
2022/08/24 09:51:59 protocol error: received DATA after END_STREAM
2022/08/24 09:51:59 protocol error: received DATA after END_STREAM
2022/08/24 09:51:59 protocol error: received DATA after END_STREAM
{"Tag":"name=Spark_schema, type=STRUCT, repetitiontype=REQUIRED","Fields":[{"Tag":"name=SAMPLE_ID, type=INT64, repetitiontype=OPTIONAL"},{"Tag":"name=URL, type=BYTE_ARRAY, convertedtype=UTF8, repetitiontype=OPTIONAL"},{"Tag":"name=TEXT, type=BYTE_ARRAY, convertedtype=UTF8, repetitiontype=OPTIONAL"},{"Tag":"name=HEIGHT, type=INT32, repetitiontype=OPTIONAL"},{"Tag":"name=WIDTH, type=INT32, repetitiontype=OPTIONAL"},{"Tag":"name=LICENSE, type=BYTE_ARRAY, convertedtype=UTF8, repetitiontype=OPTIONAL"},{"Tag":"name=NSFW, type=BYTE_ARRAY, convertedtype=UTF8, repetitiontype=OPTIONAL"},{"Tag":"name=Similarity, type=DOUBLE, repetitiontype=OPTIONAL"}]}

TUI

Reminder to myself that TUI is a thing that can be pretty interesting, I'm thinking of using Bubble Tea or tview to build something, maybe with this parquet-tools but can also be something else.

FIXED_LEN_BYTE_ARRAY/DECIMAL type output

FIXED_LEN_BYTE_ARRAY/DECIMAL output is not human readable, it's something like

"\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0005\u007f)"

refactor code to deal with fields that need to be re-interpretted

Codes to handle DECIMAL, INTERVAL and INT96 in cat and meta commands were added steadily without an high level design, the logic is spread to lots of places and hard to maintain.

Here are what's in my mind to refactor those codes:

  1. test cases:
    • Cover these types:
      • DECIMAL (FIXED_LENGTH_BYTE_ARRY and BYTE_ARRAY)
      • DECIMAL (INT32 and INT64)
      • INTERVAL (FIXED_LENGTH_BYTE_ARRY)
      • INT96 (treat as timestamp only)
    • Cover field with above types in these locations:
      • top level
        • scalar
        • pointer
      • embedded
        • list element
        • map key
        • map value
  2. rough idea for cat:
    1. scan schema to get all fields needs to be reinterpreted
    2. base64 string values of those fields that need to be reinterpreted
    3. JSON whole data (a single row)
    4. use gjson/sjon to retrieve fields need to be reinterpreted, convert, then assign back
  3. rough idea for meta:
    1. scan schema to get all fields needs to be reinterpreted
    2. reinterpret min/max value

Improve output of MinValue and MaxValue

There should be something I can do to improve data shown as MinValue and MaxValue, I don't mean to deal with all scenarios, but at least should handle numeric value (INTnn and FLOAT/DOUBLE) and string value (UTF8 only)

Missing logicaltype in schema output

logicaltype like these are missing in schema output:

Date2             int32               `parquet:"name=date2, type=INT32, convertedtype=DATE, logicaltype=DATE"`
TimeMillis2       int32               `parquet:"name=timemillis2, type=INT32, logicaltype=TIME, logicaltype.isadjustedtoutc=true, logicaltype.unit=MILLIS"`
TimeMicros2       int64               `parquet:"name=timemicros2, type=INT64, logicaltype=TIME, logicaltype.isadjustedtoutc=false, logicaltype.unit=MICROS"`
TimestampMillis2  int64               `parquet:"name=timestampmillis2, type=INT64, logicaltype=TIMESTAMP, logicaltype.isadjustedtoutc=true, logicaltype.unit=MILLIS"`
TimestampMicros2  int64               `parquet:"name=timestampmicros2, type=INT64, logicaltype=TIMESTAMP, logicaltype.isadjustedtoutc=false, logicaltype.unit=MICROS"`
Decimal5          int32               `parquet:"name=decimal5, type=INT32, scale=2, precision=9, logicaltype=DECIMAL, logicaltype.precision=9, logicaltype.scale=2"`

schema output are:

Date2             int32            `parquet:"name=Date2, type=INT32, convertedtype=DATE, repetitiontype=REQUIRED"`
Timemillis2       int32            `parquet:"name=Timemillis2, type=INT32, repetitiontype=REQUIRED"`
Timemicros2       int64            `parquet:"name=Timemicros2, type=INT64, repetitiontype=REQUIRED"`
Timestampmillis2  int64            `parquet:"name=Timestampmillis2, type=INT64, repetitiontype=REQUIRED"`
Timestampmicros2  int64            `parquet:"name=Timestampmicros2, type=INT64, repetitiontype=REQUIRED"`
Decimal5          int32            `parquet:"name=Decimal5, type=INT32, repetitiontype=REQUIRED"`

INTERVAL type import or cat problem

Imported from a jsonl to parquet with INTERVAL date type, then cat got panic:

$ parquet-tools cat -f jsonl cmd/testdata/all-types.parquet > /tmp/data.jsonl
$ parquet-tools schema -f json cmd/testdata/all-types.parquet > /tmp/schema.json
$ parquet-tools import -m /tmp/schema.json -f jsonl -s /tmp/data.jsonl /tmp/imported.parquet
$ parquet-tools cat /tmp/imported.parquet
[panic: runtime error: index out of range [0] with length 0 [recovered]
	panic: runtime error: index out of range [0] with length 0

goroutine 1 [running]:
github.com/alecthomas/kong.catch(0x140002ffe60)
	github.com/alecthomas/[email protected]/kong.go:383 +0xb8
panic({0x100d1f3a0, 0x1400090c990})
	runtime/panic.go:838 +0x204
github.com/xitongsys/parquet-go/types.DECIMAL_BYTE_ARRAY_ToString({0x1013b0020?, 0x0?, 0x0?}, 0x100a1236d?, 0x140002fee28?)
	github.com/xitongsys/[email protected]/types/converter.go:127 +0x1d0
github.com/hangxie/parquet-tools/cmd.reinterpretNestedFields(0x140002ff078, {0x1400008a6d0, 0x0, 0x0}, {0x10?, 0x1?, 0x140002fefc8?, 0x100294334?})
	github.com/hangxie/parquet-tools/cmd/cat.go:229 +0x604
github.com/hangxie/parquet-tools/cmd.reinterpretNestedFields(0x140006781d0, {0x1400008a6d0, 0x1, 0x1}, {0x0?, 0x4?, 0x1400035f100?, 0x14000479110?})
	github.com/hangxie/parquet-tools/cmd/cat.go:216 +0x248
github.com/hangxie/parquet-tools/cmd.(*CatCmd).Run(0x10137ef80, 0x1?)
	github.com/hangxie/parquet-tools/cmd/cat.go:112 +0x978
reflect.Value.call({0x100c3f940?, 0x10137ef80?, 0x1400035fad8?}, {0x1009f271b, 0x4}, {0x140004746f0, 0x1, 0x1002aaeec?})
	reflect/value.go:556 +0x5e4
reflect.Value.Call({0x100c3f940?, 0x10137ef80?, 0x9?}, {0x140004746f0, 0x1, 0x1})
	reflect/value.go:339 +0x98
github.com/alecthomas/kong.callMethod({0x1009f224f, 0x3}, {0x100d247a0?, 0x10137ef80?, 0x3?}, {0x100c3f940?, 0x10137ef80?, 0x0?}, 0x0?)
	github.com/alecthomas/[email protected]/callbacks.go:71 +0x3a4
github.com/alecthomas/kong.(*Context).RunNode(0x140001ff200, 0x14000412380, {0x1400035ff00, 0x1, 0x1})
	github.com/alecthomas/[email protected]/context.go:706 +0x468
github.com/alecthomas/kong.(*Context).Run(0x140004121c0?, {0x1400035ff00?, 0x0?, 0x0?})
	github.com/alecthomas/[email protected]/context.go:723 +0xc0
main.main()
	github.com/hangxie/parquet-tools/main.go:40 +0x2bc

It works fine if this field is removed:

    {
      "Tag": "name=Interval, type=FIXED_LEN_BYTE_ARRAY, convertedtype=INTERVAL, repetitiontype=REQUIRED"
    },

Import using JSON schema

This was mentioned by a friend, I think it's a useful feature as JSON schema is used more than schema file used by parquet-go.

I'm not sure if it is doable though, need to research a bit.

Panic on Fedora 35 and 36

Fedora 33 and 34, but failed on 35 and 36, error from 36:

# ./parquet-tools-v1.10.1-linux-amd64
runtime/cgo: pthread_create failed: Operation not permitted
SIGABRT: abort
PC=0x7f2996e7b39c m=0 sigcode=18446744073709551610

goroutine 0 [idle]:
runtime: unknown pc 0x7f2996e7b39c
stack: frame={sp:0x7ffdc062a060, fp:0x0} stack=[0x7ffdbfe2b5b8,0x7ffdc062a5f0)
0x00007ffdc0629f60:  0x00007ffdc062a430  0x00000000033662e0
0x00007ffdc0629f70:  0x0000000000203000  0x0000000001708480
0x00007ffdc0629f80:  0x00007f297027805b  0x00007f299700ee9f
0x00007ffdc0629f90:  0x0000000000000001  0x0000000000000000
0x00007ffdc0629fa0:  0x2525252525252525  0x2525252525252525
0x00007ffdc0629fb0:  0x000000ffffffffff  0x0000000000000000
0x00007ffdc0629fc0:  0x000000ffffffffff  0x0000000000000000
0x00007ffdc0629fd0:  0x415353454d5f434c  0x505f434c00534547
0x00007ffdc0629fe0:  0x0000000000000000  0x0000000000000000
0x00007ffdc0629ff0:  0x0000000000000000  0x0000000000000000
0x00007ffdc062a000:  0x0000000000000000  0x0000000000000000
0x00007ffdc062a010:  0x0000000000000000  0x0000000000000000
0x00007ffdc062a020:  0x6e75720000000000  0x6f67632f656d6974
0x00007ffdc062a030:  0x0000000000000000  0x0000000000000000
0x00007ffdc062a040:  0x3b31303d63706d2e  0x67676f2e2a3a3633
0x00007ffdc062a050:  0x2a3a36333b31303d  0x00007f2996e7b38e
0x00007ffdc062a060: <0x3d7661772e2a3a36  0x2e2a3a36333b3130
0x00007ffdc062a070:  0x333b31303d61676f  0x7375706f2e2a3a36
0x00007ffdc062a080:  0x2a3a36333b31303d  0x3b31303d7870732e
0x00007ffdc062a090:  0x0000000000000000  0x0000000000000000
0x00007ffdc062a0a0:  0x0000000000000000  0x0000000000000000
0x00007ffdc062a0b0:  0x0000000000000000  0x0000000000000000
0x00007ffdc062a0c0:  0x0000000000000000  0x0000000000000000
0x00007ffdc062a0d0:  0x0000000000000000  0x0000000000000000
0x00007ffdc062a0e0:  0x0000000000000000  0x83af847d47132c00
0x00007ffdc062a0f0:  0x00007f2996de9740  0x0000000000000006
0x00007ffdc062a100:  0x00000000033662e0  0x0000000000203000
0x00007ffdc062a110:  0x0000000001708480  0x00007f2996e2e696
0x00007ffdc062a120:  0x00007f2996fe8990  0x00007f2996e187f3
0x00007ffdc062a130:  0x0000000000000020  0x0000000000000000
0x00007ffdc062a140:  0x0000000000000000  0x0000000000000000
0x00007ffdc062a150:  0x0000000000000000  0x0000000000000000
runtime: unknown pc 0x7f2996e7b39c
stack: frame={sp:0x7ffdc062a060, fp:0x0} stack=[0x7ffdbfe2b5b8,0x7ffdc062a5f0)
0x00007ffdc0629f60:  0x00007ffdc062a430  0x00000000033662e0
0x00007ffdc0629f70:  0x0000000000203000  0x0000000001708480
0x00007ffdc0629f80:  0x00007f297027805b  0x00007f299700ee9f
0x00007ffdc0629f90:  0x0000000000000001  0x0000000000000000
0x00007ffdc0629fa0:  0x2525252525252525  0x2525252525252525
0x00007ffdc0629fb0:  0x000000ffffffffff  0x0000000000000000
0x00007ffdc0629fc0:  0x000000ffffffffff  0x0000000000000000
0x00007ffdc0629fd0:  0x415353454d5f434c  0x505f434c00534547
0x00007ffdc0629fe0:  0x0000000000000000  0x0000000000000000
0x00007ffdc0629ff0:  0x0000000000000000  0x0000000000000000
0x00007ffdc062a000:  0x0000000000000000  0x0000000000000000
0x00007ffdc062a010:  0x0000000000000000  0x0000000000000000
0x00007ffdc062a020:  0x6e75720000000000  0x6f67632f656d6974
0x00007ffdc062a030:  0x0000000000000000  0x0000000000000000
0x00007ffdc062a040:  0x3b31303d63706d2e  0x67676f2e2a3a3633
0x00007ffdc062a050:  0x2a3a36333b31303d  0x00007f2996e7b38e
0x00007ffdc062a060: <0x3d7661772e2a3a36  0x2e2a3a36333b3130
0x00007ffdc062a070:  0x333b31303d61676f  0x7375706f2e2a3a36
0x00007ffdc062a080:  0x2a3a36333b31303d  0x3b31303d7870732e
0x00007ffdc062a090:  0x0000000000000000  0x0000000000000000
0x00007ffdc062a0a0:  0x0000000000000000  0x0000000000000000
0x00007ffdc062a0b0:  0x0000000000000000  0x0000000000000000
0x00007ffdc062a0c0:  0x0000000000000000  0x0000000000000000
0x00007ffdc062a0d0:  0x0000000000000000  0x0000000000000000
0x00007ffdc062a0e0:  0x0000000000000000  0x83af847d47132c00
0x00007ffdc062a0f0:  0x00007f2996de9740  0x0000000000000006
0x00007ffdc062a100:  0x00000000033662e0  0x0000000000203000
0x00007ffdc062a110:  0x0000000001708480  0x00007f2996e2e696
0x00007ffdc062a120:  0x00007f2996fe8990  0x00007f2996e187f3
0x00007ffdc062a130:  0x0000000000000020  0x0000000000000000
0x00007ffdc062a140:  0x0000000000000000  0x0000000000000000
0x00007ffdc062a150:  0x0000000000000000  0x0000000000000000

goroutine 1 [running]:
runtime.systemstack_switch()
	/usr/local/go/src/runtime/asm_amd64.s:350 fp=0xc000050780 sp=0xc000050778 pc=0x462f60
runtime.main()
	/usr/local/go/src/runtime/proc.go:174 +0x7b fp=0xc0000507e0 sp=0xc000050780 pc=0x43771b
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1581 +0x1 fp=0xc0000507e8 sp=0xc0000507e0 pc=0x465181

rax    0x0
rbx    0x7f2996de9740
rcx    0x7f2996e7b39c
rdx    0x6
rdi    0x15
rsi    0x15
rbp    0x15
rsp    0x7ffdc062a060
r8     0x7ffdc062a130
r9     0x7f2996fa24e0
r10    0x8
r11    0x246
r12    0x6
r13    0x203000
r14    0x1708480
r15    0x7f297027805b
rip    0x7f2996e7b39c
rflags 0x246
cs     0x33
fs     0x0
gs     0x0

schema output of LIST and MAP key/value type misses converted type

expected:

Map  map[string]int32 `parquet:"name=Map, type=MAP, repetitiontype=REQUIRED, keytype=BYTE_ARRAY, keyconvertedtype=UTF8, valuetype=INT32"`
List []string         `parquet:"name=List, type=LIST, repetitiontype=REQUIRED, valuetype=BYTE_ARRAY, valueconvertedtype=DECIMAL, valuescale=2, valueprecision=10"`

got:

Map  map[string]int32 `parquet:"name=Map, type=MAP, keytype=BYTE_ARRAY, valuetype=INT32"`
List []string         `parquet:"name=List, type=LIST, valuetype=BYTE_ARRAY"`

Push to alternative container registry

I'm moving away from Docker, current build works with podman after #97, however, images built are still uploading to Docker Hub which is kind of risky (cost and throttle, etc.).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.