Giter Site home page Giter Site logo

liquidaty / zsv Goto Github PK

View Code? Open in Web Editor NEW
171.0 7.0 9.0 7.75 MB

zsv+lib: tabular data swiss-army knife CLI + world's fastest (simd) CSV parser

License: MIT License

Makefile 9.28% C 88.49% Shell 2.23%
csv json simd sql parser sqlite3 flatten serialize txt fixed

zsv's People

Contributors

herbygillot avatar iamazeem avatar liquidaty avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

zsv's Issues

[macos/clang] Tests are failing

On macos/clang, the tests are failing with the following errors:

test-echo: 
/bin/sh: line 1:  2584 Illegal instruction: 4  /Users/runner/work/zsv/zsv/build/Darwin/rel/clang/bin/zsv_echo /Users/runner/work/zsv/zsv/data/loans_1.csv > /tmp/test-echo.out
make[2]: *** [test-echo] Error 132
make[1]: *** [test-echo] Error 2
make: *** [test] Error 2

macOS: 10.15
clang: 12.0.0

Passed LDFLAGS='-mmacosx-version-min=10.5' as env var but looks like it's not included in the final linking command:

clang -pipe -ffunction-sections -fdata-sections -fpic -O3 -DNDEBUG -std=gnu11 -Wno-gnu-statement-expression -Wshadow -Wall -Wextra -Wno-missing-braces -pedantic -DSTDC_HEADERS -D_GNU_SOURCE -ftree-vectorize -fvisibility=hidden -I/Users/runner/work/zsv/zsv/amd64-macosx-clang/include -mavx2 -DZSV_EXTRAS -DHAVE__MM256_MOVEMASK_EPI8 -DHAVE_IMMINTRIN_H -DHAVE_MEMMEM -DHAVE_TGETENT -DHAVE_ARC4RANDOM_UNIFORM -DHAVE___BUILTIN_EXPECT -DNO___BUILTIN_EXPECT_WITH_PROBABILITY -flto -Iexternal/utf8proc-2.6.1 -DUTF8PROC -DUTF8PROC_STATIC -I/Users/runner/work/zsv/zsv/include -o /Users/runner/work/zsv/zsv/build/Darwin/rel/clang/bin/zsv_echo echo.c /Users/runner/work/zsv/zsv/build/Darwin/rel/clang/objs/utils/writer.o /Users/runner/work/zsv/zsv/build/Darwin/rel/clang/objs/utils/file.o /Users/runner/work/zsv/zsv/build/Darwin/rel/clang/objs/utils/err.o /Users/runner/work/zsv/zsv/build/Darwin/rel/clang/objs/utils/signal.o /Users/runner/work/zsv/zsv/build/Darwin/rel/clang/objs/utils/mem.o /Users/runner/work/zsv/zsv/build/Darwin/rel/clang/objs/utils/clock.o /Users/runner/work/zsv/zsv/build/Darwin/rel/clang/objs/utils/arg.o /Users/runner/work/zsv/zsv/build/Darwin/rel/clang/objs/utils/dl.o /Users/runner/work/zsv/zsv/build/Darwin/rel/clang/objs/utils/string.o /Users/runner/work/zsv/zsv/build/Darwin/rel/clang/objs/utils/dirs.o   -L/Users/runner/work/zsv/zsv/amd64-macosx-clang/lib -lzsv /Users/runner/work/zsv/zsv/build/Darwin/rel/clang/external/utf8proc-2.6.1/utf8proc.o  -lpthread  -march=native -ldl -ltermcap 

See full logs here: https://github.com/liquidaty/zsv/runs/6664830117?check_suite_focus=true

ARM64 Linux Support

Any plans to add support for ARM64 Linux?

I just tried to build it for ARM64 on Linux by aarch64-linux-gnu-gcc cross-compiler.

PREFIX=zsv-arm64-linux-gcc CC=aarch64-linux-gnu-gcc ./configure
make install

But it failed with linker errors for termcap library:

/usr/lib/gcc-cross/aarch64-linux-gnu/10/../../../../aarch64-linux-gnu/bin/ld: cannot find -ltermcap
collect2: error: ld returned 1 exit status

You might want to explore this further if it's in the roadmap. Thanks!

configure script is failing

Description

On Ubuntu 18.04 and 20.04 under bash, the configure script fails with these errors:

$ ./configure
./configure: 214: [: unexpected operator
./configure: 225: [: unexpected operator
./configure: 227: [[: not found
...
checking whether compiler accepts __builtin_expect(0,0)..../configure: 156: [: 1: unexpected operator
no
checking whether compiler accepts __builtin_expect_with_probability(0,0,0.5)..../configure: 156: [: 1: unexpected operator
no
creating ... ./configure: 492: cannot create : Directory nonexistent

As the configure script specifies #!/bin/sh, the bash things such as [[ condition ]], and == are not supported.
The same behavior has also been observed in the GitHub Actions CI pipeline.
However, with #!/bin/bash, it works fine.

You might want to look into it on your side as well.
Maybe, a fix would be required either to use pure sh stuff or using bash

Expected

The configure script should work without fail.

Actual

The configure script fails with multiple syntax errors.

choco.exe install failing

Any idea why this might be failing?

I downloaded the latest zsv-0.3.4-alpha-amd64-windows-mingw.nupkg and am executing:

choco install zsv -source .\zsv-0.3.4-alpha-amd64-windows-mingw.nupkg

image

add `paste` command

currently, the sql command supports --join-indexes for key-based joins similar to a paste command. Certain use cases however are not covered, such as a simple paste based on row number (i.e. first row of each file side-by-side, second row side-by-side, etc)

Some of these can be covered using the unix paste utility. However, that utility has limitations such as not being directly usable on CSV files, not being able to handle different header spans (e.g. file 1 has a 2-row header, and file 1 has a 1-row header).

[freebsd/gcc] Tests are failing

Errors:

gmake[2]: Entering directory '/Users/runner/work/zsv/zsv/app/test'
\033[1;35mtest-echo: \033[0m
gmake[2]: *** [Makefile:100: test-echo] Illegal instruction (core dumped)
gmake[2]: Leaving directory '/Users/runner/work/zsv/zsv/app/test'
gmake[1]: Leaving directory '/Users/runner/work/zsv/zsv/app'
gmake[1]: *** [Makefile:348: test-echo] Error 2
gmake: *** [Makefile:57: test] Error 2

CI run: https://github.com/liquidaty/zsv/runs/6874125879?check_suite_focus=true#step:9:3432

Similar to #15.

Add GitHub Actions CI pipeline.

Add GitHub Actions CI pipeline.

Supported platforms and compilers:

  • Linux (gcc)
  • MacOS (clang/gcc)
  • Windows (mingw)
  • BSD (gcc)

Add version.h to store the released version

Currently, there's no version file in the codebase.
The version is extracted using git command and used as a compile time macro VERSION wherever needed.
Also, the CI workflow is being updated with the historical released version. See:

TAG: "0.3.2"

Instead of hardcoding the released version as TAG in the CI, it is suggested to use a version.h header file as the default place to store version information e.g. version, branch, date, etc.

`zsv sql` returns exit code 0 for missing CLI args

Environment

  • Ubuntu 22.04 LTS
  • zsv 0.3.8-alpha

Description

Unlike stack, compare and other similar commands that return exit code 1 after printing their help text, sql command does not:

$ zsv stack > /dev/null; echo $?
1

$ zsv compare > /dev/null; echo $?
1

$ zsv sql > /dev/null; echo $?
0

`zsv help 2json` returns exit code 5

Unlike other commands, zsv help 2json returns an exit code 5 whereas others return 0.
zsv help 2json should also return 0 to be consistent with the rest.

CLI:

$ zsv version
zsv version 0.3.8-alpha (lib 0.3.8-alpha)

$ zsv help 2json
2json: streaming CSV to json converter, or sqlite3 db to JSON converter

Usage: 
   2json [input.csv] [options]
   2json --from-db <sqlite3_filename> [options]

Options:
  -h, --help
  -o, --output <filename>       : output to specified filename
  --compact                     : output compact JSON
  --from-db                     : input is sqlite3 database
  --db-table <table_name>       : name of table in input database to convert
  --object                      : output as array of objects
  --no-empty                    : omit empty properties (only with --object)
  --database                    : output in database schema
  --no-header                   : treat the header row as a data row
  --index <name on expr>        : add index to database schema
  --unique-index <name on expr> : add unique index to database schema

$ echo $?
5

Screenshot:

Screenshot from 2024-03-06 23-21-59


OS: Ubuntu 22.04 LTS
zsv: 0.3.8-alpha

[Ubuntu 18.04] zsv linked GLIBC version issue

zsv did not work after installing the .deb packages.

Environment:

$ lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 18.04.6 LTS
Release:	18.04
Codename:	bionic

Downloaded and tested with:

Error:

$ zsv version
zsv: /lib/x86_64-linux-gnu/libm.so.6: version `GLIBC_2.29' not found (required by zsv)
zsv: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.28' not found (required by zsv)

Linked libraries:

$ ldd $(which zsv)
/usr/bin/zsv: /lib/x86_64-linux-gnu/libm.so.6: version `GLIBC_2.29' not found (required by /usr/bin/zsv)
/usr/bin/zsv: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.28' not found (required by /usr/bin/zsv)
	linux-vdso.so.1 (0x00007ffe0ffed000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fd781c9a000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fd781a96000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fd7816f8000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fd781307000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fd781eb9000)

As these binaries are built on Ubuntu 20.04 so this issue is expected for the versions lower than 20.04 (where GLIBC version is not supported).

To support Ubuntu 18.04, the runner should be the same.
However, the Ubuntu 18.04 runner has been deprecated.

One solution could be to generate a statically linked binary to resolve this issue.
Another could be to build binaries on Docker containers and distribute those.

In addition, the minimum supported version of the target OS for the prebuilt binaries should properly be documented in the README in case the static linkage is not possible.

Thanks!

Lack of documentation and unclear API is a real drawback.

Hello,

I'm very interested in using zsvlib into a C project.

I'm looking for a way to use the library in order to achieve the same result as the command zsv select file.csv -- column_title to convert a column into a char**.

However, I find the API unclear, and the applications source code seems quite convoluted, to not say obscure. I'm starting to understand that a zsvlib “Hello World” should start with:

struct zsv_opts* opts = zsv_get_default_opts();
zsv_parser parser = zsv_new(opts);

But, again, I feel like I'm at the beginning of a long journey before I finally feel comfortable with the API, and start producing the result I want to achieve.

I'm hoping to use this library in the future, so I wanted to signal this difficulty.

Best regards.

Incorrect columns for last row

A user reported that the last row of some data was showing the incorrect number of columns.
I tested the file with both the pull and simple examples, and it shows the same problem in both.

Here's a version of the data exhibiting the issue. Most text has been converted to underscores.
The file is semi-colon (;) delimited, is UTF-8 encoded, and has some non-ascii characters :

wrong_cols.csv

The last row reports 46 columns instead of 37.

Adding a single character to one of the preceding rows (from row 396 onwards) results in a successful parse.
Pulling the header and last row into a new file results in a successful parse.

Tested on Windows (x64) and macOS (arm64).

[QA] Test nuget package (nupkg) with local feed

Following are the instructions to test on Linux/Ubuntu (use same instructions on Windows with respective path changes):

Download nuget package zsv-amd64-windows-mingw.nupkg from CI artifacts https://github.com/liquidaty/zsv/actions/runs/2474706978 to ~/Downloads directory.

Install nuget:

sudo apt install nuget

Unzip:

unzip zsv-amd64-windows-mingw.nupkg.zip

Create a nuget feed directory:

mkdir nuget-feed

Add package:

nuget add ./zsv-amd64-windows-mingw.nupkg -source ./nuget-feed

Install package:

nuget install zsv -source $PWD/nuget-feed

Check in current directory, there'll be a directory named zsv.8.6.22.1654706402.
The version may differ depending on the CI workflow the package is downloaded from.

[PKG] Update custom homebrew formula with new release

The custom homebrew formula needs to be updated with each new release via CI workflow.

The current homebrew-zsv is pointing to an older release 0.0.4-alpha while the latest one is v0.3.3-alpha.

Changes required:

  1. Version: https://github.com/liquidaty/homebrew-zsv/blob/a6b5b99f3e1acd286df0366d6eff09ec1f8ddcc6/formula/zsv.rb#L4
  2. Hash: https://github.com/liquidaty/homebrew-zsv/blob/a6b5b99f3e1acd286df0366d6eff09ec1f8ddcc6/formula/zsv.rb#L7

Suggested CI Workflow

With each new release, at the end of CI run from main:

  1. Clone https://github.com/liquidaty/homebrew-zsv
  2. Calculate SHA256 of the generated tar file i.e. zsv-version-alpha-amd64-macosx-gcc.tar.gz
  3. Replace version and hash in https://github.com/liquidaty/homebrew-zsv/blob/main/formula/zsv.rb
  4. Commit and push

Inconsistent `zsv help <command>` formatting

Environment

  • OS: Ubuntu 22.04 LTS
  • zsv: v0.3.8-alpha

Description

The help output for some commands seems to be inconsistent in terms of:

  • line length
  • indentation of the flag description
  • space between short and long flags
  • placement of colon after flag
  • missing description
  • duplicate description
  • argument format (underscore vs space)
  • argument case mismatch (lowercase vs the rest)

Here are such observed instances:

  • zsv help (indentation)
  zsv (un)register [<extension_id>]: (un)register an extension
      Registration info is saved in zsv.ini located in a directory determined as:
        ZSV_CONFIG_DIR environment variable value, if set
        otherwise, /usr/local/etc

  -u,--malformed-utf8-replacement <replacement_string>: replacement string (can be empty) in case of malformed UTF8 input
       (default for "desc" commamnd is '?')
  • zsv help select (indentation, colon placement)
Usage: select [filename] [options] [-- col_specifier [... col_specifier]]
  where col_specifier is a column name or, if the -n option is used,
   a column index (starting at 1) or index range in the form of n-m
  e.g. select -n myfile.csv -- 1 4-6 50 10
       select myfile.csv -- first_col fiftieth_column "Tenth Column"

  -e <embedded lineend char>  : char to replace embedded lineend. if none provided, embedded lineends are preserved
      If the provided string begins with 0x, it will be interpreted as the hex representation of a string

  -n: provided column indexes are numbers corresponding to column positions (starting with 1), instead of names

  --whitespace-clean-no-newline: clean whitespace and remove embedded newlines
  -W,--no-trim: do not trim whitespace
  -o <output filename>: name of file to save output to
  • zsv help sql (indentation, colon placement)
sql: run ad hoc sql on a CSV file
          or join multiple CSV files on one or more common column(s)

Options:
  --join-indexes <n1...>: specify one or more column names to join multiple files by
     each n is treated as an index in the first input file that determines a column
     of the join. For example, if joining two files that, respectively, have columns
     A,B,C,D and X,B,C,A,Y then `--join-indexes 1,3` will join on columns A and C.
     When using this option, do not include an sql statement
  -b: output with BOM
  • zsv help count (missing description, one space indentation whereas others have two, redundant square brackets)
Usage: count [options]
Options:
 -h, --help            : show usage
 [-i, --input] <filename>: use specified file input
  • zsv help desc (colon placement)
Options:
  -b, --with-bom : output with BOM
  -C <maximum_number_of_columns>: defaults to 1024
  -H: only output header names
  -q, --quick: minimize example counts,
  -a, --all: calculate all metadata (for now, this only adds uniqueness info)
  -o <output filename>: name of file to save output to (defaults to stdout)
  • zsv help pretty (indentation, double colons)
  -W, --width <n>: set the max line width to output. if not provided
                            will try to detect automatically
  -p, --rows:             : set the number of (preview) rows to calculated widths from
                            if not provided, defaults to 150
  • zsv help flatten (long description, formatting of flags, colon placement)
flatten: flatten a table, based on a single-column key assuming that rows to flatten always appear in contiguous blocks

Usage: flatten [<filename>] [<options>] -- [aggregate_output_spec ...]
Each aggregate output specification is either (i) a single-column aggregation or (future: (ii) the "*" placeholder (in conjunction with -a)).
A single-column aggregation consists of the column name or index, followed by the equal sign (=) and then an aggregation method.
If the equal sign should be part of the column name, it can be escaped with a preceding backslash.

The following aggregation methods may be used:  array (pipe-delimited)
  array_<delim> (user-specified delimiter)

Options:
  -b: output with BOM
  -v, --verbose: display verbose messages
  -C <max columns to output>: maximum number of columns to output
  -m <max rows per aggregation>: defaults to 1024. If this limit is reached for any aggregation,
     an error will be output
  --row-id <Row ID column name>: Required. name of column to group by
  --col-name <Column ID column name>: name of column specifying the output column name
  -V <Value column name>: name of column specifying the output value
  (future: -a <Aggregation method>: aggregation method to use for the select-all placeholder)
  -o <output filename>: name of file to save output to
  • zsv help 2json (flags: extra space after comma)
Options:
  -h, --help
  -o, --output <filename>       : output to specified filename
  • zsv help 2tsv (indentation)
2tsv: convert CSV to TSV (tab-delimited text) suitable for simple-delimiter
       text processing. By default, embedded tabs or multilines will be escaped
       to \t, \n or \r, respectively
  • zsv help serialize (duplicate description)
serialize: Serialize a CSV file into Row/Colname/Value triplets

Usage: serialize [<filename>]
Serializes a CSV file
  • zsv help stack (flags)
zsv help stack
stack: stack one or more csv files vertically, aligning columns with the same name

Usage: stack [options] filename [filename...]

Options:
  -o <filename>: output file
  -b: output with BOM
  -q: always add double-quotes
  -T: input is tab-delimited, instead of comma-delimited
  • zsv help compare (missing description)
Usage: compare [options] [file1.csv] [file2.csv] [...]

There may be other instances too.

Not able to run paste

Hi,
first of all thank you for the development of this product.

I'm using zsv version 0.3.8-alpha. If I run zsv paste 01.csv 02.csv, I have 71522 illegal hardware instruction zsv paste 01.csv 02.csv.

How to use paste?

Thank you

"Illegal instruction" when running on Intel Ivy Bridge CPU

Hello- I wanted to compare the speed benchmark for zsv with csvquote, and ran into this issue where zsv appears to not run on my 2012-era CPU. This is likely because my CPU does not support the AVX2 set of SIMD instructions.

$ ./confgiure
$ sudo make install
$ zsv count testdata.csv
Illegal instruction

For the zsv benchmark, I think you may want to use a different test data set instead of https://burntsushi.net/stuff/worldcitiespop_mil.csv because that data does exercise any special csv parsing capabilities. With this data set, to perform the benchmark tests to count rows and select columns the standard unix text processing tools wc and cut would suffice (or awk).

A better test for benchmarking would be a large file (multiple GB) that contains quoted fields with embedded field separators and record separators (commas and newlines). A clever example of a self-generating CSV test file is here which provides an infinite stream of random CSV data that can be adjusted to increase or decrease the portion of quoted fields that contain separator characters that should be treated as data.

Also the benchmark test should check for correct results as well as measure speed.

Hope these suggestions are helpful.
Dan

make install fails for lib on Windows (MinGW + GitBash)

Description

make install fails for lib on Windows (MinGW + GitBash).

Errors:

Working directory is 'D:\a\zsv\zsv'
[command]"C:\Program Files\Git\bin\git.exe" version
git version 2.35.1.windows.2
...
./configure --prefix=build
make install
...
Building with mingw
mingw32-make[1]: Entering directory 'D:/a/zsv/zsv/src'
Using config file /d/a/zsv/zsv/config.mk
No profiling set. To use PGO, compile with PGO=1, then run with data, then compile again with PGO=2
No VECTOR_SIZE set, using 256
Makefile:13: /d/a/zsv/zsv/config.mk: No such file or directory
mingw32-make[1]: Leaving directory 'D:/a/zsv/zsv/src'
mingw32-make[1]: *** No rule to make target '/d/a/zsv/zsv/config.mk'.  Stop.
mingw32-make: *** [Makefile:54: install] Error 2

Some digging into it revealed that src/Makefile:147 was unable to create a target for LIBZSV_INSTALL which contains /d/a/zsv/zsv/config.mk.

Could you please share your Windows + MinGW environment specs with versions?

Expected

The build should be successful.

Actual

The build is failing.

Incorrect column count on last row with empty column

It appears that zsv produces an incorrect column count on the last row if the last column is empty and an EOL (end-of-line) is not provided.

The following csv does not have an EOL on the final row. This should not affect the number of columns reported by the library, but it does. In this case, only a single column is reported on the last row - whereas it correctly reports 2 columns on the preceding row.

one,two
1,2
3,
4,

This is the result of pull.c on the above.

Row 1 has 2 columns of which 2 are non-blank
Row 2 has 2 columns of which 2 are non-blank
Row 3 has 2 columns of which 1 is non-blank
Row 4 has 1 columns of which 1 is non-blank

As you can see, it incorrectly reports the last row as having only 1 column.

Adding an EOL to the last row produces a different result, with the last row now reporting 2 columns :

one,two
1,2
3,
4,

This is the result of pull.c with the EOL added to the last row :

Row 1 has 2 columns of which 2 are non-blank
Row 2 has 2 columns of which 2 are non-blank
Row 3 has 2 columns of which 1 is non-blank
Row 4 has 2 columns of which 1 is non-blank

[QA] Test AMD64 Linux/gcc and Linux/clang RPM packages #18

Please download and install RPM packages on your side.

Download Linux/gcc and Linux/clang RPM packages from here:
https://github.com/liquidaty/zsv/actions/runs/2438944223

After downloading and extracting zsv-amd64-linux-gcc.deb:

Install:

sudo yum install ./zsv-amd64-linux-gcc.rpm

Play around with the installed zsv to make sure it's working.

Uninstall:

sudo yum remove zsv

After Linux/gcc, please test zsv-amd64-linux-clang.rpm also.


During my testing, I found out that zsv has a transitive dependency on libtinfo5. (Built on Ubuntu 18.04 with rpmbuild)
I had to install it separately on Ubuntu 18.04 with apt.
For my testing, I used alien to install the RPM package on Ubuntu 18.04 and 20.04.
Besides, there's no explicit linking with this library (libtinfo) in the source code.
If you have knowledge about this, please do share. Thanks!


In addition, earlier on May 23rd, I had this output for ldd command without libtinfo:

$ ldd ./build/Linux/rel/cc/bin/cli
    linux-vdso.so.1 (0x00007ffeb8beb000)
    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f09a3794000)
    libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f09a378d000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f09a35a1000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f09a3953000)

But, now I'm getting this output with libtinfo:

$ ldd ./build/Linux/rel/gcc/bin/cli 
    linux-vdso.so.1 (0x00007ffc0c9f4000)
    libtinfo.so.6 => /lib/x86_64-linux-gnu/libtinfo.so.6 (0x00007f34e6ade000)
    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f34e6abc000)
    libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f34e6ab5000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f34e68c9000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f34e6cab000)

On Ubuntu 18.04, it's libtinfo.so.5.
On Ubuntu 20.04, it's libtinfo.so.6.

We need to determine the cause of this addition and possibly remove this dependency.
Otherwise, this will indirectly be affecting the packaging and we'll have to ship this as part of the package.

remove allocations + optimize them

For better performance you want to remove allocations and store the state (ie string escaped or not etc) on the stack, if possible.
Check the file size to estimate, if allocation is necessary.

Also: I dont see that you use arena allocation. Did I miss that?

more precise description for UTF-8 behavior

The README shows "Assumes valid UTF8, but does not misbehave if input contains bad UTF8", but this does not specify 1. for what this assumption is made, 2. what parts of UTF8 are checked (codepoints, grapheme clusters or more stuff).

Typically, one means codepoints, but this is not explicit in the text.

[QA] Test AMD64 Linux/gcc and Linux/clang debian packages

Please download and install debian packages on your side.

Download Linux/gcc and Linux/clang debian packages from here:
https://github.com/liquidaty/zsv/actions/runs/2426883993

After downloading and extracting zsv-amd64-linux-gcc.deb:

Install:

sudo apt install ./zsv-amd64-linux-gcc.deb

Play around with the installed zsv to make sure it's working.

Uninstall:

sudo apt remove zsv

After Linux/gcc, please test zsv-amd64-linux-clang.deb also.


During my testing, I found out that zsv has a transitive dependency on libtinfo5. (Build on Ubuntu 18.04)
I had to include it as part of the debian package.
Beside, there's no explicit linking with this library in the source code.
If you have knowledge about this, please do share. Thanks!

build fails because `zsv_scan_fixed.c` can't be found

$ git clone https://github.com/liquidaty/zsv
$ cd zsv
$ ./configure
config will be saved to config.mk
checking for AWK tool... awk
checking for C compiler... cc
checking whether C compiler works... yes
checking for ar... gcc-ar
checking for ranlib... gcc-ranlib
checking host system type... x86_64-pc-linux-gnu
checking whether compiler accepts -Werror=unknown-warning-option... no
checking whether compiler accepts -Werror=unused-command-line-argument... no
checking whether compiler accepts -Werror=ignored-optimization-argument... no
checking whether linker accepts -Werror=unknown-warning-option... no
checking whether linker accepts -Werror=unused-command-line-argument... no
checking whether linker accepts -Werror=ignored-optimization-argument... no
checking whether compiler accepts -fvectorize... no
checking whether compiler accepts -ftree-vectorize... yes
checking whether compiler accepts -fopt-info-vec-optimized... yes
checking whether compiler accepts -fopt-info-vec-missed... yes
checking whether compiler accepts -fopt-info-vec-all... yes
checking whether compiler accepts -fpie... yes
checking whether compiler accepts -fpic... yes
checking whether linker accepts -pie... yes
checking whether linker accepts -fpic... yes
checking whether compiler accepts -pipe... yes
checking whether compiler accepts -ffunction-sections... yes
checking whether compiler accepts -fdata-sections... yes
checking whether compiler accepts -mavx2... yes
checking whether compiler accepts -mvpclmulqdq... yes
checking whether compiler accepts -flto... yes
checking whether compiler accepts -fvisibility=hidden... yes
checking whether linker accepts -Wl,--gc-sections... yes
checking whether linker accepts -flto... yes
checking whether linker accepts -fwhole-program... yes
checking whether linker accepts -march=native... yes
checking whether linker accepts -ldl... yes
checking whether linker accepts -Wl,-z,now... yes
checking whether linker accepts -Wl,-z,relro... yes
checking whether linker accepts -Wl,-z,now... yes
checking whether linker accepts -Wl,-z,relro... yes
checking whether compiler accepts _mm256_movemask_epi8 from immintrin.h...yes
checking whether there is a header called immintrin.h... yes
checking whether compiler accepts memmem from string.h...no
checking whether compiler accepts tgetent from termcap.h...yes
checking whether compiler accepts arc4random_uniform from stdlib.h...no
checking whether compiler accepts rand_s from stdlib.h with #define _CRT_RAND_S... no
checking whether compiler accepts __builtin_expect(0,0)...yes
checking whether compiler accepts __builtin_expect_with_probability(0,0,0.5)...yes
creating config.mk... done
$ make all
make[1]: Entering directory '/home/andrew/clones/zsv/src'
Using config file /home/andrew/clones/zsv/config.mk
No profiling set. To use PGO, compile with PGO=1, then run with data, then compile again with PGO=2
No VECTOR_SIZE set, using 256
cc -pipe -ffunction-sections -fdata-sections  -fvisibility=hidden -DHAVE__MM256_MOVEMASK_EPI8 -DHAVE_IMMINTRIN_H -DHAVE_TGETENT -DHAVE___BUILTIN_EXPECT -DHAVE___BUILTIN_EXPECT_WITH_PROBABILITY -mavx2 -DNDEBUG -O3 -fPIC -std=gnu11 -Wno-gnu-statement-expression -Wshadow -Wall -Wextra -Wno-missing-braces -pedantic -D_GNU_SOURCE  -I/home/andrew/clones/zsv/include -DNO_UTF8_CHECK -DVECTOR_SIZE_256 -o /home/andrew/clones/zsv/build/Linux/rel/cc/objs/zsv.o -c zsv.c
In file included from zsv.c:20:
zsv_internal.c:394:10: fatal error: zsv_scan_fixed.c: No such file or directory
  394 | #include "zsv_scan_fixed.c"
      |          ^~~~~~~~~~~~~~~~~~
compilation terminated.
make[1]: *** [Makefile:163: /home/andrew/clones/zsv/build/Linux/rel/cc/objs/zsv.o] Error 1
make[1]: Leaving directory '/home/andrew/clones/zsv/src'
make: *** [Makefile:49: all] Error 2

There is no zsv_scan_fixed.c file in my checkout, so... yeah. My next step was to try your downloads, but your web site is borked:

$ curl https://zsvhub.com
curl: (60) SSL: no alternative certificate subject name matches target host name 'zsvhub.com'
More details here: https://curl.se/docs/sslcerts.html

curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above.

curl -k https://zsvhub.com works. So I try my browser, and it finally lets me through after a bunch of warnings and I was able to get a working binary that way.

[CI] Fix release workflow warnings

Release (v0.3.4-alpha) CI run: https://github.com/liquidaty/zsv/actions/runs/3660069700

Warnings:

ci (ubuntu-20.04)
Node.js 12 actions are deprecated. For more information see: https://github.blog/changelog/2022-09-22-github-actions-all-actions-will-begin-running-on-node16-instead-of-node12/. Please update the following actions to use Node.js 16: dawidd6/action-get-tag@v1

ci (ubuntu-20.04)
The `set-output` command is deprecated and will be disabled soon. Please upgrade to using Environment Files. For more information see: https://github.blog/changelog/2022-10-11-github-actions-deprecating-save-state-and-set-output-commands/

ci (macos-12)
Node.js 12 actions are deprecated. For more information see: https://github.blog/changelog/2022-09-22-github-actions-all-actions-will-begin-running-on-node16-instead-of-node12/. Please update the following actions to use Node.js 16: dawidd6/action-get-tag@v1

ci (macos-12)
The `set-output` command is deprecated and will be disabled soon. Please upgrade to using Environment Files. For more information see: https://github.blog/changelog/2022-10-11-github-actions-deprecating-save-state-and-set-output-commands/

These warnings are coming from:

uses: dawidd6/action-get-tag@v1
if: startsWith(github.ref, 'refs/tags/v')
id: tag
with:
strip_v: true

The rest of the actions have already been updated earlier for NodeJS updates.

https://github.com/dawidd6/action-get-tag has been archived by its author.

As an alternative, the tag can directly be extracted and set like this (lines copied here for quick reference):

    - name: Get tag if tagged/released and set TAG env var
      if: startsWith(github.ref, 'refs/tags/v')
      run: |
        TAG=$(echo $GITHUB_REF | cut -d '/' -f3)
        echo "TAG: $TAG"
        if [[ $TAG == "v"* ]]; then
          TAG="${TAG:1}"
        fi
        echo "TAG: $TAG"
        echo "TAG=$TAG" >> $GITHUB_ENV

I'll submit a PR shortly. Thanks!

CC: @liquidaty

optimize data sizes

zsv_scanner, zsv_ctx, zsv_stack_colname etc look like they have padding, since the char, bool fields etc are not at the end of struct.

C is not allowed to reorder fields, so there might be significant padding. Consider reordering the struct fields.

Support shell command completion for Bash and other supported shells for better UX

Here's how minikube supports this as part of its binary:

$ minikube help completion 
Outputs minikube shell completion for the given shell (bash, zsh or fish)

	This depends on the bash-completion binary.  Example installation instructions:
	OS X:
		$ brew install bash-completion
		$ source $(brew --prefix)/etc/bash_completion
		$ minikube completion bash > ~/.minikube-completion  # for bash users
		$ minikube completion zsh > ~/.minikube-completion  # for zsh users
		$ source ~/.minikube-completion
		$ minikube completion fish > ~/.config/fish/completions/minikube.fish # for fish users
	Ubuntu:
		$ apt-get install bash-completion
		$ source /etc/bash-completion
		$ source <(minikube completion bash) # for bash users
		$ source <(minikube completion zsh) # for zsh users
		$ minikube completion fish > ~/.config/fish/completions/minikube.fish # for fish users

	Additionally, you may want to output the completion to a file and source in your .bashrc

	Note for zsh users: [1] zsh completions are only supported in versions of zsh >= 5.2
	Note for fish users: [2] please refer to this docs for more details https://fishshell.com/docs/current/#tab-completion

Usage:
  minikube completion SHELL [flags] [options]

Use "minikube options" for a list of global command-line options (applies to all commands).

zsv may also support shell command completion as part of its binary.

Here's a small PoC for Bash: https://github.com/iamazeem/zsv-bash-completion

Demo:

zsv-bash-completion-demo

Wasm package on NPM?

Hi there, I'm looking to use zsv in a browser via wasm. Is there already a published package on NPM with the compiled wasm to use?

Non-blank cell emitted for empty files

Hi,

I use zsv library to parse simple csv files with fixed number of columns and expect zsv to emit a known number of cells for each row. However, empty files breaks that rule emitting a non-empty cell:

$ examples/lib/build/simple /dev/null
Row 1 has 1 columns of which 1 is non-blank
$ examples/lib/build/pull /dev/null
Row 1 has 1 columns of which 1 is non-blank

I think zsv should handle that case and return zsv_next_row(zsv) != zsv_status_row on first invocation.

$ git describe --tags
v0.3.5-alpha-7-g7780dac

Build fails with custom prefix

Description

I was trying to build with a custom prefix i.e. build:

./configure --prefix=build
make install

The lib builds successfully but cli fails while linking with these errors:

/usr/bin/ld: cannot find -lzsv
collect2: error: ld returned 1 exit status
make[1]: *** [Makefile:275: ~/zsv/build/Linux/rel/cc/bin/cli] Error 1
make[1]: Leaving directory '~/zsv/app'
make: *** [Makefile:46: install] Error 2

Also, the sequence of linker flags seemed incorrect:

cc -pipe -ffunction-sections -fdata-sections ... -lzsv -Lbuild/lib -lpthread -fwhole-program -march=native -ldl

-L should precede -l.
And, -Lbuild/bin doesn't seem right.
Shouldn't this be complete path from the build directory i.e.:

$ tree build
build/
└── Linux
    └── rel
        └── cc
            ├── bin
            ├── external
            │   ├── ...
            ├── lib
            │   └── libzsv.a
            └── objs
                ├── cli_2json.o
                ├── ...
                ├── utils
                │   ├── arg.o
                │   ├── ...
                │   └── writer.o
                └── zsv.o

Expected

The cli should build successfully.

Actual

The cli is failing with linking error as libzsv.a is not found.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.