Giter Site home page Giter Site logo

csv_parser's Introduction

csv_parser

A simple C library for parsing CSV.

License: MIT

Overview

csv.c: Parse a single line of CSV (a string with no unescaped linebreaks)

split.c: Split a string of CSV (with unescaped linebreaks) into single lines of CSV

fread_csv_line.c: Extract a single line of CSV from a file

Documentation (csv.c)

char **parse_csv( const char *line );

Returns a NULL-terminated array of strings encoded in the indicated line of CSV.

Returns NULL if there was insufficient RAM or if the line is not property encoded CSV.

The return value of parse_csv is malloc'd. Each string in the array is also malloc'd.

void free_csv_line( char **parsed );

Convenience function to free the parse_csv output. Frees each string in the array and then frees the array.

Documentation (split.c)

char **split_on_unescaped_newlines(const char *txt);

Given a string, which might contain unescaped linebreaks, get a NULL-terminated array of strings, each of which does not contain unescaped linebreaks.

Both the array and the strings in the array are malloc'd.

Documentation (fread_csv_line.c)

char *fread_csv_line(FILE *fp, int max_line_size, int *done, int *err)

Extract a line of CSV from a file.

max_line_size: Lines longer than this will cause fread_csv_line to return NULL.

done: Pointer to an int. The int is set to 1 when the end of fp is reached.

err: Pointer to an int. On error, an error code will be written to this int. Error codes are defined in csv.h: CSV_ERR_LONGLINE and CSV_ERR_NO_MEMORY.

Caveats

fread_csv_line is optimized for repeating until the file is exhausted. It mutates/depends on file position (in the fseek sense), in an unpredictable way.

fread_csv_line shouldn't be called on different files in parallel. (Hopefully sometime in the future we'll add a proper init system to deal with this).

Calling fread_csv_line on fp after a previous call exhausted the file (indicated by *done) is undefined behavior.

TODO

In split.c and fread_csv_line.c: Deal with carriage returns when "linebreak" doesn't mean "\n"

csv_parser's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

csv_parser's Issues

Multiple warnings if add more compiler's flags

% cat ./Makefile 
WFLAGS += -Wall -Wextra -Wpedantic -Wshadow
WFLAGS += -Wconversion -Wsign-conversion -Winit-self -Wunreachable-code -Wformat-y2k
WFLAGS += -Wformat-nonliteral -Wformat-security -Wmissing-include-dirs
WFLAGS += -Wswitch-default -Wtrigraphs -Wstrict-overflow=5
WFLAGS += -Wfloat-equal -Wundef -Wshadow
WFLAGS += -Wbad-function-cast -Wcast-qual -Wcast-align
WFLAGS += -Wwrite-strings
WFLAGS += -Winline

all:
	gcc -g $(WFLAGS) csv.c split.c fread_csv_line.c tests/test.c -o test
make

csv.c: In function ‘parse_csv’:
csv.c:61:33: warning: conversion to ‘long unsigned int’ from ‘int’ may change the sign of the result [-Wsign-conversion]
   61 |     buf = malloc( sizeof(char*) * (fieldcnt+1) );
      |                                 ^
csv.c:102:22: warning: this statement may fall through [-Wimplicit-fallthrough=]
  102 |                 fEnd = 1;
      |                 ~~~~~^~~
csv.c:103:13: note: here
  103 |             case ',':
      |             ^~~~
split.c: In function ‘split_on_unescaped_newlines’:
split.c:27:33: warning: conversion to ‘long unsigned int’ from ‘int’ may change the sign of the result [-Wsign-conversion]
   27 |     buf = malloc( sizeof(char*) * (nLines+1) );
      |                                 ^
split.c:48:26: warning: conversion to ‘size_t’ {aka ‘long unsigned int’} from ‘long int’ may change the sign of the result [-Wsign-conversion]
   48 |             size_t len = ptr - lineStart;
      |                          ^~~
fread_csv_line.c: In function ‘fread_csv_line’:
fread_csv_line.c:57:37: warning: conversion to ‘size_t’ {aka ‘long unsigned int’} from ‘int’ may change the sign of the result [-Wsign-conversion]
   57 |         buf = malloc( max_line_size + 1 );
      |                       ~~~~~~~~~~~~~~^~~
fread_csv_line.c:11:21: warning: conversion from ‘size_t’ {aka ‘long unsigned int’} to ‘int’ may change value [-Wconversion]
   11 |         fread_len = fread( read_buf, sizeof(char), READ_BLOCK_SIZE, fp );\
      |                     ^~~~~
fread_csv_line.c:74:9: note: in expansion of macro ‘QUICK_GETC’
   74 |         QUICK_GETC(ch, fp);
      |         ^~~~~~~~~~
fread_csv_line.c:11:21: warning: conversion from ‘size_t’ {aka ‘long unsigned int’} to ‘int’ may change value [-Wconversion]
   11 |         fread_len = fread( read_buf, sizeof(char), READ_BLOCK_SIZE, fp );\
      |                     ^~~~~
fread_csv_line.c:89:17: note: in expansion of macro ‘QUICK_GETC’
   89 |                 QUICK_GETC(ch, fp);
      |   

LeakSanitizer: detected memory leaks

Thank you for you hard work guys. That is nice and useful library but seems there are memory leaks detected.

How to repeat:

  1. Install LLVM for sanitizer
sudo apt -y install llvm
  1. Build with sanitizer
git clone https://github.com/semitrivial/csv_parser.git
cd csv_parser
clang -fsanitize=address -g -O0 -DDEBUG -Wall csv.c split.c fread_csv_line.c tests/test.c -o test

3, Run with sanitizer

ASAN_OPTIONS=symbolize=1 ASAN_SYMBOLIZER_PATH=/usr/bin/llvm-symbolizer ./test
unning test_parse_csv...Success.
Running test_split_on_unescaped_newlines...Success.
Running test_fread_csv_line...Success.

=================================================================
==115538==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 26 byte(s) in 1 object(s) allocated from:
    #0 0x562f0af00b53 in strdup (/home/user/examples/csv_parser/test+0x8cb53) (BuildId: 6dbba5a4b3656ad04924963f1dc4003f03e58507)
    #1 0x562f0af524be in fread_csv_line /home/user/examples/csv_parser/fread_csv_line.c:106:12
    #2 0x562f0af52ec2 in test_fread_csv_line /home/user/examples/csv_parser/tests/test.c:99:16
    #3 0x562f0af52554 in run_test /home/user/examples/csv_parser/tests/test.c:12:12
    #4 0x562f0af525cc in main /home/user/examples/csv_parser/tests/test.c:25:3
    #5 0x7f4e037c1d8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16

Direct leak of 18 byte(s) in 1 object(s) allocated from:
    #0 0x562f0af00b53 in strdup (/home/user/examples/csv_parser/test+0x8cb53) (BuildId: 6dbba5a4b3656ad04924963f1dc4003f03e58507)
    #1 0x562f0af524be in fread_csv_line /home/user/examples/csv_parser/fread_csv_line.c:106:12
    #2 0x562f0af53060 in test_fread_csv_line /home/user/examples/csv_parser/tests/test.c:108:16
    #3 0x562f0af52554 in run_test /home/user/examples/csv_parser/tests/test.c:12:12
    #4 0x562f0af525cc in main /home/user/examples/csv_parser/tests/test.c:25:3
    #5 0x7f4e037c1d8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16

Direct leak of 16 byte(s) in 1 object(s) allocated from:
    #0 0x562f0af00b53 in strdup (/home/user/examples/csv_parser/test+0x8cb53) (BuildId: 6dbba5a4b3656ad04924963f1dc4003f03e58507)
    #1 0x562f0af524be in fread_csv_line /home/user/examples/csv_parser/fread_csv_line.c:106:12
    #2 0x562f0af52fd6 in test_fread_csv_line /home/user/examples/csv_parser/tests/test.c:105:16
    #3 0x562f0af52554 in run_test /home/user/examples/csv_parser/tests/test.c:12:12
    #4 0x562f0af525cc in main /home/user/examples/csv_parser/tests/test.c:25:3
    #5 0x7f4e037c1d8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16

Direct leak of 16 byte(s) in 1 object(s) allocated from:
    #0 0x562f0af00b53 in strdup (/home/user/examples/csv_parser/test+0x8cb53) (BuildId: 6dbba5a4b3656ad04924963f1dc4003f03e58507)
    #1 0x562f0af524be in fread_csv_line /home/user/examples/csv_parser/fread_csv_line.c:106:12
    #2 0x562f0af52e38 in test_fread_csv_line /home/user/examples/csv_parser/tests/test.c:96:16
    #3 0x562f0af52554 in run_test /home/user/examples/csv_parser/tests/test.c:12:12
    #4 0x562f0af525cc in main /home/user/examples/csv_parser/tests/test.c:25:3
    #5 0x7f4e037c1d8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16

Direct leak of 16 byte(s) in 1 object(s) allocated from:
    #0 0x562f0af00b53 in strdup (/home/user/examples/csv_parser/test+0x8cb53) (BuildId: 6dbba5a4b3656ad04924963f1dc4003f03e58507)
    #1 0x562f0af524be in fread_csv_line /home/user/examples/csv_parser/fread_csv_line.c:106:12
    #2 0x562f0af52dae in test_fread_csv_line /home/user/examples/csv_parser/tests/test.c:93:16
    #3 0x562f0af52554 in run_test /home/user/examples/csv_parser/tests/test.c:12:12
    #4 0x562f0af525cc in main /home/user/examples/csv_parser/tests/test.c:25:3
    #5 0x7f4e037c1d8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16

Direct leak of 14 byte(s) in 1 object(s) allocated from:
    #0 0x562f0af00b53 in strdup (/home/user/examples/csv_parser/test+0x8cb53) (BuildId: 6dbba5a4b3656ad04924963f1dc4003f03e58507)
    #1 0x562f0af524be in fread_csv_line /home/user/examples/csv_parser/fread_csv_line.c:106:12
    #2 0x562f0af52f4c in test_fread_csv_line /home/user/examples/csv_parser/tests/test.c:102:16
    #3 0x562f0af52554 in run_test /home/user/examples/csv_parser/tests/test.c:12:12
    #4 0x562f0af525cc in main /home/user/examples/csv_parser/tests/test.c:25:3
    #5 0x7f4e037c1d8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16

Direct leak of 14 byte(s) in 1 object(s) allocated from:
    #0 0x562f0af00b53 in strdup (/home/user/examples/csv_parser/test+0x8cb53) (BuildId: 6dbba5a4b3656ad04924963f1dc4003f03e58507)
    #1 0x562f0af524be in fread_csv_line /home/user/examples/csv_parser/fread_csv_line.c:106:12
    #2 0x562f0af52d24 in test_fread_csv_line /home/user/examples/csv_parser/tests/test.c:90:16
    #3 0x562f0af52554 in run_test /home/user/examples/csv_parser/tests/test.c:12:12
    #4 0x562f0af525cc in main /home/user/examples/csv_parser/tests/test.c:25:3
    #5 0x7f4e037c1d8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16

Direct leak of 12 byte(s) in 1 object(s) allocated from:
    #0 0x562f0af00b53 in strdup (/home/user/examples/csv_parser/test+0x8cb53) (BuildId: 6dbba5a4b3656ad04924963f1dc4003f03e58507)
    #1 0x562f0af524be in fread_csv_line /home/user/examples/csv_parser/fread_csv_line.c:106:12
    #2 0x562f0af530ea in test_fread_csv_line /home/user/examples/csv_parser/tests/test.c:111:16
    #3 0x562f0af52554 in run_test /home/user/examples/csv_parser/tests/test.c:12:12
    #4 0x562f0af525cc in main /home/user/examples/csv_parser/tests/test.c:25:3
    #5 0x7f4e037c1d8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16

Direct leak of 12 byte(s) in 1 object(s) allocated from:
    #0 0x562f0af00b53 in strdup (/home/user/examples/csv_parser/test+0x8cb53) (BuildId: 6dbba5a4b3656ad04924963f1dc4003f03e58507)
    #1 0x562f0af524be in fread_csv_line /home/user/examples/csv_parser/fread_csv_line.c:106:12
    #2 0x562f0af52c9a in test_fread_csv_line /home/user/examples/csv_parser/tests/test.c:87:16
    #3 0x562f0af52554 in run_test /home/user/examples/csv_parser/tests/test.c:12:12
    #4 0x562f0af525cc in main /home/user/examples/csv_parser/tests/test.c:25:3
    #5 0x7f4e037c1d8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16

Direct leak of 1 byte(s) in 1 object(s) allocated from:
    #0 0x562f0af00b53 in strdup (/home/user/examples/csv_parser/test+0x8cb53) (BuildId: 6dbba5a4b3656ad04924963f1dc4003f03e58507)
    #1 0x562f0af524be in fread_csv_line /home/user/examples/csv_parser/fread_csv_line.c:106:12
    #2 0x562f0af53174 in test_fread_csv_line /home/user/examples/csv_parser/tests/test.c:114:16
    #3 0x562f0af52554 in run_test /home/user/examples/csv_parser/tests/test.c:12:12
    #4 0x562f0af525cc in main /home/user/examples/csv_parser/tests/test.c:25:3
    #5 0x7f4e037c1d8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16

SUMMARY: AddressSanitizer: 145 byte(s) leaked in 10 allocation(s).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.