The kotlin-csv's discuss from doyaaaaaken

Each line should contain the same number of fields throughout the file.

So, below CSV format is invalid.

a,b,c
d,e
f,g,h

But kotlin-csv doen't throw exception on CsvFileReader.readAll, CsvFileReader.readAllAsSequence methods when reading this kind of csv data.

Error AsSequence

val file = File("file.csv") // format ZoneDateTime,BigDecimal
val values = csvReader.open(file) {
    val listStr = readAll()
    val size = listStr.map { ZonedDateTime.parse(it[0]) to BigDecimal(it[1]) }.size
    println(size)
    readAllAsSequence().map { ZonedDateTime.parse(it[0]) to BigDecimal(it[1]) }.toMap()
}

println(values.size)

output:

1490
0

PS: Sorry, I realized the error: you need to rediscover

Introduce QUOTE_NONNUMERIC on WriteQuoteMode

In some case, it may be better not to quote the numeric type field. (e.g. In case of opening written CSV with Microsoft Excel)

Below CSV is QUOTE_NONNUMERIC option's example.

"a","b",1
2.0,"e","f"

Skip mismatched row fields number option?

Is it possible to add an option to skip (or even read) rows with different fields number than other rows instead of throwing an exception. I'm willing to PR it myself if you're open to adding this option

Let me know what your thoughts are on this

Add Kotlin/Js build target

refs #15

Contain line number in MalformedCSVException

MalformedCSVException thrown in CSV parse logic doesn't contain line number.
e.g.

kotlin-csv/src/main/kotlin/com/github/doyaaaaaken/kotlincsv/parser/ParseStateMachine.kt

Line 66 in 22e0498

throw MalformedCSVException("$pos")

It's better to inform the malformed position's line number and column number.

Ability to read line by line file with header

It is very common to have csv headers in csv files. However not always I can or need to store the whole file in memory. Right now there is possibility either to read file line by line or read all data with header. It would be really useful to be able to do both at the same time

Wrong quotes and missing escape characters on writing CSV with WriteQuoteMode.ALL

When we use WriteQuoteMode.ALL, it does not use the quote character we specified in the CsvWriteQuoteContext but just uses " instead, and it does not escape any special character at all, even producing invalid CSV output.

Looking at the source code, it's pretty obvious why:

kotlin-csv/src/jvmMain/kotlin/com/github/doyaaaaaken/kotlincsv/client/CsvFileWriter.kt

Line 130 in 8dbaa07

WriteQuoteMode.ALL -> "\"$field\""

Add Kotlin/Native build targe

Add Kotlin/Native build target on the multiplatform project.

Handle duplicated headers

Hey @doyaaaaaken thank you very much for this library!

kotlin-csv/src/commonMain/kotlin/com/github/doyaaaaaken/kotlincsv/client/CsvFileReader.kt

Lines 63 to 64 in 48510de

    
           val duplicated = findDuplicate(headers) 
        
           if (duplicated != null) throw MalformedCSVException("header '$duplicated' is duplicated")

When reading all lines with headers, a duplicate check is performed and MalformedCSVException thrown.
Since columns in a row are accessed by their index headers could simply be deduplicated by appending something like an occurrence indicator to the header name.

Example:

just	some	example	example	headers
A	B	C	D	E

The header example appears twice. According to the suggestion the second occurrence could be named example_01.
Main benefit would be to not have a runtime exception thrown and not needing to rename columns in the original file to have it processed.

I know this would introduce some "magic" and I hope that wouldn't collide with your goal of having a simple library.
The required functionality would be just a few lines of code and I'd very gladly do a PR myself, just first wanted to get your thoughts on this.

java.lang.NoClassDefFoundError: mu/KotlinLogging

Describe the problem

It quite not a bug, but I'm stuck on this exception:

Caused by: java.lang.NoClassDefFoundError: mu/KotlinLogging
	at com.github.doyaaaaaken.kotlincsv.client.CsvFileReader.<init>(CsvFileReader.kt:21)
	at com.github.doyaaaaaken.kotlincsv.client.CsvReader.open(CsvReader.kt:129)
	at com.github.doyaaaaaken.kotlincsv.client.CsvReader.readAll(CsvReader.kt:48)

It happens only on Ubuntu. MacOS is still fine.

Environment

kotlin-csv version: 0.11.0
java version: java8
kotlin version: 1.4.10
OS: Ubuntu 18.04.5

Any suggestion on this? Thank you!

Skip empty line option on reading

According to the CSV Specificaton, an empty line between CSV rows is not allowed.
But, there is a demand for reading that kind of file.

So, we want to set it as csvReader option like below.

val reader = csvReader {
    skipEmptyLine = true
}

val str = """a,b,c

d,e,f
"""
//can read csv containing empty line.
reader.read(str)

Write in a Suspending function

Here, we can read in a suspending function.
https://github.com/doyaaaaaken/kotlin-csv#read-in-a-suspending-function
The background is here #66.

It would be better that we can also use openAsync on CSVWriter not only on CSVReader.

Write Without Header

With current implementation whenever we write rows, it automatically adds the first row as a header. There should be an option to say no header needed.

Output log when skip miss matched row

Output log when skip miss matched row.
Change code to deal with these TODO comments.

kotlin-csv/src/jvmMain/kotlin/com/github/doyaaaaaken/kotlincsv/client/CsvFileReader.kt

Line 47 in 29009f9

//TODO - log as info level about skipped row.

kotlin-csv/build.gradle.kts

Line 67 in e9b7971

// TODO - investigation and approval

Discussed here #48 (comment)

Benchmark performance

Problem with parsing CSV file with spaces and colon

Good morning,

I'm trying to use kotlin-csv on a comma delimited file (input stream) but I think there is a problem with managing the spaces and colon.

In particular, these are the first lines of the file:

Device serial,Date ,Temperature 51 (Medium) °C
11869,2021-02-09 00:14:59,7.2
11869,2021-02-09 00:30:01,7.1
11869,2021-02-09 00:44:59,7.2
11869,2021-02-09 00:59:59,7.4
11869,2021-02-09 01:14:58,7.5
11869,2021-02-09 01:29:58,7.5
11869,2021-02-09 01:44:58,7.3
11869,2021-02-09 01:59:58,7.2
11869,2021-02-09 02:14:58,7.2
11869,2021-02-09 02:29:58,7.2
11869,2021-02-09 02:44:58,7.2
11869,2021-02-09 02:59:57,7.3
11869,2021-02-09 03:14:57,7.2
11869,2021-02-09 03:29:57,7.3
11869,2021-02-09 03:44:57,7.2
11869,2021-02-09 03:59:57,7.4

..while this is the script:

data class DataClass(
val FirstColumn: String,
val SecondColumn: String,
val ThirdColumn: String )

fun parse(data:InputStream): List?{

 val list = mutableListOf<DataClass>()

 try {
    val rows: List<List<String>> = csvReader().readAll(data)

    for (i in rows) {

 var firstColumn : String =  i[0]
 var secondColumn : String =  i[1]
 var thirdColumn : String =  i[2]

 list.add(DataClass(FirstColumn =firstColumn, SecondColumn = secondColumn,  ThirdColumn = thirdColumn))
    }

}
catch (e: Exception) {
e.printStackTrace()
}
return list
}

Unfortunately only the first column of each row is correctly identified in the output, for example (first row):
first column: 11869
second column: 2021-02-09
third colum: 0
No other colums detected.

Where is my mistake?
Thank you

Kotlin/Native Support

https://kotlinlang.org/docs/reference/multiplatform.html

Increase coverage on CsvReader.kt

Coverage can be seen here.
https://codecov.io/gh/doyaaaaaken/kotlin-csv/src/28ef24ad5b128fb1e5ae7d5707e0f4c11009214e/src/main/kotlin/com/github/doyaaaaaken/kotlincsv/client/CsvReader.kt

Make it easy to apply setting for TSV read/write setting

Prepare a set of CsvReaderContext/CsvWriteContext setting for TSV, and enable to use it.

allow `open` callback (and others) to accept a suspending function

Output last line terminator option for CsvWriter class

According to CSV specification,
there may be line break or not at the end of the file.

The last record in the file may or may not have an ending line break.

So, introduce the below option.

val writer = csvWriter {
    outputLastLineTerminator = false
}

Pluggable CsvParser by using existing library in JvmMain

Use existing mature java CSV library as default in JVM project, and make project inherent CsvParser optional.

Functions that use lambdas should be inlined where possible

I noticed that functions using lambdas aren't utilizing Kotlin's inline keyword. This could have avoidable performance impacts.

Take, for example, this function:

fun csvReader(init: CsvReaderContext.() -> Unit = {}): CsvReader {
    val context = CsvReaderContext().apply(init)
    return CsvReader(context)
}

The JVM doesn't have higher-order functions, so Kotlin must generate a class (a "SAM type") with the lambda's code in a single method. This doesn't matter too much if Kotlin can generate a singleton object, but in this case it can't, as CsvReaderContext() is captured in the closure. So, every time this function is invoked the lambda's class must be instantiated with CsvReaderContext() in a field, invoked, then garbage collected right after. (Correct me if I'm wrong)

*Small correction: this is a bad example. Looking at the bytecode, the reader context is passed as a method parameter.

I'm not sure how this works on other platforms, but on the JVM this impacts performance.

To avoid this, Kotlin provides inline functions, which inline the function's bytecode into where it's used. This mitigates the performance issues above at the expense of bytecode size being larger. If internal types or functions are used, you can add the @PublishedApi annotation to allow them to be accessed by the function, which makes whatever it's applied to public in the bytecode but obeyed by Kotlin.

A more impactful example would be the open functions in CsvReader.kt and CsvWriter.kt

Now, whether this is that big of a deal in this case is debatable.

Feature request: Differentiate between empty string and null value

readAllWithHeader yields a List<Map<String,String>> and hence empty columns are being read as empty strings, so that we get "" for both col1 and col2 in the following example:

"col1","col2"
"",

I'd really like to get null for col2 here (this of course only makes sense if all strings are quoted, otherwise it wouldn't be clear how to interpret empty columns). I understand that you can't change the result to List<Map<String,String?>> now, but maybe you could add a nullCode option for reading as it already exists for writing. The default value is an empty string "" (=current behavior). I could then simply do

val nullCode = "NULL"
val rows = csvReader(nullCode=nullCode).readAllWithHeader(inputStream)
    .map { row -> row.mapValues { col -> if (col.value == nullCode) null else col.value } }

At first glance it seems that it only requires to change

kotlin-csv/src/commonMain/kotlin/com/github/doyaaaaaken/kotlincsv/parser/ParseStateMachine.kt

Lines 36 to 48 in c23a51b

    
           delimiter -> { 
        
               flushField() 
        
               state = ParseState.DELIMITER 
        
           } 
        
           '\n', '\u2028', '\u2029', '\u0085' -> { 
        
               flushField() 
        
               state = ParseState.END 
        
           } 
        
           '\r' -> { 
        
               if (nextCh == '\n') pos += 1 
        
               flushField() 
        
               state = ParseState.END 
        
           }

to

                    delimiter -> {
                        field.append(nullCode)
                        flushField()
                        state = ParseState.DELIMITER
                    }
                    '\n', '\u2028', '\u2029', '\u0085' -> {
                        field.append(nullCode)
                        flushField()
                        state = ParseState.END
                    }
                    '\r' -> {
                        if (nextCh == '\n') pos += 1
                        field.append(nullCode)
                        flushField()
                        state = ParseState.END
                    }

and the same for

kotlin-csv/src/commonMain/kotlin/com/github/doyaaaaaken/kotlincsv/parser/ParseStateMachine.kt

Lines 87 to 99 in c23a51b

    
           delimiter -> { 
        
               flushField() 
        
               state = ParseState.DELIMITER 
        
           } 
        
           '\n', '\u2028', '\u2029', '\u0085' -> { 
        
               flushField() 
        
               state = ParseState.END 
        
           } 
        
           '\r' -> { 
        
               if (nextCh == '\n') pos += 1 
        
               flushField() 
        
               state = ParseState.END 
        
           }

but I didn't check it thoroughly.

CsvFileReader#readAllAsSequence fails on non-equal sized rows

Describe the bug
If rows are not equal-sized an exception is thrown:
com.github.doyaaaaaken.kotlincsv.util.CSVFieldNumDifferentException: Fields num seems to be 4 on each row, but on 2th csv row, fields num is 3.

To Reproduce

        csvWriter().open(csvFile) {
            writeRow(listOf("a"))
            writeRow(listOf("a", "b"))
        }
        csvReader().open(csvFile) {
           readAllAsSequence()
        }

Expected behavior
Missing cells are treated as nulls or empty strings.

Environment

kotlin-csv version 0.11.0
OS: Android 10

Screenshots
N/A

number of fields in a row has to be based on the header

To reproduce

Have a csv file with header row with 3 columns and two data rows, first data row with two columns, second - with three. Like this:

First name	Last name	Citizenship
John	Bobkins
Michael	Pepkins	US

While invoking 'readAllWithHeaderAsSequence' on this file, the CSVFieldNumDifferentException is thrown saying that two colums are expected but three are found. It happens because 'fieldsNum' variable in the CsvFileReader.kt is initialized based on the first data row, while it has to be initialized based on the header row.

Expected behavior
The following code has to return two rows:

csvReader().open(filePath) {
                readAllWithHeaderAsSequence().forEach {

. . . 
}

Environment

kotlin-csv version 0.11.1
java version - java8
OS: Windows 10

Publish document

Like this service
https://www.gitbook.com/

Write directly to a String

is there any way to write to a String instead of a File?

Is there a way to alter existing csv file columns while reading from it?

For example i have this csv file that includes a column formatted in json, and all i want to do is reformat that field per each row without having to rewrite the whole file

Long-running write

Please allow writing to csv file without having to close it.
I have a streaming scenario where I need to write each data I get to csv. Closing and reopening after each batch would be suboptimal.

Thanks
David

Update to Kotlin 1.4.x

Hi,

Would be nice to see support for latest Kotlin 1.4

Thanks,

Adrian

Reuse config between reader and writer

When reading/writing you usually want to use the same config, i.e. charset, quoteChar, etc.

It would be great if we could write this config once, and then reuse in both read and write.

Something like:

val context = CsvContext {
    charset = "UTF-8"
}

csvReader(context)...
csvWriter(context)...

Increase coverage on ParseStateMachine

Test code example.

kotlin-csv/src/test/kotlin/com/github/doyaaaaaken/kotlincsv/parser/CsvParserTest.kt

Line 27 in 84d7f38

parser.parseRow("a,") shouldBe listOf("a", "")

Coverage can be seen here.
https://codecov.io/gh/doyaaaaaken/kotlin-csv/src/master/src/main/kotlin/com/github/doyaaaaaken/kotlincsv/parser/ParseStateMachine.kt

java.lang.NoClassDefFoundError: com/github/doyaaaaaken/kotlincsv/dsl/CsvReaderDslKt

Describe the bug
Cannot find the dsl

To Reproduce

plugins {
	kotlin("jvm") version "1.6.10"
        ...
}
...
dependencies {
	implementation("com.github.doyaaaaaken:kotlin-csv-jvm:1.2.0")
        ...
}
...
val rows = csvReader().readAll(inputStream) // throws error

java.lang.NoClassDefFoundError: com/github/doyaaaaaken/kotlincsv/dsl/CsvReaderDslKt

Expected behavior
A clear and concise description of what you expected to happen.

Environment

kotlin-csv version: 1.6.10
java version: 11.0.3
OS: MacOS

Screenshots
If applicable, add screenshots to help explain your problem.

Remove logger 3rd party library

Quickly looking at the code it seems like there's only one log statement:

kotlin-csv/src/commonMain/kotlin/com/github/doyaaaaaken/kotlincsv/client/CsvFileReader.kt

Line 48 in 8108e5b

    
           logger.info { "skip miss matched row. [csv row num = ${idx + 1}, fields num = ${row.size}, fields num of first row = $fieldsNumInRow]" }

Do we really need to pull an entire library for logging?

kotlin-csv/build.gradle.kts

Line 50 in ed7a678

implementation("io.github.microutils:kotlin-logging:2.0.11")

I'm an Android user and currently that log would go basically nowhere.

Test for CSV Reader/Writer compatibility

Test if CSV written by CSVWriter can be read by CSVReader.

EROFS (Read-only file system)

getting this error please help
Caused by: java.io.FileNotFoundException: test.csv: open failed: EROFS (Read-only file system)

Custom formatter for any type on writing CSV

We can write any type of List<List<Any?>> data.
When writing field, we use Any?.toString().
It's better to customize the output format.

kotlin-csv.kotlin_module: Module was compiled with an incompatible version of Kotlin. The binary version of its metadata is 1.5.1, expected version is 1.1.16.

How to fix this issue ?

Improvement: Read one row at a time

Currently the only way to interact with a CSV is to parse all rows. Two use-cases that this does not cover are:

Reading only the header. This is useful if you wish to provide a breakdown of what is included in the file. While it should be trivial to do without a library, the existence of this library and its parsing logic supports the position that this is a non-trivial task.
Reading row-by-row, which is arguably a superset of the former use-case. This would be helpful when interacting with asynchronous workflows. One could attempt to read a single row from a piped input stream, and the library throws an exception when another line cannot be read in its entirety (as it does now with the full text). The producer can then continue to populate the input stream as data becomes available. The end result would be an asynchronous stream of rows (which I am not suggesting should be included in this library, but these changes would make this possible).

adding kotlin-csv-jvm depenedncy pulls testing libraries into runtime

Describe the bug
When building application (using gradle distZip from application plugin) that depends on kotlin-csv, test libraries are pulled into created application artefact.

To Reproduce
Create empty basic kotlin project
add dependency implementation("com.github.doyaaaaaken:kotlin-csv-jvm:0.10.1")

Expected behavior
No testing libraries in artefact. To check:

run gradle dependencies
- runtimeClasspath should not contain kotlin-test library
run gradle distZip with application plugin
- created zip should not contain testing libraries

Environment

kotlin-csv: 0.10.1
java version: java8
gradle: 6.5
OS: Win

Bug report template for creating issue

prepare issue template like this
https://github.com/ktorio/ktor/blob/master/.github/ISSUE_TEMPLATE/bug-report.md

Asci Null characters between csv Strings

Hey!
i've parsed a csv and found out, that there is an asci NULL character between every char.

i've used a httpRequest with a ByeInputStream as following:

val result = httpClient.get<HttpStatement> {
  url(blobUrl)
}.execute() { response: HttpResponse ->
  val channel: ByteReadChannel = response.receive()
  val byteIn = ByteArrayInputStream(channel.toByteArray())
  csvReader {
    delimiter = ','
    skipEmptyLine = true
    skipMissMatchedRow = true
  }.readAll(byteIn)
}

the output is a List<List> which is totally correct.

// when i go through all elements like:
map { list ->
    list[0].forEach { c: Char ->
    println(c.toInt())
     }
}

// the output is:
0
50
0
48
0
50
0
49
0
45
0
48
0
49
0
45
0
48
0
49
0

This just happens to one csv response which i'm not sure why it happens. It's a report from the Google Play store
The same code works 100% good with other csv files.

i've solved it my replacing the NULL char manually like:

list[0].replace(Char.MIN_VALUE.toString(), "")

// e.g.
list[0].replace(Char.MIN_VALUE.toString(), "").forEach { c: Char ->
  println(c.toInt())
}

which returns:

i'm not sure if it's interesting to do it out of the box?

have a nice day!

Charset auto detection on reading CSV

Before reading CSV, we should set charset (default is UTF-8).
Introduce auto-detection for CSV file, so become not to need to set charset.

use suspend function inside lambda of `open` method

In the below code, a compile error happen because suspend function cannot be called inside lambda of open function.
So make it callable.

suspend fun processRow(row: List<String>): List<String> {
    return row.map { "prefix-$it" }
}

val rows: List<List<String>> = csvReader().open("test.csv") {
    readAllAsSequence()
        .map { row -> processRow(row) } // Compile ERROR!! processRow is suspend function so cannot call inside lambda
        .toList()
}

Discusssion: https://kotlinlang.slack.com/archives/CMAL3470A/p1601651001001000

Introduce BOM for Microsoft applications

Hey there,

thank you very much for this gerat project.

Microsoft applications, for some reason, seem to require a BOM to parse for example UTF-8 files correctly, even though there is no byte order in UTF-8 like there is in 16/32. In order to open a created csv file correctly I suggest to add this special BOM (UTF-8 does require three special bytes 0xEF, 0xBB and 0xBF at the start of the file), even though the csvWriter is configured with the Charsets.UTF_8.name().

Why this is undocumented and why Excel seems to require a BOM for UTF-8 I don't know; might be good questions for Excel team at Microsoft.

What do you think or do you have any suggestion to solve this problem?

	val duplicated = findDuplicate(headers)
	if (duplicated != null) throw MalformedCSVException("header '$duplicated' is duplicated")

	delimiter -> {
	flushField()
	state = ParseState.DELIMITER
	}
	'\n', '\u2028', '\u2029', '\u0085' -> {
	flushField()
	state = ParseState.END
	}
	'\r' -> {
	if (nextCh == '\n') pos += 1
	flushField()
	state = ParseState.END
	}

doyaaaaaken / kotlin-csv Goto Github PK

kotlin-csv's Issues

Example:

Recommend Projects

Recommend Topics

Recommend Org