doyaaaaaken / kotlin-csv Goto Github PK
View Code? Open in Web Editor NEWPure Kotlin CSV Reader/Writer
Home Page: https://kenta-koyama-biz.gitbook.io/kotlin-csv/
License: Apache License 2.0
Pure Kotlin CSV Reader/Writer
Home Page: https://kenta-koyama-biz.gitbook.io/kotlin-csv/
License: Apache License 2.0
CSV specification says
Each line should contain the same number of fields throughout the file.
So, below CSV format is invalid.
a,b,c
d,e
f,g,h
But kotlin-csv doen't throw exception on CsvFileReader.readAll
, CsvFileReader.readAllAsSequence
methods when reading this kind of csv data.
val file = File("file.csv") // format ZoneDateTime,BigDecimal
val values = csvReader.open(file) {
val listStr = readAll()
val size = listStr.map { ZonedDateTime.parse(it[0]) to BigDecimal(it[1]) }.size
println(size)
readAllAsSequence().map { ZonedDateTime.parse(it[0]) to BigDecimal(it[1]) }.toMap()
}
println(values.size)
output:
1490
0
PS: Sorry, I realized the error: you need to rediscover
In some case, it may be better not to quote the numeric type field. (e.g. In case of opening written CSV with Microsoft Excel)
Below CSV is QUOTE_NONNUMERIC option's example.
"a","b",1
2.0,"e","f"
Is it possible to add an option to skip (or even read) rows with different fields number than other rows instead of throwing an exception. I'm willing to PR it myself if you're open to adding this option
Let me know what your thoughts are on this
refs #15
MalformedCSVException thrown in CSV parse logic doesn't contain line number.
e.g.
It's better to inform the malformed position's line number and column number.
It is very common to have csv headers in csv files. However not always I can or need to store the whole file in memory. Right now there is possibility either to read file line by line or read all data with header. It would be really useful to be able to do both at the same time
When we use WriteQuoteMode.ALL
, it does not use the quote character we specified in the CsvWriteQuoteContext
but just uses "
instead, and it does not escape any special character at all, even producing invalid CSV output.
Looking at the source code, it's pretty obvious why:
Add Kotlin/Native build target on the multiplatform project.
Hey @doyaaaaaken thank you very much for this library!
When reading all lines with headers, a duplicate check is performed and MalformedCSVException
thrown.
Since columns in a row are accessed by their index headers could simply be deduplicated by appending something like an occurrence indicator to the header name.
just | some | example | example | headers |
---|---|---|---|---|
A | B | C | D | E |
The header example
appears twice. According to the suggestion the second occurrence could be named example_01
.
Main benefit would be to not have a runtime exception thrown and not needing to rename columns in the original file to have it processed.
I know this would introduce some "magic" and I hope that wouldn't collide with your goal of having a simple library.
The required functionality would be just a few lines of code and I'd very gladly do a PR myself, just first wanted to get your thoughts on this.
Describe the problem
It quite not a bug, but I'm stuck on this exception:
Caused by: java.lang.NoClassDefFoundError: mu/KotlinLogging
at com.github.doyaaaaaken.kotlincsv.client.CsvFileReader.<init>(CsvFileReader.kt:21)
at com.github.doyaaaaaken.kotlincsv.client.CsvReader.open(CsvReader.kt:129)
at com.github.doyaaaaaken.kotlincsv.client.CsvReader.readAll(CsvReader.kt:48)
It happens only on Ubuntu. MacOS is still fine.
Environment
Any suggestion on this? Thank you!
According to the CSV Specificaton, an empty line between CSV rows is not allowed.
But, there is a demand for reading that kind of file.
So, we want to set it as csvReader option like below.
val reader = csvReader {
skipEmptyLine = true
}
val str = """a,b,c
d,e,f
"""
//can read csv containing empty line.
reader.read(str)
Here, we can read in a suspending function.
https://github.com/doyaaaaaken/kotlin-csv#read-in-a-suspending-function
The background is here #66.
It would be better that we can also use openAsync
on CSVWriter not only on CSVReader.
With current implementation whenever we write rows, it automatically adds the first row as a header. There should be an option to say no header needed.
Output log when skip miss matched row.
Change code to deal with these TODO comments.
Line 67 in e9b7971
Discussed here #48 (comment)
Good morning,
I'm trying to use kotlin-csv on a comma delimited file (input stream) but I think there is a problem with managing the spaces and colon.
In particular, these are the first lines of the file:
Device serial,Date ,Temperature 51 (Medium) °C
11869,2021-02-09 00:14:59,7.2
11869,2021-02-09 00:30:01,7.1
11869,2021-02-09 00:44:59,7.2
11869,2021-02-09 00:59:59,7.4
11869,2021-02-09 01:14:58,7.5
11869,2021-02-09 01:29:58,7.5
11869,2021-02-09 01:44:58,7.3
11869,2021-02-09 01:59:58,7.2
11869,2021-02-09 02:14:58,7.2
11869,2021-02-09 02:29:58,7.2
11869,2021-02-09 02:44:58,7.2
11869,2021-02-09 02:59:57,7.3
11869,2021-02-09 03:14:57,7.2
11869,2021-02-09 03:29:57,7.3
11869,2021-02-09 03:44:57,7.2
11869,2021-02-09 03:59:57,7.4
..while this is the script:
data class DataClass(
val FirstColumn: String,
val SecondColumn: String,
val ThirdColumn: String )
fun parse(data:InputStream): List?{
val list = mutableListOf<DataClass>()
try {
val rows: List<List<String>> = csvReader().readAll(data)
for (i in rows) {
var firstColumn : String = i[0]
var secondColumn : String = i[1]
var thirdColumn : String = i[2]
list.add(DataClass(FirstColumn =firstColumn, SecondColumn = secondColumn, ThirdColumn = thirdColumn))
}
}
catch (e: Exception) {
e.printStackTrace()
}
return list
}
Unfortunately only the first column of each row is correctly identified in the output, for example (first row):
first column: 11869
second column: 2021-02-09
third colum: 0
No other colums detected.
Where is my mistake?
Thank you
Prepare a set of CsvReaderContext/CsvWriteContext setting for TSV, and enable to use it.
According to CSV specification,
there may be line break or not at the end of the file.
The last record in the file may or may not have an ending line break.
So, introduce the below option.
val writer = csvWriter {
outputLastLineTerminator = false
}
Use existing mature java CSV library as default in JVM project, and make project inherent CsvParser optional.
I noticed that functions using lambdas aren't utilizing Kotlin's inline
keyword. This could have avoidable performance impacts.
Take, for example, this function:
fun csvReader(init: CsvReaderContext.() -> Unit = {}): CsvReader {
val context = CsvReaderContext().apply(init)
return CsvReader(context)
}
The JVM doesn't have higher-order functions, so Kotlin must generate a class (a "SAM type") with the lambda's code in a single method. This doesn't matter too much if Kotlin can generate a singleton object
, but in this case it can't, as CsvReaderContext()
is captured in the closure. So, every time this function is invoked the lambda's class must be instantiated with CsvReaderContext()
in a field, invoked, then garbage collected right after. (Correct me if I'm wrong)
*Small correction: this is a bad example. Looking at the bytecode, the reader context is passed as a method parameter.
I'm not sure how this works on other platforms, but on the JVM this impacts performance.
To avoid this, Kotlin provides inline
functions, which inline the function's bytecode into where it's used. This mitigates the performance issues above at the expense of bytecode size being larger. If internal
types or functions are used, you can add the @PublishedApi
annotation to allow them to be accessed by the function, which makes whatever it's applied to public in the bytecode but obeyed by Kotlin.
A more impactful example would be the open
functions in CsvReader.kt
and CsvWriter.kt
Now, whether this is that big of a deal in this case is debatable.
readAllWithHeader
yields a List<Map<String,String>>
and hence empty columns are being read as empty strings, so that we get ""
for both col1
and col2
in the following example:
"col1","col2"
"",
I'd really like to get null
for col2
here (this of course only makes sense if all strings are quoted, otherwise it wouldn't be clear how to interpret empty columns). I understand that you can't change the result to List<Map<String,String?>>
now, but maybe you could add a nullCode
option for reading as it already exists for writing. The default value is an empty string ""
(=current behavior). I could then simply do
val nullCode = "NULL"
val rows = csvReader(nullCode=nullCode).readAllWithHeader(inputStream)
.map { row -> row.mapValues { col -> if (col.value == nullCode) null else col.value } }
At first glance it seems that it only requires to change
delimiter -> {
field.append(nullCode)
flushField()
state = ParseState.DELIMITER
}
'\n', '\u2028', '\u2029', '\u0085' -> {
field.append(nullCode)
flushField()
state = ParseState.END
}
'\r' -> {
if (nextCh == '\n') pos += 1
field.append(nullCode)
flushField()
state = ParseState.END
}
and the same for
but I didn't check it thoroughly.
Describe the bug
If rows are not equal-sized an exception is thrown:
com.github.doyaaaaaken.kotlincsv.util.CSVFieldNumDifferentException: Fields num seems to be 4 on each row, but on 2th csv row, fields num is 3.
To Reproduce
csvWriter().open(csvFile) {
writeRow(listOf("a"))
writeRow(listOf("a", "b"))
}
csvReader().open(csvFile) {
readAllAsSequence()
}
Expected behavior
Missing cells are treated as nulls or empty strings.
Environment
Screenshots
N/A
To reproduce
Have a csv file with header row with 3 columns and two data rows, first data row with two columns, second - with three. Like this:
First name | Last name | Citizenship |
---|---|---|
John | Bobkins | |
Michael | Pepkins | US |
While invoking 'readAllWithHeaderAsSequence' on this file, the CSVFieldNumDifferentException is thrown saying that two colums are expected but three are found. It happens because 'fieldsNum' variable in the CsvFileReader.kt is initialized based on the first data row, while it has to be initialized based on the header row.
Expected behavior
The following code has to return two rows:
csvReader().open(filePath) {
readAllWithHeaderAsSequence().forEach {
. . .
}
Environment
Like this service
https://www.gitbook.com/
is there any way to write to a String instead of a File?
For example i have this csv file that includes a column formatted in json, and all i want to do is reformat that field per each row without having to rewrite the whole file
Please allow writing to csv file without having to close it.
I have a streaming scenario where I need to write each data I get to csv. Closing and reopening after each batch would be suboptimal.
Thanks
David
Hi,
Would be nice to see support for latest Kotlin 1.4
Thanks,
Adrian
When reading/writing you usually want to use the same config, i.e. charset, quoteChar, etc.
It would be great if we could write this config once, and then reuse in both read and write.
Something like:
val context = CsvContext {
charset = "UTF-8"
}
csvReader(context)...
csvWriter(context)...
Test code example.
Coverage can be seen here.
https://codecov.io/gh/doyaaaaaken/kotlin-csv/src/master/src/main/kotlin/com/github/doyaaaaaken/kotlincsv/parser/ParseStateMachine.kt
Describe the bug
Cannot find the dsl
To Reproduce
plugins {
kotlin("jvm") version "1.6.10"
...
}
...
dependencies {
implementation("com.github.doyaaaaaken:kotlin-csv-jvm:1.2.0")
...
}
...
val rows = csvReader().readAll(inputStream) // throws error
java.lang.NoClassDefFoundError: com/github/doyaaaaaken/kotlincsv/dsl/CsvReaderDslKt
Expected behavior
A clear and concise description of what you expected to happen.
Environment
Screenshots
If applicable, add screenshots to help explain your problem.
Quickly looking at the code it seems like there's only one log statement:
Do we really need to pull an entire library for logging?
Line 50 in ed7a678
I'm an Android user and currently that log would go basically nowhere.
Test if CSV written by CSVWriter can be read by CSVReader.
getting this error please help
Caused by: java.io.FileNotFoundException: test.csv: open failed: EROFS (Read-only file system)
We can write any type of List<List<Any?>> data.
When writing field, we use Any?.toString().
It's better to customize the output format.
How to fix this issue ?
Currently the only way to interact with a CSV is to parse all rows. Two use-cases that this does not cover are:
Describe the bug
When building application (using gradle distZip from application plugin) that depends on kotlin-csv, test libraries are pulled into created application artefact.
To Reproduce
Create empty basic kotlin project
add dependency implementation("com.github.doyaaaaaken:kotlin-csv-jvm:0.10.1")
Expected behavior
No testing libraries in artefact. To check:
run gradle dependencies
run gradle distZip
with application plugin
Environment
prepare issue template like this
https://github.com/ktorio/ktor/blob/master/.github/ISSUE_TEMPLATE/bug-report.md
Hey!
i've parsed a csv and found out, that there is an asci NULL character between every char.
i've used a httpRequest with a ByeInputStream as following:
val result = httpClient.get<HttpStatement> {
url(blobUrl)
}.execute() { response: HttpResponse ->
val channel: ByteReadChannel = response.receive()
val byteIn = ByteArrayInputStream(channel.toByteArray())
csvReader {
delimiter = ','
skipEmptyLine = true
skipMissMatchedRow = true
}.readAll(byteIn)
}
the output is a List<List> which is totally correct.
// when i go through all elements like:
map { list ->
list[0].forEach { c: Char ->
println(c.toInt())
}
}
// the output is:
0
50
0
48
0
50
0
49
0
45
0
48
0
49
0
45
0
48
0
49
0
This just happens to one csv response which i'm not sure why it happens. It's a report from the Google Play store
The same code works 100% good with other csv files.
i've solved it my replacing the NULL char manually like:
list[0].replace(Char.MIN_VALUE.toString(), "")
// e.g.
list[0].replace(Char.MIN_VALUE.toString(), "").forEach { c: Char ->
println(c.toInt())
}
which returns:
50
48
50
49
45
48
49
45
48
49
i'm not sure if it's interesting to do it out of the box?
have a nice day!
Before reading CSV, we should set charset (default is UTF-8).
Introduce auto-detection for CSV file, so become not to need to set charset.
In the below code, a compile error happen because suspend function cannot be called inside lambda of open
function.
So make it callable.
suspend fun processRow(row: List<String>): List<String> {
return row.map { "prefix-$it" }
}
val rows: List<List<String>> = csvReader().open("test.csv") {
readAllAsSequence()
.map { row -> processRow(row) } // Compile ERROR!! processRow is suspend function so cannot call inside lambda
.toList()
}
Discusssion: https://kotlinlang.slack.com/archives/CMAL3470A/p1601651001001000
Hey there,
thank you very much for this gerat project.
Microsoft applications, for some reason, seem to require a BOM to parse for example UTF-8 files correctly, even though there is no byte order in UTF-8 like there is in 16/32. In order to open a created csv file correctly I suggest to add this special BOM (UTF-8 does require three special bytes 0xEF
, 0xBB
and 0xBF
at the start of the file), even though the csvWriter
is configured with the Charsets.UTF_8.name()
.
Why this is undocumented and why Excel seems to require a BOM for UTF-8 I don't know; might be good questions for Excel team at Microsoft.
What do you think or do you have any suggestion to solve this problem?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.