Giter Site home page Giter Site logo

akuleshov7 / ktoml Goto Github PK

View Code? Open in Web Editor NEW
428.0 6.0 22.0 931 KB

Kotlin Multiplatform parser and compile-time serializer/deserializer for TOML format (Native, JS, JVM) based on KxS

Home Page: https://akuleshov7.github.io/ktoml

License: MIT License

Kotlin 99.87% Shell 0.13%
serialization kotlin toml deserialization native kotlinx-serialization kotlinx hacktoberfest

ktoml's Introduction

Releases Maven Central License Build and test Lines of code Hits-of-Code GitHub repo size codebeat badge maintainability Run deteKT Run diKTat

Fully Native and Multiplatform Kotlin serialization library for serialization/deserialization of toml format. Uses native kotlinx.serialization, provided by Kotlin. This library contains no Java code and no Java dependencies. We believe that TOML is actually the most readable and user-friendly configuration file format. So we decided to support this format for the kotlinx serialization library.

Contribution

As this project is needed by the Kotlin community, we need your help. We will be glad if you will test ktoml or contribute to this project. In case you don't have much time for this - at least spend 5 seconds to give us a star to attract other contributors!

Thanks! ๐Ÿ™ ๐Ÿฅณ

Acknowledgement

Special thanks to those awesome developers who give us great suggestions, help us to maintain and improve this project: @NightEule5, @bishiboosh, @Peanuuutz, @petertrr, @nulls, @Olivki, @edrd-f, @BOOMeranGG, @aSemy, @thomasgalvin

Supported platforms

All the code is written in Kotlin common module. This means that it can be built for each and every Kotlin native platform. However, to reduce the scope, ktoml now supports only the following platforms:

  • jvm
  • mingwx64
  • linuxx64
  • macosx64
  • macosArm64 (M1)
  • ios
  • iosSimulatorArm64
  • js (obviously only for ktoml-core!). Note, that js(LEGACY) is not supported

Other platforms could be added later on the demand (just create a corresponding issue) or easily built by users on their machines.

๐ŸŒ ktoml supports Kotlin 1.9.22

Current limitations

โ— Please note, that TOML standard does not define Java-like types: Char, Short, etc. You can check types that are supported in TOML standard here. However, in Ktoml, our goal is to comprehensively support all primitive types offered by Kotlin.

General
We are still developing and testing this library, so it has several limitations:
โœ… deserialization (with some parsing limitations)
โœ… serialization (with tree-related limitations)

Parsing and decoding
โœ… Table sections (single and dotted)
โœ… Key-value pairs (single and dotted)
โœ… Long/Integer/Byte/Short types
โœ… Double/Float types
โœ… Basic Strings
โœ… Literal Strings
โœ… Char type
โœ… Boolean type
โœ… Simple Arrays
โœ… Comments
โœ… Inline Tables
โœ… Offset Date-Time (to Instant of kotlinx-datetime)
โœ… Local Date-Time (to LocalDateTime of kotlinx-datetime)
โœ… Local Date (to LocalDate of kotlinx-datetime)
โœ… Local Time (to LocalTime of kotlinx-datetime)
โœ… Multiline Strings
โœ… Arrays (including multiline arrays)
โœ… Maps (for anonymous key-value pairs)
โŒ Arrays: nested; of Different Types
โŒ Nested Inline Tables
โŒ Array of Tables
โŒ Inline Array of Tables

Dependency

The library is hosted on the Maven Central. To import ktoml library you need to add following dependencies to your code:

Maven
<dependency>
  <groupId>com.akuleshov7</groupId>
  <artifactId>ktoml-core</artifactId>
  <version>0.5.1</version>
</dependency>
<dependency>
  <groupId>com.akuleshov7</groupId>
  <artifactId>ktoml-file</artifactId>
  <version>0.5.1</version>
</dependency>
Gradle Groovy
implementation 'com.akuleshov7:ktoml-core:0.5.1'
implementation 'com.akuleshov7:ktoml-file:0.5.1'
Gradle Kotlin
implementation("com.akuleshov7:ktoml-core:0.5.1")
implementation("com.akuleshov7:ktoml-file:0.5.1")

How to use

โ— as TOML is a foremost language for config files, we have also supported the deserialization from file. However, we are using okio to read the file, so it will be added as a dependency to your project if you will import ktoml-file. Same about okio Source (for example if you need Streaming): ktoml-source. For basic scenarios of decoding strings you can simply use ktoml-core.

โ— don't forget to add the serialization plugin kotlin("plugin.serialization") to your project. Otherwise, @Serialization annotation won't work properly.

Deserialization:

Straight-forward deserialization
// add extensions from 'kotlinx' lib to your project:
import kotlinx.serialization.decodeFromString
import kotlinx.serialization.serializer
// add com.akuleshov7:ktoml-core to your project:
import com.akuleshov7.ktoml.deserialize

@Serializable
data class MyClass(/* your fields */)

// to deserialize toml input in a string format (separated by newlines '\n')
// no need to provide serializer() explicitly if you will use extension method from
// <kotlinx.serialization.decodeFromString>
val resultFromString = Toml.decodeFromString<MyClass>(/* string with a toml input */)
val resultFromList = Toml.decodeFromString<MyClass>(serializer(), /* sequence with lines of strings with a toml input */)
Partial deserialization

Partial Deserialization can be useful when you would like to deserialize only one single table and you do not want to reproduce whole object structure in your code.

// If you need to deserialize only some part of the toml - provide the full name of the toml table. 
// The deserializer will work only with this table and it's children.
// For example if you have the following toml, but you want only to decode [c.d.e.f] table: 
// [a]
//   b = 1
// [c.d.e.f]
//   d = "5"

val result = Toml.partiallyDecodeFromString<MyClassOnlyForTable>(serializer(), /* string with a toml input */, "c.d.e.f")
val result = Toml.partiallyDecodeFromString<MyClassOnlyForTable>(serializer(), /* list with toml strings */, "c.d.e.f")
Toml File deserialization
// add com.akuleshov7:ktoml-file to your project
import com.akuleshov7.ktoml.file

val resultFromString = TomlFileReader.decodeFromFile<MyClass>(serializer(), /* file path to toml file */)
val resultFromList = TomlFileReader.partiallyDecodeFromFile<MyClass>(serializer(),  /* file path to toml file */, /* table name */)

โ— toml-file is only one of the example for reading the data from source. For your particular case you can implement your own source provider based on okio.Source. For this purpose we have prepared toml-source module and implemented an example with java streams for JVM target.

// add com.akuleshov7:ktoml-source to your project
import com.akuleshov7.ktoml.source

val resultFromString = TomlFileReader.decodeFromSource<MyClass>(serializer(), /* your source */)
val resultFromList = TomlFileReader.partiallyDecodeFromSource<MyClass>(serializer(),  /* your source */, /* table name */)

Serialization:

Straight-forward serialization
// add extensions from 'kotlinx' lib to your project:
import kotlinx.serialization.encodeToString
// add com.akuleshov7:ktoml-core to your project:
import com.akuleshov7.ktoml.Toml

@Serializable
data class MyClass(/* your fields */)

val toml = Toml.encodeToString(MyClass(/* ... */))
Toml File serialization
// add com.akuleshov7:ktoml-file to your project
import com.akuleshov7.ktoml.file.TomlFileWriter

TomlFileWriter.encodeToFile<MyClass>(serializer(), /* file path to toml file */)

Parser to AST:

Simple parser
import com.akuleshov7.ktoml.parsers.TomlParser
import com.akuleshov7.ktoml.TomlConfig
/* ========= */
var tomlAST = TomlParser(TomlInputConfig()).parseStringsToTomlTree(/* list with toml strings */)
tomlAST = TomlParser(TomlInputConfig()).parseString(/* the string that you want to parse */)
tomlAST.prettyPrint()

Configuration

Ktoml parsing and deserialization was made configurable to fit all the requirements from users. We have created a special configuration class that can be passed to the decoder method:

Toml(
    inputConfig = TomlInputConfig(
        // allow/prohibit unknown names during the deserialization, default false
        ignoreUnknownNames = false,
        // allow/prohibit empty values like "a = # comment", default true
        allowEmptyValues = true,
        // allow/prohibit null values like "a = null", default true
        allowNullValues = true,
        // allow/prohibit escaping of single quotes in literal strings, default true
        allowEscapedQuotesInLiteralStrings = true,
        // allow/prohibit processing of empty toml, if false - throws an InternalDecodingException exception, default is true
        allowEmptyToml = true,
    ),
    outputConfig = TomlOutputConfig(
        // indentation symbols for serialization, default 4 spaces
        indentation = Indentation.FOUR_SPACES,
    )
).decodeFromString<MyClass>(
    tomlString
)

How ktoml works: examples

โ— You can check how below examples work in decoding ReadMeExampleTest and encoding ReadMeExampleTest.

Deserialization The following example:
someBooleanProperty = true
# inline tables in gradle 'libs.versions.toml' notation
gradle-libs-like-property = { id = "org.jetbrains.kotlin.jvm", version.ref = "kotlin" }

[table1]
    # null is prohibited by the TOML spec, but allowed in ktoml for nullable types
    # so for 'property1' null value is ok. Use: property1 = null  
    property1 = 100
    property2 = 6

[myMap]
    a = "b"
    c = "d"

[table2]
    someNumber = 5
[table2."akuleshov7.com"]
    name = 'this is a "literal" string'
    # empty lists are also supported
    configurationList = ["a",  "b",  "c"]

    # such redeclaration of table2
    # is prohibited in toml specification;
    # but ktoml is allowing it in non-strict mode: 
    [table2]
        otherNumber = 5.56
        # use single quotes
        charFromString = 'a'
        charFromInteger = 123

can be deserialized to MyClass:

@Serializable
data class MyClass(
    val someBooleanProperty: Boolean,
    val table1: Table1,
    val table2: Table2,
    @SerialName("gradle-libs-like-property")
    val kotlinJvm: GradlePlugin,
    val myMap: Map<String, String>
)

@Serializable
data class Table1(
    // nullable property, from toml input you can pass "null"/"nil"/"empty" value (no quotes needed) to this field
    val property1: Long?,
    // please note, that according to the specification of toml integer values should be represented with Long,
    // but we allow to use Int/Short/etc. Just be careful with overflow
    val property2: Byte,
    // no need to pass this value in the input as it has the default value and so it is NOT REQUIRED
    val property3: Short = 5
)

@Serializable
data class Table2(
    val someNumber: Long,
    @SerialName("akuleshov7.com")
    val inlineTable: NestedTable,
    val otherNumber: Double,
    // Char in a manner of Java/Kotlin is not supported in TOML, because single quotes are used for literal strings.
    // However, ktoml supports reading Char from both single-char string and from it's integer code
    val charFromString: Char,
    val charFromInteger: Char
)

@Serializable
data class NestedTable(
    val name: String,
    @SerialName("configurationList")
    val overriddenName: List<String?>
)

@Serializable
data class GradlePlugin(val id: String, val version: Version)

@Serializable
data class Version(val ref: String)

with the following code:

Toml.decodeFromString<MyClass>(/* your toml string */)

Translation of the example above to json-terminology:

{
  "someBooleanProperty": true,
  
  "gradle-libs-like-property": {
    "id": "org.jetbrains.kotlin.jvm",
    "version": {
      "ref": "kotlin"
    }
  },
  
  "table1": {
    "property1": 100,
    "property2": 5
  },
  "table2": {
    "someNumber": 5,
    
    "otherNumber": 5.56,
    "akuleshov7.com": {
      "name": "my name",
      "configurationList": [
        "a",
        "b",
        "c"
      ]
    }
  }
}
Serialization The following example from above:
someBooleanProperty = true
# inline tables in gradle 'libs.versions.toml' notation
gradle-libs-like-property = { id = "org.jetbrains.kotlin.jvm", version.ref = "kotlin" }

[table1]
# null is prohibited by the TOML spec, but allowed in ktoml for nullable types
# so for 'property1' null value is ok. Use: property1 = null. 
# Null can also be prohibited with 'allowNullValues = false'
property1 = 100
property2 = 6

[table2]
    someNumber = 5
    [table2."akuleshov7.com"]
        name = 'this is a "literal" string'
        # empty lists are also supported
        configurationList = ["a",  "b",  "c"]

# such redeclaration of table2
# is prohibited in toml specification;
# but ktoml is allowing it in non-strict mode: 
[table2]
    otherNumber = 5.56
    # use single quotes
    charFromString = 'a'
    charFromInteger = 123

can be serialized from MyClass:

@Serializable
data class MyClass(
    val someBooleanProperty: Boolean,
    @TomlComments(
        "Comments can be added",
        "More comments can also be added"
    )
    val table1: Table1,
    val table2: Table2,
   @SerialName("gradle-libs-like-property")
   val kotlinJvm: GradlePlugin
)

@Serializable
data class Table1(
    @TomlComments(inline = "At the end of lines too")
    // nullable values, represented as "null" in toml. For more strict behavior,
    // null values can be ignored with the ignoreNullValues config property.
    val property1: Long?,
    // please note, that according to the specification of toml integer values should be represented with Long
    val property2: Long,
    // Default values can be ignored with the ignoreDefaultValues config property.
    val property3: Long = 5
)

@Serializable
data class Table2(
    // Integers can be formatted in hex, binary, etc. Currently only decimal is
    // supported.
    @TomlInteger(IntegerRepresentation.DECIMAL)
    val someNumber: Long,
    @SerialName("akuleshov7.com")
    @TomlInlineTable // Can be on the property
    val inlineTable: InlineTable,
    @TomlComments(
        "Properties always appear before sub-tables, tables aren't redeclared"
    )
    val otherNumber: Double
)

@Serializable
data class InlineTable(
    @TomlLiteral
    val name: String,
    @SerialName("configurationList")
    val overriddenName: List<String?>
)

@Serializable
@TomlInlineTable // ...or the class
data class GradlePlugin(
    val id: String,
    // version is "collapsed": single member inline tables become dotted pairs.
    val version: Version
)

@Serializable
@TomlInlineTable
data class Version(val ref: String)

with the following code:

Toml.encodeToString<MyClass>(/* your encoded object */)

Q&A

I want to catch ktoml-specific exceptions in my code, how can I do it?

Ktoml may generate various exceptions when encountering invalid input. It's important to note that certain strict checks can be enabled or disabled (refer to the Configuration section in this readme). We have intentionally exposed only two top-level exceptions, namely TomlDecodingException and TomlEncodingException, for public use. You can catch these exceptions in your code, as all other exceptions inherit from one of these two and will not be publicly accessible.

What if I do not know the names for keys and tables in my TOML, and therefore cannot specify a strict schema for decoding? Can I still decode it somehow?

Certainly. In such cases, you can decode all your key-values into a Map. However, it's important to be aware that both ktoml and kotlinx will be unable to enforce type control in this scenario. Therefore, you should not expect any "type safety." For instance, even when dealing with a mixture of types like Int, Map, String, etc., such as:

[a]
    b = 42
    c = "String"
    [a.innerTable]
        d = 5
    [a.otherInnerTable]
        d = "String"

You can still decode it using Toml.decodeFromString<MyClass>(data) where:

// MyClass(a={b=42, c=String, innerTable={d=5}, otherInnerTable={d=String}})
@Serializable
data class MyClass(
    val a: Map<String, Map<String, String>>
)

However, be aware that this may lead to unintended side effects. Our recommendation is to decode only key-values of the same type for a more predictable outcome.

ktoml's People

Contributors

akuleshov7 avatar asemy avatar bishiboosh avatar boomerangg avatar dependabot[bot] avatar nighteule5 avatar nulls avatar peanuuutz avatar renovate[bot] avatar saschpe avatar thomasgalvin avatar vitusortner avatar vlsi avatar wavesonics avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

ktoml's Issues

Improve `toString`

Supposing a node which is a key-value:

Current

node.content // blabla="blabla"

New feature

node.toString() // blabla = "blabla"

Can be more (arrays, and so on).

Support unknown table keys

Give a TOML like:

[known]
foo = "bar"

[known.unknownA]
blah = "one"

[known.unknownB]
blah = "two"

I don't see a way to bind this while also still doing type validation on the known table and the unknown child table shapes (just not their keys).

I would want to bind it against a type defined something like this:

@Serializable
data class Config(
  val known: Known,
  @UnboundTables // Probably needs some marker annotation like this?
  val unknowns: Map<String, KnownChild>,
)
@Serializable
data class Known(val foo: String)
@Serializable
data class KnownChild(val blah: String)

In the case where the child tables do not conform to the same shape, this should also be allowed:

@Serializable
data class Config(
  val known: Known,
  @UnboundTables // Probably needs some marker annotation like this?
  val unknowns: Map<String, TomlTable>,
)
@Serializable
data class Known(val foo: String)

And finally, I have a real-world case which is mixing known table names and unknown table names in a single type. So the TOML looks like this:

[knownA]
foo = "bar"

[knownB]
foo = "baz"

[unknownA]
thing = "something"
other = "whatever"

[unknownB]
thing = "it's a thing"
other = "stuff"

Ideally I could somehow bind this against a type like:

@Serializable
data class Config(
  val knownA: Known,
  val knownB: Known,
  @UnboundTables // Probably needs some marker annotation like this?
  val unknowns: Map<String, Other>,
)
@Serializable
data class Known(val foo: String)
@Serializable
data class Other(val thing: String, val other: String)

I looked through the issues, documentation, and tests cases and didn't see anything that would suggest this is supported today. So far the only path forward I've seen is to go directly to TomlTable and do the entirety of the type binding myself.

Support decoding enum values

    enum class TestEnum {
        A, B, C
    }

    @Serializable
    data class Table3(val a: Boolean, val e: String, val d: Int, val b: TestEnum)

    @Serializable
    data class ComplexPlainTomlCase(val table3: Table3)

    @Test
    fun testForComplexTypesExceptionOnEnums() {
        println("table3: (a:true, d:5, e:\"my test\", b = A)")
        val test = deserialize<ComplexPlainTomlCase>("[table3] \n a = true \n d = 5 \n e = my test \n b = A")
        assertEquals(ComplexPlainTomlCase(Table3(true, "my test", 5, b = TestEnum.A)), test)
    }

Fix and rework partial decoding

@ExperimentalSerializationApi
class PartialDecoderTest {
    @Serializable
    data class TwoTomlTables(val table1: Table1, val table2: Table2)

    @Serializable
    data class Table1(val a: Long, val b: Long)

    @Serializable
    data class Table2(val c: Long, val e: Long, val d: Long)

    @Test
    fun testPartialDecoding() {
        val test = TwoTomlTables(Table1(1, 2), Table2(1, 2, 3))
        assertEquals(
            test.table1,
            Toml.partiallyDecodeFromString(
                serializer(),
                "[table1] \n a = 1 \n b = 2 \n [table2] \n c = 1 \n e = 2 \n d = 3",
                "table1"
            )
        )
    }
}
    @Test
    fun testPartialFileDecoding() {
        val file = "src/commonTest/resources/partial_decoder.toml"
        val test = TwoTomlTables(Table1(1, 2), Table2(1, 2, 3))
        assertEquals(
            test.table1,
            TomlFileReader.partiallyDecodeFromFile(
                serializer(), file, "table1"
            )
        )
    }

Support Arrays of Tables

I see this is already called out as a feature that's not yet implemented in the README:

โŒ Array of Tables

I just wanted to make a ticket for it so folks can vote for it, and so I can get notified when it's implemented. (It's currently a blocker for my use case.)


My use case is this:

I want to support multiple configuration environments in one file. Like:

[[environments]]
name = "foo"
# โ€ฆ

[[environments]]
name = "bar"
# โ€ฆ

But that currently gives an error:

Not able to parse the key: [[environments]] as it contains invalid symbols. In case you would like to use special symbols - use quotes as it is required by TOML standard: "My key ~ with special % symbols"


Alternatively, I also tried a "Map of Tables" approach like this, but that didn't work either:

[environments.foo]
# โ€ฆ

[environments.bar]
# โ€ฆ

Error:

Invalid number of key-value arguments provided in the input for deserialization. Missing the required field <0> from class <kotlin.collections.LinkedHashMap> in the input

General feedback on the API design

Hello, this isn't regarding one specific issue per se, but rather some general feedback regarding the design of the current API. If any of this comes off as aggressive/mean sounding, I apologize, my intention is solely for constructive criticism.

Most of my opinions will be based on the API design of officially supported format libraries developed by JetBrains themselves, which can be found here, and the Kotlin coding conventions provided by JetBrains, which can be found here. I'm not sure how much stuff you wanna change, but I figured it would be best to provide feedback while the library is still in early development, as a lot of these changes would break backwards compatibility.

There's a decent chunk of stuff that I want to provide feedback on, so I'm sorry if things read like a jumbled mess. I will try to section off the feedback to their own "sections" as best as I can.


If any of the suggestions here are something you like, I can make a pull requests with the fixes if desired. I would rather just explain my reasoning and thoughts before just making a pull requests with all fixes.


The Ktoml class

The class name

First point to address here is the name, if we look at naming rules, it states that Names of classes and objects start with an uppercase letter and use camel case, and with camel case, each new word should be capitalized, and for acronyms, each letter representing the word should normally be treated as a new word. meaning that if we follow these rules, the appropriate name for the class should be KToml rather than Ktoml as the K stands for Kotlin, and toml should be treated as one word.
(By following the above rules it should technically be KTOML, but if we look at the officially supported formats like json, and other classes developed by JetBrains, they seem to follow the rules of Dart wherein an acronym that's 3 or more characters long should be treated as a word, so instead of URL it would be Url.)

However, if we look at essentially all other libraries, even those outside of the officially supported formats, like yamlkt and avro4k, they just use the format name as the class name, meaning that rather than Ktoml it would be Toml.

Personally I think the nicest looking option is to just follow the official libraries and name the class Toml, as there is no real point in denoting that it's specifically for Kotlin as far as I can see.

The general design of the config

I will be basing the following suggestion on the json library.

If we look at how the json library handles configuration, we can see that it's using a sealed class hierarchy to achieve this, which can be roughly laid out like this:

  • The Json class is the parent, you can not create new instances of directly, it has the implementations for the format its extending already defined.
  • The companion object Default of the Json class is the default implementation, which uses the default settings for serialization/deserialization.
  • There exists a JsonImpl class which allows custom settings to be set, this is internal and never exposed to the end user.
  • There exists a JsonBuilder class which allows the user to customize the settings, in conjunction with the top-level Json function this provides a nice Kotlin DSL for creating a Json format with custom settings.
  • The top-level Json function acts as the constructor of the Json class, allowing the user to create instances with a custom configuration easily.

The benefit of this structure is that if I just want to use the default settings for the format, I can just write Json.encodeToString, or Json.Default.encodeToString if I want to be more explicit. And when I want to change the settings I can just go Json { // stuff }. It also allows the user to easily copy the settings from any already created Json instance, while keeping the settings of the instance immutable.

If we apply the same design layout to ktoml it would roughly look like this:

public sealed class Toml(
    override val serializersModule: SerializersModule,
    public val ignoreUnknownNames: Boolean,
) : StringFormat {
    public companion object Default : Toml(EmptySerializersModule, false)

    public final override fun <T> encodeToString(serializer: SerializationStrategy<T>, value: T): String = // default implementation

    public final override fun <T> decodeFromString(deserializer: DeserializationStrategy<T>, string: String): T = // default implementation
}

public fun Toml(from: Toml = Toml.Default, builderAction: TomlBuilder.() -> Unit): Toml {
    val builder = TomlBuilder(from).apply(builderAction)
    return TomlImpl(builder.serializersModule, builder.ignoreUnknownNames)
}

public class TomlBuilder internal constructor(toml: Toml) {
    public var ignoreUnknownNames: Boolean = toml.ignoreUnknownNames

    public var serializersModule: SerializersModule = toml.serializersModule
}

private class TomlImpl(module: SerializersModule, ignoreUnknownNames: Boolean) : Toml(module, ignoreUnknownNames)

The deserialize and serialize top-level functions

This is partly down to personal preference, but the absence of anything similar from most libraries should also be a tell-tale sign.

I do not think having these top-level functions actually add anything of value, I can see that the thought behind being that it might be easier to just call the top-level function rather than having to create a new Ktoml instance and call the relevant function. However, I can only see that this would bring readability issues and ambiguity going down the line.

Here are some of the issues I can see would pop up from these functions:

  • The names of them are very ambiguous. Sure, it's obvious that they're deserializing/serializing something, but it's not obvious what format they're being converted to, deserializeToml would be better, but it still feels like a code smell due to the other reasons defined below.
  • From my own experience, it's much better to store/cache a kotlinx.serialization format as a constant value somewhere, as essentially all implementations are immutable and do not modify anything within itself, they can be used from multiple threads, so thread safety is not a concern. Therefore, creating a new instance every time you just wanna write/read something is a code smell, and should generally be avoided. Due to how these functions work, they all create a new instance just for this purpose.
  • Building on the first point, if I have multiple formats in one project, it's very ambiguous what format the function deserialize would actually deserialize into.
  • Unless something has changed, using the implicit serializer(). function like what is done in these functions is way slower than explicitly passing in a serializer, as it requires reflection rather than just a direct function call. So encouraging the use of that function by making these functions so easily accessible is not good design imo.

The dependency on okio

I personally think dragging in a whole dependency just for inbuilt support for reading from a file is rather excessive, and I know a lot of other people also would like the dependency graph of the libraries they use to be as minimal as possible.

The inbuilt functions for reading from a file aren't that much of a time-saver either:
Ktoml.decodeFromFile(Thing.serializer(), "/foo/bar.toml") vs Ktoml.decodeFromString(Thing.serializer(), Path("/foo/bar.toml").readText())
(The above example is of course if you're on the JVM, but it's still relevant due to the argument below.)

There's also the fact that okio is not the only mulitplatform kotlin library that supports files, and while kotlinx.io is currently postponed, it will still be developed at one point, and there will certainly be more multiplatform file libraries developed. If this library then forces a dependency on okio this could be annoying for users who would rather use another library.

Therefore I think it would be better to not have explicit support for a specific file library, and rather just leave that up to the user. (Just quickly reading text from a file is more verbose in okio than in the Java path api with Kotlin extensions, but regardless, I don't think the minimal amount of boilerplate saved is worth explicitly forcing this library onto the user.)


These suggestions are mainly only for the public facing API, as I haven't looked too deeply into the more internal API.

I hope no offense was taken from this, this is only meant as constructive criticism for a library I'm looking forward to use once it gets more stable.

Native decoding fails with an exception: kotlin.Float cannot be cast to kotlin.String

 @Test
 @Ignore
    fun regressionTest() {
        // this test is NOT failing on JVM but fails on mingw64
            deserialize<Regression>(
                "[general] \n" +
                        "execCmd = \"java -jar ktlint && java -jar ktlint -R diktat.0.6.2.jar\" \n" +
                        "description = \"Test for diktat - linter and formater for Kotlin\""
            )
    }

On mingw fails with the following:

kotlin.ClassCastException: kotlin.Float cannot be cast to kotlin.String
kotlin.ClassCastException: kotlin.Float cannot be cast to kotlin.String
    at kfun:kotlin.Throwable#<init>(kotlin.String?;kotlin.Throwable?){} (00000000004513c0)
    at kfun:kotlin.Throwable#<init>(kotlin.String?){} (00000000004516d0)
    at kfun:kotlin.Exception#<init>(kotlin.String?){} (000000000044abc0)
    at kfun:kotlin.RuntimeException#<init>(kotlin.String?){} (000000000044ada0)
    at kfun:kotlin.ClassCastException#<init>(kotlin.String?){} (000000000044b840)
    at ThrowClassCastException (0000000000489ed0)
    at kfun:kotlinx.serialization.encoding.AbstractDecoder#decodeString(){}kotlin.String (000000000057b6b0)
    at kfun:kotlinx.serialization.encoding.AbstractDecoder#decodeStringElement(kotlinx.serialization.descriptors.SerialDescriptor;kotlin.Int){}kotlin.String (000000000057c070)
    at kfun:com.akuleshov7.ktoml.test.decoder.GeneralDecoderTest.General.$serializer#deserialize(kotlinx.serialization.encoding.Decoder){}com.akuleshov7.ktoml.test.decoder.GeneralDecoderTest.General (00000000005d3640)
    at kfun:kotlinx.serialization.encoding.Decoder#decodeSerializableValue(kotlinx.serialization.DeserializationStrategy<0:0>){0ยง<kotlin.Any?>}0:0 (000000000057c780)
    at kfun:kotlinx.serialization.encoding.AbstractDecoder#decodeSerializableValue(kotlinx.serialization.DeserializationStrategy<0:0>;0:0?){0ยง<kotlin.Any?>}0:0 (000000000057b9b0)
    at kfun:kotlinx.serialization.encoding.AbstractDecoder#decodeSerializableElement(kotlinx.serialization.descriptors.SerialDescriptor;kotlin.Int;kotlinx.serialization.DeserializationStrategy<0:0>;0:0?){0ยง<kotlin.Any?>}0:0 (000000000057c350)
    at kfun:com.akuleshov7.ktoml.test.decoder.GeneralDecoderTest.Regression.$serializer#deserialize(kotlinx.serialization.encoding.Decoder){}com.akuleshov7.ktoml.test.decoder.GeneralDecoderTest.Regression (00000000005d21a0)
    at kfun:kotlinx.serialization.encoding.Decoder#decodeSerializableValue(kotlinx.serialization.DeserializationStrategy<0:0>){0ยง<kotlin.Any?>}0:0 (000000000057c780)

On jvm everything works perfectly

Literal strings

Literal strings are surrounded by single quotes. Like basic strings, they must appear on a single line:

# What you see is what you get.
winpath  = 'C:\Users\nodejs\templates'
winpath2 = '\\ServerX\admin$\system32\'
quoted   = 'Tom "Dubs" Preston-Werner'
regex    = '<\i\c*\s*>'

Since there is no escaping, there is no way to write a single quote inside a literal string enclosed by single quotes. Luckily, TOML supports a multi-line version of literal strings that solves this problem.

Allow empty files

Currently, this logic:

val firstFileChild = rootNode.getFirstChild() ?: throw InternalDecodingException(
"Missing child nodes (tables, key-values) for TomlFile." +
" Was empty toml provided to the input?"
)

forbids the parsing of empty files.

First, consider the official TOML ABNF: https://github.com/toml-lang/toml/blob/1.0.0/toml.abnf. This format definition allows an empty file through a single expression consisting of white space (ws) of zero characters (*wschar).

Second, as far as this library is concerned, if all my properties have default values then an empty parse should return the default instance.

@Serializable
data class Foo(val name: String = "Jake")

val expected = Foo()
val actual = Toml.decodeFromString(Foo.serializer(), "")
assert(expected == actual)

I have a real-world use where all of my top-level properties have sane defaults bound in the model and all of my supported tables are optional (and thus have null defaults in the model). Currently this fails to parse because of this check.

Support parsing of special escaped symbols

The following symbols should be properly parsed:

backspace = "This string has a \b backspace character."
tab = "This string has a \t tab character."
newline = "This string has a \n new line character."
formfeed = "This string has a \f form feed character."
carriage = "This string has a \r carriage return character."
quote = "This string has a \" quote character."
backslash = "This string has a \\ backslash character."
notunicode1 = "This string does not have a unicode \\u escape."
notunicode2 = "This string does not have a unicode \u005Cu escape."
notunicode3 = "This string does not have a unicode \\u0075 escape."
notunicode4 = "This string does not have a unicode \\\u0075 escape."

Fix documentation and readme

  1. Imports for ktoml-file example in readme are wrong
  2. Need to suggest to add serialization plugin for @Serializable annotation
  3. Need to add info about exceptions and exception handling
  4. Need to mention or fix the issue with invalid parsing of array to invalid type (String for example)

Dotted keys and their relation to tables

Dotted keys are a sequence of bare or quoted keys joined with a dot. This allows for grouping similar properties together:

name = "Orange"
physical.color = "orange"
physical.shape = "round"
site."google.com" = true
In JSON land, that would give you the following structure:

{
"name": "Orange",
"physical": {
"color": "orange",
"shape": "round"
},
"site": {
"google.com": true
}
}

For details regarding the tables that dotted keys define, refer to the Table section below.

Whitespace around dot-separated parts is ignored. However, best practice is to not use any extraneous whitespace.

fruit.name = "banana" # this is best practice
fruit. color = "yellow" # same as fruit.color
fruit . flavor = "banana" # same as fruit.flavor
Indentation is treated as whitespace and ignored.

Invalid processing of escaped symbols

the following code is not working:

    execFlags = "-checks * -allow-enabling-analyzer-alpha-checkers -config \""
Uncaught Kotlin exception: com.akuleshov7.ktoml.exceptions.TomlParsingException: Line 13: According to TOML documentation unknown escape symbols are not allowed. Please check [\"]

Support for Nullable Tables

Given the following code:

class TomlReadTest : FunSpec({
    test("toml read nullable") {
        @Serializable
        data class Key(val value: Long)
        @Serializable
        data class Config(val key: Key?)

        val mapper = Toml(
            config = KtomlConf(
                ignoreUnknownNames = true,
                emptyValuesAllowed = true
            )
        )
        val toml = mapper.decodeFromString<Config>("""            
            [key]
            value = 1            
        """.trimIndent())

        assertNotNull(toml)
        assertEquals(1L, toml.key?.value)
    }
})

It is expected to deserialize Config with key = Key(1L), however ktoml fails with:

This kind of node should not be processed in TomlDecoder.decodeValue(): com.akuleshov7.ktoml.parsers.node.TomlTable@6d9c9c74
com.akuleshov7.ktoml.exceptions.InternalDecodingException: This kind of node should not be processed in TomlDecoder.decodeValue(): com.akuleshov7.ktoml.parsers.node.TomlTable@6d9c9c74
	at com.akuleshov7.ktoml.decoders.TomlDecoder.decodeKeyValue(TomlDecoder.kt:96)
	at com.akuleshov7.ktoml.decoders.TomlDecoder.decodeValue(TomlDecoder.kt:25)
	at com.akuleshov7.ktoml.decoders.TomlDecoder.decodeNotNullMark(TomlDecoder.kt:36)
	at kotlinx.serialization.encoding.AbstractDecoder.decodeNullableSerializableElement(AbstractDecoder.kt:79)

Changing to a property with default value does work:

class TomlReadTest : FunSpec({
    test("toml read nullable") {
        @Serializable
        data class Key(val value: Long)
        @Serializable
        data class Config(val key: Key = Key(0L))

        val mapper = Toml(
            config = KtomlConf(
                ignoreUnknownNames = true,
                emptyValuesAllowed = true
            )
        )
        val toml = mapper.decodeFromString<Config>("""            
            [key]
            value = 1            
        """.trimIndent())

        assertNotNull(toml)
        assertEquals(1L, toml.key.value)
    }
})

All my code base uses Kotlinx Serialization, only the configuration parsing is using Jackson and I would like to remove this.

Support Unicode symbols in the strings

Now unicode symbols are not supported. For

"\u0048\u0065\u006C\u006C\u006F World"

you will get:

According to TOML documentation unknown escape symbols are not allowed.

Multiline literal strings

Multi-line literal strings are surrounded by three single quotes on each side and allow newlines. Like literal strings, there is no escaping whatsoever. A newline immediately following the opening delimiter will be trimmed. All other content between the delimiters is interpreted as-is without modification.

regex2 = '''I [dw]on't need \d{2} apples'''
lines  = '''
The first newline is
trimmed in raw strings.
   All other whitespace
   is preserved.
'''

Add processing for empty toml lines

Currently we are failing with strange errors:

        mapper.decodeFromString<Config2>(
            """            
     
            """.trimIndent()
        )
List is empty.
java.util.NoSuchElementException: List is empty.
	at kotlin.collections.CollectionsKt___CollectionsKt.last(_Collections.kt:416)
	at com.akuleshov7.ktoml.parsers.TomlParser.trimEmptyLines-impl(TomlParser.kt:78)

Encapsulation design

Discussed in #92

Originally posted by Peanuuutz January 12, 2022
Currently encapsulation is not present, which is an urgent problem for a widely used application. We should only expose those classes and functions that are necessary for users. This will help users to quickly find what they want, as well as help us develop new features as well as modify existing code without worrying about breaking users' applications.

I'm going to copy what I have done. (Actually these are quite common)

The following are public:

  • Entrypoint for serialization and deserialization: Toml. (Consider this suggestion)
  • AST: TomlValue(which derives TomlBoolean, TomlLong, TomlBasicString, TomlArray, TomlTable to name a few).
  • AST manipulation: like JsonPrimitive.int.
  • Exceptions: only sealed TomlEncodingException and TomlDecodingException.

The following are internal:

  • Encoders
  • Emitters
  • Decoders
  • Parsers

Also, I suggest separating AST itself and the actual parsing process, like Toml<type> only has content property, while Toml<type>Parser is responsible for parsing and producing it.

ktoml is unable to parse empty mutable list

Here is a simple code which works in a wrong way:

import com.akuleshov7.ktoml.Toml
import kotlinx.serialization.ExperimentalSerializationApi
import kotlinx.serialization.decodeFromString
import kotlinx.serialization.Serializable

@Serializable
data class Test(val ignoreLines: MutableList<String>? = null)

@ExperimentalSerializationApi
fun main() {
    val testClass: Test = Toml.decodeFromString("ignoreLines = []")
    println("${testClass.ignoreLines?.size ?: "ignoreLines is null"}")          // 1
    println(testClass.ignoreLines)                                              // [null]
    println(testClass.ignoreLines?.get(0) is String)                            // true
}

ktoml is expected to parse an empty mutable list of Strings, but as the resullt we get a mutable list with one element - String "null". The same problem with List<String>, Set<String>.

Bug in tables with dotted keys and quotes

 @Test
    @Ignore
    fun parseDottedKey() {
        val result = TomlParser("a.\"a.b.c\".b.d = 123").parseString()
        // FixMe: some dotted keys are incorrectly parsed
        throw IllegalArgumentException()
    }

While we are doing createTomlTableFromDottedKey() we are loosing the information about the quotes

Inline Tables

Inline Table
Inline tables provide a more compact syntax for expressing tables. They are especially useful for grouped data that can otherwise quickly become verbose. Inline tables are fully defined within curly braces: { and }. Within the braces, zero or more comma-separated key/value pairs may appear. Key/value pairs take the same form as key/value pairs in standard tables. All value types are allowed, including inline tables.

Inline tables are intended to appear on a single line. A terminating comma (also called trailing comma) is not permitted after the last key/value pair in an inline table. No newlines are allowed between the curly braces unless they are valid within a value. Even so, it is strongly discouraged to break an inline table onto multiples lines. If you find yourself gripped with this desire, it means you should be using standard tables.

name = { first = "Tom", last = "Preston-Werner" }
point = { x = 1, y = 2 }
animal = { type.name = "pug" }
The inline tables above are identical to the following standard table definitions:

[name]
first = "Tom"
last = "Preston-Werner"

[point]
x = 1
y = 2

[animal]
type.name = "pug"
Inline tables are fully self-contained and define all keys and sub-tables within them. Keys and sub-tables cannot be added outside the braces.

[product]
type = { name = "Nail" }
`# type.edible = false  # INVALID`
Similarly, inline tables cannot be used to add keys or sub-tables to an already-defined table.

[product]
type.name = "Nail"
` type = { edible = false }  # INVALID`

Add proper logging for cast exceptions

Such exceptions (from kotlinx) are not informative

class java.lang.Integer cannot be cast to class java.lang.String (java.lang.Integer and java.lang.String are in module java.base of loader 'bootstrap')
java.lang.ClassCastException: class java.lang.Integer cannot be cast to class java.lang.String (java.lang.Integer and java.lang.String are in module java.base of loader 'bootstrap')

Inline comments processing is wrong with tables

[fruits] # comment
     name = "apple"

[[array]] # comment
     name = "apple" 

These comments will be parsed incorrectly and cause invalid logic:

Line 1: Incorrect format of Key-Value pair (missing equals sign). Should be <key = value>, but was: [fruits] # comment. If you wanted to define table - use brackets []

Entry points to fix it: TomlParser.parseStringsToTomlTree

Inline arrays of tables

TOML standard allows inline arrays of tables in the same way how it is done for single inline tables (#48)

points = [ { x = 1, y = 2, z = 3 },
           { x = 7, y = 8, z = 9 },
           { x = 2, y = 4, z = 8 } ]

After #88 will be finished - this issue should be implemented

Support unicode symbols

\uXXXX     - unicode         (U+XXXX)
\UXXXXXXXX - unicode         (U+XXXXXXXX)

Any Unicode character may be escaped with the \uXXXX or \UXXXXXXXX forms.
The escape codes must be valid Unicode scalar values.

Prohibit the default values for keys with a special option

Create a special option (disabled by the default) that would control the possibility of the default value for the toml key:

class A (val a: B = B(value))

As some people would like to have a strict control to the data - it could be very useful.

Bug with parsing comments: 'Incorrect format of Key-Value pair'

The following line: lineCaptureGroup = 1 # index warningTextHasLine = false`

triggers:

[ERROR] Line 9: Incorrect format of Key-Value pair. Should be <key = value>, but was: lineCaptureGroup = 1  # index of regex capture group for line number, used when `warningTextHasLine == false`
Uncaught Kotlin exception: com.akuleshov7.ktoml.exceptions.TomlParsingException: Line 9: Incorrect format of Key-Value pair. Should be <key = value>, but was: lineCaptureGroup = 1  # index of regex capture group for line number, used when `warningTextHasLine == false`
    at kfun:kotlin.Throwable#<init>(kotlin.String?;kotlin.Throwable?){} (0000000000450130)
    at kfun:kotlin.Throwable#<init>(kotlin.String?){} (0000000000450440)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.