alethiophile / qtoml Goto Github PK

View Code? Open in Web Editor NEW

25.0 4.0 3.0 54 KB

Another Python TOML encoder/decoder

License: MIT License

Python 100.00%

toml toml-parser toml-parsing

qtoml's Introduction

DEPRECATION NOTE

qTOML is now deprecated. It will not be updated further.

In replacement for qTOML, I endorse the tomllib standard library module for Python 3.11+, or tomli for earlier versions. In either case, tomli-w is available for writing.

These libraries satisfy the use case I had for writing qTOML to begin with, making this version redundant.

qTOML

qtoml is another Python TOML encoder/decoder. I wrote it because I found uiri/toml too unstable, and PyTOML too slow.

For information concerning the TOML language, see toml-lang/toml.

qtoml currently supports TOML v0.5.0.

Usage

qtoml is available on PyPI. You can install it using pip:

$ pip install qtoml

qtoml supports the standard load/loads/dump/dumps API common to most similar modules. Usage:

>>> import qtoml
>>> toml_string = """
... test_value = 7
... """
>>> qtoml.loads(toml_string)
{'test_value': 7}
>>> print(qtoml.dumps({'a': 4, 'b': 5.0}))
a = 4
b = 5.0

>>> infile = open('filename.toml', 'r')
>>> parsed_structure = qtoml.load(infile)
>>> outfile = open('new_filename.toml', 'w')
>>> qtoml.dump(parsed_structure, outfile)

TOML supports a fairly complete subset of the Python data model, but notably does not include a null or None value. If you have a large dictionary from somewhere else including None values, it can occasionally be useful to substitute them on encode:

>>> print(qtoml.dumps({ 'none': None }))
qtoml.encoder.TOMLEncodeError: TOML cannot encode None
>>> print(qtoml.dumps({ 'none': None }, encode_none='None'))
none = 'None'

The encode_none value must be a replacement encodable by TOML, such as zero or a string.

This breaks reversibility of the encoding, by rendering None values indistinguishable from literal occurrences of whatever sentinel you chose. Thus, it should not be used when exact representations are critical.

Development/testing

qtoml uses the poetry tool for project management. To check out the project for development, run:

$ git clone --recurse-submodules https://github.com/alethiophile/qtoml
$ cd qtoml
$ poetry install

This assumes poetry is already installed. The package and dependencies will be installed in the currently active virtualenv if there is one, or a project-specific new one created if not.

qtoml is tested against the alethiophile/toml-test test suite, forked from uiri's fork of the original by BurntSushi. To run the tests, after checking out the project as shown above, enter the tests directory and run:

$ pytest              # if you already had a virtualenv active
$ poetry run pytest   # if you didn't

License

This project is available under the terms of the MIT license.

qtoml's People

Contributors

Stargazers

Watchers

Forkers

jeffcarpenter juan-jurado miccoli

qtoml's Issues

Release stray output patch?

Hi,

I am evaluating (among others) TOML benchmarks at https://github.com/eno-lang/benchmarks and experimentally tried out your library when I noticed stray debug output filling my console - I've already seen that you have fixed the issue but it took me a while to figure out why I was still getting stray output then, although you had already fixed this weeks ago.

Bottom line, you might want to consider doing a patch release for this, I'd certainly be happy about it :)
Thanks for providing this! 👍

Best, Simon

Question about the speed of parsing

I wrote it because I found PyTOML too slow.

Hi! I want to know, how slow it is, could make user feel slow? Like 2 seconds for parsing a toml doc, and usually the parsing process should take a imperceptible period of time like 2ms?
And what principle could make that big speed difference between these parsers?

I'm a new toml parser coder (not the author of PyTOML), but I'm caring about the speed of parser, which I didn't care about before.

Thank you!

Dotted keys using string 'a.b.c' instead of dict a:{b:{c:'foo'}}}

The TOML format allows dotted keys ("Dotted keys are a sequence of bare or quoted keys joined with a dot.") but I fail to create one using this library.

I cannot reproduce the given TOML example code:

name = "Orange"
physical.color = "orange"
physical.shape = "round"
site."google.com" = true

Using Python dictionary:

{
    'name': "Orange",
    'physical.color': "orange",
    'physical.shape': "round",
    'site."google.com"': True,
}

Reproduce code

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import qtoml
import pprint

config = {
    'name': "Orange",
    'physical.color': "orange",
    'physical.shape': "round",
    'site."google.com"': True,
}

toml_string = qtoml.dumps(config)
print('TOML format:')
print(toml_string)

print('-----')

toml_loaded = qtoml.loads(toml_string)
pp = pprint.PrettyPrinter(indent=4)
print('Python dump:')
pp.pprint(toml_loaded)

Outputs:

TOML format:
name = 'Orange'
'physical.color' = 'orange'
'physical.shape' = 'round'
'site."google.com"' = true

-----
Python dump:
{   'name': 'Orange',
    'physical.color': 'orange',
    'physical.shape': 'round',
    'site."google.com"': True}

We can see the physical key is missing as we could expect from TOML format.

In order to get it, I must change the Python dictionary to:

config = {
    'name': "Orange",
    'physical': {
        'color': "orange",
        'shape': "round",
    },
    'site."google.com"': True,
}

Which outputs:

TOML format:
name = 'Orange'
'site."google.com"' = true

[physical]
color = 'orange'
shape = 'round'

-----
Python dump:
{   'name': 'Orange',
    'physical': {'color': 'orange', 'shape': 'round'},
    'site."google.com"': True}

Where physical key now exists (using the square bracket notation: which is fine).

Is dotted-notation supported (and I am doing something wrong)?

Input sanitization - is this safe to expose to the internet?

Is it safe to load untrusted/arbitrary toml with qtoml?

Problems in dealing with scientific notation

I'm running Miniconda 3.6.7 on a Linux machine. The following program works as expected:

import qtoml

string = qtoml.dumps({'a': 1.0})
print('Encoded TOML: ', string)
qtoml.loads(string)

However, changing the value of a from 1.0 to 1.0e-9 makes qtoml crash:

import qtoml

string = qtoml.dumps({'a': 1.0e-9})
print('Encoded TOML: ', string)
qtoml.loads(string)

The error is the following:

Encoded TOML:  a = 1e-09

Traceback (most recent call last):
  File "test.py", line 5, in <module>
    qtoml.loads(string)
  File "/HOME/miniconda3/lib/python3.6/site-packages/qtoml/decoder.py", line 514, in loads
    (kl, v), p = parse_pair(p)
  File "/HOME/miniconda3/lib/python3.6/site-packages/qtoml/decoder.py", line 428, in parse_pair
    v, p = parse_value(p)
  File "/HOME/miniconda3/lib/python3.6/site-packages/qtoml/decoder.py", line 391, in parse_value
    raise TOMLDecodeError("can't parse type", p)
qtoml.decoder.TOMLDecodeError: can't parse type (line 1, column 4)

This is a clash between the current TOML specification and the way programming languages output numbers in scientific notation (see my bugreport toml-lang/toml#356). qtoml should remove the trailing zero from exponentials.

Support UserDict and friends

Currently (version 0.3.0) the qtoml dumper doesn't understand UserDicts, which makes using their subclasses annoying. Subclassing dict would of course work, but the internal optimizations of the dict class means that overriding methods can be unpredictable.

[feature request] Implement heterogeneous values in arrays as per the >0.5.0 (unreleased) TOML spec

The working and still unreleased TOML spec allows heterogenous values in arrays: see

This is a major improvement that better aligns TOML lists with python ones: I would suggest to implement this feature ahead of the formal release of the new TOML spec.

Fails to encode OrderDict

When I call qtoml.dumps on an OrderedDict I get a TOMLEncoderError.

If this line were to check isinstance(v, dict) instead of type(v) == dict then it should work fine.

Support for Python 2.7.X

Greetings, I know in 2020 support for Python 2.7.X will end. However projects still transitioning to Python 3. Currently "uiri/toml" hhas some showstopping issues with incorrect string escaping in TOML encoder and serializer produced trailing whitespace. I understand of this is out of the scope of qtoml with your limited development time. In that case do you have another recommendation for a matured toml library for Python?

Force-generating inline tables

I've been trying to explicitly generate inline tables for my data. After a bit of hacking, I discovered an interesting solution for v0.3.0:

from qtoml.encoder import TOMLEncoder
from collections import UserDict

hacked_encoder = TOMLEncoder()
hacked_encoder.st[UserDict] = hacked_encoder.dump_itable

data = {
    "my_data": {
        "inline_value": "this is a scalar",
        "inline_table": UserDict(
            {"this": "generates", "an": 10, "inline table": True}
        ),
        "separate_table": {"this": "creates", "another table": 42},
    }
}

print(hacked_encoder.dump_sections(data, [], False))

This yields:

[my_data]
inline_value = 'this is a scalar'
inline_table = { this = 'generates', an = 10, 'inline table' = true }

[my_data.separate_table]
this = 'creates'
'another table' = 42

The current implementation of TOMLEncoder.dump_sections() always renders dict as regular tables even if a dumper is assigned to TOMLEncoder.st. Since collections.UserDict is not a subclass of dict, it circumvents this issue.

I know this is fragile and unreliable. However, since qtoml doesn't expose a lot of options for rendering TOML, I thought it was worth sharing.

PS. It would be great to make this a feature by providing a qtoml.InlineTableDict class (which simply subclasses UserDict).

[feature request] make the toml encoder extensible

The standard library json encoder is extensible, a feature that I find very handy:

import json
from pathlib import Path
import numpy as np

class MyEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, Path):
            return obj.as_posix()
        if isinstance(obj, np.ndarray):
            return obj.tolist()
        # Let the base class default method raise the TypeError
        return super().default(obj)

so that

>>> json.dumps(Path("foo") / "bar", cls=MyEncoder)
'"foo/bar"'
>>> json.dumps(np.zeros((2,2,3)), cls=MyEncoder)
'[[[0.0, 0.0, 0.0], [0.0, 0.0, 0.0]], [[0.0, 0.0, 0.0], [0.0, 0.0, 0.0]]]'

What about implementing a similar protocol also in qtoml?