fabiocaccamo / python-benedict Goto Github PK

:blue_book: dict subclass with keylist/keypath support, built-in I/O operations (base64, csv, html, ini, json, pickle, plist, query-string, toml, xls, xml, yaml), s3 support and many utilities.

License: MIT License

Python 86.85% Q# 0.06% HTML 13.09%

python dict dictionary keypath base64 csv json pickle plist query-string

python-benedict's Introduction

python-benedict

python-benedict is a dict subclass with keylist/keypath/keyattr support, I/O shortcuts (base64, cli, csv, html, ini, json, pickle, plist, query-string, toml, xls, xml, yaml) and many utilities... for humans, obviously.

Features

100% backward-compatible, you can safely wrap existing dictionaries.
NEW Keyattr support for get/set items using keys as attributes.
Keylist support using list of keys as key.
Keypath support using keypath-separator (dot syntax by default).
Keypath list-index support (also negative) using the standard [n] suffix.
Normalized I/O operations with most common formats: base64, cli, csv, html, ini, json, pickle, plist, query-string, toml, xls, xml, yaml.
Multiple I/O operations backends: file-system (read/write), url (read-only), s3 (read/write).
Many utility and parse methods to retrieve data as needed (check the API section).
Well tested. ;)

Index

Installation
- Optional Requirements
Usage
- Basics
- Keyattr my_dict.x.y.z
- Keylist my_dict["x", "y", "z"]
- Keypath my_dict["x.y.z"]
- I/O
- API
Testing
License

Installation

If you want to install everything:

Run pip install "python-benedict[all]"

alternatively you can install the main package:

Run pip install python-benedict, then install only the optional requirements you need.

Optional Requirements

Here the hierarchy of possible installation targets available when running pip install "python-benedict[...]" (each target installs all its sub-targets):

[all]
- [io]
  - [html]
  - [toml]
  - [xls]
  - [xml]
  - [yaml]
- [parse]
- [s3]

Usage

Basics

benedict is a dict subclass, so it is possible to use it as a normal dictionary (you can just cast an existing dict).

from benedict import benedict

# create a new empty instance
d = benedict()

# or cast an existing dict
d = benedict(existing_dict)

# or create from data source (filepath, url or data-string) in a supported format:
# Base64, CSV, JSON, TOML, XML, YAML, query-string
d = benedict("https://localhost:8000/data.json", format="json")

# or in a Django view
params = benedict(request.GET.items())
page = params.get_int("page", 1)

Keyattr

It is possible to get/set items using keys as attributes (dotted notation).

d = benedict(keyattr_dynamic=True) # default False
d.profile.firstname = "Fabio"
d.profile.lastname = "Caccamo"
print(d) # -> { "profile":{ "firstname":"Fabio", "lastname":"Caccamo" } }

By default, if the keyattr_dynamic is not explicitly set to True, this functionality works for get/set only already existing items.

Disable keyattr functionality

You can disable the keyattr functionality passing keyattr_enabled=False option in the constructor.

d = benedict(existing_dict, keyattr_enabled=False) # default True

or using the getter/setter property.

d.keyattr_enabled = False

Dynamic keyattr functionality

You can enable the dynamic attributes access functionality passing keyattr_dynamic=True in the constructor.

d = benedict(existing_dict, keyattr_dynamic=True) # default False

or using the getter/setter property.

d.keyattr_dynamic = True

Warning - even if this feature is very useful, it has some obvious limitations: it works only for string keys that are unprotected (not starting with an _) and that don't clash with the currently supported methods names.

Keylist

Wherever a key is used, it is possible to use also a list (or a tuple) of keys.

d = benedict()

# set values by keys list
d["profile", "firstname"] = "Fabio"
d["profile", "lastname"] = "Caccamo"
print(d) # -> { "profile":{ "firstname":"Fabio", "lastname":"Caccamo" } }
print(d["profile"]) # -> { "firstname":"Fabio", "lastname":"Caccamo" }

# check if keypath exists in dict
print(["profile", "lastname"] in d) # -> True

# delete value by keys list
del d["profile", "lastname"]
print(d["profile"]) # -> { "firstname":"Fabio" }

Keypath

. is the default keypath separator.

If you cast an existing dict and its keys contain the keypath separator a ValueError will be raised.

In this case you should use a custom keypath separator or disable keypath functionality.

d = benedict()

# set values by keypath
d["profile.firstname"] = "Fabio"
d["profile.lastname"] = "Caccamo"
print(d) # -> { "profile":{ "firstname":"Fabio", "lastname":"Caccamo" } }
print(d["profile"]) # -> { "firstname":"Fabio", "lastname":"Caccamo" }

# check if keypath exists in dict
print("profile.lastname" in d) # -> True

# delete value by keypath
del d["profile.lastname"]

Custom keypath separator

You can customize the keypath separator passing the keypath_separator argument in the constructor.

If you pass an existing dict to the constructor and its keys contain the keypath separator an Exception will be raised.

d = benedict(existing_dict, keypath_separator="/")

Change keypath separator

You can change the keypath_separator at any time using the getter/setter property.

If any existing key contains the new keypath_separator an Exception will be raised.

d.keypath_separator = "/"

Disable keypath functionality

You can disable the keypath functionality passing keypath_separator=None option in the constructor.

d = benedict(existing_dict, keypath_separator=None)

or using the getter/setter property.

d.keypath_separator = None

List index support

List index are supported, keypaths can include indexes (also negative) using [n], to perform any operation very fast:

# Eg. get last location cordinates of the first result:
loc = d["results[0].locations[-1].coordinates"]
lat = loc.get_decimal("latitude")
lng = loc.get_decimal("longitude")

I/O

For simplifying I/O operations, benedict supports a variety of input/output methods with most common formats: base64, cli, csv, html, ini, json, pickle, plist, query-string, toml, xls, xml, yaml.

Input via constructor

It is possible to create a benedict instance directly from data-source (filepath, url, s3 or data string) by passing the data source and the data format (optional, default "json") in the constructor.

# filepath
d = benedict("/root/data.yml", format="yaml")

# url
d = benedict("https://localhost:8000/data.xml", format="xml")

# s3
d = benedict("s3://my-bucket/data.xml", s3_options={"aws_access_key_id": "...", "aws_secret_access_key": "..."})

# data
d = benedict('{"a": 1, "b": 2, "c": 3, "x": 7, "y": 8, "z": 9}')

Input methods

All input methods can be accessed as class methods and are prefixed by from_* followed by the format name.
In all input methods, the first argument can represent a source: file path, url, s3 url, or data string.

Input sources

All supported sources (file, url, s3, data) are allowed by default, but in certains situations when the input data comes from untrusted sources it may be useful to restrict the allowed sources using the sources argument:

# url
d = benedict("https://localhost:8000/data.json", sources=["url"]) # -> ok
d = benedict.from_json("https://localhost:8000/data.json", sources=["url"]) # -> ok

# s3
d = benedict("s3://my-bucket/data.json", sources=["url"]) # -> raise ValueError
d = benedict.from_json("s3://my-bucket/data.json", sources=["url"]) # -> raise ValueError

Output methods

All output methods can be accessed as instance methods and are prefixed by to_* followed by the format name.
In all output methods, if filepath="..." kwarg is specified, the output will be also saved at the specified filepath.

Supported formats

Here are the details of the supported formats, operations and extra options docs.

format	input	output	extra options docs
`base64`	✅	✅	-
`cli`	✅	❌	argparse
`csv`	✅	✅	csv
`html`	✅	❌	bs4 (Beautiful Soup 4)
`ini`	✅	✅	configparser
`json`	✅	✅	json
`pickle`	✅	✅	pickle
`plist`	✅	✅	plistlib
`query-string`	✅	✅	-
`toml`	✅	✅	toml
`xls`	✅	❌	openpyxl - xlrd
`xml`	✅	✅	xmltodict
`yaml`	✅	✅	PyYAML

API

Utility methods
- clean
- clone
- dump
- filter
- find
- flatten
- groupby
- invert
- items_sorted_by_keys
- items_sorted_by_values
- keypaths
- match
- merge
- move
- nest
- remove
- rename
- search
- standardize
- subset
- swap
- traverse
- unflatten
- unique
I/O methods
Parse methods

Utility methods

These methods are common utilities that will speed up your everyday work.

Utilities that accept key argument(s) also support keypath(s).

Utilities that return a dictionary always return a new benedict instance.

`clean`

# Clean the current dict instance removing all empty values: None, "", {}, [], ().
# If strings or collections (dict, list, set, tuple) flags are False,
# related empty values will not be deleted.
d.clean(strings=True, collections=True)

`clone`

# Return a clone (deepcopy) of the dict.
c = d.clone()

`dump`

# Return a readable representation of any dict/list.
# This method can be used both as static method or instance method.
s = benedict.dump(d.keypaths())
print(s)
# or
d = benedict()
print(d.dump())

`filter`

# Return a filtered dict using the given predicate function.
# Predicate function receives key, value arguments and should return a bool value.
predicate = lambda k, v: v is not None
f = d.filter(predicate)

`find`

# Return the first match searching for the given keys/keypaths.
# If no result found, default value is returned.
keys = ["a.b.c", "m.n.o", "x.y.z"]
f = d.find(keys, default=0)

`flatten`

# Return a new flattened dict using the given separator to join nested dict keys to flatten keypaths.
f = d.flatten(separator="_")

`groupby`

# Group a list of dicts at key by the value of the given by_key and return a new dict.
g = d.groupby("cities", by_key="country_code")

`invert`

# Return an inverted dict where values become keys and keys become values.
# Since multiple keys could have the same value, each value will be a list of keys.
# If flat is True each value will be a single value (use this only if values are unique).
i = d.invert(flat=False)

`items_sorted_by_keys`

# Return items (key/value list) sorted by keys.
# If reverse is True, the list will be reversed.
items = d.items_sorted_by_keys(reverse=False)

`items_sorted_by_values`

# Return items (key/value list) sorted by values.
# If reverse is True, the list will be reversed.
items = d.items_sorted_by_values(reverse=False)

`keypaths`

# Return a list of all keypaths in the dict.
# If indexes is True, the output will include list values indexes.
k = d.keypaths(indexes=False)

`match`

# Return a list of all values whose keypath matches the given pattern (a regex or string).
# If pattern is string, wildcard can be used (eg. [*] can be used to match all list indexes).
# If indexes is True, the pattern will be matched also against list values.
m = d.match(pattern, indexes=True)

`merge`

# Merge one or more dictionary objects into current instance (deepupdate).
# Sub-dictionaries keys will be merged together.
# If overwrite is False, existing values will not be overwritten.
# If concat is True, list values will be concatenated together.
d.merge(a, b, c, overwrite=True, concat=False)

`move`

# Move an item from key_src to key_dst.
# It can be used to rename a key.
# If key_dst exists, its value will be overwritten.
d.move("a", "b", overwrite=True)

`nest`

# Nest a list of dicts at the given key and return a new nested list
# using the specified keys to establish the correct items hierarchy.
d.nest("values", id_key="id", parent_id_key="parent_id", children_key="children")

`remove`

# Remove multiple keys from the dict.
# It is possible to pass a single key or more keys (as list or *args).
d.remove(["firstname", "lastname", "email"])

`rename`

# Rename a dict item key from "key" to "key_new".
# If key_new exists, a KeyError will be raised.
d.rename("first_name", "firstname")

`search`

# Search and return a list of items (dict, key, value, ) matching the given query.
r = d.search("hello", in_keys=True, in_values=True, exact=False, case_sensitive=False)

`standardize`

# Standardize all dict keys, e.g. "Location Latitude" -> "location_latitude".
d.standardize()

`subset`

# Return a dict subset for the given keys.
# It is possible to pass a single key or more keys (as list or *args).
s = d.subset(["firstname", "lastname", "email"])

`swap`

# Swap items values at the given keys.
d.swap("firstname", "lastname")

`traverse`

# Traverse a dict passing each item (dict, key, value) to the given callback function.
def f(d, key, value):
    print(f"dict: {d} - key: {key} - value: {value}")
d.traverse(f)

`unflatten`

# Return a new unflattened dict using the given separator to split dict keys to nested keypaths.
u = d.unflatten(separator="_")

`unique`

# Remove duplicated values from the dict.
d.unique()

I/O methods

These methods are available for input/output operations.

`from_base64`

# Try to load/decode a base64 encoded data and return it as benedict instance.
# Accept as first argument: url, filepath or data-string.
# It's possible to choose the subformat used under the hood:
# ('csv', 'json', 'query-string', 'toml', 'xml', 'yaml'), default: 'json'.
# It's possible to choose the encoding, default 'utf-8'.
# A ValueError is raised in case of failure.
d = benedict.from_base64(s, subformat="json", encoding="utf-8", **kwargs)

`from_cli`

# Load and decode data from a string of CLI arguments.
# ArgumentParser specific options can be passed using kwargs:
# https://docs.python.org/3/library/argparse.html#argparse.ArgumentParser
# Return a new dict instance. A ValueError is raised in case of failure.
d = benedict.from_cli(s, **kwargs)

`from_csv`

# Try to load/decode a csv encoded data and return it as benedict instance.
# Accept as first argument: url, filepath or data-string.
# It's possible to specify the columns list, default: None (in this case the first row values will be used as keys).
# It's possible to pass decoder specific options using kwargs:
# https://docs.python.org/3/library/csv.html
# A ValueError is raised in case of failure.
d = benedict.from_csv(s, columns=None, columns_row=True, **kwargs)

`from_html`

# Try to load/decode a html data and return it as benedict instance.
# Accept as first argument: url, filepath or data-string.
# It's possible to pass decoder specific options using kwargs:
# https://beautiful-soup-4.readthedocs.io/
# A ValueError is raised in case of failure.
d = benedict.from_html(s, **kwargs)

`from_ini`

# Try to load/decode a ini encoded data and return it as benedict instance.
# Accept as first argument: url, filepath or data-string.
# It's possible to pass decoder specific options using kwargs:
# https://docs.python.org/3/library/configparser.html
# A ValueError is raised in case of failure.
d = benedict.from_ini(s, **kwargs)

`from_json`

# Try to load/decode a json encoded data and return it as benedict instance.
# Accept as first argument: url, filepath or data-string.
# It's possible to pass decoder specific options using kwargs:
# https://docs.python.org/3/library/json.html
# A ValueError is raised in case of failure.
d = benedict.from_json(s, **kwargs)

`from_pickle`

# Try to load/decode a pickle encoded in Base64 format and return it as benedict instance.
# Accept as first argument: url, filepath or data-string.
# It's possible to pass decoder specific options using kwargs:
# https://docs.python.org/3/library/pickle.html
# A ValueError is raised in case of failure.
d = benedict.from_pickle(s, **kwargs)

`from_plist`

# Try to load/decode a p-list encoded data and return it as benedict instance.
# Accept as first argument: url, filepath or data-string.
# It's possible to pass decoder specific options using kwargs:
# https://docs.python.org/3/library/plistlib.html
# A ValueError is raised in case of failure.
d = benedict.from_plist(s, **kwargs)

`from_query_string`

# Try to load/decode a query-string and return it as benedict instance.
# Accept as first argument: url, filepath or data-string.
# A ValueError is raised in case of failure.
d = benedict.from_query_string(s, **kwargs)

`from_toml`

# Try to load/decode a toml encoded data and return it as benedict instance.
# Accept as first argument: url, filepath or data-string.
# It's possible to pass decoder specific options using kwargs:
# https://pypi.org/project/toml/
# A ValueError is raised in case of failure.
d = benedict.from_toml(s, **kwargs)

`from_xls`

# Try to load/decode a xls file (".xls", ".xlsx", ".xlsm") from url, filepath or data-string.
# Accept as first argument: url, filepath or data-string.
# It's possible to pass decoder specific options using kwargs:
# - https://openpyxl.readthedocs.io/ (for .xlsx and .xlsm files)
# - https://pypi.org/project/xlrd/ (for .xls files)
# A ValueError is raised in case of failure.
d = benedict.from_xls(s, sheet=0, columns=None, columns_row=True, **kwargs)

`from_xml`

# Try to load/decode a xml encoded data and return it as benedict instance.
# Accept as first argument: url, filepath or data-string.
# It's possible to pass decoder specific options using kwargs:
# https://github.com/martinblech/xmltodict
# A ValueError is raised in case of failure.
d = benedict.from_xml(s, **kwargs)

`from_yaml`

# Try to load/decode a yaml encoded data and return it as benedict instance.
# Accept as first argument: url, filepath or data-string.
# It's possible to pass decoder specific options using kwargs:
# https://pyyaml.org/wiki/PyYAMLDocumentation
# A ValueError is raised in case of failure.
d = benedict.from_yaml(s, **kwargs)

`to_base64`

# Return the dict instance encoded in base64 format and optionally save it at the specified 'filepath'.
# It's possible to choose the subformat used under the hood:
# ('csv', json', 'query-string', 'toml', 'xml', 'yaml'), default: 'json'.
# It's possible to choose the encoding, default 'utf-8'.
# It's possible to pass decoder specific options using kwargs.
# A ValueError is raised in case of failure.
s = d.to_base64(subformat="json", encoding="utf-8", **kwargs)

`to_csv`

# Return a list of dicts in the current dict encoded in csv format and optionally save it at the specified filepath.
# It's possible to specify the key of the item (list of dicts) to encode, default: 'values'.
# It's possible to specify the columns list, default: None (in this case the keys of the first item will be used).
# A ValueError is raised in case of failure.
s = d.to_csv(key="values", columns=None, columns_row=True, **kwargs)

`to_ini`

# Return the dict instance encoded in ini format and optionally save it at the specified filepath.
# It's possible to pass encoder specific options using kwargs:
# https://docs.python.org/3/library/configparser.html
# A ValueError is raised in case of failure.
s = d.to_ini(**kwargs)

`to_json`

# Return the dict instance encoded in json format and optionally save it at the specified filepath.
# It's possible to pass encoder specific options using kwargs:
# https://docs.python.org/3/library/json.html
# A ValueError is raised in case of failure.
s = d.to_json(**kwargs)

`to_pickle`

# Return the dict instance as pickle encoded in Base64 format and optionally save it at the specified filepath.
# The pickle protocol used by default is 2.
# It's possible to pass encoder specific options using kwargs:
# https://docs.python.org/3/library/pickle.html
# A ValueError is raised in case of failure.
s = d.to_pickle(**kwargs)

`to_plist`

# Return the dict instance encoded in p-list format and optionally save it at the specified filepath.
# It's possible to pass encoder specific options using kwargs:
# https://docs.python.org/3/library/plistlib.html
# A ValueError is raised in case of failure.
s = d.to_plist(**kwargs)

`to_query_string`

# Return the dict instance as query-string and optionally save it at the specified filepath.
# A ValueError is raised in case of failure.
s = d.to_query_string(**kwargs)

`to_toml`

# Return the dict instance encoded in toml format and optionally save it at the specified filepath.
# It's possible to pass encoder specific options using kwargs:
# https://pypi.org/project/toml/
# A ValueError is raised in case of failure.
s = d.to_toml(**kwargs)

`to_xml`

# Return the dict instance encoded in xml format and optionally save it at the specified filepath.
# It's possible to pass encoder specific options using kwargs:
# https://github.com/martinblech/xmltodict
# A ValueError is raised in case of failure.
s = d.to_xml(**kwargs)

`to_yaml`

# Return the dict instance encoded in yaml format.
# If filepath option is passed the output will be saved ath
# It's possible to pass encoder specific options using kwargs:
# https://pyyaml.org/wiki/PyYAMLDocumentation
# A ValueError is raised in case of failure.
s = d.to_yaml(**kwargs)

Parse methods

These methods are wrappers of the get method, they parse data trying to return it in the expected type.

`get_bool`

# Get value by key or keypath trying to return it as bool.
# Values like `1`, `true`, `yes`, `on`, `ok` will be returned as `True`.
d.get_bool(key, default=False)

`get_bool_list`

# Get value by key or keypath trying to return it as list of bool values.
# If separator is specified and value is a string it will be splitted.
d.get_bool_list(key, default=[], separator=",")

`get_date`

# Get value by key or keypath trying to return it as date.
# If format is not specified it will be autodetected.
# If choices and value is in choices return value otherwise default.
d.get_date(key, default=None, format=None, choices=[])

`get_date_list`

# Get value by key or keypath trying to return it as list of date values.
# If separator is specified and value is a string it will be splitted.
d.get_date_list(key, default=[], format=None, separator=",")

`get_datetime`

# Get value by key or keypath trying to return it as datetime.
# If format is not specified it will be autodetected.
# If choices and value is in choices return value otherwise default.
d.get_datetime(key, default=None, format=None, choices=[])

`get_datetime_list`

# Get value by key or keypath trying to return it as list of datetime values.
# If separator is specified and value is a string it will be splitted.
d.get_datetime_list(key, default=[], format=None, separator=",")

`get_decimal`

# Get value by key or keypath trying to return it as Decimal.
# If choices and value is in choices return value otherwise default.
d.get_decimal(key, default=Decimal("0.0"), choices=[])

`get_decimal_list`

# Get value by key or keypath trying to return it as list of Decimal values.
# If separator is specified and value is a string it will be splitted.
d.get_decimal_list(key, default=[], separator=",")

`get_dict`

# Get value by key or keypath trying to return it as dict.
# If value is a json string it will be automatically decoded.
d.get_dict(key, default={})

`get_email`

# Get email by key or keypath and return it.
# If value is blacklisted it will be automatically ignored.
# If check_blacklist is False, it will be not ignored even if blacklisted.
d.get_email(key, default="", choices=None, check_blacklist=True)

`get_float`

# Get value by key or keypath trying to return it as float.
# If choices and value is in choices return value otherwise default.
d.get_float(key, default=0.0, choices=[])

`get_float_list`

# Get value by key or keypath trying to return it as list of float values.
# If separator is specified and value is a string it will be splitted.
d.get_float_list(key, default=[], separator=",")

`get_int`

# Get value by key or keypath trying to return it as int.
# If choices and value is in choices return value otherwise default.
d.get_int(key, default=0, choices=[])

`get_int_list`

# Get value by key or keypath trying to return it as list of int values.
# If separator is specified and value is a string it will be splitted.
d.get_int_list(key, default=[], separator=",")

`get_list`

# Get value by key or keypath trying to return it as list.
# If separator is specified and value is a string it will be splitted.
d.get_list(key, default=[], separator=",")

`get_list_item`

# Get list by key or keypath and return value at the specified index.
# If separator is specified and list value is a string it will be splitted.
d.get_list_item(key, index=0, default=None, separator=",")

`get_phonenumber`

# Get phone number by key or keypath and return a dict with different formats (e164, international, national).
# If country code is specified (alpha 2 code), it will be used to parse phone number correctly.
d.get_phonenumber(key, country_code=None, default=None)

`get_slug`

# Get value by key or keypath trying to return it as slug.
# If choices and value is in choices return value otherwise default.
d.get_slug(key, default="", choices=[])

`get_slug_list`

# Get value by key or keypath trying to return it as list of slug values.
# If separator is specified and value is a string it will be splitted.
d.get_slug_list(key, default=[], separator=",")

`get_str`

# Get value by key or keypath trying to return it as string.
# Encoding issues will be automatically fixed.
# If choices and value is in choices return value otherwise default.
d.get_str(key, default="", choices=[])

`get_str_list`

# Get value by key or keypath trying to return it as list of str values.
# If separator is specified and value is a string it will be splitted.
d.get_str_list(key, default=[], separator=",")

`get_uuid`

# Get value by key or keypath trying to return it as valid uuid.
# If choices and value is in choices return value otherwise default.
d.get_uuid(key, default="", choices=[])

`get_uuid_list`

# Get value by key or keypath trying to return it as list of valid uuid values.
# If separator is specified and value is a string it will be splitted.
d.get_uuid_list(key, default=[], separator=",")

Testing

# clone repository
git clone https://github.com/fabiocaccamo/python-benedict.git && cd python-benedict

# create virtualenv and activate it
python -m venv venv && . venv/bin/activate

# upgrade pip
python -m pip install --upgrade pip

# install requirements
pip install -r requirements.txt -r requirements-test.txt

# install pre-commit to run formatters and linters
pre-commit install --install-hooks

# run tests using tox
tox

# or run tests using unittest
python -m unittest

License

Released under MIT License.

Supporting

⭐ Star this project on GitHub
Follow me on GitHub
💙 Follow me on Twitter
💰 Sponsor me on Github

python-benedict's People

Contributors

Stargazers

Watchers

Forkers

gaybro8777 datafields-team neutrinoceros tonydspaniard antran22 ptrkdy janidai jrog-interest isabellarossi gustavogarciapereira draconar liusenlindegithub next-franciscoalgaba simkimsia darkdragon84 bvasil pspitzner 0therguys hazho jaygith ignertic pjcafonso drreeww larryw3i dave-vsdevs milhauzindahauz evilensky sueastside mazebraker vsajip cosrah vishalsingh17 jaedukseo python-repository-hub shalevy1 axleunix arpitjain799 d-chris denperidge-redpencil ryan-workfromhome edith-bot pushfoo clearinterface solipsistmonkey 5l1v3r1 fire17

python-benedict's Issues

Append to list

Maybe this feature already exists and I haven't seen it.
With benedict it is possible to add a new item to a list specifying its index like:

d['results[0].locations[4]'] = "Rome".

It would be fantastic, in my opinion, to have the possibility to append a new element without specifying it's index:

d['results[0].locations[]'] = "Rome"
# Added to locations at index 4 because locations has length 3

Bug: Benedict to_json converts np.int64 to str

Whenever np.int64 datatype comes, benedict is returning str due to which when I json load, I am not seeing number.

json.loads(benedict({"x": [np.int64(0), np.int64(34)]}).to_json())

benedict.to_json returns empty dict for benedict from generator

Python version
Python 3.8.3

Package version
python-benedict==0.21.0

Current behavior (bug description)
bededict.to_json for benedict from generator returns '{}' instead of json dumped dict

Expected behavior
bededict.to_json should return actual json dumped dict

Steps to reproduce

from benedict import benedict


def gen_dict():
    for k, v in enumerate('abcd'):
        yield k, v


if __name__ == '__main__':
    b = benedict(gen_dict())
    assert b == {0: 'a', 1: 'b', 2: 'c', 3: 'd'}
    assert b.to_json()  ==  '{"0": "a", "1": "b", "2": "c", "3": "d"}'

But if recast benedict to dict and back to benedict it works fine

if __name__ == '__main__':
    b = benedict(gen_dict())
    b = benedict(dict(b))
    assert b == {0: 'a', 1: 'b', 2: 'c', 3: 'd'}
    assert b.to_json()  ==  '{"0": "a", "1": "b", "2": "c", "3": "d"}'

Cast benedict to dict

I’m trying to write benedict object to yaml file, and there is one problem.
Benedict.to_yaml add “!!python/object/new:benedict.dicts.benedict\ndictitems:” into yaml file.
How can I cast benedict object to “simple” dictionary, which will look like initialy dictionary?
Or how can I write benedict to yaml without “!!python/object/new:benedict.dicts.benedict\ndictitems:” and keypath_separator:.\n ?

I would like to see in benedict object, member like "python_dict", in order to it would be possible make call like:

my_dict = benedict_object.python_dict
type(my_dict ) is dict
True

Reduce dependencies via separate package?

Hi, have you given any thought to separating the core dictionary traversal/manipulation extensions into its own package? We were surprised that the dependencies included things as far reaching as url fetching, the IO featureset, etc.

Add django utility methods

Add django utility methods:

from_django_request(request)
from_django_model(instance)
from_django_models(instances)

Upvote & Fund

We're using Polar.sh so you can upvote and help fund this issue.
We receive the funding once the issue is completed & confirmed by you.
Thank you in advance for helping prioritize & fund our backlog.

Concat parameter on merge method does not work properly

Wrong behavior

New list overwrites old list, or nothing happens.

Assumed behavior

List contains values from original and merged lists.

Steps to reproduce problem

Init benedict as follows

test = benedict({'foo': {'bar': {'foobar': {'barbar': ['test']}}}})

Init dict to merge as follows

test2 = {'foo': {'bar': {'foobar': {'barbar': ['test2']}}}}

Try to merge

test.merge(test2, overwrite=True, concat=True) 
# Leads to {...'barbar':['test2'] } assumed {...'barbar':['test', 'test2'] }

test.merge(test2, overwrite=False, concat=True) 
# Leads to {...'barbar':['test'] } not sure what should assume....

AttributeError: 'benedict' object has no attribute '_keypath_separator'

Hola! Happy 2020!

I love this library; best thing that happened to me in months, actually :-)

Now, I'm stumbling upon an issue that I haven't been able to resolve on my own.

It occurs when a pool worker attempts to return a list of totally seperate and independent benedict objects.

Any idea why that might be?

Many thanks,
Panos

Exception in thread Thread-3:
Traceback (most recent call last):
  File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 463, in _handle_results
    task = get()
  File "/usr/lib/python3.6/multiprocessing/connection.py", line 251, in recv
    return _ForkingPickler.loads(buf.getbuffer())
  File "/usr/local/lib/python3.6/dist-packages/benedict/dicts/keypath.py", line 125, in __setitem__
    self._check_keypath_separator_in_keys(value)
  File "/usr/local/lib/python3.6/dist-packages/benedict/dicts/keypath.py", line 28, in _check_keypath_separator_in_keys
    sep = self._keypath_separator
AttributeError: 'benedict' object has no attribute '_keypath_separator'

Edit 1:

I just tried the same approach minus the multiprocessing part. It is just an iteration of the following:

i) I create a benedict object,
ii) I append it to a list,
iii) If the list has hit a length of 10,000 items I pickle the list, and create a new one.

Upon pickle read I'm getting the same error:

  File "<stdin>", line 2, in <module>
  File "/usr/local/lib/python3.6/dist-packages/benedict/dicts/keypath.py", line 125, in __setitem__
    self._check_keypath_separator_in_keys(value)
  File "/usr/local/lib/python3.6/dist-packages/benedict/dicts/keypath.py", line 28, in _check_keypath_separator_in_keys
    sep = self._keypath_separator
AttributeError: 'benedict' object has no attribute '_keypath_separator'

Clean values in lists

Is there a way to clean all values, including those that are in a list with one method? If not it would be nice to have.

dump benedict object to yaml not working in versions from 0.20 to 0.22

Python version
3.8.0

Package version
0.20 - 0.22

Current behavior (bug description)
Code example:

#!/bin/python

from benedict import benedict
import yaml

data = benedict({"level1": {"level2": "blablabla"}})
data_yaml = yaml.safe_dump(dict(data))
print(data_yaml)

With python-benedict==0.19.0:

level1:
  level2: blablabla

With python-benedict==0.22.0:

Traceback (most recent call last):
  File "./test.py", line 7, in <module>
    data_yaml = yaml.safe_dump(dict(data))
  File "/home/lburinov/.local/lib/python3.8/site-packages/yaml/__init__.py", line 306, in safe_dump
    return dump_all([data], stream, Dumper=SafeDumper, **kwds)
  File "/home/lburinov/.local/lib/python3.8/site-packages/yaml/__init__.py", line 278, in dump_all
    dumper.represent(data)
  File "/home/lburinov/.local/lib/python3.8/site-packages/yaml/representer.py", line 27, in represent
    node = self.represent_data(data)
  File "/home/lburinov/.local/lib/python3.8/site-packages/yaml/representer.py", line 48, in represent_data
    node = self.yaml_representers[data_types[0]](self, data)
  File "/home/lburinov/.local/lib/python3.8/site-packages/yaml/representer.py", line 207, in represent_dict
    return self.represent_mapping('tag:yaml.org,2002:map', data)
  File "/home/lburinov/.local/lib/python3.8/site-packages/yaml/representer.py", line 118, in represent_mapping
    node_value = self.represent_data(item_value)
  File "/home/lburinov/.local/lib/python3.8/site-packages/yaml/representer.py", line 58, in represent_data
    node = self.yaml_representers[None](self, data)
  File "/home/lburinov/.local/lib/python3.8/site-packages/yaml/representer.py", line 231, in represent_undefined
    raise RepresenterError("cannot represent an object", data)
yaml.representer.RepresenterError: ('cannot represent an object', {'level2': 'blablabla'})

to_yaml() not working too.
Code example:

#!/bin/python

from benedict import benedict

data = benedict({"level1": {"level2": "blablabla"}})
data_yaml = data.to_yaml()
print(data_yaml)

Result with python-benedict==0.19.0:

level1:
  level2: blablabla

Result with python-benedict==0.22.0:

level1: !!python/object/new:benedict.dicts.benedict
  dictitems:
    level2: blablabla
  state:
    _dict:
      level2: blablabla
    _keypath_separator: .

Add .ini file format support.

Add .ini file format support with the following methods:

from_ini
to_ini

Strange compatibility issue Spyder 4 vs. PyCharm

Python version
3.8

Package version
0.18.2

Current behavior (bug description)
When I run this simple code in Spyder 4 (WinPython64-3.8.5.0 ) the package does what it should.

from benedict import benedict

d = benedict('jason.json', format='json')

print(d['embeds[0].author.name'])

jason.json file:

{
  "content": "Message content",
  "embeds": [
    {
      "title": "Embed1 title",
      "description": "embed 1 description",
      "url": "http://embed1titleurl.com",
      "color": 7506394,
      "author": {
        "name": "embed1 Author name",
        "url": "http://embed1authorurl.com",
        "icon_url": "https://discohook.org/static/discord-avatar.png"
      },
      "footer": {
        "text": "embed 1 footer text",
        "icon_url": "https://discohook.org/static/discord-avatar.png"
      },
      "timestamp": "2020-09-16T22:00:00.000Z"
    },
    {
      "title": "Embed2 title",
      "description": "embed 2 description",
      "url": "http://embed2titleurl.com",
      "color": 7506394,
      "fields": [
        {
          "name": "embed 2 Field 1 name",
          "value": "embed 2 field 1 value"
        }
      ],
      "author": {
        "name": "embed2 Author name",
        "url": "http://embed2authorurl.com",
        "icon_url": "https://discohook.org/static/discord-avatar.png"
      },
      "footer": {
        "text": "embed 2 footer text",
        "icon_url": "https://discohook.org/static/discord-avatar.png"
      },
      "timestamp": "2020-09-07T22:00:00.000Z"
    }
  ]
}

Output:
embed1 Author name

but when I run this script in PyCharm 2020.2.1 (Community Edition) (Python 3.8) it fails on the import.

Traceback (most recent call last):
  File "C:/Users/TAROX/PycharmProjects/GateControlBot/benedict_test.py", line 3, in <module>
    from benedict import benedict
ImportError: cannot import name 'benedict' from 'benedict' (C:\Users\TAROX\PycharmProjects\GateControlBot\venv\lib\site-packages\benedict\__init__.py)

I'm guessing because the package and the class is named exactly the same (btw. BIG NoNo) pycharm is reporting this as an error.
I like the functionality you guys put into this package but I'm currently working inside pycharm and kinda need it working there.

setting value on path with an index doesn't create the list element

Python version
3.8.1

Package version
0.22.0

Current behavior (bug description)
d[x[0].y] = value
throws an exception if x[0] doesn't exist

Expected behavior
d[x[0].y] = value
I expected that the list element 0 would be automatically created (i.e. like a key is) if it don't exist

conda-forge package

Hi @fabiocaccamo, we are looking to use benedict in https://github.com/iterative/dvc. We need to support conda in DVC and are currently in the process of submitting a conda-forge package for python-benedict. Would you like us add you to the maintainers list for the conda package? (Maintenance mostly just involves approving auto-generated PRs from conda's bots whenever a new pypi package is released)

conda-forge/staged-recipes#13326

new_flatten() suggestion

What do you think about a new method that could combine some of the characteristics of three current methods.
I would suggest (after your excellent work for the match method) something like the following:

r = d.new_flatten(regex_pattern, separator='_', in_keys=True, in_values=True, case_sensitive=False)
     where:
     - search pattern could be imported from the match(...) method
     - arguments could be some of the search(...) parameters
     - returned data could be a dict just like flatten() --> {'full_key_path': value}

It might look like a grep command with a useful output :-)

Thanks and Regards

Update or insert

Problem description

The current merge() function is awesome but can not handle situation where we want UPDATE OR IF NECESSARY extend nested dictionary/list with new value/path.

Let me deliver an example:

test_object = {
    "foo": [{"bar": {"rer" : "value"}}]
}

new_key_value_object = {
'foo': [{'bar': {'test': 'update'}}]
}

# I would want to update or insert new value to path `foo[0].bar.test` depending on if the path exist or not.

# With overwrite true:
> test.object.merge(new_key_value_object, overwrite=True) 
> result:  {'foo': [{'bar': {'test': 'update'}}]}

# With overwrite false:
> test.object.merge(new_key_value_object, overwrite=False) 
> result:  {"foo": [{"bar": {"rer" : "value"}}]}

# Desired result:
 {
"foo": [
                  {"bar": {
                       "rer" : "value",
                       "test": "update"
                            }
                  }
         ]
}

Generally I would want function which insert or update value to path, but not override paths. If paths are missing we add the new path but not override anything but the final value

Possible solutions

Create insert_or_update(path, newValue) method
Add new parameters to merge() function to support this feature.

Improve traverse, calling the callback with the full path key instead of the last one

Sometimes in traverse api it can be useful to have the full path in the callback, for example to find a particular node in the nested dict.

Something like:

def _traverse_collection(d, path, callback):
    if type_util.is_dict(d):
        _traverse_dict(d, path, callback)
    elif type_util.is_list_or_tuple(d):
        _traverse_list(d, path, callback)

def _traverse_dict(d, path, callback):
    keys = list(d.keys())
    for key in keys:
        value = d.get(key, None)
        callback(d, key, path, value)
        path.append(key)
        _traverse_collection(value, path, callback)
        path.pop()


def _traverse_list(ls, path, callback):
    items = list(enumerate(ls))
    for index, value in items:
        callback(ls, index, path, value)
        path.append(index)
        _traverse_collection(value, path, callback)
        path.pop()

def traverse(d, callback):
    if not callable(callback):
        raise ValueError('callback argument must be a callable.')
    _traverse_collection(d, [], callback)

example code:

>>> d = {'dict1': [{'foo': 1, 'bar': 2}], 'dict2': {'baz': 3, 'quux': 4}}
>>> def traverse_item(dct, key, path, value):
...    print('key: {} - path: {} - value: {}'.format(key, path, value))
... 
>>> d.traverse(traverse_item)
key: dict1 - path: [] - value: [{'foo': 1, 'bar': 2}]
key: 0 - path: ['dict1'] - value: {'foo': 1, 'bar': 2}
key: foo - path: ['dict1', 0] - value: 1
key: bar - path: ['dict1', 0] - value: 2
key: dict2 - path: [] - value: {'baz': 3, 'quux': 4}
key: baz - path: ['dict2'] - value: 3
key: quux - path: ['dict2'] - value: 4

When key has space, it does not work for key path. but it works for key list

Python version
3.6

Package version
0.14.1

Current behavior (bug description)
Not sure if it's a bug

In [19]: test = input_benedict["Project Manager[0].signum"]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-19-7b8ccd1047cc> in <module>
----> 1 test = input_benedict["Project Manager[0].signum"]

/usr/local/lib/python3.6/site-packages/benedict/dicts/keypath.py in __getitem__(self, key)
     92         else:
     93             key = keys[0]
---> 94             value = super(KeypathDict, self).__getitem__(key)
     95         return value
     96

KeyError:

but key list works

test = input_benedict["Project Manager", 0, "signum"]

Expected behavior
Works for both keypath and key list?

List indexes in keypath?

Hi there, I'm loving benedict, but have a feature request... Can we have something like this:

from benedict import benedict

d = benedict({
    "a": [
        {"b": 42}, 
        {"b": 24}
    ]
})
d.get("a.1.b")
=> 24

Are we looking for S3 integration where the input file could be anything (xml/csv) and output to be dict

Input : S3 location files (all supported file format of benedict)
Should support all operations there after

Support returning indexes of list value in keypaths() utility function

For example,

data = {"items": ["a", {"c": 1}]}

assert benedict(data).keypaths() == [
    "items",
    "items[0]",
    "items[1]",
    "items[1].c"
]

In my use case, I have a regular expression as a filter for keypaths, the filtered keypaths are passed to subset() to make a new filtered dict.

For example,

import re

filter = "items[\d+].name"

data = {"items": [{"name": "a", "value": "b"}, {"name": "c", "value": "d"}]}
b_data = benedict(data)


filtered_keypaths = [kp for kp in b_data.keypaths() if re.fullmatch(filter, kp)]
filtered_data = b_data.subset(filtered_keypaths)

assert filtered_data == {"items": [{"name": "a"}, {"name": "b"}]}

Just like what map() works in the functional programming.

Search for int or float value no results

Python version
3.8

Package version
0.17.0

Current behavior (bug description)

d = benedict({'A': 1, 'B': {'C': 'hello', 'D': 'Hello', 'E': 123}})
r = d.search('hello', in_keys=False, in_values=True, exact=False, case_sensitive=False)
print(r)
r = d.search(123, in_keys=False, in_values=True, exact=False, case_sensitive=False)
print(r)

output:

[({'C': 'hello', 'D': 'Hello', 'E': 123}, 'C', 'hello'), ({'C': 'hello', 'D': 'Hello', 'E': 123}, 'D', 'Hello')]
[]

Expected behavior
Actual path, key, value

The packages fails to load AWS SAM yaml format with resources reference

Python version
3.8

Package version
@latest (10.08.20)

Current behavior (bug description)

Loading a YAML template with the following entry

...
AccountsFunction:
    Description: "Accounts Function ARN"
    Value: !GetAtt AccountsLambda.Arn

throws the following exception

ValueError: Invalid data or url or filepath argument: .../template.yaml
could not determine a constructor for the tag '!GetAtt'

Expected behavior
Such tags should be supported as well

I can't provide the whole template as it contains sensitive info but the provided info should be sufficient.
Please contact me if you have further questions.
Thanks in advance.

to_toml() ValueError: Circular reference detected

Python version

3.7.0

Package version

0.23.2

Current behavior (bug description)

After reading and modifying a TOML value in a benedict object. If trying to write the changes back to the TOML a ValueError is raised.

Example:

from benedict import benedict
pyproject_toml = benedict("./pyproject.toml", format="toml")
pyproject_toml["tool.poetry.name"] = "name"
pyproject_toml.to_toml()

Execption raised:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/.../.venv/lib/python3.7/site-packages/benedict/dicts/io/io_dict.py", line 223, in to_toml
    return self._encode(self, 'toml', **kwargs)
  File "/.../.venv/lib/python3.7/site-packages/benedict/dicts/io/io_dict.py", line 55, in _encode
    s = io_util.encode(d, format, **kwargs)
  File "/.../.venv/lib/python3.7/site-packages/benedict/dicts/io/io_util.py", line 28, in encode
    s = serializer.encode(d, **kwargs)
  File "/.../.venv/lib/python3.7/site-packages/benedict/serializers/toml.py", line 20, in encode
    data = toml.dumps(d, **kwargs)
  File "/.../.venv/lib/python3.7/site-packages/toml/encoder.py", line 67, in dumps
    raise ValueError("Circular reference detected")
ValueError: Circular reference detected

Expected behavior

New changed TOML dumped to stdout as string and if filepath argument specified, write to destination filepath.

The bug is identified in the code and a PR to fix the problem is going to be submitted ASAP.

filter and modify list of dicts

So I have a list of dicts like this (this dict could be in nested dict as well):
guilds = [{"ID":1, "Name":"x"}, {"ID":2, "Name":"y"}]

I want to modify the value of the Name key in the dict which ID is 1. I could not find how can I do that with the library.

Add docstrings to methods

When programming using the python-benedict, I like to use the fact that my IDE can automatically show docstrings for methods. This way, you don't have to constantly look up the source code or guess the meaning of parameters.

You pretty much have all of this information in the readme.md, so it might be good to copy that into your code.
Ideally, you can then automatically generate more extensive and searchable documentation with sphinx.

Example:

>>> from benedict import benedict
>>> test_dict = benedict({"a": 1})
>>> help(test_dict.clean)

Help on method clean in module benedict.dicts:

clean(strings=True, dicts=True, lists=True) method of benedict.dicts.benedict instance

This is not informative and should return a proper docstring.

wish-list: override parameter to merge function

I just discovered your "benedict" package and I'm really enjoying it. It greatly simplifies the handling of dict in python. I am not a professional programmer but I love doing it. I wanted to ask if it is possible to add the "override" parameter to the merge() function, perhaps keeping its default value to False. In the meantime, waiting for your reply, I took the liberty of making a copy of merge.py and modifying it accordingly (attached is my solution just as sample).
Thanks very much for your work
Regards
Loreto

LnMerge.zip

[Question] Can I use [n] as a way to loop through all in list?

In [1]:from benedict import benedict

In [2]: test = {"lorem": [{"ipsum":"a"}, {"ipsum": "b"}]}

In [3]: benedict_test = benedict(test)

In [4]: result = benedict_test.get("lorem[0].ipsum")

In [5]: result
Out[5]: 'a'

In [6]: result = benedict_test.get("lorem[n].ipsum")

In [7]: result

I guess the answer to my question is a simple no.

But I was wondering if there's a way to get what I want? Either via a feature request or via other pythonic code to work with existing benedict?

Expected output is a list of ['a', 'b']

flatten method should return a line for each element in a list

I need that flatten method return {"key_path": value} also for elements in a list in a dictionary.
For the following dict:

test:
  d10:
    d11: single
    d12:
      crontab: ["cron_01", "cron_02"]
      minidlna: ["mini_01", "mini_02"]
    d13:
      - dict01: ciao_01
      - dict02: ciao_02

dict.flatten() return:

{
    "test/d10/d11": "single",
    "test/d10/d12/crontab": ["cron_01", "cron_02"],
    "test/d10/d12/minidlna": ["mini_01", "mini_02"],
    "test/d10/d13": [{"dict01": "ciao_01"}, {"dict02": "ciao_02"} ]
}

I would like to get something like:

{
    "test/d10/d11": "single",
    "test/d10/d12/crontab/[0]": "cron_01",
    "test/d10/d12/crontab/[1]": "cron_02",
    "test/d10/d12/minidlna/[0]": "mini_01",
    "test/d10/d12/minidlna/[1]": "mini_02",
    "test/d10/d13/[0]/dict01": "ciao_01",
    "test/d10/d13/[1]/dict02": "ciao_02"
}

where the list-index could be '[i]' or just 'ì' or anything easily identifiable.
Attached is a sample of flatten module, modified from the original, to return the above output if explode_lists=True is passed as argument.
Do you think this can be done?

My wish would go further and ask if benedict could be able to use the returned path to point to the value of the single element of list.

Regards
Loreto

flatten.zip

Unable to clone existing benedict object with keypath_separator=None

Python version
3.6.10

Package version
0.20.0

Current behavior (bug description)

>>> b = benedict({"a.b": 1}, keypath_separator=None)
>>> b
{'a.b': 1}
>>> c = benedict(b, keypath_separator=None)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/peter/devel/env/lib/python3.6/site-packages/benedict/dicts/__init__.py", line 39, in __init__
    super(benedict, self).__init__(args[0].dict())
  File "/home/peter/devel/env/lib/python3.6/site-packages/benedict/dicts/keypath/keypath_dict.py", line 14, in __init__
    keypath_util.check_keys(self, self._keypath_separator)
  File "/home/peter/devel/env/lib/python3.6/site-packages/benedict/dicts/keypath/keypath_util.py", line 24, in check_keys
    traverse(d, check_key)
  File "/home/peter/devel/env/lib/python3.6/site-packages/benedict/core/traverse.py", line 31, in traverse
    _traverse_collection(d, callback)
  File "/home/peter/devel/env/lib/python3.6/site-packages/benedict/core/traverse.py", line 8, in _traverse_collection
    _traverse_dict(d, callback)
  File "/home/peter/devel/env/lib/python3.6/site-packages/benedict/core/traverse.py", line 17, in _traverse_dict
    callback(d, key, value)
  File "/home/peter/devel/env/lib/python3.6/site-packages/benedict/dicts/keypath/keypath_util.py", line 23, in check_key
    '\'{}\', found: \'{}\'.'.format(separator, key))
ValueError: keys should not contain keypath separator '.', found: 'a.b'.

Expected behavior

>>> from benedict import benedict
>>> b = benedict({"a.b": 1}, keypath_separator=None)
>>> b
{'a.b': 1}
>>> c = benedict(b, keypath_separator=None)

This did work in 0.19.0.

Original object is getting updated in the `merge` operation

Python version
3.8.5

Package version
0.23.2

Current behavior (bug description)
The benedict object when merged with another dict also updates the original object, even though both point to different memory locations.
In the example below, a was passed to benedict and stored in b, and even though both have different memory addresses, both of these are updated in the merge operation:

>>> from benedict import benedict
>>> 
>>> a = {'1': {'words': 'one'}, '2': {'words': 'two'}}
>>> 
>>> b = benedict(a)
>>> 
>>> a is b
False
>>> 
>>> a['1'] is b['1']
False
>>> 
>>> 
>>> b.merge({'1': {'val': 1}})
>>> 
>>> b
{'1': {'words': 'one', 'val': 1}, '2': {'words': 'two'}}
>>> 
>>> a
{'1': {'words': 'one', 'val': 1}, '2': {'words': 'two'}}
>>> 
>>> b.merge({'2': {'words': 'due'}})
>>> 
>>> b
{'1': {'words': 'one', 'val': 1}, '2': {'words': 'due'}}
>>> a
{'1': {'words': 'one', 'val': 1}, '2': {'words': 'due'}}
>>>

Expected behavior
The merge operation should not update the original object (a in this case)

Add binary format I/O support.

Add binary format I/O methods extending the .plist serializer to preserve data types.

from_binary()
to_binary()

filter method not taking key_path_separator in consideration

Python version
3.7.4

Package version
0.14.0

Current behavior (bug description)
d = benedict()
d.keypath_separator = '$'
--implementation--
print (d.keypath_separator)
d1 = d.filter(lambda k, v: True)
print (d1.keypath_separator)

$
.
This behavior makes me loose the correct structure when I have '.' in the key parameters.
Expected behavior
$
$

index filtering in list of dictionaries

For my project it was necessary to be able to easily include index filtering. So I had to develop a similar class that enables adressing one or several dictionary entries according to the folowing syntax:
d['key1.key2.(key3.key4==value).key3.key5']=newValue. I think this would be a very pythonic feature that would allow users to work with dictionaries and json-files in a similar way like e.g. with pandas dataframes.
Have you ever thought of including this functionality into python-benedict? If you are interested, I could contribute my code.

wrong key-values updates using pointers

Python version
Python 3.6.9

Package version
python-benedict (0.18.1)

Current behavior (bug description)
I'll try to explain the problem using a simple example.

`my_servers="""

SERVER:
    S01:
        alias: s01_alias
        ssh_port: s01_port
        host: s01_host
        credentials:
            username: s01_user
            password: s01_passw
        location:
            building: s01_building
            floar: s01_floar
            room: s01_room

Creating a benedict dictionary:

servers=benedict.from_yaml(my_servers)

Creating a couple of pointers (are they pointers???)

s01_ptr=benedict(servers['SERVER.S01'])
s01_copy=benedict(servers['SERVER.S01'].copy())

Changing some items:

 s01_ptr['alias']='ptr_alias'
 s01_ptr['location.building']='ptr_building'
 s01_ptr['credentials.username']='ptr_unsername'

OUTPUT of the three objects:

--- s01_ptr:
{
    "alias": "ptr_alias",    <--- changed
    "credentials": {
        "password": "s01_passw",
        "username": "ptr_unsername"    <--- changed
    },
    "host": "s01_host",
    "location": {
        "building": "ptr_building",    <--- changed
        "floar": "s01_floar",
        "room": "s01_room"
    },
    "ssh_port": "s01_port"
}

--- s01_copy:
{
    "alias": "s01_alias",    <--- NOT changed
    "credentials": {
        "password": "s01_passw",
        "username": "ptr_unsername"    <--- changed
    },
    "host": "s01_host",
    "location": {
        "building": "ptr_building",    <--- changed
        "floar": "s01_floar",
        "room": "s01_room"
    },
    "ssh_port": "s01_port"
}

--- servers:
{
    "SERVER": {
        "S01": {
            "alias": "s01_alias",    <--- NOT changed
            "credentials": {
                "password": "s01_passw",
                "username": "ptr_unsername"    <--- changed
            },
            "host": "s01_host",
            "location": {
                "building": "ptr_building",    <--- changed
                "floar": "s01_floar",
                "room": "s01_room"
            },
            "ssh_port": "s01_port"
        }
    }
}

Expected behavior
As we can see, on the output, all objects are impacted by the changes:
s01_ptr has been affected by all changes. It's OK
s01_copy only the sub-keys out of the root have been affected. It's NOT OK
original server object just as s01_copy. It's NOT OK

So if they are pointers, the result is wrong, if they are copies the result is wrong.

Obviously, all this in case I have not made a huge mistake, in which case, I apologize and in any case thank you for the time spent analyzing the problem.

Just another question: what are the differences between the two statements used to create s01_ptr and s01_copy?
For as I see it, s01_ptr should return a real pointer (just as in a native python dictionary) while s01_copy should return a copy according to the .copy() suffix.
At the end what are the differences between the copy() and clone().

Thank You for Your time
Loreto

P.S.: Attached a simple script to re-create the problem.
Benedict_test02.zip

Cannort import benedict

Python version
3.8

Package version
0.22.2

Current behavior (bug description)

from benedict import benedict
Traceback (most recent call last):
File "", line 1, in
ImportError: cannot import name 'benedict' from 'benedict' (unknown location)

Expected behavior
It should be possible to import and use benedict.

** Additional pip information
$ pip show python-benedict
Name: python-benedict
Version: 0.22.2
Summary: python-benedict is a dict subclass with keylist/keypath support, I/O shortcuts (base64, csv, json, pickle, plist, query-string, toml, xml, yaml) and many utilities... for humans, obviously.
Home-page: https://github.com/fabiocaccamo/python-benedict
Author: Fabio Caccamo
Author-email: [email protected]
License: MIT

get doesn't work when the key is a list with one element.

Python version
3.7.4

Package version
0.14.0

Current behavior (bug description)
I tried the following:

from benedict import benedict

key = ['1', '2']
dictionary = {'1' : {'2' : 'two'}}
d = benedict(dictionary)
print(d.get(key))

key = ['1']
dictionary = {'1' : 'one'}
d = benedict(dictionary)
print(d.get(key))

'two' will be printed but 'one' not. I get an type error: TypeError: unhashable type: 'list'

Expected behavior
I expected that the last command will print 'one'.

Keep pointer to the initial input dict (if provided).

json.dumps no longer works directly with benedict in 0.20.0

Python version
3.7.7

Package version
0.20.0

Django version
2.2.15

Current behavior (bug description)

Let's say you have a Django 2.2 model that uses the postgres JSONField

from django.contrib.postgres.fields import JSONField
from django.db import models

class SomeModel(models.Model):
    linkage = JSONField(default=dict)

When you then do

normal_dict = {'list': {'scope': {'pk': '109'}}
benedict_dict = benedict(normal_dict)

instance = SomeModel.objects.create(linkage=benedict_dict)
retrieved =SomeModel.objects.last()

the instance.linkage will definitely still show {'list': {'scope': {'pk': '109'}} but retrieved.linkage will definitely show {}

What if cast to dict?

if we do this

normal_dict = {'list': {'scope': {'pk': '109'}}
benedict_dict = benedict(normal_dict)

instance = SomeModel.objects.create(linkage=dict(benedict_dict))

the instance.linkage will definitely still show {'list': {'scope': {'pk': '109'}} but retrieved.linkage will definitely show {'list':{}}

My workaround

normal_dict = {'list': {'scope': {'pk': '109'}}
benedict_dict = benedict(normal_dict)

sanitized_dict = benedict_dict
if isinstance(benedict_dict, benedict):
    sanitized_dict = json.loads(benedict_dict.to_json())

instance = SomeModel.objects.create(linkage=sanitized_dict)
instance = SomeModel.objects.create(linkage=sanitized_dict)

Expected behavior
in 0.19.0, this situation does not occur.

But in 0.20.0 this situation occurs. Based on my reading of the changelog it appears that all the methods will auto cast everything to benedict. I guess this is the issue.

Also I would say this goes against the spirit of the line in the readme https://github.com/fabiocaccamo/python-benedict#basics
where it says you can use benedict as a normal dict. That's not true anymore with regards with 0.20.0 and JSONField

Change of behavior from 0.16.0 to 0.17.0

Python version
3.7.5

Package version
0.17.0

Current behavior (bug description)

d = benedict({'jobs': [{'name': 'job-name', 'plan': [{'put': 'put-name', 'params': {'action': 'create', 'fields': {'Version/s': [{'name': 'N/A'}], 'User Impact': 'Info on User Impacts'}}}]}]}, keypath_separator='/')
outputs :

On 0.16.0
{'jobs': [{'name': 'job-name', 'plan': [{'put': 'put-name', 'params': {'action': 'create', 'fields': {'Version/s': [{'name': 'N/A'}], 'User Impact': 'Info on User Impacts'}}}]}]}

On 0.17.0
File "/usr/local/lib/python3.7/site-packages/benedict/dicts/keypath/keypath_util.py", line 23, in check_key
''{}', found: '{}'.'.format(separator, key))
ValueError: keys should not contain keypath separator '/', found: 'Version/s'.

Expected behavior
We saw this error while running our unit tests.
I didn't see a specific bugfix, I suppose that 0.17.0 is the correct behavior?
Can you please confirm it?

Add a method that tries different key paths and returns the first truthy value or the last one

Hello, awesome library you got here, I thoroughly enjoyed the pun as well.

Working with it made me realize of another functionality that could be built in, namely trying a list of keypaths until one of them returns a truthy value. It could be called try_keypaths() or something like that.

Here's how I imagine one would use it:

info = {'details' : {'general': { 'description': 'A very neat description' } } }

molto_bene_dict = benedict(info)

try_keys = [
    'something.else.details.general.dscription',
    'unexisting.whatever',
    'details.general.description'
]

molto_bene_dict.try_keypaths(try_keys) == 'A very neat description'

If this is something you think it'd fit, I could implement it myself :)
Cheers!

TypeError: 'module' object is not callable

Python version
3.6.9

Package version
0.18.1

Current behavior (bug description)
on the very first line where i call the benedict() class, an exception is thrown:

Traceback (most recent call last):
  File "benedict-test.py", line 18, in <module>
    d = benedict()
TypeError: 'module' object is not callable

This even happens when creating a dictionary just as your usage examples demonstrate.

Expected behavior
see above!

Can't specify a keypath separator at the same time as creating a dictionary from yaml

Python version
2.7

Package version
python-benedict 0.14.1

Current behavior (bug description)
When attempting to create a dictionary from a yaml file, if will fail if the data inside the yaml file contains a "." . This is expected since the period is the default separator. However, this seems like it should make it work, but it doesn't:

d = benedict.from_yaml("test.yml", keypath_separator=None)

When attempting to create a dictionary from a yaml file, the option to set the keypath (or disable it does not work) does not work.

test.yml contents:

192.168.0.1:
  test: value
  test2: value2
value:
  value_with_period: 12.34.45

Expected behavior

This should work, if the test.yml contains a value that contains the separator.
d = benedict.from_yaml("test.yml", keypath_separator=None)

Is there a better way to do this? I see similiar issues, but I don't see something that is specifically helpful.

Missing key in KeyError Exception

Python version
Python 3.8

Package version
0.18.1

Current behavior (bug description)
I am using benedict to "flatten" nested dictionaries for easier use and using % as a separator.
When benedict can't find a key in the dictionary, it just gives the following error:
KeyError

It doesn't give any other indication what key exactly is missing, which is the normal behavior for dictionaries.

Actual full error:

[ERROR] KeyError
Traceback (most recent call last):
...traceback related to my code...
File "/var/task/benedict/dicts/keypath/keypath_dict.py", line 34, in getitem
return super(KeypathDict, self).getitem(
File "/var/task/benedict/dicts/keylist/keylist_dict.py", line 43, in getitem
return self._getitem_by_keys(key)
File "/var/task/benedict/dicts/keylist/keylist_dict.py", line 52, in _getitem_by_keys
raise KeyError
KeyError

Expected behavior
If it is possible, expected behavior would be that the error also states what key is missing?

Output to JSON broken after clone()

Python version
3.6.9

Package version
0.22.2

Current behavior (bug description)
I've discovered a case where JSON output is incorrect for cloned benedicts. It can be reproduced with the following

$ pip freeze | grep benedict
python-benedict==0.22.2
$ python
Python 3.6.9 (default, Jan  9 2020, 16:16:25) 
[GCC 4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.10.44.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.

>>> from benedict import benedict
>>> import json
>>> d = {
...  'id': '37e4f6e876',
...  'meta': {'data': {'category': 'category0',
...                    'id': 'data_id',
...                    'title': 'A title'},
...           'id': '37e4f6e876',
...           'k0': {'ka': {'key1': '',
...                         'key2': 'value2',
...                         'key3': 'value3',
...                         'key4': True},
...                  'kb': {'key1': '',
...                         'key2': 'value2',
...                         'key3': 'value3',
...                         'key4': True},
...                  'kc': {'extra_key2': 'value2',
...                         'key1': '',
...                         'key2': 'value2',
...                         'key3': 'value3',
...                         'key4': True},
...                  'kd': {'key1': '',
...                         'key2': 'value2',
...                         'key3': 'value3',
...                         'key4': True},
...                  'ke': {'key1': '',
...                         'key2': 'value2',
...                         'key3': 'value3',
...                         'key4': True},
...                  'kf': {'key1': '',
...                         'key2': 'value2',
...                         'key3': 'separated',
...                         'key4': True}},
...           'language': 'en',
...           'name': 'name_value'}}
>>> keypaths = ['id', 'meta.k0.kc', 'meta.language']
>>> d = benedict(d)
>>> d_new = benedict()
>>> for path in keypaths:
...     d_new[path] = d[path]
... 
>>> d_new
{'id': '37e4f6e876', 'meta': {'k0': {'kc': {'extra_key2': 'value2', 'key1': '', 'key2': 'value2', 'key3': 'value3', 'key4': True}}, 'language': 'en'}}
>>> d_new.to_json()
'{"id": "37e4f6e876", "meta": {"k0": {"kc": {"extra_key2": "value2", "key1": "", "key2": "value2", "key3": "value3", "key4": true}}, "language": "en"}}'
>>> json.dumps(d_new)
'{"id": "37e4f6e876", "meta": {"k0": {"kc": {"extra_key2": "value2", "key1": "", "key2": "value2", "key3": "value3", "key4": true}}, "language": "en"}}'
>>> d_new2 = d_new.clone()
>>> d_new2
{'id': '37e4f6e876', 'meta': {'k0': {'kc': {'extra_key2': 'value2', 'key1': '', 'key2': 'value2', 'key3': 'value3', 'key4': True}}, 'language': 'en'}}
>>> d_new2.to_json()
'{"id": "37e4f6e876", "meta": {"k0": {"kc": {}}, "language": "en"}}'
>>> json.dumps(d_new2)
'{"id": "37e4f6e876", "meta": {"k0": {"kc": {}}, "language": "en"}}'
>>>

Note that the kc nested dictionary is empty.

Expected behavior

The kc dictionary should be {"extra_key2": "value2", "key1": "", "key2": "value2", "key3": "value3", "key4": true}

Can not support one yaml format

apiVersion: v1
data:
  filebeat.yml: |-

if my yaml is as above, benedict.from_yaml will throw exception

Traceback (most recent call last):
  File "/Users/mizeng/Projects/PycharmProjects/Ming/tools/yaml_handle/process_yaml1.py", line 13, in <module>
    d = benedict.from_yaml(f)
  File "/Users/mizeng/Projects/PycharmProjects/Ming/tools/.venv/lib/python3.7/site-packages/benedict/dicts/io/io_dict.py", line 150, in from_yaml
    return cls(s, format='yaml', **kwargs)
  File "/Users/mizeng/Projects/PycharmProjects/Ming/tools/.venv/lib/python3.7/site-packages/benedict/dicts/__init__.py", line 44, in __init__
    super(benedict, self).__init__(*args, **kwargs)
  File "/Users/mizeng/Projects/PycharmProjects/Ming/tools/.venv/lib/python3.7/site-packages/benedict/dicts/keypath/keypath_dict.py", line 16, in __init__
    keypath_util.check_keys(self, self._keypath_separator)
  File "/Users/mizeng/Projects/PycharmProjects/Ming/tools/.venv/lib/python3.7/site-packages/benedict/dicts/keypath/keypath_util.py", line 24, in check_keys
    traverse(d, check_key)
  File "/Users/mizeng/Projects/PycharmProjects/Ming/tools/.venv/lib/python3.7/site-packages/benedict/core/traverse.py", line 31, in traverse
    _traverse_collection(d, callback)
  File "/Users/mizeng/Projects/PycharmProjects/Ming/tools/.venv/lib/python3.7/site-packages/benedict/core/traverse.py", line 8, in _traverse_collection
    _traverse_dict(d, callback)
  File "/Users/mizeng/Projects/PycharmProjects/Ming/tools/.venv/lib/python3.7/site-packages/benedict/core/traverse.py", line 18, in _traverse_dict
    _traverse_collection(value, callback)
  File "/Users/mizeng/Projects/PycharmProjects/Ming/tools/.venv/lib/python3.7/site-packages/benedict/core/traverse.py", line 8, in _traverse_collection
    _traverse_dict(d, callback)
  File "/Users/mizeng/Projects/PycharmProjects/Ming/tools/.venv/lib/python3.7/site-packages/benedict/core/traverse.py", line 17, in _traverse_dict
    callback(d, key, value)
  File "/Users/mizeng/Projects/PycharmProjects/Ming/tools/.venv/lib/python3.7/site-packages/benedict/dicts/keypath/keypath_util.py", line 23, in check_key
    '\'{}\', found: \'{}\'.'.format(separator, key))
ValueError: keys should not contain keypath separator '.', found: 'filebeat.yml'.

Upvote & Fund

We're using Polar.sh so you can upvote and help fund this issue.
We receive the funding once the issue is completed & confirmed by you.
Thank you in advance for helping prioritize & fund our backlog.

Performance issue

Python version
3.7

Package version
0.18.0 vs 0.21.0

Current behavior (bug description)
0.21.0 is up to 30 times slower in some instances than 0.18.0 for "large" dictionaries

Expected behavior
Performance is equal or better in 0.21.0

Code to reproduce:
import benedict
from benedict import benedict as ben
import logging
import os
logger = logging.getLogger(os.path.basename(file))
logger.setLevel(logging.INFO)

formatter = logging.Formatter('%(asctime)s : [%(levelname)s] : %(name)s : %(message)s')
log_file_handler = logging.StreamHandler()
log_file_handler.setFormatter(formatter)
logger.addHandler(log_file_handler)

Create large nested dictionary

test = ben({})

logger.info(f"Starting test with python-benedict version: {benedict.version}")
for i in range(0, 500):
for j in range(0, 100):
test.set(f"{i}.{j}", f"text-{i}-{j}")

Access multiple elements with a few missing element paths

for i in range(0, 550):
for j in range(0, 110):
test.get(f"{i}.{j}", None)

logger.info("End")

Any direct way to use keypath for list of dictionaries?

Example:
maindict =

{
	"components": [
		{
			"name": "comp1",
			"value": "value1"
		},
		{
			"name": "comp2",
			"value": "value2"
		}
	]
}

I would like to get a result like this: ["comp1", "comp2"]

Any way to achieve this?

Opening json file

I'm presuming that this is just a usage issue rather than a bug.

I have a json formatted file in a local folder (windows 10) containing a json array of several json objects. I was testing python-benedict functionality. If I open the file like this:

with open('filename.json', 'r', encoding='utf-8) as opened_file:
     loaded_file = json.load(opened_file)

'loaded_file' is returned as a list of dicts, and I can perform 'd = benedict(loaded__file[0])' and then run benedict utilities against 'd'.

On the other hand, I was trying to save a step and do d = benedict('filename.json'). When I do that (in an ipython shell), I get "ValueError: Invalid string data input."

Suggestions?

Thanks--

--Al

List embedded on dictionary

Hi Fabio,
it is a good package and useful for dictionary nested. The suggestion is methods for fetching data from dictionary embedded on list, returning a list or individual value.

for example:
items.key1 return [val1,val1, ... ,val1] from each key1.
items.key1[x] return val1 from x.
items.key1[x].key3[x].key71[x] return [val1,val3,val71] from x.

I expect that my suggestion seem an improvement on package for you.

best regards

dict structure.txt