ianlini / flatten-dict Goto Github PK

View Code? Open in Web Editor NEW

174.0 174.0 36.0 156 KB

A flexible utility for flattening and unflattening dict-like objects in Python.

License: MIT License

Python 99.78% Shell 0.22%

flatten-dict's People

Contributors

Stargazers

Watchers

flatten-dict's Issues

Feature Request: Switch "don't flatten embedded lists"

As it is usually unpredictable how many items a list will contain, it is rarely useful to flatten dictionaries which contain lists. A switch could disable flattening lists. An option could be, to assign the (unflattened) list as JSON string to the list key.

Avoid pkg_resources import in init

Importing pkg_resources on complex environments might take up to 0.5 seconds, which we are currently experiencing with in DVC (iterative/dvc#6349 (comment)). Since this is only used to retrieve the version, for 3.8>= importlib.metadata can be used from standard library which is way faster to import. For 3.7<= we can always fall back to pkg_resources. This simple fix improves flatten_dict import considerably.

master:

 $ python -Ximporttime -c 'import flatten_dict' 2>&1 | tail 
import time:        76 |      13018 |       pkg_resources.extern.pyparsing
import time:      1084 |       1084 |       pkg_resources.extern.packaging.markers
import time:      6211 |      20312 |     pkg_resources.extern.packaging.requirements
import time:       340 |        340 |     sysconfig
import time:    242836 |     305008 |   pkg_resources
import time:       688 |        688 |     six
import time:       134 |        134 |     flatten_dict.reducers
import time:       120 |        120 |     flatten_dict.splitters
import time:       264 |       1205 |   flatten_dict.flatten_dict
import time:       544 |     306755 | flatten_dict

my branch (on 3.8, with a considerably complex environment with 100s packages);

 $ python -Ximporttime -c 'import flatten_dict' 2>&1 | tail 
import time:       192 |       1344 |           email._parseaddr
import time:       558 |       8339 |         email.utils
import time:       313 |      11194 |       email._policybase
import time:       522 |      12269 |     email.feedparser
import time:       226 |      12495 |   email.parser
import time:       157 |        157 |     uu
import time:       260 |        260 |     email._encoded_words
import time:       119 |        119 |     email.iterators
import time:       391 |        926 |   email.message
import time:      2075 |      30832 | flatten_dict

Path reducer fails when enumerate_types contains 'list'

The flatten method fails to flatten lists with path reducer if the input data contains list, and enumerating lists is enabled.

Simple code to reporduce:

from flatten_dict import flatten

data = {
	"fruits" : ['apple','mango','kiwi']
}

flatten(data,reducer="path",enumerate_types=(list,))

Produces the following output:

Traceback (most recent call last):
  File "repr.py", line 7, in <module>
    flatten(data,reducer="path",enumerate_types=(list,))
  File "/home/marcsello/mmvmm2/venv/lib/python3.7/site-packages/flatten_dict/flatten_dict.py", line 88, in flatten
    _flatten(d)
  File "/home/marcsello/mmvmm2/venv/lib/python3.7/site-packages/flatten_dict/flatten_dict.py", line 75, in _flatten
    _flatten(value, flat_key)
  File "/home/marcsello/mmvmm2/venv/lib/python3.7/site-packages/flatten_dict/flatten_dict.py", line 71, in _flatten
    flat_key = reducer(parent, key)
  File "/home/marcsello/mmvmm2/venv/lib/python3.7/site-packages/flatten_dict/reducer.py", line 13, in path_reducer
    return os.path.join(k1, k2)
  File "/usr/lib/python3.7/posixpath.py", line 94, in join
    genericpath._check_arg_types('join', a, *p)
  File "/usr/lib/python3.7/genericpath.py", line 149, in _check_arg_types
    (funcname, s.__class__.__name__)) from None
TypeError: join() argument must be str or bytes, not 'int'

As I have investigated, I have found, that this may caused by that path reducer relies on python's internal path.join method, which fails when a number is passed to it. The enumeration passes a number to the join function, so this causes it to fail.

Indexing support in case of dict values are list.

Flatten returns same dictionary if dictionary values contain list.

from flatten_dict import flatten

nested = {'b': [{'c':{'d':[1,2,3]}}], 'e' : [{'f':3}, {'g':6}]}

def underscore_reducer(k1, k2):
    if k1 is None:
        print "1", k1, k2
        return k2
    else:
        print k1,k2
        return k1 + "_" + k2

print flatten(nested, reducer=underscore_reducer)
'''''''''''''''''''''''
Output
'''''''''''''''''''''''
{'b': [{'c': {'d': [1, 2, 3]}}], 'e': [{'f': 3}, {'g': 6}]}

Output shoudn't be:
{'b_0.c.d':[1,2,3], ''b_1.e_0.f":3, b_1.e_1.g:6}

unexpected output when dict key contains a '/' with reducer='path' options

Feature Proposition: Underscores as native separators

Hey there!

Can I make a pull request for native inclusion of the underscores as a separator?

Something like:

flatten(normal_dict, reducer="_")

which would output the expanded dict with underscores as separators directly.

What do you think?

Cheers and thanks for a great little utility 😄

Unflatten ?

Hi,

Great lib, simple and powerful. I was thinking that it would be great to be able to unflatten dictionaries from flattened ones.

Cheers!

question

hi
I have this

paths = {
    'a': {},
    'a/a': {},
    'b/a': {},
    'b/a/a': {},
    'b/c': {},
    'b/d': {},
}

that I converted to this with unflatten()

{
    "a": {
        "a": {}
    },
    "b": {
        "a": {
            "a": {}
        },
        "c": {},
        "d": {}
    }
}

but my application needs this format:

[
    {
        "text": "a",
        "children": [
            {
                "text": "a",
                "leaf": "true"
            }
        ]
    },
    {
        "text": "b",
        "children": [
            {
                "text": "a",
                "children": [
                    {
                        "text": "a",
                        "leaf": "true"
                    }
                ]
            },
            {
                "text": "b",
                "leaf": "true"
            },
            {
                "text": "c",
                "leaf": "true"
            }
        ]
    }
]

do you know how I could I obtain that?

Accept callable for enumerate_types

Discussed in #53

unflatten with lists

Flattening a nested dict that contains lists works great, but unflatten makes dicts instead of lists when index is list index. I rewrote part of your lib to unflatten for my needs and thought you might want to integrate it into you unflatten.

I'm worried that my changes aren't generic enough work for all kinds of mixed list with dict.

Here is I how did the unflattening. The only function I change is this one:

def nested_set_dict(d, keys, value):
    """Set a value to a sequence of nested keys

    Parameters
    ----------
    d : Mapping
    keys : Sequence[str]
    value : Any
    """
    assert keys
    key = keys[0]
    if len(keys) == 1:
        if type(d) == list:
            d.append(value)
        else:
            d[key] = value
        return

    # the type is a string so make a dict if none exists
    if type(keys[1]) == int:
        if key in d:
            pass
        else:
            d[key] = []
        d = d[key]
    elif type(key)==int:
        if (key+1) > len(d):
            d.append({})
        d = d[key]
    else:
        d = d.setdefault(key, {})
    nested_set_dict(d, keys[1:], value)

Testing it out:

d1 = {'a':{'b':[{'c1':'nested1!','d1':[{'e1':'so_nested1!!!'}]},
               {'c2':'nested2!','d2':[{'e2':'so_nested2!!!'}]},
               {'c3':'nested3!','d3':[{'e3':'so_nested3!!!'}]},
               {'c4':'nested4!','d4':[{'e4':'so_nested4a!!!'},
                                      {'e4':'so_nested4b!!!'},
                                      {'e4':'so_nested4c!!!'},
                                      {'e4':'so_nested4d!!!'},
                                      {'e4':'so_nested4e!!!'}]}]}}

Flatten works great for this out of the box

df = mzm.flatten(d1,enumerate_types=(list,))
kv = sorted([(k,v) for (k,v) in df.items()])

(('a', 'b', 0, 'c1'), 'nested1!')
(('a', 'b', 0, 'd1', 0, 'e1'), 'so_nested1!!!')
(('a', 'b', 1, 'c2'), 'nested2!')
(('a', 'b', 1, 'd2', 0, 'e2'), 'so_nested2!!!')
(('a', 'b', 2, 'c3'), 'nested3!')
(('a', 'b', 2, 'd3', 0, 'e3'), 'so_nested3!!!')
(('a', 'b', 3, 'c4'), 'nested4!')
(('a', 'b', 3, 'd4', 0, 'e4'), 'so_nested4a!!!')
(('a', 'b', 3, 'd4', 1, 'e4'), 'so_nested4b!!!')
(('a', 'b', 3, 'd4', 2, 'e4'), 'so_nested4c!!!')
(('a', 'b', 3, 'd4', 3, 'e4'), 'so_nested4d!!!')
(('a', 'b', 3, 'd4', 4, 'e4'), 'so_nested4e!!!')

d2 = {}
for key_value in kv:
    k = key_value[0]
    v = key_value[1]
    nested_set_dict(d2,k,v)

Gives

d1 =

{'a': {'b': [{'c1': 'nested1!', 'd1': [{'e1': 'so_nested1!!!'}]}, {'c2': 'nested2!', 'd2': [{'e2': 'so_nested2!!!'}]}, {'c3': 'nested3!', 'd3': [{'e3': 'so_nested3!!!'}]}, {'d4': [{'e4': 'so_nested4a!!!'}, {'e4': 'so_nested4b!!!'}, {'e4': 'so_nested4c!!!'}, {'e4': 'so_nested4d!!!'}, {'e4': 'so_nested4e!!!'}], 'c4': 'nested4!'}]}}

d2 =

{'a': {'b': [{'c1': 'nested1!', 'd1': [{'e1': 'so_nested1!!!'}]}, {'c2': 'nested2!', 'd2': [{'e2': 'so_nested2!!!'}]}, {'c3': 'nested3!', 'd3': [{'e3': 'so_nested3!!!'}]}, {'d4': [{'e4': 'so_nested4a!!!'}, {'e4': 'so_nested4b!!!'}, {'e4': 'so_nested4c!!!'}, {'e4': 'so_nested4d!!!'}, {'e4': 'so_nested4e!!!'}], 'c4': 'nested4!'}]}}

import numpy as np
from flatten_dict import flatten
d = {'a': np.array([0, 1, 2]), 'b': 2}
dflat = flatten(d, enumerate_types=(np.ndarray,))

flatten_dict.py - Line 76
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

max_flatten_depth

I'm not able to use max_flatten_depth, this error occurred:

Traceback (most recent call last):
  File "test.py", line 39, in <module>
    pprint(flatten(normal_dict, reducer='path', max_flatten_depth=2))
TypeError: flatten() got an unexpected keyword argument 'max_flatten_depth'

Tested on python 2.7 and 3.8 - the same behaviour.
Installed with

pip install flatten-dict

Error message for unflatten with duplicated key is not clear

Original error:

In [4]: unflatten({1: ('a', 'b'), 2: ('a', 'b')}, inverse=True)                                                                                                                                                   
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-4-9f138cfe18d2> in <module>
----> 1 unflatten({1: ('a', 'b'), 2: ('a', 'b')}, inverse=True)

~/projects/flatten-dict/flatten_dict/flatten_dict.py in unflatten(d, splitter, inverse)
    122             flat_key, value = value, flat_key
    123         key_tuple = splitter(flat_key)
--> 124         nested_set_dict(unflattened_dict, key_tuple, value)
    125 
    126     return unflattened_dict

~/projects/flatten-dict/flatten_dict/flatten_dict.py in nested_set_dict(d, keys, value)
     91         return
     92     d = d.setdefault(key, {})
---> 93     nested_set_dict(d, keys[1:], value)
     94 
     95 

~/projects/flatten-dict/flatten_dict/flatten_dict.py in nested_set_dict(d, keys, value)
     87     if len(keys) == 1:
     88         if key in d:
---> 89             raise ValueError("duplicated key '{}'".format(key))
     90         d[key] = value
     91         return

ValueError: duplicated key 'b'

Expected error:

ValueError: duplicated key ('a', 'b')

the output will remove empty dictionary.

flatten_dict.py

# line 57
change: 
            if isinstance(value, flattenable_types):
to 
            if isinstance(value, flattenable_types) and value != {}:

ianlini / flatten-dict Goto Github PK

flatten-dict's People

Contributors

Stargazers

Watchers

Forkers

flatten-dict's Issues

Recommend Projects

Recommend Topics

Recommend Org