Giter Site home page Giter Site logo

flatten-dict's People

Contributors

duarteocarmo avatar gllrt avatar hsorsky avatar ianlini avatar isidentical avatar nicojahn avatar omerfarukdogan avatar pvtmert avatar romnn avatar zemelleong avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

flatten-dict's Issues

Feature Request: Switch "don't flatten embedded lists"

As it is usually unpredictable how many items a list will contain, it is rarely useful to flatten dictionaries which contain lists. A switch could disable flattening lists. An option could be, to assign the (unflattened) list as JSON string to the list key.

Avoid pkg_resources import in __init__

Importing pkg_resources on complex environments might take up to 0.5 seconds, which we are currently experiencing with in DVC (iterative/dvc#6349 (comment)). Since this is only used to retrieve the version, for 3.8>= importlib.metadata can be used from standard library which is way faster to import. For 3.7<= we can always fall back to pkg_resources. This simple fix improves flatten_dict import considerably.

master:

 $ python -Ximporttime -c 'import flatten_dict' 2>&1 | tail 
import time:        76 |      13018 |       pkg_resources.extern.pyparsing
import time:      1084 |       1084 |       pkg_resources.extern.packaging.markers
import time:      6211 |      20312 |     pkg_resources.extern.packaging.requirements
import time:       340 |        340 |     sysconfig
import time:    242836 |     305008 |   pkg_resources
import time:       688 |        688 |     six
import time:       134 |        134 |     flatten_dict.reducers
import time:       120 |        120 |     flatten_dict.splitters
import time:       264 |       1205 |   flatten_dict.flatten_dict
import time:       544 |     306755 | flatten_dict

my branch (on 3.8, with a considerably complex environment with 100s packages);

 $ python -Ximporttime -c 'import flatten_dict' 2>&1 | tail 
import time:       192 |       1344 |           email._parseaddr
import time:       558 |       8339 |         email.utils
import time:       313 |      11194 |       email._policybase
import time:       522 |      12269 |     email.feedparser
import time:       226 |      12495 |   email.parser
import time:       157 |        157 |     uu
import time:       260 |        260 |     email._encoded_words
import time:       119 |        119 |     email.iterators
import time:       391 |        926 |   email.message
import time:      2075 |      30832 | flatten_dict

Path reducer fails when enumerate_types contains 'list'

The flatten method fails to flatten lists with path reducer if the input data contains list, and enumerating lists is enabled.

Simple code to reporduce:

from flatten_dict import flatten

data = {
	"fruits" : ['apple','mango','kiwi']
}

flatten(data,reducer="path",enumerate_types=(list,))

Produces the following output:

Traceback (most recent call last):
  File "repr.py", line 7, in <module>
    flatten(data,reducer="path",enumerate_types=(list,))
  File "/home/marcsello/mmvmm2/venv/lib/python3.7/site-packages/flatten_dict/flatten_dict.py", line 88, in flatten
    _flatten(d)
  File "/home/marcsello/mmvmm2/venv/lib/python3.7/site-packages/flatten_dict/flatten_dict.py", line 75, in _flatten
    _flatten(value, flat_key)
  File "/home/marcsello/mmvmm2/venv/lib/python3.7/site-packages/flatten_dict/flatten_dict.py", line 71, in _flatten
    flat_key = reducer(parent, key)
  File "/home/marcsello/mmvmm2/venv/lib/python3.7/site-packages/flatten_dict/reducer.py", line 13, in path_reducer
    return os.path.join(k1, k2)
  File "/usr/lib/python3.7/posixpath.py", line 94, in join
    genericpath._check_arg_types('join', a, *p)
  File "/usr/lib/python3.7/genericpath.py", line 149, in _check_arg_types
    (funcname, s.__class__.__name__)) from None
TypeError: join() argument must be str or bytes, not 'int'

As I have investigated, I have found, that this may caused by that path reducer relies on python's internal path.join method, which fails when a number is passed to it. The enumeration passes a number to the join function, so this causes it to fail.

Indexing support in case of dict values are list.

Flatten returns same dictionary if dictionary values contain list.

from flatten_dict import flatten

nested = {'b': [{'c':{'d':[1,2,3]}}], 'e' : [{'f':3}, {'g':6}]}

def underscore_reducer(k1, k2):
    if k1 is None:
        print "1", k1, k2
        return k2
    else:
        print k1,k2
        return k1 + "_" + k2

print flatten(nested, reducer=underscore_reducer)
'''''''''''''''''''''''
Output
'''''''''''''''''''''''
{'b': [{'c': {'d': [1, 2, 3]}}], 'e': [{'f': 3}, {'g': 6}]}

Output shoudn't be:
{'b_0.c.d':[1,2,3], ''b_1.e_0.f":3, b_1.e_1.g:6}

Feature Proposition: Underscores as native separators

Hey there!

Can I make a pull request for native inclusion of the underscores as a separator?

Something like:

flatten(normal_dict, reducer="_")

which would output the expanded dict with underscores as separators directly.

What do you think?

Cheers and thanks for a great little utility ๐Ÿ˜„

Unflatten ?

Hi,

Great lib, simple and powerful. I was thinking that it would be great to be able to unflatten dictionaries from flattened ones.

Cheers!

question

hi
I have this

paths = {
    'a': {},
    'a/a': {},
    'b/a': {},
    'b/a/a': {},
    'b/c': {},
    'b/d': {},
}

that I converted to this with unflatten()

{
    "a": {
        "a": {}
    },
    "b": {
        "a": {
            "a": {}
        },
        "c": {},
        "d": {}
    }
}

but my application needs this format:

[
    {
        "text": "a",
        "children": [
            {
                "text": "a",
                "leaf": "true"
            }
        ]
    },
    {
        "text": "b",
        "children": [
            {
                "text": "a",
                "children": [
                    {
                        "text": "a",
                        "leaf": "true"
                    }
                ]
            },
            {
                "text": "b",
                "leaf": "true"
            },
            {
                "text": "c",
                "leaf": "true"
            }
        ]
    }
]

do you know how I could I obtain that?

unflatten with lists

Flattening a nested dict that contains lists works great, but unflatten makes dicts instead of lists when index is list index. I rewrote part of your lib to unflatten for my needs and thought you might want to integrate it into you unflatten.

I'm worried that my changes aren't generic enough work for all kinds of mixed list with dict.

Here is I how did the unflattening. The only function I change is this one:

def nested_set_dict(d, keys, value):
    """Set a value to a sequence of nested keys

    Parameters
    ----------
    d : Mapping
    keys : Sequence[str]
    value : Any
    """
    assert keys
    key = keys[0]
    if len(keys) == 1:
        if type(d) == list:
            d.append(value)
        else:
            d[key] = value
        return

    # the type is a string so make a dict if none exists
    if type(keys[1]) == int:
        if key in d:
            pass
        else:
            d[key] = []
        d = d[key]
    elif type(key)==int:
        if (key+1) > len(d):
            d.append({})
        d = d[key]
    else:
        d = d.setdefault(key, {})
    nested_set_dict(d, keys[1:], value)

Testing it out:

d1 = {'a':{'b':[{'c1':'nested1!','d1':[{'e1':'so_nested1!!!'}]},
               {'c2':'nested2!','d2':[{'e2':'so_nested2!!!'}]},
               {'c3':'nested3!','d3':[{'e3':'so_nested3!!!'}]},
               {'c4':'nested4!','d4':[{'e4':'so_nested4a!!!'},
                                      {'e4':'so_nested4b!!!'},
                                      {'e4':'so_nested4c!!!'},
                                      {'e4':'so_nested4d!!!'},
                                      {'e4':'so_nested4e!!!'}]}]}}    

Flatten works great for this out of the box

df = mzm.flatten(d1,enumerate_types=(list,))
kv = sorted([(k,v) for (k,v) in df.items()])

(('a', 'b', 0, 'c1'), 'nested1!')
(('a', 'b', 0, 'd1', 0, 'e1'), 'so_nested1!!!')
(('a', 'b', 1, 'c2'), 'nested2!')
(('a', 'b', 1, 'd2', 0, 'e2'), 'so_nested2!!!')
(('a', 'b', 2, 'c3'), 'nested3!')
(('a', 'b', 2, 'd3', 0, 'e3'), 'so_nested3!!!')
(('a', 'b', 3, 'c4'), 'nested4!')
(('a', 'b', 3, 'd4', 0, 'e4'), 'so_nested4a!!!')
(('a', 'b', 3, 'd4', 1, 'e4'), 'so_nested4b!!!')
(('a', 'b', 3, 'd4', 2, 'e4'), 'so_nested4c!!!')
(('a', 'b', 3, 'd4', 3, 'e4'), 'so_nested4d!!!')
(('a', 'b', 3, 'd4', 4, 'e4'), 'so_nested4e!!!')

d2 = {}
for key_value in kv:
    k = key_value[0]
    v = key_value[1]
    nested_set_dict(d2,k,v)

Gives

d1 =

{'a': {'b': [{'c1': 'nested1!', 'd1': [{'e1': 'so_nested1!!!'}]}, {'c2': 'nested2!', 'd2': [{'e2': 'so_nested2!!!'}]}, {'c3': 'nested3!', 'd3': [{'e3': 'so_nested3!!!'}]}, {'d4': [{'e4': 'so_nested4a!!!'}, {'e4': 'so_nested4b!!!'}, {'e4': 'so_nested4c!!!'}, {'e4': 'so_nested4d!!!'}, {'e4': 'so_nested4e!!!'}], 'c4': 'nested4!'}]}}

d2 =

{'a': {'b': [{'c1': 'nested1!', 'd1': [{'e1': 'so_nested1!!!'}]}, {'c2': 'nested2!', 'd2': [{'e2': 'so_nested2!!!'}]}, {'c3': 'nested3!', 'd3': [{'e3': 'so_nested3!!!'}]}, {'d4': [{'e4': 'so_nested4a!!!'}, {'e4': 'so_nested4b!!!'}, {'e4': 'so_nested4c!!!'}, {'e4': 'so_nested4d!!!'}, {'e4': 'so_nested4e!!!'}], 'c4': 'nested4!'}]}}

Patch release with updated pyproject.toml?

First of all, thanks for the handy library! I've been looking for a more lightweight alternative to pandas.json_normalize for awhile, and this looks like just what I needed.

Would it be possible to push out a patch release (0.3.1) that includes your changes to pyproject.toml from #32? It would be nice to have this on PyPI, because the current release installs pathlib2 as a dependency even on python3.4+.

Inverse function of flatten

Given the reducer is 1-to-1, the flatten function should be 1-to-1.
Therefore it is possible to provide a function to invert the flattened dictionary.

enumerate_types cannot handle numpy.ndarray

The existence check of value immediately following the isinstance check against flattenable_types will throw a ValueError if used with a numpy ndarray.

import numpy as np
from flatten_dict import flatten
d = {'a': np.array([0, 1, 2]), 'b': 2}
dflat = flatten(d, enumerate_types=(np.ndarray,))

flatten_dict.py - Line 76
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

max_flatten_depth

I'm not able to use max_flatten_depth, this error occurred:

Traceback (most recent call last):
  File "test.py", line 39, in <module>
    pprint(flatten(normal_dict, reducer='path', max_flatten_depth=2))
TypeError: flatten() got an unexpected keyword argument 'max_flatten_depth'

Tested on python 2.7 and 3.8 - the same behaviour.
Installed with

pip install flatten-dict

Error message for unflatten with duplicated key is not clear

Original error:

In [4]: unflatten({1: ('a', 'b'), 2: ('a', 'b')}, inverse=True)                                                                                                                                                   
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-4-9f138cfe18d2> in <module>
----> 1 unflatten({1: ('a', 'b'), 2: ('a', 'b')}, inverse=True)

~/projects/flatten-dict/flatten_dict/flatten_dict.py in unflatten(d, splitter, inverse)
    122             flat_key, value = value, flat_key
    123         key_tuple = splitter(flat_key)
--> 124         nested_set_dict(unflattened_dict, key_tuple, value)
    125 
    126     return unflattened_dict

~/projects/flatten-dict/flatten_dict/flatten_dict.py in nested_set_dict(d, keys, value)
     91         return
     92     d = d.setdefault(key, {})
---> 93     nested_set_dict(d, keys[1:], value)
     94 
     95 

~/projects/flatten-dict/flatten_dict/flatten_dict.py in nested_set_dict(d, keys, value)
     87     if len(keys) == 1:
     88         if key in d:
---> 89             raise ValueError("duplicated key '{}'".format(key))
     90         d[key] = value
     91         return

ValueError: duplicated key 'b'

Expected error:

ValueError: duplicated key ('a', 'b')

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.