The dynamo-pandas from drgfreeman

AWS configuration parameters cannot be overwritten

While AWS configuration parameters can be set via a config file or environment variables, there may be cases where these parameters need to be overwritten.

The current put_df, get_df and transactions module functions do not provide a mean to pass these parameters.

Adding a **kwargs argument to the different functions and passing it to the underlying boto3.client or boto3.resource function call would provide this functionality.

For examples, the get_df function signature would become:

def get_df(*, table, keys=None, attributes=None, dtype=None, **resource_kwargs):
    ...

Make to_items and to_df functions private

The to_items and to_df functions are simple functions that do not add much value to the API.

Make these functions private and remove the to_item function.

Add parameter to select item attributes to get

Add parameter to select item attributes to get when calling the following functions:

get_df
transactions.get_all_items
transactions.get_item
transactions.get_items

The parameter would take a list of attribute names.

Example

>>> df = get_df(
...     table="players",
...     keys=[{"player_id": "player_three"}, {"player_id": "player_one"}],
...     attributes=["player_id", "play_time"],
... )
>>> print(df)
      player_id        play_time
0  player_three  1 days 14:01:19
1    player_one  2 days 17:41:55

Include testing on Windows platfrom in CI

Tests that pass on Ubuntu Linux fail on Windows 10 (ref. #12).

Include testing a subset of the CI test matrix on a Windows platform.

Configure tox

Use tox to run unit test on different python version locally and in CI.

Configure Sphinx

Configure Sphinx docs and expand docstrings.

Bad indentation in get_df docs dtype example

The dtype parameter in one of the examples in the get_df function docs is not properly indented.

dynamo-pandas/dynamo_pandas/dynamo_pandas.py

Lines 66 to 74 in 3e51320

    
               >>> df = get_df( 
        
               ...     table="players", 
        
               ...     keys=keys(player_id=["player_two", "player_four"]), 
        
               ...         dtype={ 
        
               ...             "bonus_points": "Int8", 
        
               ...             "last_play": "datetime64[ns, UTC]", 
        
               ...             # "play_time": "timedelta64[ns]"  # See note below. 
        
               ...         } 
        
               ...     )

Add put_items function in transactions module

Add put_items function in transactions module to allow adding/updating multiple items simultaneously.

Implement an update_df functionality

@DrGFreeman Is this repo still active?
I built an update functionality for updating selected columns in a dataframe to dynamodb using your module, and I'd want to contribute that functionality.

error when calling get_df()

I defined boto3_agrs as a dictionary

boto3_args={}
boto3_args["endpoint_url"] = "http://localhost:8000"
boto3_args["aws_access_key_id"] = "fakeMyKeyId"
boto3_args["aws_secret_access_key"] = "fakeSecretAccessKey"

And tried to execute
df = get_df(table = "Employee", boto3_kwargs = boto3_args)

Errror: TypeError: get_df() got an unexpected keyword argument 'boto3_kwargs'

But when I checked th soruce code, the method signaure in dynamo_pandas.py is:
def get_df(*, table, keys=None, attributes=None, dtype=None, boto3_kwargs={}):

This does have keyword argument boto3_kwargs as a Key word argument.

The return value of transactions.get_item for a non-existent item is not tested

It is expected that the transactions.get_item function returns None if no item matching the specified key is found in the table.

There is currently no unit test to verify this behavior.

Remove support for Python 3.7 and add support for 3.11 & 3.12

Remove support for Python 3.7 and add support for 3.11 & 3.12. Ref. https://devguide.python.org/versions/.

There is no test to ensure putting an item with an existing key updates the item

There is no test to ensure that putting an item with an existing key updates the item

Add high level transaction functions

Add high level transaction functions that integrate conversion and transactions in a single function call:

put_df(df, table) add/update all items from a dataframe.
get_df(keys, table) get specific items (or all if keys=None) from a table into a dataframe.

Handling of unprocessed items from the client's batch_write_item function is not tested

The handling of the unprocessed items from the client's batch_write_item function called in transactions.put_items is not covered by unit tests. This can lead to bugs like #42 remaining unnoticed.

dynamo-pandas/dynamo_pandas/transactions/transactions.py

Lines 293 to 294 in 3a28921

    
           if response["UnprocessedItems"] != {}: 
        
               return response["UprocessedItems"][table]

Investivate whether mocking using moto can be used to return unprocessed items. Otherwise, potentially use a custom mock to return unprocessed items and ensure the whole function is covered by tests.

Align black version in pre-commit-config with requirements-dev.txt

#76 bumped black to 24.3.0 in requirements-dev.txt however the version in .pre-commit-config.yaml is still 22.3.0.

Align the two versions to avoid formatting conflicts.

Version number is not updated in release 1.1.0

Version number in release (tag) 1.1.0 is still 1.0.0.

Unprocessed keys in get_items are not handled correctly

In the transactions.get_items function, the unprocessed keys returned by the boto3.resource().batch_get_item() function are not handled correctly and the function is called again with all the original keys:

dynamo-pandas/dynamo_pandas/transactions/transactions.py

Lines 137 to 139 in 3a28921

    
           while response["UnprocessedKeys"] != {}: 
        
               response = resource.batch_get_item(RequestItems=_request(keys)) 
        
               items.extend(response["Responses"][table])

Also, this block of code is not covered by unit tests, preventing this bug from being reported in tests.

Unit tests failing on Windows platform

Most unit tests fail on Windows platform with the following exception:

AttributeError: module 'numpy' has no attribute 'float128'

moto mock_dynamodb2 is deprecated

moto's mock_dynamodb2 is deprecated and results in failing CI. Replace it with mock_dynamodb.

Boto3_kwargs parameter commit not part of latest release

Hi

Thanks for putting in the time to create this cool package. It has been really useful.

I was wondering if you could please create a new release. I am specifically after this commit with the boto3_kwargs parameter being added:
Add boto3_kwargs parameter

Thanks again.

Code examples in README do not have syntax highlighting

Add the python language identifier to code examples in README.

Typo in "UnprocessedItems" dictionary key

The key should read "UnprocessedItems":

dynamo-pandas/dynamo_pandas/transactions/transactions.py

Line 294 in 3a28921

return response["UprocessedItems"][table]

tox is missing from dev requirements

tox is used to automate execution of tests however it is not included in the dev requirements (requirements-dev.txt).

Add Python 3.10 to CI

Add Python 3.10 to tox environment and CI builds.

Make boto3 an "extra" requirement

boto3 is currently defined in the install_requires parameter of setup in setup.py. This result in the boto3 and botocore packages being added to lambda layers built using AWS SAM tools. These two packages use about 60 MB of layer storage space, a significant fraction of the AWS lambda layer size limit of 250 MB, although they are not required to be installed in the lambda layer since they are included in the lambda runtime environment.

Moving boto3 to the extras_require parameter of the setup function would prevent the addition of boto3 and botocore to lambda layers while allowing their installation using the 'boto' extra option.

Update Installation section of README and docs to reflect the changes in installation options.

Tables with GSI & LSI?

Hi, firstly this package looks like it could really make my life easier, so thanks for putting the time in!
i'm not a dynamoDB expert, so sorry if this is a stupid error on my part.
I'm receiving a client error when working with 'get_df' on dynamo tables that have either GSI or LSI:
"An error occurred (ValidationException) when calling the BatchGetItem operation: The provided key element does not match the schema"

Following your examples, it's working for all tables that dont have a GSI or LSI, should i be using a different "keys" / query structure for those tables?

Release version 1.0.0

Remove development notices from README.
Change version in __init__.py and docs/conf.py.

Bad indentation in Overview documentation code example

The indentation of the dtype parameter and closing parenthesis in the get_df with dtype example in docs/overview.rst are incorrect:

df = get_df(
    table="players",
    keys=keys(player_id=["player_two", "player_four"]),
        dtype={
            "bonus_points": "Int8",
            "last_play": "datetime64[ns, UTC]",
            # "play_time": "timedelta64[ns]"  # See note below.
        }
    )

Should read:

df = get_df(
    table="players",
    keys=keys(player_id=["player_two", "player_four"]),
    dtype={
        "bonus_points": "Int8",
        "last_play": "datetime64[ns, UTC]",
        # "play_time": "timedelta64[ns]"  # See note below.
    }
)

Add functions to convert DataFrame and Series to items dict and vice-versa

Add functions to convert pandas DataFrame and Series to items dict and vice-versa.

Examples (subject to modification):

to_items(df) to convert a dataframe to a list of dictionaries.
to_item(obj) to convert a single row dataframe or a series to a dictionary.
to_df(items, dtype=None) to convert a single or multiple items to a dataframe with optional data types.

filter with attribute value

Could we get dataframe from Dynamodb with filtering on attribute value? I know right now we can filter on keys, but not sure if we can filter on attribute value. Thank you.

Release version 1.2.1

Release version 1.2.1 to make bug fixes from #45 available on PyPi.

Also add a CHANGELOG.md file to make tracking of changes easier.

Move the keys function to the main module

Move the keys function from the transactions module to the main module.

Using the package in with the high level interface functions, a use should not have to import functions from sub modules. Since the keys function is meant as a helper function to keep the interface simple, it would make more sense to have it as part of the main module.

Returned unprocessed items have incorrect format

The unprocessed items returned by the _put_items function embedded in the transactions.put_items function are not in the same format as the items passed to the function.

dynamo-pandas/dynamo_pandas/transactions/transactions.py

Lines 289 to 296 in 3a28921

    
           def _put_items(items, table=table): 
        
               response = client.batch_write_item( 
        
                   RequestItems={table: [{"PutRequest": {"Item": item}} for item in items]} 
        
               ) 
        
               if response["UnprocessedItems"] != {}: 
        
                   return response["UprocessedItems"][table] 
        
               else: 
        
                   return []

The _put_item function expects a list of item dictionaries serialized with the serde.TypeSerializer.serialize() method whereas the function returns a list of dictionaries in the format {"PutRequest": {"Item": item}} where item is a serialized dictionary.

A correct implementation would be:

def _put_items(items, table=table):
        response = client.batch_write_item(
            RequestItems={table: [{"PutRequest": {"Item": item}} for item in items]}
        )
        if response["UnprocessedItems"] != {}:
            return [
                item["PutRequest"]["Item"]
                for item in response["UnprocessedItems"][table]
            ]
        else:
            return []

This bug is currently pass unit tests since the unprocessed items handling is not covered by tests (ref. #43).

Timedelta string values cannot be converted with the dtype parameter

Timedelta string values stored in a table cannot be converted with the dtype parameter of the get_df and to_df functions or using the dataframe astype method. This is due to a known bug in pandas (ref.: pandas-dev/pandas#38509).

As a result, unit tests for the dtype parameter of the get_df and put_df function do not test this conversion. Once the pandas issue is resolved, this conversion can be added to the tests.

As a workaround, the Timedelta columns can be converted using pd.to_timedelta(df.column_name).

Test fixtures are failing with moto version 5

mock_dynamodb has been removed in moto version 5 and replaced with mock_aws.

	>>> df = get_df(
	... table="players",
	... keys=keys(player_id=["player_two", "player_four"]),
	... dtype={
	... "bonus_points": "Int8",
	... "last_play": "datetime64[ns, UTC]",
	... # "play_time": "timedelta64[ns]" # See note below.
	... }
	... )

	if response["UnprocessedItems"] != {}:
	return response["UprocessedItems"][table]

	while response["UnprocessedKeys"] != {}:
	response = resource.batch_get_item(RequestItems=_request(keys))
	items.extend(response["Responses"][table])

	def _put_items(items, table=table):
	response = client.batch_write_item(
	RequestItems={table: [{"PutRequest": {"Item": item}} for item in items]}
	)
	if response["UnprocessedItems"] != {}:
	return response["UprocessedItems"][table]
	else:
	return []

drgfreeman / dynamo-pandas Goto Github PK

dynamo-pandas's People

Contributors

Stargazers

Watchers

Forkers

dynamo-pandas's Issues

Recommend Projects

Recommend Topics

Recommend Org