drgfreeman / dynamo-pandas Goto Github PK
View Code? Open in Web Editor NEWMake working with pandas data and AWS DynamoDB easy
Home Page: https://dynamo-pandas.readthedocs.io/en/stable/
License: MIT License
Make working with pandas data and AWS DynamoDB easy
Home Page: https://dynamo-pandas.readthedocs.io/en/stable/
License: MIT License
While AWS configuration parameters can be set via a config file or environment variables, there may be cases where these parameters need to be overwritten.
The current put_df
, get_df
and transactions
module functions do not provide a mean to pass these parameters.
Adding a **kwargs
argument to the different functions and passing it to the underlying boto3.client
or boto3.resource
function call would provide this functionality.
For examples, the get_df
function signature would become:
def get_df(*, table, keys=None, attributes=None, dtype=None, **resource_kwargs):
...
The to_items
and to_df
functions are simple functions that do not add much value to the API.
Make these functions private and remove the to_item
function.
Add parameter to select item attributes to get when calling the following functions:
get_df
transactions.get_all_items
transactions.get_item
transactions.get_items
The parameter would take a list of attribute names.
Example
>>> df = get_df(
... table="players",
... keys=[{"player_id": "player_three"}, {"player_id": "player_one"}],
... attributes=["player_id", "play_time"],
... )
>>> print(df)
player_id play_time
0 player_three 1 days 14:01:19
1 player_one 2 days 17:41:55
Tests that pass on Ubuntu Linux fail on Windows 10 (ref. #12).
Include testing a subset of the CI test matrix on a Windows platform.
Use tox to run unit test on different python version locally and in CI.
Add setup.py
Configure Sphinx docs and expand docstrings.
The dtype
parameter in one of the examples in the get_df
function docs is not properly indented.
dynamo-pandas/dynamo_pandas/dynamo_pandas.py
Lines 66 to 74 in 3e51320
Add put_items
function in transactions
module to allow adding/updating multiple items simultaneously.
@DrGFreeman Is this repo still active?
I built an update functionality for updating selected columns in a dataframe to dynamodb using your module, and I'd want to contribute that functionality.
I defined boto3_agrs as a dictionary
boto3_args={}
boto3_args["endpoint_url"] = "http://localhost:8000"
boto3_args["aws_access_key_id"] = "fakeMyKeyId"
boto3_args["aws_secret_access_key"] = "fakeSecretAccessKey"
And tried to execute
df = get_df(table = "Employee", boto3_kwargs = boto3_args)
Errror: TypeError: get_df() got an unexpected keyword argument 'boto3_kwargs'
But when I checked th soruce code, the method signaure in dynamo_pandas.py is:
def get_df(*, table, keys=None, attributes=None, dtype=None, boto3_kwargs={}):
This does have keyword argument boto3_kwargs as a Key word argument.
It is expected that the transactions.get_item
function returns None
if no item matching the specified key is found in the table.
There is currently no unit test to verify this behavior.
Remove support for Python 3.7 and add support for 3.11 & 3.12. Ref. https://devguide.python.org/versions/.
There is no test to ensure that putting an item with an existing key updates the item
Add high level transaction functions that integrate conversion and transactions in a single function call:
put_df(df, table)
add/update all items from a dataframe.get_df(keys, table)
get specific items (or all if keys=None
) from a table into a dataframe.The handling of the unprocessed items from the client's batch_write_item
function called in transactions.put_items
is not covered by unit tests. This can lead to bugs like #42 remaining unnoticed.
dynamo-pandas/dynamo_pandas/transactions/transactions.py
Lines 293 to 294 in 3a28921
Investivate whether mocking using moto
can be used to return unprocessed items. Otherwise, potentially use a custom mock to return unprocessed items and ensure the whole function is covered by tests.
#76 bumped black to 24.3.0 in requirements-dev.txt
however the version in .pre-commit-config.yaml
is still 22.3.0.
Align the two versions to avoid formatting conflicts.
Version number in release (tag) 1.1.0 is still 1.0.0.
In the transactions.get_items
function, the unprocessed keys returned by the boto3.resource().batch_get_item()
function are not handled correctly and the function is called again with all the original keys:
dynamo-pandas/dynamo_pandas/transactions/transactions.py
Lines 137 to 139 in 3a28921
Also, this block of code is not covered by unit tests, preventing this bug from being reported in tests.
Most unit tests fail on Windows platform with the following exception:
AttributeError: module 'numpy' has no attribute 'float128'
moto's mock_dynamodb2
is deprecated and results in failing CI. Replace it with mock_dynamodb
.
Hi
Thanks for putting in the time to create this cool package. It has been really useful.
I was wondering if you could please create a new release. I am specifically after this commit with the boto3_kwargs parameter being added:
Add boto3_kwargs parameter
Thanks again.
Add the python
language identifier to code examples in README.
The key should read "UnprocessedItems"
:
tox is used to automate execution of tests however it is not included in the dev requirements (requirements-dev.txt
).
Add Python 3.10 to tox environment and CI builds.
boto3 is currently defined in the install_requires
parameter of setup
in setup.py
. This result in the boto3 and botocore packages being added to lambda layers built using AWS SAM tools. These two packages use about 60 MB of layer storage space, a significant fraction of the AWS lambda layer size limit of 250 MB, although they are not required to be installed in the lambda layer since they are included in the lambda runtime environment.
Moving boto3 to the extras_require
parameter of the setup
function would prevent the addition of boto3 and botocore to lambda layers while allowing their installation using the 'boto'
extra option.
Update Installation section of README and docs to reflect the changes in installation options.
Hi, firstly this package looks like it could really make my life easier, so thanks for putting the time in!
i'm not a dynamoDB expert, so sorry if this is a stupid error on my part.
I'm receiving a client error when working with 'get_df' on dynamo tables that have either GSI or LSI:
"An error occurred (ValidationException) when calling the BatchGetItem operation: The provided key element does not match the schema"
Following your examples, it's working for all tables that dont have a GSI or LSI, should i be using a different "keys" / query structure for those tables?
__init__.py
and docs/conf.py
.The indentation of the dtype
parameter and closing parenthesis in the get_df
with dtype
example in docs/overview.rst
are incorrect:
df = get_df(
table="players",
keys=keys(player_id=["player_two", "player_four"]),
dtype={
"bonus_points": "Int8",
"last_play": "datetime64[ns, UTC]",
# "play_time": "timedelta64[ns]" # See note below.
}
)
Should read:
df = get_df(
table="players",
keys=keys(player_id=["player_two", "player_four"]),
dtype={
"bonus_points": "Int8",
"last_play": "datetime64[ns, UTC]",
# "play_time": "timedelta64[ns]" # See note below.
}
)
Add functions to convert pandas DataFrame and Series to items dict and vice-versa.
Examples (subject to modification):
to_items(df)
to convert a dataframe to a list of dictionaries.to_item(obj)
to convert a single row dataframe or a series to a dictionary.to_df(items, dtype=None)
to convert a single or multiple items to a dataframe with optional data types.Could we get dataframe from Dynamodb with filtering on attribute value? I know right now we can filter on keys, but not sure if we can filter on attribute value. Thank you.
Release version 1.2.1 to make bug fixes from #45 available on PyPi.
Also add a CHANGELOG.md file to make tracking of changes easier.
Move the keys
function from the transactions
module to the main module.
Using the package in with the high level interface functions, a use should not have to import functions from sub modules. Since the keys
function is meant as a helper function to keep the interface simple, it would make more sense to have it as part of the main module.
The unprocessed items returned by the _put_items
function embedded in the transactions.put_items
function are not in the same format as the items passed to the function.
dynamo-pandas/dynamo_pandas/transactions/transactions.py
Lines 289 to 296 in 3a28921
The _put_item
function expects a list of item dictionaries serialized with the serde.TypeSerializer.serialize()
method whereas the function returns a list of dictionaries in the format {"PutRequest": {"Item": item}}
where item
is a serialized dictionary.
A correct implementation would be:
def _put_items(items, table=table):
response = client.batch_write_item(
RequestItems={table: [{"PutRequest": {"Item": item}} for item in items]}
)
if response["UnprocessedItems"] != {}:
return [
item["PutRequest"]["Item"]
for item in response["UnprocessedItems"][table]
]
else:
return []
This bug is currently pass unit tests since the unprocessed items handling is not covered by tests (ref. #43).
Timedelta string values stored in a table cannot be converted with the dtype
parameter of the get_df
and to_df
functions or using the dataframe astype
method. This is due to a known bug in pandas (ref.: pandas-dev/pandas#38509).
As a result, unit tests for the dtype
parameter of the get_df
and put_df
function do not test this conversion. Once the pandas issue is resolved, this conversion can be added to the tests.
As a workaround, the Timedelta columns can be converted using pd.to_timedelta(df.column_name)
.
mock_dynamodb
has been removed in moto version 5 and replaced with mock_aws
.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.