Comments (30)
Any updates on adding support?
from kinesis-aggregation.
First round of updates is complete. There's a working unit test that runs an end-to-end aggregate & deaggregate successfully on both Python 2.7.x and Python 3.6.x. (Have not merged back to master yet)
I'm working on adding some more unit testing and will try to do some end-to-end testing.
I will probably bump the version number to 1.1 when I push to PyPi.
from kinesis-aggregation.
Great! My Mon. & Tues. this week are madness, but I'll try to wrap things up, get some more test coverage and get this pushed to PyPi by the end of the week. Thanks!
from kinesis-aggregation.
Definitely within the next two months pls!
from kinesis-aggregation.
Apologies, got swamped. Likely early next week at this point.
from kinesis-aggregation.
As an update to anyone watching, I'm working on this today, so I'll update with progress later on.
from kinesis-aggregation.
The short answer is that when this project was developed, AWS Lambda didn't actually support Python3. That's a fairly recent release (https://aws.amazon.com/about-aws/whats-new/2017/04/aws-lambda-supports-python-3-6/) and you're the first person to ask about it.
We'll work on getting support for that in.
from kinesis-aggregation.
No, sorry @jessmchung , had lost track of this. I'll try to start looking into it in the next week or so. What level of urgency do you have?
from kinesis-aggregation.
@jessmchung my current plan is to add concurrent Python 2.7.x and Python 3.x support to the existing Python code by adding a dependency on the six library (https://pypi.python.org/pypi/six) as well as get some unit tests added in. Does that sound reasonable?
from kinesis-aggregation.
@brentnash That sounds excellent!
from kinesis-aggregation.
Ongoing work happening here:
https://github.com/awslabs/kinesis-aggregation/tree/py3-compat
from kinesis-aggregation.
I really appreciate Py3 support.
Tested py3-compat
branch and worked like a charm except for one.
For some py3 environments, pip
is named pip3
(or the like). In that case, make_lambda_build.py
fails with 'You do not have "pip" installed or it is not on your PATH.` error.
For example, on upcoming AL2, if you install py3 with $ sudo amazon-linux-extras install python3
, pips are installed as follows:
$ ls /usr/bin/ | grep pip
pip3
pip-3
pip-3.6
pip3.6
from kinesis-aggregation.
Thank you for taking a look at it @quiver ! Much appreciated.
Nice catch on the pip3 thing.
I checked in a change to invoke the pip
module directly which I think should get around the issue. i.e.:
python -m pip install -r requirements.txt
Give it a shot if you get a chance. I tested with python 2 & python 3 on my system successfully.
from kinesis-aggregation.
@brentnash
python -m pip install -r requirements.txt
This hardcoded approach does not work if python
is linked to Py2 and its target runtime is Py3.
What about using sys.executable
instead?
This should work as long as make_lambda_build.py
is invoked from the corresponding interpreter.
$ git diff -u
diff --git a/python/make_lambda_build.py b/python/make_lambda_build.py
index 41aa05a..97a2857 100644
--- a/python/make_lambda_build.py
+++ b/python/make_lambda_build.py
@@ -97,7 +97,7 @@ def install_dependencies():
print('')
print('Installing necessary modules from pip...')
requirements_file = os.path.join(proj_dir, REQUIREMENTS_FILE_NAME)
- pip_install_cmd = 'python -m pip install -r {} -t "{}"'.format(requirements_file, build_dir)
+ pip_install_cmd = '{} -m pip install -r {} -t "{}"'.format(sys.executable, requirements_file, build_dir)
print(pip_install_cmd)
pip_install_cmd_line_result = subprocess.call(pip_install_cmd, shell=True)
if pip_install_cmd_line_result != 0:
Tested with
- system py2/3
- virtualenv
from kinesis-aggregation.
@quiver I forgot sys.executable existed. That is a much better answer. I'll add that instead.
from kinesis-aggregation.
@jessmchung I'm still going to add some more tests, but did you want to try out the py3-compat branch before I merge to master and upload to PyPi?
from kinesis-aggregation.
@brentnash Yes! Will let you know asap!
from kinesis-aggregation.
Not urgent, but if you have a chance, I always appreciate extra data/validation points. Thanks!
from kinesis-aggregation.
Whoops - meant to respond! We tested this. It looks good for our usecases.
from kinesis-aggregation.
This is working for us as well! Any update on when it will be released?
from kinesis-aggregation.
Great to hear @brentnash looking forward to the Python 3 support!! Will it be available on pip?
from kinesis-aggregation.
That's the plan!
from kinesis-aggregation.
from kinesis-aggregation.
Added a bunch more unit tests:
https://github.com/awslabs/kinesis-aggregation/commits/py3-compat
I generated a bunch of data with the actual Kinesis Producer Library (KPL) to compare against.
Working on merging to master now and then I'll start on the PyPi distribution.
from kinesis-aggregation.
Live on Test PyPi at the moment:
https://testpypi.python.org/pypi/aws_kinesis_agg
Running some last tests.
from kinesis-aggregation.
Alright, new version should be live:
https://pypi.python.org/pypi/aws_kinesis_agg/1.1.0
Note the new version number is v1.1.0 (was previously 1.0.1). I've done some basic verification via both Python 2.7.14 and Python 3.6.4. Give it a shot and let me know how it goes for all of you (@jessmchung @jarrellmark @jsullivanLI)
I'll leave this issue open until we get some independent confirmation. I may also make a few documentation updates as well.
from kinesis-aggregation.
Thanks for the port, @brentnash !
I had to make a slight modification to my code to get it working in python 3.6. I'm not sure if this is a bug just pointing it out. A sample is below demonstrating the change:
import json
import uuid
import boto3
import aws_kinesis_agg.aggregator
kinesis_client = None
def send_record(agg_record):
global kinesis_client
partition_key, explicit_hash_key, data = agg_record.get_contents()
print("partition_key: {partition_key}".format(partition_key=partition_key))
print("explicit_hash_key: {explicit_hash_key}".format(explicit_hash_key=explicit_hash_key))
print("data: {data}".format(data=data))
"""
The commented out code below worked in python 2.7
"""
# kinesis_client.put_record(
# StreamName='STREAM_NAME',
# Data=data,
# PartitionKey=partition_key,
# ExplicitHashKey=explicit_hash_key)
kinesis_client.put_record(
StreamName='STREAM_NAME',
Data=data,
PartitionKey=partition_key)
if __name__ == '__main__':
print(json.dumps({
'id': str(uuid.uuid4()),
'message': "Sample Message"}))
session = boto3.Session(profile_name='default')
kinesis_client = session.client('kinesis')
kinesis_agg = aws_kinesis_agg.aggregator.RecordAggregator()
kinesis_agg.on_record_complete(send_record)
for message_number in range(5):
partition_key = str(uuid.uuid4())
kinesis_agg.add_user_record(
partition_key=partition_key,
data=json.dumps({
'id': partition_key,
'message': "Message #{message_number}".format(message_number=message_number)}) \
.encode('utf-8'))
send_record(kinesis_agg.clear_and_get())
It looks like the older add_user_record method calculated its own explicit_hash_key value if None
was given.
from kinesis-aggregation.
@jarrellmark The good news is that this was indeed intentional, but definitely needs to be better documented in this project. Part of the reason I bumped the version from 1.0 to 1.1 was that I wanted to imply there might be some changes in functionality.
Let me try to run down the rationale briefly (in case you're interested)...
The Kinesis Producer Library (KPL) has code (https://github.com/awslabs/amazon-kinesis-producer/blob/master/aws/kinesis/core/user_record.cc#L22) that appears to generate an explicit hash key (EHK) from a partition key (PK) when one isn't present. That's what this project was doing previously (although the calculation was actually incorrect so I fixed that too). In reality though, I ran a bunch of tests with the actual KPL and if you give it a null EHK, it doesn't auto-generate one, it literally just doesn't put an EHK into the aggregated message and you re-receive it as null on the other end via the Kinesis Client Library (KCL) when the aggregated record is unpacked. So as far as I can tell, in normal circumstances, that code isn't actually used.
Furthermore, as describe in #11, specifying your own EHKs can actually cause some unintended and unpleasant side effects if you're using the actual Kinesis Client Library (KCL). If you don't specify an EHK, it will just create one for itself by hashing the PK. You can find the concerning behavior here: https://github.com/awslabs/amazon-kinesis-client/blob/master/src/main/java/com/amazonaws/services/kinesis/clientlibrary/types/UserRecord.java#L243. The fallout is essentially that if any of your PKs or EHKs for your user records inside the aggregated record don't match the shard the aggregated record itself is sent to, the KCL will drop them. So in that sense, not sending an EHK on your behalf seems like a safer bet (in addition to being more in-line with the actual KPL). FWIW, I don't believe the Python deaggregation modules in this project have that issue, but it's worth knowing about.
Sorry, that was a lot more text than I thought it was going to be. Anyway, that's the rationale; if you disagree or have other ideas, I'm happy to hear them.
from kinesis-aggregation.
Sounds good, thanks for the explanation, @brentnash.
from kinesis-aggregation.
It's been a few week with no further comments, so I'm going to go ahead and resolve this. Feel free to reopen if there are any further issues.
from kinesis-aggregation.
Related Issues (20)
- Aggregation in Java combines different partition keys to one request HOT 1
- Size or time in seconds for trigger callback HOT 1
- Not able to specify 'Boto3' as a keyword argument for iter_deaggregate_records HOT 2
- Need parameter data_format in definition for deaggregate_records to use with Boto3
- AggRecord does not validate record size correctly HOT 3
- aws_kinesis_agg not supported in FIPS mode HOT 1
- Unnecessary dependence on the `npm` package
- Feature request: Provide language support for rustlang
- Provide .protobuf file to allow code generation for non-supported languages HOT 1
- kpl_pb2.py should be regenerated to avoid problems with upcoming protobuf release HOT 11
- Documentation mentions using AggregationManager, which does not exist HOT 4
- Incomplete Documentation - 404 for link
- Is data loss possible when consume aggregated records by AWS lambda? HOT 11
- aws-kinesis-agg nodejs 18 + support HOT 1
- package.json version bump
- Aws-SDK-V3 support HOT 2
- Feature Request: Support for non-MD5 hashing HOT 3
- aws_kinesis_agg fails to run on FIPS compliant python images HOT 1
- CVE-2023-36665
- protobufjs Prototype Pollution vulnerability HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kinesis-aggregation.