Giter Site home page Giter Site logo

Comments (30)

jessmchung avatar jessmchung commented on September 15, 2024 8

Any updates on adding support?

from kinesis-aggregation.

 avatar commented on September 15, 2024 4

First round of updates is complete. There's a working unit test that runs an end-to-end aggregate & deaggregate successfully on both Python 2.7.x and Python 3.6.x. (Have not merged back to master yet)

I'm working on adding some more unit testing and will try to do some end-to-end testing.

I will probably bump the version number to 1.1 when I push to PyPi.

from kinesis-aggregation.

 avatar commented on September 15, 2024 3

Great! My Mon. & Tues. this week are madness, but I'll try to wrap things up, get some more test coverage and get this pushed to PyPi by the end of the week. Thanks!

from kinesis-aggregation.

jessmchung avatar jessmchung commented on September 15, 2024 2

Definitely within the next two months pls!

from kinesis-aggregation.

 avatar commented on September 15, 2024 2

Apologies, got swamped. Likely early next week at this point.

from kinesis-aggregation.

 avatar commented on September 15, 2024 2

As an update to anyone watching, I'm working on this today, so I'll update with progress later on.

from kinesis-aggregation.

 avatar commented on September 15, 2024

The short answer is that when this project was developed, AWS Lambda didn't actually support Python3. That's a fairly recent release (https://aws.amazon.com/about-aws/whats-new/2017/04/aws-lambda-supports-python-3-6/) and you're the first person to ask about it.

We'll work on getting support for that in.

from kinesis-aggregation.

 avatar commented on September 15, 2024

No, sorry @jessmchung , had lost track of this. I'll try to start looking into it in the next week or so. What level of urgency do you have?

from kinesis-aggregation.

 avatar commented on September 15, 2024

@jessmchung my current plan is to add concurrent Python 2.7.x and Python 3.x support to the existing Python code by adding a dependency on the six library (https://pypi.python.org/pypi/six) as well as get some unit tests added in. Does that sound reasonable?

from kinesis-aggregation.

jessmchung avatar jessmchung commented on September 15, 2024

@brentnash That sounds excellent!

from kinesis-aggregation.

 avatar commented on September 15, 2024

Ongoing work happening here:

https://github.com/awslabs/kinesis-aggregation/tree/py3-compat

from kinesis-aggregation.

quiver avatar quiver commented on September 15, 2024

I really appreciate Py3 support.

Tested py3-compat branch and worked like a charm except for one.

For some py3 environments, pip is named pip3(or the like). In that case, make_lambda_build.py fails with 'You do not have "pip" installed or it is not on your PATH.` error.

For example, on upcoming AL2, if you install py3 with $ sudo amazon-linux-extras install python3, pips are installed as follows:

$ ls /usr/bin/ | grep pip
pip3
pip-3
pip-3.6
pip3.6

from kinesis-aggregation.

 avatar commented on September 15, 2024

Thank you for taking a look at it @quiver ! Much appreciated.

Nice catch on the pip3 thing.

I checked in a change to invoke the pip module directly which I think should get around the issue. i.e.:

python -m pip install -r requirements.txt

Give it a shot if you get a chance. I tested with python 2 & python 3 on my system successfully.

from kinesis-aggregation.

quiver avatar quiver commented on September 15, 2024

@brentnash

python -m pip install -r requirements.txt

This hardcoded approach does not work if python is linked to Py2 and its target runtime is Py3.

What about using sys.executable instead?

This should work as long as make_lambda_build.py is invoked from the corresponding interpreter.

$ git diff -u
diff --git a/python/make_lambda_build.py b/python/make_lambda_build.py
index 41aa05a..97a2857 100644
--- a/python/make_lambda_build.py
+++ b/python/make_lambda_build.py
@@ -97,7 +97,7 @@ def install_dependencies():
     print('')
     print('Installing necessary modules from pip...')
     requirements_file = os.path.join(proj_dir, REQUIREMENTS_FILE_NAME)
-    pip_install_cmd = 'python -m pip install -r {} -t "{}"'.format(requirements_file, build_dir)
+    pip_install_cmd = '{} -m pip install -r {} -t "{}"'.format(sys.executable, requirements_file, build_dir)
     print(pip_install_cmd)
     pip_install_cmd_line_result = subprocess.call(pip_install_cmd, shell=True)
     if pip_install_cmd_line_result != 0:

Tested with

  • system py2/3
  • virtualenv

from kinesis-aggregation.

 avatar commented on September 15, 2024

@quiver I forgot sys.executable existed. That is a much better answer. I'll add that instead.

from kinesis-aggregation.

 avatar commented on September 15, 2024

@jessmchung I'm still going to add some more tests, but did you want to try out the py3-compat branch before I merge to master and upload to PyPi?

from kinesis-aggregation.

jessmchung avatar jessmchung commented on September 15, 2024

@brentnash Yes! Will let you know asap!

from kinesis-aggregation.

 avatar commented on September 15, 2024

Not urgent, but if you have a chance, I always appreciate extra data/validation points. Thanks!

from kinesis-aggregation.

jessmchung avatar jessmchung commented on September 15, 2024

Whoops - meant to respond! We tested this. It looks good for our usecases.

from kinesis-aggregation.

jsullivanLI avatar jsullivanLI commented on September 15, 2024

This is working for us as well! Any update on when it will be released?

from kinesis-aggregation.

jarrellmark avatar jarrellmark commented on September 15, 2024

Great to hear @brentnash looking forward to the Python 3 support!! Will it be available on pip?

from kinesis-aggregation.

 avatar commented on September 15, 2024

That's the plan!

from kinesis-aggregation.

jessmchung avatar jessmchung commented on September 15, 2024

YAY!!!
bitmoji

from kinesis-aggregation.

 avatar commented on September 15, 2024

Added a bunch more unit tests:

https://github.com/awslabs/kinesis-aggregation/commits/py3-compat

I generated a bunch of data with the actual Kinesis Producer Library (KPL) to compare against.

Working on merging to master now and then I'll start on the PyPi distribution.

from kinesis-aggregation.

 avatar commented on September 15, 2024

Live on Test PyPi at the moment:

https://testpypi.python.org/pypi/aws_kinesis_agg

Running some last tests.

from kinesis-aggregation.

 avatar commented on September 15, 2024

Alright, new version should be live:

https://pypi.python.org/pypi/aws_kinesis_agg/1.1.0

Note the new version number is v1.1.0 (was previously 1.0.1). I've done some basic verification via both Python 2.7.14 and Python 3.6.4. Give it a shot and let me know how it goes for all of you (@jessmchung @jarrellmark @jsullivanLI)

I'll leave this issue open until we get some independent confirmation. I may also make a few documentation updates as well.

from kinesis-aggregation.

jarrellmark avatar jarrellmark commented on September 15, 2024

Thanks for the port, @brentnash !

I had to make a slight modification to my code to get it working in python 3.6. I'm not sure if this is a bug just pointing it out. A sample is below demonstrating the change:

import json
import uuid

import boto3
import aws_kinesis_agg.aggregator

kinesis_client = None

def send_record(agg_record):
    global kinesis_client

    partition_key, explicit_hash_key, data = agg_record.get_contents()

    print("partition_key: {partition_key}".format(partition_key=partition_key))
    print("explicit_hash_key: {explicit_hash_key}".format(explicit_hash_key=explicit_hash_key))
    print("data: {data}".format(data=data))

    """
    The commented out code below worked in python 2.7
    """

    # kinesis_client.put_record(
    #     StreamName='STREAM_NAME',
    #     Data=data,
    #     PartitionKey=partition_key,
    #     ExplicitHashKey=explicit_hash_key)

    kinesis_client.put_record(
        StreamName='STREAM_NAME',
        Data=data,
        PartitionKey=partition_key)

if __name__ == '__main__':
    print(json.dumps({
        'id': str(uuid.uuid4()),
        'message': "Sample Message"}))

    session = boto3.Session(profile_name='default')
    kinesis_client = session.client('kinesis')

    kinesis_agg = aws_kinesis_agg.aggregator.RecordAggregator()
    kinesis_agg.on_record_complete(send_record)

    for message_number in range(5):
        partition_key = str(uuid.uuid4())

        kinesis_agg.add_user_record(
            partition_key=partition_key,
            data=json.dumps({
                    'id': partition_key,
                    'message': "Message #{message_number}".format(message_number=message_number)}) \
                .encode('utf-8'))

    send_record(kinesis_agg.clear_and_get())

It looks like the older add_user_record method calculated its own explicit_hash_key value if None was given.

from kinesis-aggregation.

 avatar commented on September 15, 2024

@jarrellmark The good news is that this was indeed intentional, but definitely needs to be better documented in this project. Part of the reason I bumped the version from 1.0 to 1.1 was that I wanted to imply there might be some changes in functionality.

Let me try to run down the rationale briefly (in case you're interested)...

The Kinesis Producer Library (KPL) has code (https://github.com/awslabs/amazon-kinesis-producer/blob/master/aws/kinesis/core/user_record.cc#L22) that appears to generate an explicit hash key (EHK) from a partition key (PK) when one isn't present. That's what this project was doing previously (although the calculation was actually incorrect so I fixed that too). In reality though, I ran a bunch of tests with the actual KPL and if you give it a null EHK, it doesn't auto-generate one, it literally just doesn't put an EHK into the aggregated message and you re-receive it as null on the other end via the Kinesis Client Library (KCL) when the aggregated record is unpacked. So as far as I can tell, in normal circumstances, that code isn't actually used.

Furthermore, as describe in #11, specifying your own EHKs can actually cause some unintended and unpleasant side effects if you're using the actual Kinesis Client Library (KCL). If you don't specify an EHK, it will just create one for itself by hashing the PK. You can find the concerning behavior here: https://github.com/awslabs/amazon-kinesis-client/blob/master/src/main/java/com/amazonaws/services/kinesis/clientlibrary/types/UserRecord.java#L243. The fallout is essentially that if any of your PKs or EHKs for your user records inside the aggregated record don't match the shard the aggregated record itself is sent to, the KCL will drop them. So in that sense, not sending an EHK on your behalf seems like a safer bet (in addition to being more in-line with the actual KPL). FWIW, I don't believe the Python deaggregation modules in this project have that issue, but it's worth knowing about.

Sorry, that was a lot more text than I thought it was going to be. Anyway, that's the rationale; if you disagree or have other ideas, I'm happy to hear them.

from kinesis-aggregation.

jarrellmark avatar jarrellmark commented on September 15, 2024

Sounds good, thanks for the explanation, @brentnash.

from kinesis-aggregation.

 avatar commented on September 15, 2024

It's been a few week with no further comments, so I'm going to go ahead and resolve this. Feel free to reopen if there are any further issues.

from kinesis-aggregation.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.