Giter Site home page Giter Site logo

mojathenaudfs's Introduction

MOJAthenaUDFs

AWS ATHENA UDFs for Record Linkage

Need Java 1.8 / Maven installed. Installation of both through SDKMAN is recommended

HOW TO BUILD JAR and create the lambda function:

mvn package clean #to check if it compiles
mvn clean install -DskipTests

Create a lambda function (make sure that lambda role is created before this).Either use the web interface or the below command:

aws lambda create-function
--function-name athenaudf
--runtime java8
--role arn:aws:iam::3242345446345:role/serverlessrepo-AthenaUserDefin
--handler com.awssupport.athena.udfs.MOJAthenaUserDefinedFunctions
--timeout 900
--zip-file fileb://./target/MOJAthenaUserDefinedFunctions-1.0-SNAPSHOT --region eu-west-1
  • Go to Athena and make sure that Console is pointing to Engine version 2

  • Run below query on ATHENA to test UDF:

USING EXTERNAL FUNCTION dm(input VARCHAR) RETURNS VARCHAR LAMBDA 'athenaudf' SELECT dm('Hello')


Progress

v.1.0.0

get this mechanism working / compiling and outputing a jar. Next step: Test that its working on Athena.

  • double metaphone should be ready and working now

  • Jaro Winkler from Apache Commons also compiles sucessfully and has a scalar interface , in the way UDFs work in Athena : it needs input of the form "TEXT_FROM_COL1####TEXT_FROM_COL2" where #### is the seperator of the two column strings As the documentation [see reference 1] points out:

      Scalar UDFs only โ€“ Athena only supports scalar UDFs, which process one row at a time and return a single column value. 
      Athena passes a batch of rows, potentially in parallel, to the UDF each time it invokes Lambda
    

References

[1] Athena querying udfs

[2] Athena UDFs: creating and deploying

[3] AWS Athena UDF access to lambda context

mojathenaudfs's People

Contributors

mamonu avatar

Forkers

cvargas-xbrein

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.