Giter Site home page Giter Site logo

uap-js-udf's Introduction

JavaScript SQL User-Defined Function (UDF) User-Agent Parser Library for the uap-core project.

The primary platform for this is Google BigQuery.

Related Links

Usage

The assets for this project have been published publically so that you may use them though note the bucket has Requester Pays enabled so costs are incurred by the user; sounds worrying but in practice is roughly $0.10 for 10k queries.

Use the following to use the parser:

CREATE TEMP FUNCTION uaparser(ua STRING)
RETURNS STRUCT<
  user_agent STRUCT<
    family STRING,
    major STRING,
    minor STRING,
    patch STRING
  >,
  os STRUCT<
    family STRING,
    major STRING,
    minor STRING,
    patch STRING,
    patchMinor STRING
  >,
  device STRUCT<
    family STRING,
    brand STRING,
    model STRING
  >
>
DETERMINISTIC
LANGUAGE js
OPTIONS (
  -- description = "User-Agent Parser (https://github.com/jimdigriz/uap-js-udf)",
  library = [
    "gs://requester-pays.coremem.com/uap-js-udf/regexes.js",
    "gs://requester-pays.coremem.com/uap-js-udf/uaparser.js"
  ]
)
AS "return uaparser(ua);";

SELECT user_agent, uaparser(user_agent) AS uap FROM `[PROJECT].[DATASET].[TABLE]` WHERE ...;

N.B. remove the comment (leading '--') from the description if you make this a non-temporary function

Partial Parsing

If you only want to parse only the user agent, OS or device you can opt to use the more specific function to make the query faster:

User Agent

CREATE TEMP FUNCTION uaparser_user_agent(ua STRING)
RETURNS STRUCT<
  family STRING,
  major STRING,
  minor STRING,
  patch STRING
>
DETERMINISTIC
LANGUAGE js
OPTIONS (
  -- description = "User-Agent Parser (https://github.com/jimdigriz/uap-js-udf)",
  library = [
    "gs://requester-pays.coremem.com/uap-js-udf/regexes.js",
    "gs://requester-pays.coremem.com/uap-js-udf/uaparser.js"
  ]
)
AS "return uaparser.user_agent(ua);";

SELECT user_agent, uaparser_user_agent(user_agent) AS uap_user_agent FROM `[PROJECT].[DATASET].[TABLE]` WHERE ...;

Operating System

CREATE TEMP FUNCTION uaparser_os(ua STRING)
RETURNS STRUCT<
  family STRING,
  major STRING,
  minor STRING,
  patch STRING,
  patchMinor STRING
>
DETERMINISTIC
LANGUAGE js
OPTIONS (
  -- description = "User-Agent Parser (https://github.com/jimdigriz/uap-js-udf)",
  library = [
    "gs://requester-pays.coremem.com/uap-js-udf/regexes.js",
    "gs://requester-pays.coremem.com/uap-js-udf/uaparser.js"
  ]
)
AS "return uaparser.os(ua);";

SELECT user_agent, uaparser_os(user_agent) AS uap_os FROM `[PROJECT].[DATASET].[TABLE]` WHERE ...;

Device

CREATE TEMP FUNCTION uaparser_device(ua STRING)
RETURNS STRUCT<
  family STRING,
  brand STRING,
  model STRING
>
DETERMINISTIC
LANGUAGE js
OPTIONS (
  -- description = "User-Agent Parser (https://github.com/jimdigriz/uap-js-udf)",
  library = [
    "gs://requester-pays.coremem.com/uap-js-udf/regexes.js",
    "gs://requester-pays.coremem.com/uap-js-udf/uaparser.js"
  ]
)
AS "return uaparser.device(ua);";

SELECT user_agent, uaparser_device(user_agent) AS uap_device FROM `[PROJECT].[DATASET].[TABLE]` WHERE ...;

Deploy

If you want to maintain your own version of the assets you will need:

  • curl
  • git
  • make
  • python3

Check out the project with:

git clone https://github.com/jimdigriz/uap-js-udf.git

Now run:

make

Copy the following to a Google Storage account (eg. gs://mybucket/uaparser/):

  • uaparser.js
  • regexes.js

This can be done by using something like:

gsutil cp -J regexes.js uaparser.js gs://mybucket/uaparser/
gsutil acl ch -r -g example.com:R gs://mybucket/uaparser

N.B. assumes example.com is the domain for your account, do change this to a user/group ACL if more suitable for your use case; public access is not recommended or you will pick up the billing for it

uap-js-udf's People

Contributors

jimdigriz avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.