Giter Site home page Giter Site logo

koteiito / node-athena Goto Github PK

View Code? Open in Web Editor NEW
105.0 8.0 73.0 187 KB

a nodejs simple aws athena client

License: MIT License

Makefile 0.31% TypeScript 99.69%
athena aws-athena nodejs athena-client aws aws-lambda lambda javascript typescript

node-athena's People

Contributors

break-pointer avatar doublesharp avatar isseu avatar jakechampion avatar koteiito avatar skyhacks avatar uzimith avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

node-athena's Issues

Singleton createClient function

Hello,

First I would like to ask a question, does the createClient function returns a new AWS connection every time we call it?

Do you think having a singleton createClient function would be nice? If so, should I create a pull request with a method called createClientSingleton for this purpose?

Thanks,
Ugurcan

IAM roles

Hi,

Good day.

What IAM roles are required by the package? Thanks.

Regards.
JJ

TypeError: Cannot read property 'length' of undefined

This problem appears approximately every time I use execute(sql).toPromise()
After debugging I noticed that error throws in this place

if (this.columns.length === 0) {

TypeError: Cannot read property 'length' of undefined
at AthenaStream. (/var/task/node_modules/athena-client/build/lib/stream.js:93:40)
at next (native)
at fulfilled (/var/task/node_modules/athena-client/build/lib/stream.js:4:58)

The version of athena-client is 2.0.0

Unexpected key 'WorkGroup' found in params

I'm trying to run the example:

var clientConfig = {
    bucketUri: 's3://xxxx'
}

var awsConfig = {
    region: 'xxxx', 
}
 
var athena = require("athena-client")
var client = athena.createClient(clientConfig, awsConfig)

client.execute('SELECT 1').toPromise()
.then(function(data) {
    console.log(data)
})
.catch(function(err) {
    console.error(err)
})

But I'm getting the following error:

Unexpected key 'WorkGroup' found in params

Any idea why?

the rows of the query of 'SHOW TABLES'

When I got the records to the query of 'SHOW TABLES IN hoge'

hoge
fuga 
piyo

I got the rows in node-athena.

[ Row { hoge: 'fuga' },
  Row { hoge: 'piyo' }]

The sample code I used is here.

    const client = createClient(
      { bucketUri: 's3://bucket_path' },
      { region: 'ap-northeast-1' }
    )
    const tables = await client
      .execute(`SHOW TABLES IN hoge`)
      .toPromise()
    console.log(tables.records)

Backpressure when streaming athena results

Hi,

Thanks for putting this library together, as the recommended library on AWS' own docs (athena-express) has no built in support for streaming results.

One issue I'm running into is that when streaming results, the operations I'm performing on them are significantly slower than the rate at which the data is being streamed in. The data then buffers until I've exceeded the maximum number of allowed memory mapped locations (at an operating system level). Is there a way using this library at the moment to be able to restrict how quickly the data is being streamed in, based on the number of data events that have been successfully processed?

Retrieving results after creating a CTAS table

I would want to get query results after I have created a CTAS table with skipFetchResult = true.
I tried creating a new client with a different client config, excluding the skipFetchResult setting.
However, after using the new client I still do not get any results.

CTAS results from a promise

Great library! I've been using it to execute and capture the results of a variety of Athena SQL commands. They all work apart from the CREATE TABLE AS.

When I execute the following I get a NoSuchKey: The specified key does not exist. error.

	const query = 'CREATE TABLE newtable WITH (format='ORC') SELECT * from rawtable';
        athena-client
		.execute(query)
		.toPromise()
		.then(result => {})
                .catch(error => {})

Dataset with comma (,) being split incorrectly

I am hitting a problem during fetching of my results.

When the dataset have a comma (,) in the row, the buffer split incorrectly split on the comma with catering for the escaping double quotes (").

bucket cleanup

Does this cleanup the bucket after the results are returned? If not Feature request!

How to parse the records?

Hi,

I'm executing simple query from Readme.md file

 await client.execute('SELECT 1').toPromise()
      .then((data) => {
        console.log(data)
})

And I get array with Type definition: records: [ Row { _col0: '1' } ],.
This wasn't mentioned in the docs. How can I parse such thing in JavaScript?

{ records: [ Row { _col0: '1' } ],
      queryExecution:
       { QueryExecutionId: '1379ac84-4b2f-4629-a708-5eafd594d725',
         Query: 'SELECT 1',
         ResultConfiguration:
          { OutputLocation: 's3://........bucket/1379ac84-4b2f-4629-a708-5eafd594d725.csv' },
         QueryExecutionContext: { Database: 'default' },
         Status:
          { State: 'SUCCEEDED',
            SubmissionDateTime: 2018-08-23T12:12:52.706Z,
            CompletionDateTime: 2018-08-23T12:12:54.296Z },
         Statistics: { EngineExecutionTimeInMillis: 1285, DataScannedInBytes: 0 } } }

Return the query Id as well as its results

Hey, I would like to ask for the addition of the query id to the return value of the 'execute' method.
That will enable us to get the query results from Athena without re-querying (And will greatly reduce the bills).

Using client library with Angular

Hi,
Trying to use this client with angular project and running into several issues.

const { athena } = require('athena-client');

@Injectable({
  providedIn: 'root'
})
export class AthenaTestGqService {

  clientConfig = {
    bucketUri: constants.aws.athena.tempLocation
  };

  awsConfig = {
    region: constants.aws.region
  };


  constructor() {

    const client = athena.createClient(this.clientConfig, this.awsConfig);

    client.execute('SELECT * FROM my_table LIMIT 20', function (err, data) {
      if (err) {
        console.log(err);
      }
      console.log(data);
    });

  }
}

The error this produces is

TypeError: Cannot read property 'createClient' of undefined

I get UnhandledPromiseRejection warning when sql has error

try to execute this code:

return athenaUtil.createClient({ bucketUri: 'MY S3 OUTPUT' }, { region: process.env.AWS_REGION }).execute('BAD SQL STRING').toStream();

it will throw "UnhandledPromiseRejectionWarning: InvalidRequestException: line 1:1: mismatched input 'BAD' expecting"

Error

Have an error: "Athena is not a constructor". How to fix it?

CSV files

When using this package for querying my athena database, it keep creating junk CSV files and add them into my bucket.
how can i prevent that from happening ?

Compatibility with AWS Lambda

Hi, I was just wondering which versions of node this is compatible with because AWS Lambda uses Node v6. I want to make sure that I can use this with it.

Boolean typing removed while reading data

Hi, I wanted to thank for your work. Querying Athena from JS can be cumbersome and your library really helps.

We have encountered an issue with parsing results from Athena queries. In particular, if a Glue column is marked as boolean, the information is lost in translation when parsing the .csv results file.

Typing information is stored in a nearby .csv.metadata file, see:
https://docs.aws.amazon.com/athena/latest/ug/querying.html

I don't think the library is parsing any of the typings when reading the results from the CSV, which is mostly ok in JS. Strings are still strings, numbers are properly casted. But Boolean("false") === true and this behaviour can cause very hard-to-track bugs (like the one that lead me here).

I am opening the issue in case you are not aware of the issue.

Cheers

Global config issue

I spent a lot of time trying to find a problem in my code but the problem was hidden from the eyes, and it lives on this line of code https://github.com/KoteiIto/node-athena/blob/master/src/index.ts#L42

I am using AWS STS to be able to access another AWS profile, and run a request with temporary credentials that are passed to the createClient function and they are set globally after the function call, and it breaks the rest of the AWS client calls (like S3, where default server settings must be used) returning in the response Access Denied

concurrency limit

Thank you for providing this handy library!

I notice that you keep a local counter of calls, and will queue queries when the concurrency limit is hit.

Just curious why? Athena has its own way of queueing, and would not fail the queries either.

Thanks!

Yang

Support of CTAS query

Hello,

First of all, big thanks for your great work on that lib, we are using it and it works like a charm.

AWS has released an update a few weeks back for what they call CTAS queries:

https://docs.aws.amazon.com/athena/latest/ug/ctas.html

It offers great features, however, we hit a glitch using node-athena on that kind of query.

The "outputLocation" received in the response is not pointing to the csv result or the metadata file.
Instead, the response is s3://[bucket_name]/[path]/[query_id] without the file extension.

When running this kind of queries with node-athena, we hit an error as the s3 key does not exists.

  1. The key: s3://[bucket_name]/[path]/[query_id] doesn't exists.
  2. Only the .metadata file is generated.

We have been able to patch the lib so it can handle this kind of query.

We have a compiled working version available at:

https://github.com/jubry/node-athena

I'm not a coder, so i'm not pretending my patch is how we should handle things.

I just changed a few lines of code in one file (client.js):

  1. Adding a new property in the client config:
export interface AthenaClientConfig extends AthenaRequestConfig {
  pollingInterval?: number
  queryTimeout?: number
  concurrentExecMax?: number
  execRightCheckInterval?: number
  noResultExpected?: boolean
}

NB: noResultExpected?: boolean added

  1. Adding a condition for not trying to access the CSV file in that case:
      if (!config.noResultExpected) {
        const resultsStream = this.request.getResultsStream(
          queryExecution.ResultConfiguration.OutputLocation,
        )
        resultsStream.pipe(csvTransform)
      }
  

NB: if (!config.noResultExpected) { added

By doing that, we are able to keep using your library.

Would it be possible for you to implement such functionality in a proper way ?

It would be greatly appreciated.

Find here an sql statement to reproduce the problem:

CREATE TABLE database.table WITH 
( format = 'PARQUET', 
external_location = 's3://mybucket_name/') 
AS SELECT 1 as myfield

Let me know if you have any question about it.

Thanks in advance,

And again, congratulation for your great work.

Regards,

:)

What are the minimum required permissions to use this module?

What are the minimum required permissions to use this module?

Also, is it possible to use the role that is running the module to be inferred as the accessKeyId and secretAccessKey instead of having to hardcode those values into the code being used to run the query?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.