Giter Site home page Giter Site logo

jshs2's Introduction

JSHS2

Introduction

JSHS2 is a node.js client driver for hive server 2. See test/PromiseTest.js, for an example of how to use it. JSHS2 reference from pyhs2 after rewrite javascript. But some feature modify suitable for javascript(ex> Promise support).

JSHS2 include IDL(Interface Description Language). For example, Thrift_0.9.2_Hive_1.1.0 in idl directory.

Important

If you want to use JSHS2, You must change hive server 2 configuration. See Hive Security Configuration, after change hive.server2.authentication value to NOSASL. If you meet connection timeout or querying timeout, most case you don't set NOSASL mode. Because hive server 2 default option is SASL mode.

<!-- in conf/hive-site.xml, change configuration restart hive server -->

<configuration>
  <property>
    <name>hive.exec.authentication</name>
    <value>NOSASL</value>
  </property>
</configuration>

What is differenc node-java with Hive JDBC between jshs2?

JDBC connect hive via node-java (using C++ JNI interface). JDBC don't need to set NOSASL, and fully support Apache communify. But JDBC query connection, execution that is some slow because node-jav JNI interface not fast furthermore JDBC need so much loop. If you need execute heavy query that don't suitable for your need.

Breaking Change

JSHS2 0.4.0 refactoring to Node.js 7.x. JSHS2 develop using by ES2015 feature. For example Class, Destructuring, etc...

If you use under Node.js 7.x, use JSHS2 0.3.1.

I need help!

And I need your help. JSHS2 is not implementation SASL. I hope that someone add SASL on this project. Contact [email protected] with questions

Install

npm i jshs2 --save

Option

Code

const options = {
  // Connection configuration
  auth: 'NOSASL',
  host: '101.102.103.104',           // HiveServer2 hostname
  port: '12340',                     // HiveServer2 port
  timeout: 10000,                    // Connection timeout
  username: 'jshs2tester',           // HiveServer2 user
  password: '',                      // HiveServer2 password
  hiveType: HS2Util.HIVE_TYPE.HIVE,  // HiveServer2 type, (Hive or CDH Hive)
  hiveVer: '1.1.0',                  // HiveServer2 Version
  thriftVer: '0.9.2',                // Thrift version at IDL Compile time

  // maybe if you need chdVer below after line
  cdhVer: '5.3.0',

  // Cursor configuration
  maxRows: 5120,
  nullStr: 'NULL',
  i64ToString: true,
};


const configure = new Configuration(options);
const idl = new IDLContainer();

idl.initialize(configure).then(() => {
  // your code, ...
});

Description

Configure new class. It contain connection configuration and cursor configuration.

  • auth - using auth mechanisms. At now only support 'NOSASL'
  • host - hive server2 ip address or domain
  • port - hive server2 port
  • timeout - timeout for connect function
  • username - username for hive server, maybe that logging username on hive server
  • hiveType - Hive Type, CDH or not (that is hive).
  • thriftVer - using thrift version
  • cdhVer - if you using CDH, describe version parameter
  • maxRows - fetch size
  • nullStr - Maybe column value is NULL, replace nullStr. Default value is 'NULL'.
  • i64ToString - javascript number type present floating number. So, i64 interger value cannot display using number. If you enable this flag,

Hive in CDH

getLog function is difference between vanilla Hive and Hive in CDH(CDH Hive). CDH Hive must support Hue. And that is display query operation status. So, Cloudera is add GetLog api on hive.

If you using CDH Hive, describe hiveType 'cdh' after you using getLog function. Reference Simple Usage

Interface Description Language(IDL)

Hive support thrift protocol, that is using by IDL. jshs2 can use your idl(idl for your environment). See under idl directory that was created using by simple rule.

  • Use Hive in CDH
    • /idl/Thrift_[Thrift version]_Hive_[Hive version]_CDH_[CDH version]
  • Use Vanilla Hive
    • /idl/Thrift_[Thrift version]_Hive_[Hive version]

jsHS2 include & Test

  • Thrift_0.9.2_Hive_0.13.1_CDH_5.3.0
  • Thrift_0.9.2_Hive_1.0.0
  • Thrift_0.9.2_Hive_1.1.0
  • Thrift_0.9.2_Hive_1.2.0
  • Thrift_0.9.2_Hive_1.2.1
  • Thrift_0.9.2_Hive_2.0.0
  • Thrift_0.9.3_Hive_1.0.0
  • Thrift_0.9.3_Hive_1.1.0
  • Thrift_0.9.3_Hive_1.2.0
  • Thrift_0.9.3_Hive_1.2.1
  • Thrift_0.9.3_Hive_2.0.0
  • Thrift_0.9.3_Hive_2.1.0

Custom IDL

Interface file compile from Hive(hive-0.13.1-cdh5.3.0/service/if). After copy and rename jshs2 idl directory, And You specify version. That is it!

Test

# Create cluster.json file, after modify value by your environment
$ cp cluster.json.sample cluster.json

$ npm run test     # without debug message
$ npm run test:msg # with debug message

Example

const fs = require('fs');
const co = require('co');
const expect = require('chai').expect;
const debug = require('debug')('jshs2:PromiseTest');
const jshs2 = require('../index.js');

const HS2Util = jshs2.HS2Util;
const IDLContainer = jshs2.IDLContainer;
const HiveConnection = jshs2.HiveConnection;
const Configuration = jshs2.Configuration;

const config = JSON.parse(fs.readFileSync('./cluster.json.sample'));
const options = {};

options.auth = config[config.use].auth;
options.host = config[config.use].host;
options.port = config[config.use].port;
options.timeout = config[config.use].timeout;
options.username = config[config.use].username;
options.hiveType = config[config.use].hiveType;
options.hiveVer = config[config.use].hiveVer;
options.cdhVer = config[config.use].cdhVer;
options.thriftVer = config[config.use].thriftVer;

options.maxRows = config[config.use].maxRows;
options.nullStr = config[config.use].nullStr;
options.i64ToString = config[config.use].i64ToString;

co(function* coroutine() {
  const execResult = yield cursor.execute(config.Query.query);

  for (let i = 0, len = 1000; i < len; i += 1) {
    const status = yield cursor.getOperationStatus();
    const log = yield cursor.getLog();

    debug('wait, status -> ', HS2Util.getState(serviceType, status));
    debug('wait, log -> ', log);

    if (HS2Util.isFinish(cursor, status)) {
      debug('Status -> ', status, ' -> stop waiting');

      break;
    }

    yield HS2Util.sleep(5000);
  }

  debug('execResult -> ', execResult);

  let fetchResult;
  if (execResult.hasResultSet) {
    const schema = yield cursor.getSchema();

    debug('schema -> ', schema);

    fetchResult = yield cursor.fetchBlock();

    debug('first row ->', JSON.stringify(fetchResult.rows[0]));
    debug('rows ->', fetchResult.rows.length);
    debug('rows ->', fetchResult.hasMoreRows);
  }

  return {
    hasResultSet: execResult.hasResultSet,
    rows: (execResult.hasResultSet) ? fetchResult.rows : [],
  };
}).then((data) => {
  console.log(data);
}).catch((err) => {
  debug('Error caused, ');
  debug(`message:  ${err.message}`);
  debug(`stack:  ${err.stack}`);
});

jshs2's People

Contributors

andrewcpacifico avatar imjuni avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

jshs2's Issues

Cannot get test to pass

I keep getting below error from Hiveserver2 log:

2018-04-10 15:58:19,848 ERROR [HiveServer2-Handler-Pool: Thread-41]: server.TThreadPoolServer (TThreadPoolServer.java:run(297)) - Error occurred during processing of message.
java.lang.RuntimeException: org.apache.thrift.transport.TTransportException: Invalid status -128
	at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:219)
	at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:269)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.thrift.transport.TTransportException: Invalid status -128
	at org.apache.thrift.transport.TSaslTransport.sendAndThrowMessage(TSaslTransport.java:232)
	at org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:184)
	at org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125)
	at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271)
	at org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41)
	at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216)
	... 4 more




2018-04-10 15:59:56,636 ERROR [HiveServer2-Handler-Pool: Thread-41]: server.TThreadPoolServer (TThreadPoolServer.java:run(297)) - Error occurred during processing of message.
java.lang.RuntimeException: org.apache.thrift.transport.TTransportException: Invalid status -128
	at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:219)
	at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:269)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.thrift.transport.TTransportException: Invalid status -128
	at org.apache.thrift.transport.TSaslTransport.sendAndThrowMessage(TSaslTransport.java:232)
	at org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:184)
	at org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125)
	at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271)
	at org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41)
	at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216)
	... 4 more





Thrift_0.9.2_Hive_0.13.1_CDH_5.3.02018-04-10 16:02:41,131 ERROR [HiveServer2-Handler-Pool: Thread-41]: server.TThreadPoolServer (TThreadPoolServer.java:run(297)) - Error occurred during processing of message.
java.lang.RuntimeException: org.apache.thrift.transport.TTransportException: Invalid status -128
	at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:219)
	at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:269)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.thrift.transport.TTransportException: Invalid status -128
	at org.apache.thrift.transport.TSaslTransport.sendAndThrowMessage(TSaslTransport.java:232)
	at org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:184)
	at org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125)
	at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271)
	at org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41)
	at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216)
	... 4 more

I use CDH 5.14 via yum install. So it comes with Hive 1.1.0. Look like the version incompatibility is the problem?

I also tried ApacheHive and not working either:

{
  "use": "HiveOnCDH",
  "HiveOnCDH": {
    "auth": "NOSASL",
    "host": "127.0.0.1",
    "port": 10000,
    "timeout": 10000,
    "username": "hive",
    "password": "hive",
    "hiveType": 1,
    "hiveVer": "0.13.1",
    "cdhVer": "5.3.0",
    "thriftVer": "0.9.2",
    "maxRows": 5120,
    "nullStr": "NULL",
    "i64ToString": true
  },
  "ApacheHive": {
    "auth": "NOSASL",
    "host": "127.0.0.1",
    "port": 10000,
    "timeout": 50000,
    "username": "hive",
    "password": "hive",
    "hiveType": 0,
    "hiveVer": "1.1.0",
    "thriftVer": "0.9.3",
    "maxRows": 5120,
    "nullStr": "NULL",
    "i64ToString": true
  },
  "Query": {
    "query": "show databases"
  }
}

Is this project still active? Could we get an update?

The thrift dependency is bloated

The npm thrift project is not packaged for library use, and it requires about 30 megs of space. It's fine for devDependencies but should not be used for production (distribution) builds.

This is just a copy of the whole thrift repo, basically:
https://www.npmjs.com/package/thrift

That's overkill, when we only need the JS thrift client.

org.apache.thrift.transport.TTransportException: Invalid status -128

I can't connect to our HiveServer2. After looking into server log, I got org.apache.thrift.transport.TTransportException: Invalid status -128. Here is my code:

const options = {
	auth: "NOSASL",
	host: "my host",
	port: 10000,
	timeout: 10000,
	username: "my username",
        password: "my password"
	hiveType: HS2Util.HIVE_TYPE.CDH,
	hiveVer: "0.13.1",
	thriftVer: "0.9.0",
	cdhVer: "5.3.3"
};


it('test', function() {
	var configuration = new Configuration(options);
	var idl = new IDLContainer();
	var cursor;
	return idl.initialize(configuration).then(function() {
		var connection = new HiveConnection(configuration, idl);
		return connection.connect();
	}).then(function(_cursor) {
		cursor = _cursor;
		return cursor.execute(options.query);
	}).then(function() {
		promise.delay(2000);
		logger.log('info', cursor.getOperationStatus());
	}).catch(function(error) {
		throw error;
	});
});

Server log:
java.lang.RuntimeException: org.apache.thrift.transport.TTransportException: Invalid status -128
at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:219)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:227)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.thrift.transport.TTransportException: Invalid status -128
at org.apache.thrift.transport.TSaslTransport.sendAndThrowMessage(TSaslTransport.java:230)
at org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:184)
at org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125)
at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:262)
at org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41)
at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216)
... 4 more

How can I know the query progress?

How can I know the query progress?
How long does it take to query about?
I want to monitor the state of the query, including time, progress.
such as :
A = The number of the current task finished.
B = The number of the total task finished.
pencent C = A/B;

example code in "Simple Usage"

Hi,
3 question about the example code:

  • where is jshs2 = require('jshs2') variable is used?
  • where does the conn variable get initial value?
  • have you ever tried that code? ;)

getLog problem solve

Previous version of Hive, (In 0.12.x or under version) that only CDH hive have a getLog function.

So, jshs2 are to detect exists getLog function using by hiveType.

Recently version of Hive, (In 1.0.0 or higher version) that have alternative method FetchResults. That is change option fetchType 0 to 1, have same effect getLog.

jshs2 will changed,

maybe hive version is higher 1.0.0, that change getLog using FetchResult and fetchType.

I am facing an issue with timed out when i run test run

Below is the details , i have increased timeout to 60000ms but still same issue
node_modules/.bin/mocha test/PromiseTest.js --timeout 60000
ThriftDriverTest
1) "before all" hook
2) "after all" hook
0 passing (1m)
2 failing

  1. ThriftDriverTest "before all" hook:
    Error: Timeout of 60000ms exceeded. For async tests and hooks, ensure "done()" is called; if returning a Promise, ensure it resolves.

  2. ThriftDriverTest "after all" hook:
    Uncaught AssertionError: expected [TypeError: Cannot read property 'close' of undefined] to not exist
    at Immediate.setImmediate (test/PromiseTest.js:80:27)

my config is with below properties
"hiveVer": "0.13.1",
"cdhVer": "5.3.0",
"thriftVer": "0.9.2",

Not able to connect to db

I am using node 6.1.0
And jshs2 0.4.4

And here is my code

` var serverConf = require("jshs2");

var Configuration = serverConf.Configuration;
var HiveConnection = serverConf.HiveConnection;
var IDLContainer = serverConf.IDLContainer;

var options = {
"auth": 'NOSASL',
"host": 'myserver',
"port": 10000,
"timeout": 100,
"username": 'abc',
"password": 'abc'
}

 const hiveConfig = new Configuration(options);
 const idl = new IDLContainer();


 function main() {
   idl.initialize(hiveConfig).then(function(data) {
	     var connection = new HiveConnection(hiveConfig, idl);
	    var cursor = connection.connect().then(function() {
		  console.log("success");
		// var res = cursor.execute('SELECT * FROM employees LIMIT 10');

		// if (res.hasResultSet) {
		// 	cursor.fetchBlock().then(function(fetchResult) {
		// 		for (var i = 0; i < fetchResult.rows.length; i++) {
		// 			console.log(fetchResult.rows[i])
		// 		}
		// 	})
		// }

		// cursor.close();
		// connection.close();
	     }, function() {
		     console.log("error");
	     });
	
	
  })


}

main();`

All I am getting in console is either 'error' or nothing.
Any idea what will be the problem?

test script is not working.

hi. your test code is not working.
I need your help, post error message. hive and hadoop was installed in local.

environment
-node version : v9.5.0
-npm : 5.6.0
-hadoop : apache 2.9.0
-hive : apache 2.3.2

I selected ApacheHive config and input 2.1.1 hive version and 0.9.3 thrift version.

image

try cursor.execute(query), but execute is not a function?

Hi I'm trying to use this module to connect to a Hiveserver2 DB, and I have tried this code below.

	const options = {
		// Connection configuration
		auth: 'NOSASL',
	  host: 'myhost',            // HiveServer2 hostname
	  port: '10000',                     // HiveServer2 port
	  username: 'root',           			 // HiveServer2 user
	  password: '12345',          // HiveServer2 password
	  hiveType: HS2Util.HIVE_TYPE.HIVE,  // HiveServer2 type, (Hive or CDH Hive)
	  hiveVer: '2.1.0',                  // HiveServer2 Version
	  thriftVer: '0.9.3',                // Thrift version at IDL Compile time
	};

	let connection;
	let cursor;
	let serviceType;

	const configuration = new Configuration(options);
  const idl = new IDLContainer();

	idl.initialize(configuration).then(() => {
	// your code, ...
		connection = new HiveConnection(configuration, idl);
		cursor = connection.connect();
		serviceType = idl.serviceType;
		console.log(connection);
		console.log(cursor);
		console.log(serviceType);
		cursor.execute('SHOW DATABASES').then((err,result) => {
			console.log('check if run this part');
			console.log(err);				
			console.log(result);
		})
	});

but I received the error that execute is not a function. I think I got some misunderstand with the read me example, and I write those code based on the read me and the test in this module.... hope to get some help thank you

why no response???

Port: 10000,
Servers: [ '192.168.100.125' ],
username: 'hadoop',
query: 'show databases' } }

{ auth: 'NOSASL',
host: '192.168.100.125',
port: 10000,
timeout: 10000,
username: 'hadoop',
hiveVer: '1.2.0',
thriftVer: '0.9.2',
hiveType: 0,
maxRows: 5120,
nullStr: 'NULL' }
jshs2:configure port, : +0ms 10000
jshs2:configure timeout, : +4ms 10000
jshs2:configure hiveType, : +2ms 0
jshs2:configure hiveVer, : +2ms 1.2.0
jshs2:configure thirftVer, : +3ms 0.9.2
jshs2:configure cdhVer, : +1ms null
jshs2:CConnection OpenSession function start, -> +27ms 192.168.100.125
jshs2:CConnection OpenSession function start, -> +2ms 10000
jshs2:CConnection Initialize, username -> +6ms hadoop
jshs2:CConnection Initialize, password -> +1ms
jshs2:CConnection OpenSession request start, +1ms { client_protocol: 7,
username: 'hadoop',
password: '',
configuration: null }
..
.
.
.
no response.....

why?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.