Light

fredo-xvii / hive Goto Github PK

View Code? Open in Web Editor NEW

0.0 2.0 0.0 9 KB

Anything Hive

Python 12.68% R 87.32%

hive's Introduction

HIVE

Anything Hive

Cheatsheets:

http://www.kdnuggets.com/2015/07/good-data-science-machine-learning-cheat-sheets.html

Coding

CREATE TABLE / TABLE MANIPULATION:
- https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTable
MANUAL
- https://cwiki.apache.org/confluence/display/Hive/LanguageManual
FUNCTIONS
- https://www.qubole.com/resources/cheatsheet/hive-function-cheat-sheet/

Settings for ODBC Connection:

ADD TEZ SETTINGS:
- https://community.hortonworks.com/content/supportkb/49486/how-to-set-hiveexecutionengine-using-hortonworks-h.html

Hive metadata

use sys; query string
use infromation_schema; query string
create materialized view table_name
constraints:
primary key (id) disable novalidate, constraint var foredign key (var) references table(id) disable novalidate
create table: var as surraget_key()
resource plan + caching:
create resource plan plan_name;
create pool plan_name.a with alloc_fraction = 0.8 query_parallelism = 5;
create pool plan_name.b with alloc_fraction = 0.2 query_parallelism = 20;
create rule ... ;
add rule downgrade to a;
create application maping ...
alter plan plan_name ... ;
apply plan plan_name;
options: hive.query.results.cache.enabled=true on by default
works on managed tables
store in /tmp/ folder
Spark-Hive connect
options:
hive.executeUpdate("sql String") : Bool
df.write.format(hive_warehouse_dconnector).save() "writes ORC files" from spark.
df.select() .filter .
streaming: - df.write.format(stream_to_stream).start()

Hive Arrays

explode: lateral view explode(array_string_field) x as field_name
array_contains

Map Field

key value pais concept
maps can be exploded.

Struct fields

must follow specific schema, not so in maps
use "dot" formais
create table:
insert overwrite table select * named_struct( # map() for map 'tiny_int.field_c', tinyint_field, 'tiny_int', tinyint_field ) from

JSON

get_json_object(json_col, "$.field[0].name") as var1

Partition field

always the last couple of cols.

UDF - create temporary functions

collect() - put data into 1 row, like collect(map())

hive's People

Contributors

Watchers

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.