miyakawataku / pig Goto Github PK
View Code? Open in Web Editor NEWLicense: Apache License 2.0
License: Apache License 2.0
Apache Pig =========== Pig is a dataflow programming environment for processing very large files. Pig's language is called Pig Latin. A Pig Latin program consists of a directed acyclic graph where each node represents an operation that transforms data. Operations are of two flavors: (1) relational-algebra style operations such as join, filter, project; (2) functional-programming style operators such as map, reduce. Pig compiles these dataflow programs into (sequences of) map-reduce jobs and executes them using Hadoop. It is also possible to execute Pig Latin programs in a "local" mode (without Hadoop cluster), in which case all processing takes place in a single local JVM. General Info =============== For the latest information about Pig, please visit our website at: http://pig.apache.org/ and our wiki, at: http://wiki.apache.org/pig/ Getting Started =============== 1. To learn about Pig, try http://wiki.apache.org/pig/PigTutorial 2. To build and run Pig, try http://wiki.apache.org/pig/BuildPig and http://wiki.apache.org/pig/RunPig 3. To check out the function library, try http://wiki.apache.org/pig/PiggyBank Contributing to the Project =========================== We welcome all contributions. For the details, please, visit http://wiki.apache.org/pig/HowToContribute.
LOAD 'data' as (f1:int, f2:tuple(t1:int,t2:int,t3:int));
It should be
A = LOAD 'data' as (f1:int, f2:tuple(t1:int,t2:int,t3:int));
in udf.xml
basic.html#sample says that
Partitions a relation into two or more relations.
Of course it is incorrect.
at func.xml
in basic.xml
basic.html#projection says "In this example the asterisk (*) is used to project all tuples from relation A".
in func.xml
in udf.xml
basic.html#types-table-negative should be nested under #sign.
But not.
Typo: COUNT_STAR is used the count -> COUNT_STAR is used to count
On basic.html#define-udfs
At func.xml
On basic.html#define-udfs.
At func.xml
LAST_INDEX_OF(expression) -> LAST_INDEX_OF(string, 'character', startIndex)
in func.xml
basic.xml defines the syntax of tuple schemas:
alias[:tuple] (alias[:type]) [, (alias[:type]) …] )
Syntax of bag schemas:
alias[:bag] {tuple}
Syntax of map schemas:
alias<:map> [ <type> ]
Colons ( : ) leading "tuple", "bag", "map" are described as optional, but actually they are not.
In basic.html#GROUP
It's wrong.
The DESCRIBE operator shows the schema for relation X, which has two fields, "group" and "A"
That should be
The DESCRIBE operator shows the schema for relation X, which has two fields, "group", "A" , and "B"
in func.xml
basic.xml says "Multiple fields are enclosed in parentheses and separated by commas" but actually fields are not enclosed in parentheses.
When a whole regex pattern matches but a specific group does not match, what value is contained int the corresponding fields of the result tuple?
basic.xml says "the simplest tuple expression is the star expression, which represents all fields."
Is it correct? I think a star expression "" represents all fields, and "()" represents a tuple which contains all fields.
The Japanese version now translates the phrase literally.
The description says:
Compares two fields in a tuple.
Actually, the function compares two bag arguments and returns a bag of tuples which are included in only one argument.
On basic.html#load.
This text has no link.
Schemas for Complex Data Types and Schemas for Multiple Types.
On basic.html#ship-about
Strange.
in basic.xml
basic.xml says "If the schema of a relationship can't be inferred", but it should be "If the schema of a relation can't be inferred".
basic.xml says "A map key must be a scalar", but actually a map key must be a chararray.
Typo: delimiters -> deliminters
in func.xml
in the heading of the result table of ROUND function.
in func.xml
in basic.xml
What occurs for no tuples?
In this example tuples are co-grouped and the INNER keyword is used asymmetrically on only one of the relations...
On basic.html#register
Outer joins only works for two relations.
in func.xml
In "Syntax" subsection of REGEX_EXTRACT_ALL in func.xml
T/O
IntMax example lacks 'max' method, in udf.xml.
That method is called, but not defined.
basic.html says about the argument of MAX:
An expression with data types int, long, float, double, or chararray.
But actually, MAX takes a bag of int/long/.....
basic.xml says
When using the GROUP (COGROUP) operator with multiple relations, records with a null group key are considered different and are grouped separately.
However, nulls in a single relation are grouped altogether.
basic.xml says "Having a deterministic schema is very powerful", but it may be "Having a non-deterministic schema is very powerful".
in func.xml
in basic.xml
In basic.html#LIMIT
In this example the lmit is express as a scalar.
should be
In this example the limit is expressed as a scalar.
in func.xml
T/O
T/O
in basic.xml
-> A[i] == B[i]
-> k1 == k2
-> v1 == v2
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.