Comments (5)
Hi,
You are correct. Currently nested loop join does not request hash distribution on the inner side.
Thanks for bringing this up.
In a typical case (except in the case of Indexed Nested Loop join) picking hash join it preferred. We will look into it soon.
V
from gporca.
Thanks for looking into it!! This case was hit in a perf testing. It's quite a common case where we join two big tables with indexed nested loop join with a LIMIT clause.
from gporca.
@guangzhouzhang I miss spoke earlier:
Orca does generate the correct distribution request for Index Nested Loop Join (https://github.com/greenplum-db/gporca/blob/master/libgpopt/src/operators/CPhysicalInnerIndexNLJoin.cpp#L95).
create table foo(a int, b int) distributed by (a);
create table bar(c int, d int) distributed by (c);
create INDEX bar_index on bar using btree (c);
vraghavan=# explain select * from foo, bar where a = c;
QUERY PLAN
---------------------------------------------------------------------------------
Gather Motion 3:1 (slice1; segments: 3) (cost=0.00..433.00 rows=1 width=16)
-> Nested Loop (cost=0.00..433.00 rows=1 width=16)
Join Filter: true
-> Table Scan on foo (cost=0.00..431.00 rows=1 width=8)
-> Index Scan using bar_index on bar (cost=0.00..2.00 rows=1 width=8)
Index Cond: bar.c = foo.a
Settings: optimizer=on
Optimizer status: PQO version 1.633
(8 rows)
As you can see we do not broadcast.
However when we have a text column we introduce a broadcast as shown below:
create table jcj_jjxx_r (
jjbh varchar(20) not null,
jjdbh varchar(30) default null,
primary key (jjbh)
);
create table jcj_cjxx_r (
cjbh varchar(20) not null,
jjbh varchar(20) not null,
primary key (jjbh)
);
explain select * from jcj_jjxx_r join jcj_cjxx_r on jcj_jjxx_r.jjbh= jcj_cjxx_r.jjbh;
QUERY PLAN
-----------------------------------------------------------------------------------------------
Gather Motion 3:1 (slice2; segments: 3) (cost=0.00..433.00 rows=1 width=32)
-> Nested Loop (cost=0.00..433.00 rows=1 width=32)
Join Filter: true
-> Broadcast Motion 3:3 (slice1; segments: 3) (cost=0.00..431.00 rows=1 width=16)
-> Table Scan on jcj_jjxx_r (cost=0.00..431.00 rows=1 width=16)
-> Index Scan using jcj_cjxx_r_pkey on jcj_cjxx_r (cost=0.00..2.00 rows=1 width=16)
Index Cond: jcj_cjxx_r.jjbh::text = jcj_jjxx_r.jjbh::text
Settings: optimizer=on
Optimizer status: PQO version 1.633
(9 rows)
Will update the story to work on this soon. Thanks!
from gporca.
This issue has been open for nearly 1 year! I am planning to jump into the rabbit hole.
create table R1(a varchar, primary key(a));
create table S1(b varchar, primary key(b));
explain select * from R1, S1 where R1.a = S1.b;
QUERY PLAN
---------------------------------------------------------------------------------------------
Gather Motion 3:1 (slice2; segments: 3) (cost=0.00..433.00 rows=1 width=16)
-> Nested Loop (cost=0.00..433.00 rows=1 width=16)
Join Filter: true
-> Broadcast Motion 3:3 (slice1; segments: 3) (cost=0.00..431.00 rows=1 width=8)
-> Table Scan on r1 (cost=0.00..431.00 rows=1 width=8)
-> Index Scan using s1_pkey on s1 (cost=0.00..2.00 rows=1 width=8)
Index Cond: s1.b::text = r1.a::text
Settings: optimizer=on
Optimizer status: PQO version 2.12.0
(9 rows)
create table R(a text, primary key(a));
create table S(b text, primary key(b));
explain select * from R, S where R.a = S.b;
QUERY PLAN
-------------------------------------------------------------------------------
Gather Motion 3:1 (slice1; segments: 3) (cost=0.00..433.00 rows=1 width=16)
-> Nested Loop (cost=0.00..433.00 rows=1 width=16)
Join Filter: true
-> Table Scan on r (cost=0.00..431.00 rows=1 width=8)
-> Index Scan using s_pkey on s (cost=0.00..2.00 rows=1 width=8)
Index Cond: s.b = r.a
Settings: optimizer=on
Optimizer status: PQO version 2.12.0
(8 rows)
When the join column is text
type, we can generate plan without broadcasting the outer side, but for varchar
type, plan with broadcast motion is generated.
I took a look at the memo generated by these 2 plans, they almost have the same plan space/plan alternatives, same optimization request.
By setting the following GUC on,
set optimizer=on;
set client_min_messages='log';
set optimizer_enumerate_plans=on;
set optimizer_print_optimization_context = on;
set optimizer_print_memo_after_optimization = on;
explain select * from R1, S1 where R1.a = S1.b;
Check the memo structure for outer child R1
:
2017-03-25 10:47:28:813289 PDT,THD000,TRACE,"
Group 0 (#GExprs: 4):
0: CLogicalGet "r1" ("r1"), Columns: ["a" (0), "ctid" (1), "xmin" (2), "cmin" (3), "xmax" (4), "cmax" (5), "tableoid" (6), "gp_segment_id" (7)] Key sets: {[0], [1,7]} [ ]
1: CPhysicalTableScan "r1" ("r1") [ ]
Cost Ctxts:
main ctxt (stage 0)6.0, child ctxts:[], rows:1.000000 (group), cost: 431.000021
main ctxt (stage 0)1.0, child ctxts:[], rows:1.000000 (group), cost: 431.000021
main ctxt (stage 0)3.0, child ctxts:[], rows:1.000000 (group), cost: 431.000021
main ctxt (stage 0)2.0, child ctxts:[], rows:1.000000 (group), cost: 431.000021
2: CPhysicalMotionGather(master) [ 0 ]
Cost Ctxts:
main ctxt (stage 0)4.0, child ctxts:[1], rows:1.000000 (group), cost: 431.000125
main ctxt (stage 0)0.0, child ctxts:[1], rows:1.000000 (group), cost: 431.000125
3: CPhysicalMotionBroadcast [ 0 ]
Cost Ctxts:
main ctxt (stage 0)7.0, child ctxts:[1], rows:1.000000 (group), cost: 431.000465
main ctxt (stage 0)5.0, child ctxts:[1], rows:1.000000 (group), cost: 431.000465
Grp OptCtxts:
6 (stage 0): (req cols: ["a" (0)], req CTEs: [], req order: [<empty> match: satisfy ], req dist: [HASHED: [ +--CScalarCast origin: [Grp:3, GrpExpr:0]
+--CScalarIdent "a" (0) origin: [Grp:2, GrpExpr:0]
, nulls colocated ] match: satisfy], req rewind: [NON-REWINDABLE match: satisfy], req partition propagation: [Filters: [] match: satisfy ]) => Best Expr:1
4 (stage 0): (req cols: ["a" (0)], req CTEs: [], req order: [<empty> match: satisfy ], req dist: [SINGLETON (master) match: satisfy], req rewind: [NON-REWINDABLE match: satisfy], req partition propagation: [Filters: [] match: satisfy ]) => Best Expr:2
7 (stage 0): (req cols: ["a" (0)], req CTEs: [], req order: [<empty> match: satisfy ], req dist: [REPLICATED match: exact], req rewind: [NON-REWINDABLE match: satisfy], req partition propagation: [Filters: [] match: satisfy ]) => Best Expr:3
1 (stage 0): (req cols: ["a" (0)], req CTEs: [], req order: [<empty> match: satisfy ], req dist: [ANY EOperatorId: 121 match: satisfy], req rewind: [NON-REWINDABLE match: satisfy], req partition propagation: [Filters: [] match: satisfy ]) => Best Expr:1
5 (stage 0): (req cols: ["a" (0)], req CTEs: [], req order: [<empty> match: satisfy ], req dist: [REPLICATED match: satisfy], req rewind: [NON-REWINDABLE match: satisfy], req partition propagation: [Filters: [] match: satisfy ]) => Best Expr:3
3 (stage 0): (req cols: ["a" (0)], req CTEs: [], req order: [<empty> match: satisfy ], req dist: [HASHED: [ +--CScalarIdent "a" (0) origin: [Grp:2, GrpExpr:0]
, nulls colocated ] match: exact], req rewind: [NON-REWINDABLE match: satisfy], req partition propagation: [Filters: [] match: satisfy ]) => Best Expr:1
2 (stage 0): (req cols: ["a" (0)], req CTEs: [], req order: [<empty> match: satisfy ], req dist: [NON-SINGLETON match: satisfy], req rewind: [NON-REWINDABLE match: satisfy], req partition propagation: [Filters: [] match: satisfy ]) => Best Expr:1
0 (stage 0): (req cols: ["a" (0)], req CTEs: [], req order: [<empty> match: satisfy ], req dist: [SINGLETON (master) match: exact], req rewind: [NON-REWINDABLE match: satisfy], req partition propagation: [Filters: [] match: satisfy ]) => Best Expr:2
",
The CPhysicalInnerIndexNLJoin
decided to request replication for R1 (Group 0)
, with the child group optimization context is 7
, why not 3
or 6
?
I suspect that this might related with that GPDB internally cast varchar
to text
, Orca can't correctly deal with ScalarIdent
wrapped by a ScalarCast
. But looking at the code of CDistributionSpecHashed.cpp
, it does extract the column by calling CUtils::PexprWithoutBinaryCoercibleCasts
.
@vraghavan78 Any thoughts?
from gporca.
Fixed by commit: 34fc6a7
from gporca.
Related Issues (20)
- make error on aarch64 with gcc 8.3.0 on archlinuxarm 64 HOT 8
- Memory leak when running JoinPlanWithRedistribute.mdp in debug mode
- Incorrect join partition elimination HOT 3
- Long plan generation utilising all available memory HOT 1
- make error on solaris HOT 3
- GPORCA doesn't support aarch64 platform.
- can orca apply to innodb?
- gporca occurs exception when using statistic from numeric(30, 8) column to optimize plan
- clang-format in Allman style always indents the contents of extern "C" blocks in C headers
- no <dxl:DistrOpfamilies> tag in gporca/server/dxl.xsd
- CHistogram should keep track of row count (cardinality)
- Orca optimizer single-threaded vs. multi-threaded HOT 3
- why COptCtxt to tls
- [question] why use `virtual` instead of `override`
- Delegating query optimization task to Orca HOT 1
- Orca - how to use
- Is there any formal grammar for the definition of query/plan?
- Should consider startup cost of inner table in NLJ? HOT 1
- Is a bug of CExpression.cpp:505?
- Incorrect result about group by SQL HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gporca.