Describe the issue Add support for org.

CC: <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

I think this should have been fixed by <a class="issue-link js-issue-link" data-error-

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

yes this looks like a solved issue now. Thanks for checking <a class="user-mention not

Add Alias column support in join condition for JoinIndex rule about hyperspace HOT 6 CLOSED

microsoft commented on May 22, 2024

Add Alias column support in join condition for JoinIndex rule

from hyperspace.

Comments (6)

rapoth commented on May 22, 2024

CC: @sezruby

from hyperspace.

rapoth commented on May 22, 2024

CC: @imback82

from hyperspace.

imback82 commented on May 22, 2024

I think this should have been fixed by apache/spark#26943 in Spark 3.0, and I don't think it's worthwhile to update the rule to support this scenario for Spark 2.4.

For the following:

spark.conf.set("spark.sql.autoBroadcastJoinThreshold", "-1")
val df1 = (0 until 100).map(i => (i % 5, i % 11)).toDF("i1", "j1")
val df2 = (0 until 100).map(i => (i % 7, i % 11)).toDF("i2", "j2")
df1.write.format("parquet").bucketBy(8, "i1").saveAsTable("t1")
df2.write.format("parquet").bucketBy(8, "i2").saveAsTable("t2")
sql("SELECT t1.i1, t2Temp.aliasC FROM t1 INNER JOIN (SELECT i2 as aliasC from t2) as t2Temp WHERE t1.i1 = t2Temp.aliasC").explain

Spark 2.4.5 outputs:

== Physical Plan ==
*(5) SortMergeJoin [i1#31], [aliasC#37], Inner
:- *(2) Sort [i1#31 ASC NULLS FIRST], false, 0
:  +- Exchange hashpartitioning(i1#31, 200)
:     +- *(1) Project [i1#31]
:        +- *(1) Filter isnotnull(i1#31)
:           +- *(1) FileScan parquet default.t1[i1#31] Batched: true, Format: Parquet, Location: InMemoryFileIndex[file:...], PartitionFilters: [], PushedFilters: [IsNotNull(i1)], ReadSchema: struct<i1:int>, SelectedBucketsCount: 8 out of 8
+- *(4) Sort [aliasC#37 ASC NULLS FIRST], false, 0
   +- Exchange hashpartitioning(aliasC#37, 200)
      +- *(3) Project [i2#33 AS aliasC#37]
         +- *(3) Filter isnotnull(i2#33)
            +- *(3) FileScan parquet default.t2[i2#33] Batched: true, Format: Parquet, Location: InMemoryFileIndex[file:...], PartitionFilters: [], PushedFilters: [IsNotNull(i2)], ReadSchema: struct<i2:int>, SelectedBucketsCount: 8 out of 8

whereas Spark 3.0 prints:

== Physical Plan ==
*(3) SortMergeJoin [i1#27], [aliasC#33], Inner
:- *(1) Sort [i1#27 ASC NULLS FIRST], false, 0
:  +- *(1) Project [i1#27]
:     +- *(1) Filter isnotnull(i1#27)
:        +- *(1) ColumnarToRow
:           +- FileScan parquet default.t1[i1#27] Batched: true, DataFilters: [isnotnull(i1#27)], Format: Parquet, Location: InMemoryFileIndex[file:...], PartitionFilters: [], PushedFilters: [IsNotNull(i1)], ReadSchema: struct<i1:int>, SelectedBucketsCount: 8 out of 8
+- *(2) Sort [aliasC#33 ASC NULLS FIRST], false, 0
   +- *(2) Project [i2#29 AS aliasC#33]
      +- *(2) Filter isnotnull(i2#29)
         +- *(2) ColumnarToRow
            +- FileScan parquet default.t2[i2#29] Batched: true, DataFilters: [isnotnull(i2#29)], Format: Parquet, Location: InMemoryFileIndex[file:...], PartitionFilters: [], PushedFilters: [IsNotNull(i2)], ReadSchema: struct<i2:int>, SelectedBucketsCount: 8 out of 8

from hyperspace.

rapoth commented on May 22, 2024

Great thanks! I'm closing this issue.

from hyperspace.

imback82 commented on May 22, 2024

@apoorvedave1 Can you confirm if this is not needed in the join rule? I see that we clean up the aliases, so matching should be ok, right?

from hyperspace.

apoorvedave1 commented on May 22, 2024

yes this looks like a solved issue now. Thanks for checking @imback82

from hyperspace.

Add Alias column support in join condition for JoinIndex rule about hyperspace HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent