Summary
LookupJoinOperator supports INNER, LEFT, SEMI, and ANTI join types. For INNER/LEFT it evaluates the join's non-equi conditions after the dimension-table lookup, but for SEMI/ANTI it appears to silently ignore them, which can produce incorrect results.
Details
buildJoinedDataBlockDefault (INNER/LEFT) looks up the right row and then applies _nonEquiEvaluators before emitting the joined row.
buildJoinedDataBlockSemi / buildJoinedDataBlockAnti only test key existence via _rightTable.containsKey(key) and never reference _nonEquiEvaluators.
- The constructor builds
_nonEquiEvaluators from node.getNonEquiConditions() for every join type, so non-equi conditions present on a SEMI/ANTI lookup join are constructed but never evaluated.
Can SEMI/ANTI lookup joins carry non-equi conditions?
It appears so. RelToPlanNodeConverter#convertLogicalJoin builds the lookup JoinNode with joinInfo.nonEquiConditions for any join type and does not forbid non-equi conditions for lookup joins. (By contrast, the ASOF branch explicitly does Preconditions.checkState(joinInfo.nonEquiConditions.isEmpty(), ...).) I couldn't find a guard or a comment indicating that SEMI/ANTI lookup joins are guaranteed to be equi-only.
Impact
For a SEMI/ANTI lookup join that has a non-equi condition, the predicate is dropped: a SEMI join keeps left rows that the predicate should exclude, and an ANTI join drops left rows it should keep.
Questions / suggestions
- Is this intended — i.e., are SEMI/ANTI lookup joins guaranteed elsewhere to never carry non-equi conditions? If so, a
Preconditions.checkState/comment would make the invariant explicit.
- Otherwise, SEMI/ANTI should evaluate the non-equi conditions (which would require fetching the right row via
lookupValues instead of containsKey), or the planner should reject the combination (as ASOF does).
References: LookupJoinOperator#buildJoinedDataBlockSemi / #buildJoinedDataBlockAnti vs #buildJoinedDataBlockDefault; RelToPlanNodeConverter#convertLogicalJoin.
Summary
LookupJoinOperatorsupportsINNER,LEFT,SEMI, andANTIjoin types. ForINNER/LEFTit evaluates the join's non-equi conditions after the dimension-table lookup, but forSEMI/ANTIit appears to silently ignore them, which can produce incorrect results.Details
buildJoinedDataBlockDefault(INNER/LEFT) looks up the right row and then applies_nonEquiEvaluatorsbefore emitting the joined row.buildJoinedDataBlockSemi/buildJoinedDataBlockAntionly test key existence via_rightTable.containsKey(key)and never reference_nonEquiEvaluators._nonEquiEvaluatorsfromnode.getNonEquiConditions()for every join type, so non-equi conditions present on a SEMI/ANTI lookup join are constructed but never evaluated.Can SEMI/ANTI lookup joins carry non-equi conditions?
It appears so.
RelToPlanNodeConverter#convertLogicalJoinbuilds the lookupJoinNodewithjoinInfo.nonEquiConditionsfor any join type and does not forbid non-equi conditions for lookup joins. (By contrast, the ASOF branch explicitly doesPreconditions.checkState(joinInfo.nonEquiConditions.isEmpty(), ...).) I couldn't find a guard or a comment indicating that SEMI/ANTI lookup joins are guaranteed to be equi-only.Impact
For a SEMI/ANTI lookup join that has a non-equi condition, the predicate is dropped: a SEMI join keeps left rows that the predicate should exclude, and an ANTI join drops left rows it should keep.
Questions / suggestions
Preconditions.checkState/comment would make the invariant explicit.lookupValuesinstead ofcontainsKey), or the planner should reject the combination (as ASOF does).References:
LookupJoinOperator#buildJoinedDataBlockSemi/#buildJoinedDataBlockAntivs#buildJoinedDataBlockDefault;RelToPlanNodeConverter#convertLogicalJoin.