Skip to content

[SQL] Propagate join hints through compiler stages#6162

Merged
mihaibudiu merged 5 commits into
feldera:mainfrom
mihaibudiu:issue6160
May 12, 2026
Merged

[SQL] Propagate join hints through compiler stages#6162
mihaibudiu merged 5 commits into
feldera:mainfrom
mihaibudiu:issue6160

Conversation

@mihaibudiu
Copy link
Copy Markdown
Contributor

Fixes #6160

This includes some test that optimizer hints are pushed (at least for simple programs) through all optimizations.
Currently these are not used in the generated code, since there is no API to pass join hints to the circuit; that needs some DBSP support.

I created 3 hints for joins (broadcast, shard, balance), but I didn't document them yet, since they have no effect.

@mihaibudiu
Copy link
Copy Markdown
Contributor Author

image

Comment on lines +1530 to +1541
switch (hint.hintName) {
case "broadcast":
strategies.add(new JoinStrategy(JoinStrategy.Strategy.Broadcast));
break;
case "balance":
strategies.add(new JoinStrategy(JoinStrategy.Strategy.Balance));
break;
case "shard":
strategies.add(new JoinStrategy(JoinStrategy.Strategy.Shard));
break;
default:
break;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hint names "broadcast", "balance", "shard" are duplicated as raw strings here and again in SqlToRelCompiler#HintStrategyTable. Two sources of truth for the same registry — easy to drift if one name is renamed or a new strategy added. Consider deriving them from JoinStrategy.Strategy (e.g. Strategy.Broadcast.name().toLowerCase(Locale.ROOT)) and registering them in one place so adding a new strategy is a single-file change.

Also: the default: arm silently drops anything that is not one of the three. Calcite's HintStrategyTable filters unknown names upstream, but a typo on a registered name (e.g. "broadcaast") would silently no-op here. An assertion or a switch keyed on Strategy.values() would catch that.

HepProgram getProgram(RelNode node, int level) {
// only call this if there are no Correlates
if (!containsCorrelate(node)) {
if (!containsCorrelate(node) && !containsHints(node) && (joinCount(node) > 1)) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two distinct behavioral changes bundled into one condition:

  1. !containsHints(node) is wider than it needs to be. The comment two lines down says HYPER_GRAPH_OPTIMIZE "loses hints" — that is specifically about join hints. Any hint anywhere in the tree (a query-level /*+ ... */ on the SELECT, a table hint that has nothing to do with joins) now disables hypergraph reordering for the entire query. Tighten this to "contains a hint on a Join" so unrelated hints do not silently kill an optimization.

  2. joinCount(node) > 1 is unrelated to the hint-propagation goal of this PR. Whether or not it is a no-op for single-join queries in practice, the gate change should land with its own rationale and a test that pins the new behavior. Right now nothing in the test suite would notice if this condition were flipped back.

Comment on lines +54 to +70
static boolean containsHints(RelNode node) {
if (node instanceof Hintable ha)
return !ha.getHints().isEmpty();
for (RelNode input : node.getInputs()) {
if (containsHints(input))
return true;
}
return false;
}

static int joinCount(RelNode node) {
int recurse = 0;
for (RelNode input : node.getInputs()) {
recurse += joinCount(input);
}
if (node instanceof Join)
recurse += 1;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each of containsCorrelate, containsHints, and joinCount walks the entire RelNode tree independently. On every Hypergraph step invocation that is three full traversals where a single visitor could compute all three predicates at once. Soft — only worth folding together if profiling shows it on a hot path.

Comment on lines +10 to +13
public enum Strategy {
Broadcast,
Shard,
Balance,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Broadcast and Shard are well-known terms in distributed-join literature; Balance is not. A short Javadoc on each variant — what it means operationally, even informally — would save the next reader (or a user reading docs generated from this) a hunt. Especially important since these names are user-facing via /*+ ... */ syntax.

Assert.assertTrue(this.joinFound);
}
});
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only broadcast is covered. shard and balance go through the same switch but a typo in one of those case labels would slip past every test in this PR. Parameterizing across JoinStrategy.Strategy.values() is cheap and keeps coverage automatic when new strategies are added.

Also missing: a regression test for the new hypergraph gate — a 3-way join with a join hint, asserting the hint survives end-to-end (i.e. that the gate added in CalciteOptimizer actually does what it says).

Comment thread docs.feldera.com/docs/sql/grammar.md Outdated
FROM
tableName /*+ hint3(5, 'x') */
JOIN
tableName /*+ hint4(c=id), hint5 */
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hint4(c=id) uses an identifier (id) as the value, but the grammar a few lines below declares optionVal : stringLiteral. Either the example should be hint4(c='id') or the grammar should list simpleIdentifier as an alternative for optionVal.

Support for hints is under development.

## Creating indexes

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section is empty other than "Support for hints is under development." Since the parser now accepts broadcast, shard, and balance without errors (they are registered in HintStrategyTable), users will discover them and try them. A one-line table listing the three names with "recognized but currently no-op" is more honest than the current placeholder and pre-empts the bug report "my hint did nothing."

public static final String HINT_BROADCAST_LEFT = "broadcast_left";
public static final String HINT_BROADCAST_RIGHT = "broadcast_right";
public static final String HINT_BALANCE = "balance";
public static final String HINT_SHARD = "shart";
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: "shart" should be "shard". As written, the Shard strategy is registered under the SQL hint name shart, so users have to write /*+ shart */ to get sharded joins, and /*+ shard */ falls through to the default arm in visitJoin and is silently ignored. No test catches this because only broadcast_left is exercised end-to-end.

HepProgram getProgram(RelNode node, int level) {
// only call this if there are no Correlates
if (!containsCorrelate(node)) {
if (!containsCorrelate(node) && !containsHints(node) && (joinCount(node) > 1)) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The gate now combines three conditions, but two of them are conceptually unrelated:

  • !containsHints(node) — disabling the hyper-graph rule when any hint is present anywhere in the subtree is much coarser than necessary. A non-join hint elsewhere in the plan (or any future scan/aggregate hint) will silently disable hyper-graph optimization for the whole query. Skipping should be conditional on a hint the rule would actually drop, e.g. a JoinStrategy annotation on a Join.
  • joinCount(node) > 1 — this is an independent behavioral change with no rationale comment and no test. If it's meant as a perf guard (don't run hyper on trivial single-join plans), please say so in a comment and add coverage; otherwise it shouldn't be folded into this PR.

Also, three independent recursive walks (containsCorrelate + containsHints + joinCount) over the same tree on every optimizer entry — fine for now, but worth a single visitor if this grows.

public void testHints() {
var ccs = this.getCCS("""
CREATE TABLE T(x INT);
CREATE VIEW V AS SELECT /*+ broadcast_left */ * FROM T JOIN T AS S USING (x);""");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only broadcast_left is exercised end-to-end. broadcast_right, balance, and shard go through the same switch in visitJoin, but each maps to a distinct JoinStrategy.Strategy enum value — easy to typo a case arm (cf. "shart" above). Please parameterize this test or add three more variants asserting the correct annotation per hint.

Comment thread docs.feldera.com/docs/sql/grammar.md Outdated

### Supported hints and their impact on query implementation

Support for hints is under development.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR introduces four user-visible join hints (broadcast_left, broadcast_right, balance, shard) and registers them in HintStrategyTable, but the "Supported hints" section still just says "Support for hints is under development." Either document the four hints here (name, semantics, where they may appear) or add a follow-up issue and link it from this section.

@mihaibudiu mihaibudiu force-pushed the issue6160 branch 2 times, most recently from 7c8b9e7 to ce3b5cb Compare May 7, 2026 01:26
a hash-join strategy by sharding the input with alias *table*
- `balanced(`*table*`)`: Indicates that the following `JOIN` should be implemented using
a balanced strategy by hashing on all fields the input with alias *table*

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doc/code mismatch: this lists the hint as balanced(table), but the registered hint name is balanceHINT_BALANCE = "balance" in SqlToRelCompiler, and the new testThreeWayJoinHints writes /*+ ... balance(U) */. A user copying the docs verbatim with /*+ balanced(U) */ will get an unknown hint that silently no-ops. Either rename the constant to balanced or fix the docs to say balance(table).

@mihaibudiu mihaibudiu force-pushed the issue6160 branch 3 times, most recently from de11941 to 53126b5 Compare May 8, 2026 07:17
Copy link
Copy Markdown
Contributor

@ryzhyk ryzhyk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving modulo Rust warning @mihaibudiu is fixing.

Compiler warning are currently false negatives. This is ok as a preview, but we really need to fix this and turn them, into errors for people to be able to use hints.

@mihaibudiu mihaibudiu added this pull request to the merge queue May 8, 2026
@ryzhyk ryzhyk removed this pull request from the merge queue due to a manual request May 8, 2026
@mihaibudiu mihaibudiu force-pushed the issue6160 branch 2 times, most recently from 71c7019 to 761fe64 Compare May 8, 2026 20:50
@mihaibudiu mihaibudiu enabled auto-merge May 8, 2026 22:31
@mihaibudiu mihaibudiu added this pull request to the merge queue May 8, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 9, 2026
@mihaibudiu mihaibudiu enabled auto-merge May 9, 2026 03:49
@mihaibudiu mihaibudiu added this pull request to the merge queue May 9, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 9, 2026
Copy link
Copy Markdown

@mythical-fred mythical-fred left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mihaibudiu mihaibudiu added this pull request to the merge queue May 11, 2026
@mihaibudiu mihaibudiu removed this pull request from the merge queue due to a manual request May 11, 2026
@mihaibudiu mihaibudiu enabled auto-merge May 11, 2026 07:28
@mihaibudiu mihaibudiu disabled auto-merge May 11, 2026 07:29
@mihaibudiu mihaibudiu enabled auto-merge May 11, 2026 07:30
@mihaibudiu mihaibudiu added this pull request to the merge queue May 11, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 11, 2026
@mihaibudiu mihaibudiu force-pushed the issue6160 branch 2 times, most recently from 8738bff to 318cfa0 Compare May 11, 2026 21:58
@mihaibudiu mihaibudiu enabled auto-merge May 11, 2026 21:58
@mihaibudiu mihaibudiu added this pull request to the merge queue May 11, 2026
@mihaibudiu mihaibudiu removed this pull request from the merge queue due to a manual request May 11, 2026
@mihaibudiu mihaibudiu enabled auto-merge May 11, 2026 23:31
@mihaibudiu mihaibudiu added this pull request to the merge queue May 11, 2026
Copy link
Copy Markdown

@mythical-fred mythical-fred left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New commits since my last approval are CI/test fixes (Java test verbosity, multi-crate CI fix, Python integration test fix). No substantive design changes. Prior approval stands.

@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 12, 2026
Signed-off-by: Mihai Budiu <mbudiu@feldera.com>
Signed-off-by: Mihai Budiu <mbudiu@feldera.com>
…own because of earthly bugs)

Signed-off-by: Mihai Budiu <mbudiu@feldera.com>
Signed-off-by: Mihai Budiu <mbudiu@feldera.com>
Signed-off-by: Mihai Budiu <mbudiu@feldera.com>
@mihaibudiu mihaibudiu enabled auto-merge May 12, 2026 07:37
@mihaibudiu mihaibudiu added this pull request to the merge queue May 12, 2026
Merged via the queue into feldera:main with commit a26e6c0 May 12, 2026
1 check passed
@mihaibudiu mihaibudiu deleted the issue6160 branch May 12, 2026 09:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[SQL] Add support for optimizer hints

3 participants