Skip to content

feat: Bigquery as OLAP engine#9161

Merged
k-anshul merged 22 commits intomainfrom
bigquery_olap
Apr 17, 2026
Merged

feat: Bigquery as OLAP engine#9161
k-anshul merged 22 commits intomainfrom
bigquery_olap

Conversation

@k-anshul
Copy link
Copy Markdown
Member

@k-anshul k-anshul commented Apr 1, 2026

closes https://linear.app/rilldata/issue/PLAT-450/metrics-views-on-bigquery

Added

TODOs to be done with follow ups:

  • Exports are broken
  • remove conversion of civil.Date to time.Time in the rill driver and handle it wherever required

Checklist:

  • Covered by tests
  • Ran it and it works as intended
  • Reviewed the diff before requesting a review
  • Checked for unhandled edge cases
  • Linked the issues it closes
  • Checked if the docs need to be updated. If so, create a separate Linear DOCS issue
  • Intend to cherry-pick into the release branch
  • I'm proud of this work!

@k-anshul k-anshul self-assigned this Apr 1, 2026
}

rangeSQL := fmt.Sprintf(
"SELECT min(%[1]s) as `min`, max(%[1]s) as `max`, %[2]s as `watermark` FROM %[3]s %[4]s",
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not an efficient query even when running on partition column

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An optimization can be done where we check if this is the partition column in the table and directly check on min/max partition metadata.
Given this is an often executed query I think it can done in a follow-up. @begelundmuller thoughts ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the optimization can be done in a fast/cheap/safe way, then yeah it sounds good to me

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be fast but to ensure that we do not query information_schema again and again, we need to cache the information that this is the partition column in the table so require some changes. Will take it up separately .

@k-anshul k-anshul requested a review from begelundmuller April 2, 2026 13:08
Comment thread runtime/drivers/bigquery/bigquery.go
Comment thread runtime/drivers/bigquery/olap.go
Comment thread runtime/drivers/olap.go Outdated
Comment thread runtime/drivers/olap.go Outdated
Comment thread runtime/drivers/olap.go Outdated
Comment thread runtime/metricsview/executor/executor_validate.go Outdated
Comment thread runtime/drivers/olap.go
Comment thread runtime/testruntime/testruntime.go Outdated
Comment thread runtime/testruntime/testruntime.go Outdated
@@ -180,33 +181,157 @@ func (q *TableHead) generalExport(ctx context.Context, rt *runtime.Runtime, inst
}

func (q *TableHead) buildTableHeadSQL(ctx context.Context, olap drivers.OLAPStore) (string, error) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like there's a huge complexity increase in this function. Two questions:

  1. We don't run TableHead very often, so is it necessary to optimize it so hard? In general, I would assume people who connect a BI tool to a data warehouse are fine with a SELECT * FROM tbl LIMIT 100 query being run.
  2. If it really is necessary, is it possible to combine it into one nested query and push it into the dialect somehow?

Copy link
Copy Markdown
Member Author

@k-anshul k-anshul Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. It is used in data preview. On a 100 TB table this can cost a user 600 dollars. This can be a silent "trap" for a user given BigQuery returns result very fast (as reported by users running such queries on big tables).
    I agree that users should not use bytes processed based pricing when connecting to a BI tool but we should not leave such traps for users.
    For example, I found this issue in superset where the reporter refused to use superset with BigQuery till this kind of queries are removed : Select * Limit is DANGEROUS in BigQuery apache/superset#17299
  2. For partition pruning the filter has to be a static filter and using dynamic filter is not allowed.

If you are worried about dialect specific complexity in runtime/queries then we can take one of the following approaches:

  1. Disable data preview for BigQuery in UI and return an error in the API.
  2. Use preview table API which is free : https://docs.cloud.google.com/bigquery/docs/samples/bigquery-browse-table#bigquery_browse_table-go

Both approaches make this more optimised given we don't have to scan even 1 partition (which can still be big).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. Yeah I'm just a little worried about the driver-specificity in TableHead, especially given we are not adding many new OLAP drivers.

I don't think we should disable previews, but it would just be nice if we could push this into the driver somehow. I'm good with any of these:

  1. Rewrite SELECT * FROM tbl LIMIT n into preview API calls inside OLAPStore.Query itself (similar to the code we have here:
    // Regex to parse BigQuery SELECT ALL statement: SELECT * FROM `project_id.dataset.table`
    var selectQueryRegex = regexp.MustCompile(
    )
  2. Add a Head function on the OLAPStore interface (other drives can implement using a normal SELECT *)
  3. Add to the drivers.Dialect somehow (will become clean with Naman's refactors)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented 2nd option. It leads to some duplicate code but seemed cleanest/safest.

Comment thread runtime/drivers/bigquery/bigquery.go
Comment thread runtime/drivers/bigquery/bigquery.go Outdated
Comment thread runtime/drivers/bigquery/olap.go Outdated
Comment thread runtime/drivers/bigquery/olap.go Outdated
Comment thread runtime/drivers/olap.go Outdated
Comment thread runtime/queries/table_head.go
type: "object",
title: "BigQuery",
"x-category": "warehouse",
"x-category": "olap",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be problematic; most people will probably still wan't to use BigQuery as source for now? I believe applications would work on a way to give users a choice between OLAP and source for Snowflake and BigQuery, but if that hasn't landed yet, probably we should stick with the old default?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually @nishantmonu51 asked to move Snowflake to OLAP engine for now and handle it as warehouse for duckdb in subsequent OLAP works.
I followed the same approach here.
But on reflection, given BigQuery OLAP connector has been a mixed experience, I am okay to leave it as warehouse.
Thoughts @begelundmuller @nishantmonu51 ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you aware of this thread? I think Applications had to patch that change as it was breaking some flows. https://rilldata.slack.com/archives/C093UBT5NLV/p1775582170699349?thread_ts=1775491222.470509&cid=C093UBT5NLV

However, I see their patch didn't involve this specific flag, so maybe you need to change something else. I'll let you look at the thread and the patch and make any necessary changes. I do think we should not break the BigQuery as warehouse flows just yet.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. I wasn't ware of this. Reverted all UI changes per the patch fix.

Copy link
Copy Markdown
Contributor

@begelundmuller begelundmuller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@k-anshul k-anshul merged commit 586a914 into main Apr 17, 2026
15 of 18 checks passed
@k-anshul k-anshul deleted the bigquery_olap branch April 17, 2026 04:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants