# Copyright 2026 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

Unlock SQL and Python interoperability for BigQuery with %%bqsql magic#

In this tutorial, you will learn how to seamlessly chain data processing across SQL and Python code cells using %%bqsql IPython magic and BigQuery DataFrames (BigFrames). This interoperability is now available to all Jupyter users, whether you’re in Colab, JupyterLab, or VS Code.

While we begin by loading a local Excel dataset into a local Pandas DataFrame, the main focus is on how you can transition between Pandas’ Python-centric API and BigQuery’s SQL-centric engine. This hybrid workflow combines the best of both worlds: the expressive power of SQL for complex transformations and the versatile Python ecosystem for visualization and further analysis.

Thanks to open-source packages like Jupyter, Pandas, BigFrames, and the BigQuery sandbox, you can follow all steps in this guide for free* and without a credit card.

*See the BigQuery sandbox documentation for limitations.

The %%bqsql Magic#

Last year, Google introduced SQL cells in Colab Enterprise notebooks. Now, with the %%bqsql cell magics in BigQuery DataFrames, this same powerful interoperability is available to all Jupyter users, whether you’re in Colab, JupyterLab, or VS Code. These magics allow you to write SQL queries that run directly on local pandas DataFrames, BigFrames DataFrames, or BigQuery tables.

Getting Started#

To get started,

Enable the BigQuery sandbox. Make note of your Google Cloud project ID.
Set up a local Python development environment (see: Setting up a Python development environment) for Google Cloud.
Create and activate a venv to isolate Python dependencies. On Linux or macOS, use these commands (update to your preferred Python version):
```
python3.12 -m venv ~/venv
. ~/venv/bin/activate
```

Install the Jupyter, bigframes, and python-calamine packages:

pip install --upgrade jupyterlab bigframes python-calamine

Start Jupyter Lab.
```
jupyter lab
```
Open a web browser to the URL listed in the output. It will be something like http://localhost:8888/lab?token=somesupersecretvaluehere.
Create a new notebook using the Jupyter Lab UI.

%pip install python-calamine pandas bigframes

Accessing the Dataset#

In this tutorial, you’ll analyze the USDA wheat data. Use the standard requests package to download the data to a temporary file, mimicking a typical local data analysis workflow.

import tempfile

import requests

url = "https://www.ers.usda.gov/media/5706/wheat-data-all-years.xlsx?v=52690"

tmp = tempfile.NamedTemporaryFile(delete=True)

with requests.get(url, stream=True) as r:
    r.raise_for_status()
    for chunk in r.iter_content(chunk_size=8192):
        tmp.write(chunk)

tmp.flush()
tmp.seek(0)

Use the pyarrow dtype_backend when preparing local Pandas data for SQL processing. This ensures more consistent handling of NULL values and seamless schema mapping when you hand off the data to the BigQuery SQL engine. For this example, read the ‘Table05’ sheet, which contains annual wheat supply and disappearance data:

import pandas as pd

df = pd.read_excel(
    tmp,
    sheet_name="Table05",
    dtype_backend="pyarrow",
    engine="calamine",
    header=1,  # Skip the first row.
)
tmp.close()
df

	Marketing year 1/	Time period	Beginning stocks	Production	Imports 2/	Total supply 3/	Food use	Seed use	Feed and residual use	Total domestic use 3/	Exports 2/	Total disappearance 3/	Ending stocks
0	1950/51	MY Jun-May	496.0	1019.0	11.0	1526.0	580.0	--	109.0	689.0	345.0	1034.0	492.0
1	1951/52	MY Jun-May	492.0	988.0	30.0	1510.0	585.0	--	110.0	695.0	485.0	1180.0	330.0
2	1952/53	MY Jun-May	330.0	1306.0	24.0	1660.0	578.0	--	78.0	656.0	332.0	988.0	672.0
3	1953/54	MY Jun-May	672.0	1173.0	6.0	1851.0	556.0	--	87.0	643.0	214.0	857.0	994.0
4	1954/55	MY Jun-May	994.0	984.0	3.0	1981.0	552.0	--	53.0	605.0	267.0	872.0	1109.0
...	...	...	...	...	...	...	...	...	...	...	...	...	...
281	1/ June–May. Latest data may be preliminary or...	<NA>	<NA>	<NA>	<NA>	<NA>	<NA>	<NA>	<NA>	<NA>	<NA>	<NA>	<NA>
282	2/ Includes flour and selected other products ...	<NA>	<NA>	<NA>	<NA>	<NA>	<NA>	<NA>	<NA>	<NA>	<NA>	<NA>	<NA>
283	3/ Totals may not add due to rounding.	<NA>	<NA>	<NA>	<NA>	<NA>	<NA>	<NA>	<NA>	<NA>	<NA>	<NA>	<NA>
284	Source: USDA, Economic Research Service, based...	<NA>	<NA>	<NA>	<NA>	<NA>	<NA>	<NA>	<NA>	<NA>	<NA>	<NA>	<NA>
285	Updated: May 12, 2026	<NA>	<NA>	<NA>	<NA>	<NA>	<NA>	<NA>	<NA>	<NA>	<NA>	<NA>	<NA>

286 rows × 13 columns

Preparing the data#

Before querying the local DataFrame with SQL, ensure that the column names are SQL-friendly. BigQuery supports flexible column names, allowing most unicode characters, but special characters like “/” and “” must be removed or replaced.

df.columns = [name.replace("/", "") for name in df.columns]
df

	Marketing year 1	Time period	Beginning stocks	Production	Imports 2	Total supply 3	Food use	Seed use	Feed and residual use	Total domestic use 3	Exports 2	Total disappearance 3	Ending stocks
0	1950/51	MY Jun-May	496.0	1019.0	11.0	1526.0	580.0	--	109.0	689.0	345.0	1034.0	492.0
1	1951/52	MY Jun-May	492.0	988.0	30.0	1510.0	585.0	--	110.0	695.0	485.0	1180.0	330.0
2	1952/53	MY Jun-May	330.0	1306.0	24.0	1660.0	578.0	--	78.0	656.0	332.0	988.0	672.0
3	1953/54	MY Jun-May	672.0	1173.0	6.0	1851.0	556.0	--	87.0	643.0	214.0	857.0	994.0
4	1954/55	MY Jun-May	994.0	984.0	3.0	1981.0	552.0	--	53.0	605.0	267.0	872.0	1109.0
...	...	...	...	...	...	...	...	...	...	...	...	...	...
281	1/ June–May. Latest data may be preliminary or...	<NA>	<NA>	<NA>	<NA>	<NA>	<NA>	<NA>	<NA>	<NA>	<NA>	<NA>	<NA>
282	2/ Includes flour and selected other products ...	<NA>	<NA>	<NA>	<NA>	<NA>	<NA>	<NA>	<NA>	<NA>	<NA>	<NA>	<NA>
283	3/ Totals may not add due to rounding.	<NA>	<NA>	<NA>	<NA>	<NA>	<NA>	<NA>	<NA>	<NA>	<NA>	<NA>	<NA>
284	Source: USDA, Economic Research Service, based...	<NA>	<NA>	<NA>	<NA>	<NA>	<NA>	<NA>	<NA>	<NA>	<NA>	<NA>	<NA>
285	Updated: May 12, 2026	<NA>	<NA>	<NA>	<NA>	<NA>	<NA>	<NA>	<NA>	<NA>	<NA>	<NA>	<NA>

286 rows × 13 columns

Filtering with Pandas#

Perform a basic filter using standard Python/Pandas syntax to remove rows with missing data. This represents the initial Python-only stage of a processing chain.

full_rows = df[~df['Beginning stocks'].isna()]
full_rows

	Marketing year 1	Time period	Beginning stocks	Production	Imports 2	Total supply 3	Food use	Seed use	Feed and residual use	Total domestic use 3	Exports 2	Total disappearance 3	Ending stocks
0	1950/51	MY Jun-May	496.0	1019.0	11.0	1526.0	580.0	--	109.0	689.0	345.0	1034.0	492.0
1	1951/52	MY Jun-May	492.0	988.0	30.0	1510.0	585.0	--	110.0	695.0	485.0	1180.0	330.0
2	1952/53	MY Jun-May	330.0	1306.0	24.0	1660.0	578.0	--	78.0	656.0	332.0	988.0	672.0
3	1953/54	MY Jun-May	672.0	1173.0	6.0	1851.0	556.0	--	87.0	643.0	214.0	857.0	994.0
4	1954/55	MY Jun-May	994.0	984.0	3.0	1981.0	552.0	--	53.0	605.0	267.0	872.0	1109.0
...	...	...	...	...	...	...	...	...	...	...	...	...	...
275	2025/26	MY Jun-May	854.734	1984.537	125.0	2964.271	960.0	59.7	100.0	1119.7	910.0	2029.7	934.571
276	2025/26	Q1 Jun-Aug	854.734	1984.537	30.593	2869.864	241.091	2.653	239.522	483.266	252.58	735.846	2134.018
277	2025/26	Q2 Sep-Nov	2134.018	0.0	30.078	2164.096	245.58	39.658	-54.047	231.191	255.802	486.993	1677.103
278	2025/26	Q3 Dec-Feb	1677.103	0.0	32.363	1709.466	230.975	1.75	-24.747	207.978	201.291	409.269	1300.197
279	2026/27	MY Jun-May	934.571	1561.322	140.0	2635.893	960.0	59	80.0	1099.0	775.0	1874.0	761.893

280 rows × 13 columns

Interoperate with SQL using the BigQuery SQL magics (%%bqsql)#

The BigQuery DataFrames library provides the %%bqsql magic, which acts as the bridge between your Python and SQL environments. It allows the BigQuery query engine to directly reference and query your local Pandas DataFrames (by implicitly uploading them as temporary tables) as well as actual BigQuery tables and external tables in GCS (Parquet, Iceberg, CSV).

To enable this integration in your notebook, load the bigframes extension. This is already completed in BigQuery Studio, Colab Enterprise, and Colab notebooks. For other environments, such as VS Code and Jupyter Lab, run the following cell:

%load_ext bigframes

To ensure the correct Google Cloud project is billed for query usage, including free tier usage, configure the project ID used by the magics. Even in the free sandbox tier, a project ID is required to allocate query resources. If you don’t set it explicitly, BigFrames will try to discover it from your environment (e.g., your Application Default Credentials).

import bigframes.pandas as bpd

PROJECT_ID = "" # @param {type:"string"}
bpd.options.bigquery.project = PROJECT_ID

Querying Local Pandas DataFrames with SQL#

With the project configured, you can now run SQL queries directly against your local Pandas DataFrame (full_rows) as if it were a table in BigQuery. Simply reference the variable name inside braces {full_rows} in your SQL query.

%%bqsql
SELECT * FROM {full_rows}

Query processed 0 Bytes. [Job bigframes-dev:US.c3c67902-6a45-492a-9491-a91daddaada1 details]

Load job c22ec1ce-09da-4ea1-b0a0-f28eee65aa20 is DONE. Open Job

Query processed 30.0 kB in a moment of slot time.

	Marketing year 1	Time period	Beginning stocks	Imports 2	Total supply 3	Food use	Seed use	Feed and residual use	Total domestic use 3	Exports 2	Total disappearance 3	Ending stocks
0	1980/81	Q2 Sep-Nov	2714.0	0.6	2714.6	162.1	76	4.865	242.965	379.335	622.3	2092.3
1	1987/88	Q2 Sep-Nov	2976.462	4.525	2980.987	193.048	58	-79.082	171.966	308.453	480.419	2500.568
2	2014/15	Q2 Sep-Nov	1907.22	34.551	1941.771	248.187	48.802	-92.585	204.404	207.737	412.141	1529.63
3	1976/77	Q2 Sep-Nov	2385.2	0.5	2385.7	153.0	64	-2.795	214.205	277.295	491.5	1894.2
4	1994/95	Q2 Sep-Nov	2069.494	21.423	2090.917	229.297	60.954	-28.64	261.611	338.202	599.813	1491.104
5	2002/03	Q2 Sep-Nov	1748.987	23.087	1772.074	237.754	54.599	-74.678	217.675	234.53	452.205	1319.869
6	2007/08	Q2 Sep-Nov	1716.927	21.486	1738.413	245.026	59.915	-119.882	185.059	421.416	606.475	1131.938
7	2025/26	Q2 Sep-Nov	2134.018	30.078	2164.096	245.58	39.658	-54.047	231.191	255.802	486.993	1677.103
8	1995/96	Q2 Sep-Nov	1881.099	16.252	1897.351	232.151	64.356	-98.182	198.325	360.759	559.084	1338.267
9	2001/02	Q2 Sep-Nov	2155.814	29.04	2184.854	245.088	51.601	-23.073	273.616	287.783	561.399	1623.455

10 rows × 13 columns

[280 rows x 13 columns in total]

You should see the results from full_rows.

Chaining SQL and Python: Saving SQL Results#

The true power of the %%bqsql magic lies in chaining. By providing a destination variable name as an argument to %%bqsql (e.g., %%bqsql destination_var), the query result is saved as a BigQuery DataFrame (a.k.a. BigFrames DataFrame) to that variable.

This DataFrame lives on the BigQuery engine but behaves like a Pandas DataFrame in Python. You can immediately use it in subsequent Python cells, or reference it again in another SQL cell. This allows you to build a multi-step, hybrid processing pipeline.

Filter the data to only yearly entries using SQL, and save the result into a new BigFrames DataFrame named yearly:

%%bqsql yearly
SELECT *
FROM {full_rows}
WHERE STARTS_WITH(`Time period`, 'MY')

Query processed 0 Bytes in a moment of slot time. [Job bigframes-dev:US.71850fc1-147f-44f7-b4c0-b94592f55639 details]

Load job aaa74c26-b3ff-422f-a670-93222188fe9f is DONE. Open Job

Query processed 30.0 kB in a moment of slot time.

	Marketing year 1	Time period	Beginning stocks	Production	Imports 2	Total supply 3	Food use	Seed use	Feed and residual use	Total domestic use 3	Exports 2	Total disappearance 3	Ending stocks
0	1955/56	MY Jun-May	1109.0	937.0	10.0	2056.0	553.0	--	51.0	604.0	322.0	926.0	1130.0
1	1957/58	MY Jun-May	1004.0	956.0	10.0	1970.0	547.0	--	43.0	590.0	418.0	1008.0	962.0
2	1954/55	MY Jun-May	994.0	984.0	3.0	1981.0	552.0	--	53.0	605.0	267.0	872.0	1109.0
3	1951/52	MY Jun-May	492.0	988.0	30.0	1510.0	585.0	--	110.0	695.0	485.0	1180.0	330.0
4	1956/57	MY Jun-May	1130.0	1005.0	8.0	2143.0	541.0	--	57.0	598.0	541.0	1139.0	1004.0
5	1950/51	MY Jun-May	496.0	1019.0	11.0	1526.0	580.0	--	109.0	689.0	345.0	1034.0	492.0
6	1962/63	MY Jun-May	1420.6	1092.0	5.3	2517.9	502.7	61.4	34.7	598.8	649.4	1248.2	1269.7
7	1959/60	MY Jun-May	1368.0	1118.0	7.0	2493.0	558.0	--	49.0	607.0	502.0	1109.0	1384.0
8	1963/64	MY Jun-May	1269.7	1146.8	4.0	2420.5	487.9	64.9	28.6	581.4	845.6	1427.0	993.5
9	1953/54	MY Jun-May	672.0	1173.0	6.0	1851.0	556.0	--	87.0	643.0	214.0	857.0	994.0

10 rows × 13 columns

[77 rows x 13 columns in total]

Chaining Step 2: Complex SQL Transformation on the BigFrames DataFrame#

Now, you can chain another SQL operation. Reference the yearly BigFrames DataFrame that you just created, extract the year using SQL regular expressions, cast it to a timestamp, and save the results into a new BigFrames DataFrame named timeseries.

Notice how you are building a chain: Local Pandas -> [SQL filter] -> BigFrames yearly -> [SQL transform] -> BigFrames timeseries.

%%bqsql timeseries
SELECT
  * EXCEPT (`Marketing year 1`),
  TIMESTAMP(CONCAT(
    REGEXP_EXTRACT(`Marketing year 1`, r'([0-9]+)\/'),
    '-01-01')) AS `year`
FROM {yearly}

Query processed 0 Bytes in a moment of slot time. [Job bigframes-dev:US.fdcdabc9-e1a3-47e6-ab27-9e19d9aaa106 details]

Load job ec6be16a-722d-4151-b7a6-5e410557577d is DONE. Open Job

Query processed 8.3 kB in a moment of slot time.

	Time period	Beginning stocks	Production	Imports 2	Total supply 3	Food use	Seed use	Feed and residual use	Total domestic use 3	Exports 2	Total disappearance 3	Ending stocks	year
0	MY Jun-May	1004.0	956.0	10.0	1970.0	547.0	--	43.0	590.0	418.0	1008.0	962.0	1957-01-01 00:00:00+00:00
1	MY Jun-May	672.0	1173.0	6.0	1851.0	556.0	--	87.0	643.0	214.0	857.0	994.0	1953-01-01 00:00:00+00:00
2	MY Jun-May	496.0	1019.0	11.0	1526.0	580.0	--	109.0	689.0	345.0	1034.0	492.0	1950-01-01 00:00:00+00:00
3	MY Jun-May	330.0	1306.0	24.0	1660.0	578.0	--	78.0	656.0	332.0	988.0	672.0	1952-01-01 00:00:00+00:00
4	MY Jun-May	962.0	1457.0	8.0	2427.0	561.0	--	48.0	609.0	450.0	1059.0	1368.0	1958-01-01 00:00:00+00:00
5	MY Jun-May	994.0	984.0	3.0	1981.0	552.0	--	53.0	605.0	267.0	872.0	1109.0	1954-01-01 00:00:00+00:00
6	MY Jun-May	492.0	988.0	30.0	1510.0	585.0	--	110.0	695.0	485.0	1180.0	330.0	1951-01-01 00:00:00+00:00
7	MY Jun-May	1130.0	1005.0	8.0	2143.0	541.0	--	57.0	598.0	541.0	1139.0	1004.0	1956-01-01 00:00:00+00:00
8	MY Jun-May	1109.0	937.0	10.0	2056.0	553.0	--	51.0	604.0	322.0	926.0	1130.0	1955-01-01 00:00:00+00:00
9	MY Jun-May	1368.0	1118.0	7.0	2493.0	558.0	--	49.0	607.0	502.0	1109.0	1384.0	1959-01-01 00:00:00+00:00

10 rows × 13 columns

[77 rows x 13 columns in total]

Chaining Back to Python: Visualizing BigFrames Data#

Now that you’ve completed some SQL transformations, you can chain back to Python for visualization. Because BigFrames DataFrames implement the Pandas API, you can call standard visualization methods (like .plot.line()) directly on the timeseries DataFrame without downloading the full dataset first. The computations happen in BigQuery, and only the summarized chart data is sent back to the notebook.

timeseries.set_index('year').sort_index().plot.line()

Query processed 8.8 kB in a moment of slot time.

Load job c6b4d65a-4555-4efc-9f8b-7156f4c62835 is DONE. Open Job

<Axes: xlabel='year'>

../../_images/fd96da1edeaafb8af9842d4307eaa64f4a000509f667ccec9ce8513186c1cc35.png

Downloading to Local Pandas (Optional Handoff)#

If you need to use local Python libraries that are not supported by BigFrames (such as custom plotting libraries or local ML frameworks), you can explicitly download the final transformed remote DataFrame into a standard local Pandas DataFrame using .to_pandas():

pddf = timeseries.set_index('year').sort_index().to_pandas()
pddf

	Time period	Beginning stocks	Production	Imports 2	Total supply 3	Food use	Seed use	Feed and residual use	Total domestic use 3	Exports 2	Total disappearance 3	Ending stocks
year
1950-01-01 00:00:00+00:00	MY Jun-May	496.0	1019.0	11.0	1526.0	580.0	--	109.0	689.0	345.0	1034.0	492.0
1951-01-01 00:00:00+00:00	MY Jun-May	492.0	988.0	30.0	1510.0	585.0	--	110.0	695.0	485.0	1180.0	330.0
1952-01-01 00:00:00+00:00	MY Jun-May	330.0	1306.0	24.0	1660.0	578.0	--	78.0	656.0	332.0	988.0	672.0
1953-01-01 00:00:00+00:00	MY Jun-May	672.0	1173.0	6.0	1851.0	556.0	--	87.0	643.0	214.0	857.0	994.0
1954-01-01 00:00:00+00:00	MY Jun-May	994.0	984.0	3.0	1981.0	552.0	--	53.0	605.0	267.0	872.0	1109.0
...	...	...	...	...	...	...	...	...	...	...	...	...
2022-01-01 00:00:00+00:00	MY Jun-May	674.431	1649.713	121.585	2445.729	971.677	68.369	75.503	1115.549	760.612	1876.161	569.568
2023-01-01 00:00:00+00:00	MY Jun-May	569.568	1803.942	137.798	2511.308	961.303	62.046	85.617	1108.966	705.908	1814.874	696.434
2024-01-01 00:00:00+00:00	MY Jun-May	696.434	1978.697	148.954	2824.085	969.493	61.1	112.863	1143.456	825.895	1969.351	854.734
2025-01-01 00:00:00+00:00	MY Jun-May	854.734	1984.537	125.0	2964.271	960.0	59.7	100.0	1119.7	910.0	2029.7	934.571
2026-01-01 00:00:00+00:00	MY Jun-May	934.571	1561.322	140.0	2635.893	960.0	59	80.0	1099.0	775.0	1874.0	761.893

77 rows × 12 columns

Conclusion: The Power of Hybrid Chaining#

By leveraging BigQuery DataFrames and the %%bqsql magic, you have built a powerful, interoperable pipeline that seamlessly transitions between SQL and Python.

This hybrid approach offers several key benefits:

Optimal Tool Selection: Use SQL for what it does best (complex queries, window functions, regex extractions on large sets) and Python for what it does best (visualization, statistical analysis, ML, orchestrating workflow).
Improved Readability: Instead of massive, unreadable SQL queries with dozens of CTEs, or long, complex Pandas method chains, you can split your pipeline into logical steps, alternating between SQL and Python.
Seamless Scaling: The exact same %%bqsql code can scale from a tiny local Pandas DataFrame to billions of rows in a production BigQuery table. You only need to swap the initial local Pandas DataFrame with a BigQuery DataFrame reference.

Next Steps#

In addition to the %%bqsql cell magic, BigFrames also registers a BigQuery Accessor on standard Pandas DataFrames, allowing you to run SQL scalar functions directly on local pandas data.

For example, you can call powerful Google Cloud community UDFs from BigQuery Utils, BigFunctions, or CARTO Analytics Toolbox for BigQuery using df.bigquery.sql_scalar(...):

Scaling Up: Advanced BigQuery Features#

While the BigQuery sandbox offers a powerful environment to test these hybrid Python-SQL workflows for free, some advanced features like BigQuery Machine Learning (BQML) are restricted. By connecting a billing account to your Google Cloud project, you can unlock advanced capabilities such as ML.FORECAST (or the AI.FORECAST function) to predict time-series data using Google’s state-of-the-art foundational models directly from your SQL/Python chain.

Feedback & Community#

The BigFrames team would love to hear your feedback on the hybrid Python-SQL experience:

Email: bigframes-feedback@google.com
Issues: File bug reports or feature requests on the open-source BigFrames repository.
Updates: To receive news and updates, subscribe to the BigFrames email list.