feat(bigquery): move the bigquery backend back into the main ibis repo by cpcloud · Pull Request #4797 · ibis-project/ibis

cpcloud · 2022-11-08T15:48:00Z

This PR moves bigquery back into the main ibis repo.

~~Still working through the failing tests, though many are fixed.~~ Tests are passing.

TODOs:

get all tests passing locally
see if we can automatically handle the autonaming we're doing that bigquery doesn't accept
setup ci similar to ibis-bigquery if possible, though maybe we just run these tests on push events only similar to snowflake

Possible follow ups:

delete <4 legacy code
delete <4 legacy tests
move SQL tests to write-then-compare so that they are easy to modify

github-actions · 2022-11-08T16:09:27Z

Test Results

      41 files       41 suites 1h 37m 12s ⏱️
11 830 tests   8 998 ✔️   2 832 💤 0 ❌
42 238 runs 32 018 ✔️ 10 220 💤 0 ❌

Results for commit 5be3c16.

♻️ This comment has been updated with latest results.

jreback · 2022-11-14T16:33:52Z

@cpcloud why do we view this as a good thing?

cpcloud · 2022-11-14T16:59:15Z

@jreback Good question, thanks for bringing it up.

The primary reason is to prevent the maintenance burden that comes along with a separate repo.

I give a more detailed answer to your question here (ibis-project/ibis-bigquery#151).

In short, many of the things we thought would be good about having a separate repo in practice increase maintenance work or have a negligible effect on the amount of maintenance work.

jreback · 2022-11-14T17:57:54Z

@cpcloud sure

is this a general change in policy though? or a specific one off for BQ?

eg what about a lot of the other google variants or mssql for example

cpcloud · 2022-11-21T13:35:35Z

@tswast Friendly ping! Any thoughts on this PR?

codecov · 2022-11-21T17:13:59Z

Codecov Report

Merging #4797 (986544d) into master (a2d03d1) will decrease coverage by 5.15%.
The diff coverage is 2.37%.

❗ Current head 986544d differs from pull request most recent head 4c16755. Consider uploading reports for the commit 4c16755 to get more accurate results

@@            Coverage Diff             @@
##           master    #4797      +/-   ##
==========================================
- Coverage   92.88%   87.73%   -5.16%     
==========================================
  Files         192      204      +12     
  Lines       21731    22830    +1099     
  Branches     3011     3124     +113     
==========================================
- Hits        20185    20030     -155     
- Misses       1129     2389    +1260     
+ Partials      417      411       -6

Impacted Files	Coverage Δ
ibis/backends/bigquery/client.py	`0.00% <0.00%> (ø)`
ibis/backends/bigquery/compiler.py	`0.00% <0.00%> (ø)`
ibis/backends/bigquery/datatypes.py	`0.00% <0.00%> (ø)`
ibis/backends/bigquery/operations.py	`0.00% <0.00%> (ø)`
ibis/backends/bigquery/registry.py	`0.00% <0.00%> (ø)`
ibis/backends/bigquery/rewrites.py	`0.00% <0.00%> (ø)`
ibis/backends/bigquery/udf/__init__.py	`0.00% <0.00%> (ø)`
ibis/backends/bigquery/udf/core.py	`0.00% <0.00%> (ø)`
ibis/backends/bigquery/udf/find.py	`0.00% <0.00%> (ø)`
ibis/backends/bigquery/udf/rewrite.py	`0.00% <0.00%> (ø)`
... and 24 more

tswast

BQ changes LGTM. I like the "snapshot" structure in the tests.

cpcloud · 2022-11-23T21:28:12Z

    def fetch_from_cursor(self, cursor, schema):
        query = cursor.query
-        df = query.to_arrow().to_pandas(timestamp_as_object=True)
+        query_result = query.result()


@tswast Can you take a look at this block of code and say whether this is expected behavior?

The use case is reading from bigquery-public-data.hacker_news.comments, but having ibis-gbq be the billing project.

Without this workaround, the storage API creates a read session in the data project (bigquery-public-data), which causes queries to fail when using the pyarrow functionality.

I wouldn't expect this to be necessary. If the query succeeded, then I assume the billing project is being set correctly in the client constructor

ibis/ibis/backends/bigquery/__init__.py

Line 169 in 01bd402

project=new_backend.billing_project,

and in the query method

ibis/ibis/backends/bigquery/__init__.py

Line 236 in 01bd402

stmt, job_config=job_config, project=self.billing_project

I've filed googleapis/python-bigquery#1422 to investigate this further, but I think it's fine to keep this workaround if there really is a bug.

Update: I think there really is a bug. The project from "client" is used instead of the project from the QueryJob.

Ah, okay. Thanks Tim. I'll keep this comment unresolved so the link is easier to find.

cpcloud · 2022-11-27T13:09:46Z

Ok, I'm going to merge this in and fix any issues with the CI. Thanks all for the help reviewing, great to see this back in the main repo!

cpcloud added this to the 4.0.0 milestone Nov 8, 2022

cpcloud added the community Issues or PRs requiring help from the community label Nov 8, 2022

cpcloud force-pushed the ibis-bigquery branch 4 times, most recently from 362feca to 0a44e88 Compare November 11, 2022 14:26

cpcloud added the bigquery The BigQuery backend label Nov 11, 2022

cpcloud force-pushed the ibis-bigquery branch 2 times, most recently from 4d93724 to 9ad71ad Compare November 11, 2022 16:18

mik-laj reviewed Nov 11, 2022

View reviewed changes

Comment thread .github/workflows/ibis-backends-cloud.yml Outdated

cpcloud force-pushed the ibis-bigquery branch 2 times, most recently from a9f1703 to c5b2f6f Compare November 13, 2022 14:28

mik-laj reviewed Nov 14, 2022

View reviewed changes

Comment thread ibis/tests/sql/conftest.py Outdated

cpcloud mentioned this pull request Nov 14, 2022

test: use pytest-snapshot for testing generated SQL #4836

Merged

cpcloud force-pushed the ibis-bigquery branch 2 times, most recently from 3a634f0 to 58a8347 Compare November 14, 2022 14:08

cpcloud force-pushed the ibis-bigquery branch from d834c60 to bd24e9f Compare November 14, 2022 22:56

cpcloud marked this pull request as ready for review November 14, 2022 23:27