feat(bigquery): move the bigquery backend back into the main ibis repo#4797
feat(bigquery): move the bigquery backend back into the main ibis repo#4797cpcloud merged 1 commit intoibis-project:masterfrom
Conversation
362feca to
0a44e88
Compare
4d93724 to
9ad71ad
Compare
a9f1703 to
c5b2f6f
Compare
3a634f0 to
58a8347
Compare
|
@cpcloud why do we view this as a good thing? |
|
@jreback Good question, thanks for bringing it up. The primary reason is to prevent the maintenance burden that comes along with a separate repo. I give a more detailed answer to your question here (ibis-project/ibis-bigquery#151). In short, many of the things we thought would be good about having a separate repo in practice increase maintenance work or have a negligible effect on the amount of maintenance work. |
|
@cpcloud sure is this a general change in policy though? or a specific one off for BQ? eg what about a lot of the other google variants or mssql for example |
d834c60 to
bd24e9f
Compare
e9d38e8 to
d0d7624
Compare
f2a7c12 to
bdd83fc
Compare
|
@tswast Friendly ping! Any thoughts on this PR? |
bdd83fc to
c2d767c
Compare
Codecov Report
@@ Coverage Diff @@
## master #4797 +/- ##
==========================================
- Coverage 92.88% 87.73% -5.16%
==========================================
Files 192 204 +12
Lines 21731 22830 +1099
Branches 3011 3124 +113
==========================================
- Hits 20185 20030 -155
- Misses 1129 2389 +1260
+ Partials 417 411 -6
|
9891fcf to
659ba53
Compare
tswast
left a comment
There was a problem hiding this comment.
BQ changes LGTM. I like the "snapshot" structure in the tests.
c6dc72c to
60eab8f
Compare
| def fetch_from_cursor(self, cursor, schema): | ||
| query = cursor.query | ||
| df = query.to_arrow().to_pandas(timestamp_as_object=True) | ||
| query_result = query.result() |
There was a problem hiding this comment.
@tswast Can you take a look at this block of code and say whether this is expected behavior?
The use case is reading from bigquery-public-data.hacker_news.comments, but having ibis-gbq be the billing project.
Without this workaround, the storage API creates a read session in the data project (bigquery-public-data), which causes queries to fail when using the pyarrow functionality.
There was a problem hiding this comment.
I wouldn't expect this to be necessary. If the query succeeded, then I assume the billing project is being set correctly in the client constructor
ibis/ibis/backends/bigquery/__init__.py
Line 169 in 01bd402
ibis/ibis/backends/bigquery/__init__.py
Line 236 in 01bd402
I've filed googleapis/python-bigquery#1422 to investigate this further, but I think it's fine to keep this workaround if there really is a bug.
There was a problem hiding this comment.
Update: I think there really is a bug. The project from "client" is used instead of the project from the QueryJob.
There was a problem hiding this comment.
Ah, okay. Thanks Tim. I'll keep this comment unresolved so the link is easier to find.
a9a2568 to
747e808
Compare
|
Ok, I'm going to merge this in and fix any issues with the CI. Thanks all for the help reviewing, great to see this back in the main repo! |
747e808 to
8d75e33
Compare
8d75e33 to
4c16755
Compare
This PR moves bigquery back into the main ibis repo.
Still working through the failing tests, though many are fixed.Tests are passing.TODOs:
Possible follow ups: