Implement BigQuery Table Schema Update Operator#15367
Conversation
5d878e2 to
a797f32
Compare
0f8cca8 to
791d158
Compare
630d250 to
0fb0247
Compare
|
The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*. |
|
@marcosmarxm How important would you say it is to move functionality into a hook? Solving the use-case was easy using two already existing hooks and I don't think I need to create an additional hook given I don't implement any new functionality in how we talk with BigQuery Further I think mutating a mutable object shouldn't be a problem in this case and I am not too keen to deepcopy it for the sake of it. Lastly the naming of the schema_fields parameter, I wanted to be consistent with naming and input format of that field between this and other operators, do you think it's a problem? Changing it into updates_to_schema_fields or similar is an easy possibility. |
The CI broke because of pylint can you run pre-commit locally to organize imports? |
ee376d1 to
3f3f5b7
Compare
|
@marcosmarxm |
b059345 to
bd9af54
Compare
22a7093 to
7e128e2
Compare
|
The Workflow run is cancelling this PR. Building images for the PR has failed. Follow the workflow link to check the reason. |
89964c5 to
0f960c3
Compare
044e0d0 to
8d8bc74
Compare
|
Thanks for the thorough reviews @marcosmarxm and @tswast. Great jiob @thejens ! |
With this change we implement a new operator that handles patching of table schemas in bigquery.
This is needed as typing out an entire schema data structure (schema), in order to set e.g. a field description on a single field requires a lot of overhead. Also, many times the schema is not known or very complex as it may be the result of a Query or parsed automatically when importing files as tables.
This operator is useful for a workflow like:
Upstream: Create a BigQuery table as the output of a Query or import operator. Writer of job/operator knows the names of the fields, perhaps the types, but not necessarily how other schema fields are defined.
Downstream (this operator): Supply a partial schema definition that only contains field names and description values that will be patched on to the "generated by bigquery" schema from upstream.