Issue
I'm currently using the python sdk to transform and load data into BQ. One of the fields I have is supposed to be a string but the data will come in as either a long float number (ex: 75257229615211513551680283362634662820), int number ( ex: 4 ), or string (ex: http://some-string-url ).
So the data representation looks like
{
"count": "6",
"value": "75257229615211513551680283362634662820"
},
{
"count": "4",
"value": "5"
},
{
"count": "4",
"value": "http://some-string-url"
}
Since BQ does the auto detection, these values can be recognized as either a FLOAT, INT, or STRING. I followed docs, and stack recommendations to assign a hard schema to the config which looks like
table = bigquery.table.Table(table_ref)
job_config = bigquery.LoadJobConfig()
if table_name == self.incident_table_name:
job_config.schema = self.incidents_schema
if table_name == self.kpi_table_name:
job_config.schema = self.kpi_schema
job_config.autodetect = False # should already be assigned to False in the SDK ( just for a hard assurance )
The schema is read in from a json file that looks like
"fields": [
{
"mode": "NULLABLE",
"name": "count",
"type": "INTEGER"
},
{
"mode": "NULLABLE",
"name": "value",
"type": "STRING"
}
],
and parsed into the SchemaField() object..
field_schema = bigquery.SchemaField(name=name,
field_type=field_type,
mode=mode,
fields=subschema
)
Just to be clear i do believe that this is apart of a subschema and assigned to the fields as it's an array of objects.
then i submit the data through the SDK via the load_table_from_json which looks like
job = params.bq_client.load_table_from_json([incident],
job_config=job_config,
destination=table)
result = job.result()
All fairly straightforward, and follows closely with what is documented and shared by others. BUT for whatever reason, the API request will fail and continue to try parse value as either a STRING, FLOAT or INT even with the schema defined.
So a couple questions here... is this a bug with BQ, or the SDK not accepting the schema? Am i doing something wrong here? Does the autodetect feature always do its thing even when turned off and a schema defined?
Any communication here would be greatly appreciated.
Cheers
info
sdk version: google-cloud-bigquery 2.20.0
os: MacOS
environment: Cloud Function
python version: 3.9
Issue
I'm currently using the python sdk to transform and load data into BQ. One of the fields I have is supposed to be a string but the data will come in as either a long float number (ex:
75257229615211513551680283362634662820), int number ( ex:4), or string (ex:http://some-string-url).So the data representation looks like
Since BQ does the auto detection, these values can be recognized as either a FLOAT, INT, or STRING. I followed docs, and stack recommendations to assign a hard schema to the config which looks like
The schema is read in from a json file that looks like
and parsed into the SchemaField() object..
Just to be clear i do believe that this is apart of a
subschemaand assigned to thefieldsas it's an array of objects.then i submit the data through the SDK via the
load_table_from_jsonwhich looks likeAll fairly straightforward, and follows closely with what is documented and shared by others. BUT for whatever reason, the API request will fail and continue to try parse
valueas either a STRING, FLOAT or INT even with the schema defined.So a couple questions here... is this a bug with BQ, or the SDK not accepting the schema? Am i doing something wrong here? Does the autodetect feature always do its thing even when turned off and a schema defined?
Any communication here would be greatly appreciated.
Cheers
info
sdk version:
google-cloud-bigquery 2.20.0os: MacOS
environment: Cloud Function
python version: 3.9