Weird bug in TabularDataset.column_names

I got very weird issue.
I've imported well-known Retail Stockout prediction dataset in CSV format. I've imported the dataset to the Vertex AI Datasets using google.cloud.aiplatform.TabularDataset python library code.

Most columns have the "Wk_" prefix. The screenshot shows that there is only one column with "2016_43_Quantity" in it - "Wk_2016_43_Quantity" column. Just like in the source CSV.
Everything is fine.

But here is the problem:
When I call the API to get the dataset metadata including the column names, all column names are fine except one which is stated as "WWk_2016_43_Quantity". (Notice the double "W" in the "WWk_" prefix).
In context:
...
 'Wk_2016_42_Quantity',
 'WWk_2016_43_Quantity',
 'Wk_2016_44_Quantity',
...

This discrepancy causes the subsequent model training to fail due to the dataset not having the `WWk_2016_43_Quantity` column (it has `Wk_2016_43_Quantity` instead).

I do not understand how this could have happened, but you can easily examine the imported dataset and see that the UX and and what returned by the google-cloud-aiplatform library differs.



#### Environment details

  - OS type and version: Linux
  - Python version: 3.7
  - `google-cloud-aiplatform` version: 1.1.1

#### Steps to reproduce

  1. Create dataset from the "gs://kubeflow-pipelines-regional-us-central1/mirror/cloud-ml-data/automl-tables/notebooks/stockout.csv" file
  2. Try getting its columns

#### Code example

```python
from google.cloud import aiplatform
print(aiplatform.TabularDataset('projects/140626129697/locations/us-central1/datasets/2405036550225133568').column_names)
```



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weird bug in TabularDataset.column_names #589

Environment details

Steps to reproduce

Code example

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Weird bug in TabularDataset.column_names #589

Description

Environment details

Steps to reproduce

Code example

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions