feat: Added AutoMLTablesTrainingJob and tests#62
Conversation
338063f to
2802dc8
Compare
| "transformations": self._column_transformations, | ||
| "trainBudgetMilliNodeHours": budget_milli_node_hours, | ||
| # optional inputs | ||
| "weightColumnName": weight_column, |
There was a problem hiding this comment.
In google3/google/cloud/aiplatform/publicfiles/trainingjob/definition/automl_tables.proto, this column is referred to as "weight_column_name".
However, in gs://google-cloud-aiplatform/schema/trainingjob/definition/automl_tabular_1.0.0.yaml, it gives the name as "weightColumn". This is incorrect and results in an error at training time.
@sasha-gitg why is there a discrepancy here?
I imagine the protos are the source-of-truth and the yaml is just out-of-date? If so, what's our plan to mitigate the users from dealing with this, since they don't have access to the protos AFAIK.
There was a problem hiding this comment.
Synced with the team. The yaml is incorrect and will be updated. The field should be weightColumnName. If users are using our Model Builder SDK then hopefully we have caught these issues early during our development. This should become less of an issue for the service as we move forward because it will support all previously versioned yaml schemas.
2759662 to
36d6b7c
Compare
| def __init__( | ||
| self, | ||
| display_name: str, | ||
| optimization_objective: str, |
There was a problem hiding this comment.
According to the protos and yaml, the optimization_objective is optional since there is a default if it's not supplied.
Also, the design doc doesn't have the optimization_prediction_type parameter either. Sure, we can support this by inferring that info based on the optimization_objective, but is that what you intended?
My personal preferred way is to create an OptimizationObjective abstract class and create subclasses of each for each optimization_objective type. That would encapsulate the optimization_objective, optimization_prediction_type , optimization_objective_recall_value and optimization_objective_precision_value parameters together into one.
However, from our other convos it seems like you might prefer just passing in a string for optimization_objective and having the other parameters be optional.
Let me know what you prefer, regarding the type for optimization_objective and whether or not to include a optimization_prediction_type parameter.
There was a problem hiding this comment.
With respect to optimization_prediction_type we received feedback to include that as the top level argument because it requires less knowledge overhead than optimization_objective. Additionally, we should include all the values flattened in the API surface and validate or ignore arguments based on valid combinations. There's precedence for this pattern:
https://github.com/scikit-learn/scikit-learn/blob/0fb307bf3/sklearn/linear_model/_logistic.py#L1011
So the current state of the input arguments LGTM just elevate optimization_prediction_type over optimization_objective and make optimization_objective optional but add comments to explain the defaults when selecting optimization_prediction_type.
There was a problem hiding this comment.
Thanks for the context, sgtm!
5831100 to
c175714
Compare
| init = initializer.global_config.init | ||
|
|
||
| __all__ = ("gapic", "CustomTrainingJob", "Model", "Dataset", "Endpoint") | ||
| __all__ = ( |
There was a problem hiding this comment.
The linter did this
c12d2b9 to
7858dfd
Compare
| validation_fraction_split: float, | ||
| test_fraction_split: float, | ||
| weight_column: Optional[str] = None, | ||
| budget_milli_node_hours: int = 1000, |
There was a problem hiding this comment.
I'm guessing setting this to less than 1000 hours will cause the backend to use 1000 hours instead. Probably same with the maximum value.
I have no idea if this is true though and this seems opaque to users.
Ideally, the backend should respond with a warning if the parameters are out of bounds.
2e4dc7b to
caf7dc4
Compare
Added the training job subclass for tables
caf7dc4 to
072850f
Compare
Added the training job subclass for tables Co-authored-by: Ivan Cheung <ivanmkc@google.com>
Added the training job subclass for tables Co-authored-by: Ivan Cheung <ivanmkc@google.com>
Added the training job subclass for tables Co-authored-by: Ivan Cheung <ivanmkc@google.com>
Added the training job subclass for tables Co-authored-by: Ivan Cheung <ivanmkc@google.com>
Added the training job subclass for tables Co-authored-by: Ivan Cheung <ivanmkc@google.com>
Added the training job subclass for tables Co-authored-by: Ivan Cheung <ivanmkc@google.com>
Added support for AutoMLTablesTraining.
Fixes https://b.corp.google.com/issues/172282518
Depends on #49
Could possibly add more client-side validation, but currently deferring validation to the backend services pending more discussion.