Skip to content

The --skip-source-validation option not working #3682

@InstructorZhang

Description

@InstructorZhang

Expected Behavior

When running feast apply --skip-source-validation, the data source validation should be skipped.

Current Behavior

When I specify --skip-source-validation with feast apply, it will call store.plan(repo) (see code here), which will invoke _make_inferences() and then _infer_features_and_entities(). The function _infer_features_and_entities() calls the function get_table_column_names_and_types(). On the other hand, what the function validate() in Spark source does is also to call get_table_column_names_and_types().

That means the validation was not skipped because the function get_table_column_names_and_types() will be called anyway, which causes the "table not found" error even if I ran feast apply --skip-source-validation.

The logs for error are as follows:

Traceback (most recent call last):
  File "/opt/homebrew/anaconda3/envs/feast/bin/feast", line 8, in <module>
    sys.exit(cli())
  File "/opt/homebrew/anaconda3/envs/feast/lib/python3.8/site-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/opt/homebrew/anaconda3/envs/feast/lib/python3.8/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/opt/homebrew/anaconda3/envs/feast/lib/python3.8/site-packages/click/core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/homebrew/anaconda3/envs/feast/lib/python3.8/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/homebrew/anaconda3/envs/feast/lib/python3.8/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/opt/homebrew/anaconda3/envs/feast/lib/python3.8/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/opt/homebrew/anaconda3/envs/feast/lib/python3.8/site-packages/feast/cli.py", line 519, in apply_total_command
    apply_total(repo_config, repo, skip_source_validation)
  File "/opt/homebrew/anaconda3/envs/feast/lib/python3.8/site-packages/feast/usage.py", line 283, in wrapper
    return func(*args, **kwargs)
  File "/opt/homebrew/anaconda3/envs/feast/lib/python3.8/site-packages/feast/repo_operations.py", line 335, in apply_total
    apply_total_with_repo_instance(
  File "/opt/homebrew/anaconda3/envs/feast/lib/python3.8/site-packages/feast/repo_operations.py", line 296, in apply_total_with_repo_instance
    registry_diff, infra_diff, new_infra = store.plan(repo)
  File "/opt/homebrew/anaconda3/envs/feast/lib/python3.8/site-packages/feast/usage.py", line 283, in wrapper
    return func(*args, **kwargs)
  File "/opt/homebrew/anaconda3/envs/feast/lib/python3.8/site-packages/feast/feature_store.py", line 724, in plan
    self._make_inferences(
  File "/opt/homebrew/anaconda3/envs/feast/lib/python3.8/site-packages/feast/feature_store.py", line 602, in _make_inferences
    update_feature_views_with_inferred_features_and_entities(
  File "/opt/homebrew/anaconda3/envs/feast/lib/python3.8/site-packages/feast/inference.py", line 168, in update_feature_views_with_inferred_features_and_entities
    _infer_features_and_entities(
  File "/opt/homebrew/anaconda3/envs/feast/lib/python3.8/site-packages/feast/inference.py", line 206, in _infer_features_and_entities
    table_column_names_and_types = fv.batch_source.get_table_column_names_and_types(
  File "/opt/homebrew/anaconda3/envs/feast/lib/python3.8/site-packages/feast_spark/spark_source.py", line 161, in get_table_column_names_and_types
    df = spark_session.sql(f"SELECT * FROM {self.get_table_query_string()}")
  File "/opt/homebrew/anaconda3/envs/feast/lib/python3.8/site-packages/pyspark/sql/session.py", line 1440, in sql
    return DataFrame(self._jsparkSession.sql(sqlQuery, litArgs), self)
  File "/opt/homebrew/anaconda3/envs/feast/lib/python3.8/site-packages/py4j/java_gateway.py", line 1322, in __call__
    return_value = get_return_value(
  File "/opt/homebrew/anaconda3/envs/feast/lib/python3.8/site-packages/pyspark/errors/exceptions/captured.py", line 175, in deco
    raise converted from None
pyspark.errors.exceptions.captured.AnalysisException: [TABLE_OR_VIEW_NOT_FOUND] The table or view `XXXXXX` cannot be found. Verify the spelling and correctness of the schema and catalog.
If you did not qualify the name with a schema, verify the current_schema() output, or qualify the name with the correct schema and catalog.
To tolerate the error on drop use DROP VIEW IF EXISTS or DROP TABLE IF EXISTS.; line 18 pos 9;

Steps to reproduce

Can only reproduce in our internal environment where there are permission controls.

Specifications

  • Version: 0.31.1
  • Platform: Linux
  • Subsystem: Ubuntu

Possible Solution

Need to re-investigate the logics/implementation of skipping source validation.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions