Add `date_partition_column` to `SparkSource`

**Is your feature request related to a problem? Please describe.**
The current spark implementation scans over all parquet files. This process can be made faster and more efficient by specifying a `date_partition_column`. During execution, this column would be used to filter the data at a file level. Only files who's date is within the range would be scanned.

**Describe the solution you'd like**
Add `date_partition_column` to `SparkSource`. A similar implementation exists for the `AthenaSource`

**Describe alternatives you've considered**
None

I have implemented this locally and it works. I'm happy to open a PR


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `date_partition_column` to `SparkSource` #4835

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add date_partition_column to SparkSource #4835

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Add `date_partition_column` to `SparkSource` #4835