This proposal aims to add a centralized global mechanism that tracks all DataFrames persisted and/or checkpointed across different GraphFrames algorithms, serving as a common tool for consistency and efficiency.
Nice-to-haves:
- Methods to centralize DataFrame persistence, checkpointing, and temporary AQE disabling.
- Persisted DataFrames should set the internal
tableName in the Spark's Cache Manager to prevent Spark from performing an internal explain (see SPARK-50992) and be clearly identified in the Spark UI.
- A global configuration setting for the storage level, ensuring uniform persistence levels for all DataFrames.
- A
clearAll method to unpersist all internally persisted DataFrames in GraphFrames and delete the checkpoint dir.
- A "nice" documentation ;)
Related Issues:
This proposal aims to add a centralized global mechanism that tracks all DataFrames persisted and/or checkpointed across different GraphFrames algorithms, serving as a common tool for consistency and efficiency.
Nice-to-haves:
tableNamein the Spark's Cache Manager to prevent Spark from performing an internal explain (see SPARK-50992) and be clearly identified in the Spark UI.clearAllmethod to unpersist all internally persisted DataFrames in GraphFrames and delete the checkpoint dir.Related Issues: