chore(bigframes): optimize system test teardown#17443
Conversation
There was a problem hiding this comment.
Code Review
This pull request parallelizes table deletion using a ThreadPoolExecutor and moves the Python UDF cleanup process to a background daemon thread in anonymous_dataset.py. Additionally, it configures pytest to use the worksteal distribution algorithm in noxfile.py. Feedback on these changes highlights potential issues with silent failures during table deletion due to unhandled exceptions in the thread pool, and warns that running the UDF cleanup in a daemon thread may lead to incomplete cleanup and secondary exceptions during interpreter shutdown.
| with ThreadPoolExecutor() as executor: | ||
| for table_ref in self._table_ids: | ||
| executor.submit(self.bqclient.delete_table, table_ref, not_found_ok=True) | ||
| self._table_ids.clear() |
There was a problem hiding this comment.
Using ThreadPoolExecutor as a context manager waits for all threads to complete, but it silently swallows any exceptions raised by the submitted tasks (unless future.result() is explicitly called). To prevent silent failures during table deletion, we should retrieve the results of the futures. Additionally, we can avoid spinning up the thread pool entirely if self._table_ids is empty, and use a try...finally block to ensure self._table_ids.clear() is always called.
if self._table_ids:
try:
with ThreadPoolExecutor() as executor:
futures = [
executor.submit(self.bqclient.delete_table, table_ref, not_found_ok=True)
for table_ref in self._table_ids
]
for future in futures:
future.result()
finally:
self._table_ids.clear()| ) | ||
| warnings.warn(msg, category=bfe.CleanupFailedWarning) | ||
|
|
||
| threading.Thread(target=run_cleanup, daemon=True, name="bigframes-udf-cleanup").start() |
There was a problem hiding this comment.
Starting the UDF cleanup in a daemon thread (daemon=True) means that if the Python process exits shortly after close() is called (which is very common at the end of a script or test run), the thread will be abruptly terminated. This can prevent the cleanup from completing, leading to resource accumulation (the very issue b/450913424 is trying to solve).
Additionally, calling warnings.warn from a daemon thread during interpreter shutdown can raise secondary exceptions (like TypeError or AttributeError due to modules being cleaned up and set to None).
Consider whether this cleanup should be synchronous, or if there is a way to register an atexit handler to ensure the thread completes before the process exits.
brings down bigframes system tests from 23 mins to 12 mins i.e. 2x speed up.