Previously we were able to run connectedComponents() with ~150 million records with persist() but it seems like persist is causing problem and job stopped working or slow down without any error.
Note: Running this job on Dataproc with these cluster specs
Workers: 20
Memory: 64gb
Cores: 16
Previously we were able to run connectedComponents() with ~150 million records with persist() but it seems like persist is causing problem and job stopped working or slow down without any error.
Note: Running this job on Dataproc with these cluster specs
Workers: 20
Memory: 64gb
Cores: 16