Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
044a1c1
</think>
SemyonSinchenko Feb 28, 2026
54ebffe
refactor: extract ConnectedComponents implementation to TwoPhase
SemyonSinchenko Feb 28, 2026
87f43bb
fix: standardize connected components algorithm string and GraphX impl
SemyonSinchenko Feb 28, 2026
8057d93
feat: add getAlgorithm getter
SemyonSinchenko Feb 28, 2026
3f6c1d0
refactor: remove GraphX implementation and maxIter from TwoPhase
SemyonSinchenko Mar 5, 2026
b97f19c
feat: add runAQE method for AQE-based two-phase connected components
SemyonSinchenko Mar 5, 2026
44772b3
refactor: calcMinNbrSum object method, DecimalType(38,10), drop overflow
SemyonSinchenko Mar 5, 2026
44bfb4e
refactor: simplify calcMinNbrSum and remove unused variable
SemyonSinchenko Mar 5, 2026
354ec7d
refactor: extract shared output generation to buildOutput method
SemyonSinchenko Mar 5, 2026
6a8c0bf
feat: add intermediate storage level and unpersist in GraphX CC
SemyonSinchenko Mar 5, 2026
3d1972e
feat: use runAQE for two-phase CC when broadcastThreshold is -1
SemyonSinchenko Mar 5, 2026
d3d3230
refactor: remove unused spark variable in TwoPhase
SemyonSinchenko Mar 5, 2026
4cac662
feat: add isGraphPrepared to skip graph preparation in run and runAQE
SemyonSinchenko Mar 5, 2026
2321744
refactor: add isGraphPrepared and checkpoint params to algorithm runs
SemyonSinchenko Mar 5, 2026
2047d52
feat: add internal graph preparation control to ConnectedComponents
SemyonSinchenko Mar 5, 2026
4949d3c
refactor: rename _isGraphPrepared and update setter
SemyonSinchenko Mar 5, 2026
43982af
docs: clarify setIsGraphPrepared docstring on algorithm prep steps
SemyonSinchenko Mar 5, 2026
6d778b7
docs: replace Connected Components section with comprehensive algorit…
SemyonSinchenko Mar 5, 2026
075026b
feat: rename graphframes to two_phase and add checkpointing to rand_cont
SemyonSinchenko Mar 5, 2026
d2d2ecb
test: add local checkpointing to RandomizedContractionSuite
SemyonSinchenko Mar 5, 2026
5bfaac1
refactor: unpersist edges earlier and in finally block
SemyonSinchenko Mar 5, 2026
7813237
test: add System.gc() to make test more robust
SemyonSinchenko Mar 5, 2026
328f4fc
test: clean up Spark checkpoint directory in RandomizedContractionSuite
SemyonSinchenko Mar 5, 2026
cc3abbd
test(RandomizedContractionSuite): move checkpoint cleanup after unper…
SemyonSinchenko Mar 5, 2026
04ebe28
chore: check memory leaks for persisted only
SemyonSinchenko Mar 5, 2026
a4aee16
fix: make the test robust, not random
SemyonSinchenko Mar 5, 2026
8562ee6
chore: drop the redundant
SemyonSinchenko Mar 5, 2026
54a7ae9
chor: naming
SemyonSinchenko Mar 6, 2026
ab5aec2
Merge remote-tracking branch 'graphframes/main' into 775-refactor-cc-api
SemyonSinchenko Mar 11, 2026
6915dc2
fix: addressing James' comments
SemyonSinchenko Mar 11, 2026
ce062ff
fix: python docstrings
SemyonSinchenko Mar 11, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ class ConnectedComponentsBenchmark extends LDBCBenchmarkBase {
} else {
graph.connectedComponents
.setUseLocalCheckpoints(true)
.setAlgorithm("graphframes")
.setAlgorithm(algorithm)
.setBroadcastThreshold(broadcastThreshold.toInt)
.setUseLocalCheckpoints(useLocalCheckpoints.toBoolean)
.run()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,9 +32,11 @@ object GraphFramesConf {
SQLConf
.buildConf("spark.graphframes.connectedComponents.algorithm")
.doc(""" Sets the connected components algorithm to use (default: "graphframes"). Supported algorithms
| - "graphframes": Uses alternating large star and small star iterations proposed in
| - "two_phase": Uses alternating large star and small star iterations proposed in
| [[http://dx.doi.org/10.1145/2670979.2670997 Connected Components in MapReduce and Beyond]]
| with skewed join optimization.
| - "randomized_contraction": Uses randomized algorithm proposed in
| [[https://arxiv.org/pdf/1802.09478 In-database connected component analysis]]
| - "graphframes": Deprecated alias for "two_phase"
| - "graphx": Converts the graph to a GraphX graph and then uses the connected components
| implementation in GraphX.
| @see org.graphframes.lib.ConnectedComponents.supportedAlgorithms""".stripMargin)
Expand Down
Loading
Loading