Describe the bug
GraphFrame.kCore produces incorrect results. For every vertex the algorithm returns kcore = degree regardless of the graph's actual k-core structure.
The root cause is a swap in KCore.scala: both sendMsgToSrc and sendMsgToDst reference the same endpoint's column, so each vertex only ever receives its own current value back. The algorithm converges after a single superstep having made no real progress.
To Reproduce
The existing test suite inadvertently documents the bug.
"star graph": asserts center vertex kcore === 3 (correct value is 1 — no 2-core can form when leaves each have one neighbor)
"chain graph": uses a directed cycle (0,1),(1,2),(2,0) labelled as a chain (while it's actually the same as the "triangle test"), asserts all kcore === 2, which also holds for the correct algorithm by coincidence — masking the error
"medium graph": asserts kcoreMap(0L) >= 4 (correct upper bound is 3)
Expected behavior
See "To Reproduce" section above.
An additional test could be, for the graph below (triangle + pendant tail):
Vertices {1, 2, 3} form a 2-core → kcore = 2
Vertices {4, 5} are only in the 1-core → kcore = 1
System:
- OS: Ubuntu 20.04
- Spark / PySpark versions: 3.5.x, 4.0.x, 4.1.x
- GraphFrames version: 0.10+ (since introduction of kcore)
Verified locally with latest version of main (commit 2571f40)
Component
Are you planning on creating a PR?
Describe the bug
GraphFrame.kCoreproduces incorrect results. For every vertex the algorithm returnskcore = degreeregardless of the graph's actual k-core structure.The root cause is a swap in
KCore.scala: bothsendMsgToSrcandsendMsgToDstreference the same endpoint's column, so each vertex only ever receives its own current value back. The algorithm converges after a single superstep having made no real progress.To Reproduce
The existing test suite inadvertently documents the bug.
"star graph": asserts center vertex kcore === 3 (correct value is 1 — no 2-core can form when leaves each have one neighbor)
"chain graph": uses a directed cycle (0,1),(1,2),(2,0) labelled as a chain (while it's actually the same as the "triangle test"), asserts all kcore === 2, which also holds for the correct algorithm by coincidence — masking the error
"medium graph": asserts kcoreMap(0L) >= 4 (correct upper bound is 3)
Expected behavior
See "To Reproduce" section above.
An additional test could be, for the graph below (triangle + pendant tail):
Vertices {1, 2, 3} form a 2-core → kcore = 2
Vertices {4, 5} are only in the 1-core → kcore = 1
System:
Verified locally with latest version of main (commit 2571f40)
Component
Are you planning on creating a PR?
PR here: fix(kcore): swap sendMsgToSrc/sendMsgToDst column references #802