Skip to content

bug: KCore: algorithm returns wrong kcore values (kcore = degree for all vertices) #801

@DavidGi

Description

@DavidGi

Describe the bug

GraphFrame.kCore produces incorrect results. For every vertex the algorithm returns kcore = degree regardless of the graph's actual k-core structure.
The root cause is a swap in KCore.scala: both sendMsgToSrc and sendMsgToDst reference the same endpoint's column, so each vertex only ever receives its own current value back. The algorithm converges after a single superstep having made no real progress.

To Reproduce

The existing test suite inadvertently documents the bug.
"star graph": asserts center vertex kcore === 3 (correct value is 1 — no 2-core can form when leaves each have one neighbor)
"chain graph": uses a directed cycle (0,1),(1,2),(2,0) labelled as a chain (while it's actually the same as the "triangle test"), asserts all kcore === 2, which also holds for the correct algorithm by coincidence — masking the error
"medium graph": asserts kcoreMap(0L) >= 4 (correct upper bound is 3)

Expected behavior

See "To Reproduce" section above.

An additional test could be, for the graph below (triangle + pendant tail):

1 ── 2
 ╲   │
    3 ── 4 ── 5

Vertices {1, 2, 3} form a 2-core → kcore = 2
Vertices {4, 5} are only in the 1-core → kcore = 1

System:

  • OS: Ubuntu 20.04
  • Spark / PySpark versions: 3.5.x, 4.0.x, 4.1.x
  • GraphFrames version: 0.10+ (since introduction of kcore)

Verified locally with latest version of main (commit 2571f40)

Component

  • Scala Core Internal
  • Scala API
  • Spark Connect Plugin
  • PySpark Classic
  • PySpark Connect

Are you planning on creating a PR?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions