Skip to content

Add K-Means clustering algorithm#7515

Open
Tejashribambal19 wants to merge 5 commits into
TheAlgorithms:masterfrom
Tejashribambal19:feature/kmeans-machinelearning
Open

Add K-Means clustering algorithm#7515
Tejashribambal19 wants to merge 5 commits into
TheAlgorithms:masterfrom
Tejashribambal19:feature/kmeans-machinelearning

Conversation

@Tejashribambal19

Copy link
Copy Markdown

Description

This pull request adds an implementation of the K-Means clustering algorithm using Lloyd's algorithm.

Summary

  • Implements the K-Means clustering algorithm for multidimensional data.
  • Assigns each point to its nearest centroid using squared Euclidean distance.
  • Recomputes centroid positions until convergence or the maximum number of iterations is reached.
  • Preserves the previous centroid when a cluster becomes empty.
  • Includes comprehensive input validation for invalid arguments.
  • Adds a corresponding JUnit test class covering both normal and edge cases.

Validation

The implementation validates:

  • Null input arrays
  • Empty datasets
  • Empty centroid arrays
  • More centroids than data points
  • Non-positive maximum iterations
  • Negative tolerance
  • Dimension mismatches
  • Zero-dimensional points
  • Null point and centroid rows

Tests

The accompanying test class covers:

  • Simple clustering
  • Single-cluster datasets
  • Immediate convergence
  • Empty-cluster handling
  • Null inputs
  • Empty datasets
  • Empty centroid arrays
  • Invalid iteration counts
  • Negative tolerance
  • Too many centroids
  • Dimension mismatches
  • Zero-dimensional points

Checklist

  • I have read CONTRIBUTING.md.
  • This pull request is all my own work -- I have not plagiarized it.
  • All filenames are in PascalCase.
  • All functions and variable names follow Java naming conventions.
  • All new algorithms have a URL in their comments that points to Wikipedia or other similar explanations.
  • All new algorithms include a corresponding test class that validates their functionality.
  • All new code is formatted with clang-format -i --style=file path/to/your/file.java.

@codecov-commenter

codecov-commenter commented Jul 5, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 97.43590% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.30%. Comparing base (c498379) to head (c093785).

Files with missing lines Patch % Lines
...java/com/thealgorithms/machinelearning/KMeans.java 97.43% 0 Missing and 2 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master    #7515      +/-   ##
============================================
+ Coverage     80.25%   80.30%   +0.04%     
- Complexity     7358     7388      +30     
============================================
  Files           810      811       +1     
  Lines         23787    23865      +78     
  Branches       4678     4706      +28     
============================================
+ Hits          19091    19165      +74     
  Misses         3940     3940              
- Partials        756      760       +4     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants