Skip to content

Optimize drs plan generation#12014

Merged
DaanHoogland merged 6 commits into
apache:4.20from
shapeblue:optimize-drs-plan-generation
Dec 10, 2025
Merged

Optimize drs plan generation#12014
DaanHoogland merged 6 commits into
apache:4.20from
shapeblue:optimize-drs-plan-generation

Conversation

@vishesh92

@vishesh92 vishesh92 commented Nov 7, 2025

Copy link
Copy Markdown
Member

Description

This PR fixes #11978

Tested in a simulator based env with 15 hosts & 150 VMs for 5 iterations. Before the changes, it took around 2.5 minutes and after the changes, it takes around 25 seconds.

To optimize, I added tests for findHostForMigration and then refactored the method. Then used the parts of the refactored method for DRS plan generation. Then I refactored how calculation was being done for a migration and made it less compute intensive to make it faster.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • Build/CI
  • Test (unit or integration test code)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

How did you try to break this feature and the system with this change?

@codecov

codecov Bot commented Nov 7, 2025

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 4.00%. Comparing base (d53b6db) to head (56ab2c0).
⚠️ Report is 14 commits behind head on 4.20.

❗ There is a different number of reports uploaded between BASE (d53b6db) and HEAD (56ab2c0). Click for more details.

HEAD has 1 upload less than BASE
Flag BASE (d53b6db) HEAD (56ab2c0)
unittests 1 0
Additional details and impacted files
@@              Coverage Diff              @@
##               4.20   #12014       +/-   ##
=============================================
- Coverage     16.19%    4.00%   -12.19%     
=============================================
  Files          5657      402     -5255     
  Lines        498467    32665   -465802     
  Branches      60491     5808    -54683     
=============================================
- Hits          80702     1309    -79393     
+ Misses       408783    31203   -377580     
+ Partials       8982      153     -8829     
Flag Coverage Δ
uitests 4.00% <ø> (ø)
unittests ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR optimizes the Cluster DRS (Distributed Resource Scheduler) planning algorithm to improve performance through caching and reduced computational complexity. The main changes include:

  • Pre-computing VM technical compatibility checks (storage, hypervisor, UEFI) once per DRS cycle instead of per iteration
  • Optimizing imbalance calculations by reusing pre-calculated metrics arrays and updating only affected hosts
  • Refactoring listHostsForMigrationOfVM into smaller, reusable methods (getTechnicallyCompatibleHosts, applyAffinityConstraints, getCapableSuitableHosts)
  • Adding comprehensive test coverage for the new migration host listing functionality

Reviewed Changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
server/src/main/java/org/apache/cloudstack/cluster/ClusterDrsServiceImpl.java Implements caching of technical compatibility checks and affinity group processing optimization; adds pre-calculation of metrics for faster iteration
server/src/main/java/com/cloud/server/ManagementServerImpl.java Refactors VM migration host listing into smaller helper methods; extracts technical compatibility and affinity constraint logic
api/src/main/java/org/apache/cloudstack/cluster/ClusterDrsAlgorithm.java Adds optimized imbalance calculation methods using array-based operations; introduces reusable stateless calculators
plugins/drs/cluster/balanced/src/main/java/org/apache/cloudstack/cluster/Balanced.java Updates to use optimized getMetrics signature with pre-calculated data
plugins/drs/cluster/condensed/src/main/java/org/apache/cloudstack/cluster/Condensed.java Updates to use optimized getMetrics signature with pre-calculated data
server/src/test/java/org/apache/cloudstack/cluster/ClusterDrsServiceImplTest.java Adds tests for new DRS planning edge cases and updates existing tests for method signature changes
server/src/test/java/com/cloud/server/ManagementServerImplTest.java Adds comprehensive test coverage for VM migration host listing functionality
api/src/main/java/com/cloud/server/ManagementService.java Adds public API methods for technical compatibility and affinity constraints
plugins/drs/cluster/balanced/src/test/java/org/apache/cloudstack/cluster/BalancedTest.java Updates tests for new getMetrics signature
plugins/drs/cluster/condensed/src/test/java/org/apache/cloudstack/cluster/CondensedTest.java Updates tests for new getMetrics signature

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread server/src/main/java/org/apache/cloudstack/cluster/ClusterDrsServiceImpl.java Outdated
Comment thread server/src/main/java/com/cloud/server/ManagementServerImpl.java
@vishesh92 vishesh92 force-pushed the optimize-drs-plan-generation branch 2 times, most recently from 7aa5194 to 6d21c2c Compare November 7, 2025 10:15
@vishesh92 vishesh92 requested a review from Copilot November 7, 2025 10:15
@vishesh92

Copy link
Copy Markdown
Member Author

@blueorangutan package

@blueorangutan

Copy link
Copy Markdown

@vishesh92 a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread server/src/test/java/org/apache/cloudstack/cluster/ClusterDrsServiceImplTest.java Outdated
@blueorangutan

Copy link
Copy Markdown

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 15676

@vishesh92 vishesh92 force-pushed the optimize-drs-plan-generation branch from 6d21c2c to 912acbb Compare November 7, 2025 13:01
@vishesh92 vishesh92 linked an issue Nov 7, 2025 that may be closed by this pull request
@vishesh92 vishesh92 marked this pull request as ready for review November 7, 2025 13:10
@vishesh92 vishesh92 requested a review from Copilot November 7, 2025 13:10
@vishesh92

Copy link
Copy Markdown
Member Author

@blueorangutan package

@blueorangutan

Copy link
Copy Markdown

@vishesh92 a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 8 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread server/src/main/java/com/cloud/server/ManagementServerImpl.java
Comment thread api/src/main/java/org/apache/cloudstack/cluster/ClusterDrsAlgorithm.java Outdated
Comment thread server/src/test/java/com/cloud/server/ManagementServerImplTest.java
@blueorangutan

Copy link
Copy Markdown

Packaging result [SF]: ✖️ el8 ✖️ el9 ✖️ debian ✖️ suse15. SL-JID 15678

@vishesh92 vishesh92 force-pushed the optimize-drs-plan-generation branch from 912acbb to 182a02d Compare November 7, 2025 20:41
Comment thread server/src/main/java/org/apache/cloudstack/cluster/ClusterDrsServiceImpl.java Outdated
Comment thread server/src/main/java/org/apache/cloudstack/cluster/ClusterDrsServiceImpl.java Outdated
Comment thread server/src/main/java/org/apache/cloudstack/cluster/ClusterDrsServiceImpl.java Outdated
Comment thread server/src/main/java/org/apache/cloudstack/cluster/ClusterDrsServiceImpl.java Outdated
Comment thread server/src/main/java/org/apache/cloudstack/cluster/ClusterDrsServiceImpl.java Outdated
Comment thread server/src/main/java/org/apache/cloudstack/cluster/ClusterDrsServiceImpl.java Outdated
Comment thread server/src/main/java/org/apache/cloudstack/cluster/ClusterDrsServiceImpl.java Outdated
@DaanHoogland

Copy link
Copy Markdown
Contributor

@vishesh92 clgtm although some more modularisation can be done.

Do we have more performance data? I read a factor 7 improvement but only in a very specific situation. Any chance we can add to that?

@vishesh92

Copy link
Copy Markdown
Member Author

@vishesh92 clgtm although some more modularisation can be done.

Do we have more performance data? I read a factor 7 improvement but only in a very specific situation. Any chance we can add to that?

I don't have more performance data. I tested this locally with Simulator.

@DaanHoogland DaanHoogland left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clgtm

@borisstoyanov

Copy link
Copy Markdown
Contributor

@blueorangutan package

@blueorangutan

Copy link
Copy Markdown

@borisstoyanov a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan

Copy link
Copy Markdown

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 15827

@RosiKyu

RosiKyu commented Dec 10, 2025

Copy link
Copy Markdown
Collaborator

DRS Performance Optimization Test Results

Environment: CloudStack 4.20.2.0 (WITHOUT PR) / 4.20.3.0 (WITH PR) | Simulator | 15 hosts, 150 VMs

Test Summary

Test Case Description Result
TC1 Performance Baseline Comparison PASS
TC2 DRS Migration Execution Validation PASS
TC3 Condensed Algorithm Verification PASS
TC4 Affinity Rules Compliance PASS
TC5 Scale and Consistency Testing PASS
TC6 Edge Case Handling PASS
TC7 Regression Testing PASS

TC1: Performance Baseline Comparison

Result: PASS - 3.6x performance improvement (150s -> 42s)

Environment Run 1 Run 2 Run 3 Average
WITHOUT PR 2m30s 2m34s 2m28s ~150s
WITH PR 44s 41s 42s ~42s

WITHOUT PR (10.0.35.218)

time echo "generate clusterdrsplan id=2fbbf922-f557-4e23-99c1-e4df3be7210c migrations=5" | cmk [Run 1: 2m30.958s]
{
  "generateclusterdrsplanresponse": {
    "clusterid": "2fbbf922-f557-4e23-99c1-e4df3be7210c",
    "eventid": "974a5d16-e4e3-4e09-b976-1a589ac1818e",
    "id": "f5e24fdc-b6d2-4191-aa81-6a3630bcd4a1",
    "migrations": [
      {
        "destinationhostid": "fca5a008-952f-4ca5-8a60-2213cfde4b5a",
        "destinationhostname": "SimulatedAgent.a9cb3969-0455-4dde-9cce-2e64a3e2c094",
        "sourcehostid": "5e336605-f5c6-47b9-a847-06a53a7ae696",
        "sourcehostname": "SimulatedAgent.2f0b0205-28e0-4ab9-96cc-957f8230d894",
        "virtualmachineid": "12c0a3c4-5494-46a2-b8d5-e43c69c8af33",
        "virtualmachinename": "i-2-41-QA"
      },
      {
        "destinationhostid": "fca5a008-952f-4ca5-8a60-2213cfde4b5a",
        "destinationhostname": "SimulatedAgent.a9cb3969-0455-4dde-9cce-2e64a3e2c094",
        "sourcehostid": "c27ffc96-fb4f-4e74-a79c-a4e8748f4cff",
        "sourcehostname": "SimulatedAgent.f05f652c-7908-4212-842a-724e6b993884",
        "virtualmachineid": "1b6a7275-19a7-459a-83ea-32475cdcaa22",
        "virtualmachinename": "i-2-11-QA"
      },
      {
        "destinationhostid": "fca5a008-952f-4ca5-8a60-2213cfde4b5a",
        "destinationhostname": "SimulatedAgent.a9cb3969-0455-4dde-9cce-2e64a3e2c094",
        "sourcehostid": "8d47e97d-6ffa-4aac-a146-23976f997365",
        "sourcehostname": "SimulatedAgent.d006e660-69b7-4475-8cb2-d8626f58f156",
        "virtualmachineid": "4c891cf8-ee96-467a-a307-f95b5bda3cad",
        "virtualmachinename": "i-2-24-QA"
      },
      {
        "destinationhostid": "fca5a008-952f-4ca5-8a60-2213cfde4b5a",
        "destinationhostname": "SimulatedAgent.a9cb3969-0455-4dde-9cce-2e64a3e2c094",
        "sourcehostid": "5e336605-f5c6-47b9-a847-06a53a7ae696",
        "sourcehostname": "SimulatedAgent.2f0b0205-28e0-4ab9-96cc-957f8230d894",
        "virtualmachineid": "ea4fac53-b639-4b37-84f6-11111c68af49",
        "virtualmachinename": "i-2-43-QA"
      },
      {
        "destinationhostid": "d875cf86-984d-42c3-87f2-1d94cf76309e",
        "destinationhostname": "SimulatedAgent.d8e60b91-775a-4017-ba25-b62bdcdcb5c7",
        "sourcehostid": "c27ffc96-fb4f-4e74-a79c-a4e8748f4cff",
        "sourcehostname": "SimulatedAgent.f05f652c-7908-4212-842a-724e6b993884",
        "virtualmachineid": "297d1b66-e8a6-4b94-a1d9-0d906c7ffa48",
        "virtualmachinename": "i-2-19-QA"
      }
    ],
    "status": "UNDER_REVIEW",
    "type": "MANUAL"
  }
}
time echo "generate clusterdrsplan id=2fbbf922-f557-4e23-99c1-e4df3be7210c migrations=5" | cmk [Run 2: 2m34.215s]
{
  "generateclusterdrsplanresponse": {
    "clusterid": "2fbbf922-f557-4e23-99c1-e4df3be7210c",
    "eventid": "29d3428e-896b-4231-bbed-cf18b54efc0e",
    "id": "306e348d-f69b-49e2-93e4-2dd74fc58a44",
    "migrations": [
      {
        "destinationhostid": "fca5a008-952f-4ca5-8a60-2213cfde4b5a",
        "sourcehostid": "5e336605-f5c6-47b9-a847-06a53a7ae696",
        "virtualmachinename": "i-2-41-QA"
      },
      {
        "destinationhostid": "fca5a008-952f-4ca5-8a60-2213cfde4b5a",
        "sourcehostid": "c27ffc96-fb4f-4e74-a79c-a4e8748f4cff",
        "virtualmachinename": "i-2-11-QA"
      },
      {
        "destinationhostid": "fca5a008-952f-4ca5-8a60-2213cfde4b5a",
        "sourcehostid": "8d47e97d-6ffa-4aac-a146-23976f997365",
        "virtualmachinename": "i-2-24-QA"
      },
      {
        "destinationhostid": "fca5a008-952f-4ca5-8a60-2213cfde4b5a",
        "sourcehostid": "5e336605-f5c6-47b9-a847-06a53a7ae696",
        "virtualmachinename": "i-2-43-QA"
      },
      {
        "destinationhostid": "d875cf86-984d-42c3-87f2-1d94cf76309e",
        "sourcehostid": "c27ffc96-fb4f-4e74-a79c-a4e8748f4cff",
        "virtualmachinename": "i-2-19-QA"
      }
    ],
    "status": "UNDER_REVIEW",
    "type": "MANUAL"
  }
}
time echo "generate clusterdrsplan id=2fbbf922-f557-4e23-99c1-e4df3be7210c migrations=5" | cmk [Run 3: 2m28.971s]
{
  "generateclusterdrsplanresponse": {
    "clusterid": "2fbbf922-f557-4e23-99c1-e4df3be7210c",
    "eventid": "e7a321b7-f530-432b-8d0a-962872aa8846",
    "id": "5810b2f7-a384-46a3-966d-edb46025bd6c",
    "migrations": [
      {
        "destinationhostid": "fca5a008-952f-4ca5-8a60-2213cfde4b5a",
        "sourcehostid": "5e336605-f5c6-47b9-a847-06a53a7ae696",
        "virtualmachinename": "i-2-41-QA"
      },
      {
        "destinationhostid": "fca5a008-952f-4ca5-8a60-2213cfde4b5a",
        "sourcehostid": "c27ffc96-fb4f-4e74-a79c-a4e8748f4cff",
        "virtualmachinename": "i-2-11-QA"
      },
      {
        "destinationhostid": "fca5a008-952f-4ca5-8a60-2213cfde4b5a",
        "sourcehostid": "8d47e97d-6ffa-4aac-a146-23976f997365",
        "virtualmachinename": "i-2-24-QA"
      },
      {
        "destinationhostid": "fca5a008-952f-4ca5-8a60-2213cfde4b5a",
        "sourcehostid": "5e336605-f5c6-47b9-a847-06a53a7ae696",
        "virtualmachinename": "i-2-43-QA"
      },
      {
        "destinationhostid": "d875cf86-984d-42c3-87f2-1d94cf76309e",
        "sourcehostid": "c27ffc96-fb4f-4e74-a79c-a4e8748f4cff",
        "virtualmachinename": "i-2-19-QA"
      }
    ],
    "status": "UNDER_REVIEW",
    "type": "MANUAL"
  }
}

WITH PR (10.0.33.97)

time echo "generate clusterdrsplan id=2fbbf922-f557-4e23-99c1-e4df3be7210c migrations=5" | cmk [Run 1: 0m44.479s]
{
  "generateclusterdrsplanresponse": {
    "clusterid": "2fbbf922-f557-4e23-99c1-e4df3be7210c",
    "eventid": "f62e5cb6-6228-46f3-a319-b21f0f8f4838",
    "id": "0bfc36d1-e36c-4c67-9c09-d031dd57a91b",
    "migrations": [
      {
        "destinationhostid": "fca5a008-952f-4ca5-8a60-2213cfde4b5a",
        "sourcehostid": "5e336605-f5c6-47b9-a847-06a53a7ae696",
        "virtualmachinename": "i-2-41-QA"
      },
      {
        "destinationhostid": "fca5a008-952f-4ca5-8a60-2213cfde4b5a",
        "sourcehostid": "c27ffc96-fb4f-4e74-a79c-a4e8748f4cff",
        "virtualmachinename": "i-2-11-QA"
      },
      {
        "destinationhostid": "fca5a008-952f-4ca5-8a60-2213cfde4b5a",
        "sourcehostid": "8d47e97d-6ffa-4aac-a146-23976f997365",
        "virtualmachinename": "i-2-24-QA"
      },
      {
        "destinationhostid": "fca5a008-952f-4ca5-8a60-2213cfde4b5a",
        "sourcehostid": "5e336605-f5c6-47b9-a847-06a53a7ae696",
        "virtualmachinename": "i-2-43-QA"
      },
      {
        "destinationhostid": "67efe13d-abb1-4493-9ec5-52c970186c99",
        "sourcehostid": "c27ffc96-fb4f-4e74-a79c-a4e8748f4cff",
        "virtualmachinename": "i-2-19-QA"
      }
    ],
    "status": "UNDER_REVIEW",
    "type": "MANUAL"
  }
}
time echo "generate clusterdrsplan id=2fbbf922-f557-4e23-99c1-e4df3be7210c migrations=5" | cmk [Run 2: 0m40.830s]
{
  "generateclusterdrsplanresponse": {
    "clusterid": "2fbbf922-f557-4e23-99c1-e4df3be7210c",
    "eventid": "9580b28c-e8bd-46b6-a98e-741383b4ae44",
    "id": "e317dd0d-b5c3-4d0c-a827-89151fe25e29",
    "migrations": [
      {
        "destinationhostid": "fca5a008-952f-4ca5-8a60-2213cfde4b5a",
        "sourcehostid": "5e336605-f5c6-47b9-a847-06a53a7ae696",
        "virtualmachinename": "i-2-41-QA"
      },
      {
        "destinationhostid": "fca5a008-952f-4ca5-8a60-2213cfde4b5a",
        "sourcehostid": "c27ffc96-fb4f-4e74-a79c-a4e8748f4cff",
        "virtualmachinename": "i-2-11-QA"
      },
      {
        "destinationhostid": "fca5a008-952f-4ca5-8a60-2213cfde4b5a",
        "sourcehostid": "8d47e97d-6ffa-4aac-a146-23976f997365",
        "virtualmachinename": "i-2-24-QA"
      },
      {
        "destinationhostid": "fca5a008-952f-4ca5-8a60-2213cfde4b5a",
        "sourcehostid": "5e336605-f5c6-47b9-a847-06a53a7ae696",
        "virtualmachinename": "i-2-43-QA"
      },
      {
        "destinationhostid": "fca5a008-952f-4ca5-8a60-2213cfde4b5a",
        "sourcehostid": "c27ffc96-fb4f-4e74-a79c-a4e8748f4cff",
        "virtualmachinename": "i-2-19-QA"
      }
    ],
    "status": "UNDER_REVIEW",
    "type": "MANUAL"
  }
}
time echo "generate clusterdrsplan id=2fbbf922-f557-4e23-99c1-e4df3be7210c migrations=5" | cmk [Run 3: 0m42.268s]
{
  "generateclusterdrsplanresponse": {
    "clusterid": "2fbbf922-f557-4e23-99c1-e4df3be7210c",
    "eventid": "00143dd0-6c98-4c68-868a-8f9f057b4069",
    "id": "114bd182-d2d0-4c81-96f0-fb162b3acdc5",
    "migrations": [
      {
        "destinationhostid": "fca5a008-952f-4ca5-8a60-2213cfde4b5a",
        "sourcehostid": "5e336605-f5c6-47b9-a847-06a53a7ae696",
        "virtualmachinename": "i-2-41-QA"
      },
      {
        "destinationhostid": "fca5a008-952f-4ca5-8a60-2213cfde4b5a",
        "sourcehostid": "c27ffc96-fb4f-4e74-a79c-a4e8748f4cff",
        "virtualmachinename": "i-2-11-QA"
      },
      {
        "destinationhostid": "fca5a008-952f-4ca5-8a60-2213cfde4b5a",
        "sourcehostid": "8d47e97d-6ffa-4aac-a146-23976f997365",
        "virtualmachinename": "i-2-24-QA"
      },
      {
        "destinationhostid": "fca5a008-952f-4ca5-8a60-2213cfde4b5a",
        "sourcehostid": "5e336605-f5c6-47b9-a847-06a53a7ae696",
        "virtualmachinename": "i-2-43-QA"
      },
      {
        "destinationhostid": "67efe13d-abb1-4493-9ec5-52c970186c99",
        "sourcehostid": "c27ffc96-fb4f-4e74-a79c-a4e8748f4cff",
        "virtualmachinename": "i-2-19-QA"
      }
    ],
    "status": "UNDER_REVIEW",
    "type": "MANUAL"
  }
}

Observations: Both environments suggest the same VMs for migration. Functional correctness maintained.


TC2: DRS Migration Execution Validation

Result: PASS - DRS recommendations are correct and executable

Step 1: Check VM placement before migration

echo "list virtualmachines id=12c0a3c4-5494-46a2-b8d5-e43c69c8af33 filter=id,name,hostid,hostname" | cmk
{
  "count": 1,
  "virtualmachine": [
    {
      "hostid": "5e336605-f5c6-47b9-a847-06a53a7ae696",
      "hostname": "SimulatedAgent.2f0b0205-28e0-4ab9-96cc-957f8230d894",
      "id": "12c0a3c4-5494-46a2-b8d5-e43c69c8af33",
      "name": "drs-test-vm-37"
    }
  ]
}

Step 2: Generate DRS plan with execute=true

echo "generate clusterdrsplan id=2fbbf922-f557-4e23-99c1-e4df3be7210c migrations=2 execute=true" | cmk
{
  "generateclusterdrsplanresponse": {
    "clusterid": "2fbbf922-f557-4e23-99c1-e4df3be7210c",
    "eventid": "32c5d2b1-c24f-4075-8c48-3db91216d390",
    "id": "097302f8-c12a-4ac2-96cc-d3e538380c30",
    "migrations": [
      {
        "destinationhostid": "fca5a008-952f-4ca5-8a60-2213cfde4b5a",
        "destinationhostname": "SimulatedAgent.a9cb3969-0455-4dde-9cce-2e64a3e2c094",
        "sourcehostid": "5e336605-f5c6-47b9-a847-06a53a7ae696",
        "sourcehostname": "SimulatedAgent.2f0b0205-28e0-4ab9-96cc-957f8230d894",
        "virtualmachineid": "12c0a3c4-5494-46a2-b8d5-e43c69c8af33",
        "virtualmachinename": "i-2-41-QA"
      },
      {
        "destinationhostid": "fca5a008-952f-4ca5-8a60-2213cfde4b5a",
        "destinationhostname": "SimulatedAgent.a9cb3969-0455-4dde-9cce-2e64a3e2c094",
        "sourcehostid": "c27ffc96-fb4f-4e74-a79c-a4e8748f4cff",
        "sourcehostname": "SimulatedAgent.f05f652c-7908-4212-842a-724e6b993884",
        "virtualmachineid": "1b6a7275-19a7-459a-83ea-32475cdcaa22",
        "virtualmachinename": "i-2-11-QA"
      }
    ],
    "status": "UNDER_REVIEW",
    "type": "MANUAL"
  }
}

Step 3: Manual migration to validate DRS recommendation

echo "migrateVirtualMachine virtualmachineid=12c0a3c4-5494-46a2-b8d5-e43c69c8af33 hostid=fca5a008-952f-4ca5-8a60-2213cfde4b5a" | cmk

Step 4: Verify VM moved to DRS-recommended destination

echo "list virtualmachines id=12c0a3c4-5494-46a2-b8d5-e43c69c8af33 filter=id,name,hostid,hostname" | cmk
{
  "count": 1,
  "virtualmachine": [
    {
      "hostid": "fca5a008-952f-4ca5-8a60-2213cfde4b5a",
      "hostname": "SimulatedAgent.a9cb3969-0455-4dde-9cce-2e64a3e2c094",
      "id": "12c0a3c4-5494-46a2-b8d5-e43c69c8af33",
      "name": "drs-test-vm-37"
    }
  ]
}

Note: The execute=true flag with Simulator hypervisor does not auto-execute migrations. Manual migration validates DRS recommendations are correct.


TC3: Condensed Algorithm Verification

Result: PASS - Condensed algorithm correctly consolidates VMs onto fewer hosts

Step 1: Change algorithm to condensed

echo "update configuration name=drs.algorithm value=condensed clusterid=2fbbf922-f557-4e23-99c1-e4df3be7210c" | cmk
{
  "configuration": {
    "category": "Advanced",
    "component": "ClusterDrsService",
    "defaultvalue": "balanced",
    "description": "The DRS algorithm to be executed on the cluster. Possible values are condensed, balanced.",
    "displaytext": "DRS algorithm",
    "group": "Miscellaneous",
    "isdynamic": true,
    "name": "drs.algorithm",
    "options": "condensed,balanced",
    "scope": "cluster",
    "subgroup": "DRS",
    "type": "Select",
    "value": "condensed"
  }
}

Step 2: Generate DRS plan with condensed algorithm

time echo "generate clusterdrsplan id=2fbbf922-f557-4e23-99c1-e4df3be7210c migrations=5" | cmk [41.526s]
{
  "generateclusterdrsplanresponse": {
    "clusterid": "2fbbf922-f557-4e23-99c1-e4df3be7210c",
    "eventid": "80aab3c9-f0ea-4132-83fa-121570ed02cb",
    "id": "39dd199a-2f11-4cae-80b3-64087426cab2",
    "migrations": [
      {
        "destinationhostid": "c27ffc96-fb4f-4e74-a79c-a4e8748f4cff",
        "sourcehostid": "fca5a008-952f-4ca5-8a60-2213cfde4b5a",
        "virtualmachinename": "i-2-41-QA"
      },
      {
        "destinationhostid": "c27ffc96-fb4f-4e74-a79c-a4e8748f4cff",
        "sourcehostid": "fca5a008-952f-4ca5-8a60-2213cfde4b5a",
        "virtualmachinename": "i-2-108-QA"
      },
      {
        "destinationhostid": "5e336605-f5c6-47b9-a847-06a53a7ae696",
        "sourcehostid": "fca5a008-952f-4ca5-8a60-2213cfde4b5a",
        "virtualmachinename": "i-2-109-QA"
      },
      {
        "destinationhostid": "5e336605-f5c6-47b9-a847-06a53a7ae696",
        "sourcehostid": "d875cf86-984d-42c3-87f2-1d94cf76309e",
        "virtualmachinename": "i-2-64-QA"
      },
      {
        "destinationhostid": "5e336605-f5c6-47b9-a847-06a53a7ae696",
        "sourcehostid": "d875cf86-984d-42c3-87f2-1d94cf76309e",
        "virtualmachinename": "i-2-75-QA"
      }
    ],
    "status": "UNDER_REVIEW",
    "type": "MANUAL"
  }
}

Algorithm Behavior Comparison:

Algorithm Source Host Destination Host Behavior
Balanced Most loaded hosts Least loaded host (fca5a008) Spreads VMs
Condensed Least loaded host (fca5a008) More loaded hosts Consolidates VMs

TC4: Affinity Rules Compliance

Result: PASS - Anti-affinity constraints respected in DRS plans

Step 1: Create anti-affinity group

echo "create affinitygroup name=drs-test-antiaffinity type='host anti-affinity'" | cmk
{
  "affinitygroup": {
    "account": "admin",
    "domain": "ROOT",
    "domainid": "c5e52334-bfee-11f0-8d7e-1e006500046d",
    "domainpath": "/",
    "id": "28cc27d7-cf55-41fc-b365-27ac524a9ee9",
    "name": "drs-test-antiaffinity",
    "type": "host anti-affinity"
  }
}

Step 2: Add VMs to anti-affinity group

echo "update vmaffinitygroup id=e3d4e44c-d087-4b40-b9e3-d97a80a134af affinitygroupids=28cc27d7-cf55-41fc-b365-27ac524a9ee9" | cmk
{
  "virtualmachine": {
    "account": "admin",
    "affinitygroup": [
      {
        "account": "admin",
        "id": "28cc27d7-cf55-41fc-b365-27ac524a9ee9",
        "name": "drs-test-antiaffinity"
      }
    ],
    "displayname": "drs-test-vm-2",
    "id": "e3d4e44c-d087-4b40-b9e3-d97a80a134af",
    "name": "drs-test-vm-2",
    "state": "Stopped"
  }
}
echo "update vmaffinitygroup id=948b0e57-0bcd-4469-b042-837c973232ae affinitygroupids=28cc27d7-cf55-41fc-b365-27ac524a9ee9" | cmk
{
  "virtualmachine": {
    "account": "admin",
    "affinitygroup": [
      {
        "account": "admin",
        "id": "28cc27d7-cf55-41fc-b365-27ac524a9ee9",
        "name": "drs-test-antiaffinity"
      }
    ],
    "displayname": "drs-test-vm-4",
    "id": "948b0e57-0bcd-4469-b042-837c973232ae",
    "name": "drs-test-vm-4",
    "state": "Stopped"
  }
}

Step 3: Verify VMs on different hosts after restart

echo "list virtualmachines id=e3d4e44c-d087-4b40-b9e3-d97a80a134af filter=id,name,hostid,hostname" | cmk
{
  "count": 1,
  "virtualmachine": [
    {
      "hostid": "94122a66-6930-4e14-8941-5da14de64069",
      "hostname": "SimulatedAgent.5d146d20-8041-4bf7-81a7-3e0987e3f0f1",
      "id": "e3d4e44c-d087-4b40-b9e3-d97a80a134af",
      "name": "drs-test-vm-2"
    }
  ]
}
echo "list virtualmachines id=948b0e57-0bcd-4469-b042-837c973232ae filter=id,name,hostid,hostname" | cmk
{
  "count": 1,
  "virtualmachine": [
    {
      "hostid": "2e236e86-60bd-412f-86a9-b7ad97e5817d",
      "hostname": "SimulatedAgent.d9407f91-48c3-436a-bb37-39afc5dc6da6",
      "id": "948b0e57-0bcd-4469-b042-837c973232ae",
      "name": "drs-test-vm-4"
    }
  ]
}

VMs confirmed on different hosts (anti-affinity enforced at startup).

Step 4: Generate DRS plan - verify anti-affinity VMs not moved together

time echo "generate clusterdrsplan id=2fbbf922-f557-4e23-99c1-e4df3be7210c migrations=10" | cmk [41.437s]
{
  "generateclusterdrsplanresponse": {
    "clusterid": "2fbbf922-f557-4e23-99c1-e4df3be7210c",
    "eventid": "74ef9885-7b6d-47c9-b395-6562f77af001",
    "id": "9eeee535-f45e-4818-9481-9f19c6ff1a18",
    "migrations": [
      {
        "destinationhostid": "fca5a008-952f-4ca5-8a60-2213cfde4b5a",
        "sourcehostid": "c27ffc96-fb4f-4e74-a79c-a4e8748f4cff",
        "virtualmachinename": "i-2-11-QA"
      },
      {
        "destinationhostid": "fca5a008-952f-4ca5-8a60-2213cfde4b5a",
        "sourcehostid": "8d47e97d-6ffa-4aac-a146-23976f997365",
        "virtualmachinename": "i-2-24-QA"
      },
      {
        "destinationhostid": "fca5a008-952f-4ca5-8a60-2213cfde4b5a",
        "sourcehostid": "5e336605-f5c6-47b9-a847-06a53a7ae696",
        "virtualmachinename": "i-2-43-QA"
      },
      {
        "destinationhostid": "d875cf86-984d-42c3-87f2-1d94cf76309e",
        "sourcehostid": "c27ffc96-fb4f-4e74-a79c-a4e8748f4cff",
        "virtualmachinename": "i-2-19-QA"
      },
      {
        "destinationhostid": "67efe13d-abb1-4493-9ec5-52c970186c99",
        "sourcehostid": "29dc9af0-9d13-47b4-9a7d-c467d851ac49",
        "virtualmachinename": "i-2-4-QA"
      },
      {
        "destinationhostid": "fca5a008-952f-4ca5-8a60-2213cfde4b5a",
        "sourcehostid": "87d290dd-fc82-497f-9e95-497a537dad5a",
        "virtualmachinename": "i-2-10-QA"
      }
    ],
    "status": "UNDER_REVIEW",
    "type": "MANUAL"
  }
}

Observation: Neither drs-test-vm-2 nor drs-test-vm-4 appear in migration plan - anti-affinity constraint respected.


TC5: Scale and Consistency Testing

Result: PASS - Consistent performance across repeated runs

Test 1: High migration limit (migrations=50)

time echo "generate clusterdrsplan id=2fbbf922-f557-4e23-99c1-e4df3be7210c migrations=50" | cmk [44.624s]
{
  "generateclusterdrsplanresponse": {
    "clusterid": "2fbbf922-f557-4e23-99c1-e4df3be7210c",
    "eventid": "cdb9dc74-6047-4754-b911-9b3342a2cb96",
    "id": "f56e02e7-455b-4dc2-88df-3e5cc2c29e5c",
    "migrations": [
      {"virtualmachinename": "i-2-11-QA", "destinationhostid": "fca5a008-952f-4ca5-8a60-2213cfde4b5a"},
      {"virtualmachinename": "i-2-24-QA", "destinationhostid": "fca5a008-952f-4ca5-8a60-2213cfde4b5a"},
      {"virtualmachinename": "i-2-43-QA", "destinationhostid": "fca5a008-952f-4ca5-8a60-2213cfde4b5a"},
      {"virtualmachinename": "i-2-19-QA", "destinationhostid": "67efe13d-abb1-4493-9ec5-52c970186c99"},
      {"virtualmachinename": "i-2-4-QA", "destinationhostid": "d875cf86-984d-42c3-87f2-1d94cf76309e"},
      {"virtualmachinename": "i-2-10-QA", "destinationhostid": "fca5a008-952f-4ca5-8a60-2213cfde4b5a"}
    ],
    "status": "UNDER_REVIEW",
    "type": "MANUAL"
  }
}

Result: 6 migrations returned (DRS limits to actual need regardless of requested count)

Test 2: Repeated execution (10 runs)

for i in {1..5}; do echo "=== Run $i ==="; time echo "generate clusterdrsplan ..." | cmk > /dev/null; done
=== Run 1 ===
real	0m45.592s

=== Run 2 ===
real	0m45.816s

=== Run 3 ===
real	0m41.616s

=== Run 4 ===
real	0m41.555s

=== Run 5 ===
real	0m44.450s
for i in {1..5}; do echo "=== Run $i ==="; time cmk <<< "generate clusterdrsplan ..." >/dev/null; done
=== Run 1 ===
real	0m43.672s

=== Run 2 ===
real	0m43.546s

=== Run 3 ===
real	0m45.504s

=== Run 4 ===
real	0m42.770s

=== Run 5 ===
real	0m43.415s

Average: ~43.7s, Variance: +/-2s - No performance degradation observed.


TC6: Edge Case Handling

Result: PASS - Proper error handling for all edge cases

Test Input Expected Result
Zero migrations migrations=0 Error PASS
Invalid cluster id=invalid-cluster-id Error PASS
Balanced cluster drs.imbalance=0.1 Empty plan PASS
time echo "generate clusterdrsplan id=2fbbf922-f557-4e23-99c1-e4df3be7210c migrations=0" | cmk [0.101s]
Error: (HTTP 431, error code 4350) Unable to execute DRS on the cluster C0 as the number of migrations [0] is invalid
echo "generate clusterdrsplan id=invalid-cluster-id migrations=5" | cmk
Error: (HTTP 431, error code 9999) Unable to execute API command generateclusterdrsplan due to invalid value. Invalid parameter id value=invalid-cluster-id due to incorrect long value format, or entity does not exist or due to incorrect parameter annotation for the field in api cmd class.
time echo "generate clusterdrsplan ..." | cmk [with drs.imbalance=0.1, 14.1s]
{
  "generateclusterdrsplanresponse": {
    "clusterid": "2fbbf922-f557-4e23-99c1-e4df3be7210c",
    "eventid": "2de00c31-4730-4de2-96d5-9771bf4159f8",
    "id": "f345b3c3-ae4b-46c1-ad7b-bf887d0feb90",
    "migrations": [],
    "status": "UNDER_REVIEW",
    "type": "MANUAL"
  }
}

TC7: Regression Testing

Result: PASS - Both environments generate identical migration plans

Plan comparison from TC1 shows both WITH and WITHOUT PR environments selected the same VMs for migration:

  • i-2-41-QA, i-2-11-QA, i-2-24-QA, i-2-43-QA, i-2-19-QA

Only difference: execution time (150s vs 42s).

Note: Direct comparison was performed during TC1 before schema upgrade. Subsequent tests used only the WITH PR environment since both servers shared the same database.


Conclusion

All 7 test cases passed. The PR delivers significant performance improvement (3.6x faster) while maintaining functional correctness across both algorithms, affinity rules, and edge cases.

@DaanHoogland DaanHoogland requested a review from RosiKyu December 10, 2025 12:15

@RosiKyu RosiKyu left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Detailed Test Results: #12014 (comment)

@DaanHoogland DaanHoogland merged commit 4348386 into apache:4.20 Dec 10, 2025
24 of 26 checks passed
@DaanHoogland DaanHoogland deleted the optimize-drs-plan-generation branch December 10, 2025 12:24
@sureshanaparti sureshanaparti added this to the 4.20.3 milestone Dec 10, 2025
sandeeplocharla pushed a commit to NetApp/cloudstack that referenced this pull request Feb 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

KVM DRS optimizations

7 participants