Enable kubernetes_node_scale benchmark (up to 5k nodes) on GCP GKE with CCC by vofish · Pull Request #6536 · GoogleCloudPlatform/PerfKitBenchmarker

vofish · 2026-03-12T20:17:11Z

Summary

Enables running the kubernetes_node_scale benchmark (scale up → scale down → scale up again, e.g., toward large node counts) on GCP GKE with Custom ComputeClass and node pool auto-creation (NAP) where configured. The benchmark applies a Deployment with pod anti-affinity (one pod per node), records scale-up / scale-down / second scale-up timing, then tears down.

Main changes

kubernetes_node_scale.yaml.j2 - Deployment only (pause pods, anti-affinity). No ComputeClass in the manifest; the cluster path creates the ComputeClass (aligned with upstream PKB / custom compute class support).
GKE - GetNodeSelectors sets cloud.google.com/compute-class to the default node pool name when _UsesCustomComputeClass(default_nodepool) is true, so ModifyPodSpecPlacementYaml adds the selector without cloud-specific YAML in the benchmark template.
GKE flags - gke_autoscaling_profile (optimize-utilization/balanced) and gke_cluster_ipv4_cidr_size for large scale-outs; gke_autoscaling_profile is also stored in GetResourceMetadata for run-to-run comparison.
Machine families - Use existing --k8s_machine_families (and ContainerClusterSpec.machine_families) instead of a dedicated NAP machine-type flag; container_spec._ApplyFlags reads flag_values.k8s_machine_families so the flag applies correctly.

Usage notes

For the Custom ComputeClass + NAP path on GCP, pass something like --k8s_machine_families=e2.
Raise container_cluster.max_vm_count (or equivalent) so the cluster autoscaler can reach your target --kubernetes_scale_num_nodes.

…th CCC

hubatish · 2026-04-08T15:36:25Z

 )

+NAP_MACHINE_TYPE = flags.DEFINE_string(
+    'kubernetes_node_scale_nap_machine_type',


Rather than this, use the flag K8S_MACHINE_FAMILIES created in:
#6559

(or set via config_overrides)

hubatish · 2026-04-08T15:37:24Z

+kind: ComputeClass
+metadata:
+  name: app-ccc
+spec:


Most of this should be removed/duplicate with https://github.com/GoogleCloudPlatform/PerfKitBenchmarker/pull/6559/changes

If eg storage & spot false are quite important we can make some revisions within that framework.

hubatish · 2026-04-08T15:39:26Z

      cmd.flags['min-nodes'] = self.min_nodes
-    cmd.flags['cluster-ipv4-cidr'] = f'/{_CalculateCidrSize(self.max_nodes)}'
+    if gcp_flags.GKE_AUTOSCALING_PROFILE.value:
+      cmd.flags['autoscaling-profile'] = gcp_flags.GKE_AUTOSCALING_PROFILE.value


Let's also get this value into ResourceMetadata, since it sounds like something we might change over multiple runs (eg the only difference between 2 runs could be the autoscaling-profile & we'd like to compare them. ResourceMetadata lets us distinguish between the two from just the results).

hubatish · 2026-04-08T15:40:35Z

+# Timeout for "kubectl delete all --all" during teardown. Increase for
+# large-scale runs (e.g. 5000+ pods) to avoid benchmark failure.
+flags.DEFINE_integer(
+    'kubernetes_teardown_delete_timeout',


Just looking at this PR, is this flag actually used? Seems reasonable enough.

hubatish · 2026-04-08T15:41:20Z

                - app
            topologyKey: "kubernetes.io/hostname"
+      {% if cloud == 'GCP' %}
+      nodeSelector:


Is this always the case? Prefer to pass in whether we're using custom compute classes or not.

hubatish · 2026-04-08T16:44:54Z

  cluster = bm_spec.container_cluster
+  manifest_kwargs: dict[str, Any] = {'cloud': FLAGS.cloud}
+  if cluster.default_nodepool.machine_families:
+    manifest_kwargs['gcp_compute_class_name'] = cluster.default_nodepool.name


Can this go in ModifyPodSpecPlacementYaml instead? I belive that code already deals with nodeSelector & it would obscure any reference to the cloud (eg cloud.google.com/compute-class) from the yaml file.

hubatish

Cool, I think that's all my comments addressed! Can you resolve them as well? Approving.

hubatish · 2026-04-08T17:19:09Z

Summary

Enables running the kubernetes_node_scale benchmark (0→5k→0→5k nodes) on GCP GKE with Custom ComputeClass. The benchmark scales a deployment with pod anti-affinity, measures scale-up, scale-down, and a second scale-up, then tears down the cluster.

Main changes

kubernetes_node_scale.yaml.j2 - added ComputeClass and nodeSelector sections

GKE - added flags for autoscaling profile and ipv4 cidr size

Add --kubernetes_node_scale_nap_machine_type (default e2-medium) and render ComputeClass machineType from the Jinja template via nap_machine_type, with a template default for callers that omit the variable.

This description should also be updated

Specifically some changes in this flags.py file ran into internal vs external merge conflicts, so manually splitting this into a separate PR. Flags are therefore not used in this PR but will be used in a follow up. PiperOrigin-RevId: 896710171

Specifically some changes in this flags.py file ran into internal vs external merge conflicts, so manually splitting this into a separate PR. Flags are therefore not used in this PR but will be used in a follow up. PiperOrigin-RevId: 896733417

vofish and others added 4 commits March 12, 2026 22:03

Enable kubernetes_node_scale benchmark (up to 5k nodes) on GCP GKE wi…

8f8021f

…th CCC

Update logic to create regional cluster

a9f3bb4

Merge branch 'master' into gcp-5k-ccc

87ebce8

Make GCP NAP machine type configurable

d49d1a1

hubatish reviewed Apr 8, 2026

View reviewed changes

Address PR comments

b946424

vofish requested a review from hubatish April 8, 2026 16:29

hubatish reviewed Apr 8, 2026

View reviewed changes

Address PR comments #2

161f4b1

vofish requested a review from hubatish April 8, 2026 17:14

hubatish approved these changes Apr 8, 2026

View reviewed changes

hubatish added the ready to pull label Apr 8, 2026

copybara-service Bot mentioned this pull request Apr 8, 2026

Merging partial changes from https://github.com/GoogleCloudPlatform/PerfKitBenchmarker/pull/6536 #6584

Closed

vofish and others added 2 commits April 9, 2026 09:43

Merge branch 'master' into gcp-5k-ccc

0d3443a

Fix: read k8s_machine_families flag correctly in container_spec

2ab6a73

hubatish approved these changes Apr 9, 2026

View reviewed changes

hubatish approved these changes Apr 17, 2026

View reviewed changes

hubatish added ready to pull and removed ready to pull labels Apr 17, 2026

copybara-service Bot merged commit 0ffe5f5 into GoogleCloudPlatform:master Apr 21, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable kubernetes_node_scale benchmark (up to 5k nodes) on GCP GKE with CCC#6536

Enable kubernetes_node_scale benchmark (up to 5k nodes) on GCP GKE with CCC#6536
copybara-service[bot] merged 8 commits intoGoogleCloudPlatform:masterfrom
kiryl-filatau:gcp-5k-ccc

vofish commented Mar 12, 2026 •

edited

Loading

Uh oh!

hubatish Apr 8, 2026

Uh oh!

hubatish Apr 8, 2026

Uh oh!

hubatish Apr 8, 2026

Uh oh!

hubatish Apr 8, 2026

Uh oh!

hubatish Apr 8, 2026

Uh oh!

hubatish Apr 8, 2026

Uh oh!

hubatish left a comment

Uh oh!

hubatish commented Apr 8, 2026

Summary

Main changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

vofish commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Main changes

Usage notes

Uh oh!

hubatish Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

hubatish Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

hubatish Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

hubatish Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

hubatish Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

hubatish Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

hubatish left a comment

Choose a reason for hiding this comment

Uh oh!

hubatish commented Apr 8, 2026

Summary

Main changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vofish commented Mar 12, 2026 •

edited

Loading