Skip to content

Added support for removing unused port groups on VMWare#4701

Merged
nvazquez merged 5 commits into
apache:mainfrom
shapeblue:vmware-cleanup-port-groups
Aug 28, 2021
Merged

Added support for removing unused port groups on VMWare#4701
nvazquez merged 5 commits into
apache:mainfrom
shapeblue:vmware-cleanup-port-groups

Conversation

@Spaceman1984

@Spaceman1984 Spaceman1984 commented Feb 18, 2021

Copy link
Copy Markdown
Contributor

Description

This PR adds a global setting (vmware.cleanup.port.groups) to toggle removing unused port groups from VMware hypervisor hosts.

Fixes: #3779

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

This has been manually tested, with the below steps:

  • Set global setting (vmware.cleanup.port.groups) to true / false.
  • Create an instance on a new network.
  • Check network config for hosts in vSphere client for the newly created port group.
  • Destroy and expunge the instance.
  • Destroy the router.
  • Check network config for hosts in vSphere client if the port group has been removed or not, depending on global setting (vmware.cleanup.port.groups)

@Spaceman1984

Copy link
Copy Markdown
Contributor Author

@blueorangutan package

@blueorangutan

Copy link
Copy Markdown

@Spaceman1984 a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan

Copy link
Copy Markdown

Packaging result: ✔centos7 ✔centos8 ✔debian. JID-2735

@Spaceman1984

Copy link
Copy Markdown
Contributor Author

@blueorangutan test centos7 vmware-67u3

@blueorangutan

Copy link
Copy Markdown

@Spaceman1984 a Trillian-Jenkins test job (centos7 mgmt + vmware-67u3) has been kicked to run smoke tests

@blueorangutan

Copy link
Copy Markdown

Trillian test result (tid-3570)
Environment: vmware-67u3 (x2), Advanced Networking with Mgmt server 7
Total time taken: 38865 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr4701-t3570-vmware-67u3.zip
Intermittent failure detected: /marvin/tests/smoke/test_kubernetes_clusters.py
Smoke tests completed. 85 look OK, 1 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File
test_03_deploy_and_upgrade_kubernetes_cluster Failure 776.06 test_kubernetes_clusters.py

@Spaceman1984 Spaceman1984 marked this pull request as ready for review February 24, 2021 06:56
@Spaceman1984

Copy link
Copy Markdown
Contributor Author

@blueorangutan package

@blueorangutan

Copy link
Copy Markdown

@Spaceman1984 a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress. [S]

@blueorangutan

Copy link
Copy Markdown

Packaging result: ✔️ centos7 ✔️ centos8 ✔️ debian. SL-JID 88

@DaanHoogland

Copy link
Copy Markdown
Contributor

code looks good @Spaceman1984 , but i don't see how it is being called. Am I missing something?

"Vmware script timeout for ova packaging process", true, ConfigKey.Scope.Global, 1000);

static final ConfigKey<Boolean> s_vmwareCleanupPortGroups = new ConfigKey<Boolean>("Advanced", Boolean.class, "vmware.cleanup.port.groups", "false",
"Remove unused port groups. WARNING: When set to true, native VMware HA might not work.", true, ConfigKey.Scope.Global);

@sureshanaparti sureshanaparti Mar 17, 2021

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Spaceman1984 If my understanding is correct, cleanup port groups happens only on the network cleanup only and the old uncleaned port groups of other networks (when this setting is false) still remains. If so, Can you mention in the setting description that port groups cleanup happens on the network cleanup for that network only.

Is it possible to clean old port groups of other networks through any garbage collector, when this setting is true.

@Spaceman1984 Spaceman1984 Jun 24, 2021

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To my knowledge, this setting is not used through garbage collection, it is checked when an instance is destroyed. I'm not sure if there is some kind of garbage collection that would destroy a VR, if there is, then I suppose this will also apply.

@nvazquez

Copy link
Copy Markdown
Contributor

Hi @Spaceman1984 there are some open questions on this PR, can you please advise?

@Spaceman1984

Copy link
Copy Markdown
Contributor Author

Hi @Spaceman1984 there are some open questions on this PR, can you please advise?

Sure, @nvazquez

@Spaceman1984

Spaceman1984 commented Jun 24, 2021

Copy link
Copy Markdown
Contributor Author

code looks good @Spaceman1984 , but i don't see how it is being called. Am I missing something?

This setting is checked when an instance is destroyed @DaanHoogland. If the instance being destroyed is the last instance (including VR) using a specific network, the port group will be removed.

@DaanHoogland DaanHoogland left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clgtm

@sureshanaparti sureshanaparti requested a review from nvazquez June 27, 2021 16:29
@nvazquez

Copy link
Copy Markdown
Contributor

@blueorangutan test centos7 vmware-67u3

@blueorangutan

Copy link
Copy Markdown

@nvazquez a Trillian-Jenkins test job (centos7 mgmt + vmware-67u3) has been kicked to run smoke tests

@blueorangutan

Copy link
Copy Markdown

Trillian test result (tid-1380)
Environment: vmware-67u3 (x2), Advanced Networking with Mgmt server 7
Total time taken: 36550 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr4701-t1380-vmware-67u3.zip
Intermittent failure detected: /marvin/tests/smoke/test_pvlan.py
Smoke tests completed. 88 look OK, 1 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File
test_create_pvlan_network Error 0.03 test_pvlan.py

@sureshanaparti

Copy link
Copy Markdown
Contributor

@Spaceman1984 Tested this manually with multiple vSphere hosts (and Standard vSwitches), I noticed the port groups are removed only on one host where the last destroyed VM (usually VR) resides. Make sure the unused port groups are removed from all the hosts. Also, verify this with Distributed vSwitch. Thanks.

@yadvr

yadvr commented Aug 9, 2021

Copy link
Copy Markdown
Member

@sureshanaparti @andrijapanicsb @borisstoyanov @vladimirpetrov @nvazquez are we lgtm to merge this?

@nvazquez

Copy link
Copy Markdown
Contributor

@Spaceman1984 could you check @sureshanaparti's last comment?

@sureshanaparti

Copy link
Copy Markdown
Contributor

@sureshanaparti @andrijapanicsb @borisstoyanov @vladimirpetrov @nvazquez are we lgtm to merge this?

no @rhtyd

@sureshanaparti sureshanaparti force-pushed the vmware-cleanup-port-groups branch from 5cc8fb6 to e0a3f40 Compare August 24, 2021 13:44
@sureshanaparti

Copy link
Copy Markdown
Contributor

@blueorangutan package

@blueorangutan

Copy link
Copy Markdown

@sureshanaparti a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@sureshanaparti

Copy link
Copy Markdown
Contributor

@Spaceman1984 Tested this manually with multiple vSphere hosts (and Standard vSwitches), I noticed the port groups are removed only on one host where the last destroyed VM (usually VR) resides. Make sure the unused port groups are removed from all the hosts. Also, verify this with Distributed vSwitch. Thanks.

Addressed the changes for cleaning up of the unused port groups on all the hosts. Tested this manually with multiple vSphere hosts, network configured through standard vSwitches on them.

@blueorangutan

Copy link
Copy Markdown

Packaging result: ✔️ el7 ✔️ el8 ✔️ debian ✔️ suse15. SL-JID 997

@sureshanaparti

Copy link
Copy Markdown
Contributor

@blueorangutan test centos7 vmware-67u3

@blueorangutan

Copy link
Copy Markdown

@sureshanaparti a Trillian-Jenkins test job (centos7 mgmt + vmware-67u3) has been kicked to run smoke tests

@blueorangutan

Copy link
Copy Markdown

Trillian test result (tid-1763)
Environment: vmware-67u3 (x2), Advanced Networking with Mgmt server 7
Total time taken: 64141 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr4701-t1763-vmware-67u3.zip
Intermittent failure detected: /marvin/tests/smoke/test_internal_lb.py
Intermittent failure detected: /marvin/tests/smoke/test_privategw_acl.py
Intermittent failure detected: /marvin/tests/smoke/test_vm_life_cycle.py
Intermittent failure detected: /marvin/tests/smoke/test_vpc_redundant.py
Intermittent failure detected: /marvin/tests/smoke/test_vpc_vpn.py
Smoke tests completed. 87 look OK, 2 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File
test_02_internallb_roundrobin_1RVPC_3VM_HTTP_port80 Failure 1074.85 test_internal_lb.py
test_09_expunge_vm Failure 423.54 test_vm_life_cycle.py

@DaanHoogland

Copy link
Copy Markdown
Contributor

@Spaceman1984, can you look at the conflicts here

@nvazquez

Copy link
Copy Markdown
Contributor

Conflicts fixed

@blueorangutan package

@blueorangutan

Copy link
Copy Markdown

@nvazquez a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan

Copy link
Copy Markdown

Packaging result: ✔️ el7 ✔️ el8 ✔️ debian ✔️ suse15. SL-JID 1044

@nvazquez

Copy link
Copy Markdown
Contributor

@blueorangutan test centos7 vmware-67u3

@blueorangutan

Copy link
Copy Markdown

@nvazquez a Trillian-Jenkins test job (centos7 mgmt + vmware-67u3) has been kicked to run smoke tests

@blueorangutan

Copy link
Copy Markdown

Trillian test result (tid-1817)
Environment: vmware-67u3 (x2), Advanced Networking with Mgmt server 7
Total time taken: 37999 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr4701-t1817-vmware-67u3.zip
Smoke tests completed. 89 look OK, 0 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File

@nvazquez nvazquez left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@nvazquez nvazquez merged commit 1d3083d into apache:main Aug 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[VMware / PVLAN] Port groups seem never cleaned up on ESXi when networks are deleted and thus block reusing the same VLANs (

8 participants