Skip to content

Add container grouping to network connection topology#22601

Open
ktsaou wants to merge 26 commits into
netdata:masterfrom
ktsaou:topology-containers
Open

Add container grouping to network connection topology#22601
ktsaou wants to merge 26 commits into
netdata:masterfrom
ktsaou:topology-containers

Conversation

@ktsaou
Copy link
Copy Markdown
Member

@ktsaou ktsaou commented Jun 2, 2026

Summary

This PR elevates topology:network-connections from a process-only topology to an actor-level topology that can group network connections by process name, PID, or container/service identity.

Main changes:

  • Adds NetIPC lookup protocol support across the vendored C, Go, and Rust netipc implementations for CGROUPS_LOOKUP and APPS_LOOKUP request/response flows.
  • Adds a cgroups.plugin lookup server that exposes cgroup status, orchestrator, cgroup names, paths, and labels, and wakes cgroup discovery on lookup misses without whole-cache invalidation.
  • Adds an apps.plugin bridge that resolves process cgroup paths through cgroups.plugin, derives process/container enrichment, exposes APPS_LOOKUP, and reports partial/pending enrichment so downstream consumers can retry instead of caching incomplete data forever.
  • Adds a network-viewer.plugin APPS_LOOKUP client and cache for on-demand per-PID enrichment, with eviction-based cache maintenance, reconnect/failure logging, and IPC health metrics.
  • Extends topology:network-connections with group_by selections:
    • process_name: grouped process actors.
    • pid: one actor per PID with scalar per-PID enrichment fields.
    • container: grouped container/service/user/process-fallback actors using canonical container_name.
  • Adds actor-owned topology detail tables for contributing processes and cgroups, plus merged/set-valued actor labels for fields that vary across grouped PIDs.
  • Adds shared cgroup topology classification rules for Docker, Kubernetes, LXC, Podman, systemd-nspawn, systemd units, VMs, users, and process fallback actors.
  • Replaces the generated cgroup-name.sh helper with a Go cgroup-name binary and updates build, install, RPM, DEB, makeself, and Proxmox docs references.
  • Updates netdata.topology.v1 schema/docs/specs with the new grouping contract, actor table expectations, aggregation-scope descriptions, and icon tokens: docker, kubernetes, lxc, nspawn, podman, systemd, and user.
  • Adds focused tests and fixtures for lookup protocols, cgroup path parsing, cgroup orchestrator classification, network-viewer APPS_LOOKUP caching, and topology container fixture validation.
  • Records the implemented SOWs and updates the topology developer skill/docs used for future topology producer work.

Design notes:

  • The cache model is on-demand and eviction-based. It does not periodically invalidate whole caches.
  • Pending cgroup/container information remains pending until resolved; it does not permanently fall back to process names on transient lookup misses.
  • Raw cgroup paths and other per-PID fields are scalar only in group_by:pid. Grouped modes preserve variable fields through actor labels/detail tables instead of pretending they are single scalar actor properties.
  • Runtime connection/disconnection/failure logs are retained because they are operator-actionable; per-message debug send/receive logs are not included.
Test Plan

Validated locally after rebasing onto current netdata/netdata:master:

  • git diff --check
  • Exact conflict-marker scan: rg -n "^(<<<<<<< |=======$|>>>>>>> )" .
  • Skill metadata limit scan for all SKILL.md descriptions under .agents/skills and docs/netdata-ai/skills
  • sudo -n cmake --build build --target network-viewer.plugin network-viewer-topology-containers-test -j 8
  • ./build/network-viewer-topology-containers-test
  • python3 src/collectors/network-viewer.plugin/tests/validate_topology_container_fixtures.py
  • .agents/sow/audit.sh

Notes from validation:

  • The focused build completed successfully.
  • The build emitted an existing simple_pattern const-qualifier warning outside this PR's touched code.
  • The SOW audit completed successfully and reported existing skill-classification warnings for non-project skill directories; no sensitive-data findings were reported.
Additional Information

This PR is the Agent-side implementation. The Cloud topology aggregation/UI consumption work is intentionally tracked separately because those repositories are outside this Agent branch. The Agent schema and producer payloads now declare the generic grouping, actor table, aggregation-scope, and icon-token contracts needed by those consumers.

Tracked SOWs in this branch:

  • Completed: SOW-0032, SOW-0033, SOW-0034, SOW-0035, SOW-0036, SOW-0037, SOW-0038, SOW-0039, SOW-0040, SOW-0042, SOW-0043, SOW-0044.
  • Pending external/consumer follow-up: SOW-0041, Cloud topology generic aggregation verification.
For users: How does this change affect me?
  • Affected area: Network Connections topology, apps.plugin process enrichment, cgroups.plugin cgroup discovery/enrichment, and the cgroup-name helper.
  • Visible change: topology:network-connections can now group actors by process name, PID, or container/service identity when the UI exposes the group_by selector.
  • Benefit: users can inspect network dependencies at the process level or at a higher-level container/service/user actor level, with richer modal details for contributing processes and cgroups.
  • Operational impact: new IPC health charts expose lookup request/cache/latency behavior for cgroups/apps/network-viewer enrichment paths.
  • Packaging impact: installs now ship cgroup-name instead of cgroup-name.sh; installer and package permission handling were updated accordingly.

Summary by cubic

Adds container/service grouping to topology:network-connections with on‑demand cgroup/apps enrichment, bounded LRU caching, and IPC health charts. Replaces the shell cgroup-name.sh with a Go cgroup-name helper, updates schema/docs/tests, and guards Linux‑only tests to keep CI portable.

  • New Features

    • Actor grouping: group_by:process_name, group_by:pid, group_by:container, with correct identities, merged labels, and detail tables; optional labels:<pattern> whitelist.
    • Enrichment chain: cgroups.plugin serves CGROUPS_LOOKUP and wakes discovery on misses; apps.plugin bridges APPS_LOOKUP; network-viewer.plugin warms a bounded per‑PID cache and exports IPC health/latency charts.
    • Classification/icons: shared rules detect Docker, Kubernetes, LXC, Podman, systemd, nspawn, users, and VMs; emits actor kind/type and icons (docker, kubernetes, lxc, nspawn, podman, systemd, user).
    • Functions/schema/docs/tests: Linux processes gains cgroup/container/service columns; netdata.topology.v1 extended for grouping/aggregation and icons; new how‑tos (Kubernetes pod grouping, find containers by port); focused tests for lookup protocols, cgroup parsing/classification, cache behavior, and containerized topology; vendored @netipc adds lookup codecs/services; network-viewer.plugin adds “apps lookup cache size” (default 8192).
  • Migration

    • Build/package: ship Go cgroup-name; new CMake options ENABLE_CGROUP_NAME, ENABLE_CGROUPS_LOOKUP_SERVER, optional ENABLE_CGROUPS_LOOKUP_TEST_CLIENT; installer/RPM/DEB/makeself permissions updated.
    • API consumers: grouped views and set‑aggregated labels are now part of topology:network-connections; Cloud/UI follow‑up tracked separately; no config changes required.
    • CI/build: vendor lookup protocol updates; add SonarCloud CPD exclusions for large/generated IPC sources; fix analyzer and portability issues across compilers/platforms; guard Linux‑only topology tests to avoid non‑Linux failures.

Written for commit 8b65264. Summary will update on new commits.

Review in cubic

@github-actions github-actions Bot added area/packaging Packaging and operating systems support area/docs area/collectors Everything related to data collection area/build Build system (autotools and cmake). collectors/apps collectors/cgroups collectors/go.d area/metadata Integrations metadata area/go labels Jun 2, 2026
@cubic-dev-ai
Copy link
Copy Markdown
Contributor

cubic-dev-ai Bot commented Jun 2, 2026

@cubic-dev-ai please review again

@ktsaou I have started the AI code review. It will take a few minutes to complete.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review this pull request because it exceeds the maximum number of lines (20,000). Try reducing the number of changed lines and requesting a review from Copilot again.

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 issues found across 135 files

Confidence score: 4/5

  • This PR looks safe to merge overall: the reported items are mostly quality/configuration and documentation consistency issues, with no clear runtime-breaking defect identified.
  • Most severe is in .codacy.yml (5/10): excluding actively maintained source headers from analysis can reduce static-analysis coverage and make future defects easier to miss.
  • In .sonarcloud.properties, missing CPD exclusions for mirrored unix/windows implementations (cache.go/cache_windows.go and raw/client.go/raw/client_windows.go) is likely to create noisy duplication findings and review friction rather than direct user-facing regressions.
  • Pay close attention to .codacy.yml, .sonarcloud.properties, docs/netdata-ai/skills/query-netdata-agents/how-tos/find-containers-for-topology-port-direct.md, and docs/netdata-ai/skills/query-netdata-cloud/how-tos/group-network-topology-by-kubernetes-pod.md - align analysis/doc patterns to avoid reduced coverage and avoidable confusion.

Note: This PR contains a large number of files. cubic only reviews up to 100 files per PR, so some files may not have been reviewed. cubic prioritizes the most important files to review.
On a pro plan you can use ultrareview for larger PRs.

Re-trigger cubic

Comment thread .sonarcloud.properties
Comment thread .sonarcloud.properties
Comment thread .codacy.yml Outdated
Comment thread src/collectors/apps.plugin/apps_os_linux.c Dismissed
Comment thread src/collectors/apps.plugin/apps_os_linux.c Dismissed
@ktsaou
Copy link
Copy Markdown
Member Author

ktsaou commented Jun 2, 2026

@cubic-dev-ai please review again

@ktsaou ktsaou requested a review from Copilot June 2, 2026 09:28
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review this pull request because it exceeds the maximum number of lines (20,000). Try reducing the number of changed lines and requesting a review from Copilot again.

@cubic-dev-ai
Copy link
Copy Markdown
Contributor

cubic-dev-ai Bot commented Jun 2, 2026

@cubic-dev-ai please review again

@ktsaou I have started the AI code review. It will take a few minutes to complete.

@ktsaou
Copy link
Copy Markdown
Member Author

ktsaou commented Jun 2, 2026

@cubic-dev-ai please review again

@ktsaou ktsaou requested a review from Copilot June 2, 2026 09:31
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review this pull request because it exceeds the maximum number of lines (20,000). Try reducing the number of changed lines and requesting a review from Copilot again.

@cubic-dev-ai
Copy link
Copy Markdown
Contributor

cubic-dev-ai Bot commented Jun 2, 2026

@cubic-dev-ai please review again

@ktsaou I have started the AI code review. It will take a few minutes to complete.

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 issues found across 1 file (changes from recent commits).

You've manually re-run cubic several times on this PR. Each manual re-review checks the full PR again and counts toward your usage quota. To preserve your usage limits, we recommend letting cubic automatically review new commits.

Re-trigger cubic

@ktsaou
Copy link
Copy Markdown
Member Author

ktsaou commented Jun 2, 2026

@cubic-dev-ai please review again

@ktsaou ktsaou requested a review from Copilot June 2, 2026 10:10
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review this pull request because it exceeds the maximum number of lines (20,000). Try reducing the number of changed lines and requesting a review from Copilot again.

@cubic-dev-ai
Copy link
Copy Markdown
Contributor

cubic-dev-ai Bot commented Jun 2, 2026

@cubic-dev-ai please review again

@ktsaou I have started the AI code review. It will take a few minutes to complete.

@cubic-dev-ai
Copy link
Copy Markdown
Contributor

cubic-dev-ai Bot commented Jun 2, 2026

You're iterating quickly on this pull request. To help protect your rate limits, cubic has paused automatic reviews on new pushes for now—when you're ready for another review, comment @cubic-dev-ai review.

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 issues found across 7 files (changes from recent commits).

You've manually re-run cubic several times on this PR. Each manual re-review checks the full PR again and counts toward your usage quota. To preserve your usage limits, we recommend letting cubic automatically review new commits.

Re-trigger cubic

@ktsaou
Copy link
Copy Markdown
Member Author

ktsaou commented Jun 2, 2026

@cubic-dev-ai please review again

@ktsaou ktsaou requested a review from Copilot June 2, 2026 18:10
@cubic-dev-ai
Copy link
Copy Markdown
Contributor

cubic-dev-ai Bot commented Jun 2, 2026

@cubic-dev-ai please review again

@ktsaou I have started the AI code review. It will take a few minutes to complete.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review this pull request because it exceeds the maximum number of lines (20,000). Try reducing the number of changed lines and requesting a review from Copilot again.

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 issues found across 1 file (changes from recent commits).

You've manually re-run cubic several times on this PR. Each manual re-review checks the full PR again and counts toward your usage quota. To preserve your usage limits, we recommend letting cubic automatically review new commits.

Re-trigger cubic

@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented Jun 2, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/build Build system (autotools and cmake). area/collectors Everything related to data collection area/docs area/go area/metadata Integrations metadata area/packaging Packaging and operating systems support area/plugins.d collectors/apps collectors/cgroups collectors/go.d

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants