Add container grouping to network connection topology#22601
Conversation
@ktsaou I have started the AI code review. It will take a few minutes to complete. |
There was a problem hiding this comment.
5 issues found across 135 files
Confidence score: 4/5
- This PR looks safe to merge overall: the reported items are mostly quality/configuration and documentation consistency issues, with no clear runtime-breaking defect identified.
- Most severe is in
.codacy.yml(5/10): excluding actively maintained source headers from analysis can reduce static-analysis coverage and make future defects easier to miss. - In
.sonarcloud.properties, missing CPD exclusions for mirrored unix/windows implementations (cache.go/cache_windows.goandraw/client.go/raw/client_windows.go) is likely to create noisy duplication findings and review friction rather than direct user-facing regressions. - Pay close attention to
.codacy.yml,.sonarcloud.properties,docs/netdata-ai/skills/query-netdata-agents/how-tos/find-containers-for-topology-port-direct.md, anddocs/netdata-ai/skills/query-netdata-cloud/how-tos/group-network-topology-by-kubernetes-pod.md- align analysis/doc patterns to avoid reduced coverage and avoidable confusion.
Note: This PR contains a large number of files. cubic only reviews up to 100 files per PR, so some files may not have been reviewed. cubic prioritizes the most important files to review.
On a pro plan you can use ultrareview for larger PRs.
Re-trigger cubic
|
@cubic-dev-ai please review again |
@ktsaou I have started the AI code review. It will take a few minutes to complete. |
|
@cubic-dev-ai please review again |
@ktsaou I have started the AI code review. It will take a few minutes to complete. |
There was a problem hiding this comment.
0 issues found across 1 file (changes from recent commits).
You've manually re-run cubic several times on this PR. Each manual re-review checks the full PR again and counts toward your usage quota. To preserve your usage limits, we recommend letting cubic automatically review new commits.
Re-trigger cubic
|
@cubic-dev-ai please review again |
@ktsaou I have started the AI code review. It will take a few minutes to complete. |
|
You're iterating quickly on this pull request. To help protect your rate limits, cubic has paused automatic reviews on new pushes for now—when you're ready for another review, comment |
There was a problem hiding this comment.
0 issues found across 7 files (changes from recent commits).
You've manually re-run cubic several times on this PR. Each manual re-review checks the full PR again and counts toward your usage quota. To preserve your usage limits, we recommend letting cubic automatically review new commits.
Re-trigger cubic
|
@cubic-dev-ai please review again |
@ktsaou I have started the AI code review. It will take a few minutes to complete. |
There was a problem hiding this comment.
0 issues found across 1 file (changes from recent commits).
You've manually re-run cubic several times on this PR. Each manual re-review checks the full PR again and counts toward your usage quota. To preserve your usage limits, we recommend letting cubic automatically review new commits.
Re-trigger cubic
|



Summary
This PR elevates
topology:network-connectionsfrom a process-only topology to an actor-level topology that can group network connections by process name, PID, or container/service identity.Main changes:
CGROUPS_LOOKUPandAPPS_LOOKUPrequest/response flows.cgroups.pluginlookup server that exposes cgroup status, orchestrator, cgroup names, paths, and labels, and wakes cgroup discovery on lookup misses without whole-cache invalidation.apps.pluginbridge that resolves process cgroup paths throughcgroups.plugin, derives process/container enrichment, exposesAPPS_LOOKUP, and reports partial/pending enrichment so downstream consumers can retry instead of caching incomplete data forever.network-viewer.pluginAPPS_LOOKUP client and cache for on-demand per-PID enrichment, with eviction-based cache maintenance, reconnect/failure logging, and IPC health metrics.topology:network-connectionswithgroup_byselections:process_name: grouped process actors.pid: one actor per PID with scalar per-PID enrichment fields.container: grouped container/service/user/process-fallback actors using canonicalcontainer_name.cgroup-name.shhelper with a Gocgroup-namebinary and updates build, install, RPM, DEB, makeself, and Proxmox docs references.netdata.topology.v1schema/docs/specs with the new grouping contract, actor table expectations, aggregation-scope descriptions, and icon tokens:docker,kubernetes,lxc,nspawn,podman,systemd, anduser.Design notes:
group_by:pid. Grouped modes preserve variable fields through actor labels/detail tables instead of pretending they are single scalar actor properties.Test Plan
Validated locally after rebasing onto current
netdata/netdata:master:git diff --checkrg -n "^(<<<<<<< |=======$|>>>>>>> )" .SKILL.mddescriptions under.agents/skillsanddocs/netdata-ai/skillssudo -n cmake --build build --target network-viewer.plugin network-viewer-topology-containers-test -j 8./build/network-viewer-topology-containers-testpython3 src/collectors/network-viewer.plugin/tests/validate_topology_container_fixtures.py.agents/sow/audit.shNotes from validation:
simple_patternconst-qualifier warning outside this PR's touched code.Additional Information
This PR is the Agent-side implementation. The Cloud topology aggregation/UI consumption work is intentionally tracked separately because those repositories are outside this Agent branch. The Agent schema and producer payloads now declare the generic grouping, actor table, aggregation-scope, and icon-token contracts needed by those consumers.
Tracked SOWs in this branch:
For users: How does this change affect me?
cgroup-namehelper.topology:network-connectionscan now group actors by process name, PID, or container/service identity when the UI exposes thegroup_byselector.cgroup-nameinstead ofcgroup-name.sh; installer and package permission handling were updated accordingly.Summary by cubic
Adds container/service grouping to
topology:network-connectionswith on‑demand cgroup/apps enrichment, bounded LRU caching, and IPC health charts. Replaces the shellcgroup-name.shwith a Gocgroup-namehelper, updates schema/docs/tests, and guards Linux‑only tests to keep CI portable.New Features
group_by:process_name,group_by:pid,group_by:container, with correct identities, merged labels, and detail tables; optionallabels:<pattern>whitelist.cgroups.pluginservesCGROUPS_LOOKUPand wakes discovery on misses;apps.pluginbridgesAPPS_LOOKUP;network-viewer.pluginwarms a bounded per‑PID cache and exports IPC health/latency charts.docker,kubernetes,lxc,nspawn,podman,systemd,user).processesgains cgroup/container/service columns;netdata.topology.v1extended for grouping/aggregation and icons; new how‑tos (Kubernetes pod grouping, find containers by port); focused tests for lookup protocols, cgroup parsing/classification, cache behavior, and containerized topology; vendored@netipcadds lookup codecs/services;network-viewer.pluginadds “apps lookup cache size” (default 8192).Migration
cgroup-name; new CMake optionsENABLE_CGROUP_NAME,ENABLE_CGROUPS_LOOKUP_SERVER, optionalENABLE_CGROUPS_LOOKUP_TEST_CLIENT; installer/RPM/DEB/makeself permissions updated.topology:network-connections; Cloud/UI follow‑up tracked separately; no config changes required.Written for commit 8b65264. Summary will update on new commits.