Netdata fixes part 32#22972
Open
stelfrag wants to merge 6 commits into
Open
Conversation
simple_pattern_trim_around_equal() currently reads before its mallocz buffer when a label matcher starts with '='. A trailing '=' also lets the '=' branch copy the NUL and advance the input pointer past the terminator, so the next loop condition reads beyond the string; depending on adjacent heap contents the leading case could also move dst before the allocation and write there. Guard the previous-byte check with dst > store and break after processing terminal '=' so non-terminal fall-through behavior stays byte-for-byte unchanged. Functional impact: preserves existing host/chart label matcher parsing, including malformed leading and trailing '=' semantics covered by rrdlabels tests. Performance impact: one pointer comparison only on '=' bytes, no new allocations, and no extra pass over the input. Security impact: removes undefined out-of-bounds reads and potential one-byte output underflow/overflow paths. Validation: git diff --check; ASAN CMake configuration; ASAN object build for health_prototypes.c and rrdlabels.c; ASAN build-asan-min/netdata -W rrdlabelstest. Independent source review: three required reviewers approved with no blockers. (cherry picked from commit a2aeaeb) (cherry picked from commit 57ac9b93f167d0d0fba32c18e71860dc4bc6cf69)
MEM-pluginsd-004 is a potential integer underflow: pluginsd_host() reconstructs vnode last_seen from the parser-local Judy array and casts (now - last_seen) to uint32_t. If CLOCK_REALTIME moves backward after the vnode timestamp is refreshed, the negative delta wraps to a huge age and can incorrectly mark an active virtual node stale. Current vs potential: HOST localhost stale scanning is a current PLUGINSD parser path, but the faulty branch requires a wall-clock rollback or equivalent realtime skew, so this is classified as potential. Functional impact: normal stale detection is unchanged when now is after last_seen. Future last_seen values are treated as age 0, preventing false vnode stale transitions, online flag churn, node-state updates, and streaming stop signaling caused by clock skew. Performance impact: adds one signed time_t comparison per tracked vnode during stale scans; no allocations, locks, or additional scans. Solution: compute seen_seconds_ago only when now is after last_seen and clamp future or equal timestamps to zero before comparing with node_stale_after_seconds. (cherry picked from commit fd2c57d) (cherry picked from commit c9b0e14ff648b9aae3e77d7ee7c618b73e28758e)
facets_reset_key() marked a key as FACET_KEY_VALUE_NONE and reset the hash, but left the transient current-value payload from the previous row in place. That included the borrowed raw pointer, raw length, reusable buffer length, and current FACET_VALUE pointer. Current vs potential: current valid flows appear to gate current-value reads on FACET_KEY_VALUE_UPDATED or repopulate missing values before reading them, so this is a potential stale-state bug rather than a proven currently exercised read. Leaving stale payload fields after clearing the update flag still makes future or accidental ungated reads observe prior-row state. Solution: reset raw, raw_len, b->len, and v at the same point that the key is marked as having no current value. This matches the existing empty and unsampled setter pattern for the raw fields while preserving ownership of the key-owned reusable buffer. Functional impact: valid facet processing is unchanged because valid readers must not consume current-value payload after FACET_KEY_VALUE_NONE and the next value assignment repopulates these fields before use. Invalid stale reads now see null or empty reset state instead of a previous row's payload. Performance impact: negligible. The reset helper gains four scalar stores and adds no allocation, freeing, locking, hashing, branching, or copying. (cherry picked from commit 08d9439) (cherry picked from commit 0088b803aaa95f873c4ac27ce8df9e0c5203e469)
Issue: check_if_resumed_from_suspension() subtracted realtime and monotonic samples as unsigned usec_t values before proving the clocks had moved forward. A backward CLOCK_REALTIME step could wrap realtime - last_realtime into a huge value and falsely trigger hibernation delay. Current vs potential: the arithmetic bug is current in the source. Runtime impact is potential because it requires a backward realtime sample between health loop iterations. Solution: require realtime to move forward and monotonic time to not move backward before computing deltas, then compare realtime_delta > 2 * monotonic_delta with subtraction instead of multiplication to avoid unsigned overflow. Functional impact: normal forward-clock hibernation detection is unchanged. Backward or non-forward samples now return false instead of feeding wrapped deltas into the detector. Performance impact: two branch predicates and two local delta variables once per health loop iteration; no allocation, locking, I/O, or lifecycle changes. Validation: git diff --check passed; the touched health_event_loop object compiled with low-priority build; codegraph and rg found one direct caller; external reviewers glm, minimax, deepseek, and kimi approved the actual diff. (cherry picked from commit 732a609) (cherry picked from commit 450e9d0674055514326cae86c053649c60a5c3bf)
str_to_facets_hash() decoded eleven fixed character offsets before validating the local string contract. The current code paths already validate externally supplied facet ids before they reach this helper, so this is a potential helper-local safety bug rather than a currently exercised public-input crash path. Keep the fix surgical by rejecting NULL and any string whose bounded length is not the canonical FACET_STRING_HASH_SIZE - 1 before the fixed-offset decode. Valid generated or validated facet ids still decode exactly as before, while invalid direct helper inputs return FACETS_HASH_ZERO, matching the existing sentinel behavior used by the caller paths. Functional impact: none for valid facet ids and no intended behavior change for the existing request flows; invalid helper-local input is rejected earlier instead of being decoded from undersized data. Performance impact: negligible. The added strnlen() is bounded to FACET_STRING_HASH_SIZE and runs only while parsing or registering selected facet ids, not in the per-row log scan path. Validation: worker traced direct callers with codegraph and rg, verified request-path impact, ran git diff --check, configured CMake with low-priority scheduling, and built the libnetdata target. Reviewer gate: glm, minimax, deepseek, and kimi approved after semantic review; minimax required a same-scope rerun because the first output was incomplete. (cherry picked from commit 722b6fc) (cherry picked from commit f0c95f780a4feb12d3041cc4a8a7a12ae24ccd75)
registry_fix_machine_name() normalizes leading, internal, and trailing whitespace by walking a mutable string. Its old trailing-trim loop used while (--t >= s), which can form a pointer one byte before the start of the string before the relational bounds check is evaluated. For empty or all-trimmed normalized input, that is undefined behavior even when the pointer is not dereferenced. Current vs potential bug: the undefined pointer expression is source-level current in the helper for empty normalized input. Runtime reachability is plausible through registry request or log-replay inputs, but the fix does not depend on proving a crash path because the existing loop violates the C pointer bounds contract. Solution: keep t as the one-past-end pointer from the forward scan and trim with while (t > s) plus t[-1]. When the previous character is a space, decrement t inside the assignment and write the terminator; otherwise stop. The final length remains t - s. Functional impact: intended behavior is unchanged. The helper still skips leading whitespace, converts internal whitespace to spaces, trims trailing spaces in place, returns the same normalized pointer, and reports the same length for valid inputs. Performance impact: neutral. The algorithm remains a forward O(n) normalization pass plus backward O(n) trim pass, with no allocation, locking, hashing, I/O, or dictionary work. Validation: inspected the staged diff and kept the change limited to registry_fix_machine_name(). Worker validation passed whitespace checks, a low-priority CMake configure, and a low-priority registry_internals object build. Required local reviewer quorum approved the source-level fix. (cherry picked from commit 5ab5428) (cherry picked from commit de24a1dd6a3d940534ef8a1e024b673433eaa45c)
Contributor
There was a problem hiding this comment.
No issues found across 6 files
Confidence score: 5/5
- Automated review surfaced no issues in the provided summaries.
- No files require special attention.
Architecture diagram
sequenceDiagram
participant Health as Health Event Loop
participant Pluginsd as Pluginsd Parser
participant Facets as Facets Engine
participant Registry as Registry Internals
participant Labels as RRDLabels Unit Test
Note over Health,Labels: Runtime flow for edge case fixes
Health->>Health: check_if_resumed_from_suspension()
Health->>Health: Get realtime and monotonic clocks
alt Clock moving forward
Health->>Health: Compute realtime_delta and monotonic_delta
alt realtime_delta > monotonic_delta AND realtime_delta - monotonic_delta > monotonic_delta
Health->>Health: Set resumed from suspension flag
end
else Clock rollback / stalled
Health->>Health: Skip detection (no false hibernation)
end
Pluginsd->>Pluginsd: pluginsd_host() processing vnode age
Note over Pluginsd: Clamp seen_seconds_ago to 0 when now <= last_seen
Pluginsd->>Pluginsd: Compute (now > last_seen) ? now - last_seen : 0
alt seen_seconds_ago >= stale_after_seconds
Pluginsd->>Pluginsd: Clear RRDHOST_OPTION_VIRTUAL_HOST (stale)
end
Facets->>Facets: str_to_facets_hash() with input string
opt Input is NULL or not exact hash length
Facets->>Facets: Return FACETS_HASH_ZERO (reject invalid)
end
Facets->>Facets: facets_reset_key() for key cleanup
Facets->>Facets: Clear current_value.raw, raw_len, b->len, v
Facets->>Facets: Reset transient fields to avoid stale reads
Registry->>Registry: registry_fix_machine_name() trimming trailing spaces
loop While t > s
alt t[-1] == ' '
Registry->>Registry: Decrement t and set to null
else
Registry->>Registry: Break on non-space
end
end
Note over Registry: No pointer before start of string
Labels->>Labels: rrdlabels_unittest_check_pattern_list()
Labels->>Labels: Test =_hostname (false - invalid pattern)
Labels->>Labels: Test _hostname= (true - empty value OK)
Labels-->>Labels: Assert match/no-match as expected
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.




Summary
Summary by cubic
Fixes edge cases in time and string handling to prevent false hibernation/stale states and remove undefined behavior across health, pluginsd, facets, registry, and label parsing. Adds label matcher tests for '=' edge cases.
now <= last_seento prevent false stale transitions on clock rollback.Written for commit 5b682c6. Summary will update on new commits.