Fix integer overflow in pad output shape computation#115456
Open
mohammadmseet-hue wants to merge 3 commits intotensorflow:masterfrom
Open
Fix integer overflow in pad output shape computation#115456mohammadmseet-hue wants to merge 3 commits intotensorflow:masterfrom
mohammadmseet-hue wants to merge 3 commits intotensorflow:masterfrom
Conversation
mirror_pad::GetPaddedOutputShape() computes each output dimension as
`SizeOfDimension(input, i) + left_pad + right_pad`, where `left_pad`
and `right_pad` are int64_t values that come straight from the
`padding_matrix` tensor (a model-constant on the eager-resize path
gated by IsConstantOrPersistentTensor in Prepare). The int64 sum is
then implicitly narrowed to `int` when stored into
TfLiteIntArray::data[i].
Without bounds checks, a malicious .tflite that contains a MirrorPad
op with large or negative padding values can:
- silently narrow a multi-billion intended dimension into a small or
negative int that gets written into output_size->data[i]
- reach ResizeTensor with the wrapped dimension, allocating an
undersized output buffer
- have Eval (MirrorPadWorkerTask::Run) compute output_size as
NumElements(output_tensor) and then iterate through that count
while indexing via input_dims_num_elements stride math derived
from the original (un-wrapped) input dims, producing a heap-buffer-
overflow write whose size and content are controlled by the model
Notably, mirror_pad has no equivalent of pad.cc's CheckPaddingOverflow
helper — there is no upstream bounds check on `left_pad` / `right_pad`
at all.
Fix
---
Validate left_pad and right_pad as non-negative, do the addition in
int64_t, and bounds-check the result against
std::numeric_limits<int32_t>::max() before storing into
TfLiteIntArray::data[i]. On any failure, return nullptr — both
existing call sites (Prepare and Eval) already handle a nullptr
unique_ptr by returning kTfLiteError.
Also drop <stdint.h> / <stddef.h> in favor of <cstdint> / <cstddef>
per the style review on PR tensorflow#115031.
This is the same family of fix as PRs tensorflow#115031 (stablehlo_reduce_window),
tensorflow#115452 (stablehlo_pad), tensorflow#115453 (space_to_batch_nd / batch_to_space_nd),
and tensorflow#115454 (tile). Mirror_pad is the closest sibling of stablehlo_pad
and shares the exact same downstream OOB pattern.
pad::ResizeOutputTensor() validates each individual padding value via CheckPaddingOverflow (it bounds them to fit in int32), but does NOT validate that the SUM `input + before_padding + after_padding` fits in int32. Two int32 paddings near INT32_MAX summed with any positive input dim trivially overflow int32 while each individual value stays "valid". The wrapped sum is stored into output_size->data[idx] (int) and used by ResizeTensor to allocate the output buffer. optimized_ops::Pad in Eval later iterates over the real padding values via op_params.left/right_padding (which come from the same paddings tensor but NOT through the wrapped sum), writing past the under-sized allocation — a heap-buffer-overflow write whose size and content are controlled by the model. A malicious .tflite that contains a Pad op with a paddings tensor whose two int32 values sum to more than INT32_MAX can therefore corrupt heap memory in any TFLite consumer that loads the model. Fix --- Compute the per-dimension output size in int64 and bounds-check the result against INT32_MAX before storing into TfLiteIntArray::data[]. On overflow, free the partially-built output_size, log a kernel error, and return kTfLiteError before ResizeTensor. This is the same family of fix as PRs tensorflow#115031 (stablehlo_reduce_window), tensorflow#115452 (stablehlo_pad), tensorflow#115453 (space/batch_to_batch_nd), tensorflow#115454 (tile), and tensorflow#115455 (mirror_pad). pad.cc has CheckPaddingOverflow as a "looks like a bounds check" gate, but as documented in the previous fixes, per-operand validation is not sufficient when the sum of valid operands can itself overflow. Also drop <stdint.h> in favor of <cstdint> per the style review on PR tensorflow#115031.
mohammadmseet-hue
added a commit
to mohammadmseet-hue/tensorflow
that referenced
this pull request
Apr 8, 2026
fill::ResizeOutputImpl<T>() and broadcast_to::ResizeOutputTensor() both
consume an attacker-controlled int32 or int64 shape tensor and assign
each value into TfLiteIntArray::data[i] (int) without validating that
the value fits in int32 — a silent narrowing that can wrap any
multi-billion intended dimension into a small or negative int.
fill.cc
-------
T data = GetTensorData<T>(dims)[i];
if (data < 0) { error } // catches negative
output_shape->data[i] = data; // T -> int silent narrowing
The negativity check is necessary but not sufficient: when T == int64_t a
positive int64 value such as 0x100000001 passes `data < 0` and silently
narrows to a small int when assigned to output_shape->data[i].
broadcast_to.cc
---------------
auto get_shape_data = [op_context](int i) -> int32_t {
if (op_context->shape->type == kTfLiteInt32) {
return GetTensorData<int32_t>(op_context->shape)[i];
} else {
return GetTensorData<int64_t>(op_context->shape)[i];
}
};
...
output_shape->data[idx] = get_shape_data(idx);
The lambda forcibly narrows int64 -> int32 in its return type, throwing
away the high bits. There is no negativity check, no upper-bound check,
and no validation between the attacker shape tensor and ResizeTensor.
The chain in both kernels is identical to the bugs already fixed in
PRs tensorflow#115031 / tensorflow#115452 / tensorflow#115453 / tensorflow#115454 / tensorflow#115455 / tensorflow#115456: the
wrapped per-dim values flow into ResizeTensor; the kernel Eval path
later iterates over the un-wrapped intended output element count and
writes past the under-sized backing allocation — a heap-buffer-overflow
write controlled by the model.
Fix
---
In fill.cc, bounds-check `data` against the int32 range explicitly
before assigning into output_shape->data[i].
In broadcast_to.cc, change the return type of `get_shape_data` to
int64_t so the high bits survive, and bounds-check the value against
the int32 range at the assignment site. Both checks log a kernel
error and return kTfLiteError before ResizeTensor is reached.
Also drop <stdint.h> in favor of <cstdint> for both files (C++ only
translation units), per the style review on PR tensorflow#115031.
4 tasks
mohammadmseet-hue
added a commit
to mohammadmseet-hue/tensorflow
that referenced
this pull request
Apr 8, 2026
ResizeOutputTensor() in strided_slice.cc computes each output dimension as `dim_shape = end - begin`, both int32 values derived from the attacker-controlled begin / end / strides tensors. The subtraction is unchecked: with attacker-chosen begin and end (e.g. begin = -2, end = INT32_MAX) the int32 result silently wraps to INT32_MIN+1 — a same-sign value that is NOT caught by the `(dim_shape < 0) != (stride < 0)` guard. Subsequent division by stride (itself attacker-controlled) propagates the wrap into output_shape_vector, which is then passed to ResizeTensor. Additionally, the existing TFLITE_CHECK_LT(dim_shape, 0) before the negative-stride division is a release-build no-op (DCHECK) and the division itself can invoke UB if stride == INT32_MIN (because the unsigned absolute value of INT32_MIN cannot be represented in int32). A malicious .tflite that contains a StridedSlice op with crafted begin / end / stride constant tensors can therefore drive the per-dim output size to a wrapped value, ResizeTensor allocates an undersized output buffer, and the inner StridedSlice loop in Eval iterates over the un-wrapped logical output region and writes past the allocation — a heap-buffer-overflow write whose size and content are controlled by the model. Fix --- Promote dim_shape to int64_t before the subtraction so attacker int32 end / begin values cannot wrap. After the division, bounds-check the result against the int32 range used by TfLiteIntArray::data[] and return kTfLiteError on overflow before ResizeTensor. Reject stride == INT32_MIN explicitly via TF_LITE_ENSURE_MSG to avoid UB in the negate-and-divide step below. Add parentheses to the existing `(dim_shape < 0) != (stride < 0)` guard for clarity (the unparenthesised form is correct only by operator precedence accident). Drop <stdint.h> in favor of <cstdint> per the style review on PR tensorflow#115031. This is the same family of fix as PRs tensorflow#115031 / tensorflow#115452 / tensorflow#115453 / tensorflow#115454 / tensorflow#115455 / tensorflow#115456 / tensorflow#115457 — same bug class, same downstream narrowing into TfLiteIntArray::data[], same heap-OOB-write outcome, same fix template (validate + early-return before ResizeTensor).
4 tasks
mohammadmseet-hue
added a commit
to mohammadmseet-hue/tensorflow
that referenced
this pull request
Apr 8, 2026
gather_nd::Prepare reads `indices_nd` from `indices->dims->data[
indices_rank - 1]`, which is an attacker-controlled int32 value coming
from the .tflite model's `indices` tensor shape. The existing bound
check only validated the upper bound:
if (indices_nd > params_rank) { error }
A *negative* value passes this check (negative is not > positive) and
propagates into the rest of Prepare:
// output_rank wraps to a huge positive int because subtracting a
// negative is the same as adding a positive
const int output_rank = indices_rank + params_rank - indices_nd - 1;
TfLiteIntArray* output_shape = TfLiteIntArrayCreate(output_rank);
...
for (int i = indices_nd; i < params_rank; ++i) {
output_shape->data[output_index++] = params->dims->data[i]; // OOB read
}
Two distinct memory-safety primitives result:
1. The loop iterates `for (int i = indices_nd; i < params_rank; ++i)`
with i starting at a negative int32 and ending at a small positive
value. Each iteration reads `params->dims->data[i]` for negative
i — an OOB read of params->dims->data on the order of ~2^31 entries.
2. `output_index` runs past `output_rank` and writes into
`output_shape->data[]` past the just-allocated TfLiteIntArray
storage, corrupting heap memory adjacent to the allocation.
3. The wrapped `output_rank` also produces a garbage TfLiteIntArray
shape, which `ResizeTensor` then uses to size the output tensor;
the kernel Eval path later walks `reference_ops::GatherNd` with the
un-wrapped intended sizes, producing a heap-buffer-overflow write
whose length and content are controlled by the model.
A malicious .tflite that contains a GatherNd op whose `indices` tensor
has its innermost dimension set to a negative int32 (e.g. via a crafted
flatbuffer) is enough to trigger the chain.
Fix
---
Validate `indices_nd >= 0` in addition to the existing
`indices_nd > params_rank` check, and add a defensive
`TF_LITE_ENSURE(context, output_rank >= 0)` after the subtraction so
any future regression that introduces a wrap is caught before the
TfLiteIntArrayCreate / loop combination.
Also drop <stdint.h> in favor of <cstdint> per the style review on
PR tensorflow#115031.
This is the same family of fix as PRs tensorflow#115031 / tensorflow#115452 / tensorflow#115453 /
tensorflow#115454 / tensorflow#115455 / tensorflow#115456 / tensorflow#115457 / tensorflow#115458, with the addition
that gather_nd carries a *second* OOB primitive (the loop reading
params->dims->data[negative]).
4 tasks
mohammadmseet-hue
added a commit
to mohammadmseet-hue/tensorflow
that referenced
this pull request
Apr 8, 2026
scatter_nd::ResizeOutputTensor and sparse_to_dense::Resize both copy
the contents of an attacker-controlled `shape` / `output_shape` input
tensor directly into TfLiteIntArray::data[i] (`int`) without
validating that each value is non-negative or fits in the int32 range
used by the TfLiteIntArray storage.
scatter_nd
----------
for (int i = 0; i < shape_rank; i++) {
output_shape->data[i] = shape_data[i];
}
`shape_data` is `IndicesT*` where IndicesT can be int32 or int64. For
int64 a positive value > INT32_MAX silently narrows; negatives pass
unchecked. The wrapped per-dim value flows into ResizeTensor and the
kernel Eval (reference_ops::ScatterNd) later writes through indices
that are not bounds-checked against the wrapped output dims —
heap-buffer-overflow write whose length and content are controlled by
the model.
sparse_to_dense
---------------
for (int i = 0; i < output_dimensions; ++i) {
output_shape_array->data[i] = GetTensorData<T>(output_shape)[i];
}
Same shape, same template-T narrowing. T = int64 silently truncates;
negatives pass through. reference_ops::SparseToDense then writes at
indices computed from sparse `indices` (also unchecked) that are
nominally within the un-truncated output region — heap OOB write.
Fix
---
In both files, accumulate each dim into an int64 temporary and reject
values that are negative or exceed std::numeric_limits<int32_t>::max()
before assigning into TfLiteIntArray::data[i]. On any failure, free
the partially-built TfLiteIntArray, log a kernel error, and return
kTfLiteError before ResizeTensor.
Drop <stdint.h> in favor of <cstdint> for both files (C++ only TUs)
per the style review on PR tensorflow#115031.
This is the same family of fix as PRs tensorflow#115031 / tensorflow#115452 / tensorflow#115453 /
tensorflow#115454 / tensorflow#115455 / tensorflow#115456 / tensorflow#115457 / tensorflow#115458 / tensorflow#115459.
4 tasks
mohammadmseet-hue
added a commit
to mohammadmseet-hue/tensorflow
that referenced
this pull request
Apr 8, 2026
…ites
slice::CalculateOutputShapeVector<T> reads `begin` and `size` values
from attacker-controlled int32 / int64 input tensors. The existing
validation has three independent gaps:
1. begin[idx] is never validated to be >= 0 or <= input_dim. With a
negative begin and size_value == -1, the code computes
`size_value = input_dim - begin` which is larger than input_dim,
producing an output shape that asks the kernel Eval path to read
past the end of input. With a begin > input_dim the same
computation underflows or overshoots.
2. The `else` branch checks `input_dim < begin + size` in template-T
arithmetic, where T can be int64. With begin = INT64_MAX and
size = 1, the addition signed-overflows to INT64_MIN and the
check `input_dim < INT64_MIN` is false → bypass. The unchecked
begin then propagates to GetBeginAndSizeVectors and into
reference_ops::Slice as `op_params.begin[i]`, where it is used
to compute `input_offset + begin * stride` for the read pointer.
Result: OOB read on the input buffer.
3. The final `static_cast<int>(size_value)` silently truncates int64
to int. A large positive int64 size value becomes a small or
negative int written into the output_shape_vector and on into
ResizeTensor, producing an undersized buffer that the kernel
later overruns.
A malicious .tflite with crafted begin / size constant tensors can
therefore drive any of these into a heap-buffer-overflow read or
write, depending on which branch is taken.
Fix
---
* Validate begin in [0, input_dim] before either branch.
* Compute `begin + size` in int64 in the else branch so the
comparison cannot wrap.
* Bounds-check the final size_value against the int range used by
output_shape_vector before the static_cast.
Drop <stdint.h> in favor of <cstdint> per the style review on
PR tensorflow#115031.
This is the same family of fix as PRs tensorflow#115031, tensorflow#115452, tensorflow#115453,
tensorflow#115455, tensorflow#115456, tensorflow#115457, tensorflow#115458, tensorflow#115459, tensorflow#115460. Twelve tflite
kernels in the CheckedInt incomplete-pattern hunt now share the same
bug class.
4 tasks
9c67d08 to
75efc75
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
pad::ResizeOutputTensor()validates each individual padding value via the first-pass loop (it bounds them to fit inint), but does NOT validate that the SUMinput + before_padding + after_paddingfits in int32. Two int32 paddings nearINT32_MAXsummed with any positive input dim trivially overflow int32 while each individual value stays "valid".The chain (master HEAD)
The wrapped sum is stored into
output_size->data[idx](int) and used byResizeTensorto allocate the output buffer.optimized_ops::PadinEvallater iterates over the real padding values viaop_params.left/right_padding(which come from the same paddings tensor but NOT through the wrapped sum), writing past the under-sized allocation — a heap-buffer-overflow write whose size and content are controlled by the model.A malicious
.tflitethat contains aPadop with a paddings tensor whose two int32 values (each individually valid) sum to more thanINT32_MAXcan therefore corrupt heap memory in any TFLite consumer that loads the model.Fix
Compute the per-dimension output size in
int64_tand bounds-check the result againstINT32_MAXbefore storing intoTfLiteIntArray::data[]. On overflow, free the partially-builtoutput_size, log a kernel error, and returnkTfLiteErrorbeforeResizeTensor.Also drop
<stdint.h>in favor of<cstdint>per the style review on PR #115031.Relationship to other PRs in this series
This is the same family of fix as PRs:
pad.cchas a per-operand validation gate that looks like a bounds check but, as documented in the previous fixes, per-operand validation is not sufficient when the sum of valid operands can itself overflow.Files changed
tensorflow/lite/kernels/pad.ccTest plan
pad_testtests pass against the patched kernel.