Fix integer overflow in pad output shape computation by mohammadmseet-hue · Pull Request #115456 · tensorflow/tensorflow

mohammadmseet-hue · 2026-04-08T15:03:42Z

Summary

pad::ResizeOutputTensor() validates each individual padding value via the first-pass loop (it bounds them to fit in int), but does NOT validate that the SUM input + before_padding + after_padding fits in int32. Two int32 paddings near INT32_MAX summed with any positive input dim trivially overflow int32 while each individual value stays "valid".

The chain (master HEAD)

template <typename PaddingIntegerType>
TfLiteStatus ResizeOutputTensor(TfLiteContext* context, PadContext* op_context) {
  ...
  // First pass: validate each individual padding value is non-negative.
  for (int idx = 0; idx < op_context->dims; ++idx) {
    int before_padding = static_cast<int>(*paddings_data++);
    int after_padding  = static_cast<int>(*paddings_data++);
    TF_LITE_ENSURE_MSG(context, (before_padding >= 0 && after_padding >= 0),
                       \"Pad value has to be greater than equal to 0.\");
  }
  // Second pass: compute the output dimensions.
  paddings_data = GetTensorData<PaddingIntegerType>(op_context->paddings);
  for (int idx = 0; idx < op_context->dims; ++idx) {
    int before_padding = static_cast<int>(*paddings_data++);
    int after_padding  = static_cast<int>(*paddings_data++);
    output_size->data[idx] =
        (input_size->data[idx] + before_padding + after_padding);   // unchecked int sum
  }
  return context->ResizeTensor(context, op_context->output, output_size);
}

The wrapped sum is stored into output_size->data[idx] (int) and used by ResizeTensor to allocate the output buffer. optimized_ops::Pad in Eval later iterates over the real padding values via op_params.left/right_padding (which come from the same paddings tensor but NOT through the wrapped sum), writing past the under-sized allocation — a heap-buffer-overflow write whose size and content are controlled by the model.

A malicious .tflite that contains a Pad op with a paddings tensor whose two int32 values (each individually valid) sum to more than INT32_MAX can therefore corrupt heap memory in any TFLite consumer that loads the model.

Fix

Compute the per-dimension output size in int64_t and bounds-check the result against INT32_MAX before storing into TfLiteIntArray::data[]. On overflow, free the partially-built output_size, log a kernel error, and return kTfLiteError before ResizeTensor.

-    output_size->data[idx] =
-        (input_size->data[idx] + before_padding + after_padding);
+    const int64_t dim = static_cast<int64_t>(input_size->data[idx]) +
+                        before_padding + after_padding;
+    if (dim < 0 || dim > std::numeric_limits<int32_t>::max()) {
+      TfLiteIntArrayFree(output_size);
+      TF_LITE_KERNEL_LOG(
+          context,
+          \"Pad: integer overflow computing input + paddings for dim %d\", idx);
+      return kTfLiteError;
+    }
+    output_size->data[idx] = static_cast<int>(dim);

Also drop <stdint.h> in favor of <cstdint> per the style review on PR #115031.

Relationship to other PRs in this series

This is the same family of fix as PRs:

#115031 — stablehlo_reduce_window
#115452 — stablehlo_pad
#115453 — space_to_batch_nd / batch_to_space_nd
#115454 — tile
#115455 — mirror_pad

pad.cc has a per-operand validation gate that looks like a bounds check but, as documented in the previous fixes, per-operand validation is not sufficient when the sum of valid operands can itself overflow.

Files changed

File	Lines
`tensorflow/lite/kernels/pad.cc`	+22 / -4

Test plan

No public API change.
No new dependencies.
Existing pad_test tests pass against the patched kernel.
Happy to add regression tests covering the sum-overflow case on request.

mirror_pad::GetPaddedOutputShape() computes each output dimension as `SizeOfDimension(input, i) + left_pad + right_pad`, where `left_pad` and `right_pad` are int64_t values that come straight from the `padding_matrix` tensor (a model-constant on the eager-resize path gated by IsConstantOrPersistentTensor in Prepare). The int64 sum is then implicitly narrowed to `int` when stored into TfLiteIntArray::data[i]. Without bounds checks, a malicious .tflite that contains a MirrorPad op with large or negative padding values can: - silently narrow a multi-billion intended dimension into a small or negative int that gets written into output_size->data[i] - reach ResizeTensor with the wrapped dimension, allocating an undersized output buffer - have Eval (MirrorPadWorkerTask::Run) compute output_size as NumElements(output_tensor) and then iterate through that count while indexing via input_dims_num_elements stride math derived from the original (un-wrapped) input dims, producing a heap-buffer- overflow write whose size and content are controlled by the model Notably, mirror_pad has no equivalent of pad.cc's CheckPaddingOverflow helper — there is no upstream bounds check on `left_pad` / `right_pad` at all. Fix --- Validate left_pad and right_pad as non-negative, do the addition in int64_t, and bounds-check the result against std::numeric_limits<int32_t>::max() before storing into TfLiteIntArray::data[i]. On any failure, return nullptr — both existing call sites (Prepare and Eval) already handle a nullptr unique_ptr by returning kTfLiteError. Also drop <stdint.h> / <stddef.h> in favor of <cstdint> / <cstddef> per the style review on PR tensorflow#115031. This is the same family of fix as PRs tensorflow#115031 (stablehlo_reduce_window), tensorflow#115452 (stablehlo_pad), tensorflow#115453 (space_to_batch_nd / batch_to_space_nd), and tensorflow#115454 (tile). Mirror_pad is the closest sibling of stablehlo_pad and shares the exact same downstream OOB pattern.

pad::ResizeOutputTensor() validates each individual padding value via CheckPaddingOverflow (it bounds them to fit in int32), but does NOT validate that the SUM `input + before_padding + after_padding` fits in int32. Two int32 paddings near INT32_MAX summed with any positive input dim trivially overflow int32 while each individual value stays "valid". The wrapped sum is stored into output_size->data[idx] (int) and used by ResizeTensor to allocate the output buffer. optimized_ops::Pad in Eval later iterates over the real padding values via op_params.left/right_padding (which come from the same paddings tensor but NOT through the wrapped sum), writing past the under-sized allocation — a heap-buffer-overflow write whose size and content are controlled by the model. A malicious .tflite that contains a Pad op with a paddings tensor whose two int32 values sum to more than INT32_MAX can therefore corrupt heap memory in any TFLite consumer that loads the model. Fix --- Compute the per-dimension output size in int64 and bounds-check the result against INT32_MAX before storing into TfLiteIntArray::data[]. On overflow, free the partially-built output_size, log a kernel error, and return kTfLiteError before ResizeTensor. This is the same family of fix as PRs tensorflow#115031 (stablehlo_reduce_window), tensorflow#115452 (stablehlo_pad), tensorflow#115453 (space/batch_to_batch_nd), tensorflow#115454 (tile), and tensorflow#115455 (mirror_pad). pad.cc has CheckPaddingOverflow as a "looks like a bounds check" gate, but as documented in the previous fixes, per-operand validation is not sufficient when the sum of valid operands can itself overflow. Also drop <stdint.h> in favor of <cstdint> per the style review on PR tensorflow#115031.

fill::ResizeOutputImpl<T>() and broadcast_to::ResizeOutputTensor() both consume an attacker-controlled int32 or int64 shape tensor and assign each value into TfLiteIntArray::data[i] (int) without validating that the value fits in int32 — a silent narrowing that can wrap any multi-billion intended dimension into a small or negative int. fill.cc ------- T data = GetTensorData<T>(dims)[i]; if (data < 0) { error } // catches negative output_shape->data[i] = data; // T -> int silent narrowing The negativity check is necessary but not sufficient: when T == int64_t a positive int64 value such as 0x100000001 passes `data < 0` and silently narrows to a small int when assigned to output_shape->data[i]. broadcast_to.cc --------------- auto get_shape_data = [op_context](int i) -> int32_t { if (op_context->shape->type == kTfLiteInt32) { return GetTensorData<int32_t>(op_context->shape)[i]; } else { return GetTensorData<int64_t>(op_context->shape)[i]; } }; ... output_shape->data[idx] = get_shape_data(idx); The lambda forcibly narrows int64 -> int32 in its return type, throwing away the high bits. There is no negativity check, no upper-bound check, and no validation between the attacker shape tensor and ResizeTensor. The chain in both kernels is identical to the bugs already fixed in PRs tensorflow#115031 / tensorflow#115452 / tensorflow#115453 / tensorflow#115454 / tensorflow#115455 / tensorflow#115456: the wrapped per-dim values flow into ResizeTensor; the kernel Eval path later iterates over the un-wrapped intended output element count and writes past the under-sized backing allocation — a heap-buffer-overflow write controlled by the model. Fix --- In fill.cc, bounds-check `data` against the int32 range explicitly before assigning into output_shape->data[i]. In broadcast_to.cc, change the return type of `get_shape_data` to int64_t so the high bits survive, and bounds-check the value against the int32 range at the assignment site. Both checks log a kernel error and return kTfLiteError before ResizeTensor is reached. Also drop <stdint.h> in favor of <cstdint> for both files (C++ only translation units), per the style review on PR tensorflow#115031.

ResizeOutputTensor() in strided_slice.cc computes each output dimension as `dim_shape = end - begin`, both int32 values derived from the attacker-controlled begin / end / strides tensors. The subtraction is unchecked: with attacker-chosen begin and end (e.g. begin = -2, end = INT32_MAX) the int32 result silently wraps to INT32_MIN+1 — a same-sign value that is NOT caught by the `(dim_shape < 0) != (stride < 0)` guard. Subsequent division by stride (itself attacker-controlled) propagates the wrap into output_shape_vector, which is then passed to ResizeTensor. Additionally, the existing TFLITE_CHECK_LT(dim_shape, 0) before the negative-stride division is a release-build no-op (DCHECK) and the division itself can invoke UB if stride == INT32_MIN (because the unsigned absolute value of INT32_MIN cannot be represented in int32). A malicious .tflite that contains a StridedSlice op with crafted begin / end / stride constant tensors can therefore drive the per-dim output size to a wrapped value, ResizeTensor allocates an undersized output buffer, and the inner StridedSlice loop in Eval iterates over the un-wrapped logical output region and writes past the allocation — a heap-buffer-overflow write whose size and content are controlled by the model. Fix --- Promote dim_shape to int64_t before the subtraction so attacker int32 end / begin values cannot wrap. After the division, bounds-check the result against the int32 range used by TfLiteIntArray::data[] and return kTfLiteError on overflow before ResizeTensor. Reject stride == INT32_MIN explicitly via TF_LITE_ENSURE_MSG to avoid UB in the negate-and-divide step below. Add parentheses to the existing `(dim_shape < 0) != (stride < 0)` guard for clarity (the unparenthesised form is correct only by operator precedence accident). Drop <stdint.h> in favor of <cstdint> per the style review on PR tensorflow#115031. This is the same family of fix as PRs tensorflow#115031 / tensorflow#115452 / tensorflow#115453 / tensorflow#115454 / tensorflow#115455 / tensorflow#115456 / tensorflow#115457 — same bug class, same downstream narrowing into TfLiteIntArray::data[], same heap-OOB-write outcome, same fix template (validate + early-return before ResizeTensor).

gather_nd::Prepare reads `indices_nd` from `indices->dims->data[ indices_rank - 1]`, which is an attacker-controlled int32 value coming from the .tflite model's `indices` tensor shape. The existing bound check only validated the upper bound: if (indices_nd > params_rank) { error } A *negative* value passes this check (negative is not > positive) and propagates into the rest of Prepare: // output_rank wraps to a huge positive int because subtracting a // negative is the same as adding a positive const int output_rank = indices_rank + params_rank - indices_nd - 1; TfLiteIntArray* output_shape = TfLiteIntArrayCreate(output_rank); ... for (int i = indices_nd; i < params_rank; ++i) { output_shape->data[output_index++] = params->dims->data[i]; // OOB read } Two distinct memory-safety primitives result: 1. The loop iterates `for (int i = indices_nd; i < params_rank; ++i)` with i starting at a negative int32 and ending at a small positive value. Each iteration reads `params->dims->data[i]` for negative i — an OOB read of params->dims->data on the order of ~2^31 entries. 2. `output_index` runs past `output_rank` and writes into `output_shape->data[]` past the just-allocated TfLiteIntArray storage, corrupting heap memory adjacent to the allocation. 3. The wrapped `output_rank` also produces a garbage TfLiteIntArray shape, which `ResizeTensor` then uses to size the output tensor; the kernel Eval path later walks `reference_ops::GatherNd` with the un-wrapped intended sizes, producing a heap-buffer-overflow write whose length and content are controlled by the model. A malicious .tflite that contains a GatherNd op whose `indices` tensor has its innermost dimension set to a negative int32 (e.g. via a crafted flatbuffer) is enough to trigger the chain. Fix --- Validate `indices_nd >= 0` in addition to the existing `indices_nd > params_rank` check, and add a defensive `TF_LITE_ENSURE(context, output_rank >= 0)` after the subtraction so any future regression that introduces a wrap is caught before the TfLiteIntArrayCreate / loop combination. Also drop <stdint.h> in favor of <cstdint> per the style review on PR tensorflow#115031. This is the same family of fix as PRs tensorflow#115031 / tensorflow#115452 / tensorflow#115453 / tensorflow#115454 / tensorflow#115455 / tensorflow#115456 / tensorflow#115457 / tensorflow#115458, with the addition that gather_nd carries a *second* OOB primitive (the loop reading params->dims->data[negative]).

scatter_nd::ResizeOutputTensor and sparse_to_dense::Resize both copy the contents of an attacker-controlled `shape` / `output_shape` input tensor directly into TfLiteIntArray::data[i] (`int`) without validating that each value is non-negative or fits in the int32 range used by the TfLiteIntArray storage. scatter_nd ---------- for (int i = 0; i < shape_rank; i++) { output_shape->data[i] = shape_data[i]; } `shape_data` is `IndicesT*` where IndicesT can be int32 or int64. For int64 a positive value > INT32_MAX silently narrows; negatives pass unchecked. The wrapped per-dim value flows into ResizeTensor and the kernel Eval (reference_ops::ScatterNd) later writes through indices that are not bounds-checked against the wrapped output dims — heap-buffer-overflow write whose length and content are controlled by the model. sparse_to_dense --------------- for (int i = 0; i < output_dimensions; ++i) { output_shape_array->data[i] = GetTensorData<T>(output_shape)[i]; } Same shape, same template-T narrowing. T = int64 silently truncates; negatives pass through. reference_ops::SparseToDense then writes at indices computed from sparse `indices` (also unchecked) that are nominally within the un-truncated output region — heap OOB write. Fix --- In both files, accumulate each dim into an int64 temporary and reject values that are negative or exceed std::numeric_limits<int32_t>::max() before assigning into TfLiteIntArray::data[i]. On any failure, free the partially-built TfLiteIntArray, log a kernel error, and return kTfLiteError before ResizeTensor. Drop <stdint.h> in favor of <cstdint> for both files (C++ only TUs) per the style review on PR tensorflow#115031. This is the same family of fix as PRs tensorflow#115031 / tensorflow#115452 / tensorflow#115453 / tensorflow#115454 / tensorflow#115455 / tensorflow#115456 / tensorflow#115457 / tensorflow#115458 / tensorflow#115459.

…ites slice::CalculateOutputShapeVector<T> reads `begin` and `size` values from attacker-controlled int32 / int64 input tensors. The existing validation has three independent gaps: 1. begin[idx] is never validated to be >= 0 or <= input_dim. With a negative begin and size_value == -1, the code computes `size_value = input_dim - begin` which is larger than input_dim, producing an output shape that asks the kernel Eval path to read past the end of input. With a begin > input_dim the same computation underflows or overshoots. 2. The `else` branch checks `input_dim < begin + size` in template-T arithmetic, where T can be int64. With begin = INT64_MAX and size = 1, the addition signed-overflows to INT64_MIN and the check `input_dim < INT64_MIN` is false → bypass. The unchecked begin then propagates to GetBeginAndSizeVectors and into reference_ops::Slice as `op_params.begin[i]`, where it is used to compute `input_offset + begin * stride` for the read pointer. Result: OOB read on the input buffer. 3. The final `static_cast<int>(size_value)` silently truncates int64 to int. A large positive int64 size value becomes a small or negative int written into the output_shape_vector and on into ResizeTensor, producing an undersized buffer that the kernel later overruns. A malicious .tflite with crafted begin / size constant tensors can therefore drive any of these into a heap-buffer-overflow read or write, depending on which branch is taken. Fix --- * Validate begin in [0, input_dim] before either branch. * Compute `begin + size` in int64 in the else branch so the comparison cannot wrap. * Bounds-check the final size_value against the int range used by output_shape_vector before the static_cast. Drop <stdint.h> in favor of <cstdint> per the style review on PR tensorflow#115031. This is the same family of fix as PRs tensorflow#115031, tensorflow#115452, tensorflow#115453, tensorflow#115455, tensorflow#115456, tensorflow#115457, tensorflow#115458, tensorflow#115459, tensorflow#115460. Twelve tflite kernels in the CheckedInt incomplete-pattern hunt now share the same bug class.

mohammadmseet-hue added 2 commits April 8, 2026 17:01

google-ml-butler bot added the size:M CL Change Size: Medium label Apr 8, 2026

google-ml-butler bot assigned gbaned Apr 8, 2026

mohammadmseet-hue mentioned this pull request Apr 8, 2026

Fix integer overflow in fill / broadcast_to output shape computation #115457

Open

4 tasks

mohammadmseet-hue mentioned this pull request Apr 8, 2026

Fix integer overflow in strided_slice output dimension computation #115458

Open

4 tasks

mohammadmseet-hue mentioned this pull request Apr 8, 2026

Fix OOB read + heap OOB write in gather_nd via negative indices_nd #115459

Open

4 tasks

mohammadmseet-hue mentioned this pull request Apr 8, 2026

Fix integer overflow in scatter_nd / sparse_to_dense output shape #115460

Open

4 tasks

mohammadmseet-hue mentioned this pull request Apr 8, 2026

Fix begin/size validation in slice to prevent OOB reads / heap OOB writes #115462

Open

4 tasks

keerthanakadiri requested a review from cantonios April 9, 2026 06:03

google-ml-butler bot added the awaiting review Pull request awaiting review label Apr 9, 2026

keerthanakadiri added the comp:lite TF Lite related issues label Apr 9, 2026

keerthanakadiri added this to PR Queue Apr 9, 2026

github-project-automation bot moved this to Assigned Reviewer in PR Queue Apr 9, 2026

Remove verbose explanatory comments

75efc75

mohammadmseet-hue force-pushed the fix-pad-overflow branch from 9c67d08 to 75efc75 Compare April 9, 2026 14:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix integer overflow in pad output shape computation#115456

Fix integer overflow in pad output shape computation#115456
mohammadmseet-hue wants to merge 3 commits intotensorflow:masterfrom
mohammadmseet-hue:fix-pad-overflow

mohammadmseet-hue commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

mohammadmseet-hue commented Apr 8, 2026

Summary

The chain (master HEAD)

Fix

Relationship to other PRs in this series

Files changed

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants