Skip to content

Fix integer overflow in pad output shape computation#115456

Open
mohammadmseet-hue wants to merge 3 commits intotensorflow:masterfrom
mohammadmseet-hue:fix-pad-overflow
Open

Fix integer overflow in pad output shape computation#115456
mohammadmseet-hue wants to merge 3 commits intotensorflow:masterfrom
mohammadmseet-hue:fix-pad-overflow

Conversation

@mohammadmseet-hue
Copy link
Copy Markdown

Summary

pad::ResizeOutputTensor() validates each individual padding value via the first-pass loop (it bounds them to fit in int), but does NOT validate that the SUM input + before_padding + after_padding fits in int32. Two int32 paddings near INT32_MAX summed with any positive input dim trivially overflow int32 while each individual value stays "valid".

The chain (master HEAD)

template <typename PaddingIntegerType>
TfLiteStatus ResizeOutputTensor(TfLiteContext* context, PadContext* op_context) {
  ...
  // First pass: validate each individual padding value is non-negative.
  for (int idx = 0; idx < op_context->dims; ++idx) {
    int before_padding = static_cast<int>(*paddings_data++);
    int after_padding  = static_cast<int>(*paddings_data++);
    TF_LITE_ENSURE_MSG(context, (before_padding >= 0 && after_padding >= 0),
                       \"Pad value has to be greater than equal to 0.\");
  }
  // Second pass: compute the output dimensions.
  paddings_data = GetTensorData<PaddingIntegerType>(op_context->paddings);
  for (int idx = 0; idx < op_context->dims; ++idx) {
    int before_padding = static_cast<int>(*paddings_data++);
    int after_padding  = static_cast<int>(*paddings_data++);
    output_size->data[idx] =
        (input_size->data[idx] + before_padding + after_padding);   // unchecked int sum
  }
  return context->ResizeTensor(context, op_context->output, output_size);
}

The wrapped sum is stored into output_size->data[idx] (int) and used by ResizeTensor to allocate the output buffer. optimized_ops::Pad in Eval later iterates over the real padding values via op_params.left/right_padding (which come from the same paddings tensor but NOT through the wrapped sum), writing past the under-sized allocation — a heap-buffer-overflow write whose size and content are controlled by the model.

A malicious .tflite that contains a Pad op with a paddings tensor whose two int32 values (each individually valid) sum to more than INT32_MAX can therefore corrupt heap memory in any TFLite consumer that loads the model.

Fix

Compute the per-dimension output size in int64_t and bounds-check the result against INT32_MAX before storing into TfLiteIntArray::data[]. On overflow, free the partially-built output_size, log a kernel error, and return kTfLiteError before ResizeTensor.

-    output_size->data[idx] =
-        (input_size->data[idx] + before_padding + after_padding);
+    const int64_t dim = static_cast<int64_t>(input_size->data[idx]) +
+                        before_padding + after_padding;
+    if (dim < 0 || dim > std::numeric_limits<int32_t>::max()) {
+      TfLiteIntArrayFree(output_size);
+      TF_LITE_KERNEL_LOG(
+          context,
+          \"Pad: integer overflow computing input + paddings for dim %d\", idx);
+      return kTfLiteError;
+    }
+    output_size->data[idx] = static_cast<int>(dim);

Also drop <stdint.h> in favor of <cstdint> per the style review on PR #115031.

Relationship to other PRs in this series

This is the same family of fix as PRs:

pad.cc has a per-operand validation gate that looks like a bounds check but, as documented in the previous fixes, per-operand validation is not sufficient when the sum of valid operands can itself overflow.

Files changed

File Lines
tensorflow/lite/kernels/pad.cc +22 / -4

Test plan

  • No public API change.
  • No new dependencies.
  • Existing pad_test tests pass against the patched kernel.
  • Happy to add regression tests covering the sum-overflow case on request.

mirror_pad::GetPaddedOutputShape() computes each output dimension as
`SizeOfDimension(input, i) + left_pad + right_pad`, where `left_pad`
and `right_pad` are int64_t values that come straight from the
`padding_matrix` tensor (a model-constant on the eager-resize path
gated by IsConstantOrPersistentTensor in Prepare). The int64 sum is
then implicitly narrowed to `int` when stored into
TfLiteIntArray::data[i].

Without bounds checks, a malicious .tflite that contains a MirrorPad
op with large or negative padding values can:

  - silently narrow a multi-billion intended dimension into a small or
    negative int that gets written into output_size->data[i]
  - reach ResizeTensor with the wrapped dimension, allocating an
    undersized output buffer
  - have Eval (MirrorPadWorkerTask::Run) compute output_size as
    NumElements(output_tensor) and then iterate through that count
    while indexing via input_dims_num_elements stride math derived
    from the original (un-wrapped) input dims, producing a heap-buffer-
    overflow write whose size and content are controlled by the model

Notably, mirror_pad has no equivalent of pad.cc's CheckPaddingOverflow
helper — there is no upstream bounds check on `left_pad` / `right_pad`
at all.

Fix
---

Validate left_pad and right_pad as non-negative, do the addition in
int64_t, and bounds-check the result against
std::numeric_limits<int32_t>::max() before storing into
TfLiteIntArray::data[i]. On any failure, return nullptr — both
existing call sites (Prepare and Eval) already handle a nullptr
unique_ptr by returning kTfLiteError.

Also drop <stdint.h> / <stddef.h> in favor of <cstdint> / <cstddef>
per the style review on PR tensorflow#115031.

This is the same family of fix as PRs tensorflow#115031 (stablehlo_reduce_window),
tensorflow#115452 (stablehlo_pad), tensorflow#115453 (space_to_batch_nd / batch_to_space_nd),
and tensorflow#115454 (tile). Mirror_pad is the closest sibling of stablehlo_pad
and shares the exact same downstream OOB pattern.
pad::ResizeOutputTensor() validates each individual padding value via
CheckPaddingOverflow (it bounds them to fit in int32), but does NOT
validate that the SUM `input + before_padding + after_padding` fits in
int32. Two int32 paddings near INT32_MAX summed with any positive
input dim trivially overflow int32 while each individual value stays
"valid".

The wrapped sum is stored into output_size->data[idx] (int) and used by
ResizeTensor to allocate the output buffer. optimized_ops::Pad in Eval
later iterates over the real padding values via op_params.left/right_padding
(which come from the same paddings tensor but NOT through the wrapped
sum), writing past the under-sized allocation — a heap-buffer-overflow
write whose size and content are controlled by the model.

A malicious .tflite that contains a Pad op with a paddings tensor whose
two int32 values sum to more than INT32_MAX can therefore corrupt heap
memory in any TFLite consumer that loads the model.

Fix
---

Compute the per-dimension output size in int64 and bounds-check the
result against INT32_MAX before storing into TfLiteIntArray::data[].
On overflow, free the partially-built output_size, log a kernel error,
and return kTfLiteError before ResizeTensor.

This is the same family of fix as PRs tensorflow#115031 (stablehlo_reduce_window),
tensorflow#115452 (stablehlo_pad), tensorflow#115453 (space/batch_to_batch_nd), tensorflow#115454
(tile), and tensorflow#115455 (mirror_pad). pad.cc has CheckPaddingOverflow as a
"looks like a bounds check" gate, but as documented in the previous
fixes, per-operand validation is not sufficient when the sum of valid
operands can itself overflow.

Also drop <stdint.h> in favor of <cstdint> per the style review on
PR tensorflow#115031.
@google-ml-butler google-ml-butler bot added the size:M CL Change Size: Medium label Apr 8, 2026
mohammadmseet-hue added a commit to mohammadmseet-hue/tensorflow that referenced this pull request Apr 8, 2026
fill::ResizeOutputImpl<T>() and broadcast_to::ResizeOutputTensor() both
consume an attacker-controlled int32 or int64 shape tensor and assign
each value into TfLiteIntArray::data[i] (int) without validating that
the value fits in int32 — a silent narrowing that can wrap any
multi-billion intended dimension into a small or negative int.

fill.cc
-------

  T data = GetTensorData<T>(dims)[i];
  if (data < 0) { error }                  // catches negative
  output_shape->data[i] = data;             // T -> int silent narrowing

The negativity check is necessary but not sufficient: when T == int64_t a
positive int64 value such as 0x100000001 passes `data < 0` and silently
narrows to a small int when assigned to output_shape->data[i].

broadcast_to.cc
---------------

  auto get_shape_data = [op_context](int i) -> int32_t {
    if (op_context->shape->type == kTfLiteInt32) {
      return GetTensorData<int32_t>(op_context->shape)[i];
    } else {
      return GetTensorData<int64_t>(op_context->shape)[i];
    }
  };
  ...
  output_shape->data[idx] = get_shape_data(idx);

The lambda forcibly narrows int64 -> int32 in its return type, throwing
away the high bits. There is no negativity check, no upper-bound check,
and no validation between the attacker shape tensor and ResizeTensor.

The chain in both kernels is identical to the bugs already fixed in
PRs tensorflow#115031 / tensorflow#115452 / tensorflow#115453 / tensorflow#115454 / tensorflow#115455 / tensorflow#115456: the
wrapped per-dim values flow into ResizeTensor; the kernel Eval path
later iterates over the un-wrapped intended output element count and
writes past the under-sized backing allocation — a heap-buffer-overflow
write controlled by the model.

Fix
---

In fill.cc, bounds-check `data` against the int32 range explicitly
before assigning into output_shape->data[i].

In broadcast_to.cc, change the return type of `get_shape_data` to
int64_t so the high bits survive, and bounds-check the value against
the int32 range at the assignment site. Both checks log a kernel
error and return kTfLiteError before ResizeTensor is reached.

Also drop <stdint.h> in favor of <cstdint> for both files (C++ only
translation units), per the style review on PR tensorflow#115031.
mohammadmseet-hue added a commit to mohammadmseet-hue/tensorflow that referenced this pull request Apr 8, 2026
ResizeOutputTensor() in strided_slice.cc computes each output dimension
as `dim_shape = end - begin`, both int32 values derived from the
attacker-controlled begin / end / strides tensors. The subtraction is
unchecked: with attacker-chosen begin and end (e.g. begin = -2,
end = INT32_MAX) the int32 result silently wraps to INT32_MIN+1 — a
same-sign value that is NOT caught by the
`(dim_shape < 0) != (stride < 0)` guard. Subsequent division by stride
(itself attacker-controlled) propagates the wrap into output_shape_vector,
which is then passed to ResizeTensor.

Additionally, the existing TFLITE_CHECK_LT(dim_shape, 0) before the
negative-stride division is a release-build no-op (DCHECK) and the
division itself can invoke UB if stride == INT32_MIN (because the
unsigned absolute value of INT32_MIN cannot be represented in int32).

A malicious .tflite that contains a StridedSlice op with crafted
begin / end / stride constant tensors can therefore drive the per-dim
output size to a wrapped value, ResizeTensor allocates an undersized
output buffer, and the inner StridedSlice loop in Eval iterates over
the un-wrapped logical output region and writes past the allocation —
a heap-buffer-overflow write whose size and content are controlled by
the model.

Fix
---

Promote dim_shape to int64_t before the subtraction so attacker int32
end / begin values cannot wrap. After the division, bounds-check the
result against the int32 range used by TfLiteIntArray::data[] and
return kTfLiteError on overflow before ResizeTensor.

Reject stride == INT32_MIN explicitly via TF_LITE_ENSURE_MSG to avoid
UB in the negate-and-divide step below.

Add parentheses to the existing `(dim_shape < 0) != (stride < 0)`
guard for clarity (the unparenthesised form is correct only by
operator precedence accident).

Drop <stdint.h> in favor of <cstdint> per the style review on
PR tensorflow#115031.

This is the same family of fix as PRs tensorflow#115031 / tensorflow#115452 / tensorflow#115453 /
tensorflow#115454 / tensorflow#115455 / tensorflow#115456 / tensorflow#115457 — same bug class, same downstream
narrowing into TfLiteIntArray::data[], same heap-OOB-write outcome,
same fix template (validate + early-return before ResizeTensor).
mohammadmseet-hue added a commit to mohammadmseet-hue/tensorflow that referenced this pull request Apr 8, 2026
gather_nd::Prepare reads `indices_nd` from `indices->dims->data[
indices_rank - 1]`, which is an attacker-controlled int32 value coming
from the .tflite model's `indices` tensor shape. The existing bound
check only validated the upper bound:

  if (indices_nd > params_rank) { error }

A *negative* value passes this check (negative is not > positive) and
propagates into the rest of Prepare:

  // output_rank wraps to a huge positive int because subtracting a
  // negative is the same as adding a positive
  const int output_rank = indices_rank + params_rank - indices_nd - 1;
  TfLiteIntArray* output_shape = TfLiteIntArrayCreate(output_rank);
  ...
  for (int i = indices_nd; i < params_rank; ++i) {
    output_shape->data[output_index++] = params->dims->data[i];   // OOB read
  }

Two distinct memory-safety primitives result:

1. The loop iterates `for (int i = indices_nd; i < params_rank; ++i)`
   with i starting at a negative int32 and ending at a small positive
   value. Each iteration reads `params->dims->data[i]` for negative
   i — an OOB read of params->dims->data on the order of ~2^31 entries.

2. `output_index` runs past `output_rank` and writes into
   `output_shape->data[]` past the just-allocated TfLiteIntArray
   storage, corrupting heap memory adjacent to the allocation.

3. The wrapped `output_rank` also produces a garbage TfLiteIntArray
   shape, which `ResizeTensor` then uses to size the output tensor;
   the kernel Eval path later walks `reference_ops::GatherNd` with the
   un-wrapped intended sizes, producing a heap-buffer-overflow write
   whose length and content are controlled by the model.

A malicious .tflite that contains a GatherNd op whose `indices` tensor
has its innermost dimension set to a negative int32 (e.g. via a crafted
flatbuffer) is enough to trigger the chain.

Fix
---

Validate `indices_nd >= 0` in addition to the existing
`indices_nd > params_rank` check, and add a defensive
`TF_LITE_ENSURE(context, output_rank >= 0)` after the subtraction so
any future regression that introduces a wrap is caught before the
TfLiteIntArrayCreate / loop combination.

Also drop <stdint.h> in favor of <cstdint> per the style review on
PR tensorflow#115031.

This is the same family of fix as PRs tensorflow#115031 / tensorflow#115452 / tensorflow#115453 /
tensorflow#115454 / tensorflow#115455 / tensorflow#115456 / tensorflow#115457 / tensorflow#115458, with the addition
that gather_nd carries a *second* OOB primitive (the loop reading
params->dims->data[negative]).
mohammadmseet-hue added a commit to mohammadmseet-hue/tensorflow that referenced this pull request Apr 8, 2026
scatter_nd::ResizeOutputTensor and sparse_to_dense::Resize both copy
the contents of an attacker-controlled `shape` / `output_shape` input
tensor directly into TfLiteIntArray::data[i] (`int`) without
validating that each value is non-negative or fits in the int32 range
used by the TfLiteIntArray storage.

scatter_nd
----------

  for (int i = 0; i < shape_rank; i++) {
    output_shape->data[i] = shape_data[i];
  }

`shape_data` is `IndicesT*` where IndicesT can be int32 or int64. For
int64 a positive value > INT32_MAX silently narrows; negatives pass
unchecked. The wrapped per-dim value flows into ResizeTensor and the
kernel Eval (reference_ops::ScatterNd) later writes through indices
that are not bounds-checked against the wrapped output dims —
heap-buffer-overflow write whose length and content are controlled by
the model.

sparse_to_dense
---------------

  for (int i = 0; i < output_dimensions; ++i) {
    output_shape_array->data[i] = GetTensorData<T>(output_shape)[i];
  }

Same shape, same template-T narrowing. T = int64 silently truncates;
negatives pass through. reference_ops::SparseToDense then writes at
indices computed from sparse `indices` (also unchecked) that are
nominally within the un-truncated output region — heap OOB write.

Fix
---

In both files, accumulate each dim into an int64 temporary and reject
values that are negative or exceed std::numeric_limits<int32_t>::max()
before assigning into TfLiteIntArray::data[i]. On any failure, free
the partially-built TfLiteIntArray, log a kernel error, and return
kTfLiteError before ResizeTensor.

Drop <stdint.h> in favor of <cstdint> for both files (C++ only TUs)
per the style review on PR tensorflow#115031.

This is the same family of fix as PRs tensorflow#115031 / tensorflow#115452 / tensorflow#115453 /
tensorflow#115454 / tensorflow#115455 / tensorflow#115456 / tensorflow#115457 / tensorflow#115458 / tensorflow#115459.
mohammadmseet-hue added a commit to mohammadmseet-hue/tensorflow that referenced this pull request Apr 8, 2026
…ites

slice::CalculateOutputShapeVector<T> reads `begin` and `size` values
from attacker-controlled int32 / int64 input tensors. The existing
validation has three independent gaps:

  1. begin[idx] is never validated to be >= 0 or <= input_dim. With a
     negative begin and size_value == -1, the code computes
     `size_value = input_dim - begin` which is larger than input_dim,
     producing an output shape that asks the kernel Eval path to read
     past the end of input. With a begin > input_dim the same
     computation underflows or overshoots.

  2. The `else` branch checks `input_dim < begin + size` in template-T
     arithmetic, where T can be int64. With begin = INT64_MAX and
     size = 1, the addition signed-overflows to INT64_MIN and the
     check `input_dim < INT64_MIN` is false → bypass. The unchecked
     begin then propagates to GetBeginAndSizeVectors and into
     reference_ops::Slice as `op_params.begin[i]`, where it is used
     to compute `input_offset + begin * stride` for the read pointer.
     Result: OOB read on the input buffer.

  3. The final `static_cast<int>(size_value)` silently truncates int64
     to int. A large positive int64 size value becomes a small or
     negative int written into the output_shape_vector and on into
     ResizeTensor, producing an undersized buffer that the kernel
     later overruns.

A malicious .tflite with crafted begin / size constant tensors can
therefore drive any of these into a heap-buffer-overflow read or
write, depending on which branch is taken.

Fix
---

  * Validate begin in [0, input_dim] before either branch.
  * Compute `begin + size` in int64 in the else branch so the
    comparison cannot wrap.
  * Bounds-check the final size_value against the int range used by
    output_shape_vector before the static_cast.

Drop <stdint.h> in favor of <cstdint> per the style review on
PR tensorflow#115031.

This is the same family of fix as PRs tensorflow#115031, tensorflow#115452, tensorflow#115453,
tensorflow#115455, tensorflow#115456, tensorflow#115457, tensorflow#115458, tensorflow#115459, tensorflow#115460. Twelve tflite
kernels in the CheckedInt incomplete-pattern hunt now share the same
bug class.
@google-ml-butler google-ml-butler bot added the awaiting review Pull request awaiting review label Apr 9, 2026
@keerthanakadiri keerthanakadiri added the comp:lite TF Lite related issues label Apr 9, 2026
@github-project-automation github-project-automation bot moved this to Assigned Reviewer in PR Queue Apr 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

awaiting review Pull request awaiting review comp:lite TF Lite related issues size:M CL Change Size: Medium

Projects

Status: Assigned Reviewer

Development

Successfully merging this pull request may close these issues.

3 participants