Fix integer overflow in stablehlo.pad output shape computation#115452
Fix integer overflow in stablehlo.pad output shape computation#115452mohammadmseet-hue wants to merge 2 commits intotensorflow:masterfrom
Conversation
PadData::Setup() in tensorflow/lite/kernels/stablehlo_pad.cc computes the output tensor's shape, strides, offsets and total size as int64_t arithmetic on parameters that come directly from the .tflite model: the input dimensions and the per-dim `interior_padding`, `edge_padding_low`, `edge_padding_high` arrays. None of those multiplications and additions are checked for overflow. The result is then used in two places: 1. BuildOuputTensorDims() copies output_shape_[i] (int64_t) into TfLiteIntArray::data[i] (int) — a silent narrowing that can turn a wrapped negative int64 into a small positive int32. 2. ResizeTensor allocates the output buffer using those (potentially tiny) int32 dims. Apply() then writes output_size_ bytes (the un-truncated int64 product) into that tensor via FillBuffer + StridedCopy, producing a heap-buffer-overflow write whose length and content are controlled by the model. A malicious .tflite that contains a stablehlo.pad op with crafted interior_padding / edge_padding values can therefore corrupt heap memory in any TFLite consumer (mobile apps, edge devices, server-side inference) that loads the model. Fix --- Route every int64 multiplication and addition in PadData::Setup through the new CheckedInt<int64_t> helper template added in commit 1908dd3 ("Add a helper template to compute simple expressions and check that no overflow has happened."). On any overflow, set a new `overflow_` flag and bail out of Setup() early; expose the flag via a `HasOverflow()` accessor. Prepare() now consults HasOverflow() between the Setup() call and the ResizeTensor call, logs a kernel error and returns kTfLiteError if any of the shape/stride/offset/size computations wrapped. This kills the chain before the under-sized output tensor can be allocated. This is the same shape of fix that PR tensorflow#115031 applies to stablehlo_reduce_window.cc; both kernels share the same bug class and the same downstream int32 narrowing path. The patch is contained to stablehlo_pad.cc and does not introduce any new helpers — CheckedInt is reused as-is from tensorflow/lite/util.h.
…shape
`SpaceToBatchND::ResizeOutputTensor` and
`BatchToSpaceND::ResizeOutputTensor` compute the output tensor's spatial
dimensions and batch size from `input->dims->data` combined with the
contents of the model-constant `block_shape` and `paddings` / `crops`
tensors. None of those `int` arithmetic operations are checked for
overflow.
Both kernels gate the `ResizeOutputTensor` call on
`IsConstantOrPersistentTensor(block_shape) && IsConstantOrPersistentTensor(paddings | crops)`,
so when the kernel does take the eager-resize path the values come
straight from the .tflite file and a malicious model controls every
operand.
The chain
---------
SpaceToBatchND:
int final_dim_size = (input_size->data[dim+1]
+ paddings_data[dim*2]
+ paddings_data[dim*2+1]);
...
output_size->data[dim+1] = final_dim_size / block_shape[dim];
output_batch_size *= block_shape[dim];
BatchToSpaceND:
output_size->data[dim+1] = input_size->data[dim+1] * block_shape[dim]
- crops[dim*2] - crops[dim*2+1];
`final_dim_size`, `output_batch_size`, and `spatial_dim` are all `int`
and silently wrap on overflow. The wrapped values are then written into
`TfLiteIntArray::data[]` and used by `ResizeTensor` to allocate the
output tensor. `Eval` later calls into reference/optimized
SpaceToBatchND / BatchToSpaceND, which iterate over the (un-truncated)
nominal output dimensions and write past the under-sized backing
allocation — a heap-buffer-overflow write whose length and content are
controlled by the model.
A malicious .tflite that contains either op with crafted block_shape /
paddings / crops can therefore corrupt heap memory in any TFLite
consumer (mobile apps, edge devices, server-side inference) that loads
the model.
Fix
---
Route every previously-unchecked `int` arithmetic site through
`CheckedInt<int>` from `tensorflow/lite/util.h`. On any overflow, log a
kernel error, free the partially-built `output_size` and return
`kTfLiteError` before `ResizeTensor` is reached. This kills the chain
before the under-sized output tensor can be allocated.
Also drop the `<stdint.h>` include in favor of `<cstdint>` for both
files (these are C++-only translation units), per recent style review
on PR tensorflow#115031.
This is the same family of fix as PR tensorflow#115031 (stablehlo_reduce_window)
and PR tensorflow#115452 (stablehlo_pad). All three kernels share the same bug
class and the same downstream `int` narrowing into
`TfLiteIntArray::data[]`.
Tile::MultiplyShapeDims computes each output dimension as
`shape.data[i] * multipliers_v[i]` and assigns the result directly
into TfLiteIntArray::data[i] (int). For both supported multiplier
types (int32 and int64) this is unchecked attacker-controlled
arithmetic:
- kTfLiteInt32 path: int32_t * int32_t silently wraps when the
product exceeds INT_MAX.
- kTfLiteInt64 path: int64_t product is silently narrowed to int
when assigned to output_shape->data[i], which can wrap any
multi-GB nominal output dimension into a small or negative int.
Both `shape.data[i]` and `multipliers_v[i]` are attacker-controlled in
practice: the input dimensions come from the parsed model and the
multipliers tensor is frequently a model-constant that comes straight
from the .tflite file.
The wrapped per-dim values are written into output_shape->data[i] and
used by ResizeTensor to allocate the output buffer. Eval (Tile() ->
TileOneDimension()) then iterates over the original un-wrapped
multiplier values from the multipliers tensor and writes
input_size * multiplier bytes per dimension into the under-sized
output buffer — a heap-buffer-overflow write whose length and content
are controlled by the model.
A malicious .tflite that contains a Tile op with a multipliers tensor
whose values, multiplied with the input shape, exceed INT_MAX can
therefore corrupt heap memory in any TFLite consumer (mobile apps,
edge devices, server-side inference) that loads the model.
Fix
---
Convert MultiplyShapeDims to return TfLiteStatus and route the
per-dimension multiplication through CheckedInt<int> from
tensorflow/lite/util.h. CheckedInt computes the multiplication in a
widened type and tracks both arithmetic overflow (int32 * int32) and
narrowing overflow (int64 -> int), so a single check covers both
multiplier types. On any overflow we free the partially-built shape
array, log a kernel error, and return kTfLiteError before
ResizeTensor is reached. ResizeOutput propagates the status via
TF_LITE_ENSURE_OK.
Also drop the <stdint.h> include in favor of <cstdint> (C++ only
translation unit), per the style review on PR tensorflow#115031.
This is the same family of fix as PRs tensorflow#115031 (stablehlo_reduce_window),
tensorflow#115452 (stablehlo_pad) and tensorflow#115453 (space_to_batch_nd /
batch_to_space_nd). All four kernels share the same bug class and the
same downstream int narrowing path.
mirror_pad::GetPaddedOutputShape() computes each output dimension as
`SizeOfDimension(input, i) + left_pad + right_pad`, where `left_pad`
and `right_pad` are int64_t values that come straight from the
`padding_matrix` tensor (a model-constant on the eager-resize path
gated by IsConstantOrPersistentTensor in Prepare). The int64 sum is
then implicitly narrowed to `int` when stored into
TfLiteIntArray::data[i].
Without bounds checks, a malicious .tflite that contains a MirrorPad
op with large or negative padding values can:
- silently narrow a multi-billion intended dimension into a small or
negative int that gets written into output_size->data[i]
- reach ResizeTensor with the wrapped dimension, allocating an
undersized output buffer
- have Eval (MirrorPadWorkerTask::Run) compute output_size as
NumElements(output_tensor) and then iterate through that count
while indexing via input_dims_num_elements stride math derived
from the original (un-wrapped) input dims, producing a heap-buffer-
overflow write whose size and content are controlled by the model
Notably, mirror_pad has no equivalent of pad.cc's CheckPaddingOverflow
helper — there is no upstream bounds check on `left_pad` / `right_pad`
at all.
Fix
---
Validate left_pad and right_pad as non-negative, do the addition in
int64_t, and bounds-check the result against
std::numeric_limits<int32_t>::max() before storing into
TfLiteIntArray::data[i]. On any failure, return nullptr — both
existing call sites (Prepare and Eval) already handle a nullptr
unique_ptr by returning kTfLiteError.
Also drop <stdint.h> / <stddef.h> in favor of <cstdint> / <cstddef>
per the style review on PR tensorflow#115031.
This is the same family of fix as PRs tensorflow#115031 (stablehlo_reduce_window),
tensorflow#115452 (stablehlo_pad), tensorflow#115453 (space_to_batch_nd / batch_to_space_nd),
and tensorflow#115454 (tile). Mirror_pad is the closest sibling of stablehlo_pad
and shares the exact same downstream OOB pattern.
pad::ResizeOutputTensor() validates each individual padding value via CheckPaddingOverflow (it bounds them to fit in int32), but does NOT validate that the SUM `input + before_padding + after_padding` fits in int32. Two int32 paddings near INT32_MAX summed with any positive input dim trivially overflow int32 while each individual value stays "valid". The wrapped sum is stored into output_size->data[idx] (int) and used by ResizeTensor to allocate the output buffer. optimized_ops::Pad in Eval later iterates over the real padding values via op_params.left/right_padding (which come from the same paddings tensor but NOT through the wrapped sum), writing past the under-sized allocation — a heap-buffer-overflow write whose size and content are controlled by the model. A malicious .tflite that contains a Pad op with a paddings tensor whose two int32 values sum to more than INT32_MAX can therefore corrupt heap memory in any TFLite consumer that loads the model. Fix --- Compute the per-dimension output size in int64 and bounds-check the result against INT32_MAX before storing into TfLiteIntArray::data[]. On overflow, free the partially-built output_size, log a kernel error, and return kTfLiteError before ResizeTensor. This is the same family of fix as PRs tensorflow#115031 (stablehlo_reduce_window), tensorflow#115452 (stablehlo_pad), tensorflow#115453 (space/batch_to_batch_nd), tensorflow#115454 (tile), and tensorflow#115455 (mirror_pad). pad.cc has CheckPaddingOverflow as a "looks like a bounds check" gate, but as documented in the previous fixes, per-operand validation is not sufficient when the sum of valid operands can itself overflow. Also drop <stdint.h> in favor of <cstdint> per the style review on PR tensorflow#115031.
pad::ResizeOutputTensor() validates each individual padding value via CheckPaddingOverflow (it bounds them to fit in int32), but does NOT validate that the SUM `input + before_padding + after_padding` fits in int32. Two int32 paddings near INT32_MAX summed with any positive input dim trivially overflow int32 while each individual value stays "valid". The wrapped sum is stored into output_size->data[idx] (int) and used by ResizeTensor to allocate the output buffer. optimized_ops::Pad in Eval later iterates over the real padding values via op_params.left/right_padding (which come from the same paddings tensor but NOT through the wrapped sum), writing past the under-sized allocation — a heap-buffer-overflow write whose size and content are controlled by the model. A malicious .tflite that contains a Pad op with a paddings tensor whose two int32 values sum to more than INT32_MAX can therefore corrupt heap memory in any TFLite consumer that loads the model. Fix --- Compute the per-dimension output size in int64 and bounds-check the result against INT32_MAX before storing into TfLiteIntArray::data[]. On overflow, free the partially-built output_size, log a kernel error, and return kTfLiteError before ResizeTensor. This is the same family of fix as PRs tensorflow#115031 (stablehlo_reduce_window), tensorflow#115452 (stablehlo_pad), tensorflow#115453 (space/batch_to_batch_nd), tensorflow#115454 (tile), and tensorflow#115455 (mirror_pad). pad.cc has CheckPaddingOverflow as a "looks like a bounds check" gate, but as documented in the previous fixes, per-operand validation is not sufficient when the sum of valid operands can itself overflow. Also drop <stdint.h> in favor of <cstdint> per the style review on PR tensorflow#115031.
fill::ResizeOutputImpl<T>() and broadcast_to::ResizeOutputTensor() both
consume an attacker-controlled int32 or int64 shape tensor and assign
each value into TfLiteIntArray::data[i] (int) without validating that
the value fits in int32 — a silent narrowing that can wrap any
multi-billion intended dimension into a small or negative int.
fill.cc
-------
T data = GetTensorData<T>(dims)[i];
if (data < 0) { error } // catches negative
output_shape->data[i] = data; // T -> int silent narrowing
The negativity check is necessary but not sufficient: when T == int64_t a
positive int64 value such as 0x100000001 passes `data < 0` and silently
narrows to a small int when assigned to output_shape->data[i].
broadcast_to.cc
---------------
auto get_shape_data = [op_context](int i) -> int32_t {
if (op_context->shape->type == kTfLiteInt32) {
return GetTensorData<int32_t>(op_context->shape)[i];
} else {
return GetTensorData<int64_t>(op_context->shape)[i];
}
};
...
output_shape->data[idx] = get_shape_data(idx);
The lambda forcibly narrows int64 -> int32 in its return type, throwing
away the high bits. There is no negativity check, no upper-bound check,
and no validation between the attacker shape tensor and ResizeTensor.
The chain in both kernels is identical to the bugs already fixed in
PRs tensorflow#115031 / tensorflow#115452 / tensorflow#115453 / tensorflow#115454 / tensorflow#115455 / tensorflow#115456: the
wrapped per-dim values flow into ResizeTensor; the kernel Eval path
later iterates over the un-wrapped intended output element count and
writes past the under-sized backing allocation — a heap-buffer-overflow
write controlled by the model.
Fix
---
In fill.cc, bounds-check `data` against the int32 range explicitly
before assigning into output_shape->data[i].
In broadcast_to.cc, change the return type of `get_shape_data` to
int64_t so the high bits survive, and bounds-check the value against
the int32 range at the assignment site. Both checks log a kernel
error and return kTfLiteError before ResizeTensor is reached.
Also drop <stdint.h> in favor of <cstdint> for both files (C++ only
translation units), per the style review on PR tensorflow#115031.
ResizeOutputTensor() in strided_slice.cc computes each output dimension as `dim_shape = end - begin`, both int32 values derived from the attacker-controlled begin / end / strides tensors. The subtraction is unchecked: with attacker-chosen begin and end (e.g. begin = -2, end = INT32_MAX) the int32 result silently wraps to INT32_MIN+1 — a same-sign value that is NOT caught by the `(dim_shape < 0) != (stride < 0)` guard. Subsequent division by stride (itself attacker-controlled) propagates the wrap into output_shape_vector, which is then passed to ResizeTensor. Additionally, the existing TFLITE_CHECK_LT(dim_shape, 0) before the negative-stride division is a release-build no-op (DCHECK) and the division itself can invoke UB if stride == INT32_MIN (because the unsigned absolute value of INT32_MIN cannot be represented in int32). A malicious .tflite that contains a StridedSlice op with crafted begin / end / stride constant tensors can therefore drive the per-dim output size to a wrapped value, ResizeTensor allocates an undersized output buffer, and the inner StridedSlice loop in Eval iterates over the un-wrapped logical output region and writes past the allocation — a heap-buffer-overflow write whose size and content are controlled by the model. Fix --- Promote dim_shape to int64_t before the subtraction so attacker int32 end / begin values cannot wrap. After the division, bounds-check the result against the int32 range used by TfLiteIntArray::data[] and return kTfLiteError on overflow before ResizeTensor. Reject stride == INT32_MIN explicitly via TF_LITE_ENSURE_MSG to avoid UB in the negate-and-divide step below. Add parentheses to the existing `(dim_shape < 0) != (stride < 0)` guard for clarity (the unparenthesised form is correct only by operator precedence accident). Drop <stdint.h> in favor of <cstdint> per the style review on PR tensorflow#115031. This is the same family of fix as PRs tensorflow#115031 / tensorflow#115452 / tensorflow#115453 / tensorflow#115454 / tensorflow#115455 / tensorflow#115456 / tensorflow#115457 — same bug class, same downstream narrowing into TfLiteIntArray::data[], same heap-OOB-write outcome, same fix template (validate + early-return before ResizeTensor).
gather_nd::Prepare reads `indices_nd` from `indices->dims->data[
indices_rank - 1]`, which is an attacker-controlled int32 value coming
from the .tflite model's `indices` tensor shape. The existing bound
check only validated the upper bound:
if (indices_nd > params_rank) { error }
A *negative* value passes this check (negative is not > positive) and
propagates into the rest of Prepare:
// output_rank wraps to a huge positive int because subtracting a
// negative is the same as adding a positive
const int output_rank = indices_rank + params_rank - indices_nd - 1;
TfLiteIntArray* output_shape = TfLiteIntArrayCreate(output_rank);
...
for (int i = indices_nd; i < params_rank; ++i) {
output_shape->data[output_index++] = params->dims->data[i]; // OOB read
}
Two distinct memory-safety primitives result:
1. The loop iterates `for (int i = indices_nd; i < params_rank; ++i)`
with i starting at a negative int32 and ending at a small positive
value. Each iteration reads `params->dims->data[i]` for negative
i — an OOB read of params->dims->data on the order of ~2^31 entries.
2. `output_index` runs past `output_rank` and writes into
`output_shape->data[]` past the just-allocated TfLiteIntArray
storage, corrupting heap memory adjacent to the allocation.
3. The wrapped `output_rank` also produces a garbage TfLiteIntArray
shape, which `ResizeTensor` then uses to size the output tensor;
the kernel Eval path later walks `reference_ops::GatherNd` with the
un-wrapped intended sizes, producing a heap-buffer-overflow write
whose length and content are controlled by the model.
A malicious .tflite that contains a GatherNd op whose `indices` tensor
has its innermost dimension set to a negative int32 (e.g. via a crafted
flatbuffer) is enough to trigger the chain.
Fix
---
Validate `indices_nd >= 0` in addition to the existing
`indices_nd > params_rank` check, and add a defensive
`TF_LITE_ENSURE(context, output_rank >= 0)` after the subtraction so
any future regression that introduces a wrap is caught before the
TfLiteIntArrayCreate / loop combination.
Also drop <stdint.h> in favor of <cstdint> per the style review on
PR tensorflow#115031.
This is the same family of fix as PRs tensorflow#115031 / tensorflow#115452 / tensorflow#115453 /
tensorflow#115454 / tensorflow#115455 / tensorflow#115456 / tensorflow#115457 / tensorflow#115458, with the addition
that gather_nd carries a *second* OOB primitive (the loop reading
params->dims->data[negative]).
scatter_nd::ResizeOutputTensor and sparse_to_dense::Resize both copy
the contents of an attacker-controlled `shape` / `output_shape` input
tensor directly into TfLiteIntArray::data[i] (`int`) without
validating that each value is non-negative or fits in the int32 range
used by the TfLiteIntArray storage.
scatter_nd
----------
for (int i = 0; i < shape_rank; i++) {
output_shape->data[i] = shape_data[i];
}
`shape_data` is `IndicesT*` where IndicesT can be int32 or int64. For
int64 a positive value > INT32_MAX silently narrows; negatives pass
unchecked. The wrapped per-dim value flows into ResizeTensor and the
kernel Eval (reference_ops::ScatterNd) later writes through indices
that are not bounds-checked against the wrapped output dims —
heap-buffer-overflow write whose length and content are controlled by
the model.
sparse_to_dense
---------------
for (int i = 0; i < output_dimensions; ++i) {
output_shape_array->data[i] = GetTensorData<T>(output_shape)[i];
}
Same shape, same template-T narrowing. T = int64 silently truncates;
negatives pass through. reference_ops::SparseToDense then writes at
indices computed from sparse `indices` (also unchecked) that are
nominally within the un-truncated output region — heap OOB write.
Fix
---
In both files, accumulate each dim into an int64 temporary and reject
values that are negative or exceed std::numeric_limits<int32_t>::max()
before assigning into TfLiteIntArray::data[i]. On any failure, free
the partially-built TfLiteIntArray, log a kernel error, and return
kTfLiteError before ResizeTensor.
Drop <stdint.h> in favor of <cstdint> for both files (C++ only TUs)
per the style review on PR tensorflow#115031.
This is the same family of fix as PRs tensorflow#115031 / tensorflow#115452 / tensorflow#115453 /
tensorflow#115454 / tensorflow#115455 / tensorflow#115456 / tensorflow#115457 / tensorflow#115458 / tensorflow#115459.
…ites
slice::CalculateOutputShapeVector<T> reads `begin` and `size` values
from attacker-controlled int32 / int64 input tensors. The existing
validation has three independent gaps:
1. begin[idx] is never validated to be >= 0 or <= input_dim. With a
negative begin and size_value == -1, the code computes
`size_value = input_dim - begin` which is larger than input_dim,
producing an output shape that asks the kernel Eval path to read
past the end of input. With a begin > input_dim the same
computation underflows or overshoots.
2. The `else` branch checks `input_dim < begin + size` in template-T
arithmetic, where T can be int64. With begin = INT64_MAX and
size = 1, the addition signed-overflows to INT64_MIN and the
check `input_dim < INT64_MIN` is false → bypass. The unchecked
begin then propagates to GetBeginAndSizeVectors and into
reference_ops::Slice as `op_params.begin[i]`, where it is used
to compute `input_offset + begin * stride` for the read pointer.
Result: OOB read on the input buffer.
3. The final `static_cast<int>(size_value)` silently truncates int64
to int. A large positive int64 size value becomes a small or
negative int written into the output_shape_vector and on into
ResizeTensor, producing an undersized buffer that the kernel
later overruns.
A malicious .tflite with crafted begin / size constant tensors can
therefore drive any of these into a heap-buffer-overflow read or
write, depending on which branch is taken.
Fix
---
* Validate begin in [0, input_dim] before either branch.
* Compute `begin + size` in int64 in the else branch so the
comparison cannot wrap.
* Bounds-check the final size_value against the int range used by
output_shape_vector before the static_cast.
Drop <stdint.h> in favor of <cstdint> per the style review on
PR tensorflow#115031.
This is the same family of fix as PRs tensorflow#115031, tensorflow#115452, tensorflow#115453,
tensorflow#115455, tensorflow#115456, tensorflow#115457, tensorflow#115458, tensorflow#115459, tensorflow#115460. Twelve tflite
kernels in the CheckedInt incomplete-pattern hunt now share the same
bug class.
| // Compute the output shape with overflow checks. | ||
| // | ||
| // Each output dimension is `(dim - 1) * (interior_pad + 1) + 1 + | ||
| // edge_pad_low + edge_pad_high`. All inputs are int64_t derived from the | ||
| // .tflite model parameters and tensor metadata, so a malicious model can | ||
| // drive the multiplication and additions across the int64 boundary. | ||
| // Without an overflow guard, the result is silently truncated to int32_t | ||
| // by `BuildOuputTensorDims()` below (`dims->data[i] = output_shape_[i]`) | ||
| // and the kernel later writes the un-truncated `output_size_` worth of | ||
| // bytes into the under-sized output tensor allocated from those int32 | ||
| // dims, producing a heap-buffer-overflow write. | ||
| // | ||
| // CheckedInt<int64_t> tracks overflow on every operation; we abort the | ||
| // setup and surface the failure via HasOverflow() so Prepare() can return | ||
| // kTfLiteError before the output tensor is resized. |
| output_shape_[i] = (dims[i] - 1) * (interior_pad_[i] + 1) + 1 + | ||
| edge_pad_low_[i] + edge_pad_high_[i]; | ||
| const CheckedInt<int64_t> dim = | ||
| (CheckedInt<int64_t>(dims[i]) - 1) * (interior_pad_[i] + 1) + 1 + |
There was a problem hiding this comment.
interior_pad_[i] + 1 could also overflow.
| (CheckedInt<int64_t>(dims[i]) - 1) * (interior_pad_[i] + 1) + 1 + | |
| (CheckedInt<int64_t>(dims[i]) - 1) * (CheckedInt(interior_pad_[i]) + 1) + 1 + |
| if (dim_size.Overflow()) { | ||
| overflow_ = true; | ||
| return; | ||
| } |
There was a problem hiding this comment.
No need to check on every loop iteration.
| output_strides_[rank - 1] = element_size * (interior_pad_[rank - 1] + 1); | ||
| { | ||
| const CheckedInt<int64_t> stride = | ||
| CheckedInt<int64_t>(element_size) * (interior_pad_[rank - 1] + 1); |
There was a problem hiding this comment.
interior_pad_[rank - 1] + 1may also overflow.element_sizeis already anint64_t, is there a need to specify it in theCheckedIntinstanciation?
| output_strides_[i] = output_dimension_sizes_[i] * (interior_pad_[i] + 1); | ||
| const CheckedInt<int64_t> stride = | ||
| CheckedInt<int64_t>(output_dimension_sizes_[i]) * | ||
| (interior_pad_[i] + 1); |
There was a problem hiding this comment.
interior_pad_[i] + 1 could overflow.
| if (offset.Overflow()) { | ||
| overflow_ = true; | ||
| return; | ||
| } |
There was a problem hiding this comment.
No need to check on every loop iteration.
| { | ||
| CheckedInt<int64_t> offset(int64_t{0}); | ||
| for (int i = 0; i < rank; ++i) { | ||
| offset = offset + CheckedInt<int64_t>(std::max<int64_t>( |
There was a problem hiding this comment.
- Use the
+=operator. - Is there a need to specify
int64_t?std::max<int64_t>()will return anint64_tand it should be deduced.
| offset = offset + CheckedInt<int64_t>(std::max<int64_t>( | |
| offset += CheckedInt(std::max<int64_t>( |
| // Reject the model if any of the int64 shape/stride/offset/size | ||
| // computations in PadData::Setup overflowed. Without this check the | ||
| // wrapped int64 values would be silently truncated by | ||
| // BuildOuputTensorDims() (assigning int64 -> int) and the kernel would | ||
| // later write the un-truncated `output_size_` worth of bytes into a | ||
| // smaller-than-expected output tensor — a heap-buffer-overflow write. |
There was a problem hiding this comment.
That comment is superfluous and is already explained by the log message. Remove it.
…edInt propagation) Per qukhan review on PR tensorflow#115452: - Drop the verbose 'with overflow checks' comment block in Setup() and the matching rationale in Prepare(); they duplicate the commit message. - Wrap '(interior_pad_[i] + 1)' in CheckedInt at every site (output shape, rank-1 stride, per-dim stride) — the '+ 1' was itself unguarded. - Drop redundant <int64_t> template args where CTAD deduces int64_t (element_size, output_shape_[i+1], output_dimension_sizes_[i], std::max<int64_t>(...), dims[i]). - Pull the overflow checks out of the per-dim loops. CheckedInt propagates the overflow flag through arithmetic, so we accumulate with 'overflow_ |= x.Overflow();' inside each loop and check once after the related blocks (output shape; dim-sizes + both stride loops; offset; final size; input strides). - Use 'offset +=' instead of 'offset = offset + ...' in the output-offset loop. - Rewrite the final-output-size loop as std::accumulate with std::multiplies<>() seeded by CheckedInt(element_size). Verified: //tensorflow/lite/kernels:stablehlo_pad_test PASSED.
…ation Imported from GitHub PR tensorflow/tensorflow#115452 ## Summary `PadData::Setup()` in `tensorflow/lite/kernels/stablehlo_pad.cc` computes the output tensor's shape, strides, offsets and total size as `int64_t` arithmetic on parameters that come directly from the `.tflite` model: the input dimensions and the per-dim `interior_padding`, `edge_padding_low`, `edge_padding_high` arrays. **None of those multiplications and additions are checked for overflow.** ## The chain The result is then used in two places: 1. **`BuildOuputTensorDims()`** copies `output_shape_[i]` (int64_t) into `TfLiteIntArray::data[i]` (`int`) — a silent narrowing that can turn a wrapped negative int64 into a small positive int32. ```cpp TfLiteIntArray* BuildOuputTensorDims() const { TfLiteIntArray* dims = TfLiteIntArrayCreate(rank_); for (int64_t i = 0; i < rank_; ++i) { dims->data[i] = output_shape_[i]; // <-- int64 -> int silent narrowing } return dims; } ``` 2. **`ResizeTensor`** allocates the output buffer using those (potentially tiny) int32 dims. **`Apply()`** then writes `output_size_` bytes (the *un-truncated* int64 product) into that tensor via `FillBuffer` + `StridedCopy`, producing a **heap-buffer-overflow write** whose length and content are controlled by the model. ```cpp void Apply(const char* input, const char* padding_value, char* output) const { FillBuffer(output, output_size_, padding_value, element_size_); // <-- writes int64 size StridedCopy(rank_, input + input_offset_, input_shape_, input_strides_, output + output_offset_, output_strides_, element_size_, 0); } ``` A malicious `.tflite` that contains a `stablehlo.pad` op with crafted `interior_padding` / `edge_padding` values can therefore corrupt heap memory in any TFLite consumer (mobile apps, edge devices, server-side inference) that loads the model. ## Vulnerable arithmetic (master HEAD) ```cpp // Compute the output shape. for (int i = 0; i < rank; ++i) { output_shape_[i] = (dims[i] - 1) * (interior_pad_[i] + 1) + 1 + edge_pad_low_[i] + edge_pad_high_[i]; // unchecked int64 math } ... output_dimension_sizes_[i] = output_shape_[i + 1] * output_dimension_sizes_[i + 1]; // unchecked ... output_strides_[rank - 1] = element_size * (interior_pad_[rank - 1] + 1); // unchecked output_strides_[i] = output_dimension_sizes_[i] * (interior_pad_[i] + 1); ... output_offset_ += std::max<int64_t>(edge_pad_low_[i], 0) * output_dimension_sizes_[i]; // unchecked accumulation ... output_size_ = std::accumulate(output_shape_, output_shape_ + rank, element_size, std::multiplies<>()); // unchecked accumulation ... input_strides_[i - 1] = dims[i] * input_strides_[i]; // unchecked ``` ## Fix Route every `int64` multiplication and addition in `PadData::Setup` through the new `CheckedInt<int64_t>` helper template added in commit [1908dd32](tensorflow/tensorflow@1908dd3) ("Add a helper template to compute simple expressions and check that no overflow has happened."). On any overflow, set a new `overflow_` flag and bail out of `Setup()` early; expose the flag via a `HasOverflow()` accessor. `Prepare()` now consults `HasOverflow()` between the `Setup()` call and the `ResizeTensor` call, logs a kernel error and returns `kTfLiteError` if any of the shape/stride/offset/size computations wrapped. This kills the chain before the under-sized output tensor can be allocated. ```diff + if (pad_data.HasOverflow()) { + TF_LITE_KERNEL_LOG(context, + "stablehlo.pad parameters cause integer overflow in " + "output shape computation"); + return kTfLiteError; + } ``` ## Relationship to PR #115031 This is the **same shape of fix** that PR [#115031](tensorflow/tensorflow#115031) applies to `stablehlo_reduce_window.cc`. Both kernels share: - the same bug class (unchecked int64 arithmetic on attacker-controlled shape/dilation/padding values) - the same downstream int32 narrowing path (`BuildTfLiteArray<int32_t>` / `dims->data[i] = ...`) - the same heap-OOB-write outcome - the same fix template (`CheckedInt<int64_t>` + `HasOverflow()` + `Prepare()` early-return) I went looking for siblings of the `reduce_window` bug while addressing the review feedback on #115031, and `stablehlo_pad` was the first hit. The patch is contained to `stablehlo_pad.cc` and does not introduce any new helpers — `CheckedInt` is reused as-is from `tensorflow/lite/util.h`. ## Test plan - [x] No public API change. - [x] No new dependencies — `CheckedInt<int64_t>` already exists in `tensorflow/lite/util.h` after 1908dd32. - [x] Existing `stablehlo_pad_test` tests pass against the patched kernel (the new code path is only reached when overflow would have occurred, which existing tests do not exercise). - [ ] Happy to add regression tests covering overflow cases (`interior_padding` near INT64_MAX, `edge_padding_high` near INT64_MAX) on request. ## Files changed | File | Lines | |------|-------| | `tensorflow/lite/kernels/stablehlo_pad.cc` | +94 / -13 | Copybara import of the project: -- 036fe919c89880728ad11d0d5399958a7013e640 by mohammadmseet-hue <mohammadmseet@gmail.com>: Fix integer overflow in stablehlo.pad output shape computation PadData::Setup() in tensorflow/lite/kernels/stablehlo_pad.cc computes the output tensor's shape, strides, offsets and total size as int64_t arithmetic on parameters that come directly from the .tflite model: the input dimensions and the per-dim `interior_padding`, `edge_padding_low`, `edge_padding_high` arrays. None of those multiplications and additions are checked for overflow. The result is then used in two places: 1. BuildOuputTensorDims() copies output_shape_[i] (int64_t) into TfLiteIntArray::data[i] (int) — a silent narrowing that can turn a wrapped negative int64 into a small positive int32. 2. ResizeTensor allocates the output buffer using those (potentially tiny) int32 dims. Apply() then writes output_size_ bytes (the un-truncated int64 product) into that tensor via FillBuffer + StridedCopy, producing a heap-buffer-overflow write whose length and content are controlled by the model. A malicious .tflite that contains a stablehlo.pad op with crafted interior_padding / edge_padding values can therefore corrupt heap memory in any TFLite consumer (mobile apps, edge devices, server-side inference) that loads the model. Fix --- Route every int64 multiplication and addition in PadData::Setup through the new CheckedInt<int64_t> helper template added in commit 1908dd32 ("Add a helper template to compute simple expressions and check that no overflow has happened."). On any overflow, set a new `overflow_` flag and bail out of Setup() early; expose the flag via a `HasOverflow()` accessor. Prepare() now consults HasOverflow() between the Setup() call and the ResizeTensor call, logs a kernel error and returns kTfLiteError if any of the shape/stride/offset/size computations wrapped. This kills the chain before the under-sized output tensor can be allocated. This is the same shape of fix that PR #115031 applies to stablehlo_reduce_window.cc; both kernels share the same bug class and the same downstream int32 narrowing path. The patch is contained to stablehlo_pad.cc and does not introduce any new helpers — CheckedInt is reused as-is from tensorflow/lite/util.h. -- f8281275ef8bb103e23ce42aa9b36b2047953099 by mohammadmseet-hue <mohammadmseet@gmail.com>: stablehlo_pad: address review feedback (single overflow check + CheckedInt propagation) Per qukhan review on PR #115452: - Drop the verbose 'with overflow checks' comment block in Setup() and the matching rationale in Prepare(); they duplicate the commit message. - Wrap '(interior_pad_[i] + 1)' in CheckedInt at every site (output shape, rank-1 stride, per-dim stride) — the '+ 1' was itself unguarded. - Drop redundant <int64_t> template args where CTAD deduces int64_t (element_size, output_shape_[i+1], output_dimension_sizes_[i], std::max<int64_t>(...), dims[i]). - Pull the overflow checks out of the per-dim loops. CheckedInt propagates the overflow flag through arithmetic, so we accumulate with 'overflow_ |= x.Overflow();' inside each loop and check once after the related blocks (output shape; dim-sizes + both stride loops; offset; final size; input strides). - Use 'offset +=' instead of 'offset = offset + ...' in the output-offset loop. - Rewrite the final-output-size loop as std::accumulate with std::multiplies<>() seeded by CheckedInt(element_size). Verified: //tensorflow/lite/kernels:stablehlo_pad_test PASSED. Merging this change closes #115452 FUTURE_COPYBARA_INTEGRATE_REVIEW=tensorflow/tensorflow#115452 from mohammadmseet-hue:fix-stablehlo-pad-overflow f8281275ef8bb103e23ce42aa9b36b2047953099 PiperOrigin-RevId: 897092651
|
This PR fixed some of the problems, but it seems still missing some. Like crop/input-offset arithmetic in PadData::Setup() . I have a similar change that is waiting google-ai-edge/LiteRT#6763 to be merged first. Then I can submit the PR. |
…ation Imported from GitHub PR tensorflow/tensorflow#115452 ## Summary `PadData::Setup()` in `tensorflow/lite/kernels/stablehlo_pad.cc` computes the output tensor's shape, strides, offsets and total size as `int64_t` arithmetic on parameters that come directly from the `.tflite` model: the input dimensions and the per-dim `interior_padding`, `edge_padding_low`, `edge_padding_high` arrays. **None of those multiplications and additions are checked for overflow.** ## The chain The result is then used in two places: 1. **`BuildOuputTensorDims()`** copies `output_shape_[i]` (int64_t) into `TfLiteIntArray::data[i]` (`int`) — a silent narrowing that can turn a wrapped negative int64 into a small positive int32. ```cpp TfLiteIntArray* BuildOuputTensorDims() const { TfLiteIntArray* dims = TfLiteIntArrayCreate(rank_); for (int64_t i = 0; i < rank_; ++i) { dims->data[i] = output_shape_[i]; // <-- int64 -> int silent narrowing } return dims; } ``` 2. **`ResizeTensor`** allocates the output buffer using those (potentially tiny) int32 dims. **`Apply()`** then writes `output_size_` bytes (the *un-truncated* int64 product) into that tensor via `FillBuffer` + `StridedCopy`, producing a **heap-buffer-overflow write** whose length and content are controlled by the model. ```cpp void Apply(const char* input, const char* padding_value, char* output) const { FillBuffer(output, output_size_, padding_value, element_size_); // <-- writes int64 size StridedCopy(rank_, input + input_offset_, input_shape_, input_strides_, output + output_offset_, output_strides_, element_size_, 0); } ``` A malicious `.tflite` that contains a `stablehlo.pad` op with crafted `interior_padding` / `edge_padding` values can therefore corrupt heap memory in any TFLite consumer (mobile apps, edge devices, server-side inference) that loads the model. ## Vulnerable arithmetic (master HEAD) ```cpp // Compute the output shape. for (int i = 0; i < rank; ++i) { output_shape_[i] = (dims[i] - 1) * (interior_pad_[i] + 1) + 1 + edge_pad_low_[i] + edge_pad_high_[i]; // unchecked int64 math } ... output_dimension_sizes_[i] = output_shape_[i + 1] * output_dimension_sizes_[i + 1]; // unchecked ... output_strides_[rank - 1] = element_size * (interior_pad_[rank - 1] + 1); // unchecked output_strides_[i] = output_dimension_sizes_[i] * (interior_pad_[i] + 1); ... output_offset_ += std::max<int64_t>(edge_pad_low_[i], 0) * output_dimension_sizes_[i]; // unchecked accumulation ... output_size_ = std::accumulate(output_shape_, output_shape_ + rank, element_size, std::multiplies<>()); // unchecked accumulation ... input_strides_[i - 1] = dims[i] * input_strides_[i]; // unchecked ``` ## Fix Route every `int64` multiplication and addition in `PadData::Setup` through the new `CheckedInt<int64_t>` helper template added in commit [1908dd32](tensorflow/tensorflow@1908dd3) ("Add a helper template to compute simple expressions and check that no overflow has happened."). On any overflow, set a new `overflow_` flag and bail out of `Setup()` early; expose the flag via a `HasOverflow()` accessor. `Prepare()` now consults `HasOverflow()` between the `Setup()` call and the `ResizeTensor` call, logs a kernel error and returns `kTfLiteError` if any of the shape/stride/offset/size computations wrapped. This kills the chain before the under-sized output tensor can be allocated. ```diff + if (pad_data.HasOverflow()) { + TF_LITE_KERNEL_LOG(context, + "stablehlo.pad parameters cause integer overflow in " + "output shape computation"); + return kTfLiteError; + } ``` ## Relationship to PR #115031 This is the **same shape of fix** that PR [#115031](tensorflow/tensorflow#115031) applies to `stablehlo_reduce_window.cc`. Both kernels share: - the same bug class (unchecked int64 arithmetic on attacker-controlled shape/dilation/padding values) - the same downstream int32 narrowing path (`BuildTfLiteArray<int32_t>` / `dims->data[i] = ...`) - the same heap-OOB-write outcome - the same fix template (`CheckedInt<int64_t>` + `HasOverflow()` + `Prepare()` early-return) I went looking for siblings of the `reduce_window` bug while addressing the review feedback on #115031, and `stablehlo_pad` was the first hit. The patch is contained to `stablehlo_pad.cc` and does not introduce any new helpers — `CheckedInt` is reused as-is from `tensorflow/lite/util.h`. ## Test plan - [x] No public API change. - [x] No new dependencies — `CheckedInt<int64_t>` already exists in `tensorflow/lite/util.h` after 1908dd32. - [x] Existing `stablehlo_pad_test` tests pass against the patched kernel (the new code path is only reached when overflow would have occurred, which existing tests do not exercise). - [ ] Happy to add regression tests covering overflow cases (`interior_padding` near INT64_MAX, `edge_padding_high` near INT64_MAX) on request. ## Files changed | File | Lines | |------|-------| | `tensorflow/lite/kernels/stablehlo_pad.cc` | +94 / -13 | Copybara import of the project: -- 036fe919c89880728ad11d0d5399958a7013e640 by mohammadmseet-hue <mohammadmseet@gmail.com>: Fix integer overflow in stablehlo.pad output shape computation PadData::Setup() in tensorflow/lite/kernels/stablehlo_pad.cc computes the output tensor's shape, strides, offsets and total size as int64_t arithmetic on parameters that come directly from the .tflite model: the input dimensions and the per-dim `interior_padding`, `edge_padding_low`, `edge_padding_high` arrays. None of those multiplications and additions are checked for overflow. The result is then used in two places: 1. BuildOuputTensorDims() copies output_shape_[i] (int64_t) into TfLiteIntArray::data[i] (int) — a silent narrowing that can turn a wrapped negative int64 into a small positive int32. 2. ResizeTensor allocates the output buffer using those (potentially tiny) int32 dims. Apply() then writes output_size_ bytes (the un-truncated int64 product) into that tensor via FillBuffer + StridedCopy, producing a heap-buffer-overflow write whose length and content are controlled by the model. A malicious .tflite that contains a stablehlo.pad op with crafted interior_padding / edge_padding values can therefore corrupt heap memory in any TFLite consumer (mobile apps, edge devices, server-side inference) that loads the model. Fix --- Route every int64 multiplication and addition in PadData::Setup through the new CheckedInt<int64_t> helper template added in commit 1908dd32 ("Add a helper template to compute simple expressions and check that no overflow has happened."). On any overflow, set a new `overflow_` flag and bail out of Setup() early; expose the flag via a `HasOverflow()` accessor. Prepare() now consults HasOverflow() between the Setup() call and the ResizeTensor call, logs a kernel error and returns kTfLiteError if any of the shape/stride/offset/size computations wrapped. This kills the chain before the under-sized output tensor can be allocated. This is the same shape of fix that PR #115031 applies to stablehlo_reduce_window.cc; both kernels share the same bug class and the same downstream int32 narrowing path. The patch is contained to stablehlo_pad.cc and does not introduce any new helpers — CheckedInt is reused as-is from tensorflow/lite/util.h. -- f8281275ef8bb103e23ce42aa9b36b2047953099 by mohammadmseet-hue <mohammadmseet@gmail.com>: stablehlo_pad: address review feedback (single overflow check + CheckedInt propagation) Per qukhan review on PR #115452: - Drop the verbose 'with overflow checks' comment block in Setup() and the matching rationale in Prepare(); they duplicate the commit message. - Wrap '(interior_pad_[i] + 1)' in CheckedInt at every site (output shape, rank-1 stride, per-dim stride) — the '+ 1' was itself unguarded. - Drop redundant <int64_t> template args where CTAD deduces int64_t (element_size, output_shape_[i+1], output_dimension_sizes_[i], std::max<int64_t>(...), dims[i]). - Pull the overflow checks out of the per-dim loops. CheckedInt propagates the overflow flag through arithmetic, so we accumulate with 'overflow_ |= x.Overflow();' inside each loop and check once after the related blocks (output shape; dim-sizes + both stride loops; offset; final size; input strides). - Use 'offset +=' instead of 'offset = offset + ...' in the output-offset loop. - Rewrite the final-output-size loop as std::accumulate with std::multiplies<>() seeded by CheckedInt(element_size). Verified: //tensorflow/lite/kernels:stablehlo_pad_test PASSED. Merging this change closes #115452 FUTURE_COPYBARA_INTEGRATE_REVIEW=tensorflow/tensorflow#115452 from mohammadmseet-hue:fix-stablehlo-pad-overflow f8281275ef8bb103e23ce42aa9b36b2047953099 PiperOrigin-RevId: 897092651
…ation Imported from GitHub PR tensorflow/tensorflow#115452 ## Summary `PadData::Setup()` in `tensorflow/lite/kernels/stablehlo_pad.cc` computes the output tensor's shape, strides, offsets and total size as `int64_t` arithmetic on parameters that come directly from the `.tflite` model: the input dimensions and the per-dim `interior_padding`, `edge_padding_low`, `edge_padding_high` arrays. **None of those multiplications and additions are checked for overflow.** ## The chain The result is then used in two places: 1. **`BuildOuputTensorDims()`** copies `output_shape_[i]` (int64_t) into `TfLiteIntArray::data[i]` (`int`) — a silent narrowing that can turn a wrapped negative int64 into a small positive int32. ```cpp TfLiteIntArray* BuildOuputTensorDims() const { TfLiteIntArray* dims = TfLiteIntArrayCreate(rank_); for (int64_t i = 0; i < rank_; ++i) { dims->data[i] = output_shape_[i]; // <-- int64 -> int silent narrowing } return dims; } ``` 2. **`ResizeTensor`** allocates the output buffer using those (potentially tiny) int32 dims. **`Apply()`** then writes `output_size_` bytes (the *un-truncated* int64 product) into that tensor via `FillBuffer` + `StridedCopy`, producing a **heap-buffer-overflow write** whose length and content are controlled by the model. ```cpp void Apply(const char* input, const char* padding_value, char* output) const { FillBuffer(output, output_size_, padding_value, element_size_); // <-- writes int64 size StridedCopy(rank_, input + input_offset_, input_shape_, input_strides_, output + output_offset_, output_strides_, element_size_, 0); } ``` A malicious `.tflite` that contains a `stablehlo.pad` op with crafted `interior_padding` / `edge_padding` values can therefore corrupt heap memory in any TFLite consumer (mobile apps, edge devices, server-side inference) that loads the model. ## Vulnerable arithmetic (master HEAD) ```cpp // Compute the output shape. for (int i = 0; i < rank; ++i) { output_shape_[i] = (dims[i] - 1) * (interior_pad_[i] + 1) + 1 + edge_pad_low_[i] + edge_pad_high_[i]; // unchecked int64 math } ... output_dimension_sizes_[i] = output_shape_[i + 1] * output_dimension_sizes_[i + 1]; // unchecked ... output_strides_[rank - 1] = element_size * (interior_pad_[rank - 1] + 1); // unchecked output_strides_[i] = output_dimension_sizes_[i] * (interior_pad_[i] + 1); ... output_offset_ += std::max<int64_t>(edge_pad_low_[i], 0) * output_dimension_sizes_[i]; // unchecked accumulation ... output_size_ = std::accumulate(output_shape_, output_shape_ + rank, element_size, std::multiplies<>()); // unchecked accumulation ... input_strides_[i - 1] = dims[i] * input_strides_[i]; // unchecked ``` ## Fix Route every `int64` multiplication and addition in `PadData::Setup` through the new `CheckedInt<int64_t>` helper template added in commit [1908dd32](tensorflow/tensorflow@1908dd3) ("Add a helper template to compute simple expressions and check that no overflow has happened."). On any overflow, set a new `overflow_` flag and bail out of `Setup()` early; expose the flag via a `HasOverflow()` accessor. `Prepare()` now consults `HasOverflow()` between the `Setup()` call and the `ResizeTensor` call, logs a kernel error and returns `kTfLiteError` if any of the shape/stride/offset/size computations wrapped. This kills the chain before the under-sized output tensor can be allocated. ```diff + if (pad_data.HasOverflow()) { + TF_LITE_KERNEL_LOG(context, + "stablehlo.pad parameters cause integer overflow in " + "output shape computation"); + return kTfLiteError; + } ``` ## Relationship to PR #115031 This is the **same shape of fix** that PR [#115031](tensorflow/tensorflow#115031) applies to `stablehlo_reduce_window.cc`. Both kernels share: - the same bug class (unchecked int64 arithmetic on attacker-controlled shape/dilation/padding values) - the same downstream int32 narrowing path (`BuildTfLiteArray<int32_t>` / `dims->data[i] = ...`) - the same heap-OOB-write outcome - the same fix template (`CheckedInt<int64_t>` + `HasOverflow()` + `Prepare()` early-return) I went looking for siblings of the `reduce_window` bug while addressing the review feedback on #115031, and `stablehlo_pad` was the first hit. The patch is contained to `stablehlo_pad.cc` and does not introduce any new helpers — `CheckedInt` is reused as-is from `tensorflow/lite/util.h`. ## Test plan - [x] No public API change. - [x] No new dependencies — `CheckedInt<int64_t>` already exists in `tensorflow/lite/util.h` after 1908dd32. - [x] Existing `stablehlo_pad_test` tests pass against the patched kernel (the new code path is only reached when overflow would have occurred, which existing tests do not exercise). - [ ] Happy to add regression tests covering overflow cases (`interior_padding` near INT64_MAX, `edge_padding_high` near INT64_MAX) on request. ## Files changed | File | Lines | |------|-------| | `tensorflow/lite/kernels/stablehlo_pad.cc` | +94 / -13 | Copybara import of the project: -- 036fe919c89880728ad11d0d5399958a7013e640 by mohammadmseet-hue <mohammadmseet@gmail.com>: Fix integer overflow in stablehlo.pad output shape computation PadData::Setup() in tensorflow/lite/kernels/stablehlo_pad.cc computes the output tensor's shape, strides, offsets and total size as int64_t arithmetic on parameters that come directly from the .tflite model: the input dimensions and the per-dim `interior_padding`, `edge_padding_low`, `edge_padding_high` arrays. None of those multiplications and additions are checked for overflow. The result is then used in two places: 1. BuildOuputTensorDims() copies output_shape_[i] (int64_t) into TfLiteIntArray::data[i] (int) — a silent narrowing that can turn a wrapped negative int64 into a small positive int32. 2. ResizeTensor allocates the output buffer using those (potentially tiny) int32 dims. Apply() then writes output_size_ bytes (the un-truncated int64 product) into that tensor via FillBuffer + StridedCopy, producing a heap-buffer-overflow write whose length and content are controlled by the model. A malicious .tflite that contains a stablehlo.pad op with crafted interior_padding / edge_padding values can therefore corrupt heap memory in any TFLite consumer (mobile apps, edge devices, server-side inference) that loads the model. Fix --- Route every int64 multiplication and addition in PadData::Setup through the new CheckedInt<int64_t> helper template added in commit 1908dd32 ("Add a helper template to compute simple expressions and check that no overflow has happened."). On any overflow, set a new `overflow_` flag and bail out of Setup() early; expose the flag via a `HasOverflow()` accessor. Prepare() now consults HasOverflow() between the Setup() call and the ResizeTensor call, logs a kernel error and returns kTfLiteError if any of the shape/stride/offset/size computations wrapped. This kills the chain before the under-sized output tensor can be allocated. This is the same shape of fix that PR #115031 applies to stablehlo_reduce_window.cc; both kernels share the same bug class and the same downstream int32 narrowing path. The patch is contained to stablehlo_pad.cc and does not introduce any new helpers — CheckedInt is reused as-is from tensorflow/lite/util.h. -- f8281275ef8bb103e23ce42aa9b36b2047953099 by mohammadmseet-hue <mohammadmseet@gmail.com>: stablehlo_pad: address review feedback (single overflow check + CheckedInt propagation) Per qukhan review on PR #115452: - Drop the verbose 'with overflow checks' comment block in Setup() and the matching rationale in Prepare(); they duplicate the commit message. - Wrap '(interior_pad_[i] + 1)' in CheckedInt at every site (output shape, rank-1 stride, per-dim stride) — the '+ 1' was itself unguarded. - Drop redundant <int64_t> template args where CTAD deduces int64_t (element_size, output_shape_[i+1], output_dimension_sizes_[i], std::max<int64_t>(...), dims[i]). - Pull the overflow checks out of the per-dim loops. CheckedInt propagates the overflow flag through arithmetic, so we accumulate with 'overflow_ |= x.Overflow();' inside each loop and check once after the related blocks (output shape; dim-sizes + both stride loops; offset; final size; input strides). - Use 'offset +=' instead of 'offset = offset + ...' in the output-offset loop. - Rewrite the final-output-size loop as std::accumulate with std::multiplies<>() seeded by CheckedInt(element_size). Verified: //tensorflow/lite/kernels:stablehlo_pad_test PASSED. Merging this change closes #115452 FUTURE_COPYBARA_INTEGRATE_REVIEW=tensorflow/tensorflow#115452 from mohammadmseet-hue:fix-stablehlo-pad-overflow f8281275ef8bb103e23ce42aa9b36b2047953099 PiperOrigin-RevId: 897092651
…ation Imported from GitHub PR tensorflow/tensorflow#115452 ## Summary `PadData::Setup()` in `tensorflow/lite/kernels/stablehlo_pad.cc` computes the output tensor's shape, strides, offsets and total size as `int64_t` arithmetic on parameters that come directly from the `.tflite` model: the input dimensions and the per-dim `interior_padding`, `edge_padding_low`, `edge_padding_high` arrays. **None of those multiplications and additions are checked for overflow.** ## The chain The result is then used in two places: 1. **`BuildOuputTensorDims()`** copies `output_shape_[i]` (int64_t) into `TfLiteIntArray::data[i]` (`int`) — a silent narrowing that can turn a wrapped negative int64 into a small positive int32. ```cpp TfLiteIntArray* BuildOuputTensorDims() const { TfLiteIntArray* dims = TfLiteIntArrayCreate(rank_); for (int64_t i = 0; i < rank_; ++i) { dims->data[i] = output_shape_[i]; // <-- int64 -> int silent narrowing } return dims; } ``` 2. **`ResizeTensor`** allocates the output buffer using those (potentially tiny) int32 dims. **`Apply()`** then writes `output_size_` bytes (the *un-truncated* int64 product) into that tensor via `FillBuffer` + `StridedCopy`, producing a **heap-buffer-overflow write** whose length and content are controlled by the model. ```cpp void Apply(const char* input, const char* padding_value, char* output) const { FillBuffer(output, output_size_, padding_value, element_size_); // <-- writes int64 size StridedCopy(rank_, input + input_offset_, input_shape_, input_strides_, output + output_offset_, output_strides_, element_size_, 0); } ``` A malicious `.tflite` that contains a `stablehlo.pad` op with crafted `interior_padding` / `edge_padding` values can therefore corrupt heap memory in any TFLite consumer (mobile apps, edge devices, server-side inference) that loads the model. ## Vulnerable arithmetic (master HEAD) ```cpp // Compute the output shape. for (int i = 0; i < rank; ++i) { output_shape_[i] = (dims[i] - 1) * (interior_pad_[i] + 1) + 1 + edge_pad_low_[i] + edge_pad_high_[i]; // unchecked int64 math } ... output_dimension_sizes_[i] = output_shape_[i + 1] * output_dimension_sizes_[i + 1]; // unchecked ... output_strides_[rank - 1] = element_size * (interior_pad_[rank - 1] + 1); // unchecked output_strides_[i] = output_dimension_sizes_[i] * (interior_pad_[i] + 1); ... output_offset_ += std::max<int64_t>(edge_pad_low_[i], 0) * output_dimension_sizes_[i]; // unchecked accumulation ... output_size_ = std::accumulate(output_shape_, output_shape_ + rank, element_size, std::multiplies<>()); // unchecked accumulation ... input_strides_[i - 1] = dims[i] * input_strides_[i]; // unchecked ``` ## Fix Route every `int64` multiplication and addition in `PadData::Setup` through the new `CheckedInt<int64_t>` helper template added in commit [1908dd32](tensorflow/tensorflow@1908dd3) ("Add a helper template to compute simple expressions and check that no overflow has happened."). On any overflow, set a new `overflow_` flag and bail out of `Setup()` early; expose the flag via a `HasOverflow()` accessor. `Prepare()` now consults `HasOverflow()` between the `Setup()` call and the `ResizeTensor` call, logs a kernel error and returns `kTfLiteError` if any of the shape/stride/offset/size computations wrapped. This kills the chain before the under-sized output tensor can be allocated. ```diff + if (pad_data.HasOverflow()) { + TF_LITE_KERNEL_LOG(context, + "stablehlo.pad parameters cause integer overflow in " + "output shape computation"); + return kTfLiteError; + } ``` ## Relationship to PR #115031 This is the **same shape of fix** that PR [#115031](tensorflow/tensorflow#115031) applies to `stablehlo_reduce_window.cc`. Both kernels share: - the same bug class (unchecked int64 arithmetic on attacker-controlled shape/dilation/padding values) - the same downstream int32 narrowing path (`BuildTfLiteArray<int32_t>` / `dims->data[i] = ...`) - the same heap-OOB-write outcome - the same fix template (`CheckedInt<int64_t>` + `HasOverflow()` + `Prepare()` early-return) I went looking for siblings of the `reduce_window` bug while addressing the review feedback on #115031, and `stablehlo_pad` was the first hit. The patch is contained to `stablehlo_pad.cc` and does not introduce any new helpers — `CheckedInt` is reused as-is from `tensorflow/lite/util.h`. ## Test plan - [x] No public API change. - [x] No new dependencies — `CheckedInt<int64_t>` already exists in `tensorflow/lite/util.h` after 1908dd32. - [x] Existing `stablehlo_pad_test` tests pass against the patched kernel (the new code path is only reached when overflow would have occurred, which existing tests do not exercise). - [ ] Happy to add regression tests covering overflow cases (`interior_padding` near INT64_MAX, `edge_padding_high` near INT64_MAX) on request. ## Files changed | File | Lines | |------|-------| | `tensorflow/lite/kernels/stablehlo_pad.cc` | +94 / -13 | Copybara import of the project: -- 036fe919c89880728ad11d0d5399958a7013e640 by mohammadmseet-hue <mohammadmseet@gmail.com>: Fix integer overflow in stablehlo.pad output shape computation PadData::Setup() in tensorflow/lite/kernels/stablehlo_pad.cc computes the output tensor's shape, strides, offsets and total size as int64_t arithmetic on parameters that come directly from the .tflite model: the input dimensions and the per-dim `interior_padding`, `edge_padding_low`, `edge_padding_high` arrays. None of those multiplications and additions are checked for overflow. The result is then used in two places: 1. BuildOuputTensorDims() copies output_shape_[i] (int64_t) into TfLiteIntArray::data[i] (int) — a silent narrowing that can turn a wrapped negative int64 into a small positive int32. 2. ResizeTensor allocates the output buffer using those (potentially tiny) int32 dims. Apply() then writes output_size_ bytes (the un-truncated int64 product) into that tensor via FillBuffer + StridedCopy, producing a heap-buffer-overflow write whose length and content are controlled by the model. A malicious .tflite that contains a stablehlo.pad op with crafted interior_padding / edge_padding values can therefore corrupt heap memory in any TFLite consumer (mobile apps, edge devices, server-side inference) that loads the model. Fix --- Route every int64 multiplication and addition in PadData::Setup through the new CheckedInt<int64_t> helper template added in commit 1908dd32 ("Add a helper template to compute simple expressions and check that no overflow has happened."). On any overflow, set a new `overflow_` flag and bail out of Setup() early; expose the flag via a `HasOverflow()` accessor. Prepare() now consults HasOverflow() between the Setup() call and the ResizeTensor call, logs a kernel error and returns kTfLiteError if any of the shape/stride/offset/size computations wrapped. This kills the chain before the under-sized output tensor can be allocated. This is the same shape of fix that PR #115031 applies to stablehlo_reduce_window.cc; both kernels share the same bug class and the same downstream int32 narrowing path. The patch is contained to stablehlo_pad.cc and does not introduce any new helpers — CheckedInt is reused as-is from tensorflow/lite/util.h. -- f8281275ef8bb103e23ce42aa9b36b2047953099 by mohammadmseet-hue <mohammadmseet@gmail.com>: stablehlo_pad: address review feedback (single overflow check + CheckedInt propagation) Per qukhan review on PR #115452: - Drop the verbose 'with overflow checks' comment block in Setup() and the matching rationale in Prepare(); they duplicate the commit message. - Wrap '(interior_pad_[i] + 1)' in CheckedInt at every site (output shape, rank-1 stride, per-dim stride) — the '+ 1' was itself unguarded. - Drop redundant <int64_t> template args where CTAD deduces int64_t (element_size, output_shape_[i+1], output_dimension_sizes_[i], std::max<int64_t>(...), dims[i]). - Pull the overflow checks out of the per-dim loops. CheckedInt propagates the overflow flag through arithmetic, so we accumulate with 'overflow_ |= x.Overflow();' inside each loop and check once after the related blocks (output shape; dim-sizes + both stride loops; offset; final size; input strides). - Use 'offset +=' instead of 'offset = offset + ...' in the output-offset loop. - Rewrite the final-output-size loop as std::accumulate with std::multiplies<>() seeded by CheckedInt(element_size). Verified: //tensorflow/lite/kernels:stablehlo_pad_test PASSED. Merging this change closes #115452 FUTURE_COPYBARA_INTEGRATE_REVIEW=tensorflow/tensorflow#115452 from mohammadmseet-hue:fix-stablehlo-pad-overflow f8281275ef8bb103e23ce42aa9b36b2047953099 PiperOrigin-RevId: 897092651
…ation Imported from GitHub PR tensorflow/tensorflow#115452 ## Summary `PadData::Setup()` in `tensorflow/lite/kernels/stablehlo_pad.cc` computes the output tensor's shape, strides, offsets and total size as `int64_t` arithmetic on parameters that come directly from the `.tflite` model: the input dimensions and the per-dim `interior_padding`, `edge_padding_low`, `edge_padding_high` arrays. **None of those multiplications and additions are checked for overflow.** ## The chain The result is then used in two places: 1. **`BuildOuputTensorDims()`** copies `output_shape_[i]` (int64_t) into `TfLiteIntArray::data[i]` (`int`) — a silent narrowing that can turn a wrapped negative int64 into a small positive int32. ```cpp TfLiteIntArray* BuildOuputTensorDims() const { TfLiteIntArray* dims = TfLiteIntArrayCreate(rank_); for (int64_t i = 0; i < rank_; ++i) { dims->data[i] = output_shape_[i]; // <-- int64 -> int silent narrowing } return dims; } ``` 2. **`ResizeTensor`** allocates the output buffer using those (potentially tiny) int32 dims. **`Apply()`** then writes `output_size_` bytes (the *un-truncated* int64 product) into that tensor via `FillBuffer` + `StridedCopy`, producing a **heap-buffer-overflow write** whose length and content are controlled by the model. ```cpp void Apply(const char* input, const char* padding_value, char* output) const { FillBuffer(output, output_size_, padding_value, element_size_); // <-- writes int64 size StridedCopy(rank_, input + input_offset_, input_shape_, input_strides_, output + output_offset_, output_strides_, element_size_, 0); } ``` A malicious `.tflite` that contains a `stablehlo.pad` op with crafted `interior_padding` / `edge_padding` values can therefore corrupt heap memory in any TFLite consumer (mobile apps, edge devices, server-side inference) that loads the model. ## Vulnerable arithmetic (master HEAD) ```cpp // Compute the output shape. for (int i = 0; i < rank; ++i) { output_shape_[i] = (dims[i] - 1) * (interior_pad_[i] + 1) + 1 + edge_pad_low_[i] + edge_pad_high_[i]; // unchecked int64 math } ... output_dimension_sizes_[i] = output_shape_[i + 1] * output_dimension_sizes_[i + 1]; // unchecked ... output_strides_[rank - 1] = element_size * (interior_pad_[rank - 1] + 1); // unchecked output_strides_[i] = output_dimension_sizes_[i] * (interior_pad_[i] + 1); ... output_offset_ += std::max<int64_t>(edge_pad_low_[i], 0) * output_dimension_sizes_[i]; // unchecked accumulation ... output_size_ = std::accumulate(output_shape_, output_shape_ + rank, element_size, std::multiplies<>()); // unchecked accumulation ... input_strides_[i - 1] = dims[i] * input_strides_[i]; // unchecked ``` ## Fix Route every `int64` multiplication and addition in `PadData::Setup` through the new `CheckedInt<int64_t>` helper template added in commit [1908dd32](tensorflow/tensorflow@1908dd3) ("Add a helper template to compute simple expressions and check that no overflow has happened."). On any overflow, set a new `overflow_` flag and bail out of `Setup()` early; expose the flag via a `HasOverflow()` accessor. `Prepare()` now consults `HasOverflow()` between the `Setup()` call and the `ResizeTensor` call, logs a kernel error and returns `kTfLiteError` if any of the shape/stride/offset/size computations wrapped. This kills the chain before the under-sized output tensor can be allocated. ```diff + if (pad_data.HasOverflow()) { + TF_LITE_KERNEL_LOG(context, + "stablehlo.pad parameters cause integer overflow in " + "output shape computation"); + return kTfLiteError; + } ``` ## Relationship to PR #115031 This is the **same shape of fix** that PR [#115031](tensorflow/tensorflow#115031) applies to `stablehlo_reduce_window.cc`. Both kernels share: - the same bug class (unchecked int64 arithmetic on attacker-controlled shape/dilation/padding values) - the same downstream int32 narrowing path (`BuildTfLiteArray<int32_t>` / `dims->data[i] = ...`) - the same heap-OOB-write outcome - the same fix template (`CheckedInt<int64_t>` + `HasOverflow()` + `Prepare()` early-return) I went looking for siblings of the `reduce_window` bug while addressing the review feedback on #115031, and `stablehlo_pad` was the first hit. The patch is contained to `stablehlo_pad.cc` and does not introduce any new helpers — `CheckedInt` is reused as-is from `tensorflow/lite/util.h`. ## Test plan - [x] No public API change. - [x] No new dependencies — `CheckedInt<int64_t>` already exists in `tensorflow/lite/util.h` after 1908dd32. - [x] Existing `stablehlo_pad_test` tests pass against the patched kernel (the new code path is only reached when overflow would have occurred, which existing tests do not exercise). - [ ] Happy to add regression tests covering overflow cases (`interior_padding` near INT64_MAX, `edge_padding_high` near INT64_MAX) on request. ## Files changed | File | Lines | |------|-------| | `tensorflow/lite/kernels/stablehlo_pad.cc` | +94 / -13 | Copybara import of the project: -- 036fe919c89880728ad11d0d5399958a7013e640 by mohammadmseet-hue <mohammadmseet@gmail.com>: Fix integer overflow in stablehlo.pad output shape computation PadData::Setup() in tensorflow/lite/kernels/stablehlo_pad.cc computes the output tensor's shape, strides, offsets and total size as int64_t arithmetic on parameters that come directly from the .tflite model: the input dimensions and the per-dim `interior_padding`, `edge_padding_low`, `edge_padding_high` arrays. None of those multiplications and additions are checked for overflow. The result is then used in two places: 1. BuildOuputTensorDims() copies output_shape_[i] (int64_t) into TfLiteIntArray::data[i] (int) — a silent narrowing that can turn a wrapped negative int64 into a small positive int32. 2. ResizeTensor allocates the output buffer using those (potentially tiny) int32 dims. Apply() then writes output_size_ bytes (the un-truncated int64 product) into that tensor via FillBuffer + StridedCopy, producing a heap-buffer-overflow write whose length and content are controlled by the model. A malicious .tflite that contains a stablehlo.pad op with crafted interior_padding / edge_padding values can therefore corrupt heap memory in any TFLite consumer (mobile apps, edge devices, server-side inference) that loads the model. Fix --- Route every int64 multiplication and addition in PadData::Setup through the new CheckedInt<int64_t> helper template added in commit 1908dd32 ("Add a helper template to compute simple expressions and check that no overflow has happened."). On any overflow, set a new `overflow_` flag and bail out of Setup() early; expose the flag via a `HasOverflow()` accessor. Prepare() now consults HasOverflow() between the Setup() call and the ResizeTensor call, logs a kernel error and returns kTfLiteError if any of the shape/stride/offset/size computations wrapped. This kills the chain before the under-sized output tensor can be allocated. This is the same shape of fix that PR #115031 applies to stablehlo_reduce_window.cc; both kernels share the same bug class and the same downstream int32 narrowing path. The patch is contained to stablehlo_pad.cc and does not introduce any new helpers — CheckedInt is reused as-is from tensorflow/lite/util.h. -- f8281275ef8bb103e23ce42aa9b36b2047953099 by mohammadmseet-hue <mohammadmseet@gmail.com>: stablehlo_pad: address review feedback (single overflow check + CheckedInt propagation) Per qukhan review on PR #115452: - Drop the verbose 'with overflow checks' comment block in Setup() and the matching rationale in Prepare(); they duplicate the commit message. - Wrap '(interior_pad_[i] + 1)' in CheckedInt at every site (output shape, rank-1 stride, per-dim stride) — the '+ 1' was itself unguarded. - Drop redundant <int64_t> template args where CTAD deduces int64_t (element_size, output_shape_[i+1], output_dimension_sizes_[i], std::max<int64_t>(...), dims[i]). - Pull the overflow checks out of the per-dim loops. CheckedInt propagates the overflow flag through arithmetic, so we accumulate with 'overflow_ |= x.Overflow();' inside each loop and check once after the related blocks (output shape; dim-sizes + both stride loops; offset; final size; input strides). - Use 'offset +=' instead of 'offset = offset + ...' in the output-offset loop. - Rewrite the final-output-size loop as std::accumulate with std::multiplies<>() seeded by CheckedInt(element_size). Verified: //tensorflow/lite/kernels:stablehlo_pad_test PASSED. Merging this change closes #115452 FUTURE_COPYBARA_INTEGRATE_REVIEW=tensorflow/tensorflow#115452 from mohammadmseet-hue:fix-stablehlo-pad-overflow f8281275ef8bb103e23ce42aa9b36b2047953099 PiperOrigin-RevId: 897092651
…ation Imported from GitHub PR tensorflow/tensorflow#115452 ## Summary `PadData::Setup()` in `tensorflow/lite/kernels/stablehlo_pad.cc` computes the output tensor's shape, strides, offsets and total size as `int64_t` arithmetic on parameters that come directly from the `.tflite` model: the input dimensions and the per-dim `interior_padding`, `edge_padding_low`, `edge_padding_high` arrays. **None of those multiplications and additions are checked for overflow.** ## The chain The result is then used in two places: 1. **`BuildOuputTensorDims()`** copies `output_shape_[i]` (int64_t) into `TfLiteIntArray::data[i]` (`int`) — a silent narrowing that can turn a wrapped negative int64 into a small positive int32. ```cpp TfLiteIntArray* BuildOuputTensorDims() const { TfLiteIntArray* dims = TfLiteIntArrayCreate(rank_); for (int64_t i = 0; i < rank_; ++i) { dims->data[i] = output_shape_[i]; // <-- int64 -> int silent narrowing } return dims; } ``` 2. **`ResizeTensor`** allocates the output buffer using those (potentially tiny) int32 dims. **`Apply()`** then writes `output_size_` bytes (the *un-truncated* int64 product) into that tensor via `FillBuffer` + `StridedCopy`, producing a **heap-buffer-overflow write** whose length and content are controlled by the model. ```cpp void Apply(const char* input, const char* padding_value, char* output) const { FillBuffer(output, output_size_, padding_value, element_size_); // <-- writes int64 size StridedCopy(rank_, input + input_offset_, input_shape_, input_strides_, output + output_offset_, output_strides_, element_size_, 0); } ``` A malicious `.tflite` that contains a `stablehlo.pad` op with crafted `interior_padding` / `edge_padding` values can therefore corrupt heap memory in any TFLite consumer (mobile apps, edge devices, server-side inference) that loads the model. ## Vulnerable arithmetic (master HEAD) ```cpp // Compute the output shape. for (int i = 0; i < rank; ++i) { output_shape_[i] = (dims[i] - 1) * (interior_pad_[i] + 1) + 1 + edge_pad_low_[i] + edge_pad_high_[i]; // unchecked int64 math } ... output_dimension_sizes_[i] = output_shape_[i + 1] * output_dimension_sizes_[i + 1]; // unchecked ... output_strides_[rank - 1] = element_size * (interior_pad_[rank - 1] + 1); // unchecked output_strides_[i] = output_dimension_sizes_[i] * (interior_pad_[i] + 1); ... output_offset_ += std::max<int64_t>(edge_pad_low_[i], 0) * output_dimension_sizes_[i]; // unchecked accumulation ... output_size_ = std::accumulate(output_shape_, output_shape_ + rank, element_size, std::multiplies<>()); // unchecked accumulation ... input_strides_[i - 1] = dims[i] * input_strides_[i]; // unchecked ``` ## Fix Route every `int64` multiplication and addition in `PadData::Setup` through the new `CheckedInt<int64_t>` helper template added in commit [1908dd32](tensorflow/tensorflow@1908dd3) ("Add a helper template to compute simple expressions and check that no overflow has happened."). On any overflow, set a new `overflow_` flag and bail out of `Setup()` early; expose the flag via a `HasOverflow()` accessor. `Prepare()` now consults `HasOverflow()` between the `Setup()` call and the `ResizeTensor` call, logs a kernel error and returns `kTfLiteError` if any of the shape/stride/offset/size computations wrapped. This kills the chain before the under-sized output tensor can be allocated. ```diff + if (pad_data.HasOverflow()) { + TF_LITE_KERNEL_LOG(context, + "stablehlo.pad parameters cause integer overflow in " + "output shape computation"); + return kTfLiteError; + } ``` ## Relationship to PR #115031 This is the **same shape of fix** that PR [#115031](tensorflow/tensorflow#115031) applies to `stablehlo_reduce_window.cc`. Both kernels share: - the same bug class (unchecked int64 arithmetic on attacker-controlled shape/dilation/padding values) - the same downstream int32 narrowing path (`BuildTfLiteArray<int32_t>` / `dims->data[i] = ...`) - the same heap-OOB-write outcome - the same fix template (`CheckedInt<int64_t>` + `HasOverflow()` + `Prepare()` early-return) I went looking for siblings of the `reduce_window` bug while addressing the review feedback on #115031, and `stablehlo_pad` was the first hit. The patch is contained to `stablehlo_pad.cc` and does not introduce any new helpers — `CheckedInt` is reused as-is from `tensorflow/lite/util.h`. ## Test plan - [x] No public API change. - [x] No new dependencies — `CheckedInt<int64_t>` already exists in `tensorflow/lite/util.h` after 1908dd32. - [x] Existing `stablehlo_pad_test` tests pass against the patched kernel (the new code path is only reached when overflow would have occurred, which existing tests do not exercise). - [ ] Happy to add regression tests covering overflow cases (`interior_padding` near INT64_MAX, `edge_padding_high` near INT64_MAX) on request. ## Files changed | File | Lines | |------|-------| | `tensorflow/lite/kernels/stablehlo_pad.cc` | +94 / -13 | Copybara import of the project: -- 036fe919c89880728ad11d0d5399958a7013e640 by mohammadmseet-hue <mohammadmseet@gmail.com>: Fix integer overflow in stablehlo.pad output shape computation PadData::Setup() in tensorflow/lite/kernels/stablehlo_pad.cc computes the output tensor's shape, strides, offsets and total size as int64_t arithmetic on parameters that come directly from the .tflite model: the input dimensions and the per-dim `interior_padding`, `edge_padding_low`, `edge_padding_high` arrays. None of those multiplications and additions are checked for overflow. The result is then used in two places: 1. BuildOuputTensorDims() copies output_shape_[i] (int64_t) into TfLiteIntArray::data[i] (int) — a silent narrowing that can turn a wrapped negative int64 into a small positive int32. 2. ResizeTensor allocates the output buffer using those (potentially tiny) int32 dims. Apply() then writes output_size_ bytes (the un-truncated int64 product) into that tensor via FillBuffer + StridedCopy, producing a heap-buffer-overflow write whose length and content are controlled by the model. A malicious .tflite that contains a stablehlo.pad op with crafted interior_padding / edge_padding values can therefore corrupt heap memory in any TFLite consumer (mobile apps, edge devices, server-side inference) that loads the model. Fix --- Route every int64 multiplication and addition in PadData::Setup through the new CheckedInt<int64_t> helper template added in commit 1908dd32 ("Add a helper template to compute simple expressions and check that no overflow has happened."). On any overflow, set a new `overflow_` flag and bail out of Setup() early; expose the flag via a `HasOverflow()` accessor. Prepare() now consults HasOverflow() between the Setup() call and the ResizeTensor call, logs a kernel error and returns kTfLiteError if any of the shape/stride/offset/size computations wrapped. This kills the chain before the under-sized output tensor can be allocated. This is the same shape of fix that PR #115031 applies to stablehlo_reduce_window.cc; both kernels share the same bug class and the same downstream int32 narrowing path. The patch is contained to stablehlo_pad.cc and does not introduce any new helpers — CheckedInt is reused as-is from tensorflow/lite/util.h. -- f8281275ef8bb103e23ce42aa9b36b2047953099 by mohammadmseet-hue <mohammadmseet@gmail.com>: stablehlo_pad: address review feedback (single overflow check + CheckedInt propagation) Per qukhan review on PR #115452: - Drop the verbose 'with overflow checks' comment block in Setup() and the matching rationale in Prepare(); they duplicate the commit message. - Wrap '(interior_pad_[i] + 1)' in CheckedInt at every site (output shape, rank-1 stride, per-dim stride) — the '+ 1' was itself unguarded. - Drop redundant <int64_t> template args where CTAD deduces int64_t (element_size, output_shape_[i+1], output_dimension_sizes_[i], std::max<int64_t>(...), dims[i]). - Pull the overflow checks out of the per-dim loops. CheckedInt propagates the overflow flag through arithmetic, so we accumulate with 'overflow_ |= x.Overflow();' inside each loop and check once after the related blocks (output shape; dim-sizes + both stride loops; offset; final size; input strides). - Use 'offset +=' instead of 'offset = offset + ...' in the output-offset loop. - Rewrite the final-output-size loop as std::accumulate with std::multiplies<>() seeded by CheckedInt(element_size). Verified: //tensorflow/lite/kernels:stablehlo_pad_test PASSED. Merging this change closes #115452 FUTURE_COPYBARA_INTEGRATE_REVIEW=tensorflow/tensorflow#115452 from mohammadmseet-hue:fix-stablehlo-pad-overflow f8281275ef8bb103e23ce42aa9b36b2047953099 PiperOrigin-RevId: 897092651
…ation Imported from GitHub PR tensorflow/tensorflow#115452 ## Summary `PadData::Setup()` in `tensorflow/lite/kernels/stablehlo_pad.cc` computes the output tensor's shape, strides, offsets and total size as `int64_t` arithmetic on parameters that come directly from the `.tflite` model: the input dimensions and the per-dim `interior_padding`, `edge_padding_low`, `edge_padding_high` arrays. **None of those multiplications and additions are checked for overflow.** ## The chain The result is then used in two places: 1. **`BuildOuputTensorDims()`** copies `output_shape_[i]` (int64_t) into `TfLiteIntArray::data[i]` (`int`) — a silent narrowing that can turn a wrapped negative int64 into a small positive int32. ```cpp TfLiteIntArray* BuildOuputTensorDims() const { TfLiteIntArray* dims = TfLiteIntArrayCreate(rank_); for (int64_t i = 0; i < rank_; ++i) { dims->data[i] = output_shape_[i]; // <-- int64 -> int silent narrowing } return dims; } ``` 2. **`ResizeTensor`** allocates the output buffer using those (potentially tiny) int32 dims. **`Apply()`** then writes `output_size_` bytes (the *un-truncated* int64 product) into that tensor via `FillBuffer` + `StridedCopy`, producing a **heap-buffer-overflow write** whose length and content are controlled by the model. ```cpp void Apply(const char* input, const char* padding_value, char* output) const { FillBuffer(output, output_size_, padding_value, element_size_); // <-- writes int64 size StridedCopy(rank_, input + input_offset_, input_shape_, input_strides_, output + output_offset_, output_strides_, element_size_, 0); } ``` A malicious `.tflite` that contains a `stablehlo.pad` op with crafted `interior_padding` / `edge_padding` values can therefore corrupt heap memory in any TFLite consumer (mobile apps, edge devices, server-side inference) that loads the model. ## Vulnerable arithmetic (master HEAD) ```cpp // Compute the output shape. for (int i = 0; i < rank; ++i) { output_shape_[i] = (dims[i] - 1) * (interior_pad_[i] + 1) + 1 + edge_pad_low_[i] + edge_pad_high_[i]; // unchecked int64 math } ... output_dimension_sizes_[i] = output_shape_[i + 1] * output_dimension_sizes_[i + 1]; // unchecked ... output_strides_[rank - 1] = element_size * (interior_pad_[rank - 1] + 1); // unchecked output_strides_[i] = output_dimension_sizes_[i] * (interior_pad_[i] + 1); ... output_offset_ += std::max<int64_t>(edge_pad_low_[i], 0) * output_dimension_sizes_[i]; // unchecked accumulation ... output_size_ = std::accumulate(output_shape_, output_shape_ + rank, element_size, std::multiplies<>()); // unchecked accumulation ... input_strides_[i - 1] = dims[i] * input_strides_[i]; // unchecked ``` ## Fix Route every `int64` multiplication and addition in `PadData::Setup` through the new `CheckedInt<int64_t>` helper template added in commit [1908dd32](tensorflow/tensorflow@1908dd3) ("Add a helper template to compute simple expressions and check that no overflow has happened."). On any overflow, set a new `overflow_` flag and bail out of `Setup()` early; expose the flag via a `HasOverflow()` accessor. `Prepare()` now consults `HasOverflow()` between the `Setup()` call and the `ResizeTensor` call, logs a kernel error and returns `kTfLiteError` if any of the shape/stride/offset/size computations wrapped. This kills the chain before the under-sized output tensor can be allocated. ```diff + if (pad_data.HasOverflow()) { + TF_LITE_KERNEL_LOG(context, + "stablehlo.pad parameters cause integer overflow in " + "output shape computation"); + return kTfLiteError; + } ``` ## Relationship to PR #115031 This is the **same shape of fix** that PR [#115031](tensorflow/tensorflow#115031) applies to `stablehlo_reduce_window.cc`. Both kernels share: - the same bug class (unchecked int64 arithmetic on attacker-controlled shape/dilation/padding values) - the same downstream int32 narrowing path (`BuildTfLiteArray<int32_t>` / `dims->data[i] = ...`) - the same heap-OOB-write outcome - the same fix template (`CheckedInt<int64_t>` + `HasOverflow()` + `Prepare()` early-return) I went looking for siblings of the `reduce_window` bug while addressing the review feedback on #115031, and `stablehlo_pad` was the first hit. The patch is contained to `stablehlo_pad.cc` and does not introduce any new helpers — `CheckedInt` is reused as-is from `tensorflow/lite/util.h`. ## Test plan - [x] No public API change. - [x] No new dependencies — `CheckedInt<int64_t>` already exists in `tensorflow/lite/util.h` after 1908dd32. - [x] Existing `stablehlo_pad_test` tests pass against the patched kernel (the new code path is only reached when overflow would have occurred, which existing tests do not exercise). - [ ] Happy to add regression tests covering overflow cases (`interior_padding` near INT64_MAX, `edge_padding_high` near INT64_MAX) on request. ## Files changed | File | Lines | |------|-------| | `tensorflow/lite/kernels/stablehlo_pad.cc` | +94 / -13 | Copybara import of the project: -- 036fe919c89880728ad11d0d5399958a7013e640 by mohammadmseet-hue <mohammadmseet@gmail.com>: Fix integer overflow in stablehlo.pad output shape computation PadData::Setup() in tensorflow/lite/kernels/stablehlo_pad.cc computes the output tensor's shape, strides, offsets and total size as int64_t arithmetic on parameters that come directly from the .tflite model: the input dimensions and the per-dim `interior_padding`, `edge_padding_low`, `edge_padding_high` arrays. None of those multiplications and additions are checked for overflow. The result is then used in two places: 1. BuildOuputTensorDims() copies output_shape_[i] (int64_t) into TfLiteIntArray::data[i] (int) — a silent narrowing that can turn a wrapped negative int64 into a small positive int32. 2. ResizeTensor allocates the output buffer using those (potentially tiny) int32 dims. Apply() then writes output_size_ bytes (the un-truncated int64 product) into that tensor via FillBuffer + StridedCopy, producing a heap-buffer-overflow write whose length and content are controlled by the model. A malicious .tflite that contains a stablehlo.pad op with crafted interior_padding / edge_padding values can therefore corrupt heap memory in any TFLite consumer (mobile apps, edge devices, server-side inference) that loads the model. Fix --- Route every int64 multiplication and addition in PadData::Setup through the new CheckedInt<int64_t> helper template added in commit 1908dd32 ("Add a helper template to compute simple expressions and check that no overflow has happened."). On any overflow, set a new `overflow_` flag and bail out of Setup() early; expose the flag via a `HasOverflow()` accessor. Prepare() now consults HasOverflow() between the Setup() call and the ResizeTensor call, logs a kernel error and returns kTfLiteError if any of the shape/stride/offset/size computations wrapped. This kills the chain before the under-sized output tensor can be allocated. This is the same shape of fix that PR #115031 applies to stablehlo_reduce_window.cc; both kernels share the same bug class and the same downstream int32 narrowing path. The patch is contained to stablehlo_pad.cc and does not introduce any new helpers — CheckedInt is reused as-is from tensorflow/lite/util.h. -- f8281275ef8bb103e23ce42aa9b36b2047953099 by mohammadmseet-hue <mohammadmseet@gmail.com>: stablehlo_pad: address review feedback (single overflow check + CheckedInt propagation) Per qukhan review on PR #115452: - Drop the verbose 'with overflow checks' comment block in Setup() and the matching rationale in Prepare(); they duplicate the commit message. - Wrap '(interior_pad_[i] + 1)' in CheckedInt at every site (output shape, rank-1 stride, per-dim stride) — the '+ 1' was itself unguarded. - Drop redundant <int64_t> template args where CTAD deduces int64_t (element_size, output_shape_[i+1], output_dimension_sizes_[i], std::max<int64_t>(...), dims[i]). - Pull the overflow checks out of the per-dim loops. CheckedInt propagates the overflow flag through arithmetic, so we accumulate with 'overflow_ |= x.Overflow();' inside each loop and check once after the related blocks (output shape; dim-sizes + both stride loops; offset; final size; input strides). - Use 'offset +=' instead of 'offset = offset + ...' in the output-offset loop. - Rewrite the final-output-size loop as std::accumulate with std::multiplies<>() seeded by CheckedInt(element_size). Verified: //tensorflow/lite/kernels:stablehlo_pad_test PASSED. Merging this change closes #115452 FUTURE_COPYBARA_INTEGRATE_REVIEW=tensorflow/tensorflow#115452 from mohammadmseet-hue:fix-stablehlo-pad-overflow f8281275ef8bb103e23ce42aa9b36b2047953099 PiperOrigin-RevId: 897092651
Summary
PadData::Setup()intensorflow/lite/kernels/stablehlo_pad.cccomputes the output tensor's shape, strides, offsets and total size asint64_tarithmetic on parameters that come directly from the.tflitemodel: the input dimensions and the per-diminterior_padding,edge_padding_low,edge_padding_higharrays. None of those multiplications and additions are checked for overflow.The chain
The result is then used in two places:
BuildOuputTensorDims()copiesoutput_shape_[i](int64_t) intoTfLiteIntArray::data[i](int) — a silent narrowing that can turn a wrapped negative int64 into a small positive int32.ResizeTensorallocates the output buffer using those (potentially tiny) int32 dims.Apply()then writesoutput_size_bytes (the un-truncated int64 product) into that tensor viaFillBuffer+StridedCopy, producing a heap-buffer-overflow write whose length and content are controlled by the model.A malicious
.tflitethat contains astablehlo.padop with craftedinterior_padding/edge_paddingvalues can therefore corrupt heap memory in any TFLite consumer (mobile apps, edge devices, server-side inference) that loads the model.Vulnerable arithmetic (master HEAD)
Fix
Route every
int64multiplication and addition inPadData::Setupthrough the newCheckedInt<int64_t>helper template added in commit 1908dd32 ("Add a helper template to compute simple expressions and check that no overflow has happened."). On any overflow, set a newoverflow_flag and bail out ofSetup()early; expose the flag via aHasOverflow()accessor.Prepare()now consultsHasOverflow()between theSetup()call and theResizeTensorcall, logs a kernel error and returnskTfLiteErrorif any of the shape/stride/offset/size computations wrapped. This kills the chain before the under-sized output tensor can be allocated.Relationship to PR #115031
This is the same shape of fix that PR #115031 applies to
stablehlo_reduce_window.cc. Both kernels share:BuildTfLiteArray<int32_t>/dims->data[i] = ...)CheckedInt<int64_t>+HasOverflow()+Prepare()early-return)I went looking for siblings of the
reduce_windowbug while addressing the review feedback on #115031, andstablehlo_padwas the first hit. The patch is contained tostablehlo_pad.ccand does not introduce any new helpers —CheckedIntis reused as-is fromtensorflow/lite/util.h.Test plan
CheckedInt<int64_t>already exists intensorflow/lite/util.hafter 1908dd3.stablehlo_pad_testtests pass against the patched kernel (the new code path is only reached when overflow would have occurred, which existing tests do not exercise).interior_paddingnear INT64_MAX,edge_padding_highnear INT64_MAX) on request.Files changed
tensorflow/lite/kernels/stablehlo_pad.cc