Skip to content

[io] Improve checks when reading arrays with TBufferFile#21814

Merged
dpiparo merged 3 commits intoroot-project:masterfrom
dpiparo:fix_tbuffer
Apr 9, 2026
Merged

[io] Improve checks when reading arrays with TBufferFile#21814
dpiparo merged 3 commits intoroot-project:masterfrom
dpiparo:fix_tbuffer

Conversation

@dpiparo
Copy link
Copy Markdown
Member

@dpiparo dpiparo commented Apr 7, 2026

No description provided.

@dpiparo dpiparo requested review from hahnjo and silverweed April 7, 2026 15:34
@dpiparo dpiparo self-assigned this Apr 7, 2026
@dpiparo dpiparo requested a review from pcanal as a code owner April 7, 2026 15:34
@dpiparo dpiparo added the in:I/O label Apr 7, 2026
Comment thread io/io/inc/TBufferFile.h Outdated
@@ -268,6 +269,11 @@ class TBufferFile : public TBufferIO {

//---------------------- TBufferFile inlines ---------------------------------------

//______________________________________________________________________________
inline bool TBufferFile::IsBadCollLen(Int_t len, Int_t n) const {
return n <= 0 || len <=0 || (size_t)len > (size_t)(fBufMax - fBufCur);
Copy link
Copy Markdown
Member

@pcanal pcanal Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side note: the value zero is a totally valid (not bad) value that just happens to be treated the same way as the invalid values (i.e. read nothing for the array)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a better name in mind? Maybe ShouldReadColl?

Comment thread io/io/src/TBufferFile.cxx Outdated
@@ -818,7 +822,7 @@ Int_t TBufferFile::ReadArray(Int_t *&ii)
*this >> n;
Int_t l = sizeof(Int_t)*n;

if (l <= 0 || l > fBufSize) return 0;
if (IsBadCollLen(l, n)) return 0;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (IsBadCollLen(l, n)) return 0;
if (!n || ExceedsBufferCapacity<Int_t>(n)) return 0;

and remove the variable l entirely.

If preferring not template:

Suggested change
if (IsBadCollLen(l, n)) return 0;
if (!n || ExceedsBufferCapacity(n, sizeof(Int_t)) return 0;

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting to change the checks previously in place?
Suppose l is dropped. What would happen to the case where n is positive, but l, equal to n*sizeof(type) is negative?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but l, equal to n*sizeof(type) is negative?

This is technically a spurious/redundant test. l is not used and if n*sizeof(type) is larger than 31bits then it will also fail the (n * sizeof(type)) > (size_t)(fBufMax - fBufCur) part of the test.

That said if the test was desired it could still be done equally well by calculating l inside the function ...

Copy link
Copy Markdown
Member

@pcanal pcanal Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(retracted while reviewing essential details)(side note sizeof(Int_t)==size_on_file(Int_t) but sizeof(Long_t) != size_on_file(Long_t)

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 7, 2026

Test Results

    22 files      22 suites   3d 10h 57m 27s ⏱️
 3 833 tests  3 782 ✅  1 💤 50 ❌
76 555 runs  76 487 ✅ 18 💤 50 ❌

For more details on these failures, see this check.

Results for commit e4ffd90.

♻️ This comment has been updated with latest results.

Comment thread io/io/inc/TBufferFile.h Outdated
Comment thread io/io/src/TBufferFile.cxx Outdated
@dpiparo
Copy link
Copy Markdown
Member Author

dpiparo commented Apr 8, 2026

The comments above are relevant, and easy to address. Thanks. I think that the tests failures are due to an issue of the file, which was never detected before. This simple program, that can be run as a macro too, reveals it:

#include "TFile.h"
#include "TTree.h"

/*
Invalid read happens at
- file "root://eospublic.cern.ch//eos/root-eos/testfiles/atlasFlushed.root"
- tree "CollectionTree"
- branch m_TrigMuonEFInfoContainers.vector<TPObjRef>
- basket 23

#0  0x00007ffff448d02c in __pthread_kill_implementation () from /lib64/libc.so.6
#1  0x00007ffff443fb86 in raise () from /lib64/libc.so.6
#2  0x00007ffff4429873 in abort () from /lib64/libc.so.6
#3  0x00007ffff48a1b21 in __gnu_cxx::__verbose_terminate_handler() [clone .cold] () from /lib64/libstdc++.so.6
#4  0x00007ffff48ad53c in __cxxabiv1::__terminate(void (*)()) () from /lib64/libstdc++.so.6
#5  0x00007ffff48ad5a7 in std::terminate() () from /lib64/libstdc++.so.6
#6  0x00007ffff48ad809 in __cxa_throw () from /lib64/libstdc++.so.6
#7  0x00007ffff72c639b in TBufferFile::IsBadCollLen (this=0x9112330, len=256, n=64) at /foo/root/io/io/src/TBufferFile.cxx:68

Here: len is 256, fBufMax 152116592, fBufCur 152116338, fBufMax-fBufCur = 254: two extra bytes

#8  0x00007ffff72bbcea in TBufferFile::ReadArray (this=0x9112330, ii=@0x90e3358: 0x0) at /foo/root/io/io/src/TBufferFile.cxx:839
#9  0x00007ffff634314d in TBasket::ReadBasketBuffers (this=0x90e3280, pos=298023188, len=3586, file=0x1723640) at /foo/root/tree/tree/src/TBasket.cxx:671
#10 0x00007ffff6352a21 in TBranch::GetBasketImpl (this=0x31aec30, basketnumber=23, user_buffer=0x0) at /foo/root/tree/tree/src/TBranch.cxx:1267
#11 0x00007ffff6353198 in TBranch::GetBasketAndFirst (this=0x31aec30, basket=@0x7fffffffcc80: 0x0, first=@0x7fffffffcc78: 1100, user_buffer=0x0)
    at /foo/root/tree/tree/src/TBranch.cxx:1389
#12 0x00007ffff6353fb0 in TBranch::GetEntry (this=0x31aec30, entry=1100, getall=0) at /foo/root/tree/tree/src/TBranch.cxx:1715
#13 0x00007ffff6366ec4 in TBranchElement::GetEntry (this=0x31aec30, entry=1100, getall=0) at /foo/root/tree/tree/src/TBranchElement.cxx:2826
#14 0x00007ffff6366a34 in TBranchElement::GetEntry (this=0x31ae610, entry=1100, getall=0) at /foo/root/tree/tree/src/TBranchElement.cxx:2761
#15 0x00007ffff6366a34 in TBranchElement::GetEntry (this=0x31adfd0, entry=1100, getall=0) at /foo/root/tree/tree/src/TBranchElement.cxx:2761
#16 0x00007ffff63e4347 in operator() (__closure=0x7fffffffd2b0) at /foo/root/tree/tree/src/TTree.cxx:5741
#17 0x00007ffff63e44d5 in TTree::GetEntry (this=0x13dd370, entry=1100, getall=0) at /foo/root/tree/tree/src/TTree.cxx:5819
#18 0x00000000004021e0 in main () at repro.cpp:9
(gdb) f 10
#10 0x00007ffff6352a21 in TBranch::GetBasketImpl (this=0x31aec30, basketnumber=23, user_buffer=0x0) at /foo/root/tree/tree/src/TBranch.cxx:1267

*/

int repro()
{
   auto f = std::unique_ptr<TFile>{TFile::Open("root://eospublic.cern.ch//eos/root-eos/testfiles/atlasFlushed.root")};
   auto t = f->Get<TTree>("CollectionTree");

   // not strictly necessary, but just to illustrate the matter
   t->SetBranchStatus("*",0);
   t->SetBranchStatus("m_TrigMuonEFInfoContainers.vector<TPObjRef>", 1);

   t->GetEntry(1100); // Invalid Read!

   return 0;
}

int main(){return repro();}

@dpiparo dpiparo force-pushed the fix_tbuffer branch 3 times, most recently from 6cbd362 to e50cef7 Compare April 8, 2026 14:56
@pcanal
Copy link
Copy Markdown
Member

pcanal commented Apr 8, 2026

The problem is with the management of the buffer length in TBuffer where there is 2 sources of truth (fBufMax and fBufSize) and in some circumstances they got out-of-sync. The fix is:

$ git diff core/base/src/TBuffer.cxx
diff --git a/core/base/src/TBuffer.cxx b/core/base/src/TBuffer.cxx
index 8a0c2b78e99..20cba8b524a 100644
--- a/core/base/src/TBuffer.cxx
+++ b/core/base/src/TBuffer.cxx
@@ -304,6 +304,7 @@ void TBuffer::SetReadMode()
       // We had reserved space for the free block count,
       // release it,
       fBufSize += kExtraSpace;
+      fBufMax = fBuffer + fBufSize;
    }
    fMode = kRead;
 }
@@ -317,6 +318,7 @@ void TBuffer::SetWriteMode()
       // We had not yet reserved space for the free block count,
       // reserve it now.
       fBufSize -= kExtraSpace;
+      fBufMax = fBuffer + fBufSize;
    }
    fMode = kWrite;
 }

Comment on lines 81 to 84
// that does not present corruption
const int efirst = 1101; //original value 0;
const int elast = 2999; //original value efirst+nentries;
if (cachesize == -2) {
Copy link
Copy Markdown
Member

@pcanal pcanal Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes (thankfully not needed thanks to fixing the underlying issue) are changing the nature of this test. In this case a better solution would have been to disabled the branch triggering an issue.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. They were there assuming the file had issues and to check if other problems were present thanks to the CI.

dpiparo and others added 2 commits April 9, 2026 07:40
Because, under certain circumstances, BufMax and fBufSize got
out-of-sync

Co-Authored-By: Philippe Canal <pcanal@fnal.gov>
i.e. in the TBufferFile::Read*Array* methods.

Previously, the collection length was checked to be positive
and that it was less than the total buffer size.

Those checks have been improved with a battery of 3 checks:
    1. The collection's number of elements is checked to be positive
    2. The collection length is checked to be positive
    3. The collection length is checked to be less than the portion
    of the buffer that remains to be read

The test roottest-root-io-prefetching-run had to be modified as well
because of what seems to be a corrupted input file.

thanks also to @Sebasteuo
@dpiparo dpiparo merged commit 6ab4902 into root-project:master Apr 9, 2026
27 of 30 checks passed
@dpiparo
Copy link
Copy Markdown
Member Author

dpiparo commented Apr 9, 2026

/backport to 6.38, 6.36, 6.32, 6.30, 6.28, 6.26

@root-project-bot
Copy link
Copy Markdown

Something went wrong with the backport to 6.38: @dpiparo please see the logs

@dpiparo dpiparo removed the in:I/O label Apr 9, 2026
@dpiparo
Copy link
Copy Markdown
Member Author

dpiparo commented Apr 9, 2026

/backport to 6.36, 6.32, 6.30, 6.28, 6.26

@root-project-bot
Copy link
Copy Markdown

Something went wrong with the backport to 6.32: @dpiparo please see the logs

@dpiparo
Copy link
Copy Markdown
Member Author

dpiparo commented Apr 9, 2026

/backport to 6.32

@dpiparo
Copy link
Copy Markdown
Member Author

dpiparo commented Apr 9, 2026

/backport to 6.30

@dpiparo
Copy link
Copy Markdown
Member Author

dpiparo commented Apr 9, 2026

/backport to 6.28

@dpiparo
Copy link
Copy Markdown
Member Author

dpiparo commented Apr 9, 2026

/backport to 6.26

@root-project-bot
Copy link
Copy Markdown

Something went wrong with the backport to 6.26: @dpiparo please see the logs

@root-project-bot
Copy link
Copy Markdown

Something went wrong with the backport to 6.28: @dpiparo please see the logs

@root-project-bot
Copy link
Copy Markdown

Something went wrong with the backport to 6.30: @dpiparo please see the logs

@root-project-bot
Copy link
Copy Markdown

Something went wrong with the backport to 6.32: @dpiparo please see the logs

Comment thread io/io/inc/TBufferFile.h
Comment on lines +273 to +280
inline bool TBufferFile::ShouldNotReadCollection(Int_t lengthInBytes, Int_t nElements) const
{
// Three cases here in which we should not read the collection:
// 1. The collection has zero or a negative number of elements or
// 2. The length in bytes of the collection is zero or negative
// 3. The remaining part of the buffer is too short to read the collection
return nElements <= 0 || lengthInBytes <= 0 || (size_t)lengthInBytes > (size_t)(fBufMax - fBufCur);
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
inline bool TBufferFile::ShouldNotReadCollection(Int_t lengthInBytes, Int_t nElements) const
{
// Three cases here in which we should not read the collection:
// 1. The collection has zero or a negative number of elements or
// 2. The length in bytes of the collection is zero or negative
// 3. The remaining part of the buffer is too short to read the collection
return nElements <= 0 || lengthInBytes <= 0 || (size_t)lengthInBytes > (size_t)(fBufMax - fBufCur);
}
inline bool TBufferFile::ShouldNotReadCollection(size_t lengthInBytes, Int_t nElements) const
{
// Three cases here in which we should not read the collection:
// 1. The collection has zero or a negative number of elements or
// 2. The length in bytes of the collection is zero or negative
// 3. The remaining part of the buffer is too short to read the collection
return nElements <= 0 || lengthInBytes > (size_t)(fBufMax - fBufCur);
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This requires the call site to become:

if (ShouldNotReadCollection(n*size_on_file(type), n)) return 0;

or

if (ShouldNotReadCollection(size_on_file(type), n)) return 0;

and do the multiplication inline.

@dpiparo dpiparo deleted the fix_tbuffer branch April 10, 2026 04:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants