Skip to content

Commit 0adb0af

Browse files
committed
Fix text detection for blobs containing bytes in ASCII range up to a zero
Despite we provide a length for toUnicode() the validity/decode is being performed only up to the first null character, so it passes as text blobs containing bytes in the ASCII range, followed by a zero and anything else after. See issue #1772
1 parent a7fc1ab commit 0adb0af

1 file changed

Lines changed: 6 additions & 0 deletions

File tree

src/Data.cpp

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,12 @@ bool isTextOnly(QByteArray data, const QString& encoding, bool quickTest)
1717
if(startsWithBom(data))
1818
return true;
1919

20+
// We can assume that the default encoding (UTF-8) cannot contain character zero.
21+
// This has to be checked explicitly because toUnicode() is ignoring bytes beyond
22+
// the zero.
23+
if(encoding.isEmpty() && data.contains('\0'))
24+
return false;
25+
2026
// Truncate to the first couple of bytes for quick testing
2127
int testSize = quickTest? std::min(512, data.size()) : data.size();
2228
QTextCodec::ConverterState state;

0 commit comments

Comments
 (0)