Fix text detection for blobs containing bytes in ASCII range up to a zero

mgrojo · mgrojo · commit 0adb0af13342 · 2019-03-02T13:51:31.000+01:00
Despite we provide a length for toUnicode() the validity/decode is being performed only up to the first null character, so it passes as text blobs containing bytes in the ASCII range, followed by a zero and anything else after. See issue #1772
diff --git a/src/Data.cpp b/src/Data.cpp
@@ -17,6 +17,12 @@ bool isTextOnly(QByteArray data, const QString& encoding, bool quickTest)
     if(startsWithBom(data))
         return true;
 
+    // We can assume that the default encoding (UTF-8) cannot contain character zero.
+    // This has to be checked explicitly because toUnicode() is ignoring bytes beyond
+    // the zero.
+    if(encoding.isEmpty() && data.contains('\0'))
+        return false;
+
     // Truncate to the first couple of bytes for quick testing
     int testSize = quickTest? std::min(512, data.size()) : data.size();
     QTextCodec::ConverterState state;