Skip to content

Commit 95ed671

Browse files
committed
[Bug 15440] lc-compile: Permit string literals with embedded nuls
Internally, lc-compile uses nul-terminated strings. However, both the compiled module file format and the LiveCode runtime allow strings to contain nuls. Previously, `"\u{0}"` evaluated to the empty string. This was because `UnescapeStringLiteral()` faithfully decoded `\u{0}` to the byte string `{0x00, 0x00}`, which interpreted as a nul-terminated string is empty. When outputting a module file, compiler internal strings are converted to libfoundation `MCString`. So the nul-terminated strings used internally by the compiler are never exposed (apart possibly via error or diagnostic messages). A trivial fix is therefore to use Modified UTF-8 encoding for compiler-internal strings, where U+0000 is encoded as `{0xc0, 0x80}`.
1 parent 0437c68 commit 95ed671

4 files changed

Lines changed: 75 additions & 1 deletion

File tree

docs/lcb/notes/15440.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
# LiveCode Builder Tools
2+
# lc-compile
3+
4+
# [15440] Preserve nul characters in string literals

tests/lcb/compiler/literals.lcb

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
/*
2+
Copyright (C) 2015 LiveCode Ltd.
3+
4+
This file is part of LiveCode.
5+
6+
LiveCode is free software; you can redistribute it and/or modify it under
7+
the terms of the GNU General Public License v3 as published by the Free
8+
Software Foundation.
9+
10+
LiveCode is distributed in the hope that it will be useful, but WITHOUT ANY
11+
WARRANTY; without even the implied warranty of MERCHANTABILITY or
12+
FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
13+
for more details.
14+
15+
You should have received a copy of the GNU General Public License
16+
along with LiveCode. If not see <http://www.gnu.org/licenses/>. */
17+
18+
module com.livecode.compiler.literals.tests
19+
20+
public handler TestLiteralNul()
21+
variable tString
22+
23+
test "single literal nul" when the number of chars in "\u{0}" is 1
24+
25+
put "a\u{0}b\u{0}c" into tString
26+
test diagnostic "literal nuls: actual string length is" && \
27+
the number of chars in tString formatted as string
28+
test "literal nuls" when the number of chars in tString is 5
29+
end handler
30+
31+
end module

tests/lcb/stdlib/typeconvert.lcb

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
/*
2+
Copyright (C) 2015 LiveCode Ltd.
3+
4+
This file is part of LiveCode.
5+
6+
LiveCode is free software; you can redistribute it and/or modify it under
7+
the terms of the GNU General Public License v3 as published by the Free
8+
Software Foundation.
9+
10+
LiveCode is distributed in the hope that it will be useful, but WITHOUT ANY
11+
WARRANTY; without even the implied warranty of MERCHANTABILITY or
12+
FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
13+
for more details.
14+
15+
You should have received a copy of the GNU General Public License
16+
along with LiveCode. If not see <http://www.gnu.org/licenses/>. */
17+
18+
module com.livecode.typeconvert.tests
19+
20+
public handler TestSplit()
21+
-- Bug 15440
22+
variable tString
23+
variable tList
24+
put "security.selinux\u{0}user.test\u{0}user.uuid" into tString
25+
split tString by "\u{0}" into tList
26+
test "split (nul)" when the number of elements in tList is 3
27+
end handler
28+
29+
end module

toolchain/lc-compile/src/literal.c

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -113,7 +113,17 @@ static int char_to_nibble(char p_char, unsigned int *r_nibble)
113113

114114
void append_utf8_char(char *p_string, int *x_index, int p_char)
115115
{
116-
if (p_char < 128)
116+
// If the char is NUL (i.e. U+0000) we can't represent it directly
117+
// in a nul-terminated string. However, we *can* use a 2-byte
118+
// overlong encoding to represent it safely (i.e. "Modified
119+
// UTF-8")
120+
if (p_char == 0)
121+
{
122+
p_string[*x_index] = 0xc0;
123+
p_string[*x_index + 1] = 0x80;
124+
(*x_index) += 2;
125+
}
126+
else if (p_char < 128)
117127
{
118128
p_string[*x_index] = p_char;
119129
(*x_index) += 1;

0 commit comments

Comments
 (0)