Package: purescript-lua-enums
File: src/Data/Enum.lua
Function: fromCharCode
Class: semantics Severity: high
fromCharCode is bound directly to string.char, which builds a string from raw byte values 0..255 and ERRORS for any value > 255 ('bad argument #1 to char (invalid value)', confirmed on Lua 5.1 for 256 and 65535). Even for 128..255 it emits a single raw byte (0x80-0xFF) which is invalid as a standalone UTF-8 character, not the Latin-1 code point JS produces. The JS FFI is String.fromCharCode(c), valid over the entire Char range 0..65535. fromCharCode backs charToEnum, which Data.Enum uses for the Char instances of toEnum/succ/pred; so any Char above U+007F is wrong and any Char above U+00FF throws at runtime. Correct only for the ASCII subrange 0..127.
Current (Lua):
fromCharCode = (string.char)
Expected: JS: String.fromCharCode(c) yields the character for code unit c over 0..65535 (e.g. 65 -> 'A', 233 -> 'é', 256 -> U+0100 'Ā', 65535 -> U+FFFF), as a valid UTF-8 string after compilation.
Proposed fix:
Encode the code point as UTF-8 (full BMP range) instead of a single byte. E.g.:
fromCharCode = function(n)
if n < 0x80 then return string.char(n) end
if n < 0x800 then return string.char(0xC0 + math.floor(n / 0x40), 0x80 + (n % 0x40)) end
if n < 0x10000 then return string.char(0xE0 + math.floor(n / 0x1000), 0x80 + (math.floor(n / 0x40) % 0x40), 0x80 + (n % 0x40)) end
return string.char(0xF0 + math.floor(n / 0x40000), 0x80 + (math.floor(n / 0x1000) % 0x40), 0x80 + (math.floor(n / 0x40) % 0x40), 0x80 + (n % 0x40))
end
Verified on Lua 5.1: toCharCode(fromCharCode(233))==233, ...(65535)==65535, ...(0x100)==256, no errors.
Found by the FFI audit; reproduced under Lua 5.1.
Package: purescript-lua-enums
File:
src/Data/Enum.luaFunction:
fromCharCodeClass: semantics Severity: high
fromCharCode is bound directly to string.char, which builds a string from raw byte values 0..255 and ERRORS for any value > 255 ('bad argument #1 to char (invalid value)', confirmed on Lua 5.1 for 256 and 65535). Even for 128..255 it emits a single raw byte (0x80-0xFF) which is invalid as a standalone UTF-8 character, not the Latin-1 code point JS produces. The JS FFI is
String.fromCharCode(c), valid over the entire Char range 0..65535. fromCharCode backs charToEnum, which Data.Enum uses for the Char instances of toEnum/succ/pred; so any Char above U+007F is wrong and any Char above U+00FF throws at runtime. Correct only for the ASCII subrange 0..127.Current (Lua):
Expected: JS: String.fromCharCode(c) yields the character for code unit c over 0..65535 (e.g. 65 -> 'A', 233 -> 'é', 256 -> U+0100 'Ā', 65535 -> U+FFFF), as a valid UTF-8 string after compilation.
Proposed fix:
Found by the FFI audit; reproduced under Lua 5.1.