[fork-ffi] enums: toCharCode — toCharCode is implemented as `c:byte()`, which returns the value of…

**Package:** purescript-lua-enums  
**File:** `src/Data/Enum.lua`  
**Function:** `toCharCode`  
**Class:** semantics  **Severity:** high

toCharCode is implemented as `c:byte()`, which returns the value of the FIRST BYTE (0..255) of the argument's UTF-8 encoding, not the character's code point. The pslua compiler emits a PureScript Char literal as a Lua string holding the character's UTF-8 bytes (Lua.hs LiteralChar -> String of Text.singleton c, printed raw via dquotes(pretty t)). So any Char above U+007F arrives as a multi-byte UTF-8 string and c:byte() returns the leading byte. Confirmed on Lua 5.1: U+00E9 ('é', UTF-8 C3 A9) -> 195 instead of 233; U+FFFF (top, UTF-8 EF BF BF) -> 239 instead of 65535. JS FFI does `c.charCodeAt(0)`, returning the UTF-16 code unit (0..65535). This also breaks Data.Enum's `cardinality = toCharCode top - toCharCode bottom` (evaluates to 239-0 = 239 instead of 65535) and `fromEnum :: Char -> Int`. Correct only for the ASCII subrange 0..127.

**Current (Lua):**
```lua
toCharCode = (function(c) return c:byte() end)
```

**Expected:** JS: c.charCodeAt(0) returns the code point/UTF-16 code unit of the first character (0..65535). For 'A' -> 65, 'é'(U+00E9) -> 233, top(U+FFFF) -> 65535.

**Proposed fix:**
```lua
Decode the first UTF-8 code point of the string instead of taking the first byte. E.g.:
  toCharCode = function(c)
    local b1 = c:byte(1)
    if b1 < 0x80 then return b1 end
    if b1 < 0xE0 then return (b1 - 0xC0) * 0x40 + (c:byte(2) - 0x80) end
    if b1 < 0xF0 then return (b1 - 0xE0) * 0x1000 + (c:byte(2) - 0x80) * 0x40 + (c:byte(3) - 0x80) end
    return (b1 - 0xF0) * 0x40000 + (c:byte(2) - 0x80) * 0x1000 + (c:byte(3) - 0x80) * 0x40 + (c:byte(4) - 0x80)
  end
Verified on Lua 5.1: 'A'->65, 'é'->233, top->65535.
```

_Found by the FFI audit; reproduced under Lua 5.1._

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fork-ffi] enums: toCharCode — toCharCode is implemented as `c:byte()`, which returns the value of… #79

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[fork-ffi] enums: toCharCode — toCharCode is implemented as c:byte(), which returns the value of… #79

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

[fork-ffi] enums: toCharCode — toCharCode is implemented as `c:byte()`, which returns the value of… #79