feat: switch grisu2 float-to-string algorithm to hybrid of xjb & zmij algorithms by JairusSW · Pull Request #3025 · AssemblyScript/assemblyscript

JairusSW · 2026-06-09T05:20:07Z

Changes proposed in this pull request:
⯈ Switch to xjb-as which is an improvement over zmij-as
⯈ Comply to the ECMA262 Specification with the exception of a trailing .0

Note: I wrote xjb-as with claude assisting me for an initial port. I then optimized by hand. Everything is carefully checked and passes over a trillion fuzz cases compared directly against Number::parse() in V8. It also reaches 100% code coverage and tests every edge case both I and claude could think of.

It's good quality stuff though, and I'm quite confident in that.

Here's some performance notes. xjb-as/README.md has more extensive notes and stuff. These benches are taken on an AMD 7800x3D.

program uses	grisu2	xjb	Δ
f64 only	4.56	6.84	+2.28
f32 only	4.56	4.61	+0.05
both	5.17	8.97	+3.80

Lookup-table footprint vs the old grisu2 implementation (omiting shared tables):

program uses	grisu2	xjb	Δ
f64 (`dtoa`) only	910 B	672 B	−238 B
f32 (`ftoa`) only	910 B	616 B	−294 B
both	910 B	1288 B	+378 B

I've read the contributing guidelines
I've added my name and email to the NOTICE file

P.S. Sorry about the diff. Most of it is just the wasm files.

Signed-off-by: Jairus Tanaka <me@jairus.dev>

PaperPrototype · 2026-06-17T23:01:49Z

Just curious here, why the exception of a trailing .0?

JairusSW · 2026-06-17T23:04:04Z

Just curious here, why the exception of a trailing .0?

Max and Dan made that decision so that floats are easily identifiable when converting them to a string. For example, you know that the string "1.0" is a result of f64.toString() while the string "1" is likely a result of i32.toString(). Slight deviation from ECMA262-spec, but it's not awful

MaxGraey · 2026-06-17T23:06:22Z

Can you rename xjb.ts to dtoa.ts and in comments mentioned about implementation based on xjb and zmij? xjb filename is very exotic and doesn't explains exactly what this file does

Co-authored-by: Max Graey <maxgraey@gmail.com>

MaxGraey · 2026-06-17T23:25:34Z

+let gBcd: u64 = 0;
+let gBcdLen: i32 = 0;


gBcd -> gBcdVal
?

MaxGraey · 2026-06-17T23:28:11Z

+export let gDigHi: u64 = 0;
+export let gDigLo: u64 = 0;
+export let gDigNum: i32 = 0;


You have a lot of global vars. Does we need all of them? And can we declare all of necessary vars in one place?

Co-authored-by: Max Graey <maxgraey@gmail.com>

dcodeIO · 2026-06-17T23:31:34Z

+ (data $5 (i32.const 1239) "\80\00\00\00\00\00\00\00\a0\00\00\00\00\00\00\00\c8\00\00\00\00\00\00\00\fa\00\00\00\00\00\00@\9c\00\00\00\00\00\00P\c3\00\00\00\00\00\00$\f4\00\00\00\00\00\80\96\98\00\00\00\00\00 \bc\be\00\00\00\00\00(k\ee\00\00\00\00\00\f9\02\95\00\00\00\00@\b7C\ba\00\00\00\00\10\a5\d4\e8\00\00\00\00*\e7\84\91\00\00\00\80\f4 \e6\b5\00\00\00\a01\a9_\e3\00\00\00\04\bf\c9\1b\8e\00\00\00\c5.\bc\a2\b1\00\00@v:k\0b\de\00\00\e8\89\04#\c7\8a\00\00b\ac\c5\ebx\ad\00\80z\17\b7&\d7\d8\00\90\acn2x\86\87\00\b4W\n?\16h\a9\00\a1\ed\cc\ce\1b\c2\d3\a0\84\14@aQY\84\c8\a5\19\90\b9\a5o\a5:\0f \f4\'\8f\cb\ce")
+ (data $6 (i32.const 1456) "o\1b\8e(\10T\8e\af\daM\e4^\ae\f0\ec\07J\fb\9f\f4\98\'D\b1\9dwA\df\cf\11\cd\99\07\ef\99\85\0b?\fe\b2\15\aa\b4\dc\e6\a7\1f\86c\beZ\06\0b\a5\bc\b4\aaSkuz\07\ed\0f\08\bf,)Ud\7f\b6C\d5\b1\17L\c8;\1a\fb;\efi\c2\87F\b8B\a7\ee@OQ]=\eb\dd\e4PF\1a\12\ba\13\e4labM\f3\92\ea\af(\b6\ef&\e2\bb\8c6U\n\f7\89\04\89\0f`\cb\05\e9\b8\b6\bd!\c9\c1\bb\87\e9\00T\96_\9a\84x\db\8f\bf4\d0\bdr\04R\98\de\'\8a\92\95\00\9am\c1\94\82\17\0f<\05\b7u\00\00\00\00\00\00P\c3\00\00\00\00\00\00\00\00\05\e3L6\12\197\c5\00\00\00\00\00\00(l\d6\aa\80\9d\ef\f0\"\c7\f6~\b9\b7\d2:MBL\c8q\d5m\93\13\c9\ea8\1e\cd\19:\bc\03\1cU\ab\01\80\0c\t\cb\c6,\07\d3\bf\f5\ad\\\a1\90\08\137h\03\cd\10\8cz\c3\87\a8\db6.\ef\07\12\c2\b2\02\cf\bc\f4\03^\e4g\f9\94\c7\85\d7in\f8\06\d1R\ba\be\01\d763\e1|\a0\1c4\a8E\10\d3Q\a0\t\12\11H\de\1e1Vx\85\fa\a6\1e\d5f\a5>\7f\"t*U3\f1\ca\ba\0f)2\d7\96@\adGy\17|\a9t\088\c7\b1\d8J\d9\bc\"x\ae\81R7\18")
+ (data $7 (i32.const 1824) "?6N\n@\18\00\00\00d\00\00@\00 $\00\00\00\00\00\00\00\0c\80\13\c8\82\1f\e0L^\0f\f60\d7\1b\00\00\00\00\00\00\00\fc\ff\f7\cd\d8\01\82n\d1?\cd@\01%d\db\r\r\00\00\00$\04\14@8qS\b4\1dx\11")
+ (data $9 (i32.const 2032) "p\\\ea{\ce2~\8f\1a\c7C\c6\b0\b7\96\e5\ae\05\03\05\'\c6\ab\b7\bf7\cf\d0\b8\d1\ef\92\fe%\e5\1a\8eO\19\eb2\ebP\e2\a4?\14\bc\f5\88\r\b5P\99v\96!\dbH\bb\1a\c2\bd\f0\b4\15\07\c9{\ce\97\c0]\11l:\96\0b\13\9a\c7\1b\e0\c3V\df\84\f6\06\e3L6\12\197\c5\9e\b5p+\a8\ad\c5\9d\97\"\81E@|o\fc\dfNg\04\cd\c9\f2\c9\e6\0b\b96\d7\07\8f\a1\85\t\94\f8x9?\81:\0f \f4\'\8f\cb\ce\c8\a5\19\90\b9\a5o\a5\a0\84\14@aQY\84\00\a1\ed\cc\ce\1b\c2\d3\00\b4W\n?\16h\a9\00\90\acn2x\86\87\00\80z\17\b7&\d7\d8\00\00b\ac\c5\ebx\ad\00\00\e8\89\04#\c7\8a\00\00@v:k\0b\de\00\00\00\c5.\bc\a2\b1\00\00\00\04\bf\c9\1b\8e\00\00\00\a01\a9_\e3\00\00\00\80\f4 \e6\b5\00\00\00\00*\e7\84\91\00\00\00\00\10\a5\d4\e8\00\00\00\00@\b7C\ba\00\00\00\00\00\f9\02\95\00\00\00\00\00(k\ee\00\00\00\00\00 \bc\be\00\00\00\00\00\80\96\98\00\00\00\00\00\00$\f4\00\00\00\00\00\00P\c3\00\00\00\00\00\00@\9c\00\00\00\00\00\00\00\fa\00\00\00\00\00\00\00\c8\00\00\00\00\00\00\00\a0\00\00\00\00\00\00\00\80\cd\cc\cc\cc\cc\cc\cc\cc\0b\d7\a3p=\n\d7\a3<\dfO\8d\97n\12\83,e\19\e2X\17\b7\d1$\84G\1bG\ac\c5\a7\b6il\af\05\bd7\86\bdBz\e5\d5\94\bf\d6\fd\cea\84\11w\cc\ab\98\a5\b46A_p\89\bf\d5\ed\bd\ce\fe\e6\db\ff\aa$\cb\0b\ff\eb\af\cc\88Po\t\cc\bc\8c\14\0e\b4KB\13.\e1\10\d8\\\t5\dc$\b4\da\ac\b0:\f7|\1d\90\\\e1M\c4\be\94\95\e6J\b4\a462\aaw\b8\08]\1d\92\8e\ee\92\93\a6a\95\b6}J\1e\ec\eb\1a\11\92d\08\e5\bc\ef{\datP\a0\1d\97\b2,\f7\ba\80\00\c9\f1(\8a\92\95\00\9am\c1S;uD\cd\14\be\9aR\c5\ee\d3\ae\87\96\f7\db\9dXv%\06\12\c6I~\e0\91\b7\d1t\9e\0e\ca\00\83\f2\b5\87\fd?;\9a5\f5\f7\d2\ca2\fc\14^\f7_B\a2\f5\fcCK,\b3\ce\81\bb\949E\ad\1e\b1\cf")
+ (data $10 (i32.const 2648) "\"\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$\"#$!\"#")
+ (data $11 (i32.const 2908) ",")
+ (data $11.1 (i32.const 2920) "\02\00\00\00\1c\00\00\00I\00n\00v\00a\00l\00i\00d\00 \00l\00e\00n\00g\00t\00h")
+ (data $12 (i32.const 2956) "<")
+ (data $12.1 (i32.const 2968) "\02\00\00\00&\00\00\00~\00l\00i\00b\00/\00a\00r\00r\00a\00y\00b\00u\00f\00f\00e\00r\00.\00t\00s")
+ (data $13 (i32.const 3020) "<")
+ (data $13.1 (i32.const 3032) "\02\00\00\00(\00\00\00A\00l\00l\00o\00c\00a\00t\00i\00o\00n\00 \00t\00o\00o\00 \00l\00a\00r\00g\00e")
+ (data $14 (i32.const 3084) "<")
+ (data $14.1 (i32.const 3096) "\02\00\00\00 \00\00\00~\00l\00i\00b\00/\00r\00t\00/\00i\00t\00c\00m\00s\00.\00t\00s")
+ (data $17 (i32.const 3212) "<")
+ (data $17.1 (i32.const 3224) "\02\00\00\00$\00\00\00I\00n\00d\00e\00x\00 \00o\00u\00t\00 \00o\00f\00 \00r\00a\00n\00g\00e")
+ (data $18 (i32.const 3276) ",")
+ (data $18.1 (i32.const 3288) "\02\00\00\00\14\00\00\00~\00l\00i\00b\00/\00r\00t\00.\00t\00s")
+ (data $20 (i32.const 3356) "<")
+ (data $20.1 (i32.const 3368) "\02\00\00\00\1e\00\00\00~\00l\00i\00b\00/\00r\00t\00/\00t\00l\00s\00f\00.\00t\00s")
+ (data $21 (i32.const 3420) "\1c")
+ (data $21.1 (i32.const 3432) "\02")
+ (data $22 (i32.const 3452) "<")
+ (data $22.1 (i32.const 3464) "\02\00\00\00$\00\00\00~\00l\00i\00b\00/\00t\00y\00p\00e\00d\00a\00r\00r\00a\00y\00.\00t\00s")
+ (data $23 (i32.const 3516) "<")
+ (data $23.1 (i32.const 3528) "\02\00\00\00&\00\00\00~\00l\00i\00b\00/\00s\00t\00a\00t\00i\00c\00a\00r\00r\00a\00y\00.\00t\00s")
+ (data $24 (i32.const 3580) ",")
+ (data $24.1 (i32.const 3592) "\02\00\00\00\1a\00\00\00~\00l\00i\00b\00/\00a\00r\00r\00a\00y\00.\00t\00s")
+ (data $25 (i32.const 3628) "|")
+ (data $25.1 (i32.const 3640) "\02\00\00\00^\00\00\00E\00l\00e\00m\00e\00n\00t\00 \00t\00y\00p\00e\00 \00m\00u\00s\00t\00 \00b\00e\00 \00n\00u\00l\00l\00a\00b\00l\00e\00 \00i\00f\00 \00a\00r\00r\00a\00y\00 \00i\00s\00 \00h\00o\00l\00e\00y")
+ (data $26 (i32.const 3756) "<")
+ (data $26.1 (i32.const 3768) "\02\00\00\00*\00\00\00O\00b\00j\00e\00c\00t\00 \00a\00l\00r\00e\00a\00d\00y\00 \00p\00i\00n\00n\00e\00d")
+ (data $27 (i32.const 3820) "<")
+ (data $27.1 (i32.const 3832) "\02\00\00\00(\00\00\00O\00b\00j\00e\00c\00t\00 \00i\00s\00 \00n\00o\00t\00 \00p\00i\00n\00n\00e\00d")
+ (data $28 (i32.const 3888) "\10\00\00\00 \00\00\00 \00\00\00 ")
+ (data $28.1 (i32.const 3912) "\81\08\00\00\01\19\00\00\01\02\00\00$\t\00\00\a4\00\00\00$\n\00\00\02\t\00\00\02A\00\00\00\00\00\00A\00\00\00 ")


Wondering if the baseline amount of static memory can be reduced here by computing the parts of it that aren't strictly necessary to have as LUTs?

Co-authored-by: Max Graey <maxgraey@gmail.com>

…ript into jairus/switch-to-xjb

MaxGraey · 2026-06-18T09:00:16Z

+function dtoa_dotZero(buffer: usize, len: u32): u32 {
+  let p = buffer;
+  let end = buffer + (<usize>len << 1);
+  while (p < end) {
+    let c = <i32>load<u16>(p);
+    if ((c < CharCode._0 || c > CharCode._9) && c != CharCode.MINUS) return len;
+    p += 2;
  }
+  store<u16>(end, CharCode.DOT);
+  store<u16>(end, CharCode._0, 2);
+  return len + 2;
 }


Looks like post-process stage. Can we optimize this and do in-place on main processing stage?

MaxGraey · 2026-06-18T09:02:07Z

+    len = ftoa_buffered_single(dtoa_buf, <f32>value);
+  } else {
+    // @ts-ignore: type
+    len = dtoa_buffered_double(dtoa_buf, <f64>value);


ftoa & dtoa already have hint about type so simplify names to:

ftoa_buffered and dtoa_buffered

MaxGraey · 2026-06-18T09:04:01Z

+
+// 28 normalized exact powers 10**0..10**27 - the within-stride minor factors.
+// @ts-ignore: decorator
+@lazy const POW10_MINOR = memory.data<u64>([


Plz move all constans and lut tables on top

MaxGraey · 2026-06-18T09:06:06Z

+const DIV10K_SIG: u64 = ((<u64>1) << DIV10K_EXP) / 10000 + 1;
+const NEG10K: u64 = ((<u64>1) << 32) - 10000;
+
+export const ZEROS: u64 = 0x3030303030303030;


Suggested change

export const ZEROS: u64 = 0x3030303030303030;

export const BCD_ZEROS: u64 = 0x3030303030303030;

MaxGraey · 2026-06-18T09:08:30Z

+  // bswap to big-endian so the most-significant digit lands in the high byte
+  let bcd = bswap<u64>(singles);
+  gBcd = bcd;
+  gBcdLen = <i32>((70 - clz<u64>((bcd << 1) | 1)) / 8);


What 70 meaning here? Can you make local const with meaningful name or comment?

Also as I mentioned we can remove gBcdLen and return it as laocal var from here function and leave only gBcd for glob var.

MaxGraey · 2026-06-18T09:18:38Z

+
+// High 64 bits of the 128-bit product x * y. Matches umul128.
+// @ts-ignore: decorator
+@inline export function mulhi64(a: u64, b: u64): u64 {


mulhi64 -> umul64hi

MaxGraey · 2026-06-18T09:20:28Z

+
+// Returns (x * y + c) >> 64.
+// @ts-ignore: decorator
+@inline export function umul128AddHi64(x: u64, y: u64, c: u64): u64 {


umul64hiCarry(a: u64, b: u64, carry: u64): u64

MaxGraey · 2026-06-18T09:21:51Z

+
+// Shift that keeps a fixed 128-bit fractional part after scaling by 10**dec_exp.
+// @ts-ignore: decorator
+@inline export function computeExpShift(binExp: i32, decExp: i32): i32 {


Suggested change

@inline export function computeExpShift(binExp: i32, decExp: i32): i32 {

@inline export function exponentShift(binExp: i32, decExp: i32): i32 {

MaxGraey · 2026-06-18T09:22:52Z

+// floor(log10(2**bin_exp)). (The f64 path only ever needs the regular form; the
+// irregular 3/4 variant lives in ftoa.ts's own copy.)
+// @ts-ignore: decorator
+@inline export function computeDecExp(binExp: i32): i32 {


Suggested change

@inline export function computeDecExp(binExp: i32): i32 {

@inline export function toDecExponent(binExp: i32): i32 {

MaxGraey · 2026-06-18T09:25:27Z

+// major[(i+10)/28] * minor[(i+10)%28], normalized left if the top bit is clear,
+// then the per-power fixup bit subtracted off the low limb.
+// @ts-ignore: decorator
+@inline function computePow10(i: i32): void {


Suggested change

@inline function computePow10(i: i32): void {

@inline function pow10(exp: i32): u64 {

...

return lo;

}

This also reduce one global var. Also do the same for the rest similar functions which store low u64 part in glob instead return it

MaxGraey · 2026-06-18T09:26:53Z

+@inline export function loadPow10Xjb64(power: i32): void {
+  computePow10(power + 293);
+  gPow10Lo += u64(power < 0);
+}
+
+// @ts-ignore: decorator
+@inline export function loadPow10HiXjb64(power: i32): u64 {
+  computePow10(power + 293);
+  return gPow10Hi;
+}


These two functions are too small and trivial. Inline into call sites and remove them

MaxGraey · 2026-06-18T09:37:34Z

+    store<u16>(base, <u16>(ascii & 0xff));
+    store<u16>(base, <u16>((ascii >> 8) & 0xff), 2);
+    store<u16>(base, <u16>((ascii >> 16) & 0xff), 4);
+    store<u16>(base, <u16>((ascii >> 24) & 0xff), 6);
+    store<u16>(base, <u16>((ascii >> 32) & 0xff), 8);
+    store<u16>(base, <u16>((ascii >> 40) & 0xff), 10);
+    store<u16>(base, <u16>((ascii >> 48) & 0xff), 12);
+    store<u16>(base, <u16>(ascii >> 56), 14);


It's ASCII -> UTF16. It can be more efficient. It's typical interleave with zero idiom:

Suggested change

store<u16>(base, <u16>(ascii & 0xff));

store<u16>(base, <u16>((ascii >> 8) & 0xff), 2);

store<u16>(base, <u16>((ascii >> 16) & 0xff), 4);

store<u16>(base, <u16>((ascii >> 24) & 0xff), 6);

store<u16>(base, <u16>((ascii >> 32) & 0xff), 8);

store<u16>(base, <u16>((ascii >> 40) & 0xff), 10);

store<u16>(base, <u16>((ascii >> 48) & 0xff), 12);

store<u16>(base, <u16>(ascii >> 56), 14);

let lo = ascii & 0xFFFFFFFF;

let hi = ascii >> 32;

lo = (lo | (lo << 16)) & 0x0000FFFF0000FFFF;

hi = (hi | (hi << 16)) & 0x0000FFFF0000FFFF;

lo = (lo | (lo << 8)) & 0x00FF00FF00FF00FF;

hi = (hi | (hi << 8)) & 0x00FF00FF00FF00FF;

store<u64>(base, lo, 0);

store<u64>(base, hi, 8);

MaxGraey · 2026-06-18T09:42:01Z

+// Eight packed ASCII digits in a u64 -> 8 UTF-16 code units (16 bytes) at
+// `p + off`. SIMD zero-extends the bytes to u16 lanes in one store.
+// @ts-ignore: decorator
+@inline export function putBlock8(p: usize, ascii: u64, off: usize = 0): void {


Also putBlock8 is not perfect name. I guess better use writeUnpacked8 or writeDeinterleved64toUtf16 or something like this

MaxGraey · 2026-06-18T09:43:50Z

+  if (decExp < 0) putBlock8(start, ZEROS);
+  let lastDigitChar = <u64>(0x30 + (hasLastDigit ? gLastDigit : 0));
+  let numDigits = hasLastDigit ? 16 : gDigits - 1;
+  let dHi = gDigHi, dLo = gDigLo;


Suggested change

let dHi = gDigHi, dLo = gDigLo;

let dHi = gDigHi;

let dLo = gDigLo;

MaxGraey · 2026-06-18T09:44:57Z

+      fHi = (dHi >> s) | (dLo << (64 - s));
+      fLo = (dLo >> s) | (d16 << (64 - s));
+    } else if (s == 64) {
+      fHi = dLo; fLo = d16;


Suggested change

fHi = dLo; fLo = d16;

fHi = dLo;

fLo = d16;

MaxGraey · 2026-06-18T09:46:57Z

+  store<u16>(buf, 0x65); // 'e'
+  store<u16>(buf, 0x2b + (m & 2), 2); // '+' / '-' branchlessly


We have CharCode enum for such char literals

import { CharCode } from "./string";

MaxGraey · 2026-06-18T09:52:14Z

+@inline function setDecimalResult(integral: u64, one: u64, decExp: i32): void {
+  if (one == 10) {
+    gSig = <i64>(integral + 1);
+    gLastDigit = 0;
+    gHasLastDigit = false;
+  } else if (one == 0) {
+    gSig = <i64>integral;
+    gLastDigit = 0;
+    gHasLastDigit = false;
+  } else {
+    gSig = <i64>integral;
+    gLastDigit = <i32>one;
+    gHasLastDigit = true;
+  }
+  gExp = decExp;
+}


What if we inline this function on all 3 call sites? Does this allow us to eliminate or reduce the number of such glob vars? It really bothers me that there are so many of glob vars even though most of them could be stored locally and passed as arguments. Also at least one could be returned from the function. And use extra glob vars as fallback apprach when we need to return a multi-value

MaxGraey · 2026-06-18T09:59:02Z

+    let decExp = (q * LOG10_2_SIGNIFICAND - 131072) >> LOG10_2_EXP; // 131072 = 2**17 rounding bias
+    let powExp = -decExp - 1;
+    let h = q + ((powExp * LOG2_POW10_SIGNIFICAND) >> LOG2_POW10_EXP);
+
+    let pow10Hi = loadPow10HiXjb64(powExp);
+
+    let integral = pow10Hi >> (11 - h);
+    let halfUlp = pow10Hi >> (-h);
+    let dotOne = pow10Hi << (53 + h);
+
+    let one = ((((dotOne >> (53 + h)) * 5) + (((<u64>1) << (9 - h)))) >> (10 - h));
+    one = ((((dotOne >> 54) * 5) & 0x1ff) > ((halfUlp >> 55) * 5))
+      ? ((((dotOne >> 54) * 5) >> 9) + 1)
+      : one;
+    one = dotOne == ((<u64>1) << 62) ? 2 : one;


How about invert sign for powExp at first place then is will be much more optimal:

Suggested change

let decExp = (q * LOG10_2_SIGNIFICAND - 131072) >> LOG10_2_EXP; // 131072 = 2**17 rounding bias

let powExp = -decExp - 1;

let h = q + ((powExp * LOG2_POW10_SIGNIFICAND) >> LOG2_POW10_EXP);

let pow10Hi = loadPow10HiXjb64(powExp);

let integral = pow10Hi >> (11 - h);

let halfUlp = pow10Hi >> (-h);

let dotOne = pow10Hi << (53 + h);

let one = ((((dotOne >> (53 + h)) * 5) + (((<u64>1) << (9 - h)))) >> (10 - h));

one = ((((dotOne >> 54) * 5) & 0x1ff) > ((halfUlp >> 55) * 5))

? ((((dotOne >> 54) * 5) >> 9) + 1)

: one;

one = dotOne == ((<u64>1) << 62) ? 2 : one;

let decExp = (q * LOG10_2_SIGNIFICAND - 131072) >> LOG10_2_EXP; // 131072 = 2**17 rounding bias

let powExp = decExp + 1;

let h = ((powExp * LOG2_POW10_SIGNIFICAND) >> LOG2_POW10_EXP) - q;

let pow10Hi = computePow10(293 - powExp);

gPow10Hi = pow10Hi;

let integral = pow10Hi >> 11 + h;

let halfUlp = pow10Hi >> h;

let dotOne = pow10Hi << 53 - h;

let one = ((((dotOne >> (53 - h)) * 5) + (((<u64>1) << (9 + h)))) >> (10 - h));

one = ((((dotOne >> 54) * 5) & 0x1ff) > ((halfUlp >> 55) * 5))

? ((((dotOne >> 54) * 5) >> 9) + 1)

: one;

one = dotOne == ((<u64>1) << 62) ? 2 : one;

Also I recommend ((dotOne >> 54) * 5) and ((dotOne >> (53 - h)) * 5) move out from expr as locals

MaxGraey · 2026-06-18T11:32:10Z

+  // bswap to big-endian so the most-significant digit lands in the high byte
+  let bcd = bswap<u64>(singles);
+  gBcd = bcd;
+  gBcdLen = <i32>((70 - clz<u64>((bcd << 1) | 1)) / 8);


Also I found a bug. It should be:

gBcdLen = <i32>((70 - clz<u64>((singles << 1) | 1)) / 8);

value fixed gBcdLen wrong (current) gBcdLen

1 1 8

78 2 8

1234 4 8

10 2 7

100 3 6

10000 5 4

MaxGraey · 2026-06-18T11:44:46Z

Also plz add extra tests which cover new implementation-specific edge cases (I anready found a bug) and uncomment existing test cases which failed with grisu2

JairusSW added 3 commits June 8, 2026 21:45

chore: add xjb-as to util

5705f06

Signed-off-by: Jairus Tanaka <me@jairus.dev>

chore: add liceses to NOTICE

c778be9

Signed-off-by: Jairus Tanaka <me@jairus.dev>

chore: update changed tests

1fdc61d

Signed-off-by: Jairus Tanaka <me@jairus.dev>

JairusSW marked this pull request as draft June 9, 2026 06:04

JairusSW added 5 commits June 16, 2026 16:00

chore: simplify and use smaller tables

09c3ef8

chore: use table from number instead

7cd14a4

chore: update notice

e0646ff

chore: clean up

650a7c3

chore: clean up

759c01f

JairusSW marked this pull request as ready for review June 17, 2026 18:13