Skip to content

feat: switch grisu2 float-to-string algorithm to hybrid of xjb & zmij algorithms#3025

Open
JairusSW wants to merge 30 commits into
AssemblyScript:mainfrom
JairusSW:jairus/switch-to-xjb
Open

feat: switch grisu2 float-to-string algorithm to hybrid of xjb & zmij algorithms#3025
JairusSW wants to merge 30 commits into
AssemblyScript:mainfrom
JairusSW:jairus/switch-to-xjb

Conversation

@JairusSW

@JairusSW JairusSW commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Fixes #3012.

Changes proposed in this pull request:
⯈ Switch to xjb-as which is an improvement over zmij-as
⯈ Comply to the ECMA262 Specification with the exception of a trailing .0

Note: I wrote xjb-as with claude assisting me for an initial port. I then optimized by hand. Everything is carefully checked and passes over a trillion fuzz cases compared directly against Number::parse() in V8. It also reaches 100% code coverage and tests every edge case both I and claude could think of.

It's good quality stuff though, and I'm quite confident in that.

Here's some performance notes. xjb-as/README.md has more extensive notes and stuff. These benches are taken on an AMD 7800x3D.

dtoa (f64) vs the AssemblyScript stdlib

ftoa (f32) vs the AssemblyScript stdlib

dtoa (f64) per-stage latency

program uses grisu2 xjb Δ
f64 only 4.56 6.84 +2.28
f32 only 4.56 4.61 +0.05
both 5.17 8.97 +3.80

Lookup-table footprint vs the old grisu2 implementation (omiting shared tables):

program uses grisu2 xjb Δ
f64 (dtoa) only 910 B 672 B −238 B
f32 (ftoa) only 910 B 616 B −294 B
both 910 B 1288 B +378 B
  • I've read the contributing guidelines
  • I've added my name and email to the NOTICE file

P.S. Sorry about the diff. Most of it is just the wasm files.

JairusSW added 3 commits June 8, 2026 21:45
Signed-off-by: Jairus Tanaka <me@jairus.dev>
Signed-off-by: Jairus Tanaka <me@jairus.dev>
Signed-off-by: Jairus Tanaka <me@jairus.dev>
@JairusSW JairusSW marked this pull request as draft June 9, 2026 06:04
@JairusSW JairusSW marked this pull request as ready for review June 17, 2026 18:13
@PaperPrototype

Copy link
Copy Markdown

Just curious here, why the exception of a trailing .0?

@JairusSW

JairusSW commented Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

Just curious here, why the exception of a trailing .0?

Max and Dan made that decision so that floats are easily identifiable when converting them to a string. For example, you know that the string "1.0" is a result of f64.toString() while the string "1" is likely a result of i32.toString(). Slight deviation from ECMA262-spec, but it's not awful

@MaxGraey

MaxGraey commented Jun 17, 2026

Copy link
Copy Markdown
Member

Can you rename xjb.ts to dtoa.ts and in comments mentioned about implementation based on xjb and zmij? xjb filename is very exotic and doesn't explains exactly what this file does

Comment thread std/assembly/util/dtoa.ts
Comment thread std/assembly/util/xjb.ts Outdated
Comment thread std/assembly/util/dtoa.ts
Comment thread std/assembly/util/dtoa.ts Outdated
Comment thread std/assembly/util/dtoa.ts Outdated
JairusSW and others added 2 commits June 17, 2026 16:23
Co-authored-by: Max Graey <maxgraey@gmail.com>
Co-authored-by: Max Graey <maxgraey@gmail.com>
Comment thread std/assembly/util/dtoa.ts Outdated
Comment thread std/assembly/util/dtoa.ts
Comment on lines +135 to +136
let gBcd: u64 = 0;
let gBcdLen: i32 = 0;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gBcd -> gBcdVal
?

Comment thread std/assembly/util/dtoa.ts Outdated
Comment thread std/assembly/util/dtoa.ts Outdated
Comment on lines +151 to +153
export let gDigHi: u64 = 0;
export let gDigLo: u64 = 0;
export let gDigNum: i32 = 0;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have a lot of global vars. Does we need all of them? And can we declare all of necessary vars in one place?

Comment thread std/assembly/util/dtoa.ts Outdated
Co-authored-by: Max Graey <maxgraey@gmail.com>
Comment thread tests/compiler/bindings/raw.release.wat Outdated
Comment on lines +57 to +91
(data $5 (i32.const 1239) "\80\00\00\00\00\00\00\00\a0\00\00\00\00\00\00\00\c8\00\00\00\00\00\00\00\fa\00\00\00\00\00\00@\9c\00\00\00\00\00\00P\c3\00\00\00\00\00\00$\f4\00\00\00\00\00\80\96\98\00\00\00\00\00 \bc\be\00\00\00\00\00(k\ee\00\00\00\00\00\f9\02\95\00\00\00\00@\b7C\ba\00\00\00\00\10\a5\d4\e8\00\00\00\00*\e7\84\91\00\00\00\80\f4 \e6\b5\00\00\00\a01\a9_\e3\00\00\00\04\bf\c9\1b\8e\00\00\00\c5.\bc\a2\b1\00\00@v:k\0b\de\00\00\e8\89\04#\c7\8a\00\00b\ac\c5\ebx\ad\00\80z\17\b7&\d7\d8\00\90\acn2x\86\87\00\b4W\n?\16h\a9\00\a1\ed\cc\ce\1b\c2\d3\a0\84\14@aQY\84\c8\a5\19\90\b9\a5o\a5:\0f \f4\'\8f\cb\ce")
(data $6 (i32.const 1456) "o\1b\8e(\10T\8e\af\daM\e4^\ae\f0\ec\07J\fb\9f\f4\98\'D\b1\9dwA\df\cf\11\cd\99\07\ef\99\85\0b?\fe\b2\15\aa\b4\dc\e6\a7\1f\86c\beZ\06\0b\a5\bc\b4\aaSkuz\07\ed\0f\08\bf,)Ud\7f\b6C\d5\b1\17L\c8;\1a\fb;\efi\c2\87F\b8B\a7\ee@OQ]=\eb\dd\e4PF\1a\12\ba\13\e4labM\f3\92\ea\af(\b6\ef&\e2\bb\8c6U\n\f7\89\04\89\0f`\cb\05\e9\b8\b6\bd!\c9\c1\bb\87\e9\00T\96_\9a\84x\db\8f\bf4\d0\bdr\04R\98\de\'\8a\92\95\00\9am\c1\94\82\17\0f<\05\b7u\00\00\00\00\00\00P\c3\00\00\00\00\00\00\00\00\05\e3L6\12\197\c5\00\00\00\00\00\00(l\d6\aa\80\9d\ef\f0\"\c7\f6~\b9\b7\d2:MBL\c8q\d5m\93\13\c9\ea8\1e\cd\19:\bc\03\1cU\ab\01\80\0c\t\cb\c6,\07\d3\bf\f5\ad\\\a1\90\08\137h\03\cd\10\8cz\c3\87\a8\db6.\ef\07\12\c2\b2\02\cf\bc\f4\03^\e4g\f9\94\c7\85\d7in\f8\06\d1R\ba\be\01\d763\e1|\a0\1c4\a8E\10\d3Q\a0\t\12\11H\de\1e1Vx\85\fa\a6\1e\d5f\a5>\7f\"t*U3\f1\ca\ba\0f)2\d7\96@\adGy\17|\a9t\088\c7\b1\d8J\d9\bc\"x\ae\81R7\18")
(data $7 (i32.const 1824) "?6N\n@\18\00\00\00d\00\00@\00 $\00\00\00\00\00\00\00\0c\80\13\c8\82\1f\e0L^\0f\f60\d7\1b\00\00\00\00\00\00\00\fc\ff\f7\cd\d8\01\82n\d1?\cd@\01%d\db\r\r\00\00\00$\04\14@8qS\b4\1dx\11")
(data $9 (i32.const 2032) "p\\\ea{\ce2~\8f\1a\c7C\c6\b0\b7\96\e5\ae\05\03\05\'\c6\ab\b7\bf7\cf\d0\b8\d1\ef\92\fe%\e5\1a\8eO\19\eb2\ebP\e2\a4?\14\bc\f5\88\r\b5P\99v\96!\dbH\bb\1a\c2\bd\f0\b4\15\07\c9{\ce\97\c0]\11l:\96\0b\13\9a\c7\1b\e0\c3V\df\84\f6\06\e3L6\12\197\c5\9e\b5p+\a8\ad\c5\9d\97\"\81E@|o\fc\dfNg\04\cd\c9\f2\c9\e6\0b\b96\d7\07\8f\a1\85\t\94\f8x9?\81:\0f \f4\'\8f\cb\ce\c8\a5\19\90\b9\a5o\a5\a0\84\14@aQY\84\00\a1\ed\cc\ce\1b\c2\d3\00\b4W\n?\16h\a9\00\90\acn2x\86\87\00\80z\17\b7&\d7\d8\00\00b\ac\c5\ebx\ad\00\00\e8\89\04#\c7\8a\00\00@v:k\0b\de\00\00\00\c5.\bc\a2\b1\00\00\00\04\bf\c9\1b\8e\00\00\00\a01\a9_\e3\00\00\00\80\f4 \e6\b5\00\00\00\00*\e7\84\91\00\00\00\00\10\a5\d4\e8\00\00\00\00@\b7C\ba\00\00\00\00\00\f9\02\95\00\00\00\00\00(k\ee\00\00\00\00\00 \bc\be\00\00\00\00\00\80\96\98\00\00\00\00\00\00$\f4\00\00\00\00\00\00P\c3\00\00\00\00\00\00@\9c\00\00\00\00\00\00\00\fa\00\00\00\00\00\00\00\c8\00\00\00\00\00\00\00\a0\00\00\00\00\00\00\00\80\cd\cc\cc\cc\cc\cc\cc\cc\0b\d7\a3p=\n\d7\a3<\dfO\8d\97n\12\83,e\19\e2X\17\b7\d1$\84G\1bG\ac\c5\a7\b6il\af\05\bd7\86\bdBz\e5\d5\94\bf\d6\fd\cea\84\11w\cc\ab\98\a5\b46A_p\89\bf\d5\ed\bd\ce\fe\e6\db\ff\aa$\cb\0b\ff\eb\af\cc\88Po\t\cc\bc\8c\14\0e\b4KB\13.\e1\10\d8\\\t5\dc$\b4\da\ac\b0:\f7|\1d\90\\\e1M\c4\be\94\95\e6J\b4\a462\aaw\b8\08]\1d\92\8e\ee\92\93\a6a\95\b6}J\1e\ec\eb\1a\11\92d\08\e5\bc\ef{\datP\a0\1d\97\b2,\f7\ba\80\00\c9\f1(\8a\92\95\00\9am\c1S;uD\cd\14\be\9aR\c5\ee\d3\ae\87\96\f7\db\9dXv%\06\12\c6I~\e0\91\b7\d1t\9e\0e\ca\00\83\f2\b5\87\fd?;\9a5\f5\f7\d2\ca2\fc\14^\f7_B\a2\f5\fcCK,\b3\ce\81\bb\949E\ad\1e\b1\cf")
(data $10 (i32.const 2648) "\"\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$\"#$!\"#")
(data $11 (i32.const 2908) ",")
(data $11.1 (i32.const 2920) "\02\00\00\00\1c\00\00\00I\00n\00v\00a\00l\00i\00d\00 \00l\00e\00n\00g\00t\00h")
(data $12 (i32.const 2956) "<")
(data $12.1 (i32.const 2968) "\02\00\00\00&\00\00\00~\00l\00i\00b\00/\00a\00r\00r\00a\00y\00b\00u\00f\00f\00e\00r\00.\00t\00s")
(data $13 (i32.const 3020) "<")
(data $13.1 (i32.const 3032) "\02\00\00\00(\00\00\00A\00l\00l\00o\00c\00a\00t\00i\00o\00n\00 \00t\00o\00o\00 \00l\00a\00r\00g\00e")
(data $14 (i32.const 3084) "<")
(data $14.1 (i32.const 3096) "\02\00\00\00 \00\00\00~\00l\00i\00b\00/\00r\00t\00/\00i\00t\00c\00m\00s\00.\00t\00s")
(data $17 (i32.const 3212) "<")
(data $17.1 (i32.const 3224) "\02\00\00\00$\00\00\00I\00n\00d\00e\00x\00 \00o\00u\00t\00 \00o\00f\00 \00r\00a\00n\00g\00e")
(data $18 (i32.const 3276) ",")
(data $18.1 (i32.const 3288) "\02\00\00\00\14\00\00\00~\00l\00i\00b\00/\00r\00t\00.\00t\00s")
(data $20 (i32.const 3356) "<")
(data $20.1 (i32.const 3368) "\02\00\00\00\1e\00\00\00~\00l\00i\00b\00/\00r\00t\00/\00t\00l\00s\00f\00.\00t\00s")
(data $21 (i32.const 3420) "\1c")
(data $21.1 (i32.const 3432) "\02")
(data $22 (i32.const 3452) "<")
(data $22.1 (i32.const 3464) "\02\00\00\00$\00\00\00~\00l\00i\00b\00/\00t\00y\00p\00e\00d\00a\00r\00r\00a\00y\00.\00t\00s")
(data $23 (i32.const 3516) "<")
(data $23.1 (i32.const 3528) "\02\00\00\00&\00\00\00~\00l\00i\00b\00/\00s\00t\00a\00t\00i\00c\00a\00r\00r\00a\00y\00.\00t\00s")
(data $24 (i32.const 3580) ",")
(data $24.1 (i32.const 3592) "\02\00\00\00\1a\00\00\00~\00l\00i\00b\00/\00a\00r\00r\00a\00y\00.\00t\00s")
(data $25 (i32.const 3628) "|")
(data $25.1 (i32.const 3640) "\02\00\00\00^\00\00\00E\00l\00e\00m\00e\00n\00t\00 \00t\00y\00p\00e\00 \00m\00u\00s\00t\00 \00b\00e\00 \00n\00u\00l\00l\00a\00b\00l\00e\00 \00i\00f\00 \00a\00r\00r\00a\00y\00 \00i\00s\00 \00h\00o\00l\00e\00y")
(data $26 (i32.const 3756) "<")
(data $26.1 (i32.const 3768) "\02\00\00\00*\00\00\00O\00b\00j\00e\00c\00t\00 \00a\00l\00r\00e\00a\00d\00y\00 \00p\00i\00n\00n\00e\00d")
(data $27 (i32.const 3820) "<")
(data $27.1 (i32.const 3832) "\02\00\00\00(\00\00\00O\00b\00j\00e\00c\00t\00 \00i\00s\00 \00n\00o\00t\00 \00p\00i\00n\00n\00e\00d")
(data $28 (i32.const 3888) "\10\00\00\00 \00\00\00 \00\00\00 ")
(data $28.1 (i32.const 3912) "\81\08\00\00\01\19\00\00\01\02\00\00$\t\00\00\a4\00\00\00$\n\00\00\02\t\00\00\02A\00\00\00\00\00\00A\00\00\00 ")

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if the baseline amount of static memory can be reduced here by computing the parts of it that aren't strictly necessary to have as LUTs?

Comment thread std/assembly/util/dtoa.ts Outdated
Co-authored-by: Max Graey <maxgraey@gmail.com>
@MaxGraey MaxGraey changed the title feat: switch grisu2 float-to-string algorithm to Xiang JunBo's xjb algorithm feat: switch grisu2 float-to-string algorithm to hybrid of xjb & zmij algorithms Jun 18, 2026
Comment on lines +419 to 430
function dtoa_dotZero(buffer: usize, len: u32): u32 {
let p = buffer;
let end = buffer + (<usize>len << 1);
while (p < end) {
let c = <i32>load<u16>(p);
if ((c < CharCode._0 || c > CharCode._9) && c != CharCode.MINUS) return len;
p += 2;
}
store<u16>(end, CharCode.DOT);
store<u16>(end, CharCode._0, 2);
return len + 2;
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like post-process stage. Can we optimize this and do in-place on main processing stage?

Comment on lines +436 to +439
len = ftoa_buffered_single(dtoa_buf, <f32>value);
} else {
// @ts-ignore: type
len = dtoa_buffered_double(dtoa_buf, <f64>value);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ftoa & dtoa already have hint about type so simplify names to:

ftoa_buffered and dtoa_buffered

Comment thread std/assembly/util/dtoa.ts

// 28 normalized exact powers 10**0..10**27 - the within-stride minor factors.
// @ts-ignore: decorator
@lazy const POW10_MINOR = memory.data<u64>([

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plz move all constans and lut tables on top

Comment thread std/assembly/util/dtoa.ts
const DIV10K_SIG: u64 = ((<u64>1) << DIV10K_EXP) / 10000 + 1;
const NEG10K: u64 = ((<u64>1) << 32) - 10000;

export const ZEROS: u64 = 0x3030303030303030;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
export const ZEROS: u64 = 0x3030303030303030;
export const BCD_ZEROS: u64 = 0x3030303030303030;

Comment thread std/assembly/util/dtoa.ts
// bswap to big-endian so the most-significant digit lands in the high byte
let bcd = bswap<u64>(singles);
gBcd = bcd;
gBcdLen = <i32>((70 - clz<u64>((bcd << 1) | 1)) / 8);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What 70 meaning here? Can you make local const with meaningful name or comment?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also as I mentioned we can remove gBcdLen and return it as laocal var from here function and leave only gBcd for glob var.

Comment thread std/assembly/util/dtoa.ts
Comment thread std/assembly/util/dtoa.ts

// High 64 bits of the 128-bit product x * y. Matches umul128.
// @ts-ignore: decorator
@inline export function mulhi64(a: u64, b: u64): u64 {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mulhi64 -> umul64hi

Comment thread std/assembly/util/dtoa.ts

// Returns (x * y + c) >> 64.
// @ts-ignore: decorator
@inline export function umul128AddHi64(x: u64, y: u64, c: u64): u64 {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

umul64hiCarry(a: u64, b: u64, carry: u64): u64

Comment thread std/assembly/util/dtoa.ts

// Shift that keeps a fixed 128-bit fractional part after scaling by 10**dec_exp.
// @ts-ignore: decorator
@inline export function computeExpShift(binExp: i32, decExp: i32): i32 {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@inline export function computeExpShift(binExp: i32, decExp: i32): i32 {
@inline export function exponentShift(binExp: i32, decExp: i32): i32 {

Comment thread std/assembly/util/dtoa.ts
// floor(log10(2**bin_exp)). (The f64 path only ever needs the regular form; the
// irregular 3/4 variant lives in ftoa.ts's own copy.)
// @ts-ignore: decorator
@inline export function computeDecExp(binExp: i32): i32 {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@inline export function computeDecExp(binExp: i32): i32 {
@inline export function toDecExponent(binExp: i32): i32 {

Comment thread std/assembly/util/dtoa.ts
// major[(i+10)/28] * minor[(i+10)%28], normalized left if the top bit is clear,
// then the per-power fixup bit subtracted off the low limb.
// @ts-ignore: decorator
@inline function computePow10(i: i32): void {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@inline function computePow10(i: i32): void {
@inline function pow10(exp: i32): u64 {
...
return lo;
}

This also reduce one global var. Also do the same for the rest similar functions which store low u64 part in glob instead return it

Comment thread std/assembly/util/dtoa.ts
Comment on lines +127 to +136
@inline export function loadPow10Xjb64(power: i32): void {
computePow10(power + 293);
gPow10Lo += u64(power < 0);
}

// @ts-ignore: decorator
@inline export function loadPow10HiXjb64(power: i32): u64 {
computePow10(power + 293);
return gPow10Hi;
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two functions are too small and trivial. Inline into call sites and remove them

Comment thread std/assembly/util/dtoa.ts
Comment on lines +296 to +303
store<u16>(base, <u16>(ascii & 0xff));
store<u16>(base, <u16>((ascii >> 8) & 0xff), 2);
store<u16>(base, <u16>((ascii >> 16) & 0xff), 4);
store<u16>(base, <u16>((ascii >> 24) & 0xff), 6);
store<u16>(base, <u16>((ascii >> 32) & 0xff), 8);
store<u16>(base, <u16>((ascii >> 40) & 0xff), 10);
store<u16>(base, <u16>((ascii >> 48) & 0xff), 12);
store<u16>(base, <u16>(ascii >> 56), 14);

@MaxGraey MaxGraey Jun 18, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's ASCII -> UTF16. It can be more efficient. It's typical interleave with zero idiom:

Suggested change
store<u16>(base, <u16>(ascii & 0xff));
store<u16>(base, <u16>((ascii >> 8) & 0xff), 2);
store<u16>(base, <u16>((ascii >> 16) & 0xff), 4);
store<u16>(base, <u16>((ascii >> 24) & 0xff), 6);
store<u16>(base, <u16>((ascii >> 32) & 0xff), 8);
store<u16>(base, <u16>((ascii >> 40) & 0xff), 10);
store<u16>(base, <u16>((ascii >> 48) & 0xff), 12);
store<u16>(base, <u16>(ascii >> 56), 14);
let lo = ascii & 0xFFFFFFFF;
let hi = ascii >> 32;
lo = (lo | (lo << 16)) & 0x0000FFFF0000FFFF;
hi = (hi | (hi << 16)) & 0x0000FFFF0000FFFF;
lo = (lo | (lo << 8)) & 0x00FF00FF00FF00FF;
hi = (hi | (hi << 8)) & 0x00FF00FF00FF00FF;
store<u64>(base, lo, 0);
store<u64>(base, hi, 8);

Comment thread std/assembly/util/dtoa.ts
// Eight packed ASCII digits in a u64 -> 8 UTF-16 code units (16 bytes) at
// `p + off`. SIMD zero-extends the bytes to u16 lanes in one store.
// @ts-ignore: decorator
@inline export function putBlock8(p: usize, ascii: u64, off: usize = 0): void {

@MaxGraey MaxGraey Jun 18, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also putBlock8 is not perfect name. I guess better use writeUnpacked8 or writeDeinterleved64toUtf16 or something like this

Comment thread std/assembly/util/dtoa.ts
if (decExp < 0) putBlock8(start, ZEROS);
let lastDigitChar = <u64>(0x30 + (hasLastDigit ? gLastDigit : 0));
let numDigits = hasLastDigit ? 16 : gDigits - 1;
let dHi = gDigHi, dLo = gDigLo;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let dHi = gDigHi, dLo = gDigLo;
let dHi = gDigHi;
let dLo = gDigLo;

Comment thread std/assembly/util/dtoa.ts
fHi = (dHi >> s) | (dLo << (64 - s));
fLo = (dLo >> s) | (d16 << (64 - s));
} else if (s == 64) {
fHi = dLo; fLo = d16;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fHi = dLo; fLo = d16;
fHi = dLo;
fLo = d16;

Comment thread std/assembly/util/dtoa.ts
Comment on lines +423 to +424
store<u16>(buf, 0x65); // 'e'
store<u16>(buf, 0x2b + (m & 2), 2); // '+' / '-' branchlessly

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have CharCode enum for such char literals

import { CharCode } from "./string";

Comment thread std/assembly/util/dtoa.ts
Comment on lines +452 to +467
@inline function setDecimalResult(integral: u64, one: u64, decExp: i32): void {
if (one == 10) {
gSig = <i64>(integral + 1);
gLastDigit = 0;
gHasLastDigit = false;
} else if (one == 0) {
gSig = <i64>integral;
gLastDigit = 0;
gHasLastDigit = false;
} else {
gSig = <i64>integral;
gLastDigit = <i32>one;
gHasLastDigit = true;
}
gExp = decExp;
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we inline this function on all 3 call sites? Does this allow us to eliminate or reduce the number of such glob vars? It really bothers me that there are so many of glob vars even though most of them could be stored locally and passed as arguments. Also at least one could be returned from the function. And use extra glob vars as fallback apprach when we need to return a multi-value

Comment thread std/assembly/util/dtoa.ts
Comment on lines +476 to +490
let decExp = (q * LOG10_2_SIGNIFICAND - 131072) >> LOG10_2_EXP; // 131072 = 2**17 rounding bias
let powExp = -decExp - 1;
let h = q + ((powExp * LOG2_POW10_SIGNIFICAND) >> LOG2_POW10_EXP);

let pow10Hi = loadPow10HiXjb64(powExp);

let integral = pow10Hi >> (11 - h);
let halfUlp = pow10Hi >> (-h);
let dotOne = pow10Hi << (53 + h);

let one = ((((dotOne >> (53 + h)) * 5) + (((<u64>1) << (9 - h)))) >> (10 - h));
one = ((((dotOne >> 54) * 5) & 0x1ff) > ((halfUlp >> 55) * 5))
? ((((dotOne >> 54) * 5) >> 9) + 1)
: one;
one = dotOne == ((<u64>1) << 62) ? 2 : one;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about invert sign for powExp at first place then is will be much more optimal:

Suggested change
let decExp = (q * LOG10_2_SIGNIFICAND - 131072) >> LOG10_2_EXP; // 131072 = 2**17 rounding bias
let powExp = -decExp - 1;
let h = q + ((powExp * LOG2_POW10_SIGNIFICAND) >> LOG2_POW10_EXP);
let pow10Hi = loadPow10HiXjb64(powExp);
let integral = pow10Hi >> (11 - h);
let halfUlp = pow10Hi >> (-h);
let dotOne = pow10Hi << (53 + h);
let one = ((((dotOne >> (53 + h)) * 5) + (((<u64>1) << (9 - h)))) >> (10 - h));
one = ((((dotOne >> 54) * 5) & 0x1ff) > ((halfUlp >> 55) * 5))
? ((((dotOne >> 54) * 5) >> 9) + 1)
: one;
one = dotOne == ((<u64>1) << 62) ? 2 : one;
let decExp = (q * LOG10_2_SIGNIFICAND - 131072) >> LOG10_2_EXP; // 131072 = 2**17 rounding bias
let powExp = decExp + 1;
let h = ((powExp * LOG2_POW10_SIGNIFICAND) >> LOG2_POW10_EXP) - q;
let pow10Hi = computePow10(293 - powExp);
gPow10Hi = pow10Hi;
let integral = pow10Hi >> 11 + h;
let halfUlp = pow10Hi >> h;
let dotOne = pow10Hi << 53 - h;
let one = ((((dotOne >> (53 - h)) * 5) + (((<u64>1) << (9 + h)))) >> (10 - h));
one = ((((dotOne >> 54) * 5) & 0x1ff) > ((halfUlp >> 55) * 5))
? ((((dotOne >> 54) * 5) >> 9) + 1)
: one;
one = dotOne == ((<u64>1) << 62) ? 2 : one;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I recommend ((dotOne >> 54) * 5) and ((dotOne >> (53 - h)) * 5) move out from expr as locals

Comment thread std/assembly/util/dtoa.ts
// bswap to big-endian so the most-significant digit lands in the high byte
let bcd = bswap<u64>(singles);
gBcd = bcd;
gBcdLen = <i32>((70 - clz<u64>((bcd << 1) | 1)) / 8);

@MaxGraey MaxGraey Jun 18, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I found a bug. It should be:

  gBcdLen = <i32>((70 - clz<u64>((singles << 1) | 1)) / 8);
value fixed gBcdLen wrong (current) gBcdLen
1 1 8
78 2 8
1234 4 8
10 2 7
100 3 6
10000 5 4

@MaxGraey

Copy link
Copy Markdown
Member

Also plz add extra tests which cover new implementation-specific edge cases (I anready found a bug) and uncomment existing test cases which failed with grisu2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Task] Replace Grisu2 with Zmij for float-to-string conversion

4 participants