feat: switch grisu2 float-to-string algorithm to hybrid of xjb & zmij algorithms#3025
feat: switch grisu2 float-to-string algorithm to hybrid of xjb & zmij algorithms#3025JairusSW wants to merge 30 commits into
Conversation
Signed-off-by: Jairus Tanaka <me@jairus.dev>
Signed-off-by: Jairus Tanaka <me@jairus.dev>
Signed-off-by: Jairus Tanaka <me@jairus.dev>
|
Just curious here, why the exception of a trailing .0? |
Max and Dan made that decision so that floats are easily identifiable when converting them to a string. For example, you know that the string |
|
Can you rename xjb.ts to dtoa.ts and in comments mentioned about implementation based on xjb and zmij? |
Co-authored-by: Max Graey <maxgraey@gmail.com>
Co-authored-by: Max Graey <maxgraey@gmail.com>
| let gBcd: u64 = 0; | ||
| let gBcdLen: i32 = 0; |
| export let gDigHi: u64 = 0; | ||
| export let gDigLo: u64 = 0; | ||
| export let gDigNum: i32 = 0; |
There was a problem hiding this comment.
You have a lot of global vars. Does we need all of them? And can we declare all of necessary vars in one place?
Co-authored-by: Max Graey <maxgraey@gmail.com>
| (data $5 (i32.const 1239) "\80\00\00\00\00\00\00\00\a0\00\00\00\00\00\00\00\c8\00\00\00\00\00\00\00\fa\00\00\00\00\00\00@\9c\00\00\00\00\00\00P\c3\00\00\00\00\00\00$\f4\00\00\00\00\00\80\96\98\00\00\00\00\00 \bc\be\00\00\00\00\00(k\ee\00\00\00\00\00\f9\02\95\00\00\00\00@\b7C\ba\00\00\00\00\10\a5\d4\e8\00\00\00\00*\e7\84\91\00\00\00\80\f4 \e6\b5\00\00\00\a01\a9_\e3\00\00\00\04\bf\c9\1b\8e\00\00\00\c5.\bc\a2\b1\00\00@v:k\0b\de\00\00\e8\89\04#\c7\8a\00\00b\ac\c5\ebx\ad\00\80z\17\b7&\d7\d8\00\90\acn2x\86\87\00\b4W\n?\16h\a9\00\a1\ed\cc\ce\1b\c2\d3\a0\84\14@aQY\84\c8\a5\19\90\b9\a5o\a5:\0f \f4\'\8f\cb\ce") | ||
| (data $6 (i32.const 1456) "o\1b\8e(\10T\8e\af\daM\e4^\ae\f0\ec\07J\fb\9f\f4\98\'D\b1\9dwA\df\cf\11\cd\99\07\ef\99\85\0b?\fe\b2\15\aa\b4\dc\e6\a7\1f\86c\beZ\06\0b\a5\bc\b4\aaSkuz\07\ed\0f\08\bf,)Ud\7f\b6C\d5\b1\17L\c8;\1a\fb;\efi\c2\87F\b8B\a7\ee@OQ]=\eb\dd\e4PF\1a\12\ba\13\e4labM\f3\92\ea\af(\b6\ef&\e2\bb\8c6U\n\f7\89\04\89\0f`\cb\05\e9\b8\b6\bd!\c9\c1\bb\87\e9\00T\96_\9a\84x\db\8f\bf4\d0\bdr\04R\98\de\'\8a\92\95\00\9am\c1\94\82\17\0f<\05\b7u\00\00\00\00\00\00P\c3\00\00\00\00\00\00\00\00\05\e3L6\12\197\c5\00\00\00\00\00\00(l\d6\aa\80\9d\ef\f0\"\c7\f6~\b9\b7\d2:MBL\c8q\d5m\93\13\c9\ea8\1e\cd\19:\bc\03\1cU\ab\01\80\0c\t\cb\c6,\07\d3\bf\f5\ad\\\a1\90\08\137h\03\cd\10\8cz\c3\87\a8\db6.\ef\07\12\c2\b2\02\cf\bc\f4\03^\e4g\f9\94\c7\85\d7in\f8\06\d1R\ba\be\01\d763\e1|\a0\1c4\a8E\10\d3Q\a0\t\12\11H\de\1e1Vx\85\fa\a6\1e\d5f\a5>\7f\"t*U3\f1\ca\ba\0f)2\d7\96@\adGy\17|\a9t\088\c7\b1\d8J\d9\bc\"x\ae\81R7\18") | ||
| (data $7 (i32.const 1824) "?6N\n@\18\00\00\00d\00\00@\00 $\00\00\00\00\00\00\00\0c\80\13\c8\82\1f\e0L^\0f\f60\d7\1b\00\00\00\00\00\00\00\fc\ff\f7\cd\d8\01\82n\d1?\cd@\01%d\db\r\r\00\00\00$\04\14@8qS\b4\1dx\11") | ||
| (data $9 (i32.const 2032) "p\\\ea{\ce2~\8f\1a\c7C\c6\b0\b7\96\e5\ae\05\03\05\'\c6\ab\b7\bf7\cf\d0\b8\d1\ef\92\fe%\e5\1a\8eO\19\eb2\ebP\e2\a4?\14\bc\f5\88\r\b5P\99v\96!\dbH\bb\1a\c2\bd\f0\b4\15\07\c9{\ce\97\c0]\11l:\96\0b\13\9a\c7\1b\e0\c3V\df\84\f6\06\e3L6\12\197\c5\9e\b5p+\a8\ad\c5\9d\97\"\81E@|o\fc\dfNg\04\cd\c9\f2\c9\e6\0b\b96\d7\07\8f\a1\85\t\94\f8x9?\81:\0f \f4\'\8f\cb\ce\c8\a5\19\90\b9\a5o\a5\a0\84\14@aQY\84\00\a1\ed\cc\ce\1b\c2\d3\00\b4W\n?\16h\a9\00\90\acn2x\86\87\00\80z\17\b7&\d7\d8\00\00b\ac\c5\ebx\ad\00\00\e8\89\04#\c7\8a\00\00@v:k\0b\de\00\00\00\c5.\bc\a2\b1\00\00\00\04\bf\c9\1b\8e\00\00\00\a01\a9_\e3\00\00\00\80\f4 \e6\b5\00\00\00\00*\e7\84\91\00\00\00\00\10\a5\d4\e8\00\00\00\00@\b7C\ba\00\00\00\00\00\f9\02\95\00\00\00\00\00(k\ee\00\00\00\00\00 \bc\be\00\00\00\00\00\80\96\98\00\00\00\00\00\00$\f4\00\00\00\00\00\00P\c3\00\00\00\00\00\00@\9c\00\00\00\00\00\00\00\fa\00\00\00\00\00\00\00\c8\00\00\00\00\00\00\00\a0\00\00\00\00\00\00\00\80\cd\cc\cc\cc\cc\cc\cc\cc\0b\d7\a3p=\n\d7\a3<\dfO\8d\97n\12\83,e\19\e2X\17\b7\d1$\84G\1bG\ac\c5\a7\b6il\af\05\bd7\86\bdBz\e5\d5\94\bf\d6\fd\cea\84\11w\cc\ab\98\a5\b46A_p\89\bf\d5\ed\bd\ce\fe\e6\db\ff\aa$\cb\0b\ff\eb\af\cc\88Po\t\cc\bc\8c\14\0e\b4KB\13.\e1\10\d8\\\t5\dc$\b4\da\ac\b0:\f7|\1d\90\\\e1M\c4\be\94\95\e6J\b4\a462\aaw\b8\08]\1d\92\8e\ee\92\93\a6a\95\b6}J\1e\ec\eb\1a\11\92d\08\e5\bc\ef{\datP\a0\1d\97\b2,\f7\ba\80\00\c9\f1(\8a\92\95\00\9am\c1S;uD\cd\14\be\9aR\c5\ee\d3\ae\87\96\f7\db\9dXv%\06\12\c6I~\e0\91\b7\d1t\9e\0e\ca\00\83\f2\b5\87\fd?;\9a5\f5\f7\d2\ca2\fc\14^\f7_B\a2\f5\fcCK,\b3\ce\81\bb\949E\ad\1e\b1\cf") | ||
| (data $10 (i32.const 2648) "\"\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$\"#$!\"#") | ||
| (data $11 (i32.const 2908) ",") | ||
| (data $11.1 (i32.const 2920) "\02\00\00\00\1c\00\00\00I\00n\00v\00a\00l\00i\00d\00 \00l\00e\00n\00g\00t\00h") | ||
| (data $12 (i32.const 2956) "<") | ||
| (data $12.1 (i32.const 2968) "\02\00\00\00&\00\00\00~\00l\00i\00b\00/\00a\00r\00r\00a\00y\00b\00u\00f\00f\00e\00r\00.\00t\00s") | ||
| (data $13 (i32.const 3020) "<") | ||
| (data $13.1 (i32.const 3032) "\02\00\00\00(\00\00\00A\00l\00l\00o\00c\00a\00t\00i\00o\00n\00 \00t\00o\00o\00 \00l\00a\00r\00g\00e") | ||
| (data $14 (i32.const 3084) "<") | ||
| (data $14.1 (i32.const 3096) "\02\00\00\00 \00\00\00~\00l\00i\00b\00/\00r\00t\00/\00i\00t\00c\00m\00s\00.\00t\00s") | ||
| (data $17 (i32.const 3212) "<") | ||
| (data $17.1 (i32.const 3224) "\02\00\00\00$\00\00\00I\00n\00d\00e\00x\00 \00o\00u\00t\00 \00o\00f\00 \00r\00a\00n\00g\00e") | ||
| (data $18 (i32.const 3276) ",") | ||
| (data $18.1 (i32.const 3288) "\02\00\00\00\14\00\00\00~\00l\00i\00b\00/\00r\00t\00.\00t\00s") | ||
| (data $20 (i32.const 3356) "<") | ||
| (data $20.1 (i32.const 3368) "\02\00\00\00\1e\00\00\00~\00l\00i\00b\00/\00r\00t\00/\00t\00l\00s\00f\00.\00t\00s") | ||
| (data $21 (i32.const 3420) "\1c") | ||
| (data $21.1 (i32.const 3432) "\02") | ||
| (data $22 (i32.const 3452) "<") | ||
| (data $22.1 (i32.const 3464) "\02\00\00\00$\00\00\00~\00l\00i\00b\00/\00t\00y\00p\00e\00d\00a\00r\00r\00a\00y\00.\00t\00s") | ||
| (data $23 (i32.const 3516) "<") | ||
| (data $23.1 (i32.const 3528) "\02\00\00\00&\00\00\00~\00l\00i\00b\00/\00s\00t\00a\00t\00i\00c\00a\00r\00r\00a\00y\00.\00t\00s") | ||
| (data $24 (i32.const 3580) ",") | ||
| (data $24.1 (i32.const 3592) "\02\00\00\00\1a\00\00\00~\00l\00i\00b\00/\00a\00r\00r\00a\00y\00.\00t\00s") | ||
| (data $25 (i32.const 3628) "|") | ||
| (data $25.1 (i32.const 3640) "\02\00\00\00^\00\00\00E\00l\00e\00m\00e\00n\00t\00 \00t\00y\00p\00e\00 \00m\00u\00s\00t\00 \00b\00e\00 \00n\00u\00l\00l\00a\00b\00l\00e\00 \00i\00f\00 \00a\00r\00r\00a\00y\00 \00i\00s\00 \00h\00o\00l\00e\00y") | ||
| (data $26 (i32.const 3756) "<") | ||
| (data $26.1 (i32.const 3768) "\02\00\00\00*\00\00\00O\00b\00j\00e\00c\00t\00 \00a\00l\00r\00e\00a\00d\00y\00 \00p\00i\00n\00n\00e\00d") | ||
| (data $27 (i32.const 3820) "<") | ||
| (data $27.1 (i32.const 3832) "\02\00\00\00(\00\00\00O\00b\00j\00e\00c\00t\00 \00i\00s\00 \00n\00o\00t\00 \00p\00i\00n\00n\00e\00d") | ||
| (data $28 (i32.const 3888) "\10\00\00\00 \00\00\00 \00\00\00 ") | ||
| (data $28.1 (i32.const 3912) "\81\08\00\00\01\19\00\00\01\02\00\00$\t\00\00\a4\00\00\00$\n\00\00\02\t\00\00\02A\00\00\00\00\00\00A\00\00\00 ") |
There was a problem hiding this comment.
Wondering if the baseline amount of static memory can be reduced here by computing the parts of it that aren't strictly necessary to have as LUTs?
Co-authored-by: Max Graey <maxgraey@gmail.com>
…ript into jairus/switch-to-xjb
| function dtoa_dotZero(buffer: usize, len: u32): u32 { | ||
| let p = buffer; | ||
| let end = buffer + (<usize>len << 1); | ||
| while (p < end) { | ||
| let c = <i32>load<u16>(p); | ||
| if ((c < CharCode._0 || c > CharCode._9) && c != CharCode.MINUS) return len; | ||
| p += 2; | ||
| } | ||
| store<u16>(end, CharCode.DOT); | ||
| store<u16>(end, CharCode._0, 2); | ||
| return len + 2; | ||
| } |
There was a problem hiding this comment.
Looks like post-process stage. Can we optimize this and do in-place on main processing stage?
| len = ftoa_buffered_single(dtoa_buf, <f32>value); | ||
| } else { | ||
| // @ts-ignore: type | ||
| len = dtoa_buffered_double(dtoa_buf, <f64>value); |
There was a problem hiding this comment.
ftoa & dtoa already have hint about type so simplify names to:
ftoa_buffered and dtoa_buffered
|
|
||
| // 28 normalized exact powers 10**0..10**27 - the within-stride minor factors. | ||
| // @ts-ignore: decorator | ||
| @lazy const POW10_MINOR = memory.data<u64>([ |
There was a problem hiding this comment.
Plz move all constans and lut tables on top
| const DIV10K_SIG: u64 = ((<u64>1) << DIV10K_EXP) / 10000 + 1; | ||
| const NEG10K: u64 = ((<u64>1) << 32) - 10000; | ||
|
|
||
| export const ZEROS: u64 = 0x3030303030303030; |
There was a problem hiding this comment.
| export const ZEROS: u64 = 0x3030303030303030; | |
| export const BCD_ZEROS: u64 = 0x3030303030303030; |
| // bswap to big-endian so the most-significant digit lands in the high byte | ||
| let bcd = bswap<u64>(singles); | ||
| gBcd = bcd; | ||
| gBcdLen = <i32>((70 - clz<u64>((bcd << 1) | 1)) / 8); |
There was a problem hiding this comment.
What 70 meaning here? Can you make local const with meaningful name or comment?
There was a problem hiding this comment.
Also as I mentioned we can remove gBcdLen and return it as laocal var from here function and leave only gBcd for glob var.
|
|
||
| // High 64 bits of the 128-bit product x * y. Matches umul128. | ||
| // @ts-ignore: decorator | ||
| @inline export function mulhi64(a: u64, b: u64): u64 { |
|
|
||
| // Returns (x * y + c) >> 64. | ||
| // @ts-ignore: decorator | ||
| @inline export function umul128AddHi64(x: u64, y: u64, c: u64): u64 { |
There was a problem hiding this comment.
umul64hiCarry(a: u64, b: u64, carry: u64): u64
|
|
||
| // Shift that keeps a fixed 128-bit fractional part after scaling by 10**dec_exp. | ||
| // @ts-ignore: decorator | ||
| @inline export function computeExpShift(binExp: i32, decExp: i32): i32 { |
There was a problem hiding this comment.
| @inline export function computeExpShift(binExp: i32, decExp: i32): i32 { | |
| @inline export function exponentShift(binExp: i32, decExp: i32): i32 { |
| // floor(log10(2**bin_exp)). (The f64 path only ever needs the regular form; the | ||
| // irregular 3/4 variant lives in ftoa.ts's own copy.) | ||
| // @ts-ignore: decorator | ||
| @inline export function computeDecExp(binExp: i32): i32 { |
There was a problem hiding this comment.
| @inline export function computeDecExp(binExp: i32): i32 { | |
| @inline export function toDecExponent(binExp: i32): i32 { |
| // major[(i+10)/28] * minor[(i+10)%28], normalized left if the top bit is clear, | ||
| // then the per-power fixup bit subtracted off the low limb. | ||
| // @ts-ignore: decorator | ||
| @inline function computePow10(i: i32): void { |
There was a problem hiding this comment.
| @inline function computePow10(i: i32): void { | |
| @inline function pow10(exp: i32): u64 { | |
| ... | |
| return lo; | |
| } |
This also reduce one global var. Also do the same for the rest similar functions which store low u64 part in glob instead return it
| @inline export function loadPow10Xjb64(power: i32): void { | ||
| computePow10(power + 293); | ||
| gPow10Lo += u64(power < 0); | ||
| } | ||
|
|
||
| // @ts-ignore: decorator | ||
| @inline export function loadPow10HiXjb64(power: i32): u64 { | ||
| computePow10(power + 293); | ||
| return gPow10Hi; | ||
| } |
There was a problem hiding this comment.
These two functions are too small and trivial. Inline into call sites and remove them
| store<u16>(base, <u16>(ascii & 0xff)); | ||
| store<u16>(base, <u16>((ascii >> 8) & 0xff), 2); | ||
| store<u16>(base, <u16>((ascii >> 16) & 0xff), 4); | ||
| store<u16>(base, <u16>((ascii >> 24) & 0xff), 6); | ||
| store<u16>(base, <u16>((ascii >> 32) & 0xff), 8); | ||
| store<u16>(base, <u16>((ascii >> 40) & 0xff), 10); | ||
| store<u16>(base, <u16>((ascii >> 48) & 0xff), 12); | ||
| store<u16>(base, <u16>(ascii >> 56), 14); |
There was a problem hiding this comment.
It's ASCII -> UTF16. It can be more efficient. It's typical interleave with zero idiom:
| store<u16>(base, <u16>(ascii & 0xff)); | |
| store<u16>(base, <u16>((ascii >> 8) & 0xff), 2); | |
| store<u16>(base, <u16>((ascii >> 16) & 0xff), 4); | |
| store<u16>(base, <u16>((ascii >> 24) & 0xff), 6); | |
| store<u16>(base, <u16>((ascii >> 32) & 0xff), 8); | |
| store<u16>(base, <u16>((ascii >> 40) & 0xff), 10); | |
| store<u16>(base, <u16>((ascii >> 48) & 0xff), 12); | |
| store<u16>(base, <u16>(ascii >> 56), 14); | |
| let lo = ascii & 0xFFFFFFFF; | |
| let hi = ascii >> 32; | |
| lo = (lo | (lo << 16)) & 0x0000FFFF0000FFFF; | |
| hi = (hi | (hi << 16)) & 0x0000FFFF0000FFFF; | |
| lo = (lo | (lo << 8)) & 0x00FF00FF00FF00FF; | |
| hi = (hi | (hi << 8)) & 0x00FF00FF00FF00FF; | |
| store<u64>(base, lo, 0); | |
| store<u64>(base, hi, 8); |
| // Eight packed ASCII digits in a u64 -> 8 UTF-16 code units (16 bytes) at | ||
| // `p + off`. SIMD zero-extends the bytes to u16 lanes in one store. | ||
| // @ts-ignore: decorator | ||
| @inline export function putBlock8(p: usize, ascii: u64, off: usize = 0): void { |
There was a problem hiding this comment.
Also putBlock8 is not perfect name. I guess better use writeUnpacked8 or writeDeinterleved64toUtf16 or something like this
| if (decExp < 0) putBlock8(start, ZEROS); | ||
| let lastDigitChar = <u64>(0x30 + (hasLastDigit ? gLastDigit : 0)); | ||
| let numDigits = hasLastDigit ? 16 : gDigits - 1; | ||
| let dHi = gDigHi, dLo = gDigLo; |
There was a problem hiding this comment.
| let dHi = gDigHi, dLo = gDigLo; | |
| let dHi = gDigHi; | |
| let dLo = gDigLo; |
| fHi = (dHi >> s) | (dLo << (64 - s)); | ||
| fLo = (dLo >> s) | (d16 << (64 - s)); | ||
| } else if (s == 64) { | ||
| fHi = dLo; fLo = d16; |
There was a problem hiding this comment.
| fHi = dLo; fLo = d16; | |
| fHi = dLo; | |
| fLo = d16; |
| store<u16>(buf, 0x65); // 'e' | ||
| store<u16>(buf, 0x2b + (m & 2), 2); // '+' / '-' branchlessly |
There was a problem hiding this comment.
We have CharCode enum for such char literals
import { CharCode } from "./string";| @inline function setDecimalResult(integral: u64, one: u64, decExp: i32): void { | ||
| if (one == 10) { | ||
| gSig = <i64>(integral + 1); | ||
| gLastDigit = 0; | ||
| gHasLastDigit = false; | ||
| } else if (one == 0) { | ||
| gSig = <i64>integral; | ||
| gLastDigit = 0; | ||
| gHasLastDigit = false; | ||
| } else { | ||
| gSig = <i64>integral; | ||
| gLastDigit = <i32>one; | ||
| gHasLastDigit = true; | ||
| } | ||
| gExp = decExp; | ||
| } |
There was a problem hiding this comment.
What if we inline this function on all 3 call sites? Does this allow us to eliminate or reduce the number of such glob vars? It really bothers me that there are so many of glob vars even though most of them could be stored locally and passed as arguments. Also at least one could be returned from the function. And use extra glob vars as fallback apprach when we need to return a multi-value
| let decExp = (q * LOG10_2_SIGNIFICAND - 131072) >> LOG10_2_EXP; // 131072 = 2**17 rounding bias | ||
| let powExp = -decExp - 1; | ||
| let h = q + ((powExp * LOG2_POW10_SIGNIFICAND) >> LOG2_POW10_EXP); | ||
|
|
||
| let pow10Hi = loadPow10HiXjb64(powExp); | ||
|
|
||
| let integral = pow10Hi >> (11 - h); | ||
| let halfUlp = pow10Hi >> (-h); | ||
| let dotOne = pow10Hi << (53 + h); | ||
|
|
||
| let one = ((((dotOne >> (53 + h)) * 5) + (((<u64>1) << (9 - h)))) >> (10 - h)); | ||
| one = ((((dotOne >> 54) * 5) & 0x1ff) > ((halfUlp >> 55) * 5)) | ||
| ? ((((dotOne >> 54) * 5) >> 9) + 1) | ||
| : one; | ||
| one = dotOne == ((<u64>1) << 62) ? 2 : one; |
There was a problem hiding this comment.
How about invert sign for powExp at first place then is will be much more optimal:
| let decExp = (q * LOG10_2_SIGNIFICAND - 131072) >> LOG10_2_EXP; // 131072 = 2**17 rounding bias | |
| let powExp = -decExp - 1; | |
| let h = q + ((powExp * LOG2_POW10_SIGNIFICAND) >> LOG2_POW10_EXP); | |
| let pow10Hi = loadPow10HiXjb64(powExp); | |
| let integral = pow10Hi >> (11 - h); | |
| let halfUlp = pow10Hi >> (-h); | |
| let dotOne = pow10Hi << (53 + h); | |
| let one = ((((dotOne >> (53 + h)) * 5) + (((<u64>1) << (9 - h)))) >> (10 - h)); | |
| one = ((((dotOne >> 54) * 5) & 0x1ff) > ((halfUlp >> 55) * 5)) | |
| ? ((((dotOne >> 54) * 5) >> 9) + 1) | |
| : one; | |
| one = dotOne == ((<u64>1) << 62) ? 2 : one; | |
| let decExp = (q * LOG10_2_SIGNIFICAND - 131072) >> LOG10_2_EXP; // 131072 = 2**17 rounding bias | |
| let powExp = decExp + 1; | |
| let h = ((powExp * LOG2_POW10_SIGNIFICAND) >> LOG2_POW10_EXP) - q; | |
| let pow10Hi = computePow10(293 - powExp); | |
| gPow10Hi = pow10Hi; | |
| let integral = pow10Hi >> 11 + h; | |
| let halfUlp = pow10Hi >> h; | |
| let dotOne = pow10Hi << 53 - h; | |
| let one = ((((dotOne >> (53 - h)) * 5) + (((<u64>1) << (9 + h)))) >> (10 - h)); | |
| one = ((((dotOne >> 54) * 5) & 0x1ff) > ((halfUlp >> 55) * 5)) | |
| ? ((((dotOne >> 54) * 5) >> 9) + 1) | |
| : one; | |
| one = dotOne == ((<u64>1) << 62) ? 2 : one; |
There was a problem hiding this comment.
Also I recommend ((dotOne >> 54) * 5) and ((dotOne >> (53 - h)) * 5) move out from expr as locals
| // bswap to big-endian so the most-significant digit lands in the high byte | ||
| let bcd = bswap<u64>(singles); | ||
| gBcd = bcd; | ||
| gBcdLen = <i32>((70 - clz<u64>((bcd << 1) | 1)) / 8); |
There was a problem hiding this comment.
Also I found a bug. It should be:
gBcdLen = <i32>((70 - clz<u64>((singles << 1) | 1)) / 8);| value | fixed gBcdLen | wrong (current) gBcdLen |
|---|---|---|
| 1 | 1 | 8 |
| 78 | 2 | 8 |
| 1234 | 4 | 8 |
| 10 | 2 | 7 |
| 100 | 3 | 6 |
| 10000 | 5 | 4 |
|
Also plz add extra tests which cover new implementation-specific edge cases (I anready found a bug) and uncomment existing test cases which failed with grisu2 |
Fixes #3012.
Changes proposed in this pull request:
⯈ Switch to xjb-as which is an improvement over zmij-as
⯈ Comply to the ECMA262 Specification with the exception of a trailing
.0It's good quality stuff though, and I'm quite confident in that.
Here's some performance notes. xjb-as/README.md has more extensive notes and stuff. These benches are taken on an AMD 7800x3D.
Lookup-table footprint vs the old grisu2 implementation (omiting shared tables):
dtoa) onlyftoa) only