Skip to content

Optimize string decoding performance#34

Merged
gfx merged 1 commit intomsgpack:masterfrom
sergeyzenchenko:master
May 28, 2019
Merged

Optimize string decoding performance#34
gfx merged 1 commit intomsgpack:masterfrom
sergeyzenchenko:master

Conversation

@sergeyzenchenko
Copy link
Copy Markdown
Collaborator

@sergeyzenchenko sergeyzenchenko commented May 28, 2019

@gfx here is string decoding optimization PR

#33

@sergeyzenchenko
Copy link
Copy Markdown
Collaborator Author

I think best way is to remove WASM decoding version in order to simplify the code.
For small strings JS version is fast enough and for large strings TextDecoder is faster than WASM because it's done internally inside JS VM and does not require multiple strings operations inside JS like WASM and JS versions (I mean String.fromCharCode.apply(String, units as any);)

@gfx
Copy link
Copy Markdown
Member

gfx commented May 28, 2019

Really fascinating results.

Comparison with utf8Decode, utf8DecodeWasm, and TextDecoder like this:

/* eslint-disable no-console */
import { utf8Encode, utf8Count, utf8Decode } from "../src/utils/utf8";
import { utf8DecodeWasm } from "../src/wasmFunctions";

// @ts-ignore
import Benchmark from "benchmark";

const textDecoder = new TextDecoder();

const dataSet = [10, 100, 200, 1_000, 10_000, 100_000].map((n) => {
  return "a".repeat(n);
});

for (const str of dataSet) {
  const byteLength = utf8Count(str);
  const bytes = new Uint8Array(new ArrayBuffer(byteLength));
  utf8Encode(str, bytes, 0);

  console.log(`\n## string length=${str.length} byteLength=${byteLength}\n`);

  const suite = new Benchmark.Suite();

  const N = Math.round(100_0000 / str.length);

  // use the result to avoid void-context optimizations
  let count = 0;

  suite.add("utf8Decode", () => {
    if (utf8Decode(bytes, 0, byteLength) !== str) {
      throw new Error("wrong result!");
    }
  });

  suite.add("utf8DecodeWasm", () => {
    if (utf8DecodeWasm(bytes, 0, byteLength) !== str) {
      throw new Error("wrong result!");
    }
  });

  suite.add("TextDecoder", () => {
    if (textDecoder.decode(bytes.subarray(0, byteLength)) !== str) {
      throw new Error("wrong result!");
    }
  });
  suite.on("cycle", (event: any) => {
    console.log(String(event.target));
  });

  suite.run();
}

The result is already what you know, but the best threshold seems 200 (not 32):

$ npx ts-node benchmark/sandbox.ts

## string length=10 byteLength=10

utf8Decode x 8,147,700 ops/sec ±3.23% (84 runs sampled)
utf8DecodeWasm x 1,073,699 ops/sec ±2.33% (88 runs sampled)
TextDecoder x 693,559 ops/sec ±3.68% (74 runs sampled)

## string length=100 byteLength=100

utf8Decode x 860,952 ops/sec ±3.01% (83 runs sampled)
utf8DecodeWasm x 323,801 ops/sec ±8.54% (67 runs sampled)
TextDecoder x 484,350 ops/sec ±6.20% (69 runs sampled)

## string length=200 byteLength=200

utf8Decode x 458,241 ops/sec ±3.88% (88 runs sampled)
utf8DecodeWasm x 326,323 ops/sec ±5.80% (79 runs sampled)
TextDecoder x 454,980 ops/sec ±3.84% (74 runs sampled)

## string length=300 byteLength=300

utf8Decode x 298,996 ops/sec ±2.66% (83 runs sampled)
utf8DecodeWasm x 215,869 ops/sec ±9.42% (74 runs sampled)
TextDecoder x 403,562 ops/sec ±4.16% (75 runs sampled)

## string length=1000 byteLength=1000

utf8Decode x 88,042 ops/sec ±3.17% (87 runs sampled)
utf8DecodeWasm x 90,835 ops/sec ±3.79% (84 runs sampled)
TextDecoder x 248,807 ops/sec ±4.62% (83 runs sampled)

## string length=10000 byteLength=10000

utf8Decode x 9,666 ops/sec ±2.65% (87 runs sampled)
utf8DecodeWasm x 8,440 ops/sec ±7.20% (72 runs sampled)
TextDecoder x 42,109 ops/sec ±5.33% (81 runs sampled)

## string length=100000 byteLength=100000

utf8Decode x 105 ops/sec ±4.66% (69 runs sampled)
utf8DecodeWasm x 71.41 ops/sec ±4.42% (61 runs sampled)
TextDecoder x 3,550 ops/sec ±2.13% (87 runs sampled)

And as you suggested, WASM version is redundant if TextDecoder is available. It looks too early.

@gfx gfx merged commit 73c9984 into msgpack:master May 28, 2019
@gfx
Copy link
Copy Markdown
Member

gfx commented May 28, 2019

I'll still keep WASM ver. for a while because it should become faster than the current WASM engine.

Especially wasm/reference-types could boost the speed significantly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants