File tree Expand file tree Collapse file tree
Expand file tree Collapse file tree Original file line number Diff line number Diff line change 4343// (building the huffman encoding on UTF-16 code points gave better
4444// compression than building it on UTF-8 bytes)
4545//
46+ // - code points starting at 128 (word_start) and potentially extending
47+ // to 255 (word_end) (but never interfering with the target
48+ // language's used code points) stand for dictionary entries in a
49+ // dictionary with size up to 256 code points. The dictionary entries
50+ // are computed with a heuristic based on frequent substrings of 2 to
51+ // 9 code points. These are called "words" but are not, grammatically
52+ // speaking, words. They're just spans of code points that frequently
53+ // occur together.
54+ //
55+ // - dictionary entries are non-overlapping, and the _ending_ index of each
56+ // entry is stored in an array. Since the index given is the ending
57+ // index, the array is called "wends".
58+ //
4659// The "data" / "tail" construct is so that the struct's last member is a
4760// "flexible array". However, the _only_ member is not permitted to be
4861// a flexible member, so we have to declare the first byte as a separte
You can’t perform that action at this time.
0 commit comments