Skip to content

Commit 365b2e3

Browse files
committed
encoding: rudimentary TextDecoder support w/o ICU
Also split up the tests. PR-URL: nodejs#14489 Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Refael Ackermann <refack@gmail.com>
1 parent 3b0ef0b commit 365b2e3

8 files changed

Lines changed: 428 additions & 268 deletions

File tree

doc/api/errors.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1010,6 +1010,12 @@ would be possible by calling a callback more then once.
10101010
Used when an attempt is made to use crypto features while Node.js is not
10111011
compiled with OpenSSL crypto support.
10121012

1013+
<a id="ERR_NO_ICU"></a>
1014+
### ERR_NO_ICU
1015+
1016+
Used when an attempt is made to use features that require [ICU][], while
1017+
Node.js is not compiled with ICU support.
1018+
10131019
<a id="ERR_NO_LONGER_SUPPORTED"></a>
10141020
### ERR_NO_LONGER_SUPPORTED
10151021

@@ -1139,6 +1145,7 @@ installed.
11391145
[domains]: domain.html
11401146
[event emitter-based]: events.html#events_class_eventemitter
11411147
[file descriptors]: https://en.wikipedia.org/wiki/File_descriptor
1148+
[ICU]: intl.html#intl_internationalization_support
11421149
[online]: http://man7.org/linux/man-pages/man3/errno.3.html
11431150
[stream-based]: stream.html
11441151
[syscall]: http://man7.org/linux/man-pages/man2/syscall.2.html

doc/api/intl.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@ option:
5252
| [WHATWG URL Parser][] | partial (no IDN support) | full | full | full
5353
| [`require('buffer').transcode()`][] | none (function does not exist) | full | full | full
5454
| [REPL][] | partial (inaccurate line editing) | full | full | full
55-
| [`require('util').TextDecoder`][] | none (class does not exist) | partial/full (depends on OS) | partial (Unicode-only) | full
55+
| [`require('util').TextDecoder`][] | partial (basic encodings support) | partial/full (depends on OS) | partial (Unicode-only) | full
5656

5757
*Note*: The "(not locale-aware)" designation denotes that the function carries
5858
out its operation just like the non-`Locale` version of the function, if one

doc/api/util.md

Lines changed: 39 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -544,7 +544,7 @@ added: v8.0.0
544544
A Symbol that can be used to declare custom promisified variants of functions,
545545
see [Custom promisified functions][].
546546

547-
### Class: util.TextDecoder
547+
## Class: util.TextDecoder
548548
<!-- YAML
549549
added: REPLACEME
550550
-->
@@ -563,23 +563,33 @@ while (buffer = getNextChunkSomehow()) {
563563
string += decoder.decode(); // end-of-stream
564564
```
565565

566-
#### WHATWG Supported Encodings
566+
### WHATWG Supported Encodings
567567

568568
Per the [WHATWG Encoding Standard][], the encodings supported by the
569569
`TextDecoder` API are outlined in the tables below. For each encoding,
570-
one or more aliases may be used. Support for some encodings is enabled
571-
only when Node.js is using the full ICU data (see [Internationalization][]).
572-
`util.TextDecoder` is `undefined` when ICU is not enabled during build.
570+
one or more aliases may be used.
573571

574-
##### Encodings Supported By Default
572+
Different Node.js build configurations support different sets of encodings.
573+
While a very basic set of encodings is supported even on Node.js builds without
574+
ICU enabled, support for some encodings is provided only when Node.js is built
575+
with ICU and using the full ICU data (see [Internationalization][]).
576+
577+
#### Encodings Supported Without ICU
575578

576579
| Encoding | Aliases |
577580
| ----------- | --------------------------------- |
578-
| `'utf8'` | `'unicode-1-1-utf-8'`, `'utf-8'` |
579-
| `'utf-16be'`| |
581+
| `'utf-8'` | `'unicode-1-1-utf-8'`, `'utf8'` |
580582
| `'utf-16le'`| `'utf-16'` |
581583

582-
##### Encodings Requiring Full-ICU
584+
#### Encodings Supported by Default (With ICU)
585+
586+
| Encoding | Aliases |
587+
| ----------- | --------------------------------- |
588+
| `'utf-8'` | `'unicode-1-1-utf-8'`, `'utf8'` |
589+
| `'utf-16le'`| `'utf-16'` |
590+
| `'utf-16be'`| |
591+
592+
#### Encodings Requiring Full ICU Data
583593

584594
| Encoding | Aliases |
585595
| ----------------- | -------------------------------- |
@@ -621,13 +631,14 @@ only when Node.js is using the full ICU data (see [Internationalization][]).
621631
*Note*: The `'iso-8859-16'` encoding listed in the [WHATWG Encoding Standard][]
622632
is not supported.
623633

624-
#### new TextDecoder([encoding[, options]])
634+
### new TextDecoder([encoding[, options]])
625635

626636
* `encoding` {string} Identifies the `encoding` that this `TextDecoder` instance
627637
supports. Defaults to `'utf-8'`.
628638
* `options` {Object}
629639
* `fatal` {boolean} `true` if decoding failures are fatal. Defaults to
630-
`false`.
640+
`false`. This option is only supported when ICU is enabled (see
641+
[Internationalization][]).
631642
* `ignoreBOM` {boolean} When `true`, the `TextDecoder` will include the byte
632643
order mark in the decoded result. When `false`, the byte order mark will
633644
be removed from the output. This option is only used when `encoding` is
@@ -636,7 +647,7 @@ is not supported.
636647
Creates an new `TextDecoder` instance. The `encoding` may specify one of the
637648
supported encodings or an alias.
638649

639-
#### textDecoder.decode([input[, options]])
650+
### textDecoder.decode([input[, options]])
640651

641652
* `input` {ArrayBuffer|DataView|TypedArray} An `ArrayBuffer`, `DataView` or
642653
Typed Array instance containing the encoded data.
@@ -652,49 +663,55 @@ internally and emitted after the next call to `textDecoder.decode()`.
652663
If `textDecoder.fatal` is `true`, decoding errors that occur will result in a
653664
`TypeError` being thrown.
654665

655-
#### textDecoder.encoding
666+
### textDecoder.encoding
656667

657-
* Value: {string}
668+
* {string}
658669

659670
The encoding supported by the `TextDecoder` instance.
660671

661-
#### textDecoder.fatal
672+
### textDecoder.fatal
662673

663-
* Value: {boolean}
674+
* {boolean}
664675

665676
The value will be `true` if decoding errors result in a `TypeError` being
666677
thrown.
667678

668-
#### textDecoder.ignoreBOM
679+
### textDecoder.ignoreBOM
669680

670-
* Value: {boolean}
681+
* {boolean}
671682

672683
The value will be `true` if the decoding result will include the byte order
673684
mark.
674685

675-
### Class: util.TextEncoder
686+
## Class: util.TextEncoder
676687
<!-- YAML
677688
added: REPLACEME
678689
-->
679690

680691
> Stability: 1 - Experimental
681692
682693
An implementation of the [WHATWG Encoding Standard][] `TextEncoder` API. All
683-
instances of `TextEncoder` only support `UTF-8` encoding.
694+
instances of `TextEncoder` only support UTF-8 encoding.
684695

685696
```js
686697
const encoder = new TextEncoder();
687698
const uint8array = encoder.encode('this is some data');
688699
```
689700

690-
#### textEncoder.encode([input])
701+
### textEncoder.encode([input])
691702

692703
* `input` {string} The text to encode. Defaults to an empty string.
693704
* Returns: {Uint8Array}
694705

695-
UTF-8 Encodes the `input` string and returns a `Uint8Array` containing the
706+
UTF-8 encodes the `input` string and returns a `Uint8Array` containing the
696707
encoded bytes.
697708

709+
### textDecoder.encoding
710+
711+
* {string}
712+
713+
The encoding supported by the `TextEncoder` instance. Always set to `'utf-8'`.
714+
698715
## Deprecated APIs
699716

700717
The following APIs have been deprecated and should no longer be used. Existing

0 commit comments

Comments
 (0)