Skip to content

Commit ac46a4f

Browse files
authored
Treat U+30A0 & U+30FB in Katakana Block as CJK (#16796)
1 parent d52e905 commit ac46a4f

5 files changed

Lines changed: 56 additions & 1 deletion

File tree

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
#### Treat U+30A0 & U+30FB in Katakana Block as CJK (#16796 by @tats-u)
2+
3+
Prettier doesn't treat U+30A0 & U+30FB as Japanese. U+30FB is commonly used in Japanese to represent the delimitation of first and last names of non-Japanese people or “and”. The following “C言語・C++・Go・Rust” means “C language & C++ & Go & Rust” in Japanese.
4+
5+
<!-- prettier-ignore -->
6+
```md
7+
<!-- Input (--prose-wrap=never) -->
8+
9+
C言
10+
11+
12+
C++
13+
14+
Go
15+
16+
Rust
17+
18+
<!-- Prettier stable -->
19+
C言語・ C++ ・ Go ・ Rust
20+
21+
<!-- Prettier main -->
22+
C言語・C++・Go・Rust
23+
```
24+
25+
U+30A0 can be used as the replacement of the `-` in non-Japanese names (e.g. “Saint-Saëns” (Charles Camille Saint-Saëns) can be represented as “サン゠サーンス” in Japanese), but substituted by ASCII hyphen (U+002D) or U+FF1D (full width hyphen) in many cases (e.g. “サン=サーンス” or “サン=サーンス”).

cspell.json

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -241,6 +241,7 @@
241241
"Rubocop",
242242
"ruleset",
243243
"rulesets",
244+
"Saëns",
244245
"sandhose",
245246
"Sapegin",
246247
"sbdchd",

src/language-markdown/constants.evaluate.js

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,14 @@ const cjkCharset = new Charset(
1414
"Modifier_Symbol",
1515
"Nonspacing_Mark",
1616
],
17-
}),
17+
// .union below makes the next Block condition "OR"
18+
// If it is merged into this object definition, it will be "AND" instead
19+
}).union(
20+
// Firefox treats some symbols (U+30A0, U+30FB) in the Katakana block as CJK
21+
unicodeRegex({
22+
Block: ["Katakana"],
23+
}),
24+
),
1825
);
1926
const variationSelectorsCharset = unicodeRegex({
2027
Block: ["Variation_Selectors", "Variation_Selectors_Supplement"],

tests/format/markdown/splitCjkText/__snapshots__/format.test.js.snap

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -283,6 +283,16 @@ English
283283
[ウ
284284
ィキペディア]: https://ja.wikipedia.org/
285285
286+
287+
C言
288+
289+
290+
C++
291+
292+
Go
293+
294+
Rust
295+
286296
=====================================output=====================================
287297
日本語、にほんご。汉语, 中文. 日本語,にほんご.English
288298
words!? 漢字!汉字?「セリフ」(括弧) 文字(括弧)文字【括弧】日本語English
@@ -297,5 +307,7 @@ words!? 漢字!汉字?「セリフ」(括弧) 文字(括弧)文字【括
297307
298308
[ウ ィキペディア]: https://ja.wikipedia.org/
299309
310+
C言語・C++・Go・Rust
311+
300312
================================================================================
301313
`;

tests/format/markdown/splitCjkText/symbolSpaceNewLine.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,3 +57,13 @@ English
5757

5858
[
5959
ィキペディア]: https://ja.wikipedia.org/
60+
61+
62+
C言
63+
64+
65+
C++
66+
67+
Go
68+
69+
Rust

0 commit comments

Comments
 (0)