|
1 | 1 | # Instructions |
2 | 2 |
|
3 | | -You have identified a gap in the social media market for very very short |
4 | | -posts. Now that Twitter allows 280 character posts, people wanting quick |
5 | | -social media updates aren't being served. You decide to create your own |
6 | | -social media network. |
| 3 | +You have identified a gap in the social media market for very very short posts. |
| 4 | +Now that Twitter allows 280 character posts, people wanting quick social media updates aren't being served. |
| 5 | +You decide to create your own social media network. |
7 | 6 |
|
8 | | -To make your product noteworthy, you make it extreme and only allow posts |
9 | | -of 5 or less characters. Any posts of more than 5 characters should be |
10 | | -truncated to 5. |
| 7 | +To make your product noteworthy, you make it extreme and only allow posts of 5 or less characters. |
| 8 | +Any posts of more than 5 characters should be truncated to 5. |
11 | 9 |
|
12 | | -To allow your users to express themselves fully, you allow Emoji and |
13 | | -other Unicode. |
| 10 | +To allow your users to express themselves fully, you allow Emoji and other Unicode. |
14 | 11 |
|
15 | 12 | The task is to truncate input strings to 5 characters. |
16 | 13 |
|
17 | 14 | ## Text Encodings |
18 | 15 |
|
19 | 16 | Text stored digitally has to be converted to a series of bytes. |
20 | 17 | There are 3 ways to map characters to bytes in common use. |
21 | | -* **ASCII** can encode English language characters. All |
22 | | -characters are precisely 1 byte long. |
23 | | -* **UTF-8** is a Unicode text encoding. Characters take between 1 |
24 | | -and 4 bytes. |
25 | | -* **UTF-16** is a Unicode text encoding. Characters are either 2 or |
26 | | -4 bytes long. |
27 | | - |
28 | | -UTF-8 and UTF-16 are both Unicode encodings which means they're capable of |
29 | | -representing a massive range of characters including: |
30 | | -* Text in most of the world's languages and scripts |
31 | | -* Historic text |
32 | | -* Emoji |
33 | | - |
34 | | -UTF-8 and UTF-16 are both variable length encodings, which means that |
35 | | -different characters take up different amounts of space. |
36 | | - |
37 | | -Consider the letter 'a' and the emoji '😛'. In UTF-16 the letter takes |
38 | | -2 bytes but the emoji takes 4 bytes. |
39 | | - |
40 | | -The trick to this exercise is to use APIs designed around Unicode |
41 | | -characters (codepoints) instead of Unicode codeunits. |
| 18 | + |
| 19 | +- **ASCII** can encode English language characters. |
| 20 | + All characters are precisely 1 byte long. |
| 21 | +- **UTF-8** is a Unicode text encoding. |
| 22 | + Characters take between 1 and 4 bytes. |
| 23 | +- **UTF-16** is a Unicode text encoding. |
| 24 | + Characters are either 2 or 4 bytes long. |
| 25 | + |
| 26 | +UTF-8 and UTF-16 are both Unicode encodings which means they're capable of representing a massive range of characters including: |
| 27 | + |
| 28 | +- Text in most of the world's languages and scripts |
| 29 | +- Historic text |
| 30 | +- Emoji |
| 31 | + |
| 32 | +UTF-8 and UTF-16 are both variable length encodings, which means that different characters take up different amounts of space. |
| 33 | + |
| 34 | +Consider the letter 'a' and the emoji '😛'. |
| 35 | +In UTF-16 the letter takes 2 bytes but the emoji takes 4 bytes. |
| 36 | + |
| 37 | +The trick to this exercise is to use APIs designed around Unicode characters (codepoints) instead of Unicode codeunits. |
0 commit comments