Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 19 additions & 19 deletions 5-regular-expressions/10-regexp-backreferences/article.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# Backreferences: \n and $n
# 反向引用:\n $n

Capturing groups may be accessed not only in the result, but in the replacement string, and in the pattern too.
捕获组不仅能在结果中读取,也能在替换字符串,甚至模式中读取。

## Group in replacement: $n
## 替换字符串中的组:$n

When we are using `replace` method, we can access n-th group in the replacement string using `$n`.
`replace` 方法中可以用 `$n` 在替换字符串中访问第 n 个捕获组。

For instance:
例如:

```js run
let name = "John Smith";
Expand All @@ -15,19 +15,19 @@ name = name.replace(/(\w+) (\w+)/i, *!*"$2, $1"*/!*);
alert( name ); // Smith, John
```

Here `pattern:$1` in the replacement string means "substitute the content of the first group here", and `pattern:$2` means "substitute the second group here".
这里替换字符串中的 `pattern:$1` 的意思是“在这里替换第一个捕获组的内容”,`pattern:$2` 的意思是“在这里替换第二个捕获组的内容”。

Referencing a group in the replacement string allows us to reuse the existing text during the replacement.
在替换字符串中引用组允许我们在替换时重用已存在的文本。

## Group in pattern: \n
## 模式中的组:\n

A group can be referenced in the pattern using `\n`.
在模式中,可以用 `\n` 引用组。

To make things clear let's consider a task. We need to find a quoted string: either a single-quoted `subject:'...'` or a double-quoted `subject:"..."` -- both variants need to match.
为了更好的说明,假设有一个任务。我们需要找出带引号的字符串:要么是单引号 `subject:'...'`,要么是双引号 `subject:"..."` —— 这两种类型都要找出来。

How to look for them?
怎么查找这类引用组呢?

We can put two kinds of quotes in the pattern: `pattern:['"](.*?)['"]`. That finds strings like `match:"..."` and `match:'...'`, but it gives incorrect matches when one quote appears inside another one, like the string `subject:"She's the one!"`:
模式里要有两种引号:`pattern:['"](.*?)['"]`。匹配形如 `match:"..."` `match:'...'` 的字符串,但当两类符号存在嵌套是,结果就不正确了,例如字符串 `subject:"She's the one!"`

```js run
let str = "He said: \"She's the one!\".";
Expand All @@ -38,9 +38,9 @@ let reg = /['"](.*?)['"]/g;
alert( str.match(reg) ); // "She'
```

As we can see, the pattern found an opening quote `match:"`, then the text is consumed lazily till the other quote `match:'`, that closes the match.
如你所见,这个模式发现了左双引号 `match:"`,然后匹配文本直到发现另一个引号 `match:'`,结束本次匹配。

To make sure that the pattern looks for the closing quote exactly the same as the opening one, let's make a group of it and use the backreference:
为了确保模式匹配的右引号类型和左引号类型一致,可以将引号包裹成组并使用反向引用:

```js run
let str = "He said: \"She's the one!\".";
Expand All @@ -50,11 +50,11 @@ let reg = /(['"])(.*?)\1/g;
alert( str.match(reg) ); // "She's the one!"
```

Now everything's correct! The regular expression engine finds the first quote `pattern:(['"])` and remembers the content of `pattern:(...)`, that's the first capturing group.
现在一切搞定!正则表达式引擎匹配第一个引号 `pattern:(['"])` 时,记录 `pattern(...)` 的内容,这就是第一个捕获组。

Further in the pattern `pattern:\1` means "find the same text as in the first group".
`pattern:\1` 的含义是“找到与第一组相同的文本”。

Please note:
请注意:

- To reference a group inside a replacement string -- we use `$1`, while in the pattern -- a backslash `\1`.
- If we use `?:` in the group, then we can't reference it. Groups that are excluded from capturing `(?:...)` are not remembered by the engine.
- 在替换字符串内部引用组的方式 —— `$1`,在模式中引用组的方式 —— `\1`
- 在组内使用 `?:` 则无法引用到该组。正则表达式引擎不会记住被排除在捕获 `(?:...)` 之外的组。
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@

The first idea can be to list the languages with `|` in-between.
第一个解法是列出所有语言,中间加上 `|` 符号。

But that doesn't work right:
但是运行不如所愿:

```js run
let reg = /Java|JavaScript|PHP|C|C\+\+/g;
Expand All @@ -11,18 +11,18 @@ let str = "Java, JavaScript, PHP, C, C++";
alert( str.match(reg) ); // Java,Java,PHP,C,C
```

The regular expression engine looks for alternations one-by-one. That is: first it checks if we have `match:Java`, otherwise -- looks for `match:JavaScript` and so on.
正则表达式引擎查找选择模式的时是挨个查找的。意思是:它先匹配是否存在 `match:Java`,否则 —— 接着匹配 `match:JavaScript` 及其后的字符串。

As a result, `match:JavaScript` can never be found, just because `match:Java` is checked first.
结果,`match:JavaScript` 永远匹配不到,因为 `match:Java` 先被匹配了。

The same with `match:C` and `match:C++`.
`match:C` `match:C++` 同理。

There are two solutions for that problem:
这个问题有两个解决办法:

1. Change the order to check the longer match first: `pattern:JavaScript|Java|C\+\+|C|PHP`.
2. Merge variants with the same start: `pattern:Java(Script)?|C(\+\+)?|PHP`.
1. 变更匹配顺序,长的字符串优先匹配:`pattern:JavaScript|Java|C\+\+|C|PHP`
2. 合并相同前缀:`pattern:Java(Script)?|C(\+\+)?|PHP`

In action:
运行代码如下:

```js run
let reg = /Java(Script)?|C(\+\+)?|PHP/g;
Expand Down
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# Find programming languages
# 查找编程语言

There are many programming languages, for instance Java, JavaScript, PHP, C, C++.
有许多编程语言,例如 Java, JavaScript, PHP, C, C++

Create a regexp that finds them in the string `subject:Java JavaScript PHP C++ C`:
构建一个正则式,用来匹配字符串 `subject:Java JavaScript PHP C++ C` 中包含的编程语言:

```js
let reg = /your regexp/g;
Expand Down
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@

Opening tag is `pattern:\[(b|url|quote)\]`.
起始标签是 `pattern:\[(b|url|quote)\]`

Then to find everything till the closing tag -- let's the pattern `pattern:[\s\S]*?` to match any character including the newline and then a backreference to the closing tag.
匹配字符串直到遇到结束标签 —— 模式 `pattern:[\s\S]*?` 匹配任意字符,包括换行和用于结束标记的反向引用。

The full pattern: `pattern:\[(b|url|quote)\][\s\S]*?\[/\1\]`.
完整模式为:`pattern:\[(b|url|quote)\][\s\S]*?\[/\1\]`

In action:
运行代码如下:

```js run
let reg = /\[(b|url|quote)\][\s\S]*?\[\/\1\]/g;
Expand All @@ -20,4 +20,4 @@ let str = `
alert( str.match(reg) ); // [b]hello![/b],[quote][url]http://google.com[/url][/quote]
```

Please note that we had to escape a slash for the closing tag `pattern:[/\1]`, because normally the slash closes the pattern.
请注意我们要转义结束标签 `pattern:[/\1]` 中的斜杠,通常斜杠会关闭模式。
Original file line number Diff line number Diff line change
@@ -1,35 +1,35 @@
# Find bbtag pairs
# 查找 bbtag

A "bb-tag" looks like `[tag]...[/tag]`, where `tag` is one of: `b`, `url` or `quote`.
bb-tag” 形如 `[tag]...[/tag]``tag` 匹配 `b``url` `quote` 其中之一。

For instance:
例如:
```
[b]text[/b]
[url]http://google.com[/url]
```

BB-tags can be nested. But a tag can't be nested into itself, for instance:
BB-tags 可以嵌套。但标签不能自嵌套,比如:

```
Normal:
可行:
[url] [b]http://google.com[/b] [/url]
[quote] [b]text[/b] [/quote]

Impossible:
不可行:
[b][b]text[/b][/b]
```

Tags can contain line breaks, that's normal:
标签可以包含换行,通常为以下形式:

```
[quote]
[b]text[/b]
[/quote]
```

Create a regexp to find all BB-tags with their contents.
构造一个正则式用于查找所有 BB-tags 和其内容。

For instance:
举例:

```js
let reg = /your regexp/g;
Expand All @@ -38,7 +38,7 @@ let str = "..[url]http://google.com[/url]..";
alert( str.match(reg) ); // [url]http://google.com[/url]
```

If tags are nested, then we need the outer tag (if we want we can continue the search in its content):
如果标签嵌套,那么我们需要记录匹配的外层标签(如果希望继续查找匹配的标签内容的话):

```js
let reg = /your regexp/g;
Expand Down
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
The solution: `pattern:/"(\\.|[^"\\])*"/g`.
答案是 `pattern:/"(\\.|[^"\\])*"/g`

Step by step:
步骤如下:

- First we look for an opening quote `pattern:"`
- Then if we have a backslash `pattern:\\` (we technically have to double it in the pattern, because it is a special character, so that's a single backslash in fact), then any character is fine after it (a dot).
- Otherwise we take any character except a quote (that would mean the end of the string) and a backslash (to prevent lonely backslashes, the backslash is only used with some other symbol after it): `pattern:[^"\\]`
- ...And so on till the closing quote.
- 首先匹配左双引号 `pattern:"`
- 接着如果有反斜杠 `pattern:\\`,则匹配其后跟随的任意字符。(技术上,我们必须在模式中用双反斜杠,因为它是一个特殊的字符,但实际上是一个反斜杠字符)
- 如果没有,则匹配除双引号(字符串的结束)和反斜杠(排除仅存在反斜杠的情况,反斜杠仅在和其后字符一起使用时有效)外的任意字符:`pattern:[^"\\]`
- ...继续匹配直到遇到反双引号

In action:
运行代码如下:

```js run
let reg = /"(\\.|[^"\\])*"/g;
Expand Down
Original file line number Diff line number Diff line change
@@ -1,28 +1,28 @@
# Find quoted strings
# 查询引用字符串

Create a regexp to find strings in double quotes `subject:"..."`.
构建一个正则表达式用于匹配双引号内的字符串 `subject:"..."`

The important part is that strings should support escaping, in the same way as JavaScript strings do. For instance, quotes can be inserted as `subject:\"` a newline as `subject:\n`, and the slash itself as `subject:\\`.
最重要的部分是字符串应该支持转义,正如 JavaScript 字符串的行为一样。例如,引号可以插入为 `subject:\"`,换行符为 `subject:\n`,斜杠本身为 `subject:\\`

```js
let str = "Just like \"here\".";
```

For us it's important that an escaped quote `subject:\"` does not end a string.
对我们来说,重要的是转义的引号 `subject:\"` 不会结束字符串匹配。

So we should look from one quote to the other ignoring escaped quotes on the way.
所以,我们应该匹配两个引号之间的内容,且忽略中间转义的引号。

That's the essential part of the task, otherwise it would be trivial.
这是任务的关键部分,否则这个任务就没什么意思了。

Examples of strings to match:
匹配字符串示例:
```js
.. *!*"test me"*/!* ..
.. *!*"Say \"Hello\"!"*/!* ... (escaped quotes inside)
.. *!*"\\"*/!* .. (double slash inside)
.. *!*"\\ \""*/!* .. (double slash and an escaped quote inside)
```

In JavaScript we need to double the slashes to pass them right into the string, like this:
JavaScript 中,双斜杠用于把斜杠转义为字符串,如下所示:

```js run
let str = ' .. "test me" .. "Say \\"Hello\\"!" .. "\\\\ \\"" .. ';
Expand Down
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@

The pattern start is obvious: `pattern:<style`.
模式的开头显而易见:`pattern:<style`

...But then we can't simply write `pattern:<style.*?>`, because `match:<styler>` would match it.
...然而不能简单地写出 `pattern:<style.*?>` 这样的表达式,因为会同时匹配 `match:<styler>`

We need either a space after `match:<style` and then optionally something else or the ending `match:>`.
要么匹配 `match:<style` 后的一个空格,然后匹配任意内容;要么直接匹配结束符号 `match:>`

In the regexp language: `pattern:<style(>|\s.*?>)`.
最终的正则表达式为:`pattern:<style(>|\s.*?>)`

In action:
运行代码如下:

```js run
let reg = /<style(>|\s.*?>)/g;
Expand Down
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
# Find the full tag
# 查找完整标签

Write a regexp to find the tag `<style...>`. It should match the full tag: it may have no attributes `<style>` or have several of them `<style type="..." id="...">`.
写出一个正则表达式,用于查找 `<style...>` 标签。它应该匹配完整的标签:该标签可能是没有属性的标签 `<style>` 或是有很多属性的标签 `<style type="..." id="...">`

...But the regexp should not match `<styler>`!
...同时正则表达式不应该匹配 `<styler>`

For instance:
举例如下:

```js
let reg = /your regexp/g;
Expand Down
52 changes: 26 additions & 26 deletions 5-regular-expressions/11-regexp-alternation/article.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
# Alternation (OR) |
# 选择(OR)|

Alternation is the term in regular expression that is actually a simple "OR".
选择是正则表达式中的一个术语,实际上是一个简单的“或”。

In a regular expression it is denoted with a vertical line character `pattern:|`.
在正则表达式中,它用竖线 `pattern:|` 表示。

For instance, we need to find programming languages: HTML, PHP, Java or JavaScript.
例如,我们需要找出编程语言:HTMLPHPJava JavaScript

The corresponding regexp: `pattern:html|php|java(script)?`.
对应的正则表达式为:`pattern:html|php|java(script)?`

A usage example:
用例如下:

```js run
let reg = /html|php|css|java(script)?/gi;
Expand All @@ -18,37 +18,37 @@ let str = "First HTML appeared, then CSS, then JavaScript";
alert( str.match(reg) ); // 'HTML', 'CSS', 'JavaScript'
```

We already know a similar thing -- square brackets. They allow to choose between multiple character, for instance `pattern:gr[ae]y` matches `match:gray` or `match:grey`.
我们已知的一个相似符号 —— 方括号。就允许在许多字符中进行选择,例如 `pattern:gr[ae]y` 匹配 `match:gray` `match:grey`

Alternation works not on a character level, but on expression level. A regexp `pattern:A|B|C` means one of expressions `A`, `B` or `C`.
选择符号并非在字符级别生效,而是在表达式级别。正则表达式 `pattern:A|B|C` 意思是命中 `A``B` `C` 其一均可。

For instance:
例如:

- `pattern:gr(a|e)y` means exactly the same as `pattern:gr[ae]y`.
- `pattern:gra|ey` means "gra" or "ey".
- `pattern:gr(a|e)y` 严格等同 `pattern:gr[ae]y`
- `pattern:gra|ey` 匹配 "gra" or "ey"

To separate a part of the pattern for alternation we usually enclose it in parentheses, like this: `pattern:before(XXX|YYY)after`.
我们通常用圆括号把模式中的选择部分括起来,像这样 `pattern:before(XXX|YYY)after`

## Regexp for time
## 时间正则表达式

In previous chapters there was a task to build a regexp for searching time in the form `hh:mm`, for instance `12:00`. But a simple `pattern:\d\d:\d\d` is too vague. It accepts `25:99` as the time.
在之前的章节中有个任务是构建用于查找形如 `hh:mm` 的时间字符串,例如 `12:00`。但是简单的 `pattern:\d\d:\d\d` 过于模糊。它同时匹配 `25:99`

How can we make a better one?
如何构建更优的正则表达式?

We can apply more careful matching:
我们可以应用到更多的严格匹配结果中:

- The first digit must be `0` or `1` followed by any digit.
- Or `2` followed by `pattern:[0-3]`
- 首个匹配数字必须是 `0` `1`,同时其后还要跟随任一数字。
- 或者是数字 `2` 之后跟随 `pattern:[0-3]`

As a regexp: `pattern:[01]\d|2[0-3]`.
构建正则表达式:`pattern:[01]\d|2[0-3]`

Then we can add a colon and the minutes part.
接着可以添加冒号和分钟的部分。

The minutes must be from `0` to `59`, in the regexp language that means the first digit `pattern:[0-5]` followed by any other digit `\d`.
分钟的部分必须在 `0` `59` 区间,在正则表达式语言中含义为首个匹配数字 `pattern:[0-5]` 其后跟随任一数字 `\d`

Let's glue them together into the pattern: `pattern:[01]\d|2[0-3]:[0-5]\d`.
把他们拼接在一起形成最终的模式 `pattern:[01]\d|2[0-3]:[0-5]\d`

We're almost done, but there's a problem. The alternation `|` is between the `pattern:[01]\d` and `pattern:2[0-3]:[0-5]\d`. That's wrong, because it will match either the left or the right pattern:
快大功告成了,但仍然存在一个问题。选择符 `|` `pattern:[01]\d` `pattern:2[0-3]:[0-5]\d` 之间。这是错误的,因为它只匹配符号左侧或右侧任一表达式。


```js run
Expand All @@ -57,11 +57,11 @@ let reg = /[01]\d|2[0-3]:[0-5]\d/g;
alert("12".match(reg)); // 12 (matched [01]\d)
```

That's rather obvious, but still an often mistake when starting to work with regular expressions.
这个错误相当明显,但也是初学正则表达式的常见错误。

We need to add parentheses to apply alternation exactly to hours: `[01]\d` OR `2[0-3]`.
我们需要添加一个插入语用于匹配时钟:`[01]\d` `2[0-3]`

The correct variant:
以下为正确版本:

```js run
let reg = /([01]\d|2[0-3]):[0-5]\d/g;
Expand Down