Skip to content

Commit 0fa73b8

Browse files
author
Esben Sparre Andreasen
committed
JS: add query js/regex/missing-regexp-anchor
1 parent 69db54a commit 0fa73b8

File tree

11 files changed

+460
-0
lines changed

11 files changed

+460
-0
lines changed

change-notes/1.21/analysis-javascript.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@
2727

2828
| **Query** | **Tags** | **Purpose** |
2929
|-----------------------------------------------|------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
30+
| Missing regular expression anchor (`js/regex/missing-regexp-anchor`) | correctness, security, external/cwe/cwe-20 | Highlights regular expression patterns that may be missing an anchor, indicating a possible violation of [CWE-20](https://cwe.mitre.org/data/definitions/20.html). Results are not shown on LGTM by default. |
3031
| Prototype pollution (`js/prototype-pollution`) | security, external/cwe-250, external/cwe-400 | Highlights code that allows an attacker to modify a built-in prototype object through an unsanitized recursive merge function. The results are shown on LGTM by default. |
3132

3233
## Changes to existing queries

javascript/config/suites/javascript/security

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
+ semmlecode-javascript-queries/Security/CWE-020/IncompleteHostnameRegExp.ql: /Security/CWE/CWE-020
44
+ semmlecode-javascript-queries/Security/CWE-020/IncompleteUrlSubstringSanitization.ql: /Security/CWE/CWE-020
55
+ semmlecode-javascript-queries/Security/CWE-020/IncorrectSuffixCheck.ql: /Security/CWE/CWE-020
6+
+ semmlecode-javascript-queries/Security/CWE-020/MissingRegExpAnchor.ql: /Security/CWE/CWE-020
67
+ semmlecode-javascript-queries/Security/CWE-022/TaintedPath.ql: /Security/CWE/CWE-022
78
+ semmlecode-javascript-queries/Security/CWE-022/ZipSlip.ql: /Security/CWE/CWE-022
89
+ semmlecode-javascript-queries/Security/CWE-078/CommandInjection.ql: /Security/CWE/CWE-078
Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
<!DOCTYPE qhelp PUBLIC
2+
"-//Semmle//qhelp//EN"
3+
"qhelp.dtd">
4+
<qhelp>
5+
6+
<overview>
7+
<p>
8+
9+
Sanitizing untrusted input with regular expressions is a
10+
common technique. However, it is error prone to match untrusted input
11+
against regular expressions without anchors such as <code>^</code> or
12+
<code>$</code>. Malicious input can bypass such security checks by
13+
embedding one of the allowed patterns in an unexpected location.
14+
15+
</p>
16+
17+
<p>
18+
19+
Even if the matching is not done in a security-critical
20+
context, it may still cause undesirable behaviors when the regular
21+
expression matches accidentally.
22+
23+
</p>
24+
</overview>
25+
26+
<recommendation>
27+
<p>
28+
29+
Use anchors to ensure that regular expressions match at
30+
the expected locations.
31+
32+
</p>
33+
</recommendation>
34+
35+
<example>
36+
37+
<p>
38+
39+
The following example code checks that a URL redirection
40+
will reach the <code>example.com</code> domain, or one of its
41+
subdomains, and not some malicious site.
42+
43+
</p>
44+
45+
<sample src="examples/MissingRegExpAnchor_BAD.js"/>
46+
47+
<p>
48+
49+
The check with the regular expression match is, however, easy to bypass. For example
50+
by embedding <code>example.com</code> in the path component:
51+
<code>http://evil-example.net/example.com</code>, or in the query
52+
string component: <code>http://evil-example.net/?x=example.com</code>.
53+
54+
Address these shortcomings by using anchors in the regular expression instead:
55+
56+
</p>
57+
58+
<sample src="examples/MissingRegExpAnchor_GOOD.js"/>
59+
60+
<p>
61+
62+
A related mistake is to write a regular expression with
63+
multiple alternatives, but to only include an anchor for one of the
64+
alternatives. As an example, the regular expression
65+
<code>/^www\\.example\\.com|beta\\.example\\.com/</code> will match the host
66+
<code>evil.beta.example.com</code> because the regular expression is parsed
67+
as <code>/(^www\\.example\\.com)|(beta\\.example\\.com)/</code>
68+
69+
</p>
70+
</example>
71+
72+
<references>
73+
<li>MDN: <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions">Regular Expressions</a></li>
74+
<li>OWASP: <a href="https://www.owasp.org/index.php/Server_Side_Request_Forgery">SSRF</a></li>
75+
<li>OWASP: <a href="https://www.owasp.org/index.php/Unvalidated_Redirects_and_Forwards_Cheat_Sheet">XSS Unvalidated Redirects and Forwards Cheat Sheet</a>.</li>
76+
</references>
77+
</qhelp>
Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,86 @@
1+
/**
2+
* @name Missing regular expression anchor
3+
* @description Regular expressions without anchors can be vulnerable to bypassing.
4+
* @kind problem
5+
* @problem.severity warning
6+
* @precision medium
7+
* @id js/regex/missing-regexp-anchor
8+
* @tags correctness
9+
* security
10+
* external/cwe/cwe-20
11+
*/
12+
13+
import javascript
14+
15+
/**
16+
* Holds if `src` is a pattern for a collection of alternatives where
17+
* only the first or last alternative is anchored, indicating a
18+
* precedence mistake explained by `msg`.
19+
*
20+
* The canonical example of such a mistake is: `^a|b|c`, which is
21+
* parsed as `(^a)|(b)|(c)`.
22+
*/
23+
predicate isAnInterestingSemiAnchoredRegExpString(RegExpPatternSource src, string msg) {
24+
exists(string str, string maybeGroupedStr, string regex, string anchorPart, string posString, string escapedDot |
25+
// a dot that might be escaped in a regular expression, for example `/\./` or new `RegExp('\\.')`
26+
escapedDot = "\\\\\\\\?[.]" and
27+
// a string that is mostly free from special reqular expression symbols
28+
str = "(?:(?:" + escapedDot + ")|[a-z:/.?_,@0-9 -])+" and
29+
// the string may be wrapped in parentheses
30+
maybeGroupedStr = "(?:" + str + "|\\(" + str + "\\))" and
31+
(
32+
// a problematic pattern: `^a|b|...|x`
33+
regex = "(?i)(\\^" + maybeGroupedStr + ")(?:\\|" + maybeGroupedStr + ")+" and
34+
posString = "beginning"
35+
or
36+
// a problematic pattern: `a|b|...|x$`
37+
regex = "(?i)(?:" + maybeGroupedStr + "\\|)+(" + maybeGroupedStr + "\\$)" and
38+
posString = "end"
39+
) and
40+
anchorPart = src.getPattern().regexpCapture(regex, 1) and
41+
anchorPart.regexpMatch("(?i).*[a-z].*") and
42+
msg = "The alternative '" + anchorPart + "' uses an anchor to match from the " + posString +
43+
" of a string, but the other alternatives of this regular expression do not use anchors."
44+
)
45+
}
46+
47+
/**
48+
* Holds if `src` is an unanchored pattern for a URL, indicating a
49+
* mistake explained by `msg`.
50+
*/
51+
predicate isAnInterestingUnanchoredRegExpString(RegExpPatternSource src, string msg) {
52+
exists(string pattern | pattern = src.getPattern() |
53+
// a substring sequence of a protocol and subdomains, perhaps with some regex characters mixed in, followed by a known TLD
54+
pattern
55+
.regexpMatch("(?i)[():|?a-z0-9-\\\\./]+[.]" + RegExpPatterns::commonTLD() +
56+
"([/#?():]\\S*)?") and
57+
// without any anchors
58+
pattern.regexpMatch("[^$^]+") and
59+
// that is not used for capture or replace
60+
not exists(DataFlow::MethodCallNode mcn, string name | name = mcn.getMethodName() |
61+
name = "exec" and
62+
mcn = src.getARegExpObject().getAMethodCall() and
63+
exists(mcn.getAPropertyRead())
64+
or
65+
exists(DataFlow::Node arg |
66+
arg = mcn.getArgument(0) and
67+
(
68+
src.getARegExpObject().flowsTo(arg) or
69+
src.(StringRegExpPatternSource).getAUse() = arg
70+
)
71+
|
72+
name = "replace"
73+
or
74+
name = "match" and exists(mcn.getAPropertyRead())
75+
)
76+
) and
77+
msg = "When this is used as a regular expression on a URL, it may match anywhere, and arbitrary hosts may come before or after it."
78+
)
79+
}
80+
81+
from DataFlow::Node nd, string msg
82+
where
83+
isAnInterestingUnanchoredRegExpString(nd, msg)
84+
or
85+
isAnInterestingSemiAnchoredRegExpString(nd, msg)
86+
select nd, msg
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
app.get('/some/path', function(req, res) {
2+
let url = req.param("url");
3+
// BAD: the host of `url` may be controlled by an attacker
4+
if (url.match(/https?:\/\/www\.example\.com\//)) {
5+
res.redirect(url);
6+
}
7+
});
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
app.get('/some/path', function(req, res) {
2+
let url = req.param("url");
3+
// GOOD: the host of `url` can not be controlled by an attacker
4+
if (url.match(/^https?:\/\/www\.example\.com\//)) {
5+
res.redirect(url);
6+
}
7+
});

javascript/ql/test/query-tests/Security/CWE-020/IncompleteHostnameRegExp.expected

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,3 +22,6 @@
2222
| tst-IncompleteHostnameRegExp.js:48:13:48:68 | '^http: ... e\\.com' | This string, which is used as a regular expression $@, has an unescaped '.' before 'example.com', so it might match more hosts than expected. | tst-IncompleteHostnameRegExp.js:48:13:48:68 | '^http: ... e\\.com' | here |
2323
| tst-IncompleteHostnameRegExp.js:48:41:48:68 | '^https ... e\\.com' | This string, which is used as a regular expression $@, has an unescaped '.' before 'example.com', so it might match more hosts than expected. | tst-IncompleteHostnameRegExp.js:48:13:48:68 | '^http: ... e\\.com' | here |
2424
| tst-IncompleteHostnameRegExp.js:53:13:53:36 | 'test.' ... e.com$' | This string, which is used as a regular expression $@, has an unescaped '.' before 'example.com', so it might match more hosts than expected. | tst-IncompleteHostnameRegExp.js:53:13:53:36 | 'test.' ... e.com$' | here |
25+
| tst-SemiAnchoredRegExp.js:30:2:30:23 | /^good. ... er.com/ | This regular expression has an unescaped '.' before 'com\|better.com', so it might match more hosts than expected. | tst-SemiAnchoredRegExp.js:30:2:30:23 | /^good. ... er.com/ | here |
26+
| tst-SemiAnchoredRegExp.js:64:13:64:34 | '^good. ... er.com' | This string, which is used as a regular expression $@, has an unescaped '.' before 'com\|better.com', so it might match more hosts than expected. | tst-SemiAnchoredRegExp.js:64:13:64:34 | '^good. ... er.com' | here |
27+
| tst-SemiAnchoredRegExp.js:65:13:65:36 | '^good\\ ... r\\.com' | This string, which is used as a regular expression $@, has an unescaped '.' before 'com\|better.com', so it might match more hosts than expected. | tst-SemiAnchoredRegExp.js:65:13:65:36 | '^good\\ ... r\\.com' | here |
Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
| tst-SemiAnchoredRegExp.js:3:2:3:7 | /^a\|b/ | The alternative '^a' uses an anchor to match from the beginning of a string, but the other alternatives of this regular expression do not use anchors. |
2+
| tst-SemiAnchoredRegExp.js:6:2:6:9 | /^a\|b\|c/ | The alternative '^a' uses an anchor to match from the beginning of a string, but the other alternatives of this regular expression do not use anchors. |
3+
| tst-SemiAnchoredRegExp.js:12:2:12:9 | /^a\|(b)/ | The alternative '^a' uses an anchor to match from the beginning of a string, but the other alternatives of this regular expression do not use anchors. |
4+
| tst-SemiAnchoredRegExp.js:14:2:14:11 | /^(a)\|(b)/ | The alternative '^(a)' uses an anchor to match from the beginning of a string, but the other alternatives of this regular expression do not use anchors. |
5+
| tst-SemiAnchoredRegExp.js:17:2:17:7 | /a\|b$/ | The alternative 'b$' uses an anchor to match from the end of a string, but the other alternatives of this regular expression do not use anchors. |
6+
| tst-SemiAnchoredRegExp.js:20:2:20:9 | /a\|b\|c$/ | The alternative 'c$' uses an anchor to match from the end of a string, but the other alternatives of this regular expression do not use anchors. |
7+
| tst-SemiAnchoredRegExp.js:26:2:26:9 | /(a)\|b$/ | The alternative 'b$' uses an anchor to match from the end of a string, but the other alternatives of this regular expression do not use anchors. |
8+
| tst-SemiAnchoredRegExp.js:28:2:28:11 | /(a)\|(b)$/ | The alternative '(b)$' uses an anchor to match from the end of a string, but the other alternatives of this regular expression do not use anchors. |
9+
| tst-SemiAnchoredRegExp.js:30:2:30:23 | /^good. ... er.com/ | The alternative '^good.com' uses an anchor to match from the beginning of a string, but the other alternatives of this regular expression do not use anchors. |
10+
| tst-SemiAnchoredRegExp.js:31:2:31:25 | /^good\\ ... r\\.com/ | The alternative '^good\\.com' uses an anchor to match from the beginning of a string, but the other alternatives of this regular expression do not use anchors. |
11+
| tst-SemiAnchoredRegExp.js:32:2:32:27 | /^good\\ ... \\\\.com/ | The alternative '^good\\\\.com' uses an anchor to match from the beginning of a string, but the other alternatives of this regular expression do not use anchors. |
12+
| tst-SemiAnchoredRegExp.js:37:13:37:18 | "^a\|b" | The alternative '^a' uses an anchor to match from the beginning of a string, but the other alternatives of this regular expression do not use anchors. |
13+
| tst-SemiAnchoredRegExp.js:40:13:40:20 | "^a\|b\|c" | The alternative '^a' uses an anchor to match from the beginning of a string, but the other alternatives of this regular expression do not use anchors. |
14+
| tst-SemiAnchoredRegExp.js:46:13:46:20 | "^a\|(b)" | The alternative '^a' uses an anchor to match from the beginning of a string, but the other alternatives of this regular expression do not use anchors. |
15+
| tst-SemiAnchoredRegExp.js:48:13:48:22 | "^(a)\|(b)" | The alternative '^(a)' uses an anchor to match from the beginning of a string, but the other alternatives of this regular expression do not use anchors. |
16+
| tst-SemiAnchoredRegExp.js:51:13:51:18 | "a\|b$" | The alternative 'b$' uses an anchor to match from the end of a string, but the other alternatives of this regular expression do not use anchors. |
17+
| tst-SemiAnchoredRegExp.js:54:13:54:20 | "a\|b\|c$" | The alternative 'c$' uses an anchor to match from the end of a string, but the other alternatives of this regular expression do not use anchors. |
18+
| tst-SemiAnchoredRegExp.js:60:13:60:20 | "(a)\|b$" | The alternative 'b$' uses an anchor to match from the end of a string, but the other alternatives of this regular expression do not use anchors. |
19+
| tst-SemiAnchoredRegExp.js:62:13:62:22 | "(a)\|(b)$" | The alternative '(b)$' uses an anchor to match from the end of a string, but the other alternatives of this regular expression do not use anchors. |
20+
| tst-SemiAnchoredRegExp.js:64:13:64:34 | '^good. ... er.com' | The alternative '^good.com' uses an anchor to match from the beginning of a string, but the other alternatives of this regular expression do not use anchors. |
21+
| tst-SemiAnchoredRegExp.js:65:13:65:36 | '^good\\ ... r\\.com' | The alternative '^good.com' uses an anchor to match from the beginning of a string, but the other alternatives of this regular expression do not use anchors. |
22+
| tst-SemiAnchoredRegExp.js:66:13:66:38 | '^good\\ ... \\\\.com' | The alternative '^good\\.com' uses an anchor to match from the beginning of a string, but the other alternatives of this regular expression do not use anchors. |
23+
| tst-SemiAnchoredRegExp.js:75:2:75:27 | /(\\.xxx ... .zzz)$/ | The alternative '(\\.zzz)$' uses an anchor to match from the end of a string, but the other alternatives of this regular expression do not use anchors. |
24+
| tst-SemiAnchoredRegExp.js:77:2:77:23 | /\\.xxx\| ... zzz$/ig | The alternative '\\.zzz$' uses an anchor to match from the end of a string, but the other alternatives of this regular expression do not use anchors. |
25+
| tst-SemiAnchoredRegExp.js:78:2:78:19 | /\\.xxx\|\\.yyy\|zzz$/ | The alternative 'zzz$' uses an anchor to match from the end of a string, but the other alternatives of this regular expression do not use anchors. |
26+
| tst-SemiAnchoredRegExp.js:81:2:81:28 | /^(xxx ... yyy)/i | The alternative '^(xxx yyy zzz)' uses an anchor to match from the beginning of a string, but the other alternatives of this regular expression do not use anchors. |
27+
| tst-SemiAnchoredRegExp.js:83:2:83:24 | /^(xxx: ... (zzz:)/ | The alternative '^(xxx:)' uses an anchor to match from the beginning of a string, but the other alternatives of this regular expression do not use anchors. |
28+
| tst-SemiAnchoredRegExp.js:84:2:84:23 | /^(xxx? ... zzz\\/)/ | The alternative '^(xxx?:)' uses an anchor to match from the beginning of a string, but the other alternatives of this regular expression do not use anchors. |
29+
| tst-SemiAnchoredRegExp.js:85:2:85:16 | /^@media\|@page/ | The alternative '^@media' uses an anchor to match from the beginning of a string, but the other alternatives of this regular expression do not use anchors. |
30+
| tst-SemiAnchoredRegExp.js:87:2:87:21 | /^click\|mouse\|touch/ | The alternative '^click' uses an anchor to match from the beginning of a string, but the other alternatives of this regular expression do not use anchors. |
31+
| tst-SemiAnchoredRegExp.js:88:2:88:43 | /^http: ... r\\.com/ | The alternative '^http://good\\.com' uses an anchor to match from the beginning of a string, but the other alternatives of this regular expression do not use anchors. |
32+
| tst-SemiAnchoredRegExp.js:89:2:89:47 | /^https ... r\\.com/ | The alternative '^https?://good\\.com' uses an anchor to match from the beginning of a string, but the other alternatives of this regular expression do not use anchors. |
33+
| tst-SemiAnchoredRegExp.js:90:2:90:55 | /^mouse ... ragend/ | The alternative '^mouse' uses an anchor to match from the beginning of a string, but the other alternatives of this regular expression do not use anchors. |
34+
| tst-SemiAnchoredRegExp.js:91:2:91:14 | /^xxx:\|yyy:/i | The alternative '^xxx:' uses an anchor to match from the beginning of a string, but the other alternatives of this regular expression do not use anchors. |
35+
| tst-SemiAnchoredRegExp.js:92:2:92:18 | /_xxx\|_yyy\|_zzz$/ | The alternative '_zzz$' uses an anchor to match from the end of a string, but the other alternatives of this regular expression do not use anchors. |
36+
| tst-UnanchoredUrlRegExp.js:3:43:3:61 | "https?://good.com" | When this is used as a regular expression on a URL, it may match anywhere, and arbitrary hosts may come before or after it. |
37+
| tst-UnanchoredUrlRegExp.js:4:54:4:72 | "https?://good.com" | When this is used as a regular expression on a URL, it may match anywhere, and arbitrary hosts may come before or after it. |
38+
| tst-UnanchoredUrlRegExp.js:10:2:10:22 | /https? ... od.com/ | When this is used as a regular expression on a URL, it may match anywhere, and arbitrary hosts may come before or after it. |
39+
| tst-UnanchoredUrlRegExp.js:11:13:11:31 | "https?://good.com" | When this is used as a regular expression on a URL, it may match anywhere, and arbitrary hosts may come before or after it. |
40+
| tst-UnanchoredUrlRegExp.js:13:44:13:62 | "https?://good.com" | When this is used as a regular expression on a URL, it may match anywhere, and arbitrary hosts may come before or after it. |
41+
| tst-UnanchoredUrlRegExp.js:15:13:15:31 | "https?://good.com" | When this is used as a regular expression on a URL, it may match anywhere, and arbitrary hosts may come before or after it. |
42+
| tst-UnanchoredUrlRegExp.js:19:43:19:62 | "https?://good.com/" | When this is used as a regular expression on a URL, it may match anywhere, and arbitrary hosts may come before or after it. |
43+
| tst-UnanchoredUrlRegExp.js:20:43:20:66 | "https? ... m:8080" | When this is used as a regular expression on a URL, it may match anywhere, and arbitrary hosts may come before or after it. |
44+
| tst-UnanchoredUrlRegExp.js:23:3:23:21 | "https?://good.com" | When this is used as a regular expression on a URL, it may match anywhere, and arbitrary hosts may come before or after it. |
45+
| tst-UnanchoredUrlRegExp.js:24:3:24:23 | /https? ... od.com/ | When this is used as a regular expression on a URL, it may match anywhere, and arbitrary hosts may come before or after it. |
46+
| tst-UnanchoredUrlRegExp.js:25:14:25:32 | "https?://good.com" | When this is used as a regular expression on a URL, it may match anywhere, and arbitrary hosts may come before or after it. |
47+
| tst-UnanchoredUrlRegExp.js:35:2:35:32 | /https? ... 0-9]+)/ | When this is used as a regular expression on a URL, it may match anywhere, and arbitrary hosts may come before or after it. |
48+
| tst-UnanchoredUrlRegExp.js:49:11:49:51 | /youtub ... -_]+)/i | When this is used as a regular expression on a URL, it may match anywhere, and arbitrary hosts may come before or after it. |
49+
| tst-UnanchoredUrlRegExp.js:77:11:77:32 | /vimeo\\ ... 0-9]+)/ | When this is used as a regular expression on a URL, it may match anywhere, and arbitrary hosts may come before or after it. |
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Security/CWE-020/MissingRegExpAnchor.ql

0 commit comments

Comments
 (0)