Merge codeql-ruby into codeql#6942
Conversation
Fix filenames in source archives
Co-authored-by: Nick Rolfe <nickrolfe@github.com>
Start modelling some file system access concepts
This seems a more convenient place to keep all the HTTP client modelling.
Model outgoing HTTP requests as remote flow sources
Add support for flow summaries
Add query for Code Injection
Alert suppression and file classifier query
Desugar array literals to `::Array.[]`
Make `{Unary,Binary}Operation` a sub class of `MethodCall`
Model some more HTTP clients
`self` variables are scoped to methods, modules, classes and the top-level of the program. Prior to this change, they were treated as being scoped just to methods. This change means we (once again) correctly synthesise `self` receivers for method calls in class bodies, module bodies and at the top-level.
Some of these look a bit suspicious, so need to double check them before merging.
The change to `self` modelling finds more true positives in this query.
Ruby: warn that Ruby is still in Beta
Sync with `github/codeql:main`
Merge rc/3.3 into main
This requires changing the CFG trees for classes and modules from post-order to pre-order so that we can place the writes at the root node of the tree, to prevent them overlapping with reads in the body of the class/module. We need to do this because classes and modules don't define their own basic block, but re-use the surrounding one. This problem doesn't occur for `self` variables in methods because each method has its own basic block and we can place the write on the entry node of the bock.
Many `self` reads are synthesised from method calls with an implicit
`self` receiver. Synthesised nodes have no `toGenerated` result, which
the default definition of `isCapturedAccess` uses to determine if a
variable's scope matches the access's scope.
Hence we override the definition to properly identify accesses like the
call `puts` (below) as captured reads of a `self` variable defined in a
parent scope.
In other words, `puts x` is short for `self.puts x` and the `self`
refers to its value in the scope of the module `Foo`.
```ruby
module Foo
MY_PROC = -> (x) { puts x }
end
```
We also have to update the SSA `SelfDefinition` to exclude captured
`self` variables.
Add rb/path-injection query
Conflicts: docs/codeql/query-help/codeql-cwe-coverage.rst
|
QHelp previews: javascript/ql/src/Performance/PolynomialReDoS.qhelpPolynomial regular expression used on uncontrolled dataSome regular expressions take a long time to match certain input strings to the point where the time it takes to match a string of length n is proportional to nk or even 2n. Such regular expressions can negatively affect performance, or even allow a malicious user to perform a Denial of Service ("DoS") attack by crafting an expensive input string for the regular expression to match. The regular expression engines provided by many popular JavaScript platforms use backtracking non-deterministic finite automata to implement regular expression matching. While this approach is space-efficient and allows supporting advanced features like capture groups, it is not time-efficient in general. The worst-case time complexity of such an automaton can be polynomial or even exponential, meaning that for strings of a certain shape, increasing the input length by ten characters may make the automaton about 1000 times slower. Typically, a regular expression is affected by this problem if it contains a repetition of the form RecommendationModify the regular expression to remove the ambiguity, or ensure that the strings matched with the regular expression are short enough that the time-complexity does not matter. ExampleConsider this use of a regular expression, which removes all leading and trailing whitespace in a string: text.replace(/^\s+|\s+$/g, ''); // BAD
The sub-expression This ultimately means that the time cost of trimming a string is quadratic in the length of the string. So a string like Avoid this problem by rewriting the regular expression to not contain the ambiguity about when to start matching whitespace sequences. For instance, by using a negative look-behind ( Note that the sub-expression ExampleAs a similar, but slightly subtler problem, consider the regular expression that matches lines with numbers, possibly written using scientific notation: ^0\.\d+E?\d+$ // BAD
The problem with this regular expression is in the sub-expression This is problematic for strings that do not end with a digit. Such a string will force the regular expression engine to process each digit sequence once per digit in the sequence, again leading to a quadratic time complexity. To make the processing faster, the regular expression should be rewritten such that the two References
javascript/ql/src/Performance/ReDoS.qhelpInefficient regular expressionSome regular expressions take a long time to match certain input strings to the point where the time it takes to match a string of length n is proportional to nk or even 2n. Such regular expressions can negatively affect performance, or even allow a malicious user to perform a Denial of Service ("DoS") attack by crafting an expensive input string for the regular expression to match. The regular expression engines provided by many popular JavaScript platforms use backtracking non-deterministic finite automata to implement regular expression matching. While this approach is space-efficient and allows supporting advanced features like capture groups, it is not time-efficient in general. The worst-case time complexity of such an automaton can be polynomial or even exponential, meaning that for strings of a certain shape, increasing the input length by ten characters may make the automaton about 1000 times slower. Typically, a regular expression is affected by this problem if it contains a repetition of the form RecommendationModify the regular expression to remove the ambiguity, or ensure that the strings matched with the regular expression are short enough that the time-complexity does not matter. ExampleConsider this regular expression: /^_(__|.)+_$/
Its sub-expression This problem can be avoided by rewriting the regular expression to remove the ambiguity between the two branches of the alternative inside the repetition: /^_(__|[^_])+_$/
References
python/ql/src/experimental/Security/CWE-730/PolynomialReDoS.qhelpPolynomial regular expression used on uncontrolled dataSome regular expressions take a long time to match certain input strings to the point where the time it takes to match a string of length n is proportional to nk or even 2n. Such regular expressions can negatively affect performance, or even allow a malicious user to perform a Denial of Service ("DoS") attack by crafting an expensive input string for the regular expression to match. The regular expression engine provided by Python uses a backtracking non-deterministic finite automata to implement regular expression matching. While this approach is space-efficient and allows supporting advanced features like capture groups, it is not time-efficient in general. The worst-case time complexity of such an automaton can be polynomial or even exponential, meaning that for strings of a certain shape, increasing the input length by ten characters may make the automaton about 1000 times slower. Typically, a regular expression is affected by this problem if it contains a repetition of the form RecommendationModify the regular expression to remove the ambiguity, or ensure that the strings matched with the regular expression are short enough that the time-complexity does not matter. ExampleConsider this use of a regular expression, which removes all leading and trailing whitespace in a string: re.sub(r"^\s+|\s+$", "", text) # BAD
The sub-expression This ultimately means that the time cost of trimming a string is quadratic in the length of the string. So a string like Avoid this problem by rewriting the regular expression to not contain the ambiguity about when to start matching whitespace sequences. For instance, by using a negative look-behind ( Note that the sub-expression ExampleAs a similar, but slightly subtler problem, consider the regular expression that matches lines with numbers, possibly written using scientific notation: ^0\.\d+E?\d+$ # BAD
The problem with this regular expression is in the sub-expression This is problematic for strings that do not end with a digit. Such a string will force the regular expression engine to process each digit sequence once per digit in the sequence, again leading to a quadratic time complexity. To make the processing faster, the regular expression should be rewritten such that the two References
python/ql/src/experimental/Security/CWE-730/ReDoS.qhelpInefficient regular expressionSome regular expressions take a long time to match certain input strings to the point where the time it takes to match a string of length n is proportional to nk or even 2n. Such regular expressions can negatively affect performance, or even allow a malicious user to perform a Denial of Service ("DoS") attack by crafting an expensive input string for the regular expression to match. The regular expression engine provided by Python uses a backtracking non-deterministic finite automata to implement regular expression matching. While this approach is space-efficient and allows supporting advanced features like capture groups, it is not time-efficient in general. The worst-case time complexity of such an automaton can be polynomial or even exponential, meaning that for strings of a certain shape, increasing the input length by ten characters may make the automaton about 1000 times slower. Typically, a regular expression is affected by this problem if it contains a repetition of the form RecommendationModify the regular expression to remove the ambiguity, or ensure that the strings matched with the regular expression are short enough that the time-complexity does not matter. ExampleConsider this regular expression: ^_(__|.)+_$
Its sub-expression This problem can be avoided by rewriting the regular expression to remove the ambiguity between the two branches of the alternative inside the repetition: ^_(__|[^_])+_$
References
ruby/ql/src/queries/security/cwe-022/PathInjection.qhelpUncontrolled data used in path expressionAccessing files using paths constructed from user-controlled data can allow an attacker to access unexpected resources. This can result in sensitive information being revealed or deleted, or an attacker being able to influence behavior by modifying unexpected files. RecommendationValidate user input before using it to construct a file path, either using an off-the-shelf library like Ideally, follow these rules:
ExampleIn the first example, a file name is read from an HTTP request and then used to access a file. However, a malicious user could enter a file name which is an absolute path, such as In the second example, it appears that the user is restricted to opening a file within the class FilesController < ActionController::Base
def first_example
# BAD: This could read any file on the file system
@content = File.read params[:path]
end
def second_example
# BAD: This could still read any file on the file system
@content = File.read "/home/user/#{ params[:path] }"
end
endReferences
ruby/ql/src/queries/security/cwe-078/CommandInjection.qhelpUncontrolled command lineCode that passes user input directly to RecommendationIf possible, use hard-coded string literals to specify the command to run or library to load. Instead of passing the user input directly to the process or library function, examine the user input and then choose among hard-coded string literals. If the applicable libraries or commands cannot be determined at compile time, then add code to verify that the user input string is safe before using it. ExampleThe following example shows code that takes a shell script that can be changed maliciously by a user, and passes it straight to class UsersController < ActionController::Base
def create
command = params[:command]
system(command) # BAD
end
endReferences
ruby/ql/src/queries/security/cwe-078/KernelOpen.qhelpUse of
|
This pull request integrates the Ruby extractor and queries into the public codeql repository.