Merge codeql-ruby into codeql by aibaars · Pull Request #6942 · github/codeql

aibaars · 2021-10-25T10:06:57Z

This pull request integrates the Ruby extractor and queries into the public codeql repository.

Fix filenames in source archives

Co-authored-by: Nick Rolfe <nickrolfe@github.com>

Start modelling some file system access concepts

This seems a more convenient place to keep all the HTTP client modelling.

Model outgoing HTTP requests as remote flow sources

Add support for flow summaries

Add query for Code Injection

Alert suppression and file classifier query

Desugar array literals to `::Array.[]`

Make `{Unary,Binary}Operation` a sub class of `MethodCall`

Model some more HTTP clients

`self` variables are scoped to methods, modules, classes and the top-level of the program. Prior to this change, they were treated as being scoped just to methods. This change means we (once again) correctly synthesise `self` receivers for method calls in class bodies, module bodies and at the top-level.

Some of these look a bit suspicious, so need to double check them before merging.

The change to `self` modelling finds more true positives in this query.

Ruby: warn that Ruby is still in Beta

Sync with `github/codeql:main`

…-path

Merge rc/3.3 into main

This requires changing the CFG trees for classes and modules from post-order to pre-order so that we can place the writes at the root node of the tree, to prevent them overlapping with reads in the body of the class/module. We need to do this because classes and modules don't define their own basic block, but re-use the surrounding one. This problem doesn't occur for `self` variables in methods because each method has its own basic block and we can place the write on the entry node of the bock.

Many `self` reads are synthesised from method calls with an implicit `self` receiver. Synthesised nodes have no `toGenerated` result, which the default definition of `isCapturedAccess` uses to determine if a variable's scope matches the access's scope. Hence we override the definition to properly identify accesses like the call `puts` (below) as captured reads of a `self` variable defined in a parent scope. In other words, `puts x` is short for `self.puts x` and the `self` refers to its value in the scope of the module `Foo`. ```ruby module Foo MY_PROC = -> (x) { puts x } end ``` We also have to update the SSA `SelfDefinition` to exclude captured `self` variables.

Add rb/path-injection query

Conflicts: docs/codeql/query-help/codeql-cwe-coverage.rst

github-actions · 2021-10-25T10:07:54Z

QHelp previews:

javascript/ql/src/Performance/PolynomialReDoS.qhelp

Polynomial regular expression used on uncontrolled data

Some regular expressions take a long time to match certain input strings to the point where the time it takes to match a string of length n is proportional to n^k or even 2ⁿ. Such regular expressions can negatively affect performance, or even allow a malicious user to perform a Denial of Service ("DoS") attack by crafting an expensive input string for the regular expression to match.

The regular expression engines provided by many popular JavaScript platforms use backtracking non-deterministic finite automata to implement regular expression matching. While this approach is space-efficient and allows supporting advanced features like capture groups, it is not time-efficient in general. The worst-case time complexity of such an automaton can be polynomial or even exponential, meaning that for strings of a certain shape, increasing the input length by ten characters may make the automaton about 1000 times slower.

Typically, a regular expression is affected by this problem if it contains a repetition of the form r* or r+ where the sub-expression r is ambiguous in the sense that it can match some string in multiple ways. More information about the precise circumstances can be found in the references.

Recommendation

Modify the regular expression to remove the ambiguity, or ensure that the strings matched with the regular expression are short enough that the time-complexity does not matter.

Example

Consider this use of a regular expression, which removes all leading and trailing whitespace in a string:

			text.replace(/^\s+|\s+$/g, ''); // BAD

The sub-expression "\s+$" will match the whitespace characters in text from left to right, but it can start matching anywhere within a whitespace sequence. This is problematic for strings that do not end with a whitespace character. Such a string will force the regular expression engine to process each whitespace sequence once per whitespace character in the sequence.

This ultimately means that the time cost of trimming a string is quadratic in the length of the string. So a string like "a b" will take milliseconds to process, but a similar string with a million spaces instead of just one will take several minutes.

Avoid this problem by rewriting the regular expression to not contain the ambiguity about when to start matching whitespace sequences. For instance, by using a negative look-behind (/^\s+|(?<!\s)\s+$/g), or just by using the built-in trim method (text.trim()).

Note that the sub-expression "^\s+" is not problematic as the ^ anchor restricts when that sub-expression can start matching, and as the regular expression engine matches from left to right.

Example

As a similar, but slightly subtler problem, consider the regular expression that matches lines with numbers, possibly written using scientific notation:

			^0\.\d+E?\d+$ // BAD

The problem with this regular expression is in the sub-expression \d+E?\d+ because the second \d+ can start matching digits anywhere after the first match of the first \d+ if there is no E in the input string.

This is problematic for strings that do not end with a digit. Such a string will force the regular expression engine to process each digit sequence once per digit in the sequence, again leading to a quadratic time complexity.

To make the processing faster, the regular expression should be rewritten such that the two \d+ sub-expressions do not have overlapping matches: ^0\.\d+(E\d+)?$.

References

OWASP: Regular expression Denial of Service - ReDoS.
Wikipedia: ReDoS.
Wikipedia: Time complexity.
James Kirrage, Asiri Rathnayake, Hayo Thielecke: Static Analysis for Regular Expression Denial-of-Service Attack.
Common Weakness Enumeration: CWE-1333.
Common Weakness Enumeration: CWE-730.
Common Weakness Enumeration: CWE-400.

javascript/ql/src/Performance/ReDoS.qhelp

Inefficient regular expression

Some regular expressions take a long time to match certain input strings to the point where the time it takes to match a string of length n is proportional to n^k or even 2ⁿ. Such regular expressions can negatively affect performance, or even allow a malicious user to perform a Denial of Service ("DoS") attack by crafting an expensive input string for the regular expression to match.

The regular expression engines provided by many popular JavaScript platforms use backtracking non-deterministic finite automata to implement regular expression matching. While this approach is space-efficient and allows supporting advanced features like capture groups, it is not time-efficient in general. The worst-case time complexity of such an automaton can be polynomial or even exponential, meaning that for strings of a certain shape, increasing the input length by ten characters may make the automaton about 1000 times slower.

Typically, a regular expression is affected by this problem if it contains a repetition of the form r* or r+ where the sub-expression r is ambiguous in the sense that it can match some string in multiple ways. More information about the precise circumstances can be found in the references.

Recommendation

Modify the regular expression to remove the ambiguity, or ensure that the strings matched with the regular expression are short enough that the time-complexity does not matter.

Example

Consider this regular expression:

			/^_(__|.)+_$/

Its sub-expression "(__|.)+?" can match the string "__" either by the first alternative "__" to the left of the "|" operator, or by two repetitions of the second alternative "." to the right. Thus, a string consisting of an odd number of underscores followed by some other character will cause the regular expression engine to run for an exponential amount of time before rejecting the input.

This problem can be avoided by rewriting the regular expression to remove the ambiguity between the two branches of the alternative inside the repetition:

			/^_(__|[^_])+_$/

References

OWASP: Regular expression Denial of Service - ReDoS.
Wikipedia: ReDoS.
Wikipedia: Time complexity.
James Kirrage, Asiri Rathnayake, Hayo Thielecke: Static Analysis for Regular Expression Denial-of-Service Attack.
Common Weakness Enumeration: CWE-1333.
Common Weakness Enumeration: CWE-730.
Common Weakness Enumeration: CWE-400.

python/ql/src/experimental/Security/CWE-730/PolynomialReDoS.qhelp

Polynomial regular expression used on uncontrolled data

Some regular expressions take a long time to match certain input strings to the point where the time it takes to match a string of length n is proportional to n^k or even 2ⁿ. Such regular expressions can negatively affect performance, or even allow a malicious user to perform a Denial of Service ("DoS") attack by crafting an expensive input string for the regular expression to match.

The regular expression engine provided by Python uses a backtracking non-deterministic finite automata to implement regular expression matching. While this approach is space-efficient and allows supporting advanced features like capture groups, it is not time-efficient in general. The worst-case time complexity of such an automaton can be polynomial or even exponential, meaning that for strings of a certain shape, increasing the input length by ten characters may make the automaton about 1000 times slower.

Typically, a regular expression is affected by this problem if it contains a repetition of the form r* or r+ where the sub-expression r is ambiguous in the sense that it can match some string in multiple ways. More information about the precise circumstances can be found in the references.

Recommendation

Modify the regular expression to remove the ambiguity, or ensure that the strings matched with the regular expression are short enough that the time-complexity does not matter.

Example

Consider this use of a regular expression, which removes all leading and trailing whitespace in a string:

			re.sub(r"^\s+|\s+$", "", text) # BAD

The sub-expression "\s+$" will match the whitespace characters in text from left to right, but it can start matching anywhere within a whitespace sequence. This is problematic for strings that do not end with a whitespace character. Such a string will force the regular expression engine to process each whitespace sequence once per whitespace character in the sequence.

This ultimately means that the time cost of trimming a string is quadratic in the length of the string. So a string like "a b" will take milliseconds to process, but a similar string with a million spaces instead of just one will take several minutes.

Avoid this problem by rewriting the regular expression to not contain the ambiguity about when to start matching whitespace sequences. For instance, by using a negative look-behind (^\s+|(?<!\s)\s+$), or just by using the built-in strip method (text.strip()).

Note that the sub-expression "^\s+" is not problematic as the ^ anchor restricts when that sub-expression can start matching, and as the regular expression engine matches from left to right.

Example

As a similar, but slightly subtler problem, consider the regular expression that matches lines with numbers, possibly written using scientific notation:

			^0\.\d+E?\d+$ # BAD

The problem with this regular expression is in the sub-expression \d+E?\d+ because the second \d+ can start matching digits anywhere after the first match of the first \d+ if there is no E in the input string.

This is problematic for strings that do not end with a digit. Such a string will force the regular expression engine to process each digit sequence once per digit in the sequence, again leading to a quadratic time complexity.

To make the processing faster, the regular expression should be rewritten such that the two \d+ sub-expressions do not have overlapping matches: ^0\.\d+(E\d+)?$.

References

OWASP: Regular expression Denial of Service - ReDoS.
Wikipedia: ReDoS.
Wikipedia: Time complexity.
James Kirrage, Asiri Rathnayake, Hayo Thielecke: Static Analysis for Regular Expression Denial-of-Service Attack.
Common Weakness Enumeration: CWE-730.
Common Weakness Enumeration: CWE-400.

python/ql/src/experimental/Security/CWE-730/ReDoS.qhelp

Inefficient regular expression

Some regular expressions take a long time to match certain input strings to the point where the time it takes to match a string of length n is proportional to n^k or even 2ⁿ. Such regular expressions can negatively affect performance, or even allow a malicious user to perform a Denial of Service ("DoS") attack by crafting an expensive input string for the regular expression to match.

The regular expression engine provided by Python uses a backtracking non-deterministic finite automata to implement regular expression matching. While this approach is space-efficient and allows supporting advanced features like capture groups, it is not time-efficient in general. The worst-case time complexity of such an automaton can be polynomial or even exponential, meaning that for strings of a certain shape, increasing the input length by ten characters may make the automaton about 1000 times slower.

Typically, a regular expression is affected by this problem if it contains a repetition of the form r* or r+ where the sub-expression r is ambiguous in the sense that it can match some string in multiple ways. More information about the precise circumstances can be found in the references.

Recommendation

Modify the regular expression to remove the ambiguity, or ensure that the strings matched with the regular expression are short enough that the time-complexity does not matter.

Example

Consider this regular expression:

			^_(__|.)+_$

Its sub-expression "(__|.)+?" can match the string "__" either by the first alternative "__" to the left of the "|" operator, or by two repetitions of the second alternative "." to the right. Thus, a string consisting of an odd number of underscores followed by some other character will cause the regular expression engine to run for an exponential amount of time before rejecting the input.

This problem can be avoided by rewriting the regular expression to remove the ambiguity between the two branches of the alternative inside the repetition:

			^_(__|[^_])+_$

References

OWASP: Regular expression Denial of Service - ReDoS.
Wikipedia: ReDoS.
Wikipedia: Time complexity.
James Kirrage, Asiri Rathnayake, Hayo Thielecke: Static Analysis for Regular Expression Denial-of-Service Attack.
Common Weakness Enumeration: CWE-730.
Common Weakness Enumeration: CWE-400.

ruby/ql/src/queries/security/cwe-022/PathInjection.qhelp

Uncontrolled data used in path expression

Accessing files using paths constructed from user-controlled data can allow an attacker to access unexpected resources. This can result in sensitive information being revealed or deleted, or an attacker being able to influence behavior by modifying unexpected files.

Recommendation

Validate user input before using it to construct a file path, either using an off-the-shelf library like ActiveStorage::Filename#sanitized in Rails, or by performing custom validation.

Ideally, follow these rules:

Do not allow more than a single "." character.
Do not allow directory separators such as "/" or "\" (depending on the file system).
Do not rely on simply replacing problematic sequences such as "../". For example, after applying this filter to ".../...//", the resulting string would still be "../".
Use a whitelist of known good patterns.

Example

In the first example, a file name is read from an HTTP request and then used to access a file. However, a malicious user could enter a file name which is an absolute path, such as "/etc/passwd".

In the second example, it appears that the user is restricted to opening a file within the "user" home directory. However, a malicious user could enter a file name containing special characters. For example, the string "../../etc/passwd" will result in the code reading the file located at "/home/user/../../etc/passwd", which is the system's password file. This file would then be sent back to the user, giving them access to all the system's passwords.

class FilesController < ActionController::Base
  def first_example
    # BAD: This could read any file on the file system
    @content = File.read params[:path]
  end

  def second_example
    # BAD: This could still read any file on the file system
    @content = File.read "/home/user/#{ params[:path] }"
  end
end

References

OWASP: Path Traversal.
Rails: ActiveStorage::Filename#sanitized.
Common Weakness Enumeration: CWE-22.
Common Weakness Enumeration: CWE-23.
Common Weakness Enumeration: CWE-36.
Common Weakness Enumeration: CWE-73.
Common Weakness Enumeration: CWE-99.

ruby/ql/src/queries/security/cwe-078/CommandInjection.qhelp

Uncontrolled command line

Code that passes user input directly to Kernel.system, Kernel.exec, or some other library routine that executes a command, allows the user to execute malicious code.

Recommendation

If possible, use hard-coded string literals to specify the command to run or library to load. Instead of passing the user input directly to the process or library function, examine the user input and then choose among hard-coded string literals.

If the applicable libraries or commands cannot be determined at compile time, then add code to verify that the user input string is safe before using it.

Example

The following example shows code that takes a shell script that can be changed maliciously by a user, and passes it straight to Kernel.system without examining it first.

class UsersController < ActionController::Base
  def create
    command = params[:command]
    system(command) # BAD
  end
end

References

OWASP: Command Injection.
Common Weakness Enumeration: CWE-78.
Common Weakness Enumeration: CWE-88.

ruby/ql/src/queries/security/cwe-078/KernelOpen.qhelp

Use of `Kernel.open` or `IO.read`

If Kernel.open is given a file name that starts with a | character, it will execute the remaining string as a shell command. If a malicious user can control the file name, they can execute arbitrary code. The same vulnerability applies to IO.read.

Recommendation

Use File.open instead of Kernel.open, as the former does not have this vulnerability. Similarly, use File.read instead of IO.read.

Example

The following example shows code that calls Kernel.open on a user-supplied file path.

class UsersController < ActionController::Base
  def create
    filename = params[:filename]
    open(filename) # BAD
  end
end

Instead, File.open should be used, as in the following example.

class UsersController < ActionController::Base
    def create
      filename = params[:filename]
      File.open(filename)
    end
  end

References

OWASP: Command Injection.
Example CVE: Command Injection in RDoc.
Common Weakness Enumeration: CWE-78.
Common Weakness Enumeration: CWE-88.
Common Weakness Enumeration: CWE-73.

ruby/ql/src/queries/security/cwe-079/ReflectedXSS.qhelp

Reflected server-side cross-site scripting

Directly writing user input (for example, an HTTP request parameter) to a webpage, without properly sanitizing the input first, allows for a cross-site scripting vulnerability.

Recommendation

To guard against cross-site scripting, escape user input before writing it to the page. Some frameworks, such as Rails, perform this escaping implicitly and by default.

Take care when using methods such as html_safe or raw. They can be used to emit a string without escaping it, and should only be used when the string has already been manually escaped (for example, with the Rails html_escape method), or when the content is otherwise guaranteed to be safe (such as a hard-coded string).

Example

The following example is safe because the params[:user_name] content within the output tags will be HTML-escaped automatically before being emitted.

<p>Hello <%= params[:user_name] %>!</p>

However, the following example is unsafe because user-controlled input is emitted without escaping, since it is marked as html_safe.

<p>Hello <%= params[:user_name].html_safe %>!</p>

References

OWASP: XSS Ruby on Rails Cheatsheet.
Wikipedia: Cross-site scripting.
Common Weakness Enumeration: CWE-79.
Common Weakness Enumeration: CWE-116.

ruby/ql/src/queries/security/cwe-079/StoredXSS.qhelp

Stored cross-site scripting

Directly writing an uncontrolled stored value (for example, a database field) to a webpage, without properly sanitizing the value first, allows for a cross-site scripting vulnerability.

This kind of vulnerability is also called stored cross-site scripting, to distinguish it from other types of cross-site scripting.

Recommendation

To guard against stored cross-site scripting, consider escaping before using uncontrolled stored values to create HTML content. Some frameworks, such as Rails, perform this escaping implicitly and by default.

Take care when using methods such as html_safe or raw. They can be used to emit a string without escaping it, and should only be used when the string has already been manually escaped (for example, with the Rails html_escape method), or when the content is otherwise guaranteed to be safe (such as a hard-coded string).

Example

The following example is safe because the user.name content within the output tags will be HTML-escaped automatically before being emitted.

<% user = User.find(1) %>
<p>Hello <%= user.name %>!</p>

However, the following example may be unsafe because user.name is emitted without escaping, since it is marked as html_safe. If the name is not sanitized before being written to the database, then an attacker could use this to insert arbitrary content into the HTML output, including scripts.

<% user = User.find(1) %>
<p>Hello <%= user.name.html_safe %>!</p>

In the next example, content from a file on disk is inserted literally into HTML content. This approach is sometimes used to load script content, such as extensions for a web application, from files on disk. Care should taken in these cases to ensure both that the loaded files are trusted, and that the file cannot be modified by untrusted users.

<script>
  <%= File.read(File.join(SCRIPT_DIR, "script.js")).html_safe %>
</script>

References

OWASP: XSS Ruby on Rails Cheatsheet.
Wikipedia: Cross-site scripting.
Common Weakness Enumeration: CWE-79.
Common Weakness Enumeration: CWE-116.

ruby/ql/src/queries/security/cwe-089/SqlInjection.qhelp

SQL query built from user-controlled sources

If a database query (such as a SQL or NoSQL query) is built from user-provided data without sufficient sanitization, a malicious user may be able to run malicious database queries.

Recommendation

Most database connector libraries offer a way of safely embedding untrusted data into a query by means of query parameters or prepared statements.

Example

In the following Rails example, an ActionController class has a text_bio method to handle requests to fetch a biography for a specified user.

The user is specified using a parameter, user_name provided by the client. This value is accessible using the params method.

The method illustrates three different ways to construct and execute an SQL query to find the user by name.

In the first case, the parameter user_name is inserted into an SQL fragment using string interpolation. The parameter is user-supplied and is not sanitized. An attacker could use this to construct SQL queries that were not intended to be executed here.

The second case uses string concatenation and is vulnerable in the same way that the first case is.

In the third case, the name is passed in a hash instead. ActiveRecord will construct a parameterized SQL query that is not vulnerable to SQL injection attacks.

class UserController < ActionController::Base
  def text_bio
    # BAD -- Using string interpolation
    user = User.find_by "name = '#{params[:user_name]}'"

    # BAD -- Using string concatenation
    find_str = "name = '" + params[:user_name] + "'"
    user = User.find_by(find_str)

    # GOOD -- Using a hash to parameterize arguments
    user = User.find_by name: params[:user_name]

    render plain: user&.text_bio
  end
end

References

Wikipedia: SQL injection.
OWASP: SQL Injection Prevention Cheat Sheet.
Common Weakness Enumeration: CWE-89.

ruby/ql/src/queries/security/cwe-094/CodeInjection.qhelp

Code injection

Directly evaluating user input (for example, an HTTP request parameter) as code without first sanitizing the input allows an attacker arbitrary code execution. This can occur when user input is passed to code that interprets it as an expression to be evaluated, using methods such as Kernel.eval or Kernel.send.

Recommendation

Avoid including user input in any expression that may be dynamically evaluated. If user input must be included, use context-specific escaping before including it. It is important that the correct escaping is used for the type of evaluation that will occur.

Example

The following example shows two functions setting a name from a request. The first function uses eval to execute the set_name method. This is dangerous as it can allow a malicious user to execute arbitrary code on the server. For example, the user could supply the value "' + exec('rm -rf') + '" to destroy the server's file system. The second function calls the set_name method directly and is thus safe.

class UsersController < ActionController::Base
  # BAD - Allow user to define code to be run.
  def create_bad
    first_name = params[:first_name]
    eval("set_name(#{first_name})")
  end

  # GOOD - Call code directly
  def create_good
    first_name = params[:first_name]
    set_name(first_name)
  end

  def set_name(name)
    @name = name
  end
end

References

OWASP: Code Injection.
Wikipedia: Code Injection.
Common Weakness Enumeration: CWE-94.
Common Weakness Enumeration: CWE-95.
Common Weakness Enumeration: CWE-116.

ruby/ql/src/queries/security/cwe-1333/PolynomialReDoS.qhelp

Polynomial regular expression used on uncontrolled data

Some regular expressions take a long time to match certain input strings to the point where the time it takes to match a string of length n is proportional to n^k or even 2ⁿ. Such regular expressions can negatively affect performance, or even allow a malicious user to perform a Denial of Service ("DoS") attack by crafting an expensive input string for the regular expression to match.

The regular expression engine used by the Ruby interpreter (MRI) uses backtracking non-deterministic finite automata to implement regular expression matching. While this approach is space-efficient and allows supporting advanced features like capture groups, it is not time-efficient in general. The worst-case time complexity of such an automaton can be polynomial or even exponential, meaning that for strings of a certain shape, increasing the input length by ten characters may make the automaton about 1000 times slower.

Typically, a regular expression is affected by this problem if it contains a repetition of the form r* or r+ where the sub-expression r is ambiguous in the sense that it can match some string in multiple ways. More information about the precise circumstances can be found in the references.

Recommendation

Modify the regular expression to remove the ambiguity, or ensure that the strings matched with the regular expression are short enough that the time-complexity does not matter.

Example

Consider this use of a regular expression, which removes all leading and trailing whitespace in a string:

			text.gsub!(/^\s+|\s+$/, '') # BAD

The sub-expression "\s+$" will match the whitespace characters in text from left to right, but it can start matching anywhere within a whitespace sequence. This is problematic for strings that do not end with a whitespace character. Such a string will force the regular expression engine to process each whitespace sequence once per whitespace character in the sequence.

This ultimately means that the time cost of trimming a string is quadratic in the length of the string. So a string like "a b" will take milliseconds to process, but a similar string with a million spaces instead of just one will take several minutes.

Avoid this problem by rewriting the regular expression to not contain the ambiguity about when to start matching whitespace sequences. For instance, by using a negative look-behind (/^\s+|(?<!\s)\s+$/), or just by using the built-in strip method (text.strip!).

Note that the sub-expression "^\s+" is not problematic as the ^ anchor restricts when that sub-expression can start matching, and as the regular expression engine matches from left to right.

Example

As a similar, but slightly subtler problem, consider the regular expression that matches lines with numbers, possibly written using scientific notation:

			/^0\.\d+E?\d+$/ # BAD

The problem with this regular expression is in the sub-expression \d+E?\d+ because the second \d+ can start matching digits anywhere after the first match of the first \d+ if there is no E in the input string.

This is problematic for strings that do not end with a digit. Such a string will force the regular expression engine to process each digit sequence once per digit in the sequence, again leading to a quadratic time complexity.

To make the processing faster, the regular expression should be rewritten such that the two \d+ sub-expressions do not have overlapping matches: /^0\.\d+(E\d+)?$/.

References

OWASP: Regular expression Denial of Service - ReDoS.
Wikipedia: ReDoS.
Wikipedia: Time complexity.
James Kirrage, Asiri Rathnayake, Hayo Thielecke: Static Analysis for Regular Expression Denial-of-Service Attack.
Common Weakness Enumeration: CWE-1333.
Common Weakness Enumeration: CWE-730.
Common Weakness Enumeration: CWE-400.

ruby/ql/src/queries/security/cwe-1333/ReDoS.qhelp

Inefficient regular expression

Some regular expressions take a long time to match certain input strings to the point where the time it takes to match a string of length n is proportional to n^k or even 2ⁿ. Such regular expressions can negatively affect performance, or even allow a malicious user to perform a Denial of Service ("DoS") attack by crafting an expensive input string for the regular expression to match.

The regular expression engine used by the Ruby interpreter (MRI) uses backtracking non-deterministic finite automata to implement regular expression matching. While this approach is space-efficient and allows supporting advanced features like capture groups, it is not time-efficient in general. The worst-case time complexity of such an automaton can be polynomial or even exponential, meaning that for strings of a certain shape, increasing the input length by ten characters may make the automaton about 1000 times slower.

Typically, a regular expression is affected by this problem if it contains a repetition of the form r* or r+ where the sub-expression r is ambiguous in the sense that it can match some string in multiple ways. More information about the precise circumstances can be found in the references.

Recommendation

Modify the regular expression to remove the ambiguity, or ensure that the strings matched with the regular expression are short enough that the time-complexity does not matter.

Example

Consider this regular expression:

      /^_(__|.)+_$/

Its sub-expression "(__|.)+?" can match the string "__" either by the first alternative "__" to the left of the "|" operator, or by two repetitions of the second alternative "." to the right. Thus, a string consisting of an odd number of underscores followed by some other character will cause the regular expression engine to run for an exponential amount of time before rejecting the input.

This problem can be avoided by rewriting the regular expression to remove the ambiguity between the two branches of the alternative inside the repetition:

      /^_(__|[^_])+_$/

References

OWASP: Regular expression Denial of Service - ReDoS.
Wikipedia: ReDoS.
Wikipedia: Time complexity.
James Kirrage, Asiri Rathnayake, Hayo Thielecke: Static Analysis for Regular Expression Denial-of-Service Attack.
Common Weakness Enumeration: CWE-1333.
Common Weakness Enumeration: CWE-730.
Common Weakness Enumeration: CWE-400.

ruby/ql/src/queries/security/cwe-295/RequestWithoutValidation.qhelp

Request without certificate validation

Certificate validation is the standard authentication method of a secure TLS connection. Without it, there is no guarantee about who the other party of a TLS connection is, making man-in-the-middle attacks more likely to occur.

When testing software that uses TLS connections, it may be useful to disable the certificate validation temporarily. But disabling it in production environments is strongly discouraged, unless an alternative method of authentication is used.

Recommendation

Do not disable certificate validation for TLS connections.

Example

The following example shows an HTTPS connection that makes a GET request to a remote server. But the connection is not secure since the verify_mode option of the connection is set to OpenSSL::SSL::VERIFY_NONE. As a consequence, anyone can impersonate the remote server.

require "net/https"
require "uri"

uri = URI.parse "https://example.com/"
http = Net::HTTP.new uri.host, uri.port
http.use_ssl = true
http.verify_mode = OpenSSL::SSL::VERIFY_NONE
request = Net::HTTP::Get.new uri.request_uri
puts http.request(request).body

To make the connection secure, the verify_mode option should have its default value, or be explicitly set to OpenSSL::SSL::VERIFY_PEER.

References

Wikipedia: Transport Layer Security (TLS)
Wikipedia: Man-in-the-middle attack
Ruby-doc: Net::HTTP
Common Weakness Enumeration: CWE-295.

ruby/ql/src/queries/security/cwe-502/UnsafeDeserialization.qhelp

Deserialization of user-controlled data

Deserializing untrusted data using any method that allows the construction of arbitrary objects is easily exploitable and, in many cases, allows an attacker to execute arbitrary code.

Recommendation

Avoid deserialization of untrusted data if possible. If the architecture permits it, use serialization formats that cannot represent arbitarary objects. For libraries that support it, such as the Ruby standard library's JSON module, ensure that the parser is configured to disable deserialization of arbitrary objects.

Example

The following example calls the Marshal.load, JSON.load, YAML.load, and Oj.load methods on data from an HTTP request. Since these methods are capable of deserializing to arbitrary objects, this is inherently unsafe.

require 'json'
require 'yaml'
require 'oj'

class UserController < ActionController::Base
  def marshal_example
    data = Base64.decode64 params[:data]
    object = Marshal.load data
    # ...
  end

  def json_example
    object = JSON.load params[:json]
    # ...
  end

  def yaml_example
    object = YAML.load params[:yaml]
    # ...
  end

  def oj_example
    object = Oj.load params[:json]
    # ...
  end
end

Using JSON.parse and YAML.safe_load instead, as in the following example, removes the vulnerability. Similarly, calling Oj.load with any mode other than :object is safe, as is calling Oj.safe_load. Note that there is no safe way to deserialize untrusted data using Marshal.

require 'json'

class UserController < ActionController::Base
  def safe_json_example
    object = JSON.parse params[:json]
    # ...
  end

  def safe_yaml_example
    object = YAML.safe_load params[:yaml]
    # ...
  end

  def safe_oj_example
    object = Oj.load params[:yaml], { mode: :strict }
    # or
    object = Oj.safe_load params[:yaml]
    # ...
  end
end

References

OWASP vulnerability description: deserialization of untrusted data.
Ruby documentation: guidance on deserializing objects safely.
Ruby documentation: security guidance on the Marshal library.
Ruby documentation: security guidance on JSON.load.
Ruby documentation: security guidance on the YAML library.
Common Weakness Enumeration: CWE-502.

ruby/ql/src/queries/security/cwe-601/UrlRedirect.qhelp

URL redirection from remote source

Directly incorporating user input into a URL redirect request without validating the input can facilitate phishing attacks. In these attacks, unsuspecting users can be redirected to a malicious site that looks very similar to the real site they intend to visit, but which is controlled by the attacker.

Recommendation

To guard against untrusted URL redirection, it is advisable to avoid putting user input directly into a redirect URL. Instead, maintain a list of authorized redirects on the server; then choose from that list based on the user input provided.

Example

The following example shows an HTTP request parameter being used directly in a URL redirect without validating the input, which facilitates phishing attacks:

class HelloController < ActionController::Base
  def hello
    redirect_to params[:url]
  end
end

One way to remedy the problem is to validate the user input against a known fixed string before doing the redirection:

class HelloController < ActionController::Base
  VALID_REDIRECT = "http://cwe.mitre.org/data/definitions/601.html"

  def hello
    if params[:url] == VALID_REDIRECT
      redirect_to params[:url]
    else
      # error
    end
  end
end

References

OWASP: XSS Unvalidated Redirects and Forwards Cheat Sheet.
Rails Guides: Redirection and Files.
Common Weakness Enumeration: CWE-601.

ruby/ql/src/queries/security/cwe-611/Xxe.qhelp

XML external entity expansion

Parsing untrusted XML files with a weakly configured XML parser may lead to an XML External Entity (XXE) attack. This type of attack uses external entity references to access arbitrary files on a system, carry out denial-of-service (DoS) attacks, or server-side request forgery. Even when the result of parsing is not returned to the user, DoS attacks are still possible and out-of-band data retrieval techniques may allow attackers to steal sensitive data.

Recommendation

The easiest way to prevent XXE attacks is to disable external entity handling when parsing untrusted data. How this is done depends on the library being used. Note that some libraries, such as rexml, nokogiri and libxml-ruby, disable entity expansion by default, so unless you have explicitly enabled entity expansion, no further action needs to be taken.

Example

The following example uses the nokogiri XML parser to parse a string xmlSrc. If that string is from an untrusted source, this code may be vulnerable to an XXE attack, since the parser is invoked with the noent option set:

require "nokogiri"

def process_data1
   xmlSrc = request.body
   doc = Nokogiri::XML.parse(xmlSrc, nil, nil, Nokogiri::XML::ParseOptions::NOENT) # BAD
end

def process_data2
   xmlSrc = request.body
   doc = Nokogiri::XML.parse(xmlSrc) { |config| config.noent } # BAD
end

To guard against XXE attacks, the noent option should be omitted or cleared (e.g. using nonoent). This means that no entity expansion is undertaken at all, not even for standard internal entities such as & or >. If desired, these entities can be expanded in a separate step using utility functions.

require "nokogiri"

def process_data1
   xmlSrc = request.body
   doc = Nokogiri::XML.parse(xmlSrc) # GOOD
end

def process_data2
   xmlSrc = request.body
   doc = Nokogiri::XML.parse(xmlSrc) { |config| config.nonoent } # GOOD
end

References

OWASP: XML External Entity (XXE) Processing.
Timothy Morgen: XML Schema, DTD, and Entity Attacks.
Timur Yunusov, Alexey Osipov: XML Out-Of-Band Data Retrieval.
Common Weakness Enumeration: CWE-611.
Common Weakness Enumeration: CWE-776.
Common Weakness Enumeration: CWE-827.

ruby/ql/src/queries/security/cwe-732/WeakFilePermissions.qhelp

Overly permissive file permissions

When creating a file, POSIX systems allow permissions to be specified for owner, group and others separately. Permissions should be kept as strict as possible, preventing access to the files contents by other users.

Recommendation

Restrict the file permissions of files to prevent any but the owner being able to read or write to that file

References

Wikipedia: File system permissions.
Common Weakness Enumeration: CWE-732.

ruby/ql/src/queries/security/cwe-798/HardcodedCredentials.qhelp

Hard-coded credentials

Including unencrypted hard-coded inbound or outbound authentication credentials within source code or configuration files is dangerous because the credentials may be easily discovered.

Source or configuration files containing hard-coded credentials may be visible to an attacker. For example, the source code may be open source, or it may be leaked or accidentally revealed.

For inbound authentication, hard-coded credentials may allow unauthorized access to the system. This is particularly problematic if the credential is hard-coded in the source code, because it cannot be disabled easily. For outbound authentication, the hard-coded credentials may provide an attacker with privileged information or unauthorized access to some other system.

Recommendation

Remove hard-coded credentials, such as user names, passwords and certificates, from source code, placing them in configuration files or other data stores if necessary. If possible, store configuration files including credential data separately from the source code, in a secure location with restricted access.

For outbound authentication details, consider encrypting the credentials or the enclosing data stores or configuration files, and using permissions to restrict access.

For inbound authentication details, consider hashing passwords using standard library functions where possible. For example, OpenSSL::KDF.pbkdf2_hmac.

Example

The following examples shows different types of inbound and outbound authentication.

In the first case, RackAppBad, we accept a password from a remote user, and compare it against a plaintext string literal. If an attacker acquires the source code they can observe the password, and can log in to the system. Furthermore, if such an intrusion was discovered, the application would need to be rewritten and redeployed in order to change the password.

In the second case, RackAppGood, the password is compared to a hashed and salted password stored in a configuration file, using OpenSSL::KDF.pbkdf2_hmac. In this case, access to the source code or the assembly would not reveal the password to an attacker. Even access to the configuration file containing the password hash and salt would be of little value to an attacker, as it is usually extremely difficult to reverse engineer the password from the hash and salt. In a real application care should be taken to make the string comparison of the hashed input against the hashed password take close to constant time, as this will make timing attacks more difficult.

require 'rack'
require 'yaml'
require 'openssl'

class RackAppBad
  def call(env)
    req = Rack::Request.new(env)
    password = req.params['password']

    # BAD: Inbound authentication made by comparison to string literal
    if password == 'myPa55word'
      [200, {'Content-type' => 'text/plain'}, ['OK']]
    else
      [403, {'Content-type' => 'text/plain'}, ['Permission denied']]
    end
  end
end

class RackAppGood
  def call(env)
    req = Rack::Request.new(env)
    password = req.params['password']

    config_file = YAML.load_file('config.yml')
    hashed_password = config_file['hashed_password']
    salt = [config_file['salt']].pack('H*')

    #GOOD: Inbound authentication made by comparing to a hash password from a config file.
    hash = OpenSSL::Digest::SHA256.new
    dk = OpenSSL::KDF.pbkdf2_hmac(
      password, salt: salt, hash: hash, iterations: 100_000, length: hash.digest_length
    )
    hashed_input = dk.unpack('H*').first
    if hashed_password == hashed_input
      [200, {'Content-type' => 'text/plain'}, ['OK']]
    else
      [403, {'Content-type' => 'text/plain'}, ['Permission denied']]
    end
  end
end

References

OWASP: XSS Use of hard-coded password.
Common Weakness Enumeration: CWE-259.
Common Weakness Enumeration: CWE-321.
Common Weakness Enumeration: CWE-798.

nickrolfe and others added 30 commits September 21, 2021 11:37

Merge pull request #301 from github/fix_source_archive

dd31473

Fix filenames in source archives

start modelling some file access concepts

25300cb

revamp weak file permissions query

d1f2258

use instanceof extensions for some filesystem concepts

6315621

Files.qll library implementation

05a04f4

Files library tests

70c2be8

Desugar array literals to ::Array.[]

888183f

Replace string kind with boolean preservesValue

a37737d

Move two predicates in FlowSummaryImplSpecific.qll

e670fdb

Apply suggestions from code review

0092c02

Co-authored-by: Nick Rolfe <nickrolfe@github.com>

test for IO.open as a way of creating an IO instance

b769aa6

Merge pull request #283 from github/file-system-sources

21e31a4

Start modelling some file system access concepts

Add HTTP::Client concept

2bdea01

Add Net::HTTP request modelling

3000587

Simplify Net::HTTP modelling

b658bac

Move Net::HTTP modelling into http_clients module

5826f2c

This seems a more convenient place to keep all the HTTP client modelling.

Merge pull request #306 from github/hmac-outgoing-http

83705c5

Model outgoing HTTP requests as remote flow sources

Address review comments

68d41f9

Address review comments

41608ef

Merge pull request #277 from github/hvitved/flow-summaries

f347505

Add support for flow summaries

Merge pull request #293 from github/hmac-code-injection

4f9518a

Add query for Code Injection

Merge pull request #297 from github/aibaars/alert-suppression

40f0112

Alert suppression and file classifier query

Merge pull request #305 from github/hvitved/desugar/array-literals

ca2ff9a

Desugar array literals to `::Array.[]`

Model the Excon HTTP client

ee51298

Model the Faraday HTTP client

4cf520c

Model the RestClient HTTP client

88885a2

Make {Unary,Binary}Operation a sub class of MethodCall

edfdfb1

Include MethodCall.getAChild in {Unary,Binary}Operation.getAChild

30d2df5

Merge pull request #308 from github/hvitved/operation-method-call

141f5f7

Make `{Unary,Binary}Operation` a sub class of `MethodCall`

Merge pull request #307 from github/hmac-outgoing-http-2

74982cb

Model some more HTTP clients

hmac and others added 25 commits October 20, 2021 09:43

Update test fixtures

c437fd5

Some of these look a bit suspicious, so need to double check them before merging.

Remove redundant import

e0b2d88

Make SelfVariableAccess a sub type of LocalVariableAccess

94f0f8d

Make all self nodes LocalSourceNodes

f1f7930

Add rb/path-injection query

86da3c2

Import Frameworks.qll in Summaries module

e367832

Add change-note for rb/path-injection

16c62d2

Update stored XSS fixture

356828c

The change to `self` modelling finds more true positives in this query.

Merge remote-tracking branch 'codeql/main' into 'main'

0bec898

Ruby: sync-files and add allowParameterReturnInSelf()

630ee17

Ruby: warn that Ruby is still in Beta

74ba532

Merge pull request #373 from github/aibaars/warn-ruby-beta

62d25b6

Ruby: warn that Ruby is still in Beta

Merge pull request #374 from github/aibaars/sync-codeql

f9f0fab

Sync with `github/codeql:main`

Merge remote-tracking branch 'origin/main' into external-control-file…

5734f51

…-path

Merge pull request #375 from github/rc/3.3

4f72d08

Merge rc/3.3 into main

Update test fixtures

aa86070

Add taint summaries for ActiveStorage::Filename

d4cee73

Minor refactor

87df3a0

Data flow: Assign empty locations to summary nodes

61d7cde

Merge pull request #358 from github/external-control-file-path

3851a27

Add rb/path-injection query

Merge pull request #335 from github/hmac/self-flow

f020b2e

Merge branch 'main' of github.com:github/codeql into 'main'

4f79398

Conflicts: docs/codeql/query-help/codeql-cwe-coverage.rst

github-actions Bot added the documentation label Oct 25, 2021

nickrolfe approved these changes Oct 25, 2021

View reviewed changes

aibaars merged commit afc7867 into main Oct 25, 2021

aibaars deleted the aibaars/patch-10 branch October 25, 2021 10:33

Conversation

aibaars commented Oct 25, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Oct 25, 2021

Polynomial regular expression used on uncontrolled data

Recommendation

Example

Example

References

Inefficient regular expression

Recommendation

Example

References

Polynomial regular expression used on uncontrolled data

Recommendation

Example

Example

References

Inefficient regular expression

Recommendation

Example

References

Uncontrolled data used in path expression

Recommendation

Example

References

Uncontrolled command line

Recommendation

Example

References

Use of Kernel.open or IO.read

Recommendation

Example

References

Reflected server-side cross-site scripting

Recommendation

Example

References

Stored cross-site scripting

Recommendation

Example

References

SQL query built from user-controlled sources

Recommendation

Example

References

Code injection

Recommendation

Example

References

Polynomial regular expression used on uncontrolled data

Recommendation

Example

Example

References

Inefficient regular expression

Recommendation

Example

References

Request without certificate validation

Recommendation

Example

References

Deserialization of user-controlled data

Recommendation

Example

References

URL redirection from remote source

Recommendation

Example

References

XML external entity expansion

Recommendation

Example

References

Overly permissive file permissions

Recommendation

References

Hard-coded credentials

aibaars commented Oct 25, 2021 •

edited

Loading

Use of `Kernel.open` or `IO.read`