Skip to content

Merge codeql-ruby into codeql#6942

Merged
aibaars merged 1525 commits into
mainfrom
aibaars/patch-10
Oct 25, 2021
Merged

Merge codeql-ruby into codeql#6942
aibaars merged 1525 commits into
mainfrom
aibaars/patch-10

Conversation

@aibaars
Copy link
Copy Markdown
Contributor

@aibaars aibaars commented Oct 25, 2021

This pull request integrates the Ruby extractor and queries into the public codeql repository.

nickrolfe and others added 30 commits September 21, 2021 11:37
Fix filenames in source archives
Co-authored-by: Nick Rolfe <nickrolfe@github.com>
Start modelling some file system access concepts
This seems a more convenient place to keep all the HTTP client
modelling.
Model outgoing HTTP requests as remote flow sources
Alert suppression and file classifier query
Make `{Unary,Binary}Operation` a sub class of `MethodCall`
hmac and others added 25 commits October 20, 2021 09:43
`self` variables are scoped to methods, modules, classes and the
top-level of the program. Prior to this change, they were treated as
being scoped just to methods.

This change means we (once again) correctly synthesise `self` receivers
for method calls in class bodies, module bodies and at the top-level.
Some of these look a bit suspicious, so need to double check them before
merging.
The change to `self` modelling finds more true positives in this query.
Ruby: warn that Ruby is still in Beta
This requires changing the CFG trees for classes and modules from
post-order to pre-order so that we can place the writes at the root node
of the tree, to prevent them overlapping with reads in the body of the
class/module.

We need to do this because classes and modules don't define their own
basic block, but re-use the surrounding one. This problem doesn't occur
for `self` variables in methods because each method has its own basic
block and we can place the write on the entry node of the bock.
Many `self` reads are synthesised from method calls with an implicit
`self` receiver. Synthesised nodes have no `toGenerated` result, which
the default definition of `isCapturedAccess` uses to determine if a
variable's scope matches the access's scope.

Hence we override the definition to properly identify accesses like the
call `puts` (below) as captured reads of a `self` variable defined in a
parent scope.

In other words, `puts x` is short for `self.puts x` and the `self`
refers to its value in the scope of the module `Foo`.

```ruby
module Foo
  MY_PROC = -> (x) { puts x }
end
```

We also have to update the SSA `SelfDefinition` to exclude captured
`self` variables.
Conflicts:
	docs/codeql/query-help/codeql-cwe-coverage.rst
@github-actions
Copy link
Copy Markdown
Contributor

QHelp previews:

javascript/ql/src/Performance/PolynomialReDoS.qhelp

Polynomial regular expression used on uncontrolled data

Some regular expressions take a long time to match certain input strings to the point where the time it takes to match a string of length n is proportional to nk or even 2n. Such regular expressions can negatively affect performance, or even allow a malicious user to perform a Denial of Service ("DoS") attack by crafting an expensive input string for the regular expression to match.

The regular expression engines provided by many popular JavaScript platforms use backtracking non-deterministic finite automata to implement regular expression matching. While this approach is space-efficient and allows supporting advanced features like capture groups, it is not time-efficient in general. The worst-case time complexity of such an automaton can be polynomial or even exponential, meaning that for strings of a certain shape, increasing the input length by ten characters may make the automaton about 1000 times slower.

Typically, a regular expression is affected by this problem if it contains a repetition of the form r* or r+ where the sub-expression r is ambiguous in the sense that it can match some string in multiple ways. More information about the precise circumstances can be found in the references.

Recommendation

Modify the regular expression to remove the ambiguity, or ensure that the strings matched with the regular expression are short enough that the time-complexity does not matter.

Example

Consider this use of a regular expression, which removes all leading and trailing whitespace in a string:

			text.replace(/^\s+|\s+$/g, ''); // BAD
		

The sub-expression "\s+$" will match the whitespace characters in text from left to right, but it can start matching anywhere within a whitespace sequence. This is problematic for strings that do not end with a whitespace character. Such a string will force the regular expression engine to process each whitespace sequence once per whitespace character in the sequence.

This ultimately means that the time cost of trimming a string is quadratic in the length of the string. So a string like "a b" will take milliseconds to process, but a similar string with a million spaces instead of just one will take several minutes.

Avoid this problem by rewriting the regular expression to not contain the ambiguity about when to start matching whitespace sequences. For instance, by using a negative look-behind (/^\s+|(?<!\s)\s+$/g), or just by using the built-in trim method (text.trim()).

Note that the sub-expression "^\s+" is not problematic as the ^ anchor restricts when that sub-expression can start matching, and as the regular expression engine matches from left to right.

Example

As a similar, but slightly subtler problem, consider the regular expression that matches lines with numbers, possibly written using scientific notation:

			^0\.\d+E?\d+$ // BAD
		

The problem with this regular expression is in the sub-expression \d+E?\d+ because the second \d+ can start matching digits anywhere after the first match of the first \d+ if there is no E in the input string.

This is problematic for strings that do not end with a digit. Such a string will force the regular expression engine to process each digit sequence once per digit in the sequence, again leading to a quadratic time complexity.

To make the processing faster, the regular expression should be rewritten such that the two \d+ sub-expressions do not have overlapping matches: ^0\.\d+(E\d+)?$.

References

javascript/ql/src/Performance/ReDoS.qhelp

Inefficient regular expression

Some regular expressions take a long time to match certain input strings to the point where the time it takes to match a string of length n is proportional to nk or even 2n. Such regular expressions can negatively affect performance, or even allow a malicious user to perform a Denial of Service ("DoS") attack by crafting an expensive input string for the regular expression to match.

The regular expression engines provided by many popular JavaScript platforms use backtracking non-deterministic finite automata to implement regular expression matching. While this approach is space-efficient and allows supporting advanced features like capture groups, it is not time-efficient in general. The worst-case time complexity of such an automaton can be polynomial or even exponential, meaning that for strings of a certain shape, increasing the input length by ten characters may make the automaton about 1000 times slower.

Typically, a regular expression is affected by this problem if it contains a repetition of the form r* or r+ where the sub-expression r is ambiguous in the sense that it can match some string in multiple ways. More information about the precise circumstances can be found in the references.

Recommendation

Modify the regular expression to remove the ambiguity, or ensure that the strings matched with the regular expression are short enough that the time-complexity does not matter.

Example

Consider this regular expression:

			/^_(__|.)+_$/
		

Its sub-expression "(__|.)+?" can match the string "__" either by the first alternative "__" to the left of the "|" operator, or by two repetitions of the second alternative "." to the right. Thus, a string consisting of an odd number of underscores followed by some other character will cause the regular expression engine to run for an exponential amount of time before rejecting the input.

This problem can be avoided by rewriting the regular expression to remove the ambiguity between the two branches of the alternative inside the repetition:

			/^_(__|[^_])+_$/
		

References

python/ql/src/experimental/Security/CWE-730/PolynomialReDoS.qhelp

Polynomial regular expression used on uncontrolled data

Some regular expressions take a long time to match certain input strings to the point where the time it takes to match a string of length n is proportional to nk or even 2n. Such regular expressions can negatively affect performance, or even allow a malicious user to perform a Denial of Service ("DoS") attack by crafting an expensive input string for the regular expression to match.

The regular expression engine provided by Python uses a backtracking non-deterministic finite automata to implement regular expression matching. While this approach is space-efficient and allows supporting advanced features like capture groups, it is not time-efficient in general. The worst-case time complexity of such an automaton can be polynomial or even exponential, meaning that for strings of a certain shape, increasing the input length by ten characters may make the automaton about 1000 times slower.

Typically, a regular expression is affected by this problem if it contains a repetition of the form r* or r+ where the sub-expression r is ambiguous in the sense that it can match some string in multiple ways. More information about the precise circumstances can be found in the references.

Recommendation

Modify the regular expression to remove the ambiguity, or ensure that the strings matched with the regular expression are short enough that the time-complexity does not matter.

Example

Consider this use of a regular expression, which removes all leading and trailing whitespace in a string:

			re.sub(r"^\s+|\s+$", "", text) # BAD
		

The sub-expression "\s+$" will match the whitespace characters in text from left to right, but it can start matching anywhere within a whitespace sequence. This is problematic for strings that do not end with a whitespace character. Such a string will force the regular expression engine to process each whitespace sequence once per whitespace character in the sequence.

This ultimately means that the time cost of trimming a string is quadratic in the length of the string. So a string like "a b" will take milliseconds to process, but a similar string with a million spaces instead of just one will take several minutes.

Avoid this problem by rewriting the regular expression to not contain the ambiguity about when to start matching whitespace sequences. For instance, by using a negative look-behind (^\s+|(?<!\s)\s+$), or just by using the built-in strip method (text.strip()).

Note that the sub-expression "^\s+" is not problematic as the ^ anchor restricts when that sub-expression can start matching, and as the regular expression engine matches from left to right.

Example

As a similar, but slightly subtler problem, consider the regular expression that matches lines with numbers, possibly written using scientific notation:

			^0\.\d+E?\d+$ # BAD
		

The problem with this regular expression is in the sub-expression \d+E?\d+ because the second \d+ can start matching digits anywhere after the first match of the first \d+ if there is no E in the input string.

This is problematic for strings that do not end with a digit. Such a string will force the regular expression engine to process each digit sequence once per digit in the sequence, again leading to a quadratic time complexity.

To make the processing faster, the regular expression should be rewritten such that the two \d+ sub-expressions do not have overlapping matches: ^0\.\d+(E\d+)?$.

References

python/ql/src/experimental/Security/CWE-730/ReDoS.qhelp

Inefficient regular expression

Some regular expressions take a long time to match certain input strings to the point where the time it takes to match a string of length n is proportional to nk or even 2n. Such regular expressions can negatively affect performance, or even allow a malicious user to perform a Denial of Service ("DoS") attack by crafting an expensive input string for the regular expression to match.

The regular expression engine provided by Python uses a backtracking non-deterministic finite automata to implement regular expression matching. While this approach is space-efficient and allows supporting advanced features like capture groups, it is not time-efficient in general. The worst-case time complexity of such an automaton can be polynomial or even exponential, meaning that for strings of a certain shape, increasing the input length by ten characters may make the automaton about 1000 times slower.

Typically, a regular expression is affected by this problem if it contains a repetition of the form r* or r+ where the sub-expression r is ambiguous in the sense that it can match some string in multiple ways. More information about the precise circumstances can be found in the references.

Recommendation

Modify the regular expression to remove the ambiguity, or ensure that the strings matched with the regular expression are short enough that the time-complexity does not matter.

Example

Consider this regular expression:

			^_(__|.)+_$
		

Its sub-expression "(__|.)+?" can match the string "__" either by the first alternative "__" to the left of the "|" operator, or by two repetitions of the second alternative "." to the right. Thus, a string consisting of an odd number of underscores followed by some other character will cause the regular expression engine to run for an exponential amount of time before rejecting the input.

This problem can be avoided by rewriting the regular expression to remove the ambiguity between the two branches of the alternative inside the repetition:

			^_(__|[^_])+_$
		

References

ruby/ql/src/queries/security/cwe-022/PathInjection.qhelp

Uncontrolled data used in path expression

Accessing files using paths constructed from user-controlled data can allow an attacker to access unexpected resources. This can result in sensitive information being revealed or deleted, or an attacker being able to influence behavior by modifying unexpected files.

Recommendation

Validate user input before using it to construct a file path, either using an off-the-shelf library like ActiveStorage::Filename#sanitized in Rails, or by performing custom validation.

Ideally, follow these rules:

  • Do not allow more than a single "." character.
  • Do not allow directory separators such as "/" or "\" (depending on the file system).
  • Do not rely on simply replacing problematic sequences such as "../". For example, after applying this filter to ".../...//", the resulting string would still be "../".
  • Use a whitelist of known good patterns.

Example

In the first example, a file name is read from an HTTP request and then used to access a file. However, a malicious user could enter a file name which is an absolute path, such as "/etc/passwd".

In the second example, it appears that the user is restricted to opening a file within the "user" home directory. However, a malicious user could enter a file name containing special characters. For example, the string "../../etc/passwd" will result in the code reading the file located at "/home/user/../../etc/passwd", which is the system's password file. This file would then be sent back to the user, giving them access to all the system's passwords.

class FilesController < ActionController::Base
  def first_example
    # BAD: This could read any file on the file system
    @content = File.read params[:path]
  end

  def second_example
    # BAD: This could still read any file on the file system
    @content = File.read "/home/user/#{ params[:path] }"
  end
end

References

ruby/ql/src/queries/security/cwe-078/CommandInjection.qhelp

Uncontrolled command line

Code that passes user input directly to Kernel.system, Kernel.exec, or some other library routine that executes a command, allows the user to execute malicious code.

Recommendation

If possible, use hard-coded string literals to specify the command to run or library to load. Instead of passing the user input directly to the process or library function, examine the user input and then choose among hard-coded string literals.

If the applicable libraries or commands cannot be determined at compile time, then add code to verify that the user input string is safe before using it.

Example

The following example shows code that takes a shell script that can be changed maliciously by a user, and passes it straight to Kernel.system without examining it first.

class UsersController < ActionController::Base
  def create
    command = params[:command]
    system(command) # BAD
  end
end

References

ruby/ql/src/queries/security/cwe-078/KernelOpen.qhelp

Use of Kernel.open or IO.read

If Kernel.open is given a file name that starts with a | character, it will execute the remaining string as a shell command. If a malicious user can control the file name, they can execute arbitrary code. The same vulnerability applies to IO.read.

Recommendation

Use File.open instead of Kernel.open, as the former does not have this vulnerability. Similarly, use File.read instead of IO.read.

Example

The following example shows code that calls Kernel.open on a user-supplied file path.

class UsersController < ActionController::Base
  def create
    filename = params[:filename]
    open(filename) # BAD
  end
end  

Instead, File.open should be used, as in the following example.

class UsersController < ActionController::Base
    def create
      filename = params[:filename]
      File.open(filename)
    end
  end  

References

ruby/ql/src/queries/security/cwe-079/ReflectedXSS.qhelp

Reflected server-side cross-site scripting

Directly writing user input (for example, an HTTP request parameter) to a webpage, without properly sanitizing the input first, allows for a cross-site scripting vulnerability.

Recommendation

To guard against cross-site scripting, escape user input before writing it to the page. Some frameworks, such as Rails, perform this escaping implicitly and by default.

Take care when using methods such as html_safe or raw. They can be used to emit a string without escaping it, and should only be used when the string has already been manually escaped (for example, with the Rails html_escape method), or when the content is otherwise guaranteed to be safe (such as a hard-coded string).

Example

The following example is safe because the params[:user_name] content within the output tags will be HTML-escaped automatically before being emitted.

<p>Hello <%= params[:user_name] %>!</p>

However, the following example is unsafe because user-controlled input is emitted without escaping, since it is marked as html_safe.

<p>Hello <%= params[:user_name].html_safe %>!</p>

References

ruby/ql/src/queries/security/cwe-079/StoredXSS.qhelp

Stored cross-site scripting

Directly writing an uncontrolled stored value (for example, a database field) to a webpage, without properly sanitizing the value first, allows for a cross-site scripting vulnerability.

This kind of vulnerability is also called stored cross-site scripting, to distinguish it from other types of cross-site scripting.

Recommendation

To guard against stored cross-site scripting, consider escaping before using uncontrolled stored values to create HTML content. Some frameworks, such as Rails, perform this escaping implicitly and by default.

Take care when using methods such as html_safe or raw. They can be used to emit a string without escaping it, and should only be used when the string has already been manually escaped (for example, with the Rails html_escape method), or when the content is otherwise guaranteed to be safe (such as a hard-coded string).

Example

The following example is safe because the user.name content within the output tags will be HTML-escaped automatically before being emitted.

<% user = User.find(1) %>
<p>Hello <%= user.name %>!</p>

However, the following example may be unsafe because user.name is emitted without escaping, since it is marked as html_safe. If the name is not sanitized before being written to the database, then an attacker could use this to insert arbitrary content into the HTML output, including scripts.

<% user = User.find(1) %>
<p>Hello <%= user.name.html_safe %>!</p>

In the next example, content from a file on disk is inserted literally into HTML content. This approach is sometimes used to load script content, such as extensions for a web application, from files on disk. Care should taken in these cases to ensure both that the loaded files are trusted, and that the file cannot be modified by untrusted users.

<script>
  <%= File.read(File.join(SCRIPT_DIR, "script.js")).html_safe %>
</script>

References

ruby/ql/src/queries/security/cwe-089/SqlInjection.qhelp

SQL query built from user-controlled sources

If a database query (such as a SQL or NoSQL query) is built from user-provided data without sufficient sanitization, a malicious user may be able to run malicious database queries.

Recommendation

Most database connector libraries offer a way of safely embedding untrusted data into a query by means of query parameters or prepared statements.

Example

In the following Rails example, an ActionController class has a text_bio method to handle requests to fetch a biography for a specified user.

The user is specified using a parameter, user_name provided by the client. This value is accessible using the params method.

The method illustrates three different ways to construct and execute an SQL query to find the user by name.

In the first case, the parameter user_name is inserted into an SQL fragment using string interpolation. The parameter is user-supplied and is not sanitized. An attacker could use this to construct SQL queries that were not intended to be executed here.

The second case uses string concatenation and is vulnerable in the same way that the first case is.

In the third case, the name is passed in a hash instead. ActiveRecord will construct a parameterized SQL query that is not vulnerable to SQL injection attacks.

class UserController < ActionController::Base
  def text_bio
    # BAD -- Using string interpolation
    user = User.find_by "name = '#{params[:user_name]}'"

    # BAD -- Using string concatenation
    find_str = "name = '" + params[:user_name] + "'"
    user = User.find_by(find_str)

    # GOOD -- Using a hash to parameterize arguments
    user = User.find_by name: params[:user_name]

    render plain: user&.text_bio
  end
end

References

ruby/ql/src/queries/security/cwe-094/CodeInjection.qhelp

Code injection

Directly evaluating user input (for example, an HTTP request parameter) as code without first sanitizing the input allows an attacker arbitrary code execution. This can occur when user input is passed to code that interprets it as an expression to be evaluated, using methods such as Kernel.eval or Kernel.send.

Recommendation

Avoid including user input in any expression that may be dynamically evaluated. If user input must be included, use context-specific escaping before including it. It is important that the correct escaping is used for the type of evaluation that will occur.

Example

The following example shows two functions setting a name from a request. The first function uses eval to execute the set_name method. This is dangerous as it can allow a malicious user to execute arbitrary code on the server. For example, the user could supply the value "' + exec('rm -rf') + '" to destroy the server's file system. The second function calls the set_name method directly and is thus safe.

class UsersController < ActionController::Base
  # BAD - Allow user to define code to be run.
  def create_bad
    first_name = params[:first_name]
    eval("set_name(#{first_name})")
  end

  # GOOD - Call code directly
  def create_good
    first_name = params[:first_name]
    set_name(first_name)
  end

  def set_name(name)
    @name = name
  end
end

References

ruby/ql/src/queries/security/cwe-1333/PolynomialReDoS.qhelp

Polynomial regular expression used on uncontrolled data

Some regular expressions take a long time to match certain input strings to the point where the time it takes to match a string of length n is proportional to nk or even 2n. Such regular expressions can negatively affect performance, or even allow a malicious user to perform a Denial of Service ("DoS") attack by crafting an expensive input string for the regular expression to match.

The regular expression engine used by the Ruby interpreter (MRI) uses backtracking non-deterministic finite automata to implement regular expression matching. While this approach is space-efficient and allows supporting advanced features like capture groups, it is not time-efficient in general. The worst-case time complexity of such an automaton can be polynomial or even exponential, meaning that for strings of a certain shape, increasing the input length by ten characters may make the automaton about 1000 times slower.

Typically, a regular expression is affected by this problem if it contains a repetition of the form r* or r+ where the sub-expression r is ambiguous in the sense that it can match some string in multiple ways. More information about the precise circumstances can be found in the references.

Recommendation

Modify the regular expression to remove the ambiguity, or ensure that the strings matched with the regular expression are short enough that the time-complexity does not matter.

Example

Consider this use of a regular expression, which removes all leading and trailing whitespace in a string:

			text.gsub!(/^\s+|\s+$/, '') # BAD
		

The sub-expression "\s+$" will match the whitespace characters in text from left to right, but it can start matching anywhere within a whitespace sequence. This is problematic for strings that do not end with a whitespace character. Such a string will force the regular expression engine to process each whitespace sequence once per whitespace character in the sequence.

This ultimately means that the time cost of trimming a string is quadratic in the length of the string. So a string like "a b" will take milliseconds to process, but a similar string with a million spaces instead of just one will take several minutes.

Avoid this problem by rewriting the regular expression to not contain the ambiguity about when to start matching whitespace sequences. For instance, by using a negative look-behind (/^\s+|(?<!\s)\s+$/), or just by using the built-in strip method (text.strip!).

Note that the sub-expression "^\s+" is not problematic as the ^ anchor restricts when that sub-expression can start matching, and as the regular expression engine matches from left to right.

Example

As a similar, but slightly subtler problem, consider the regular expression that matches lines with numbers, possibly written using scientific notation:

			/^0\.\d+E?\d+$/ # BAD
		

The problem with this regular expression is in the sub-expression \d+E?\d+ because the second \d+ can start matching digits anywhere after the first match of the first \d+ if there is no E in the input string.

This is problematic for strings that do not end with a digit. Such a string will force the regular expression engine to process each digit sequence once per digit in the sequence, again leading to a quadratic time complexity.

To make the processing faster, the regular expression should be rewritten such that the two \d+ sub-expressions do not have overlapping matches: /^0\.\d+(E\d+)?$/.

References

ruby/ql/src/queries/security/cwe-1333/ReDoS.qhelp

Inefficient regular expression

Some regular expressions take a long time to match certain input strings to the point where the time it takes to match a string of length n is proportional to nk or even 2n. Such regular expressions can negatively affect performance, or even allow a malicious user to perform a Denial of Service ("DoS") attack by crafting an expensive input string for the regular expression to match.

The regular expression engine used by the Ruby interpreter (MRI) uses backtracking non-deterministic finite automata to implement regular expression matching. While this approach is space-efficient and allows supporting advanced features like capture groups, it is not time-efficient in general. The worst-case time complexity of such an automaton can be polynomial or even exponential, meaning that for strings of a certain shape, increasing the input length by ten characters may make the automaton about 1000 times slower.

Typically, a regular expression is affected by this problem if it contains a repetition of the form r* or r+ where the sub-expression r is ambiguous in the sense that it can match some string in multiple ways. More information about the precise circumstances can be found in the references.

Recommendation

Modify the regular expression to remove the ambiguity, or ensure that the strings matched with the regular expression are short enough that the time-complexity does not matter.

Example

Consider this regular expression:

      /^_(__|.)+_$/
    

Its sub-expression "(__|.)+?" can match the string "__" either by the first alternative "__" to the left of the "|" operator, or by two repetitions of the second alternative "." to the right. Thus, a string consisting of an odd number of underscores followed by some other character will cause the regular expression engine to run for an exponential amount of time before rejecting the input.

This problem can be avoided by rewriting the regular expression to remove the ambiguity between the two branches of the alternative inside the repetition:

      /^_(__|[^_])+_$/
    

References

ruby/ql/src/queries/security/cwe-295/RequestWithoutValidation.qhelp

Request without certificate validation

Certificate validation is the standard authentication method of a secure TLS connection. Without it, there is no guarantee about who the other party of a TLS connection is, making man-in-the-middle attacks more likely to occur.

When testing software that uses TLS connections, it may be useful to disable the certificate validation temporarily. But disabling it in production environments is strongly discouraged, unless an alternative method of authentication is used.

Recommendation

Do not disable certificate validation for TLS connections.

Example

The following example shows an HTTPS connection that makes a GET request to a remote server. But the connection is not secure since the verify_mode option of the connection is set to OpenSSL::SSL::VERIFY_NONE. As a consequence, anyone can impersonate the remote server.

require "net/https"
require "uri"

uri = URI.parse "https://example.com/"
http = Net::HTTP.new uri.host, uri.port
http.use_ssl = true
http.verify_mode = OpenSSL::SSL::VERIFY_NONE
request = Net::HTTP::Get.new uri.request_uri
puts http.request(request).body

To make the connection secure, the verify_mode option should have its default value, or be explicitly set to OpenSSL::SSL::VERIFY_PEER.

References

ruby/ql/src/queries/security/cwe-502/UnsafeDeserialization.qhelp

Deserialization of user-controlled data

Deserializing untrusted data using any method that allows the construction of arbitrary objects is easily exploitable and, in many cases, allows an attacker to execute arbitrary code.

Recommendation

Avoid deserialization of untrusted data if possible. If the architecture permits it, use serialization formats that cannot represent arbitarary objects. For libraries that support it, such as the Ruby standard library's JSON module, ensure that the parser is configured to disable deserialization of arbitrary objects.

Example

The following example calls the Marshal.load, JSON.load, YAML.load, and Oj.load methods on data from an HTTP request. Since these methods are capable of deserializing to arbitrary objects, this is inherently unsafe.

require 'json'
require 'yaml'
require 'oj'

class UserController < ActionController::Base
  def marshal_example
    data = Base64.decode64 params[:data]
    object = Marshal.load data
    # ...
  end

  def json_example
    object = JSON.load params[:json]
    # ...
  end

  def yaml_example
    object = YAML.load params[:yaml]
    # ...
  end

  def oj_example
    object = Oj.load params[:json]
    # ...
  end
end

Using JSON.parse and YAML.safe_load instead, as in the following example, removes the vulnerability. Similarly, calling Oj.load with any mode other than :object is safe, as is calling Oj.safe_load. Note that there is no safe way to deserialize untrusted data using Marshal.

require 'json'

class UserController < ActionController::Base
  def safe_json_example
    object = JSON.parse params[:json]
    # ...
  end

  def safe_yaml_example
    object = YAML.safe_load params[:yaml]
    # ...
  end

  def safe_oj_example
    object = Oj.load params[:yaml], { mode: :strict }
    # or
    object = Oj.safe_load params[:yaml]
    # ...
  end
end

References

ruby/ql/src/queries/security/cwe-601/UrlRedirect.qhelp

URL redirection from remote source

Directly incorporating user input into a URL redirect request without validating the input can facilitate phishing attacks. In these attacks, unsuspecting users can be redirected to a malicious site that looks very similar to the real site they intend to visit, but which is controlled by the attacker.

Recommendation

To guard against untrusted URL redirection, it is advisable to avoid putting user input directly into a redirect URL. Instead, maintain a list of authorized redirects on the server; then choose from that list based on the user input provided.

Example

The following example shows an HTTP request parameter being used directly in a URL redirect without validating the input, which facilitates phishing attacks:

class HelloController < ActionController::Base
  def hello
    redirect_to params[:url]
  end
end

One way to remedy the problem is to validate the user input against a known fixed string before doing the redirection:

class HelloController < ActionController::Base
  VALID_REDIRECT = "http://cwe.mitre.org/data/definitions/601.html"

  def hello
    if params[:url] == VALID_REDIRECT
      redirect_to params[:url]
    else
      # error
    end
  end
end

References

ruby/ql/src/queries/security/cwe-611/Xxe.qhelp

XML external entity expansion

Parsing untrusted XML files with a weakly configured XML parser may lead to an XML External Entity (XXE) attack. This type of attack uses external entity references to access arbitrary files on a system, carry out denial-of-service (DoS) attacks, or server-side request forgery. Even when the result of parsing is not returned to the user, DoS attacks are still possible and out-of-band data retrieval techniques may allow attackers to steal sensitive data.

Recommendation

The easiest way to prevent XXE attacks is to disable external entity handling when parsing untrusted data. How this is done depends on the library being used. Note that some libraries, such as rexml, nokogiri and libxml-ruby, disable entity expansion by default, so unless you have explicitly enabled entity expansion, no further action needs to be taken.

Example

The following example uses the nokogiri XML parser to parse a string xmlSrc. If that string is from an untrusted source, this code may be vulnerable to an XXE attack, since the parser is invoked with the noent option set:

require "nokogiri"

def process_data1
   xmlSrc = request.body
   doc = Nokogiri::XML.parse(xmlSrc, nil, nil, Nokogiri::XML::ParseOptions::NOENT) # BAD
end

def process_data2
   xmlSrc = request.body
   doc = Nokogiri::XML.parse(xmlSrc) { |config| config.noent } # BAD
end

To guard against XXE attacks, the noent option should be omitted or cleared (e.g. using nonoent). This means that no entity expansion is undertaken at all, not even for standard internal entities such as &amp; or &gt;. If desired, these entities can be expanded in a separate step using utility functions.

require "nokogiri"

def process_data1
   xmlSrc = request.body
   doc = Nokogiri::XML.parse(xmlSrc) # GOOD
end

def process_data2
   xmlSrc = request.body
   doc = Nokogiri::XML.parse(xmlSrc) { |config| config.nonoent } # GOOD
end

References

ruby/ql/src/queries/security/cwe-732/WeakFilePermissions.qhelp

Overly permissive file permissions

When creating a file, POSIX systems allow permissions to be specified for owner, group and others separately. Permissions should be kept as strict as possible, preventing access to the files contents by other users.

Recommendation

Restrict the file permissions of files to prevent any but the owner being able to read or write to that file

References

ruby/ql/src/queries/security/cwe-798/HardcodedCredentials.qhelp

Hard-coded credentials

Including unencrypted hard-coded inbound or outbound authentication credentials within source code or configuration files is dangerous because the credentials may be easily discovered.

Source or configuration files containing hard-coded credentials may be visible to an attacker. For example, the source code may be open source, or it may be leaked or accidentally revealed.

For inbound authentication, hard-coded credentials may allow unauthorized access to the system. This is particularly problematic if the credential is hard-coded in the source code, because it cannot be disabled easily. For outbound authentication, the hard-coded credentials may provide an attacker with privileged information or unauthorized access to some other system.

Recommendation

Remove hard-coded credentials, such as user names, passwords and certificates, from source code, placing them in configuration files or other data stores if necessary. If possible, store configuration files including credential data separately from the source code, in a secure location with restricted access.

For outbound authentication details, consider encrypting the credentials or the enclosing data stores or configuration files, and using permissions to restrict access.

For inbound authentication details, consider hashing passwords using standard library functions where possible. For example, OpenSSL::KDF.pbkdf2_hmac.

Example

The following examples shows different types of inbound and outbound authentication.

In the first case, RackAppBad, we accept a password from a remote user, and compare it against a plaintext string literal. If an attacker acquires the source code they can observe the password, and can log in to the system. Furthermore, if such an intrusion was discovered, the application would need to be rewritten and redeployed in order to change the password.

In the second case, RackAppGood, the password is compared to a hashed and salted password stored in a configuration file, using OpenSSL::KDF.pbkdf2_hmac. In this case, access to the source code or the assembly would not reveal the password to an attacker. Even access to the configuration file containing the password hash and salt would be of little value to an attacker, as it is usually extremely difficult to reverse engineer the password from the hash and salt. In a real application care should be taken to make the string comparison of the hashed input against the hashed password take close to constant time, as this will make timing attacks more difficult.

require 'rack'
require 'yaml'
require 'openssl'

class RackAppBad
  def call(env)
    req = Rack::Request.new(env)
    password = req.params['password']

    # BAD: Inbound authentication made by comparison to string literal
    if password == 'myPa55word'
      [200, {'Content-type' => 'text/plain'}, ['OK']]
    else
      [403, {'Content-type' => 'text/plain'}, ['Permission denied']]
    end
  end
end

class RackAppGood
  def call(env)
    req = Rack::Request.new(env)
    password = req.params['password']

    config_file = YAML.load_file('config.yml')
    hashed_password = config_file['hashed_password']
    salt = [config_file['salt']].pack('H*')

    #GOOD: Inbound authentication made by comparing to a hash password from a config file.
    hash = OpenSSL::Digest::SHA256.new
    dk = OpenSSL::KDF.pbkdf2_hmac(
      password, salt: salt, hash: hash, iterations: 100_000, length: hash.digest_length
    )
    hashed_input = dk.unpack('H*').first
    if hashed_password == hashed_input
      [200, {'Content-type' => 'text/plain'}, ['OK']]
    else
      [403, {'Content-type' => 'text/plain'}, ['Permission denied']]
    end
  end
end

References

@aibaars aibaars merged commit afc7867 into main Oct 25, 2021
@aibaars aibaars deleted the aibaars/patch-10 branch October 25, 2021 10:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.