|
| 1 | +HTTPS Everywhere Rule Checker |
| 2 | +============================= |
| 3 | + |
| 4 | +Author: Ondrej Mikle, CZ.NIC (ondrej.mikle@nic.cz) |
| 5 | + |
| 6 | +Installation |
| 7 | +------------ |
| 8 | + |
| 9 | +:: |
| 10 | + |
| 11 | + pip install https-everywhere-checker |
| 12 | + |
| 13 | +or using the supplied setup.py |
| 14 | + |
| 15 | +:: |
| 16 | + |
| 17 | + python setup.py install |
| 18 | + |
| 19 | +Configuration |
| 20 | +------------- |
| 21 | + |
| 22 | +Copy ``checker.config.sample`` to ``checker.config`` and change the |
| 23 | +``rulesdir`` under ``[rulesets]`` to point to a directory with the XML |
| 24 | +files of HTTPS Everywhere rules (usually the |
| 25 | +``src/chrome/content/rules`` of locally checked out git tree of HTTPS |
| 26 | +Everywhere). |
| 27 | + |
| 28 | +Running |
| 29 | +------- |
| 30 | + |
| 31 | +Once you have modified the config, run: |
| 32 | + |
| 33 | +:: |
| 34 | + |
| 35 | + check-https-rules checker.config |
| 36 | + |
| 37 | +Output will be written to selected log file, infos/warnings/errors |
| 38 | +contain the useful information. |
| 39 | + |
| 40 | +Features |
| 41 | +-------- |
| 42 | + |
| 43 | +- Attempts to follow Firefox behavior as closely as possible (including |
| 44 | + rewriting HTTP redirects according to rules; well except for |
| 45 | + Javascript and meta-redirects) |
| 46 | +- IDN domain support |
| 47 | +- Currently two metrics on "distance" of two resources implemented, one |
| 48 | + is purely string-based, the other tries to measure "similarity of the |
| 49 | + shape of DOM tree" |
| 50 | +- Multi-threaded scanner |
| 51 | +- Support for various "platforms" (e.g. CAcert), i.e. sets of CA |
| 52 | + certificate sets which can be switched during following of redirects |
| 53 | +- set of used CA certificates can be statically restricted to one CA |
| 54 | + certificate set (see ``static_ca_path`` in config file) |
| 55 | + |
| 56 | +What errors in rulesets can be detected |
| 57 | +--------------------------------------- |
| 58 | + |
| 59 | +- big difference in HTML page structure |
| 60 | +- error in ruleset - declared target that no rule rewrites, bad regexps |
| 61 | + (usually capture groups are wrong), incomplete FQDNs, non-existent |
| 62 | + domains |
| 63 | +- HTTP 200 in original page, while rewritten page returns 4xx/5xx |
| 64 | +- cycle detection in redirects |
| 65 | +- transvalid certificates (incomplete chains) |
| 66 | +- other invalid certificate detection (self-signed, expired, CN |
| 67 | + mismatch...) |
| 68 | + |
| 69 | +False positives and shortcomings |
| 70 | +-------------------------------- |
| 71 | + |
| 72 | +- Some pages deliberately have different HTTP and HTTPS page, some for |
| 73 | + example redirect to different page under https |
| 74 | +- URLs to scan are naively guessed from target hosts, having test set |
| 75 | + of URLs in a ruleset would improve it (better coverage) |
| 76 | + |
| 77 | +Known bugs |
| 78 | +---------- |
| 79 | + |
| 80 | +CURL+NSS can't handle hosts with SNI sharing same IP address |
| 81 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 82 | + |
| 83 | +PyCURL and NSS incorrectly handle the case when two FQDNs have identical |
| 84 | +IP address, use Server Name Indication and try to resume TLS session |
| 85 | +with the same session ID. Even turning off SSL session cache via setting |
| 86 | +``pycurl.SSL_SESSIONID_CACHE`` to zero won't help (it's ignored by |
| 87 | +libcurl/pycurl for some reason). PyCURL+NSS fail to see that server |
| 88 | +didn't acknowledge SNI in response (see RFC 4366 reference below), thus |
| 89 | +'Host' header in HTTP and SNI seen by server are different, thus HTTP |
| 90 | +404. |
| 91 | + |
| 92 | +This one issue was especially insidious bug, many thanks to Pavel Janík |
| 93 | +for helping hunt this bug down. |
| 94 | + |
| 95 | +Testcase |
| 96 | +^^^^^^^^ |
| 97 | + |
| 98 | +See ``curl_test_nss/curl_testcase_nss_sni.py`` script that demonstrates |
| 99 | +the bug. |
| 100 | + |
| 101 | +Technical details |
| 102 | +^^^^^^^^^^^^^^^^^ |
| 103 | + |
| 104 | +PyCURL sends TLS handshake with SNI for the first host. This works. |
| 105 | +Connection is then closed, but PyCURL+NSS remembers the SSL session ID. |
| 106 | +It will attempt to use the same session ID when later connecting to |
| 107 | +second host on the same IP. |
| 108 | + |
| 109 | +However, the server won't acknowledge what client requested with new |
| 110 | +SNI, because client attempts to resume during TLS handshake using the |
| 111 | +incorrect session ID. Thus the session is "resumed" to the first host's |
| 112 | +SNI. |
| 113 | + |
| 114 | +Side observation: When validation is turned off in PyCURL+NSS, it also |
| 115 | +turns off session resume as a side effect (the code is in curl's nss.c). |
| 116 | + |
| 117 | +Workaround |
| 118 | +^^^^^^^^^^ |
| 119 | + |
| 120 | +Set config to use SSLv3 instead of default TLSv1 (option ``ssl_version`` |
| 121 | +under ``http`` section). |
| 122 | + |
| 123 | +Normative reference |
| 124 | +^^^^^^^^^^^^^^^^^^^ |
| 125 | + |
| 126 | +See last four paragraphs of `RFC 4366, section |
| 127 | +3.1 <https://tools.ietf.org/html/rfc4366#section-3.1>`__. Contrast with |
| 128 | +`RFC 6066 section 3 <https://tools.ietf.org/html/rfc6066#section-3>`__, |
| 129 | +last two paragraphs. In TLS 1.2 the logic is reversed - server must not |
| 130 | +resume such connection and must go through full handshake again. |
| 131 | + |
| 132 | +At most 9 capture groups in rule supported |
| 133 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 134 | + |
| 135 | +This is a workaround for ambiguous rewrites in rules such as: |
| 136 | + |
| 137 | +:: |
| 138 | + |
| 139 | + <rule from="^http://(www\.)?01\.org/" to="https://$101.org/" /> |
| 140 | + |
| 141 | +The ``$101`` would actually mean 101-st group, so we assume that only first digit after ``$`` |
| 142 | +denotes the group (which is how it seems to work in javascript). |
| 143 | + |
| 144 | +May not work under Windows |
| 145 | +~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 146 | + |
| 147 | +According to `PyCURL |
| 148 | +documentation <http://curl.haxx.se/libcurl/c/curl_easy_setopt.html#CURLOPTCAPATH>`__, |
| 149 | +using CAPATH may not work under Windows. I'd guess it's due to openssl's |
| 150 | +``c_rehash`` utility that creates symlinks to PEM certificates. |
| 151 | +Hypothetically it could work if the symlinks were replaced by regular |
| 152 | +files with identical names, but haven't tried. |
| 153 | + |
| 154 | +Threading bugs and workarounds |
| 155 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 156 | + |
| 157 | +There are some race conditions with Python threads and OpenSSL/GnuTLS |
| 158 | +that cause about due to SIGPIPE or SIGSEGV. While libcurl code seems to |
| 159 | +have implemented the necessary callbacks, there's a bug somewhere :-) |
| 160 | + |
| 161 | +Workaround: set ``fetch_in_subprocess`` under ``http`` section in config |
| 162 | +to true when using multiple threads for fetching. Using subprocess is on |
| 163 | +by default. |
| 164 | + |
| 165 | +You might have to set PYTHONPATH if working dir is different from code |
| 166 | +dir with python scripts. |
| 167 | + |
| 168 | +If underlying SSL library is NSS, threading looks fine. |
| 169 | + |
| 170 | +As a side effect, the CURL+NSS SNI bug does not happen with subprocesses |
| 171 | +(SSL session ID cache is not kept among process invocations). |
| 172 | + |
| 173 | +If pure-threaded version starts eating too much memory (like 1 GB in a |
| 174 | +minute), turn on the ``fetch_in_subprocess`` option metioned above. Some |
| 175 | +combinations of CURL and SSL library versions do that. Spawning separate |
| 176 | +subprocesses prevents any caches building up and eating too much memory. |
| 177 | + |
| 178 | +Using subprocess hypothetically might cause a deadlock due to |
| 179 | +insufficient buffer size when exchanging data through stdin/stdout in |
| 180 | +case of a large HTML page, but hasn't happened for any of the rules |
| 181 | +(I've tried to run them on the complete batch of rulesets contained in |
| 182 | +HTTPS Everywhere Nov 2 2012 commit |
| 183 | +c343f230a49d960dba90424799c3bacc2325fc94). Though in case deadlock |
| 184 | +happens, increase buffer size in ``subprocess.Popen`` invocation in |
| 185 | +``http_client.py``. |
| 186 | + |
| 187 | +Generic bugs/quirks of SSL libraries |
| 188 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 189 | + |
| 190 | +Each of the three possible libraries (OpenSSL, GnuTLS, NSS) has |
| 191 | +different set of quirks. GnuTLS seems to be the most strict one |
| 192 | +regarding relevant RFCs and will not for instance tolerate certificate |
| 193 | +chain in wrong order or forgive server not sending ``close_notify`` |
| 194 | +alert. |
| 195 | + |
| 196 | +Thus it's entirely possible that while a server chain and SSL/TLS |
| 197 | +handshake seems OK when using one lib, it may break with the other. |
| 198 | + |
| 199 | +Transvalid certificates (transitive closure of root and intermediate certs) |
| 200 | +--------------------------------------------------------------------------- |
| 201 | + |
| 202 | +The ``platform_certs/FF_transvalid.tar.bz2`` attempts to simulate common |
| 203 | +browser behavior of caching intermediate certs. The directory contains |
| 204 | +FF's builtin certs and all intermediate certs that validate from FF's |
| 205 | +builtin certs (a transitive closure). |
| 206 | + |
| 207 | +The certs above are in a tarball (need to be unpacked and c\_rehash'd |
| 208 | +for use). |
| 209 | + |
| 210 | +The script is in ``certs_transitive_closure/build_closure.sh`` and is |
| 211 | +rather crude, definitely needs some double-checking of sanity (see |
| 212 | +comments inside the script). |
| 213 | + |
| 214 | +Quick outline of the script's algorithm: |
| 215 | + |
| 216 | +1. IntermediateSet\_0 := {trusted builtin certs from clean install of |
| 217 | + Firefox} |
| 218 | +2. Certs that have basic constraints CA=true or are X509 version 1 are |
| 219 | + exported from some DB like SSL Observatory |
| 220 | +3. Iterate over all exported certs, add new unique certificates not yet |
| 221 | + contained in IntermediateSet\_n validate against latest |
| 222 | + IntermediateSet\_n, forming IntermediateSet\_{n+1} |
| 223 | +4. n += 1 |
| 224 | +5. If any certs were added in step 3, goto 3, else end |
| 225 | + |
| 226 | +Last IntermediateSet is the closure. |
0 commit comments