Skip to content

Commit 2fdd1d2

Browse files
committed
Removing https-everywhere-checker as submodule and incorporating it into the mainline repo
1 parent ad551df commit 2fdd1d2

File tree

6,427 files changed

+73360
-5
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

6,427 files changed

+73360
-5
lines changed

.gitmodules

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,6 @@
11
[submodule "addon-sdk"]
22
path = addon-sdk
33
url = https://github.com/mozilla/addon-sdk.git
4-
[submodule "https-everywhere-checker"]
5-
path = test/rules
6-
url = https://github.com/jsha/https-everywhere-checker.git
7-
branch = updates
84
[submodule "translations"]
95
path = translations
106
url = https://git.torproject.org/translation.git

test/rules

Lines changed: 0 additions & 1 deletion
This file was deleted.

test/rules/.gitignore

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
.*.swp
2+
checker.config
3+
*.kpf
4+
*.pyc
5+
*.pyo
6+
rules
7+
*.html
8+
9+
*.py[cod]
10+
11+
# C extensions
12+
*.so
13+
14+
# Packages
15+
*.egg
16+
*.egg-info
17+
dist
18+
build
19+
eggs
20+
parts
21+
bin
22+
var
23+
sdist
24+
develop-eggs
25+
.installed.cfg
26+
lib
27+
lib64
28+
29+
# Installer logs
30+
pip-log.txt
31+
32+
# Unit test / coverage reports
33+
.coverage
34+
.tox
35+
nosetests.xml
36+
htmlcov
37+
38+
# Translations
39+
*.mo
40+
41+
# Mr Developer
42+
.mr.developer.cfg
43+
.project
44+
.pydevproject
45+
46+
# Complexity
47+
output/*.html
48+
output/*/index.html
49+
50+
# Sphinx
51+
docs/_build

test/rules/COPYING

Lines changed: 674 additions & 0 deletions
Large diffs are not rendered by default.

test/rules/HISTORY.rst

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
.. :changelog:
2+
3+
History
4+
-------
5+
6+
0.1.0 (unreleased)
7+
++++++++++++++++++
8+
9+
* First release on PyPI.

test/rules/MANIFEST.in

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
include AUTHORS.rst
2+
include CONTRIBUTING.rst
3+
include HISTORY.rst
4+
include LICENSE
5+
include README.rst
6+
7+
recursive-include tests *
8+
recursive-exclude * __pycache__
9+
recursive-exclude * *.py[co]
10+
11+
recursive-include docs *.rst conf.py Makefile make.bat

test/rules/README.rst

Lines changed: 226 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,226 @@
1+
HTTPS Everywhere Rule Checker
2+
=============================
3+
4+
Author: Ondrej Mikle, CZ.NIC (ondrej.mikle@nic.cz)
5+
6+
Installation
7+
------------
8+
9+
::
10+
11+
pip install https-everywhere-checker
12+
13+
or using the supplied setup.py
14+
15+
::
16+
17+
python setup.py install
18+
19+
Configuration
20+
-------------
21+
22+
Copy ``checker.config.sample`` to ``checker.config`` and change the
23+
``rulesdir`` under ``[rulesets]`` to point to a directory with the XML
24+
files of HTTPS Everywhere rules (usually the
25+
``src/chrome/content/rules`` of locally checked out git tree of HTTPS
26+
Everywhere).
27+
28+
Running
29+
-------
30+
31+
Once you have modified the config, run:
32+
33+
::
34+
35+
check-https-rules checker.config
36+
37+
Output will be written to selected log file, infos/warnings/errors
38+
contain the useful information.
39+
40+
Features
41+
--------
42+
43+
- Attempts to follow Firefox behavior as closely as possible (including
44+
rewriting HTTP redirects according to rules; well except for
45+
Javascript and meta-redirects)
46+
- IDN domain support
47+
- Currently two metrics on "distance" of two resources implemented, one
48+
is purely string-based, the other tries to measure "similarity of the
49+
shape of DOM tree"
50+
- Multi-threaded scanner
51+
- Support for various "platforms" (e.g. CAcert), i.e. sets of CA
52+
certificate sets which can be switched during following of redirects
53+
- set of used CA certificates can be statically restricted to one CA
54+
certificate set (see ``static_ca_path`` in config file)
55+
56+
What errors in rulesets can be detected
57+
---------------------------------------
58+
59+
- big difference in HTML page structure
60+
- error in ruleset - declared target that no rule rewrites, bad regexps
61+
(usually capture groups are wrong), incomplete FQDNs, non-existent
62+
domains
63+
- HTTP 200 in original page, while rewritten page returns 4xx/5xx
64+
- cycle detection in redirects
65+
- transvalid certificates (incomplete chains)
66+
- other invalid certificate detection (self-signed, expired, CN
67+
mismatch...)
68+
69+
False positives and shortcomings
70+
--------------------------------
71+
72+
- Some pages deliberately have different HTTP and HTTPS page, some for
73+
example redirect to different page under https
74+
- URLs to scan are naively guessed from target hosts, having test set
75+
of URLs in a ruleset would improve it (better coverage)
76+
77+
Known bugs
78+
----------
79+
80+
CURL+NSS can't handle hosts with SNI sharing same IP address
81+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
82+
83+
PyCURL and NSS incorrectly handle the case when two FQDNs have identical
84+
IP address, use Server Name Indication and try to resume TLS session
85+
with the same session ID. Even turning off SSL session cache via setting
86+
``pycurl.SSL_SESSIONID_CACHE`` to zero won't help (it's ignored by
87+
libcurl/pycurl for some reason). PyCURL+NSS fail to see that server
88+
didn't acknowledge SNI in response (see RFC 4366 reference below), thus
89+
'Host' header in HTTP and SNI seen by server are different, thus HTTP
90+
404.
91+
92+
This one issue was especially insidious bug, many thanks to Pavel Janík
93+
for helping hunt this bug down.
94+
95+
Testcase
96+
^^^^^^^^
97+
98+
See ``curl_test_nss/curl_testcase_nss_sni.py`` script that demonstrates
99+
the bug.
100+
101+
Technical details
102+
^^^^^^^^^^^^^^^^^
103+
104+
PyCURL sends TLS handshake with SNI for the first host. This works.
105+
Connection is then closed, but PyCURL+NSS remembers the SSL session ID.
106+
It will attempt to use the same session ID when later connecting to
107+
second host on the same IP.
108+
109+
However, the server won't acknowledge what client requested with new
110+
SNI, because client attempts to resume during TLS handshake using the
111+
incorrect session ID. Thus the session is "resumed" to the first host's
112+
SNI.
113+
114+
Side observation: When validation is turned off in PyCURL+NSS, it also
115+
turns off session resume as a side effect (the code is in curl's nss.c).
116+
117+
Workaround
118+
^^^^^^^^^^
119+
120+
Set config to use SSLv3 instead of default TLSv1 (option ``ssl_version``
121+
under ``http`` section).
122+
123+
Normative reference
124+
^^^^^^^^^^^^^^^^^^^
125+
126+
See last four paragraphs of `RFC 4366, section
127+
3.1 <https://tools.ietf.org/html/rfc4366#section-3.1>`__. Contrast with
128+
`RFC 6066 section 3 <https://tools.ietf.org/html/rfc6066#section-3>`__,
129+
last two paragraphs. In TLS 1.2 the logic is reversed - server must not
130+
resume such connection and must go through full handshake again.
131+
132+
At most 9 capture groups in rule supported
133+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
134+
135+
This is a workaround for ambiguous rewrites in rules such as:
136+
137+
::
138+
139+
<rule from="^http://(www\.)?01\.org/" to="https://$101.org/" />
140+
141+
The ``$101`` would actually mean 101-st group, so we assume that only first digit after ``$``
142+
denotes the group (which is how it seems to work in javascript).
143+
144+
May not work under Windows
145+
~~~~~~~~~~~~~~~~~~~~~~~~~~
146+
147+
According to `PyCURL
148+
documentation <http://curl.haxx.se/libcurl/c/curl_easy_setopt.html#CURLOPTCAPATH>`__,
149+
using CAPATH may not work under Windows. I'd guess it's due to openssl's
150+
``c_rehash`` utility that creates symlinks to PEM certificates.
151+
Hypothetically it could work if the symlinks were replaced by regular
152+
files with identical names, but haven't tried.
153+
154+
Threading bugs and workarounds
155+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
156+
157+
There are some race conditions with Python threads and OpenSSL/GnuTLS
158+
that cause about due to SIGPIPE or SIGSEGV. While libcurl code seems to
159+
have implemented the necessary callbacks, there's a bug somewhere :-)
160+
161+
Workaround: set ``fetch_in_subprocess`` under ``http`` section in config
162+
to true when using multiple threads for fetching. Using subprocess is on
163+
by default.
164+
165+
You might have to set PYTHONPATH if working dir is different from code
166+
dir with python scripts.
167+
168+
If underlying SSL library is NSS, threading looks fine.
169+
170+
As a side effect, the CURL+NSS SNI bug does not happen with subprocesses
171+
(SSL session ID cache is not kept among process invocations).
172+
173+
If pure-threaded version starts eating too much memory (like 1 GB in a
174+
minute), turn on the ``fetch_in_subprocess`` option metioned above. Some
175+
combinations of CURL and SSL library versions do that. Spawning separate
176+
subprocesses prevents any caches building up and eating too much memory.
177+
178+
Using subprocess hypothetically might cause a deadlock due to
179+
insufficient buffer size when exchanging data through stdin/stdout in
180+
case of a large HTML page, but hasn't happened for any of the rules
181+
(I've tried to run them on the complete batch of rulesets contained in
182+
HTTPS Everywhere Nov 2 2012 commit
183+
c343f230a49d960dba90424799c3bacc2325fc94). Though in case deadlock
184+
happens, increase buffer size in ``subprocess.Popen`` invocation in
185+
``http_client.py``.
186+
187+
Generic bugs/quirks of SSL libraries
188+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
189+
190+
Each of the three possible libraries (OpenSSL, GnuTLS, NSS) has
191+
different set of quirks. GnuTLS seems to be the most strict one
192+
regarding relevant RFCs and will not for instance tolerate certificate
193+
chain in wrong order or forgive server not sending ``close_notify``
194+
alert.
195+
196+
Thus it's entirely possible that while a server chain and SSL/TLS
197+
handshake seems OK when using one lib, it may break with the other.
198+
199+
Transvalid certificates (transitive closure of root and intermediate certs)
200+
---------------------------------------------------------------------------
201+
202+
The ``platform_certs/FF_transvalid.tar.bz2`` attempts to simulate common
203+
browser behavior of caching intermediate certs. The directory contains
204+
FF's builtin certs and all intermediate certs that validate from FF's
205+
builtin certs (a transitive closure).
206+
207+
The certs above are in a tarball (need to be unpacked and c\_rehash'd
208+
for use).
209+
210+
The script is in ``certs_transitive_closure/build_closure.sh`` and is
211+
rather crude, definitely needs some double-checking of sanity (see
212+
comments inside the script).
213+
214+
Quick outline of the script's algorithm:
215+
216+
1. IntermediateSet\_0 := {trusted builtin certs from clean install of
217+
Firefox}
218+
2. Certs that have basic constraints CA=true or are X509 version 1 are
219+
exported from some DB like SSL Observatory
220+
3. Iterate over all exported certs, add new unique certificates not yet
221+
contained in IntermediateSet\_n validate against latest
222+
IntermediateSet\_n, forming IntermediateSet\_{n+1}
223+
4. n += 1
224+
5. If any certs were added in step 3, goto 3, else end
225+
226+
Last IntermediateSet is the closure.

0 commit comments

Comments
 (0)