Skip to content
This repository was archived by the owner on Oct 24, 2020. It is now read-only.

Commit 172505b

Browse files
committed
Rebrand and update rulesets.md
1 parent 25c344b commit 172505b

1 file changed

Lines changed: 134 additions & 42 deletions

File tree

docs/en_US/rulesets.md

Lines changed: 134 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -1,107 +1,199 @@
1-
## HTTPS Everywhere Rulesets
1+
## HTTPS Always Rulesets
22

3-
This page describes how to write rulesets for [HTTPS Everywhere](https://eff.org/https-everywhere), a browser extension that switches sites over from HTTP to HTTPS automatically. HTTPS Everywhere comes with [thousands](http://www.eff.org/https-everywhere/atlas/) of rulesets that tell HTTPS Everywhere which sites it should switch to HTTPS and how. If there is a site that offers HTTPS and is not handled by the extension, this guide will explain how to add that site.
3+
This page describes how to write rulesets for HTTPS Always, a browser extension that
4+
switches sites over from HTTP to HTTPS automatically. HTTPS Always comes with thousands
5+
of rulesets that tell HTTPS Always which sites it should switch to HTTPS and how.
6+
If there is a site that offers HTTPS and is not handled by the extension, this guide
7+
will explain how to add that site.
48

59
#### [Rulesets](#rulesets)
610

7-
A `ruleset` is an [XML](http://www.xml.com/pub/a/98/10/guide0.html?page=2) file describing behavior for a site or group of sites. A ruleset contains one or more `rules`. For example, here is [`RabbitMQ.xml`](https://github.com/efforg/https-everywhere/blob/master/src/chrome/content/rules/RabbitMQ.xml), from the addon distribution:
11+
A `ruleset` is an [XML](https://www.xml.com/pub/a/98/10/guide0.html?page=2) file
12+
describing behavior for a site or group of sites. A ruleset contains one or
13+
more `rules`. For example, here is
14+
[`RabbitMQ.xml`](https://github.com/g4jc/https-always/blob/master/src/chrome/content/rules/RabbitMQ.xml),
15+
from the addon distribution:
816

917
```xml
1018
<ruleset name="RabbitMQ">
11-
<target host="rabbitmq.com" />
12-
<target host="www.rabbitmq.com" />
19+
<target host="rabbitmq.com" />
20+
<target host="www.rabbitmq.com" />
1321

14-
<rule from="^http:"
15-
to="https:" />
22+
<rule from="^http:"
23+
to="https:" />
1624
</ruleset>
1725
```
1826

19-
The `target` tag specifies which web sites the ruleset applies to. The `rule` tag specifies how URLs on those web sites should be rewritten. This rule says that any URLs on `rabbitmq.com` and `www.rabbitmq.com` should be modified by replacing "http:" with "https:".
27+
The `target` tag specifies which web sites the ruleset applies to. The `rule`
28+
tag specifies how URLs on those web sites should be rewritten. This rule says
29+
that any URLs on `rabbitmq.com` and `www.rabbitmq.com` should be modified by
30+
replacing "http:" with "https:".
2031

21-
When the browser loads a URL, HTTPS Everywhere takes the host name (e.g. <tt>www.rabbitmq.com</tt>) and searches its ruleset database for rulesets that match that host name.
32+
When the browser loads a URL, HTTPS Always takes the host name (e.g.
33+
<tt>www.rabbitmq.com</tt>) and searches its ruleset database for rulesets that
34+
match that host name.
2235

23-
HTTPS Everywhere then tries each rule in those rulesets against the full URL. If the [Regular Expression](http://www.regular-expressions.info/quickstart.html), or regexp, in one of those rules matches, HTTPS Everywhere [rewrites the URL](#rules-and-regular-expressions) according the `to` attribute of the rule.
36+
HTTPS Always then tries each rule in those rulesets against the full URL.
37+
If the [Regular
38+
Expression](https://www.regular-expressions.info/quickstart.html), or regexp, in
39+
one of those rules matches, HTTPS Always [rewrites the
40+
URL](#rules-and-regular-expressions) according the `to` attribute of the rule.
2441

2542
#### [Wildcard Targets](#wildcard-targets)
2643

27-
To cover all of a domain's subdomains, you may want to specify a wildcard target like `*.twitter.com`. Specifying this type of left-side wildcard matches any host name with `.twitter.com` as a suffix, e.g. `www.twitter.com` or `urls.api.twitter.com`. You can also specify a right-side wildcard like `www.google.*`. Right-side wildcards, unlike left-side wildcards, apply only one level deep. So if you want to cover all countries you'll generally need to specify `www.google.*`, `www.google.co.*`, and `www.google.com.*` to cover domains like `www.google.co.uk` or `www.google.com.au`. You should use wildcard targets only when you have rules that apply to the entire wildcard space. If your rules only apply to specific hosts, you should list each host as a separate target.
44+
To cover all of a domain's subdomains, you may want to specify a wildcard
45+
target like `*.twitter.com`. Specifying this type of left-side wildcard matches
46+
any host name with `.twitter.com` as a suffix, e.g. `www.twitter.com` or
47+
`urls.api.twitter.com`. You can also specify a right-side wildcard like
48+
`www.google.*`. Right-side wildcards, unlike left-side wildcards, apply only
49+
one level deep. So if you want to cover all countries you'll generally need to
50+
specify `www.google.*`, `www.google.co.*`, and `www.google.com.*` to cover
51+
domains like `www.google.co.uk` or `www.google.com.au`. You should use wildcard
52+
targets only when you have rules that apply to the entire wildcard space. If
53+
your rules only apply to specific hosts, you should list each host as a
54+
separate target.
2855

2956
#### [Rules and Regular Expressions](#rules-and-regular-expressions)
3057

31-
The `rule` tags do the actual rewriting work. The `from` attribute of each rule is a [regular expression](http://www.regular-expressions.info/quickstart.html) matched against a full URL. You can use rules to rewrite URLs in simple or complicated ways. Here's a simplified (and now obsolete) example for Wikipedia:
58+
The `rule` tags do the actual rewriting work. The `from` attribute of each rule
59+
is a [regular expression](https://www.regular-expressions.info/quickstart.html)
60+
matched against a full URL. You can use rules to rewrite URLs in simple or
61+
complicated ways. Here's a simplified (and now obsolete) example for Wikipedia:
3262

3363
```xml
3464
<ruleset name="Wikipedia">
35-
<target host="*.wikipedia.org" />
65+
<target host="*.wikipedia.org" />
3666

37-
<rule from="^http://(\w{2})\.wikipedia\.org/wiki/"
38-
to="https://secure.wikimedia.org/wikipedia/$1/wiki/"/>
67+
<rule from="^http://(\w{2})\.wikipedia\.org/wiki/"
68+
to="https://secure.wikimedia.org/wikipedia/$1/wiki/"/>
3969
</ruleset>
4070
```
4171

42-
The `to` attribute replaces the text matched by the `from` attribute. It can contain placeholders like `$1` that are replaced with the text matched inside the parentheses.
72+
The `to` attribute replaces the text matched by the `from` attribute. It can
73+
contain placeholders like `$1` that are replaced with the text matched inside
74+
the parentheses.
4375

44-
This rule rewrites a URL like `http://fr.wikipedia.org/wiki/Chose` to `https://secure.wikimedia.org/wikipedia/fr/wiki/Chose`. Notice, again, that the target is allowed to contain (just one) * as a wildcard meaning "any".
76+
This rule rewrites a URL like `http://fr.wikipedia.org/wiki/Chose` to
77+
`https://secure.wikimedia.org/wikipedia/fr/wiki/Chose`. Notice, again, that the
78+
target is allowed to contain (just one) * as a wildcard meaning "any".
4579

46-
Rules are applied in the order they are listed within each ruleset. Order between rulesets is unspecified. Only the first rule or exception matching a given URL is applied.
80+
Rules are applied in the order they are listed within each ruleset. Order
81+
between rulesets is unspecified. Only the first rule or exception matching a
82+
given URL is applied.
4783

48-
Rules are evaluated using [Javascript regular expressions](http://www.regular-expressions.info/javascript.html), which are similar but not identical to [Perl-style regular expressions.](http://www.regular-expressions.info/pcre.html) Note that if your rules include ampersands (&amp;), they need to be appropriately XML-encoded: replace each occurrence of **&amp;** with **&amp;#x26;**.
84+
Rules are evaluated using [Javascript regular
85+
expressions](https://www.regular-expressions.info/javascript.html), which are
86+
similar but not identical to [Perl-style regular
87+
expressions.](https://www.regular-expressions.info/pcre.html) Note that if your
88+
rules include ampersands (&amp;), they need to be appropriately XML-encoded:
89+
replace each occurrence of **&amp;** with **&amp;#x26;**.
4990

5091
#### [Exclusions](#exclusions)
5192

52-
An exclusion specifies a pattern, using a regular expression, for URLs where the rule should **not** be applied. The Stack Exchange rule contains an exclusion for the OpenID login path, which breaks logins if it is rewritten:
93+
An exclusion specifies a pattern, using a regular expression, for URLs where
94+
the rule should **not** be applied. The Stack Exchange rule contains an
95+
exclusion for the OpenID login path, which breaks logins if it is rewritten:
5396

5497
```xml
55-
<exclusion pattern="^http://(?:\w+\.)?stack(?:exchange|overflow)\.com/users/authenticate/" />
98+
<exclusion pattern="^http://(\w+\.)?stack(exchange|overflow)\.com/users/authenticate/" />
5699
```
57100

58-
Exclusions are always evaluated before rules in a given ruleset. Matching any exclusion means that a URL won't match any rules within the same ruleset. However, if other rulesets match the same target hosts, the rules in those rulesets will still be tried.
101+
Exclusions are always evaluated before rules in a given ruleset. Matching any
102+
exclusion means that a URL won't match any rules within the same ruleset.
103+
However, if other rulesets match the same target hosts, the rules in those
104+
rulesets will still be tried.
59105

60106
#### [Style Guide](#style-guide)
61107

62-
There are many different ways you can write a ruleset, or regular expression within the ruleset. It's easier for everyone to understand the rulesets if they follow similar practices. You should read and follow the [Ruleset style guide](https://github.com/EFForg/https-everywhere/blob/master/CONTRIBUTING.md#ruleset-style-guide). Some of the guidelines in that document are intended to make [Ruleset testing](https://github.com/EFForg/https-everywhere/blob/master/ruleset-testing.md) less cumbersome.
108+
There are many different ways you can write a ruleset, or regular expression
109+
within the ruleset. It's easier for everyone to understand the rulesets if they
110+
follow similar practices. You should read and follow the Ruleset style
111+
guide.
63112

64113
#### [Secure Cookies](#secure-cookies)
65114

66-
Many HTTPS websites fail to correctly set the [secure flag](https://secure.wikimedia.org/wikipedia/en/wiki/HTTP_cookie#Secure_and_HttpOnly) on authentication and/or tracking cookies. HTTPS Everywhere provides a facility for turning this flag on. For instance:
115+
Many HTTPS websites fail to correctly set the [secure
116+
flag](https://en.wikipedia.org/wiki/HTTP_cookie#Secure_and_HttpOnly)
117+
on authentication and/or tracking cookies. HTTPS Always provides a facility
118+
for turning this flag on. For instance:
67119

68120
```xml
69121
<securecookie host="^market\.android\.com$" name=".+" />
70122
```
71123

72-
The "host" parameter is a regexp specifying which domains should have their cookies secured; the "name" parameter is a regexp specifying which cookies should be secured. For a cookie to be secured, it must be sent by a target host for that ruleset. It must also be sent over HTTPS and match the name regexp. For cookies set by Javascript in a web page, the Firefox extension can't tell which host set the cookie and instead uses the domain attribute of the cookie to check against target hosts. A cookie whose domain attribute starts with a "." (the default, if not specified by Javascript) will be matched as if it was sent from a host name made by stripping the leading dot.
124+
The "host" parameter is a regexp specifying which domains should have their
125+
cookies secured; the "name" parameter is a regexp specifying which cookies
126+
should be secured. For a cookie to be secured, it must be sent by a target host
127+
for that ruleset. It must also be sent over HTTPS and match the name regexp.
128+
For cookies set by Javascript in a web page, the UXP extension can't tell
129+
which host set the cookie and instead uses the domain attribute of the cookie
130+
to check against target hosts. A cookie whose domain attribute starts with a
131+
"." (the default, if not specified by Javascript) will be matched as if it was
132+
sent from a host name made by stripping the leading dot.
73133

74134
#### [Testing](#testing)
75135

76-
We use an [automated checker](https://github.com/hiviah/https-everywhere-checker) to run some basic tests on all rulesets. This is described in more detail in our [Ruleset Testing](https://github.com/EFForg/https-everywhere/blob/master/ruleset-testing.md) document, but in short there are two parts: Your ruleset must have enough test URLs to cover all the various types of URL covered by your rules. And each of those test URLs must load, both before rewriting and after rewriting. Every target host tag generates an implicit test URL unless it contains a wildcard. You can add additional test URLs manually using the `<test url="..."/>` tag. The test URLs you add this way should be real pages loaded from the site, or real images, CSS, and Javascript if you have rules that specifically affect those resources.
77-
78-
You should also manually test your ruleset by placing it in the `HTTPSEverywhereUserRules/` subdirectory in [your Firefox profile directory](http://kb.mozillazine.org/Profile_folder_-_Firefox), and then restarting Firefox. While using the rule, check for messages in the Firefox Error Console to see if there are any issues with the way the site supports HTTPS.
79-
80-
If you&apos;ve tested your rule and are sure it would be of use to the world at large, submit it as a [pull request](https://help.github.com/articles/using-pull-requests/) on our [GitHub repository](https://github.com/EFForg/https-everywhere/) or send it to the rulesets mailing list at `https-everywhere-rules AT eff.org`. Please be aware that this is a public and publicly-archived mailing list.
136+
We use an [automated
137+
checker](https://github.com/hiviah/https-everywhere-checker) to run some basic
138+
tests on all rulesets. This is described in more detail in our [Ruleset
139+
Testing](https://github.com/g4jc/https-always/blob/master/ruleset-testing.md)
140+
document, but in short there are two parts: Your ruleset must have enough test
141+
URLs to cover all the various types of URL covered by your rules. And each of
142+
those test URLs must load, both before rewriting and after rewriting. Every
143+
target host tag generates an implicit test URL unless it contains a wildcard.
144+
You can add additional test URLs manually using the `<test url="..."/>` tag.
145+
The test URLs you add this way should be real pages loaded from the site, or
146+
real images, CSS, and Javascript if you have rules that specifically affect
147+
those resources.
148+
149+
You can test rulesets in the browser using a hidden debugging page, but please
150+
be aware that this approach should only be used for debugging purposes and
151+
should not be used for setting up personal custom rules. You can access the
152+
hidden debugging page this way:
153+
154+
* Pale Moon: `about:addons` > HTTPS Always preferences > click under
155+
`General Settings` > press <kbd>Ctrl-Z</kbd>
156+
* Chromium/Chrome: `chrome://extensions/` > HTTPS Always options > click
157+
under `General Settings` > press <kbd>Ctrl-Z</kbd>
158+
159+
You might need to disable popup blocking for the page to appear. Once you have
160+
loaded the page, you might find it convenient to bookmark it for later use.
161+
162+
If you&apos;ve tested your rule and are sure it would be of use to the world at
163+
large, submit it as a [pull
164+
request](https://help.github.com/articles/using-pull-requests/) on our [GitHub
165+
repository](https://github.com/g4jc/https-always/).
81166

82167
#### [make-trivial-rule](#make-trivial-rule)
83168

84-
As an alternative to writing rules by hand, there are scripts you can run from a Unix command line to automate the process of creating a simple rule for a specified domain. These scripts are not included with HTTPS Everywhere releases but are available in our development repository and are described in [our development documentation](https://www.eff.org/https-everywhere/development).
169+
As an alternative to writing rules by hand, there are scripts you can run from
170+
a Unix command line to automate the process of creating a simple rule for a
171+
specified domain.
85172

86173
#### [Disabling a ruleset by default](#disabling-a-ruleset-by-default)
87174

88-
Sometimes rulesets are useful or interesting, but cause problems that make them unsuitable for being enabled by default in everyone's browsers. Typically when a ruleset has problems we will disable it by default until someone has time to fix it. You can do this by adding a `default_off` attribute to the ruleset element, with a value explaining why the rule is off.
175+
Sometimes rulesets are useful or interesting, but cause problems that make them
176+
unsuitable for being enabled by default in everyone's browsers. Typically when
177+
a ruleset has problems we will disable it by default until someone has time to
178+
fix it. You can do this by adding a `default_off` attribute to the ruleset
179+
element, with a value explaining why the rule is off.
89180

90181
```xml
91182
<ruleset name="Amazon (buggy)" default_off="breaks site">
92-
<target host="www.amazon.*" />
93-
<target host="amazon.*" />
183+
<target host="www.amazon.*" />
184+
<target host="amazon.*" />
94185
</ruleset>
95186
```
96187

97-
You can add more details, like a link to a bug report, in the comments for the file.
188+
You can add more details, like a link to a bug report, in the comments for the
189+
file.
98190

99191
#### [Mixed Content Blocking (MCB)](#mixed-content-blocking-mcb)
100192

101-
Some rulesets may trigger active mixed content (i.e. scripts loaded over HTTP instead of HTTPS). This type of mixed content is blocked in both [Chrome](https://trac.torproject.org/projects/tor/ticket/6975) and Firefox, before HTTPS Everywhere has a chance to rewrite the URLs to an HTTPS version. This generally breaks the site. However, the Tor Browser doesn&apos;t block mixed content, in order to allow HTTPS Everywhere to try and rewrite the URLs to an HTTPS version.
102-
103-
To enable a rule only on platforms that allow mixed content (currently only the Tor Browser), you can add a `platform="mixedcontent"` attribute to the ruleset element.
104-
105-
#### [HTTPS->HTTP downgrade rules](#https-http-downgrade-rules)
106-
107-
By default, HTTPS Everywhere will refuse to allow rules that would downgrade a URL from HTTPS to HTTP. Occasionally, this is necessary because the extension rewrites a page to HTTPS, and that page contains relative links to resources which do not exist on the HTTPS part of the site. This is very rare, especially because these resources will typically be blocked by [Mixed Content Blocking](#mixed-content-blocking-mcb). If it necessary, you can add a `downgrade="1"` attribute to the rule to make it easier to audit the ruleset library for such rules.
193+
Some rulesets may trigger active mixed content (i.e. scripts loaded over HTTP
194+
instead of HTTPS). This type of mixed content is blocked in most major browsers,
195+
before HTTPS Always has a chance to rewrite the URLs to an HTTPS version.
196+
This generally breaks the site. Depending on their configuration and threat
197+
model, some users might however decide to enable these rulesets via a global
198+
option in HTTPS Always. To that effect, such rulesets are identified with
199+
the specific `platform="mixedcontent"` attribute to the ruleset element.

0 commit comments

Comments
 (0)