|
| 1 | +# Ruleset coverage requirements |
| 2 | + |
| 3 | +We have an automated tester that checks URLs for all rulesets to ensure they |
| 4 | +still work. In order for that tester to work we need input URLs. We have |
| 5 | +additional testing in place to ensure that all rulesets have a sufficient number |
| 6 | +of test URLs to test them thoroughly. |
| 7 | + |
| 8 | +Goal: 100% coverage of all targets and all branches of all regexes in each ruleset. |
| 9 | + |
| 10 | +Each ruleset has a number of "implicit" test URLs based on the target hosts. For |
| 11 | +each target host e.g. example.com, there is an implicit test URL of |
| 12 | +http://example.com/. Exception: target hosts that contain a wildcard ("*") do |
| 13 | +not create an implicit test URL. |
| 14 | + |
| 15 | +Additional test URLs can be added with the new <test> tag in the XML, e.g. |
| 16 | +<test url="http://example.com/complex-page">. |
| 17 | + |
| 18 | +Test URLs will be matched against the regexes in each <rule> and <exclusion>. A |
| 19 | +test URL can only match against one <rule> and one <exclusion>. Once all the |
| 20 | +test URLs have been matched up, we count the number of test URLs matching each |
| 21 | +<rule> and each <exclusion>, and make sure the count meets the minimum number. |
| 22 | +The minimum number of test URLs for each <rule> or <exclusion> is one plus the |
| 23 | +number of '*', '+', '?', or '|' characters in the regex. Since each of these |
| 24 | +characters increases the complexity of the regex (usually increasing the variety |
| 25 | +of URLs it can match), we require correspondingly more test URLs to ensure good |
| 26 | +coverage. |
| 27 | + |
| 28 | +TODO: We'd like to also require that there be at least three test URLs for every |
| 29 | +target host with a left-side wildcard, and at least ten test URLs for each |
| 30 | +target host with a right-side wildcard. But this is not yet implemented. |
| 31 | + |
| 32 | +# Example: |
| 33 | + <ruleset name="example.com"> |
| 34 | + <target host="example.com" /> |
| 35 | + <target host="*.example.com" /> |
| 36 | + |
| 37 | + <test url="http://www.example.com/" /> |
| 38 | + <test url="http://beta.example.com/" /> |
| 39 | + |
| 40 | + <rule from="^http://([\w-]+\.)?dezeen\.com/" |
| 41 | + to="https://$1dezeen.com/" /> |
| 42 | + |
| 43 | + </ruleset> |
| 44 | + |
| 45 | +This ruleset has one implicit test URL from a target host |
| 46 | +("http://example.com/"). The other target host has a wildcard, so creates no |
| 47 | +implicit test URL. There's a single rule. That rule contains a '+' and a '?', so |
| 48 | +it requires a total of three matching test URLs. We add the necessary test URLs |
| 49 | +using explicit <test> tags. |
| 50 | + |
| 51 | +# Testing and Continuous Build |
| 52 | + |
| 53 | +Testing for rulest coverage is now part of the Travis CI continuous build. |
| 54 | +Currently we only test rulesets that have been modified since February 2 2015. |
| 55 | +Submitting changes to any ruleset that does not meet the coverage requirements |
| 56 | +will break the build. This means that even fixes of existing rules may require |
| 57 | +additional work to bring them up to snuff. |
| 58 | + |
| 59 | +To run the tests locally, you'll need the https-everywhere-checker, which is now |
| 60 | +a submodule of https-everywhere. Run these commands to set it up: |
| 61 | + |
| 62 | + git submodule init |
| 63 | + git submodule update |
| 64 | + cd https-everywhere-checker |
| 65 | + pip install --user -r requirements.txt |
| 66 | + cd - |
| 67 | + ./test-ruleset-coverage.sh |
| 68 | + |
| 69 | +Note you may also need to apt-get install libcurl4-openssl-dev so that one of |
| 70 | +the requirements in https-everywhere-checker can be satisfied. |
| 71 | + |
| 72 | +To test a specific ruleset: |
| 73 | + |
| 74 | + python2.7 https-everywhere-checker/src/https_everywhere_checker/check_rules.py https-everywhere-checker/checker.config.sample rules/Example.xml |
0 commit comments