They are only optional today. but to get there, they need to be made reliable. We are currently running 19 browsers in SL and 15 in BS, and monitoring the output.
Thanks to #4417 which landed yesterday, the 2 are both green if everything goes well. But it doesn't happen often... We are experiencing 2 kind of errors:
- failure: browser does not start or disconnects or fails or is very slow, so the campaign is not fully run
- flake: a test that should pass fails because it reaches the timeout
To improve the situation, I've looked at about 60 builds in the last 10 days (all builds in master and presubmit-* after #4417). That's about 120 jobs or 2000 campaigns or 4M+ unit tests.
Most jobs failed (90%) due to 123 failures and 74 flakes. This is bad, but:
- 70% of the failures come from iOS9. Our campaign doesn't work well in this simulator, be it in SL,in BS or locally. In 80% of the campaigns, Safari mobile freezes silently, fails to load resources (XHR error), or is very slow.
- 80% of the flakes come from 5 unit tests which can "randomly" fail in any browser. They are:
PostMessageBusSink should send messages immediatly when run outside the zone
element probe should provide a global function to inspect elements
MdButton button[md-button] should disable the button
MdButton a[md-button] should remove disabled anchors from tab order
MdButton button[md-button] should handle a click on the button
It also interesting to note that there is a trend which shows that SL is better with Windows, while BS is better with Mac. This applies to both failures and flakes.
So an idea would be split browsers among the 2 providers to improve the overall reliability.
Next steps:
- remove iOS9 from CI until it is made reliable, if possible. There is no problem with the desktop version, so it shouldn't be an issue
- increase the timeout of the 5 identified flaky tests
- continue monitoring and apply the split if the trend is confirmed
They are only optional today. but to get there, they need to be made reliable. We are currently running 19 browsers in SL and 15 in BS, and monitoring the output.
Thanks to #4417 which landed yesterday, the 2 are both green if everything goes well. But it doesn't happen often... We are experiencing 2 kind of errors:
To improve the situation, I've looked at about 60 builds in the last 10 days (all builds in master and presubmit-* after #4417). That's about 120 jobs or 2000 campaigns or 4M+ unit tests.
Most jobs failed (90%) due to 123 failures and 74 flakes. This is bad, but:
It also interesting to note that there is a trend which shows that SL is better with Windows, while BS is better with Mac. This applies to both failures and flakes.
So an idea would be split browsers among the 2 providers to improve the overall reliability.
Next steps: