Jameswtruher/travisdailybuild#2958
Conversation
Modify test failure presentation to use platform available XML methods
Some of the language/parser tests have been hanging in a non-reproducable manner which
causes the CI system to invalidate the entire run. This change adds support for timeout
which will fail a test if it runs to long, rather than invalidate the entire run.
current behavior is still supported, and is not done in a new session:
PS> get-runtimeerror -src '1/'
At line:1 char:3
+ 1/
+ ~
You must provide a value expression following the '/' operator.
+ CategoryInfo : ParserError: (:) [], ParentContainsErrorRecordException
+ FullyQualifiedErrorId : ExpectedValueExpression
Adding a timeout will do the operation in a async powershell session
PS> get-runtimeerror -src '1/' -timeout 5
You must provide a value expression following the '/' operator.
+ CategoryInfo : ParserError: (:) [], ParentContainsErrorRecordException
+ FullyQualifiedErrorId : ExpectedValueExpression
If the operation takes longer than the supplied timeout, a timeout error will be returned
PS> get-runtimeerror -src 'start-sleep 6' -timeout 2
get-runtimeerror : Operation Timed Out ('start-sleep 6')
At line:1 char:1
+ get-runtimeerror -src 'start-sleep 6' -timeout 2
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : NotSpecified: (:) [Write-Error], WriteErrorException
+ FullyQualifiedErrorId : Microsoft.PowerShell.Commands.WriteErrorException,Get-RuntimeError
Also, only call add-type in CounterTestHelperFunctions.ps1 if we're going to actually run the tests
Travis watches output from the build to ensure that it hasn't hung we need to find a balance between too much output and not enough output. A run which has too much output is killed because it looks like an error loop A run which has too little output is killed because it looks like a hang
|
|
||
| $cmd = ConstructCommand $testCase | ||
| Write-Host "Command to run: $cmd" | ||
| # Write-Host "Command to run: $cmd" |
There was a problem hiding this comment.
Should just delete this line. #Resolved
There was a problem hiding this comment.
done! #Closed
| Get-InstalledModule -Name $ContosoServer -AllVersions -ErrorAction SilentlyContinue | Uninstall-Module -Force | ||
| } | ||
|
|
||
|
|
There was a problem hiding this comment.
Extra line. #Resolved
There was a problem hiding this comment.
deleted #Closed
| log ("Name='{0}', Destination='{1}', Repository='{2}'" -f ($Name -join ','), $Destination, $RepositoryName) | ||
|
|
||
| # do not output progress | ||
| $ProgressPreference = "SilentlyContinue" |
There was a problem hiding this comment.
Do we need to reset this back to previous setting when we complete the script? #ByDesign
There was a problem hiding this comment.
nope - test tests are executed in a new scope, so it gets set back when the script exits #Closed
Remove extraneous extra line in PowerShellGet.Tests.ps1
lzybkr
left a comment
There was a problem hiding this comment.
Help me understand the timeout on the parser tests - the test will still fail, but just fail faster? Or something else?
I wonder how hard it would be to get a core dump when we see the timeout - upload that as an artifact, and debug later.
| } | ||
| catch | ||
| { | ||
| write-verbose -verbose "caught" |
There was a problem hiding this comment.
Is this a left-over debugging message?
caught isn't a very useful message by itself, and I don't see any other Write-Verbose calls in this function to give it context. #Resolved
There was a problem hiding this comment.
| { | ||
| It "error should happen at parse time, not at runtime" -Skip {} | ||
| $errors = Get-RuntimeError -Src $src | ||
| $errors = Get-RuntimeError -Src $src -Timeout 5 |
There was a problem hiding this comment.
I'd feel a bit better if we waited a bit longer, maybe 10 seconds - in case we're on a really slow VM or we add a test that is somewhat longish (like 1s or so). #Resolved
There was a problem hiding this comment.
it was SWAG for sure, I don't have trouble with having this be 10
In reply to: 94878763 [](ancestors = 94878763)
| logerror "TEST FAILURES" | ||
| foreach ( $testfail in $x.SelectNodes('.//test-case[@result = "Failure"]')) | ||
| # switch between methods, SelectNode is not available on dotnet core | ||
| if ( "System.Xml.XmlDocumentXPathExtensions" -as "Type" ) { |
There was a problem hiding this comment.
As a general rule, you should prefer -as [Type] instead of -as "Type" because the former resolves the type just once while the later resolves it every time it is executed. #Resolved
There was a problem hiding this comment.
|
|
||
| $script:counterSamples = $null | ||
| if ($maxSamples) | ||
| if ($maxSamples -and $IsWindows) |
There was a problem hiding this comment.
Seems Get-Counter don't work on IoT.
SetScriptVars is only called from BeforeAll blocks so we can mask it there by SkipCounterTests. #Resolved
| } | ||
|
|
||
| if ($export) | ||
| if ($export -and $IsWindows) |
There was a problem hiding this comment.
The same about IoT. #Resolved
| $ps.AddScript($src) > $null | ||
| $ar = $ps.BeginInvoke() | ||
| # give it 250 milliseconds to complete | ||
| start-sleep -mill 250 |
There was a problem hiding this comment.
Parser tests is very fast (some ms). This change will slow them down in many times.
And how will it help us to find the root of the problem? Travis already has a global monitoring of a job duration and automatically kills a hung job. Maybe we ask them to take a dump in this moment? #WontFix
There was a problem hiding this comment.
yes, it will slow down some tests, but I reckoned that it was better to slow down the test than to invalidate an entire run - that's what I'm trying to avoid. When running a full test pass I saw nearly 100% test runs killed for lack of activity. Each one of those runs were hung in the language tests.
In reply to: 94967709 [](ancestors = 94967709)
There was a problem hiding this comment.
I am concerned that we conceal the problem. Is there a way to reproduce this problem except to get a hung accidentally? I believe that we in the allowed time could identify the problem and fix it by analyzing a dump without slow down tests. #WontFix
There was a problem hiding this comment.
while I agree that we need to determine root cause, i disagree about where. Catching this in CI has been extremely difficult and collecting dumps in Travis is really something I would rather avoid (if it's even possible).
we need to track down this issue, but I don't believe CI is the appropriate place
In reply to: 94993680 [](ancestors = 94993680)
There was a problem hiding this comment.
There seems to be an issue that needs investigation. Please open an issue for this. #Resolved
Alter timeout to 10 seconds to be improve chances of not timing out for runtime parser checks improve logic for counter tests to also skip for IoT
| $ShouldRun = $false | ||
| } | ||
|
|
||
| if ( $ShouldRun ) |
There was a problem hiding this comment.
$ShouldRun looks like duplication of SkipCounterTests #Resolved
There was a problem hiding this comment.
Agreed, This is duplicate of SkipCounterTests #Resolved
| $ps.AddScript($src) > $null | ||
| $ar = $ps.BeginInvoke() | ||
| # give it 250 milliseconds to complete | ||
| start-sleep -mill 250 |
There was a problem hiding this comment.
There seems to be an issue that needs investigation. Please open an issue for this. #Resolved
| $ShouldRun = $false | ||
| } | ||
|
|
||
| if ( $ShouldRun ) |
There was a problem hiding this comment.
Agreed, This is duplicate of SkipCounterTests #Resolved
| function SkipCounterTests | ||
| { | ||
| if ([System.Management.Automation.Platform]::IsLinux -or | ||
| [System.Management.Automation.Platform]::IsOSX) |
There was a problem hiding this comment.
We should have a single function with all the logic to determine whether the tests should run. #Resolved
|
@lzybkr I don't seem to be able to follow up on your question about the timeout in place. Currently, when one of the parser tests hang, the entire test run is killed after about 10 minutes of inactivity. The timeout code enables us to fail a single test and have the rest of the run continue. I agree that capturing a dump at that point would be fantastic, but I don't know how to do that. Can you provide instructions? I'll be very happy to add it. |
|
@TravisEz13 an issue has already been created for the hanging behavior. #2748 #Resolved |
|
I don't know, but a search of 'create linux core dump' suggests that |
|
It seems it is impossible that hung process killed himself and unloaded a dump. Possible workarround http://jsteemann.github.io/blog/2014/10/30/getting-core-dumps-of-failed-travisci-builds/ |
|
We don't know if the process is hung, but clearly a thread is hung, so Jim's change may help in creating the dump. But I agree the CI system should provide a way to debug of they don't already. Appveyor definitely does provide rdp access, maybe Travis already has something similar. |
|
I found nothing in Travis help but I discovered long discussion about travis dumps travis-ci/travis-ci#3754 |
|
will that provide you with what you need, given managed code? As you're the author/owner of these particular tests, shall I just open an issue for you to track? In reply to: 271043086 [](ancestors = 271043086) |
…he logic in import-counter.tests.ps1
|
@Francisco-Gamino, @TravisEz13 my last commit should have addressed your request |
|
If I'm reading the logs correctly, the async changes didn't help at all, and most likely are a net negative right now. See these logs: https://travis-ci.org/PowerShell/PowerShell/jobs/195355364 Notice 10 minutes with no output - so we failed to kill the async test and did not get further results. |
enable daily test run support in travis
This also includes a number of test changes which need to be reviewed by the test owners
@lzybkr I've made changes in parser tests to support async execution to avoid test run hangs which have sporadically appeared. Now, if one of those tests hang, it (the test) will fail rather than the entire run.
@Francisco-Gamino, I've made changes to the help tests, counter and management tests
once this is merged, I'll have an addition PR to put the test run badge in the ReadMe.md (after I create the cron job to actually run the tests)