Skip to content

Jameswtruher/travisdailybuild#2958

Merged
TravisEz13 merged 11 commits into
PowerShell:masterfrom
JamesWTruher:jameswtruher/travisdailybuild
Jan 10, 2017
Merged

Jameswtruher/travisdailybuild#2958
TravisEz13 merged 11 commits into
PowerShell:masterfrom
JamesWTruher:jameswtruher/travisdailybuild

Conversation

@JamesWTruher
Copy link
Copy Markdown
Collaborator

@JamesWTruher JamesWTruher commented Jan 5, 2017

enable daily test run support in travis

This also includes a number of test changes which need to be reviewed by the test owners
@lzybkr I've made changes in parser tests to support async execution to avoid test run hangs which have sporadically appeared. Now, if one of those tests hang, it (the test) will fail rather than the entire run.
@Francisco-Gamino, I've made changes to the help tests, counter and management tests

once this is merged, I'll have an addition PR to put the test run badge in the ReadMe.md (after I create the cron job to actually run the tests)

Modify test failure presentation to use platform available XML methods
Some of the language/parser tests have been hanging in a non-reproducable manner which
causes the CI system to invalidate the entire run. This change adds support for timeout
which will fail a test if it runs to long, rather than invalidate the entire run.

current behavior is still supported, and is not done in a new session:
PS> get-runtimeerror -src '1/'
At line:1 char:3
+ 1/
+   ~
You must provide a value expression following the '/' operator.
    + CategoryInfo          : ParserError: (:) [], ParentContainsErrorRecordException
    + FullyQualifiedErrorId : ExpectedValueExpression

Adding a timeout will do the operation in a async powershell session
PS> get-runtimeerror -src '1/' -timeout 5
You must provide a value expression following the '/' operator.
    + CategoryInfo          : ParserError: (:) [], ParentContainsErrorRecordException
    + FullyQualifiedErrorId : ExpectedValueExpression

If the operation takes longer than the supplied timeout, a timeout error will be returned
PS> get-runtimeerror -src 'start-sleep 6' -timeout 2
get-runtimeerror : Operation Timed Out ('start-sleep 6')
At line:1 char:1
+ get-runtimeerror -src 'start-sleep 6' -timeout 2
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : NotSpecified: (:) [Write-Error], WriteErrorException
    + FullyQualifiedErrorId : Microsoft.PowerShell.Commands.WriteErrorException,Get-RuntimeError
Also, only call add-type in CounterTestHelperFunctions.ps1 if we're going to actually run the tests
Travis watches output from the build to ensure that it hasn't hung
we need to find a balance between too much output and not enough output.
A run which has too much output is killed because it looks like an error loop
A run which has too little output is killed because it looks like a hang

$cmd = ConstructCommand $testCase
Write-Host "Command to run: $cmd"
# Write-Host "Command to run: $cmd"
Copy link
Copy Markdown
Member

@adityapatwardhan adityapatwardhan Jan 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should just delete this line. #Resolved

Copy link
Copy Markdown
Collaborator Author

@JamesWTruher JamesWTruher Jan 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done! #Closed

Get-InstalledModule -Name $ContosoServer -AllVersions -ErrorAction SilentlyContinue | Uninstall-Module -Force
}


Copy link
Copy Markdown
Member

@adityapatwardhan adityapatwardhan Jan 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra line. #Resolved

Copy link
Copy Markdown
Collaborator Author

@JamesWTruher JamesWTruher Jan 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleted #Closed

Comment thread build.psm1
log ("Name='{0}', Destination='{1}', Repository='{2}'" -f ($Name -join ','), $Destination, $RepositoryName)

# do not output progress
$ProgressPreference = "SilentlyContinue"
Copy link
Copy Markdown
Member

@adityapatwardhan adityapatwardhan Jan 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to reset this back to previous setting when we complete the script? #ByDesign

Copy link
Copy Markdown
Collaborator Author

@JamesWTruher JamesWTruher Jan 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope - test tests are executed in a new scope, so it gets set back when the script exits #Closed

Remove extraneous extra line in PowerShellGet.Tests.ps1
Copy link
Copy Markdown
Contributor

@lzybkr lzybkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Help me understand the timeout on the parser tests - the test will still fail, but just fail faster? Or something else?

I wonder how hard it would be to get a core dump when we see the timeout - upload that as an artifact, and debug later.

}
catch
{
write-verbose -verbose "caught"
Copy link
Copy Markdown
Contributor

@lzybkr lzybkr Jan 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a left-over debugging message?
caught isn't a very useful message by itself, and I don't see any other Write-Verbose calls in this function to give it context. #Resolved

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed yes - pulled


In reply to: 94878328 [](ancestors = 94878328)

{
It "error should happen at parse time, not at runtime" -Skip {}
$errors = Get-RuntimeError -Src $src
$errors = Get-RuntimeError -Src $src -Timeout 5
Copy link
Copy Markdown
Contributor

@lzybkr lzybkr Jan 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd feel a bit better if we waited a bit longer, maybe 10 seconds - in case we're on a really slow VM or we add a test that is somewhat longish (like 1s or so). #Resolved

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was SWAG for sure, I don't have trouble with having this be 10


In reply to: 94878763 [](ancestors = 94878763)

Comment thread build.psm1 Outdated
logerror "TEST FAILURES"
foreach ( $testfail in $x.SelectNodes('.//test-case[@result = "Failure"]'))
# switch between methods, SelectNode is not available on dotnet core
if ( "System.Xml.XmlDocumentXPathExtensions" -as "Type" ) {
Copy link
Copy Markdown
Contributor

@lzybkr lzybkr Jan 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a general rule, you should prefer -as [Type] instead of -as "Type" because the former resolves the type just once while the later resolves it every time it is executed. #Resolved

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed


In reply to: 94879724 [](ancestors = 94879724)

@PowerShellTeam PowerShellTeam added the Review - Needed The PR is being reviewed label Jan 6, 2017

$script:counterSamples = $null
if ($maxSamples)
if ($maxSamples -and $IsWindows)
Copy link
Copy Markdown
Collaborator

@iSazonov iSazonov Jan 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems Get-Counter don't work on IoT.
SetScriptVars is only called from BeforeAll blocks so we can mask it there by SkipCounterTests. #Resolved

}

if ($export)
if ($export -and $IsWindows)
Copy link
Copy Markdown
Collaborator

@iSazonov iSazonov Jan 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same about IoT. #Resolved

$ps.AddScript($src) > $null
$ar = $ps.BeginInvoke()
# give it 250 milliseconds to complete
start-sleep -mill 250
Copy link
Copy Markdown
Collaborator

@iSazonov iSazonov Jan 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parser tests is very fast (some ms). This change will slow them down in many times.
And how will it help us to find the root of the problem? Travis already has a global monitoring of a job duration and automatically kills a hung job. Maybe we ask them to take a dump in this moment? #WontFix

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it will slow down some tests, but I reckoned that it was better to slow down the test than to invalidate an entire run - that's what I'm trying to avoid. When running a full test pass I saw nearly 100% test runs killed for lack of activity. Each one of those runs were hung in the language tests.


In reply to: 94967709 [](ancestors = 94967709)

Copy link
Copy Markdown
Collaborator

@iSazonov iSazonov Jan 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am concerned that we conceal the problem. Is there a way to reproduce this problem except to get a hung accidentally? I believe that we in the allowed time could identify the problem and fix it by analyzing a dump without slow down tests. #WontFix

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while I agree that we need to determine root cause, i disagree about where. Catching this in CI has been extremely difficult and collecting dumps in Travis is really something I would rather avoid (if it's even possible).
we need to track down this issue, but I don't believe CI is the appropriate place


In reply to: 94993680 [](ancestors = 94993680)

Copy link
Copy Markdown
Member

@TravisEz13 TravisEz13 Jan 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be an issue that needs investigation. Please open an issue for this. #Resolved

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an issue has already been created for the hanging behavior. #2748


In reply to: 95024875 [](ancestors = 95024875)

Alter timeout to 10 seconds to be improve chances of not timing out for runtime parser checks
improve logic for counter tests to also skip for IoT
$ShouldRun = $false
}

if ( $ShouldRun )
Copy link
Copy Markdown
Collaborator

@iSazonov iSazonov Jan 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$ShouldRun looks like duplication of SkipCounterTests #Resolved

Copy link
Copy Markdown
Member

@TravisEz13 TravisEz13 Jan 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, This is duplicate of SkipCounterTests #Resolved

Copy link
Copy Markdown
Member

@TravisEz13 TravisEz13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll merge. I expect the comment to be addressed.

$ps.AddScript($src) > $null
$ar = $ps.BeginInvoke()
# give it 250 milliseconds to complete
start-sleep -mill 250
Copy link
Copy Markdown
Member

@TravisEz13 TravisEz13 Jan 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be an issue that needs investigation. Please open an issue for this. #Resolved

$ShouldRun = $false
}

if ( $ShouldRun )
Copy link
Copy Markdown
Member

@TravisEz13 TravisEz13 Jan 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, This is duplicate of SkipCounterTests #Resolved

function SkipCounterTests
{
if ([System.Management.Automation.Platform]::IsLinux -or
[System.Management.Automation.Platform]::IsOSX)
Copy link
Copy Markdown
Contributor

@Francisco-Gamino Francisco-Gamino Jan 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have a single function with all the logic to determine whether the tests should run. #Resolved

@JamesWTruher
Copy link
Copy Markdown
Collaborator Author

JamesWTruher commented Jan 6, 2017

@lzybkr I don't seem to be able to follow up on your question about the timeout in place.

Currently, when one of the parser tests hang, the entire test run is killed after about 10 minutes of inactivity. The timeout code enables us to fail a single test and have the rest of the run continue. I agree that capturing a dump at that point would be fantastic, but I don't know how to do that. Can you provide instructions? I'll be very happy to add it.
#Resolved

@JamesWTruher
Copy link
Copy Markdown
Collaborator Author

JamesWTruher commented Jan 6, 2017

@TravisEz13 an issue has already been created for the hanging behavior. #2748 #Resolved

@lzybkr
Copy link
Copy Markdown
Contributor

lzybkr commented Jan 7, 2017

I don't know, but a search of 'create linux core dump' suggests that gcore or kill may be capable of creating the core dump.

@iSazonov
Copy link
Copy Markdown
Collaborator

iSazonov commented Jan 7, 2017

It seems it is impossible that hung process killed himself and unloaded a dump.
As I mentioned above the best way to get the dump is to ask Travis team add the new option to travis.yml They already control session timeout and may at this point easily take a dump and upload it in the specified (in travis.yml) location. This can be useful in the future to solve other problems.

Possible workarround http://jsteemann.github.io/blog/2014/10/30/getting-core-dumps-of-failed-travisci-builds/

@lzybkr
Copy link
Copy Markdown
Contributor

lzybkr commented Jan 7, 2017

We don't know if the process is hung, but clearly a thread is hung, so Jim's change may help in creating the dump. But I agree the CI system should provide a way to debug of they don't already. Appveyor definitely does provide rdp access, maybe Travis already has something similar.

@iSazonov
Copy link
Copy Markdown
Collaborator

iSazonov commented Jan 7, 2017

I found nothing in Travis help but I discovered long discussion about travis dumps travis-ci/travis-ci#3754

@JamesWTruher
Copy link
Copy Markdown
Collaborator Author

will that provide you with what you need, given managed code? As you're the author/owner of these particular tests, shall I just open an issue for you to track?


In reply to: 271043086 [](ancestors = 271043086)

@JamesWTruher
Copy link
Copy Markdown
Collaborator Author

@Francisco-Gamino, @TravisEz13 my last commit should have addressed your request

@TravisEz13 TravisEz13 merged commit c97ca77 into PowerShell:master Jan 10, 2017
@JamesWTruher JamesWTruher deleted the jameswtruher/travisdailybuild branch January 13, 2017 22:30
@lzybkr
Copy link
Copy Markdown
Contributor

lzybkr commented Jan 25, 2017

@JamesWTruher

If I'm reading the logs correctly, the async changes didn't help at all, and most likely are a net negative right now. See these logs:

https://travis-ci.org/PowerShell/PowerShell/jobs/195355364

Notice 10 minutes with no output - so we failed to kill the async test and did not get further results.
Also notice the error tests take ~250ms when they should really take about ~1ms or less.

@iSazonov
Copy link
Copy Markdown
Collaborator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants