fix: make readCancelConfirmation respect context cancellation#359
Merged
dlevy-msft-sql merged 11 commits intoApr 25, 2026
Merged
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #359 +/- ##
===========================================
+ Coverage 80.66% 96.60% +15.94%
===========================================
Files 35 92 +57
Lines 6842 74351 +67509
===========================================
+ Hits 5519 71829 +66310
- Misses 1055 2187 +1132
- Partials 268 335 +67
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
faaaeb8 to
f96bf9e
Compare
shueybubbles
approved these changes
Apr 24, 2026
There was a problem hiding this comment.
Pull request overview
This PR updates the token-processing cancellation drain path so waiting for server cancellation acknowledgement (doneAttn) can stop when a context is done, preventing QueryContext from hanging indefinitely when the network is unresponsive.
Changes:
- Add a context-aware
readCancelConfirmation(ctx, tokChan)loop that can exit onctx.Done(). - Introduce a dedicated drain context (with
cancelDrainTimeout) to wait for cancellation acknowledgement after the caller context is already cancelled. - Add unit tests covering success, channel close, context cancellation, skipping non-done tokens, and timeout behavior.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| token.go | Adds drain timeout constant, threads a drain context through the cancellation-ack wait, and makes readCancelConfirmation context-aware. |
| token_unit_test.go | Adds unit tests for the new context-aware readCancelConfirmation behavior. |
f96bf9e to
3db1a22
Compare
37ec17d to
7b4c6bd
Compare
6ebd751 to
4c2467f
Compare
…onse goroutines Adds a readDone channel to tdsSession that serializes access to the shared tdsBuffer. Each processSingleResponse goroutine waits for the previous one to finish reading before starting. Extracts a startResponseReader helper to deduplicate the pattern.
f99c927 to
a7998ad
Compare
- Accept context.Context parameter in readCancelConfirmation - Create dedicated drainCtx with 5-second timeout for cancel drain - Extract cancelDrainTimeout constant and cancelConfirmationResult enum - Distinguish channel-closed vs error/timeout to skip second response - Add 8 tests: 6 unit tests for readCancelConfirmation, 2 integration tests for nextToken cancel-drain paths
a7998ad to
73feec8
Compare
…readCancelConfirmation - Use fmt.Errorf with %w instead of %s to preserve the original error value in cancelDrainError, enabling errors.Is/As on the inner cause - Add non-blocking drain of tokChan when ctx.Done fires in the second select of readCancelConfirmation, preventing spurious connection marking when a DONE_ATTN arrives simultaneously with context cancel
…oroutine leak When nextToken() returns a StreamError from cancel-drain failure, processSingleResponse may still be sending to tokChan. Without a receiver, it blocks forever on ch <- ... . Start a background drain goroutine on both error-return paths so the sender can make progress and exit once the connection is closed.
Simplify readCancelConfirmation to a two-phase select loop: 1. Non-blocking: prefer buffered tokens over ctx.Done 2. Blocking: check ctx.Done alongside tokChan every iteration The previous three-select structure could loop indefinitely on the non-blocking path (via continue) without ever reaching the ctx.Done check, defeating the cancelDrainTimeout bound.
Replace the two-phase select (non-blocking + blocking) with a single select that checks both ctx.Done and tokChan every iteration: - Fixes timeout enforcement: ctx.Done is a candidate on every select, not bypassed by a non-blocking continue loop - Fixes select race: when ctx.Done wins over a simultaneously-ready tokChan, a non-blocking drain catches any buffered DONE_ATTN before returning unavailable
shueybubbles
approved these changes
Apr 25, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Make
readCancelConfirmationrespect context cancellation soQueryContextcallers are not blocked indefinitely when the network is down.Attribution
This fix implements the approach proposed by @tiwo in #160 (with @vasily-kirichenko's correction to use value type assertion
doneStructinstead of pointer*doneStruct).Problem
When a query is cancelled via context (e.g.
QueryContextwith a timeout), the driver sends a cancellation request to the server and then callsreadCancelConfirmationto wait for the server's acknowledgement. This function usesfor range tokChanwhich blocks until:doneStructwithdoneAttnis received (success), orWhen the network is down,
processSingleResponseblocks onsess.bufreads and never closestokChan. This causesreadCancelConfirmationto block forever, even though the caller's context has already been cancelled.Users reported that
QueryContextwith a deadline would hang permanently instead of returning when the network connection was lost (#160).Fix
Change
readCancelConfirmationto accept acontext.Contextand return acancelConfirmationResultenum (received/channelClosed/unavailable) instead ofbool. It uses a double-select pattern: first non-blocking to prioritize buffered tokens, then blocking on bothctx.Done()andtokChan.In
nextToken(), adrainCtxwith a 5-second timeout is created (sincet.ctxis already cancelled at that point) and passed toreadCancelConfirmationfor the first drain attempt. When the first drain returnschannelClosed(current response had no DONE_ATTN), a second response reader is started witht.ctx(already cancelled) so thatprocessSingleResponse'sReturnMessageEnqueuecalls return immediately viactx.Done()— preventing goroutine stalls on a full message queue. A freshdrainCtx2is created for the secondreadCancelConfirmationcall so the first attempt's elapsed time does not consume the second attempt's timeout budget. When either drain returnsunavailable, aServerErroris returned with diagnostic context (phase and underlying cause).Tests
Added 9 tests:
Unit tests for
readCancelConfirmation(token_unit_test.go):TestReadCancelConfirmation_Success- confirmsdoneAttntoken returnsreceivedTestReadCancelConfirmation_ChannelClosedWithoutConfirmation- closed channel returnschannelClosedTestReadCancelConfirmation_ContextCancelled- cancelled context returnsunavailableTestReadCancelConfirmation_SkipsNonAttnTokens- non-done and done-without-attn tokens are skippedTestReadCancelConfirmation_PrioritizesBufferedTokenOverCancelledContext- buffered attn token wins over cancelled ctxTestReadCancelConfirmation_ErrorTokenIsUnavailable- error on channel returnsunavailableDirect tests for
nextTokencancel drain (token_test.go):TestNextToken_CancelDrainCurrentResponseConfirmationReturnsContextError- current-responsedoneAttnreturnscontext.Canceledwithout starting a second responseTestNextToken_CancelDrainUnavailableDoesNotStartSecondResponse- error token short-circuits without spawning a secondprocessSingleResponseTestNextToken_CancelDrainClosedChannelStartsSecondResponse- closed channel proceeds to the second response and findsdoneAttnconfirmationGoroutine Lifecycle
When cancel-drain times out,
processSingleResponsemay still be running (blocked on a network read). TheServerErrorreturn marks the connection as bad. When the connection pool closes the underlyingnet.Conn, the blocked read unblocks with an error and the goroutine exits. This is the standard cleanup mechanism for bad connections.Risk
Low. The only behavioral change is that cancellation drain is now bounded by a 5-second timeout per attempt. The function still processes all tokens identically when the context is not cancelled. The connection is marked bad (
ServerError) if drain times out, which causes cleanup of any outstanding goroutines when the connection is closed.Fixes #160