Skip to content

fix: make readCancelConfirmation respect context cancellation#359

Merged
dlevy-msft-sql merged 11 commits into
microsoft:mainfrom
dlevy-msft-sql:fix/cancel-confirmation-context
Apr 25, 2026
Merged

fix: make readCancelConfirmation respect context cancellation#359
dlevy-msft-sql merged 11 commits into
microsoft:mainfrom
dlevy-msft-sql:fix/cancel-confirmation-context

Conversation

@dlevy-msft-sql
Copy link
Copy Markdown

@dlevy-msft-sql dlevy-msft-sql commented Apr 17, 2026

Summary

Make readCancelConfirmation respect context cancellation so QueryContext callers are not blocked indefinitely when the network is down.

Attribution

This fix implements the approach proposed by @tiwo in #160 (with @vasily-kirichenko's correction to use value type assertion doneStruct instead of pointer *doneStruct).

Problem

When a query is cancelled via context (e.g. QueryContext with a timeout), the driver sends a cancellation request to the server and then calls readCancelConfirmation to wait for the server's acknowledgement. This function uses for range tokChan which blocks until:

  1. A doneStruct with doneAttn is received (success), or
  2. The channel is closed

When the network is down, processSingleResponse blocks on sess.buf reads and never closes tokChan. This causes readCancelConfirmation to block forever, even though the caller's context has already been cancelled.

Users reported that QueryContext with a deadline would hang permanently instead of returning when the network connection was lost (#160).

Fix

Change readCancelConfirmation to accept a context.Context and return a cancelConfirmationResult enum (received/channelClosed/unavailable) instead of bool. It uses a double-select pattern: first non-blocking to prioritize buffered tokens, then blocking on both ctx.Done() and tokChan.

In nextToken(), a drainCtx with a 5-second timeout is created (since t.ctx is already cancelled at that point) and passed to readCancelConfirmation for the first drain attempt. When the first drain returns channelClosed (current response had no DONE_ATTN), a second response reader is started with t.ctx (already cancelled) so that processSingleResponse's ReturnMessageEnqueue calls return immediately via ctx.Done() — preventing goroutine stalls on a full message queue. A fresh drainCtx2 is created for the second readCancelConfirmation call so the first attempt's elapsed time does not consume the second attempt's timeout budget. When either drain returns unavailable, a ServerError is returned with diagnostic context (phase and underlying cause).

Tests

Added 9 tests:

Unit tests for readCancelConfirmation (token_unit_test.go):

  • TestReadCancelConfirmation_Success - confirms doneAttn token returns received
  • TestReadCancelConfirmation_ChannelClosedWithoutConfirmation - closed channel returns channelClosed
  • TestReadCancelConfirmation_ContextCancelled - cancelled context returns unavailable
  • TestReadCancelConfirmation_SkipsNonAttnTokens - non-done and done-without-attn tokens are skipped
  • TestReadCancelConfirmation_PrioritizesBufferedTokenOverCancelledContext - buffered attn token wins over cancelled ctx
  • TestReadCancelConfirmation_ErrorTokenIsUnavailable - error on channel returns unavailable

Direct tests for nextToken cancel drain (token_test.go):

  • TestNextToken_CancelDrainCurrentResponseConfirmationReturnsContextError - current-response doneAttn returns context.Canceled without starting a second response
  • TestNextToken_CancelDrainUnavailableDoesNotStartSecondResponse - error token short-circuits without spawning a second processSingleResponse
  • TestNextToken_CancelDrainClosedChannelStartsSecondResponse - closed channel proceeds to the second response and finds doneAttn confirmation

Goroutine Lifecycle

When cancel-drain times out, processSingleResponse may still be running (blocked on a network read). The ServerError return marks the connection as bad. When the connection pool closes the underlying net.Conn, the blocked read unblocks with an error and the goroutine exits. This is the standard cleanup mechanism for bad connections.

Risk

Low. The only behavioral change is that cancellation drain is now bounded by a 5-second timeout per attempt. The function still processes all tokens identically when the context is not cancelled. The connection is marked bad (ServerError) if drain times out, which causes cleanup of any outstanding goroutines when the connection is closed.

Fixes #160

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 17, 2026

Codecov Report

❌ Patch coverage is 79.26829% with 17 lines in your changes missing coverage. Please review.
✅ Project coverage is 96.60%. Comparing base (c10fa99) to head (12ba6e0).

Files with missing lines Patch % Lines
token.go 79.26% 14 Missing and 3 partials ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##             main     #359       +/-   ##
===========================================
+ Coverage   80.66%   96.60%   +15.94%     
===========================================
  Files          35       92       +57     
  Lines        6842    74351    +67509     
===========================================
+ Hits         5519    71829    +66310     
- Misses       1055     2187     +1132     
- Partials      268      335       +67     
Flag Coverage Δ
unittests 96.53% <80.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
tds.go 70.01% <ø> (+0.32%) ⬆️
token.go 69.83% <79.26%> (+1.12%) ⬆️

... and 57 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@dlevy-msft-sql dlevy-msft-sql added this to the v1.11.0 milestone Apr 17, 2026
@dlevy-msft-sql dlevy-msft-sql force-pushed the fix/cancel-confirmation-context branch from faaaeb8 to f96bf9e Compare April 22, 2026 22:13
@shueybubbles shueybubbles requested a review from Copilot April 24, 2026 13:50
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the token-processing cancellation drain path so waiting for server cancellation acknowledgement (doneAttn) can stop when a context is done, preventing QueryContext from hanging indefinitely when the network is unresponsive.

Changes:

  • Add a context-aware readCancelConfirmation(ctx, tokChan) loop that can exit on ctx.Done().
  • Introduce a dedicated drain context (with cancelDrainTimeout) to wait for cancellation acknowledgement after the caller context is already cancelled.
  • Add unit tests covering success, channel close, context cancellation, skipping non-done tokens, and timeout behavior.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
token.go Adds drain timeout constant, threads a drain context through the cancellation-ack wait, and makes readCancelConfirmation context-aware.
token_unit_test.go Adds unit tests for the new context-aware readCancelConfirmation behavior.

Comment thread token.go Outdated
Comment thread token.go Outdated
Comment thread token_unit_test.go
@dlevy-msft-sql dlevy-msft-sql force-pushed the fix/cancel-confirmation-context branch from f96bf9e to 3db1a22 Compare April 24, 2026 17:51
@dlevy-msft-sql dlevy-msft-sql modified the milestones: v1.11.0, v1.10.0 Apr 24, 2026
@dlevy-msft-sql dlevy-msft-sql force-pushed the fix/cancel-confirmation-context branch 3 times, most recently from 37ec17d to 7b4c6bd Compare April 24, 2026 20:00
@dlevy-msft-sql dlevy-msft-sql requested a review from Copilot April 24, 2026 20:45
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

Comment thread token.go Outdated
@dlevy-msft-sql dlevy-msft-sql force-pushed the fix/cancel-confirmation-context branch 3 times, most recently from 6ebd751 to 4c2467f Compare April 24, 2026 21:11
…onse goroutines

Adds a readDone channel to tdsSession that serializes access to the
shared tdsBuffer. Each processSingleResponse goroutine waits for the
previous one to finish reading before starting. Extracts a
startResponseReader helper to deduplicate the pattern.
@dlevy-msft-sql dlevy-msft-sql force-pushed the fix/cancel-confirmation-context branch 3 times, most recently from f99c927 to a7998ad Compare April 25, 2026 00:42
@dlevy-msft-sql dlevy-msft-sql requested a review from Copilot April 25, 2026 00:45
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Comment thread token.go
Comment thread token.go
Comment thread tds.go Outdated
Comment thread queries_go19_test.go Outdated
- Accept context.Context parameter in readCancelConfirmation
- Create dedicated drainCtx with 5-second timeout for cancel drain
- Extract cancelDrainTimeout constant and cancelConfirmationResult enum
- Distinguish channel-closed vs error/timeout to skip second response
- Add 8 tests: 6 unit tests for readCancelConfirmation, 2 integration
  tests for nextToken cancel-drain paths
@dlevy-msft-sql dlevy-msft-sql force-pushed the fix/cancel-confirmation-context branch from a7998ad to 73feec8 Compare April 25, 2026 00:58
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Comment thread token.go Outdated
Comment thread token.go Outdated
…readCancelConfirmation

- Use fmt.Errorf with %w instead of %s to preserve the original error
  value in cancelDrainError, enabling errors.Is/As on the inner cause
- Add non-blocking drain of tokChan when ctx.Done fires in the second
  select of readCancelConfirmation, preventing spurious connection
  marking when a DONE_ATTN arrives simultaneously with context cancel
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Comment thread token.go
…oroutine leak

When nextToken() returns a StreamError from cancel-drain failure,
processSingleResponse may still be sending to tokChan. Without a
receiver, it blocks forever on ch <- ... . Start a background drain
goroutine on both error-return paths so the sender can make progress
and exit once the connection is closed.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Comment thread token.go Outdated
Simplify readCancelConfirmation to a two-phase select loop:
1. Non-blocking: prefer buffered tokens over ctx.Done
2. Blocking: check ctx.Done alongside tokChan every iteration

The previous three-select structure could loop indefinitely on
the non-blocking path (via continue) without ever reaching the
ctx.Done check, defeating the cancelDrainTimeout bound.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Comment thread token.go Outdated
Comment thread token.go Outdated
Replace the two-phase select (non-blocking + blocking) with a single
select that checks both ctx.Done and tokChan every iteration:
- Fixes timeout enforcement: ctx.Done is a candidate on every select,
  not bypassed by a non-blocking continue loop
- Fixes select race: when ctx.Done wins over a simultaneously-ready
  tokChan, a non-blocking drain catches any buffered DONE_ATTN before
  returning unavailable
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Comment thread token_test.go
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated no new comments.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated no new comments.

@dlevy-msft-sql dlevy-msft-sql merged commit 65e137f into microsoft:main Apr 25, 2026
15 checks passed
@dlevy-msft-sql dlevy-msft-sql deleted the fix/cancel-confirmation-context branch April 25, 2026 20:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

QueryContext blocks when the network is down

4 participants