Skip to content

connectors#3

Closed
kirtimanmishrazipstack wants to merge 3 commits into
mainfrom
connector-issues
Closed

connectors#3
kirtimanmishrazipstack wants to merge 3 commits into
mainfrom
connector-issues

Conversation

@kirtimanmishrazipstack
Copy link
Copy Markdown
Contributor

What

...

Why

...

How

...

Relevant Docs

Related Issues or PRs

Dependencies Versions / Env Variables

Notes on Testing

...

Screenshots

...

Checklist

I have read and understood the Contribution Guidelines.

ritwik-g pushed a commit that referenced this pull request May 5, 2026
…o CORS (#1938)

* UN-3439 [FIX] Accept wildcard subdomain origins in SocketIO and Django CORS

Production socket connections were failing for `*.env.us-central.unstract.com`
because python-socketio does exact-string comparison on `cors_allowed_origins`,
so a literal `*` pattern silently rejected every real subdomain.

- Add `CORS_ALLOWED_ORIGIN_REGEXES` derived from `WEB_APP_ORIGIN_URL_WITH_WILD_CARD`.
- Wire SocketIO via `_RegexOrigin` whose `__eq__` does the regex match — single
  list entry covers all wildcard subdomains, no library subclass needed.
- Normalize `WEB_APP_ORIGIN_URL` through `urlparse` so trailing slashes / paths
  in env are stripped (also fixes the `…com//oauth-status/` double-slash).
- Add startup guard for malformed env values.

Resolves item #1 of UN-3439. Items #2/#3 (decoupling indexing from Socket.io,
fallback) are owned separately.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* UN-3439 [FIX] Address PR review: canonical origin, fullmatch, unhashable RegexOrigin, tests

Addresses five review comments on #1938:

1. coderabbitai (Major) — RFC 6454 canonicalization. Browsers serialize
   `Origin` headers with a lowercase host and no explicit default ports;
   `parsed_url.netloc` preserved both, so `https://APP.EXAMPLE.COM:443`
   would silently fail to match the browser's `https://app.example.com`.
   Switch to `parsed_url.hostname` + drop default ports, and reject
   non-http(s) schemes at startup.

2. greptile (P2) — `re.fullmatch` instead of `re.match`. With `re.match`
   plus `$`, a candidate ending in `\n` matches because `$` is allowed
   before an optional trailing newline. `fullmatch` removes the ambiguity.

3. self — `_RegexOrigin.__hash__` violated `a == b ⇒ hash(a) == hash(b)`
   (one fixed pattern hash vs. many matching strings). Today this is
   masked because python-socketio uses linear `__eq__` on a list, but if
   the allow-list is ever wrapped in a set, every legitimate subdomain
   would silently be rejected — exactly the failure mode UN-3439 closes.
   Make instances unhashable so the contract can't be broken.

4. self — No regression tests. Add `backend/utils/tests/test_cors_origin.py`
   (33 cases) covering: regex match/no-match, lookalike spoofing, scheme
   mismatch, trailing-newline rejection, non-string equality protocol,
   unhashability, ReDoS bounds, URL normalization (case, default ports,
   trailing slash, paths, queries), startup-guard rejections (empty,
   no-scheme, non-browser-scheme, no-host), and end-to-end via the same
   `RegexOrigin` path SocketIO uses.

5. self — Over-clever wildcard-to-regex builder. The
   `split('*').join(re.escape, ...)` construction generalised to N
   wildcards but the input has exactly one; replace with a direct rf-string
   that's self-evident on review.

Refactor for testability: extract `RegexOrigin` and `normalize_web_app_origin`
into `backend/utils/cors_origin.py` (Django-free, importable from settings
and tests). Settings now delegates to one helper call; `log_events.py`
imports `RegexOrigin`. No behavioural change beyond what each comment fixes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* UN-3439 [FIX] Address SonarCloud quality gate

The Sonar quality gate failed with C reliability + 5 security hotspots, all
on the new test file:

- S905 (Bug, Major) — `{ro}` flagged as no-side-effect statement (Sonar
  doesn't see the implicit `__hash__` call). Drove the C reliability rating.
  Fix: use `len({ro})` so the side effect is via an explicit function call;
  test still asserts the same `TypeError`.
- S5727 (Code Smell, Critical) — `assert ro != None` is tautological and
  doesn't exercise `__eq__`. Switch to `(ro == None) is False` which directly
  tests that `NotImplemented` falls back to identity-equality.
- S5332 × 5 (Hotspots) — `http://` and `ftp://` literals in test data.
  These are intentional inputs proving the rejection logic. Annotate with
  `# NOSONAR` and an explanatory comment so the hotspots can be marked
  reviewed.

No production code changed; tests still 33/33 passing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* UN-3439 [FIX] Remove last S5727 code smell — test __eq__ via dunder

Sonar S5727 correctly inferred that ``ro == None`` is statically always
False (NotImplemented falls back to identity), making the assertion look
tautological. The intent is to lock the protocol contract: ``__eq__`` must
return the ``NotImplemented`` sentinel for non-strings. Test that directly
via ``ro.__eq__(None) is NotImplemented`` instead of going through ``==``.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* UN-3439 [FIX] Address remaining CodeRabbit nits — port validation, ReDoS bound

Two minor follow-ups from the second CodeRabbit pass:

- `parsed.port` is a property that raises ValueError on malformed/out-of-range
  inputs (e.g. `:abc`, `:99999`). That bypassed our normalized config-error
  message and surfaced as a stack trace. Wrap the access and re-raise with
  the same actionable text. Adds two test cases (`https://example.com:abc`,
  `https://example.com:99999`) to lock the new behaviour.

- The 50ms ReDoS timing bound is too tight for noisy CI runners. Loosen to
  500ms — still orders of magnitude below what catastrophic backtracking
  would produce.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
kirtimanmishrazipstack added a commit that referenced this pull request May 7, 2026
* UN-3439 [FIX] Accept wildcard subdomain origins in SocketIO and Django CORS (#1938)

* UN-3439 [FIX] Accept wildcard subdomain origins in SocketIO and Django CORS

Production socket connections were failing for `*.env.us-central.unstract.com`
because python-socketio does exact-string comparison on `cors_allowed_origins`,
so a literal `*` pattern silently rejected every real subdomain.

- Add `CORS_ALLOWED_ORIGIN_REGEXES` derived from `WEB_APP_ORIGIN_URL_WITH_WILD_CARD`.
- Wire SocketIO via `_RegexOrigin` whose `__eq__` does the regex match — single
  list entry covers all wildcard subdomains, no library subclass needed.
- Normalize `WEB_APP_ORIGIN_URL` through `urlparse` so trailing slashes / paths
  in env are stripped (also fixes the `…com//oauth-status/` double-slash).
- Add startup guard for malformed env values.

Resolves item #1 of UN-3439. Items #2/#3 (decoupling indexing from Socket.io,
fallback) are owned separately.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* UN-3439 [FIX] Address PR review: canonical origin, fullmatch, unhashable RegexOrigin, tests

Addresses five review comments on #1938:

1. coderabbitai (Major) — RFC 6454 canonicalization. Browsers serialize
   `Origin` headers with a lowercase host and no explicit default ports;
   `parsed_url.netloc` preserved both, so `https://APP.EXAMPLE.COM:443`
   would silently fail to match the browser's `https://app.example.com`.
   Switch to `parsed_url.hostname` + drop default ports, and reject
   non-http(s) schemes at startup.

2. greptile (P2) — `re.fullmatch` instead of `re.match`. With `re.match`
   plus `$`, a candidate ending in `\n` matches because `$` is allowed
   before an optional trailing newline. `fullmatch` removes the ambiguity.

3. self — `_RegexOrigin.__hash__` violated `a == b ⇒ hash(a) == hash(b)`
   (one fixed pattern hash vs. many matching strings). Today this is
   masked because python-socketio uses linear `__eq__` on a list, but if
   the allow-list is ever wrapped in a set, every legitimate subdomain
   would silently be rejected — exactly the failure mode UN-3439 closes.
   Make instances unhashable so the contract can't be broken.

4. self — No regression tests. Add `backend/utils/tests/test_cors_origin.py`
   (33 cases) covering: regex match/no-match, lookalike spoofing, scheme
   mismatch, trailing-newline rejection, non-string equality protocol,
   unhashability, ReDoS bounds, URL normalization (case, default ports,
   trailing slash, paths, queries), startup-guard rejections (empty,
   no-scheme, non-browser-scheme, no-host), and end-to-end via the same
   `RegexOrigin` path SocketIO uses.

5. self — Over-clever wildcard-to-regex builder. The
   `split('*').join(re.escape, ...)` construction generalised to N
   wildcards but the input has exactly one; replace with a direct rf-string
   that's self-evident on review.

Refactor for testability: extract `RegexOrigin` and `normalize_web_app_origin`
into `backend/utils/cors_origin.py` (Django-free, importable from settings
and tests). Settings now delegates to one helper call; `log_events.py`
imports `RegexOrigin`. No behavioural change beyond what each comment fixes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* UN-3439 [FIX] Address SonarCloud quality gate

The Sonar quality gate failed with C reliability + 5 security hotspots, all
on the new test file:

- S905 (Bug, Major) — `{ro}` flagged as no-side-effect statement (Sonar
  doesn't see the implicit `__hash__` call). Drove the C reliability rating.
  Fix: use `len({ro})` so the side effect is via an explicit function call;
  test still asserts the same `TypeError`.
- S5727 (Code Smell, Critical) — `assert ro != None` is tautological and
  doesn't exercise `__eq__`. Switch to `(ro == None) is False` which directly
  tests that `NotImplemented` falls back to identity-equality.
- S5332 × 5 (Hotspots) — `http://` and `ftp://` literals in test data.
  These are intentional inputs proving the rejection logic. Annotate with
  `# NOSONAR` and an explanatory comment so the hotspots can be marked
  reviewed.

No production code changed; tests still 33/33 passing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* UN-3439 [FIX] Remove last S5727 code smell — test __eq__ via dunder

Sonar S5727 correctly inferred that ``ro == None`` is statically always
False (NotImplemented falls back to identity), making the assertion look
tautological. The intent is to lock the protocol contract: ``__eq__`` must
return the ``NotImplemented`` sentinel for non-strings. Test that directly
via ``ro.__eq__(None) is NotImplemented`` instead of going through ``==``.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* UN-3439 [FIX] Address remaining CodeRabbit nits — port validation, ReDoS bound

Two minor follow-ups from the second CodeRabbit pass:

- `parsed.port` is a property that raises ValueError on malformed/out-of-range
  inputs (e.g. `:abc`, `:99999`). That bypassed our normalized config-error
  message and surfaced as a stack trace. Wrap the access and re-raise with
  the same actionable text. Adds two test cases (`https://example.com:abc`,
  `https://example.com:99999`) to lock the new behaviour.

- The 50ms ReDoS timing bound is too tight for noisy CI runners. Loosen to
  500ms — still orders of magnitude below what catastrophic backtracking
  would produce.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ReverseMerge: V0.161.4 hotfix (#1943)

* Change csp to report only

* [HOTFIX] Bool-parse ENABLE_HIGHLIGHT_API_DEPLOYMENT env var (v0.161.4) (#1939)

[HOTFIX] Bool-parse ENABLE_HIGHLIGHT_API_DEPLOYMENT env var (#1937)

[FIX] Bool-parse ENABLE_HIGHLIGHT_API_DEPLOYMENT env var

os.environ.get returns the raw string when the variable is set, so
ENABLE_HIGHLIGHT_API_DEPLOYMENT="False" was truthy in Python (any
non-empty string is truthy). Wrap in CommonUtils.str_to_bool so
"False" / "false" / "0" actually evaluate to False.

The setting is consumed by the cloud configuration plugin's spec
default (ConfigSpec.default in plugins/configuration/cloud_config.py)
on cloud and on-prem builds. With this fix, an admin who explicitly
sets the env var to a falsy string sees highlight data stripped as
expected.

Co-authored-by: vishnuszipstack <117254672+vishnuszipstack@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Deepak K <89829542+Deepak-Kesavan@users.noreply.github.com>
Co-authored-by: vishnuszipstack <117254672+vishnuszipstack@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* UN-3448 [FIX] Remove vestigial `uv pip install` line in uv-lock-automation workflow (#1941)

* UN-3448 [FIX] Add --system flag to uv pip install in uv-lock-automation workflow

Modern uv requires uv pip install to run inside a virtual environment OR
with the explicit --system flag. The workflow currently has neither, so
it errors out:

  error: No virtual environment found for Python 3.12.9; run `uv venv`
  to create an environment, or pass `--system` to install into a
  non-virtual environment

This breaks every PR that touches a pyproject.toml (the workflow's
paths filter triggers on those). Last successful run was 2026-04-01,
before a behaviour change in uv or astral-sh/setup-uv@v7.

The --system flag is exactly what the error message suggests and is
correct here — we install pip into the runner's system Python; the
downstream uv-lock.sh script creates its own venvs as needed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* UN-3448 [FIX] Remove vestigial `uv pip install` line per review

Per @jaseemjaskp's review: the pre-step `uv pip install ... pip` does
nothing useful for this workflow. The downstream uv-lock.sh script
uses uv sync at line 74, which manages its own venvs internally and
never invokes pip directly:

  $ grep -rn 'pip' docker/scripts/uv-lock-gen/
  docker/scripts/uv-lock-gen/uv-lock.sh:2:set -o pipefail

Only match is pipefail (shell option), no real pip references.

Removing the line entirely is cleaner than papering over with --system.
The line was likely copy-pasted from a sibling workflow that legitimately
needed pip in the system Python.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ReverseMerge: V0.163.2 hotfix (#1946)

* [HOTFIX] Use importlib.util.find_spec for pluggable worker discovery (#1918)

* [FIX] Use importlib.util.find_spec for pluggable worker discovery

_verify_pluggable_worker_exists() previously checked for the literal file
`pluggable_worker/<name>/worker.py` on disk, which breaks when the plugin
has been compiled to a .so (Nuitka, Cython, or any C extension) — the
module is perfectly importable but the pre-check rejects it because only
the .py extension is considered.

Replace the filesystem check with importlib.util.find_spec(), which is
Python's standard way to ask "is this module resolvable by the import
system?". It honors every registered finder — source .py, compiled .so,
bytecode .pyc, namespace packages, zipimports — so the function now
matches what its docstring claims: verifying the module can be loaded,
not that a specific file extension is present.

Behavior is preserved for existing deployments:
- Images with no `pluggable_worker/<name>/` subpackage → find_spec
  raises ModuleNotFoundError (ImportError subclass) → returns False.
- Images with source .py → find_spec resolves the .py → returns True.
- Images with compiled .so → find_spec resolves the .so → returns True.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* [FIX] Handle ValueError from find_spec in pluggable worker verification

Greptile-flagged edge case: importlib.util.find_spec() can raise
ValueError (not just ImportError) when sys.modules has a partially
initialised module entry with __spec__ = None from a prior failed import.
Broaden the except to catch both.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* [FIX] Resolve api-deployment worker directory from enum import path

worker.py:452 did worker_type.value.replace("-", "_") to derive the
on-disk dir name. All WorkerType enum values already use underscores,
so the replace was a no-op; for API_DEPLOYMENT whose dir is
"api-deployment" (hyphen), it resolved to "api_deployment" and the
os.path.exists() check failed. Boot then logged a spurious
"❌ Worker directory not found: /app/api_deployment" at ERROR level.

The task registration path (builder + celery autodiscover via
to_import_path) is unaffected, so this was purely log noise — but
noise at ERROR level that masks real failures in log scans.

Fix: derive the directory from the authoritative to_import_path()
which already handles the hyphen case (api_deployment -> api-deployment).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* [HOTFIX] Add IAM Role / Instance Profile auth mode to AWS Bedrock adapter (#1944)

* [FEAT] Allow Bedrock to fall through to boto3's default credential chain

Match the S3/MinIO connector pattern: when AWS access keys are left blank
on the Bedrock LLM and embedding adapter forms, drop them from the kwargs
dict so boto3's default credential chain handles authentication. This
unlocks IAM role / instance profile / IRSA / AWS Profile scenarios on
hosts that already have ambient AWS credentials (e.g. EKS workers with
IRSA, EC2 with an instance profile).

- llm1/static/bedrock.json: clarify access-key descriptions to mention
  IRSA and instance profile (already non-required at v0.163.2 base).
- embedding1/static/bedrock.json: drop aws_access_key_id and
  aws_secret_access_key from top-level required; same description fix;
  expose aws_profile_name for parity with the LLM form.
- base1.py: AWSBedrockLLMParameters and AWSBedrockEmbeddingParameters
  now strip empty access-key values from the validated kwargs before
  returning, so empty strings don't override boto3's default chain.
  AWSBedrockEmbeddingParameters fields gain explicit None defaults
  and an aws_profile_name field.

Backward-compatible: existing adapters with access keys filled in
continue to work unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* [FEAT] Add Authentication Type selector to Bedrock adapter form

Add an explicit `auth_type` selector with two options, making the auth
choice clear to users:

- "Access Keys" (default): existing flow, keys required
- "IAM Role / Instance Profile (on-prem AWS only)": no fields; relies on
  boto3's default credential chain (IRSA on EKS, task role on ECS,
  instance profile on EC2). Description on the selector explicitly notes
  this option is only for AWS-hosted Unstract deployments.

The form-only auth_type field is stripped before LiteLLM validation in
both AWSBedrockLLMParameters.validate() and AWSBedrockEmbeddingParameters.
validate(). Empty access keys continue to be stripped so boto3 falls
through to the default chain even when the access_keys arm is selected
without values (matches the S3/MinIO connector pattern).

Backward-compatible: legacy adapters without auth_type behave as
"Access Keys" mode (the default), and existing keys are forwarded
unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* [REVIEW] Address Bedrock auth_type review feedback

Fixes the P0/P1 issues raised by greptile-apps and jaseemjaskp on
PR #1944.

Behaviour fixes:
- Stale-key leak in IAM Role mode: switching an existing adapter from
  Access Keys to IAM Role would carry truthy stored access keys through
  the strip-empty-only loop, so boto3 silently authenticated with the
  old long-lived credentials instead of falling through to the host's
  IRSA / instance-profile identity. Both LLM and embedding paths were
  affected.
- Silent acceptance of unknown auth_type: a typo (e.g. "access_key") or
  a malformed payload from a non-UI client passed through the dict
  comprehension untouched, with no enum guard.
- Cross-field validation gap: explicit Access Keys mode with blank or
  whitespace-only values silently fell through to the default
  credential chain instead of surfacing the misconfiguration.

Implementation:
- Add a module-level _resolve_bedrock_aws_credentials helper used by
  both AWSBedrockLLMParameters.validate() and AWSBedrock
  EmbeddingParameters.validate(), so the auth-type contract is
  expressed once.
  - Validates auth_type against an allowlist (None | "access_keys" |
    "iam_role"); raises ValueError on anything else.
  - iam_role: unconditionally drops aws_access_key_id and
    aws_secret_access_key.
  - access_keys (explicit): requires non-blank values; raises ValueError
    if either is empty or whitespace-only.
  - Legacy (auth_type absent): retains the lenient strip behaviour so
    pre-PR adapter configurations continue to deserialise unchanged.
- Restore aws_region_name as required (no `= None` default) on
  AWSBedrockEmbeddingParameters; only credentials may legitimately be
  absent.
- Drop the orphan aws_profile_name field from
  embedding1/static/bedrock.json: it was added for parity with the LLM
  form but lives outside the auth_type oneOf and contradicts the
  selector's "no further input" semantics. The LLM form already had
  aws_profile_name pre-PR and is left alone for backwards compatibility.

Tests:
- New tests/test_bedrock_adapter.py covers 15 cases across LLM and
  embedding adapters: legacy-no-auth-type, explicit access_keys with
  valid/blank/whitespace keys, iam_role with stale/no keys, unknown
  auth_type rejection, cross-field validation, and preservation of
  unrelated params (model_id, aws_profile_name, region, thinking).

Skipped (P2 nice-to-have):
- Comment-scope clarification, MinIO reference rewording,
  validate-mutates-caller'\''s-dict, and the LLM form description nit
  about aws_profile_name visibility. These don'\''t change behaviour
  and can be addressed in a follow-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

---------

Co-authored-by: Chandrasekharan M <117059509+chandrasekharan-zipstack@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Athul <89829560+athul-rs@users.noreply.github.com>

* batch notification

---------

Co-authored-by: ali <117142933+muhammad-ali-e@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Ritwik G <100672805+ritwik-g@users.noreply.github.com>
Co-authored-by: Deepak K <89829542+Deepak-Kesavan@users.noreply.github.com>
Co-authored-by: vishnuszipstack <117254672+vishnuszipstack@users.noreply.github.com>
Co-authored-by: Praveen Kumar <praveen@zipstack.com>
Co-authored-by: Chandrasekharan M <117059509+chandrasekharan-zipstack@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Athul <89829560+athul-rs@users.noreply.github.com>
kirtimanmishrazipstack added a commit that referenced this pull request Jun 1, 2026
- NotificationBuffer.dispatch_attempts + NOTIFICATION_MAX_DISPATCH_ATTEMPTS:
  _dispatch_group dead-letters rows past the cap and increments on each SENDING
  claim, bounding the reaper reclaim loop so a lost terminal callback can't
  redeliver forever (self-review #3).
- Delete the orphaned synchronous notification_v2/provider/ cluster — zero
  callers after the batched dispatch_notifications path replaced it (#2).
- Fold dispatch_attempts into 0002_notification_batching; refresh lifecycle
  db_comments + BufferStatus docstring.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
muhammad-ali-e added a commit that referenced this pull request Jun 8, 2026
…test-infra fix)

Seven of Vishnu's findings against ``524ae9184`` addressed. Three
flagged IMPORTANT (silent-failure + missing test coverage), four
SUGGESTION (drift hazard, comment/behaviour mismatch, hollow canary,
duplicate-test cleanup). The other three (hand-built fixture,
SDK1 ``dict[str, Any]`` boundary, ``as_header`` TypedDict refactor)
deferred — see PR thread acknowledgments.

* **#11a (SUGGESTION, drift)** — ``workers/queue_backend/dispatch.py``
  still hand-built the fairness header instead of calling
  ``fairness.as_header()``. Wire-format encoding now has a single
  source so the two sites can't drift. ``FAIRNESS_HEADER_NAME``
  import dropped (no longer used here).

* **#9 (SUGGESTION, comment/behaviour mismatch)** —
  ``ExecutionDispatcher._build_send_kwargs`` was forwarding ``{}``
  as-is despite the docstring calling it "likely indicates a
  producer-side build bug". Changed ``if headers is not None`` ->
  ``if headers``: falsy is dropped so the on-wire shape matches the
  no-headers baseline and a miswired producer surfaces immediately.
  Docstring rewritten to describe the new contract.

* **#3 (SUGGESTION, wording)** — ``canary_helpers`` docstring
  overstated adoption. Softened to "intended single home" and
  named the two characterisation walkers still inlining the logic
  as a known follow-up.

* **#5 + #6 (IMPORTANT cleanup + SUGGESTION fixture)** — six
  structurally-identical ``*_forwards_headers`` /
  ``*_omits_headers_when_none`` tests collapsed to three
  parametrized methods over the three dispatch entry points.
  Fixture now uses ``FairnessKey(...).as_header()`` rather than
  hand-built dicts, so the wire shape exercised matches what real
  producers emit (including ``pipeline_priority``). Net: ~60 LOC
  removed, per-method failure granularity preserved via parametrize
  IDs. Also added empty-dict drop assertions covering #9.

* **#7 (SUGGESTION, missing combined test)** — new
  ``test_dispatch_with_callback_combines_headers_and_callbacks``
  passes ``on_success``, ``on_error``, ``task_id``, and ``headers``
  together and asserts all four land on the same ``send_task``
  call. A key-merge regression in ``_build_send_kwargs`` would
  have slipped through the single-kwarg forwarding tests.

* **#8 (SUGGESTION, hollow canary)** — the
  ``execute_extraction`` dispatch canary only ever asserted the
  empty (passing) case against the live tree. Added a positive-
  detection unit test feeding ``ast.parse`` of a known-bad snippet
  and a blind-spot lock test (constant ref, f-string,
  ``apply_async`` all evade the detector — documenting the scope
  so a future widening intentionally trips the asserts).

* **#4 (IMPORTANT, untested helper)** — ``_fairness_headers`` in
  ``structure_tool_task`` was untested; a regression flipping
  ``NON_API`` -> ``API`` or dropping ``headers=`` at any of the
  three call sites would have stayed green. Added focused unit
  tests in new ``test_structure_tool_task.py`` (wire shape,
  org_id propagation, ``NON_API`` not ``API``) and extended
  ``test_sanity_phase5.TestStructureToolSingleDispatch`` to assert
  ``dispatch.call_args.kwargs["headers"]`` carries the expected
  shape.

* **#2 (IMPORTANT, vacuous-pass)** — ``iter_production_trees``
  warned-and-continued on ``SyntaxError`` but neither canary
  module promoted the warning to error. A botched merge in a
  production file would have dropped silently from the audit set
  and every canary would have passed vacuously over a smaller
  tree. Added ``pytestmark = pytest.mark.filterwarnings(
  "error::UserWarning")`` on both ``test_executor_dispatch`` and
  ``test_fairness_key``, plus a new ``test_canary_helpers.py``
  that unit-tests both the warn-on-broken behaviour and the
  promote-to-error contract the canary modules rely on.

**Bundled test-infra fix (unrelated but unblocks CI):** the
``test_callback_sanity.TestEagerHealthcheckRoundTrip`` tests
selected the healthcheck task via ``endswith(".healthcheck")``
against ``eager_app.tasks``, which is a shared celery global
registry containing ``callback.worker.healthcheck``,
``executor.worker.healthcheck``,
``file_processing.worker.healthcheck`` etc. The bare ``next(...)``
returned whichever was inserted first — non-deterministic across
pytest module-collection orders. Without this fix, the new tests
added in this commit perturb the collection profile enough to
flip the failure rate from ~10% to nearly 100%. Replaced with
exact-name lookup ``name == "callback.worker.healthcheck"``.
Identical fix already landed on the UN-3513 branch (see #2020).

Test count: 31 -> 42 on the UN-3508-touched modules. Full workers
suite: 6 failures pre-existing baseline, unchanged by this commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
muhammad-ali-e added a commit that referenced this pull request Jun 8, 2026
* UN-3508 [FEAT] Plumb fairness through ExecutionDispatcher

Phase 5.2 of the PG Queue rollout (epic UN-3445). Adds fairness-header
support to the third dispatch path (sdk1's ExecutionDispatcher) so
``execute_extraction`` tasks emitted by file_processing carry the same
routing metadata as workflow-execution dispatches that go through
queue_backend.

What

* sdk1/execution/dispatcher.py: ``dispatch``, ``dispatch_async``,
  ``dispatch_with_callback`` all accept an optional ``headers`` kwarg.
  When non-None, forwarded to Celery's send_task; when None, omitted
  so the call shape stays identical to pre-Phase-5.2 for callers that
  don't opt in (sdk1's existing tests remain green unchanged).
* queue_backend/fairness.py: new ``FairnessKey.as_header()`` method
  returns the wire-ready ``{"x-fairness-key": ...}`` dict. Producers
  no longer need to reference ``FAIRNESS_HEADER_NAME`` directly —
  keeps the additive-only canary in test_fairness_key.py happy.
* file_processing/structure_tool_task.py: small ``_fairness_headers``
  helper builds the header (defaulting workload_type to NON_API;
  propagating the real type is Phase 6 work). All three
  ``dispatcher.dispatch(...)`` sites (lines 468, 507, 720) now pass
  ``headers=_fairness_headers(organization_id)``.
* tests/test_executor_dispatch.py: new file. Covers header forwarding
  through all three dispatcher methods (including the "omit when
  None" pre-existing shape preservation), the FairnessKey.as_header()
  shape, and an AST inventory canary that forbids raw
  ``*.send_task("execute_extraction", ...)`` outside
  ExecutionDispatcher.

Why

UN-3501 plumbed fairness on bare dispatch() call sites. The
``execute_extraction`` task is the most workflow-execution-y dispatch
in the codebase but bypasses queue_backend (uses ExecutionDispatcher
directly), so it had no fairness header. The canary in
test_fairness_key.py audits only bare-name dispatch() and missed it.

No regression risk

* Additive: ``headers`` is optional and defaults to None on all three
  dispatcher methods; the existing 78 sdk1 tests pass unchanged.
* Producer-side only — no consumer reads ``x-fairness-key`` yet.
* No queue routing, task name, or args/kwargs change.

Test count: workers seam suite 53 -> 60 (new test_executor_dispatch.py
with 7 tests). sdk1 dispatcher suite 80/80 green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* UN-3508 [REFACTOR] Extract shared canary helpers to drop SonarCloud duplication

SonarCloud flagged 7.8% duplicated lines in the new
test_executor_dispatch.py — the file-walking helper and skip-dir
constants were copy-pasted from test_fairness_key.py.

Move them into tests/canary_helpers.py:

* WORKERS_ROOT, DEFAULT_SKIP_TOP_DIRS constants.
* iter_production_trees(skip_top_dirs=…) generator.

Both canary tests use relative imports (from .canary_helpers import …)
to keep one canonical import path — tests/ is already a package via
__init__.py, no pyproject change needed. (An earlier attempt added
pythonpath = ["tests"], reverted — it would have created a second
top-level import path for every test file and a dual-module-object
hazard.)

The fairness canary widens its skip set with ``queue_backend`` (where
the seam legitimately defines fairness constants); the executor canary
keeps the default. Tests stay at 60/60 — pure dedup, no behavioural
change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* UN-3508 [FIX] Address 14 PR review findings (HIGH/MED/NIT)

* dispatcher.py: factor _build_send_kwargs helper; document headers kwarg on dispatch_with_callback; reference FAIRNESS_HEADER_NAME symbol instead of bare string; document empty-dict caller-bug semantic
* structure_tool_task.py: narrow _fairness_headers return type; replace 'Phase 6 work' with TODO(UN-3504) anchor
* fairness.py: concrete as_header() docstring with explicit shape
* canary_helpers.py: surface SyntaxError via UserWarning (real silent-failure bug; canaries no longer pass vacuously on unparseable files)
* test_executor_dispatch.py: switch to dict[str, Any] dropping type-ignore; use WorkloadType.NON_API.value instead of invalid 'etl' literal; new test_dispatch_async_omits_headers_when_none; tighten canary docstring + note blind spots; drop plan-stage vocab; reorder relative import; new test_fairness_header_shape_orgless for org_id=None case

Tests: workers 60 -> 62, sdk1 dispatcher 80/80 green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* UN-3508 [DOCS] Fix iter_production_trees docstring: 'Yield' -> 'Return a list'

Greptile P2: function builds and returns a list — it is not a
generator — but the docstring opened with 'Yield ...', which would
mislead a reader into expecting lazy consumption / generator semantics
(early break, send(), etc.).

Pure docstring fix, no behaviour change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* UN-3508 [FIX] Address vishnuszipstack review (7 real fixes + bundled test-infra fix)

Seven of Vishnu's findings against ``524ae9184`` addressed. Three
flagged IMPORTANT (silent-failure + missing test coverage), four
SUGGESTION (drift hazard, comment/behaviour mismatch, hollow canary,
duplicate-test cleanup). The other three (hand-built fixture,
SDK1 ``dict[str, Any]`` boundary, ``as_header`` TypedDict refactor)
deferred — see PR thread acknowledgments.

* **#11a (SUGGESTION, drift)** — ``workers/queue_backend/dispatch.py``
  still hand-built the fairness header instead of calling
  ``fairness.as_header()``. Wire-format encoding now has a single
  source so the two sites can't drift. ``FAIRNESS_HEADER_NAME``
  import dropped (no longer used here).

* **#9 (SUGGESTION, comment/behaviour mismatch)** —
  ``ExecutionDispatcher._build_send_kwargs`` was forwarding ``{}``
  as-is despite the docstring calling it "likely indicates a
  producer-side build bug". Changed ``if headers is not None`` ->
  ``if headers``: falsy is dropped so the on-wire shape matches the
  no-headers baseline and a miswired producer surfaces immediately.
  Docstring rewritten to describe the new contract.

* **#3 (SUGGESTION, wording)** — ``canary_helpers`` docstring
  overstated adoption. Softened to "intended single home" and
  named the two characterisation walkers still inlining the logic
  as a known follow-up.

* **#5 + #6 (IMPORTANT cleanup + SUGGESTION fixture)** — six
  structurally-identical ``*_forwards_headers`` /
  ``*_omits_headers_when_none`` tests collapsed to three
  parametrized methods over the three dispatch entry points.
  Fixture now uses ``FairnessKey(...).as_header()`` rather than
  hand-built dicts, so the wire shape exercised matches what real
  producers emit (including ``pipeline_priority``). Net: ~60 LOC
  removed, per-method failure granularity preserved via parametrize
  IDs. Also added empty-dict drop assertions covering #9.

* **#7 (SUGGESTION, missing combined test)** — new
  ``test_dispatch_with_callback_combines_headers_and_callbacks``
  passes ``on_success``, ``on_error``, ``task_id``, and ``headers``
  together and asserts all four land on the same ``send_task``
  call. A key-merge regression in ``_build_send_kwargs`` would
  have slipped through the single-kwarg forwarding tests.

* **#8 (SUGGESTION, hollow canary)** — the
  ``execute_extraction`` dispatch canary only ever asserted the
  empty (passing) case against the live tree. Added a positive-
  detection unit test feeding ``ast.parse`` of a known-bad snippet
  and a blind-spot lock test (constant ref, f-string,
  ``apply_async`` all evade the detector — documenting the scope
  so a future widening intentionally trips the asserts).

* **#4 (IMPORTANT, untested helper)** — ``_fairness_headers`` in
  ``structure_tool_task`` was untested; a regression flipping
  ``NON_API`` -> ``API`` or dropping ``headers=`` at any of the
  three call sites would have stayed green. Added focused unit
  tests in new ``test_structure_tool_task.py`` (wire shape,
  org_id propagation, ``NON_API`` not ``API``) and extended
  ``test_sanity_phase5.TestStructureToolSingleDispatch`` to assert
  ``dispatch.call_args.kwargs["headers"]`` carries the expected
  shape.

* **#2 (IMPORTANT, vacuous-pass)** — ``iter_production_trees``
  warned-and-continued on ``SyntaxError`` but neither canary
  module promoted the warning to error. A botched merge in a
  production file would have dropped silently from the audit set
  and every canary would have passed vacuously over a smaller
  tree. Added ``pytestmark = pytest.mark.filterwarnings(
  "error::UserWarning")`` on both ``test_executor_dispatch`` and
  ``test_fairness_key``, plus a new ``test_canary_helpers.py``
  that unit-tests both the warn-on-broken behaviour and the
  promote-to-error contract the canary modules rely on.

**Bundled test-infra fix (unrelated but unblocks CI):** the
``test_callback_sanity.TestEagerHealthcheckRoundTrip`` tests
selected the healthcheck task via ``endswith(".healthcheck")``
against ``eager_app.tasks``, which is a shared celery global
registry containing ``callback.worker.healthcheck``,
``executor.worker.healthcheck``,
``file_processing.worker.healthcheck`` etc. The bare ``next(...)``
returned whichever was inserted first — non-deterministic across
pytest module-collection orders. Without this fix, the new tests
added in this commit perturb the collection profile enough to
flip the failure rate from ~10% to nearly 100%. Replaced with
exact-name lookup ``name == "callback.worker.healthcheck"``.
Identical fix already landed on the UN-3513 branch (see #2020).

Test count: 31 -> 42 on the UN-3508-touched modules. Full workers
suite: 6 failures pre-existing baseline, unchanged by this commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant