Skip to content

[FIX] Unblock urllib3 2.x via google-auth bump (clear urllib3 Dependabot alerts)#2042

Open
jaseemjaskp wants to merge 1 commit into
mainfrom
fix/dependabot-urllib3-protobuf
Open

[FIX] Unblock urllib3 2.x via google-auth bump (clear urllib3 Dependabot alerts)#2042
jaseemjaskp wants to merge 1 commit into
mainfrom
fix/dependabot-urllib3-protobuf

Conversation

@jaseemjaskp

Copy link
Copy Markdown
Contributor

What

Unblocks urllib3 1.26.20 → 2.7.0 across root / backend / workers / connectors by lifting the constraint that pinned it to the 1.x line.

  • unstract/connectors/pyproject.toml: google-auth==2.20.0google-auth>=2.22.0
  • uv.lock, backend/uv.lock, workers/uv.lock, unstract/connectors/uv.lock: urllib3 → 2.7.0, google-auth → 2.53.0 (cachetools dropped — no longer a hard dep of google-auth 2.53)

Why

urllib3 was held at 1.26.20 everywhere, and its CVEs (sensitive-header leakage across redirects, decompression-bomb bypass, unbounded decompression chain) have no 1.26.x patch — the fixes exist only in 2.x. The resolver was capped by a single pin: google-auth==2.20.0, which declares urllib3<2.0. google-auth>=2.22.0 removed that cap, so urllib3 resolves to 2.7.0 and the alerts clear. It was the only urllib3<2 constraint in the graph (verified by forcing urllib3>=2.7.0 and reading the resolver conflict).

How

  • Diagnosed the cap by forcing urllib3>=2.7.0 into the resolver until it named google-auth==2.20.0.
  • Bumped only the google-auth lower bound (left google-cloud-bigquery pinned, since the remaining protobuf cap is separate — see below).
  • >= matches existing repo style (boto3~=, httpx>=, croniter>=, …).

Can this PR break any existing features. If yes, please list possible items. If no, please explain why. (PS: Admins do not merge the PR without this section filled)

This is the highest-risk PR in the Dependabot series and should get real integration testing before merge. It changes the HTTP layer (urllib3 1.x → 2.x) and the GCP auth library (google-auth 2.20 → 2.53), both used across the cloud connectors (GCS, BigQuery, GCP-auth) and anything doing HTTP via requests/botocore/etc.

Mitigating evidence:

  • Tiny, surgical diff — 3 packages per lock (urllib3 ↑, google-auth ↑, cachetools removed). No wider cascade.
  • All 4 locks pass uv lock --check.
  • Connectors env smoke test: urllib3 2.7.0, google.auth, google.cloud.bigquery, pymysql all import cleanly.
  • urllib3 2.x is mature; google-auth 2.20→2.53 stays within the same major.

👉 Please exercise the cloud connectors (GCS / BigQuery / GCP auth) and any outbound-HTTP paths in CI/staging. A full local run isn't possible (backend env blocked by the unrelated django-celery-beat 2.5.0 wheel quirk); connectors-level imports are verified but live auth/transfer flows are not.

Consider merging this after the lower-risk Dependabot PRs (#2038#2041) land.

Database Migrations

None.

Env Config

None.

Relevant Docs

urllib3 2.0 migration guide · google-auth changelog

Related Issues or PRs

Final PR of the Dependabot remediation series (#2038 frontend, #2039 python-transitive, #2040 Django, #2041 Authlib/PyMySQL).

Deferred — protobuf 5.x

protobuf is still capped <5 by additional google/grpc libraries (beyond google-cloud-bigquery, which this PR leaves pinned), and 4 → 5 is a major-version jump. It stays on the 4.25.x LTS line (latest patch), which carries the relevant backported fixes. Moving to 5.x warrants its own PR with a coordinated google-cloud / grpcio upgrade + testing.

Dependencies Versions

urllib3 1.26.20 → 2.7.0 · google-auth 2.20.0 → 2.53.0

Notes on Testing

  • uv lock --check on all 4 workspaces ✓
  • Connectors import smoke test (urllib3 / google.auth / bigquery / pymysql) ✓
  • Live cloud-connector + HTTP integration testing to be done in CI/staging.

Screenshots

N/A — dependency change.

Checklist

I have read and understood the Contribution Guidelines.

…ndabot alerts)

urllib3 was held at 1.26.20 across root/backend/workers/connectors because
google-auth==2.20.0 (pinned in unstract-connectors) requires urllib3<2.0.
google-auth>=2.22.0 drops that cap, letting urllib3 resolve to 2.7.0 and
clearing the urllib3 high-severity alerts (header leakage on redirects,
decompression-bomb bypass, unbounded decompression chain).

- unstract/connectors/pyproject.toml: google-auth==2.20.0 -> >=2.22.0
- root/backend/workers/connectors uv.lock: urllib3 1.26.20 -> 2.7.0,
  google-auth 2.20.0 -> 2.53.0 (cachetools dropped — no longer a hard dep
  of google-auth 2.53)

Verified: all 4 locks pass 'uv lock --check'; connectors env smoke test
imports urllib3 2.7.0, google.auth, google.cloud.bigquery, pymysql cleanly.

Deferred (NOT in this PR): protobuf 5.x is still capped <5 by additional
google/grpc libraries (beyond google-cloud-bigquery) and is a major-version
jump; protobuf stays on the 4.25.x LTS line for now.
@coderabbitai

coderabbitai Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Summary by CodeRabbit

  • Chores
    • Updated google-auth dependency to version 2.22.0 or later for improved compatibility.

Walkthrough

The PR updates the google-auth dependency in the connectors package from a pinned version 2.20.0 to a relaxed lower-bounded constraint >=2.22.0. This change permits usage of google-auth versions 2.22.0 and above, removing the implicit urllib3<2 constraint that was present in version 2.20.0.

Changes

Dependency Version Constraint

Layer / File(s) Summary
Update google-auth to >=2.22.0
unstract/connectors/pyproject.toml
The google-auth dependency is relaxed from exact pinning at 2.20.0 to a minimum version constraint >=2.22.0, with an inline note that this removes the urllib3<2 cap.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: unblocking urllib3 2.x by bumping google-auth to resolve Dependabot security alerts.
Description check ✅ Passed The PR description comprehensively covers all required template sections: What (urllib3/google-auth versions), Why (CVE fixes), How (methodology), risk assessment, database/env changes, testing notes, related PRs, and dependencies versions.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/dependabot-urllib3-protobuf

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@sonarqubecloud

Copy link
Copy Markdown

@github-actions

Copy link
Copy Markdown
Contributor

Unstract test results

Per-group results

Status Group Tier Passed Failed Errors Skipped Duration (s)
unit-connectors unit 64 12 0 3 16.6
unit-core unit 0 0 2 0 1.2
unit-platform-service unit 9 0 1 0 1.3
unit-prompt-service unit 15 0 0 0 19.7
unit-rig unit 53 0 0 0 3.2
unit-runner unit 11 0 0 0 3.0
unit-sdk1 unit 390 0 0 0 20.3
unit-tool-registry unit 0 0 1 0 1.4
unit-workers unit 0 0 0 0 17.1
TOTAL 542 12 4 3 83.8

Critical paths

⚠️ Critical paths not yet covered

  • auth-login — User can log in and obtain a session cookie. (entry: POST /api/v1/auth/login; declared coverage: no groups declared)
  • adapter-register-llm — Register and validate an LLM adapter. (entry: POST /api/v1/adapter/; declared coverage: no groups declared)
  • workflow-create-execute — Create a workflow, configure source+destination, execute, poll, fetch result. (entry: POST /api/v1/workflow/{id}/execute/; declared coverage: e2e-workflow)
  • api-deployment-run — Deploy a workflow as an API, POST a document, receive structured JSON. (entry: POST /deployment/api/{org}/{name}/; declared coverage: e2e-api-deployment)
  • prompt-studio-fetch-response — Prompt Studio: create project, add prompt, run single-pass, get response. (entry: POST /api/v1/prompt-studio/prompt-studio-tool/{id}/fetch_response/; declared coverage: e2e-prompt-studio)
  • pipeline-etl-execute — Run an ETL pipeline from source connector to destination. (entry: POST /api/v1/pipeline/{id}/execute/; declared coverage: no groups declared)
  • usage-token-tracking — Per-execution token usage is recorded and retrievable. (entry: GET /api/v1/usage/get_token_usage/; declared coverage: no groups declared)
  • workflow-execution-fan-out — Multi-file workflow execution fans out to file-processing workers and rejoins. (entry: internal: backend → rabbitmq → workers/file_processing; declared coverage: no groups declared)
  • callback-result-delivery — Async results are posted back via the callback worker. (entry: internal: workers/callback → backend /internal endpoints; declared coverage: no groups declared)
✅ Covered critical paths
  • tool-sandbox-exec — covered by unit-runner

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@unstract/connectors/pyproject.toml`:
- Line 25: Update the inline comment on the dependency entry
"google-auth>=2.22.0" to remove the incorrect claim that it “drops the urllib3<2
cap” (google-auth==2.22.0 still pins urllib3<2.0); instead note that upgrading
to a google-auth version that truly allows urllib3>=2 is required to lift that
cap. Also audit any direct use of urllib3.util.Retry in your codebase: replace
deprecated parameters such as method_whitelist with allowed_methods and update
any references to renamed DEFAULT_* constants to their urllib3 2.0 equivalents
so Retry construction is compatible with urllib3 2.x.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: af6aa938-a3c3-4f3f-9c8b-3f917ffdf9f4

📥 Commits

Reviewing files that changed from the base of the PR and between cafcca7 and 41c9503.

⛔ Files ignored due to path filters (4)
  • backend/uv.lock is excluded by !**/*.lock
  • unstract/connectors/uv.lock is excluded by !**/*.lock
  • uv.lock is excluded by !**/*.lock
  • workers/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (1)
  • unstract/connectors/pyproject.toml

"singleton-decorator~=1.0.0",
# For GCSHelper
"google-auth==2.20.0",
"google-auth>=2.22.0", # >=2.22 drops the urllib3<2 cap

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

google-auth 2.22.0 urllib3 constraint removal changelog

💡 Result:

In google-auth version 2.22.0, the urllib3 dependency was explicitly constrained to < 2.0 [1]. This constraint was added in earlier versions to avoid breaking changes introduced by the urllib3 2.0 release [2]. The removal of this constraint did not occur in version 2.22.0. Instead, it was addressed in subsequent releases. A later attempt to update the dependency to require urllib3 >= 2.0.5 in pull request #1389 caused dependency conflicts for other packages [3]. Consequently, support for urllib3 < 2.0 versions was restored in google-auth version 2.23.2 [4].

Citations:


🌐 Web query:

google-auth breaking changes 2.20 to 2.53

💡 Result:

From google-auth library version 2.20.0 (2023-06-12) up through 2.53.0 (2026-05-15), the authoritative changelog shows mostly bugfixes/features; I did not find any documented “breaking changes” sections specifically spanning that entire range. What I can state with evidence: - The google-auth 2.20.0 changelog entry exists (2023-06-12) and lists changes under bug fixes/features, including adding public API load_credentials_from_dict (#1326) and expiry handling for compute_engine.IDTokenCredentials and impersonated_credentials.IDTokenCredentials (#1327/#1330) [1]. - A later release in this range, google-auth v2.53.0, lists bug fixes/features (e.g., allowlisting agents-nonprod trust domains for agent identity and fail-fast on invalid/non-workload certificate configs) and does not mark itself as a breaking-change release in the release notes [2][3]. If you meant a specific type of “breaking change” (API signature change, removed symbol, dependency constraint change, auth behavior change, etc.), tell me which surface you care about and (ideally) your current version and target version (e.g., “2.20 → 2.53”). I can then enumerate only the relevant deltas from the changelog/compare views for those exact releases.

Citations:


🌐 Web query:

urllib3 2.0 breaking changes Retry API compatibility

💡 Result:

In urllib3 2.0, the Retry API had a few compatibility-breaking changes, mostly around renamed/removed Retry configuration fields and default constants. 1) Removed/renamed Retry option names (most common break) - Deprecated options in 1.x were removed in 2.0: - Retry.method_whitelist (and Retry.DEFAULT_METHOD_WHITELIST) -> replaced by Retry.allowed_methods / Retry.DEFAULT_ALLOWED_METHODS - Retry.DEFAULT_REDIRECT_HEADERS_BLACKLIST -> replaced by Retry.DEFAULT_REMOVE_HEADERS_ON_REDIRECT - Retry(method_whitelist=...) -> replaced by Retry(allowed_methods=...) This is explicitly called out as deprecated options being removed “Starting in urllib3 v2.0”. [1][2] 2) Changed/removed class constants related to backoff - Retry.BACK0FF_MAX was renamed to Retry.DEFAULT_BACKOFF_MAX. [1][3] - Retry.MAX_BACKOFF was deprecated and removed in v2.0 in favor of Retry.DEFAULT_MAX_BACKOFF. [1][4] 3) Retry constructor gained new backoff-related parameters - urllib3 2.0 added: - a configurable Retry(backoff_max=...) - optional Retry(backoff_jitter=...) If you were subclassing or introspecting constructor parameters, these signature changes can be relevant. [1][3] What “compatibility” usually looks like for most users - The urllib3 2.0 migration guide states that urllib3 “maintain[s] functional API compatibility for most users” and that many changes are to defaults/internal implementation details; you generally only see trouble when you use deprecated/removed Retry option names or constants. [5] If you want, paste your code that constructs Retry (or your dependency tree / error message) and I can map it to the correct urllib3 2.x Retry arguments.

Citations:


🏁 Script executed:

#!/bin/bash
# Description: Verify google-auth and urllib3 API usage across the codebase

echo "=== Checking google-auth API usage ==="
rg -n --type=py "from google\.oauth2 import|from google\.cloud import" -A 3

echo -e "\n=== Checking urllib3 direct imports ==="
rg -n --type=py "from urllib3|import urllib3" -A 2

echo -e "\n=== Checking if any code pins urllib3 or google-auth versions elsewhere ==="
fd -e toml -e txt -e lock -e cfg | xargs rg -l "urllib3|google-auth"

Repository: Zipstack/unstract

Length of output: 2855


Fix the google-auth → urllib3 “cap removal” rationale in pyproject

  • The inline comment on google-auth>=2.22.0 (“drops the urllib3<2 cap”) is incorrect for google-auth==2.22.0; upstream packaging for 2.22.0 still pins urllib3 to <2.0, so >=2.22.0 alone doesn’t guarantee urllib3 2.x.
  • Because the project uses urllib3.util.Retry directly, re-check the retry construction for urllib3 2.0 removed/renamed options/constants (e.g., method_whitelistallowed_methods, related DEFAULT_* renames).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@unstract/connectors/pyproject.toml` at line 25, Update the inline comment on
the dependency entry "google-auth>=2.22.0" to remove the incorrect claim that it
“drops the urllib3<2 cap” (google-auth==2.22.0 still pins urllib3<2.0); instead
note that upgrading to a google-auth version that truly allows urllib3>=2 is
required to lift that cap. Also audit any direct use of urllib3.util.Retry in
your codebase: replace deprecated parameters such as method_whitelist with
allowed_methods and update any references to renamed DEFAULT_* constants to
their urllib3 2.0 equivalents so Retry construction is compatible with urllib3
2.x.

@greptile-apps

greptile-apps Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR unblocks urllib3 2.x across all four lock files by lifting the google-auth==2.20.0 exact pin (which carried a urllib3<2 transitive constraint) to google-auth>=2.22.0, resolving the library to 2.53.0. The change is a security fix: urllib3 1.26.x is end-of-life with known CVEs that only have patches in the 2.x line.

  • unstract/connectors/pyproject.toml: single-line constraint change from ==2.20.0 to >=2.22.0, with an explanatory inline comment.
  • All four uv.lock files updated consistently: urllib3 1.26.20 → 2.7.0, google-auth 2.20.0 → 2.53.0, cachetools removed (no longer a hard dep of google-auth 2.53.0; cryptography is now declared instead, but was already present transitively).
  • The two direct urllib3 usages in the codebase (urllib3.util.Retry / urllib3.util.retry.Retry) are unchanged API in urllib3 2.x; requests 2.33.0 and botocore 1.37.1 are both urllib3 2.x compatible.

Confidence Score: 4/5

Safe to merge after cloud-connector integration testing; the change is surgically narrow (3 packages per lock) and all in-repo urllib3 API call sites are compatible with 2.x.

The diff is a dependency-only change touching urllib3 (a foundational HTTP layer) and google-auth (used by every GCP connector path). No code is wrong in isolation, and the resolver produces a consistent, verified lock across all four workspaces. The gap that keeps this short of a clean pass is the live auth/transfer flows: google-auth jumped 33 minor versions (2.20→2.53) and urllib3 crossed a major version boundary, so any subtle behavioural difference in redirect handling, credential refresh, or streaming would only surface under real cloud traffic.

All lock files are consistent and mechanically correct. The only file worth a second look is unstract/connectors/pyproject.toml — the open-ended >=2.22.0 bound is intentional and matches repo style, but reviewers should be aware that a future uv lock --upgrade could select google-auth 3.x if one is released.

Important Files Changed

Filename Overview
unstract/connectors/pyproject.toml Single constraint bump: google-auth==2.20.0 → >=2.22.0 with an inline comment; follows existing repo >= style and is the root cause fix.
unstract/connectors/uv.lock urllib3 1.26.20→2.7.0, google-auth 2.20.0→2.53.0, cachetools entry removed; cryptography was already present transitively so no new native dep is introduced.
uv.lock Root workspace lock updated with the same three-package delta as the connectors lock; changes are consistent.
backend/uv.lock Backend workspace lock updated with the same three-package delta; consistent with root and connectors locks.
workers/uv.lock Workers workspace lock updated with the same three-package delta; consistent across all four workspaces.

Reviews (1): Last reviewed commit: "[FIX] Unblock urllib3 2.x by bumping goo..." | Re-trigger Greptile

@jaseemjaskp jaseemjaskp left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated review (PR Review Toolkit). This is a clean, internally-consistent dependency bump; the 4 lock files match each other and the pyproject change, and platform-service/prompt-service/sdk1 were correctly left untouched (they already resolve urllib3 2.7.0). One line carries two minor, non-blocking points below — distinct from the floor-version question @coderabbitai already raised.

"singleton-decorator~=1.0.0",
# For GCSHelper
"google-auth==2.20.0",
"google-auth>=2.22.0", # >=2.22 drops the urllib3<2 cap

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dependency hygiene — two minor points (separate from the floor-version question already raised):

1. Unbounded upper bound breaks this block's pinning convention. Every other dependency here is exact-pinned (google-cloud-secret-manager==2.16.1, gcsfs==2024.10.0) or capped (singleton-decorator~=1.0.0). google-auth>=2.22.0 is the only unbounded-upper specifier. The lock pins 2.53.0 so today's builds are reproducible, but a future uv lock regen will silently pull whatever google-auth is newest, with no guard against a breaking major.
Suggest: match the siblings with "google-auth==2.53.0", or at minimum cap it: "google-auth>=2.22.0,<3".

2. Comment is slightly misleading + names an unverifiable version. "drops the urllib3<2 cap" reads as if google-auth still depends on urllib3 with a relaxed range. Per the lock diff, google-auth 2.53.0 removes urllib3 from its dependencies entirely (2.20.0 listed urllib3/six/rsa/cachetools; 2.53.0 lists only cryptography + pyasn1-modules). Also 2.22 is neither the old (2.20.0) nor new (2.53.0) endpoint and isn't substantiated here, so it's prime comment-rot.
Suggest:

# google-auth >= 2.53 no longer depends on urllib3 at all (2.20 did) — that's what unblocks urllib3 2.x
"google-auth>=2.22.0",

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant