Skip to content

fix(llama-parse): rename schema field url to base_url (#1972)#2019

Open
jaseemjaskp wants to merge 2 commits into
mainfrom
fix/llamaparse-base-url-schema-1972
Open

fix(llama-parse): rename schema field url to base_url (#1972)#2019
jaseemjaskp wants to merge 2 commits into
mainfrom
fix/llamaparse-base-url-schema-1972

Conversation

@jaseemjaskp
Copy link
Copy Markdown
Contributor

What

  • Rename the LlamaParse x2text adapter UI schema field from url to base_url in json_schema.json.

Why

  • The schema defined the base-URL field as url, but the adapter code reads it via LlamaParseConfig.BASE_URL ("base_url"). The mismatch meant the value a user entered was stored under url and never read, so LlamaParse always fell back to its hardcoded US endpoint (https://api.cloud.llamaindex.ai).
  • As a result, EU-region LlamaCloud users could not use LlamaParse — their key was sent to the US endpoint, returning 401 Unauthorized.

How

  • Changed the property key "url""base_url" in unstract/sdk1/.../llama_parse/src/static/json_schema.json so it matches constants.py and the lookup in llama_parse.py.

Can this PR break any existing features. If yes, please list possible items. If no, please explain why. (PS: Admins do not merge the PR without this section filled)

  • No new breakage. This is a forward fix for new adapter instances.
  • Note on existing instances: any LlamaParse adapter already saved before this fix has its value persisted under the old url key (configs are stored Fernet-encrypted in adapter_instance.adapter_metadata_b), so it will not auto-migrate. Those few users simply need to re-open and re-save the adapter to populate base_url. A data migration was considered but intentionally skipped to keep the change minimal.

Database Migrations

  • None.

Env Config

  • None.

Relevant Docs

  • N/A

Related Issues or PRs

Dependencies Versions

  • N/A

Notes on Testing

  • Validated the JSON and ran the repo pre-commit JSON/whitespace hooks (all pass).
  • Manual verification: configure a LlamaParse adapter with the EU base URL (https://api.cloud.eu.llamaindex.ai) and confirm extraction succeeds instead of returning 401.

Screenshots

Checklist

I have read and understood the Contribution Guidelines.

The LlamaParse x2text adapter UI schema defined the base URL field as
"url", but the adapter code reads it via LlamaParseConfig.BASE_URL
("base_url"). The mismatch meant the configured URL was never applied,
so LlamaParse always fell back to the hardcoded US endpoint and EU-region
users hit 401 errors. Rename the schema key to base_url to match.

Fixes #1972
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 8, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7458208f-b69a-4ffe-8502-6b761da5bef6

📥 Commits

Reviewing files that changed from the base of the PR and between d55c80e and b37059b.

📒 Files selected for processing (3)
  • unstract/sdk1/src/unstract/sdk1/adapters/x2text/llama_parse/src/constants.py
  • unstract/sdk1/src/unstract/sdk1/adapters/x2text/llama_parse/src/llama_parse.py
  • unstract/sdk1/src/unstract/sdk1/adapters/x2text/llama_parse/src/static/json_schema.json
🚧 Files skipped from review as they are similar to previous changes (1)
  • unstract/sdk1/src/unstract/sdk1/adapters/x2text/llama_parse/src/static/json_schema.json

Summary by CodeRabbit

  • Chores
    • Configuration schema for the Llama Parse Text Extractor adapter updated: a connection property was renamed for clearer wording and validation now expects a URI-style value.
  • Bug Fixes
    • Backward-compatibility added so existing configurations using the previous property name continue to work without changes.

Walkthrough

The PR fixes a schema-code mismatch in the LlamaParse adapter by renaming the JSON schema property from url to base_url, aligning the UI configuration field name with the actual configuration key expected by the adapter implementation.

Changes

LlamaParse Schema Property Alignment

Layer / File(s) Summary
Schema property rename from url to base_url
unstract/sdk1/src/unstract/sdk1/adapters/x2text/llama_parse/src/static/json_schema.json
Renamed the JSON schema configuration property key from url to base_url, updated its title to "Base URL" and format from url to uri.
Add legacy constant
unstract/sdk1/src/unstract/sdk1/adapters/x2text/llama_parse/src/constants.py
Adds LlamaParseConfig.LEGACY_BASE_URL = "url" to preserve compatibility with previously saved adapter instances.
Runtime fallback to legacy key
unstract/sdk1/src/unstract/sdk1/adapters/x2text/llama_parse/src/llama_parse.py
Adapter now reads base_url from config[BASE_URL] and falls back to config[LEGACY_BASE_URL] if needed.

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: renaming a schema field from 'url' to 'base_url' in the LlamaParse adapter configuration, directly addressing the issue.
Description check ✅ Passed The description is comprehensive, covering all required sections: What, Why, How, breaking changes assessment, database migrations, environment config, related issues, testing notes, and the contribution guidelines checklist.
Linked Issues check ✅ Passed The PR fulfills all objectives from issue #1972: renames schema field 'url' to 'base_url', aligns with constants.py and llama_parse.py, adds legacy fallback support, and restores EU region support.
Out of Scope Changes check ✅ Passed All changes are directly related to fixing the schema/config key mismatch. Schema property rename, legacy fallback constant, and parser initialization update are all within scope of issue #1972.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/llamaparse-base-url-schema-1972

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Jun 8, 2026

Greptile Summary

This PR fixes a key mismatch in the LlamaParse x2text adapter where the JSON schema stored the base URL under "url" but the adapter code read it as "base_url", causing EU-region users to always hit the hardcoded US endpoint and receive 401 Unauthorized errors. The rename is paired with a backward-compatibility fallback so existing saved adapter instances continue to work.

  • json_schema.json: Renames the property key urlbase_url, updates the title, and corrects the format annotation from the non-standard "url" to the JSON Schema-standard "uri".
  • constants.py: Adds LEGACY_BASE_URL = "url" to name the old key for use in the fallback path.
  • llama_parse.py: Changes the base_url argument to LlamaParse(...) to prefer base_url and fall back to url for configs stored before the rename.

Confidence Score: 5/5

Safe to merge — the change correctly fixes a field-name mismatch that was silently dropping a user-supplied value, and the backward-compatibility fallback protects existing saved configs.

The rename is minimal and targeted: one JSON schema key change, one new constant, and a single or-fallback in the parser call. The fallback logic correctly handles both new instances (reads base_url) and old ones (falls through to the legacy url key). No data is lost and no other code paths are affected.

No files require special attention.

Important Files Changed

Filename Overview
unstract/sdk1/src/unstract/sdk1/adapters/x2text/llama_parse/src/static/json_schema.json Renames the url property key to base_url and corrects the format from the non-standard "url" to the JSON Schema-standard "uri"; title updated accordingly.
unstract/sdk1/src/unstract/sdk1/adapters/x2text/llama_parse/src/constants.py Adds LEGACY_BASE_URL = "url" constant to support backward compatibility with adapter instances saved before the rename.
unstract/sdk1/src/unstract/sdk1/adapters/x2text/llama_parse/src/llama_parse.py Updates _call_parser to fall back to the legacy url key when base_url is absent, preserving continuity for existing adapter instances.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Adapter config loaded] --> B{config.get 'base_url'?}
    B -- new instance, has value --> C[Use base_url]
    B -- old instance, None or empty --> D{config.get 'url' legacy?}
    D -- old instance, has value --> E[Use legacy url]
    D -- neither present --> F[Pass None → LlamaParse default US endpoint]
    C --> G[LlamaParse API call]
    E --> G
    F --> G
Loading

Reviews (2): Last reviewed commit: "fix(llama-parse): add legacy url fallbac..." | Re-trigger Greptile

Copy link
Copy Markdown
Contributor Author

@jaseemjaskp jaseemjaskp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review Toolkit — summary (6 agents: Code Reviewer, Silent Failure Hunter, Type Design, PR Test Analyzer, Comment Analyzer, Code Simplifier)

Verdict: the one-line rename is correct and fixes a real bug. Before this PR the schema published the key url while the adapter reads base_url (constants.py:5LlamaParseConfig.BASE_URL = "base_url", consumed at llama_parse.py:55). So the configured Base URL never reached the adapter — it resolved to None and the SDK silently fell back to its default endpoint. This PR aligns the two. Blast radius is fully contained: LlamaParseConfig is referenced only within llama_parse.py; no frontend/backend code reads the literal url key.

Prioritised findings

P1 — Backward-compat / no migration (llama_parse.py:55, outside diff). Adapter instances saved before this change persist the value under the old url key and have no base_url, so config.get("base_url") returns None for them. Cloud users are unaffected (None → default cloud endpoint), but self-hosted / custom-endpoint users silently revert to https://api.cloud.llamaindex.ai with no log. Fix options: (a) data migration renaming the persisted key, or (b) legacy fallback: self.config.get(LlamaParseConfig.BASE_URL) or self.config.get("url"). Moot if this adapter has no released configs yet — please confirm. (Inline note on schema line 21.)

P2 — Pre-existing broad except Exception mislabels errors (llama_parse.py:94-98, outside diff). Any non-ConnectError (401/403/404 from a wrong base_url, etc.) is hard-coded as "invalid API Key", and line 96 uses a non-f-string "...: {exe}" so the exception detail is never logged. Not introduced here, but the rename makes these failures more likely. Recommend fixing the f-string and not asserting a guessed cause.

P3 — No test coverage (sdk1/tests). Zero tests reference LlamaParse or this schema. A cheap, network-free regression test mirroring test_gemini_embedding.py:31-53 would pin the schema-key↔constant mapping that just drifted: assert LlamaParseConfig.BASE_URL in schema['properties'], 'url' not in properties, required == {'api_key'}. Note: only assert the subset of constants the adapter actually reads from self.config (NUM_WORKERS/LANGUAGE are not in the schema). Criticality ~7/10.

P4 — title still reads "URL" (schema line 23, inline). Field is now base_url and description says "Base URL"; suggest "title": "Base URL".

P5 — Nits (optional): "format": "url" is non-standard JSON Schema — standard is "uri" (schema line 24, inline). Description (line 26) mis-cases "llama Parse" and drops the article vs line 19 — suggest "...of the Llama Parse server.". api_key block (line 14) is indented 8 spaces vs 6 for siblings. All pre-existing.

Type Design / Code Simplifier: schema is well-formed; base_url optional-with-default is the right choice (not in required). Nothing to simplify in the rename itself. Note llama_parse is now the lone x2text adapter using base_url vs url in siblings — acceptable since its code requires base_url.

No blocking issues. Recommend addressing P1 (or confirming it's moot) and P4 before merge; P2/P3 are good follow-ups.

- Fall back to the legacy 'url' key so LlamaParse adapter instances saved
  before the base_url rename (#1972) keep their configured endpoint instead
  of silently reverting to the default US endpoint.
- Update schema title 'URL' -> 'Base URL' and format 'url' -> 'uri'.

Addresses review feedback on #2019.
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented Jun 8, 2026

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 8, 2026

Unstract test results

Per-group results

Status Group Tier Passed Failed Errors Skipped Duration (s)
unit-connectors unit 64 12 0 3 16.5
unit-core unit 0 0 2 0 1.2
unit-platform-service unit 9 0 1 0 1.3
unit-prompt-service unit 15 0 0 0 20.1
unit-rig unit 53 0 0 0 3.2
unit-runner unit 11 0 0 0 3.0
unit-sdk1 unit 381 0 0 0 24.1
unit-tool-registry unit 0 0 1 0 1.3
unit-workers unit 0 0 0 0 16.8
TOTAL 533 12 4 3 87.5

Critical paths

⚠️ Critical paths not yet covered

  • auth-login — User can log in and obtain a session cookie. (entry: POST /api/v1/auth/login; declared coverage: no groups declared)
  • adapter-register-llm — Register and validate an LLM adapter. (entry: POST /api/v1/adapter/; declared coverage: no groups declared)
  • workflow-create-execute — Create a workflow, configure source+destination, execute, poll, fetch result. (entry: POST /api/v1/workflow/{id}/execute/; declared coverage: e2e-workflow)
  • api-deployment-run — Deploy a workflow as an API, POST a document, receive structured JSON. (entry: POST /deployment/api/{org}/{name}/; declared coverage: e2e-api-deployment)
  • prompt-studio-fetch-response — Prompt Studio: create project, add prompt, run single-pass, get response. (entry: POST /api/v1/prompt-studio/prompt-studio-tool/{id}/fetch_response/; declared coverage: e2e-prompt-studio)
  • pipeline-etl-execute — Run an ETL pipeline from source connector to destination. (entry: POST /api/v1/pipeline/{id}/execute/; declared coverage: no groups declared)
  • usage-token-tracking — Per-execution token usage is recorded and retrievable. (entry: GET /api/v1/usage/get_token_usage/; declared coverage: no groups declared)
  • workflow-execution-fan-out — Multi-file workflow execution fans out to file-processing workers and rejoins. (entry: internal: backend → rabbitmq → workers/file_processing; declared coverage: no groups declared)
  • callback-result-delivery — Async results are posted back via the callback worker. (entry: internal: workers/callback → backend /internal endpoints; declared coverage: no groups declared)
✅ Covered critical paths
  • tool-sandbox-exec — covered by unit-runner

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LlamaParse adapter: url field in json_schema.json doesn't match base_url config key, breaking EU region support

2 participants