fix(llama-parse): rename schema field url to base_url (#1972)#2019
fix(llama-parse): rename schema field url to base_url (#1972)#2019jaseemjaskp wants to merge 2 commits into
Conversation
The LlamaParse x2text adapter UI schema defined the base URL field as
"url", but the adapter code reads it via LlamaParseConfig.BASE_URL
("base_url"). The mismatch meant the configured URL was never applied,
so LlamaParse always fell back to the hardcoded US endpoint and EU-region
users hit 401 errors. Rename the schema key to base_url to match.
Fixes #1972
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
🚧 Files skipped from review as they are similar to previous changes (1)
Summary by CodeRabbit
WalkthroughThe PR fixes a schema-code mismatch in the LlamaParse adapter by renaming the JSON schema property from ChangesLlamaParse Schema Property Alignment
🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
| Filename | Overview |
|---|---|
| unstract/sdk1/src/unstract/sdk1/adapters/x2text/llama_parse/src/static/json_schema.json | Renames the url property key to base_url and corrects the format from the non-standard "url" to the JSON Schema-standard "uri"; title updated accordingly. |
| unstract/sdk1/src/unstract/sdk1/adapters/x2text/llama_parse/src/constants.py | Adds LEGACY_BASE_URL = "url" constant to support backward compatibility with adapter instances saved before the rename. |
| unstract/sdk1/src/unstract/sdk1/adapters/x2text/llama_parse/src/llama_parse.py | Updates _call_parser to fall back to the legacy url key when base_url is absent, preserving continuity for existing adapter instances. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Adapter config loaded] --> B{config.get 'base_url'?}
B -- new instance, has value --> C[Use base_url]
B -- old instance, None or empty --> D{config.get 'url' legacy?}
D -- old instance, has value --> E[Use legacy url]
D -- neither present --> F[Pass None → LlamaParse default US endpoint]
C --> G[LlamaParse API call]
E --> G
F --> G
Reviews (2): Last reviewed commit: "fix(llama-parse): add legacy url fallbac..." | Re-trigger Greptile
jaseemjaskp
left a comment
There was a problem hiding this comment.
PR Review Toolkit — summary (6 agents: Code Reviewer, Silent Failure Hunter, Type Design, PR Test Analyzer, Comment Analyzer, Code Simplifier)
Verdict: the one-line rename is correct and fixes a real bug. Before this PR the schema published the key url while the adapter reads base_url (constants.py:5 → LlamaParseConfig.BASE_URL = "base_url", consumed at llama_parse.py:55). So the configured Base URL never reached the adapter — it resolved to None and the SDK silently fell back to its default endpoint. This PR aligns the two. Blast radius is fully contained: LlamaParseConfig is referenced only within llama_parse.py; no frontend/backend code reads the literal url key.
Prioritised findings
P1 — Backward-compat / no migration (llama_parse.py:55, outside diff). Adapter instances saved before this change persist the value under the old url key and have no base_url, so config.get("base_url") returns None for them. Cloud users are unaffected (None → default cloud endpoint), but self-hosted / custom-endpoint users silently revert to https://api.cloud.llamaindex.ai with no log. Fix options: (a) data migration renaming the persisted key, or (b) legacy fallback: self.config.get(LlamaParseConfig.BASE_URL) or self.config.get("url"). Moot if this adapter has no released configs yet — please confirm. (Inline note on schema line 21.)
P2 — Pre-existing broad except Exception mislabels errors (llama_parse.py:94-98, outside diff). Any non-ConnectError (401/403/404 from a wrong base_url, etc.) is hard-coded as "invalid API Key", and line 96 uses a non-f-string "...: {exe}" so the exception detail is never logged. Not introduced here, but the rename makes these failures more likely. Recommend fixing the f-string and not asserting a guessed cause.
P3 — No test coverage (sdk1/tests). Zero tests reference LlamaParse or this schema. A cheap, network-free regression test mirroring test_gemini_embedding.py:31-53 would pin the schema-key↔constant mapping that just drifted: assert LlamaParseConfig.BASE_URL in schema['properties'], 'url' not in properties, required == {'api_key'}. Note: only assert the subset of constants the adapter actually reads from self.config (NUM_WORKERS/LANGUAGE are not in the schema). Criticality ~7/10.
P4 — title still reads "URL" (schema line 23, inline). Field is now base_url and description says "Base URL"; suggest "title": "Base URL".
P5 — Nits (optional): "format": "url" is non-standard JSON Schema — standard is "uri" (schema line 24, inline). Description (line 26) mis-cases "llama Parse" and drops the article vs line 19 — suggest "...of the Llama Parse server.". api_key block (line 14) is indented 8 spaces vs 6 for siblings. All pre-existing.
Type Design / Code Simplifier: schema is well-formed; base_url optional-with-default is the right choice (not in required). Nothing to simplify in the rename itself. Note llama_parse is now the lone x2text adapter using base_url vs url in siblings — acceptable since its code requires base_url.
No blocking issues. Recommend addressing P1 (or confirming it's moot) and P4 before merge; P2/P3 are good follow-ups.
- Fall back to the legacy 'url' key so LlamaParse adapter instances saved before the base_url rename (#1972) keep their configured endpoint instead of silently reverting to the default US endpoint. - Update schema title 'URL' -> 'Base URL' and format 'url' -> 'uri'. Addresses review feedback on #2019.
|
Unstract test resultsPer-group results
Critical paths
|



What
urltobase_urlinjson_schema.json.Why
url, but the adapter code reads it viaLlamaParseConfig.BASE_URL("base_url"). The mismatch meant the value a user entered was stored underurland never read, soLlamaParsealways fell back to its hardcoded US endpoint (https://api.cloud.llamaindex.ai).401 Unauthorized.How
"url"→"base_url"inunstract/sdk1/.../llama_parse/src/static/json_schema.jsonso it matchesconstants.pyand the lookup inllama_parse.py.Can this PR break any existing features. If yes, please list possible items. If no, please explain why. (PS: Admins do not merge the PR without this section filled)
urlkey (configs are stored Fernet-encrypted inadapter_instance.adapter_metadata_b), so it will not auto-migrate. Those few users simply need to re-open and re-save the adapter to populatebase_url. A data migration was considered but intentionally skipped to keep the change minimal.Database Migrations
Env Config
Relevant Docs
Related Issues or PRs
urlfield in json_schema.json doesn't matchbase_urlconfig key, breaking EU region support #1972Dependencies Versions
Notes on Testing
https://api.cloud.eu.llamaindex.ai) and confirm extraction succeeds instead of returning 401.Screenshots
Checklist
I have read and understood the Contribution Guidelines.