feat(file_processors): add remote docling-serve provider#5412
feat(file_processors): add remote docling-serve provider#5412franciscojavierarceo merged 7 commits intoogx-ai:mainfrom
Conversation
f41025e to
b62d9f0
Compare
Add a remote file processor provider that delegates document conversion and chunking to a Docling Serve instance, enabling GPU-accelerated layout-aware document parsing for real-time RAG applications. Signed-off-by: Alina Ryan <aliryan@redhat.com>
| headers = self._get_headers() | ||
|
|
||
| options = { | ||
| "to_formats": '["md"]', |
There was a problem hiding this comment.
feels like this could go in FileProcessorConfig but can be a follow up PR later
There was a problem hiding this comment.
will add in follow-up
…cessor Use SecretStr for api_key config field with get_secret_value() in headers, validate files_api dependency is non-None in get_adapter_impl, type files_api as Files instead of Any, use file_id as document_id when available, simplify filename fallback to "upload", and include /v1 in base_url default to match standard Docling Serve API convention. Signed-off-by: Alina Ryan <aliryan@redhat.com>
…ocessor Add detailed description to the docling-serve provider spec including features, usage examples with Docker and run.yaml, and links to the Docling Serve documentation repository. Signed-off-by: Alina Ryan <aliryan@redhat.com>
Signed-off-by: Alina Ryan <aliryan@redhat.com>
|
spoke offline with @franciscojavierarceo - I'm going to complete a speed/scale analysis test of some pdfs, will post results here |
|
Performance Benchmark Results Benchmarked the remote::docling-serve file processor against a local Docling Serve instance using real-world PDFs from a mixed corpus (103 files, 49KB–63MB). All tests used chunking_strategy: auto (Docling's HybridChunker, which splits documents at semantic boundaries like headings and sections) via POST /v1alpha/file-processors/process.
Single-file latency (concurrency=1)
Behavior under concurrent load
Deep dive: large file failures The large file used was 37-02-FullBook.pdf — a Johns Hopkins APL Technical Digest (academic research papers). 86 pages, 63.4MB, 297 embedded images (charts, diagrams, photos), created
Why they fail: Each request forces Docling Serve to load a 63MB PDF into memory, extract 297 images, run layout analysis on 86 pages, and chunk the result. Sequentially this works fine Observations
Test setup
|
| "to_formats": '["md"]', | ||
| } | ||
|
|
||
| async with httpx.AsyncClient(timeout=300.0) as client: |
There was a problem hiding this comment.
in a follow up we can make the timeout part of the file processor config.
What does this PR do?
Adds a remote file processor provider that delegates document conversion and chunking to a Docling Serve instance, enabling GPU-accelerated layout-aware document parsing for real-time RAG applications.
Test Plan
End-to-end RAG demo
Prerequisites:
docling_serve_rag_config.yaml
Start Llama Stack server:
OLLAMA_URL=http://localhost:11434/v1 llama stack run docling_serve_rag_config.yaml --port 8321RAG Pipeline
Upload PDF
Process with docling-serve
Create vector store
Insert file into vector store
Verify indexing
Search the vector store
RAG: retrieve context and generate answer