DocSpec is a streaming document conversion library. It converts DOCX, ODT, RTF, HTML, Markdown, BlockNote JSON, and Pandoc native — event by event, byte by byte, without buffering the world. Built in Rust for memory-conscious systems, from microcontrollers to servers.
See our Manifesto for what we stand for: memory extremism, streaming-first design, and the belief that software should earn every byte it uses.
DocSpec works through a pipeline of readers and writers. A reader (EventSource) parses a document and emits events: StartParagraph, Text, EndParagraph, StartHeading, etc. A writer (EventSink) consumes these events and produces output in the target format.
The architecture is fully decoupled. Any reader connects to any writer. A DOCX reader can feed a Markdown writer. An HTML reader can feed BlockNote JSON. The events are the contract.
To convert a document:
- Create a reader for your input format
- Create a writer for your output format
- Connect them through the event pipeline
- Let the events flow
No buffering. No intermediate representations. No loading the entire document into memory. The document streams through, event by event.
Install the docspec binary:
cargo install docspec-cliConvert a document:
docspec convert input.docx output.mdStart the HTTP API server:
docspec httpThe Docker image (ghcr.io/docspec/api) runs docspec http internally.
- Manifesto — Philosophy and values: memory extremism, streaming design, quality standards
- Architecture — Streaming pipeline design, reader/writer contracts, event model decisions, and how to read the in-code event reference
- Coding Standards — Code style rules, formatting conventions, review checklist
- Contributing — How to contribute, PR process, development workflow
- Testing — Test philosophy, coverage requirements, testing patterns
- Security — Security principles, vulnerability reporting, safe practices
- Agents — Guidance for AI agents analyzing or contributing to this codebase
- Memory Conscious: Every byte allocated must justify its existence. We measure, profile, and optimize relentlessly.
- Streaming First: Data flows event by event. Nothing accumulates. Everything moves.
- Fail Fast: On corruption or error, surface it immediately. No partial output. No silent truncation.
- No Unsafe Code: The workspace forbids unsafe entirely. Safety is not a limitation; it is a foundation.
- Strict Quality: 98% coverage for new and changed executable Rust lines in covered crates. In source code: no
unwrap, noexpect, no inline#[allow]warning suppressions. Test files (undertests/**and#[cfg(test)]modules) may opt out ofunwrap_used/expect_usedvia crate-level#![allow(...)]; source code stays strict.
We chose Rust because it gives us control: memory layout, allocation, lifetimes — without a garbage collector making decisions for us. The borrow checker enforces at compile time what other languages discover at runtime through crashes. Ownership is not a feature; it is a discipline.
DocSpec is under active development. The architecture is stable. The event model is defined. Readers and writers are being implemented incrementally.
See LICENSE file.