Skip to content

docspec/docspec

DocSpec

DocSpec is a streaming document conversion library. It converts DOCX, ODT, RTF, HTML, Markdown, BlockNote JSON, and Pandoc native — event by event, byte by byte, without buffering the world. Built in Rust for memory-conscious systems, from microcontrollers to servers.

Philosophy

See our Manifesto for what we stand for: memory extremism, streaming-first design, and the belief that software should earn every byte it uses.

Quick Start

DocSpec works through a pipeline of readers and writers. A reader (EventSource) parses a document and emits events: StartParagraph, Text, EndParagraph, StartHeading, etc. A writer (EventSink) consumes these events and produces output in the target format.

The architecture is fully decoupled. Any reader connects to any writer. A DOCX reader can feed a Markdown writer. An HTML reader can feed BlockNote JSON. The events are the contract.

To convert a document:

  1. Create a reader for your input format
  2. Create a writer for your output format
  3. Connect them through the event pipeline
  4. Let the events flow

No buffering. No intermediate representations. No loading the entire document into memory. The document streams through, event by event.

CLI

Install the docspec binary:

cargo install docspec-cli

Convert a document:

docspec convert input.docx output.md

Start the HTTP API server:

docspec http

The Docker image (ghcr.io/docspec/api) runs docspec http internally.

Documentation

  • Manifesto — Philosophy and values: memory extremism, streaming design, quality standards
  • Architecture — Streaming pipeline design, reader/writer contracts, event model decisions, and how to read the in-code event reference
  • Coding Standards — Code style rules, formatting conventions, review checklist
  • Contributing — How to contribute, PR process, development workflow
  • Testing — Test philosophy, coverage requirements, testing patterns
  • Security — Security principles, vulnerability reporting, safe practices
  • Agents — Guidance for AI agents analyzing or contributing to this codebase

Core Principles

  • Memory Conscious: Every byte allocated must justify its existence. We measure, profile, and optimize relentlessly.
  • Streaming First: Data flows event by event. Nothing accumulates. Everything moves.
  • Fail Fast: On corruption or error, surface it immediately. No partial output. No silent truncation.
  • No Unsafe Code: The workspace forbids unsafe entirely. Safety is not a limitation; it is a foundation.
  • Strict Quality: 98% coverage for new and changed executable Rust lines in covered crates. In source code: no unwrap, no expect, no inline #[allow] warning suppressions. Test files (under tests/** and #[cfg(test)] modules) may opt out of unwrap_used / expect_used via crate-level #![allow(...)]; source code stays strict.

Why Rust

We chose Rust because it gives us control: memory layout, allocation, lifetimes — without a garbage collector making decisions for us. The borrow checker enforces at compile time what other languages discover at runtime through crashes. Ownership is not a feature; it is a discipline.

Status

DocSpec is under active development. The architecture is stable. The event model is defined. Readers and writers are being implemented incrementally.

License

See LICENSE file.

About

Home for the Rust implementation for DocSpec

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages