Proposal for Improving simdjson on RISC-V: A First-Class RVV Backend (with measurable wins) #2581
Rejean-McCormick
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Proposal for Improving simdjson on RISC-V: A First-Class RVV Backend (with measurable wins)
Disclaimer: This analysis has been made by AI, under my guidance.
0) Motivation and current state (from the codebase)
The project already has dedicated CI workflows that build and run tests under QEMU with RVV enabled, including multiple VLEN configurations (e.g., VLEN=128, 256, 1024) and toolchains (clang/gcc).
The preprocessor layer already detects RVV-capable builds via
SIMDJSON_IS_RVV(and related macros).However, RVV is not currently treated as a first-class simdjson “implementation” in the same way as
haswell,icelake,arm64, etc. Concretely:rvv.rvvcase.get_available_implementation_pointers()) enumerates known implementations but has no RVV entry.SIMDJSON_SINGLE_IMPLEMENTATIONmacro sums the known implementations and would need RVV integration if we add it.High-level goal: add a real RVV backend that is selectable via the existing runtime dispatch mechanism, passes the existing RVV CI, and shows clear speedups (first UTF-8 validation, then Stage 1 structural indexing).
1) Deliverables and acceptance criteria
Deliverable A — “Plumbing” RVV backend (compiles, selectable, no perf regression)
Acceptance:
SIMDJSON_FORCE_IMPLEMENTATION=rvvselects RVV at runtime (same mechanism already used in dispatch).Deliverable B — RVV UTF-8 validation (first measurable win)
Acceptance:
Deliverable C — RVV Stage 1 structural indexing (main parsing throughput win)
Acceptance:
2) Work plan (incremental PRs, each independently reviewable)
PR 1 — Make RVV a first-class “implementation” (no new SIMD yet)
Why: This reduces review risk: we land scaffolding first, then performance code.
Changes:
CMake integration
rvvto the CMake implementation list (currently:fallback westmere haswell icelake arm64 ppc64).Builtin integration
Runtime dispatch / registry
get_rvv_singleton()and register it inget_available_implementation_pointers()(where all implementations are listed).Single-implementation logic
SIMDJSON_SINGLE_IMPLEMENTATIONaccounting to include RVV (currently sums only existing impls).Note: Platform detection macros already exist (
SIMDJSON_IS_RVV).So PR 1 is primarily wiring.
PR 2 — RVV implementation skeleton + “generic” functional parity
Goal: Provide a working RVV implementation class that is correct even before full vector optimizations.
Changes:
Add
include/simdjson/rvv/implementation.handsrc/rvv.cpp(mirroring patterns used by other impls).Implement required virtuals:
create_dom_parser_implementationminifyvalidate_utf8For the initial pass:
validate_utf8can call the generic path to ensure correctness; then PR 3 replaces it with RVV-accelerated code.Acceptance:
PR 3 — RVV UTF-8 validation (first targeted SIMD optimization)
Goal: Implement a true RVV fast path for
validate_utf8().Approach:
Start with a pragmatic vector strategy that matches existing internal abstractions:
Use the existing
SIMDJSON_IS_RVVgate for compilation.Add a benchmark mode that:
validate_utf8repeatedly on representative dataMeasurement:
PR 4 — RVV mask/bitset extraction primitives (enabler for Stage 1)
Problem: Stage 1 is dominated by producing bitmasks for structurals, quotes, backslashes, etc. RVV is predicate-driven and does not provide x86-like
movemask.Plan:
Implement a stable, correct predicate→bitset lowering:
uint64_t(or a small struct of multipleuint64_tdepending on chunk size)Add a microbenchmark for mask packing alone (so we can see if we need
zvbb-assisted optimization).Optional fast path:
SIMDJSON_HAS_ZVBB_INTRINSICSto 0 with a comment about detection limitations.Yet CI builds with
-march=rv64gcv_zvbbin at least one workflow.Proposal: add a build-time opt-in (CMake option or compile definition) that enables
zvbbpaths when the compiler supports them (without relying on fragile autodetection).PR 5 — RVV Stage 1 structural indexing
Goal: Accelerate the core of parsing by porting Stage 1 to RVV.
Approach:
Reuse the existing stage1 algorithm and swap in RVV SIMD primitives + bitmask extraction from PR 4.
Validate with:
Acceptance:
PR 6 — RVV Stage 2 (strings, numbers) and “polish”
Goal: Make RVV a complete backend, not just stage1 + utf8.
Items:
3) Benchmarking/reporting plan (what I would submit with each perf PR)
A small, reproducible benchmark script that:
A short “before/after” table for:
4) Risk management and review strategy
5) Optional sidequests (high ROI, can be done in parallel)
zvbbenablement story (explicit opt-in / CMake flag), since CI already exercisesrv64gcv_zvbb.If you want, I can rewrite the plan into a PR-by-PR checklist (exact files to add/modify, plus minimal API surface for the RVV simd layer) aligned with simdjson’s current dispatch and build wiring.
Beta Was this translation helpful? Give feedback.
All reactions