Skip to content

Commit e5ed7b5

Browse files
Moultclaude
andcommitted
ifcviewer: GPU LOD0/LOD1 selection in compute cull (step 3c)
The compact shader now computes per-instance pixel radius and routes survivors to LOD1 buckets when the projected sphere falls below the LOD1 threshold (default 30 px, same as CPU path, tunable via IFC_LOD1_PX). Layout expanded from 2 to 4 buckets per mesh: [0..M) fwd_lod0 [M..2M) fwd_lod1 [2M..3M) rev_lod0 [3M..4M) rev_lod1 Two MDIs per model: CCW for [0..2M), CW for [2M..4M). Per-mesh has_lod1 flags live in a new gpu_mesh_flags_ssbo (binding 4). Contribution cull refactored: the compact shader now computes pixelRadius() once and uses it for both the min_pixel_radius rejection and LOD routing, matching the CPU path's logic. Visible-buffer worst case is 2 × total_instances (each LOD bucket reserves the full fwd/rev capacity per mesh, since LOD selection is dynamic). Tri count drops ~60% on the test dataset (53M → 22M) thanks to LOD1 decimated meshes. FPS recovers from 16 to 36 despite 690k sub_draws (4M layout). MDI compaction remains the final perf fix. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 069ef20 commit e5ed7b5

3 files changed

Lines changed: 168 additions & 102 deletions

File tree

src/ifcviewer/README.md

Lines changed: 12 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -795,14 +795,18 @@ single giant model / <18 cores CPU BVH trv Phase 3E GPU cull (plann
795795
- [x] Event-driven rendering (zero idle CPU/GPU, cull skipped on still frames)
796796
- [~] **Phase 3E — GPU-side compute-shader culling** (in progress)
797797
- [x] 3a: `IFC_GPU_CULL=1` drives rendering via compute cull (frustum +
798-
contribution, single bucket per mesh). Correctness matches CPU
799-
path; perf regressed — we submit one sub-draw per mesh even
800-
when `instanceCount=0`. Fix is MDI compaction via
801-
`glMultiDrawElementsIndirectCount`, deferred to 3a-followup so
802-
we don't pull a GL 4.6 entrypoint loader into this commit.
803-
- [ ] 3a-followup: compact non-empty commands, use count-buffer MDI
804-
- [ ] 3b: fwd/rev reflection bucketing on GPU
805-
- [ ] 3c: LOD0/LOD1 selection on GPU
798+
contribution). Perf regressed — submits one sub-draw per mesh
799+
even when `instanceCount=0` (CP overhead from empty commands).
800+
- [x] 3b: fwd/rev reflection bucketing — compact shader routes by
801+
reflected flag into CCW and CW MDI buckets.
802+
- [x] 3c: LOD0/LOD1 selection — compact shader computes per-instance
803+
pixel radius and routes to LOD1 bucket when below threshold.
804+
Per-mesh `has_lod1` flags SSBO. 4 buckets per mesh (fwd/rev ×
805+
LOD0/LOD1), 4M commands total, 2 MDIs per model.
806806
- [ ] 3d: HiZ with same-frame depth pre-pass
807+
- [ ] MDI compaction — compact non-empty commands into contiguous
808+
buffer, use `glMultiDrawElementsIndirectCount` (GL 4.6 /
809+
`ARB_indirect_parameters`). Deferred until all feature buckets
810+
land so we can introduce GL 4.6 loading once, cleanly.
807811
- [ ] Vulkan/MoltenVK backend for macOS
808812
- [ ] Embedded Python scripting console

0 commit comments

Comments
 (0)