@@ -694,31 +694,33 @@ thousands and the frame time drops accordingly.
694694
695695##### Known caveats
696696
697- - ** Disabled while the camera moves.** The pyramid is aligned to the
698- VP matrix of the frame that produced it. On a moving camera the
699- stored VP no longer matches the current one, and reusing it would
700- pop objects in and out as the stale depth falsely claims they're
701- occluded. The cull now compares ` hiz_vp_ == current_vp ` and drops
702- HiZ rejection entirely when they differ, so HiZ only contributes on
703- still frames. The honest cost: orbiting — the exact motion where
704- the frame rate tends to dip — gets no HiZ help. A proper fix needs
705- a same-frame depth pre-pass (draw cheap depth, build HiZ from * that*
706- frame's VP, then issue the colour pass against it); deferred to the
707- GPU-compute cull rewrite in Phase 3E where we're touching this code
708- anyway. We also tried a 3-deep PBO ring for async readback (2-frame
709- stale) which produced visible flicker on fast orbits — reverted.
697+ - ** Optional during camera motion (` IFC_HIZ_MOTION=1 ` ).** The pyramid
698+ is aligned to the previous frame's VP. On a moving camera the stale
699+ depth can falsely occlude objects, particularly thin geometry (pipes,
700+ railings) at oblique angles. By default HiZ is disabled during motion
701+ (` hiz_vp_ == current_vp ` check). Setting ` IFC_HIZ_MOTION=1 ` forces
702+ HiZ on during motion — benchmarks show this is the single biggest
703+ perf lever (2.9× speedup), and the artifacts are transient and minor
704+ during active orbiting. When the camera stops, a settle recull fires
705+ with ` hiz_vp_valid_ = false ` , disabling HiZ for that one frame and
706+ re-culling the full scene. This guarantees the stationary view is
707+ artifact-free. See Phase 3G for benchmark data.
708+ - ** Conservative occlusion test.** The original "max over coarse mip"
709+ test was too aggressive for BIM scenes where the entire depth range
710+ compresses into 0.99–1.00. Replaced with "all fine-mip texels must
711+ agree" — sample at mip 1, reject only if every texel has depth less
712+ than the AABB's nearest point, early-out on the first non-occluding
713+ texel. Queries covering >64 texels skip HiZ entirely. Eliminates
714+ most false occlusions at the cost of fewer true rejections.
715+ - ** Depth blit replaced with shader downsample.** The original
716+ ` glBlitFramebuffer ` for scaling the resolved depth to HiZ size
717+ produced ` GL_INVALID_VALUE ` on some drivers. Replaced with a
718+ fullscreen-triangle shader writing ` gl_FragDepth ` . The resolve
719+ texture uses ` GL_DEPTH24_STENCIL8 ` to match Qt's default FBO format
720+ (which uses D24S8 even when only depth is requested).
710721- ** Readback syncs the GPU.** ` glGetTextureImage ` is blocking.
711722 Measured cost is well under a millisecond at 256×128; not a
712- bottleneck on the machines tested. Phase 3D's compute-shader cull
713- removes it entirely.
714- - ** Doesn't move the needle on overview shots.** Those scenes are
715- CPU-bound on the cull traversal itself, not GPU-bound on drawing,
716- so cutting the drawn-triangle count in half is invisible in the
717- frame time. ` hiz_rej ` still rises modestly on overviews (the frustum
718- hull contains everything behind visible walls) but saved GPU work
719- is masked by CPU cost. HiZ pays off on interior views, where the
720- GPU * was* the bottleneck. If a project never leaves overview,
721- ` IFC_NO_HIZ=1 ` shaves the ~ 1 ms of HiZ cost.
723+ bottleneck on the machines tested.
722724- ** Transparent geometry would need special handling** , but the
723725 current renderer doesn't have any, so no-op for now.
724726
@@ -961,6 +963,74 @@ doors, windows, pipe fittings) share geometry across placements.
961963 ratio (~ 80 k vs ~ 120 k), confirming per-draw overhead as the
962964 dominant cost.
963965
966+ #### 3G. Motion-adaptive culling + HiZ during motion — ✅ done
967+
968+ The bottleneck during camera orbit is the sheer number of visible
969+ objects and sub_draws. Two complementary strategies address this:
970+
971+ ##### Motion-adaptive contribution culling (` IFC_MIN_PX_MOTION ` )
972+
973+ During camera motion, use a larger pixel-radius threshold to hide
974+ small objects that contribute little at interactive rates. When the
975+ camera stops, a settle recull restores the base threshold and full
976+ detail within one frame. No visual artifacts — objects below the
977+ motion threshold are genuinely tiny on screen.
978+
979+ ##### HiZ during motion (` IFC_HIZ_MOTION=1 ` )
980+
981+ Force the one-frame-stale HiZ pyramid to remain active during camera
982+ motion. The stale depth causes minor false occlusions on thin
983+ geometry at oblique angles, but these are transient during active
984+ orbit. When the camera stops, the settle recull invalidates the HiZ
985+ pyramid (` hiz_vp_valid_ = false ` ) and re-culls without HiZ,
986+ guaranteeing the stationary view is artifact-free.
987+
988+ ##### Benchmark results
989+
990+ Benchmarked on 1.06 M-instance / 111-model scene, 200-frame orbit
991+ (103° arc, 0.5°/frame), GTX 1650:
992+
993+ | Configuration | avg ms | fps | speedup | obj | sub_draws | hiz_rej |
994+ | ----------------------------------| --------| ------| ---------| -------| -----------| ---------|
995+ | Baseline (no opts) | 61.25 | 16.3 | 1.0× | 254k | 155k | 0 |
996+ | MIN_PX_MOTION=10 | 37.67 | 26.5 | 1.6× | 70k | 56k | 0 |
997+ | HIZ_MOTION=1 | 21.44 | 46.6 | 2.9× | 33k | 17.5k | 28k |
998+ | HIZ_MOTION=1 + MIN_PX_MOTION=10 | 19.62 | 51.0 | 3.1× | 11.4k | 8.7k | 11.5k |
999+ | GPU_CULL + HIZ + MIN_PX | 19.22 | 52.0 | 3.2× | 11.3k | 8.6k | 59k |
1000+
1001+ ##### Conclusions
1002+
1003+ 1 . ** HiZ during motion is the biggest single lever** — 2.9× alone.
1004+ Artifacts are minor and transient during orbit; the stationary view
1005+ is guaranteed correct by the settle recull.
1006+
1007+ 2 . ** Motion pixel culling is clean and effective** — 1.6× with zero
1008+ artifacts.
1009+
1010+ 3 . ** Combining both gives diminishing returns** — 3.1× vs 2.9× (HiZ
1011+ alone) or 1.6× (MIN_PX alone). They compete over the same objects.
1012+
1013+ 4 . ** GPU cull adds nothing** on top of these — 52.0 vs 51.0 fps. The
1014+ CPU BVH path handles the reduced visible set in ~ 2 ms.
1015+
1016+ 5 . ** The ~ 19 ms floor is GPU rendering** , not culling. At 8.6k
1017+ sub_draws the bottleneck shifts to draw dispatch + triangle
1018+ rasterization. Further improvement requires reducing sub_draws
1019+ (static batching) or moving to a more efficient draw model.
1020+
1021+ ##### Benchmark CLI
1022+
1023+ Press ** C** during interactive use to print the current camera as a
1024+ ` --camera ` argument. Then benchmark reproducibly:
1025+
1026+ ``` bash
1027+ ./IfcViewer --camera tx,ty,tz,dist,yaw,pitch --benchmark 200 files...
1028+ ```
1029+
1030+ The benchmark orbits the camera (0.5°/frame yaw), measures N frames
1031+ after a 5-frame warmup, prints avg/median/p1/p99 frame times, then
1032+ exits. Env vars control the test configuration.
1033+
9641034### Planned follow-ups (post-Phase-3)
9651035
9661036- ** Mesh shaders / meshlets.** Ceiling-raising, but overkill until the
@@ -979,6 +1049,7 @@ Scene size Bottleneck Fix
9791049multi-million + occluders redundant rasterisation Phase 3C HiZ (done, CPU readback)
9801050many models, serial cull single-thread BVH trv Phase 3D parallel cull (done)
9811051single giant model / <18 cores CPU BVH trv Phase 3E GPU cull (hybrid, done)
1052+ orbit fps on 1M+ scenes too many vis objects Phase 3G motion culling + HiZ (done, 3.1×)
982105390k+ unique visible meshes per-draw GPU overhead Phase 3F static batching (next)
9831054```
9841055
@@ -996,14 +1067,16 @@ single giant model / <18 cores CPU BVH trv Phase 3E GPU cull (hybri
9961067- [x] Reflection-aware two-pass draw for mirrored placements
9971068- [x] Backface culling (user-toggleable, default on)
9981069- [x] ` reorient-shells ` enabled in iterator
999- - [x] Perf diagnostic env vars (` IFC_SKIP_MDI ` , ` IFC_MAX_SUBDRAWS ` , ` IFC_MIN_PX ` , ` IFC_LOD1_PX ` , ` IFC_NO_HIZ ` , ` IFC_HIZ_SIZE ` , ` IFC_CULL_THREADS ` )
1070+ - [x] Perf diagnostic env vars (` IFC_SKIP_MDI ` , ` IFC_MAX_SUBDRAWS ` , ` IFC_MIN_PX ` , ` IFC_LOD1_PX ` , ` IFC_NO_HIZ ` , ` IFC_HIZ_SIZE ` , ` IFC_CULL_THREADS ` , ` IFC_MIN_PX_MOTION ` , ` IFC_HIZ_MOTION ` , ` IFC_GPU_CULL ` , ` IFC_SUBDRAW_DIAG ` )
10001071- [x] Phase 3A — screen-space contribution culling
10011072- [x] Phase 3B — distance / contribution LOD (meshoptimizer ` simplifySloppy ` )
10021073- [x] Phase 3C — Hierarchical-Z occlusion culling (v1, CPU-side readback)
10031074- [x] Phase 3D — Parallel per-model CPU cull (` std::async ` fan-out)
10041075- [x] Quantized VBO (16 B/vert, sidecar v6)
10051076- [x] Event-driven rendering (zero idle CPU/GPU, cull skipped on still frames)
10061077- [x] Phase 3E — GPU compute-shader culling (hybrid: GPU frustum+contribution, async readback, CPU HiZ+LOD+emit)
1078+ - [x] Phase 3G — Motion-adaptive culling + HiZ during motion (3.1× orbit speedup on 1M-instance scene)
1079+ - [x] Benchmark CLI (` --camera ` , ` --benchmark ` , press C to capture camera)
10071080- [ ] ** Phase 3F — Static batching of single-instance meshes** (next; reduces 90k+ sub_draws to hundreds)
10081081- [ ] Vulkan/MoltenVK backend for macOS
10091082- [ ] Embedded Python scripting console
0 commit comments