Skip to content

Commit ed6e8d8

Browse files
Moultclaude
andcommitted
ifcviewer: benchmark CLI, settle recull fix, and Phase 3G documentation
Add --camera tx,ty,tz,dist,yaw,pitch and --benchmark N CLI args for reproducible performance measurement. The benchmark orbits the camera (0.5°/frame yaw) for N frames after a 5-frame warmup, prints avg/median/p1/p99 frame times, then exits. Press C during interactive use to print the current camera as a --camera argument. Fix settle recull to fire after ANY camera motion (not just when IFC_MIN_PX_MOTION is set), ensuring HiZ artifacts from motion frames are always cleared when the camera stops. Document Phase 3G (motion-adaptive culling + HiZ during motion) in README with benchmark results from 1.06M-instance scene: - Baseline: 16.3 fps - IFC_MIN_PX_MOTION=10: 26.5 fps (1.6x) - IFC_HIZ_MOTION=1: 46.6 fps (2.9x) - Both combined: 51.0 fps (3.1x) - + GPU_CULL: 52.0 fps (3.2x, negligible gain) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 930678e commit ed6e8d8

6 files changed

Lines changed: 241 additions & 30 deletions

File tree

src/ifcviewer/MainWindow.cpp

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -190,6 +190,7 @@ void MainWindow::connectStreamer(GeometryStreamer* streamer) {
190190
void MainWindow::startNextLoad() {
191191
if (load_queue_.empty()) {
192192
loading_model_id_ = 0;
193+
applyPendingBenchmark();
193194
return;
194195
}
195196

@@ -554,3 +555,34 @@ void MainWindow::populateProperties(uint32_t object_id) {
554555
}
555556
}
556557
}
558+
559+
void MainWindow::setPendingCamera(const QString& params) {
560+
pending_camera_ = params;
561+
}
562+
563+
void MainWindow::setPendingBenchmark(int frames) {
564+
pending_benchmark_ = frames;
565+
}
566+
567+
void MainWindow::applyPendingBenchmark() {
568+
if (pending_camera_.isEmpty() && pending_benchmark_ <= 0) return;
569+
570+
if (!pending_camera_.isEmpty()) {
571+
QStringList parts = pending_camera_.split(',');
572+
if (parts.size() == 6) {
573+
viewport_->setCamera(
574+
parts[0].toFloat(), parts[1].toFloat(), parts[2].toFloat(),
575+
parts[3].toFloat(), parts[4].toFloat(), parts[5].toFloat());
576+
qDebug("Camera set: %s", qPrintable(pending_camera_));
577+
} else {
578+
qWarning("--camera expects 6 comma-separated values: tx,ty,tz,dist,yaw,pitch");
579+
}
580+
pending_camera_.clear();
581+
}
582+
583+
if (pending_benchmark_ > 0) {
584+
qDebug("Starting benchmark: %d frames", pending_benchmark_);
585+
viewport_->setBenchmarkFrames(pending_benchmark_);
586+
pending_benchmark_ = 0;
587+
}
588+
}

src/ifcviewer/MainWindow.h

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,8 @@ class MainWindow : public QMainWindow {
5757
~MainWindow();
5858

5959
void addFiles(const QStringList& paths);
60+
void setPendingCamera(const QString& params);
61+
void setPendingBenchmark(int frames);
6062

6163
private slots:
6264
void onFileOpen();
@@ -109,6 +111,11 @@ private slots:
109111
static uint64_t scopedKey(uint32_t model_id, int ifc_id) {
110112
return (static_cast<uint64_t>(model_id) << 32) | static_cast<uint32_t>(ifc_id);
111113
}
114+
115+
QString pending_camera_;
116+
int pending_benchmark_ = 0;
117+
118+
void applyPendingBenchmark();
112119
};
113120

114121
#endif // MAINWINDOW_H

src/ifcviewer/README.md

Lines changed: 97 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -694,31 +694,33 @@ thousands and the frame time drops accordingly.
694694

695695
##### Known caveats
696696

697-
- **Disabled while the camera moves.** The pyramid is aligned to the
698-
VP matrix of the frame that produced it. On a moving camera the
699-
stored VP no longer matches the current one, and reusing it would
700-
pop objects in and out as the stale depth falsely claims they're
701-
occluded. The cull now compares `hiz_vp_ == current_vp` and drops
702-
HiZ rejection entirely when they differ, so HiZ only contributes on
703-
still frames. The honest cost: orbiting — the exact motion where
704-
the frame rate tends to dip — gets no HiZ help. A proper fix needs
705-
a same-frame depth pre-pass (draw cheap depth, build HiZ from *that*
706-
frame's VP, then issue the colour pass against it); deferred to the
707-
GPU-compute cull rewrite in Phase 3E where we're touching this code
708-
anyway. We also tried a 3-deep PBO ring for async readback (2-frame
709-
stale) which produced visible flicker on fast orbits — reverted.
697+
- **Optional during camera motion (`IFC_HIZ_MOTION=1`).** The pyramid
698+
is aligned to the previous frame's VP. On a moving camera the stale
699+
depth can falsely occlude objects, particularly thin geometry (pipes,
700+
railings) at oblique angles. By default HiZ is disabled during motion
701+
(`hiz_vp_ == current_vp` check). Setting `IFC_HIZ_MOTION=1` forces
702+
HiZ on during motion — benchmarks show this is the single biggest
703+
perf lever (2.9× speedup), and the artifacts are transient and minor
704+
during active orbiting. When the camera stops, a settle recull fires
705+
with `hiz_vp_valid_ = false`, disabling HiZ for that one frame and
706+
re-culling the full scene. This guarantees the stationary view is
707+
artifact-free. See Phase 3G for benchmark data.
708+
- **Conservative occlusion test.** The original "max over coarse mip"
709+
test was too aggressive for BIM scenes where the entire depth range
710+
compresses into 0.99–1.00. Replaced with "all fine-mip texels must
711+
agree" — sample at mip 1, reject only if every texel has depth less
712+
than the AABB's nearest point, early-out on the first non-occluding
713+
texel. Queries covering >64 texels skip HiZ entirely. Eliminates
714+
most false occlusions at the cost of fewer true rejections.
715+
- **Depth blit replaced with shader downsample.** The original
716+
`glBlitFramebuffer` for scaling the resolved depth to HiZ size
717+
produced `GL_INVALID_VALUE` on some drivers. Replaced with a
718+
fullscreen-triangle shader writing `gl_FragDepth`. The resolve
719+
texture uses `GL_DEPTH24_STENCIL8` to match Qt's default FBO format
720+
(which uses D24S8 even when only depth is requested).
710721
- **Readback syncs the GPU.** `glGetTextureImage` is blocking.
711722
Measured cost is well under a millisecond at 256×128; not a
712-
bottleneck on the machines tested. Phase 3D's compute-shader cull
713-
removes it entirely.
714-
- **Doesn't move the needle on overview shots.** Those scenes are
715-
CPU-bound on the cull traversal itself, not GPU-bound on drawing,
716-
so cutting the drawn-triangle count in half is invisible in the
717-
frame time. `hiz_rej` still rises modestly on overviews (the frustum
718-
hull contains everything behind visible walls) but saved GPU work
719-
is masked by CPU cost. HiZ pays off on interior views, where the
720-
GPU *was* the bottleneck. If a project never leaves overview,
721-
`IFC_NO_HIZ=1` shaves the ~1 ms of HiZ cost.
723+
bottleneck on the machines tested.
722724
- **Transparent geometry would need special handling**, but the
723725
current renderer doesn't have any, so no-op for now.
724726

@@ -961,6 +963,74 @@ doors, windows, pipe fittings) share geometry across placements.
961963
ratio (~80 k vs ~120 k), confirming per-draw overhead as the
962964
dominant cost.
963965

966+
#### 3G. Motion-adaptive culling + HiZ during motion — ✅ done
967+
968+
The bottleneck during camera orbit is the sheer number of visible
969+
objects and sub_draws. Two complementary strategies address this:
970+
971+
##### Motion-adaptive contribution culling (`IFC_MIN_PX_MOTION`)
972+
973+
During camera motion, use a larger pixel-radius threshold to hide
974+
small objects that contribute little at interactive rates. When the
975+
camera stops, a settle recull restores the base threshold and full
976+
detail within one frame. No visual artifacts — objects below the
977+
motion threshold are genuinely tiny on screen.
978+
979+
##### HiZ during motion (`IFC_HIZ_MOTION=1`)
980+
981+
Force the one-frame-stale HiZ pyramid to remain active during camera
982+
motion. The stale depth causes minor false occlusions on thin
983+
geometry at oblique angles, but these are transient during active
984+
orbit. When the camera stops, the settle recull invalidates the HiZ
985+
pyramid (`hiz_vp_valid_ = false`) and re-culls without HiZ,
986+
guaranteeing the stationary view is artifact-free.
987+
988+
##### Benchmark results
989+
990+
Benchmarked on 1.06 M-instance / 111-model scene, 200-frame orbit
991+
(103° arc, 0.5°/frame), GTX 1650:
992+
993+
| Configuration | avg ms | fps | speedup | obj | sub_draws | hiz_rej |
994+
|----------------------------------|--------|------|---------|-------|-----------|---------|
995+
| Baseline (no opts) | 61.25 | 16.3 | 1.0× | 254k | 155k | 0 |
996+
| MIN_PX_MOTION=10 | 37.67 | 26.5 | 1.6× | 70k | 56k | 0 |
997+
| HIZ_MOTION=1 | 21.44 | 46.6 | 2.9× | 33k | 17.5k | 28k |
998+
| HIZ_MOTION=1 + MIN_PX_MOTION=10 | 19.62 | 51.0 | 3.1× | 11.4k | 8.7k | 11.5k |
999+
| GPU_CULL + HIZ + MIN_PX | 19.22 | 52.0 | 3.2× | 11.3k | 8.6k | 59k |
1000+
1001+
##### Conclusions
1002+
1003+
1. **HiZ during motion is the biggest single lever** — 2.9× alone.
1004+
Artifacts are minor and transient during orbit; the stationary view
1005+
is guaranteed correct by the settle recull.
1006+
1007+
2. **Motion pixel culling is clean and effective** — 1.6× with zero
1008+
artifacts.
1009+
1010+
3. **Combining both gives diminishing returns** — 3.1× vs 2.9× (HiZ
1011+
alone) or 1.6× (MIN_PX alone). They compete over the same objects.
1012+
1013+
4. **GPU cull adds nothing** on top of these — 52.0 vs 51.0 fps. The
1014+
CPU BVH path handles the reduced visible set in ~2 ms.
1015+
1016+
5. **The ~19 ms floor is GPU rendering**, not culling. At 8.6k
1017+
sub_draws the bottleneck shifts to draw dispatch + triangle
1018+
rasterization. Further improvement requires reducing sub_draws
1019+
(static batching) or moving to a more efficient draw model.
1020+
1021+
##### Benchmark CLI
1022+
1023+
Press **C** during interactive use to print the current camera as a
1024+
`--camera` argument. Then benchmark reproducibly:
1025+
1026+
```bash
1027+
./IfcViewer --camera tx,ty,tz,dist,yaw,pitch --benchmark 200 files...
1028+
```
1029+
1030+
The benchmark orbits the camera (0.5°/frame yaw), measures N frames
1031+
after a 5-frame warmup, prints avg/median/p1/p99 frame times, then
1032+
exits. Env vars control the test configuration.
1033+
9641034
### Planned follow-ups (post-Phase-3)
9651035

9661036
- **Mesh shaders / meshlets.** Ceiling-raising, but overkill until the
@@ -979,6 +1049,7 @@ Scene size Bottleneck Fix
9791049
multi-million + occluders redundant rasterisation Phase 3C HiZ (done, CPU readback)
9801050
many models, serial cull single-thread BVH trv Phase 3D parallel cull (done)
9811051
single giant model / <18 cores CPU BVH trv Phase 3E GPU cull (hybrid, done)
1052+
orbit fps on 1M+ scenes too many vis objects Phase 3G motion culling + HiZ (done, 3.1×)
9821053
90k+ unique visible meshes per-draw GPU overhead Phase 3F static batching (next)
9831054
```
9841055

@@ -996,14 +1067,16 @@ single giant model / <18 cores CPU BVH trv Phase 3E GPU cull (hybri
9961067
- [x] Reflection-aware two-pass draw for mirrored placements
9971068
- [x] Backface culling (user-toggleable, default on)
9981069
- [x] `reorient-shells` enabled in iterator
999-
- [x] Perf diagnostic env vars (`IFC_SKIP_MDI`, `IFC_MAX_SUBDRAWS`, `IFC_MIN_PX`, `IFC_LOD1_PX`, `IFC_NO_HIZ`, `IFC_HIZ_SIZE`, `IFC_CULL_THREADS`)
1070+
- [x] Perf diagnostic env vars (`IFC_SKIP_MDI`, `IFC_MAX_SUBDRAWS`, `IFC_MIN_PX`, `IFC_LOD1_PX`, `IFC_NO_HIZ`, `IFC_HIZ_SIZE`, `IFC_CULL_THREADS`, `IFC_MIN_PX_MOTION`, `IFC_HIZ_MOTION`, `IFC_GPU_CULL`, `IFC_SUBDRAW_DIAG`)
10001071
- [x] Phase 3A — screen-space contribution culling
10011072
- [x] Phase 3B — distance / contribution LOD (meshoptimizer `simplifySloppy`)
10021073
- [x] Phase 3C — Hierarchical-Z occlusion culling (v1, CPU-side readback)
10031074
- [x] Phase 3D — Parallel per-model CPU cull (`std::async` fan-out)
10041075
- [x] Quantized VBO (16 B/vert, sidecar v6)
10051076
- [x] Event-driven rendering (zero idle CPU/GPU, cull skipped on still frames)
10061077
- [x] Phase 3E — GPU compute-shader culling (hybrid: GPU frustum+contribution, async readback, CPU HiZ+LOD+emit)
1078+
- [x] Phase 3G — Motion-adaptive culling + HiZ during motion (3.1× orbit speedup on 1M-instance scene)
1079+
- [x] Benchmark CLI (`--camera`, `--benchmark`, press C to capture camera)
10071080
- [ ] **Phase 3F — Static batching of single-instance meshes** (next; reduces 90k+ sub_draws to hundreds)
10081081
- [ ] Vulkan/MoltenVK backend for macOS
10091082
- [ ] Embedded Python scripting console

src/ifcviewer/ViewportWindow.cpp

Lines changed: 81 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -22,8 +22,10 @@
2222
#include "AppSettings.h"
2323

2424
#include <QMouseEvent>
25+
#include <QKeyEvent>
2526
#include <QWheelEvent>
2627
#include <QSurfaceFormat>
28+
#include <QCoreApplication>
2729
#include <QtMath>
2830
#include <QtOpenGL/QOpenGLVersionFunctionsFactory>
2931

@@ -1203,6 +1205,44 @@ void ViewportWindow::setSelectedObjectId(uint32_t id) {
12031205
requestUpdate();
12041206
}
12051207

1208+
void ViewportWindow::setCamera(float tx, float ty, float tz,
1209+
float dist, float yaw, float pitch) {
1210+
camera_target_ = QVector3D(tx, ty, tz);
1211+
camera_distance_ = dist;
1212+
camera_yaw_ = yaw;
1213+
camera_pitch_ = pitch;
1214+
have_cached_cull_ = false;
1215+
requestUpdate();
1216+
}
1217+
1218+
void ViewportWindow::setBenchmarkFrames(int n) {
1219+
benchmark_total_ = n;
1220+
benchmark_count_ = 0;
1221+
benchmark_warmup_ = 5;
1222+
benchmark_yaw_start_ = camera_yaw_;
1223+
benchmark_frame_times_.clear();
1224+
benchmark_frame_times_.reserve(n);
1225+
requestUpdate();
1226+
}
1227+
1228+
QString ViewportWindow::cameraString() const {
1229+
return QString("%1,%2,%3,%4,%5,%6")
1230+
.arg(camera_target_.x(), 0, 'f', 4)
1231+
.arg(camera_target_.y(), 0, 'f', 4)
1232+
.arg(camera_target_.z(), 0, 'f', 4)
1233+
.arg(camera_distance_, 0, 'f', 4)
1234+
.arg(camera_yaw_, 0, 'f', 2)
1235+
.arg(camera_pitch_, 0, 'f', 2);
1236+
}
1237+
1238+
void ViewportWindow::keyPressEvent(QKeyEvent* event) {
1239+
if (event->key() == Qt::Key_C && !(event->modifiers() & Qt::ControlModifier)) {
1240+
qDebug("--camera %s", qPrintable(cameraString()));
1241+
return;
1242+
}
1243+
QWindow::keyPressEvent(event);
1244+
}
1245+
12061246
// --- HiZ occlusion culling (Phase 3C) -----------------------------------
12071247

12081248
// Baseline HiZ resolution. 256x128 is enough to cull big occluders
@@ -1983,24 +2023,23 @@ void ViewportWindow::render() {
19832023
&& last_cull_view_ == view_matrix_
19842024
&& last_cull_proj_ == proj_matrix_;
19852025
const bool camera_moving = !camera_unchanged;
2026+
const bool use_motion_threshold = camera_moving
2027+
&& motion_min_pixel_radius > base_min_pixel_radius;
19862028
// Force a re-cull on the first still frame after motion so we
1987-
// restore the base (tighter) contribution threshold.
2029+
// restore the base contribution threshold and clear stale HiZ.
19882030
const bool needs_settle_recull = !camera_moving
1989-
&& last_cull_was_motion_
1990-
&& motion_min_pixel_radius > base_min_pixel_radius;
2031+
&& last_cull_was_motion_;
19912032
const bool cull_this_frame = camera_moving || needs_settle_recull;
19922033
// Invalidate HiZ on the settle frame: the pyramid was built from the
19932034
// motion frame's sparse depth (aggressive threshold hid objects whose
19942035
// depth would normally populate the pyramid), causing false occlusion.
19952036
if (needs_settle_recull)
19962037
hiz_vp_valid_ = false;
1997-
const bool use_motion_threshold = camera_moving
1998-
&& motion_min_pixel_radius > base_min_pixel_radius;
19992038
const float min_pixel_radius = use_motion_threshold
20002039
? motion_min_pixel_radius : base_min_pixel_radius;
20012040
if (cull_this_frame) {
20022041
hiz_reject_count_.store(0, std::memory_order_relaxed);
2003-
last_cull_was_motion_ = use_motion_threshold;
2042+
last_cull_was_motion_ = camera_moving;
20042043
} else {
20052044
++cull_skipped_frames_;
20062045
}
@@ -2353,6 +2392,42 @@ void ViewportWindow::render() {
23532392
// Reported fps = "if I rendered continuously, this is the rate I'd hit",
23542393
// which is what profiling actually wants.
23552394
const float frame_cost_s = frame_cost_clock.nsecsElapsed() * 1e-9f;
2395+
2396+
if (benchmark_total_ > 0) {
2397+
camera_yaw_ += benchmark_yaw_speed_;
2398+
have_cached_cull_ = false;
2399+
2400+
if (benchmark_warmup_ > 0) {
2401+
--benchmark_warmup_;
2402+
} else {
2403+
benchmark_frame_times_.push_back(frame_cost_s * 1000.0f);
2404+
++benchmark_count_;
2405+
}
2406+
if (benchmark_count_ >= benchmark_total_) {
2407+
std::sort(benchmark_frame_times_.begin(), benchmark_frame_times_.end());
2408+
float sum = 0.0f;
2409+
for (float t : benchmark_frame_times_) sum += t;
2410+
float avg = sum / benchmark_frame_times_.size();
2411+
float median = benchmark_frame_times_[benchmark_frame_times_.size() / 2];
2412+
float p1 = benchmark_frame_times_[(size_t)(benchmark_frame_times_.size() * 0.01f)];
2413+
float p99 = benchmark_frame_times_[(size_t)(benchmark_frame_times_.size() * 0.99f)];
2414+
float total_arc = benchmark_yaw_speed_ * (benchmark_total_ + 5);
2415+
qDebug("\n=== BENCHMARK (%d frames, orbit %.0f° at %.1f°/frame) ===",
2416+
benchmark_total_, total_arc, benchmark_yaw_speed_);
2417+
qDebug(" avg: %.2f ms (%.1f fps)", avg, 1000.0f / avg);
2418+
qDebug(" median: %.2f ms (%.1f fps)", median, 1000.0f / median);
2419+
qDebug(" p1: %.2f ms p99: %.2f ms", p1, p99);
2420+
qDebug(" last frame: obj %u tri %u sub_draws %u hiz_rej %u",
2421+
visible_objects_, visible_triangles_,
2422+
indirect_sub_draws_,
2423+
hiz_reject_count_.load());
2424+
qDebug("=== END BENCHMARK ===\n");
2425+
QCoreApplication::quit();
2426+
return;
2427+
}
2428+
requestUpdate();
2429+
}
2430+
23562431
accumulated_time_ += frame_cost_s;
23572432
frame_count_++;
23582433
if (accumulated_time_ >= 1.0f) {

src/ifcviewer/ViewportWindow.h

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -174,6 +174,10 @@ class ViewportWindow : public QWindow {
174174
void setSelectedObjectId(uint32_t id);
175175
uint32_t pickObjectAt(int x, int y);
176176

177+
void setCamera(float tx, float ty, float tz, float dist, float yaw, float pitch);
178+
void setBenchmarkFrames(int n);
179+
QString cameraString() const;
180+
177181
struct FrameStats {
178182
float fps;
179183
float frame_time_ms;
@@ -194,6 +198,7 @@ class ViewportWindow : public QWindow {
194198
protected:
195199
void exposeEvent(QExposeEvent* event) override;
196200
void resizeEvent(QResizeEvent* event) override;
201+
void keyPressEvent(QKeyEvent* event) override;
197202
bool event(QEvent* event) override;
198203

199204
private:
@@ -381,6 +386,14 @@ class ViewportWindow : public QWindow {
381386
// When the camera stops, re-cull once at the base threshold.
382387
bool last_cull_was_motion_ = false;
383388

389+
// Benchmark mode: render N frames, collect stats, then exit.
390+
int benchmark_total_ = 0;
391+
int benchmark_count_ = 0;
392+
int benchmark_warmup_ = 5;
393+
float benchmark_yaw_start_ = 0.0f;
394+
float benchmark_yaw_speed_ = 0.5f; // degrees per frame
395+
std::vector<float> benchmark_frame_times_;
396+
384397
// Per-frame stats
385398
uint32_t visible_triangles_ = 0;
386399
uint32_t visible_objects_ = 0;

0 commit comments

Comments
 (0)