Skip to content

Ifcviewer - an ultra fast ifcopenshell viewer and app#7930

Draft
Moult wants to merge 37 commits intodatamodel-v1.0from
ifcviewer
Draft

Ifcviewer - an ultra fast ifcopenshell viewer and app#7930
Moult wants to merge 37 commits intodatamodel-v1.0from
ifcviewer

Conversation

@Moult
Copy link
Copy Markdown
Contributor

@Moult Moult commented Apr 11, 2026

From https://community.osarch.org/discussion/3386/addressing-some-core-ifcopenshell-issues

A desktop viewer. Right now, Blender is really not optimised for a viewer and people don't realise how fast IOS really is because Blender itself imposes a crazy overhead for loading Blender meshes. We need a way on the desktop to view 50 models for simple coordination.

So the high level proposal is to make:

  1. A big viewer tool. Basically Bonsai but not for authoring, it's for coordination. It's for viewing, BCF issue tracking, clash detection, viewing with drawings, connection to CDEs, all the stuff that we currently have to do with proprietary software (BIMVision, Revizto, ...).
  2. A framework for other people to build their own tools. So if you want to build your own kiosk app, tablet app, desktop app, phone app, ... you have a starting point. Kind of like the non-web equivalent of "npm install web-ifc-component"

My general impression is that all the strategies of making a high performance viewer is well established and heavily documented (i.e. well-trained AI). So I'm fairly confident we can quickly get to a state where all the basic tricks are implemented with vibe coding and start to approach the more cutting edge like whatever nanite is doing.

Also, I think the general approach of a Qt app with dockable windows and standard properties viewers, checkboxes, settings window, status bar, etc is also very well established and AI will do a pretty good job in a greenfield situation, so I hope to vibe away.

I hope to get to a milestone where the bare viewer is ready with a good foundation. After that, hopefully with the work done with IfcZero, datamodel refactoring, new kernels, etc especially with Python utils upstreamed to C++, I hope especially with the guidance from the Bonsai data classes and UI layer, to replicate some of the more important readonly properties. I think this somewhat low risk, so long as the AI helps with all the Qt stuff, I'll be able to look after the IFC-related domain logic.

Obviously, don't merge :) Comments very very welcome.

See README.md for latest full explanation:

https://github.com/IfcOpenShell/IfcOpenShell/tree/ifcviewer/src/ifcviewer

(Also ignore the Python stuff and build hacks that's just a mess I made in my local dev)

@Moult Moult marked this pull request as draft April 11, 2026 10:44
@Moult Moult mentioned this pull request Apr 11, 2026
@aothms
Copy link
Copy Markdown
Member

aothms commented Apr 11, 2026

For me either way is fine, but maybe in light of "release early release often" I think doing this with daily releases enabled would be even cooler. Just tell Claude/Codex/... that releases are build with build_win.yml build_rocky.yml build_rocky_arm.yml build_osx.yml should be able to figure it out.

Moult and others added 27 commits April 14, 2026 08:30
All AI generated slop. Do NOT trust these "fixes". It's just to get it
working on my machine.
Track per-object AABB and index range during upload. Each frame,
extract frustum planes from the view-projection matrix and cull
objects whose AABB is entirely outside any plane. Draw only visible
objects via glMultiDrawElements. Document the three-phase rendering
performance strategy in README.md.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Show FPS, frame time, visible/total objects, and visible/total
triangles in the status bar. Toggled via Settings > Show Performance
Stats, persisted in app settings.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Introduce ModelHandle and per-model GeometryStreamers so multiple IFC
files can be loaded simultaneously. Object IDs are globally unique
(monotonically increasing across models). File picker is now multiselect.
Each model gets a top-level tree node. Property lookup uses the correct
model's ifcopenshell::file. ViewportWindow supports hide/show/remove
per model via model_id filtering in the frustum cull pass.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Reflect current architecture: per-model streamers, glMultiDrawElements
with frustum culling, 32-byte vertex format with color, multiselect
file picker, settings/stats files.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…load

Phase 2 performance: BVH acceleration with median-split build, per-model
trees, and EBO re-sorting for GPU cache coherence. Raw binary .ifcview
sidecar stores full geometry + BVH for instant subsequent loads (skip
tessellation entirely).

Per-model GPU buffers (VAO/VBO/EBO per model) eliminate cross-model buffer
copies on growth. Sidecar reads happen on a background thread. Bulk GPU
uploads are progressive (48 MB/frame chunks) so the viewport stays
interactive while multi-GB models stream in.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Per-second frame log reports fps/ms, visible/total object & triangle
ratios, VRAM breakdown (VBO+EBO), model count, and pending uploads.

Upload-complete log includes per-model VBO/EBO MB and scene total VRAM.

Streamer runs an instancing analysis keyed on geom.id(): total shapes,
unique representations, dedup ratio, theoretical VBO/EBO/SSBO sizes if
instanced, potential savings, and top-5 most-duplicated representations.
Used to validate whether GPU instancing is worth the architectural
rewrite for a given dataset.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When a BVH leaf passes the frustum test, emit a single glMultiDrawElements
record covering the leaf's entire index range instead of one per object.
Leaves are contiguous in the EBO after reorderEbo, so the range is just
[first_object.index_offset, sum(index_count)]. Cuts draw calls by ~8x
(BVH_MAX_LEAF_SIZE) and shifts the bottleneck from CPU/driver per-draw
overhead toward GPU vertex throughput.

Per-object features (selection highlight, per-vertex color, object_id
picking) are unchanged — they operate on vertex attributes, not draw
state. Future per-object hide/override will use SSBO lookups sampled
by object_id in the fragment shader.

Slight overdraw from skipping per-object frustum tests within a leaf is
negligible given median-split BVH tightness and spare tri throughput.

Also adds visible_objects_ counter so stats still report true object
counts (not leaf counts), plus leaf_draws/model_draws breakdown in the
per-second frame log.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Commit A of the instancing migration (Phase 3a).  The streamer now runs
the iterator with use-world-coords=false and dedupes by the geometry's
representation id, emitting a MeshChunk once per unique geometry and an
InstanceChunk per placement.  The viewport keeps geometry in local
coordinates (28 B/vertex, down from 32) and applies the per-instance
transform in the vertex shader via an std430 SSBO indexed by
gl_InstanceID + a per-draw uniform offset.  After streaming finishes
finalizeModel() stable-sorts instances by mesh_id, assigns each mesh a
contiguous range, and uploads the SSBO; render then issues one
glDrawElementsInstancedBaseVertex per mesh.

BvhAccel is reshaped to operate on a generic BvhItem (world AABB +
model_id) so it can drive instance-level culling, but the path is not
wired in yet -- every instance is drawn every frame in this commit.
Progressive-during-streaming rendering is likewise disabled: a model
appears when its SSBO is uploaded, not incrementally.  Sidecar cache
is stubbed (reads miss, writes are no-ops); the v4 on-disk format with
MeshInfo + InstanceGpu sections lands in Commit B.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Commit B of the instancing migration.  The sidecar on-disk format is
reintroduced at version 4 with MeshInfo + InstanceCpu sections in place
of v3's flat per-object draw-info array.

After streaming finishes, MainWindow asks the viewport for a post-
finalise snapshot (VBO + EBO are read back from the GPU, meshes and
instances come from the CPU-side arrays) and writes it alongside
PackedElementInfo + the string table.  On a subsequent load,
readSidecar rehydrates the whole struct and ViewportWindow::
applyCachedModel uploads VBO/EBO/SSBO in a single step, bypassing the
iterator entirely.

Staleness check is still by source file size.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Re-wires the BVH acceleration structure on top of the new instanced
renderer.  Per model, build a BVH over per-instance world AABBs at
finalize (and on sidecar apply).  Each frame, traverse the BVH against
the camera frustum to produce a visible-instance index list, bucket by
mesh_id, and upload to a per-model SSBO at binding=1.  The main and
pick vertex shaders do a double-indirection
`instances[visible[u_offset + gl_InstanceID]]` so draws only touch
instances that passed the frustum test.

Models with fewer than BVH_MIN_OBJECTS instances skip the BVH build
and fall back to a linear per-instance frustum test.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Pre-allocate the instance SSBO on model creation (4 MB, grow-on-demand)
and append each arriving InstanceChunk directly to the GPU-side
InstanceGpu array in uploadInstanceChunk.  This makes a model drawable
as soon as its first mesh + first instance chunk land, rather than
waiting for finalizeModel.

The visible-list architecture already decouples SSBO order from the
draw path, so appending in insertion order is correct — no sorting
required.  finalizeModel collapses to:
  - compute per-mesh instance counts (for stats + sidecar round-trip)
  - build the per-model BVH over instance world AABBs

Render / pick loops now gate on ssbo_instance_count > 0 rather than
the finalized flag.  Stats include in-progress models in totals
(excluding only hidden).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Each visible model now issues a single glMultiDrawElementsIndirect
call instead of one glDrawElementsInstancedBaseVertex per mesh.  The
CPU BVH cull populates an array of DrawElementsIndirectCommand
records plus the flat visible-instance list, uploads both, and draws
the whole model in one GL call.

Vertex shaders switch from a uniform u_instance_offset to
gl_BaseInstanceARB (ARB_shader_draw_parameters), so per-draw offset
comes from the indirect command's baseInstance field.

Draw-call counts for BIM scenes with hundreds of unique meshes drop
from hundreds-per-frame to one-per-model, cutting driver overhead.
This also sets up the plumbing for the follow-up compute-shader cull
that will populate the indirect buffer entirely on-GPU.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Two bugs conflated as "weird colors":

1. Two-sided lighting.  IFC placements often embed reflection
   matrices (mirrored families).  Transforming a_normal by
   mat3(inst.transform) produces a normal pointing the wrong way
   on those instances, and max(n·L, 0) then clamps the surface to
   pure ambient — reads as dark / washed out.  Use gl_FrontFacing
   to flip n in the fragment shader so both winding orientations
   shade correctly.  The proper fix (ship an inverse-transpose
   normal matrix or a det-sign bit per instance) is still owed;
   that would unlock re-enabling GL_CULL_FACE for a big fragment-
   work win on closed solids.

2. Stats label "inst_draws" was counting indirect sub-draws, not
   actual GL draw calls — misleading since MDI collapses N sub-
   draws into one glMultiDrawElementsIndirect.  Split into
   gl_draw_calls (real GL calls, = drawn-model count) and
   indirect_sub_draws (packed sub-commands).  For a BIM model
   with 47k unique meshes at full view this now correctly reads
   "1 gl_draws (47092 sub)" rather than suggesting 47k driver
   dispatches.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
IFC files routinely have IfcConnectedFaceSets whose faces point
inconsistently within the same shell — the result under per-vertex
normals is dark inside-out patches, and under GL_CULL_FACE it's
swiss-cheese.  reorient-shells fixes the face winding at geometry
generation time, which is the only place it can be fixed correctly;
no shader trick can recover from a mesh whose triangles disagree
among themselves.

Off by default in IfcOpenShell because it adds iterator time, but
we cache the result in the sidecar so it's a one-shot cost per file.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Enables GL_CULL_FACE by default (user-toggleable in Settings) so
closed solids skip shading their back halves.  The catch is that
IFC placements can contain reflections (mat4 with det<0 — mirrored
families, symmetric instances).  Naively culling would make every
mirrored instance vanish because the rasterizer sees its screen-space
winding as backwards.

Fix: detect reflections at upload time via determinant sign, bucket
visible instances into forward (det>=0) and reverse (det<0) per mesh
during culling, and issue two glMultiDrawElementsIndirect calls per
model with glFrontFace toggled CCW/CW between them.  The indirect
buffer is still one buffer — just split into a forward slice followed
by a reverse slice, with m.indirect_forward_count recording the split.

Vertex shader flips the normal when the transform has negative
determinant, keeping lighting correct on mirrored instances.  The
fragment shader keeps the gl_FrontFacing fallback as a safety net
when culling is disabled (e.g. for files with open shells).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The previous README described a pre-instancing world (32-byte world-
coord vertices with per-vertex object_id, ObjectDrawInfo structs, EBO
reordering after BVH build, and a Phase 3 plan built around moving
draw submission to the GPU).  Most of that is either gone or already
solved:

  - Vertices are now 28 B local-coord; per-instance transforms live
    in an SSBO read through a visible-index SSBO and gl_BaseInstanceARB.
  - ObjectDrawInfo is replaced by MeshInfo + InstanceCpu + InstanceGpu.
  - No EBO reorder on BVH build — the BVH is over instance AABBs and
    the mesh/EBO layout is orthogonal.
  - Draw-call submission is already one glMultiDrawElementsIndirect
    per model; the old Phase 3 goal is met.

New content worth keeping:

  - GPU instancing section documents the mesh/instance/visible/indirect
    buffer contract the whole renderer hangs off of.
  - Reflection-aware two-pass draw is documented (det<0 placements,
    forward/reverse slice split, glFrontFace toggle).
  - reorient-shells and backface culling are called out as correctness
    + perf levers with their tradeoffs.
  - Phase 3 is rewritten around the actual bottleneck surfaced by
    profiling: per-frame glNamedBufferSubData stalls on the visible
    and indirect buffers.  Includes the diagnostic methodology (empty-
    screen jump to 60 fps, window/MSAA invariance, upload-comment-out
    experiment) so future-me remembers why this is the next step.
  - 3A (persistent mapped ring buffers, near-term) and 3B (GPU-side
    compute cull, longer-term) split out with scope estimates.
  - Roadmap updated: instancing / MDI / reflections / reorient-shells
    / backface cull all ticked; 3A surfaced as the next open item.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Earlier probes pointed at per-frame glNamedBufferSubData uploads as the
bottleneck (60 fps when those two calls were commented out).  That was a
false reading — zeroing the uploads also emptied the indirect buffer, so
MDI drew nothing.  "No upload" and "no draw" were indistinguishable.

Two new diagnostic env vars in render() isolate the real costs:

  IFC_SKIP_MDI=1       keep cull + upload + binds, skip only the MDI
                       draws.  Gives 62 fps with everything else running,
                       confirming the non-draw path fits in ~16 ms.
  IFC_MAX_SUBDRAWS=N   cap each MDI's drawcount.  67k -> 30k sub-draws
                       saves 0 ms, confirming sub-draw count itself is
                       not the bottleneck; the long tail of sub-draws
                       carries ~no triangles.

On a GTX 1650 with 128 M triangles in view, nvidia-smi sits at 95 %
GPU util and FPS scales with triangle work, not sub-draw count.  The
card is simply rasterising at ~850 M tri/s.  No CPU-side or upload
trick recovers it.

Revised Phase 3 is therefore shedding triangles, not bytes:
  3A screen-space contribution culling (next)
  3B LOD
  3C HiZ occlusion
  3D GPU-side compute culling

README Phase 3 section rewritten around the diagnosis, including the
false lead, so future work doesn't re-tread the upload path.  The
aborted staging+resident ring-buffer implementation was reverted (the
uncommitted working tree is gone — pure glNamedBufferSubData retained
for the visible + indirect buffers, which we now know is fine).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Reject frustum-visible objects whose bounding sphere projects below a
pixel-radius threshold.  Applied at both BVH-node level (whole subtrees
pruned) and per-instance level; short-circuits when the camera is
inside the AABB so nothing-you're-standing-next-to is ever lost.
Pick pass passes threshold 0 so sub-pixel objects stay clickable.

Threshold defaults to 2 px (radius), overridable via IFC_MIN_PX env
var.  Measured on the 128 M-tri test scene (GTX 1650):

  0 px (off):   6.7 fps, 128 M tris
  2 px:        20.2 fps,  40 M tris (31%)
  4 px:        30.3 fps,  15 M tris (12%)

The metric is sphere-based (cheap: one sqrt per test) rather than
AABB-corner projection; loses a little precision on very elongated
bounds but costs ~5x less per test and the BVH-node pre-cull means
the long-tail-of-small-things case is already handled by subtree
pruning before we touch individual instances.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Decimate each unique mesh once at sidecar-build time and swap to the
reduced index slice per-instance per-frame when projected sphere radius
drops below IFC_LOD1_PX (default 30).  Same VBO, same SSBO, just a
different firstIndex/count in the indirect command.

Extends MeshInfo (48→56 B) with lod1_ebo_byte_offset + lod1_index_count
and bumps the sidecar to v5.  buildLods() runs inside
onStreamingFinished, appends decimated indices to sd.indices,
applyLodExtension pushes the EBO suffix to the live GPU state, and the
sidecar is written with LOD1 baked in.

simplifySloppy (voxel clustering) is used instead of the default
edge-collapse meshopt_simplify because BIM brep output is per-triangle-
unwelded and non-manifold after welding — simplify returned the input
unchanged for every mesh tested.  Sloppy ignores topology.  Knobs
(IFC_LOD_SLOPPY, IFC_LOD_ERROR, IFC_LOD_RATIO, IFC_LOD_MIN_SAVINGS,
IFC_LOD_LOCK_BORDER, IFC_LOD_DEBUG) are available for A/B tuning.

Result on the 128M-tri 10-model test scene (GTX 1650, 2px contribution
cull): 20.2 → 43.2 fps, 40M → 14M visible triangles, no change in
object count.  LOD build adds 100–600 ms per model on first open,
cached thereafter.

README Phase 3B section is now a full writeup of pipeline, selection,
decimator-choice rationale, env vars, and measured numbers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
After the main draw, blit the MSAA default-framebuffer depth to a
single-sample 256×128 depth texture, read it back, and build a CPU
max-reduced mip pyramid.  Next frame's cullAndUploadVisible projects
each BVH node / instance AABB through the previous frame's VP and
compares the AABB's nearest depth against the pyramid's deepest value
at the matching mip level; strictly-beyond AABBs are rejected.

Conservative direction (aabb_near > hiz_max) — never wrongly rejects a
visible instance, so no flicker.  BVH subtree-level test lets a single
8-corner projection reject up to a leaf's worth of instances.

Tuning knobs: IFC_NO_HIZ=1 disables; IFC_HIZ_SIZE overrides base width.
New stats counter hiz_rej shows rejects/frame.

Measured: big win on interior views (GPU-bound), roughly zero net
effect on exterior overviews (CPU-bound on cull traversal, so the
saved GPU work is masked).  Tried a 3-deep PBO ring for async readback
and reverted — the extra frame of staleness produced visible flicker
on fast orbit, and the synchronous readback wasn't actually a measured
bottleneck at 256×128.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@Moult Moult force-pushed the ifcviewer branch 2 times, most recently from d000002 to 91c8e46 Compare April 14, 2026 10:05
Moult and others added 10 commits April 14, 2026 20:33
cullAndUploadVisible was reading each instance's AABB through
m.instances[idx] — a 104-byte InstanceCpu struct — for the frustum /
contribution / HiZ tests.  Only 24 of those bytes (the two float[3]
AABBs) are actually used by the tests; the rest (4×4 transform +
header) is pure cache-line waste, and with 569k instances the array
is 59 MB, well past any cache.

bvh_items[idx] already stores a 1:1 compact 28-byte record with the
same AABB, built unconditionally in buildBvhForModel().  Switch the
hot test path to read from it, and only touch InstanceCpu once an
instance has passed all three tests (for mesh_id).  Modest ~20 %
drop in cull-traverse time on a 569k-object overview (26 ms → 21 ms).

Also add four cull-phase timers (clr / trv / emt / upl) to the
per-second stats line so future optimisation work has concrete
numbers to chase.  Confirmed via these timers that bucket clears,
emit and GPU upload are all <1 ms combined; traversal is where the
remaining CPU cost lives.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
render() was re-running the full cull every 16 ms timer tick even when
nothing had changed — the camera matrices, scene state, and therefore
visible set were all identical to the previous frame's.  The GPU was
still happy to redraw from the cached indirect buffer, but the CPU was
burning 21 ms/frame rebuilding the same visible list.

Detect the no-op case by comparing view/proj against last_cull_view_ /
last_cull_proj_ and checking a scene-dirty flag (have_cached_cull_)
that every mutator on models_gpu_ invalidates — finalizeModel,
applyCachedModel, applyLodExtension, hide/show/remove/reset, and
uploadInstanceChunk.  When the check passes we skip both
cullAndUploadVisible and buildHizPyramid (the depth buffer is
bit-identical, so re-reading it produces the same pyramid).

Per-model visible_objects / visible_triangles stats now live on
ModelGpuData so the stats line reports correct numbers on skipped
frames instead of reading from a stale indirect_scratch_.

Measured on a 569k-object overview: still frames go 22 fps → 62 fps;
orbiting goes 23 fps → ~30-50 fps depending on how hard you move the
mouse (the cull only pays its full cost on the ~25 % of frames where
the camera actually moved).  The stats line gains a "skipped N/M"
field so you can see the ratio live.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replaced the 16ms QTimer with QEvent::UpdateRequest delivered via
requestUpdate(), posted from every state mutator (mouse/wheel, model
lifecycle, selection, visibility, resize).  A static BIM scene — the
common case for a viewer — now does no work at all between user actions.

FPS is now measured as time spent inside render() rather than wall-clock
gap between frames, so idle gaps don't pollute the 1-second window and
the headline number reflects real render throughput.  Headline fps still
caps at vsync; sub-vsync profiling lives in the cull[...] phase timers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Position now u16x3 normalized against each mesh's local AABB; normal
oct-encoded to i16x2; RGBA8 colour unchanged. Per-mesh dequant basis
lives in a new MeshGpu SSBO at binding 2; both main and pick shaders
mix() against it before applying the instance transform.

Drops VBO and sidecar size by ~43 % (28 -> 16 B/vert), which matters
mostly for warm-load downloads of precomputed sidecars and steady-state
VRAM. LodBuilder dequantizes positions into a scratch buffer before
calling meshopt, since meshoptimizer needs float positions.

Also fixes a streaming-time crash in cullAndUploadVisible: bvh_items
was only populated at finalize, but the linear fallback indexes it
during streaming. Mirror BvhItem appends in uploadInstanceChunk so the
hot path stays valid before the BVH is built.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add the event-driven rendering bullet (zero idle cost, in-render frame
timing) and roadmap entries for VBO quantization and event-driven
rendering.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Split cullAndUploadVisible into cullModelCpu (CPU-only, thread-safe) and
uploadCullResults (GL-only, main thread). render() fans the per-model
culls out via std::async and joins before the serial upload pass.

The cull scratch (vis_fwd/rev_lod0/1, visible_flat, indirect_scratch)
moved onto ModelGpuData so each worker owns its output buffers. Phase
timers and hiz_reject_count_ are atomic since workers fetch_add into
them. A new wall-clock timer around the dispatch block reports the
actual frame-time contribution; the existing clr/trv/emt counters are
now documented as per-thread sums.

Measured on the 18-model / 569k-instance test scene: wall-clock cull
dropped from ~25 ms to ~5 ms while the aggregate CPU work (trv) stayed
~30 ms. Frame time 34 ms -> 19 ms. IFC_CULL_THREADS=0 forces the
single-threaded fallback.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add the parallel cull bullet to the feature list, a Phase 3D section
explaining the fan-out / scratch-ownership design + measured 4x
speedup, and renumber the planned GPU compute cull to Phase 3E so it
can cite 3D as the CPU algorithm being ported.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
HiZ from last frame encodes depth from last frame's viewpoint. When
the camera moves, projecting a current-frame AABB through the stored
VP answers 'was this occluded last frame?' rather than 'is it occluded
now?' — a self-reinforcing feedback loop where objects culled in
prior frames never appear in any depth buffer and stay permanently
hidden at certain camera angles.

Fix: require hiz_vp_ == current VP for the HiZ test to apply. HiZ
still helps static views (kicks in one frame after camera stops) but
no longer produces false occlusions during orbit. The correct fix for
orbit coverage is a depth pre-pass feeding fresh HiZ — planned as
part of Phase 3E GPU compute cull.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Two stability bugs:

1. Clicking an object left the scene with wrong shading until the camera
   moved.  The pick pass re-culls every model with its own parameters
   (min_pixel_radius=0, no HiZ) and overwrites each model's visible_ssbo
   and indirect buffer.  The next render() saw an unchanged camera,
   skipped the cull via the have_cached_cull_ shortcut, and drew the
   stale pick-pass buffers.  Fix: invalidate have_cached_cull_ at the
   end of pickObjectAt().

2. Loading two sidecar-cached models made the second model's picked
   properties resolve to the first model's elements.  Sidecars store raw
   object_id / model_id values from the session that wrote them, and
   both files start at object_id=1, so element_map_ entries collided.
   Fix: on load, rebase every PackedElementInfo and InstanceCpu by
   (next_object_id_ - min_id_in_sidecar) and overwrite model_id with
   the freshly-assigned handle before the elements hit element_map_.

Also document both in the README — the pick-pass note under 3A
contribution culling, the sidecar rebase under the sidecar format
section.
The 'Known caveats' bullet still described the old 1-frame-stale
behavior.  Since 6b496d8 the cull compares hiz_vp_ to the current VP
and drops HiZ rejection whenever they differ, so HiZ only helps on
still frames — orbiting gets no benefit.  Call out the tradeoff and
the planned same-frame-depth-pre-pass fix slated for Phase 3E.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants