Skip to content

Ifcviewer - an ultra fast ifcopenshell viewer and app#7930

Draft
Moult wants to merge 62 commits intodatamodel-v1.0from
ifcviewer
Draft

Ifcviewer - an ultra fast ifcopenshell viewer and app#7930
Moult wants to merge 62 commits intodatamodel-v1.0from
ifcviewer

Conversation

@Moult
Copy link
Copy Markdown
Contributor

@Moult Moult commented Apr 11, 2026

From https://community.osarch.org/discussion/3386/addressing-some-core-ifcopenshell-issues

A desktop viewer. Right now, Blender is really not optimised for a viewer and people don't realise how fast IOS really is because Blender itself imposes a crazy overhead for loading Blender meshes. We need a way on the desktop to view 50 models for simple coordination.

So the high level proposal is to make:

  1. A big viewer tool. Basically Bonsai but not for authoring, it's for coordination. It's for viewing, BCF issue tracking, clash detection, viewing with drawings, connection to CDEs, all the stuff that we currently have to do with proprietary software (BIMVision, Revizto, ...).
  2. A framework for other people to build their own tools. So if you want to build your own kiosk app, tablet app, desktop app, phone app, ... you have a starting point. Kind of like the non-web equivalent of "npm install web-ifc-component"

My general impression is that all the strategies of making a high performance viewer is well established and heavily documented (i.e. well-trained AI). So I'm fairly confident we can quickly get to a state where all the basic tricks are implemented with vibe coding and start to approach the more cutting edge like whatever nanite is doing.

Also, I think the general approach of a Qt app with dockable windows and standard properties viewers, checkboxes, settings window, status bar, etc is also very well established and AI will do a pretty good job in a greenfield situation, so I hope to vibe away.

I hope to get to a milestone where the bare viewer is ready with a good foundation. After that, hopefully with the work done with IfcZero, datamodel refactoring, new kernels, etc especially with Python utils upstreamed to C++, I hope especially with the guidance from the Bonsai data classes and UI layer, to replicate some of the more important readonly properties. I think this somewhat low risk, so long as the AI helps with all the Qt stuff, I'll be able to look after the IFC-related domain logic.

Obviously, don't merge :) Comments very very welcome.

See README.md for latest full explanation:

https://github.com/IfcOpenShell/IfcOpenShell/tree/ifcviewer/src/ifcviewer

(Also ignore the Python stuff and build hacks that's just a mess I made in my local dev)

@Moult Moult marked this pull request as draft April 11, 2026 10:44
@Moult Moult mentioned this pull request Apr 11, 2026
@aothms
Copy link
Copy Markdown
Member

aothms commented Apr 11, 2026

For me either way is fine, but maybe in light of "release early release often" I think doing this with daily releases enabled would be even cooler. Just tell Claude/Codex/... that releases are build with build_win.yml build_rocky.yml build_rocky_arm.yml build_osx.yml should be able to figure it out.

@Moult Moult force-pushed the ifcviewer branch 2 times, most recently from d000002 to 91c8e46 Compare April 14, 2026 10:05
@Moult
Copy link
Copy Markdown
Contributor Author

Moult commented Apr 18, 2026

This bears a bit of commentary. After commit 196f984 landed, basically I then decided to "do whatever AI thought necessary in GPU shader land".

It was not a good move.

All those commits up until 01dd8d5 were basically just GPU experiments and it basically didn't do much at all. Most stuff was actually reverted - it looked like AI was going nuts.

I then went back and decided to fix two things. Essentially, most of the time is when navigating. So 1) get HiZ to work properly - it was previously not implemented correctly and basically always off as a result and 2) just turn off small objects when orbiting (kind of like a aggressive contribution culling.

After those two tweaks (HiZ I never got to be absolutely perfect, there were always edge cases, I believe due to a huge scene distance with combinations of lots of very thin (extreme aspect ratio AABBs) objects like pipes but it was good enough whilst orbiting) I think the results speak for themselves: basically huge speed boost.

Note right now these are mostly configured via env vars. Also there are new benchmarking and camera args so I can consistently measure stats across scenes and flags that turn features on and off.

IFC_GPU_CULL=0 IFC_HIZ_MOTION=0 IFC_MIN_PX_MOTION=0
=== BENCHMARK (200 frames, orbit 103° at 0.5°/frame) ===
  avg: 61.25 ms (16.3 fps)
  median: 59.88 ms (16.7 fps)
  p1: 54.60 ms  p99: 73.61 ms
  last frame: obj 254097  tri 44719843  sub_draws 155073  hiz_rej 0
=== END BENCHMARK ===

IFC_GPU_CULL=0 IFC_HIZ_MOTION=0 IFC_MIN_PX_MOTION=10
=== BENCHMARK (200 frames, orbit 103° at 0.5°/frame) ===
  avg: 37.67 ms (26.5 fps)
  median: 37.56 ms (26.6 fps)
  p1: 33.81 ms  p99: 44.90 ms
  last frame: obj 70408  tri 21657373  sub_draws 55952  hiz_rej 0
=== END BENCHMARK ===

IFC_GPU_CULL=0 IFC_HIZ_MOTION=1 IFC_MIN_PX_MOTION=0
=== BENCHMARK (200 frames, orbit 103° at 0.5°/frame) ===
  avg: 21.44 ms (46.6 fps)
  median: 20.76 ms (48.2 fps)
  p1: 12.40 ms  p99: 30.23 ms
  last frame: obj 33479  tri 8836541  sub_draws 17495  hiz_rej 28067
=== END BENCHMARK ===

IFC_GPU_CULL=0 IFC_HIZ_MOTION=1 IFC_MIN_PX_MOTION=10
=== BENCHMARK (200 frames, orbit 103° at 0.5°/frame) ===
  avg: 19.62 ms (51.0 fps)
  median: 18.51 ms (54.0 fps)
  p1: 11.30 ms  p99: 30.13 ms
  last frame: obj 11388  tri 6935091  sub_draws 8683  hiz_rej 11488
=== END BENCHMARK ===

IFC_GPU_CULL=1 IFC_HIZ_MOTION=1 IFC_MIN_PX_MOTION=10
=== BENCHMARK (200 frames, orbit 103° at 0.5°/frame) ===
  avg: 19.22 ms (52.0 fps)
  median: 18.35 ms (54.5 fps)
  p1: 10.94 ms  p99: 29.98 ms
  last frame: obj 11304  tri 6912165  sub_draws 8584  hiz_rej 58963
=== END BENCHMARK ===

@Moult Moult force-pushed the ifcviewer branch 5 times, most recently from 6394034 to 8c41011 Compare April 22, 2026 20:39
Moult and others added 19 commits April 23, 2026 06:39
All AI generated slop. Do NOT trust these "fixes". It's just to get it
working on my machine.
Track per-object AABB and index range during upload. Each frame,
extract frustum planes from the view-projection matrix and cull
objects whose AABB is entirely outside any plane. Draw only visible
objects via glMultiDrawElements. Document the three-phase rendering
performance strategy in README.md.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Show FPS, frame time, visible/total objects, and visible/total
triangles in the status bar. Toggled via Settings > Show Performance
Stats, persisted in app settings.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Introduce ModelHandle and per-model GeometryStreamers so multiple IFC
files can be loaded simultaneously. Object IDs are globally unique
(monotonically increasing across models). File picker is now multiselect.
Each model gets a top-level tree node. Property lookup uses the correct
model's ifcopenshell::file. ViewportWindow supports hide/show/remove
per model via model_id filtering in the frustum cull pass.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Reflect current architecture: per-model streamers, glMultiDrawElements
with frustum culling, 32-byte vertex format with color, multiselect
file picker, settings/stats files.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…load

Phase 2 performance: BVH acceleration with median-split build, per-model
trees, and EBO re-sorting for GPU cache coherence. Raw binary .ifcview
sidecar stores full geometry + BVH for instant subsequent loads (skip
tessellation entirely).

Per-model GPU buffers (VAO/VBO/EBO per model) eliminate cross-model buffer
copies on growth. Sidecar reads happen on a background thread. Bulk GPU
uploads are progressive (48 MB/frame chunks) so the viewport stays
interactive while multi-GB models stream in.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Per-second frame log reports fps/ms, visible/total object & triangle
ratios, VRAM breakdown (VBO+EBO), model count, and pending uploads.

Upload-complete log includes per-model VBO/EBO MB and scene total VRAM.

Streamer runs an instancing analysis keyed on geom.id(): total shapes,
unique representations, dedup ratio, theoretical VBO/EBO/SSBO sizes if
instanced, potential savings, and top-5 most-duplicated representations.
Used to validate whether GPU instancing is worth the architectural
rewrite for a given dataset.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When a BVH leaf passes the frustum test, emit a single glMultiDrawElements
record covering the leaf's entire index range instead of one per object.
Leaves are contiguous in the EBO after reorderEbo, so the range is just
[first_object.index_offset, sum(index_count)]. Cuts draw calls by ~8x
(BVH_MAX_LEAF_SIZE) and shifts the bottleneck from CPU/driver per-draw
overhead toward GPU vertex throughput.

Per-object features (selection highlight, per-vertex color, object_id
picking) are unchanged — they operate on vertex attributes, not draw
state. Future per-object hide/override will use SSBO lookups sampled
by object_id in the fragment shader.

Slight overdraw from skipping per-object frustum tests within a leaf is
negligible given median-split BVH tightness and spare tri throughput.

Also adds visible_objects_ counter so stats still report true object
counts (not leaf counts), plus leaf_draws/model_draws breakdown in the
per-second frame log.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Commit A of the instancing migration (Phase 3a).  The streamer now runs
the iterator with use-world-coords=false and dedupes by the geometry's
representation id, emitting a MeshChunk once per unique geometry and an
InstanceChunk per placement.  The viewport keeps geometry in local
coordinates (28 B/vertex, down from 32) and applies the per-instance
transform in the vertex shader via an std430 SSBO indexed by
gl_InstanceID + a per-draw uniform offset.  After streaming finishes
finalizeModel() stable-sorts instances by mesh_id, assigns each mesh a
contiguous range, and uploads the SSBO; render then issues one
glDrawElementsInstancedBaseVertex per mesh.

BvhAccel is reshaped to operate on a generic BvhItem (world AABB +
model_id) so it can drive instance-level culling, but the path is not
wired in yet -- every instance is drawn every frame in this commit.
Progressive-during-streaming rendering is likewise disabled: a model
appears when its SSBO is uploaded, not incrementally.  Sidecar cache
is stubbed (reads miss, writes are no-ops); the v4 on-disk format with
MeshInfo + InstanceGpu sections lands in Commit B.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Commit B of the instancing migration.  The sidecar on-disk format is
reintroduced at version 4 with MeshInfo + InstanceCpu sections in place
of v3's flat per-object draw-info array.

After streaming finishes, MainWindow asks the viewport for a post-
finalise snapshot (VBO + EBO are read back from the GPU, meshes and
instances come from the CPU-side arrays) and writes it alongside
PackedElementInfo + the string table.  On a subsequent load,
readSidecar rehydrates the whole struct and ViewportWindow::
applyCachedModel uploads VBO/EBO/SSBO in a single step, bypassing the
iterator entirely.

Staleness check is still by source file size.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Re-wires the BVH acceleration structure on top of the new instanced
renderer.  Per model, build a BVH over per-instance world AABBs at
finalize (and on sidecar apply).  Each frame, traverse the BVH against
the camera frustum to produce a visible-instance index list, bucket by
mesh_id, and upload to a per-model SSBO at binding=1.  The main and
pick vertex shaders do a double-indirection
`instances[visible[u_offset + gl_InstanceID]]` so draws only touch
instances that passed the frustum test.

Models with fewer than BVH_MIN_OBJECTS instances skip the BVH build
and fall back to a linear per-instance frustum test.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Pre-allocate the instance SSBO on model creation (4 MB, grow-on-demand)
and append each arriving InstanceChunk directly to the GPU-side
InstanceGpu array in uploadInstanceChunk.  This makes a model drawable
as soon as its first mesh + first instance chunk land, rather than
waiting for finalizeModel.

The visible-list architecture already decouples SSBO order from the
draw path, so appending in insertion order is correct — no sorting
required.  finalizeModel collapses to:
  - compute per-mesh instance counts (for stats + sidecar round-trip)
  - build the per-model BVH over instance world AABBs

Render / pick loops now gate on ssbo_instance_count > 0 rather than
the finalized flag.  Stats include in-progress models in totals
(excluding only hidden).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Each visible model now issues a single glMultiDrawElementsIndirect
call instead of one glDrawElementsInstancedBaseVertex per mesh.  The
CPU BVH cull populates an array of DrawElementsIndirectCommand
records plus the flat visible-instance list, uploads both, and draws
the whole model in one GL call.

Vertex shaders switch from a uniform u_instance_offset to
gl_BaseInstanceARB (ARB_shader_draw_parameters), so per-draw offset
comes from the indirect command's baseInstance field.

Draw-call counts for BIM scenes with hundreds of unique meshes drop
from hundreds-per-frame to one-per-model, cutting driver overhead.
This also sets up the plumbing for the follow-up compute-shader cull
that will populate the indirect buffer entirely on-GPU.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Two bugs conflated as "weird colors":

1. Two-sided lighting.  IFC placements often embed reflection
   matrices (mirrored families).  Transforming a_normal by
   mat3(inst.transform) produces a normal pointing the wrong way
   on those instances, and max(n·L, 0) then clamps the surface to
   pure ambient — reads as dark / washed out.  Use gl_FrontFacing
   to flip n in the fragment shader so both winding orientations
   shade correctly.  The proper fix (ship an inverse-transpose
   normal matrix or a det-sign bit per instance) is still owed;
   that would unlock re-enabling GL_CULL_FACE for a big fragment-
   work win on closed solids.

2. Stats label "inst_draws" was counting indirect sub-draws, not
   actual GL draw calls — misleading since MDI collapses N sub-
   draws into one glMultiDrawElementsIndirect.  Split into
   gl_draw_calls (real GL calls, = drawn-model count) and
   indirect_sub_draws (packed sub-commands).  For a BIM model
   with 47k unique meshes at full view this now correctly reads
   "1 gl_draws (47092 sub)" rather than suggesting 47k driver
   dispatches.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
IFC files routinely have IfcConnectedFaceSets whose faces point
inconsistently within the same shell — the result under per-vertex
normals is dark inside-out patches, and under GL_CULL_FACE it's
swiss-cheese.  reorient-shells fixes the face winding at geometry
generation time, which is the only place it can be fixed correctly;
no shader trick can recover from a mesh whose triangles disagree
among themselves.

Off by default in IfcOpenShell because it adds iterator time, but
we cache the result in the sidecar so it's a one-shot cost per file.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Enables GL_CULL_FACE by default (user-toggleable in Settings) so
closed solids skip shading their back halves.  The catch is that
IFC placements can contain reflections (mat4 with det<0 — mirrored
families, symmetric instances).  Naively culling would make every
mirrored instance vanish because the rasterizer sees its screen-space
winding as backwards.

Fix: detect reflections at upload time via determinant sign, bucket
visible instances into forward (det>=0) and reverse (det<0) per mesh
during culling, and issue two glMultiDrawElementsIndirect calls per
model with glFrontFace toggled CCW/CW between them.  The indirect
buffer is still one buffer — just split into a forward slice followed
by a reverse slice, with m.indirect_forward_count recording the split.

Vertex shader flips the normal when the transform has negative
determinant, keeping lighting correct on mirrored instances.  The
fragment shader keeps the gl_FrontFacing fallback as a safety net
when culling is disabled (e.g. for files with open shells).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Moult and others added 29 commits April 23, 2026 06:39
Two stability bugs:

1. Clicking an object left the scene with wrong shading until the camera
   moved.  The pick pass re-culls every model with its own parameters
   (min_pixel_radius=0, no HiZ) and overwrites each model's visible_ssbo
   and indirect buffer.  The next render() saw an unchanged camera,
   skipped the cull via the have_cached_cull_ shortcut, and drew the
   stale pick-pass buffers.  Fix: invalidate have_cached_cull_ at the
   end of pickObjectAt().

2. Loading two sidecar-cached models made the second model's picked
   properties resolve to the first model's elements.  Sidecars store raw
   object_id / model_id values from the session that wrote them, and
   both files start at object_id=1, so element_map_ entries collided.
   Fix: on load, rebase every PackedElementInfo and InstanceCpu by
   (next_object_id_ - min_id_in_sidecar) and overwrite model_id with
   the freshly-assigned handle before the elements hit element_map_.

Also document both in the README — the pick-pass note under 3A
contribution culling, the sidecar rebase under the sidecar format
section.
The 'Known caveats' bullet still described the old 1-frame-stale
behavior.  Since 6b496d8 the cull compares hiz_vp_ to the current VP
and drops HiZ rejection whenever they differ, so HiZ only helps on
still frames — orbiting gets no benefit.  Call out the tradeoff and
the planned same-frame-depth-pre-pass fix slated for Phase 3E.
Scaffolding for Phase 3E (GPU compute cull).  After finalizeModel /
applyCachedModel, pack each InstanceCpu's world AABB + mesh_id +
reflection bit into a std430-friendly 32 B record and push it to a
per-model aabb_ssbo.  No consumer yet — the CPU cull still drives
rendering — but the next commits will point a compute shader at this
buffer and have it produce the visible list + indirect commands
directly on the GPU.

Cost: 32 B per instance, ~18 MB for the 569 k-instance test scene.
One-shot upload at finalize time; streaming-time appends aren't
mirrored (the CPU cull doesn't need the SSBO, and finalizeModel
rebuilds the whole thing in one go).
First Phase 3E milestone: a compute shader that reads the per-instance
world-AABB SSBO added in the last commit, tests each instance against
the 6 frustum planes, and atomicAdds a global counter.  No visible list
or indirect-buffer writes yet — the output is just a survivor count,
cross-checked each frame against the CPU cull's numbers in the stats
line (`gpu_cull[Xms in=A surv=B]`) so we can verify the plumbing end-
to-end before we hand the GPU responsibility for the actual render data.

Dispatched from render() after the CPU cull completes, only when
IFC_GPU_CULL=1 and the camera moved (the skipped-cull still-frame path
doesn't re-check either).  The readback is synchronous — that's fine
for a validation path; it'll go away once the GPU writes indirect
commands directly.

Expected invariant: gpu_cull.surv >= cpu_cull.visible_objects, since
the GPU path does frustum-only and CPU adds contribution + HiZ cuts on
top.  A large mismatch (orders of magnitude, or surv < visible) means
the SSBO upload or shader logic is wrong.

No shader/buffer bindings overlap with the draw path (compute uses
bindings 0/1, restored before drawing; draw programs rebind 0/1/2).
Promote the compute cull from a validation shader to the actual draw
driver.  With the gate on, the CPU cull fan-out is skipped and MDI
consumes gpu_indirect_buffer / gpu_visible_ssbo directly.

- uploadGpuCullStaticBuffers() pre-fills per-mesh DrawElementsIndirect
  commands and a mesh_base prefix sum so the compact shader can scatter
  survivors into a fixed per-mesh range.  Instance count for each
  command is zeroed by a tiny reset dispatch, then the compact shader
  atomically writes survivors and increments instanceCount.
- Draw loop branches on the gate: single CCW MDI with all mesh
  commands.  Fwd/rev winding split, LOD selection, and HiZ are still
  CPU-path-only; reflected instances render with wrong winding under
  this gate (step 3b).
- Once-per-second readback of each model's indirect buffer populates
  the survivor / visible-object / visible-triangle stats so the
  [frame] line reflects what the GPU actually drew.

Known regression: sub_draws is the full mesh count per model (~172k on
the test dataset) vs the handful of non-empty commands the CPU path
produces.  Command-processor overhead from zero-instance sub-draws is
what drives the FPS drop, not the cull itself (0.05 ms).  Compacting
non-empty commands requires glMultiDrawElementsIndirectCount, a GL 4.6
entrypoint not exposed by Qt's QOpenGLFunctions_4_5_Core; deferring to
3a-followup so we don't bolt a getProcAddress loader into the renderer
mid-restructure.

IFC_GPU_CULL is off by default, so this does not affect normal runs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Extend the GPU-cull indirect buffer from M to 2M commands: the first M
are the forward (non-reflected, CCW) bucket, the second M are the
reverse (reflected, CW) bucket.  The compact shader reads flags bit 0
from the AABB SSBO and routes each survivor to the appropriate bucket
via bucket = reflected ? mesh_id + M : mesh_id.

uploadGpuCullStaticBuffers() now precomputes exact per-mesh fwd/rev
instance counts so each bucket reserves only the slots it needs
(total visible_ssbo size unchanged — sum of fwd + rev = total).

Draw loop issues two MDIs per model under IFC_GPU_CULL: first M
commands CCW, next M commands CW.

Sub-draws doubled (172k → 345k) which further regresses FPS due to
command-processor overhead from zero-instance sub-draws — the same
issue noted in 3a.  MDI compaction remains the fix.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The compact shader now computes per-instance pixel radius and routes
survivors to LOD1 buckets when the projected sphere falls below the
LOD1 threshold (default 30 px, same as CPU path, tunable via
IFC_LOD1_PX).

Layout expanded from 2 to 4 buckets per mesh:
  [0..M)   fwd_lod0   [M..2M)   fwd_lod1
  [2M..3M) rev_lod0   [3M..4M)  rev_lod1

Two MDIs per model: CCW for [0..2M), CW for [2M..4M).  Per-mesh
has_lod1 flags live in a new gpu_mesh_flags_ssbo (binding 4).

Contribution cull refactored: the compact shader now computes
pixelRadius() once and uses it for both the min_pixel_radius rejection
and LOD routing, matching the CPU path's logic.

Visible-buffer worst case is 2 × total_instances (each LOD bucket
reserves the full fwd/rev capacity per mesh, since LOD selection is
dynamic).

Tri count drops ~60% on the test dataset (53M → 22M) thanks to LOD1
decimated meshes.  FPS recovers from 16 to 36 despite 690k sub_draws
(4M layout).  MDI compaction remains the final perf fix.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Two-phase compute-cull dispatch when IFC_GPU_CULL=1:

  Phase 1  frustum + contribution + LOD, no HiZ  → survivors
  Depth    render survivors depth-only into half-viewport FBO
  Build    GPU compute max-reduce depth → R32F mip pyramid
  Phase 2  same cull + HiZ test                  → final survivors
  Color    render final survivors

The compact shader's new hizOccluded() projects 8 AABB corners to
screen space, picks the mip level where the covered rect fits in ≤2×2
texels, and rejects when the AABB's near-depth exceeds the pyramid's
max depth.

New GPU resources (per-window):
  hiz_gpu_fbo_ / hiz_gpu_depth_tex_  — depth-only FBO at half viewport
  hiz_gpu_pyramid_tex_                — R32F mipmapped pyramid
  hiz_gpu_copy_prog_                  — compute: depth → pyramid L0
  hiz_gpu_reduce_prog_                — compute: max-reduce L(n-1)→L(n)
  hiz_gpu_depth_prog_                 — vertex + trivial fragment

On a dense 18-model BIM dataset:
  survivors:  140k → 65k  (HiZ rejects ~50%)
  triangles:  22M  → 13M
  gpu_cull:   0.06ms → 22.5ms  (depth pre-pass CP overhead)

The depth pre-pass suffers the same empty-sub-draws CP overhead as the
color pass (690k commands, most with instanceCount=0).  Once MDI
compaction lands, both passes will be fast.  For now, net FPS is flat
(savings on color ≈ cost of depth pre-pass).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Pack compute shader compacts non-empty indirect commands into
contiguous fwd/rev ranges, eliminating ~690k empty sub-draws that
dominated command-processor overhead.  GL 4.6 entrypoint loaded via
getProcAddress with ARB fallback; graceful degradation to uncompacted
MDI when unavailable.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the CPU BVH traversal + frustum + contribution stages with a
GPU compute path (IFC_GPU_CULL=1).  A single scene-wide dispatch tests
all instances against frustum planes and screen-space contribution
threshold, compacting survivors into a flat uint32 buffer via atomicAdd.

Uses one-frame-late async readback: frame N dispatches and fences,
frame N+1 polls the fence (non-blocking) and reads the persistent-
mapped result buffer with zero GPU sync cost.  CPU still handles HiZ,
LOD selection, winding bucketing, and indirect command generation from
the compact survivor list; draw path is unchanged.

On a 1M-instance / 111-model scene (GTX 1650):
  GPU dispatch:  0.70 ms  (frustum + contribution, brute-force)
  Readback:      0.00 ms  (fence already signaled, persistent map)
  CPU consume:   5.7–6.7 ms  (parallel emit across models)
  Cull wall:     5.8–6.9 ms  (vs 9.6–15.2 ms CPU-only path)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…cull

Only clear and emit mesh buckets that received survivors in the previous
frame, converting both phases from O(total_meshes) to O(active_meshes).
Adds per-sub-phase timing (bin/clr/class/emit) to the stats line.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The HiZ pipeline had two bugs causing false occlusions:

1. The scaling depth blit (glBlitFramebuffer from window-size to HiZ-size)
   produced GL_INVALID_VALUE on some drivers. Replace with a fullscreen-
   triangle shader that samples the resolved depth and writes gl_FragDepth.

2. The resolve texture used GL_DEPTH_COMPONENT24 but Qt's default FBO uses
   D24S8 (depth+stencil). Mismatched formats cause the MSAA resolve blit
   to fail. Fix by using GL_DEPTH24_STENCIL8 for the resolve texture.

Additionally, the occlusion test was too aggressive for scenes with
compressed depth ranges (entire scene in 0.99-1.0). Change from
"max over coarse mip texels" to "reject only if ALL fine-mip texels
agree the AABB is behind them", with early-out on first non-occluding
texel and a 64-sample cap.

Also fix IFC_HIZ_MOTION=0 being treated as enabled (checked env var
existence, not value).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
During camera motion, use a larger pixel-radius threshold (IFC_MIN_PX_MOTION)
to aggressively cull small objects, dramatically reducing sub_draws and
improving orbit fps (e.g. 29→67 fps on 1M-instance scene).  When the camera
stops, automatically re-cull at the base threshold to restore full detail.

Key behaviors:
- IFC_MIN_PX_MOTION=N sets the motion threshold (0 = disabled)
- Settle recull fires on the first still frame after motion
- HiZ pyramid invalidated on settle (stale from sparse motion frame)
- GPU cull results skipped on settle (dispatched at motion threshold)
- requestUpdate() ensures the settle frame actually runs

Also adds IFC_SUBDRAW_DIAG=1 diagnostic for sub-draw composition analysis
and documents Phase 3E/3F experiment results in README.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add --camera tx,ty,tz,dist,yaw,pitch and --benchmark N CLI args for
reproducible performance measurement.  The benchmark orbits the camera
(0.5°/frame yaw) for N frames after a 5-frame warmup, prints
avg/median/p1/p99 frame times, then exits.  Press C during interactive
use to print the current camera as a --camera argument.

Fix settle recull to fire after ANY camera motion (not just when
IFC_MIN_PX_MOTION is set), ensuring HiZ artifacts from motion frames
are always cleared when the camera stops.

Document Phase 3G (motion-adaptive culling + HiZ during motion) in
README with benchmark results from 1.06M-instance scene:
  - Baseline:                    16.3 fps
  - IFC_MIN_PX_MOTION=10:       26.5 fps (1.6x)
  - IFC_HIZ_MOTION=1:           46.6 fps (2.9x)
  - Both combined:              51.0 fps (3.1x)
  - + GPU_CULL:                 52.0 fps (3.2x, negligible gain)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Benchmarks showed negligible gain (52 vs 51 fps) — the CPU BVH path
already culls efficiently, and the GPU path still read back to CPU for
LOD/winding/HiZ. Removes ~570 lines of dead weight: compute shader,
async readback, one-frame-late consume, per-model AABB SSBOs, and
profiling counters.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use $<TARGET_FILE_DIR:IfcGeom> instead of hardcoded
${CMAKE_BINARY_DIR}/ifcgeom/$<CONFIG> for plugin runtime dirs — the
old path was wrong on non-MSVC generators where $<CONFIG> expands
empty. Add explicit add_dependencies for kernel/mapping plugins so
IfcViewer waits for them to build, and drop the redundant direct link
against ${kernel_libraries}.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace i16x2 octahedral normals with i8x2, filling the 2-byte padding
after position and saving 4 bytes per vertex. int8 gives ~1.4 deg
worst-case angular error — invisible for BIM geometry which is
overwhelmingly axis-aligned. 25% VBO reduction; sidecar files shrink
~15% overall (5.4 GB -> 4.6 GB on a 111-model test scene). Bumps
sidecar format to v7.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Edge-collapse decimation (meshopt_simplify) returns BIM meshes unchanged
due to per-triangle vertex duplication and non-manifold topology. The
sloppy voxel-clustering decimator is faster, needs no shadow index
welding, and produces good results at the sub-30px LOD1 threshold.
Remove the non-sloppy branch, shadow buffer, IFC_LOD_SLOPPY and
IFC_LOD_LOCK_BORDER env vars.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Turn src/ifcviewer into libIfcViewer.so holding the rendering engine +
geometry pipeline (ViewportWindow, GeometryStreamer, BvhAccel,
InstancedGeometry, SidecarCache, LodBuilder, AppSettings).  Move the
existing UI shell (MainWindow, SettingsWindow, main.cpp) into
src/ifcviewer-full as the IfcViewerFull executable.  Add a new
src/ifcviewer-minimal target with a MinimalWindow that hosts only the
viewport and reuses the sidecar fast-path for benchmark/debug runs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
MainWindow and MinimalWindow each carried ~150 lines of mirrored
load-queue, sidecar-thread, streamer-wiring, and ID-rebase code. Lift
all of it into a SceneLoader QObject in the library; both apps now
consume it via signals. Sidecar writes stay on the full-app side since
they need the consumer's element metadata strings.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Handle streamer success, failure, and cancellation as distinct terminal states so failed or cancelled loads do not finalize as successful models. Clean up partial model/UI state in the full and minimal viewer apps when a load is cancelled or fails.

Generated with the assistance of an AI coding tool.
Buffer viewport model mutations until the OpenGL context is initialized so loads that start before first exposure do not silently drop geometry or model state.

Generated with the assistance of an AI coding tool.
Previously readSidecar/writeSidecar were keyed on (path, file_size) with
staleness rejected at read time.  Switch to pure path-stem keying: foo.ifc
and foo.ifcdb/ both resolve to foo.ifcview, so the same cache serves either
source format.  Staleness is user-managed (delete the sidecar to force a
rebuild), which also lets sidecars be copied or moved independently of the
source.

v8 header drops the source_file_size field.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The viewer can now open a .rdb directory (as produced by
RocksDbSerializer / convert_path_to_rocksdb) anywhere it accepts an
.ifc file. The full GUI gets an "Add Database..." File menu entry
that opens a directory chooser; the streamer lets the file
constructor autodetect the format and opens the store read-only so
multiple viewers can share a database without taking the exclusive
RocksDB lock.

Parallel mapping on RocksDB-backed files still produces
non-deterministic shape counts (the race is outside the instance
cache), so force num_threads=1 for the iterator when the storage is
RocksDB. Serial RocksDB (~2.6s) and parallel SPF (~0.7s) both
produce 107 shapes on AC20-FZK-Haus; @todo in-source points at the
remaining thread-safety work.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants