Skip to content

Commit 817b690

Browse files
committed
feat(blog): add post on debugging silent OpenGL samplerCube bug
1 parent 9e0478d commit 817b690

4 files changed

Lines changed: 337 additions & 87 deletions

File tree

Lines changed: 229 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,229 @@
1+
# The Case of the Invisible Geometry: Debugging a Silent OpenGL Driver Validation Failure
2+
3+
*How a `samplerCube` uniform guarded by `if (false)` made the entire scene vanish.*
4+
5+
---
6+
7+
## TL;DR
8+
9+
After a commit that added reflection probes and point shadow cubemaps to the main PBR shader, the editor scene view went black: no 3D geometry, no text on UI buttons, only particle emitters and UI button rectangles were visible. `glDrawArrays` executed for the right entities, the shader compiled without errors, and the framebuffer was configured correctly — yet every pixel was the exact clear color.
10+
11+
The root cause was a driver validation quirk: `default.frag` declared five `samplerCube` uniforms (a reflection probe and four point-shadow maps), but only bound real cubemap textures to their units when those features were actively used. When a scene had no reflection probe and no shadow-casting point lights, those texture units had *no cubemap bound at all*. The driver silently dropped every draw call against the main shader because the declared samplers had no valid backing texture — even though the code paths that would have actually sampled them were guarded by a uniform `bool` that was `false`.
12+
13+
The fix was to create a 1×1 black dummy cubemap at engine initialization and bind it to every `samplerCube` unit. Real cubemaps replace the binding when features are active.
14+
15+
---
16+
17+
## Symptoms
18+
19+
Two screenshots from the editor told the same story:
20+
21+
1. **Main menu scene**: the UI canvas's button rectangles rendered in their button colors, but no text appeared on them. The 3D background props (`BG Cube 1`, `BG Sphere`, `BG Teapot`, etc.) listed in the hierarchy were completely invisible.
22+
2. **Castle throne room scene**: a large 3D scene with 87 entities. Only the torch particle emitters were visible. No walls, no pillars, no floor, no text overlays.
23+
24+
The `Tris:` counter in the status bar read `0`, but that turned out to be a red herring — the counter is only incremented inside `Engine::Render()`, which in editor mode just clears the back buffer to dark gray and hands off to ImGui. The actual scene rendering goes through `Engine::RenderSceneToFBO()`, which doesn't touch the counter. So `Tris: 0` is expected in editor mode regardless of whether geometry is rendering.
25+
26+
The informative observation was what *did* render:
27+
28+
| System | Shader | Status |
29+
| --- | --- | --- |
30+
| UI button quads | `uiShader` | ✅ rendering |
31+
| Light/camera billboards | `spriteShader` | ✅ rendering |
32+
| Particle emitters | `particleShader` | ✅ rendering |
33+
| Text on buttons | `mainShader` via `TextRenderer` | ❌ missing |
34+
| 3D entity geometry | `mainShader` | ❌ missing |
35+
36+
Everything that used `mainShader` was broken. Everything that used a different shader worked. That narrowed the problem to the shader program itself or to how it was being used.
37+
38+
---
39+
40+
## Initial Hypotheses (and why they were wrong)
41+
42+
Before we could see actual data, we burned through several plausible-sounding theories.
43+
44+
**Hypothesis 1: `mainShader` failed to compile.** Plausible because I had recently added an L2 spherical harmonics function and new uniforms for irradiance probes. The `Shader` class runs `checkCompileErrors` which prints to `std::cerr`. I redirected `std::cout`/`std::cerr` to a log file at startup (the engine links as `WIN32` subsystem so stdout is discarded by default), and the log showed no shader errors. The shaders compiled and linked fine. Hypothesis ruled out.
45+
46+
**Hypothesis 2: Alpha clipping.** `default.frag` has an `if (albedoAlpha.a < alphaThreshold) discard;` test. If the texture format returned `alpha = 0` for RGB textures in violation of the spec, every fragment would be discarded. I forced `albedoAlpha.a = 1.0` at the top of the shader. No change. Ruled out.
47+
48+
**Hypothesis 3: Depth test rejection.** If the depth buffer was somehow pre-populated with `0.0`, `GL_LESS` would reject every new fragment. Added a `glDisable(GL_DEPTH_TEST)` in the diagnostic frame and read the pre-draw depth buffer (came back as `1.0`, correct). Ruled out.
49+
50+
**Hypothesis 4: Lighting math producing NaN.** If `finalColor` came out as NaN, some drivers drop the write. Added `if (any(isnan(finalColor))) FragColor = vec4(1,0,1,1);` magenta overrride — magenta never appeared, so no NaN. Ruled out.
51+
52+
**Hypothesis 5: Viewport / matrix corruption.** Logged camera position, view dimensions, and entity world matrices. All values were sensible: camera at `(0, 2, 10)`, viewport `800×600` then resized to `1300×743`, entity world matrices non-degenerate. Ruled out.
53+
54+
Each dead end was only dead in retrospect, but each one added diagnostic scaffolding that became load-bearing for the actual diagnosis.
55+
56+
---
57+
58+
## Getting Observability into a WIN32 GUI App
59+
60+
The first practical blocker was simply *seeing* anything. Norlong is linked as a Windows GUI app (`add_executable(Norlong WIN32 ...)`), which means there's no inherited console. Any `std::cout` goes nowhere by default, and capturing it with `Start-Process -RedirectStandardOutput` didn't produce output either.
61+
62+
The fix was one of the smallest patches of this debug session:
63+
64+
```cpp
65+
// main.cpp
66+
static std::ofstream engineLog("engine.log", std::ios::trunc);
67+
std::cout.rdbuf(engineLog.rdbuf());
68+
std::cerr.rdbuf(engineLog.rdbuf());
69+
```
70+
71+
This redirects the standard streams into a file on disk. After rebuild, the engine wrote a readable log that confirmed shader compilation succeeded, scene loaded, and the main loop was running. That gave us a signal channel for everything that followed.
72+
73+
---
74+
75+
## Turning the GPU Into a Debugger
76+
77+
The key insight for actually finding the bug was to treat the fragment shader and framebuffer as an instrumentation surface. Two kinds of probes, composed:
78+
79+
**CPU-side, per-frame:** inside `RenderSceneToFBO`, before returning, blit the framebuffer color attachment back to CPU memory with `glReadPixels` and scan it for non-clear-color pixels. Log interesting aggregate stats (min/max R/G/B values, count of non-sky pixels, sample RGB triples) and a handful of OpenGL state variables (`GL_DEPTH_WRITEMASK`, `GL_COLOR_WRITEMASK`, `GL_DEPTH_FUNC`, current draw buffers):
80+
81+
```cpp
82+
std::vector<unsigned char> buf(W * H * 3);
83+
glBindFramebuffer(GL_READ_FRAMEBUFFER, sceneViewFBO);
84+
glReadBuffer(GL_COLOR_ATTACHMENT0);
85+
glReadPixels(0, 0, W, H, GL_RGB, GL_UNSIGNED_BYTE, buf.data());
86+
87+
int maxR = 0, diffFromSky = 0;
88+
for (int i = 0; i < W * H; ++i) {
89+
int r = buf[i*3], g = buf[i*3+1], b = buf[i*3+2];
90+
if (r > maxR) maxR = r;
91+
if (r != 10 || g != 10 || b != 25) diffFromSky++;
92+
}
93+
```
94+
95+
That last check is the one that unlocked everything. The scene's clear color was `(0.04, 0.04, 0.1)` in linear space, which becomes `(10, 10, 25)` when written to an 8-bit framebuffer. Counting pixels that *differ* from exactly that value — not just pixels that are "bright" — distinguishes three very different failure modes:
96+
97+
- If fragments write `(0,0,0)` the count is high (the pixels are black, not sky).
98+
- If fragments write `finalColor` that happens to be very small positive, the count is also nonzero.
99+
- If fragments are never written at all, the count is `0` because every pixel is still the cleared value.
100+
101+
The diagnostic said `diffFromSky = 0`: every pixel was **exactly** the clear color. That is not "dark lighting". That is "no write happened". Fragments were either discarded via `discard` or never reached the output stage.
102+
103+
**Shader-side, per-fragment:** to distinguish *which* intermediate computation was breaking the pipeline, I temporarily replaced the final `FragColor = vec4(finalColor, alpha)` write with diagnostic variants and ran each one:
104+
105+
| Shader output line | Result | What it told us |
106+
| --- | --- | --- |
107+
| `FragColor = vec4(1.0, 1.0, 0.0, 1.0)` | Yellow pixels appear | The fragment shader executes and the framebuffer write path works. |
108+
| `FragColor = vec4(alpha, alpha, alpha, 1)` | White pixels | The local `alpha` variable is valid (`= 1.0`). |
109+
| `FragColor = vec4(0.5, 0.25, 0.75, 1)` | Purple pixels | Writing arbitrary literals works. |
110+
| `FragColor = vec4(albedoAlpha.rgb, 1)` | White pixels (texture color) | Sampling `texture1` works. |
111+
| `FragColor = vec4(albedoLinear, 1)` | Tinted colors appear | PBR albedo math works. |
112+
| `FragColor = vec4(Lo * 5.0, 1)` | **Zero pixels differ from sky** | Something upstream of `Lo` is wrong. |
113+
| `FragColor = vec4(finalColor * 1e6, 1)` | **Zero pixels differ from sky** | Not a magnitude problem. |
114+
115+
The critical clue: when the shader's output only depended on values the optimizer could compute without touching samplers, pixels appeared. When it depended on values that required the lighting path to run (which keeps the sampler-access code alive from a dead-code-elimination perspective), pixels did not appear.
116+
117+
That pattern — "the shader runs, writes *sometimes* work, and whether they work correlates with whether certain samplers are live" — is the fingerprint of a sampler-binding validation issue.
118+
119+
---
120+
121+
## The Bisect
122+
123+
To isolate which sampler was the culprit, I checked out `default.frag` from commit `949d50b` — a version from *before* the MRT/normal-output/reflection-probe refactor. It has a single `out vec4 FragColor` and no `samplerCube reflectionProbe`. With that shader, the scene rendered. Dim, because the ambient is low in the main menu, but *rendered*.
124+
125+
From there I ran a careful incremental merge: take the old working shader and add pieces of the new shader one at a time, re-running after each edit:
126+
127+
1. Add `layout(location = 1) out vec4 NormalOutput;` and write to it at the end → still works.
128+
2. Add the `samplerCube reflectionProbe` uniform declaration (but no usage in `main()`) → still works.
129+
3. Add the irradiance-probe SH uniforms (`uniform vec3 shCoeffs[9]`, etc.) → still works.
130+
4. Add the reflection-probe sampling block:
131+
132+
```glsl
133+
if (hasReflectionProbe && metallic > 0.01) {
134+
...
135+
vec3 envColor = textureLod(reflectionProbe, R, mip).rgb;
136+
Lo += envColor * F0 * ...;
137+
}
138+
```
139+
→ **renders go black.**
140+
141+
Commenting just that block back out restored rendering. The `textureLod(reflectionProbe, ...)` call was the discriminating line.
142+
143+
Crucially, `hasReflectionProbe` was `false` in this scene. The branch body *never executed at runtime*. But removing the code inside the branch fixed the problem. That could only mean one thing: the driver wasn't checking whether the sampler was *accessed*, it was checking whether it was *declared* and had a valid texture bound to its assigned unit.
144+
145+
---
146+
147+
## The Bug
148+
149+
`default.frag` declares several `samplerCube` uniforms:
150+
151+
```glsl
152+
uniform samplerCube pointShadowMap0;
153+
uniform samplerCube pointShadowMap1;
154+
uniform samplerCube pointShadowMap2;
155+
uniform samplerCube pointShadowMap3;
156+
uniform samplerCube reflectionProbe;
157+
```
158+
159+
At engine initialization, the C++ side assigns these samplers to dedicated texture units:
160+
161+
```cpp
162+
mainShader->setInt("pointShadowMap0", 7);
163+
mainShader->setInt("pointShadowMap1", 8);
164+
mainShader->setInt("pointShadowMap2", 9);
165+
mainShader->setInt("pointShadowMap3", 10);
166+
// reflectionProbe was implicitly unit 15, set only inside the baking code
167+
```
168+
169+
During normal scene rendering — say, a main menu with no active point shadows and no reflection probe — *nothing ever binds a cubemap to units 7, 8, 9, 10, or 15*. The C++ code only touches those units when a real point shadow or reflection probe is being rendered.
170+
171+
According to the OpenGL 3.3 core specification, this is not supposed to matter. Section 2.11.7 ("Shader Execution → Texture Access") defines incomplete-texture results: if no texture is bound to the unit a sampler reads from, the sampled value is `(0, 0, 0, 1)`. Spec-compliant drivers handle this by returning that default value.
172+
173+
But *some* drivers — typically Windows OpenGL drivers bundled with certain GPU stacks — run a stricter validation pass at draw time: every `samplerCube` declared in the linked program must have a cubemap texture bound to its assigned unit, even if the shader's control flow would never actually sample it. When that validation fails, the draw is silently dropped. No `GL_INVALID_OPERATION`, no error callback — just no pixels.
174+
175+
That matches the symptoms exactly:
176+
177+
- The fragment shader executed as far as the GPU compiler could see it would (the CPU-side flag `_dbgDrawn` counted 4 draws per frame).
178+
- Writes that used values the compiler could *prove* were independent of the guarded sampler access (constants, `alpha`, `albedoAlpha`) produced visible pixels, because the optimizer collapsed the shader to a short path that didn't keep the samplers live.
179+
- Writes that depended on `Lo` or `finalColor` — which required the lighting loop, which contains the reflection-probe block — kept the sampler references live, triggered the draw-time validation, and resulted in a silently-dropped draw.
180+
181+
The old single-output shader worked coincidentally: it had the same `samplerCube pointShadowMap0..3` declarations, but for some scenes those were bound by the point-shadow rendering pass earlier in the same frame, and the bindings persisted across state changes because nothing explicitly rebound unit 7-10 to anything else. The reflection-probe sampler was added *after* the old shader, so the coincidence broke.
182+
183+
This is exactly the class of bug that is very hard to trigger in CI or in validation layers for OpenGL, because the spec says it shouldn't happen. But it reliably happens on some production drivers, and you won't see a diagnostic message — just an empty frame.
184+
185+
---
186+
187+
## The Fix
188+
189+
Create a 1×1 black cubemap at engine startup and bind it to every `samplerCube` texture unit. Real cubemaps replace the binding when a feature needs them and fall back to the dummy otherwise:
190+
191+
```cpp
192+
glGenTextures(1, &dummyCubemap);
193+
glBindTexture(GL_TEXTURE_CUBE_MAP, dummyCubemap);
194+
unsigned char black[4] = {0, 0, 0, 255};
195+
for (int face = 0; face < 6; ++face)
196+
glTexImage2D(GL_TEXTURE_CUBE_MAP_POSITIVE_X + face, 0, GL_RGBA, 1, 1, 0,
197+
GL_RGBA, GL_UNSIGNED_BYTE, black);
198+
glTexParameteri(GL_TEXTURE_CUBE_MAP, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
199+
glTexParameteri(GL_TEXTURE_CUBE_MAP, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
200+
glTexParameteri(GL_TEXTURE_CUBE_MAP, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);
201+
glTexParameteri(GL_TEXTURE_CUBE_MAP, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);
202+
glTexParameteri(GL_TEXTURE_CUBE_MAP, GL_TEXTURE_WRAP_R, GL_CLAMP_TO_EDGE);
203+
204+
for (int unit : {7, 8, 9, 10, 15}) {
205+
glActiveTexture(GL_TEXTURE0 + unit);
206+
glBindTexture(GL_TEXTURE_CUBE_MAP, dummyCubemap);
207+
}
208+
glActiveTexture(GL_TEXTURE0);
209+
```
210+
211+
Six `glTexImage2D` calls, five texture-unit bindings, one `glGenTextures`. Zero runtime cost in the draw path. The `hasReflectionProbe` uniform still gates the actual `textureLod` at runtime, so the dummy cubemap is only ever *sampled* when the shader would have sampled it anyway (which it doesn't, in scenes without probes), and when it *is* sampled, it contributes `(0,0,0)` which is the spec-correct default for an incomplete texture.
212+
213+
---
214+
215+
## Lessons
216+
217+
**Always bind something to every sampler unit you declare.** Relying on uniform-bool guards to prevent sampler access is legal by the spec but not portable in practice. Create a dummy texture per sampler type (`sampler2D`, `samplerCube`, `sampler2DArray`, whatever you use) and use it as a default binding. It costs nothing.
218+
219+
**Use the framebuffer as a debugger.** When output looks wrong, the fastest diagnostic is to temporarily replace the final color with something that isolates one intermediate value. Constants rule out the whole pipeline. Individual uniforms or samplers rule in/out specific inputs. Scaled intermediates reveal magnitude problems. Count pixels that *differ from the clear color* rather than pixels that are "visible" — black pixels are a valid output, and treating them as invisible hides information.
220+
221+
**Trust bisecting more than reasoning.** Several hypotheses above were sensible and ruled out quickly. The actual bug — a sampler validation pass that isn't in the spec — would not have been on any list of hypotheses to check. But a clean "works vs. doesn't work" bisect on the shader source converged on it in a few edits.
222+
223+
**Redirect stdout/stderr to a log file in GUI apps from day one.** Shader compile errors, GL debug messages, any `std::cerr` you write for diagnostics — none of them are visible in a `WIN32`-subsystem executable without either a console allocation or a file redirect. A three-line addition to `main()` would have surfaced all of this ten minutes earlier.
224+
225+
**If you can, turn on `GL_KHR_debug`.** Register a debug callback with `glDebugMessageCallback` in debug builds and log every error, performance warning, and undefined-behavior report the driver is willing to give you. The draw wasn't producing a `GL_INVALID_OPERATION`, so even this wouldn't have flagged the exact bug here, but in the general case it converts silent driver quirks into explicit log lines and is free instrumentation.
226+
227+
---
228+
229+
*Commit: `2b5def6` of norlong — "Fix missing text/geometry in editor scene view".*

public/posts/posts.json

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,25 @@
11
[
2+
{
3+
"slug": "debugging-opengl-samplercube-bug",
4+
"title": "The Case of the Invisible Geometry: Debugging a Silent OpenGL Driver Validation Failure",
5+
"date": "2026-04-24",
6+
"updated": "2026-04-24",
7+
"description": "How a samplerCube uniform guarded by `if (false)` made the entire 3D scene vanish — a silent OpenGL driver validation quirk, the bisect that caught it, and the one-paragraph fix.",
8+
"tags": [
9+
"opengl",
10+
"graphics",
11+
"debugging",
12+
"shaders",
13+
"cpp",
14+
"rendering",
15+
"norlong"
16+
],
17+
"category": "dev",
18+
"filename": "debugging-opengl-samplercube-bug.txt",
19+
"authors": [
20+
"fezcode"
21+
]
22+
},
223
{
324
"slug": "vite-vike-ssg-migration",
425
"title": "Migrating Fezcodex From CRA to Vite + Vike: A Static-Site-Generation Deep Dive",

public/rss.xml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,9 +9,9 @@
99
<link>https://fezcode.com</link>
1010
</image>
1111
<generator>RSS for Node</generator>
12-
<lastBuildDate>Thu, 23 Apr 2026 23:01:51 GMT</lastBuildDate>
12+
<lastBuildDate>Thu, 23 Apr 2026 23:50:41 GMT</lastBuildDate>
1313
<atom:link href="https://fezcode.com/rss.xml" rel="self" type="application/rss+xml"/>
14-
<pubDate>Thu, 23 Apr 2026 23:01:51 GMT</pubDate>
14+
<pubDate>Thu, 23 Apr 2026 23:50:41 GMT</pubDate>
1515
<copyright><![CDATA[2026 Ahmed Samil Bulbul]]></copyright>
1616
<language><![CDATA[en]]></language>
1717
<managingEditor><![CDATA[samil.bulbul@gmail.com (Ahmed Samil Bulbul)]]></managingEditor>

0 commit comments

Comments
 (0)