-
Notifications
You must be signed in to change notification settings - Fork 371
Expand file tree
/
Copy pathindex.bs
More file actions
1269 lines (950 loc) · 65.1 KB
/
index.bs
File metadata and controls
1269 lines (950 loc) · 65.1 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<pre class='metadata'>
Title: WebGPU Explainer
Shortname: webgpu-explainer
Level: None
Status: w3c/CG-DRAFT
Group: webgpu
URL: https://gpuweb.github.io/gpuweb/explainer/
!Participate: <a href="https://github.com/gpuweb/gpuweb/issues/new?labels=explainer">File an issue</a> (<a href="https://github.com/gpuweb/gpuweb/labels/explainer">open issues</a>)
Editor: Kai Ninomiya, Google https://www.google.com, kainino@google.com, w3cid 99487
Editor: Corentin Wallez, Google https://www.google.com, cwallez@google.com
Editor: Dzmitry Malyshau, Mozilla https://www.mozilla.org, dmalyshau@mozilla.com, w3cid 96977
No Abstract: true
Markup Shorthands: markdown yes
Markup Shorthands: dfn yes
Markup Shorthands: idl yes
Markup Shorthands: css no
Assume Explicit For: yes
Boilerplate: repository-issue-tracking no
</pre>
<style>
/* Our SVGs aren't responsive to light/dark mode, so they're opaque with a
* white or black background. Rounded corners make them a bit less jarring. */
object[type="image/svg+xml"] {
border-radius: .5em;
}
</style>
Issue(tabatkins/bikeshed#2006): Set up cross-linking into the WebGPU and WGSL specs.
Issue(gpuweb/gpuweb#1321): Complete the planned sections.
# Introduction # {#introduction}
WebGPU is a proposed Web API to enable webpages to use the system's [GPU (Graphics Processing Unit)](https://en.wikipedia.org/wiki/Graphics_processing_unit) to perform computations and draw complex images that can be presented inside the page.
This goal is similar to the [WebGL](https://www.khronos.org/webgl/) family of APIs, but WebGPU enables access to more advanced features of GPUs.
Whereas WebGL is mostly for drawing images but can be repurposed (with great effort) to do other kinds of computations, WebGPU has first-class support for performing general computations on the GPU.
## Use cases ## {#use-cases}
Example use cases for WebGPU that aren't addressed by WebGL 2 are:
- Drawing images with highly-detailed scenes with many different objects (such as CAD models). WebGPU's drawing commands are individually cheaper than WebGL's.
- Executing advanced algorithms for drawing realistic scenes.
Many modern rendering techniques and optimizations cannot execute on WebGL 2 due to the lack of support for general computations.
- Executing machine learning models efficiently on the GPU.
It is possible to do general-purpose GPU (GPGPU) computation in WebGL, but it is sub-optimal and much more difficult.
Concrete examples are:
- Improving existing Javascript 3D libraries like Babylon.js and Three.js with new rendering techniques (compute-based particles, fancier post-processing, ...) and offloading to the GPU expensive computations currently done on the CPU (culling, skinned model transformation, ...).
- Porting newer game engines to the Web, and enable engines to expose more advanced rendering features.
For example Unity's WebGL export uses the lowest feature set of the engine, but WebGPU could use a higher feature set.
- Porting new classes of applications to the Web: many productivity applications offload computations to the GPU and need WebGPU's support for general computations.
- Improving existing Web teleconferencing applications. For example, Google Meet uses machine learning to separate the user from the background.
Running the machine learning in WebGPU would make it faster and more power-efficient, allowing (1) these capabilities to reach cheaper, more accessible user devices and (2) more complex and robust models.
## Goals ## {#goals}
Goals:
- Enable rendering of modern graphics both onscreen and offscreen.
- Enable general purpose computations to be executed efficiently on the GPU.
- Support implementations targeting various native GPU APIs: Microsoft's D3D12, Apple's Metal, and Khronos' Vulkan.
- Provide a human-authorable language to specify computations to run on the GPU.
- Be implementable in the multi-process architecture of browsers and uphold the security of the Web.
- As much as possible, have applications work portably across different user systems and browsers.
- Interact with the rest of the Web platform in useful but carefully-scoped ways (essentially sharing images one way or another).
- Provide a foundation to expose modern GPU functionality on the Web.
WebGPU is structured similarly to all current native GPU APIs, even if it doesn't provide all their features.
There are plans to later extend it to have more modern functionality.
See also: [[#why-not-webgl3]].
Non-goals:
- Expose support for hardware that's not programmable at all, or much less flexible, like DSPs or specialized machine learning hardware.
- Expose support for hardware that can't do general-purpose computations (like older mobile phones GPUs or even older desktop GPUs).
- Exhaustively expose all functionality available on native GPU APIs (some functionality is only available on GPUs from a single vendor, or is too niche to be added to WebGPU).
- Allow extensive mixing and matching of WebGL and WebGPU code.
- Tightly integrate with the page rendering flow like [CSS Houdini](https://developer.mozilla.org/en-US/docs/Web/Houdini).
## Why not "WebGL 3"? ## {#why-not-webgl3}
WebGL 1.0 and WebGL 2.0 are Javascript projections of the OpenGL ES 2.0 and OpenGL ES 3.0 APIs, respectively. WebGL's design traces its roots back to the OpenGL 1.0 API released in 1992 (which further traces its roots back to IRIS GL from the 1980s). This lineage has many advantages, including the vast available body of knowledge and the relative ease of porting applications from OpenGL ES to WebGL.
However, this also means that WebGL doesn't match the design of modern GPUs, causing CPU performance and GPU performance issues. It also makes it increasingly hard to implement WebGL on top of modern native GPU APIs. [WebGL 2.0 Compute](https://www.khronos.org/registry/webgl/specs/latest/2.0-compute/) was an attempt at adding general compute functionality to WebGL but the impedance mismatch with native APIs made the effort incredibly difficult. Contributors to WebGL 2.0 Compute decided to focus their efforts on WebGPU instead.
# Additional Background # {#background}
## Sandboxed GPU Processes in Web Browsers ## {#gpu-process}
A major design constraint for WebGPU is that it must be implementable and efficient in browsers that use a GPU-process architecture.
GPU drivers need access to additional kernel syscalls than what's otherwise used for Web content, and many GPU drivers are prone to hangs or crashes.
To improve stability and sandboxing, browsers use a special process that contains the GPU driver and talks with the rest of the browser through asynchronous IPC.
GPU processes are (or will be) used in Chromium, Gecko, and WebKit.
GPU processes are less sandboxed than content processes, and they are typically shared between multiple origins.
Therefore, they must validate all messages, for example to prevent a compromised content process from being able to look at the GPU memory used by another content process.
Most of WebGPU's validation rules are necessary to ensure it is secure to use, so all the validation needs to happen in the GPU process.
Likewise, all GPU driver objects only live in the GPU process, including large allocations (like buffers and textures) and complex objects (like pipelines).
In the content process, WebGPU types (`GPUBuffer`, `GPUTexture`, `GPURenderPipeline`, ...) are mostly just "handles" that identify objects that live in the GPU process.
This means that the CPU and GPU memory used by WebGPU object isn't necessarily known in the content process.
A `GPUBuffer` object can use maybe 150 bytes of CPU memory in the content process but hold a 1GB allocation of GPU memory.
See also the description of [the content and device timelines in the specification](https://gpuweb.github.io/gpuweb/#programming-model-timelines).
## Memory Visibility with GPUs and GPU Processes ## {#memory-visibility}
The two major types of GPUs are called "integrated GPUs" and "discrete GPUs".
Discrete GPUs are separate from the CPU; they usually come as PCI-e cards that you plug into the motherboard of a computer.
Integrated GPUs live on the same die as the CPU and don't have their own memory chips; instead, they use the same RAM as the CPU.
When using a discrete GPU, it's easy to see that most GPU memory allocations aren't visible to the CPU because they are inside the GPU's RAM (or VRAM for Video RAM).
For integrated GPUs most memory allocations are in the same physical places, but not made visible to the GPU for various reasons (for example, the CPU and GPU can have separate caches for the same memory, so accesses are not cache-coherent).
Instead, for the CPU to see the content of a GPU buffer, it must be "mapped", making it available in the virtual memory space of the application (think of mapped as in `mmap()`).
GPUBuffers must be specially allocated in order to be mappable - this can make it less efficient to access from the GPU (for example if it needs to be allocate in RAM instead of VRAM).
All this discussion was centered around native GPU APIs, but in browsers, the GPU driver is loaded in the *GPU process*, so native GPU buffers can be mapped only in the GPU process's virtual memory.
In general, it is not possible to map the buffer directly inside the *content process* (though some systems can do this, providing optional optimizations).
To work with this architecture an extra "staging" allocation is needed in shared memory between the GPU process and the content process.
The table below recapitulates which type of memory is visible where:
<table class="data">
<thead>
<tr>
<th>
<th> Regular `ArrayBuffer`
<th> Shared Memory
<th> Mappable GPU buffer
<th> Non-mappable GPU buffer (or texture)
</tr>
</thead>
<tr>
<td> CPU, in the content process
<td> **Visible**
<td> **Visible**
<td> Not visible
<td> Not visible
<tr>
<td> CPU, in the GPU process
<td> Not visible
<td> **Visible**
<td> **Visible**
<td> Not visible
<tr>
<td> GPU
<td> Not visible
<td> Not visible
<td> **Visible**
<td> **Visible**
</table>
# JavaScript API # {#api}
This section goes into details on important and unusual aspects of the WebGPU JavaScript API.
Generally, each subsection can be considered its own "mini-explainer",
though some require context from previous subsections.
## Adapters and Devices ## {#adapters-and-devices}
A WebGPU "adapter" (`GPUAdapter`) is an object which identifies a particular WebGPU
implementation on the system (e.g. a hardware accelerated implementation on an integrated or
discrete GPU, or software implementation).
Two different `GPUAdapter` objects on the same page could refer to the same underlying
implementation, or to two different underlying implementations (e.g. integrated and discrete GPUs).
The set of adapters visible to the page is at the discretion of the user agent.
A WebGPU "device" (`GPUDevice`) represents a logical connection to a WebGPU adapter.
It is called a "device" because it abstracts away the underlying implementation (e.g. video card)
and encapsulates a single connection: code that owns a device can act as if it is the only user
of the adapter.
As part of this encapsulation, a device is the root owner of all WebGPU objects created from it
(textures, etc.), which can be (internally) freed whenever the device is lost or destroyed.
Multiple components on a single webpage can each have their own WebGPU device.
All WebGPU usage is done through a WebGPU device or objects created from it.
In this sense, it serves a subset of the purpose of `WebGLRenderingContext`; however, unlike
`WebGLRenderingContext`, it is not associated with a canvas object, and most commands are
issued through "child" objects.
### Adapter Selection and Device Init ### {#initialization}
To get an adapter, an application calls `navigator.gpu.requestAdapter()`, optionally passing
options which may influence what adapter is chosen, like a
`powerPreference` (`"low-power"` or `"high-performance"`) or
`forceFallbackAdapter` to force a software implementation.
`requestAdapter()` never rejects, but may resolve to null if an adapter can't be returned with
the specified options.
A returned adapter exposes `info` (`vendor`/`architecture`/etc., implementation-defined), a boolean `isFallbackAdapter` so
applications with fallback paths (like WebGL or 2D canvas) can avoid slow software implementations,
and the [[#optional-capabilities]] available on the adapter.
<pre highlight=js>
const adapter = await navigator.gpu.requestAdapter(options);
if (!adapter) return goToFallback();
</pre>
To get a device, an application calls `adapter.requestDevice()`, optionally passing a descriptor
which enables additional optional capabilities - see [[#optional-capabilities]].
`requestDevice()` will reject (only) if the request is invalid,
i.e. it exceeds the capabilities of the adapter.
If anything else goes wrong in creation of the device,
it will resolve to a `GPUDevice` which has already been lost - see [[#device-loss]].
(This simplifies the number of different situations an app must handle
by avoiding an extra possible return value like `null` or another exception type,.)
<pre highlight=js>
const device = await adapter.requestDevice(descriptor);
device.lost.then(recoverFromDeviceLoss);
</pre>
An adapter may become unavailable, e.g. if it is unplugged from the system, disabled to save
power, or marked "stale" (`[[current]]` becomes false).
From then on, such an adapter can no longer vend valid devices,
and always returns already-lost `GPUDevice`s.
### Optional Capabilities ### {#optional-capabilities}
Each adapter may have different optional capabilities called "features" and "limits".
These are the maximum possible capabilities that can be requested when a device is created.
The set of optional capabilities exposed on each adapter is at the discretion of the user agent.
A device is created with an exact set of capabilities, specified in the arguments to
`adapter.requestDevice()` (see above).
When any work is issued to a device, it is strictly validated against the capabilities of the
device - not the capabilities of the adapter.
This eases development of portable applications by avoiding implicit dependence on the
capabilities of the development system.
## Object Validity and Destroyed-ness ## {#invalid-and-destroyed}
### WebGPU's Error Monad ### {#error-monad}
A.k.a. Contagious Internal Nullability.
A.k.a. transparent [promise pipelining](http://erights.org/elib/distrib/pipeline.html).
WebGPU is a very chatty API, with some applications making tens of thousands of calls per frame to render complex scenes.
We have seen that the GPU processes needs to validate the commands to satisfy their security property.
To avoid the overhead of validating commands twice in both the GPU and content process, WebGPU is designed so Javascript calls can be forwarded directly to the GPU process and validated there.
See the error section for more details on what's validated where and how errors are reported.
At the same time, during a single frame WebGPU objects can be created that depend on one another.
For example a `GPUCommandBuffer` can be recorded with commands that use temporary `GPUBuffer`s created in the same frame.
In this example, because of the performance constraint of WebGPU, it is not possible to send the message to create the `GPUBuffer` to the GPU process and synchronously wait for its processing before continuing Javascript execution.
Instead, in WebGPU all objects (like `GPUBuffer`) are created immediately on the content timeline and returned to JavaScript.
The validation is almost all done asynchronously on the "device timeline".
In the good case, when no errors occur , everything looks to JS as if it is synchronous.
However, when an error occurs in a call, it becomes a no-op (except for error reporting).
If the call returns an object (like `createBuffer`), the object is tagged as "invalid" on the GPU process side.
Since validation and allocation occur asynchronously, errors are reported asynchronously.
By itself, this can make for challenging debugging - see [[#errors-cases-debugging]].
All WebGPU calls validate that all their arguments are valid objects.
As a result, if a call takes one WebGPU object and returns a new one, the new object is also invalid (hence the term "contagious").
<figure>
<figcaption>
Timeline diagram of messages passing between processes, demonstrating how errors are propagated without synchronization.
</figcaption>
<object type="image/svg+xml" data="img/error_monad_timeline_diagram.svg"></object>
</figure>
<div class=example>
Using the API when doing only valid calls looks like a synchronous API:
<pre highlight="js">
const srcBuffer = device.createBuffer({
size: 4,
usage: GPUBufferUsage.COPY_SRC
});
const dstBuffer = ...;
const encoder = device.createCommandEncoder();
encoder.copyBufferToBuffer(srcBuffer, 0, dstBuffer, 0, 4);
const commands = encoder.finish();
device.queue.submit([commands]);
</pre>
</div>
<div class=example>
Errors propagate contagiously when creating objects:
<pre highlight="js">
// The size of the buffer is too big, this causes an OOM and srcBuffer is invalid.
const srcBuffer = device.createBuffer({
size: BIG_NUMBER,
usage: GPUBufferUsage.COPY_SRC
});
const dstBuffer = ...;
// The encoder starts as a valid object.
const encoder = device.createCommandEncoder();
// Special case: an invalid object is used when encoding commands, so the encoder
// becomes invalid.
encoder.copyBufferToBuffer(srcBuffer, 0, dstBuffer, 0, 4);
// Since the encoder is invalid, encoder.finish() is invalid and returns
// an invalid object.
const commands = encoder.finish();
// The command references an invalid object so it becomes a no-op.
device.queue.submit([commands]);
</pre>
</div>
#### Mental Models #### {#error-monad-mental-model}
One way to interpret WebGPU's semantics is that every WebGPU object is actually a `Promise` internally and that all WebGPU methods are `async` and `await` before using each of the WebGPU objects it gets as argument.
However the execution of the async code is outsourced to the GPU process (where it is actually done synchronously).
Another way, closer to actual implementation details, is to imagine that each `GPUFoo` JS object maps to a `gpu::InternalFoo` C++/Rust object on the GPU process that contains a `bool isValid`.
Then during the validation of each command on the GPU process, the `isValid` are all checked and a new, invalid object is returned if validation fails.
On the content process side, the `GPUFoo` implementation doesn't know if the object is valid or not.
### Early Destruction of WebGPU Objects ### {#early-destroy}
Most of the memory usage of WebGPU objects is in the GPU process: it can be GPU memory held by objects like `GPUBuffer` and `GPUTexture`, serialized commands held in CPU memory by `GPURenderBundles`, or complex object graphs for the WGSL AST in `GPUShaderModule`.
The JavaScript garbage collector (GC) is in the renderer process and doesn't know about the memory usage in the GPU process.
Browsers have many heuristics to trigger GCs but a common one is that it should be triggered on memory pressure scenarios.
However a single WebGPU object can hold on to MBs or GBs of memory without the GC knowing and never trigger the memory pressure event.
It is important for WebGPU applications to be able to directly free the memory used by some WebGPU objects without waiting for the GC.
For example applications might create temporary textures and buffers each frame and without the explicit `.destroy()` call they would quickly run out of GPU memory.
That's why WebGPU has a `.destroy()` method on those object types which can hold on to arbitrary amount of memory.
It signals that the application doesn't need the content of the object anymore and that it can be freed as soon as possible.
Of course, it becomes a validation error to use the object after the call to `.destroy()`.
<div class=example>
<pre highlight="js">
const dstBuffer = device.createBuffer({
size: 4
usage: GPUBufferUsage.COPY_DST
});
// The buffer is not destroyed (and valid), success!
device.queue.writeBuffer(dstBuffer, 0, myData);
dstBuffer.destroy();
// The buffer is now destroyed, commands using that would use its
// content produce validation errors.
device.queue.writeBuffer(dstBuffer, 0, myData);
</pre>
</div>
Note that, while this looks somewhat similar to the behavior of an invalid buffer, it is distinct.
Unlike invalidity, destroyed-ness can change after creation, is not contagious, and is validated only when work is actually submitted (e.g. `queue.writeBuffer()` or `queue.submit()`), not when creating dependent objects (like command encoders, see above).
## Errors ## {#errors}
In a simple world, error handling in apps would be synchronous with JavaScript exceptions.
However, for multi-process WebGPU implementations, this is prohibitively expensive.
See [[#invalid-and-destroyed]], which also explains how the *browser* handles errors.
### Problems and Solutions ### {#errors-solutions}
Developers and applications need error handling for a number of cases:
- *Debugging*:
Getting errors synchronously during development, to break in to the debugger.
- *Fatal Errors*:
Handling device/adapter loss, either by restoring WebGPU or by fallback to non-WebGPU content.
- *Fallible Allocation*:
Making fallible GPU-memory resource allocations (detecting out-of-memory conditions).
- *Fallible Validation*:
Checking success of WebGPU calls, for applications' unit/integration testing, WebGPU
conformance testing, or detecting errors in data-driven applications (e.g. loading glTF
models that may exceed device limits).
- *App Telemetry*:
Collecting error logs in web app deployment, for bug reporting and telemetry.
The following sections go into more details on these cases and how they are solved.
#### Debugging #### {#errors-cases-debugging}
**Solution:** Dev Tools.
Implementations should provide a way to enable synchronous validation,
for example via a "break on WebGPU error" option in the developer tools.
This can be achieved with a content-process⇆gpu-process round-trip in every validated WebGPU
call, though in practice this would be very slow.
It can be optimized by running a "predictive" mirror of the validation steps in the content
process, which either ignores out-of-memory errors (which it can't predict),
or uses round-trips only for calls that can produce out-of-memory errors.
#### Fatal Errors: Adapter and Device Loss #### {#errors-cases-fatalerrors}
**Solution:** [[#device-loss]].
#### Fallible Allocation, Fallible Validation, and Telemetry #### {#errors-cases-other}
**Solution:** *Error Scopes*.
For important context, see [[#invalid-and-destroyed]]. In particular, all errors (validation and
out-of-memory) are detected asynchronously, in a remote process.
In the WebGPU spec, we refer to the thread of work for each WebGPU device as its "device timeline".
As such, applications need a way to instruct the device timeline on what to do with any errors
that occur. To solve this, WebGPU uses *Error Scopes*.
### Error Scopes ### {#errors-errorscopes}
WebGL exposes errors using a `getError` function which returns the first error since the last `getError` call.
This is simple, but has two problems.
- It is synchronous, incurring a round-trip and requiring all previously issued work to be finished.
We solve this by returning errors asynchronously.
- Its flat state model composes poorly: errors can leak to/from unrelated code, possibly in
libraries/middleware, browser extensions, etc. We solve this with a stack of error "scopes",
allowing each component to hermetically capture and handle its own errors.
In WebGPU, each device<sup>1</sup> maintains a persistent "error scope" stack state.
Initially, the device's error scope stack is empty.
`GPUDevice.pushErrorScope('validation')` or `GPUDevice.pushErrorScope('out-of-memory')`
begins an error scope and pushes it onto the stack.
This scope captures only errors of a particular type depending on the type of error the application
wants to detect.
It is rare to need to detect both, so two nested error scopes are needed to do so.
`GPUDevice.popErrorScope()` ends an error scope, popping it from the stack and returning a
`Promise<GPUError?>`, which resolves once enclosed operations have completed and reported back.
This includes exactly all fallible operations that were *issued* during between the push and pop calls.
It resolves to `null` if no errors were captured, and otherwise resolves to an object describing
the first error that was captured by the scope - either a `GPUValidationError` or a
`GPUOutOfMemoryError`.
Any device-timeline error from an operation is passed to the top-most error scope on the stack at
the time it was issued.
- If an error scope captures an error, the error is not passed down the stack.
Each error scope stores **only one** error it captures; any other errors it captures
are **silently ignored**.
- If not, the error is passed down the stack to the enclosing error scope.
- If an error reaches the bottom of the stack, it **may**<sup>2</sup> fire the `uncapturederror`
event on `GPUDevice`<sup>3</sup> (and could issue a console warning as well).
<sup>1</sup>
In the plan to add [[#multithreading]], error scope state to actually be **per-device, per-realm**.
That is, when a GPUDevice is posted to a Worker for the first time, the error scope stack for
that device+realm is always empty.
(If a GPUDevice is copied *back* to an execution context it already existed on, it shares its
error scope state with all other copies on that execution context.)
<sup>2</sup>
The implementation may not choose to always fire the event for a given error, for example if it
has fired too many times, too many times rapidly, or with too many errors of the same kind.
This is similar to how Dev Tools console warnings work today for WebGL.
In poorly-formed applications, this mechanism can prevent the events from having a significant
performance impact on the system.
<sup>3</sup>
More specifically, with [[#multithreading]], this event would only exist on the *originating*
`GPUDevice` (the one that came from `createDevice`, and not by receiving posted messages);
a distinct interface would be used for non-originating device objects.
```webidl
enum GPUErrorFilter {
"out-of-memory",
"validation"
};
interface GPUOutOfMemoryError {
constructor();
};
interface GPUValidationError {
constructor(DOMString message);
readonly attribute DOMString message;
};
typedef (GPUOutOfMemoryError or GPUValidationError) GPUError;
partial interface GPUDevice {
undefined pushErrorScope(GPUErrorFilter filter);
Promise<GPUError?> popErrorScope();
};
```
#### How this solves *Fallible Allocation* #### {#errors-errorscopes-allocation}
If a call that fallibly allocates GPU memory (e.g. `createBuffer` or `createTexture`) fails, the
resulting object is invalid (same as if there were a validation error), but an `'out-of-memory'`
error is generated.
An `'out-of-memory'` error scope can be used to detect it.
**Example: tryCreateBuffer**
```ts
async function tryCreateBuffer(device: GPUDevice, descriptor: GPUBufferDescriptor): Promise<GPUBuffer | null> {
device.pushErrorScope('out-of-memory');
const buffer = device.createBuffer(descriptor);
if (await device.popErrorScope() !== null) {
return null;
}
return buffer;
}
```
This interacts with buffer mapping error cases in subtle ways due to numerous possible
out-of-memory situations in implementations, but they are not explained here.
The principle used to design the interaction is that app code should need to handle as few
different edge cases as possible, so multiple kinds of situations should result in the same
behavior.
In addition, there are (will be) rules on the relative ordering of most promise resolutions,
to prevent non-portable browser behavior or flaky races between async code.
#### How this solves *Fallible Validation* #### {#errors-errorscopes-validation}
A `'validation'` error scope can be used to detect validation errors, as above.
**Example: Testing**
```ts
device.pushErrorScope('out-of-memory');
device.pushErrorScope('validation');
{
// (Do stuff that shouldn't produce errors.)
{
device.pushErrorScope('validation');
device.doOperationThatIsExpectedToError();
device.popErrorScope().then(error => { assert(error !== null); });
}
// (More stuff that shouldn't produce errors.)
}
// Detect unexpected errors.
device.popErrorScope().then(error => { assert(error === null); });
device.popErrorScope().then(error => { assert(error === null); });
```
#### How this solves *App Telemetry* #### {#errors-errorscopes-telemetry}
As mentioned above, if an error is not captured by an error scope, it **may** fire the
originating device's `uncapturederror` event.
Applications can either watch for that event, or encapsulate parts of their application with
error scopes, to detect errors for generating error reports.
`uncapturederror` is not strictly necessary to solve this, but has the benefit of providing a
single stream for uncaptured errors from all threads.
#### Error Messages and Debug Labels #### {#errors-errorscopes-labels}
Every WebGPU object has a read-write attribute, `label`, which can be set by the application to
provide information for debugging tools (error messages, native profilers like Xcode, etc.)
Every WebGPU object creation descriptor has a member `label` which sets the initial value of the
attribute.
Additionally, parts of command buffers can be labeled with debug markers and debug groups.
See [[#command-encoding-debug]].
For both debugging (dev tools messages) and app telemetry (`uncapturederror`)
implementations can choose to report some kind of "stack trace" in their error messages,
taking advantage of object debug labels.
For example, a debug message string could be:
```
<myQueue>.submit failed:
- commands[0] (<mainColorPass>) was invalid:
- in the debug group <environment>:
- in the debug group <tree 123>:
- in setIndexBuffer, indexBuffer (<mesh3.indices>) was invalid:
- in createBuffer, desc.usage (0x89) was invalid
```
### Alternatives Considered ### {#errors-alternatives}
- Synchronous `getError`, like WebGL. Discussed at the beginning: [[#errors-errorscopes]].
- Callback-based error scope: `device.errorScope('out-of-memory', async () => { ... })`.
Since it's necessary to allow asynchronous work inside error scopes, this formulation is
actually largely equivalent to the one shown above, as the callback could never resolve.
Application architectures would be limited by the need to conform to a compatible call stack,
or they would remap the callback-based API into a push/pop-based API.
Finally, it's generally not catastrophic if error scopes become unbalanced, though the
stack could grow unboundedly resulting in an eventual crash (or device loss).
## Device Loss ## {#device-loss}
Any situation that prevents further use of a `GPUDevice` results in a *device loss*.
These can arise due to WebGPU calls or external events; for example:
`device.destroy()`, an unrecoverable out-of-memory condition, a GPU process crash, a long
operation resulting in GPU reset, a GPU reset caused by another application, a discrete GPU being
switched off to save power, or an external GPU being unplugged.
**Design principle:**
There should be as few different-looking error behaviors as possible.
This makes it easier for developers to test their app's behavior in different situations,
improves robustness of applications in the wild, and improves portability between browsers.
Issue: Finish this explainer (see [ErrorHandling.md](https://github.com/gpuweb/gpuweb/blob/main/design/ErrorHandling.md#fatal-errors-requestadapter-requestdevice-and-devicelost)).
## Buffer Mapping ## {#buffer-mapping}
A `GPUBuffer` represents a memory allocation usable by other GPU operations.
This memory can be accessed linearly, contrary to `GPUTexture` for which the actual memory layout of sequences of texels are unknown. Think of `GPUBuffers` as the result of `gpu_malloc()`.
**CPU→GPU:** When using WebGPU, applications need to transfer data from JavaScript to `GPUBuffer` very often and potentially in large quantities.
This includes mesh data, drawing and computations parameters, ML model inputs, etc.
That's why an efficient way to update `GPUBuffer` data is needed. `GPUQueue.writeBuffer` is reasonably efficient but includes at least an extra copy compared to the buffer mapping used for writing buffers.
**GPU→CPU:** Applications also often need to transfer data from the GPU to Javascript, though usually less often and in lesser quantities.
This includes screenshots, statistics from computations, simulation or ML model results, etc.
This transfer is done with buffer mapping for reading buffers.
See [[#memory-visibility]] for additional background on the various types of memory that buffer mapping interacts with.
### CPU-GPU Ownership Transfer ### {#buffer-mapping-ownership}
In native GPU APIs, when a buffer is mapped, its content becomes accessible to the CPU.
At the same time the GPU can keep using the buffer's content, which can lead to data races between the CPU and the GPU.
This means that the usage of mapped buffer is simple but leaves the synchronization to the application.
On the contrary, WebGPU prevents almost all data races in the interest of portability and consistency.
In WebGPU there is even more risk of non-portability with races on mapped buffers because of the additional "shared memory" step that may be necessary on some drivers.
That's why `GPUBuffer` mapping is done as an ownership transfer between the CPU and the GPU.
At each instant, only one of the two can access it, so no race is possible.
When an application requests to map a buffer, it initiates a transfer of the buffer's ownership to the CPU.
At this time, the GPU may still need to finish executing some operations that use the buffer, so the transfer doesn't complete until all previously-enqueued GPU operations are finished.
That's why mapping a buffer is an asynchronous operation (we'll discuss the other arguments below):
<xmp highlight=idl>
typedef [EnforceRange] unsigned long GPUMapModeFlags;
namespace GPUMapMode {
const GPUFlagsConstant READ = 0x0001;
const GPUFlagsConstant WRITE = 0x0002;
};
partial interface GPUBuffer {
Promise<undefined> mapAsync(GPUMapModeFlags mode,
optional GPUSize64 offset = 0,
optional GPUSize64 size);
};
</xmp>
<div class=example>
Using it is done like so:
<pre highlight="js">
// Mapping a buffer for writing. Here offset and size are defaulted,
// so the whole buffer is mapped.
const myMapWriteBuffer = ...;
await myMapWriteBuffer.mapAsync(GPUMapMode.WRITE);
// Mapping a buffer for reading. Only the first four bytes are mapped.
const myMapReadBuffer = ...;
await myMapReadBuffer.mapAsync(GPUMapMode.READ, 0, 4);
</pre>
</div>
Once the application has finished using the buffer on the CPU, it can transfer ownership back to the GPU by unmapping it.
This is an immediate operation that makes the application lose all access to the buffer on the CPU (i.e. detaches `ArrayBuffers`):
<xmp highlight=idl>
partial interface GPUBuffer {
undefined unmap();
};
</xmp>
<div class=example>
Using it is done like so:
<pre highlight="js">
const myMapReadBuffer = ...;
await myMapReadBuffer.mapAsync(GPUMapMode.READ, 0, 4);
// Do something with the mapped buffer.
buffer.unmap();
</pre>
</div>
When transferring ownership to the CPU, a copy may be necessary from the underlying mapped buffer to shared memory visible to the content process.
To avoid copying more than necessary, the application can specify which range it is interested in when calling `GPUBuffer.mapAsync`.
`GPUBuffer.mapAsync`'s `mode` argument controls which type of mapping operation is performed.
At the moment its values are redundant with the buffer creation's usage flags, but it is present for explicitness and future extensibility.
While a `GPUBuffer` is owned by the CPU, it is not possible to submit any operations on the device timeline that use it; otherwise, a validation error is produced.
However it is valid (and encouraged!) to record `GPUCommandBuffer`s using the `GPUBuffer`.
### Creation of Mappable Buffers ### {#buffer-mapping-creation}
The physical memory location for a `GPUBuffer`'s underlying buffer depends on whether it should be mappable and whether it is mappable for reading or writing (native APIs give some control on the CPU cache behavior for example).
At the moment mappable buffers can only be used to transfer data (so they can only have the correct `COPY_SRC` or `COPY_DST` usage in addition to a `MAP_*` usage),
That's why applications must specify that buffers are mappable when they are created using the (currently) mutually exclusive `GPUBufferUsage.MAP_READ` and `GPUBufferUsage.MAP_WRITE` flags:
<div class=example>
<pre highlight="js">
const myMapReadBuffer = device.createBuffer({
usage: GPUBufferUsage.MAP_READ | GPUBufferUsage.COPY_DST,
size: 1000,
});
const myMapWriteBuffer = device.createBuffer({
usage: GPUBufferUsage.MAP_WRITE | GPUBufferUsage.COPY_SRC,
size: 1000,
});
</pre>
</div>
### Accessing Mapped Buffers ### {#buffer-mapping-access}
Once a `GPUBuffer` is mapped, it is possible to access its memory from JavaScript
This is done by calling `GPUBuffer.getMappedRange`, which returns an `ArrayBuffer` called a "mapping".
These are available until `GPUBuffer.unmap` or `GPUBuffer.destroy` is called, at which point they are detached.
These `ArrayBuffer`s typically aren't new allocations, but instead pointers to some kind of shared memory visible to the content process (IPC shared memory, `mmap`ped file descriptor, etc.)
When transferring ownership to the GPU, a copy may be necessary from the shared memory to the underlying mapped buffer.
`GPUBuffer.getMappedRange` takes an optional range of the buffer to map (for which `offset` 0 is the start of the buffer).
This way the browser knows which parts of the underlying `GPUBuffer` have been "invalidated" and need to be updated from the memory mapping.
The range must be within the range requested in `mapAsync()`.
<xmp highlight=idl>
partial interface GPUBuffer {
ArrayBuffer getMappedRange(optional GPUSize64 offset = 0,
optional GPUSize64 size);
};
</xmp>
<div class=example>
Using it is done like so:
<pre highlight="js">
const myMapReadBuffer = ...;
await myMapReadBuffer.mapAsync(GPUMapMode.READ);
const data = myMapReadBuffer.getMappedRange();
// Do something with the data
myMapReadBuffer.unmap();
</pre>
</div>
### Mapping Buffers at Creation ### {#buffer-mapping-at-creation}
A common need is to create a `GPUBuffer` that is already filled with some data.
This could be achieved by creating a final buffer, then a mappable buffer, filling the mappable buffer, and then copying from the mappable to the final buffer, but this would be inefficient.
Instead this can be done by making the buffer CPU-owned at creation: we call this "mapped at creation".
All buffers can be mapped at creation, even if they don't have the `MAP_WRITE` buffer usages.
The browser will just handle the transfer of data into the buffer for the application.
Once a buffer is mapped at creation, it behaves as regularly mapped buffer: `GPUBUffer.getMappedRange()` is used to retrieve `ArrayBuffer`s, and ownership is transferred to the GPU with `GPUBuffer.unmap()`.
<div class=example>
Mapping at creation is done by passing `mappedAtCreation: true` in the buffer descriptor on creation:
<pre highlight="js">
const buffer = device.createBuffer({
usage: GPUBufferUsage.UNIFORM,
size: 256,
mappedAtCreation: true,
});
const data = buffer.getMappedRange();
// write to data
buffer.unmap();
</pre>
</div>
When using advanced methods to transfer data to the GPU (with a rolling list of buffers that are mapped or being mapped), mapping buffer at creation can be used to immediately create additional space where to put data to be transferred.
### Examples ### {#buffer-mapping-examples}
<div class=example>
The optimal way to create a buffer with initial data, for example here a [Draco](https://google.github.io/draco/)-compressed 3D mesh:
<pre highlight="js">
const dracoDecoder = ...;
const buffer = device.createBuffer({
usage: GPUBuffer.VERTEX | GPUBuffer.INDEX,
size: dracoDecoder.decompressedSize,
mappedAtCreation: true,
});
dracoDecoder.decodeIn(buffer.getMappedRange());
buffer.unmap();
</pre>
</div>
<div class=example>
Retrieving data from a texture rendered on the GPU:
<pre highlight="js">
const texture = getTheRenderedTexture();
const readbackBuffer = device.createBuffer({
usage: GPUBufferUsage.COPY_DST | GPUBufferUsage.MAP_READ,
size: 4 * textureWidth * textureHeight,
});
// Copy data from the texture to the buffer.
const encoder = device.createCommandEncoder();
encoder.copyTextureToBuffer(
{ texture },
{ buffer: readbackBuffer, bytesPerRow: textureWidth * 4 },
[textureWidth, textureHeight],
);
device.queue.submit([encoder.finish()]);
// Get the data on the CPU.
await readbackBuffer.mapAsync(GPUMapMode.READ);
saveScreenshot(readbackBuffer.getMappedRange());
readbackBuffer.unmap();
</pre>
</div>
<div class=example>
Updating a bunch of data on the GPU for a frame:
<pre highlight="js">
void frame() {
// Create a new buffer for our updates. In practice we would
// reuse buffers from frame to frame by re-mapping them.
const stagingBuffer = device.createBuffer({
usage: GPUBufferUsage.MAP_WRITE | GPUBufferUsage.COPY_SRC,
size: 16 * objectCount,
mappedAtCreation: true,
});
const stagingData = new Float32Array(stagingBuffer.getMappedRange());
// For each draw we are going to:
// - Put the data for the draw in stagingData.
// - Record a copy from the stagingData to the uniform buffer for the draw
// - Encoder the draw
const copyEncoder = device.createCommandEncoder();
const drawEncoder = device.createCommandEncoder();
const renderPass = myCreateRenderPass(drawEncoder);
for (var i = 0; i < objectCount; i++) {
stagingData[i * 4 + 0] = ...;
stagingData[i * 4 + 1] = ...;
stagingData[i * 4 + 2] = ...;
stagingData[i * 4 + 3] = ...;
const {uniformBuffer, uniformOffset} = getUniformsForDraw(i);
copyEncoder.copyBufferToBuffer(
stagingBuffer, i * 16,
uniformBuffer, uniformOffset,
16);
encodeDraw(renderPass, {uniformBuffer, uniformOffset});
}
renderPass.end();
// We are finished filling the staging buffer, unmap() it so
// we can submit commands that use it.
stagingBuffer.unmap();
// Submit all the copies and then all the draws. The copies
// will happen before the draw such that each draw will use
// the data that was filled inside the for-loop above.
device.queue.submit([
copyEncoder.finish(),
drawEncoder.finish()
]);
}
</pre>
</div>
## Multithreading ## {#multithreading}
Multithreading is a key part of modern graphics APIs.
Unlike OpenGL, newer APIs allow applications to encode commands, submit work, transfer data to the GPU, and
so on, from multiple threads at once, alleviating CPU bottlenecks.
This is especially relevant to WebGPU, since IDL bindings are generally much slower than C calls.
WebGPU does not *yet* allow multithreaded use of a single `GPUDevice`, but the API has been
designed from the ground up with this in mind.
This section describes the tentative plan for how it will work.
As described in [[#gpu-process]], most WebGPU objects are actually just "handles" that refer to
objects in the browser's GPU process.
As such, it is relatively straightforward to allow these to be shared among threads.
For example, a `GPUTexture` object can simply be `postMessage()`d to another thread, creating a
new `GPUTexture` JavaScript object containing a handle to the *same* (ref-counted) GPU-process object.
Several objects, like `GPUBuffer`, have client-side state.
Applications still need to use them from multiple threads without having to `postMessage` such
objects back and forth with `[Transferable]` semantics (which would also create new wrapper
objects, breaking old references).
Therefore, these objects will also be `[Serializable]` but have a small amount of (content-side)
**shared state**, just like `SharedArrayBuffer`.
Though access to this shared state is somewhat limited - it can't be changed arbitrarily quickly
on a single object - it might still be a timing attack vector, like `SharedArrayBuffer`,
so it is tentatively gated on cross-origin isolation.
See [Timing attacks](https://gpuweb.github.io/gpuweb/#security-timing).
<div class=example>
Given threads "Main" and "Worker":
- Main: `const B1 = device.createBuffer(...);`.
- Main: uses postMessage to send `B1` to Worker.
- Worker: receives message → `B2`.
- Worker: `const mapPromise = B2.mapAsync()` → successfully puts the buffer in the "map pending" state.
- Main: `B1.mapAsync()` → **throws an exception** (and doesn't change the state of the buffer).
- Main: encodes some command that uses `B1`, like:
```js
encoder.copyBufferToTexture(B1, T);
const commandBuffer = encoder.finish();
```
→ succeeds, because this doesn't depend on the buffer's client side state.
- Main: `queue.submit(commandBuffer)` → **asynchronous WebGPU error**,
because the CPU currently owns the buffer.
- Worker: `await mapPromise`, writes to the mapping, then calls `B2.unmap()`.
- Main: `queue.submit(commandBuffer)` → succeeds
- Main: `B1.mapAsync()` → successfully puts the buffer in the "map pending" state
</div>
Further discussion can be found in [#354](https://github.com/gpuweb/gpuweb/issues/354)
(note not all of it reflects current thinking).
### Unsolved: Synchronous Object Transfer ### {#multithreading-transfer}
Some application architectures require objects to be passed between threads without having to
asynchronously wait for a message to arrive on the receiving thread.
The most crucial class of such architectures are in WebAssembly applications:
Programs using native C/C++/Rust/etc. bindings for WebGPU will want to assume object handles
are plain-old-data (e.g. `typedef struct WGPUBufferImpl* WGPUBuffer;`)
that can be passed between threads freely.
Unfortunately, this cannot be implemented in C-on-JS bindings (e.g. Emscripten) without complex,
hidden, and slow asynchronicity (yielding on the receiving thread, interrupting the sending
thread to send a message, then waiting for the object on the receiving thread).
Some alternatives are mentioned in issue [#747](https://github.com/gpuweb/gpuweb/issues/747):
- `SharedObjectTable`, an object with shared-state (like `SharedArrayBuffer`) containing a table of
`[Serializable]` values. Effectively, a store into the table would serialize once, and then any
thread with the `SharedObjectTable` could (synchronously) deserialize the object on demand.
- A synchronous `MessagePort.receiveMessage()` method.
This would be less ideal as it would require any thread that creates one of these objects to
eagerly send it to every thread, just in case they need it later.
- Allow "exporting" a numerical ID for an object that can be used to "import" the object on
another thread. This bypasses the garbage collector and makes it easy to leak memory.
## Command Encoding and Submission ## {#command-encoding}
Many operations in WebGPU are purely GPU-side operations that don't use data from the CPU.
These operations are not issued directly; instead, they are encoded into `GPUCommandBuffer`s
via the builder-like `GPUCommandEncoder` interface, then later sent to the GPU with
`gpuQueue.submit()`.
This design is used by the underlying native APIs as well. It provides several benefits:
- Command buffer encoding is independent of other state, allowing encoding (and command buffer
validation) work to utilize multiple CPU threads.
- Provides a larger chunk of work at once, allowing the GPU driver to do more global
optimization, especially in how it schedules work across the GPU hardware.
### Debug Markers and Debug Groups ### {#command-encoding-debug}
For error messages and debugging tools, it is possible to label work inside a command buffer.
(See [[#errors-errorscopes-labels]].)
- `insertDebugMarker(markerLabel)` marks a point in a stream of commands.
- `pushDebugGroup(groupLabel)`/`popDebugGroup()` nestably demarcate sub-streams of commands.
This can be used e.g. to label which part of a command buffer corresponds to different objects
or parts of a scene.
### Passes ### {#command-encoding-passes}
Issue: Briefly explain passes?
## Pipelines ## {#pipelines}
## Image, Video, and Canvas input ## {#image-input}
Issue: Exact API still in flux as of this writing.
WebGPU is largely isolated from the rest of the Web platform, but has several interop points.
One of these is image data input into the API.
Aside from the general data read/write mechanisms (`writeTexture`, `writeBuffer`, and `mapAsync`),
data can also come from `<img>`/`ImageBitmap`, canvases, and videos.
There are many use-cases that require these, including: