gpuweb/explainer/index.bs at main · gpuweb/gpuweb

History

1269 lines (950 loc) · 65.1 KB

Raw

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299

300

301

302

303

304

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

323

324

325

326

327

328

329

330

331

332

333

334

335

336

337

338

339

340

341

342

343

344

345

346

347

348

349

350

351

352

353

354

355

356

357

358

359

360

361

362

363

364

365

366

367

368

369

370

371

372

373

374

375

376

377

378

379

380

381

382

383

384

385

386

387

388

389

390

391

392

393

394

395

396

397

398

399

400

401

402

403

404

405

406

407

408

409

410

411

412

413

414

415

416

417

418

419

420

421

422

423

424

425

426

427

428

429

430

431

432

433

434

435

436

437

438

439

440

441

442

443

444

445

446

447

448

449

450

451

452

453

454

455

456

457

458

459

460

461

462

463

464

465

466

467

468

469

470

471

472

473

474

475

476

477

478

479

480

481

482

483

484

485

486

487

488

489

490

491

492

493

494

495

496

497

498

499

500

501

502

503

504

505

506

507

508

509

510

511

512

513

514

515

516

517

518

519

520

521

522

523

524

525

526

527

528

529

530

531

532

533

534

535

536

537

538

539

540

541

542

543

544

545

546

547

548

549

550

551

552

553

554

555

556

557

558

559

560

561

562

563

564

565

566

567

568

569

570

571

572

573

574

575

576

577

578

579

580

581

582

583

584

585

586

587

588

589

590

591

592

593

594

595

596

597

598

599

600

601

602

603

604

605

606

607

608

609

610

611

612

613

614

615

616

617

618

619

620

621

622

623

624

625

626

627

628

629

630

631

632

633

634

635

636

637

638

639

640

641

642

643

644

645

646

647

648

649

650

651

652

653

654

655

656

657

658

659

660

661

662

663

664

665

666

667

668

669

670

671

672

673

674

675

676

677

678

679

680

681

682

683

684

685

686

687

688

689

690

691

692

693

694

695

696

697

698

699

700

701

702

703

704

705

706

707

708

709

710

711

712

713

714

715

716

717

718

719

720

721

722

723

724

725

726

727

728

729

730

731

732

733

734

735

736

737

738

739

740

741

742

743

744

745

746

747

748

749

750

751

752

753

754

755

756

757

758

759

760

761

762

763

764

765

766

767

768

769

770

771

772

773

774

775

776

777

778

779

780

781

782

783

784

785

786

787

788

789

790

791

792

793

794

795

796

797

798

799

800

801

802

803

804

805

806

807

808

809

810

811

812

813

814

815

816

817

818

819

820

821

822

823

824

825

826

827

828

829

830

831

832

833

834

835

836

837

838

839

840

841

842

843

844

845

846

847

848

849

850

851

852

853

854

855

856

857

858

859

860

861

862

863

864

865

866

867

868

869

870

871

872

873

874

875

876

877

878

879

880

881

882

883

884

885

886

887

888

889

890

891

892

893

894

895

896

897

898

899

900

901

902

903

904

905

906

907

908

909

910

911

912

913

914

915

916

917

918

919

920

921

922

923

924

925

926

927

928

929

930

931

932

933

934

935

936

937

938

939

940

941

942

943

944

945

946

947

948

949

950

951

952

953

954

955

956

957

958

959

960

961

962

963

964

965

966

967

968

969

970

971

972

973

974

975

976

977

978

979

980

981

982

983

984

985

986

987

988

989

990

991

992

993

994

995

996

997

998

999

1000

Title: WebGPU Explainer

Shortname: webgpu-explainer

Level: None

Status: w3c/CG-DRAFT

Group: webgpu

URL: https://gpuweb.github.io/gpuweb/explainer/

!Participate: <a href="https://github.com/gpuweb/gpuweb/issues/new?labels=explainer">File an issue</a> (<a href="https://github.com/gpuweb/gpuweb/labels/explainer">open issues</a>)

Editor: Kai Ninomiya, Google https://www.google.com, kainino@google.com, w3cid 99487

Editor: Corentin Wallez, Google https://www.google.com, cwallez@google.com

Editor: Dzmitry Malyshau, Mozilla https://www.mozilla.org, dmalyshau@mozilla.com, w3cid 96977

No Abstract: true

Markup Shorthands: markdown yes

Markup Shorthands: dfn yes

Markup Shorthands: idl yes

Markup Shorthands: css no

Assume Explicit For: yes

Boilerplate: repository-issue-tracking no

</pre>

<style>

/* Our SVGs aren't responsive to light/dark mode, so they're opaque with a

* white or black background. Rounded corners make them a bit less jarring. */

object[type="image/svg+xml"] {

border-radius: .5em;

}

</style>

Issue(tabatkins/bikeshed#2006): Set up cross-linking into the WebGPU and WGSL specs.

Issue(gpuweb/gpuweb#1321): Complete the planned sections.

# Introduction # {#introduction}

WebGPU is a proposed Web API to enable webpages to use the system's [GPU (Graphics Processing Unit)](https://en.wikipedia.org/wiki/Graphics_processing_unit) to perform computations and draw complex images that can be presented inside the page.

This goal is similar to the [WebGL](https://www.khronos.org/webgl/) family of APIs, but WebGPU enables access to more advanced features of GPUs.

Whereas WebGL is mostly for drawing images but can be repurposed (with great effort) to do other kinds of computations, WebGPU has first-class support for performing general computations on the GPU.

## Use cases ## {#use-cases}

Example use cases for WebGPU that aren't addressed by WebGL 2 are:

- Drawing images with highly-detailed scenes with many different objects (such as CAD models). WebGPU's drawing commands are individually cheaper than WebGL's.

- Executing advanced algorithms for drawing realistic scenes.

Many modern rendering techniques and optimizations cannot execute on WebGL 2 due to the lack of support for general computations.

- Executing machine learning models efficiently on the GPU.

It is possible to do general-purpose GPU (GPGPU) computation in WebGL, but it is sub-optimal and much more difficult.

Concrete examples are:

- Improving existing Javascript 3D libraries like Babylon.js and Three.js with new rendering techniques (compute-based particles, fancier post-processing, ...) and offloading to the GPU expensive computations currently done on the CPU (culling, skinned model transformation, ...).

- Porting newer game engines to the Web, and enable engines to expose more advanced rendering features.

For example Unity's WebGL export uses the lowest feature set of the engine, but WebGPU could use a higher feature set.

- Porting new classes of applications to the Web: many productivity applications offload computations to the GPU and need WebGPU's support for general computations.

- Improving existing Web teleconferencing applications. For example, Google Meet uses machine learning to separate the user from the background.

Running the machine learning in WebGPU would make it faster and more power-efficient, allowing (1) these capabilities to reach cheaper, more accessible user devices and (2) more complex and robust models.

## Goals ## {#goals}

Goals:

- Enable rendering of modern graphics both onscreen and offscreen.

- Enable general purpose computations to be executed efficiently on the GPU.

- Support implementations targeting various native GPU APIs: Microsoft's D3D12, Apple's Metal, and Khronos' Vulkan.

- Provide a human-authorable language to specify computations to run on the GPU.

- Be implementable in the multi-process architecture of browsers and uphold the security of the Web.

- As much as possible, have applications work portably across different user systems and browsers.

- Interact with the rest of the Web platform in useful but carefully-scoped ways (essentially sharing images one way or another).

- Provide a foundation to expose modern GPU functionality on the Web.

WebGPU is structured similarly to all current native GPU APIs, even if it doesn't provide all their features.

There are plans to later extend it to have more modern functionality.

See also: [[#why-not-webgl3]].

Non-goals:

- Expose support for hardware that's not programmable at all, or much less flexible, like DSPs or specialized machine learning hardware.

- Expose support for hardware that can't do general-purpose computations (like older mobile phones GPUs or even older desktop GPUs).

- Exhaustively expose all functionality available on native GPU APIs (some functionality is only available on GPUs from a single vendor, or is too niche to be added to WebGPU).

- Allow extensive mixing and matching of WebGL and WebGPU code.

- Tightly integrate with the page rendering flow like [CSS Houdini](https://developer.mozilla.org/en-US/docs/Web/Houdini).

## Why not "WebGL 3"? ## {#why-not-webgl3}

WebGL 1.0 and WebGL 2.0 are Javascript projections of the OpenGL ES 2.0 and OpenGL ES 3.0 APIs, respectively. WebGL's design traces its roots back to the OpenGL 1.0 API released in 1992 (which further traces its roots back to IRIS GL from the 1980s). This lineage has many advantages, including the vast available body of knowledge and the relative ease of porting applications from OpenGL ES to WebGL.

However, this also means that WebGL doesn't match the design of modern GPUs, causing CPU performance and GPU performance issues. It also makes it increasingly hard to implement WebGL on top of modern native GPU APIs. [WebGL 2.0 Compute](https://www.khronos.org/registry/webgl/specs/latest/2.0-compute/) was an attempt at adding general compute functionality to WebGL but the impedance mismatch with native APIs made the effort incredibly difficult. Contributors to WebGL 2.0 Compute decided to focus their efforts on WebGPU instead.

# Additional Background # {#background}

## Sandboxed GPU Processes in Web Browsers ## {#gpu-process}

A major design constraint for WebGPU is that it must be implementable and efficient in browsers that use a GPU-process architecture.

GPU drivers need access to additional kernel syscalls than what's otherwise used for Web content, and many GPU drivers are prone to hangs or crashes.

To improve stability and sandboxing, browsers use a special process that contains the GPU driver and talks with the rest of the browser through asynchronous IPC.

GPU processes are (or will be) used in Chromium, Gecko, and WebKit.

GPU processes are less sandboxed than content processes, and they are typically shared between multiple origins.

Therefore, they must validate all messages, for example to prevent a compromised content process from being able to look at the GPU memory used by another content process.

Most of WebGPU's validation rules are necessary to ensure it is secure to use, so all the validation needs to happen in the GPU process.

Likewise, all GPU driver objects only live in the GPU process, including large allocations (like buffers and textures) and complex objects (like pipelines).

In the content process, WebGPU types (`GPUBuffer`, `GPUTexture`, `GPURenderPipeline`, ...) are mostly just "handles" that identify objects that live in the GPU process.

This means that the CPU and GPU memory used by WebGPU object isn't necessarily known in the content process.

A `GPUBuffer` object can use maybe 150 bytes of CPU memory in the content process but hold a 1GB allocation of GPU memory.

See also the description of [the content and device timelines in the specification](https://gpuweb.github.io/gpuweb/#programming-model-timelines).

## Memory Visibility with GPUs and GPU Processes ## {#memory-visibility}

The two major types of GPUs are called "integrated GPUs" and "discrete GPUs".

Discrete GPUs are separate from the CPU; they usually come as PCI-e cards that you plug into the motherboard of a computer.

Integrated GPUs live on the same die as the CPU and don't have their own memory chips; instead, they use the same RAM as the CPU.

When using a discrete GPU, it's easy to see that most GPU memory allocations aren't visible to the CPU because they are inside the GPU's RAM (or VRAM for Video RAM).

For integrated GPUs most memory allocations are in the same physical places, but not made visible to the GPU for various reasons (for example, the CPU and GPU can have separate caches for the same memory, so accesses are not cache-coherent).

Instead, for the CPU to see the content of a GPU buffer, it must be "mapped", making it available in the virtual memory space of the application (think of mapped as in `mmap()`).

GPUBuffers must be specially allocated in order to be mappable - this can make it less efficient to access from the GPU (for example if it needs to be allocate in RAM instead of VRAM).

All this discussion was centered around native GPU APIs, but in browsers, the GPU driver is loaded in the *GPU process*, so native GPU buffers can be mapped only in the GPU process's virtual memory.

In general, it is not possible to map the buffer directly inside the *content process* (though some systems can do this, providing optional optimizations).

To work with this architecture an extra "staging" allocation is needed in shared memory between the GPU process and the content process.

The table below recapitulates which type of memory is visible where:

<thead>

<tr>

<th>

<th> Regular `ArrayBuffer`

<th> Shared Memory

<th> Mappable GPU buffer

<th> Non-mappable GPU buffer (or texture)

</tr>

</thead>

<tr>

<td> CPU, in the content process

<td> **Visible**

<td> Not visible

<tr>

<td> CPU, in the GPU process

<td> Not visible

<td> **Visible**

<td> Not visible

<tr>

<td> GPU

<td> Not visible

<td> **Visible**

</table>

# JavaScript API # {#api}

This section goes into details on important and unusual aspects of the WebGPU JavaScript API.

Generally, each subsection can be considered its own "mini-explainer",

though some require context from previous subsections.

## Adapters and Devices ## {#adapters-and-devices}

A WebGPU "adapter" (`GPUAdapter`) is an object which identifies a particular WebGPU

implementation on the system (e.g. a hardware accelerated implementation on an integrated or

discrete GPU, or software implementation).

Two different `GPUAdapter` objects on the same page could refer to the same underlying

implementation, or to two different underlying implementations (e.g. integrated and discrete GPUs).

The set of adapters visible to the page is at the discretion of the user agent.

A WebGPU "device" (`GPUDevice`) represents a logical connection to a WebGPU adapter.

It is called a "device" because it abstracts away the underlying implementation (e.g. video card)

and encapsulates a single connection: code that owns a device can act as if it is the only user

of the adapter.

As part of this encapsulation, a device is the root owner of all WebGPU objects created from it

(textures, etc.), which can be (internally) freed whenever the device is lost or destroyed.

Multiple components on a single webpage can each have their own WebGPU device.

All WebGPU usage is done through a WebGPU device or objects created from it.

In this sense, it serves a subset of the purpose of `WebGLRenderingContext`; however, unlike

`WebGLRenderingContext`, it is not associated with a canvas object, and most commands are

issued through "child" objects.

### Adapter Selection and Device Init ### {#initialization}

To get an adapter, an application calls `navigator.gpu.requestAdapter()`, optionally passing

options which may influence what adapter is chosen, like a

`powerPreference` (`"low-power"` or `"high-performance"`) or

`forceFallbackAdapter` to force a software implementation.

`requestAdapter()` never rejects, but may resolve to null if an adapter can't be returned with

the specified options.

A returned adapter exposes `info` (`vendor`/`architecture`/etc., implementation-defined), a boolean `isFallbackAdapter` so

applications with fallback paths (like WebGL or 2D canvas) can avoid slow software implementations,

and the [[#optional-capabilities]] available on the adapter.

const adapter = await navigator.gpu.requestAdapter(options);

if (!adapter) return goToFallback();

</pre>

To get a device, an application calls `adapter.requestDevice()`, optionally passing a descriptor

which enables additional optional capabilities - see [[#optional-capabilities]].

`requestDevice()` will reject (only) if the request is invalid,

i.e. it exceeds the capabilities of the adapter.

If anything else goes wrong in creation of the device,

it will resolve to a `GPUDevice` which has already been lost - see [[#device-loss]].

(This simplifies the number of different situations an app must handle

by avoiding an extra possible return value like `null` or another exception type,.)

const device = await adapter.requestDevice(descriptor);

device.lost.then(recoverFromDeviceLoss);

</pre>

An adapter may become unavailable, e.g. if it is unplugged from the system, disabled to save

power, or marked "stale" (`[[current]]` becomes false).

From then on, such an adapter can no longer vend valid devices,

and always returns already-lost `GPUDevice`s.

### Optional Capabilities ### {#optional-capabilities}

Each adapter may have different optional capabilities called "features" and "limits".

These are the maximum possible capabilities that can be requested when a device is created.

The set of optional capabilities exposed on each adapter is at the discretion of the user agent.

A device is created with an exact set of capabilities, specified in the arguments to

`adapter.requestDevice()` (see above).

When any work is issued to a device, it is strictly validated against the capabilities of the

device - not the capabilities of the adapter.

This eases development of portable applications by avoiding implicit dependence on the

capabilities of the development system.

## Object Validity and Destroyed-ness ## {#invalid-and-destroyed}

### WebGPU's Error Monad ### {#error-monad}

A.k.a. Contagious Internal Nullability.

A.k.a. transparent [promise pipelining](http://erights.org/elib/distrib/pipeline.html).

WebGPU is a very chatty API, with some applications making tens of thousands of calls per frame to render complex scenes.

We have seen that the GPU processes needs to validate the commands to satisfy their security property.

To avoid the overhead of validating commands twice in both the GPU and content process, WebGPU is designed so Javascript calls can be forwarded directly to the GPU process and validated there.

See the error section for more details on what's validated where and how errors are reported.

At the same time, during a single frame WebGPU objects can be created that depend on one another.

For example a `GPUCommandBuffer` can be recorded with commands that use temporary `GPUBuffer`s created in the same frame.

In this example, because of the performance constraint of WebGPU, it is not possible to send the message to create the `GPUBuffer` to the GPU process and synchronously wait for its processing before continuing Javascript execution.

Instead, in WebGPU all objects (like `GPUBuffer`) are created immediately on the content timeline and returned to JavaScript.

The validation is almost all done asynchronously on the "device timeline".

In the good case, when no errors occur , everything looks to JS as if it is synchronous.

However, when an error occurs in a call, it becomes a no-op (except for error reporting).

If the call returns an object (like `createBuffer`), the object is tagged as "invalid" on the GPU process side.

Since validation and allocation occur asynchronously, errors are reported asynchronously.

By itself, this can make for challenging debugging - see [[#errors-cases-debugging]].

All WebGPU calls validate that all their arguments are valid objects.

As a result, if a call takes one WebGPU object and returns a new one, the new object is also invalid (hence the term "contagious").

Timeline diagram of messages passing between processes, demonstrating how errors are propagated without synchronization.

</figcaption>

</figure>

Using the API when doing only valid calls looks like a synchronous API:

const srcBuffer = device.createBuffer({

size: 4,

usage: GPUBufferUsage.COPY_SRC

});

const dstBuffer = ...;

const encoder = device.createCommandEncoder();

encoder.copyBufferToBuffer(srcBuffer, 0, dstBuffer, 0, 4);

const commands = encoder.finish();

device.queue.submit([commands]);

</pre>

</div>

Errors propagate contagiously when creating objects:

// The size of the buffer is too big, this causes an OOM and srcBuffer is invalid.

const srcBuffer = device.createBuffer({

size: BIG_NUMBER,

usage: GPUBufferUsage.COPY_SRC

});

const dstBuffer = ...;

// The encoder starts as a valid object.

const encoder = device.createCommandEncoder();

// Special case: an invalid object is used when encoding commands, so the encoder

// becomes invalid.

encoder.copyBufferToBuffer(srcBuffer, 0, dstBuffer, 0, 4);

// Since the encoder is invalid, encoder.finish() is invalid and returns

// an invalid object.

const commands = encoder.finish();

// The command references an invalid object so it becomes a no-op.

device.queue.submit([commands]);

</pre>

</div>

#### Mental Models #### {#error-monad-mental-model}

One way to interpret WebGPU's semantics is that every WebGPU object is actually a `Promise` internally and that all WebGPU methods are `async` and `await` before using each of the WebGPU objects it gets as argument.

However the execution of the async code is outsourced to the GPU process (where it is actually done synchronously).

Another way, closer to actual implementation details, is to imagine that each `GPUFoo` JS object maps to a `gpu::InternalFoo` C++/Rust object on the GPU process that contains a `bool isValid`.

Then during the validation of each command on the GPU process, the `isValid` are all checked and a new, invalid object is returned if validation fails.

On the content process side, the `GPUFoo` implementation doesn't know if the object is valid or not.

### Early Destruction of WebGPU Objects ### {#early-destroy}

Most of the memory usage of WebGPU objects is in the GPU process: it can be GPU memory held by objects like `GPUBuffer` and `GPUTexture`, serialized commands held in CPU memory by `GPURenderBundles`, or complex object graphs for the WGSL AST in `GPUShaderModule`.

The JavaScript garbage collector (GC) is in the renderer process and doesn't know about the memory usage in the GPU process.

Browsers have many heuristics to trigger GCs but a common one is that it should be triggered on memory pressure scenarios.

However a single WebGPU object can hold on to MBs or GBs of memory without the GC knowing and never trigger the memory pressure event.

It is important for WebGPU applications to be able to directly free the memory used by some WebGPU objects without waiting for the GC.

For example applications might create temporary textures and buffers each frame and without the explicit `.destroy()` call they would quickly run out of GPU memory.

That's why WebGPU has a `.destroy()` method on those object types which can hold on to arbitrary amount of memory.

It signals that the application doesn't need the content of the object anymore and that it can be freed as soon as possible.

Of course, it becomes a validation error to use the object after the call to `.destroy()`.

const dstBuffer = device.createBuffer({

size: 4

usage: GPUBufferUsage.COPY_DST

});

// The buffer is not destroyed (and valid), success!

device.queue.writeBuffer(dstBuffer, 0, myData);

dstBuffer.destroy();

// The buffer is now destroyed, commands using that would use its

// content produce validation errors.

device.queue.writeBuffer(dstBuffer, 0, myData);

</pre>

</div>

Note that, while this looks somewhat similar to the behavior of an invalid buffer, it is distinct.

Unlike invalidity, destroyed-ness can change after creation, is not contagious, and is validated only when work is actually submitted (e.g. `queue.writeBuffer()` or `queue.submit()`), not when creating dependent objects (like command encoders, see above).

## Errors ## {#errors}

In a simple world, error handling in apps would be synchronous with JavaScript exceptions.

However, for multi-process WebGPU implementations, this is prohibitively expensive.

See [[#invalid-and-destroyed]], which also explains how the *browser* handles errors.

### Problems and Solutions ### {#errors-solutions}

Developers and applications need error handling for a number of cases:

- *Debugging*:

Getting errors synchronously during development, to break in to the debugger.

- *Fatal Errors*:

Handling device/adapter loss, either by restoring WebGPU or by fallback to non-WebGPU content.

- *Fallible Allocation*:

Making fallible GPU-memory resource allocations (detecting out-of-memory conditions).

- *Fallible Validation*:

Checking success of WebGPU calls, for applications' unit/integration testing, WebGPU

conformance testing, or detecting errors in data-driven applications (e.g. loading glTF

models that may exceed device limits).

- *App Telemetry*:

Collecting error logs in web app deployment, for bug reporting and telemetry.

The following sections go into more details on these cases and how they are solved.

#### Debugging #### {#errors-cases-debugging}

**Solution:** Dev Tools.

Implementations should provide a way to enable synchronous validation,

for example via a "break on WebGPU error" option in the developer tools.

This can be achieved with a content-process&lrarr;gpu-process round-trip in every validated WebGPU

call, though in practice this would be very slow.

It can be optimized by running a "predictive" mirror of the validation steps in the content

process, which either ignores out-of-memory errors (which it can't predict),

or uses round-trips only for calls that can produce out-of-memory errors.

#### Fatal Errors: Adapter and Device Loss #### {#errors-cases-fatalerrors}

**Solution:** [[#device-loss]].

#### Fallible Allocation, Fallible Validation, and Telemetry #### {#errors-cases-other}

**Solution:** *Error Scopes*.

For important context, see [[#invalid-and-destroyed]]. In particular, all errors (validation and

out-of-memory) are detected asynchronously, in a remote process.

In the WebGPU spec, we refer to the thread of work for each WebGPU device as its "device timeline".

As such, applications need a way to instruct the device timeline on what to do with any errors

that occur. To solve this, WebGPU uses *Error Scopes*.

### Error Scopes ### {#errors-errorscopes}

WebGL exposes errors using a `getError` function which returns the first error since the last `getError` call.

This is simple, but has two problems.

- It is synchronous, incurring a round-trip and requiring all previously issued work to be finished.

We solve this by returning errors asynchronously.

- Its flat state model composes poorly: errors can leak to/from unrelated code, possibly in

libraries/middleware, browser extensions, etc. We solve this with a stack of error "scopes",

allowing each component to hermetically capture and handle its own errors.

In WebGPU, each device<sup>1</sup> maintains a persistent "error scope" stack state.

Initially, the device's error scope stack is empty.

`GPUDevice.pushErrorScope('validation')` or `GPUDevice.pushErrorScope('out-of-memory')`

begins an error scope and pushes it onto the stack.

This scope captures only errors of a particular type depending on the type of error the application

wants to detect.

It is rare to need to detect both, so two nested error scopes are needed to do so.

`GPUDevice.popErrorScope()` ends an error scope, popping it from the stack and returning a

`Promise<GPUError?>`, which resolves once enclosed operations have completed and reported back.

This includes exactly all fallible operations that were *issued* during between the push and pop calls.

It resolves to `null` if no errors were captured, and otherwise resolves to an object describing

the first error that was captured by the scope - either a `GPUValidationError` or a

`GPUOutOfMemoryError`.

Any device-timeline error from an operation is passed to the top-most error scope on the stack at

the time it was issued.

- If an error scope captures an error, the error is not passed down the stack.

Each error scope stores **only one** error it captures; any other errors it captures

are **silently ignored**.

- If not, the error is passed down the stack to the enclosing error scope.

- If an error reaches the bottom of the stack, it **may**<sup>2</sup> fire the `uncapturederror`

event on `GPUDevice`<sup>3</sup> (and could issue a console warning as well).

In the plan to add [[#multithreading]], error scope state to actually be **per-device, per-realm**.

That is, when a GPUDevice is posted to a Worker for the first time, the error scope stack for

that device+realm is always empty.

(If a GPUDevice is copied *back* to an execution context it already existed on, it shares its

error scope state with all other copies on that execution context.)

The implementation may not choose to always fire the event for a given error, for example if it

has fired too many times, too many times rapidly, or with too many errors of the same kind.

This is similar to how Dev Tools console warnings work today for WebGL.

In poorly-formed applications, this mechanism can prevent the events from having a significant

performance impact on the system.

More specifically, with [[#multithreading]], this event would only exist on the *originating*

`GPUDevice` (the one that came from `createDevice`, and not by receiving posted messages);

a distinct interface would be used for non-originating device objects.

```webidl

enum GPUErrorFilter {

"out-of-memory",

"validation"

};

interface GPUOutOfMemoryError {

constructor();

};

interface GPUValidationError {

constructor(DOMString message);

readonly attribute DOMString message;

};

typedef (GPUOutOfMemoryError or GPUValidationError) GPUError;

partial interface GPUDevice {

undefined pushErrorScope(GPUErrorFilter filter);

Promise<GPUError?> popErrorScope();

};

```

#### How this solves *Fallible Allocation* #### {#errors-errorscopes-allocation}

If a call that fallibly allocates GPU memory (e.g. `createBuffer` or `createTexture`) fails, the

resulting object is invalid (same as if there were a validation error), but an `'out-of-memory'`

error is generated.

An `'out-of-memory'` error scope can be used to detect it.

**Example: tryCreateBuffer**

```ts

async function tryCreateBuffer(device: GPUDevice, descriptor: GPUBufferDescriptor): Promise<GPUBuffer | null> {

device.pushErrorScope('out-of-memory');

const buffer = device.createBuffer(descriptor);

if (await device.popErrorScope() !== null) {

return null;

}

return buffer;

}

```

This interacts with buffer mapping error cases in subtle ways due to numerous possible

out-of-memory situations in implementations, but they are not explained here.

The principle used to design the interaction is that app code should need to handle as few

different edge cases as possible, so multiple kinds of situations should result in the same

behavior.

In addition, there are (will be) rules on the relative ordering of most promise resolutions,

to prevent non-portable browser behavior or flaky races between async code.

#### How this solves *Fallible Validation* #### {#errors-errorscopes-validation}

A `'validation'` error scope can be used to detect validation errors, as above.

**Example: Testing**

```ts

device.pushErrorScope('out-of-memory');

device.pushErrorScope('validation');

{

// (Do stuff that shouldn't produce errors.)

{

device.pushErrorScope('validation');

device.doOperationThatIsExpectedToError();

device.popErrorScope().then(error => { assert(error !== null); });

}

// (More stuff that shouldn't produce errors.)

}

// Detect unexpected errors.

device.popErrorScope().then(error => { assert(error === null); });

```

#### How this solves *App Telemetry* #### {#errors-errorscopes-telemetry}

As mentioned above, if an error is not captured by an error scope, it **may** fire the

originating device's `uncapturederror` event.

Applications can either watch for that event, or encapsulate parts of their application with

error scopes, to detect errors for generating error reports.

`uncapturederror` is not strictly necessary to solve this, but has the benefit of providing a

single stream for uncaptured errors from all threads.

#### Error Messages and Debug Labels #### {#errors-errorscopes-labels}

Every WebGPU object has a read-write attribute, `label`, which can be set by the application to

provide information for debugging tools (error messages, native profilers like Xcode, etc.)

Every WebGPU object creation descriptor has a member `label` which sets the initial value of the

attribute.

Additionally, parts of command buffers can be labeled with debug markers and debug groups.

See [[#command-encoding-debug]].

For both debugging (dev tools messages) and app telemetry (`uncapturederror`)

implementations can choose to report some kind of "stack trace" in their error messages,

taking advantage of object debug labels.

For example, a debug message string could be:

```

<myQueue>.submit failed:

- commands[0] (<mainColorPass>) was invalid:

- in the debug group <environment>:

- in the debug group <tree 123>:

- in setIndexBuffer, indexBuffer (<mesh3.indices>) was invalid:

- in createBuffer, desc.usage (0x89) was invalid

```

### Alternatives Considered ### {#errors-alternatives}

- Synchronous `getError`, like WebGL. Discussed at the beginning: [[#errors-errorscopes]].

- Callback-based error scope: `device.errorScope('out-of-memory', async () => { ... })`.

Since it's necessary to allow asynchronous work inside error scopes, this formulation is

actually largely equivalent to the one shown above, as the callback could never resolve.

Application architectures would be limited by the need to conform to a compatible call stack,

or they would remap the callback-based API into a push/pop-based API.

Finally, it's generally not catastrophic if error scopes become unbalanced, though the

stack could grow unboundedly resulting in an eventual crash (or device loss).

## Device Loss ## {#device-loss}

Any situation that prevents further use of a `GPUDevice` results in a *device loss*.

These can arise due to WebGPU calls or external events; for example:

`device.destroy()`, an unrecoverable out-of-memory condition, a GPU process crash, a long

operation resulting in GPU reset, a GPU reset caused by another application, a discrete GPU being

switched off to save power, or an external GPU being unplugged.

**Design principle:**

There should be as few different-looking error behaviors as possible.

This makes it easier for developers to test their app's behavior in different situations,

improves robustness of applications in the wild, and improves portability between browsers.

Issue: Finish this explainer (see [ErrorHandling.md](https://github.com/gpuweb/gpuweb/blob/main/design/ErrorHandling.md#fatal-errors-requestadapter-requestdevice-and-devicelost)).

## Buffer Mapping ## {#buffer-mapping}

A `GPUBuffer` represents a memory allocation usable by other GPU operations.

This memory can be accessed linearly, contrary to `GPUTexture` for which the actual memory layout of sequences of texels are unknown. Think of `GPUBuffers` as the result of `gpu_malloc()`.

**CPU→GPU:** When using WebGPU, applications need to transfer data from JavaScript to `GPUBuffer` very often and potentially in large quantities.

This includes mesh data, drawing and computations parameters, ML model inputs, etc.

That's why an efficient way to update `GPUBuffer` data is needed. `GPUQueue.writeBuffer` is reasonably efficient but includes at least an extra copy compared to the buffer mapping used for writing buffers.

**GPU→CPU:** Applications also often need to transfer data from the GPU to Javascript, though usually less often and in lesser quantities.

This includes screenshots, statistics from computations, simulation or ML model results, etc.

This transfer is done with buffer mapping for reading buffers.

See [[#memory-visibility]] for additional background on the various types of memory that buffer mapping interacts with.

### CPU-GPU Ownership Transfer ### {#buffer-mapping-ownership}

In native GPU APIs, when a buffer is mapped, its content becomes accessible to the CPU.

At the same time the GPU can keep using the buffer's content, which can lead to data races between the CPU and the GPU.

This means that the usage of mapped buffer is simple but leaves the synchronization to the application.

On the contrary, WebGPU prevents almost all data races in the interest of portability and consistency.

In WebGPU there is even more risk of non-portability with races on mapped buffers because of the additional "shared memory" step that may be necessary on some drivers.

That's why `GPUBuffer` mapping is done as an ownership transfer between the CPU and the GPU.

At each instant, only one of the two can access it, so no race is possible.

When an application requests to map a buffer, it initiates a transfer of the buffer's ownership to the CPU.

At this time, the GPU may still need to finish executing some operations that use the buffer, so the transfer doesn't complete until all previously-enqueued GPU operations are finished.

That's why mapping a buffer is an asynchronous operation (we'll discuss the other arguments below):

typedef [EnforceRange] unsigned long GPUMapModeFlags;

namespace GPUMapMode {

const GPUFlagsConstant READ = 0x0001;

const GPUFlagsConstant WRITE = 0x0002;

};

partial interface GPUBuffer {

Promise<undefined> mapAsync(GPUMapModeFlags mode,

optional GPUSize64 offset = 0,

optional GPUSize64 size);

};

</xmp>

Using it is done like so:

// Mapping a buffer for writing. Here offset and size are defaulted,

// so the whole buffer is mapped.

const myMapWriteBuffer = ...;

await myMapWriteBuffer.mapAsync(GPUMapMode.WRITE);

// Mapping a buffer for reading. Only the first four bytes are mapped.

const myMapReadBuffer = ...;

await myMapReadBuffer.mapAsync(GPUMapMode.READ, 0, 4);

</pre>

</div>

Once the application has finished using the buffer on the CPU, it can transfer ownership back to the GPU by unmapping it.

This is an immediate operation that makes the application lose all access to the buffer on the CPU (i.e. detaches `ArrayBuffers`):

partial interface GPUBuffer {

undefined unmap();

};

</xmp>

Using it is done like so:

const myMapReadBuffer = ...;

await myMapReadBuffer.mapAsync(GPUMapMode.READ, 0, 4);

// Do something with the mapped buffer.

buffer.unmap();

</pre>

</div>

When transferring ownership to the CPU, a copy may be necessary from the underlying mapped buffer to shared memory visible to the content process.

To avoid copying more than necessary, the application can specify which range it is interested in when calling `GPUBuffer.mapAsync`.

`GPUBuffer.mapAsync`'s `mode` argument controls which type of mapping operation is performed.

At the moment its values are redundant with the buffer creation's usage flags, but it is present for explicitness and future extensibility.

While a `GPUBuffer` is owned by the CPU, it is not possible to submit any operations on the device timeline that use it; otherwise, a validation error is produced.

However it is valid (and encouraged!) to record `GPUCommandBuffer`s using the `GPUBuffer`.

### Creation of Mappable Buffers ### {#buffer-mapping-creation}

The physical memory location for a `GPUBuffer`'s underlying buffer depends on whether it should be mappable and whether it is mappable for reading or writing (native APIs give some control on the CPU cache behavior for example).

At the moment mappable buffers can only be used to transfer data (so they can only have the correct `COPY_SRC` or `COPY_DST` usage in addition to a `MAP_*` usage),

That's why applications must specify that buffers are mappable when they are created using the (currently) mutually exclusive `GPUBufferUsage.MAP_READ` and `GPUBufferUsage.MAP_WRITE` flags:

const myMapReadBuffer = device.createBuffer({

usage: GPUBufferUsage.MAP_READ | GPUBufferUsage.COPY_DST,

size: 1000,

});

const myMapWriteBuffer = device.createBuffer({

usage: GPUBufferUsage.MAP_WRITE | GPUBufferUsage.COPY_SRC,

size: 1000,

});

</pre>

</div>

### Accessing Mapped Buffers ### {#buffer-mapping-access}

Once a `GPUBuffer` is mapped, it is possible to access its memory from JavaScript

This is done by calling `GPUBuffer.getMappedRange`, which returns an `ArrayBuffer` called a "mapping".

These are available until `GPUBuffer.unmap` or `GPUBuffer.destroy` is called, at which point they are detached.

These `ArrayBuffer`s typically aren't new allocations, but instead pointers to some kind of shared memory visible to the content process (IPC shared memory, `mmap`ped file descriptor, etc.)

When transferring ownership to the GPU, a copy may be necessary from the shared memory to the underlying mapped buffer.

`GPUBuffer.getMappedRange` takes an optional range of the buffer to map (for which `offset` 0 is the start of the buffer).

This way the browser knows which parts of the underlying `GPUBuffer` have been "invalidated" and need to be updated from the memory mapping.

The range must be within the range requested in `mapAsync()`.

partial interface GPUBuffer {

ArrayBuffer getMappedRange(optional GPUSize64 offset = 0,

optional GPUSize64 size);

};

</xmp>

Using it is done like so:

const myMapReadBuffer = ...;

await myMapReadBuffer.mapAsync(GPUMapMode.READ);

const data = myMapReadBuffer.getMappedRange();

// Do something with the data

myMapReadBuffer.unmap();

</pre>

</div>

### Mapping Buffers at Creation ### {#buffer-mapping-at-creation}

A common need is to create a `GPUBuffer` that is already filled with some data.

This could be achieved by creating a final buffer, then a mappable buffer, filling the mappable buffer, and then copying from the mappable to the final buffer, but this would be inefficient.

Instead this can be done by making the buffer CPU-owned at creation: we call this "mapped at creation".

All buffers can be mapped at creation, even if they don't have the `MAP_WRITE` buffer usages.

The browser will just handle the transfer of data into the buffer for the application.

Once a buffer is mapped at creation, it behaves as regularly mapped buffer: `GPUBUffer.getMappedRange()` is used to retrieve `ArrayBuffer`s, and ownership is transferred to the GPU with `GPUBuffer.unmap()`.

Mapping at creation is done by passing `mappedAtCreation: true` in the buffer descriptor on creation:

const buffer = device.createBuffer({

usage: GPUBufferUsage.UNIFORM,

size: 256,

mappedAtCreation: true,

});

const data = buffer.getMappedRange();

// write to data

buffer.unmap();

</pre>

</div>

When using advanced methods to transfer data to the GPU (with a rolling list of buffers that are mapped or being mapped), mapping buffer at creation can be used to immediately create additional space where to put data to be transferred.

### Examples ### {#buffer-mapping-examples}

The optimal way to create a buffer with initial data, for example here a [Draco](https://google.github.io/draco/)-compressed 3D mesh:

const dracoDecoder = ...;

const buffer = device.createBuffer({

usage: GPUBuffer.VERTEX | GPUBuffer.INDEX,

size: dracoDecoder.decompressedSize,

mappedAtCreation: true,

});

dracoDecoder.decodeIn(buffer.getMappedRange());

buffer.unmap();

</pre>

</div>

Retrieving data from a texture rendered on the GPU:

const texture = getTheRenderedTexture();

const readbackBuffer = device.createBuffer({

usage: GPUBufferUsage.COPY_DST | GPUBufferUsage.MAP_READ,

size: 4 * textureWidth * textureHeight,

});

// Copy data from the texture to the buffer.

const encoder = device.createCommandEncoder();

encoder.copyTextureToBuffer(

{ texture },

{ buffer: readbackBuffer, bytesPerRow: textureWidth * 4 },

[textureWidth, textureHeight],

);

device.queue.submit([encoder.finish()]);

// Get the data on the CPU.

await readbackBuffer.mapAsync(GPUMapMode.READ);

saveScreenshot(readbackBuffer.getMappedRange());

readbackBuffer.unmap();

</pre>

</div>

Updating a bunch of data on the GPU for a frame:

void frame() {

// Create a new buffer for our updates. In practice we would

// reuse buffers from frame to frame by re-mapping them.

const stagingBuffer = device.createBuffer({

usage: GPUBufferUsage.MAP_WRITE | GPUBufferUsage.COPY_SRC,

size: 16 * objectCount,

mappedAtCreation: true,

});

const stagingData = new Float32Array(stagingBuffer.getMappedRange());

// For each draw we are going to:

// - Put the data for the draw in stagingData.

// - Record a copy from the stagingData to the uniform buffer for the draw

// - Encoder the draw

const copyEncoder = device.createCommandEncoder();

const drawEncoder = device.createCommandEncoder();

const renderPass = myCreateRenderPass(drawEncoder);

for (var i = 0; i < objectCount; i++) {

stagingData[i * 4 + 0] = ...;

stagingData[i * 4 + 1] = ...;

stagingData[i * 4 + 2] = ...;

stagingData[i * 4 + 3] = ...;

const {uniformBuffer, uniformOffset} = getUniformsForDraw(i);

copyEncoder.copyBufferToBuffer(

stagingBuffer, i * 16,

uniformBuffer, uniformOffset,

16);

encodeDraw(renderPass, {uniformBuffer, uniformOffset});

}

renderPass.end();

// We are finished filling the staging buffer, unmap() it so

// we can submit commands that use it.

stagingBuffer.unmap();

// Submit all the copies and then all the draws. The copies

// will happen before the draw such that each draw will use

// the data that was filled inside the for-loop above.

device.queue.submit([

copyEncoder.finish(),

drawEncoder.finish()

]);

}

</pre>

</div>

## Multithreading ## {#multithreading}

Multithreading is a key part of modern graphics APIs.

Unlike OpenGL, newer APIs allow applications to encode commands, submit work, transfer data to the GPU, and

so on, from multiple threads at once, alleviating CPU bottlenecks.

This is especially relevant to WebGPU, since IDL bindings are generally much slower than C calls.

WebGPU does not *yet* allow multithreaded use of a single `GPUDevice`, but the API has been

designed from the ground up with this in mind.

This section describes the tentative plan for how it will work.

As described in [[#gpu-process]], most WebGPU objects are actually just "handles" that refer to

objects in the browser's GPU process.

As such, it is relatively straightforward to allow these to be shared among threads.

For example, a `GPUTexture` object can simply be `postMessage()`d to another thread, creating a

new `GPUTexture` JavaScript object containing a handle to the *same* (ref-counted) GPU-process object.

Several objects, like `GPUBuffer`, have client-side state.

Applications still need to use them from multiple threads without having to `postMessage` such

objects back and forth with `[Transferable]` semantics (which would also create new wrapper

objects, breaking old references).

Therefore, these objects will also be `[Serializable]` but have a small amount of (content-side)

**shared state**, just like `SharedArrayBuffer`.

Though access to this shared state is somewhat limited - it can't be changed arbitrarily quickly

on a single object - it might still be a timing attack vector, like `SharedArrayBuffer`,

so it is tentatively gated on cross-origin isolation.

See [Timing attacks](https://gpuweb.github.io/gpuweb/#security-timing).

Given threads "Main" and "Worker":

- Main: `const B1 = device.createBuffer(...);`.

- Main: uses postMessage to send `B1` to Worker.

- Worker: receives message → `B2`.

- Worker: `const mapPromise = B2.mapAsync()` → successfully puts the buffer in the "map pending" state.

- Main: `B1.mapAsync()` → **throws an exception** (and doesn't change the state of the buffer).

- Main: encodes some command that uses `B1`, like:

```js

encoder.copyBufferToTexture(B1, T);

const commandBuffer = encoder.finish();

```

→ succeeds, because this doesn't depend on the buffer's client side state.

- Main: `queue.submit(commandBuffer)` → **asynchronous WebGPU error**,

because the CPU currently owns the buffer.

- Worker: `await mapPromise`, writes to the mapping, then calls `B2.unmap()`.

- Main: `queue.submit(commandBuffer)` → succeeds

- Main: `B1.mapAsync()` → successfully puts the buffer in the "map pending" state

</div>

Further discussion can be found in [#354](https://github.com/gpuweb/gpuweb/issues/354)

(note not all of it reflects current thinking).

### Unsolved: Synchronous Object Transfer ### {#multithreading-transfer}

Some application architectures require objects to be passed between threads without having to

asynchronously wait for a message to arrive on the receiving thread.

The most crucial class of such architectures are in WebAssembly applications:

Programs using native C/C++/Rust/etc. bindings for WebGPU will want to assume object handles

are plain-old-data (e.g. `typedef struct WGPUBufferImpl* WGPUBuffer;`)

that can be passed between threads freely.

Unfortunately, this cannot be implemented in C-on-JS bindings (e.g. Emscripten) without complex,

hidden, and slow asynchronicity (yielding on the receiving thread, interrupting the sending

thread to send a message, then waiting for the object on the receiving thread).

Some alternatives are mentioned in issue [#747](https://github.com/gpuweb/gpuweb/issues/747):

- `SharedObjectTable`, an object with shared-state (like `SharedArrayBuffer`) containing a table of

`[Serializable]` values. Effectively, a store into the table would serialize once, and then any

thread with the `SharedObjectTable` could (synchronously) deserialize the object on demand.

- A synchronous `MessagePort.receiveMessage()` method.

This would be less ideal as it would require any thread that creates one of these objects to

eagerly send it to every thread, just in case they need it later.

- Allow "exporting" a numerical ID for an object that can be used to "import" the object on

another thread. This bypasses the garbage collector and makes it easy to leak memory.

## Command Encoding and Submission ## {#command-encoding}

Many operations in WebGPU are purely GPU-side operations that don't use data from the CPU.

These operations are not issued directly; instead, they are encoded into `GPUCommandBuffer`s

via the builder-like `GPUCommandEncoder` interface, then later sent to the GPU with

`gpuQueue.submit()`.

This design is used by the underlying native APIs as well. It provides several benefits:

- Command buffer encoding is independent of other state, allowing encoding (and command buffer

validation) work to utilize multiple CPU threads.

- Provides a larger chunk of work at once, allowing the GPU driver to do more global

optimization, especially in how it schedules work across the GPU hardware.

### Debug Markers and Debug Groups ### {#command-encoding-debug}

For error messages and debugging tools, it is possible to label work inside a command buffer.

(See [[#errors-errorscopes-labels]].)

- `insertDebugMarker(markerLabel)` marks a point in a stream of commands.

- `pushDebugGroup(groupLabel)`/`popDebugGroup()` nestably demarcate sub-streams of commands.

This can be used e.g. to label which part of a command buffer corresponds to different objects

or parts of a scene.

### Passes ### {#command-encoding-passes}

Issue: Briefly explain passes?

## Pipelines ## {#pipelines}

## Image, Video, and Canvas input ## {#image-input}

Issue: Exact API still in flux as of this writing.

WebGPU is largely isolated from the rest of the Web platform, but has several interop points.

One of these is image data input into the API.

Aside from the general data read/write mechanisms (`writeTexture`, `writeBuffer`, and `mapAsync`),

data can also come from `<img>`/`ImageBitmap`, canvases, and videos.

There are many use-cases that require these, including:

View remainder of file in raw view

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FilesExpand file tree

index.bs

Latest commit

History

index.bs

File metadata and controls