Skip to content

perf: avoid Rc refcount overhead in DomSlot chain traversal#4088

Merged
Madoshakalaka merged 4 commits intomasterfrom
perf/domslot-path-compression
Apr 19, 2026
Merged

perf: avoid Rc refcount overhead in DomSlot chain traversal#4088
Madoshakalaka merged 4 commits intomasterfrom
perf/domslot-path-compression

Conversation

@Madoshakalaka
Copy link
Copy Markdown
Member

This addresses a TODO left by @WorldSEnder

with_next_sibling is called on every DomSlot::insert, which backs every DOM node insertion.
For a BList of N component children processed right-to-left, the total chain hops across all
insertions is 1 + 2 + ... + N = O(N^2). Reducing the constant factor per hop directly improves
large-list creation and reconciliation.

Traverse the DynamicDomSlot chain using raw pointers instead of
Rc::clone/drop per hop. The chain is transitively kept alive by
the borrowed &self, so raw pointer access is sound.
@Madoshakalaka Madoshakalaka added performance A-yew Area: The main yew crate labels Mar 29, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 29, 2026

Visit the preview URL for this PR (updated for commit 75d1266):

https://yew-rs-api--pr4088-perf-domslot-path-co-hkn87gb5.web.app

(expires Sat, 25 Apr 2026 03:15:56 GMT)

🔥 via Firebase Hosting GitHub Action 🌎

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 29, 2026

Benchmark - core

Yew Master

vnode           fastest       │ slowest       │ median        │ mean          │ samples │ iters
╰─ vnode_clone  2.458 ns      │ 2.489 ns      │ 2.461 ns      │ 2.464 ns      │ 100     │ 1000000000

Pull Request

vnode           fastest       │ slowest       │ median        │ mean          │ samples │ iters
╰─ vnode_clone  2.458 ns      │ 3.182 ns      │ 2.466 ns      │ 2.491 ns      │ 100     │ 1000000000

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 29, 2026

Size Comparison

Details
examples master (KB) pull request (KB) diff (KB) diff (%)
actix_ssr_router 613.142 613.130 -0.012 -0.002%
async_clock 100.050 100.034 -0.016 -0.016%
axum_ssr_router 613.141 613.129 -0.012 -0.002%
boids 163.882 163.876 -0.006 -0.004%
communication_child_to_parent 93.483 93.480 -0.003 -0.003%
communication_grandchild_with_grandparent 105.465 105.459 -0.006 -0.006%
communication_grandparent_to_grandchild 101.818 101.812 -0.006 -0.006%
communication_parent_to_child 90.896 90.893 -0.003 -0.003%
contexts 105.708 105.702 -0.006 -0.006%
counter 85.742 85.727 -0.016 -0.018%
counter_functional 87.754 87.745 -0.009 -0.010%
dyn_create_destroy_apps 89.631 89.615 -0.016 -0.017%
file_upload 99.251 99.235 -0.016 -0.016%
function_delayed_input 94.373 94.357 -0.016 -0.017%
function_memory_game 169.552 169.546 -0.006 -0.003%
function_router 399.062 399.057 -0.006 -0.001%
function_todomvc 164.199 164.193 -0.006 -0.004%
futures 234.741 234.726 -0.016 -0.007%
game_of_life 100.489 100.474 -0.016 -0.016%
immutable 258.396 258.395 -0.002 -0.001%
inner_html 80.586 80.570 -0.016 -0.019%
js_callback 109.292 109.286 -0.006 -0.005%
keyed_list 175.909 175.906 -0.003 -0.002%
mount_point 83.953 83.938 -0.016 -0.019%
nested_list 112.853 112.847 -0.006 -0.005%
node_refs 91.469 91.466 -0.003 -0.003%
password_strength 1717.808 1717.802 -0.006 -0.000%
portals 93.107 93.104 -0.003 -0.003%
router 365.807 365.801 -0.006 -0.002%
suspense 113.168 113.162 -0.006 -0.005%
timer 88.326 88.311 -0.016 -0.018%
timer_functional 98.784 98.778 -0.006 -0.006%
todomvc 141.351 141.335 -0.016 -0.011%
two_apps 85.927 85.917 -0.010 -0.011%
web_worker_fib 136.228 136.222 -0.006 -0.004%
web_worker_prime 184.618 184.612 -0.006 -0.003%
webgl 82.729 82.713 -0.016 -0.019%

✅ None of the examples has changed their size significantly.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 29, 2026

Benchmark - SSR

Yew Master

Details
Benchmark Round Min (ms) Max (ms) Mean (ms) Standard Deviation
Baseline 10 310.968 320.559 312.223 2.941
Hello World 10 463.052 465.101 463.905 0.752
Function Router 10 31846.829 32305.560 31992.127 133.754
Concurrent Task 10 1006.248 1007.624 1006.972 0.427
Many Providers 10 1010.631 1070.988 1033.436 21.395

Pull Request

Details
Benchmark Round Min (ms) Max (ms) Mean (ms) Standard Deviation
Baseline 10 310.818 317.213 312.048 1.919
Hello World 10 456.520 496.838 462.951 12.268
Function Router 10 31425.473 31893.226 31636.717 157.420
Concurrent Task 10 1005.877 1007.121 1006.496 0.509
Many Providers 10 1044.149 1111.238 1078.510 22.276

@Madoshakalaka
Copy link
Copy Markdown
Member Author

benchmarks looking good 0454fc7#commitcomment-180899917

@Madoshakalaka Madoshakalaka marked this pull request as ready for review March 29, 2026 13:14
github-actions[bot]
github-actions Bot previously approved these changes Mar 29, 2026
@Madoshakalaka
Copy link
Copy Markdown
Member Author

Madoshakalaka commented Mar 29, 2026

To clarify, it helps with the performance but doesn't actually address the original comment.

The 𝒪(𝑁²) comes from the chain structure itself: child[0] chains through 𝑁-1 hops, child[1] through 𝑁-2, etc. To eliminate this, BList would need to resolve insertion positions without chaining, for example by maintaining a side structure that maps each child index to its resolved DOM node reference, updated eagerly when positions change. That will be a larger refactor of how BList and NodeWriter interact with DomSlot.

I explored a possibility in #4090 briefly but couldn't make it work correctly

@WorldSEnder
Copy link
Copy Markdown
Member

WorldSEnder commented Mar 29, 2026

What you would need is a (self-balancing) tree structure, such as an AVL or Splay (not sure what the best here is - depends on usage I suppose). You need to be able to split and merge the tree (when you reassign a node in the middle) and find the right-most node starting from any node in the middle. Worst case could be O(log n) and most likely even close to O(1) if you splay correctly. I never did get it to work either though when I tried. Something about the intrusive nature of parent pointers in trees and the balancing operations needed and I didn't find an existing impl that fit to not have to invent it here.

EDIT: "worst case" should probably be "amortized" in the above, a true online worst case structure would be more complicated, see e.g. Scapegoat tree or Multi-Splay tree. It really does get hairy!

github-actions[bot]
github-actions Bot previously approved these changes Mar 30, 2026
Comment thread packages/yew/src/dom_bundle/position.rs Outdated
@WorldSEnder
Copy link
Copy Markdown
Member

WorldSEnder commented Mar 30, 2026

Actually, the current structure might not actually form a simple chain, but a tree. You can tell multiple DynamicDomSlots to track the same DomSlot, which can be another DynamicDomSlot etc forming a tree (although in code you go from child to parent). I'm not sure if this capability is used, or if it's simply accidental because it's not so easy to break the assignment of any other DynamicDomSlot that might be assigned to it. In this general form one would need to form a link/cut tree, which I don't fully understand.

github-actions[bot]
github-actions Bot previously approved these changes Apr 18, 2026
…ompression

# Conflicts:
#	packages/yew/src/dom_bundle/position.rs
@Madoshakalaka Madoshakalaka merged commit e163df0 into master Apr 19, 2026
41 checks passed
@Madoshakalaka Madoshakalaka deleted the perf/domslot-path-compression branch April 19, 2026 04:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-yew Area: The main yew crate performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants