-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Expand file tree
/
Copy pathParallelIterations.html
More file actions
469 lines (467 loc) · 45 KB
/
Copy pathParallelIterations.html
File metadata and controls
469 lines (467 loc) · 45 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
<!-- HTML header for doxygen 1.13.1-->
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "https://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en-US">
<head>
<meta http-equiv="Content-Type" content="text/xhtml;charset=UTF-8"/>
<meta http-equiv="X-UA-Compatible" content="IE=11"/>
<meta name="generator" content="Doxygen 1.13.1"/>
<meta name="viewport" content="width=device-width, initial-scale=1"/>
<title>Taskflow: A General-purpose Task-parallel Programming System: Parallel Iterations</title>
<link href="tabs.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="jquery.js"></script>
<script type="text/javascript" src="dynsections.js"></script>
<script type="text/javascript" src="clipboard.js"></script>
<link href="navtree.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="navtreedata.js"></script>
<script type="text/javascript" src="navtree.js"></script>
<script type="text/javascript" src="resize.js"></script>
<script type="text/javascript" src="cookie.js"></script>
<link href="search/search.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="search/searchdata.js"></script>
<script type="text/javascript" src="search/search.js"></script>
<link href="doxygen.css" rel="stylesheet" type="text/css" />
<link href="custom.css" rel="stylesheet" type="text/css"/>
</head>
<body>
<div id="top"><!-- do not remove this div, it is closed by doxygen! -->
<div id="titlearea">
<table cellspacing="0" cellpadding="0">
<tbody>
<tr id="projectrow">
<td id="projectlogo"><img alt="Logo" src="taskflow_logo.png"/></td>
<td id="projectalign">
<div id="projectname"><a href="https://github.com/taskflow/taskflow" style="color:inherit; text-decoration:none;">Taskflow: A General-purpose Task-parallel Programming System</a>
</div>
</td>
</tr>
</tbody>
</table>
</div>
<!-- end header part -->
<!-- Generated by Doxygen 1.13.1 -->
<script type="text/javascript">
/* @license magnet:?xt=urn:btih:d3d9a9a6595521f9666a5e94cc830dab83b65699&dn=expat.txt MIT */
var searchBox = new SearchBox("searchBox", "search/",'.html');
/* @license-end */
</script>
<script type="text/javascript">
/* @license magnet:?xt=urn:btih:d3d9a9a6595521f9666a5e94cc830dab83b65699&dn=expat.txt MIT */
$(function() { codefold.init(0); });
/* @license-end */
</script>
<script type="text/javascript" src="menudata.js"></script>
<script type="text/javascript" src="menu.js"></script>
<script type="text/javascript">
/* @license magnet:?xt=urn:btih:d3d9a9a6595521f9666a5e94cc830dab83b65699&dn=expat.txt MIT */
$(function() {
initMenu('',true,false,'search.php','Search',true);
$(function() { init_search(); });
});
/* @license-end */
</script>
<div id="main-nav"></div>
</div><!-- top -->
<div id="side-nav" class="ui-resizable side-nav-resizable">
<div id="nav-tree">
<div id="nav-tree-contents">
<div id="nav-sync" class="sync"></div>
</div>
</div>
<div id="splitbar" style="-moz-user-select:none;"
class="ui-resizable-handle">
</div>
</div>
<script type="text/javascript">
/* @license magnet:?xt=urn:btih:d3d9a9a6595521f9666a5e94cc830dab83b65699&dn=expat.txt MIT */
$(function(){initNavTree('ParallelIterations.html',''); initResizable(true); });
/* @license-end */
</script>
<div id="doc-content">
<!-- window showing the filter options -->
<div id="MSearchSelectWindow"
onmouseover="return searchBox.OnSearchSelectShow()"
onmouseout="return searchBox.OnSearchSelectHide()"
onkeydown="return searchBox.OnSearchSelectKey(event)">
</div>
<!-- iframe showing the search results (closed by default) -->
<div id="MSearchResultsWindow">
<div id="MSearchResults">
<div class="SRPage">
<div id="SRIndex">
<div id="SRResults"></div>
<div class="SRStatus" id="Loading">Loading...</div>
<div class="SRStatus" id="Searching">Searching...</div>
<div class="SRStatus" id="NoMatches">No Matches</div>
</div>
</div>
</div>
</div>
<div><div class="header">
<div class="headertitle"><div class="title">Parallel Iterations</div></div>
</div><!--header-->
<div class="contents">
<div class="toc"><h3>Table of Contents</h3>
<ul>
<li class="level1">
<a href="#ParallelIterationsIncludeTheHeader">Include the Header</a>
</li>
<li class="level1">
<a href="#ParallelIterationsIndexBased">Create an Index-based Parallel-Iteration Task</a>
<ul>
<li class="level2">
<a href="#ParallelForEachCaptureIndicesByReference">Capture Indices by Reference</a>
</li>
</ul>
</li>
<li class="level1">
<a href="#ParallelIterationsIndexRangeBased">Create an IndexRange-based Parallel-Iteration Task</a>
<ul>
<li class="level2">
<a href="#ParallelIterationsWhatIsIndexRange">What is an IndexRange?</a>
</li>
<li class="level2">
<a href="#ParallelIterationsIndexRange1D">Create a Parallel-Iteration Task over a 1D IndexRange</a>
</li>
<li class="level2">
<a href="#ParallelIterationsIndexRangeMD">Create a Parallel-Iteration Task over a Multi-dimensional IndexRange</a>
</li>
<li class="level2">
<a href="#ParallelIterationsIndexRangeMDZeroSize">Zero-size Dimensions</a>
</li>
<li class="level2">
<a href="#ParallelIterationsIndexRangeScheduling">Understand the Scheduling Algorithm</a>
</li>
<li class="level2">
<a href="#ParallelIterationsIndexRangeByReference">Capture Range by Reference</a>
</li>
</ul>
</li>
<li class="level1">
<a href="#ParallelIterationsIteratorBased">Create an Iterator-based Parallel-Iteration Task</a>
<ul>
<li class="level2">
<a href="#ParallelForEachCaptureIteratorsByReference">Capture Iterators by Reference</a>
</li>
</ul>
</li>
<li class="level1">
<a href="#ParallelIterationsConfigureAPartitioner">Configure a Partitioner</a>
</li>
</ul>
</div>
<div class="textblock"><p>Taskflow provides template functions for constructing tasks to perform parallel iterations over ranges of items.</p>
<h1><a class="anchor" id="ParallelIterationsIncludeTheHeader"></a>
Include the Header</h1>
<p>You need to include the header file, <code>taskflow/algorithm/for_each.hpp</code>, for using parallel-iteration algorithms.</p>
<div class="fragment"><div class="line"><span class="preprocessor">#include <taskflow/algorithm/for_each.hpp></span></div>
</div><!-- fragment --><h1><a class="anchor" id="ParallelIterationsIndexBased"></a>
Create an Index-based Parallel-Iteration Task</h1>
<p>Index-based parallel-for performs parallel iterations over a range <code>[first, last)</code> with the given <code>step</code> size. The task created by <a class="el" href="classtf_1_1FlowBuilder.html#a3b132bd902331a11b04b4ad66cf8bf77" title="constructs an index-based parallel-for task">tf::Taskflow::for_each_index(B first, E last, S step, C callable, P part)</a> represents parallel execution of the following loop:</p>
<div class="fragment"><div class="line"><span class="comment">// positive step</span></div>
<div class="line"><span class="keywordflow">for</span>(<span class="keyword">auto</span> i = first; i < last; i += step) {</div>
<div class="line"> callable(i);</div>
<div class="line">}</div>
<div class="line"> </div>
<div class="line"><span class="comment">// negative step</span></div>
<div class="line"><span class="keywordflow">for</span>(<span class="keyword">auto</span> i = first; i > last; i += step) {</div>
<div class="line"> callable(i);</div>
<div class="line">}</div>
</div><!-- fragment --><p>We support only integer-based ranges. The range can go in a positive or negative direction.</p>
<div class="fragment"><div class="line">taskflow.for_each_index(0, 100, 2, [](<span class="keywordtype">int</span> i) { }); <span class="comment">// 50 iterations with a positive step</span></div>
<div class="line">taskflow.for_each_index(100, 0, -2, [](<span class="keywordtype">int</span> i) { }); <span class="comment">// 50 iterations with a negative step</span></div>
</div><!-- fragment --><p>Notice that the direction is defined in terms of the half-open range <code>[first, last)</code>, where <code>last</code> is excluded. In the positive case, the 50 items are 0, 2, 4, 6, ..., 96, 98. In the negative case, the 50 items are 100, 98, 96, 94, ..., 4, 2. An example of the Taskflow graph for the positive case under 5 workers is depicted below:</p>
<div class="dotgraph">
<iframe scrolling="no" frameborder="0" src="dot_parallel_for_1.svg" width="867" height="155"><p><b>This browser is not able to show SVG: try Firefox, Chrome, Safari, or Opera instead.</b></p></iframe></div>
<h2><a class="anchor" id="ParallelForEachCaptureIndicesByReference"></a>
Capture Indices by Reference</h2>
<p>You can pass indices by reference using <a href="https://en.cppreference.com/w/cpp/utility/functional/ref">std::ref</a> to marshal parameter updates between dependent tasks. This is useful when the range is not known at the time the task graph is constructed but is initialized by an upstream task.</p>
<div class="fragment"><div class="line"><span class="keywordtype">int</span> first, last;</div>
<div class="line"><span class="keywordtype">int</span>* vec;</div>
<div class="line"> </div>
<div class="line"><span class="keyword">auto</span> init = taskflow.emplace([&](){</div>
<div class="line"> first = 0;</div>
<div class="line"> last = 1000;</div>
<div class="line"> vec = <span class="keyword">new</span> <span class="keywordtype">int</span>[1000];</div>
<div class="line">});</div>
<div class="line"> </div>
<div class="line"><span class="keyword">auto</span> pf = taskflow.for_each_index(</div>
<div class="line"> std::ref(first), std::ref(last), 1,</div>
<div class="line"> [&](<span class="keywordtype">int</span> i) {</div>
<div class="line"> std::cout << <span class="stringliteral">"parallel iteration on index "</span> << i << <span class="charliteral">'\n'</span>;</div>
<div class="line"> }</div>
<div class="line">);</div>
<div class="line"> </div>
<div class="line"><span class="comment">// The code below is wrong! first and last are captured by copy at construction time</span></div>
<div class="line"><span class="comment">// auto pf = taskflow.for_each_index(first, last, 1, [&](int i) { });</span></div>
<div class="line"> </div>
<div class="line">init.precede(pf);</div>
</div><!-- fragment --><p>When <code>init</code> finishes, the parallel-for task <code>pf</code> will see <code>first</code> as 0 and <code>last</code> as 1000 and performs parallel iterations over the 1000 indices.</p>
<h1><a class="anchor" id="ParallelIterationsIndexRangeBased"></a>
Create an IndexRange-based Parallel-Iteration Task</h1>
<p>While <a class="el" href="classtf_1_1FlowBuilder.html#a3b132bd902331a11b04b4ad66cf8bf77" title="constructs an index-based parallel-for task">tf::Taskflow::for_each_index</a> gives each worker a single scalar index, <a class="el" href="classtf_1_1FlowBuilder.html#a2582a216d54dacca2b7022ea7e89452a" title="constructs a parallel-for task over a one- or multi-dimensional index range">tf::Taskflow::for_each_by_index</a> gives each worker a contiguous <em>sub-range</em> (or <em>sub-box</em> in multiple dimensions) of the iteration space. This distinction is significant: instead of processing one element at a time, the callable operates on an entire block of consecutive indices in a single invocation, enabling block algorithms that exploit data locality, facilitate SIMD vectorisation, and reduce scheduling overhead.</p>
<h2><a class="anchor" id="ParallelIterationsWhatIsIndexRange"></a>
What is an IndexRange?</h2>
<p>A <a class="el" href="namespacetf.html#a6c928ec9248757ba8276e316ef26846b" title="alias for the common 1D case of tf::IndexRanges">tf::IndexRange</a> describes a typed, half-open index range <code>[begin, end)</code> with a step size. There are two variants:</p>
<ul>
<li><code><a class="el" href="namespacetf.html#a6c928ec9248757ba8276e316ef26846b" title="alias for the common 1D case of tf::IndexRanges">tf::IndexRange</a><T></code> (an alias for <code><a class="el" href="classtf_1_1IndexRanges.html" title="class to create an N-dimensional index range of integral indices">tf::IndexRanges</a><T, 1></code>): A 1D range of integral indices. It is defined by three parameters: <code>begin</code>, <code>end</code>, and <code>step_size</code>. The elements are <code>begin</code>, <code>begin+step</code>, <code>begin+2*step</code>, ... up to (but not including) <code>end</code>.</li>
</ul>
<div class="fragment"><div class="line"><span class="comment">// elements: 2, 5, 8, 11 (step = 3, size = 4)</span></div>
<div class="line"><a class="code hl_typedef" href="namespacetf.html#a6c928ec9248757ba8276e316ef26846b">tf::IndexRange<int></a> r(2, 14, 3);</div>
<div class="line"> </div>
<div class="line">r.begin(); <span class="comment">// 2</span></div>
<div class="line">r.end(); <span class="comment">// 14</span></div>
<div class="line">r.step_size(); <span class="comment">// 3</span></div>
<div class="line">r.size(); <span class="comment">// 4</span></div>
<div class="ttc" id="anamespacetf_html_a6c928ec9248757ba8276e316ef26846b"><div class="ttname"><a href="namespacetf.html#a6c928ec9248757ba8276e316ef26846b">tf::IndexRange</a></div><div class="ttdeci">IndexRanges< T, 1 > IndexRange</div><div class="ttdoc">alias for the common 1D case of tf::IndexRanges</div><div class="ttdef"><b>Definition</b> iterator.hpp:971</div></div>
</div><!-- fragment --><div class="dotgraph">
<iframe scrolling="no" frameborder="0" src="dot_indexrange_1d.svg" width="612" height="68"><p><b>This browser is not able to show SVG: try Firefox, Chrome, Safari, or Opera instead.</b></p></iframe></div>
<ul>
<li><code><a class="el" href="classtf_1_1IndexRanges.html" title="class to create an N-dimensional index range of integral indices">tf::IndexRanges</a><T, N></code>: An N-dimensional Cartesian product of N independent 1D ranges, one per dimension. Dimension 0 is the outermost (slowest varying) and dimension N-1 is the innermost (fastest varying), matching the natural nesting of C-style for-loops (row-major order). Each dimension is stored as a <code>std::tuple<T, T, T></code> of (begin, end, step_size), accessible and mutable through <code>dim(d)</code>.</li>
</ul>
<div class="fragment"><div class="line"><span class="comment">// 2D range: i in [0,3), j in [0,4) — a 3x4 grid of 12 index pairs</span></div>
<div class="line"><a class="code hl_class" href="classtf_1_1IndexRanges.html">tf::IndexRanges<int, 2></a> r(</div>
<div class="line"> <a class="code hl_typedef" href="namespacetf.html#a6c928ec9248757ba8276e316ef26846b">tf::IndexRange<int></a>(0, 3, 1), <span class="comment">// dim 0: i = 0, 1, 2</span></div>
<div class="line"> <a class="code hl_typedef" href="namespacetf.html#a6c928ec9248757ba8276e316ef26846b">tf::IndexRange<int></a>(0, 4, 1) <span class="comment">// dim 1: j = 0, 1, 2, 3</span></div>
<div class="line">);</div>
<div class="line"> </div>
<div class="line">r.size(); <span class="comment">// 12 (3 x 4)</span></div>
<div class="line">r.size(0); <span class="comment">// 3 (elements along dim 0)</span></div>
<div class="line">r.size(1); <span class="comment">// 4 (elements along dim 1)</span></div>
<div class="line">r.dim(0); <span class="comment">// std::tuple<int,int,int>{0, 3, 1}</span></div>
<div class="line">r.dim(1); <span class="comment">// std::tuple<int,int,int>{0, 4, 1}</span></div>
<div class="line"> </div>
<div class="line"><span class="keyword">auto</span> [b0, e0, s0] = r.dim(0); <span class="comment">// structured-binding access: b0=0, e0=3, s0=1</span></div>
<div class="ttc" id="aclasstf_1_1IndexRanges_html"><div class="ttname"><a href="classtf_1_1IndexRanges.html">tf::IndexRanges</a></div><div class="ttdoc">class to create an N-dimensional index range of integral indices</div><div class="ttdef"><b>Definition</b> iterator.hpp:188</div></div>
</div><!-- fragment --><div class="dotgraph">
<iframe scrolling="no" frameborder="0" src="dot_indexrange_2d.svg" width="576" height="474"><p><b>This browser is not able to show SVG: try Firefox, Chrome, Safari, or Opera instead.</b></p></iframe></div>
<p>The key advantage of tf::IndexRanges<T, N> over a raw scalar index is that the callable receives the full sub-range (or sub-box) at once, allowing it to implement cache-friendly block algorithms, apply SIMD over contiguous indices, or exploit any other block-level optimisation.</p>
<h2><a class="anchor" id="ParallelIterationsIndexRange1D"></a>
Create a Parallel-Iteration Task over a 1D IndexRange</h2>
<p>Passing a 1D <code><a class="el" href="namespacetf.html#a6c928ec9248757ba8276e316ef26846b" title="alias for the common 1D case of tf::IndexRanges">tf::IndexRange<T></a></code> to <a class="el" href="classtf_1_1FlowBuilder.html#a2582a216d54dacca2b7022ea7e89452a" title="constructs a parallel-for task over a one- or multi-dimensional index range">tf::Taskflow::for_each_by_index</a> creates a parallel task whose callable receives one contiguous sub-range of the original range per invocation:</p>
<div class="fragment"><div class="line"><a class="code hl_typedef" href="namespacetf.html#a6c928ec9248757ba8276e316ef26846b">tf::IndexRange<int></a> range(0, 100, 1);</div>
<div class="line"> </div>
<div class="line">taskflow.for_each_by_index(range, [](<a class="code hl_typedef" href="namespacetf.html#a6c928ec9248757ba8276e316ef26846b">tf::IndexRange<int></a> sub) {</div>
<div class="line"> <span class="comment">// sub is a contiguous slice of [0, 100)</span></div>
<div class="line"> <span class="keywordflow">for</span>(<span class="keywordtype">int</span> i = sub.<a class="code hl_function" href="classtf_1_1IndexRanges.html#ae37261f0d2f326449c469233561a3da6">begin</a>(); i < sub.<a class="code hl_function" href="classtf_1_1IndexRanges.html#a253e15e199f974ee26b2e33a5e2b3cf1">end</a>(); i += sub.<a class="code hl_function" href="classtf_1_1IndexRanges.html#afcb30e2b9567ad685e702201d2265880">step_size</a>()) {</div>
<div class="line"> process(i);</div>
<div class="line"> }</div>
<div class="line">});</div>
<div class="ttc" id="aclasstf_1_1IndexRanges_html_a253e15e199f974ee26b2e33a5e2b3cf1"><div class="ttname"><a href="classtf_1_1IndexRanges.html#a253e15e199f974ee26b2e33a5e2b3cf1">tf::IndexRanges::end</a></div><div class="ttdeci">T end() const</div><div class="ttdoc">queries the ending index of the range (only available when N == 1)</div><div class="ttdef"><b>Definition</b> iterator.hpp:358</div></div>
<div class="ttc" id="aclasstf_1_1IndexRanges_html_ae37261f0d2f326449c469233561a3da6"><div class="ttname"><a href="classtf_1_1IndexRanges.html#ae37261f0d2f326449c469233561a3da6">tf::IndexRanges::begin</a></div><div class="ttdeci">T begin() const</div><div class="ttdoc">queries the starting index of the range (only available when N == 1)</div><div class="ttdef"><b>Definition</b> iterator.hpp:346</div></div>
<div class="ttc" id="aclasstf_1_1IndexRanges_html_afcb30e2b9567ad685e702201d2265880"><div class="ttname"><a href="classtf_1_1IndexRanges.html#afcb30e2b9567ad685e702201d2265880">tf::IndexRanges::step_size</a></div><div class="ttdeci">T step_size() const</div><div class="ttdoc">queries the step size of the range (only available when N == 1)</div><div class="ttdef"><b>Definition</b> iterator.hpp:370</div></div>
</div><!-- fragment --><p>Because the callable sees the full sub-range at once, it can implement cache-friendly block algorithms that would be impossible when receiving a single index at a time. For instance, a worker assigned <code>[32, 64)</code> can prefetch that entire cache line before processing, rather than incurring per-element dispatch overhead.</p>
<h2><a class="anchor" id="ParallelIterationsIndexRangeMD"></a>
Create a Parallel-Iteration Task over a Multi-dimensional IndexRange</h2>
<p>Passing an N-dimensional <code>tf::IndexRanges<T, N></code> to <a class="el" href="classtf_1_1FlowBuilder.html#a2582a216d54dacca2b7022ea7e89452a" title="constructs a parallel-for task over a one- or multi-dimensional index range">tf::Taskflow::for_each_by_index</a> creates a parallel task whose callable receives one orthogonal <em>sub-box</em> of the full ND space per invocation. A sub-box is a valid hyperrectangle — a contiguous, axis-aligned tile that preserves the original step sizes of every dimension. The figure below shows a 2x3x4 range decomposed into two sub-boxes (one per outer-dimension value), each a 3x4 slice:</p>
<div class="image">
<object type="image/svg+xml" data="indexrange_3d.svg" width="90%" style="pointer-events: none;"></object>
</div>
<p>For a 2D range, the parallel task represents:</p>
<div class="fragment"><div class="line"><a class="code hl_class" href="classtf_1_1IndexRanges.html">tf::IndexRanges<int, 2></a> range(</div>
<div class="line"> <a class="code hl_typedef" href="namespacetf.html#a6c928ec9248757ba8276e316ef26846b">tf::IndexRange<int></a>(0, rows, 1), <span class="comment">// dimension 0: rows</span></div>
<div class="line"> <a class="code hl_typedef" href="namespacetf.html#a6c928ec9248757ba8276e316ef26846b">tf::IndexRange<int></a>(0, cols, 1) <span class="comment">// dimension 1: columns</span></div>
<div class="line">);</div>
<div class="line"> </div>
<div class="line">taskflow.for_each_by_index(range, [](<span class="keyword">const</span> <a class="code hl_class" href="classtf_1_1IndexRanges.html">tf::IndexRanges<int, 2></a>& tile) {</div>
<div class="line"> <span class="comment">// tile is an orthogonal sub-box of the full 2D range</span></div>
<div class="line"> <span class="keyword">auto</span> [i0, i1, is] = tile.<a class="code hl_function" href="classtf_1_1IndexRanges.html#a4e0162b872edd6176e9d6a308b295427">dim</a>(0);</div>
<div class="line"> <span class="keyword">auto</span> [j0, j1, js] = tile.<a class="code hl_function" href="classtf_1_1IndexRanges.html#a4e0162b872edd6176e9d6a308b295427">dim</a>(1);</div>
<div class="line"> <span class="keywordflow">for</span>(<span class="keywordtype">int</span> i = i0; i < i1; i += is) {</div>
<div class="line"> <span class="keywordflow">for</span>(<span class="keywordtype">int</span> j = j0; j < j1; j += js) {</div>
<div class="line"> process(i, j);</div>
<div class="line"> }</div>
<div class="line"> }</div>
<div class="line">});</div>
<div class="ttc" id="aclasstf_1_1IndexRanges_html_a4e0162b872edd6176e9d6a308b295427"><div class="ttname"><a href="classtf_1_1IndexRanges.html#a4e0162b872edd6176e9d6a308b295427">tf::IndexRanges::dim</a></div><div class="ttdeci">const std::tuple< T, T, T > & dim(size_t d) const</div><div class="ttdoc">returns the (begin, end, step) tuple for dimension d (read-only)</div><div class="ttdef"><b>Definition</b> iterator.hpp:297</div></div>
</div><!-- fragment --><p>Each sub-box preserves the original step sizes for every dimension, so the inner loops are identical to what you would write for sequential code. The same pattern extends naturally to three or more dimensions:</p>
<div class="fragment"><div class="line"><a class="code hl_class" href="classtf_1_1IndexRanges.html">tf::IndexRanges<int, 3></a> range(</div>
<div class="line"> <a class="code hl_typedef" href="namespacetf.html#a6c928ec9248757ba8276e316ef26846b">tf::IndexRange<int></a>(0, depth, 1),</div>
<div class="line"> <a class="code hl_typedef" href="namespacetf.html#a6c928ec9248757ba8276e316ef26846b">tf::IndexRange<int></a>(0, height, 1),</div>
<div class="line"> <a class="code hl_typedef" href="namespacetf.html#a6c928ec9248757ba8276e316ef26846b">tf::IndexRange<int></a>(0, width, 1)</div>
<div class="line">);</div>
<div class="line"> </div>
<div class="line">taskflow.for_each_by_index(range, [](<span class="keyword">const</span> <a class="code hl_class" href="classtf_1_1IndexRanges.html">tf::IndexRanges<int, 3></a>& sub) {</div>
<div class="line"> <span class="keyword">auto</span> [d0, d1, ds] = sub.<a class="code hl_function" href="classtf_1_1IndexRanges.html#a4e0162b872edd6176e9d6a308b295427">dim</a>(0);</div>
<div class="line"> <span class="keyword">auto</span> [h0, h1, hs] = sub.<a class="code hl_function" href="classtf_1_1IndexRanges.html#a4e0162b872edd6176e9d6a308b295427">dim</a>(1);</div>
<div class="line"> <span class="keyword">auto</span> [w0, w1, ws] = sub.<a class="code hl_function" href="classtf_1_1IndexRanges.html#a4e0162b872edd6176e9d6a308b295427">dim</a>(2);</div>
<div class="line"> <span class="keywordflow">for</span>(<span class="keywordtype">int</span> i = d0; i < d1; i += ds) {</div>
<div class="line"> <span class="keywordflow">for</span>(<span class="keywordtype">int</span> j = h0; j < h1; j += hs) {</div>
<div class="line"> <span class="keywordflow">for</span>(<span class="keywordtype">int</span> k = w0; k < w1; k += ws) {</div>
<div class="line"> process(i, j, k);</div>
<div class="line"> }</div>
<div class="line"> }</div>
<div class="line"> }</div>
<div class="line">});</div>
</div><!-- fragment --><h2><a class="anchor" id="ParallelIterationsIndexRangeMDZeroSize"></a>
Zero-size Dimensions</h2>
<p>When a dimension has zero size, it and all dimensions inner to it produce no iterations — but outer dimensions are unaffected. This matches the behaviour of sequential nested loops: a zero-size inner loop simply never executes, while the outer loops still run.</p>
<div class="fragment"><div class="line"><span class="comment">// The middle dimension has zero iterations (j_begin == j_end).</span></div>
<div class="line"><span class="comment">// The outer i-loop still has work; the j/k body never executes.</span></div>
<div class="line"><a class="code hl_class" href="classtf_1_1IndexRanges.html">tf::IndexRanges<int, 3></a> range(</div>
<div class="line"> <a class="code hl_typedef" href="namespacetf.html#a6c928ec9248757ba8276e316ef26846b">tf::IndexRange<int></a>(0, 100, 1), <span class="comment">// i: 100 iterations</span></div>
<div class="line"> <a class="code hl_typedef" href="namespacetf.html#a6c928ec9248757ba8276e316ef26846b">tf::IndexRange<int></a>(0, 0, 1), <span class="comment">// j: 0 iterations — body never runs</span></div>
<div class="line"> <a class="code hl_typedef" href="namespacetf.html#a6c928ec9248757ba8276e316ef26846b">tf::IndexRange<int></a>(0, 100, 1) <span class="comment">// k: 100 iterations (never reached)</span></div>
<div class="line">);</div>
<div class="line"><span class="comment">// range.size() == 100 (outer i-loop only)</span></div>
<div class="line"> </div>
<div class="line">taskflow.for_each_by_index(range, [](<span class="keyword">const</span> <a class="code hl_class" href="classtf_1_1IndexRanges.html">tf::IndexRanges<int, 3></a>& sub) {</div>
<div class="line"> <span class="comment">// Each sub-box covers a slice of the active i-dimension.</span></div>
<div class="line"> <span class="comment">// Only process(i) is ever called — the j-loop never executes,</span></div>
<div class="line"> <span class="comment">// so process(i, j) and process(i, j, k) are never reached.</span></div>
<div class="line"> <span class="keyword">auto</span> [i0, i1, is] = sub.<a class="code hl_function" href="classtf_1_1IndexRanges.html#a4e0162b872edd6176e9d6a308b295427">dim</a>(0);</div>
<div class="line"> <span class="keyword">auto</span> [j0, j1, js] = sub.<a class="code hl_function" href="classtf_1_1IndexRanges.html#a4e0162b872edd6176e9d6a308b295427">dim</a>(1);</div>
<div class="line"> <span class="keyword">auto</span> [k0, k1, ks] = sub.<a class="code hl_function" href="classtf_1_1IndexRanges.html#a4e0162b872edd6176e9d6a308b295427">dim</a>(2);</div>
<div class="line"> <span class="keywordflow">for</span>(<span class="keywordtype">int</span> i = i0; i < i1; i += is) {</div>
<div class="line"> process(i); <span class="comment">// <-- called for every i</span></div>
<div class="line"> <span class="keywordflow">for</span>(<span class="keywordtype">int</span> j = j0; j < j1; j += js) { <span class="comment">// never entered</span></div>
<div class="line"> process(i, j); <span class="comment">// <-- never called</span></div>
<div class="line"> <span class="keywordflow">for</span>(<span class="keywordtype">int</span> k = k0; k < k1; k += ks) {</div>
<div class="line"> process(i, j, k); <span class="comment">// <-- never called</span></div>
<div class="line"> }</div>
<div class="line"> }</div>
<div class="line"> }</div>
<div class="line">});</div>
</div><!-- fragment --><p>This is equivalent to the sequential nested loop:</p>
<div class="fragment"><div class="line"><span class="keywordflow">for</span>(<span class="keywordtype">int</span> i = 0; i < 100; ++i) {</div>
<div class="line"> process(i); <span class="comment">// <-- called for every i</span></div>
<div class="line"> <span class="keywordflow">for</span>(<span class="keywordtype">int</span> j = 0; j < 0; ++j) { <span class="comment">// zero iterations — body skipped</span></div>
<div class="line"> process(i, j); <span class="comment">// <-- never called</span></div>
<div class="line"> <span class="keywordflow">for</span>(<span class="keywordtype">int</span> k = 0; k < 100; ++k) {</div>
<div class="line"> process(i, j, k); <span class="comment">// <-- never called</span></div>
<div class="line"> }</div>
<div class="line"> }</div>
<div class="line">}</div>
</div><!-- fragment --><dl class="section note"><dt>Note</dt><dd>This differs from OpenMP's <code>collapse</code> clause, which treats the entire iteration space as a Cartesian product and yields zero total iterations when any dimension is empty. Taskflow preserves the sequential semantics of nested loops: outer dimensions always execute independently of inner ones.</dd></dl>
<h2><a class="anchor" id="ParallelIterationsIndexRangeScheduling"></a>
Understand the Scheduling Algorithm</h2>
<p>Taskflow schedules parallel iterations over an index range by mapping the N-dimensional space to a 1D flat index space and distributing chunks of that flat space among workers. Specifically, the full ND range is first linearized in row-major order into a flat index space of size <code>N</code> = <code>range.size()</code>. For a 3x4 2D range, this looks like:</p>
<div class="dotgraph">
<iframe scrolling="no" frameborder="0" src="dot_indexrange_flat.svg" width="474" height="420"><p><b>This browser is not able to show SVG: try Firefox, Chrome, Safari, or Opera instead.</b></p></iframe></div>
<p>Then, workers claim contiguous chunks of the flat index space, either pre-assigned (static partitioner) or on demand via an atomic cursor (dynamic partitioners). During the assignment, each flat chunk boundary coincides with a <em>hyperplane</em> <em>boundary</em>, i.e., a suffix product of the dimension sizes (1, <code>dim[N-1]</code>, <code>dim[N-1]</code> x <code>dim[N-2]</code>, ...). This is the critical constraint: only boundaries that align to a complete inner row (or slice, or hyper-slice) produce a valid rectangular sub-box. The figure below illustrates why alignment matters for a 3x4 grid:</p>
<div class="image">
<object type="image/svg+xml" data="indexrange_alignment.svg" width="90%" style="pointer-events: none;"></object>
</div>
<p>A cut at flat index 6 (mid-row) would split row <code>i=1</code> between two workers, producing an L-shaped non-rectangular region that cannot be expressed as a single <code><a class="el" href="classtf_1_1IndexRanges.html" title="class to create an N-dimensional index range of integral indices">IndexRanges</a></code> sub-box. A cut at flat index 8 (row boundary) produces two clean rectangular sub-boxes. Taskflow enforces this automatically. The <code>chunk_size</code> hint is always snapped to the nearest hyperplane boundary via <code><a class="el" href="classtf_1_1IndexRanges.html#a5d99f151421740eb2f2d180c4cd5a8f1" title="returns the smallest hyperplane-aligned chunk size that is >= chunk_size, capped at size() when chunk...">IndexRanges::ceil()</a></code>:</p>
<div class="fragment"><div class="line"><span class="comment">// For a 3x4 range with 3 workers:</span></div>
<div class="line"><span class="comment">// raw N/W = 12/3 = 4 -> r.ceil(4) = 4 (already a row boundary)</span></div>
<div class="line"><span class="comment">// For a 3x5 range with 2 workers:</span></div>
<div class="line"><span class="comment">// raw N/W = 15/2 = 7 -> r.ceil(7) = 10 (next row boundary = 2 full rows)</span></div>
</div><!-- fragment --><dl class="section user"><dt>Static vs dynamic partitioners</dt><dd></dd></dl>
<p>With a static partitioner, each worker's chunk is pre-assigned before execution and never changes. The figure below shows a 3x4 range evenly split among 3 workers, each receiving exactly one aligned row:</p>
<div class="image">
<object type="image/svg+xml" data="indexrange_partition.svg" width="80%" style="pointer-events: none;"></object>
</div>
<p>With a dynamic partitioner, workers pull chunks from a shared atomic cursor. The chunk may overshoot to the next hyperplane boundary when the requested size does not align exactly, but the cursor self-corrects so no element is visited twice.</p>
<dl class="section note"><dt>Note</dt><dd>You do not need to think about alignment explicitly. Taskflow handles it automatically — the <code>chunk_size</code> you pass to the partitioner is a <em>hint</em>, and the scheduler snaps it to the nearest valid boundary before distributing work.</dd></dl>
<h2><a class="anchor" id="ParallelIterationsIndexRangeByReference"></a>
Capture Range by Reference</h2>
<p>When the range bounds are not known at task-graph construction time, pass the range by <a href="https://en.cppreference.com/w/cpp/utility/functional/ref">std::ref</a> so that an upstream task can set the bounds before the parallel loop runs. This works for both 1D and multi-dimensional ranges.</p>
<div class="fragment"><div class="line"><a class="code hl_class" href="classtf_1_1IndexRanges.html">tf::IndexRanges<int, 2></a> range(</div>
<div class="line"> <a class="code hl_typedef" href="namespacetf.html#a6c928ec9248757ba8276e316ef26846b">tf::IndexRange<int></a>(0, 0, 1), <span class="comment">// placeholder — filled by init</span></div>
<div class="line"> <a class="code hl_typedef" href="namespacetf.html#a6c928ec9248757ba8276e316ef26846b">tf::IndexRange<int></a>(0, 0, 1)</div>
<div class="line">);</div>
<div class="line"> </div>
<div class="line"><span class="keyword">auto</span> init = taskflow.emplace([&](){</div>
<div class="line"> range.dim(0) = {0, rows, 1};</div>
<div class="line"> range.dim(1) = {0, cols, 1};</div>
<div class="line">});</div>
<div class="line"> </div>
<div class="line"><span class="keyword">auto</span> pf = taskflow.for_each_by_index(</div>
<div class="line"> std::ref(range),</div>
<div class="line"> [](tf::IndexRanges<int, 2> tile) {</div>
<div class="line"> <span class="keyword">auto</span> [i0, i1, is] = tile.<a class="code hl_function" href="classtf_1_1IndexRanges.html#a4e0162b872edd6176e9d6a308b295427">dim</a>(0);</div>
<div class="line"> <span class="keyword">auto</span> [j0, j1, js] = tile.<a class="code hl_function" href="classtf_1_1IndexRanges.html#a4e0162b872edd6176e9d6a308b295427">dim</a>(1);</div>
<div class="line"> <span class="keywordflow">for</span>(<span class="keywordtype">int</span> i = i0; i < i1; i += is) {</div>
<div class="line"> <span class="keywordflow">for</span>(<span class="keywordtype">int</span> j = j0; j < j1; j += js) {</div>
<div class="line"> process(i, j);</div>
<div class="line"> }</div>
<div class="line"> }</div>
<div class="line"> }</div>
<div class="line">);</div>
<div class="line"> </div>
<div class="line"><span class="comment">// The code below is wrong! range is captured by copy at construction time</span></div>
<div class="line"><span class="comment">// auto pf = taskflow.for_each_by_index(range, callable);</span></div>
<div class="line"> </div>
<div class="line">init.precede(pf);</div>
</div><!-- fragment --><p>When <code>init</code> finishes, <code>pf</code> reads the updated range and partitions the <code>rows x cols</code> space among workers.</p>
<h1><a class="anchor" id="ParallelIterationsIteratorBased"></a>
Create an Iterator-based Parallel-Iteration Task</h1>
<p>Iterator-based parallel-for performs parallel iterations over a range specified by two <a href="https://en.cppreference.com/w/cpp/iterator/iterator">STL-styled iterators</a>, <code>first</code> and <code>last</code>. The task created by <a class="el" href="classtf_1_1FlowBuilder.html#aae3edfa278baa75b08414e083c14c836" title="constructs an STL-styled parallel-for task">tf::Taskflow::for_each(B first, E last, C callable, P part)</a> represents parallel execution of the following loop:</p>
<div class="fragment"><div class="line"><span class="keywordflow">for</span>(<span class="keyword">auto</span> i = first; i != last; i++) {</div>
<div class="line"> callable(*i);</div>
<div class="line">}</div>
</div><!-- fragment --><p><a class="el" href="classtf_1_1FlowBuilder.html#aae3edfa278baa75b08414e083c14c836" title="constructs an STL-styled parallel-for task">tf::Taskflow::for_each</a> simultaneously applies the callable to the object obtained by dereferencing every iterator in the range <code>[first, last)</code>. It is the user's responsibility to ensure the range is valid within the execution of the parallel-for task. Iterators must have the post-increment operator <code>++</code> defined.</p>
<div class="fragment"><div class="line">std::vector<int> vec = {1, 2, 3, 4, 5};</div>
<div class="line"> </div>
<div class="line">taskflow.for_each(vec.begin(), vec.end(), [](<span class="keywordtype">int</span> i) {</div>
<div class="line"> std::cout << <span class="stringliteral">"parallel for on item "</span> << i << <span class="stringliteral">'\n'</span>;</div>
<div class="line">});</div>
</div><!-- fragment --><h2><a class="anchor" id="ParallelForEachCaptureIteratorsByReference"></a>
Capture Iterators by Reference</h2>
<p>Similar to index-based parallel-for, iterators can be passed by reference using <a href="https://en.cppreference.com/w/cpp/utility/functional/ref">std::ref</a> so that one task can set up the range before another task performs the parallel-for.</p>
<div class="fragment"><div class="line">std::vector<int> vec;</div>
<div class="line">std::vector<int>::iterator first, last;</div>
<div class="line"> </div>
<div class="line"><span class="keyword">auto</span> init = taskflow.emplace([&](){</div>
<div class="line"> vec.resize(1000);</div>
<div class="line"> first = vec.begin();</div>
<div class="line"> last = vec.end();</div>
<div class="line">});</div>
<div class="line"> </div>
<div class="line"><span class="keyword">auto</span> pf = taskflow.for_each(</div>
<div class="line"> std::ref(first), std::ref(last),</div>
<div class="line"> [](<span class="keywordtype">int</span> i) {</div>
<div class="line"> std::cout << <span class="stringliteral">"parallel iteration on item "</span> << i << <span class="charliteral">'\n'</span>;</div>
<div class="line"> }</div>
<div class="line">);</div>
<div class="line"> </div>
<div class="line"><span class="comment">// The code below is wrong! first and last are captured by copy at construction time</span></div>
<div class="line"><span class="comment">// auto pf = taskflow.for_each(first, last, [](int i) { });</span></div>
<div class="line"> </div>
<div class="line">init.precede(pf);</div>
</div><!-- fragment --><p>When <code>init</code> finishes, <code>pf</code> will see <code>first</code> pointing to the beginning of <code>vec</code> and <code>last</code> pointing to the end of <code>vec</code> and performs parallel iterations over the 1000 items.</p>
<h1><a class="anchor" id="ParallelIterationsConfigureAPartitioner"></a>
Configure a Partitioner</h1>
<p>A partitioner controls how the iteration space is divided among workers. Taskflow provides four partitioners, each suited to different workload characteristics:</p>
<ul>
<li><a class="el" href="classtf_1_1StaticPartitioner.html" title="class to construct a static partitioner for scheduling parallel algorithms">tf::StaticPartitioner</a> divides the range into equal-sized chunks ahead of execution and assigns them to workers in order. It has the lowest scheduling overhead and delivers the best performance when every iteration costs roughly the same amount of work.</li>
<li><a class="el" href="classtf_1_1DynamicPartitioner.html" title="class to create a dynamic partitioner for scheduling parallel algorithms">tf::DynamicPartitioner</a> distributes fixed-sized chunks to workers on demand as they become available. It adapts well to workloads where iteration cost varies, at the expense of slightly higher coordination overhead.</li>
<li><a class="el" href="classtf_1_1GuidedPartitioner.html" title="class to create a guided partitioner for scheduling parallel algorithms">tf::GuidedPartitioner</a> distributes chunks whose size decreases adaptively as work is consumed — large chunks early to reduce overhead, smaller chunks late to balance the tail. This is the default partitioner and delivers stable, near-optimal performance across a wide range of workloads.</li>
<li><a class="el" href="classtf_1_1RandomPartitioner.html" title="class to construct a random partitioner for scheduling parallel algorithms">tf::RandomPartitioner</a> distributes chunks of randomly sampled sizes, which can help avoid systematic load imbalances caused by data-dependent cost patterns.</li>
</ul>
<p>The following example creates two parallel-iteration tasks with different partitioners:</p>
<div class="fragment"><div class="line">std::vector<int> vec(1024, 0);</div>
<div class="line"> </div>
<div class="line"><a class="code hl_class" href="classtf_1_1StaticPartitioner.html">tf::StaticPartitioner</a> static_partitioner(10); <span class="comment">// chunk size of 10</span></div>
<div class="line"><a class="code hl_class" href="classtf_1_1GuidedPartitioner.html">tf::GuidedPartitioner</a> guided_partitioner(10); <span class="comment">// minimum chunk size of 10</span></div>
<div class="line"> </div>
<div class="line">taskflow.for_each(</div>
<div class="line"> vec.begin(), vec.end(),</div>
<div class="line"> [](<span class="keywordtype">int</span> i) { std::cout << <span class="stringliteral">"static: "</span> << i << <span class="stringliteral">'\n'</span>; },</div>
<div class="line"> static_partitioner</div>
<div class="line">);</div>
<div class="line"> </div>
<div class="line">taskflow.for_each(</div>
<div class="line"> vec.begin(), vec.end(),</div>
<div class="line"> [](<span class="keywordtype">int</span> i) { std::cout << <span class="stringliteral">"guided: "</span> << i << <span class="stringliteral">'\n'</span>; },</div>
<div class="line"> guided_partitioner</div>
<div class="line">);</div>
<div class="ttc" id="aclasstf_1_1GuidedPartitioner_html"><div class="ttname"><a href="classtf_1_1GuidedPartitioner.html">tf::GuidedPartitioner</a></div><div class="ttdoc">class to create a guided partitioner for scheduling parallel algorithms</div><div class="ttdef"><b>Definition</b> partitioner.hpp:417</div></div>
<div class="ttc" id="aclasstf_1_1StaticPartitioner_html"><div class="ttname"><a href="classtf_1_1StaticPartitioner.html">tf::StaticPartitioner</a></div><div class="ttdoc">class to construct a static partitioner for scheduling parallel algorithms</div><div class="ttdef"><b>Definition</b> partitioner.hpp:262</div></div>
</div><!-- fragment --><p>As a rule of thumb, prefer <a class="el" href="classtf_1_1StaticPartitioner.html" title="class to construct a static partitioner for scheduling parallel algorithms">tf::StaticPartitioner</a> for uniform workloads (e.g., element-wise arithmetic on arrays) and <a class="el" href="classtf_1_1GuidedPartitioner.html" title="class to create a guided partitioner for scheduling parallel algorithms">tf::GuidedPartitioner</a> for irregular workloads (e.g., graph traversal, variable-length processing). <a class="el" href="classtf_1_1DynamicPartitioner.html" title="class to create a dynamic partitioner for scheduling parallel algorithms">tf::DynamicPartitioner</a> is a good choice when chunks must be kept small and strictly equal in size.</p>
<dl class="section note"><dt>Note</dt><dd>By default, parallel-iteration tasks use <a class="el" href="namespacetf.html#ace2c5adcd5039483eebb6dbdbb6f33e3" title="default partitioner set to tf::GuidedPartitioner">tf::DefaultPartitioner</a> (currently <a class="el" href="classtf_1_1GuidedPartitioner.html" title="class to create a guided partitioner for scheduling parallel algorithms">tf::GuidedPartitioner</a>) if no partitioner is specified. </dd></dl>
</div></div><!-- contents -->
</div><!-- PageDoc -->
</div><!-- doc-content -->
<!-- HTML footer for doxygen 1.13.1-->
<!-- start footer part -->
<div id="nav-path" class="navpath"><!-- id is needed for treeview function! -->
<ul>
<li class="navelem"><a class="el" href="Algorithms.html">Taskflow Algorithms</a></li>
<li class="footer">
Maintained by <a href="https://tsung-wei-huang.github.io/">Dr. Tsung-Wei Huang</a>
—
Generated by <a href="https://www.doxygen.org/index.html"><img class="footer" src="doxygen.svg" width="104" height="31" alt="doxygen"/></a> 1.13.1
</li>
</ul>
</div>