-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Expand file tree
/
Copy pathParallelReduction.html
More file actions
360 lines (358 loc) · 30 KB
/
Copy pathParallelReduction.html
File metadata and controls
360 lines (358 loc) · 30 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
<!-- HTML header for doxygen 1.13.1-->
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "https://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en-US">
<head>
<meta http-equiv="Content-Type" content="text/xhtml;charset=UTF-8"/>
<meta http-equiv="X-UA-Compatible" content="IE=11"/>
<meta name="generator" content="Doxygen 1.13.1"/>
<meta name="viewport" content="width=device-width, initial-scale=1"/>
<title>Taskflow: A General-purpose Task-parallel Programming System: Parallel Reduction</title>
<link href="tabs.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="jquery.js"></script>
<script type="text/javascript" src="dynsections.js"></script>
<script type="text/javascript" src="clipboard.js"></script>
<link href="navtree.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="navtreedata.js"></script>
<script type="text/javascript" src="navtree.js"></script>
<script type="text/javascript" src="resize.js"></script>
<script type="text/javascript" src="cookie.js"></script>
<link href="search/search.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="search/searchdata.js"></script>
<script type="text/javascript" src="search/search.js"></script>
<link href="doxygen.css" rel="stylesheet" type="text/css" />
<link href="custom.css" rel="stylesheet" type="text/css"/>
</head>
<body>
<div id="top"><!-- do not remove this div, it is closed by doxygen! -->
<div id="titlearea">
<table cellspacing="0" cellpadding="0">
<tbody>
<tr id="projectrow">
<td id="projectlogo"><img alt="Logo" src="taskflow_logo.png"/></td>
<td id="projectalign">
<div id="projectname"><a href="https://github.com/taskflow/taskflow" style="color:inherit; text-decoration:none;">Taskflow: A General-purpose Task-parallel Programming System</a>
</div>
</td>
</tr>
</tbody>
</table>
</div>
<!-- end header part -->
<!-- Generated by Doxygen 1.13.1 -->
<script type="text/javascript">
/* @license magnet:?xt=urn:btih:d3d9a9a6595521f9666a5e94cc830dab83b65699&dn=expat.txt MIT */
var searchBox = new SearchBox("searchBox", "search/",'.html');
/* @license-end */
</script>
<script type="text/javascript">
/* @license magnet:?xt=urn:btih:d3d9a9a6595521f9666a5e94cc830dab83b65699&dn=expat.txt MIT */
$(function() { codefold.init(0); });
/* @license-end */
</script>
<script type="text/javascript" src="menudata.js"></script>
<script type="text/javascript" src="menu.js"></script>
<script type="text/javascript">
/* @license magnet:?xt=urn:btih:d3d9a9a6595521f9666a5e94cc830dab83b65699&dn=expat.txt MIT */
$(function() {
initMenu('',true,false,'search.php','Search',true);
$(function() { init_search(); });
});
/* @license-end */
</script>
<div id="main-nav"></div>
</div><!-- top -->
<div id="side-nav" class="ui-resizable side-nav-resizable">
<div id="nav-tree">
<div id="nav-tree-contents">
<div id="nav-sync" class="sync"></div>
</div>
</div>
<div id="splitbar" style="-moz-user-select:none;"
class="ui-resizable-handle">
</div>
</div>
<script type="text/javascript">
/* @license magnet:?xt=urn:btih:d3d9a9a6595521f9666a5e94cc830dab83b65699&dn=expat.txt MIT */
$(function(){initNavTree('ParallelReduction.html',''); initResizable(true); });
/* @license-end */
</script>
<div id="doc-content">
<!-- window showing the filter options -->
<div id="MSearchSelectWindow"
onmouseover="return searchBox.OnSearchSelectShow()"
onmouseout="return searchBox.OnSearchSelectHide()"
onkeydown="return searchBox.OnSearchSelectKey(event)">
</div>
<!-- iframe showing the search results (closed by default) -->
<div id="MSearchResultsWindow">
<div id="MSearchResults">
<div class="SRPage">
<div id="SRIndex">
<div id="SRResults"></div>
<div class="SRStatus" id="Loading">Loading...</div>
<div class="SRStatus" id="Searching">Searching...</div>
<div class="SRStatus" id="NoMatches">No Matches</div>
</div>
</div>
</div>
</div>
<div><div class="header">
<div class="headertitle"><div class="title">Parallel Reduction</div></div>
</div><!--header-->
<div class="contents">
<div class="toc"><h3>Table of Contents</h3>
<ul>
<li class="level1">
<a href="#ParallelReductionInclude">Include the Header</a>
</li>
<li class="level1">
<a href="#ParallelReductionCreate">Create a Parallel-Reduction Task</a>
<ul>
<li class="level2">
<a href="#ParallelReductionCaptureIteratorsByReference">Capture Iterators by Reference</a>
</li>
</ul>
</li>
<li class="level1">
<a href="#ParallelTransformReductionCreate">Create a Parallel-Transform-Reduction Task</a>
<ul>
<li class="level2">
<a href="#ParallelTransformReductionCaptureIteratorsByReference">Capture Iterators by Reference</a>
</li>
</ul>
</li>
<li class="level1">
<a href="#ParallelReduceByIndexCreate">Create a Parallel-Reduce-by-Index Task</a>
<ul>
<li class="level2">
<a href="#ParallelReduceByIndexCaptureByReference">Capture IndexRange by Reference</a>
</li>
</ul>
</li>
<li class="level1">
<a href="#ParallelReductionConfigureAPartitioner">Configure a Partitioner</a>
</li>
</ul>
</div>
<div class="textblock"><p>Taskflow provides template functions for constructing tasks to perform parallel reduction over a range of items.</p>
<h1><a class="anchor" id="ParallelReductionInclude"></a>
Include the Header</h1>
<p>You need to include the header file, <code>taskflow/algorithm/reduce.hpp</code>, for creating a parallel-reduction task.</p>
<div class="fragment"><div class="line"><span class="preprocessor">#include <taskflow/algorithm/reduce.hpp></span></div>
</div><!-- fragment --><h1><a class="anchor" id="ParallelReductionCreate"></a>
Create a Parallel-Reduction Task</h1>
<p>The task created by <a class="el" href="classtf_1_1FlowBuilder.html#afb24798ebf46e253a40b01bffb1da6a7" title="constructs an STL-styled parallel-reduction task">tf::Taskflow::reduce(B first, E last, T& result, O bop, P part)</a> performs parallel reduction over the range <code>[first, last)</code> using the binary operator <code>bop</code> and stores the reduced result in <code>result</code>. It represents the parallel execution of the following loop:</p>
<div class="fragment"><div class="line"><span class="keywordflow">for</span>(<span class="keyword">auto</span> itr = first; itr != last; itr++) {</div>
<div class="line"> result = bop(result, *itr);</div>
<div class="line">}</div>
</div><!-- fragment --><p>At runtime, the reduction task partitions the range among workers, each computing a partial result, and then combines those partial results into <code>result</code> using <code>bop</code>. The initial value of <code>result</code> participates in the reduction — it is combined with the partial results as if it were an additional element. <code>result</code> is captured by reference inside the task; it is the user's responsibility to ensure it remains alive during execution.</p>
<div class="fragment"><div class="line"><span class="keywordtype">int</span> sum = 100;</div>
<div class="line">std::vector<int> vec = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};</div>
<div class="line"> </div>
<div class="line"><a class="code hl_class" href="classtf_1_1Task.html">tf::Task</a> task = taskflow.reduce(vec.begin(), vec.end(), sum,</div>
<div class="line"> [](<span class="keywordtype">int</span> l, <span class="keywordtype">int</span> r) { return l + r; }</div>
<div class="line">);</div>
<div class="line"> </div>
<div class="line">executor.run(taskflow).wait();</div>
<div class="line">assert(sum == 155); <span class="comment">// 100 + (1+2+...+10)</span></div>
<div class="ttc" id="aclasstf_1_1Task_html"><div class="ttname"><a href="classtf_1_1Task.html">tf::Task</a></div><div class="ttdoc">class to create a task handle over a taskflow node</div><div class="ttdef"><b>Definition</b> task.hpp:569</div></div>
</div><!-- fragment --><p>The order in which <code>bop</code> is applied to pairs of elements is <em>unspecified</em>. Elements of the range may be grouped and rearranged in arbitrary order, as illustrated below for a sum-reduction over eight elements:</p>
<div class="dotgraph">
<iframe scrolling="no" frameborder="0" src="dot_parallel_reduction.svg" width="798" height="456"><p><b>This browser is not able to show SVG: try Firefox, Chrome, Safari, or Opera instead.</b></p></iframe></div>
<p>The result and argument types of <code>bop</code> must be consistent with the element type.</p>
<h2><a class="anchor" id="ParallelReductionCaptureIteratorsByReference"></a>
Capture Iterators by Reference</h2>
<p>You can pass iterators by reference using <a href="https://en.cppreference.com/w/cpp/utility/functional/ref">std::ref</a> to marshal parameter updates between dependent tasks. This is useful when the range is not known at task-graph construction time but is initialized by an upstream task.</p>
<div class="fragment"><div class="line"><span class="keywordtype">int</span> sum = 100;</div>
<div class="line">std::vector<int> vec;</div>
<div class="line">std::vector<int>::iterator first, last;</div>
<div class="line"> </div>
<div class="line"><a class="code hl_class" href="classtf_1_1Task.html">tf::Task</a> init = taskflow.emplace([&]() {</div>
<div class="line"> vec = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};</div>
<div class="line"> first = vec.begin();</div>
<div class="line"> last = vec.end();</div>
<div class="line">});</div>
<div class="line"> </div>
<div class="line"><a class="code hl_class" href="classtf_1_1Task.html">tf::Task</a> task = taskflow.reduce(</div>
<div class="line"> std::ref(first), std::ref(last), sum,</div>
<div class="line"> [](<span class="keywordtype">int</span> l, <span class="keywordtype">int</span> r) { <span class="keywordflow">return</span> l + r; }</div>
<div class="line">);</div>
<div class="line"> </div>
<div class="line"><span class="comment">// wrong! first and last are captured by copy at construction time</span></div>
<div class="line"><span class="comment">// tf::Task task = taskflow.reduce(first, last, sum,</span></div>
<div class="line"><span class="comment">// [](int l, int r) { return l + r; }</span></div>
<div class="line"><span class="comment">// );</span></div>
<div class="line"> </div>
<div class="line">init.<a class="code hl_function" href="classtf_1_1Task.html#a8c78c453295a553c1c016e4062da8588">precede</a>(task);</div>
<div class="line"> </div>
<div class="line">executor.run(taskflow).wait();</div>
<div class="line">assert(sum == 155);</div>
<div class="ttc" id="aclasstf_1_1Task_html_a8c78c453295a553c1c016e4062da8588"><div class="ttname"><a href="classtf_1_1Task.html#a8c78c453295a553c1c016e4062da8588">tf::Task::precede</a></div><div class="ttdeci">Task & precede(Ts &&... tasks)</div><div class="ttdoc">adds precedence links from this to other tasks</div><div class="ttdef"><b>Definition</b> task.hpp:1258</div></div>
</div><!-- fragment --><p>When <code>init</code> finishes, <code>first</code> and <code>last</code> point to the initialized data range of <code>vec</code>, and the reduction task performs parallel reduction over the 10 elements.</p>
<h1><a class="anchor" id="ParallelTransformReductionCreate"></a>
Create a Parallel-Transform-Reduction Task</h1>
<p>It is common to transform each element into a new type and then reduce over the transformed values. The task created by <a class="el" href="classtf_1_1FlowBuilder.html#a5283a732a77ea75446f8ed5d3377f02c" title="constructs an STL-styled parallel transform-reduce task">tf::Taskflow::transform_reduce(B first, E last, T& result, BOP bop, UOP uop, P part)</a> applies the unary operator <code>uop</code> to each element and then performs parallel reduction over <code>result</code> and the transformed values using <code>bop</code>. It represents the parallel execution of the following loop:</p>
<div class="fragment"><div class="line"><span class="keywordflow">for</span>(<span class="keyword">auto</span> itr = first; itr != last; itr++) {</div>
<div class="line"> result = bop(result, uop(*itr));</div>
<div class="line">}</div>
</div><!-- fragment --><p>The example below transforms each digit character in a string to an integer and then sums them in parallel:</p>
<div class="fragment"><div class="line">std::string str = <span class="stringliteral">"12345678"</span>;</div>
<div class="line"><span class="keywordtype">int</span> sum = 0;</div>
<div class="line"> </div>
<div class="line"><a class="code hl_class" href="classtf_1_1Task.html">tf::Task</a> task = taskflow.transform_reduce(str.begin(), str.end(), sum,</div>
<div class="line"> [](<span class="keywordtype">int</span> a, <span class="keywordtype">int</span> b) { return a + b; }, <span class="comment">// binary reduction operator</span></div>
<div class="line"> [](<span class="keywordtype">char</span> c) -> <span class="keywordtype">int</span> { return c - <span class="stringliteral">'0'</span>; } <span class="comment">// unary transformation operator</span></div>
<div class="line">);</div>
<div class="line"> </div>
<div class="line">executor.run(taskflow).wait();</div>
<div class="line">assert(sum == 36); <span class="comment">// 1+2+3+4+5+6+7+8</span></div>
</div><!-- fragment --><p>The order in which <code>bop</code> is applied to the transformed elements is <em>unspecified</em>. It is possible that <code>bop</code> will receive r-value arguments from both sides (e.g., <code>bop(uop(*itr1), uop(*itr2))</code>) due to transformed temporaries. When data passing is expensive, define the result type <code>T</code> to be move-constructible.</p>
<h2><a class="anchor" id="ParallelTransformReductionCaptureIteratorsByReference"></a>
Capture Iterators by Reference</h2>
<p>As with <a class="el" href="classtf_1_1FlowBuilder.html#afb24798ebf46e253a40b01bffb1da6a7" title="constructs an STL-styled parallel-reduction task">tf::Taskflow::reduce</a>, iterators can be passed by reference using <a href="https://en.cppreference.com/w/cpp/utility/functional/ref">std::ref</a> so that an upstream task can set up the range before the parallel-transform-reduction runs.</p>
<div class="fragment"><div class="line">std::string str;</div>
<div class="line">std::string::iterator first, last;</div>
<div class="line"><span class="keywordtype">int</span> sum = 0;</div>
<div class="line"> </div>
<div class="line"><a class="code hl_class" href="classtf_1_1Task.html">tf::Task</a> init = taskflow.emplace([&]() {</div>
<div class="line"> str = <span class="stringliteral">"12345678"</span>;</div>
<div class="line"> first = str.begin();</div>
<div class="line"> last = str.end();</div>
<div class="line">});</div>
<div class="line"> </div>
<div class="line"><a class="code hl_class" href="classtf_1_1Task.html">tf::Task</a> task = taskflow.transform_reduce(</div>
<div class="line"> std::ref(first), std::ref(last), sum,</div>
<div class="line"> [](<span class="keywordtype">int</span> a, <span class="keywordtype">int</span> b) { <span class="keywordflow">return</span> a + b; },</div>
<div class="line"> [](<span class="keywordtype">char</span> c) -> <span class="keywordtype">int</span> { <span class="keywordflow">return</span> c - <span class="charliteral">'0'</span>; }</div>
<div class="line">);</div>
<div class="line"> </div>
<div class="line"><span class="comment">// wrong! first and last are captured by copy at construction time</span></div>
<div class="line"><span class="comment">// tf::Task task = taskflow.transform_reduce(first, last, sum, bop, uop);</span></div>
<div class="line"> </div>
<div class="line">init.<a class="code hl_function" href="classtf_1_1Task.html#a8c78c453295a553c1c016e4062da8588">precede</a>(task);</div>
<div class="line"> </div>
<div class="line">executor.run(taskflow).wait();</div>
<div class="line">assert(sum == 36);</div>
</div><!-- fragment --><h1><a class="anchor" id="ParallelReduceByIndexCreate"></a>
Create a Parallel-Reduce-by-Index Task</h1>
<p>Unlike <a class="el" href="classtf_1_1FlowBuilder.html#afb24798ebf46e253a40b01bffb1da6a7" title="constructs an STL-styled parallel-reduction task">tf::Taskflow::reduce</a>, which gives each worker a single element at a time, <a class="el" href="classtf_1_1FlowBuilder.html#a3ea810696c4b29824d1aaef15342c825" title="constructs an index range-based parallel-reduction task">tf::Taskflow::reduce_by_index</a> gives each worker a contiguous <em>subrange</em> of the index space. This allows the local reduction to be written as an explicit loop over the subrange, enabling optimisations such as SIMD vectorisation, custom accumulator types, or data initialisation interleaved with reduction. The method, <a class="el" href="classtf_1_1FlowBuilder.html#a3ea810696c4b29824d1aaef15342c825" title="constructs an index range-based parallel-reduction task">tf::Taskflow::reduce_by_index</a>, represents the parallel execution of the following two-phase loop:</p>
<div class="fragment"><div class="line"><span class="comment">// phase 1: each worker computes a partial result over its subrange</span></div>
<div class="line">T partial = lop(subrange, std::nullopt); <span class="comment">// first subrange: no prior total</span></div>
<div class="line">T partial = lop(subrange, running_total); <span class="comment">// subsequent subranges: accumulate</span></div>
<div class="line"> </div>
<div class="line"><span class="comment">// phase 2: combine all partial results into the final result</span></div>
<div class="line">result = gop(result, partial1);</div>
<div class="line">result = gop(result, partial2);</div>
<div class="line"><span class="comment">// ...</span></div>
</div><!-- fragment --><p>The local operator <code>lop</code> is invoked once per subrange assigned to a worker. Its second argument is a <code>std::optional<T></code> carrying the running total accumulated by that worker so far:</p>
<ul>
<li><code>std::nullopt</code> on the first subrange processed by a worker — the worker should initialise its accumulator from scratch.</li>
<li>A value on subsequent subranges — the worker should continue accumulating from the provided running total.</li>
</ul>
<p>The global operator <code>gop</code> combines the per-worker partial results and the initial value of <code>result</code> into the final answer.</p>
<p>The example below performs a sum-reduction over a large array, initialising each element inside the local reducer:</p>
<div class="fragment"><div class="line">std::vector<double> data(100000);</div>
<div class="line"><span class="keywordtype">double</span> res = 1.0;</div>
<div class="line"> </div>
<div class="line">taskflow.reduce_by_index(</div>
<div class="line"> <a class="code hl_typedef" href="namespacetf.html#a6c928ec9248757ba8276e316ef26846b">tf::IndexRange<size_t></a>(0, data.size(), 1),</div>
<div class="line"> res,</div>
<div class="line"> <span class="comment">// local reducer: called once per subrange per worker</span></div>
<div class="line"> [&](<a class="code hl_typedef" href="namespacetf.html#a6c928ec9248757ba8276e316ef26846b">tf::IndexRange<size_t></a> subrange, std::optional<double> running_total) {</div>
<div class="line"> double partial = running_total ? *running_total : 0.0;</div>
<div class="line"> for(size_t i = subrange.begin(); i < subrange.end(); i += subrange.step_size()) {</div>
<div class="line"> data[i] = 1.0;</div>
<div class="line"> partial += data[i];</div>
<div class="line"> }</div>
<div class="line"> <span class="keywordflow">return</span> partial;</div>
<div class="line"> },</div>
<div class="line"> <span class="comment">// global reducer: combines partial results into res</span></div>
<div class="line"> std::plus<double>()</div>
<div class="line">);</div>
<div class="line"> </div>
<div class="line">executor.run(taskflow).wait();</div>
<div class="line">assert(res == 100001.0); <span class="comment">// 1.0 (initial) + 100000 * 1.0</span></div>
<div class="ttc" id="anamespacetf_html_a6c928ec9248757ba8276e316ef26846b"><div class="ttname"><a href="namespacetf.html#a6c928ec9248757ba8276e316ef26846b">tf::IndexRange</a></div><div class="ttdeci">IndexRanges< T, 1 > IndexRange</div><div class="ttdoc">alias for the common 1D case of tf::IndexRanges</div><div class="ttdef"><b>Definition</b> iterator.hpp:971</div></div>
</div><!-- fragment --><p>The global reducer combines all partial results with the initial value of <code>res</code> (here <code>1.0</code>), so the final answer is <code>1.0</code> + <code>100000.0</code> = <code>100001.0</code>.</p>
<h2><a class="anchor" id="ParallelReduceByIndexCaptureByReference"></a>
Capture IndexRange by Reference</h2>
<p>You can pass the index range by reference using <a href="https://en.cppreference.com/w/cpp/utility/functional/ref">std::ref</a> so that an upstream task can set the bounds before the parallel-reduce-by-index runs.</p>
<div class="fragment"><div class="line">std::vector<double> data;</div>
<div class="line"><span class="keywordtype">double</span> res = 0.0;</div>
<div class="line"> </div>
<div class="line"><a class="code hl_typedef" href="namespacetf.html#a6c928ec9248757ba8276e316ef26846b">tf::IndexRange<size_t></a> range(0, 0, 1); <span class="comment">// placeholder — filled by init</span></div>
<div class="line"> </div>
<div class="line"><a class="code hl_class" href="classtf_1_1Task.html">tf::Task</a> init = taskflow.emplace([&]() {</div>
<div class="line"> data.assign(100000, 1.0);</div>
<div class="line"> range.begin(0).end(data.size()).step_size(1);</div>
<div class="line">});</div>
<div class="line"> </div>
<div class="line"><a class="code hl_class" href="classtf_1_1Task.html">tf::Task</a> task = taskflow.reduce_by_index(</div>
<div class="line"> std::ref(range),</div>
<div class="line"> res,</div>
<div class="line"> [&](<a class="code hl_typedef" href="namespacetf.html#a6c928ec9248757ba8276e316ef26846b">tf::IndexRange<size_t></a> subrange, std::optional<double> running_total) {</div>
<div class="line"> <span class="keywordtype">double</span> partial = running_total ? *running_total : 0.0;</div>
<div class="line"> <span class="keywordflow">for</span>(<span class="keywordtype">size_t</span> i = subrange.<a class="code hl_function" href="classtf_1_1IndexRanges.html#ae37261f0d2f326449c469233561a3da6">begin</a>(); i < subrange.<a class="code hl_function" href="classtf_1_1IndexRanges.html#a253e15e199f974ee26b2e33a5e2b3cf1">end</a>(); i += subrange.<a class="code hl_function" href="classtf_1_1IndexRanges.html#afcb30e2b9567ad685e702201d2265880">step_size</a>()) {</div>
<div class="line"> partial += data[i];</div>
<div class="line"> }</div>
<div class="line"> <span class="keywordflow">return</span> partial;</div>
<div class="line"> },</div>
<div class="line"> std::plus<double>()</div>
<div class="line">);</div>
<div class="line"> </div>
<div class="line"><span class="comment">// wrong! range is captured by copy at construction time</span></div>
<div class="line"><span class="comment">// tf::Task task = taskflow.reduce_by_index(range, res, lop, gop);</span></div>
<div class="line"> </div>
<div class="line">init.<a class="code hl_function" href="classtf_1_1Task.html#a8c78c453295a553c1c016e4062da8588">precede</a>(task);</div>
<div class="line"> </div>
<div class="line">executor.run(taskflow).wait();</div>
<div class="line">assert(res == 100000.0);</div>
<div class="ttc" id="aclasstf_1_1IndexRanges_html_a253e15e199f974ee26b2e33a5e2b3cf1"><div class="ttname"><a href="classtf_1_1IndexRanges.html#a253e15e199f974ee26b2e33a5e2b3cf1">tf::IndexRanges::end</a></div><div class="ttdeci">T end() const</div><div class="ttdoc">queries the ending index of the range (only available when N == 1)</div><div class="ttdef"><b>Definition</b> iterator.hpp:358</div></div>
<div class="ttc" id="aclasstf_1_1IndexRanges_html_ae37261f0d2f326449c469233561a3da6"><div class="ttname"><a href="classtf_1_1IndexRanges.html#ae37261f0d2f326449c469233561a3da6">tf::IndexRanges::begin</a></div><div class="ttdeci">T begin() const</div><div class="ttdoc">queries the starting index of the range (only available when N == 1)</div><div class="ttdef"><b>Definition</b> iterator.hpp:346</div></div>
<div class="ttc" id="aclasstf_1_1IndexRanges_html_afcb30e2b9567ad685e702201d2265880"><div class="ttname"><a href="classtf_1_1IndexRanges.html#afcb30e2b9567ad685e702201d2265880">tf::IndexRanges::step_size</a></div><div class="ttdeci">T step_size() const</div><div class="ttdoc">queries the step size of the range (only available when N == 1)</div><div class="ttdef"><b>Definition</b> iterator.hpp:370</div></div>
</div><!-- fragment --><h1><a class="anchor" id="ParallelReductionConfigureAPartitioner"></a>
Configure a Partitioner</h1>
<p>A partitioner controls how the iteration space is divided among workers. Taskflow provides four partitioners, each suited to different workload characteristics:</p>
<ul>
<li><a class="el" href="classtf_1_1StaticPartitioner.html" title="class to construct a static partitioner for scheduling parallel algorithms">tf::StaticPartitioner</a> divides the range into equal-sized chunks ahead of execution and assigns them to workers in order. It has the lowest scheduling overhead and delivers the best performance when every element costs roughly the same amount of work to reduce.</li>
<li><a class="el" href="classtf_1_1DynamicPartitioner.html" title="class to create a dynamic partitioner for scheduling parallel algorithms">tf::DynamicPartitioner</a> distributes fixed-sized chunks to workers on demand as they become available. It adapts well to workloads where reduction cost varies per element, at the expense of slightly higher coordination overhead.</li>
<li><a class="el" href="classtf_1_1GuidedPartitioner.html" title="class to create a guided partitioner for scheduling parallel algorithms">tf::GuidedPartitioner</a> distributes chunks whose size decreases adaptively as work is consumed — large chunks early to reduce overhead, smaller chunks late to balance the tail. This is the default partitioner and delivers stable, near-optimal performance across a wide range of workloads.</li>
<li><a class="el" href="classtf_1_1RandomPartitioner.html" title="class to construct a random partitioner for scheduling parallel algorithms">tf::RandomPartitioner</a> distributes chunks of randomly sampled sizes, which can help avoid systematic load imbalances caused by data-dependent cost patterns.</li>
</ul>
<p>The following example creates two parallel-reduction tasks using different partitioners:</p>
<div class="fragment"><div class="line"><span class="keywordtype">int</span> sum1 = 100, sum2 = 100;</div>
<div class="line">std::vector<int> vec = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};</div>
<div class="line"> </div>
<div class="line"><a class="code hl_class" href="classtf_1_1StaticPartitioner.html">tf::StaticPartitioner</a> static_partitioner(0); <span class="comment">// chunk size auto-determined</span></div>
<div class="line"><a class="code hl_class" href="classtf_1_1GuidedPartitioner.html">tf::GuidedPartitioner</a> guided_partitioner(0); <span class="comment">// minimum chunk size auto-determined</span></div>
<div class="line"> </div>
<div class="line"><span class="comment">// parallel-reduction with static partitioner</span></div>
<div class="line">taskflow.reduce(vec.begin(), vec.end(), sum1,</div>
<div class="line"> [](<span class="keywordtype">int</span> l, <span class="keywordtype">int</span> r) { return l + r; },</div>
<div class="line"> static_partitioner</div>
<div class="line">);</div>
<div class="line"> </div>
<div class="line"><span class="comment">// parallel-reduction with guided partitioner</span></div>
<div class="line">taskflow.reduce(vec.begin(), vec.end(), sum2,</div>
<div class="line"> [](<span class="keywordtype">int</span> l, <span class="keywordtype">int</span> r) { return l + r; },</div>
<div class="line"> guided_partitioner</div>
<div class="line">);</div>
<div class="ttc" id="aclasstf_1_1GuidedPartitioner_html"><div class="ttname"><a href="classtf_1_1GuidedPartitioner.html">tf::GuidedPartitioner</a></div><div class="ttdoc">class to create a guided partitioner for scheduling parallel algorithms</div><div class="ttdef"><b>Definition</b> partitioner.hpp:417</div></div>
<div class="ttc" id="aclasstf_1_1StaticPartitioner_html"><div class="ttname"><a href="classtf_1_1StaticPartitioner.html">tf::StaticPartitioner</a></div><div class="ttdoc">class to construct a static partitioner for scheduling parallel algorithms</div><div class="ttdef"><b>Definition</b> partitioner.hpp:262</div></div>
</div><!-- fragment --><p>As a rule of thumb, prefer <a class="el" href="classtf_1_1StaticPartitioner.html" title="class to construct a static partitioner for scheduling parallel algorithms">tf::StaticPartitioner</a> when every element costs the same to reduce (e.g., summation over a plain array) and <a class="el" href="classtf_1_1GuidedPartitioner.html" title="class to create a guided partitioner for scheduling parallel algorithms">tf::GuidedPartitioner</a> for irregular workloads (e.g., reductions whose cost depends on the element value). <a class="el" href="classtf_1_1DynamicPartitioner.html" title="class to create a dynamic partitioner for scheduling parallel algorithms">tf::DynamicPartitioner</a> is a good choice when chunks must be kept small and strictly equal in size.</p>
<dl class="section note"><dt>Note</dt><dd>By default, parallel-reduction tasks use <a class="el" href="namespacetf.html#ace2c5adcd5039483eebb6dbdbb6f33e3" title="default partitioner set to tf::GuidedPartitioner">tf::DefaultPartitioner</a> (currently <a class="el" href="classtf_1_1GuidedPartitioner.html" title="class to create a guided partitioner for scheduling parallel algorithms">tf::GuidedPartitioner</a>) if no partitioner is specified. </dd></dl>
</div></div><!-- contents -->
</div><!-- PageDoc -->
</div><!-- doc-content -->
<!-- HTML footer for doxygen 1.13.1-->
<!-- start footer part -->
<div id="nav-path" class="navpath"><!-- id is needed for treeview function! -->
<ul>
<li class="navelem"><a class="el" href="Algorithms.html">Taskflow Algorithms</a></li>
<li class="footer">
Maintained by <a href="https://tsung-wei-huang.github.io/">Dr. Tsung-Wei Huang</a>
—
Generated by <a href="https://www.doxygen.org/index.html"><img class="footer" src="doxygen.svg" width="104" height="31" alt="doxygen"/></a> 1.13.1
</li>
</ul>
</div>