taskflow/docs/ParallelReduction.html at dev · taskflow/taskflow

360 lines (358 loc) · 30 KB
<!-- HTML header for doxygen 1.13.1-->
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "https://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en-US">
<meta http-equiv="Content-Type" content="text/xhtml;charset=UTF-8"/>
<meta http-equiv="X-UA-Compatible" content="IE=11"/>
<meta name="generator" content="Doxygen 1.13.1"/>
<meta name="viewport" content="width=device-width, initial-scale=1"/>
<title>Taskflow: A General-purpose Task-parallel Programming System: Parallel Reduction</title>
<link href="tabs.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="jquery.js"></script>
<script type="text/javascript" src="dynsections.js"></script>
<script type="text/javascript" src="clipboard.js"></script>
<link href="navtree.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="navtreedata.js"></script>
<script type="text/javascript" src="navtree.js"></script>
<script type="text/javascript" src="resize.js"></script>
<script type="text/javascript" src="cookie.js"></script>
<link href="search/search.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="search/searchdata.js"></script>
<script type="text/javascript" src="search/search.js"></script>
<link href="doxygen.css" rel="stylesheet" type="text/css" />
<link href="custom.css" rel="stylesheet" type="text/css"/>
<div id="top"><!-- do not remove this div, it is closed by doxygen! -->
<div id="titlearea">
<table cellspacing="0" cellpadding="0">
 <tr id="projectrow">
  <td id="projectlogo"><img alt="Logo" src="taskflow_logo.png"/></td>
  <td id="projectalign">
   <div id="projectname"><a href="https://github.com/taskflow/taskflow" style="color:inherit; text-decoration:none;">Taskflow: A General-purpose Task-parallel Programming System</a>
<!-- end header part -->
<!-- Generated by Doxygen 1.13.1 -->
<script type="text/javascript">
/* @license magnet:?xt=urn:btih:d3d9a9a6595521f9666a5e94cc830dab83b65699&amp;dn=expat.txt MIT */
var searchBox = new SearchBox("searchBox", "search/",'.html');
/* @license-end */
<script type="text/javascript">
/* @license magnet:?xt=urn:btih:d3d9a9a6595521f9666a5e94cc830dab83b65699&amp;dn=expat.txt MIT */
$(function() { codefold.init(0); });
/* @license-end */
<script type="text/javascript" src="menudata.js"></script>
<script type="text/javascript" src="menu.js"></script>
<script type="text/javascript">
/* @license magnet:?xt=urn:btih:d3d9a9a6595521f9666a5e94cc830dab83b65699&amp;dn=expat.txt MIT */
$(function() {
  initMenu('',true,false,'search.php','Search',true);
  $(function() { init_search(); });
/* @license-end */
<div id="main-nav"></div>
</div><!-- top -->
<div id="side-nav" class="ui-resizable side-nav-resizable">
  <div id="nav-tree">
    <div id="nav-tree-contents">
      <div id="nav-sync" class="sync"></div>
  <div id="splitbar" style="-moz-user-select:none;" 
       class="ui-resizable-handle">
<script type="text/javascript">
/* @license magnet:?xt=urn:btih:d3d9a9a6595521f9666a5e94cc830dab83b65699&amp;dn=expat.txt MIT */
$(function(){initNavTree('ParallelReduction.html',''); initResizable(true); });
/* @license-end */
<div id="doc-content">
<!-- window showing the filter options -->
<div id="MSearchSelectWindow"
     onmouseover="return searchBox.OnSearchSelectShow()"
     onmouseout="return searchBox.OnSearchSelectHide()"
     onkeydown="return searchBox.OnSearchSelectKey(event)">
<!-- iframe showing the search results (closed by default) -->
<div id="MSearchResultsWindow">
<div id="MSearchResults">
<div class="SRPage">
<div id="SRIndex">
<div id="SRResults"></div>
<div class="SRStatus" id="Loading">Loading...</div>
<div class="SRStatus" id="Searching">Searching...</div>
<div class="SRStatus" id="NoMatches">No Matches</div>
<div><div class="header">
  <div class="headertitle"><div class="title">Parallel Reduction</div></div>
</div><!--header-->
<div class="contents">
<div class="toc"><h3>Table of Contents</h3>
  <li class="level1">
    <a href="#ParallelReductionInclude">Include the Header</a>
  <li class="level1">
    <a href="#ParallelReductionCreate">Create a Parallel-Reduction Task</a>
      <li class="level2">
        <a href="#ParallelReductionCaptureIteratorsByReference">Capture Iterators by Reference</a>
      </li>
  <li class="level1">
    <a href="#ParallelTransformReductionCreate">Create a Parallel-Transform-Reduction Task</a>
      <li class="level2">
        <a href="#ParallelTransformReductionCaptureIteratorsByReference">Capture Iterators by Reference</a>
      </li>
  <li class="level1">
    <a href="#ParallelReduceByIndexCreate">Create a Parallel-Reduce-by-Index Task</a>
      <li class="level2">
        <a href="#ParallelReduceByIndexCaptureByReference">Capture IndexRange by Reference</a>
      </li>
  <li class="level1">
    <a href="#ParallelReductionConfigureAPartitioner">Configure a Partitioner</a>
<div class="textblock"><p>Taskflow provides template functions for constructing tasks to perform parallel reduction over a range of items.</p>
<h1><a class="anchor" id="ParallelReductionInclude"></a>
Include the Header</h1>
<p>You need to include the header file, <code>taskflow/algorithm/reduce.hpp</code>, for creating a parallel-reduction task.</p>
<div class="fragment"><div class="line"><span class="preprocessor">#include &lt;taskflow/algorithm/reduce.hpp&gt;</span></div>
</div><!-- fragment --><h1><a class="anchor" id="ParallelReductionCreate"></a>
Create a Parallel-Reduction Task</h1>
<p>The task created by <a class="el" href="classtf_1_1FlowBuilder.html#afb24798ebf46e253a40b01bffb1da6a7" title="constructs an STL-styled parallel-reduction task">tf::Taskflow::reduce(B first, E last, T&amp; result, O bop, P part)</a> performs parallel reduction over the range <code>[first, last)</code> using the binary operator <code>bop</code> and stores the reduced result in <code>result</code>. It represents the parallel execution of the following loop:</p>
<div class="fragment"><div class="line"><span class="keywordflow">for</span>(<span class="keyword">auto</span> itr = first; itr != last; itr++) {</div>
<div class="line">  result = bop(result, *itr);</div>
<div class="line">}</div>
</div><!-- fragment --><p>At runtime, the reduction task partitions the range among workers, each computing a partial result, and then combines those partial results into <code>result</code> using <code>bop</code>. The initial value of <code>result</code> participates in the reduction — it is combined with the partial results as if it were an additional element. <code>result</code> is captured by reference inside the task; it is the user's responsibility to ensure it remains alive during execution.</p>
<div class="fragment"><div class="line"><span class="keywordtype">int</span> sum = 100;</div>
<div class="line">std::vector&lt;int&gt; vec = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};</div>
<div class="line"> </div>
<div class="line"><a class="code hl_class" href="classtf_1_1Task.html">tf::Task</a> task = taskflow.reduce(vec.begin(), vec.end(), sum,</div>
<div class="line">  [](<span class="keywordtype">int</span> l, <span class="keywordtype">int</span> r) { return l + r; }</div>
<div class="line">);</div>
<div class="line"> </div>
<div class="line">executor.run(taskflow).wait();</div>
<div class="line">assert(sum == 155);  <span class="comment">// 100 + (1+2+...+10)</span></div>
<div class="ttc" id="aclasstf_1_1Task_html"><div class="ttname"><a href="classtf_1_1Task.html">tf::Task</a></div><div class="ttdoc">class to create a task handle over a taskflow node</div><div class="ttdef"><b>Definition</b> task.hpp:569</div></div>
</div><!-- fragment --><p>The order in which <code>bop</code> is applied to pairs of elements is <em>unspecified</em>. Elements of the range may be grouped and rearranged in arbitrary order, as illustrated below for a sum-reduction over eight elements:</p>
<div class="dotgraph">
<iframe scrolling="no" frameborder="0" src="dot_parallel_reduction.svg" width="798" height="456"><p><b>This browser is not able to show SVG: try Firefox, Chrome, Safari, or Opera instead.</b></p></iframe></div>
<p>The result and argument types of <code>bop</code> must be consistent with the element type.</p>
<h2><a class="anchor" id="ParallelReductionCaptureIteratorsByReference"></a>
Capture Iterators by Reference</h2>
<p>You can pass iterators by reference using <a href="https://en.cppreference.com/w/cpp/utility/functional/ref">std::ref</a> to marshal parameter updates between dependent tasks. This is useful when the range is not known at task-graph construction time but is initialized by an upstream task.</p>
<div class="fragment"><div class="line"><span class="keywordtype">int</span> sum = 100;</div>
<div class="line">std::vector&lt;int&gt; vec;</div>
<div class="line">std::vector&lt;int&gt;::iterator first, last;</div>
<div class="line"> </div>
<div class="line"><a class="code hl_class" href="classtf_1_1Task.html">tf::Task</a> init = taskflow.emplace([&amp;]() {</div>
<div class="line">  vec   = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};</div>
<div class="line">  first = vec.begin();</div>
<div class="line">  last  = vec.end();</div>
<div class="line">});</div>
<div class="line"> </div>
<div class="line"><a class="code hl_class" href="classtf_1_1Task.html">tf::Task</a> task = taskflow.reduce(</div>
<div class="line">  std::ref(first), std::ref(last), sum,</div>
<div class="line">  [](<span class="keywordtype">int</span> l, <span class="keywordtype">int</span> r) { <span class="keywordflow">return</span> l + r; }</div>
<div class="line">);</div>
<div class="line"> </div>
<div class="line"><span class="comment">// wrong! first and last are captured by copy at construction time</span></div>
<div class="line"><span class="comment">// tf::Task task = taskflow.reduce(first, last, sum,</span></div>
<div class="line"><span class="comment">//   [](int l, int r) { return l + r; }</span></div>
<div class="line"><span class="comment">// );</span></div>
<div class="line"> </div>
<div class="line">init.<a class="code hl_function" href="classtf_1_1Task.html#a8c78c453295a553c1c016e4062da8588">precede</a>(task);</div>
<div class="line"> </div>
<div class="line">executor.run(taskflow).wait();</div>
<div class="line">assert(sum == 155);</div>
<div class="ttc" id="aclasstf_1_1Task_html_a8c78c453295a553c1c016e4062da8588"><div class="ttname"><a href="classtf_1_1Task.html#a8c78c453295a553c1c016e4062da8588">tf::Task::precede</a></div><div class="ttdeci">Task &amp; precede(Ts &amp;&amp;... tasks)</div><div class="ttdoc">adds precedence links from this to other tasks</div><div class="ttdef"><b>Definition</b> task.hpp:1258</div></div>
</div><!-- fragment --><p>When <code>init</code> finishes, <code>first</code> and <code>last</code> point to the initialized data range of <code>vec</code>, and the reduction task performs parallel reduction over the 10 elements.</p>
<h1><a class="anchor" id="ParallelTransformReductionCreate"></a>
Create a Parallel-Transform-Reduction Task</h1>
<p>It is common to transform each element into a new type and then reduce over the transformed values. The task created by <a class="el" href="classtf_1_1FlowBuilder.html#a5283a732a77ea75446f8ed5d3377f02c" title="constructs an STL-styled parallel transform-reduce task">tf::Taskflow::transform_reduce(B first, E last, T&amp; result, BOP bop, UOP uop, P part)</a> applies the unary operator <code>uop</code> to each element and then performs parallel reduction over <code>result</code> and the transformed values using <code>bop</code>. It represents the parallel execution of the following loop:</p>
<div class="fragment"><div class="line"><span class="keywordflow">for</span>(<span class="keyword">auto</span> itr = first; itr != last; itr++) {</div>
<div class="line">  result = bop(result, uop(*itr));</div>
<div class="line">}</div>
</div><!-- fragment --><p>The example below transforms each digit character in a string to an integer and then sums them in parallel:</p>
<div class="fragment"><div class="line">std::string str = <span class="stringliteral">&quot;12345678&quot;</span>;</div>
<div class="line"><span class="keywordtype">int</span> sum = 0;</div>
<div class="line"> </div>
<div class="line"><a class="code hl_class" href="classtf_1_1Task.html">tf::Task</a> task = taskflow.transform_reduce(str.begin(), str.end(), sum,</div>
<div class="line">  [](<span class="keywordtype">int</span> a, <span class="keywordtype">int</span> b) { return a + b; },   <span class="comment">// binary reduction operator</span></div>
<div class="line">  [](<span class="keywordtype">char</span> c) -&gt; <span class="keywordtype">int</span> { return c - <span class="stringliteral">&#39;0&#39;</span>; } <span class="comment">// unary transformation operator</span></div>
<div class="line">);</div>
<div class="line"> </div>
<div class="line">executor.run(taskflow).wait();</div>
<div class="line">assert(sum == 36);  <span class="comment">// 1+2+3+4+5+6+7+8</span></div>
</div><!-- fragment --><p>The order in which <code>bop</code> is applied to the transformed elements is <em>unspecified</em>. It is possible that <code>bop</code> will receive r-value arguments from both sides (e.g., <code>bop(uop(*itr1), uop(*itr2))</code>) due to transformed temporaries. When data passing is expensive, define the result type <code>T</code> to be move-constructible.</p>
<h2><a class="anchor" id="ParallelTransformReductionCaptureIteratorsByReference"></a>
Capture Iterators by Reference</h2>
<p>As with <a class="el" href="classtf_1_1FlowBuilder.html#afb24798ebf46e253a40b01bffb1da6a7" title="constructs an STL-styled parallel-reduction task">tf::Taskflow::reduce</a>, iterators can be passed by reference using <a href="https://en.cppreference.com/w/cpp/utility/functional/ref">std::ref</a> so that an upstream task can set up the range before the parallel-transform-reduction runs.</p>
<div class="fragment"><div class="line">std::string str;</div>
<div class="line">std::string::iterator first, last;</div>
<div class="line"><span class="keywordtype">int</span> sum = 0;</div>
<div class="line"> </div>
<div class="line"><a class="code hl_class" href="classtf_1_1Task.html">tf::Task</a> init = taskflow.emplace([&amp;]() {</div>
<div class="line">  str   = <span class="stringliteral">&quot;12345678&quot;</span>;</div>
<div class="line">  first = str.begin();</div>
<div class="line">  last  = str.end();</div>
<div class="line">});</div>
<div class="line"> </div>
<div class="line"><a class="code hl_class" href="classtf_1_1Task.html">tf::Task</a> task = taskflow.transform_reduce(</div>
<div class="line">  std::ref(first), std::ref(last), sum,</div>
<div class="line">  [](<span class="keywordtype">int</span> a, <span class="keywordtype">int</span> b) { <span class="keywordflow">return</span> a + b; },</div>
<div class="line">  [](<span class="keywordtype">char</span> c) -&gt; <span class="keywordtype">int</span> { <span class="keywordflow">return</span> c - <span class="charliteral">&#39;0&#39;</span>; }</div>
<div class="line">);</div>
<div class="line"> </div>
<div class="line"><span class="comment">// wrong! first and last are captured by copy at construction time</span></div>
<div class="line"><span class="comment">// tf::Task task = taskflow.transform_reduce(first, last, sum, bop, uop);</span></div>
<div class="line"> </div>
<div class="line">init.<a class="code hl_function" href="classtf_1_1Task.html#a8c78c453295a553c1c016e4062da8588">precede</a>(task);</div>
<div class="line"> </div>
<div class="line">executor.run(taskflow).wait();</div>
<div class="line">assert(sum == 36);</div>
</div><!-- fragment --><h1><a class="anchor" id="ParallelReduceByIndexCreate"></a>
Create a Parallel-Reduce-by-Index Task</h1>
<p>Unlike <a class="el" href="classtf_1_1FlowBuilder.html#afb24798ebf46e253a40b01bffb1da6a7" title="constructs an STL-styled parallel-reduction task">tf::Taskflow::reduce</a>, which gives each worker a single element at a time, <a class="el" href="classtf_1_1FlowBuilder.html#a3ea810696c4b29824d1aaef15342c825" title="constructs an index range-based parallel-reduction task">tf::Taskflow::reduce_by_index</a> gives each worker a contiguous <em>subrange</em> of the index space. This allows the local reduction to be written as an explicit loop over the subrange, enabling optimisations such as SIMD vectorisation, custom accumulator types, or data initialisation interleaved with reduction. The method, <a class="el" href="classtf_1_1FlowBuilder.html#a3ea810696c4b29824d1aaef15342c825" title="constructs an index range-based parallel-reduction task">tf::Taskflow::reduce_by_index</a>, represents the parallel execution of the following two-phase loop:</p>
<div class="fragment"><div class="line"><span class="comment">// phase 1: each worker computes a partial result over its subrange</span></div>
<div class="line">T partial = lop(subrange, std::nullopt);    <span class="comment">// first subrange: no prior total</span></div>
<div class="line">T partial = lop(subrange, running_total);   <span class="comment">// subsequent subranges: accumulate</span></div>
<div class="line"> </div>
<div class="line"><span class="comment">// phase 2: combine all partial results into the final result</span></div>
<div class="line">result = gop(result, partial1);</div>
<div class="line">result = gop(result, partial2);</div>
<div class="line"><span class="comment">// ...</span></div>
</div><!-- fragment --><p>The local operator <code>lop</code> is invoked once per subrange assigned to a worker. Its second argument is a <code>std::optional&lt;T&gt;</code> carrying the running total accumulated by that worker so far:</p>
<li><code>std::nullopt</code> on the first subrange processed by a worker — the worker should initialise its accumulator from scratch.</li>
<li>A value on subsequent subranges — the worker should continue accumulating from the provided running total.</li>
<p>The global operator <code>gop</code> combines the per-worker partial results and the initial value of <code>result</code> into the final answer.</p>
<p>The example below performs a sum-reduction over a large array, initialising each element inside the local reducer:</p>
<div class="fragment"><div class="line">std::vector&lt;double&gt; data(100000);</div>
<div class="line"><span class="keywordtype">double</span> res = 1.0;</div>
<div class="line"> </div>
<div class="line">taskflow.reduce_by_index(</div>
<div class="line">  <a class="code hl_typedef" href="namespacetf.html#a6c928ec9248757ba8276e316ef26846b">tf::IndexRange&lt;size_t&gt;</a>(0, data.size(), 1),</div>
<div class="line">  res,</div>
<div class="line">  <span class="comment">// local reducer: called once per subrange per worker</span></div>
<div class="line">  [&amp;](<a class="code hl_typedef" href="namespacetf.html#a6c928ec9248757ba8276e316ef26846b">tf::IndexRange&lt;size_t&gt;</a> subrange, std::optional&lt;double&gt; running_total) {</div>
<div class="line">    double partial = running_total ? *running_total : 0.0;</div>
<div class="line">    for(size_t i = subrange.begin(); i &lt; subrange.end(); i += subrange.step_size()) {</div>
<div class="line">      data[i] = 1.0;</div>
<div class="line">      partial += data[i];</div>
<div class="line">    }</div>
<div class="line">    <span class="keywordflow">return</span> partial;</div>
<div class="line">  },</div>
<div class="line">  <span class="comment">// global reducer: combines partial results into res</span></div>
<div class="line">  std::plus&lt;double&gt;()</div>
<div class="line">);</div>
<div class="line"> </div>
<div class="line">executor.run(taskflow).wait();</div>
<div class="line">assert(res == 100001.0);  <span class="comment">// 1.0 (initial) + 100000 * 1.0</span></div>
<div class="ttc" id="anamespacetf_html_a6c928ec9248757ba8276e316ef26846b"><div class="ttname"><a href="namespacetf.html#a6c928ec9248757ba8276e316ef26846b">tf::IndexRange</a></div><div class="ttdeci">IndexRanges&lt; T, 1 &gt; IndexRange</div><div class="ttdoc">alias for the common 1D case of tf::IndexRanges</div><div class="ttdef"><b>Definition</b> iterator.hpp:971</div></div>
</div><!-- fragment --><p>The global reducer combines all partial results with the initial value of <code>res</code> (here <code>1.0</code>), so the final answer is <code>1.0</code> + <code>100000.0</code> = <code>100001.0</code>.</p>
<h2><a class="anchor" id="ParallelReduceByIndexCaptureByReference"></a>
Capture IndexRange by Reference</h2>
<p>You can pass the index range by reference using <a href="https://en.cppreference.com/w/cpp/utility/functional/ref">std::ref</a> so that an upstream task can set the bounds before the parallel-reduce-by-index runs.</p>
<div class="fragment"><div class="line">std::vector&lt;double&gt; data;</div>
<div class="line"><span class="keywordtype">double</span> res = 0.0;</div>
<div class="line"> </div>
<div class="line"><a class="code hl_typedef" href="namespacetf.html#a6c928ec9248757ba8276e316ef26846b">tf::IndexRange&lt;size_t&gt;</a> range(0, 0, 1);  <span class="comment">// placeholder — filled by init</span></div>
<div class="line"> </div>
<div class="line"><a class="code hl_class" href="classtf_1_1Task.html">tf::Task</a> init = taskflow.emplace([&amp;]() {</div>
<div class="line">  data.assign(100000, 1.0);</div>
<div class="line">  range.begin(0).end(data.size()).step_size(1);</div>
<div class="line">});</div>
<div class="line"> </div>
<div class="line"><a class="code hl_class" href="classtf_1_1Task.html">tf::Task</a> task = taskflow.reduce_by_index(</div>
<div class="line">  std::ref(range),</div>
<div class="line">  res,</div>
<div class="line">  [&amp;](<a class="code hl_typedef" href="namespacetf.html#a6c928ec9248757ba8276e316ef26846b">tf::IndexRange&lt;size_t&gt;</a> subrange, std::optional&lt;double&gt; running_total) {</div>
<div class="line">    <span class="keywordtype">double</span> partial = running_total ? *running_total : 0.0;</div>
<div class="line">    <span class="keywordflow">for</span>(<span class="keywordtype">size_t</span> i = subrange.<a class="code hl_function" href="classtf_1_1IndexRanges.html#ae37261f0d2f326449c469233561a3da6">begin</a>(); i &lt; subrange.<a class="code hl_function" href="classtf_1_1IndexRanges.html#a253e15e199f974ee26b2e33a5e2b3cf1">end</a>(); i += subrange.<a class="code hl_function" href="classtf_1_1IndexRanges.html#afcb30e2b9567ad685e702201d2265880">step_size</a>()) {</div>
<div class="line">      partial += data[i];</div>
<div class="line">    }</div>
<div class="line">    <span class="keywordflow">return</span> partial;</div>
<div class="line">  },</div>
<div class="line">  std::plus&lt;double&gt;()</div>
<div class="line">);</div>
<div class="line"> </div>
<div class="line"><span class="comment">// wrong! range is captured by copy at construction time</span></div>
<div class="line"><span class="comment">// tf::Task task = taskflow.reduce_by_index(range, res, lop, gop);</span></div>
<div class="line"> </div>
<div class="line">init.<a class="code hl_function" href="classtf_1_1Task.html#a8c78c453295a553c1c016e4062da8588">precede</a>(task);</div>
<div class="line"> </div>
<div class="line">executor.run(taskflow).wait();</div>
<div class="line">assert(res == 100000.0);</div>
<div class="ttc" id="aclasstf_1_1IndexRanges_html_a253e15e199f974ee26b2e33a5e2b3cf1"><div class="ttname"><a href="classtf_1_1IndexRanges.html#a253e15e199f974ee26b2e33a5e2b3cf1">tf::IndexRanges::end</a></div><div class="ttdeci">T end() const</div><div class="ttdoc">queries the ending index of the range (only available when N == 1)</div><div class="ttdef"><b>Definition</b> iterator.hpp:358</div></div>
<div class="ttc" id="aclasstf_1_1IndexRanges_html_ae37261f0d2f326449c469233561a3da6"><div class="ttname"><a href="classtf_1_1IndexRanges.html#ae37261f0d2f326449c469233561a3da6">tf::IndexRanges::begin</a></div><div class="ttdeci">T begin() const</div><div class="ttdoc">queries the starting index of the range (only available when N == 1)</div><div class="ttdef"><b>Definition</b> iterator.hpp:346</div></div>
<div class="ttc" id="aclasstf_1_1IndexRanges_html_afcb30e2b9567ad685e702201d2265880"><div class="ttname"><a href="classtf_1_1IndexRanges.html#afcb30e2b9567ad685e702201d2265880">tf::IndexRanges::step_size</a></div><div class="ttdeci">T step_size() const</div><div class="ttdoc">queries the step size of the range (only available when N == 1)</div><div class="ttdef"><b>Definition</b> iterator.hpp:370</div></div>
</div><!-- fragment --><h1><a class="anchor" id="ParallelReductionConfigureAPartitioner"></a>
Configure a Partitioner</h1>
<p>A partitioner controls how the iteration space is divided among workers. Taskflow provides four partitioners, each suited to different workload characteristics:</p>
<li><a class="el" href="classtf_1_1StaticPartitioner.html" title="class to construct a static partitioner for scheduling parallel algorithms">tf::StaticPartitioner</a> divides the range into equal-sized chunks ahead of execution and assigns them to workers in order. It has the lowest scheduling overhead and delivers the best performance when every element costs roughly the same amount of work to reduce.</li>
<li><a class="el" href="classtf_1_1DynamicPartitioner.html" title="class to create a dynamic partitioner for scheduling parallel algorithms">tf::DynamicPartitioner</a> distributes fixed-sized chunks to workers on demand as they become available. It adapts well to workloads where reduction cost varies per element, at the expense of slightly higher coordination overhead.</li>
<li><a class="el" href="classtf_1_1GuidedPartitioner.html" title="class to create a guided partitioner for scheduling parallel algorithms">tf::GuidedPartitioner</a> distributes chunks whose size decreases adaptively as work is consumed — large chunks early to reduce overhead, smaller chunks late to balance the tail. This is the default partitioner and delivers stable, near-optimal performance across a wide range of workloads.</li>
<li><a class="el" href="classtf_1_1RandomPartitioner.html" title="class to construct a random partitioner for scheduling parallel algorithms">tf::RandomPartitioner</a> distributes chunks of randomly sampled sizes, which can help avoid systematic load imbalances caused by data-dependent cost patterns.</li>
<p>The following example creates two parallel-reduction tasks using different partitioners:</p>
<div class="fragment"><div class="line"><span class="keywordtype">int</span> sum1 = 100, sum2 = 100;</div>
<div class="line">std::vector&lt;int&gt; vec = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};</div>
<div class="line"> </div>
<div class="line"><a class="code hl_class" href="classtf_1_1StaticPartitioner.html">tf::StaticPartitioner</a> static_partitioner(0);  <span class="comment">// chunk size auto-determined</span></div>
<div class="line"><a class="code hl_class" href="classtf_1_1GuidedPartitioner.html">tf::GuidedPartitioner</a>  guided_partitioner(0); <span class="comment">// minimum chunk size auto-determined</span></div>
<div class="line"> </div>
<div class="line"><span class="comment">// parallel-reduction with static partitioner</span></div>
<div class="line">taskflow.reduce(vec.begin(), vec.end(), sum1,</div>
<div class="line">  [](<span class="keywordtype">int</span> l, <span class="keywordtype">int</span> r) { return l + r; },</div>
<div class="line">  static_partitioner</div>
<div class="line">);</div>
<div class="line"> </div>
<div class="line"><span class="comment">// parallel-reduction with guided partitioner</span></div>
<div class="line">taskflow.reduce(vec.begin(), vec.end(), sum2,</div>
<div class="line">  [](<span class="keywordtype">int</span> l, <span class="keywordtype">int</span> r) { return l + r; },</div>
<div class="line">  guided_partitioner</div>
<div class="line">);</div>
<div class="ttc" id="aclasstf_1_1GuidedPartitioner_html"><div class="ttname"><a href="classtf_1_1GuidedPartitioner.html">tf::GuidedPartitioner</a></div><div class="ttdoc">class to create a guided partitioner for scheduling parallel algorithms</div><div class="ttdef"><b>Definition</b> partitioner.hpp:417</div></div>
<div class="ttc" id="aclasstf_1_1StaticPartitioner_html"><div class="ttname"><a href="classtf_1_1StaticPartitioner.html">tf::StaticPartitioner</a></div><div class="ttdoc">class to construct a static partitioner for scheduling parallel algorithms</div><div class="ttdef"><b>Definition</b> partitioner.hpp:262</div></div>
</div><!-- fragment --><p>As a rule of thumb, prefer <a class="el" href="classtf_1_1StaticPartitioner.html" title="class to construct a static partitioner for scheduling parallel algorithms">tf::StaticPartitioner</a> when every element costs the same to reduce (e.g., summation over a plain array) and <a class="el" href="classtf_1_1GuidedPartitioner.html" title="class to create a guided partitioner for scheduling parallel algorithms">tf::GuidedPartitioner</a> for irregular workloads (e.g., reductions whose cost depends on the element value). <a class="el" href="classtf_1_1DynamicPartitioner.html" title="class to create a dynamic partitioner for scheduling parallel algorithms">tf::DynamicPartitioner</a> is a good choice when chunks must be kept small and strictly equal in size.</p>
<dl class="section note"><dt>Note</dt><dd>By default, parallel-reduction tasks use <a class="el" href="namespacetf.html#ace2c5adcd5039483eebb6dbdbb6f33e3" title="default partitioner set to tf::GuidedPartitioner">tf::DefaultPartitioner</a> (currently <a class="el" href="classtf_1_1GuidedPartitioner.html" title="class to create a guided partitioner for scheduling parallel algorithms">tf::GuidedPartitioner</a>) if no partitioner is specified. </dd></dl>
</div></div><!-- contents -->
</div><!-- PageDoc -->
</div><!-- doc-content -->
<!-- HTML footer for doxygen 1.13.1-->
<!-- start footer part -->
<div id="nav-path" class="navpath"><!-- id is needed for treeview function! -->
    <li class="navelem"><a class="el" href="Algorithms.html">Taskflow Algorithms</a></li>
    <li class="footer">
      Maintained by <a href="https://tsung-wei-huang.github.io/">Dr. Tsung-Wei Huang</a>
      &nbsp;&mdash;&nbsp;
      Generated by <a href="https://www.doxygen.org/index.html"><img class="footer" src="doxygen.svg" width="104" height="31" alt="doxygen"/></a> 1.13.1
Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

FilesExpand file tree

ParallelReduction.html

Latest commit

History

ParallelReduction.html

File metadata and controls