-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Expand file tree
/
Copy pathExamplesDivideAndConquer.html
More file actions
286 lines (284 loc) · 21.5 KB
/
Copy pathExamplesDivideAndConquer.html
File metadata and controls
286 lines (284 loc) · 21.5 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
<!-- HTML header for doxygen 1.13.1-->
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "https://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en-US">
<head>
<meta http-equiv="Content-Type" content="text/xhtml;charset=UTF-8"/>
<meta http-equiv="X-UA-Compatible" content="IE=11"/>
<meta name="generator" content="Doxygen 1.13.1"/>
<meta name="viewport" content="width=device-width, initial-scale=1"/>
<title>Taskflow: A General-purpose Task-parallel Programming System: Divide-and-Conquer Parallelism</title>
<link href="tabs.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="jquery.js"></script>
<script type="text/javascript" src="dynsections.js"></script>
<script type="text/javascript" src="clipboard.js"></script>
<link href="navtree.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="navtreedata.js"></script>
<script type="text/javascript" src="navtree.js"></script>
<script type="text/javascript" src="resize.js"></script>
<script type="text/javascript" src="cookie.js"></script>
<link href="search/search.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="search/searchdata.js"></script>
<script type="text/javascript" src="search/search.js"></script>
<link href="doxygen.css" rel="stylesheet" type="text/css" />
<link href="custom.css" rel="stylesheet" type="text/css"/>
</head>
<body>
<div id="top"><!-- do not remove this div, it is closed by doxygen! -->
<div id="titlearea">
<table cellspacing="0" cellpadding="0">
<tbody>
<tr id="projectrow">
<td id="projectlogo"><img alt="Logo" src="taskflow_logo.png"/></td>
<td id="projectalign">
<div id="projectname"><a href="https://github.com/taskflow/taskflow" style="color:inherit; text-decoration:none;">Taskflow: A General-purpose Task-parallel Programming System</a>
</div>
</td>
</tr>
</tbody>
</table>
</div>
<!-- end header part -->
<!-- Generated by Doxygen 1.13.1 -->
<script type="text/javascript">
/* @license magnet:?xt=urn:btih:d3d9a9a6595521f9666a5e94cc830dab83b65699&dn=expat.txt MIT */
var searchBox = new SearchBox("searchBox", "search/",'.html');
/* @license-end */
</script>
<script type="text/javascript">
/* @license magnet:?xt=urn:btih:d3d9a9a6595521f9666a5e94cc830dab83b65699&dn=expat.txt MIT */
$(function() { codefold.init(0); });
/* @license-end */
</script>
<script type="text/javascript" src="menudata.js"></script>
<script type="text/javascript" src="menu.js"></script>
<script type="text/javascript">
/* @license magnet:?xt=urn:btih:d3d9a9a6595521f9666a5e94cc830dab83b65699&dn=expat.txt MIT */
$(function() {
initMenu('',true,false,'search.php','Search',true);
$(function() { init_search(); });
});
/* @license-end */
</script>
<div id="main-nav"></div>
</div><!-- top -->
<div id="side-nav" class="ui-resizable side-nav-resizable">
<div id="nav-tree">
<div id="nav-tree-contents">
<div id="nav-sync" class="sync"></div>
</div>
</div>
<div id="splitbar" style="-moz-user-select:none;"
class="ui-resizable-handle">
</div>
</div>
<script type="text/javascript">
/* @license magnet:?xt=urn:btih:d3d9a9a6595521f9666a5e94cc830dab83b65699&dn=expat.txt MIT */
$(function(){initNavTree('ExamplesDivideAndConquer.html',''); initResizable(true); });
/* @license-end */
</script>
<div id="doc-content">
<!-- window showing the filter options -->
<div id="MSearchSelectWindow"
onmouseover="return searchBox.OnSearchSelectShow()"
onmouseout="return searchBox.OnSearchSelectHide()"
onkeydown="return searchBox.OnSearchSelectKey(event)">
</div>
<!-- iframe showing the search results (closed by default) -->
<div id="MSearchResultsWindow">
<div id="MSearchResults">
<div class="SRPage">
<div id="SRIndex">
<div id="SRResults"></div>
<div class="SRStatus" id="Loading">Loading...</div>
<div class="SRStatus" id="Searching">Searching...</div>
<div class="SRStatus" id="NoMatches">No Matches</div>
</div>
</div>
</div>
</div>
<div><div class="header">
<div class="headertitle"><div class="title">Divide-and-Conquer Parallelism</div></div>
</div><!--header-->
<div class="contents">
<div class="toc"><h3>Table of Contents</h3>
<ul>
<li class="level1">
<a href="#DivideAndConquerWhatIs">What is Divide-and-Conquer?</a>
</li>
<li class="level1">
<a href="#DivideAndConquerMergeSort">Merge Sort Example</a>
</li>
<li class="level1">
<a href="#DivideAndConquerParallelMergeSort">Parallel Merge Sort with TaskGroup</a>
</li>
<li class="level1">
<a href="#DivideAndConquerGeneralTemplate">General Template</a>
</li>
<li class="level1">
<a href="#DivideAndConquerBenchmarking">Benchmarking</a>
</li>
</ul>
</div>
<div class="textblock"><p>We study how to express <em>divide-and-conquer</em> parallelism using Taskflow, using parallel merge sort as a concrete example. Divide-and-conquer is one of the most important parallel programming patterns and appears across sorting, searching, tree traversal, numerical computation, and graph algorithms.</p>
<h1><a class="anchor" id="DivideAndConquerWhatIs"></a>
What is Divide-and-Conquer?</h1>
<p>A divide-and-conquer algorithm solves a problem by recursively splitting it into two or more independent subproblems, solving each subproblem separately, and combining the results. The recursive structure is:</p>
<div class="fragment"><div class="line">Result solve(Problem P) {</div>
<div class="line"> <span class="keywordflow">if</span>(P is small enough) <span class="keywordflow">return</span> solve_directly(P); <span class="comment">// base case</span></div>
<div class="line"> <span class="keyword">auto</span> [P1, P2] = divide(P); <span class="comment">// split</span></div>
<div class="line"> Result R1 = solve(P1); <span class="comment">// solve left</span></div>
<div class="line"> Result R2 = solve(P2); <span class="comment">// solve right</span></div>
<div class="line"> <span class="keywordflow">return</span> combine(R1, R2); <span class="comment">// merge</span></div>
<div class="line">}</div>
</div><!-- fragment --><p>The key observation is that the two recursive calls, <code>solve(P1)</code> and <code>solve(P2)</code>, are <em>independent</em> of each other and can therefore run in parallel. The combine step must wait for both, but no other dependency exists.</p>
<p>Taskflow's <a class="el" href="classtf_1_1TaskGroup.html" title="class to create a task group from a task">tf::TaskGroup</a> provides a natural way to express this: spawn one subproblem asynchronously, solve the other inline (tail optimisation), then call <code>corun</code> to wait cooperatively for the spawned task before combining. Note that <code>corun</code> does not block the calling worker but the program control flow. The calling worker remain participating in the work-stealing loop to help execute the spawned subtask or any other available work in a cooperative manner.</p>
<h1><a class="anchor" id="DivideAndConquerMergeSort"></a>
Merge Sort Example</h1>
<p>Merge sort is the canonical divide-and-conquer algorithm:</p>
<ol type="1">
<li><b>Divide:</b> split the array at the midpoint</li>
<li><b>Conquer:</b> recursively sort the left and right halves in parallel</li>
<li><b>Combine:</b> merge the two sorted halves</li>
</ol>
<div class="dotgraph">
<iframe scrolling="no" frameborder="0" src="dot_parallel_sort.svg" width="688" height="500"><p><b>This browser is not able to show SVG: try Firefox, Chrome, Safari, or Opera instead.</b></p></iframe></div>
<p>The sequential implementation is:</p>
<div class="fragment"><div class="line"><span class="keywordtype">void</span> merge_sort(std::vector<int>& data, <span class="keywordtype">int</span> left, <span class="keywordtype">int</span> right) {</div>
<div class="line"> <span class="keywordflow">if</span>(right - left <= 1) <span class="keywordflow">return</span>;</div>
<div class="line"> <span class="keywordtype">int</span> mid = left + (right - left) / 2;</div>
<div class="line"> merge_sort(data, left, mid); <span class="comment">// left half</span></div>
<div class="line"> merge_sort(data, mid, right); <span class="comment">// right half</span></div>
<div class="line"> std::inplace_merge(</div>
<div class="line"> data.begin() + left,</div>
<div class="line"> data.begin() + mid,</div>
<div class="line"> data.begin() + right</div>
<div class="line"> );</div>
<div class="line">}</div>
</div><!-- fragment --><h1><a class="anchor" id="DivideAndConquerParallelMergeSort"></a>
Parallel Merge Sort with TaskGroup</h1>
<p>We parallelise the two recursive calls using <a class="el" href="classtf_1_1TaskGroup.html" title="class to create a task group from a task">tf::TaskGroup</a>. A task group is created from the executor and provides the same <code>silent_async</code> and <code>corun</code> interface as <a class="el" href="classtf_1_1Runtime.html" title="class to create a runtime task">tf::Runtime</a>, but can be used from any execution context of a worker. There is no need for the caller to already be a runtime task.</p>
<p>We also apply a <em>cutoff:</em> when the subrange is smaller than a threshold, we fall back to <code>std::sort</code> rather than spawning more tasks. Without a cutoff, the recursion would create far more tasks than there are cores, and the scheduling overhead would dominate the actual computation:</p>
<div class="fragment"><div class="line"><span class="preprocessor">#include <taskflow/taskflow.hpp></span></div>
<div class="line"> </div>
<div class="line"><a class="code hl_class" href="classtf_1_1Executor.html">tf::Executor</a> executor;</div>
<div class="line"> </div>
<div class="line"><span class="keywordtype">void</span> parallel_merge_sort(</div>
<div class="line"> std::vector<int>& data, <span class="keywordtype">int</span> left, <span class="keywordtype">int</span> right, <span class="keywordtype">int</span> cutoff = 1024</div>
<div class="line">) {</div>
<div class="line"> <span class="keywordflow">if</span>(right - left <= cutoff) {</div>
<div class="line"> <span class="comment">// base case: range is small enough — sort sequentially</span></div>
<div class="line"> std::sort(data.begin() + left, data.begin() + right);</div>
<div class="line"> <span class="keywordflow">return</span>;</div>
<div class="line"> }</div>
<div class="line"> </div>
<div class="line"> <span class="keywordtype">int</span> mid = left + (right - left) / 2;</div>
<div class="line"> </div>
<div class="line"> <span class="comment">// create a task group to manage the spawned subtask</span></div>
<div class="line"> <a class="code hl_class" href="classtf_1_1TaskGroup.html">tf::TaskGroup</a> tg = executor.<a class="code hl_function" href="classtf_1_1Executor.html#a8858a119b9ad697748293c4bb1408853">task_group</a>();</div>
<div class="line"> </div>
<div class="line"> <span class="comment">// spawn the left half asynchronously</span></div>
<div class="line"> tg.<a class="code hl_function" href="classtf_1_1TaskGroup.html#acf90acfcaf9468adc56bf647208a9e78">silent_async</a>([&, left, mid, cutoff]() {</div>
<div class="line"> parallel_merge_sort(data, left, mid, cutoff);</div>
<div class="line"> });</div>
<div class="line"> </div>
<div class="line"> <span class="comment">// sort the right half inline (tail optimisation — avoids one extra task)</span></div>
<div class="line"> parallel_merge_sort(data, mid, right, cutoff);</div>
<div class="line"> </div>
<div class="line"> <span class="comment">// cooperatively wait for the left half to finish</span></div>
<div class="line"> <span class="comment">// the calling worker stays active and helps execute other tasks while waiting</span></div>
<div class="line"> tg.corun();</div>
<div class="line"> </div>
<div class="line"> <span class="comment">// both halves are now sorted — merge them</span></div>
<div class="line"> std::inplace_merge(data.begin() + left, data.begin() + mid, data.begin() + right);</div>
<div class="line">}</div>
<div class="line"> </div>
<div class="line"><span class="keywordtype">int</span> main() {</div>
<div class="line"> </div>
<div class="line"> <span class="keyword">const</span> <span class="keywordtype">int</span> N = 1 << 20; <span class="comment">// 1M elements</span></div>
<div class="line"> std::vector<int> data(N);</div>
<div class="line"> std::generate(data.begin(), data.end(), std::rand);</div>
<div class="line"> </div>
<div class="line"> executor.<a class="code hl_function" href="classtf_1_1Executor.html#af960048056f7c6b5bc71f4f526f05df7">async</a>([&](){ parallel_merge_sort(data, 0, N); }).wait();</div>
<div class="line"> </div>
<div class="line"> assert(std::is_sorted(data.begin(), data.end()));</div>
<div class="line"> std::cout << <span class="stringliteral">"sorted "</span> << N << <span class="stringliteral">" elements\n"</span>;</div>
<div class="line"> </div>
<div class="line"> <span class="keywordflow">return</span> 0;</div>
<div class="line">}</div>
<div class="ttc" id="aclasstf_1_1Executor_html"><div class="ttname"><a href="classtf_1_1Executor.html">tf::Executor</a></div><div class="ttdoc">class to create an executor</div><div class="ttdef"><b>Definition</b> executor.hpp:62</div></div>
<div class="ttc" id="aclasstf_1_1Executor_html_a8858a119b9ad697748293c4bb1408853"><div class="ttname"><a href="classtf_1_1Executor.html#a8858a119b9ad697748293c4bb1408853">tf::Executor::task_group</a></div><div class="ttdeci">TaskGroup task_group()</div><div class="ttdoc">creates a task group that executes a collection of asynchronous tasks</div><div class="ttdef"><b>Definition</b> task_group.hpp:875</div></div>
<div class="ttc" id="aclasstf_1_1Executor_html_af960048056f7c6b5bc71f4f526f05df7"><div class="ttname"><a href="classtf_1_1Executor.html#af960048056f7c6b5bc71f4f526f05df7">tf::Executor::async</a></div><div class="ttdeci">auto async(P &&params, F &&func)</div><div class="ttdoc">creates a parameterized asynchronous task to run the given function</div></div>
<div class="ttc" id="aclasstf_1_1TaskGroup_html"><div class="ttname"><a href="classtf_1_1TaskGroup.html">tf::TaskGroup</a></div><div class="ttdoc">class to create a task group from a task</div><div class="ttdef"><b>Definition</b> task_group.hpp:61</div></div>
<div class="ttc" id="aclasstf_1_1TaskGroup_html_acf90acfcaf9468adc56bf647208a9e78"><div class="ttname"><a href="classtf_1_1TaskGroup.html#acf90acfcaf9468adc56bf647208a9e78">tf::TaskGroup::silent_async</a></div><div class="ttdeci">void silent_async(F &&f)</div><div class="ttdoc">runs the given function asynchronously without returning any future object</div><div class="ttdef"><b>Definition</b> task_group.hpp:756</div></div>
</div><!-- fragment --><p>Several design choices in this example apply broadly to parallel divide-and-conquer algorithms:</p>
<dl class="section user"><dt>Cutoff threshold</dt><dd></dd></dl>
<p>Recursively spawning tasks down to single elements would create O(N) tasks with trivial work each. The cutoff switches to <code>std::sort</code> for small subranges, keeping the task count proportional to the available parallelism rather than the input size. A good rule of thumb is to set the cutoff so that the sequential base case takes at least a few microseconds — typically a few thousand elements for integer sorting.</p>
<dl class="section user"><dt>Tail optimisation</dt><dd></dd></dl>
<p>Only the left half is spawned as an async task; the right half is sorted inline in the current execution context. This halves the number of tasks at every level of the recursion tree and eliminates one level of task-creation overhead per recursive call.</p>
<dl class="section user"><dt>Cooperative waiting</dt><dd></dd></dl>
<p>Calling <a class="el" href="classtf_1_1TaskGroup.html#a1f481dc466e3107a08346d1a124677bc" title="corun all tasks spawned by this task group with other workers">tf::TaskGroup::corun</a> does not block the calling thread from making progress but only the program control flow (i.e., the program execution will not proceed until <code>corun</code> returns). Instead, the calling thread participates in the work-stealing loop while waiting for the left-half task to complete, ensuring all threads remain productive. This is essential for recursive parallelism: if the calling thread blocked, it would hold a worker hostage while the spawned task waits for a free worker, potentially causing deadlock or severe under-utilisation.</p>
<dl class="section user"><dt>TaskGroup Lifetime</dt><dd></dd></dl>
<p>A <a class="el" href="classtf_1_1TaskGroup.html" title="class to create a task group from a task">tf::TaskGroup</a> can only be created by a worker of an executor due to the support for cooperative execution.</p>
<h1><a class="anchor" id="DivideAndConquerGeneralTemplate"></a>
General Template</h1>
<p>The same <a class="el" href="classtf_1_1TaskGroup.html" title="class to create a task group from a task">tf::TaskGroup</a> pattern applies to any divide-and-conquer algorithm. Replace the merge sort specifics with the divide/conquer/combine steps of your algorithm:</p>
<div class="fragment"><div class="line"><span class="keywordtype">void</span> solve(Problem& P, <span class="keywordtype">int</span> cutoff) {</div>
<div class="line"> <span class="keywordflow">if</span>(P.size() <= cutoff) {</div>
<div class="line"> solve_directly(P); <span class="comment">// sequential base case</span></div>
<div class="line"> <span class="keywordflow">return</span>;</div>
<div class="line"> }</div>
<div class="line"> </div>
<div class="line"> <span class="keyword">auto</span> [P1, P2] = P.divide();</div>
<div class="line"> </div>
<div class="line"> <a class="code hl_class" href="classtf_1_1TaskGroup.html">tf::TaskGroup</a> tg = executor.<a class="code hl_function" href="classtf_1_1Executor.html#a8858a119b9ad697748293c4bb1408853">task_group</a>();</div>
<div class="line"> </div>
<div class="line"> tg.<a class="code hl_function" href="classtf_1_1TaskGroup.html#acf90acfcaf9468adc56bf647208a9e78">silent_async</a>([&P1, cutoff]() {</div>
<div class="line"> solve(P1, cutoff); <span class="comment">// solve left half asynchronously</span></div>
<div class="line"> });</div>
<div class="line"> </div>
<div class="line"> solve(P2, cutoff); <span class="comment">// solve right half inline</span></div>
<div class="line"> </div>
<div class="line"> tg.<a class="code hl_function" href="classtf_1_1TaskGroup.html#a1f481dc466e3107a08346d1a124677bc">corun</a>(); <span class="comment">// wait cooperatively for left half</span></div>
<div class="line"> </div>
<div class="line"> P.combine(P1, P2); <span class="comment">// merge results</span></div>
<div class="line">}</div>
<div class="line"> </div>
<div class="line">executor.<a class="code hl_function" href="classtf_1_1Executor.html#af960048056f7c6b5bc71f4f526f05df7">async</a>([&](){ solve(); }).wait();</div>
<div class="ttc" id="aclasstf_1_1TaskGroup_html_a1f481dc466e3107a08346d1a124677bc"><div class="ttname"><a href="classtf_1_1TaskGroup.html#a1f481dc466e3107a08346d1a124677bc">tf::TaskGroup::corun</a></div><div class="ttdeci">void corun()</div><div class="ttdoc">corun all tasks spawned by this task group with other workers</div><div class="ttdef"><b>Definition</b> task_group.hpp:725</div></div>
</div><!-- fragment --><p>Examples of divide-and-conquer algorithms that fit this template directly:</p>
<ul>
<li><b>Quicksort:</b> partition around a pivot, recursively sort each partition</li>
<li><b>Binary</b> <b>search:</b> recurse into the half that contains the target</li>
<li><b>Strassen</b> <b>matrix</b> <b>multiplication:</b> recursively multiply 7 sub-matrices</li>
<li><b>Fast</b> <b>Fourier</b> <b>Transform:</b> recursively compute even/odd sub-transforms</li>
<li><b>Parallel</b> <b>reduction:</b> recursively reduce left and right halves, combine</li>
</ul>
<h1><a class="anchor" id="DivideAndConquerBenchmarking"></a>
Benchmarking</h1>
<p><a class="el" href="classtf_1_1Runtime.html" title="class to create a runtime task">Runtime</a> comparison for sorting 1M random integers on a 12-core machine:</p>
<div align="center"> <table class="markdownTable">
<tr class="markdownTableHead">
<th class="markdownTableHeadCenter">Algorithm </th><th class="markdownTableHeadCenter">Time </th></tr>
<tr class="markdownTableRowOdd">
<td class="markdownTableBodyCenter"><code>std::sort</code> (sequential) </td><td class="markdownTableBodyCenter">180 ms </td></tr>
<tr class="markdownTableRowEven">
<td class="markdownTableBodyCenter">Parallel merge sort (cutoff=1024) </td><td class="markdownTableBodyCenter">38 ms </td></tr>
<tr class="markdownTableRowOdd">
<td class="markdownTableBodyCenter">Speed-up </td><td class="markdownTableBodyCenter">~4.7× </td></tr>
</table>
</div><p>The speed-up is sub-linear because <code>std::inplace_merge</code> at the top levels of the recursion tree is sequential and dominates as the recursive parallelism collapses. Using a parallel merge step or switching to parallel quicksort (where the combine step is trivial) would push the speed-up closer to the core count. </p>
</div></div><!-- contents -->
</div><!-- PageDoc -->
</div><!-- doc-content -->
<!-- HTML footer for doxygen 1.13.1-->
<!-- start footer part -->
<div id="nav-path" class="navpath"><!-- id is needed for treeview function! -->
<ul>
<li class="navelem"><a class="el" href="Examples.html">Learning from Examples</a></li>
<li class="footer">
Maintained by <a href="https://tsung-wei-huang.github.io/">Dr. Tsung-Wei Huang</a>
—
Generated by <a href="https://www.doxygen.org/index.html"><img class="footer" src="doxygen.svg" width="104" height="31" alt="doxygen"/></a> 1.13.1
</li>
</ul>
</div>