-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Expand file tree
/
Copy pathProjectMotivation.html
More file actions
220 lines (218 loc) · 17.3 KB
/
Copy pathProjectMotivation.html
File metadata and controls
220 lines (218 loc) · 17.3 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
<!-- HTML header for doxygen 1.13.1-->
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "https://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en-US">
<head>
<meta http-equiv="Content-Type" content="text/xhtml;charset=UTF-8"/>
<meta http-equiv="X-UA-Compatible" content="IE=11"/>
<meta name="generator" content="Doxygen 1.13.1"/>
<meta name="viewport" content="width=device-width, initial-scale=1"/>
<title>Taskflow: A General-purpose Task-parallel Programming System: Project Motivation</title>
<link href="tabs.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="jquery.js"></script>
<script type="text/javascript" src="dynsections.js"></script>
<script type="text/javascript" src="clipboard.js"></script>
<link href="navtree.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="navtreedata.js"></script>
<script type="text/javascript" src="navtree.js"></script>
<script type="text/javascript" src="resize.js"></script>
<script type="text/javascript" src="cookie.js"></script>
<link href="search/search.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="search/searchdata.js"></script>
<script type="text/javascript" src="search/search.js"></script>
<link href="doxygen.css" rel="stylesheet" type="text/css" />
<link href="custom.css" rel="stylesheet" type="text/css"/>
</head>
<body>
<div id="top"><!-- do not remove this div, it is closed by doxygen! -->
<div id="titlearea">
<table cellspacing="0" cellpadding="0">
<tbody>
<tr id="projectrow">
<td id="projectlogo"><img alt="Logo" src="taskflow_logo.png"/></td>
<td id="projectalign">
<div id="projectname"><a href="https://github.com/taskflow/taskflow" style="color:inherit; text-decoration:none;">Taskflow: A General-purpose Task-parallel Programming System</a>
</div>
</td>
</tr>
</tbody>
</table>
</div>
<!-- end header part -->
<!-- Generated by Doxygen 1.13.1 -->
<script type="text/javascript">
/* @license magnet:?xt=urn:btih:d3d9a9a6595521f9666a5e94cc830dab83b65699&dn=expat.txt MIT */
var searchBox = new SearchBox("searchBox", "search/",'.html');
/* @license-end */
</script>
<script type="text/javascript">
/* @license magnet:?xt=urn:btih:d3d9a9a6595521f9666a5e94cc830dab83b65699&dn=expat.txt MIT */
$(function() { codefold.init(0); });
/* @license-end */
</script>
<script type="text/javascript" src="menudata.js"></script>
<script type="text/javascript" src="menu.js"></script>
<script type="text/javascript">
/* @license magnet:?xt=urn:btih:d3d9a9a6595521f9666a5e94cc830dab83b65699&dn=expat.txt MIT */
$(function() {
initMenu('',true,false,'search.php','Search',true);
$(function() { init_search(); });
});
/* @license-end */
</script>
<div id="main-nav"></div>
</div><!-- top -->
<div id="side-nav" class="ui-resizable side-nav-resizable">
<div id="nav-tree">
<div id="nav-tree-contents">
<div id="nav-sync" class="sync"></div>
</div>
</div>
<div id="splitbar" style="-moz-user-select:none;"
class="ui-resizable-handle">
</div>
</div>
<script type="text/javascript">
/* @license magnet:?xt=urn:btih:d3d9a9a6595521f9666a5e94cc830dab83b65699&dn=expat.txt MIT */
$(function(){initNavTree('ProjectMotivation.html',''); initResizable(true); });
/* @license-end */
</script>
<div id="doc-content">
<!-- window showing the filter options -->
<div id="MSearchSelectWindow"
onmouseover="return searchBox.OnSearchSelectShow()"
onmouseout="return searchBox.OnSearchSelectHide()"
onkeydown="return searchBox.OnSearchSelectKey(event)">
</div>
<!-- iframe showing the search results (closed by default) -->
<div id="MSearchResultsWindow">
<div id="MSearchResults">
<div class="SRPage">
<div id="SRIndex">
<div id="SRResults"></div>
<div class="SRStatus" id="Loading">Loading...</div>
<div class="SRStatus" id="Searching">Searching...</div>
<div class="SRStatus" id="NoMatches">No Matches</div>
</div>
</div>
</div>
</div>
<div><div class="header">
<div class="headertitle"><div class="title">Project Motivation</div></div>
</div><!--header-->
<div class="contents">
<div class="toc"><h3>Table of Contents</h3>
<ul>
<li class="level1">
<a href="#TheParallelProgrammingChallenge">The Parallel Programming Challenge</a>
</li>
<li class="level1">
<a href="#HeterogeneousComputing">Heterogeneous Computing</a>
</li>
<li class="level1">
<a href="#LoopVsTaskParallelism">From Loops to Task Graphs</a>
</li>
<li class="level1">
<a href="#TaskBasedParallelism">Task-based Parallelism</a>
</li>
<li class="level1">
<a href="#TheProjectMantra">What Taskflow Does Differently</a>
</li>
</ul>
</div>
<div class="textblock"><p>Writing parallel programs is hard. Getting them right, efficient, and maintainable is even harder. Taskflow exists to close that gap - to let C++ developers express sophisticated parallel workloads using a clean task-parallel graph description language.</p>
<h1><a class="anchor" id="TheParallelProgrammingChallenge"></a>
The Parallel Programming Challenge</h1>
<p>For decades, software performance improved almost for free. Every processor generation brought faster clock speeds, more transistors, and deeper instruction-level parallelism. A program written in 1995 ran measurably faster on 2005 hardware without changing a single line of code.</p>
<p>That era ended. Clock frequencies plateaued around 2004 as chip designers hit the <em>power</em> <em>wall</em> — the point at which adding more transistors simply generated more heat than the chip could dissipate. The industry's answer was to put multiple independent cores on a single die, giving birth to the multicore era.</p>
<div class="image">
<img src="era_multicore.jpg" alt="" width="60%"/>
</div>
<p>The sweeping visualization above (courtesy of Prof. Mark Horowitz and his group) captures this transition clearly: performance scaling has shifted from single-core frequency improvements to parallelism across many cores. Today, every laptop, phone, server, and embedded device ships with multiple cores. The implication for software developers is unavoidable — <em>if</em> <em>your</em> <em>program</em> <em>does</em> <em>not</em> <em>run</em> <em>in</em> <em>parallel</em>, <em>it</em> <em>is</em> <em>leaving</em> <em>performance</em> <em>on</em> <em>the</em> <em>table</em>.</p>
<h1><a class="anchor" id="HeterogeneousComputing"></a>
Heterogeneous Computing</h1>
<p>Multicore CPUs were only the beginning. Driven by the explosive growth of artificial intelligence, scientific simulation, and data analytics, modern systems now combine CPUs, GPUs, TPUs, FPGAs, and domain-specific ASICs — each offering different throughputs, latencies, memory hierarchies, and programming models.</p>
<div class="image">
<img src="CPU-vs-TPU-vs-GPU.png" alt="" width="60%"/>
</div>
<p>Exploiting this hardware diversity is one of the most difficult problems in systems programming. A developer who wants to run a neural network inference must orchestrate data movement from host memory to device memory, launch GPU kernels with the right grid geometry, synchronise CPU and GPU execution at the right points, and do all of this without introducing unnecessary stalls or data races — all while keeping the CPU cores busy in the meantime. Miss any of these concerns and either correctness or performance suffers.</p>
<p>Parallel programming is hard. Parallel programming across heterogeneous devices is a different order of difficulty.</p>
<h1><a class="anchor" id="LoopVsTaskParallelism"></a>
From Loops to Task Graphs</h1>
<p>The first tool most developers reach for is <em>loop-level</em> <em>parallelism:</em> partition the iterations of a loop into blocks and run the blocks concurrently.</p>
<div class="image">
<img src="loop-level-parallelism.jpeg" alt="" width="50%"/>
</div>
<p>Loop parallelism is easy to reason about and well-supported by OpenMP, Intel TBB, and the C++ parallel algorithms library. For regular, independent workloads, such as matrix multiplication, image filters, and element-wise array operations, it delivers excellent results. But real applications are rarely that simple. Consider a program with four computational stages where stage A must complete before stages B and C can begin, and stage D cannot start until both B and C have finished, a diamond-shaped dependency as shown below:</p>
<div class="dotgraph">
<iframe scrolling="no" frameborder="0" src="dot_simple.svg" width="323" height="131"><p><b>This browser is not able to show SVG: try Firefox, Chrome, Safari, or Opera instead.</b></p></iframe></div>
<p>There is no natural way to express this with a parallel-for loop. The dependencies cut across loop boundaries. A developer forced to implement this with raw threads must manually coordinate joins, futures, or condition variables, producing code that obscures the actual parallel structure:</p>
<div class="fragment"><div class="line"><span class="comment">// -- Raw std::thread: a diamond A -> {B, C} -> D ---------------------------</span></div>
<div class="line"><span class="keywordtype">int</span> result_a, result_b, result_c, result_d;</div>
<div class="line"> </div>
<div class="line"><span class="comment">// Step 1: run A sequentially (nothing can overlap)</span></div>
<div class="line">std::thread tA([&]() { result_a = compute_a(); });</div>
<div class="line">tA.join(); <span class="comment">// force sync before B and C can start</span></div>
<div class="line"> </div>
<div class="line"><span class="comment">// Step 2: run B and C in parallel</span></div>
<div class="line">std::thread tB([&]() { result_b = compute_b(result_a); });</div>
<div class="line">std::thread tC([&]() { result_c = compute_c(result_a); });</div>
<div class="line">tB.join();</div>
<div class="line">tC.join(); <span class="comment">// force sync before D can start</span></div>
<div class="line"> </div>
<div class="line"><span class="comment">// Step 3: run D</span></div>
<div class="line">std::thread tD([&]() { result_d = compute_d(result_b, result_c); });</div>
<div class="line">tD.join();</div>
</div><!-- fragment --><p>The join-points expose the structure, but only because this example is tiny. In a real program with dozens of stages, mixed data and control dependencies, and dynamic subgraphs, the synchronisation logic quickly becomes the dominant concern — error-prone, hard to test, and painful to modify. The root problem is that the parallelism is real and the dependencies are real, but the code has no first-class way to represent either. Thread joins and mutexes are <em>mechanisms</em>, not <em>descriptions</em>.</p>
<h1><a class="anchor" id="TaskBasedParallelism"></a>
Task-based Parallelism</h1>
<p>The right abstraction is a <em>task</em> <em>dependency</em> <em>graph</em>. Each node is a unit of work; each edge is a dependency. The runtime takes the graph and schedules tasks on available cores, respecting all dependencies, without the developer needing to think about threads, joins, or synchronisation points.</p>
<div class="dotgraph">
<iframe scrolling="no" frameborder="0" src="dot_task-level-parallelism.svg" width="295" height="347"><p><b>This browser is not able to show SVG: try Firefox, Chrome, Safari, or Opera instead.</b></p></iframe></div>
<p>Task-based parallelism is not a new idea — it has been validated by the research community and embedded in standards such as OpenMP tasks, Intel TBB flow graphs, and C++ executors. What has been missing is a library that makes task graphs as easy to write as sequential code, works across both CPU and heterogeneous devices, and scales to real-world application complexity.</p>
<h1><a class="anchor" id="TheProjectMantra"></a>
What Taskflow Does Differently</h1>
<p>Taskflow gives developers a concise, expressive API for building and executing task dependency graphs in C++. The same diamond example that required manual thread management above becomes self-documenting with Taskflow:</p>
<div class="fragment"><div class="line"><span class="comment">// -- Taskflow: the same diamond, expressed as a graph ----------------------</span></div>
<div class="line"><a class="code hl_class" href="classtf_1_1Executor.html">tf::Executor</a> executor;</div>
<div class="line"><a class="code hl_class" href="classtf_1_1Taskflow.html">tf::Taskflow</a> taskflow;</div>
<div class="line"> </div>
<div class="line"><span class="keywordtype">int</span> result_a, result_b, result_c, result_d;</div>
<div class="line"> </div>
<div class="line"><span class="keyword">auto</span> A = taskflow.<a class="code hl_function" href="classtf_1_1FlowBuilder.html#a4d52a7fe2814b264846a2085e931652c">emplace</a>([&]() { result_a = compute_a(); });</div>
<div class="line"><span class="keyword">auto</span> B = taskflow.<a class="code hl_function" href="classtf_1_1FlowBuilder.html#a4d52a7fe2814b264846a2085e931652c">emplace</a>([&]() { result_b = compute_b(result_a); });</div>
<div class="line"><span class="keyword">auto</span> C = taskflow.<a class="code hl_function" href="classtf_1_1FlowBuilder.html#a4d52a7fe2814b264846a2085e931652c">emplace</a>([&]() { result_c = compute_c(result_a); });</div>
<div class="line"><span class="keyword">auto</span> D = taskflow.<a class="code hl_function" href="classtf_1_1FlowBuilder.html#a4d52a7fe2814b264846a2085e931652c">emplace</a>([&]() { result_d = compute_d(result_b, result_c); });</div>
<div class="line"> </div>
<div class="line">A.<a class="code hl_function" href="classtf_1_1Task.html#a8c78c453295a553c1c016e4062da8588">precede</a>(B, C); <span class="comment">// B and C run after A</span></div>
<div class="line">D.<a class="code hl_function" href="classtf_1_1Task.html#a331b1b726555072e7c7d10941257f664">succeed</a>(B, C); <span class="comment">// D runs after both B and C</span></div>
<div class="line"> </div>
<div class="line">executor.<a class="code hl_function" href="classtf_1_1Executor.html#a519777f5783981d534e9e53b99712069">run</a>(taskflow).wait();</div>
<div class="ttc" id="aclasstf_1_1Executor_html"><div class="ttname"><a href="classtf_1_1Executor.html">tf::Executor</a></div><div class="ttdoc">class to create an executor</div><div class="ttdef"><b>Definition</b> executor.hpp:62</div></div>
<div class="ttc" id="aclasstf_1_1Executor_html_a519777f5783981d534e9e53b99712069"><div class="ttname"><a href="classtf_1_1Executor.html#a519777f5783981d534e9e53b99712069">tf::Executor::run</a></div><div class="ttdeci">tf::Future< void > run(Taskflow &taskflow)</div><div class="ttdoc">runs a taskflow once</div></div>
<div class="ttc" id="aclasstf_1_1FlowBuilder_html_a4d52a7fe2814b264846a2085e931652c"><div class="ttname"><a href="classtf_1_1FlowBuilder.html#a4d52a7fe2814b264846a2085e931652c">tf::FlowBuilder::emplace</a></div><div class="ttdeci">Task emplace(C &&callable)</div><div class="ttdoc">creates a static task</div><div class="ttdef"><b>Definition</b> flow_builder.hpp:1571</div></div>
<div class="ttc" id="aclasstf_1_1Task_html_a331b1b726555072e7c7d10941257f664"><div class="ttname"><a href="classtf_1_1Task.html#a331b1b726555072e7c7d10941257f664">tf::Task::succeed</a></div><div class="ttdeci">Task & succeed(Ts &&... tasks)</div><div class="ttdoc">adds precedence links from other tasks to this</div><div class="ttdef"><b>Definition</b> task.hpp:1266</div></div>
<div class="ttc" id="aclasstf_1_1Task_html_a8c78c453295a553c1c016e4062da8588"><div class="ttname"><a href="classtf_1_1Task.html#a8c78c453295a553c1c016e4062da8588">tf::Task::precede</a></div><div class="ttdeci">Task & precede(Ts &&... tasks)</div><div class="ttdoc">adds precedence links from this to other tasks</div><div class="ttdef"><b>Definition</b> task.hpp:1258</div></div>
<div class="ttc" id="aclasstf_1_1Taskflow_html"><div class="ttname"><a href="classtf_1_1Taskflow.html">tf::Taskflow</a></div><div class="ttdoc">class to create a taskflow object</div><div class="ttdef"><b>Definition</b> taskflow.hpp:64</div></div>
</div><!-- fragment --><p>The dependency structure is explicit and readable. There are no join calls, no mutexes, no condition variables. The executor handles scheduling across all available cores automatically. Adding a new stage, changing a dependency, or reusing the graph requires changing only the parts of the code that actually changed.</p>
<p>This is the design philosophy behind Taskflow:</p>
<ul>
<li><b>Expressiveness</b> — complex parallel patterns should be easy to write </li>
<li><b>Readability</b> — code should communicate its parallel structure, not hide it </li>
<li><b>Transparency</b> — the runtime should be predictable and observable</li>
</ul>
<p>In a nutshell, code written with Taskflow explains itself. Developers focus on algorithms and decomposition strategies, not on low-level synchronisation mechanics. Taskflow scales from simple parallel loops to large heterogeneous pipelines, and does so with a programming model that stays consistent across both. </p>
</div></div><!-- contents -->
</div><!-- PageDoc -->
</div><!-- doc-content -->
<!-- HTML footer for doxygen 1.13.1-->
<!-- start footer part -->
<div id="nav-path" class="navpath"><!-- id is needed for treeview function! -->
<ul>
<li class="navelem"><a class="el" href="Cookbook.html">Cookbook</a></li>
<li class="footer">
Maintained by <a href="https://tsung-wei-huang.github.io/">Dr. Tsung-Wei Huang</a>
—
Generated by <a href="https://www.doxygen.org/index.html"><img class="footer" src="doxygen.svg" width="104" height="31" alt="doxygen"/></a> 1.13.1
</li>
</ul>
</div>