-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Expand file tree
/
Copy pathCompileTaskflowWithCUDA.html
More file actions
256 lines (254 loc) · 21 KB
/
Copy pathCompileTaskflowWithCUDA.html
File metadata and controls
256 lines (254 loc) · 21 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
<!-- HTML header for doxygen 1.13.1-->
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "https://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en-US">
<head>
<meta http-equiv="Content-Type" content="text/xhtml;charset=UTF-8"/>
<meta http-equiv="X-UA-Compatible" content="IE=11"/>
<meta name="generator" content="Doxygen 1.13.1"/>
<meta name="viewport" content="width=device-width, initial-scale=1"/>
<title>Taskflow: A General-purpose Task-parallel Programming System: Compile Taskflow with CUDA</title>
<link href="tabs.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="jquery.js"></script>
<script type="text/javascript" src="dynsections.js"></script>
<script type="text/javascript" src="clipboard.js"></script>
<link href="navtree.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="navtreedata.js"></script>
<script type="text/javascript" src="navtree.js"></script>
<script type="text/javascript" src="resize.js"></script>
<script type="text/javascript" src="cookie.js"></script>
<link href="search/search.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="search/searchdata.js"></script>
<script type="text/javascript" src="search/search.js"></script>
<link href="doxygen.css" rel="stylesheet" type="text/css" />
<link href="custom.css" rel="stylesheet" type="text/css"/>
</head>
<body>
<div id="top"><!-- do not remove this div, it is closed by doxygen! -->
<div id="titlearea">
<table cellspacing="0" cellpadding="0">
<tbody>
<tr id="projectrow">
<td id="projectlogo"><img alt="Logo" src="taskflow_logo.png"/></td>
<td id="projectalign">
<div id="projectname"><a href="https://github.com/taskflow/taskflow" style="color:inherit; text-decoration:none;">Taskflow: A General-purpose Task-parallel Programming System</a>
</div>
</td>
</tr>
</tbody>
</table>
</div>
<!-- end header part -->
<!-- Generated by Doxygen 1.13.1 -->
<script type="text/javascript">
/* @license magnet:?xt=urn:btih:d3d9a9a6595521f9666a5e94cc830dab83b65699&dn=expat.txt MIT */
var searchBox = new SearchBox("searchBox", "search/",'.html');
/* @license-end */
</script>
<script type="text/javascript">
/* @license magnet:?xt=urn:btih:d3d9a9a6595521f9666a5e94cc830dab83b65699&dn=expat.txt MIT */
$(function() { codefold.init(0); });
/* @license-end */
</script>
<script type="text/javascript" src="menudata.js"></script>
<script type="text/javascript" src="menu.js"></script>
<script type="text/javascript">
/* @license magnet:?xt=urn:btih:d3d9a9a6595521f9666a5e94cc830dab83b65699&dn=expat.txt MIT */
$(function() {
initMenu('',true,false,'search.php','Search',true);
$(function() { init_search(); });
});
/* @license-end */
</script>
<div id="main-nav"></div>
</div><!-- top -->
<div id="side-nav" class="ui-resizable side-nav-resizable">
<div id="nav-tree">
<div id="nav-tree-contents">
<div id="nav-sync" class="sync"></div>
</div>
</div>
<div id="splitbar" style="-moz-user-select:none;"
class="ui-resizable-handle">
</div>
</div>
<script type="text/javascript">
/* @license magnet:?xt=urn:btih:d3d9a9a6595521f9666a5e94cc830dab83b65699&dn=expat.txt MIT */
$(function(){initNavTree('CompileTaskflowWithCUDA.html',''); initResizable(true); });
/* @license-end */
</script>
<div id="doc-content">
<!-- window showing the filter options -->
<div id="MSearchSelectWindow"
onmouseover="return searchBox.OnSearchSelectShow()"
onmouseout="return searchBox.OnSearchSelectHide()"
onkeydown="return searchBox.OnSearchSelectKey(event)">
</div>
<!-- iframe showing the search results (closed by default) -->
<div id="MSearchResultsWindow">
<div id="MSearchResults">
<div class="SRPage">
<div id="SRIndex">
<div id="SRResults"></div>
<div class="SRStatus" id="Loading">Loading...</div>
<div class="SRStatus" id="Searching">Searching...</div>
<div class="SRStatus" id="NoMatches">No Matches</div>
</div>
</div>
</div>
</div>
<div><div class="header">
<div class="headertitle"><div class="title">Compile Taskflow with CUDA</div></div>
</div><!--header-->
<div class="contents">
<div class="toc"><h3>Table of Contents</h3>
<ul>
<li class="level1">
<a href="#InstallCUDACompiler">Install CUDA Compiler</a>
</li>
<li class="level1">
<a href="#CompileTaskflowWithCUDADirectly">Compile Source Code Directly</a>
</li>
<li class="level1">
<a href="#CompileTaskflowWithCUDASeparately">Compile Source Code Separately</a>
<ul>
<li class="level2">
<a href="#CompileTaskflowWithCUDANaiveLinking">Link Objects Using nvcc</a>
</li>
<li class="level2">
<a href="#CompileTaskflowWithCUDADifferentLinkers">Link Objects Using Different Linkers</a>
</li>
</ul>
</li>
</ul>
</div>
<div class="textblock"><h1><a class="anchor" id="InstallCUDACompiler"></a>
Install CUDA Compiler</h1>
<p>To compile Taskflow with CUDA code, you need a <code>nvcc</code> compiler. Please visit the official page of <a href="https://developer.nvidia.com/cuda-downloads">Downloading CUDA Toolkit</a>.</p>
<h1><a class="anchor" id="CompileTaskflowWithCUDADirectly"></a>
Compile Source Code Directly</h1>
<p>Taskflow's GPU programming interface for CUDA is tf::cudaFlow. Consider the following <code>simple.cu</code> program that launches a single kernel function to output a message:</p>
<div class="fragment"><div class="line"><span class="preprocessor">#include <taskflow/taskflow.hpp></span></div>
<div class="line"><span class="preprocessor">#include <taskflow/cudaflow.hpp></span> </div>
<div class="line"> </div>
<div class="line"><span class="keywordtype">int</span> main(<span class="keywordtype">int</span> argc, <span class="keyword">const</span> <span class="keywordtype">char</span>** argv) {</div>
<div class="line"> </div>
<div class="line"> <span class="comment">// create a CUDA graph with a single-threaded task</span></div>
<div class="line"> <a class="code hl_typedef" href="namespacetf.html#a713c427e4f9841a90dec67045a3babed">tf::cudaGraph</a> cg;</div>
<div class="line"> cf.single_task([] __device__ () { printf(<span class="stringliteral">"hello CUDA Graph!\n"</span>); });</div>
<div class="line"> </div>
<div class="line"> <span class="comment">// instantiate an executable CUDA graph and run it through a stream</span></div>
<div class="line"> <a class="code hl_typedef" href="namespacetf.html#af19c9b301dc0b0fe2a51a960fa427e83">tf::cudaStream</a> stream;</div>
<div class="line"> <a class="code hl_typedef" href="namespacetf.html#a2be50e6880ead1d49a3fec2fc4bb893e">tf::cudaGraphExec</a> exec(cg);</div>
<div class="line"> </div>
<div class="line"> stream.<a class="code hl_function" href="classtf_1_1cudaStreamBase.html#a7dcdfb79385a57c4c59b7c9f21e8beb9">run</a>(cg).<a class="code hl_function" href="classtf_1_1cudaStreamBase.html#a1e5140505629afd4b3422399f8080cb0">synchronize</a>();</div>
<div class="line"> </div>
<div class="line"> <span class="keywordflow">return</span> 0;</div>
<div class="line">}</div>
<div class="ttc" id="aclasstf_1_1cudaStreamBase_html_a1e5140505629afd4b3422399f8080cb0"><div class="ttname"><a href="classtf_1_1cudaStreamBase.html#a1e5140505629afd4b3422399f8080cb0">tf::cudaStreamBase::synchronize</a></div><div class="ttdeci">cudaStreamBase & synchronize()</div><div class="ttdoc">synchronizes the associated stream</div><div class="ttdef"><b>Definition</b> cuda_stream.hpp:232</div></div>
<div class="ttc" id="aclasstf_1_1cudaStreamBase_html_a7dcdfb79385a57c4c59b7c9f21e8beb9"><div class="ttname"><a href="classtf_1_1cudaStreamBase.html#a7dcdfb79385a57c4c59b7c9f21e8beb9">tf::cudaStreamBase::run</a></div><div class="ttdeci">cudaStreamBase & run(const cudaGraphExecBase< C, D > &exec)</div><div class="ttdoc">runs the given executable CUDA graph</div></div>
<div class="ttc" id="anamespacetf_html_a2be50e6880ead1d49a3fec2fc4bb893e"><div class="ttname"><a href="namespacetf.html#a2be50e6880ead1d49a3fec2fc4bb893e">tf::cudaGraphExec</a></div><div class="ttdeci">cudaGraphExecBase< cudaGraphExecCreator, cudaGraphExecDeleter > cudaGraphExec</div><div class="ttdoc">default smart pointer type to manage a cudaGraphExec_t object with unique ownership</div><div class="ttdef"><b>Definition</b> cudaflow.hpp:23</div></div>
<div class="ttc" id="anamespacetf_html_a713c427e4f9841a90dec67045a3babed"><div class="ttname"><a href="namespacetf.html#a713c427e4f9841a90dec67045a3babed">tf::cudaGraph</a></div><div class="ttdeci">cudaGraphBase< cudaGraphCreator, cudaGraphDeleter > cudaGraph</div><div class="ttdoc">default smart pointer type to manage a cudaGraph_t object with unique ownership</div><div class="ttdef"><b>Definition</b> cudaflow.hpp:18</div></div>
<div class="ttc" id="anamespacetf_html_af19c9b301dc0b0fe2a51a960fa427e83"><div class="ttname"><a href="namespacetf.html#af19c9b301dc0b0fe2a51a960fa427e83">tf::cudaStream</a></div><div class="ttdeci">cudaStreamBase< cudaStreamCreator, cudaStreamDeleter > cudaStream</div><div class="ttdoc">default smart pointer type to manage a cudaStream_t object with unique ownership</div><div class="ttdef"><b>Definition</b> cuda_stream.hpp:340</div></div>
</div><!-- fragment --><p>The easiest way to compile Taskflow with CUDA code (e.g., cudaFlow, kernels) is to use <a href="https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html">nvcc</a>:</p>
<div class="fragment"><div class="line">~$ nvcc -std=c++17 -I path/to/taskflow/ --extended-lambda simple.cu -o simple</div>
<div class="line">~$ ./simple</div>
<div class="line">hello cudaFlow!</div>
</div><!-- fragment --><h1><a class="anchor" id="CompileTaskflowWithCUDASeparately"></a>
Compile Source Code Separately</h1>
<p>Large GPU applications often compile a program into separate objects and link them together to form an executable or a library. You can compile your CPU code and GPU code separately with Taskflow using <code>nvcc</code> and other compilers (such as <code>g++</code> and <code>clang++</code>). Consider the following example that defines two tasks on two different pieces (<code>main.cpp</code> and <code>cudaflow.cpp</code>) of source code:</p>
<div class="fragment"><div class="line"><span class="comment">// main.cpp</span></div>
<div class="line"><span class="preprocessor">#include <taskflow/taskflow.hpp></span></div>
<div class="line"> </div>
<div class="line"><a class="code hl_class" href="classtf_1_1Task.html">tf::Task</a> make_cudaflow(<a class="code hl_class" href="classtf_1_1Taskflow.html">tf::Taskflow</a>& taskflow); <span class="comment">// create a cudaFlow task</span></div>
<div class="line"> </div>
<div class="line"><span class="keywordtype">int</span> main() {</div>
<div class="line"> </div>
<div class="line"> <a class="code hl_class" href="classtf_1_1Executor.html">tf::Executor</a> executor;</div>
<div class="line"> <a class="code hl_class" href="classtf_1_1Taskflow.html">tf::Taskflow</a> taskflow;</div>
<div class="line"> </div>
<div class="line"> <a class="code hl_class" href="classtf_1_1Task.html">tf::Task</a> task1 = taskflow.<a class="code hl_function" href="classtf_1_1FlowBuilder.html#a4d52a7fe2814b264846a2085e931652c">emplace</a>([](){ std::cout << <span class="stringliteral">"main.cpp!\n"</span>; })</div>
<div class="line"> .name(<span class="stringliteral">"cpu task"</span>);</div>
<div class="line"> <a class="code hl_class" href="classtf_1_1Task.html">tf::Task</a> task2 = make_cudaflow(taskflow);</div>
<div class="line"> </div>
<div class="line"> task1.<a class="code hl_function" href="classtf_1_1Task.html#a8c78c453295a553c1c016e4062da8588">precede</a>(task2);</div>
<div class="line"> </div>
<div class="line"> executor.<a class="code hl_function" href="classtf_1_1Executor.html#a519777f5783981d534e9e53b99712069">run</a>(taskflow).wait();</div>
<div class="line"> </div>
<div class="line"> <span class="keywordflow">return</span> 0;</div>
<div class="line">}</div>
<div class="ttc" id="aclasstf_1_1Executor_html"><div class="ttname"><a href="classtf_1_1Executor.html">tf::Executor</a></div><div class="ttdoc">class to create an executor</div><div class="ttdef"><b>Definition</b> executor.hpp:62</div></div>
<div class="ttc" id="aclasstf_1_1Executor_html_a519777f5783981d534e9e53b99712069"><div class="ttname"><a href="classtf_1_1Executor.html#a519777f5783981d534e9e53b99712069">tf::Executor::run</a></div><div class="ttdeci">tf::Future< void > run(Taskflow &taskflow)</div><div class="ttdoc">runs a taskflow once</div></div>
<div class="ttc" id="aclasstf_1_1FlowBuilder_html_a4d52a7fe2814b264846a2085e931652c"><div class="ttname"><a href="classtf_1_1FlowBuilder.html#a4d52a7fe2814b264846a2085e931652c">tf::FlowBuilder::emplace</a></div><div class="ttdeci">Task emplace(C &&callable)</div><div class="ttdoc">creates a static task</div><div class="ttdef"><b>Definition</b> flow_builder.hpp:1571</div></div>
<div class="ttc" id="aclasstf_1_1Task_html"><div class="ttname"><a href="classtf_1_1Task.html">tf::Task</a></div><div class="ttdoc">class to create a task handle over a taskflow node</div><div class="ttdef"><b>Definition</b> task.hpp:569</div></div>
<div class="ttc" id="aclasstf_1_1Task_html_a8c78c453295a553c1c016e4062da8588"><div class="ttname"><a href="classtf_1_1Task.html#a8c78c453295a553c1c016e4062da8588">tf::Task::precede</a></div><div class="ttdeci">Task & precede(Ts &&... tasks)</div><div class="ttdoc">adds precedence links from this to other tasks</div><div class="ttdef"><b>Definition</b> task.hpp:1258</div></div>
<div class="ttc" id="aclasstf_1_1Taskflow_html"><div class="ttname"><a href="classtf_1_1Taskflow.html">tf::Taskflow</a></div><div class="ttdoc">class to create a taskflow object</div><div class="ttdef"><b>Definition</b> taskflow.hpp:64</div></div>
</div><!-- fragment --><div class="fragment"><div class="line"><span class="comment">// cudaflow.cpp</span></div>
<div class="line"><span class="preprocessor">#include <taskflow/taskflow.hpp></span></div>
<div class="line"><span class="preprocessor">#include <taskflow/cudaflow.hpp></span></div>
<div class="line"> </div>
<div class="line"><a class="code hl_class" href="classtf_1_1Task.html">tf::Task</a> make_cudaflow(<a class="code hl_class" href="classtf_1_1Taskflow.html">tf::Taskflow</a>& taskflow) {</div>
<div class="line"> <span class="keywordflow">return</span> taskflow.<a class="code hl_function" href="classtf_1_1FlowBuilder.html#a4d52a7fe2814b264846a2085e931652c">emplace</a>([](){</div>
<div class="line"> <span class="comment">// create a CUDA graph with a single-threaded task</span></div>
<div class="line"> <a class="code hl_typedef" href="namespacetf.html#a713c427e4f9841a90dec67045a3babed">tf::cudaGraph</a> cg;</div>
<div class="line"> cf.<a class="code hl_function" href="classtf_1_1cudaGraphBase.html#abb33299f42206f30f1d0f35c7c6fe6de">single_task</a>([] __device__ () { printf(<span class="stringliteral">"hello CUDA Graph!\n"</span>); });</div>
<div class="line"> </div>
<div class="line"> <span class="comment">// instantiate an executable CUDA graph and run it through a stream</span></div>
<div class="line"> <a class="code hl_typedef" href="namespacetf.html#af19c9b301dc0b0fe2a51a960fa427e83">tf::cudaStream</a> stream;</div>
<div class="line"> <a class="code hl_typedef" href="namespacetf.html#a2be50e6880ead1d49a3fec2fc4bb893e">tf::cudaGraphExec</a> exec(cg);</div>
<div class="line"> </div>
<div class="line"> stream.<a class="code hl_function" href="classtf_1_1cudaStreamBase.html#a7dcdfb79385a57c4c59b7c9f21e8beb9">run</a>(cg).<a class="code hl_function" href="classtf_1_1cudaStreamBase.html#a1e5140505629afd4b3422399f8080cb0">synchronize</a>();</div>
<div class="line"> }).name(<span class="stringliteral">"gpu task"</span>);</div>
<div class="line">}</div>
<div class="ttc" id="aclasstf_1_1cudaGraphBase_html_abb33299f42206f30f1d0f35c7c6fe6de"><div class="ttname"><a href="classtf_1_1cudaGraphBase.html#abb33299f42206f30f1d0f35c7c6fe6de">tf::cudaGraphBase::single_task</a></div><div class="ttdeci">cudaTask single_task(C c)</div><div class="ttdoc">runs a callable with only a single kernel thread</div></div>
</div><!-- fragment --><p>Compile each source to an object (<code>g++</code> as an example):</p>
<div class="fragment"><div class="line">~$ g++ -std=c++17 -I path/to/taskflow -c main.cpp -o main.o</div>
<div class="line">~$ nvcc -std=c++17 --extended-lambda -x cu -I path/to/taskflow \</div>
<div class="line"> -dc cudaflow.cpp -o cudaflow.o</div>
<div class="line">~$ ls</div>
<div class="line"># now we have the two compiled .o objects, main.o and cudaflow.o</div>
<div class="line">main.o cudaflow.o </div>
</div><!-- fragment --><p>The <code>--extended-lambda</code> option tells <code>nvcc</code> to generate GPU code for the lambda defined with <code><b>device</b></code>. The <code>-x cu</code> tells <code>nvcc</code> to treat the input files as <code></code>.cu files containing both CPU and GPU code. By default, <code>nvcc</code> treats <code></code>.cpp files as CPU-only code. This option is required to have <code>nvcc</code> generate device code here, but it is also a handy way to avoid renaming source files in larger projects. The <code>–dc</code> option tells <code>nvcc</code> to generate device code for later linking.</p>
<p>You may also need to specify the target architecture to tell <code>nvcc</code> to target on a compatible SM architecture using the option -arch. For instance, the following command requires device code linking to have compute capability 7.5 or later:</p>
<div class="fragment"><div class="line">~$ nvcc -std=c++17 --extended-lambda -x cu -arch=sm_75 -I path/to/taskflow \</div>
<div class="line"> -dc cudaflow.cpp -o cudaflow.o</div>
</div><!-- fragment --><h2><a class="anchor" id="CompileTaskflowWithCUDANaiveLinking"></a>
Link Objects Using nvcc</h2>
<p>Using <code>nvcc</code> to link compiled object code is nothing special but replacing the normal compiler with <code>nvcc</code> and it takes care of all the necessary steps:</p>
<div class="fragment"><div class="line">~$ nvcc main.o cudaflow.o -o main</div>
<div class="line"> </div>
<div class="line"># run the main program </div>
<div class="line">~$ ./main</div>
<div class="line">main.cpp!</div>
<div class="line">cudaflow.cpp!</div>
</div><!-- fragment --><h2><a class="anchor" id="CompileTaskflowWithCUDADifferentLinkers"></a>
Link Objects Using Different Linkers</h2>
<p>You can choose to use a compiler other than <code>nvcc</code> for the final link step. Since your CPU compiler does not know how to link CUDA device code, you have to add a step in your build to have <code>nvcc</code> link the CUDA device code, using the option <code>-dlink:</code> </p>
<div class="fragment"><div class="line">~$ nvcc -o gpuCode.o -dlink main.o cudaflow.o</div>
</div><!-- fragment --><p>This step links all the <em>device object code</em> and places it into <code>gpuCode.o</code>.</p>
<dl class="section note"><dt>Note</dt><dd>Note that this step does not link the CPU object code and discards the CPU object code in <code>main.o</code> and <code>cudaflow.o</code>.</dd></dl>
<p>To complete the link to an executable, you can use, for example, <code>ld</code> or <code>g++</code>.</p>
<div class="fragment"><div class="line"># replace /usr/local/cuda/lib64 with your own CUDA library installation path</div>
<div class="line">~$ g++ -pthread -L /usr/local/cuda/lib64/ -lcudart \</div>
<div class="line"> gpuCode.o main.o cudaflow.o -o main</div>
<div class="line"> </div>
<div class="line"># run the main program</div>
<div class="line">~$ ./main</div>
<div class="line">main.cpp!</div>
<div class="line">cudaflow.cpp!</div>
</div><!-- fragment --><p>We give <code>g++</code> all of the objects again because it needs the CPU object code, which is not in <code>gpuCode.o</code>. The device code stored in the original objects, <code>main.o</code> and <code>cudaflow.o</code>, does not conflict with the code in <code>gpuCode.o</code>. <code>g++</code> ignores device code because it does not know how to link it, and the device code in <code>gpuCode.o</code> is already linked and ready to go.</p>
<dl class="section note"><dt>Note</dt><dd>This intentional ignorance is extremely useful in large builds where intermediate objects may have both CPU and GPU code. In this case, we just let the GPU and CPU linkers each do its own job, noting that the CPU linker is always the last one we run. The CUDA <a class="el" href="classtf_1_1Runtime.html" title="class to create a runtime task">Runtime</a> API library is automatically linked when we use <code>nvcc</code> for linking, but we must explicitly link it (<code>-lcudart</code>) when using another linker. </dd></dl>
</div></div><!-- contents -->
</div><!-- PageDoc -->
</div><!-- doc-content -->
<!-- HTML footer for doxygen 1.13.1-->
<!-- start footer part -->
<div id="nav-path" class="navpath"><!-- id is needed for treeview function! -->
<ul>
<li class="navelem"><a class="el" href="install.html">Building and Installing</a></li>
<li class="footer">
Maintained by <a href="https://tsung-wei-huang.github.io/">Dr. Tsung-Wei Huang</a>
—
Generated by <a href="https://www.doxygen.org/index.html"><img class="footer" src="doxygen.svg" width="104" height="31" alt="doxygen"/></a> 1.13.1
</li>
</ul>
</div>