Skip to content

Commit 731f187

Browse files
updated cudaFlow to README
1 parent 2e10350 commit 731f187

147 files changed

Lines changed: 3419 additions & 683 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

README.md

Lines changed: 92 additions & 54 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
[![Wiki](image/api-doc.svg)][wiki]
77
[![Cite](image/cite-ipdps.svg)](doxygen/reference/ipdps19.pdf)
88

9-
A fast C++ *header-only* library to help you quickly write parallel programs with complex task dependencies
9+
A header-only C++ library to help you quickly write parallel and heterogeneous programs using task models
1010

1111
# Why Cpp-Taskflow?
1212

@@ -38,15 +38,16 @@ Cpp-Taskflow supports conditional tasking for you to implement cyclic and dynami
3838
| ![](image/condition.svg) |
3939

4040
Cpp-Taskflow is composable. You can create large parallel graphs through
41-
composition of modular and reusable blocks that are easier to optimize.
41+
composition of modular and reusable blocks that are easier to optimize
42+
at an individual scope.
4243

43-
| [Graph Composition](#taskflow-composition) |
44+
| [Taskflow Composition](#composable-tasking) |
4445
| :---------------: |
4546
|![](image/framework.png)|
4647

4748
Cpp-Taskflow supports heterogeneous tasking for you to
48-
speed up a wide range of computing applications
49-
by harnessing the power of both CPUs and GPUs.
49+
speed up a wide range of scientific computing applications
50+
by harnessing the power of CPU-GPU collaborative computing.
5051

5152
| [Concurrent CPU-GPU Tasking](#concurrent-cpu-gpu-tasking) |
5253
| :-----------------: |
@@ -75,8 +76,10 @@ Technical details can be referred to our [IEEE IPDPS19 paper][IPDPS19].
7576
* [Conditional Tasking](#conditional-tasking)
7677
* [Step 1: Create a Condition Task](#step-1-create-a-condition-task)
7778
* [Step 2: Scheduling Rules for Condition Tasks](#step-2-scheduling-rules-for-condition-tasks)
78-
* [Taskflow Composition](#taskflow-composition)
79+
* [Composable Tasking](#composable-tasking)
7980
* [Concurrent CPU-GPU Tasking](#concurrent-cpu-gpu-tasking)
81+
* [Step 1: Create a cudaFlow](#step-1-create-a-cudaflow)
82+
* [Step 2: Compile and Execute a cudaFlow](#step-2-compile-and-execute-a-cudaflow)
8083
* [Visualize a Taskflow Graph](#visualize-a-taskflow-graph)
8184
* [Monitor Thread Activities](#monitor-thread-activities)
8285
* [API Reference](#api-reference)
@@ -119,7 +122,7 @@ int main(){
119122
Compile and run the code with the following commands:
120123

121124
```bash
122-
~$ g++ simple.cpp -std=c++1z -O2 -lpthread -o simple
125+
~$ g++ simple.cpp -I path/to/include/taskflow/ -std=c++17 -O2 -lpthread -o simple
123126
~$ ./simple
124127
TaskA
125128
TaskC <-- concurrent with TaskB
@@ -240,6 +243,7 @@ tf::Task B = tf.emplace([] (tf::Subflow& subflow) {
240243
tf::Task B1 = subflow.emplace([](){}).name("B1");
241244
tf::Task B2 = subflow.emplace([](){}).name("B2");
242245
tf::Task B3 = subflow.emplace([](){}).name("B3");
246+
243247
B1.precede(B3);
244248
B2.precede(B3);
245249

@@ -251,31 +255,6 @@ tf::Task B = tf.emplace([] (tf::Subflow& subflow) {
251255
A subflow can be nested or recursive. You can create another subflow from
252256
the execution of a subflow and so on.
253257

254-
<img align="right" src="image/nested_subflow.svg" width="25%">
255-
256-
```cpp
257-
tf::Task A = tf.emplace([] (tf::Subflow& sbf) {
258-
std::cout << "A spawns A1 & subflow A2\n";
259-
tf::Task A1 = sbf.emplace([] () {
260-
std::cout << "subtask A1\n";
261-
}).name("A1");
262-
263-
tf::Task A2 = sbf.emplace([] (tf::Subflow& sbf2) {
264-
std::cout << "A2 spawns A2_1 & A2_2\n";
265-
tf::Task A2_1 = sbf2.emplace([] () {
266-
std::cout << "subtask A2_1\n";
267-
}).name("A2_1");
268-
269-
tf::Task A2_2 = sbf2.emplace([] () {
270-
std::cout << "subtask A2_2\n";
271-
}).name("A2_2");
272-
A2_1.precede(A2_2);
273-
}).name("A2");
274-
275-
A1.precede(A2);
276-
}).name("A");
277-
```
278-
279258
<div align="right"><b><a href="#table-of-contents">[↑]</a></b></div>
280259

281260
# Conditional Tasking
@@ -341,7 +320,7 @@ Make sure there is no task race.
341320

342321

343322

344-
# Taskflow Composition
323+
# Composable Tasking
345324

346325
A powerful feature of `tf::Taskflow` is composability.
347326
You can create multiple task graphs from different parts of your workload
@@ -364,9 +343,8 @@ auto [f2A, f2B, f2C] = f2.emplace(
364343
);
365344
auto f1_module_task = f2.composed_of(f1);
366345

367-
f2A.precede(f1_module_task);
368-
f2B.precede(f1_module_task);
369-
f1_module_task.precede(f2C);
346+
f1_module_task.succeed(f2A, f2B)
347+
.precede(f2C);
370348
```
371349
372350
Similarly, `composed_of` returns a task handle and you can use
@@ -378,7 +356,80 @@ to compose a larger taskflow and so on.
378356
379357
# Concurrent CPU-GPU Tasking
380358
381-
TBD
359+
Cpp-Taskflow enables concurrent CPU-GPU tasking by leveraging
360+
[Nvidia CUDA Toolkit][cuda-toolkit].
361+
You can harness the power of CPU-GPU collaborative computing
362+
to implement heterogeneous decomposition algorithms.
363+
364+
## Step 1: Create a cudaFlow
365+
366+
A `tf::cudaFlow` is a graph object created at runtime
367+
similar to dynamic tasking.
368+
It manages a task node in a taskflow and associates it
369+
with a [CUDA Graph][cudaGraph].
370+
To create a cudaFlow, emplace a callable with an argument
371+
of type `tf::cudaFlow`.
372+
373+
374+
375+
```cpp
376+
tf::Taskflow taskflow;
377+
tf::Executor executor;
378+
379+
const unsigned N = 1<<20; // size of the vector
380+
std::vector<float> hx(N, 1.0f), hy(N, 2.0f); // x and y vectors at host
381+
float *dx{nullptr}, *dy{nullptr}; // x and y vectors at device
382+
383+
tf::Task allocate_x = taskflow.emplace([&](){ cudaMalloc(&dx, N*sizeof(float));});
384+
tf::Task allocate_y = taskflow.emplace([&](){ cudaMalloc(&dy, N*sizeof(float));});
385+
tf::Task cudaflow = taskflow.emplace([&](tf::cudaFlow& cf) {
386+
tf::cudaTask h2d_x = cf.copy(dx, hx.data(), N); // host-to-device x data transfer
387+
tf::cudaTask h2d_y = cf.copy(dy, hy.data(), N); // host-to-device y data transfer
388+
tf::cudaTask d2h_x = cf.copy(hx.data(), dx, N); // device-to-host x data transfer
389+
tf::cudaTask d2h_y = cf.copy(hy.data(), dy, N); // device-to-host y data transfer
390+
// launch saxpy<<<(N+255)/256, 256, 0>>>(N, 2.0f, dx, dy)
391+
tf::cudaTask kernel = cf.kernel((N+255)/256, 256, 0, saxpy, N, 2.0f, dx, dy);
392+
kernel.succeed(h2d_x, h2d_y)
393+
.precede(d2h_x, d2h_y);
394+
});
395+
cudaflow.succeed(allocate_x, allocate_y); // overlap data allocations
396+
397+
executor.run(taskflow).wait();
398+
```
399+
400+
Assume our kernel implements the canonical saxpy operation
401+
(single-precision A·X Plus Y) using the CUDA syntax.
402+
403+
<img align="right" src="image/saxpy.svg" width="50%">
404+
405+
```cpp
406+
// saxpy (single-precision A·X Plus Y) kernel
407+
__global__ void saxpy(
408+
int n, float a, float *x, float *y
409+
) {
410+
// get the thread index
411+
int i = blockIdx.x*blockDim.x + threadIdx.x;
412+
413+
if (i < n) {
414+
y[i] = a*x[i] + y[i];
415+
}
416+
}
417+
```
418+
419+
420+
421+
## Step 2: Compile and Execute a cudaFlow
422+
423+
Name you source with the extension `.cu`, let's say `saxpy.cu`,
424+
and compile it through [nvcc][nvcc]:
425+
426+
```bash
427+
~$ nvcc saxpy.cu -I path/to/include/taskflow -O2 -o saxpy
428+
~$ ./saxpy
429+
```
430+
431+
Our source autonomously enables cudaFlow for compilers that support
432+
CUDA.
382433

383434
<div align="right"><b><a href="#table-of-contents">[↑]</a></b></div>
384435

@@ -404,9 +455,9 @@ B.precede(D, E);
404455
taskflow.dump(std::cout); // dump the graph in DOT to std::cout
405456
```
406457

407-
When you have dynamic tasks (subflows),
408-
you cannot simply use the `dump` method because it displays only the static portion.
409-
Instead, you need to execute the graph first to spawn dynamic tasks.
458+
When you have tasks that are created at runtime (e.g., subflow, cudaFlow),
459+
you need to execute the graph first to spawn these tasks
460+
and dump the entire graph.
410461

411462
<img align="right" src="image/debug_subflow.svg" width="25%">
412463

@@ -557,19 +608,6 @@ auto [S, T] = tf.parallel_for(
557608
// will print 0, 2, 4, 6, 8, 10 (three partitions, {0, 2}, {4, 6}, {8, 10})
558609
```
559610

560-
You can also do opposite direction with negative step size.
561-
562-
```cpp
563-
// [10, -1) with a step size of -2
564-
auto [S, T] = tf.parallel_for(
565-
10, -1, 2,
566-
[] (int i) {
567-
std::cout << "parallel_for on index " << i << std::endl;
568-
}
569-
);
570-
// will print 10, 8, 6, 4, 2, 0
571-
```
572-
573611
## Task API
574612

575613
Each time you create a task, the taskflow object adds a node to the present task dependency graph
@@ -779,7 +817,7 @@ Cpp-Taskflow is licensed under the [MIT License](./LICENSE).
779817
[GitHub pull requests]: https://github.com/cpp-taskflow/cpp-taskflow/pulls
780818
[GitHub contributors]: https://github.com/cpp-taskflow/cpp-taskflow/graphs/contributors
781819
[GraphViz]: https://www.graphviz.org/
782-
[AwesomeGraphViz]: https://github.com/CodeFreezr/awesome-graphviz
820+
[AwesomeGraphViz]: https://dreampuf.github.io/GraphvizOnline/
783821
[OpenMP Tasking]: https://www.openmp.org/spec-html/5.0/openmpsu99.html
784822
[TBB FlowGraph]: https://www.threadingbuildingblocks.org/tutorial-intel-tbb-flow-graph
785823
[OpenTimer]: https://github.com/OpenTimer/OpenTimer

docs/FAQ.html

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -110,7 +110,7 @@ <h2><a class="anchor" id="GeneralQuestion1"></a>
110110
<p>Cpp-Taskflow aims to help C++ developers quickly implement efficient parallel decomposition strategies using task-based approaches.</p>
111111
<h2><a class="anchor" id="GeneralQuestion2"></a>
112112
Q2: How do I use Cpp-Taskflow in my projects?</h2>
113-
<p>Cpp-Taskflow is a header-only library with zero dependencies. The only thing you need is a C++14 compiler. To use Cpp-Taskflow, simply drop the folder <code>taskflow/</code> to your project and include taskflow.hpp.</p>
113+
<p>Cpp-Taskflow is a header-only library with zero dependencies. The only thing you need is a C++14 compiler. To use Cpp-Taskflow, simply drop the folder <code>taskflow/</code> to your project and include <a class="el" href="taskflow_8hpp_source.html">taskflow.hpp</a>.</p>
114114
<h2><a class="anchor" id="GeneralQuestion3"></a>
115115
Q3: What is the difference between static tasking and dynamic tasking?</h2>
116116
<p>Static tasking refers to those tasks created before execution, while dynamic tasking refers to those tasks created during the execution of static tasks or dynamic tasks (nested). Dynamic tasks created by the same task node are grouped together to a subflow.</p>
@@ -125,7 +125,7 @@ <h2><a class="anchor" id="GeneralQuestion6"></a>
125125
<p>Unfortunately, Cpp-Taskflow is heavily relying on modern C++14's features/idoms/STL and it is very difficult to provide a version that compiles under older C++ versions.</p>
126126
<h2><a class="anchor" id="GeneralQuestion7"></a>
127127
Q7: How does Cpp-Taskflow schedule tasks?</h2>
128-
<p>Cpp-Taskflow implemented a very efficient <a href="https://en.wikipedia.org/wiki/Work_stealing">work-stealing scheduler</a> to execute task dependency graphs. The source code is available at <code><a class="el" href="executor_8hpp_source.html">taskflow/core/executor.hpp</a></code> and <code>taskflow/core/wsq.hpp</code>. </p><hr/>
128+
<p>Cpp-Taskflow implemented a very efficient <a href="https://en.wikipedia.org/wiki/Work_stealing">work-stealing scheduler</a> to execute task dependency graphs. The source code is available at <code><a class="el" href="executor_8hpp_source.html">taskflow/core/executor.hpp</a></code> and <code><a class="el" href="tsq_8hpp_source.html">taskflow/core/tsq.hpp</a></code>. </p><hr/>
129129
<h1><a class="anchor" id="ProgrammingQuestions"></a>
130130
Programming Questions</h1>
131131
<h2><a class="anchor" id="ProgrammingQuestions1"></a>

docs/annotated.html

Lines changed: 15 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -105,17 +105,21 @@
105105
<div class="textblock">Here are the classes, structs, unions and interfaces with brief descriptions:</div><div class="directory">
106106
<div class="levels">[detail level <span onclick="javascript:toggleLevel(1);">1</span><span onclick="javascript:toggleLevel(2);">2</span>]</div><table class="directory">
107107
<tr id="row_0_" class="even"><td class="entry"><span style="width:0px;display:inline-block;">&#160;</span><span id="arr_0_" class="arrow" onclick="toggleFolder('0_')">&#9660;</span><span class="icona"><span class="icon">N</span></span><b>tf</b></td><td class="desc"></td></tr>
108-
<tr id="row_0_0_"><td class="entry"><span style="width:32px;display:inline-block;">&#160;</span><span class="icona"><span class="icon">C</span></span><a class="el" href="classtf_1_1Executor.html" target="_self">Executor</a></td><td class="desc">The executor class to run a taskflow graph </td></tr>
109-
<tr id="row_0_1_" class="even"><td class="entry"><span style="width:32px;display:inline-block;">&#160;</span><span class="icona"><span class="icon">C</span></span><a class="el" href="classtf_1_1ExecutorObserver.html" target="_self">ExecutorObserver</a></td><td class="desc">Default executor observer to dump the execution timelines </td></tr>
110-
<tr id="row_0_2_"><td class="entry"><span style="width:32px;display:inline-block;">&#160;</span><span class="icona"><span class="icon">C</span></span><a class="el" href="classtf_1_1ExecutorObserverInterface.html" target="_self">ExecutorObserverInterface</a></td><td class="desc">The interface class for creating an executor observer </td></tr>
111-
<tr id="row_0_3_" class="even"><td class="entry"><span style="width:32px;display:inline-block;">&#160;</span><span class="icona"><span class="icon">C</span></span><a class="el" href="classtf_1_1FlowBuilder.html" target="_self">FlowBuilder</a></td><td class="desc">Building blocks of a task dependency graph </td></tr>
112-
<tr id="row_0_4_"><td class="entry"><span style="width:32px;display:inline-block;">&#160;</span><span class="icona"><span class="icon">C</span></span><a class="el" href="classtf_1_1Subflow.html" target="_self">Subflow</a></td><td class="desc">The building blocks of dynamic tasking </td></tr>
113-
<tr id="row_0_5_" class="even"><td class="entry"><span style="width:32px;display:inline-block;">&#160;</span><span class="icona"><span class="icon">C</span></span><a class="el" href="classtf_1_1Task.html" target="_self">Task</a></td><td class="desc"><a class="el" href="classtf_1_1Task.html" title="task handle to a node in a task dependency graph ">Task</a> handle to a node in a task dependency graph </td></tr>
114-
<tr id="row_0_6_"><td class="entry"><span style="width:32px;display:inline-block;">&#160;</span><span class="icona"><span class="icon">C</span></span><a class="el" href="classtf_1_1Taskflow.html" target="_self">Taskflow</a></td><td class="desc">Class to create a task dependency graph </td></tr>
115-
<tr id="row_0_7_" class="even"><td class="entry"><span style="width:32px;display:inline-block;">&#160;</span><span class="icona"><span class="icon">C</span></span><a class="el" href="classtf_1_1TaskView.html" target="_self">TaskView</a></td><td class="desc">Immutable accessor class to a task node, mainly used in the <a class="el" href="classtf_1_1ExecutorObserver.html" title="Default executor observer to dump the execution timelines. ">tf::ExecutorObserver</a> interface </td></tr>
116-
<tr id="row_1_"><td class="entry"><span style="width:16px;display:inline-block;">&#160;</span><span class="icona"><span class="icon">C</span></span><a class="el" href="structis__condition__task.html" target="_self">is_condition_task</a></td><td class="desc">Determines if a callable is a condition task </td></tr>
117-
<tr id="row_2_" class="even"><td class="entry"><span style="width:16px;display:inline-block;">&#160;</span><span class="icona"><span class="icon">C</span></span><a class="el" href="structis__dynamic__task.html" target="_self">is_dynamic_task</a></td><td class="desc">Determines if a callable is a dynamic task </td></tr>
118-
<tr id="row_3_"><td class="entry"><span style="width:16px;display:inline-block;">&#160;</span><span class="icona"><span class="icon">C</span></span><a class="el" href="structis__static__task.html" target="_self">is_static_task</a></td><td class="desc">Determines if a callable is a static task </td></tr>
108+
<tr id="row_0_0_"><td class="entry"><span style="width:32px;display:inline-block;">&#160;</span><span class="icona"><span class="icon">C</span></span><a class="el" href="classtf_1_1cudaFlow.html" target="_self">cudaFlow</a></td><td class="desc">Building methods of a cuda task dependency graph </td></tr>
109+
<tr id="row_0_1_" class="even"><td class="entry"><span style="width:32px;display:inline-block;">&#160;</span><span class="icona"><span class="icon">C</span></span><a class="el" href="classtf_1_1cudaTask.html" target="_self">cudaTask</a></td><td class="desc">Handle to a node in a cudaGraph </td></tr>
110+
<tr id="row_0_2_"><td class="entry"><span style="width:32px;display:inline-block;">&#160;</span><span class="icona"><span class="icon">C</span></span><a class="el" href="classtf_1_1Executor.html" target="_self">Executor</a></td><td class="desc">The executor class to run a taskflow graph </td></tr>
111+
<tr id="row_0_3_" class="even"><td class="entry"><span style="width:32px;display:inline-block;">&#160;</span><span class="icona"><span class="icon">C</span></span><a class="el" href="classtf_1_1ExecutorObserver.html" target="_self">ExecutorObserver</a></td><td class="desc">Default executor observer to dump the execution timelines </td></tr>
112+
<tr id="row_0_4_"><td class="entry"><span style="width:32px;display:inline-block;">&#160;</span><span class="icona"><span class="icon">C</span></span><a class="el" href="classtf_1_1ExecutorObserverInterface.html" target="_self">ExecutorObserverInterface</a></td><td class="desc">The interface class for creating an executor observer </td></tr>
113+
<tr id="row_0_5_" class="even"><td class="entry"><span style="width:32px;display:inline-block;">&#160;</span><span class="icona"><span class="icon">C</span></span><a class="el" href="classtf_1_1FlowBuilder.html" target="_self">FlowBuilder</a></td><td class="desc">Building methods of a task dependency graph </td></tr>
114+
<tr id="row_0_6_"><td class="entry"><span style="width:32px;display:inline-block;">&#160;</span><span class="icona"><span class="icon">C</span></span><a class="el" href="classtf_1_1Subflow.html" target="_self">Subflow</a></td><td class="desc">The building blocks of dynamic tasking </td></tr>
115+
<tr id="row_0_7_" class="even"><td class="entry"><span style="width:32px;display:inline-block;">&#160;</span><span class="icona"><span class="icon">C</span></span><a class="el" href="classtf_1_1Task.html" target="_self">Task</a></td><td class="desc">Handle to a node in a task dependency graph </td></tr>
116+
<tr id="row_0_8_"><td class="entry"><span style="width:32px;display:inline-block;">&#160;</span><span class="icona"><span class="icon">C</span></span><a class="el" href="classtf_1_1Taskflow.html" target="_self">Taskflow</a></td><td class="desc">Class to create a task dependency graph </td></tr>
117+
<tr id="row_0_9_" class="even"><td class="entry"><span style="width:32px;display:inline-block;">&#160;</span><span class="icona"><span class="icon">C</span></span><a class="el" href="classtf_1_1TaskQueue.html" target="_self">TaskQueue</a></td><td class="desc">Lock-free unbounded single-producer multiple-consumer queue </td></tr>
118+
<tr id="row_0_10_"><td class="entry"><span style="width:32px;display:inline-block;">&#160;</span><span class="icona"><span class="icon">C</span></span><a class="el" href="classtf_1_1TaskView.html" target="_self">TaskView</a></td><td class="desc">Immutable accessor class to a task node, mainly used in the <a class="el" href="classtf_1_1ExecutorObserver.html" title="Default executor observer to dump the execution timelines. ">tf::ExecutorObserver</a> interface </td></tr>
119+
<tr id="row_1_" class="even"><td class="entry"><span style="width:16px;display:inline-block;">&#160;</span><span class="icona"><span class="icon">C</span></span><a class="el" href="structis__condition__task.html" target="_self">is_condition_task</a></td><td class="desc">Determines if a callable is a condition task </td></tr>
120+
<tr id="row_2_"><td class="entry"><span style="width:16px;display:inline-block;">&#160;</span><span class="icona"><span class="icon">C</span></span><a class="el" href="structis__cudaflow__task.html" target="_self">is_cudaflow_task</a></td><td class="desc">Determines if a callable is a cudaflow task </td></tr>
121+
<tr id="row_3_" class="even"><td class="entry"><span style="width:16px;display:inline-block;">&#160;</span><span class="icona"><span class="icon">C</span></span><a class="el" href="structis__dynamic__task.html" target="_self">is_dynamic_task</a></td><td class="desc">Determines if a callable is a dynamic task </td></tr>
122+
<tr id="row_4_"><td class="entry"><span style="width:16px;display:inline-block;">&#160;</span><span class="icona"><span class="icon">C</span></span><a class="el" href="structis__static__task.html" target="_self">is_static_task</a></td><td class="desc">Determines if a callable is a static task </td></tr>
119123
</table>
120124
</div><!-- directory -->
121125
</div><!-- contents -->

0 commit comments

Comments
 (0)