forked from taskflow/taskflow
-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathCUDASTDTransform.html
More file actions
131 lines (127 loc) · 12.7 KB
/
Copy pathCUDASTDTransform.html
File metadata and controls
131 lines (127 loc) · 12.7 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<title>CUDA Standard Algorithms » Parallel Transforms | Taskflow QuickStart</title>
<link rel="stylesheet" href="https://fonts.googleapis.com/css?family=Source+Sans+Pro:400,400i,600,600i%7CSource+Code+Pro:400,400i,600" />
<link rel="stylesheet" href="m-dark+documentation.compiled.css" />
<link rel="icon" href="favicon.ico" type="image/vnd.microsoft.icon" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<meta name="theme-color" content="#22272e" />
</head>
<body>
<header><nav id="navigation">
<div class="m-container">
<div class="m-row">
<span id="m-navbar-brand" class="m-col-t-8 m-col-m-none m-left-m">
<a href="https://taskflow.github.io"><img src="taskflow_logo.png" alt="" />Taskflow</a> <span class="m-breadcrumb">|</span> <a href="index.html" class="m-thin">QuickStart</a>
</span>
<div class="m-col-t-4 m-hide-m m-text-right m-nopadr">
<a href="#search" class="m-doc-search-icon" title="Search" onclick="return showSearch()"><svg style="height: 0.9rem;" viewBox="0 0 16 16">
<path id="m-doc-search-icon-path" d="m6 0c-3.31 0-6 2.69-6 6 0 3.31 2.69 6 6 6 1.49 0 2.85-0.541 3.89-1.44-0.0164 0.338 0.147 0.759 0.5 1.15l3.22 3.79c0.552 0.614 1.45 0.665 2 0.115 0.55-0.55 0.499-1.45-0.115-2l-3.79-3.22c-0.392-0.353-0.812-0.515-1.15-0.5 0.895-1.05 1.44-2.41 1.44-3.89 0-3.31-2.69-6-6-6zm0 1.56a4.44 4.44 0 0 1 4.44 4.44 4.44 4.44 0 0 1-4.44 4.44 4.44 4.44 0 0 1-4.44-4.44 4.44 4.44 0 0 1 4.44-4.44z"/>
</svg></a>
<a id="m-navbar-show" href="#navigation" title="Show navigation"></a>
<a id="m-navbar-hide" href="#" title="Hide navigation"></a>
</div>
<div id="m-navbar-collapse" class="m-col-t-12 m-show-m m-col-m-none m-right-m">
<div class="m-row">
<ol class="m-col-t-6 m-col-m-none">
<li><a href="pages.html">Handbook</a></li>
<li><a href="namespaces.html">Namespaces</a></li>
</ol>
<ol class="m-col-t-6 m-col-m-none" start="3">
<li><a href="annotated.html">Classes</a></li>
<li><a href="files.html">Files</a></li>
<li class="m-show-m"><a href="#search" class="m-doc-search-icon" title="Search" onclick="return showSearch()"><svg style="height: 0.9rem;" viewBox="0 0 16 16">
<use href="#m-doc-search-icon-path" />
</svg></a></li>
</ol>
</div>
</div>
</div>
</div>
</nav></header>
<main><article>
<div class="m-container m-container-inflatable">
<div class="m-row">
<div class="m-col-l-10 m-push-l-1">
<h1>
<span class="m-breadcrumb"><a href="cudaStandardAlgorithms.html">CUDA Standard Algorithms</a> »</span>
Parallel Transforms
</h1>
<nav class="m-block m-default">
<h3>Contents</h3>
<ul>
<li><a href="#CUDASTDParallelTransformsIncludeTheHeader">Include the Header</a></li>
<li><a href="#CUDASTDTransformARangeOfItems">Transform a Range of Items</a></li>
<li><a href="#CUDASTDTransformTwoRangesOfItems">Transform Two Ranges of Items</a></li>
</ul>
</nav>
<p>Taskflow provides template methods for transforming ranges of items to different outputs.</p><section id="CUDASTDParallelTransformsIncludeTheHeader"><h2><a href="#CUDASTDParallelTransformsIncludeTheHeader">Include the Header</a></h2><p>You need to include the header file, <code>taskflow/cuda/algorithm/transform.hpp</code>, for using the parallel-transform algorithm.</p><pre class="m-code"><span class="cp">#include</span><span class="w"> </span><span class="cpf"><taskflow/cuda/algorithm/transform.hpp></span><span class="cp"></span></pre></section><section id="CUDASTDTransformARangeOfItems"><h2><a href="#CUDASTDTransformARangeOfItems">Transform a Range of Items</a></h2><p>Parallel-transform algorithm applies the given transform function to a range of items and store the result in another range specified by two iterators, <code>first</code> and <code>last</code>. The task created by <a href="namespacetf.html#a3ed764530620a419e3400e1f9ab6c956" class="m-doc">tf::<wbr />cuda_transform(P&& p, I first, I last, O output, C op)</a> represents a parallel execution for the following loop:</p><pre class="m-code"><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">first</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">last</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="o">*</span><span class="n">output</span><span class="o">++</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">op</span><span class="p">(</span><span class="o">*</span><span class="n">first</span><span class="o">++</span><span class="p">);</span><span class="w"></span>
<span class="p">}</span><span class="w"></span></pre><p>The following example creates a transform kernel that transforms an input range of <code>N</code> items to an output range by multiplying each item by 10.</p><pre class="m-code"><span class="n">tf</span><span class="o">::</span><span class="n">cudaDefaultExecutionPolicy</span><span class="w"> </span><span class="n">policy</span><span class="p">;</span><span class="w"></span>
<span class="c1">// output[i] = input[i]*10</span>
<span class="n">tf</span><span class="o">::</span><span class="n">cuda_transform</span><span class="p">(</span><span class="w"></span>
<span class="w"> </span><span class="n">policy</span><span class="p">,</span><span class="w"> </span><span class="n">input</span><span class="p">,</span><span class="w"> </span><span class="n">input</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">N</span><span class="p">,</span><span class="w"> </span><span class="n">output</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="w"> </span><span class="n">__device__</span><span class="w"> </span><span class="p">(</span><span class="kt">int</span><span class="w"> </span><span class="n">x</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">x</span><span class="o">*</span><span class="mi">10</span><span class="p">;</span><span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="p">);</span><span class="w"></span>
<span class="c1">// synchronize the execution</span>
<span class="n">policy</span><span class="p">.</span><span class="n">synchronize</span><span class="p">();</span><span class="w"></span></pre><p>Each iteration is independent of each other and is assigned one kernel thread to run the callable. The transform algorithm runs <em>asynchronously</em> through the stream specified in the execution policy. You need to synchronize the stream to obtain correct results.</p></section><section id="CUDASTDTransformTwoRangesOfItems"><h2><a href="#CUDASTDTransformTwoRangesOfItems">Transform Two Ranges of Items</a></h2><p>You can transform two ranges of items to an output range through a binary operator. The task created by <a href="namespacetf.html#abdcb5b755f7ace2aa452541d5bf93b5f" class="m-doc">tf::<wbr />cuda_transform(P&& p, I1 first1, I1 last1, I2 first2, O output, C op)</a> represents a parallel execution for the following loop:</p><pre class="m-code"><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">first1</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">last1</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="o">*</span><span class="n">output</span><span class="o">++</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">op</span><span class="p">(</span><span class="o">*</span><span class="n">first1</span><span class="o">++</span><span class="p">,</span><span class="w"> </span><span class="o">*</span><span class="n">first2</span><span class="o">++</span><span class="p">);</span><span class="w"></span>
<span class="p">}</span><span class="w"></span></pre><p>The following example creates a transform kernel that transforms two input ranges of <code>N</code> items to an output range by summing each pair of items in the input ranges.</p><pre class="m-code"><span class="n">tf</span><span class="o">::</span><span class="n">cudaDefaultExecutionPolicy</span><span class="w"> </span><span class="n">policy</span><span class="p">;</span><span class="w"></span>
<span class="c1">// output[i] = input1[i] + inpu2[i]</span>
<span class="n">tf</span><span class="o">::</span><span class="n">cuda_transform</span><span class="p">(</span><span class="n">policy</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="n">input1</span><span class="p">,</span><span class="w"> </span><span class="n">input1</span><span class="o">+</span><span class="n">N</span><span class="p">,</span><span class="w"> </span><span class="n">input2</span><span class="p">,</span><span class="w"> </span><span class="n">output</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="n">__device__</span><span class="p">(</span><span class="kt">int</span><span class="w"> </span><span class="n">a</span><span class="p">,</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">b</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">a</span><span class="o">+</span><span class="n">b</span><span class="p">;</span><span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="p">);</span><span class="w"> </span>
<span class="c1">// synchronize the execution</span>
<span class="n">policy</span><span class="p">.</span><span class="n">synchronize</span><span class="p">();</span><span class="w"></span></pre></section>
</div>
</div>
</div>
</article></main>
<div class="m-doc-search" id="search">
<a href="#!" onclick="return hideSearch()"></a>
<div class="m-container">
<div class="m-row">
<div class="m-col-m-8 m-push-m-2">
<div class="m-doc-search-header m-text m-small">
<div><span class="m-label m-default">Tab</span> / <span class="m-label m-default">T</span> to search, <span class="m-label m-default">Esc</span> to close</div>
<div id="search-symbolcount">…</div>
</div>
<div class="m-doc-search-content">
<form>
<input type="search" name="q" id="search-input" placeholder="Loading …" disabled="disabled" autofocus="autofocus" autocomplete="off" spellcheck="false" />
</form>
<noscript class="m-text m-danger m-text-center">Unlike everything else in the docs, the search functionality <em>requires</em> JavaScript.</noscript>
<div id="search-help" class="m-text m-dim m-text-center">
<p class="m-noindent">Search for symbols, directories, files, pages or
modules. You can omit any prefix from the symbol or file path; adding a
<code>:</code> or <code>/</code> suffix lists all members of given symbol or
directory.</p>
<p class="m-noindent">Use <span class="m-label m-dim">↓</span>
/ <span class="m-label m-dim">↑</span> to navigate through the list,
<span class="m-label m-dim">Enter</span> to go.
<span class="m-label m-dim">Tab</span> autocompletes common prefix, you can
copy a link to the result using <span class="m-label m-dim">⌘</span>
<span class="m-label m-dim">L</span> while <span class="m-label m-dim">⌘</span>
<span class="m-label m-dim">M</span> produces a Markdown link.</p>
</div>
<div id="search-notfound" class="m-text m-warning m-text-center">Sorry, nothing was found.</div>
<ul id="search-results"></ul>
</div>
</div>
</div>
</div>
</div>
<script src="search-v2.js"></script>
<script src="searchdata-v2.js" async="async"></script>
<footer><nav>
<div class="m-container">
<div class="m-row">
<div class="m-col-l-10 m-push-l-1">
<p>Taskflow handbook is part of the <a href="https://taskflow.github.io">Taskflow project</a>, copyright © <a href="https://tsung-wei-huang.github.io/">Dr. Tsung-Wei Huang</a>, 2018–2023.<br />Generated by <a href="https://doxygen.org/">Doxygen</a> 1.8.14 and <a href="https://mcss.mosra.cz/">m.css</a>.</p>
</div>
</div>
</div>
</nav></footer>
</body>
</html>