-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Expand file tree
/
Copy pathExamplesConv2D.html
More file actions
286 lines (284 loc) · 22.5 KB
/
Copy pathExamplesConv2D.html
File metadata and controls
286 lines (284 loc) · 22.5 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
<!-- HTML header for doxygen 1.13.1-->
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "https://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en-US">
<head>
<meta http-equiv="Content-Type" content="text/xhtml;charset=UTF-8"/>
<meta http-equiv="X-UA-Compatible" content="IE=11"/>
<meta name="generator" content="Doxygen 1.13.1"/>
<meta name="viewport" content="width=device-width, initial-scale=1"/>
<title>Taskflow: A General-purpose Task-parallel Programming System: 2D Image Convolution</title>
<link href="tabs.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="jquery.js"></script>
<script type="text/javascript" src="dynsections.js"></script>
<script type="text/javascript" src="clipboard.js"></script>
<link href="navtree.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="navtreedata.js"></script>
<script type="text/javascript" src="navtree.js"></script>
<script type="text/javascript" src="resize.js"></script>
<script type="text/javascript" src="cookie.js"></script>
<link href="search/search.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="search/searchdata.js"></script>
<script type="text/javascript" src="search/search.js"></script>
<link href="doxygen.css" rel="stylesheet" type="text/css" />
<link href="custom.css" rel="stylesheet" type="text/css"/>
</head>
<body>
<div id="top"><!-- do not remove this div, it is closed by doxygen! -->
<div id="titlearea">
<table cellspacing="0" cellpadding="0">
<tbody>
<tr id="projectrow">
<td id="projectlogo"><img alt="Logo" src="taskflow_logo.png"/></td>
<td id="projectalign">
<div id="projectname"><a href="https://github.com/taskflow/taskflow" style="color:inherit; text-decoration:none;">Taskflow: A General-purpose Task-parallel Programming System</a>
</div>
</td>
</tr>
</tbody>
</table>
</div>
<!-- end header part -->
<!-- Generated by Doxygen 1.13.1 -->
<script type="text/javascript">
/* @license magnet:?xt=urn:btih:d3d9a9a6595521f9666a5e94cc830dab83b65699&dn=expat.txt MIT */
var searchBox = new SearchBox("searchBox", "search/",'.html');
/* @license-end */
</script>
<script type="text/javascript">
/* @license magnet:?xt=urn:btih:d3d9a9a6595521f9666a5e94cc830dab83b65699&dn=expat.txt MIT */
$(function() { codefold.init(0); });
/* @license-end */
</script>
<script type="text/javascript" src="menudata.js"></script>
<script type="text/javascript" src="menu.js"></script>
<script type="text/javascript">
/* @license magnet:?xt=urn:btih:d3d9a9a6595521f9666a5e94cc830dab83b65699&dn=expat.txt MIT */
$(function() {
initMenu('',true,false,'search.php','Search',true);
$(function() { init_search(); });
});
/* @license-end */
</script>
<div id="main-nav"></div>
</div><!-- top -->
<div id="side-nav" class="ui-resizable side-nav-resizable">
<div id="nav-tree">
<div id="nav-tree-contents">
<div id="nav-sync" class="sync"></div>
</div>
</div>
<div id="splitbar" style="-moz-user-select:none;"
class="ui-resizable-handle">
</div>
</div>
<script type="text/javascript">
/* @license magnet:?xt=urn:btih:d3d9a9a6595521f9666a5e94cc830dab83b65699&dn=expat.txt MIT */
$(function(){initNavTree('ExamplesConv2D.html',''); initResizable(true); });
/* @license-end */
</script>
<div id="doc-content">
<!-- window showing the filter options -->
<div id="MSearchSelectWindow"
onmouseover="return searchBox.OnSearchSelectShow()"
onmouseout="return searchBox.OnSearchSelectHide()"
onkeydown="return searchBox.OnSearchSelectKey(event)">
</div>
<!-- iframe showing the search results (closed by default) -->
<div id="MSearchResultsWindow">
<div id="MSearchResults">
<div class="SRPage">
<div id="SRIndex">
<div id="SRResults"></div>
<div class="SRStatus" id="Loading">Loading...</div>
<div class="SRStatus" id="Searching">Searching...</div>
<div class="SRStatus" id="NoMatches">No Matches</div>
</div>
</div>
</div>
</div>
<div><div class="header">
<div class="headertitle"><div class="title">2D Image Convolution</div></div>
</div><!--header-->
<div class="contents">
<div class="toc"><h3>Table of Contents</h3>
<ul>
<li class="level1">
<a href="#Conv2DIntroduction">What is 2D Convolution?</a>
</li>
<li class="level1">
<a href="#Conv2DWalkthrough">Concrete Walkthrough</a>
</li>
<li class="level1">
<a href="#Conv2DParallelism">Why it Maps Perfectly to IndexRanges?</a>
</li>
<li class="level1">
<a href="#Conv2DImplementation">Implementation</a>
</li>
<li class="level1">
<a href="#Conv2DDesignPoints">Design Points</a>
</li>
</ul>
</div>
<div class="textblock"><p>We implement a parallel 2D image convolution using <a class="el" href="classtf_1_1FlowBuilder.html#a2582a216d54dacca2b7022ea7e89452a" title="constructs a parallel-for task over a one- or multi-dimensional index range">tf::Taskflow::for_each_by_index</a> with a two-dimensional <a class="el" href="classtf_1_1IndexRanges.html" title="class to create an N-dimensional index range of integral indices">tf::IndexRanges</a>. This example demonstrates how a naturally 2D iteration space maps directly onto multidimensional index ranges and why output pixels are unconditionally safe to compute in parallel.</p>
<h1><a class="anchor" id="Conv2DIntroduction"></a>
What is 2D Convolution?</h1>
<p>Convolution is one of the most fundamental operations in image processing and computer vision. Given an input image and a small matrix called a kernel (or filter), convolution produces an output image where each output pixel is a weighted sum of a small neighbourhood of input pixels, with the weights given by the kernel. Depending on the kernel, convolution can blur an image (Gaussian blur), sharpen it, detect edges (Sobel filter), or emboss it. It is also the core operation inside every convolutional neural network (CNN). To see how it works, consider the following <code>5x5</code> input image and a <code>3x3</code> kernel:</p>
<div class="image">
<object type="image/svg+xml" data="conv_operation.svg" style="pointer-events: none;"></object>
</div>
<p>The kernel is placed over a <code>3x3</code> window of the input centred at output pixel <code>(2, 2)</code>. Each input value in the window is multiplied by the corresponding kernel weight, and the products are summed to produce the output value. For the Gaussian-like kernel shown (weights 1/2/4 with divisor 16), this gives a weighted average that smooths high-frequency noise. The kernel then slides to every other position in the image, and the same multiply-accumulate operation is repeated. Each output pixel depends only on its own input window and the kernel; it is completely independent of every other output pixel. This independence is what makes convolution embarrassingly parallel.</p>
<h1><a class="anchor" id="Conv2DWalkthrough"></a>
Concrete Walkthrough</h1>
<p>Let us trace the computation for a <code>5x5</code> image with the <code>3x3</code> kernel below, computing output pixel <code>(2, 2)</code> by hand. The input image:</p>
<div class="fragment"><div class="line">input = [ 1 2 0 1 3 ]</div>
<div class="line"> [ 0 3 1 2 1 ]</div>
<div class="line"> [ 2 1 4 0 2 ]</div>
<div class="line"> [ 1 0 2 3 1 ]</div>
<div class="line"> [ 0 2 1 1 0 ]</div>
</div><!-- fragment --><p>The <code>3x3</code> Gaussian kernel (sum = 16, used as divisor for normalisation):</p>
<div class="fragment"><div class="line">kernel = [ 1 2 1 ]</div>
<div class="line"> [ 2 4 2 ]</div>
<div class="line"> [ 1 2 1 ]</div>
</div><!-- fragment --><p>The <code>3x3</code> window centred at output pixel <code>(2, 2)</code> covers input rows 1 to 3 and columns 1 to 3:</p>
<div class="fragment"><div class="line">window = [ 3 1 2 ] (input rows 1-3, cols 1-3)</div>
<div class="line"> [ 1 4 0 ]</div>
<div class="line"> [ 0 2 3 ]</div>
</div><!-- fragment --><p>The weighted sum is:</p>
<div class="fragment"><div class="line">y[2][2] = (1*3 + 2*1 + 1*2 +</div>
<div class="line"> 2*1 + 4*4 + 2*0 +</div>
<div class="line"> 1*0 + 2*2 + 1*3) / 16</div>
<div class="line"> = (3 + 2 + 2 + 2 + 16 + 0 + 0 + 4 + 3) / 16</div>
<div class="line"> = 32 / 16</div>
<div class="line"> = 2.0</div>
</div><!-- fragment --><p>For interior pixels the window always fits inside the image. For pixels near the border, the window extends outside the image boundary. A common strategy is boundary clamping: any out-of-bounds row or column index is clamped to the nearest valid index, effectively extending the image edge values outward.</p>
<h1><a class="anchor" id="Conv2DParallelism"></a>
Why it Maps Perfectly to IndexRanges?</h1>
<p>The 2D image convolution maps perfectly <code>tf::IndexRanges<int,2></code> because every output pixel <code>(i, j)</code> depends only on a fixed neighbourhood in the input and the kernel. Additionally, since two output pixels write to the same memory location, all output pixels can be computed simultaneously with no synchronization at all. As a result, the iteration space is inherently 2D: output row <code>i</code> ranges over <code>[0, output_rows)</code> and output column <code>j</code> ranges over <code>[0, output_cols)</code>. <code>tf::IndexRanges<int, 2></code> expresses this directly as the Cartesian product of two 1D ranges, and <a class="el" href="classtf_1_1FlowBuilder.html#a2582a216d54dacca2b7022ea7e89452a" title="constructs a parallel-for task over a one- or multi-dimensional index range">tf::Taskflow::for_each_by_index</a> partitions this 2D space into axis-aligned sub-boxes and assigns each sub-box to a worker.</p>
<p>The following figure illustrates how a 6×6 output image is split into four sub-boxes, one per worker:</p>
<div class="image">
<object type="image/svg+xml" data="conv_parallel.svg" style="pointer-events: none;"></object>
</div>
<p>Each worker processes its assigned sub-box independently, writing to a disjoint region of the output image. No locks, atomics, or barriers are needed at any point during the sweep.</p>
<h1><a class="anchor" id="Conv2DImplementation"></a>
Implementation</h1>
<p>The complete self-contained example below applies a 3×3 Gaussian blur to a hand-constructed 6×6 input image. Boundary pixels use clamped addressing.</p>
<div class="fragment"><div class="line"><span class="preprocessor">#include <taskflow/taskflow.hpp></span></div>
<div class="line"><span class="preprocessor">#include <vector></span></div>
<div class="line"> </div>
<div class="line"><span class="keywordtype">int</span> main() {</div>
<div class="line"> </div>
<div class="line"> <span class="comment">// ── image dimensions ──────────────────────────────────────────────────────</span></div>
<div class="line"> <span class="keyword">const</span> <span class="keywordtype">int</span> rows = 6;</div>
<div class="line"> <span class="keyword">const</span> <span class="keywordtype">int</span> cols = 6;</div>
<div class="line"> </div>
<div class="line"> <span class="comment">// ── input image (row-major, single channel) ───────────────────────────────</span></div>
<div class="line"> std::vector<float> input = {</div>
<div class="line"> 1, 2, 0, 1, 3, 2,</div>
<div class="line"> 0, 3, 1, 2, 1, 0,</div>
<div class="line"> 2, 1, 4, 0, 2, 1,</div>
<div class="line"> 1, 0, 2, 3, 1, 2,</div>
<div class="line"> 0, 2, 1, 1, 0, 3,</div>
<div class="line"> 3, 1, 0, 2, 1, 1,</div>
<div class="line"> };</div>
<div class="line"> </div>
<div class="line"> std::vector<float> output(rows * cols, 0.0f);</div>
<div class="line"> </div>
<div class="line"> <span class="comment">// ── 3x3 Gaussian blur kernel (normalised by 1/16) ─────────────────────────</span></div>
<div class="line"> <span class="keyword">const</span> <span class="keywordtype">int</span> ksize = 3;</div>
<div class="line"> <span class="keyword">const</span> <span class="keywordtype">int</span> kradius = ksize / 2; <span class="comment">// 1</span></div>
<div class="line"> <span class="keyword">const</span> <span class="keywordtype">float</span> knorm = 16.0f;</div>
<div class="line"> <span class="keyword">const</span> <span class="keywordtype">float</span> kernel[3][3] = {</div>
<div class="line"> { 1, 2, 1 },</div>
<div class="line"> { 2, 4, 2 },</div>
<div class="line"> { 1, 2, 1 },</div>
<div class="line"> };</div>
<div class="line"> </div>
<div class="line"> <span class="comment">// ── parallel 2D convolution ───────────────────────────────────────────────</span></div>
<div class="line"> <span class="comment">// IndexRanges<int,2> covers the full output pixel space [0,rows) x [0,cols).</span></div>
<div class="line"> <span class="comment">// Each sub-box delivered to the kernel is a contiguous 2D tile of output</span></div>
<div class="line"> <span class="comment">// pixels. Pixels in different tiles never share output locations, so no</span></div>
<div class="line"> <span class="comment">// synchronization is needed.</span></div>
<div class="line"> tf::IndexRanges<int, 2> range(</div>
<div class="line"> <a class="code hl_typedef" href="namespacetf.html#a6c928ec9248757ba8276e316ef26846b">tf::IndexRange<int></a>(0, rows, 1), <span class="comment">// dim 0: output rows</span></div>
<div class="line"> <a class="code hl_typedef" href="namespacetf.html#a6c928ec9248757ba8276e316ef26846b">tf::IndexRange<int></a>(0, cols, 1) <span class="comment">// dim 1: output columns</span></div>
<div class="line"> );</div>
<div class="line"> </div>
<div class="line"> tf::Executor executor;</div>
<div class="line"> tf::Taskflow taskflow;</div>
<div class="line"> </div>
<div class="line"> taskflow.<a class="code hl_function" href="classtf_1_1FlowBuilder.html#a2582a216d54dacca2b7022ea7e89452a">for_each_by_index</a>(range,</div>
<div class="line"> [&](<span class="keyword">const</span> tf::IndexRanges<int, 2>& box) {</div>
<div class="line"> <span class="keyword">auto</span> [i0, i1, is] = box.<a class="code hl_function" href="classtf_1_1IndexRanges.html#a4e0162b872edd6176e9d6a308b295427">dim</a>(0);</div>
<div class="line"> <span class="keyword">auto</span> [j0, j1, js] = box.<a class="code hl_function" href="classtf_1_1IndexRanges.html#a4e0162b872edd6176e9d6a308b295427">dim</a>(1);</div>
<div class="line"> <span class="keywordflow">for</span>(<span class="keywordtype">int</span> i = i0; i < i1; i += is) {</div>
<div class="line"> <span class="keywordflow">for</span>(<span class="keywordtype">int</span> j = j0; j < j1; j += js) {</div>
<div class="line"> <span class="keywordtype">float</span> sum = 0.0f;</div>
<div class="line"> <span class="keywordflow">for</span>(<span class="keywordtype">int</span> ki = 0; ki < ksize; ki++) {</div>
<div class="line"> <span class="keywordflow">for</span>(<span class="keywordtype">int</span> kj = 0; kj < ksize; kj++) {</div>
<div class="line"> <span class="comment">// clamp-to-edge boundary handling</span></div>
<div class="line"> <span class="keywordtype">int</span> si = std::clamp(i + ki - kradius, 0, rows - 1);</div>
<div class="line"> <span class="keywordtype">int</span> sj = std::clamp(j + kj - kradius, 0, cols - 1);</div>
<div class="line"> sum += kernel[ki][kj] * input[si * cols + sj];</div>
<div class="line"> }</div>
<div class="line"> }</div>
<div class="line"> output[i * cols + j] = sum / knorm;</div>
<div class="line"> }</div>
<div class="line"> }</div>
<div class="line"> }</div>
<div class="line"> );</div>
<div class="line"> </div>
<div class="line"> executor.<a class="code hl_function" href="classtf_1_1Executor.html#a519777f5783981d534e9e53b99712069">run</a>(taskflow).wait();</div>
<div class="line"> </div>
<div class="line"> <span class="comment">// print output image</span></div>
<div class="line"> printf(<span class="stringliteral">"Output image (Gaussian blur):\n"</span>);</div>
<div class="line"> <span class="keywordflow">for</span>(<span class="keywordtype">int</span> i = 0; i < rows; i++) {</div>
<div class="line"> <span class="keywordflow">for</span>(<span class="keywordtype">int</span> j = 0; j < cols; j++) {</div>
<div class="line"> printf(<span class="stringliteral">"%5.2f "</span>, output[i * cols + j]);</div>
<div class="line"> }</div>
<div class="line"> printf(<span class="stringliteral">"\n"</span>);</div>
<div class="line"> }</div>
<div class="line"> </div>
<div class="line"> <span class="keywordflow">return</span> 0;</div>
<div class="line">}</div>
<div class="ttc" id="aclasstf_1_1Executor_html_a519777f5783981d534e9e53b99712069"><div class="ttname"><a href="classtf_1_1Executor.html#a519777f5783981d534e9e53b99712069">tf::Executor::run</a></div><div class="ttdeci">tf::Future< void > run(Taskflow &taskflow)</div><div class="ttdoc">runs a taskflow once</div></div>
<div class="ttc" id="aclasstf_1_1FlowBuilder_html_a2582a216d54dacca2b7022ea7e89452a"><div class="ttname"><a href="classtf_1_1FlowBuilder.html#a2582a216d54dacca2b7022ea7e89452a">tf::FlowBuilder::for_each_by_index</a></div><div class="ttdeci">Task for_each_by_index(R range, C callable, P part=P())</div><div class="ttdoc">constructs a parallel-for task over a one- or multi-dimensional index range</div></div>
<div class="ttc" id="aclasstf_1_1IndexRanges_html_a4e0162b872edd6176e9d6a308b295427"><div class="ttname"><a href="classtf_1_1IndexRanges.html#a4e0162b872edd6176e9d6a308b295427">tf::IndexRanges::dim</a></div><div class="ttdeci">const std::tuple< T, T, T > & dim(size_t d) const</div><div class="ttdoc">returns the (begin, end, step) tuple for dimension d (read-only)</div><div class="ttdef"><b>Definition</b> iterator.hpp:297</div></div>
<div class="ttc" id="anamespacetf_html_a6c928ec9248757ba8276e316ef26846b"><div class="ttname"><a href="namespacetf.html#a6c928ec9248757ba8276e316ef26846b">tf::IndexRange</a></div><div class="ttdeci">IndexRanges< T, 1 > IndexRange</div><div class="ttdoc">alias for the common 1D case of tf::IndexRanges</div><div class="ttdef"><b>Definition</b> iterator.hpp:971</div></div>
</div><!-- fragment --><p>The program output for the hand-constructed input is:</p>
<div class="fragment"><div class="line">Output image (Gaussian blur):</div>
<div class="line"> 1.44 1.50 1.38 1.50 1.69 1.56</div>
<div class="line"> 1.44 1.75 1.56 1.56 1.56 1.25</div>
<div class="line"> 1.19 1.56 2.00 1.75 1.44 1.06</div>
<div class="line"> 1.00 1.31 1.69 1.75 1.44 1.25</div>
<div class="line"> 1.06 1.19 1.44 1.50 1.31 1.44</div>
<div class="line"> 1.19 1.31 1.19 1.25 1.25 1.31</div>
</div><!-- fragment --><dl class="section user"><dt>A real image example before and after Gaussian blur</dt><dd></dd></dl>
<div class="image">
<img src="conv_before_after.png" alt=""/>
</div>
<p>The image above demonstrates the effect of Gaussian blur. The left figure shows the original image with geometric shapes and salt-and-pepper noise, while the right figure shows the output after applying a <code>5x5</code> Gaussian blur: noise speckles are suppressed and sharp edges are softened because each output pixel is replaced by a weighted average of its neighbourhood, diluting isolated extreme values and smoothing abrupt transitions.</p>
<h1><a class="anchor" id="Conv2DDesignPoints"></a>
Design Points</h1>
<p>There are a few important design points worth noting for this example, which also apply generally to parallel convolution algorithms:</p>
<ul>
<li>Output pixels are unconditionally independent: Each pixel <code>(i, j)</code> writes exclusively to <code>output[i*cols+j]</code>. The input image is read-only during the sweep. No atomic operations or mutexes are needed, and the parallel speedup scales directly with the number of workers up to the image size.</li>
<li>Boundary clamping inside the kernel: The <code>std::clamp</code> calls handle boundary pixels without any special-casing outside the inner loop. Alternative boundary strategies (zero-padding, mirror, wrap-around) require only changing the two index computations inside the kernel; the parallel structure is identical for all of them.</li>
<li>2D index range maps naturally to 2D pixel space: Using <code>tf::IndexRanges<int,2></code> instead of a flat 1D range over all pixels preserves the row-column structure of the problem. Each sub-box delivered to the kernel is a geometrically contiguous tile of the output image, which keeps the input window accesses for adjacent output pixels close together in memory and benefits cache reuse across the inner loop over columns.</li>
<li>Partitioner choice: Each output pixel performs exactly <code>ksize*ksize</code> multiply-add operations regardless of its position, so the workload is perfectly uniform. <a class="el" href="classtf_1_1StaticPartitioner.html" title="class to construct a static partitioner for scheduling parallel algorithms">tf::StaticPartitioner</a> is the right choice here: it has the lowest scheduling overhead and delivers optimal load balance for uniform workloads. There is no reason to pay for the runtime work-stealing of guided or dynamic partitioners when every tile does the same amount of work.</li>
</ul>
<p>For large images or deep filter stacks, cache locality of the input window accesses becomes the dominant performance factor. Tiling the output into cache-sized sub-boxes (so that the corresponding input windows fit in L2 or L3 cache) can significantly improve throughput. <code>tf::IndexRanges<int,2></code> naturally expresses this tiling: simply choose a tile size that fits the relevant input region in cache and let the partitioner handle distribution across workers. </p>
</div></div><!-- contents -->
</div><!-- PageDoc -->
</div><!-- doc-content -->
<!-- HTML footer for doxygen 1.13.1-->
<!-- start footer part -->
<div id="nav-path" class="navpath"><!-- id is needed for treeview function! -->
<ul>
<li class="navelem"><a class="el" href="Examples.html">Learning from Examples</a></li>
<li class="footer">
Maintained by <a href="https://tsung-wei-huang.github.io/">Dr. Tsung-Wei Huang</a>
—
Generated by <a href="https://www.doxygen.org/index.html"><img class="footer" src="doxygen.svg" width="104" height="31" alt="doxygen"/></a> 1.13.1
</li>
</ul>
</div>