-
Notifications
You must be signed in to change notification settings - Fork 59
Expand file tree
/
Copy pathindex.html
More file actions
515 lines (446 loc) · 23.9 KB
/
index.html
File metadata and controls
515 lines (446 loc) · 23.9 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta charset="utf-8">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta name="generator" content="pandoc" />
<meta name="viewport" content="width=device-width, initial-scale=1">
<title></title>
<script src="libs/jquery-1.11.3/jquery.min.js"></script>
<script src="libs/jqueryui-1.11.4/jquery-ui.min.js"></script>
<link href="libs/tocify-1.9.1/jquery.tocify.css" rel="stylesheet" />
<script src="libs/tocify-1.9.1/jquery.tocify.js"></script>
<meta name="viewport" content="width=device-width, initial-scale=1" />
<link href="libs/bootstrap-3.3.5/css/yeti.min.css" rel="stylesheet" />
<script src="libs/bootstrap-3.3.5/js/bootstrap.min.js"></script>
<script src="libs/bootstrap-3.3.5/shim/html5shiv.min.js"></script>
<script src="libs/bootstrap-3.3.5/shim/respond.min.js"></script>
<style type="text/css">code{white-space: pre;}</style>
<link rel="stylesheet"
href="libs/highlight/textmate.css"
type="text/css" />
<script src="libs/highlight/highlight.js"></script>
<style type="text/css">
pre:not([class]) {
background-color: white;
}
</style>
<script type="text/javascript">
if (window.hljs && document.readyState && document.readyState === "complete") {
window.setTimeout(function() {
hljs.initHighlighting();
}, 0);
}
</script>
<link rel="stylesheet" href="styles.css" type="text/css" />
</head>
<body>
<style type = "text/css">
.main-container {
max-width: 940px;
margin-left: auto;
margin-right: auto;
}
code {
color: inherit;
background-color: rgba(0, 0, 0, 0.04);
}
img {
max-width:100%;
height: auto;
}
h1 {
font-size: 34px;
}
h1.title {
font-size: 38px;
}
h2 {
font-size: 30px;
}
h3 {
font-size: 24px;
}
h4 {
font-size: 18px;
}
h5 {
font-size: 16px;
}
h6 {
font-size: 12px;
}
</style>
<div class="container-fluid main-container">
<script>
$(function() {
// establish options
var options = {
selectors: "h1,h2,h3",
theme: "bootstrap3",
context: '.toc-content',
hashGenerator: function (text) {
return text.replace(/[.\/?&!#<>]/g, '').replace(/\s/g, '_').toLowerCase();
},
ignoreSelector: "h1.title",
scrollTo: 0
};
options.showAndHide = false;
options.smoothScroll = true;
// tocify
var toc = $("#TOC").tocify(options).data("toc-tocify");
});
</script>
<style type="text/css">
#TOC {
margin: 25px 0px 20px 0px;
}
@media (max-width: 768px) {
#TOC {
position: relative;
width: 100%;
}
}
.toc-content {
padding-left: 30px;
padding-right: 40px;
}
div.main-container {
max-width: 1200px;
}
div.tocify {
width: 20%;
max-width: 260px;
max-height: 85%;
}
@media (min-width: 768px) and (max-width: 991px) {
div.tocify {
width: 25%;
}
}
.tocify ul, .tocify li {
line-height: 20px;
}
.tocify-subheader .tocify-item {
font-size: 0.9em;
padding-left: 5px;
}
.tocify .list-group-item {
border-radius: 0px;
}
.tocify-subheader {
display: inline;
}
.tocify-subheader .tocify-item {
font-size: 0.95em;
padding-left: 10px;
}
</style>
<!-- setup 3col/9col grid for toc_float and main content -->
<div class="row-fluid">
<div class="col-sm-4 col-md-3">
<div id="TOC" class="tocify">
</div>
</div>
<div class="toc-content col-sm-8 col-md-9">
<div class="navbar navbar-default navbar-inverse navbar-fixed-top" role="navigation">
<div class="container">
<div class="navbar-header">
<button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#navbar">
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a class="navbar-brand" href="/">Rcpp Parallel</a>
</div>
<div id="navbar" class="navbar-collapse collapse">
<ul class="nav navbar-nav">
<li><a href="/">Home</a></li>
<li><a href="/tbb.html">Intel TBB</a></li>
<li><a href="/simd.html">Boost.SIMD</a></li>
</ul>
<ul class="nav navbar-nav navbar-right">
<li><a href="https://github.com/RcppCore/RcppParallel">GitHub</a></li>
</ul>
</div><!--/.nav-collapse -->
</div><!--/.container -->
</div><!--/.navbar -->
<script>
// manage active state of menu based on current page
$(document).ready(function () {
// active menu
href = window.location.pathname
href = href.substr(href.lastIndexOf('/'))
$('a[href="' + href + '"]').parent().addClass('active');
});
</script>
<h1 class="title">
<img id="logo" src="images/RcppParallelLogo.png" width="643" height="90"></img>
</h1>
<div id="overview" class="section level2">
<h2>Overview</h2>
<p>RcppParallel provides a complete toolkit for creating portable, high-performance parallel algorithms without requiring direct manipulation of operating system threads. RcppParallel includes:</p>
<ul>
<li><p><a href="https://www.threadingbuildingblocks.org/">Intel TBB</a> (v4.3), a C++ library for task parallelism with a wide variety of parallel algorithms and data structures (Windows, OS X, Linux, and Solaris x86 only).</p></li>
<li><p><a href="http://nt2.metascale.fr/doc/html/boost_simd.html">Boost.SIMD</a>, a C++ template library that provides portable (visa-vi instuction-sets and compilers) access to SIMD extensions.</p></li>
<li><p><a href="http://tinythreadpp.bitsnbites.eu/">TinyThread</a>, a C++ library for portable use of operating system threads.</p></li>
<li><p><code>RVector</code> and <code>RMatrix</code> wrapper classes for safe and convenient access to R data structures in a multi-threaded environment.</p></li>
<li><p>High level parallel functions (<code>parallelFor</code> and <code>parallelReduce</code>) that use Intel TBB as a back-end on systems that support it and TinyThread on other platforms.</p></li>
</ul>
</div>
<div id="examples" class="section level2">
<h2>Examples</h2>
<p>Some simple example uses of RcppParallel along with performance increases achieved over serial code. The benchmarks were executed on a 2.6GHz Haswell MacBook Pro with 4 cores (8 with hyperthreading).</p>
<p><a href="http://gallery.rcpp.org/articles/parallel-matrix-transform/">Parallel Matrix Transform</a> — Demonstrates using <code>parallelFor</code> to transform a matrix (take the square root of each element) in parallel. In this example the parallel version performs about 2.5x faster than the serial version.</p>
<p><a href="http://gallery.rcpp.org/articles/parallel-vector-sum/">Parallel Vector Sum</a> — Demonstrates using <code>parallelReduce</code> to take the sum of a vector in parallel. In this example the parallel version performs 4.5x faster than the serial version.</p>
<p><a href="http://gallery.rcpp.org/articles/parallel-distance-matrix/">Parallel Distance Matrix</a> — Demonstrates using <code>parallelFor</code> to compute pairwise distances for each row in an input data matrix. In this example the parallel version performs 5.5x faster than the serial version.</p>
<p><a href="http://gallery.rcpp.org/articles/parallel-inner-product/">Parallel Inner Product</a> — Demonstrates using <code>parallelReduce</code> to compute the inner product of two vectors in parallel. In this example the parallel version performs 2.5x faster than the serial version.</p>
<p>You may get the hang of using RcppParallel by studying the examples however you should still also review this guide in detail as it includes important documentation on thread safety, tuning, and using Intel TBB directly for more advanced use cases.</p>
</div>
<div id="getting-started" class="section level2">
<h2>Getting Started</h2>
<p>You can install the RcppParallel package from CRAN as follows:</p>
<pre class="r"><code>install.packages("RcppParallel")</code></pre>
<div id="sourcecpp" class="section level3">
<h3>sourceCpp</h3>
<p>Add the following to a standalone C++ source file to import RcppParallel:</p>
<pre class="cpp"><code>// [[Rcpp::depends(RcppParallel)]]
#include <RcppParallel.h></code></pre>
<p>When you compile the file using <code>Rcpp::sourceCpp</code> the required compiler and linker settings for RcppParallel will be automatically included in the compilation.</p>
</div>
<div id="r-packages" class="section level3">
<h3>R Packages</h3>
<p>If you want to use RcppParallel from within an R package you need to edit several files to create the requisite build and runtime links. The following additions should be made:</p>
<p><strong>DESCRIPTION</strong></p>
<pre class="yaml"><code>Imports: RcppParallel
LinkingTo: RcppParallel
SystemRequirements: GNU make</code></pre>
<p><strong>NAMESPACE</strong></p>
<pre class="r"><code>importFrom(RcppParallel, RcppParallelLibs)</code></pre>
<p><strong>src\Makevars</strong></p>
<pre class="make"><code>PKG_LIBS += $(shell ${R_HOME}/bin/Rscript -e "RcppParallel::RcppParallelLibs()")</code></pre>
<p><strong>src\Makevars.win</strong></p>
<pre class="make"><code>PKG_CXXFLAGS += -DRCPP_PARALLEL_USE_TBB=1
PKG_LIBS += $(shell "${R_HOME}/bin${R_ARCH_BIN}/Rscript.exe" \
-e "RcppParallel::RcppParallelLibs()")</code></pre>
<p>Note that the Windows variation (Makevars.win) requires an extra <code>PKG_CXXFLAGS</code> entry that enables the use of TBB. This is because TBB is not used by default on Windows (for backward compatibility with a previous version of RcppParallel which lacked support for TBB on Windows).</p>
<p>After you’ve added the above to the package you can simply include the main RcppParallel package header in source files that need to use it:</p>
<pre class="cpp"><code>#include <RcppParallel.h></code></pre>
</div>
</div>
<div id="thread-safety" class="section level2">
<h2>Thread Safety</h2>
<p>A major goal of RcppParallel is to make it possible to write parallel code without traditional threading and locking primitives (which are notoriously complicated and difficult to get right). This is achieved for the most part by <code>parallelFor</code> and <code>parallelReduce</code> however the fact that the R API itself is single-threaded must also be taken into consideration.</p>
<div id="api-restrictions" class="section level3">
<h3>API Restrictions</h3>
<p>The code that you write within parallel workers should not call the R or Rcpp API in any fashion. This is because R is single-threaded and concurrent interaction with it’s data structures can cause crashes and other undefined behavior. Here is the official guidance from <a href="https://cran.rstudio.com/doc/manuals/r-release/R-exts.html">Writing R Extensions</a>:</p>
<blockquote>
<p>Calling any of the R API from threaded code is ‘for experts only’: they will need to read the source code to determine if it is thread-safe. In particular, code which makes use of the stack-checking mechanism must not be called from threaded code.</p>
</blockquote>
<p>Not being able to call the R or Rcpp API creates an obvious challenge: how to read and write to R vectors and matrices. Fortunately, R vectors and matrices are just contiguous arrays of <code>int</code>, <code>double</code>, etc. so can be accessed using traditional array and pointer offsets. The next section describes a safe and high level way to do this.</p>
</div>
<div id="safe-accessors" class="section level3">
<h3>Safe Accessors</h3>
<p>To provide safe and convenient access to the arrays underlying R vectors and matrices RcppParallel introduces several accessor classes:</p>
<ul>
<li><p><code>RVector<T></code> — Wrap R vectors of various types</p></li>
<li><p><code>RMatrix<T></code> — Wrap R matrices of various types (also includes <code>Row</code> and <code>Column</code> classes)</p></li>
</ul>
<p>To create a thread safe accessor for an Rcpp vector or matrix just construct an instance of <code>RVector</code> or <code>RMatrix</code> with it. For example:</p>
<pre class="cpp"><code>// [[Rcpp::export]]
IntegerVector transformVector(IntegerVector x) {
RVector<int> input(x);
// etc...
}</code></pre>
<p>Similarly, if you need to return a vector as a result of a parallel transformation you should first create it using Rcpp then construct a wrapper for writing from multiple threads. For example:</p>
<pre class="cpp"><code>// [[Rcpp::export]]
IntegerVector transformVector(IntegerVector x) {
RVector<int> input(x); // create threadsafe wrapper to input
IntegerVector y(x.size()); // allocate output vector
RVector<int> output(y); // create threadsafe wrapper to output
// ...transform vector in parallel ...
return y;
}</code></pre>
</div>
<div id="locking" class="section level3">
<h3>Locking</h3>
<p>When using RcppParallel you typically do not need to worry about explicit locking, as the mechanics of <code>parallelFor</code> and <code>parallelReduce</code> (explained below) take care of providing safe windows into input and output data that have no possibility of contention. Nevertheless, if for some reason you do need to synchronize access to shared data, you can use the TinyThread locking classes (automatically available via <code>RcppParallel.h</code>):</p>
<table style="width:114%;">
<colgroup>
<col width="29%" />
<col width="84%" />
</colgroup>
<thead>
<tr class="header">
<th align="left">Function</th>
<th align="left">Description</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td align="left"><a href="http://tinythreadpp.bitsnbites.eu/doc/classtthread_1_1lock__guard.html"><code>lock_guard</code></a></td>
<td align="left">Lock guard class. The constructor locks the mutex, and the destructor unlocks the mutex, so the mutex will automatically be unlocked when the lock guard goes out of scope.</td>
</tr>
<tr class="even">
<td align="left"><a href="http://tinythreadpp.bitsnbites.eu/doc/classtthread_1_1mutex.html"><code>mutex</code></a></td>
<td align="left">Mutual exclusion object for synchronizing access to shared memory areas for several threads. The mutex is non-recursive (i.e. a program may deadlock if the thread that owns a mutex object calls lock() on that object).</td>
</tr>
<tr class="odd">
<td align="left"><a href="http://tinythreadpp.bitsnbites.eu/doc/classtthread_1_1recursive__mutex.html"><code>recursive_mutex</code></a></td>
<td align="left">Mutual exclusion object for synchronizing access to shared memory areas for several threads. The mutex is recursive (i.e. a thread may lock the mutex several times, as long as it unlocks the mutex the same number of times).</td>
</tr>
<tr class="even">
<td align="left"><a href="http://tinythreadpp.bitsnbites.eu/doc/classtthread_1_1fast__mutex.htmll"><code>fast_mutex</code></a></td>
<td align="left">Mutual exclusion object for synchronizing access to shared memory areas for several threads. It is similar to the tthread::mutex class, but instead of using system level functions, it is implemented as an atomic spin lock with very low CPU overhead.</td>
</tr>
</tbody>
</table>
<p>See the complete <a href="http://tinythreadpp.bitsnbites.eu/doc/">TinyThread documentation</a> for additional details.</p>
<p>The TinyThread locking primitives will work on all platforms. If you are using TBB directly you can alternatively use the synchronization classes provided by TBB. See the section on <a href="tbb.html#synchronization">TBB Synchronization</a> for additional details.</p>
</div>
</div>
<div id="algorithms" class="section level2">
<h2>Algorithms</h2>
<p>RcppParallel provides two high level parallel algorithms: <code>parallelFor</code> can be used to convert the work of a standard serial “for” loop into a parallel one and <code>parallelReduce</code> can be used for accumulating aggregate or other values.</p>
<div id="parallelfor" class="section level3">
<h3>parallelFor</h3>
<p>To use <code>parallelFor</code>, you create a <code>Worker</code> object that defines an <code>operator()</code> which is called by the parallel scheduler. This function is passed a <code>[begin,end)</code> exclusive range which is a safe window (i.e. not in use by other threads) into the input or output data. Note that the <code>end</code> element is not included in the range (just like an STL <code>end</code> iterator).</p>
<p>For example, here’s a <code>Worker</code> object that takes the square root of it’s input and writes it into it’s output:</p>
<pre class="cpp"><code>// [[Rcpp::depends(RcppParallel)]]
#include <RcppParallel.h>
using namespace RcppParallel;
struct SquareRoot : public Worker
{
// source matrix
const RMatrix<double> input;
// destination matrix
RMatrix<double> output;
// initialize with source and destination
SquareRoot(const NumericMatrix input, NumericMatrix output)
: input(input), output(output) {}
// take the square root of the range of elements requested
void operator()(std::size_t begin, std::size_t end) {
std::transform(input.begin() + begin,
input.begin() + end,
output.begin() + begin,
::sqrt);
}
};
</code></pre>
<p>Note that <code>SquareRoot</code> derives from <code>RcppParallel::Worker</code>. This is required for function objects passed to <code>parallelFor</code>.</p>
<p>Here’s a function that calls the <code>SquareRoot</code> worker we defined:</p>
<pre class="cpp"><code>// [[Rcpp::export]]
NumericMatrix parallelMatrixSqrt(NumericMatrix x) {
// allocate the output matrix
NumericMatrix output(x.nrow(), x.ncol());
// SquareRoot functor (pass input and output matrixes)
SquareRoot squareRoot(x, output);
// call parallelFor to do the work
parallelFor(0, x.length(), squareRoot);
// return the output matrix
return output;
}</code></pre>
</div>
<div id="parallelreduce" class="section level3">
<h3>parallelReduce</h3>
<p>To use <code>parallelReduce</code> you create a <code>Worker</code> object as well, this object should include:</p>
<ol style="list-style-type: decimal">
<li><p>A standard and “splitting” constructor. The standard constructor takes the input data and initializes whatever value is being accumulated (e.g. initialize a sum to zero). The splitting constructor is called when work needs to be split onto other threads—it takes a reference to the instance it is being split from and simply copies the pointer to the input data and initializes it’s “accumulated” value to zero.</p></li>
<li><p>An operator() which performs the work. This works just like the operator() in <code>parallelFor</code>, but instead of writing to another vector or matrix it typically will accumulate a value.</p></li>
<li><p>A join method which composes the operations of two worker instances that were previously split. Here we simply combine the accumulated value of the instance we are being joined with to our own.</p></li>
</ol>
<p>For example, here’s a <code>Worker</code> object that is used to sum a vector:</p>
<pre class="cpp"><code>// [[Rcpp::depends(RcppParallel)]]
#include <RcppParallel.h>
using namespace RcppParallel;
struct Sum : public Worker
{
// source vector
const RVector<double> input;
// accumulated value
double value;
// constructors
Sum(const NumericVector input) : input(input), value(0) {}
Sum(const Sum& sum, Split) : input(sum.input), value(0) {}
// accumulate just the element of the range I've been asked to
void operator()(std::size_t begin, std::size_t end) {
value += std::accumulate(input.begin() + begin, input.begin() + end, 0.0);
}
// join my value with that of another Sum
void join(const Sum& rhs) {
value += rhs.value;
}
};</code></pre>
<p>Now that we’ve defined the Worker, implementing the parallel sum function is straightforward. Just initialize an instance of <code>Sum</code> with an input vector and call <code>parallelReduce</code>:</p>
<pre class="cpp"><code>// [[Rcpp::export]]
double parallelVectorSum(NumericVector x) {
// declare the Sum instance
Sum sum(x);
// call parallel_reduce to start the work
parallelReduce(0, x.length(), sum);
// return the computed sum
return sum.value;
}</code></pre>
</div>
<div id="tbb-algorithms" class="section level3">
<h3>TBB Algorithms</h3>
<p>RcppParallel provides the <code>parallelFor</code> and <code>parallelReduce</code> algorithms however the TBB library includes a wealth of more advanced algorithms and other tools for parallelization. See the <a href="tbb.html">Intel TBB</a> article for additional details.</p>
</div>
</div>
<div id="tuning" class="section level2">
<h2>Tuning</h2>
<p>There are several settings available for tuning the behavior of parallel algorithms. These settings as well as benchmarking techniques are covered below.</p>
<div id="grain-size" class="section level3">
<h3>Grain Size</h3>
<p>The grain size of a parallel algorithm sets a minimum chunk size for parallelization. In other words, at what point to stop processing input on separate threads (as sometimes creating more threads can degrade the performance of an algorithm by introducing excessive synchronization overhead).</p>
<p>By default the grain size for TBB (and thus for <code>parallelFor</code> and <code>parallelReduce</code>) is 1. You can change the grain size by passing an additional parameter to these functions. For example:</p>
<pre class="cpp"><code>parallelReduce(0, x.length(), sum, 100);</code></pre>
<p>This will prevent the creation of threads that process less than 100 items. You should experiment with various chunk sizes and use the benchmarking tools described below to measure their effectiveness. The Intel TBB website includes a detailed <a href="https://www.threadingbuildingblocks.org/docs/help/tbb_userguide/Controlling_Chunking.htm">discussion of grain sizes and partitioning</a> which has some useful guidelines for tweaking grain sizes.</p>
</div>
<div id="threads-used" class="section level3">
<h3>Threads Used</h3>
<p>By default all of the available cores on a machine are used for parallel algorithms. You may instead want to use a fixed number of threads or a fixed proportion of cores available on the machine.</p>
<p>R rather than C++ functions are provided to control these settings so that users of your algorithm can control the use of resources on their system. You can call the <code>setThreadOptions</code> function to allocate threads. For example, the following sets a maximum of 4 threads:</p>
<pre class="r"><code>RcppParallel::setThreadOptions(numThreads = 4)</code></pre>
<p>To use a proportion of available cores you can use the <code>defaultNumThreads</code> function. For example, the following says to use half of the available cores on a system:</p>
<pre class="r"><code>library(RcppParallel)
setThreadOptions(numThreads = defaultNumThreads() / 2)</code></pre>
</div>
<div id="benchmarking" class="section level3">
<h3>Benchmarking</h3>
<p>As you experiment with various settings to tune your parallel algorithms you should always measure the results. The <strong>rbenchmark</strong> package has some useful tools for doing this. For example, here’s a benchmark of the parallel matrix square root example from above (in this case it’s a comparison against the serial version):</p>
<pre class="r"><code># allocate a matrix
m <- matrix(as.numeric(c(1:1000000)), nrow = 1000, ncol = 1000)
# ensure that serial and parallel versions give the same result
stopifnot(identical(matrixSqrt(m), parallelMatrixSqrt(m)))
# compare performance of serial and parallel
library(rbenchmark)
res <- benchmark(matrixSqrt(m),
parallelMatrixSqrt(m),
order="relative")
res[,1:4]</code></pre>
<pre><code> test replications elapsed relative
2 parallelMatrixSqrt(m) 100 0.294 1.000
1 matrixSqrt(m) 100 0.755 2.568</code></pre>
</div>
</div>
</div>
</div>
</div>
<script>
// add bootstrap table styles to pandoc tables
$(document).ready(function () {
$('tr.header').parent('thead').parent('table').addClass('table table-condensed');
});
</script>
<!-- dynamically load mathjax for compatibility with self-contained -->
<script>
(function () {
var script = document.createElement("script");
script.type = "text/javascript";
script.src = "https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML";
document.getElementsByTagName("head")[0].appendChild(script);
})();
</script>
</body>
</html>