Skip to content

Commit f8c3ed0

Browse files
authored
Merge pull request #147 from scijava/scijava-ops-benchmarks/fixups
Improve the SciJava Ops benchmarks
2 parents 940650f + 279a5be commit f8c3ed0

24 files changed

Lines changed: 916 additions & 1090 deletions

File tree

docs/ops/doc/Benchmarks.rst

Lines changed: 42 additions & 47 deletions
Original file line numberDiff line numberDiff line change
@@ -1,81 +1,76 @@
11
SciJava Ops Benchmarks
22
======================
33

4-
This page describes a quantitative analysis of the SciJava Ops framework, and is heavily inspired by a similar comparison of `ImgLib2 <https://imagej.net/libs/imglib2/benchmarks>`_.
4+
This page describes a quantitative analysis of the SciJava Ops framework, and is heavily inspired by a similar comparison of `ImgLib2 <https://imagej.net/libs/imglib2/benchmarks>`_. In all figures, benchmark times are displayed using bar charts (describing mean execution times, in microseconds) with error bars (used to denote the range of observed execution times).
55

66
Hardware and Software
77
---------------------
88

99
This analysis was performed with the following hardware:
1010

11-
* Dell Precision 7770
12-
* 12th Gen Intel i9-12950HX (24) @ 4.900GHz
13-
* 32 GB 4800 MT/s DDR5 RAM
11+
* 2021 Dell OptiPlex 5090 Small Form Factor
12+
* Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz
13+
* 64 GB 3200 MHz DDR4 RAM
1414

1515
The following software components were used:
1616

17-
* Ubuntu 23.10
18-
* Kernel 6.5.0-26-generic
19-
* OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.8+10) with OpenJDK 64-Bit Server VM AdoptOpenJDK (build 11.0.8+10, mixed mode)
20-
* SciJava Incubator commit `77006edc <https://github.com/scijava/incubator/commit/77006edc6a567a08ec5aba39e56fdfab8d79a0b9>`_
17+
* Ubuntu 20.04.6 LTS
18+
* Java HotSpot(TM) 64-Bit Server VM Oracle GraalVM 20.0.1+9.1 (build 20.0.1+9-jvmci-23.0-b12, mixed mode, sharing)
19+
* SciJava Incubator commit `0b8012b2 <https://github.com/scijava/incubator/commit/0b8012b2b00ba84b0583ef7260fab1be8f251041>`_
2120
* ImageJ Ops version ``2.0.0``
2221

2322
All benchmarks are executed using the `Java Microbenchmark Harness <https://github.com/openjdk/jmh>`_, using the following parameters:
2423

2524
* Forked JVM
26-
* 2 warmup executions
25+
* 1 warmup execution
2726
* 2 10-second iterations per warm-up execution
2827
* 1 measurement execution
2928
* 5 5-second iterations per measurement execution
3029

3130
Op Matching
3231
-----------
3332

34-
We first analyze the performance of executing the following static method:
33+
We first analyze the performance of executing the following static method, written to contain the *fewest* instructions possible while also avoiding code removal by the Just In Time compiler:
3534

3635
.. code-block:: java
3736
3837
/**
39-
* @param in the data to input to our function
40-
* @param d the value to add to each element in the input
41-
* @param out the preallocated storage buffer
42-
* @implNote op name="benchmark.match",type=Computer
38+
* Increments a byte value.
39+
*
40+
* @param data array containing the byte to increment
41+
* @implNote op name="benchmark.increment", type=Inplace1
4342
*/
44-
public static void op( //
45-
final RandomAccessibleInterval<DoubleType> in, //
46-
final Double d, //
47-
final RandomAccessibleInterval<DoubleType> out //
48-
) {
49-
LoopBuilder.setImages(in, out)
50-
.multiThreaded()
51-
.forEachPixel((i, o) -> o.set(i.get() + d));
43+
public static void incrementByte(final byte[] data) {
44+
data[0]++;
5245
}
5346
54-
We first benchmark the base penalty of executing this method using SciJava Ops, compared to direct execution of the static method. Notably, as this method requires a preallocated output buffer, we must either create it ourselves, *or* allow SciJava Ops to create it for us using an Op adaptation. Thus, we test the benchmark the following three scenarios:
47+
We first benchmark the overhead of executing this method through SciJava Ops, compared to direct execution of the static method. This method mutates a data structure in place, meaning the Ops engine can match it directly as an inplace Op, or **adapt** it to a function Op. Thus, we test the benchmark the following four scenarios:
5548

56-
* Output buffer creation + static method invocation
57-
* Output buffer creation + SciJava Ops invocation
58-
* SciJava Ops invocation using Op adaptation
49+
* Static method invocation
50+
* Output Buffer creation + static method invocation **(A)**
51+
* SciJava Ops inplace invocation
52+
* SciJava Ops **function** invocation **(A)**
5953

60-
The results are shown in **Figure 1**. We find Op execution through the SciJava Ops framework adds a few milliseconds of additional overhead. A few additional milliseconds of overhead are observed when SciJava Ops is additionally tasked with creating an output buffer.
54+
The results are shown in **Figure 1**. We find Op execution through the SciJava Ops framework adds approximately 100 microseconds of additional overhead. An additional millisecond of overhead is observed when SciJava Ops additionally creates an output buffer.
6155

6256
.. chart:: ../images/BenchmarkMatching.json
6357

6458
**Figure 1:** Algorithm execution performance (lower is better)
6559

66-
Note that the avove requests are benchmarked without assistance from the Op cache, i.e. they are designed to model the full matching process. As repeated Op requests will utilize the Op cache, we benchmark cached Op retrieval separately, with results shown in **Figure 2**. These benchmarks suggest Op caching helps avoid the additional overhead of Op adaptation as its performance approaches that of normal Op execution.
60+
Note that the above requests are benchmarked without assistance from the Op cache, measuring the overhead of full matching process. As repeated Op requests will utilize the Op cache, we benchmark cached Op execution separately, with results shown in **Figure 2**. From these results, we conlude that Op matching comprises the majority of SciJava Ops overhead, and repeated executions add only a few microseconds of overhead.
6761

6862
.. chart:: ../images/BenchmarkCaching.json
6963

7064
**Figure 2:** Algorithm execution performance with Op caching (lower is better)
7165

72-
Finally, we benchmark the overhead of SciJava Ops parameter conversion. Suppose we instead wish to operate upon a ``RandomAccessibleInterval<ByteType>`` - we must convert it to call our Op. We consider the following procedures:
66+
Finally, we benchmark the overhead of SciJava Ops parameter conversion. Suppose we instead wish to operate upon a ``double[]`` - we must convert it to ``byte[]`` to call our Op. We consider the following procedures:
7367

74-
* Image conversion + output buffer creation + static method invocation
75-
* output buffer creation + SciJava Ops invocation using Op conversion
76-
* SciJava Ops invocation using Op conversion and Op adaptation
68+
* Array conversion + static method invocation **(C)**
69+
* Array buffer creation + array conversion + static method invocation **(A+C)**
70+
* SciJava Ops converted inplace invocation **(C)**
71+
* SciJava Ops converted **function** invocation **(A+C)**
7772

78-
The results are shown in **Figure 3**; note the Op cache is **not** enabled. We observe overheads on the order of 10 milliseconds to perform Op conversion with and without Op adaptation.
73+
The results are shown in **Figure 3**; note the Op cache is **not** enabled. We find that parameter conversion imposes additional overhead of approximately 1 millisecond, and when both parameter conversion and Op adaptation are used the overhead (~2 milliseconds) is *additive*.
7974

8075
.. chart:: ../images/BenchmarkConversion.json
8176

@@ -84,35 +79,35 @@ The results are shown in **Figure 3**; note the Op cache is **not** enabled. We
8479
Framework Comparison
8580
--------------------
8681

87-
To validate our development efforts atop the original `ImageJ Ops <https://imagej.net/libs/imagej-ops/>`_ framework, we benchmark executions of the following method:
82+
To validate our development efforts atop the original `ImageJ Ops <https://imagej.net/libs/imagej-ops/>`_ framework, we additionally wrap the above static method within ImageJ Ops:
8883

8984
.. code-block:: java
9085
91-
/**
92-
* @param data the data to invert
93-
* @implNote op name="benchmark.invert",type=Inplace1
94-
*/
95-
public static void invertRaw(final byte[] data) {
96-
for (int i = 0; i < data.length; i++) {
97-
final int value = data[i] & 0xff;
98-
final int result = 255 - value;
99-
data[i] = (byte) result;
86+
/** Increment Op wrapper for ImageJ Ops. */
87+
@Plugin(type = Op.class, name = "benchmark.increment")
88+
public static class IncrementByteOp extends AbstractUnaryInplaceOp<byte[]>
89+
implements Op
90+
{
91+
92+
@Override
93+
public void mutate(byte[] o) {
94+
incrementByte(o);
10095
}
10196
}
10297
103-
We then benchmark the performance of executing this code using the following pathways:
98+
We then benchmark the performance of executing the static method using the following pathways:
10499

105100
* Static method invocation
106101
* SciJava Ops invocation
107-
* ImageJ Ops invocation (using a ``Class`` wrapper to make the method discoverable within ImageJ Ops)
102+
* ImageJ Ops invocation (using the above wrapper)
108103

109-
The results are shown in **Figure 4**. When algorithm matching dominates execution time, the SciJava Ops matching framework provides significant improvement in matching performance in comparison with the original ImageJ Ops framework.
104+
The results are shown in **Figure 4**. From this figure we can see that the "Op overhead" from ImageJ Ops is approximately 70x the "Op overhead" from SciJava Ops.
110105

111106
.. chart:: ../images/BenchmarkFrameworks.json
112107

113108
**Figure 4:** Algorithm execution performance by Framework (lower is better)
114109

115-
Finally, here is a figure combining all the metrics above:
110+
We provide a final figure combining all the metrics above:
116111

117112
.. chart:: ../images/BenchmarkCombined.json
118113

docs/ops/graph_results.py

Lines changed: 86 additions & 73 deletions
Original file line numberDiff line numberDiff line change
@@ -5,103 +5,116 @@
55
import plotly.io as io
66

77
# This script parses JMH benchmarking results into charts developed using plot.ly (https://plotly.com/)
8-
# It currently develops one boxplot PER class, with each JMH benchmark method represented as a separate boxplot.
98
# It expects JMH benchmark results be dumped to a file "scijava-ops-benchmark_results.json", within its directory.
109

11-
# If you'd like to add a title to the plotly charts, add an entry to the following dict.
12-
#
13-
# The key should be the simple name of the class containing the JMH benchmark
14-
# and the value should be the title of the chart
15-
figure_titles = {
16-
"BenchmarkFrameworks" : "Algorithm Execution Performance by Framework",
17-
"BenchmarkCaching" : "Caching Effects on Op Matching Performance",
18-
"BenchmarkConversion": "Parameter Conversion Performance",
19-
"BenchmarkMatching": "Basic Op Matching Performance",
20-
"BenchmarkCombined": "Combined Performance Metrics",
21-
}
22-
23-
# If you'd like to alias a particular test in the chart categories, add an entry to the following dict.
24-
#
25-
# The key should be the JMH benchmark method name, and the value should be the alias
26-
benchmark_categories = {
27-
"imageJOps" : "ImageJ Ops",
28-
"sciJavaOps": "SciJava Ops",
29-
"runStatic" : "Static Method",
30-
"runOp" : "Op Execution",
31-
"runOpCached": "Op Execution (cached)",
32-
"runOpConverted": "Op Execution (converted)",
33-
"runOpConvertedAdapted": "Op Execution (converted + adapted)",
34-
"runOpAdapted": "Op Execution (adapted)",
35-
}
36-
37-
# Read in the benchmark results
10+
# If you'd like to add a plotly chart, add an entry to the following list.
11+
12+
A = "<b style=\"color:black\">[<b style=\"color:#009E73\">A</b>]</b>"
13+
C = "<b style=\"color:black\">[<b style=\"color:#E69F00\">C</b>]</b>"
14+
AC = "<b style=\"color:black\">[<b style=\"color:#CC79A7\">AC</b>]</b>"
15+
figures = [
16+
{
17+
"name": "BenchmarkMatching",
18+
"title": "Basic Op Matching Performance",
19+
"bars": {
20+
"noOps": "Static Method",
21+
"noOpsAdapted": f"Static Method {A}",
22+
"sjOps": "SciJava Ops",
23+
"sjOpsAdapted": f"SciJava Ops {A}"
24+
}
25+
},
26+
{
27+
"name": "BenchmarkCaching",
28+
"title": "Caching Effects on Op Matching Performance",
29+
"bars": {
30+
"noOps": "Static Method",
31+
"sjOps": "SciJava Ops",
32+
"sjOpsWithCache": "SciJava Ops (cached)"
33+
}
34+
},
35+
{
36+
"name": "BenchmarkConversion",
37+
"title": "Parameter Conversion Performance",
38+
"bars": {
39+
"noOpsConverted": f"Static Method {C}",
40+
"noOpsAdaptedAndConverted": f"Static Method {AC}",
41+
"sjOpsConverted": f"SciJava Ops {C}",
42+
"sjOpsConvertedAndAdapted": f"SciJava Ops {AC}"
43+
}
44+
},
45+
{
46+
"name": "BenchmarkFrameworks",
47+
"title": "Algorithm Execution Performance by Framework",
48+
"bars": {
49+
"noOps": "Static Method",
50+
"sjOps": "SciJava Ops",
51+
"ijOps": "ImageJ Ops"
52+
}
53+
},
54+
{
55+
"name": "BenchmarkCombined",
56+
"title": "Combined Performance Metrics",
57+
"bars": {
58+
"noOps": "Static Method",
59+
"noOpsAdapted": f"Static Method {A}",
60+
"noOpsConverted": f"Static Method {C}",
61+
"noOpsAdaptedAndConverted": f"Static Method {AC}",
62+
"sjOpsWithCache": "SciJava Ops (cached)",
63+
"sjOps": "SciJava Ops",
64+
"sjOpsAdapted": f"SciJava Ops {A}",
65+
"sjOpsConverted": f"SciJava Ops {C}",
66+
"sjOpsConvertedAndAdapted": f"SciJava Ops {AC}",
67+
"ijOps": "ImageJ Ops",
68+
}
69+
}
70+
]
71+
72+
# Read in the benchmark results.
3873
with open("scijava-ops-benchmarks_results.json") as f:
3974
data = json.load(f)
4075

41-
# Build a map of results by benchmark class
42-
benchmark_classes = {}
43-
# And another map of results by benchmark test
44-
benchmark_tests = {}
76+
# Construct a mapping from test method to scores.
77+
results = {}
4578
for row in data:
46-
fqdn_tokens = row["benchmark"].split(".")
47-
cls, test = fqdn_tokens[-2], fqdn_tokens[-1]
48-
49-
# NB: Convert seconds to milliseconds
50-
score = 1000 * row["primaryMetric"]["score"]
51-
error = 1000 * row["primaryMetric"]["scoreError"]
52-
stats = {"score": score, "error": error}
53-
54-
if cls not in benchmark_classes:
55-
benchmark_classes[cls] = {}
56-
57-
benchmark_classes[cls][test] = stats
58-
59-
if test == "sciJavaOps":
60-
# NB: sciJavaOps == runOp
61-
test = "runOp"
62-
if test not in benchmark_tests:
63-
benchmark_tests[test] = []
64-
65-
benchmark_tests[test].append(stats)
79+
test = row["benchmark"].split(".")[-1]
80+
score = row["primaryMetric"]["score"]
81+
percentiles = row["primaryMetric"]["scorePercentiles"]
82+
minmax = [percentiles["0.0"], percentiles["100.0"]]
83+
results[test] = {"score": score, "minmax": minmax}
6684

67-
# Aggregate results into combined performance metrics
68-
benchmark_classes["BenchmarkCombined"] = {}
69-
for test, stats_list in benchmark_tests.items():
70-
# Take the average of all scores for this test
71-
score = statistics.fmean(stats["score"] for stats in stats_list)
72-
# Take the *worst* of all errors for this test
73-
error = max(stats["error"] for stats in stats_list)
74-
benchmark_classes["BenchmarkCombined"][test] = {"score": score, "error": error}
85+
# Build charts and dump them to JSON.
86+
for figure in figures:
87+
name = figure["name"]
88+
print(f"Generating figure for {name}", end="")
7589

76-
# For each class, build a chart and dump it to JSON
77-
for cls, test in benchmark_classes.items():
78-
print(f"Generating figure for {cls}", end="")
7990
x = []
8091
y = []
8192
error_y = []
93+
error_y_minus = []
8294

8395
# Add each benchmark in the class
84-
for method, stats in sorted(test.items(), key=lambda item: item[1]["score"]):
85-
print(".", end="")
86-
method = benchmark_categories.get(method, method)
87-
x.append(method)
88-
y.append(stats["score"])
89-
error_y.append(stats["error"])
96+
for test, label in figure["bars"].items():
97+
print(f".", end="")
98+
result = results[test]
99+
x.append(label)
100+
y.append(result["score"])
101+
error_y.append(result["minmax"][1] - result["score"])
102+
error_y_minus.append(result["score"] - result["minmax"][0])
90103

91104
# Create a bar chart
92105
fig = go.Figure()
93106
fig.add_bar(
94107
x=x,
95108
y=y,
96-
error_y=dict(type='data', array=error_y),
109+
error_y=dict(type='data', array=error_y, arrayminus=error_y_minus),
97110
)
98111
fig.update_layout(
99-
title_text=figure_titles.get(cls, "TODO: Add title"),
100-
yaxis_title="Performance (ms/op)"
112+
title_text=figure["title"] + f"<br><sup style=\"color: gray\">{A}=Adaptation, {C}=Conversion, {AC}=Adaptation & Conversion</sup>",
113+
yaxis_title="<b>Performance (&mu;s/execution)</b>"
101114
)
102115

103116
# Convert to JSON and dump
104-
with open(f"images/{cls}.json", "w") as f:
117+
with open(f"images/{name}.json", "w") as f:
105118
f.write(io.to_json(fig))
106119

107120
print()

0 commit comments

Comments
 (0)