-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Expand file tree
/
Copy pathProfiler.html
More file actions
373 lines (371 loc) · 24.7 KB
/
Copy pathProfiler.html
File metadata and controls
373 lines (371 loc) · 24.7 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
<!-- HTML header for doxygen 1.13.1-->
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "https://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en-US">
<head>
<meta http-equiv="Content-Type" content="text/xhtml;charset=UTF-8"/>
<meta http-equiv="X-UA-Compatible" content="IE=11"/>
<meta name="generator" content="Doxygen 1.13.1"/>
<meta name="viewport" content="width=device-width, initial-scale=1"/>
<title>Taskflow: A General-purpose Task-parallel Programming System: Profile Taskflow Programs</title>
<link href="tabs.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="jquery.js"></script>
<script type="text/javascript" src="dynsections.js"></script>
<script type="text/javascript" src="clipboard.js"></script>
<link href="navtree.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="navtreedata.js"></script>
<script type="text/javascript" src="navtree.js"></script>
<script type="text/javascript" src="resize.js"></script>
<script type="text/javascript" src="cookie.js"></script>
<link href="search/search.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="search/searchdata.js"></script>
<script type="text/javascript" src="search/search.js"></script>
<link href="doxygen.css" rel="stylesheet" type="text/css" />
<link href="custom.css" rel="stylesheet" type="text/css"/>
</head>
<body>
<div id="top"><!-- do not remove this div, it is closed by doxygen! -->
<div id="titlearea">
<table cellspacing="0" cellpadding="0">
<tbody>
<tr id="projectrow">
<td id="projectlogo"><img alt="Logo" src="taskflow_logo.png"/></td>
<td id="projectalign">
<div id="projectname"><a href="https://github.com/taskflow/taskflow" style="color:inherit; text-decoration:none;">Taskflow: A General-purpose Task-parallel Programming System</a>
</div>
</td>
</tr>
</tbody>
</table>
</div>
<!-- end header part -->
<!-- Generated by Doxygen 1.13.1 -->
<script type="text/javascript">
/* @license magnet:?xt=urn:btih:d3d9a9a6595521f9666a5e94cc830dab83b65699&dn=expat.txt MIT */
var searchBox = new SearchBox("searchBox", "search/",'.html');
/* @license-end */
</script>
<script type="text/javascript">
/* @license magnet:?xt=urn:btih:d3d9a9a6595521f9666a5e94cc830dab83b65699&dn=expat.txt MIT */
$(function() { codefold.init(0); });
/* @license-end */
</script>
<script type="text/javascript" src="menudata.js"></script>
<script type="text/javascript" src="menu.js"></script>
<script type="text/javascript">
/* @license magnet:?xt=urn:btih:d3d9a9a6595521f9666a5e94cc830dab83b65699&dn=expat.txt MIT */
$(function() {
initMenu('',true,false,'search.php','Search',true);
$(function() { init_search(); });
});
/* @license-end */
</script>
<div id="main-nav"></div>
</div><!-- top -->
<div id="side-nav" class="ui-resizable side-nav-resizable">
<div id="nav-tree">
<div id="nav-tree-contents">
<div id="nav-sync" class="sync"></div>
</div>
</div>
<div id="splitbar" style="-moz-user-select:none;"
class="ui-resizable-handle">
</div>
</div>
<script type="text/javascript">
/* @license magnet:?xt=urn:btih:d3d9a9a6595521f9666a5e94cc830dab83b65699&dn=expat.txt MIT */
$(function(){initNavTree('Profiler.html',''); initResizable(true); });
/* @license-end */
</script>
<div id="doc-content">
<!-- window showing the filter options -->
<div id="MSearchSelectWindow"
onmouseover="return searchBox.OnSearchSelectShow()"
onmouseout="return searchBox.OnSearchSelectHide()"
onkeydown="return searchBox.OnSearchSelectKey(event)">
</div>
<!-- iframe showing the search results (closed by default) -->
<div id="MSearchResultsWindow">
<div id="MSearchResults">
<div class="SRPage">
<div id="SRIndex">
<div id="SRResults"></div>
<div class="SRStatus" id="Loading">Loading...</div>
<div class="SRStatus" id="Searching">Searching...</div>
<div class="SRStatus" id="NoMatches">No Matches</div>
</div>
</div>
</div>
</div>
<div><div class="header">
<div class="headertitle"><div class="title">Profile Taskflow Programs</div></div>
</div><!--header-->
<div class="contents">
<div class="toc"><h3>Table of Contents</h3>
<ul>
<li class="level1">
<a href="#ProfilerEnableTFProf">Enable Taskflow Profiler</a>
</li>
<li class="level1">
<a href="#ProfilerTFPFormat">The .tfp Binary Format</a>
</li>
<li class="level1">
<a href="#ProfilerWebInterface">Visualize with the TFProf Web Interface</a>
<ul>
<li class="level2">
<a href="#ProfilerToolbar">Toolbar</a>
</li>
<li class="level2">
<a href="#ProfilerExecutorFilter">Executor Filter</a>
</li>
<li class="level2">
<a href="#ProfilerTimeline">Execution Timeline</a>
</li>
<li class="level2">
<a href="#ProfilerOverview">Overview Minimap</a>
</li>
<li class="level2">
<a href="#ProfilerParallelism">Task Parallelism vs Active Workers</a>
</li>
<li class="level2">
<a href="#ProfilerCriticalTasks">Critical Tasks</a>
</li>
<li class="level2">
<a href="#ProfilerExecutorStats">Executor Statistics</a>
</li>
<li class="level2">
<a href="#ProfilerHistogram">Task Duration Distribution</a>
</li>
</ul>
</li>
<li class="level1">
<a href="#ProfilerDisplayProfileSummary">Display Profile Summary to Console</a>
</li>
<li class="level1">
<a href="#ProfilerTips">Profiling Tips</a>
<ul>
<li class="level2">
<a href="#ProfilerTipsLarge">Large Traces</a>
</li>
<li class="level2">
<a href="#ProfilerTipsRecursive">Recursive and Nested Subflows</a>
</li>
<li class="level2">
<a href="#ProfilerTipsMultiExecutor">Multiple Executors</a>
</li>
<li class="level2">
<a href="#ProfilerTipsZoomWorkflow">Recommended Workflow</a>
</li>
</ul>
</li>
</ul>
</div>
<div class="textblock"><p>Taskflow comes with a built-in profiler, <em>TFProf</em>, for you to profile and visualize the execution of taskflow programs. TFProf records every task execution across all worker threads in every executor and produces either a compact binary trace file (<code></code>.tfp) for interactive visualization, or a concise text summary to standard error.</p>
<h1><a class="anchor" id="ProfilerEnableTFProf"></a>
Enable Taskflow Profiler</h1>
<p>All taskflow programs include a lightweight, always-available profiling module. No recompilation or special build flags are needed. To activate it, set the environment variable <code>TF_ENABLE_PROFILER</code> to the desired output file path before running your program:</p>
<div class="fragment"><div class="line">~$ TF_ENABLE_PROFILER=result.tfp ./my_taskflow</div>
</div><!-- fragment --><p>When the program finishes, it writes the profiling data to <code>result.tfp</code> in the TFProf binary format (<code></code>.tfp). If no file path is given (i.e., the variable is set but empty), TFProf prints a concise text summary to standard error instead.</p>
<h1><a class="anchor" id="ProfilerTFPFormat"></a>
The .tfp Binary Format</h1>
<p>The <code></code>.tfp file is a compact binary format designed for fast loading and efficient compression. Each segment record stores:</p>
<ul>
<li><b>delta_beg</b> : the time offset from the previous segment's start, encoded as a variable-length integer (varint). Because segments are time-ordered and adjacent tasks on the same worker tend to start close together, deltas are typically small (1–3 bytes rather than 8 bytes for an absolute timestamp).</li>
<li><b>duration</b> : the task execution time (<code>end</code> - <code>beg</code>), also varint-encoded.</li>
<li><b>name_off</b>, <b>name_len</b> : a reference into a per-executor string table that deduplicates task names.</li>
<li><b>type|name_len</b> : the task type and name length packed into a single byte.</li>
</ul>
<p>A file with <code>N</code> executors contains one 12-byte file header followed by <code>N</code> self-contained executor blocks, each with its own string table. This design keeps the format simple and allows each executor block to be decoded independently. In practice, the delta + varint encoding reduces file size by 63–71% compared to a naive fixed-width representation — a 50 MB raw trace typically compresses to 15 MB or less without any external compression library.</p>
<h1><a class="anchor" id="ProfilerWebInterface"></a>
Visualize with the TFProf Web Interface</h1>
<p>Open the TFProf web interface at <a href="https://taskflow.github.io/tfprof/">https://taskflow.github.io/tfprof/</a> and drop your <code></code>.tfp file onto the page (or click <b>"Open .tfp"</b>). The interface is a self-contained HTML file with no server, no installation, and no network dependency — it runs entirely in your browser.</p>
<div class="image">
<img src="tfprof_start.png" alt="" width="100%"/>
</div>
<p>The interface is organized into the following panels from top to bottom:</p>
<h2><a class="anchor" id="ProfilerToolbar"></a>
Toolbar</h2>
<div class="image">
<img src="tfprof_toolbar.png" alt="" width="100%"/>
</div>
<p>The toolbar at the top shows:</p>
<ul>
<li><b>"Taskflow Profiler"</b> — the application title.</li>
<li><b>"Open .tfp"</b> — opens a file picker to load a trace.</li>
<li><b>"Reset Zoom"</b> — returns to the full time range.</li>
<li><b>"<- Back"</b> — steps back to the previous zoom window.</li>
<li><b>"Taskflow GitHub"</b> — links to the Taskflow repository.</li>
</ul>
<p>A statistics bar below the toolbar shows live summary values for the currently loaded trace and active zoom window: <b>Workers</b>, <b>Tasks</b>, <b>Wall</b> (total wall-clock duration), <b>Window</b> (current zoom range), and <b>Visible</b> (number of segments visible).</p>
<h2><a class="anchor" id="ProfilerExecutorFilter"></a>
Executor Filter</h2>
<div class="image">
<img src="tfprof_execfilter.png" alt="" width="30%"/>
</div>
<p>The <b>"Executors: All N ▾"</b> button opens a searchable popover listing every executor in the trace. Each row shows the executor ID alongside live statistics (task count, utilization, peak parallelism) drawn from the current zoom window, so you can immediately spot which executors are most active or most idle. All columns in the popover are sortable by clicking the column header — click once to sort descending, again to reverse. Selecting or deselecting executors instantly updates every panel below.</p>
<h2><a class="anchor" id="ProfilerTimeline"></a>
Execution Timeline</h2>
<div class="image">
<img src="tfprof_timeline.png" alt="" width="100%"/>
</div>
<p>The execution timeline is the main view. Each row represents one worker level (<code>E<i>.W<j>.L<k></code> denotes executor <code>i</code>, physical worker <code>j</code>, nesting level <code>k</code>). A physical worker that spawns recursive subflows produces multiple levels; all levels share the same physical thread and are counted as one active worker.</p>
<p>Each colored segment represents a task execution, color-coded by task type:</p>
<table class="markdownTable">
<tr class="markdownTableHead">
<th class="markdownTableHeadNone">Color </th><th class="markdownTableHeadNone">Type </th></tr>
<tr class="markdownTableRowOdd">
<td class="markdownTableBodyNone">Blue </td><td class="markdownTableBodyNone">Static task </td></tr>
<tr class="markdownTableRowEven">
<td class="markdownTableBodyNone">Orange </td><td class="markdownTableBodyNone">Subflow task </td></tr>
<tr class="markdownTableRowOdd">
<td class="markdownTableBodyNone">Green </td><td class="markdownTableBodyNone">Condition task </td></tr>
<tr class="markdownTableRowEven">
<td class="markdownTableBodyNone">Red/Pink </td><td class="markdownTableBodyNone">Async task </td></tr>
<tr class="markdownTableRowOdd">
<td class="markdownTableBodyNone">Gray </td><td class="markdownTableBodyNone">Clustered (multiple tasks merged for display) </td></tr>
</table>
<p>When many tasks are too small to render individually at the current zoom level, TFProf merges adjacent tasks into a single <em>clustered</em> segment (shown in gray). Hover over any segment to see a tooltip with the task type, name, worker, duration, and start time. For clustered segments, the tooltip shows the task count and invites you to zoom in to see individual tasks.</p>
<p><b>Zooming:</b> brush-select any horizontal region to zoom into that window. Double-click anywhere on the timeline to step back to the previous zoom level. The <b>Reset</b> Zoom button returns to the full trace.</p>
<p>The timeline uses virtual scrolling for large traces with thousands of workers — only the rows currently in the viewport are rendered, keeping the interface responsive regardless of worker count.</p>
<h2><a class="anchor" id="ProfilerOverview"></a>
Overview Minimap</h2>
<div class="image">
<img src="tfprof_overview.png" alt="" width="100%"/>
</div>
<p>The <b>Overview</b> panel below the timeline shows the entire trace compressed into a single minimap row per worker. A blue selection rectangle shows the current zoom window. Drag the selection to pan; brush a new region to jump there directly.</p>
<h2><a class="anchor" id="ProfilerParallelism"></a>
Task Parallelism vs Active Workers</h2>
<div class="image">
<img src="tfprof_parallelism.png" alt="" width="100%"/>
</div>
<p>This section contains two stacked panels that share the same time axis:</p>
<ul>
<li><b>Task</b> (top) — the number of tasks concurrently running at each point in time, drawn as a blue step-line area. This can exceed the physical worker count when subflow nesting produces multiple active tasks on the same thread.</li>
<li><b>Worker</b> (bottom) — the number of distinct physical workers simultaneously executing a task, drawn as a green filled area. Physical worker deduplication is applied: if worker <code>W1</code> is active at nesting levels <code>L0</code>, <code>L1</code>, and <code>L2</code> simultaneously, it is counted as one active worker.</li>
</ul>
<p>Both panels zoom together with the main timeline. Brush the Worker panel to zoom, or double-click to step back.</p>
<h2><a class="anchor" id="ProfilerCriticalTasks"></a>
Critical Tasks</h2>
<div class="image">
<img src="tfprof_criticaltasks.png" alt="" width="100%"/>
</div>
<p>The <b>"Critical Tasks"</b> bar chart ranks the top-N tasks by duration within the current zoom window. The default is top 50; adjust the number with the input field. Bars are color-coded by task type. Hover a bar to see the task details; click it to zoom the timeline to that task's time span (with 50% padding on each side, clamped to the trace bounds).</p>
<h2><a class="anchor" id="ProfilerExecutorStats"></a>
Executor Statistics</h2>
<div class="image">
<img src="tfprof_execstats.png" alt="" width="100%"/>
</div>
<p>The <b>"Executor Statistics"</b> table reports per-executor metrics computed over the current zoom window:</p>
<table class="markdownTable">
<tr class="markdownTableHead">
<th class="markdownTableHeadNone">Column </th><th class="markdownTableHeadNone">Meaning </th></tr>
<tr class="markdownTableRowOdd">
<td class="markdownTableBodyNone">Executor </td><td class="markdownTableBodyNone">Executor ID </td></tr>
<tr class="markdownTableRowEven">
<td class="markdownTableBodyNone">Workers </td><td class="markdownTableBodyNone">Number of distinct physical worker threads </td></tr>
<tr class="markdownTableRowOdd">
<td class="markdownTableBodyNone">Tasks </td><td class="markdownTableBodyNone">Total task executions in the window </td></tr>
<tr class="markdownTableRowEven">
<td class="markdownTableBodyNone">Wall Time </td><td class="markdownTableBodyNone">Length of the zoom window </td></tr>
<tr class="markdownTableRowOdd">
<td class="markdownTableBodyNone">Active Time </td><td class="markdownTableBodyNone">Union of all task intervals (time at least one worker was busy) </td></tr>
<tr class="markdownTableRowEven">
<td class="markdownTableBodyNone">Idle Time </td><td class="markdownTableBodyNone">Wall Time − Active Time </td></tr>
<tr class="markdownTableRowOdd">
<td class="markdownTableBodyNone">Utilization </td><td class="markdownTableBodyNone">Σ(worker active time) / (Workers × Wall Time) </td></tr>
<tr class="markdownTableRowEven">
<td class="markdownTableBodyNone">Peak </td><td class="markdownTableBodyNone">Maximum simultaneously active workers at any instant </td></tr>
<tr class="markdownTableRowOdd">
<td class="markdownTableBodyNone">Min Dur </td><td class="markdownTableBodyNone">Shortest individual task duration </td></tr>
<tr class="markdownTableRowEven">
<td class="markdownTableBodyNone">Avg Dur </td><td class="markdownTableBodyNone">Mean task duration </td></tr>
<tr class="markdownTableRowOdd">
<td class="markdownTableBodyNone">Max Dur </td><td class="markdownTableBodyNone">Longest individual task duration </td></tr>
</table>
<p>Click any column header to sort ascending or descending (indicated by ▲/▼). Utilization is color-coded: green ≥ 80%, amber 50–80%, red < 50%.</p>
<p>All values update live as you zoom or filter executors.</p>
<p><b>Notes</b> printed below the table:</p><ul>
<li><b>Active</b> <b>Time</b> is the union of all task intervals (not the sum of individual durations), so overlapping tasks on different workers are not double-counted.</li>
<li><b>Utilization</b> counts time a worker is <em>executing</em> a task on any nesting level — if <code>W1</code> runs at <code>L0</code> and <code>L1</code> simultaneously, only one unit of worker time is counted per wall-clock instant.</li>
</ul>
<h2><a class="anchor" id="ProfilerHistogram"></a>
Task Duration Distribution</h2>
<div class="image">
<img src="tfprof_histogram.png" alt="" width="100%"/>
</div>
<p>The <b>"Task Duration Distribution"</b> panel shows the shape of the task duration distribution for the active executor selection and zoom window, drawn as a cyan step-line area plot.</p>
<p>The x-axis is the task duration and the y-axis is the task count per bin. TFProf automatically selects linear or logarithmic binning based on two signals:</p>
<ol type="1">
<li><b>Range</b> <b>ratio</b> — if <code>max/min</code> > 50, the span is wide enough that linear bins would crush nearly all tasks into the leftmost few bins.</li>
<li><b>Skewness</b> <b>proxy</b> — if the mean duration is more than 2× the median, the distribution is right-skewed (a few very long tasks dominate).</li>
</ol>
<p>When both signals are present, log binning is used; otherwise linear binning is applied. Sub-nanosecond durations (below 1 ns = 0.001 µs) are clamped to the first bin; when this occurs the leftmost x-axis tick is labeled <code><1ns</code>.</p>
<h1><a class="anchor" id="ProfilerDisplayProfileSummary"></a>
Display Profile Summary to Console</h1>
<p>To get a quick overview without opening the browser, set <code>TF_ENABLE_PROFILER</code> to an empty string. TFProf will print a text summary to standard error for each executor:</p>
<div class="fragment"><div class="line"># enable the profiler without a file path to print summary to stderr</div>
<div class="line">~$ TF_ENABLE_PROFILER= ./my_taskflow_program</div>
</div><!-- fragment --><p>A typical summary looks like this:</p>
<div class="fragment"><div class="line">================================================================================</div>
<div class="line"> Observer 0 | Wall: 203.29 ms | Workers: 12 | Tasks: 45231 | Avg Utilization: 76.4%</div>
<div class="line">================================================================================</div>
<div class="line"> </div>
<div class="line">[Aggregate Task Statistics]</div>
<div class="line">------------------------------------------------------------------</div>
<div class="line"> Type Count Total(us) Avg(us) Min(us) Max(us)</div>
<div class="line">------------------------------------------------------------------</div>
<div class="line"> static 44892 1823451 40.62 1.00 285.00</div>
<div class="line"> async 339 12483 36.82 2.00 197.00</div>
<div class="line"> </div>
<div class="line">[Worker Utilization]</div>
<div class="line">----------------------------------------------------------------------------</div>
<div class="line"> Worker Tasks Busy(us) Idle(us) Avg(us) Min(us) Max(us) Util%</div>
<div class="line">----------------------------------------------------------------------------</div>
<div class="line"> 0 4821 155244 48049 32.20 1.0 285.0 76.4%</div>
<div class="line"> 1 3902 148821 54472 38.14 1.0 241.0 73.2%</div>
<div class="line"> ...</div>
<div class="line">----------------------------------------------------------------------------</div>
<div class="line"> Total 45231 1823451 76.4% (avg)</div>
</div><!-- fragment --><p>The summary has three sections:</p>
<ol type="1">
<li><b>Overview</b> — wall-clock duration, worker count, total task count, and average worker utilization. Utilization is the mean busy fraction across all workers; 100% means every worker was busy throughout the entire execution.</li>
<li><b>Aggregate</b> <b>Task</b> <b>Statistics</b> — execution statistics broken down by task type. Columns report the execution count, total time, average, minimum, and maximum per-task duration.</li>
<li><b>Worker</b> <b>Utilization</b> — per-worker breakdown listing task count, total busy time, idle time, average/min/max task duration, and per-worker utilization. Workers that ran no tasks are omitted. The <code>Total</code> row aggregates counts and times across all active workers.</li>
</ol>
<h1><a class="anchor" id="ProfilerTips"></a>
Profiling Tips</h1>
<h2><a class="anchor" id="ProfilerTipsLarge"></a>
Large Traces</h2>
<p>For programs with millions of tasks the <code></code>.tfp file can be tens of megabytes. TFProf loads and parses the file entirely in a background browser thread so the page remains responsive during loading. The execution timeline uses virtual scrolling so even traces with thousands of worker rows render smoothly.</p>
<h2><a class="anchor" id="ProfilerTipsRecursive"></a>
Recursive and Nested Subflows</h2>
<p>Recursive taskflow programs (such as divide-and-conquer or Fibonacci-style graphs) produce many nesting levels per physical worker. The timeline labels these as <code>E<i>.W<j>.L<k></code> where <code>L<k></code> is the nesting depth. The <b>Worker</b> panel of the parallelism plot and the <b>Executor</b> Statistics table both deduplicate physical workers — if <code>W1</code> appears at levels <code>L0</code> through <code>L5</code> simultaneously, it counts as one active worker thread.</p>
<h2><a class="anchor" id="ProfilerTipsMultiExecutor"></a>
Multiple Executors</h2>
<p>When a program creates more than one <code><a class="el" href="classtf_1_1Executor.html" title="class to create an executor">tf::Executor</a></code>, TFProf records each one as a separate executor block in the <code></code>.tfp file. Use the <b>Executor</b> <b>Filter</b> to focus on a single executor or compare multiple executors side by side. The Executor Statistics table always shows one row per executor, making it easy to spot load imbalance across executors.</p>
<h2><a class="anchor" id="ProfilerTipsZoomWorkflow"></a>
Recommended Workflow</h2>
<p>A typical profiling session follows this pattern:</p>
<ol type="1">
<li>Run your program with <code>TF_ENABLE_PROFILER=result.tfp</code>.</li>
<li>Open <a href="https://taskflow.github.io/tfprof/">https://taskflow.github.io/tfprof/</a> and drop <code>result.tfp</code> onto the page.</li>
<li>Examine the <b>Executor</b> Statistics table to find executors with low utilization or high idle time — these are the first candidates for optimization.</li>
<li>Use the <b>Executor</b> Filter to isolate a single executor.</li>
<li>Look at the <b>Task</b> Parallelism panel — sustained low task count indicates sequential bottlenecks; high task count with low worker count indicates scheduling overhead.</li>
<li>Click a bar in the <b>Critical</b> Tasks chart to zoom the timeline to the longest-running task and inspect its neighbors.</li>
<li>Examine the <b>Task</b> Duration Distribution — a bimodal distribution (two peaks) suggests two qualitatively different task categories that may benefit from being separated into different executors or sized differently. </li>
</ol>
</div></div><!-- contents -->
</div><!-- PageDoc -->
</div><!-- doc-content -->
<!-- HTML footer for doxygen 1.13.1-->
<!-- start footer part -->
<div id="nav-path" class="navpath"><!-- id is needed for treeview function! -->
<ul>
<li class="navelem"><a class="el" href="Cookbook.html">Cookbook</a></li>
<li class="footer">
Maintained by <a href="https://tsung-wei-huang.github.io/">Dr. Tsung-Wei Huang</a>
—
Generated by <a href="https://www.doxygen.org/index.html"><img class="footer" src="doxygen.svg" width="104" height="31" alt="doxygen"/></a> 1.13.1
</li>
</ul>
</div>