-
Notifications
You must be signed in to change notification settings - Fork 1
Expand file tree
/
Copy pathhash-table.en.html
More file actions
319 lines (311 loc) · 22 KB
/
hash-table.en.html
File metadata and controls
319 lines (311 loc) · 22 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="generator" content="Docutils 0.17.1: http://docutils.sourceforge.net/" />
<title>Hash Container Notes — AMC Traffic Server Documentation</title>
<link rel="stylesheet" type="text/css" href="../_static/pygments.css" />
<link rel="stylesheet" type="text/css" href="../_static/sphinxdoc.css" />
<link rel="stylesheet" type="text/css" href="../_static/graphviz.css" />
<script data-url_root="../" id="documentation_options" src="../_static/documentation_options.js"></script>
<script src="../_static/jquery.js"></script>
<script src="../_static/underscore.js"></script>
<script src="../_static/doctools.js"></script>
<link rel="index" title="Index" href="../genindex.html" />
<link rel="search" title="Search" href="../search.html" />
<link rel="next" title="Diagrams" href="../reference.en.html" />
<link rel="prev" title="Traffic Server Layer 7 Working Group" href="L7R-Denver.en.html" />
</head><body>
<div class="related" role="navigation" aria-label="related navigation">
<h3>Navigation</h3>
<ul>
<li class="right" style="margin-right: 10px">
<a href="../genindex.html" title="General Index"
accesskey="I">index</a></li>
<li class="right" >
<a href="../reference.en.html" title="Diagrams"
accesskey="N">next</a> |</li>
<li class="right" >
<a href="L7R-Denver.en.html" title="Traffic Server Layer 7 Working Group"
accesskey="P">previous</a> |</li>
<li class="nav-item nav-item-0"><a href="../index.html">SWOC Docs</a> »</li>
<li class="nav-item nav-item-this"><a href="">Hash Container Notes</a></li>
</ul>
</div>
<div class="document">
<div class="documentwrapper">
<div class="bodywrapper">
<div class="body" role="main">
<section id="hash-container-notes">
<span id="id1"></span><h1>Hash Container Notes<a class="headerlink" href="#hash-container-notes" title="Permalink to this headline">¶</a></h1>
<p>Currently there is a plethora of hash table in the Traffic Server core. One the initiatives from the Cork was
to attempt to reduce this number. Before doing that successfully it is necessary to understand the
uses cases inside Traffic Server.</p>
<section id="use-cases">
<h2>Use Cases<a class="headerlink" href="#use-cases" title="Permalink to this headline">¶</a></h2>
<dl class="simple">
<dt>Simple Hash Table</dt><dd><p>Use of a hash table for rapid lookup of stored items.</p>
</dd>
<dt>Concurrent Hash Table</dt><dd><p>The simple hash table case with the additional requirement concurrency. In general this means at
most one writer and an arbitrary number of reads without explicit locks. Multiple writers are
handled by an explicit lock which is required only for table updates, not for searching or
reading.</p>
</dd>
<dt>Multi-index hash tables</dt><dd><p>In a few cases the items stored in the table need to be accessed via multiple distinct keys. The
archetypical example is the upstream session table which needs to be keyed by FQDN and IP
address.</p>
</dd>
<dt>External memory management</dt><dd><p>Because of the heavy use of proxy and class allocators, it is sometimes the case that memory and
life time management of objects is independent of the hash container. Again, the upstream session
table is archetypical in that sessions are created and destroyed elsewhere, outside the scope of
the container). Therefore inserting and removing an object should not cause its creation or
destruction. While it would be possible to use a container of smart pointers, this means
allocating and deallocating nodes in the container for inserts and deletes, in contrast to an
intrusive container which does not require such memory management.</p>
</dd>
</dl>
<p>A side property which is not required but is very conventient is the ability to use a member of the
object as a key. While the key can be split off, this tends to tie the object to the container.
Alternatively the key can be duplicated but if key requires local storage this requires another
memory allocation for each object created.</p>
</section>
<section id="designs">
<h2>Designs<a class="headerlink" href="#designs" title="Permalink to this headline">¶</a></h2>
<p>There are two basic approaches to supporting hash containers in Traffic Server.</p>
<section id="standard-template-library">
<h3>Standard Template Library<a class="headerlink" href="#standard-template-library" title="Permalink to this headline">¶</a></h3>
<p>The Standard Template Library provides the containers <code class="code docutils literal notranslate"><span class="pre">std::unordered_set</span></code> and
<code class="code docutils literal notranslate"><span class="pre">std::unordered_map</span></code>. These suffice for basic hash container use. The main concerns with the
STL containers are</p>
<ol class="arabic simple">
<li><p>Object lifetimes and memory management are tied to the container. This does not interact well
with proxy allocators for reasons noted above.</p></li>
<li><p>Relatedly, memory management is “invisible” and therfore tempting to use in places where
allocations should be avoided. It is easy to use STL containers in ways that have very bad memory
use characteristics.</p></li>
</ol>
<p>The primary concern here, as I understand it, is that use of STL containers for specific purposes
will encourage the same use in general, leading to significant performance degradation. This is not
an idle concern - I am currently involved inside Oath with another project, a plugin for Traffic Server, which
manages to consume as much CPU as the rest of Traffic Server put together due to uninformed choices in using
STL containers.</p>
<p>As noted previously, the memory management issues could be worked around using smart pointers.
However this would create a significant amount of additional memory management activity as the nodes
in the container will need to be allocated and released with every insert and removal.</p>
<p>Based on research in to allocators for STL containers, it does look like this is possible. A major
change in this area for C++11 is allocators can now be stateful. This is critical to prevent memory
migration between two containers, e.g. if memory is allocated from an allocator for container A, it
is not used for container B. However, currently this requires an update to <code class="code docutils literal notranslate"><span class="pre">MemArena</span></code> which is
blocked.</p>
</section>
<section id="intrusive-containers">
<h3>Intrusive Containers<a class="headerlink" href="#intrusive-containers" title="Permalink to this headline">¶</a></h3>
<p>Traffic Server has to a large extent rolled its own containers, primarily intrusive ones where the objects
directly support the containers. This is likely due to the style of memory management where objects
are allocatedand released independently of containers. This was originally a set of what became
legacy containers in the header file “lib/ts/Map.h”. These lacked not only documentation but any
compliance with STL standards, such as iterators, and handled only pointers. The containers also
depended on the <code class="code docutils literal notranslate"><span class="pre">Vec</span></code> class which is being phased out. For this reason, and to support
multi-indexing, the class <code class="code docutils literal notranslate"><span class="pre">TSHashTable</span></code> was created. <code class="code docutils literal notranslate"><span class="pre">TSHahhTable</span></code> had the advantages of</p>
<ul class="simple">
<li><p>Contain an arbitrary type <code class="code docutils literal notranslate"><span class="pre">T</span></code> as long as it had intrusive list support.</p></li>
<li><p>Control of the key.</p></li>
<li><p>Control of bucket expansion policy.</p></li>
<li><p>Iteration.</p></li>
</ul>
<p>These features make the container much easier for a standard C++ programmer to use as it enables
following standard STL use patterns.</p>
<p>The ability to select the key is valuable for performance as well because <code class="code docutils literal notranslate"><span class="pre">IntrusiveHashMap</span></code>
takes care to never default construct a key, but only initialize them. In addition the comparison
operator is under the control of the template parameters. This means the type used by the container
for search may be a reference or pointer to the actual key. This allows the type in the object to
own its memory while not imposing that requirement on the search key. For example, if the key is a
<code class="code docutils literal notranslate"><span class="pre">std::string</span></code> then the key used for searching can be <code class="code docutils literal notranslate"><span class="pre">const</span> <span class="pre">std::string</span> <span class="pre">&</span></code> or even a
<code class="code docutils literal notranslate"><span class="pre">std::string_view</span></code>, with the result that searching the container does not require
instantiating a <code class="code docutils literal notranslate"><span class="pre">std::string</span></code> and the attendant memory management. For an STL container,
searching requires creating a instance of key type and therefore also any memory allocation that
construction requires. There have been attempts to work around this by using STL containers and
making the key a reference or even a <code class="code docutils literal notranslate"><span class="pre">std::string_view</span></code> in to the object but these have failed
with obscure memory corruption issues. At present this is not a viable option. Additionally, it can
appear that lookups can be done on an STL container using such a reference, but type coercion will
cause the creation of the key transparently. This is a big deal for Traffic Server because in many cases the
source key is a string that is embedded in other memory (e.g., the name of a field in an HTTP
header).</p>
<p>Selecting the key also enabled the multi-index capability in that each index can select a different
key. In addition, if using STL containers then the general solution is to have one owner container
which has the actual objects and ancillary ones that contain pointers or iterators to the object in
the owner container. The problem here is that one must either duplicate the keys (for the ancillary
indices and in the object) or not have a key value available from other indices. This issue doesn’t
arise in a container that allows the key to be embedded in the object.</p>
<p>With the switch to C++11 and then C++17, it seemed reasonable to update this class to take advantage
of the new language features and become more STL compliant. This resulted in
<code class="code docutils literal notranslate"><span class="pre">IntrusiveHashMap</span></code> which has the advantages</p>
<ul class="simple">
<li><p>Use of <code class="code docutils literal notranslate"><span class="pre">IntrusiveDList</span></code>, a pure C++ class, instead of the C macro based linked lists used previously.</p></li>
<li><p>STL compliant support for ranges.</p></li>
<li><p>Full iterator support, replacing the psuedo-iterator <code class="code docutils literal notranslate"><span class="pre">TSHashTable::Location</span></code>.</p></li>
<li><p>Simplified specification, reducing the number of elements required in the descriptor type.</p></li>
</ul>
<p><code class="code docutils literal notranslate"><span class="pre">IntrusiveHashMap</span></code> has been through several iterations and example uses have be provided in
pull requests. The intention is for <code class="code docutils literal notranslate"><span class="pre">IntrusiveHashMap</span></code> to completely replace
<code class="code docutils literal notranslate"><span class="pre">TSHashTable</span></code> and uses of the other hash maps in “lib/ts/Map.h”. If the necessary support in
<code class="code docutils literal notranslate"><span class="pre">MemArena</span></code> (currently blocked) can be made available then <code class="code docutils literal notranslate"><span class="pre">IntrusiveHashMap</span></code> could be
used to replace the TCL hash tables.</p>
<p><code class="code docutils literal notranslate"><span class="pre">IntrusiveHashMap</span></code> works well with the standard allocation style used in Traffic Server. This will make
it relatively easy to replace current uses of internal legacy hash containers with</p>
<p><code class="code docutils literal notranslate"><span class="pre">IntrusiveHashMap</span></code> satisfies the specialized requirements for multi-indexed containers. Given
that, in my view, we will want this class for this reason, it seems a low marginal cost effort to
use the class in the other situations mentioned above. Not doing so will not remove the need for
this class.</p>
</section>
</section>
<section id="concurrency">
<h2>Concurrency<a class="headerlink" href="#concurrency" title="Permalink to this headline">¶</a></h2>
<p>Concurrency is hard to get right and fast. Currently there are custom concurrent hash containers in
Traffic Server. These use the technique of splitting locks per bucket, or using a pool locks <a class="footnote-reference brackets" href="#id3" id="id2">1</a>. The
alternative recommended at Cork is to use a well seasoned third party concurrent hash container. The
three options are</p>
<ol class="arabic simple">
<li><p><a class="reference external" href="https://www.threadingbuildingblocks.org/">Thread Building Blocks</a></p></li>
<li><p><a class="reference external" href="http://libcds.sourceforge.net/">Concurrent Data Structures</a></p></li>
<li><p><a class="reference external" href="http://concurrencykit.org/">Concurrency Kit</a></p></li>
</ol>
<p>Each of these has issues from the Traffic Server point of view but, in my opinion, any would be a improvement
over the current state of concurrent containers.</p>
<dl class="simple">
<dt>Thread Building Blocks</dt><dd><p>This is sponsored by Intel, but it is a heavy weight solution designed for a different domain
(compute intensive data processing) than that of Traffic Server (event driven I/O processing). Use of TBB
depends on being able to use just the concurrent containes and not most of the library. TBB is
also under development and would require periodic updates.</p>
</dd>
<dt>Concurrent Data Structures</dt><dd><p>This is a nice set of concurrent containers although the documentation is a bit thin. The main
problem here, in my view, is its dependency on thread support. It is unclear to me why this is
necessary, but the main thread and each thread that uses the library must explicitly “attach” to
the library in the thread. The development status is unknown.</p>
</dd>
<dt>Concurrency Kit</dt><dd><p>This is a very mature, well tested and fast set of concurrency containers. The problem here is it
is written in C and not amenable to easy conversion to C++11. In fact, the C code is such that it
will not even compile as C++ without extension re-skinning. It has been discussed to start a
“CK++” project to do this, which was endorsed by the Concurrency Kit author. The base C version
is a completed project, there is no active development.</p>
</dd>
</dl>
</section>
<section id="conclusions">
<h2>Conclusions<a class="headerlink" href="#conclusions" title="Permalink to this headline">¶</a></h2>
<p>While the various merits and costs of the various concurrent container libraries can be debated, the
reality is likely we will adopt which ever of these is first brought in to compatibility with Traffic Server.
My personal preference is for CK++, the reskinning of Concurrency Kit. However, I expect any of the
choices will be good enough.</p>
<p>For uses cases that do not require concurrency I think concurrent containers should not be used.
There is a cost to using them and why pay that if it’s not necessary? Concurrent containers should
be used iff there is a clear reduction in use of an explicit lock. E.g. using a concurrent hash
container means that read operations no longer need to get a lock.</p>
<p>For general hash containers there are two issues, ease of use and performance.</p>
<p>I think for our use case, where many objects are allocated via proxy allocators, the intrusive
containers are a clear win. They clearly do less work, because they do not do any memory management.
On the other hand, there are many uses of hash containers where either performance isn’t critical
(e.g.. process static tables) or where the objects aren’t proxy allocated. In such cases I see no
reason to not use STL containers, other than the carry on effects noted above. There is also the
argument that use of jemalloc or even modern glibc provides the equivalent of per thread proxy
allocators without the explicit coding requirements of the Traffic Server internal ones and therefore the
performance differences will be minimal. I don’t consider that argument because the concensus, as I
understand it, is that for the forseeable future the code base will keep the proxy allocators.</p>
<p>I find the ease of use argument comes to the same point. For proxy allocated objects, for
multi-indexing situations, and where the key is naturally embedded in the object,
<code class="code docutils literal notranslate"><span class="pre">IntrusiveHashMap</span></code> is easier to use. I have for decades thought the require separation of key
and object in STL containers is a problem to be worked around, not a feature, and that’s no small
part of why I like <code class="code docutils literal notranslate"><span class="pre">IntrusiveHashMap</span></code> better. No small part of the recent upgrade was to make
the class more STL compliant so that it acts like an STL container, e.g. range <code class="code docutils literal notranslate"><span class="pre">for</span></code> loops
work as with an STL container. I agree that for cases where the object exists only in the container,
the STL containers are the easier to use choice. It is also the case that <code class="code docutils literal notranslate"><span class="pre">IntrusiveHashmap</span></code>
<em>requires</em> the key to be in the object. This is usually reasonable but can be a problem in some
cases. Obviously those are ones where an STL container is a better choice.</p>
<p>Overall, then, for basic hash containers I would have</p>
<ul class="simple">
<li><p><code class="code docutils literal notranslate"><span class="pre">IntrusiveHashMap</span></code> for high performance use cases, and ease of use for the noted use cases.</p></li>
<li><p>Use STL containers for process statics or non-performant code. I think it should be OK to make
this the default choice and use either <code class="code docutils literal notranslate"><span class="pre">IntrusiveHashMap</span></code> or a concurrent container if a
specific need to do so exists.</p></li>
</ul>
<p class="rubric">Footnotes</p>
<dl class="footnote brackets">
<dt class="label" id="id3"><span class="brackets"><a class="fn-backref" href="#id2">1</a></span></dt>
<dd><p>In some sense these aren’t different - if the mutex in the pool is selected by key hashing it is
effectively the same as a mutex per bucket.</p>
</dd>
</dl>
</section>
</section>
<div class="clearer"></div>
</div>
</div>
</div>
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
<div class="sphinxsidebarwrapper">
<p class="logo"><a href="../index.html">
<img class="logo" src="../_static/balcora-gate-400x400.jpg" alt="Logo"/>
</a></p>
<h3><a href="../index.html">Table of Contents</a></h3>
<ul>
<li><a class="reference internal" href="#">Hash Container Notes</a><ul>
<li><a class="reference internal" href="#use-cases">Use Cases</a></li>
<li><a class="reference internal" href="#designs">Designs</a><ul>
<li><a class="reference internal" href="#standard-template-library">Standard Template Library</a></li>
<li><a class="reference internal" href="#intrusive-containers">Intrusive Containers</a></li>
</ul>
</li>
<li><a class="reference internal" href="#concurrency">Concurrency</a></li>
<li><a class="reference internal" href="#conclusions">Conclusions</a></li>
</ul>
</li>
</ul>
<h4>Previous topic</h4>
<p class="topless"><a href="L7R-Denver.en.html"
title="previous chapter">Traffic Server Layer 7 Working Group</a></p>
<h4>Next topic</h4>
<p class="topless"><a href="../reference.en.html"
title="next chapter">Diagrams</a></p>
<div role="note" aria-label="source link">
<h3>This Page</h3>
<ul class="this-page-menu">
<li><a href="../_sources/notes/hash-table.en.rst.txt"
rel="nofollow">Show Source</a></li>
</ul>
</div>
<div id="searchbox" style="display: none" role="search">
<h3 id="searchlabel">Quick search</h3>
<div class="searchformwrapper">
<form class="search" action="../search.html" method="get">
<input type="text" name="q" aria-labelledby="searchlabel" autocomplete="off" autocorrect="off" autocapitalize="off" spellcheck="false"/>
<input type="submit" value="Go" />
</form>
</div>
</div>
<script>$('#searchbox').show(0);</script>
</div>
</div>
<div class="clearer"></div>
</div>
<div class="related" role="navigation" aria-label="related navigation">
<h3>Navigation</h3>
<ul>
<li class="right" style="margin-right: 10px">
<a href="../genindex.html" title="General Index"
>index</a></li>
<li class="right" >
<a href="../reference.en.html" title="Diagrams"
>next</a> |</li>
<li class="right" >
<a href="L7R-Denver.en.html" title="Traffic Server Layer 7 Working Group"
>previous</a> |</li>
<li class="nav-item nav-item-0"><a href="../index.html">SWOC Docs</a> »</li>
<li class="nav-item nav-item-this"><a href="">Hash Container Notes</a></li>
</ul>
</div>
<div class="footer" role="contentinfo">
© Copyright 2017, amc@apache.org.
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 4.1.2.
</div>
</body>
</html>