mirror of https://gitee.com/bigwinds/arangodb
543 lines
21 KiB
HTML
543 lines
21 KiB
HTML
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
|
|
|
|
<html>
|
|
|
|
<head>
|
|
<title>Smart Pointer Timings</title>
|
|
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
|
|
</head>
|
|
|
|
<body text="#000000" bgcolor="#ffffff" link="#0000ff" vlink="#0000ff">
|
|
|
|
<h1><img height="86" alt="boost.png (6897 bytes)" src="../../boost.png" width="277" align="middle" border="0">Smart Pointer Timings</h1>
|
|
|
|
<p>In late January 2000, Mark Borgerding put forward a suggestion to boost for
|
|
a new design of smart pointer whereby an intrusive doubly linked list is used
|
|
to join together all instances of smart pointers sharing a given raw pointer.
|
|
This allowed avoidance of the costly heap allocation of a reference count that
|
|
occurred in the initial construction of the then current version of boost::shared_ptr.
|
|
Of course, nothing is for free and the benefit here was gained at the expense
|
|
of increased size and more costly copy operations. A debate ensued on the boost
|
|
mailing list and the tests which this page describes were performed to provide
|
|
a guide for current and future investigations into smart pointer implementation
|
|
strategies.</p>
|
|
<p>Thanks are due to <a href="http://www.boost.org/people/dave_abrahams.htm">Dave Abrahams</a>,
|
|
Gavin Collings,
|
|
<a href="http://www.boost.org/people/greg_colvin.htm">Greg Colvin</a> and
|
|
<a href="http://www.boost.org/people/beman_dawes.html">Beman Dawes</a>
|
|
for test code and trial implementations, the final version of which can be found
|
|
in .zip format <a href="smarttest.zip">here</a>.</p>
|
|
<h2>Description</h2>
|
|
<p>Two tests were run: the first aimed to obtain timings for two basic individual
|
|
operations:</p>
|
|
<ol type="i">
|
|
<li> Initial construction from raw pointer.</li>
|
|
<li> An amortized copy operation consisting of half an assignment and half a
|
|
copy construction - designed to reflect average usage.</li>
|
|
</ol>
|
|
<p>The second attempted to gain more insight into normal usage by timing the fill
|
|
and sort algorithms for vectors and lists filled with the various smart pointers.</p>
|
|
<p>Five smart pointer implementation strategies were tested:</p>
|
|
<ol type="i">
|
|
<li>Counted pointer using a heap allocated reference count, this is referred
|
|
to as <b>simple counted</b>.</li>
|
|
<li>Counted pointer using a special purpose allocator for the reference count
|
|
- <b>special counted</b>.</li>
|
|
<li>Counted pointer using an intrusive reference count - <b>intrusive</b>.</li>
|
|
<li>Linked pointer as described above - <b>linked</b>.</li>
|
|
<li>Cyclic pointer, a counted implementation using a std::deque for allocation
|
|
with provision for weak pointers and garbage collection of cycles of pointers
|
|
- <b>cyclic</b>.</li>
|
|
</ol>
|
|
<p>on two compilers:</p>
|
|
<ol type="i">
|
|
<li>MSVC 6.0 service pack 3, using default release optimization mode (/O2 -
|
|
optimized for speed, no inlining of functions defined outside a class body
|
|
unless specified as inline).</li>
|
|
<li>gcc 2.95.2 using full optimization (-O3 -DNDEBUG).</li>
|
|
</ol>
|
|
<p>Additionally, generated pointer sizes (taking into account struct alignment)
|
|
were compared, as were generated code sizes for MSVC mainly by manual inspection
|
|
of generated assembly code - a necessity due to function inlining.</p>
|
|
<p>All tests were run on a PII-200 running Windows NT version 4.0</p>
|
|
<h2> </h2>
|
|
<h2>Operation Timing Test Results</h2>
|
|
<p>The following graphs show the overall time in nanoseconds to acquire a pointer
|
|
(default construction) perform n amortized copy operations on it and finally
|
|
release it. The initial allocation time for the contained pointer is not included,
|
|
although the time for it's deallocation is. The contained pointer pointed to
|
|
a trivial class, but for the inclusion of an intrusive reference count for the
|
|
benefit of the intrusive counted shared pointer. A dumb pointer (i.e. a smart
|
|
pointer that simply acquires and releases its contained pointer with no extra
|
|
overhead) and a raw pointer were also included for comparison.</p>
|
|
<table border="0" align="center">
|
|
<tr>
|
|
<td width="20" height="20"> </td>
|
|
<td> </td>
|
|
<td width="20"> </td>
|
|
</tr>
|
|
<tr>
|
|
<td width="20"> </td>
|
|
<td><img src="msvcspeed.gif" width="560" height="355" alt="MSVC speed graph"></td>
|
|
<td width="20"> </td>
|
|
</tr>
|
|
<tr>
|
|
<td height="20"> </td>
|
|
<td> </td>
|
|
<td> </td>
|
|
</tr>
|
|
<tr>
|
|
<td> </td>
|
|
<td><img src="gccspeed.gif" width="560" height="355" alt="GCC speed graph"></td>
|
|
<td> </td>
|
|
</tr>
|
|
<tr>
|
|
<td height="20"> </td>
|
|
<td> </td>
|
|
<td> </td>
|
|
</tr>
|
|
</table>
|
|
<p> </p>
|
|
<p>Fitting straight lines to the above plots gives the following figures for initialization
|
|
and amortized copy operation for the two compilers (times in nanoseconds, errors
|
|
at two standard deviations) : -</p>
|
|
<p> </p>
|
|
<h4 align="center">MSVC</h4>
|
|
<table align="center" cellpadding="5" cellspacing="0" class="codetable" width="400">
|
|
<tr>
|
|
<th width="120">
|
|
<div align="right"></div>
|
|
</th>
|
|
<th class="codetabletop" width="120">
|
|
<div align="center">initialization</div>
|
|
</th>
|
|
<th class="codetabletop" width="120">copy operation</th>
|
|
</tr>
|
|
<tr>
|
|
<th class="codetableleft">
|
|
<div align="right">simple counted</div>
|
|
</th>
|
|
<td class="codetablecell" align="center">3000 +/- 170</td>
|
|
<td class="codetablecell" align="center">104 +/- 31</td>
|
|
</tr>
|
|
<tr>
|
|
<th class="codetableleft">
|
|
<div align="right">special counted</div>
|
|
</th>
|
|
<td class="codetablecell" align="center">1330 +/- 50</td>
|
|
<td class="codetablecell" align="center">85 +/- 9</td>
|
|
</tr>
|
|
<tr>
|
|
<th class="codetableleft">
|
|
<div align="right">intrusive</div>
|
|
</th>
|
|
<td class="codetablecell" align="center">1000 +/- 20</td>
|
|
<td class="codetablecell" align="center">71 +/- 3</td>
|
|
</tr>
|
|
<tr>
|
|
<th class="codetableleft" align="right">linked</th>
|
|
<td class="codetablecell" align="center">970 +/- 60</td>
|
|
<td class="codetablecell" align="center">136 +/- 10</td>
|
|
</tr>
|
|
<tr>
|
|
<th class="codetableleft" align="right">cyclic</th>
|
|
<td class="codetablecell" align="center">1290 +/- 70</td>
|
|
<td class="codetablecell" align="center">112 +/- 12</td>
|
|
</tr>
|
|
<tr>
|
|
<th class="codetableleft" align="right">dumb</th>
|
|
<td class="codetablecell" align="center">1020 +/- 20</td>
|
|
<td class="codetablecell" align="center">10 +/- 4</td>
|
|
</tr>
|
|
<tr>
|
|
<th class="codetableleft">
|
|
<div align="right">raw</div>
|
|
</th>
|
|
<td class="codetablecell" align="center">1038 +/- 30</td>
|
|
<td class="codetablecell" align="center">10 +/- 5</td>
|
|
</tr>
|
|
</table>
|
|
<h4 align="center"> </h4>
|
|
<h4 align="center">GCC</h4>
|
|
<table align="center" cellpadding="5" cellspacing="0" class="codetable" width="400">
|
|
<tr>
|
|
<th width="120">
|
|
<div align="right"></div>
|
|
</th>
|
|
<th class="codetabletop" width="120">
|
|
<div align="center">initialization</div>
|
|
</th>
|
|
<th class="codetabletop" width="120">copy operation</th>
|
|
</tr>
|
|
<tr>
|
|
<th class="codetableleft">
|
|
<div align="right">simple counted</div>
|
|
</th>
|
|
<td class="codetablecell" align="center">4620 +/- 150</td>
|
|
<td class="codetablecell" align="center">301 +/- 28</td>
|
|
</tr>
|
|
<tr>
|
|
<th class="codetableleft">
|
|
<div align="right">special counted</div>
|
|
</th>
|
|
<td class="codetablecell" align="center">1990 +/- 40</td>
|
|
<td class="codetablecell" align="center">264 +/- 7</td>
|
|
</tr>
|
|
<tr>
|
|
<th class="codetableleft">
|
|
<div align="right">intrusive</div>
|
|
</th>
|
|
<td class="codetablecell" align="center">1590 +/- 70</td>
|
|
<td class="codetablecell" align="center">181 +/- 12</td>
|
|
</tr>
|
|
<tr>
|
|
<th class="codetableleft" align="right">linked</th>
|
|
<td class="codetablecell" align="center">1470 +/- 140</td>
|
|
<td class="codetablecell" align="center">345 +/- 26</td>
|
|
</tr>
|
|
<tr>
|
|
<th class="codetableleft" align="right">cyclic</th>
|
|
<td class="codetablecell" align="center">2180 +/- 100</td>
|
|
<td class="codetablecell" align="center">330 +/- 18</td>
|
|
</tr>
|
|
<tr>
|
|
<th class="codetableleft" align="right">dumb</th>
|
|
<td class="codetablecell" align="center">1590 +/- 70</td>
|
|
<td class="codetablecell" align="center">74 +/- 12</td>
|
|
</tr>
|
|
<tr>
|
|
<th class="codetableleft" align="right">
|
|
<div align="right">raw</div>
|
|
</th>
|
|
<td class="codetablecell" align="center">1430 +/- 60</td>
|
|
<td class="codetablecell" align="center">27 +/- 11</td>
|
|
</tr>
|
|
</table>
|
|
<p>Note that the above times include a certain amount of loop overhead etc. for
|
|
each operation. An estimate of the pure smart pointer operation time 'overhead'
|
|
can be obtained by subtracting the dumb or raw figure from the smart pointer
|
|
time of interest.</p>
|
|
<h3>Detail</h3>
|
|
<p>The test involved iterating a loop which creates raw pointers. These were then
|
|
shared among a varying number (set size) of smart pointers. A range of set sizes
|
|
was used and then a line fitted to get a linear relation with number of initializations
|
|
and copy-operations. A spreadsheet was used for the line fit, and to produce
|
|
the performance graphs above.</p>
|
|
<h2> </h2>
|
|
<h2>Container Test Results</h2>
|
|
<p>To gain some insight in to operation within real life programs, this test was
|
|
devised. Smart pointers were used to fill standard containers which were then
|
|
sorted.</p>
|
|
<p>In this case, the contained pointer pointed to a class which initializes a
|
|
private data member to a random value in its default constructor. This value
|
|
is used subsequently for the sort comparison test. The class also contains an
|
|
intrusive reference count for the benefit of the intrusive counted pointer.</p>
|
|
<p> All times are in seconds for 300,000 contained pointers.</p>
|
|
<h4 align="center">GCC</h4>
|
|
<table align="center" cellpadding="5" cellspacing="0" class="codetable" width="500">
|
|
<tr>
|
|
<th> </th>
|
|
<th class="codetabletop" colspan="2">vector</th>
|
|
<th class="codetabletop" colspan="2">list</th>
|
|
</tr>
|
|
<tr>
|
|
<th width="120">
|
|
<div align="right"></div>
|
|
</th>
|
|
<th class="codetabletop2" width="80">
|
|
<div align="center">fill</div>
|
|
</th>
|
|
<th class="codetabletop2" width="80">sort</th>
|
|
<th class="codetabletop2" width="80">fill</th>
|
|
<th class="codetabletop2" width="80">sort</th>
|
|
</tr>
|
|
<tr>
|
|
<th class="codetableleft">
|
|
<div align="right">simple counted</div>
|
|
</th>
|
|
<td class="codetablecell" align="center">46.54</td>
|
|
<td class="codetablecell" align="center">2.44</td>
|
|
<td class="codetablecell" align="center">47.09</td>
|
|
<td class="codetablecell" align="center">3.22</td>
|
|
</tr>
|
|
<tr>
|
|
<th class="codetableleft">
|
|
<div align="right">special counted</div>
|
|
</th>
|
|
<td class="codetablecell" align="center">14.02</td>
|
|
<td class="codetablecell" align="center">2.83</td>
|
|
<td class="codetablecell" align="center">7.28</td>
|
|
<td class="codetablecell" align="center">3.21</td>
|
|
</tr>
|
|
<tr>
|
|
<th class="codetableleft">
|
|
<div align="right">intrusive</div>
|
|
</th>
|
|
<td class="codetablecell" align="center">12.15</td>
|
|
<td class="codetablecell" align="center">1.91</td>
|
|
<td class="codetablecell" align="center">7.99</td>
|
|
<td class="codetablecell" align="center">3.08</td>
|
|
</tr>
|
|
<tr>
|
|
<th class="codetableleft" align="right">linked</th>
|
|
<td class="codetablecell" align="center">12.46</td>
|
|
<td class="codetablecell" align="center">2.32</td>
|
|
<td class="codetablecell" align="center">8.14</td>
|
|
<td class="codetablecell" align="center">3.27</td>
|
|
</tr>
|
|
<tr>
|
|
<th class="codetableleft" align="right">cyclic</th>
|
|
<td class="codetablecell" align="center">22.60</td>
|
|
<td class="codetablecell" align="center">3.19</td>
|
|
<td class="codetablecell" align="center">1.63</td>
|
|
<td class="codetablecell" align="center">3.18</td>
|
|
</tr>
|
|
<tr>
|
|
<th class="codetableleft" align="right">
|
|
<div align="right">raw</div>
|
|
</th>
|
|
<td class="codetablecell" align="center">11.81</td>
|
|
<td class="codetablecell" align="center">0.24</td>
|
|
<td class="codetablecell" align="center">27.51</td>
|
|
<td class="codetablecell" align="center">0.77</td>
|
|
</tr>
|
|
</table>
|
|
<p> </p>
|
|
<h4 align="center">MSVC</h4>
|
|
<table align="center" cellpadding="5" cellspacing="0" class="codetable" width="500">
|
|
<tr>
|
|
<th> </th>
|
|
<th class="codetabletop" colspan="2">vector</th>
|
|
<th class="codetabletop" colspan="2">list</th>
|
|
</tr>
|
|
<tr>
|
|
<th width="120">
|
|
<div align="right"></div>
|
|
</th>
|
|
<th class="codetabletop2" width="80">
|
|
<div align="center">fill</div>
|
|
</th>
|
|
<th class="codetabletop2" width="80">sort</th>
|
|
<th class="codetabletop2" width="80">fill</th>
|
|
<th class="codetabletop2" width="80">sort</th>
|
|
</tr>
|
|
<tr>
|
|
<th class="codetableleft">
|
|
<div align="right">simple counted</div>
|
|
</th>
|
|
<td class="codetablecell" align="center">1.83</td>
|
|
<td class="codetablecell" align="center">2.37</td>
|
|
<td class="codetablecell" align="center">1.86</td>
|
|
<td class="codetablecell" align="center">4.85</td>
|
|
</tr>
|
|
<tr>
|
|
<th class="codetableleft">
|
|
<div align="right">special counted</div>
|
|
</th>
|
|
<td class="codetablecell" align="center">1.04</td>
|
|
<td class="codetablecell" align="center">2.35</td>
|
|
<td class="codetablecell" align="center">1.38</td>
|
|
<td class="codetablecell" align="center">4.58</td>
|
|
</tr>
|
|
<tr>
|
|
<th class="codetableleft">
|
|
<div align="right">intrusive</div>
|
|
</th>
|
|
<td class="codetablecell" align="center">1.04</td>
|
|
<td class="codetablecell" align="center">1.84</td>
|
|
<td class="codetablecell" align="center">1.16</td>
|
|
<td class="codetablecell" align="center">4.29</td>
|
|
</tr>
|
|
<tr>
|
|
<th class="codetableleft" align="right">linked</th>
|
|
<td class="codetablecell" align="center">1.08</td>
|
|
<td class="codetablecell" align="center">2.00</td>
|
|
<td class="codetablecell" align="center">1.21</td>
|
|
<td class="codetablecell" align="center">4.33</td>
|
|
</tr>
|
|
<tr>
|
|
<th class="codetableleft" align="right">cyclic</th>
|
|
<td class="codetablecell" align="center">1.38</td>
|
|
<td class="codetablecell" align="center">2.84</td>
|
|
<td class="codetablecell" align="center">1.47</td>
|
|
<td class="codetablecell" align="center">4.73</td>
|
|
</tr>
|
|
<tr>
|
|
<th class="codetableleft" align="right">
|
|
<div align="right">raw</div>
|
|
</th>
|
|
<td class="codetablecell" align="center">0.67</td>
|
|
<td class="codetablecell" align="center">0.28</td>
|
|
<td class="codetablecell" align="center">1.24</td>
|
|
<td class="codetablecell" align="center">1.81</td>
|
|
</tr>
|
|
</table>
|
|
<p> </p>
|
|
<h2>Code Size</h2>
|
|
<p>The following code sizes were determined by inspection of generated code for
|
|
MSVC only. Sizes are given in the form N / M / I where:</p>
|
|
<ul type="circle">
|
|
<li> N is the instruction count of the operation</li>
|
|
<li>M is the size of the code in bytes</li>
|
|
<li>I determines whether generated code was inlined or not I = inline, O = "outline"</li>
|
|
</ul>
|
|
<p> </p>
|
|
<table align="center" cellpadding="5" cellspacing="0" class="codetable" width="570">
|
|
<tr>
|
|
<th height="28" width="140">
|
|
<div align="right"></div>
|
|
</th>
|
|
<th height="28" class="codetabletop" width="80">
|
|
<div align="center">ptr()</div>
|
|
</th>
|
|
<th height="28" class="codetabletop" width="80">ptr(p)</th>
|
|
<th height="28" class="codetabletop" width="80">ptr(ptr)</th>
|
|
<th height="28" class="codetabletop" width="80">op=()</th>
|
|
<th height="28" class="codetabletop" width="80">
|
|
<div align="center">~ptr()</div>
|
|
</th>
|
|
</tr>
|
|
<tr>
|
|
<th class="codetableleft">
|
|
<div align="right">simple counted</div>
|
|
</th>
|
|
<td class="codetablecell" align="center">38/110/O</td>
|
|
<td class="codetablecell" align="center">38/110/O</td>
|
|
<td class="codetablecell" align="center">9/23/I</td>
|
|
<td class="codetablecell" align="center">22/57/I</td>
|
|
<td class="codetablecell" align="center">17/40/I</td>
|
|
</tr>
|
|
<tr>
|
|
<th class="codetableleft">
|
|
<div align="right">special counted</div>
|
|
</th>
|
|
<td class="codetablecell" align="center"><font size="-1">50/141/O</font></td>
|
|
<td class="codetablecell" align="center"><font size="-1">50/141/O</font></td>
|
|
<td class="codetablecell" align="center"><font size="-1">9/23/I</font></td>
|
|
<td class="codetablecell" align="center"><font size="-1">23/64/I</font></td>
|
|
<td class="codetablecell" align="center"><font size="-1">13/38/I</font></td>
|
|
</tr>
|
|
<tr>
|
|
<th class="codetableleft">
|
|
<div align="right">intrusive</div>
|
|
</th>
|
|
<td class="codetablecell" align="center"><font size="-1">1/2/I</font></td>
|
|
<td class="codetablecell" align="center"><font size="-1">3/6/I</font></td>
|
|
<td class="codetablecell" align="center"><font size="-1">3/6/I</font></td>
|
|
<td class="codetablecell" align="center"><font size="-1">6/11/I</font></td>
|
|
<td class="codetablecell" align="center"><font size="-1">6/11/I</font></td>
|
|
</tr>
|
|
<tr>
|
|
<th class="codetableleft">
|
|
<div align="right">linked</div>
|
|
</th>
|
|
<td class="codetablecell" align="center"><font size="-1">5/19/I</font></td>
|
|
<td class="codetablecell" align="center"><font size="-1">5/15/I</font></td>
|
|
<td class="codetablecell" align="center"><font size="-1">10/30/I</font></td>
|
|
<td class="codetablecell" align="center"><font size="-1">27/59/I</font></td>
|
|
<td class="codetablecell" align="center"><font size="-1">14/38/I</font></td>
|
|
</tr>
|
|
</table>
|
|
<p>During the code inspection, a couple of minor points were noticed: -</p>
|
|
<ul>
|
|
<li>Function inlining was critical to performance.</li>
|
|
<li>For MSVC, at least, a "delete 0" caused execution of 11 assembly
|
|
instructions, including a function call. So in cases where performance is
|
|
at an absolute premium it can be worth inserting the extra manual test.</li>
|
|
</ul>
|
|
<h2> </h2>
|
|
<h2>Data Size</h2>
|
|
<p>The following smart pointer sizes were obtained in bytes</p>
|
|
<table align="center" cellpadding="5" cellspacing="0" class="codetable" width="270">
|
|
<tr>
|
|
<th height="28" width="150">
|
|
<div align="right"></div>
|
|
</th>
|
|
<th height="28" class="codetabletop" width="60">
|
|
<div align="center">MSVC</div>
|
|
</th>
|
|
<th height="28" class="codetabletop" width="60">
|
|
<div align="center">GCC</div>
|
|
</th>
|
|
</tr>
|
|
<tr>
|
|
<th class="codetableleft">
|
|
<div align="right">simple counted</div>
|
|
</th>
|
|
<td class="codetablecell">
|
|
<div align="center">8</div>
|
|
</td>
|
|
<td class="codetablecell">
|
|
<div align="center">8</div>
|
|
</td>
|
|
</tr>
|
|
<tr>
|
|
<th class="codetableleft">
|
|
<div align="right">special counted</div>
|
|
</th>
|
|
<td class="codetablecell">
|
|
<div align="center">8</div>
|
|
</td>
|
|
<td class="codetablecell">
|
|
<div align="center">12</div>
|
|
</td>
|
|
</tr>
|
|
<tr>
|
|
<th class="codetableleft">
|
|
<div align="right">intrusive</div>
|
|
</th>
|
|
<td class="codetablecell">
|
|
<div align="center">4</div>
|
|
</td>
|
|
<td class="codetablecell">
|
|
<div align="center">4</div>
|
|
</td>
|
|
</tr>
|
|
<tr>
|
|
<th class="codetableleft">
|
|
<div align="right">linked</div>
|
|
</th>
|
|
<td class="codetablecell">
|
|
<div align="center">12</div>
|
|
</td>
|
|
<td class="codetablecell">
|
|
<div align="center">12</div>
|
|
</td>
|
|
</tr>
|
|
<tr>
|
|
<th class="codetableleft">
|
|
<div align="right">cyclic</div>
|
|
</th>
|
|
<td class="codetablecell">
|
|
<div align="center">8</div>
|
|
</td>
|
|
<td class="codetablecell">
|
|
<div align="center">8</div>
|
|
</td>
|
|
</tr>
|
|
</table>
|
|
<h2> </h2>
|
|
<h2>Summary</h2>
|
|
<p>The timing results mainly speak for themselves: clearly an intrusive pointer
|
|
outperforms all others and a simple heap based counted pointer has poor performance
|
|
relative to other implementations. The selection of an optimal non-intrusive
|
|
smart pointer implementation is more application dependent, however. Where small
|
|
numbers of copies are expected, it is likely that the linked implementation
|
|
will be favoured. Conversely, for larger numbers of copies a counted pointer
|
|
with some type of special purpose allocator looks like a win. Other factors
|
|
to bear in mind are: -</p>
|
|
<ul>
|
|
<li>Deterministic individual, as opposed to amortized, operation time. This
|
|
weighs against any implementation depending on an allocator.</li>
|
|
<li>Multithreaded synchronization. This weighs against an implementation which
|
|
spreads its information as in the case of linked pointer.</li>
|
|
</ul>
|
|
<hr>
|
|
<p>$Date$</p>
|
|
<p>© Copyright Gavin Collings 2000. Permission to copy, use, modify, sell
|
|
and distribute this document is granted provided this copyright notice appears in all
|
|
copies. This document is provided "as is" without express or implied warranty,
|
|
and with no claim as to its suitability for any purpose.</p>
|
|
</body>
|
|
</html>
|