Essentially the strategy is: A newly created and a newly opened file
is advised to be "SEQUENTIAL ACCESS", because we will either write to
it or scan it sequentially. As soon as it is sealed, we switch the
advice to "RANDOM ACCESS", because this should be the normal pattern and
aggressive read-aheads tend to be bad. The collector and the compactor
switch a sealed file back to "SEQUENTIAL ACCESS" just before they scan
it and back to "RANDOM ACCESS", when they are done.
Furthermore, all data files in a collection are advised with "WILLNEED"
just before the collection is scanned during loading.
Finally, the actual hash table of AssocMulti is advised to be random
access, although this is an anonymous map given to us by malloc and not
a memory mapped file.
This allows changing the internals of TRI_vector_t structs in order to make the struct smaller.
On 64 bits, the size of TRI_vector_t is reduced from 48 bytes to 28 bytes.
TRI_json_t does benefit from this, as its biggest component is a TRI_vector_t.
This patch reduces collection loading time by preallocating enough space in primary index ahead of time.
When a collection is closed, the number of documents in the collection will be stored in the collection's JSON info file.
This value is used to determine the initial size for the primary index when the collection is loaded next time.
Datafile iteration has also been made slightly faster.
The above changes will have a significant benefit when the collection's datafiles are already in the OS buffer cache, and when there are no secondary indexes.
Loading datafiles from disk or building secondary indexes may be more time-consuming than the improvements reapable by this patch, but the patch shouldn't hurt anyway.