mirror of https://gitee.com/bigwinds/arangodb
added readme
This commit is contained in:
parent
49cc12e6a9
commit
13905fcd84
|
@ -16,6 +16,8 @@
|
|||
* [Coming from SQL](GettingStarted/ComingFromSql.md)
|
||||
# * [Coming from MongoDB](GettingStarted/ComingFromMongoDb.md) #TODO
|
||||
#
|
||||
* [StorageEngines](StorageEngines/README.md)
|
||||
#
|
||||
* [Scalability](Scalability/README.md)
|
||||
* [Architecture](Scalability/Architecture.md)
|
||||
* [Data models](Scalability/DataModels.md)
|
||||
|
|
|
@ -0,0 +1,147 @@
|
|||
# Storage Engines
|
||||
|
||||
At the very bottom of the ArangoDB database lies the storage
|
||||
engine. The storage engine is repsonsible for persisting the documents
|
||||
on disk, holding copies in memory, providing indexes and caches to
|
||||
speed up queries.
|
||||
|
||||
Upto version 3.1 ArangoDB only supported memory mapped files (MMFILES)
|
||||
as sole storage engine. Beginning with 3.2 ArangoDB has supports
|
||||
pluggable storage engines. The second supported engine is RocksDB from
|
||||
Facebook.
|
||||
|
||||
RocksDB is an embeddable persistent key-value store. It is a log
|
||||
structure database and is optimized for fast storage.
|
||||
|
||||
The MMFILES engine is optimized for the use-case where the data fits
|
||||
into the main memory. It allows for very fast concurrent
|
||||
reads. However, writes block reads and locking is on collection
|
||||
level. Indexes are always in memory and are rebuild on startup. This
|
||||
gives a better performance but imposed a longer startup time.
|
||||
|
||||
The ROCKSDB engine is optimized for large data-sets and allows for a
|
||||
steady insert performance even if the data-set is much larger than the
|
||||
main memory. Indexes are always stored on disk but caches are used to
|
||||
speed up performance. RocksDB uses document level locks allowing for
|
||||
concurrent writes. Writes do not block reads.
|
||||
|
||||
The engine must be selected for the whole server / cluster. It is not
|
||||
possible to mix engines. The transaction handling and write-ahead-log
|
||||
format in the engines is very different and cannot be combined.
|
||||
|
||||
## RocksDB
|
||||
|
||||
### Advantages
|
||||
|
||||
The main advantages of RocksDB are
|
||||
|
||||
- document-level locks
|
||||
- support for large data-sets
|
||||
- persistent indexes
|
||||
|
||||
RocksDB is a very flexible engine that can be configured for various use cases.
|
||||
|
||||
### Caveats
|
||||
|
||||
RocksDB allows concurrent writes. However, when touching the same document a
|
||||
write conflict is raised. This cannot happen with the MMFILES engine, therefore
|
||||
application that switch to ROCKSDB need to be prepared that such exception can
|
||||
arise. It is possible to exclusively locking collections when executing AQL. This
|
||||
will avoid write conflicts but also inhibit concurrent writes.
|
||||
|
||||
Currently, another restriction is due to the transaction handling in
|
||||
RocksDB. Transactions are limited in total size. If you have a statement
|
||||
modifying a lot of documents it is necessary to commit data inbetween. This will
|
||||
be done automatically for AQL by default.
|
||||
improvements.
|
||||
|
||||
### Performance
|
||||
|
||||
RocksDB is a based on log structured merge tree. A good introduction can be
|
||||
found in:
|
||||
|
||||
- http://www.benstopford.com/2015/02/14/log-structured-merge-trees/
|
||||
- https://blog.acolyer.org/2014/11/26/the-log-structured-merge-tree-lsm-tree/
|
||||
|
||||
The basic idea is that data is organized in levels were each level is a factor
|
||||
larger than the previous. New data will reside in smaller levels while old data
|
||||
is moved down to the larger levels. This allows to support high rate of inserts
|
||||
over an extended period. In principle it is possible that the different levels
|
||||
reside on different storage media. The smaller ones on fast SSD, the larger ones
|
||||
on bigger spinning disks.
|
||||
|
||||
RocksDB itself provides a lot of different knobs to fine tune the storage
|
||||
engine according to your use-case. ArangoDB supports the most common ones
|
||||
using the options below.
|
||||
|
||||
Performance reports for the storage engine can be found here:
|
||||
|
||||
- https://github.com/facebook/rocksdb/wiki/performance-benchmarks
|
||||
- https://github.com/facebook/rocksdb/wiki/RocksDB-Tuning-Guide
|
||||
|
||||
### ArangoDB options
|
||||
|
||||
ArangoDB has a cache for the persistent index. The size of this cache
|
||||
is controlled by the option
|
||||
|
||||
--cache.size
|
||||
|
||||
RocksDB also has a cache for the blocks stored on disk. The size of
|
||||
this cache is controlled by the option
|
||||
|
||||
--rocksdb.block-cache-size M
|
||||
|
||||
ArangoDB distributes the available memory equally between the two
|
||||
caches.
|
||||
|
||||
ArangoDB chooses a size for the various levels in RocksDB that is
|
||||
suitable for general purpose applications.
|
||||
|
||||
RocksDB log strutured data levels have increasing size
|
||||
|
||||
MEM: --
|
||||
L0: --
|
||||
L1: -- --
|
||||
L2: -- -- -- --
|
||||
...
|
||||
|
||||
New or updated Documents are first stored in memory. If this memtable
|
||||
reaches the limit given by
|
||||
|
||||
--rocksdb.write-buffer-size N
|
||||
|
||||
it will converted to a SST file and inserted at level 0.
|
||||
|
||||
The following option control the size of each level and the depth.
|
||||
|
||||
--rocksdb.num-levels N
|
||||
|
||||
Limits the number of levels to N. By default it is 7 and there is
|
||||
seldom a reason to change this. A new level is only opened if there is
|
||||
too much data in the previous one.
|
||||
|
||||
--rocksdb.max-bytes-for-level-base B
|
||||
|
||||
L0 will hold at most B bytes.
|
||||
|
||||
--rocksdb.max-bytes-for-level-multiplier M
|
||||
|
||||
Each level is at most M times as much bytes as the previous
|
||||
one. Therefore the maximum number of bytes forlevel L can be
|
||||
calcalculated as
|
||||
|
||||
max-bytes-for-level-base * (max-bytes-for-level-multiplier ^ (L-1))
|
||||
|
||||
## Future
|
||||
|
||||
RocksDB imposes a limit on the transaction size. It is optimized to
|
||||
handle small transaction very efficiently, but is limiting the total
|
||||
size of the transaction.
|
||||
|
||||
Currently, we are solely using RocksDB transaction to implement the
|
||||
ArangoDB transaction handling when using the ROCKSDB engine. Therefore
|
||||
the some restrictions apply there.
|
||||
|
||||
We will improve this by introducing distributed transactions in
|
||||
ArangoDB. This will allow to handle large transaction as a series of
|
||||
small RocksDB transaction and hence removing the size restriction.
|
File diff suppressed because one or more lines are too long
Loading…
Reference in New Issue