mirror of https://gitee.com/bigwinds/arangodb
added readme
This commit is contained in:
parent
49cc12e6a9
commit
13905fcd84
|
@ -16,6 +16,8 @@
|
||||||
* [Coming from SQL](GettingStarted/ComingFromSql.md)
|
* [Coming from SQL](GettingStarted/ComingFromSql.md)
|
||||||
# * [Coming from MongoDB](GettingStarted/ComingFromMongoDb.md) #TODO
|
# * [Coming from MongoDB](GettingStarted/ComingFromMongoDb.md) #TODO
|
||||||
#
|
#
|
||||||
|
* [StorageEngines](StorageEngines/README.md)
|
||||||
|
#
|
||||||
* [Scalability](Scalability/README.md)
|
* [Scalability](Scalability/README.md)
|
||||||
* [Architecture](Scalability/Architecture.md)
|
* [Architecture](Scalability/Architecture.md)
|
||||||
* [Data models](Scalability/DataModels.md)
|
* [Data models](Scalability/DataModels.md)
|
||||||
|
|
|
@ -0,0 +1,147 @@
|
||||||
|
# Storage Engines
|
||||||
|
|
||||||
|
At the very bottom of the ArangoDB database lies the storage
|
||||||
|
engine. The storage engine is repsonsible for persisting the documents
|
||||||
|
on disk, holding copies in memory, providing indexes and caches to
|
||||||
|
speed up queries.
|
||||||
|
|
||||||
|
Upto version 3.1 ArangoDB only supported memory mapped files (MMFILES)
|
||||||
|
as sole storage engine. Beginning with 3.2 ArangoDB has supports
|
||||||
|
pluggable storage engines. The second supported engine is RocksDB from
|
||||||
|
Facebook.
|
||||||
|
|
||||||
|
RocksDB is an embeddable persistent key-value store. It is a log
|
||||||
|
structure database and is optimized for fast storage.
|
||||||
|
|
||||||
|
The MMFILES engine is optimized for the use-case where the data fits
|
||||||
|
into the main memory. It allows for very fast concurrent
|
||||||
|
reads. However, writes block reads and locking is on collection
|
||||||
|
level. Indexes are always in memory and are rebuild on startup. This
|
||||||
|
gives a better performance but imposed a longer startup time.
|
||||||
|
|
||||||
|
The ROCKSDB engine is optimized for large data-sets and allows for a
|
||||||
|
steady insert performance even if the data-set is much larger than the
|
||||||
|
main memory. Indexes are always stored on disk but caches are used to
|
||||||
|
speed up performance. RocksDB uses document level locks allowing for
|
||||||
|
concurrent writes. Writes do not block reads.
|
||||||
|
|
||||||
|
The engine must be selected for the whole server / cluster. It is not
|
||||||
|
possible to mix engines. The transaction handling and write-ahead-log
|
||||||
|
format in the engines is very different and cannot be combined.
|
||||||
|
|
||||||
|
## RocksDB
|
||||||
|
|
||||||
|
### Advantages
|
||||||
|
|
||||||
|
The main advantages of RocksDB are
|
||||||
|
|
||||||
|
- document-level locks
|
||||||
|
- support for large data-sets
|
||||||
|
- persistent indexes
|
||||||
|
|
||||||
|
RocksDB is a very flexible engine that can be configured for various use cases.
|
||||||
|
|
||||||
|
### Caveats
|
||||||
|
|
||||||
|
RocksDB allows concurrent writes. However, when touching the same document a
|
||||||
|
write conflict is raised. This cannot happen with the MMFILES engine, therefore
|
||||||
|
application that switch to ROCKSDB need to be prepared that such exception can
|
||||||
|
arise. It is possible to exclusively locking collections when executing AQL. This
|
||||||
|
will avoid write conflicts but also inhibit concurrent writes.
|
||||||
|
|
||||||
|
Currently, another restriction is due to the transaction handling in
|
||||||
|
RocksDB. Transactions are limited in total size. If you have a statement
|
||||||
|
modifying a lot of documents it is necessary to commit data inbetween. This will
|
||||||
|
be done automatically for AQL by default.
|
||||||
|
improvements.
|
||||||
|
|
||||||
|
### Performance
|
||||||
|
|
||||||
|
RocksDB is a based on log structured merge tree. A good introduction can be
|
||||||
|
found in:
|
||||||
|
|
||||||
|
- http://www.benstopford.com/2015/02/14/log-structured-merge-trees/
|
||||||
|
- https://blog.acolyer.org/2014/11/26/the-log-structured-merge-tree-lsm-tree/
|
||||||
|
|
||||||
|
The basic idea is that data is organized in levels were each level is a factor
|
||||||
|
larger than the previous. New data will reside in smaller levels while old data
|
||||||
|
is moved down to the larger levels. This allows to support high rate of inserts
|
||||||
|
over an extended period. In principle it is possible that the different levels
|
||||||
|
reside on different storage media. The smaller ones on fast SSD, the larger ones
|
||||||
|
on bigger spinning disks.
|
||||||
|
|
||||||
|
RocksDB itself provides a lot of different knobs to fine tune the storage
|
||||||
|
engine according to your use-case. ArangoDB supports the most common ones
|
||||||
|
using the options below.
|
||||||
|
|
||||||
|
Performance reports for the storage engine can be found here:
|
||||||
|
|
||||||
|
- https://github.com/facebook/rocksdb/wiki/performance-benchmarks
|
||||||
|
- https://github.com/facebook/rocksdb/wiki/RocksDB-Tuning-Guide
|
||||||
|
|
||||||
|
### ArangoDB options
|
||||||
|
|
||||||
|
ArangoDB has a cache for the persistent index. The size of this cache
|
||||||
|
is controlled by the option
|
||||||
|
|
||||||
|
--cache.size
|
||||||
|
|
||||||
|
RocksDB also has a cache for the blocks stored on disk. The size of
|
||||||
|
this cache is controlled by the option
|
||||||
|
|
||||||
|
--rocksdb.block-cache-size M
|
||||||
|
|
||||||
|
ArangoDB distributes the available memory equally between the two
|
||||||
|
caches.
|
||||||
|
|
||||||
|
ArangoDB chooses a size for the various levels in RocksDB that is
|
||||||
|
suitable for general purpose applications.
|
||||||
|
|
||||||
|
RocksDB log strutured data levels have increasing size
|
||||||
|
|
||||||
|
MEM: --
|
||||||
|
L0: --
|
||||||
|
L1: -- --
|
||||||
|
L2: -- -- -- --
|
||||||
|
...
|
||||||
|
|
||||||
|
New or updated Documents are first stored in memory. If this memtable
|
||||||
|
reaches the limit given by
|
||||||
|
|
||||||
|
--rocksdb.write-buffer-size N
|
||||||
|
|
||||||
|
it will converted to a SST file and inserted at level 0.
|
||||||
|
|
||||||
|
The following option control the size of each level and the depth.
|
||||||
|
|
||||||
|
--rocksdb.num-levels N
|
||||||
|
|
||||||
|
Limits the number of levels to N. By default it is 7 and there is
|
||||||
|
seldom a reason to change this. A new level is only opened if there is
|
||||||
|
too much data in the previous one.
|
||||||
|
|
||||||
|
--rocksdb.max-bytes-for-level-base B
|
||||||
|
|
||||||
|
L0 will hold at most B bytes.
|
||||||
|
|
||||||
|
--rocksdb.max-bytes-for-level-multiplier M
|
||||||
|
|
||||||
|
Each level is at most M times as much bytes as the previous
|
||||||
|
one. Therefore the maximum number of bytes forlevel L can be
|
||||||
|
calcalculated as
|
||||||
|
|
||||||
|
max-bytes-for-level-base * (max-bytes-for-level-multiplier ^ (L-1))
|
||||||
|
|
||||||
|
## Future
|
||||||
|
|
||||||
|
RocksDB imposes a limit on the transaction size. It is optimized to
|
||||||
|
handle small transaction very efficiently, but is limiting the total
|
||||||
|
size of the transaction.
|
||||||
|
|
||||||
|
Currently, we are solely using RocksDB transaction to implement the
|
||||||
|
ArangoDB transaction handling when using the ROCKSDB engine. Therefore
|
||||||
|
the some restrictions apply there.
|
||||||
|
|
||||||
|
We will improve this by introducing distributed transactions in
|
||||||
|
ArangoDB. This will allow to handle large transaction as a series of
|
||||||
|
small RocksDB transaction and hence removing the size restriction.
|
File diff suppressed because one or more lines are too long
Loading…
Reference in New Issue