Merge branch 'devel' of github.com:triAGENS/ArangoDB into devel

2013-07-29 16:06:29 +02:00 · 2013-07-29 16:06:29 +02:00 · 8588b648ea
parent 1be5c65d79 6cbf835adf
commit 8588b648ea
10 changed files with 204 additions and 46 deletions
--- a/Documentation/ImplementorManual/HttpReplication.md
+++ b/Documentation/ImplementorManual/HttpReplication.md
@ -7,7 +7,44 @@ HTTP Interface for Replication {#HttpReplication}
 Replication {#HttpReplicationIntro}
 ===================================

-This is an introduction to ArangoDB's Http replication interface.
+This is an introduction to ArangoDB's HTTP replication interface.
+
+The HTTP replication interface serves four main purposes:
+- fetch initial data from a server (e.g. for an initial synchronisation of data, or backups)
+- administer the replication logger (starting, stopping, querying state)
+- fetch the changelog from a server (used for incremental synchronisation of changes)
+- administer the replication applier (starting, stopping, configuring, querying state)
+
+Replication Dump Commands {#HttpReplicationDumpCommands}
+--------------------------------------------------------
+
+The `inventory` method provides can be used to query an ArangoDB server's current
+set of collections plus their indexes. Clients can use this method to get an 
+overview of which collections are present on the server. They can use this information
+to either start a full or a partial synchronisation of data, e.g. to initiate a backup
+or the incremental data synchronisation.
+
+@anchor HttpReplicationInventory
+@copydetails triagens::arango::RestReplicationHandler::handleCommandInventory
+
+The `dump` method can be used to fetch data from a specific collection. As the
+results of the dump command can be huge, it may not return all data from a collection
+at once. Instead, the dump command may be called repeatedly by replication clients
+until there is no more data to fetch. The dump command will not only return the
+current documents in the collection, but also document updates and deletions. 
+
+To get to an identical state of data, replication clients should apply the individual
+parts of the dump results in the same order as they are served to them.
+
+@anchor HttpReplicationDump
+@copydetails triagens::arango::RestReplicationHandler::handleCommandDump
+
+
+Replication Logger Commands {#HttpReplicationLoggerCommands}
+------------------------------------------------------------
+
+The logger commands allow starting, starting, and fetching the current state of 
+the replication logger. 

@anchor HttpReplicationLoggerStart
@copydetails triagens::arango::RestReplicationHandler::handleCommandLoggerStart
@ -20,19 +57,21 @@ This is an introduction to ArangoDB's Http replication interface.
@anchor HttpReplicationLoggerState
@copydetails triagens::arango::RestReplicationHandler::handleCommandLoggerState

-@CLEARPAGE
+To query the latest changes logged by the replication logger, the Http interface
+also provides the `logger-follow`.
+
+This method should be used by replication clients to incrementally fetch updates 
+from an ArangoDB instance.
+
@anchor HttpReplicationLoggerFollow
@copydetails triagens::arango::RestReplicationHandler::handleCommandLoggerFollow

-@CLEARPAGE
-@anchor HttpReplicationInventory
-@copydetails triagens::arango::RestReplicationHandler::handleCommandInventory
+Replication Applier Commands {#HttpReplicationApplierCommands}
+--------------------------------------------------------------

-@CLEARPAGE
-@anchor HttpReplicationDump
-@copydetails triagens::arango::RestReplicationHandler::handleCommandDump
+The applier commands allow to remotely start, stop, and query the state and 
+configuration of an ArangoDB server's replication applier.

-@CLEARPAGE
@anchor HttpReplicationApplierGetConfig
@copydetails triagens::arango::RestReplicationHandler::handleCommandApplierGetConfig

--- a/Documentation/ImplementorManual/HttpReplicationTOC.md
+++ b/Documentation/ImplementorManual/HttpReplicationTOC.md
@ -3,12 +3,15 @@ TOC {#HttpReplicationTOC}

 - @ref HttpReplication
  - @ref HttpReplicationIntro
+  - @ref HttpReplicationDumpCommands
+    - @ref HttpReplicationInventory "GET /_api/replication/inventory"
+    - @ref HttpReplicationDump "GET /_api/replication/dump"
+  - @ref HttpReplicationLoggerCommands
    - @ref HttpReplicationLoggerStart "PUT /_api/replication/logger-start"
    - @ref HttpReplicationLoggerStop "PUT /_api/replication/logger-stop"
    - @ref HttpReplicationLoggerState "GET /_api/replication/logger-state"
    - @ref HttpReplicationLoggerFollow "GET /_api/replication/logger-follow"
-    - @ref HttpReplicationInventory "GET /_api/replication/inventory"
-    - @ref HttpReplicationDump "GET /_api/replication/dump"
+  - @ref HttpReplicationApplierCommands
    - @ref HttpReplicationApplierGetConfig "GET /_api/replication/applier-config"
    - @ref HttpReplicationApplierSetConfig "PUT /_api/replication/applier-config"
    - @ref HttpReplicationApplierStart "PUT /_api/replication/applier-start"
--- a/Documentation/Makefile.files
+++ b/Documentation/Makefile.files
@ -152,6 +152,7 @@ WIKI = \
 	UserManualArangosh \
 	UserManualFoxx \
 	UserManualFoxxManager \
+	UserManualReplication \
 	UserManualWebInterface \
 	jsUnity

--- a/Documentation/RefManual/Replication.md
+++ b/Documentation/RefManual/Replication.md
@ -7,18 +7,11 @@ Replication Events{#RefManualReplication}
 The replication logger in ArangoDB will log all events into the `_replication`
 system collection. It will only log events when the logger is enabled.

-Continuous Replication Log{#RefManualReplicationContinuous}
-===========================================================
-
-Replication log events are made available to replication clients via the API at
-`/_api/replication/logger-follow`. This API can be called by clients to fetch
-replication log events repeatedly.
-
 The following sections describe in detail the structure of the log events
 returned by this API.

 Replication Event Types{#RefManualReplicationEventTypes}
--------------------------------------------------------
+========================================================

 The following replication event types will be logged by ArangoDB 1.4:

@ -53,7 +46,7 @@ value is a sequence number and is used by the replication applier to determine
 whether a replication event was already processed.

 Examples{#RefManualReplicationExamples}
---------------------------------------
+=======================================

 - 1000: the replication logger was stopped:

@ -440,9 +433,3 @@ event that is neither a ocument/edge operation nor a `transaction commit` event)
 should abort the ongoing transaction and discard all buffered operations. It can then
 consider the current transaction as failed.

-Collections{#RefManualReplicationCollections}
---------------------------------------------
-
-The replication logger will only log events that affect user-defined collections. Any
-events for system collections (collections with names that start with an underscore) are
-not logged by the replication logger, and thus cannot be fetched from the continuous log.
--- a/Documentation/RefManual/ReplicationTOC.md
+++ b/Documentation/RefManual/ReplicationTOC.md
@ -2,8 +2,6 @@ TOC {#RefManualReplicationTOC}
 ====================================

 - @ref RefManualReplication
-  - @ref RefManualReplicationContinuous
-    - @ref RefManualReplicationEventTypes
-    - @ref RefManualReplicationExamples
-    - @ref RefManualReplicationTransactions
-    - @ref RefManualReplicationCollections
+  - @ref RefManualReplicationEventTypes
+  - @ref RefManualReplicationExamples
+  - @ref RefManualReplicationTransactions
--- a/Documentation/UserManual/FoxxManager.md
+++ b/Documentation/UserManual/FoxxManager.md
@ -64,9 +64,9 @@ There is currently one application installed. It is called "aardvark" and it is
 a system application. You can safely ignore system applications.

 We are now going to install the hello world application. It is called
-"hello-world" - no suprise there.
+"hello-foxx" - no suprise there.

-    unix> foxx-manager install hallo-world /example
+    unix> foxx-manager install hello-foxx /example
    Application app:hello-foxx:1.2.2 installed successfully at mount point /example

 The second parameter `/example` is the mount path of the application. You should now
@ -87,7 +87,7 @@ command.

 You can install the application again under different mount path. 

-    unix> foxx-manager install hallo-world /hello
+    unix> foxx-manager install hello-foxx /hello
    Application app:hello-foxx:1.2.2 installed successfully at mount point /hello

 You now have to separated instances of the same application. They are completely
--- a/Documentation/UserManual/Transactions.md
+++ b/Documentation/UserManual/Transactions.md
@ -4,7 +4,6 @@ Transactions {#Transactions}
@NAVIGATE_Transactions
@EMBEDTOC{TransactionsTOC}

-
 Introduction {#TransactionsIntroduction}
 ========================================

@ -25,7 +24,6 @@ These *ACID* properties provide the following guarantees:
  transaction durability is configurable in ArangoDB, as is the durability
  on collection level.

-
 Transaction invocation {#TransactionsInvocation}
 ================================================

@ -54,7 +52,6 @@ data retrieval and/or modification operations, and at the end automatically
 commit the transaction. If an error occurs during transaction execution, the
 transaction is automatically aborted, and all changes are rolled back.

-
 Declaration of collections
 ==========================

@ -104,7 +101,6 @@ Even without specifying them, it is still possible to read from such collections
 from within a transaction, but with relaxed isolation. Please refer to 
@ref TransactionsLocking for more details.

-
 Declaration of data modification and retrieval operations
 =========================================================

@ -189,7 +185,6 @@ case, the user can return any legal Javascript value from the function:
      }
    });

-
 Examples
 ========

@ -303,7 +298,6 @@ start. The following example using a cap constraint should illustrate that:

    /* we now have these keys back: [ "key2", "key3", "key4" ] */

-
 Cross-collection transactions
 =============================

@ -359,7 +353,6 @@ transaction abort and roll back all changes in all collections:
    db.c1.count(); /* 0 */
    db.c2.count(); /* 0 */

-
 Passing parameters to transactions {#TransactionsParameters}
 ============================================================

@ -391,7 +384,6 @@ Some example that uses collections:
      }
    });

-
 Disallowed operations {#TransactionsDisallowedOperations}
 =========================================================

@ -403,7 +395,6 @@ If an attempt is made to carry out any of these operations during a transaction,
 ArangoDB will abort the transaction with error code `1653 (disallowed operation inside
 transaction)`.

-
 Locking and isolation {#TransactionsLocking}
 ============================================

@ -474,7 +465,6 @@ transaction. The total lock wait time may thus be much higher than the value of
 To avoid both deadlocks and non-repeatable reads, all collections used in a 
 transaction should always be specified if known in advance.

-
 Durability {#TransactionsDurability}
 ====================================

@ -549,7 +539,6 @@ synchronisation for multi-collection transactions in ArangoDB.
 The disk sync speed of the system will thus be the most important factor for the 
 performance of multi-collection transactions.

-
 Limitations {#TransactionsLimitations}
 ======================================

@ -588,4 +577,3 @@ It is legal to not declare read-only collections, but this should be avoided if
 possible to reduce the probability of deadlocks and non-repeatable reads.

 Please refer to @ref TransactionsLocking for more details.
-
--- a/Documentation/UserManual/UserManual.md
+++ b/Documentation/UserManual/UserManual.md
@ -17,6 +17,7 @@ ArangoDB's User Manual (@VERSION) {#UserManual}
@CHAPTER_REF{UserManualFoxxManager}
@CHAPTER_REF{UserManualFoxx}
@CHAPTER_REF{UserManualActions}
+@CHAPTER_REF{UserManualReplication}
@CHAPTER_REF{Transactions}
@CHAPTER_REF{CommandLine}
@CHAPTER_REF{Glossary}
--- a/Documentation/UserManual/UserManualReplication.md
+++ b/Documentation/UserManual/UserManualReplication.md
@ -0,0 +1,129 @@
+Replication {#UserManualReplication}
+====================================
+
+@NAVIGATE_UserManualReplication
+@EMBEDTOC{UserManualReplicationTOC}
+
+Introduction {#UserManualReplicationIntro}
+==========================================
+
+Starting with ArangoDB 1.4, ArangoDB comes with an optional master-slave replication.
+
+The replication is asychronous and eventually consistent, meaning that slaves will 
+*pull* changes from the master and apply them locally. Data on a slave may be
+behind the state of data on the master until the slave has fetched and applied all 
+changes. 
+
+Transactions are honored in replication, i.e. changes by a replicated transaction will 
+become visible on the slave atomically.
+
+It is possible to connect multiple slaves to the same master. Slaves should be used as
+read-only instances, though otherwise conflicts may occur that cannot be solved 
+automatically in ArangoDB 1.4.
+This is also the reason why master-master replication is not supported.
+
+Components {#UserManualReplicationComponents}
+=============================================
+
+ArangoDB's replication consists of two main components, which can be used together or
+separately: the *replication logger* and the *replication applier*.
+
+Using both components on two ArangoDB servers provides master-slave replication between
+the two, but there are also additional use cases.
+
+Replication Logger {#UserManualReplicationLogger}
+-------------------------------------------------
+
+The purpose of the replication logger is to log all changes that modify data.
+The replication logger will produce an ongoing stream of change events. That stream,
+or specific parts of the stream can be queried by clients via an HTTP API.
+
+An example client for this is the ArangoDB replication applier. 
+The ArangoDB replication applier will permanently query the stream of change events 
+the replication logger will write. It will apply "new" changes locally to get to 
+the same state of data as the logger server.
+
+External systems (e.g. indexers) could also incrementally query the log stream from 
+the replication logger. Using this approach, one could feed external systems with all 
+data modification operations done in ArangoDB.
+
+The replication logger will write all change events to a system collection named
+`_replication`. The events are thus persisted and still be present after a server
+restart or crash.
+
+ArangoDB will only log changes if the replication logger is turned on. Should there be 
+any data modifications while the replication logger is turned off, these events will
+be lost for replication.
+
+The replication logger will mainly log events that affect user-defined collections. 
+Operations on ArangoDB's system collections (collections with names that start with 
+an underscore) are intentionally excluded from replication.
+
+There is exactly one replication logger present in an ArangoDB database. 
+
+Replication Applier {#UserManualReplicationApplier}
+---------------------------------------------------
+
+The purpose of the replication applier is to read data from a remote stream of change 
+events from a data provider and apply them locally. The applier is thus using the
+*pull* principle.
+
+Normally, one would connect an ArangoDB replication applier to an ArangoDB replication
+logger. This would make the applier fetch all data from the logger server incrementally.
+The data on the applier thus will be a copy of the data on the logger server, and the
+applier server can be used as a read-only or hot standby clone.
+
+The applier can connect to any system that speaks HTTP and returns replication log 
+events in the expected format (see @INTREF{HttpReplicationLoggerFollow,format} and @ref 
+RefManualReplicationEventTypes). It is thus possible (though not the scope of the 
+ArangoDB project) to implement data providers other than ArangoDB and still have an 
+ArangoDB applier fetch their data and apply it.
+
+As the replication applier does not get notified immediately when there are "new"
+changes, it might take some time the applier has fetched and applied the newest changes 
+from a logger server. Data modification operations might thus become visible on the
+applying server later than on the server on which they were originated.
+
+If the replication applier cannot connect to the data provider or the communication
+fails for some reason, it will try to reconnect and fetch outstanding data. Until this
+succeeds, the state of data on the replication applier might also be behind the state
+of the data provider.
+
+There is exactly one replication applier present in an ArangoDB database. It is thus
+not possible to have an applier collect data from multiple ArangoDB "master" instances.
+
+Setting up Replication {#UserManualReplicationSetup}
+====================================================
+
+Setting up a working replication topology requires two ArangoDB instances:
+- the replication logger server (_master_): this is the instance we'll replication data from
+- the replication applier server (_slave_): this instance will fetch data from the logger server 
+  and apply all changes locally
+  
+For the following example setup, we'll use the instance *tcp://localhost:8529* as the 
+logger server, and the instance *tcp://localhost:8530* as an applier.
+
+The goal is to have all data from *tcp://localhost:8529* being replicated to the instance
+*tcp://localhost:8530*.
+
+Setting up the Logger {#UserManualReplicationSetupLogger}
+---------------------------------------------------------
+
+
+
+Setting up the Applier {#UserManualReplicationSetupApplier}
+-----------------------------------------------------------
+
+
+Replication Overhead {#UserManualReplicationOverhead}
+=====================================================
+
+Running the replication logger will make all data modification operations more 
+expensive, as the ArangoDB server needs to write the operation into the replication log. 
+
+Additionally, replication appliers that connect to an ArangoDB server will cause some
+extra work as incoming HTTP requests need to be processed and results be generated.
+
+Overall, turning on the replication logger will reduce throughput on an ArangoDB server
+by some extent. If the replication feature is not required, the replication logger should 
+be turned off.
--- a/Documentation/UserManual/UserManualReplicationTOC.md
+++ b/Documentation/UserManual/UserManualReplicationTOC.md
@ -0,0 +1,12 @@
+TOC {#UserManualReplicationTOC}
+===============================
+
+- @ref UserManualReplication
+  - @ref UserManualReplicationIntro
+  - @ref UserManualReplicationComponents
+    - @ref UserManualReplicationLogger
+    - @ref UserManualReplicationApplier
+  - @ref UserManualReplicationSetup
+    - @ref UserManualReplicationSetupLogger
+    - @ref UserManualReplicationSetupApplier
+  - @ref UserManualReplicationOverhead