mirror of https://gitee.com/bigwinds/arangodb
More documentation for satellites
This commit is contained in:
parent
404e04baa4
commit
9238463cc1
|
@ -30,3 +30,105 @@ to "satellite".
|
|||
Using arangosh:
|
||||
|
||||
arangosh> db._create("satellite", {"replicationFactor": "satellite"});
|
||||
|
||||
### A full example
|
||||
|
||||
arangosh> var explain = require("@arangodb/aql/explainer").explain
|
||||
arangosh> db._create("satellite", {"replicationFactor": "satellite"})
|
||||
arangosh> db._create("nonsatellite", {numberOfShards: 8})
|
||||
arangosh> db._create("nonsatellite2", {numberOfShards: 8})
|
||||
|
||||
Let's analyse a normal join not involving satellite collections:
|
||||
|
||||
```
|
||||
arangosh> explain("FOR doc in nonsatellite FOR doc2 in nonsatellite2 RETURN 1")
|
||||
|
||||
Query string:
|
||||
FOR doc in nonsatellite FOR doc2 in nonsatellite2 RETURN 1
|
||||
|
||||
Execution plan:
|
||||
Id NodeType Site Est. Comment
|
||||
1 SingletonNode DBS 1 * ROOT
|
||||
4 CalculationNode DBS 1 - LET #2 = 1 /* json expression */ /* const assignment */
|
||||
2 EnumerateCollectionNode DBS 0 - FOR doc IN nonsatellite /* full collection scan */
|
||||
12 RemoteNode COOR 0 - REMOTE
|
||||
13 GatherNode COOR 0 - GATHER
|
||||
6 ScatterNode COOR 0 - SCATTER
|
||||
7 RemoteNode DBS 0 - REMOTE
|
||||
3 EnumerateCollectionNode DBS 0 - FOR doc2 IN nonsatellite2 /* full collection scan */
|
||||
8 RemoteNode COOR 0 - REMOTE
|
||||
9 GatherNode COOR 0 - GATHER
|
||||
5 ReturnNode COOR 0 - RETURN #2
|
||||
|
||||
Indexes used:
|
||||
none
|
||||
|
||||
Optimization rules applied:
|
||||
Id RuleName
|
||||
1 move-calculations-up
|
||||
2 scatter-in-cluster
|
||||
3 remove-unnecessary-remote-scatter
|
||||
```
|
||||
|
||||
All shards involved querying the `nonsatellite` collection will fan out via the
|
||||
coordinator to the shards of `nonsatellite`. In sum 8 shards will open 8 connections
|
||||
to the coordinator asking for the results of the `nonsatellite2` join. The coordinator
|
||||
will fan out to the 8 shards of `nonsatellite2`. So there will be quite some
|
||||
network traffic.
|
||||
|
||||
Let's now have a look at the same using satellite collections:
|
||||
|
||||
```
|
||||
arangosh> db._query("FOR doc in nonsatellite FOR doc2 in satellite RETURN 1")
|
||||
|
||||
Query string:
|
||||
FOR doc in nonsatellite FOR doc2 in satellite RETURN 1
|
||||
|
||||
Execution plan:
|
||||
Id NodeType Site Est. Comment
|
||||
1 SingletonNode DBS 1 * ROOT
|
||||
4 CalculationNode DBS 1 - LET #2 = 1 /* json expression */ /* const assignment */
|
||||
2 EnumerateCollectionNode DBS 0 - FOR doc IN nonsatellite /* full collection scan */
|
||||
3 EnumerateCollectionNode DBS 0 - FOR doc2 IN satellite /* full collection scan, satellite */
|
||||
8 RemoteNode COOR 0 - REMOTE
|
||||
9 GatherNode COOR 0 - GATHER
|
||||
5 ReturnNode COOR 0 - RETURN #2
|
||||
|
||||
Indexes used:
|
||||
none
|
||||
|
||||
Optimization rules applied:
|
||||
Id RuleName
|
||||
1 move-calculations-up
|
||||
2 scatter-in-cluster
|
||||
3 remove-unnecessary-remote-scatter
|
||||
4 remove-satellite-joins
|
||||
```
|
||||
|
||||
In this scenario all shards of nonsatellite will be contacted. However
|
||||
as the join is a satellite join all shards can do the join locally
|
||||
as the data is replicated to all servers reducing the network overhead
|
||||
dramatically.
|
||||
|
||||
### Caveats
|
||||
|
||||
The cluster will automatically keep all satellite collections on all servers in sync
|
||||
by facilitating the synchronous replication. This means that write will be executed
|
||||
on the leader only and this server will coordinate replication to the followers.
|
||||
If a follower doesn't answer in time (due to network problems, temporary shutdown etc.)
|
||||
it may be removed as a follower. This is being reported to the Agency.
|
||||
|
||||
The follower (once back in business) will then periodically check the Agency and know
|
||||
that it is out of sync. It will then automatically catch up. This may take a while
|
||||
depending on how much data has to be synced. When doing a join involving the satellite
|
||||
you can specify how long the DBServer is allowed to wait for sync until the query
|
||||
is being aborted.
|
||||
|
||||
Check [https://docs.arangodb.com/3.1/HTTP/AqlQueryCursor/AccessingCursors.html] for
|
||||
details.
|
||||
|
||||
During network failure there is also a minimal chance that a query was properly
|
||||
distributed to the DBServers but that a previous satellite write could not be
|
||||
replicated to a follower and the leader dropped the follower. The follower however
|
||||
only checks every few seconds if it is really in sync so it might indeed deliver
|
||||
stale results.
|
||||
|
|
|
@ -73,6 +73,12 @@ If set to *true*, then the additional query profiling information will be return
|
|||
in the sub-attribute *profile* of the *extra* return attribute if the query result
|
||||
is not served from the query cache.
|
||||
|
||||
@RESTSTRUCT{satelliteSyncWait,JSF_post_api_cursor_opts,boolean,optional,}
|
||||
This *enterprise* parameter allows to configure how long a DBServer will have time
|
||||
to bring the satellite collections involved in the query into sync.
|
||||
The default value is *60.0* (seconds). When the max time has been reached the query
|
||||
will be stopped.
|
||||
|
||||
@RESTDESCRIPTION
|
||||
The query details include the query string plus optional query options and
|
||||
bind parameters. These values need to be passed in a JSON representation in
|
||||
|
|
Loading…
Reference in New Issue