mirror of https://gitee.com/bigwinds/arangodb
Merge branch 'devel' of github.com:arangodb/arangodb into devel
This commit is contained in:
commit
d63b8706dd
|
@ -6,17 +6,20 @@ This feature is only available in the
|
|||
[**Enterprise Edition**](https://www.arangodb.com/why-arangodb/arangodb-enterprise/)
|
||||
{% endhint %}
|
||||
|
||||
This chapter describes the [smart-graph](../README.md) module.
|
||||
It enables you to manage graphs at scale, it will give a vast performance benefit for all graphs sharded in an ArangoDB Cluster.
|
||||
On a single server this feature is pointless, hence it is only available in a cluster mode.
|
||||
In terms of querying there is no difference between smart and General Graphs.
|
||||
The former are a transparent replacement for the latter.
|
||||
So for querying the graph please refer to [AQL Graph Operations](../../../AQL/Graphs/index.html)
|
||||
and [Graph Functions](../GeneralGraphs/Functions.md) sections.
|
||||
The optimizer is clever enough to identify if we are on a SmartGraph or not.
|
||||
This chapter describes the `smart-graph` module, which enables you to manage
|
||||
graphs at scale. It will give a vast performance benefit for all graphs sharded
|
||||
in an ArangoDB Cluster. On a single server this feature is pointless, hence it
|
||||
is only available in cluster mode.
|
||||
|
||||
The difference is only in the management section: creating and modifying the underlying collections of the graph.
|
||||
For a detailed API reference please refer to [SmartGraph Management](../SmartGraphs/Management.md).
|
||||
In terms of querying there is no difference between SmartGraphs and
|
||||
General Graphs. The former is a transparent replacement for the latter.
|
||||
For graph querying please refer to [AQL Graph Operations](../../../AQL/Graphs/index.html)
|
||||
and [General Graph Functions](../GeneralGraphs/Functions.md) sections.
|
||||
The optimizer is clever enough to identify whether it is a SmartGraph or not.
|
||||
|
||||
The difference is only in the management section: creating and modifying the
|
||||
underlying collections of the graph. For a detailed API reference please refer
|
||||
to [SmartGraph Management](Management.md).
|
||||
|
||||
Do the hands-on
|
||||
[ArangoDB SmartGraphs Tutorial](https://www.arangodb.com/using-smartgraphs-arangodb/)
|
||||
|
@ -25,50 +28,63 @@ to learn more.
|
|||
What makes a graph smart?
|
||||
-------------------------
|
||||
|
||||
Most graphs have one feature that divides the entire graph into several smaller subgraphs.
|
||||
These subgraphs have a large amount of edges that only connect vertices in the same subgraph
|
||||
and only have few edges connecting vertices from other subgraphs.
|
||||
Most graphs have one feature that divides the entire graph into several smaller
|
||||
subgraphs. These subgraphs have a large amount of edges that only connect
|
||||
vertices in the same subgraph and only have few edges connecting vertices from
|
||||
other subgraphs.
|
||||
|
||||
Examples for these graphs are:
|
||||
|
||||
* Social Networks
|
||||
|
||||
- **Social Networks**<br/>
|
||||
Typically the feature here is the region/country users live in.
|
||||
Every user typically has more contacts in the same region/country then she has in other regions/countries
|
||||
Every user typically has more contacts in the same region/country then she
|
||||
has in other regions/countries
|
||||
|
||||
* Transport Systems
|
||||
- **Transport Systems**<br/>
|
||||
For those also the feature is the region/country. You have many local
|
||||
transportation but only few across countries.
|
||||
|
||||
For those also the feature is the region/country. You have many local transportation but only few across countries.
|
||||
|
||||
* E-Commerce
|
||||
|
||||
In this case probably the category of products is a good feature. Often products of the same category are bought together.
|
||||
- **E-Commerce**<br/>
|
||||
In this case probably the category of products is a good feature.
|
||||
Often products of the same category are bought together.
|
||||
|
||||
If this feature is known, SmartGraphs can make use if it.
|
||||
When creating a SmartGraph you have to define a smartAttribute, which is the name of an attribute stored in every vertex.
|
||||
The graph will than be automatically sharded in such a way that all vertices with the same value are stored on the same physical machine,
|
||||
all edges connecting vertices with identical smartAttribute values are stored on this machine as well.
|
||||
During query time the query optimizer and the query executor both know for every document exactly where it is stored and can thereby minimize network overhead.
|
||||
Everything that can be computed locally will be computed locally.
|
||||
|
||||
When creating a SmartGraph you have to define a smartAttribute, which is the
|
||||
name of an attribute stored in every vertex. The graph will than be
|
||||
automatically sharded in such a way that all vertices with the same value are
|
||||
stored on the same physical machine, all edges connecting vertices with
|
||||
identical smartAttribute values are stored on this machine as well.
|
||||
During query time the query optimizer and the query executor both know for
|
||||
every document exactly where it is stored and can thereby minimize network
|
||||
overhead. Everything that can be computed locally will be computed locally.
|
||||
|
||||
Benefits of SmartGraphs
|
||||
-----------------------
|
||||
|
||||
Because of the above described guaranteed sharding, the performance of queries that only cover one subgraph have a performance almost equal to an only local computation.
|
||||
Queries that cover more than one subgraph require some network overhead. The more subgraphs are touched the more network cost will apply.
|
||||
However the overall performance is never worse than the same query on a General Graph.
|
||||
Because of the above described guaranteed sharding, the performance of queries
|
||||
that only cover one subgraph have a performance almost equal to an only local
|
||||
computation. Queries that cover more than one subgraph require some network
|
||||
overhead. The more subgraphs are touched the more network cost will apply.
|
||||
However the overall performance is never worse than the same query using a
|
||||
General Graph.
|
||||
|
||||
Getting started
|
||||
---------------
|
||||
|
||||
First of all SmartGraphs *cannot use existing collections*, when switching to SmartGraph from an existing data set you have to import the data into a fresh SmartGraph.
|
||||
This switch can be easily achieved with [arangodump](../../Programs/Arangodump/README.md)
|
||||
and [arangorestore](../../Programs/Arangorestore/README.md).
|
||||
The only thing you have to change in this pipeline is that you create the new collections with the SmartGraph before starting `arangorestore`.
|
||||
First of all SmartGraphs *cannot use existing collections*, when switching to
|
||||
SmartGraph from an existing data set you have to import the data into a fresh
|
||||
SmartGraph. This switch can be easily achieved with
|
||||
[arangodump](../../Programs/Arangodump/README.md) and
|
||||
[arangorestore](../../Programs/Arangorestore/README.md).
|
||||
The only thing you have to change in this pipeline is that you create the new
|
||||
collections with the SmartGraph before starting `arangorestore`.
|
||||
|
||||
* Create a graph
|
||||
|
||||
In comparison to General Graph we have to add more options when creating the graph. The two options `smartGraphAttribute` and `numberOfShards` are required and cannot be modified later.
|
||||
- Create a graph
|
||||
|
||||
In comparison to General Graph we have to add more options when creating the
|
||||
graph. The two options `smartGraphAttribute` and `numberOfShards` are
|
||||
required and cannot be modified later.
|
||||
|
||||
@startDocuBlockInline smartGraphCreateGraphHowTo1
|
||||
arangosh> var graph_module = require("@arangodb/smart-graph");
|
||||
|
@ -77,11 +93,10 @@ The only thing you have to change in this pipeline is that you create the new co
|
|||
[ SmartGraph myGraph EdgeDefinitions: [ ] VertexCollections: [ ] ]
|
||||
@endDocuBlock smartGraphCreateGraphHowTo1
|
||||
|
||||
- Add some vertex collections
|
||||
|
||||
* Add some vertex collections
|
||||
|
||||
This is again identical to General Graph. The module will setup correct sharding for all these collections. *Note*: The collections have to be new.
|
||||
|
||||
This is again identical to General Graph. The module will setup correct
|
||||
sharding for all these collections. *Note*: The collections have to be new.
|
||||
|
||||
@startDocuBlockInline smartGraphCreateGraphHowTo2
|
||||
arangosh> graph._addVertexCollection("shop");
|
||||
|
@ -91,9 +106,7 @@ The only thing you have to change in this pipeline is that you create the new co
|
|||
[ SmartGraph myGraph EdgeDefinitions: [ ] VertexCollections: [ "shop", "customer", "pet" ] ]
|
||||
@endDocuBlock smartGraphCreateGraphHowTo2
|
||||
|
||||
|
||||
* Define relations on the Graph
|
||||
|
||||
- Define relations on the Graph
|
||||
|
||||
@startDocuBlockInline smartGraphCreateGraphHowTo3
|
||||
arangosh> var rel = graph_module._relation("isCustomer", ["shop"], ["customer"]);
|
||||
|
|
|
@ -165,7 +165,7 @@ is used by these writers (in terms of "writers pool") one can use
|
|||
upon several possible configurable formulas as defined by their types.
|
||||
The currently supported types are:
|
||||
|
||||
- **bytes_accum**: Consolidation is performed based on current memory cunsumption
|
||||
- **bytes_accum**: Consolidation is performed based on current memory consumption
|
||||
of segments and `threshold` property value.
|
||||
- **tier**: Consolidate based on segment byte size and live document count
|
||||
as dictated by the customization attributes.
|
||||
|
|
|
@ -20,8 +20,8 @@ of commit+consolidate), a lower value will cause a lot of disk space to be
|
|||
wasted.
|
||||
For the case where the consolidation policies rarely merge segments (i.e. few
|
||||
inserts/deletes), a higher value will impact performance without any added
|
||||
benefits.
|
||||
Background:
|
||||
benefits.<br/>
|
||||
_Background:_
|
||||
With every "commit" or "consolidate" operation a new state of the view
|
||||
internal data-structures is created on disk.
|
||||
Old states/snapshots are released once there are no longer any users
|
||||
|
@ -38,8 +38,8 @@ commit, will cause the index not to account for them and memory usage would
|
|||
continue to grow.
|
||||
For the case where there are a few inserts/updates, a higher value will impact
|
||||
performance and waste disk space for each commit call without any added
|
||||
benefits.
|
||||
Background:
|
||||
benefits.<br/>
|
||||
_Background:_
|
||||
For data retrieval ArangoSearch views follow the concept of
|
||||
"eventually-consistent", i.e. eventually all the data in ArangoDB will be
|
||||
matched by corresponding query expressions.
|
||||
|
@ -60,8 +60,8 @@ For the case where there are a lot of data modification operations, a higher
|
|||
value could potentially have the data store consume more space and file handles.
|
||||
For the case where there are a few data modification operations, a lower value
|
||||
will impact performance due to no segment candidates available for
|
||||
consolidation.
|
||||
Background:
|
||||
consolidation.<br/>
|
||||
_Background:_
|
||||
For data modification ArangoSearch views follow the concept of a
|
||||
"versioned data store". Thus old versions of data may be removed once there
|
||||
are no longer any users of the old data. The frequency of the cleanup and
|
||||
|
@ -71,8 +71,8 @@ Background:
|
|||
|
||||
@RESTSTRUCT{consolidationPolicy,post_api_view_props,object,optional,post_api_view_props_consolidation}
|
||||
The consolidation policy to apply for selecting which segments should be merged
|
||||
(default: {})
|
||||
Background:
|
||||
(default: {})<br/>
|
||||
_Background:_
|
||||
With each ArangoDB transaction that inserts documents one or more
|
||||
ArangoSearch internal segments gets created.
|
||||
Similarly for removed documents the segments that contain such documents
|
||||
|
@ -85,16 +85,16 @@ Background:
|
|||
released once old segments are no longer used.
|
||||
|
||||
|
||||
@RESTSTRUCT{type,post_api_view_props_consolidations,string,optional,string}
|
||||
@RESTSTRUCT{type,post_api_view_props_consolidation,string,optional,string}
|
||||
The segment candidates for the "consolidation" operation are selected based
|
||||
upon several possible configurable formulas as defined by their types.
|
||||
The currently supported types are (default: "bytes_accum"):
|
||||
- *bytes_accum*: consolidate if and only if ({threshold} range `[0.0, 1.0]`):
|
||||
{threshold} > (segment_bytes + sum_of_merge_candidate_segment_bytes) / all_segment_bytes
|
||||
- *bytes_accum*: consolidate if and only if (`{threshold}` range `[0.0, 1.0]`):
|
||||
`{threshold} > (segment_bytes + sum_of_merge_candidate_segment_bytes) / all_segment_bytes`
|
||||
i.e. the sum of all candidate segment byte size is less than the total
|
||||
segment byte size multiplied by the {threshold}
|
||||
segment byte size multiplied by the `{threshold}`
|
||||
- *tier*: consolidate based on segment byte size and live document count
|
||||
as dicated by the customization attributes.
|
||||
as dictated by the customization attributes.
|
||||
|
||||
|
||||
@RESTSTRUCT{links,post_api_view_props,object,optional,post_api_view_links}
|
||||
|
|
|
@ -21,8 +21,8 @@ of commit+consolidate), a lower value will cause a lot of disk space to be
|
|||
wasted.
|
||||
For the case where the consolidation policies rarely merge segments (i.e. few
|
||||
inserts/deletes), a higher value will impact performance without any added
|
||||
benefits.
|
||||
Background:
|
||||
benefits.<br/>
|
||||
_Background:_
|
||||
With every "commit" or "consolidate" operation a new state of the view
|
||||
internal data-structures is created on disk.
|
||||
Old states/snapshots are released once there are no longer any users
|
||||
|
@ -39,8 +39,8 @@ commit, will cause the index not to account for them and memory usage would
|
|||
continue to grow.
|
||||
For the case where there are a few inserts/updates, a higher value will impact
|
||||
performance and waste disk space for each commit call without any added
|
||||
benefits.
|
||||
Background:
|
||||
benefits.<br/>
|
||||
_Background:_
|
||||
For data retrieval ArangoSearch views follow the concept of
|
||||
"eventually-consistent", i.e. eventually all the data in ArangoDB will be
|
||||
matched by corresponding query expressions.
|
||||
|
@ -61,8 +61,8 @@ For the case where there are a lot of data modification operations, a higher
|
|||
value could potentially have the data store consume more space and file handles.
|
||||
For the case where there are a few data modification operations, a lower value
|
||||
will impact performance due to no segment candidates available for
|
||||
consolidation.
|
||||
Background:
|
||||
consolidation.<br/>
|
||||
_Background:_
|
||||
For data modification ArangoSearch views follow the concept of a
|
||||
"versioned data store". Thus old versions of data may be removed once there
|
||||
are no longer any users of the old data. The frequency of the cleanup and
|
||||
|
@ -72,8 +72,8 @@ Background:
|
|||
|
||||
@RESTSTRUCT{consolidationPolicy,post_api_view_props,object,optional,post_api_view_props_consolidation}
|
||||
The consolidation policy to apply for selecting which segments should be merged
|
||||
(default: {})
|
||||
Background:
|
||||
(default: {})<br/>
|
||||
_Background:_
|
||||
With each ArangoDB transaction that inserts documents one or more
|
||||
ArangoSearch internal segments gets created.
|
||||
Similarly for removed documents the segments that contain such documents
|
||||
|
@ -86,16 +86,16 @@ Background:
|
|||
released once old segments are no longer used.
|
||||
|
||||
|
||||
@RESTSTRUCT{type,post_api_view_props_consolidations,string,optional,string}
|
||||
@RESTSTRUCT{type,post_api_view_props_consolidation,string,optional,string}
|
||||
The segment candidates for the "consolidation" operation are selected based
|
||||
upon several possible configurable formulas as defined by their types.
|
||||
The currently supported types are (default: "bytes_accum"):
|
||||
- *bytes_accum*: consolidate if and only if ({threshold} range `[0.0, 1.0]`):
|
||||
{threshold} > (segment_bytes + sum_of_merge_candidate_segment_bytes) / all_segment_bytes
|
||||
- *bytes_accum*: consolidate if and only if (`{threshold}` range `[0.0, 1.0]`):
|
||||
`{threshold} > (segment_bytes + sum_of_merge_candidate_segment_bytes) / all_segment_bytes`
|
||||
i.e. the sum of all candidate segment byte size is less than the total
|
||||
segment byte size multiplied by the {threshold}
|
||||
segment byte size multiplied by the `{threshold}`
|
||||
- *tier*: consolidate based on segment byte size and live document count
|
||||
as dicated by the customization attributes.
|
||||
as dictated by the customization attributes.
|
||||
|
||||
|
||||
@RESTSTRUCT{links,post_api_view_props,object,optional,post_api_view_links}
|
||||
|
|
|
@ -20,8 +20,8 @@ of commit+consolidate), a lower value will cause a lot of disk space to be
|
|||
wasted.
|
||||
For the case where the consolidation policies rarely merge segments (i.e. few
|
||||
inserts/deletes), a higher value will impact performance without any added
|
||||
benefits.
|
||||
Background:
|
||||
benefits.<br/>
|
||||
_Background:_
|
||||
With every "commit" or "consolidate" operation a new state of the view
|
||||
internal data-structures is created on disk.
|
||||
Old states/snapshots are released once there are no longer any users
|
||||
|
@ -38,8 +38,8 @@ commit, will cause the index not to account for them and memory usage would
|
|||
continue to grow.
|
||||
For the case where there are a few inserts/updates, a higher value will impact
|
||||
performance and waste disk space for each commit call without any added
|
||||
benefits.
|
||||
Background:
|
||||
benefits.<br/>
|
||||
_Background:_
|
||||
For data retrieval ArangoSearch views follow the concept of
|
||||
"eventually-consistent", i.e. eventually all the data in ArangoDB will be
|
||||
matched by corresponding query expressions.
|
||||
|
@ -60,8 +60,8 @@ For the case where there are a lot of data modification operations, a higher
|
|||
value could potentially have the data store consume more space and file handles.
|
||||
For the case where there are a few data modification operations, a lower value
|
||||
will impact performance due to no segment candidates available for
|
||||
consolidation.
|
||||
Background:
|
||||
consolidation.<br/>
|
||||
_Background:_
|
||||
For data modification ArangoSearch views follow the concept of a
|
||||
"versioned data store". Thus old versions of data may be removed once there
|
||||
are no longer any users of the old data. The frequency of the cleanup and
|
||||
|
@ -71,8 +71,8 @@ Background:
|
|||
|
||||
@RESTSTRUCT{consolidationPolicy,post_api_view_props,object,optional,post_api_view_props_consolidation}
|
||||
The consolidation policy to apply for selecting which segments should be merged
|
||||
(default: {})
|
||||
Background:
|
||||
(default: {})<br/>
|
||||
_Background:_
|
||||
With each ArangoDB transaction that inserts documents one or more
|
||||
ArangoSearch internal segments gets created.
|
||||
Similarly for removed documents the segments that contain such documents
|
||||
|
@ -85,16 +85,16 @@ Background:
|
|||
released once old segments are no longer used.
|
||||
|
||||
|
||||
@RESTSTRUCT{type,post_api_view_props_consolidations,string,optional,string}
|
||||
@RESTSTRUCT{type,post_api_view_props_consolidation,string,optional,string}
|
||||
The segment candidates for the "consolidation" operation are selected based
|
||||
upon several possible configurable formulas as defined by their types.
|
||||
The currently supported types are (default: "bytes_accum"):
|
||||
- *bytes_accum*: consolidate if and only if ({threshold} range `[0.0, 1.0]`):
|
||||
{threshold} > (segment_bytes + sum_of_merge_candidate_segment_bytes) / all_segment_bytes
|
||||
- *bytes_accum*: consolidate if and only if (`{threshold}` range `[0.0, 1.0]`):
|
||||
`{threshold} > (segment_bytes + sum_of_merge_candidate_segment_bytes) / all_segment_bytes`
|
||||
i.e. the sum of all candidate segment byte size is less than the total
|
||||
segment byte size multiplied by the {threshold}
|
||||
segment byte size multiplied by the `{threshold}`
|
||||
- *tier*: consolidate based on segment byte size and live document count
|
||||
as dicated by the customization attributes.
|
||||
as dictated by the customization attributes.
|
||||
|
||||
|
||||
@RESTSTRUCT{links,post_api_view_props,object,optional,post_api_view_links}
|
||||
|
|
|
@ -11,7 +11,7 @@
|
|||
<li class="collections-menu"><a id="collections" class="tab" href="#collections"><i class="fa fa-folder"></i>Collections</a></li>
|
||||
<li class="views-menu"><a id="views" class="tab" href="#views"><i class="fa fa-eye"></i>Views</a></li>
|
||||
<li class="queries-menu"><a id="queries" class="tab" href="#queries"><i class="fa fa-bolt"></i>Queries</a></li>
|
||||
<li class="graphs-menu"><a id="graphs" class="tab" href="#graphs"><i class="fa fa-area-chart"></i>Graphs</a></li>
|
||||
<li class="graphs-menu"><a id="graphs" class="tab" href="#graphs"><i class="fa fa-sitemap"></i>Graphs</a></li>
|
||||
<li class="services-menu">
|
||||
<a id="services" class="tab" href="#services"><i class="fa fa-cogs"></i>Services</a>
|
||||
</li>
|
||||
|
|
Loading…
Reference in New Issue