1
0
Fork 0

Merge branch 'devel' of github.com:arangodb/arangodb into devel

This commit is contained in:
jsteemann 2019-04-30 11:46:02 +02:00
commit d63b8706dd
6 changed files with 99 additions and 86 deletions

View File

@ -6,17 +6,20 @@ This feature is only available in the
[**Enterprise Edition**](https://www.arangodb.com/why-arangodb/arangodb-enterprise/)
{% endhint %}
This chapter describes the [smart-graph](../README.md) module.
It enables you to manage graphs at scale, it will give a vast performance benefit for all graphs sharded in an ArangoDB Cluster.
On a single server this feature is pointless, hence it is only available in a cluster mode.
In terms of querying there is no difference between smart and General Graphs.
The former are a transparent replacement for the latter.
So for querying the graph please refer to [AQL Graph Operations](../../../AQL/Graphs/index.html)
and [Graph Functions](../GeneralGraphs/Functions.md) sections.
The optimizer is clever enough to identify if we are on a SmartGraph or not.
This chapter describes the `smart-graph` module, which enables you to manage
graphs at scale. It will give a vast performance benefit for all graphs sharded
in an ArangoDB Cluster. On a single server this feature is pointless, hence it
is only available in cluster mode.
The difference is only in the management section: creating and modifying the underlying collections of the graph.
For a detailed API reference please refer to [SmartGraph Management](../SmartGraphs/Management.md).
In terms of querying there is no difference between SmartGraphs and
General Graphs. The former is a transparent replacement for the latter.
For graph querying please refer to [AQL Graph Operations](../../../AQL/Graphs/index.html)
and [General Graph Functions](../GeneralGraphs/Functions.md) sections.
The optimizer is clever enough to identify whether it is a SmartGraph or not.
The difference is only in the management section: creating and modifying the
underlying collections of the graph. For a detailed API reference please refer
to [SmartGraph Management](Management.md).
Do the hands-on
[ArangoDB SmartGraphs Tutorial](https://www.arangodb.com/using-smartgraphs-arangodb/)
@ -25,50 +28,63 @@ to learn more.
What makes a graph smart?
-------------------------
Most graphs have one feature that divides the entire graph into several smaller subgraphs.
These subgraphs have a large amount of edges that only connect vertices in the same subgraph
and only have few edges connecting vertices from other subgraphs.
Most graphs have one feature that divides the entire graph into several smaller
subgraphs. These subgraphs have a large amount of edges that only connect
vertices in the same subgraph and only have few edges connecting vertices from
other subgraphs.
Examples for these graphs are:
* Social Networks
- **Social Networks**<br/>
Typically the feature here is the region/country users live in.
Every user typically has more contacts in the same region/country then she has in other regions/countries
Every user typically has more contacts in the same region/country then she
has in other regions/countries
* Transport Systems
- **Transport Systems**<br/>
For those also the feature is the region/country. You have many local
transportation but only few across countries.
For those also the feature is the region/country. You have many local transportation but only few across countries.
* E-Commerce
In this case probably the category of products is a good feature. Often products of the same category are bought together.
- **E-Commerce**<br/>
In this case probably the category of products is a good feature.
Often products of the same category are bought together.
If this feature is known, SmartGraphs can make use if it.
When creating a SmartGraph you have to define a smartAttribute, which is the name of an attribute stored in every vertex.
The graph will than be automatically sharded in such a way that all vertices with the same value are stored on the same physical machine,
all edges connecting vertices with identical smartAttribute values are stored on this machine as well.
During query time the query optimizer and the query executor both know for every document exactly where it is stored and can thereby minimize network overhead.
Everything that can be computed locally will be computed locally.
When creating a SmartGraph you have to define a smartAttribute, which is the
name of an attribute stored in every vertex. The graph will than be
automatically sharded in such a way that all vertices with the same value are
stored on the same physical machine, all edges connecting vertices with
identical smartAttribute values are stored on this machine as well.
During query time the query optimizer and the query executor both know for
every document exactly where it is stored and can thereby minimize network
overhead. Everything that can be computed locally will be computed locally.
Benefits of SmartGraphs
-----------------------
Because of the above described guaranteed sharding, the performance of queries that only cover one subgraph have a performance almost equal to an only local computation.
Queries that cover more than one subgraph require some network overhead. The more subgraphs are touched the more network cost will apply.
However the overall performance is never worse than the same query on a General Graph.
Because of the above described guaranteed sharding, the performance of queries
that only cover one subgraph have a performance almost equal to an only local
computation. Queries that cover more than one subgraph require some network
overhead. The more subgraphs are touched the more network cost will apply.
However the overall performance is never worse than the same query using a
General Graph.
Getting started
---------------
First of all SmartGraphs *cannot use existing collections*, when switching to SmartGraph from an existing data set you have to import the data into a fresh SmartGraph.
This switch can be easily achieved with [arangodump](../../Programs/Arangodump/README.md)
and [arangorestore](../../Programs/Arangorestore/README.md).
The only thing you have to change in this pipeline is that you create the new collections with the SmartGraph before starting `arangorestore`.
First of all SmartGraphs *cannot use existing collections*, when switching to
SmartGraph from an existing data set you have to import the data into a fresh
SmartGraph. This switch can be easily achieved with
[arangodump](../../Programs/Arangodump/README.md) and
[arangorestore](../../Programs/Arangorestore/README.md).
The only thing you have to change in this pipeline is that you create the new
collections with the SmartGraph before starting `arangorestore`.
* Create a graph
In comparison to General Graph we have to add more options when creating the graph. The two options `smartGraphAttribute` and `numberOfShards` are required and cannot be modified later.
- Create a graph
In comparison to General Graph we have to add more options when creating the
graph. The two options `smartGraphAttribute` and `numberOfShards` are
required and cannot be modified later.
@startDocuBlockInline smartGraphCreateGraphHowTo1
arangosh> var graph_module = require("@arangodb/smart-graph");
@ -77,11 +93,10 @@ The only thing you have to change in this pipeline is that you create the new co
[ SmartGraph myGraph EdgeDefinitions: [ ] VertexCollections: [ ] ]
@endDocuBlock smartGraphCreateGraphHowTo1
- Add some vertex collections
* Add some vertex collections
This is again identical to General Graph. The module will setup correct sharding for all these collections. *Note*: The collections have to be new.
This is again identical to General Graph. The module will setup correct
sharding for all these collections. *Note*: The collections have to be new.
@startDocuBlockInline smartGraphCreateGraphHowTo2
arangosh> graph._addVertexCollection("shop");
@ -91,13 +106,11 @@ The only thing you have to change in this pipeline is that you create the new co
[ SmartGraph myGraph EdgeDefinitions: [ ] VertexCollections: [ "shop", "customer", "pet" ] ]
@endDocuBlock smartGraphCreateGraphHowTo2
* Define relations on the Graph
- Define relations on the Graph
@startDocuBlockInline smartGraphCreateGraphHowTo3
arangosh> var rel = graph_module._relation("isCustomer", ["shop"], ["customer"]);
arangosh> graph._extendEdgeDefinitions(rel);
arangosh> graph;
[ SmartGraph myGraph EdgeDefinitions: [ "isCustomer: [shop] -> [customer]" ] VertexCollections: [ "pet" ] ]
[ SmartGraph myGraph EdgeDefinitions: [ "isCustomer: [shop] -> [customer]" ] VertexCollections: [ "pet" ] ]
@endDocuBlock smartGraphCreateGraphHowTo3

View File

@ -165,7 +165,7 @@ is used by these writers (in terms of "writers pool") one can use
upon several possible configurable formulas as defined by their types.
The currently supported types are:
- **bytes_accum**: Consolidation is performed based on current memory cunsumption
- **bytes_accum**: Consolidation is performed based on current memory consumption
of segments and `threshold` property value.
- **tier**: Consolidate based on segment byte size and live document count
as dictated by the customization attributes.

View File

@ -20,8 +20,8 @@ of commit+consolidate), a lower value will cause a lot of disk space to be
wasted.
For the case where the consolidation policies rarely merge segments (i.e. few
inserts/deletes), a higher value will impact performance without any added
benefits.
Background:
benefits.<br/>
_Background:_
With every "commit" or "consolidate" operation a new state of the view
internal data-structures is created on disk.
Old states/snapshots are released once there are no longer any users
@ -38,8 +38,8 @@ commit, will cause the index not to account for them and memory usage would
continue to grow.
For the case where there are a few inserts/updates, a higher value will impact
performance and waste disk space for each commit call without any added
benefits.
Background:
benefits.<br/>
_Background:_
For data retrieval ArangoSearch views follow the concept of
"eventually-consistent", i.e. eventually all the data in ArangoDB will be
matched by corresponding query expressions.
@ -60,8 +60,8 @@ For the case where there are a lot of data modification operations, a higher
value could potentially have the data store consume more space and file handles.
For the case where there are a few data modification operations, a lower value
will impact performance due to no segment candidates available for
consolidation.
Background:
consolidation.<br/>
_Background:_
For data modification ArangoSearch views follow the concept of a
"versioned data store". Thus old versions of data may be removed once there
are no longer any users of the old data. The frequency of the cleanup and
@ -71,8 +71,8 @@ Background:
@RESTSTRUCT{consolidationPolicy,post_api_view_props,object,optional,post_api_view_props_consolidation}
The consolidation policy to apply for selecting which segments should be merged
(default: {})
Background:
(default: {})<br/>
_Background:_
With each ArangoDB transaction that inserts documents one or more
ArangoSearch internal segments gets created.
Similarly for removed documents the segments that contain such documents
@ -85,16 +85,16 @@ Background:
released once old segments are no longer used.
@RESTSTRUCT{type,post_api_view_props_consolidations,string,optional,string}
@RESTSTRUCT{type,post_api_view_props_consolidation,string,optional,string}
The segment candidates for the "consolidation" operation are selected based
upon several possible configurable formulas as defined by their types.
The currently supported types are (default: "bytes_accum"):
- *bytes_accum*: consolidate if and only if ({threshold} range `[0.0, 1.0]`):
{threshold} > (segment_bytes + sum_of_merge_candidate_segment_bytes) / all_segment_bytes
- *bytes_accum*: consolidate if and only if (`{threshold}` range `[0.0, 1.0]`):
`{threshold} > (segment_bytes + sum_of_merge_candidate_segment_bytes) / all_segment_bytes`
i.e. the sum of all candidate segment byte size is less than the total
segment byte size multiplied by the {threshold}
segment byte size multiplied by the `{threshold}`
- *tier*: consolidate based on segment byte size and live document count
as dicated by the customization attributes.
as dictated by the customization attributes.
@RESTSTRUCT{links,post_api_view_props,object,optional,post_api_view_links}

View File

@ -21,8 +21,8 @@ of commit+consolidate), a lower value will cause a lot of disk space to be
wasted.
For the case where the consolidation policies rarely merge segments (i.e. few
inserts/deletes), a higher value will impact performance without any added
benefits.
Background:
benefits.<br/>
_Background:_
With every "commit" or "consolidate" operation a new state of the view
internal data-structures is created on disk.
Old states/snapshots are released once there are no longer any users
@ -39,8 +39,8 @@ commit, will cause the index not to account for them and memory usage would
continue to grow.
For the case where there are a few inserts/updates, a higher value will impact
performance and waste disk space for each commit call without any added
benefits.
Background:
benefits.<br/>
_Background:_
For data retrieval ArangoSearch views follow the concept of
"eventually-consistent", i.e. eventually all the data in ArangoDB will be
matched by corresponding query expressions.
@ -61,8 +61,8 @@ For the case where there are a lot of data modification operations, a higher
value could potentially have the data store consume more space and file handles.
For the case where there are a few data modification operations, a lower value
will impact performance due to no segment candidates available for
consolidation.
Background:
consolidation.<br/>
_Background:_
For data modification ArangoSearch views follow the concept of a
"versioned data store". Thus old versions of data may be removed once there
are no longer any users of the old data. The frequency of the cleanup and
@ -72,8 +72,8 @@ Background:
@RESTSTRUCT{consolidationPolicy,post_api_view_props,object,optional,post_api_view_props_consolidation}
The consolidation policy to apply for selecting which segments should be merged
(default: {})
Background:
(default: {})<br/>
_Background:_
With each ArangoDB transaction that inserts documents one or more
ArangoSearch internal segments gets created.
Similarly for removed documents the segments that contain such documents
@ -86,16 +86,16 @@ Background:
released once old segments are no longer used.
@RESTSTRUCT{type,post_api_view_props_consolidations,string,optional,string}
@RESTSTRUCT{type,post_api_view_props_consolidation,string,optional,string}
The segment candidates for the "consolidation" operation are selected based
upon several possible configurable formulas as defined by their types.
The currently supported types are (default: "bytes_accum"):
- *bytes_accum*: consolidate if and only if ({threshold} range `[0.0, 1.0]`):
{threshold} > (segment_bytes + sum_of_merge_candidate_segment_bytes) / all_segment_bytes
- *bytes_accum*: consolidate if and only if (`{threshold}` range `[0.0, 1.0]`):
`{threshold} > (segment_bytes + sum_of_merge_candidate_segment_bytes) / all_segment_bytes`
i.e. the sum of all candidate segment byte size is less than the total
segment byte size multiplied by the {threshold}
segment byte size multiplied by the `{threshold}`
- *tier*: consolidate based on segment byte size and live document count
as dicated by the customization attributes.
as dictated by the customization attributes.
@RESTSTRUCT{links,post_api_view_props,object,optional,post_api_view_links}

View File

@ -20,8 +20,8 @@ of commit+consolidate), a lower value will cause a lot of disk space to be
wasted.
For the case where the consolidation policies rarely merge segments (i.e. few
inserts/deletes), a higher value will impact performance without any added
benefits.
Background:
benefits.<br/>
_Background:_
With every "commit" or "consolidate" operation a new state of the view
internal data-structures is created on disk.
Old states/snapshots are released once there are no longer any users
@ -38,8 +38,8 @@ commit, will cause the index not to account for them and memory usage would
continue to grow.
For the case where there are a few inserts/updates, a higher value will impact
performance and waste disk space for each commit call without any added
benefits.
Background:
benefits.<br/>
_Background:_
For data retrieval ArangoSearch views follow the concept of
"eventually-consistent", i.e. eventually all the data in ArangoDB will be
matched by corresponding query expressions.
@ -60,8 +60,8 @@ For the case where there are a lot of data modification operations, a higher
value could potentially have the data store consume more space and file handles.
For the case where there are a few data modification operations, a lower value
will impact performance due to no segment candidates available for
consolidation.
Background:
consolidation.<br/>
_Background:_
For data modification ArangoSearch views follow the concept of a
"versioned data store". Thus old versions of data may be removed once there
are no longer any users of the old data. The frequency of the cleanup and
@ -71,8 +71,8 @@ Background:
@RESTSTRUCT{consolidationPolicy,post_api_view_props,object,optional,post_api_view_props_consolidation}
The consolidation policy to apply for selecting which segments should be merged
(default: {})
Background:
(default: {})<br/>
_Background:_
With each ArangoDB transaction that inserts documents one or more
ArangoSearch internal segments gets created.
Similarly for removed documents the segments that contain such documents
@ -85,16 +85,16 @@ Background:
released once old segments are no longer used.
@RESTSTRUCT{type,post_api_view_props_consolidations,string,optional,string}
@RESTSTRUCT{type,post_api_view_props_consolidation,string,optional,string}
The segment candidates for the "consolidation" operation are selected based
upon several possible configurable formulas as defined by their types.
The currently supported types are (default: "bytes_accum"):
- *bytes_accum*: consolidate if and only if ({threshold} range `[0.0, 1.0]`):
{threshold} > (segment_bytes + sum_of_merge_candidate_segment_bytes) / all_segment_bytes
- *bytes_accum*: consolidate if and only if (`{threshold}` range `[0.0, 1.0]`):
`{threshold} > (segment_bytes + sum_of_merge_candidate_segment_bytes) / all_segment_bytes`
i.e. the sum of all candidate segment byte size is less than the total
segment byte size multiplied by the {threshold}
segment byte size multiplied by the `{threshold}`
- *tier*: consolidate based on segment byte size and live document count
as dicated by the customization attributes.
as dictated by the customization attributes.
@RESTSTRUCT{links,post_api_view_props,object,optional,post_api_view_links}

View File

@ -11,7 +11,7 @@
<li class="collections-menu"><a id="collections" class="tab" href="#collections"><i class="fa fa-folder"></i>Collections</a></li>
<li class="views-menu"><a id="views" class="tab" href="#views"><i class="fa fa-eye"></i>Views</a></li>
<li class="queries-menu"><a id="queries" class="tab" href="#queries"><i class="fa fa-bolt"></i>Queries</a></li>
<li class="graphs-menu"><a id="graphs" class="tab" href="#graphs"><i class="fa fa-area-chart"></i>Graphs</a></li>
<li class="graphs-menu"><a id="graphs" class="tab" href="#graphs"><i class="fa fa-sitemap"></i>Graphs</a></li>
<li class="services-menu">
<a id="services" class="tab" href="#services"><i class="fa fa-cogs"></i>Services</a>
</li>