1
0
Fork 0

Feature 3.4/ncc1701 (#8441)

This commit is contained in:
Jan 2019-03-21 14:53:28 +01:00 committed by GitHub
parent 87eafe7357
commit 3f3f0c6fe3
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
46 changed files with 669 additions and 388 deletions

View File

@ -34,6 +34,7 @@
#include <string> #include <string>
#include "velocypack/velocypack-common.h" #include "velocypack/velocypack-common.h"
#include "velocypack/Exception.h"
namespace arangodb { namespace arangodb {
namespace velocypack { namespace velocypack {
@ -99,6 +100,16 @@ class StringRef {
/// @brief create a StringRef from a VPack slice of type String /// @brief create a StringRef from a VPack slice of type String
StringRef& operator=(Slice slice); StringRef& operator=(Slice slice);
StringRef substr(size_t pos = 0, size_t count = std::string::npos) const {
if (pos >= _length) {
throw Exception(Exception::IndexOutOfBounds, "substr index out of bounds");
}
if (count == std::string::npos || (count + pos >= _length)) {
count = _length - pos;
}
return StringRef(_data + pos, count);
}
int compare(std::string const& other) const noexcept { int compare(std::string const& other) const noexcept {
int res = memcmp(_data, other.data(), (std::min)(_length, other.size())); int res = memcmp(_data, other.data(), (std::min)(_length, other.size()));
if (res != 0) { if (res != 0) {

View File

@ -164,7 +164,7 @@ to the [naming conventions](../NamingConventions/README.md).
dramatically when using joins in AQL at the costs of reduced write dramatically when using joins in AQL at the costs of reduced write
performance on these collections. performance on these collections.
- *distributeShardsLike* distribute the shards of this collection - *distributeShardsLike*: distribute the shards of this collection
cloning the shard distribution of another. If this value is set, cloning the shard distribution of another. If this value is set,
it will copy the attributes *replicationFactor*, *numberOfShards* and it will copy the attributes *replicationFactor*, *numberOfShards* and
*shardingStrategy* from the other collection. *shardingStrategy* from the other collection.
@ -199,6 +199,20 @@ to the [naming conventions](../NamingConventions/README.md).
In single-server mode, the *shardingStrategy* attribute is meaningless In single-server mode, the *shardingStrategy* attribute is meaningless
and will be ignored. and will be ignored.
- *smartJoinAttribute: in an *Enterprise Edition* cluster, this attribute
determines an attribute of the collection that must contain the shard key value
of the referred-to smart join collection. Additionally, the sharding key
for a document in this collection must contain the value of this attribute,
followed by a colon, followed by the actual primary key of the document.
This feature can only be used in the *Enterprise Edition* and requires the
*distributeShardsLike* attribute of the collection to be set to the name
of another collection. It also requires the *shardKeys* attribute of the
collection to be set to a single shard key attribute, with an additional ':'
at the end.
A further restriction is that whenever documents are stored or updated in the
collection, the value stored in the *smartJoinAttribute* must be a string.
`db._create(collection-name, properties, type)` `db._create(collection-name, properties, type)`
Specifies the optional *type* of the collection, it can either be *document* Specifies the optional *type* of the collection, it can either be *document*

View File

@ -241,6 +241,7 @@
* [Auth and OAuth2](Foxx/Migrating2x/Auth.md) * [Auth and OAuth2](Foxx/Migrating2x/Auth.md)
* [Foxx Queries](Foxx/Migrating2x/Queries.md) * [Foxx Queries](Foxx/Migrating2x/Queries.md)
* [Satellite Collections](Satellites.md) * [Satellite Collections](Satellites.md)
* [Smart Joins](SmartJoins.md)
## OPERATIONS ## OPERATIONS

View File

@ -6,10 +6,10 @@ This feature is only available in the
[**Enterprise Edition**](https://www.arangodb.com/why-arangodb/arangodb-enterprise/) [**Enterprise Edition**](https://www.arangodb.com/why-arangodb/arangodb-enterprise/)
{% endhint %} {% endhint %}
When doing Joins in an ArangoDB cluster data has to exchanged between different servers. When doing joins in an ArangoDB cluster data has to be exchanged between different servers.
Joins will be executed on a coordinator. It will prepare an execution plan Joins will be executed on a coordinator. It will prepare an execution plan
and execute it. When executing the coordinator will contact all shards of the and execute it. When executing, the coordinator will contact all shards of the
starting point of the join and ask for their data. The database servers carrying starting point of the join and ask for their data. The database servers carrying
out this operation will load all their local data and then ask the cluster for out this operation will load all their local data and then ask the cluster for
the other part of the join. This again will be distributed to all involved shards the other part of the join. This again will be distributed to all involved shards
@ -23,7 +23,7 @@ Satellite collections are collections that are intended to address this issue.
They will facilitate the synchronous replication and replicate all its data They will facilitate the synchronous replication and replicate all its data
to all database servers that are part of the cluster. to all database servers that are part of the cluster.
This enables the database servers to execute that part of any Join locally. This enables the database servers to execute that part of any join locally.
This greatly improves performance for such joins at the costs of increased This greatly improves performance for such joins at the costs of increased
storage requirements and poorer write performance on this data. storage requirements and poorer write performance on this data.

View File

@ -0,0 +1,195 @@
Smart Joins
===========
{% hint 'info' %}
This feature is only available in the
[**Enterprise Edition**](https://www.arangodb.com/why-arangodb/arangodb-enterprise/)
{% endhint %}
When doing joins in an ArangoDB cluster data has to be exchanged between different servers.
Joins between collections in a cluster normally require roundtrips between the shards of
the different collections for fetching the data. Requests are routed through an extra
coordinator hop.
For example, with two collections *collection1* and *collection2* with 4 shards each,
the coordinator will initially contact the 4 shards of *collection1*. In order to perform
the join, the DBServer nodes which manage the actual data of *collection1* need to pull
the data from the other collection, *collection2*. This causes extra roundtrips via the
coordinator, which will then pull the data for *collection2* from the responsible shards:
arangosh> db._explain("FOR doc1 IN collection1 FOR doc2 IN collection2 FILTER doc1._key == doc2._key RETURN doc1");
Query String:
FOR doc1 IN collection1 FOR doc2 IN collection2 FILTER doc1._key == doc2._key RETURN doc1
Execution plan:
Id NodeType Site Est. Comment
1 SingletonNode DBS 1 * ROOT
3 EnumerateCollectionNode DBS 0 - FOR doc2 IN collection2 /* full collection scan, 4 shard(s) */
14 RemoteNode COOR 0 - REMOTE
15 GatherNode COOR 0 - GATHER
8 ScatterNode COOR 0 - SCATTER
9 RemoteNode DBS 0 - REMOTE
7 IndexNode DBS 0 - FOR doc1 IN collection1 /* primary index scan, 4 shard(s) */
10 RemoteNode COOR 0 - REMOTE
11 GatherNode COOR 0 - GATHER
6 ReturnNode COOR 0 - RETURN doc1
This is the general query execution, and it makes sense if there is no further
information available about how the data is actually distributed to the individual
shards. It works in case *collection1* and *collection2* have a different amount
of shards, or use different shard keys or strategies. However, it comes with the
additional cost of having to do 4 x 4 requests to perform the join.
Using distributeShardsLike
--------------------------
In the specific case that the two collections have the same number of shards and
the same shards, the data of the two collections can be co-located on the same
server for the same shard key values. In this case the extra hop via the coordinator
is not be necessary.
The query optimizer will remove the extra hop for the join in case it can prove
that data for the two collections is co-located. There are the following requirements
for this:
* using the cluster version of the ArangoDB *Enterprise Edition*
* using two collections with identical sharding. This requires the second collection
to be created with its *distributeShardsLike* attribute pointing to the first
collection
* using a single shard key per collection
Here is an example setup for this, using arangosh:
arangosh> db._create("collection1", {numberOfShards: 4, shardKeys: ["_key"]});
arangosh> db._create("collection2", {numberOfShards: 4, shardKeys: ["_key"], distributeShardsLike: "collection1"});
arangosh> for (i = 0; i < 100; ++i) {
db.collection1.insert({ _key: "test" + i });
db.collection2.insert({ _key: "test" + i });
}
With the two collections in place like this, an AQL query that uses a FILTER condition
that refers from the shard key of the one collection to the shard key of the other collection
and compares the two shard key values by equality is eligible for the "smart-join"
optimization of the query optimizer:
arangosh> db._explain("FOR doc1 IN collection1 FOR doc2 IN collection2 FILTER doc1._key == doc2._key RETURN doc1");
Query String:
FOR doc1 IN collection1 FOR doc2 IN collection2 FILTER doc1._key == doc2._key RETURN doc1
Execution plan:
Id NodeType Site Est. Comment
1 SingletonNode DBS 1 * ROOT
3 EnumerateCollectionNode DBS 0 - FOR doc2 IN collection2 /* full collection scan, 4 shard(s) */
7 IndexNode DBS 0 - FOR doc1 IN collection1 /* primary index scan, 4 shard(s) */
10 RemoteNode COOR 0 - REMOTE
11 GatherNode COOR 0 - GATHER
6 ReturnNode COOR 0 - RETURN doc1
As can be seen above, the extra hop via the coordinator is gone here, which will mean
less cluster-internal traffic and a faster response time.
Smart joins will also work if the shard key of the second collection is not *_key*,
and even for non-unique shard key values, e.g.:
arangosh> db._create("collection1", {numberOfShards: 4, shardKeys: ["_key"]});
arangosh> db._create("collection2", {numberOfShards: 4, shardKeys: ["parent"], distributeShardsLike: "collection1"});
arangosh> db.collection2.ensureIndex({ type: "hash", fields: ["parent"] });
arangosh> for (i = 0; i < 100; ++i) {
db.collection1.insert({ _key: "test" + i });
for (j = 0; j < 10; ++j) {
db.collection2.insert({ parent: "test" + i });
}
}
arangosh> db._explain("FOR doc1 IN collection1 FOR doc2 IN collection2 FILTER doc1._key == doc2.parent RETURN doc1");
Query String:
FOR doc1 IN collection1 FOR doc2 IN collection2 FILTER doc1._key == doc2.parent RETURN doc1
Execution plan:
Id NodeType Site Est. Comment
1 SingletonNode DBS 1 * ROOT
3 EnumerateCollectionNode DBS 2000 - FOR doc2 IN collection2 /* full collection scan, 4 shard(s) */
7 IndexNode DBS 2000 - FOR doc1 IN collection1 /* primary index scan, 4 shard(s) */
10 RemoteNode COOR 2000 - REMOTE
11 GatherNode COOR 2000 - GATHER
6 ReturnNode COOR 2000 - RETURN doc1
Using smartJoinAttribute
------------------------
In case the join on the second collection must be performed on a non-shard key
attribute, there is the option to specify a *smartJoinAttribute* for the collection.
Note that for this case, setting *distributeShardsLike* is still required here, and that that
only a single *shardKeys* attribute can be used.
The single attribute name specified in the *shardKeys* attribute for the collection must end
with a colon character then.
This *smartJoinAttribute* must be populated for all documents in the collection,
and must always contain a string value. The value of the *_key* attribute for each
document must consist of the value of the *smartJoinAttribute*, a colon character
and then some other user-defined key component.
The setup thus becomes:
arangosh> db._create("collection1", {numberOfShards: 4, shardKeys: ["_key"]});
arangosh> db._create("collection2", {numberOfShards: 4, shardKeys: ["_key:"], smartJoinAttribute: "parent", distributeShardsLike: "collection1"});
arangosh> db.collection2.ensureIndex({ type: "hash", fields: ["parent"] });
arangosh> for (i = 0; i < 100; ++i) {
db.collection1.insert({ _key: "test" + i });
db.collection2.insert({ _key: "test" + i + ":" + "ownKey" + i, parent: "test" + i });
}
Failure to populate the *smartGraphAttribute* with a string or not at all will lead
to a document being rejected on insert, update or replace. Similarly, failure to
prefix a document's *_key* attribute value with the value of the *smartGraphAttribute*
will also lead to the document being rejected:
arangosh> db.collection2.insert({ parent: 123 });
JavaScript exception in file './js/client/modules/@arangodb/arangosh.js' at 99,7: ArangoError 4008: smart join attribute not given or invalid
arangosh> db.collection2.insert({ _key: "123:test1", parent: "124" });
JavaScript exception in file './js/client/modules/@arangodb/arangosh.js' at 99,7: ArangoError 4007: shard key value must be prefixed with the value of the smart join attribute
The join can now be performed via the collection's *smartJoinAttribute*:
arangosh> db._explain("FOR doc1 IN collection1 FOR doc2 IN collection2 FILTER doc1._key == doc2.parent RETURN doc1")
Query String:
FOR doc1 IN collection1 FOR doc2 IN collection2 FILTER doc1._key == doc2.parent RETURN doc1
Execution plan:
Id NodeType Site Est. Comment
1 SingletonNode DBS 1 * ROOT
3 EnumerateCollectionNode DBS 101 - FOR doc2 IN collection2 /* full collection scan, 4 shard(s) */
7 IndexNode DBS 101 - FOR doc1 IN collection1 /* primary index scan, 4 shard(s) */
10 RemoteNode COOR 101 - REMOTE
11 GatherNode COOR 101 - GATHER
6 ReturnNode COOR 101 - RETURN doc1
Restricting smart joins to a single shard
-----------------------------------------
If a FILTER condition is used on one of the shard keys, the optimizer will also try
to restrict the queries to just the required shards:
arangosh> db._explain("FOR doc1 IN collection1 FOR doc2 IN collection2 FILTER doc1._key == 'test' && doc1._key == doc2.value RETURN doc1");
Query String:
FOR doc1 IN collection1 FOR doc2 IN collection2 FILTER doc1._key == 'test' && doc1._key == doc2.value
RETURN doc1
Execution plan:
Id NodeType Site Est. Comment
1 SingletonNode DBS 1 * ROOT
8 IndexNode DBS 1 - FOR doc1 IN collection1 /* primary index scan, shard: s2010246 */
7 IndexNode DBS 1 - FOR doc2 IN collection2 /* primary index scan, scan only, shard: s2010253 */
12 RemoteNode COOR 1 - REMOTE
13 GatherNode COOR 1 - GATHER
6 ReturnNode COOR 1 - RETURN doc1

View File

@ -168,6 +168,21 @@ collections (requires the *Enterprise Edition* of ArangoDB).
Manually overriding the sharding strategy does not yet provide a Manually overriding the sharding strategy does not yet provide a
benefit, but it may later in case other sharding strategies are added. benefit, but it may later in case other sharding strategies are added.
@RESTBODYPARAM{smartJoinAttribute,string,optional,string}
In an *Enterprise Edition* cluster, this attribute determines an attribute
of the collection that must contain the shard key value of the referred-to
smart join collection. Additionally, the shard key for a document in this
collection must contain the value of this attribute, followed by a colon,
followed by the actual primary key of the document.
This feature can only be used in the *Enterprise Edition* and requires the
*distributeShardsLike* attribute of the collection to be set to the name
of another collection. It also requires the *shardKeys* attribute of the
collection to be set to a single shard key attribute, with an additional ':'
at the end.
A further restriction is that whenever documents are stored or updated in the
collection, the value stored in the *smartJoinAttribute* must be a string.
@RESTQUERYPARAMETERS @RESTQUERYPARAMETERS
@RESTQUERYPARAM{waitForSyncReplication,integer,optional} @RESTQUERYPARAM{waitForSyncReplication,integer,optional}

View File

@ -58,6 +58,9 @@ void Collection::setExclusiveAccess() {
/// @brief get the collection id /// @brief get the collection id
TRI_voc_cid_t Collection::id() const { return getCollection()->id(); } TRI_voc_cid_t Collection::id() const { return getCollection()->id(); }
/// @brief collection type
TRI_col_type_e Collection::type() const { return getCollection()->type(); }
/// @brief count the number of documents in the collection /// @brief count the number of documents in the collection
size_t Collection::count(transaction::Methods* trx) const { size_t Collection::count(transaction::Methods* trx) const {
// estimate for the number of documents in the collection. may be outdated... // estimate for the number of documents in the collection. may be outdated...
@ -186,3 +189,7 @@ bool Collection::isSmart() const { return getCollection()->isSmart(); }
/// @brief check if collection is a satellite collection /// @brief check if collection is a satellite collection
bool Collection::isSatellite() const { return getCollection()->isSatellite(); } bool Collection::isSatellite() const { return getCollection()->isSatellite(); }
/// @brief return the name of the smart join attribute (empty string
/// if no smart join attribute is present)
std::string const& Collection::smartJoinAttribute() const { return getCollection()->smartJoinAttribute(); }

View File

@ -77,6 +77,9 @@ struct Collection {
return _name; return _name;
} }
/// @brief collection type
TRI_col_type_e type() const;
/// @brief count the number of documents in the collection /// @brief count the number of documents in the collection
size_t count(transaction::Methods* trx) const; size_t count(transaction::Methods* trx) const;
@ -119,6 +122,10 @@ struct Collection {
/// @brief check if collection is a satellite collection /// @brief check if collection is a satellite collection
bool isSatellite() const; bool isSatellite() const;
/// @brief return the name of the smart join attribute (empty string
/// if no smart join attribute is present)
std::string const& smartJoinAttribute() const;
private: private:
arangodb::LogicalCollection* _collection; arangodb::LogicalCollection* _collection;

View File

@ -32,20 +32,30 @@
#include "VocBase/LogicalCollection.h" #include "VocBase/LogicalCollection.h"
#include "VocBase/vocbase.h" #include "VocBase/vocbase.h"
#include <velocypack/Iterator.h>
#include <velocypack/velocypack-aliases.h> #include <velocypack/velocypack-aliases.h>
using namespace arangodb; using namespace arangodb;
using namespace arangodb::aql; using namespace arangodb::aql;
CollectionAccessingNode::CollectionAccessingNode(aql::Collection const* collection) CollectionAccessingNode::CollectionAccessingNode(aql::Collection const* collection)
: _collection(collection) { : _collection(collection),
_prototypeCollection(nullptr),
_prototypeOutVariable(nullptr) {
TRI_ASSERT(_collection != nullptr); TRI_ASSERT(_collection != nullptr);
} }
CollectionAccessingNode::CollectionAccessingNode(ExecutionPlan* plan, CollectionAccessingNode::CollectionAccessingNode(ExecutionPlan* plan,
arangodb::velocypack::Slice slice) arangodb::velocypack::Slice slice)
: _collection(plan->getAst()->query()->collections()->get( : _collection(plan->getAst()->query()->collections()->get(
slice.get("collection").copyString())) { slice.get("collection").copyString())),
_prototypeCollection(nullptr),
_prototypeOutVariable(nullptr) {
if (slice.get("prototype").isString()) {
_prototypeCollection = plan->getAst()->query()->collections()->get(slice.get("prototype").copyString());
}
TRI_ASSERT(_collection != nullptr); TRI_ASSERT(_collection != nullptr);
if (_collection == nullptr) { if (_collection == nullptr) {
@ -76,6 +86,9 @@ void CollectionAccessingNode::collection(aql::Collection const* collection) {
void CollectionAccessingNode::toVelocyPack(arangodb::velocypack::Builder& builder) const { void CollectionAccessingNode::toVelocyPack(arangodb::velocypack::Builder& builder) const {
builder.add("database", VPackValue(_collection->vocbase()->name())); builder.add("database", VPackValue(_collection->vocbase()->name()));
builder.add("collection", VPackValue(_collection->name())); builder.add("collection", VPackValue(_collection->name()));
if (_prototypeCollection != nullptr) {
builder.add("prototype", VPackValue(_prototypeCollection->name()));
}
builder.add("satellite", VPackValue(_collection->isSatellite())); builder.add("satellite", VPackValue(_collection->isSatellite()));
if (ServerState::instance()->isCoordinator()) { if (ServerState::instance()->isCoordinator()) {

View File

@ -35,6 +35,7 @@ namespace arangodb {
namespace aql { namespace aql {
struct Collection; struct Collection;
class ExecutionPlan; class ExecutionPlan;
struct Variable;
class CollectionAccessingNode { class CollectionAccessingNode {
public: public:
@ -78,11 +79,25 @@ class CollectionAccessingNode {
*/ */
std::string const& restrictedShard() const { return _restrictedTo; } std::string const& restrictedShard() const { return _restrictedTo; }
/// @brief set the prototype collection when using distributeShardsLike
void setPrototype(arangodb::aql::Collection const* prototypeCollection,
arangodb::aql::Variable const* prototypeOutVariable) {
_prototypeCollection = prototypeCollection;
_prototypeOutVariable = prototypeOutVariable;
}
aql::Collection const* prototypeCollection() const { return _prototypeCollection; }
aql::Variable const* prototypeOutVariable() const { return _prototypeOutVariable; }
protected: protected:
aql::Collection const* _collection; aql::Collection const* _collection;
/// @brief A shard this node is restricted to, may be empty /// @brief A shard this node is restricted to, may be empty
std::string _restrictedTo; std::string _restrictedTo;
/// @brief prototype collection when using distributeShardsLike
aql::Collection const* _prototypeCollection;
aql::Variable const* _prototypeOutVariable;
}; };
} // namespace aql } // namespace aql

View File

@ -133,7 +133,7 @@ void EngineInfoContainerDBServer::EngineInfo::addNode(ExecutionNode* node) {
} }
// do not set '_type' of the engine here, // do not set '_type' of the engine here,
// bacause satellite collections may consists of // because satellite collections may consist of
// multiple "main nodes" // multiple "main nodes"
break; break;
@ -262,7 +262,7 @@ void EngineInfoContainerDBServer::EngineInfo::serializeSnippet(
plan.root(previous); plan.root(previous);
plan.setVarUsageComputed(); plan.setVarUsageComputed();
// Always Verbose // Always verbose
const unsigned flags = ExecutionNode::SERIALIZE_DETAILS; const unsigned flags = ExecutionNode::SERIALIZE_DETAILS;
plan.root()->toVelocyPack(infoBuilder, flags, /*keepTopLevelOpen*/ false); plan.root()->toVelocyPack(infoBuilder, flags, /*keepTopLevelOpen*/ false);
} }
@ -287,6 +287,8 @@ void EngineInfoContainerDBServer::EngineInfo::serializeSnippet(
// this clone does the translation collection => shardId implicitly // this clone does the translation collection => shardId implicitly
// at the relevant parts of the query. // at the relevant parts of the query.
std::unordered_set<aql::Collection*> cleanup;
cleanup.emplace(_collection);
_collection->setCurrentShard(id); _collection->setCurrentShard(id);
ExecutionPlan plan(query.ast()); ExecutionPlan plan(query.ast());
@ -301,6 +303,26 @@ void EngineInfoContainerDBServer::EngineInfo::serializeSnippet(
// "varUsageComputed" flag below (which will handle the counting) // "varUsageComputed" flag below (which will handle the counting)
plan.increaseCounter(nodeType); plan.increaseCounter(nodeType);
if (nodeType == ExecutionNode::INDEX || nodeType == ExecutionNode::ENUMERATE_COLLECTION) {
auto x = dynamic_cast<CollectionAccessingNode*>(clone);
auto const* prototype = x->prototypeCollection();
if (prototype != nullptr) {
auto s1 = prototype->shardIds();
auto s2 = x->collection()->shardIds();
if (s1->size() == s2->size()) {
for (size_t i = 0; i < s1->size(); ++i) {
if ((*s1)[i] == id) {
// inject shard id into collection
auto collection = const_cast<arangodb::aql::Collection*>(x->collection());
collection->setCurrentShard((*s2)[i]);
cleanup.emplace(collection);
break;
}
}
}
}
}
if (ExecutionNode::REMOTE == nodeType) { if (ExecutionNode::REMOTE == nodeType) {
auto rem = ExecutionNode::castTo<RemoteNode*>(clone); auto rem = ExecutionNode::castTo<RemoteNode*>(clone);
// update the remote node with the information about the query // update the remote node with the information about the query
@ -329,7 +351,11 @@ void EngineInfoContainerDBServer::EngineInfo::serializeSnippet(
plan.setVarUsageComputed(); plan.setVarUsageComputed();
const unsigned flags = ExecutionNode::SERIALIZE_DETAILS; const unsigned flags = ExecutionNode::SERIALIZE_DETAILS;
plan.root()->toVelocyPack(infoBuilder, flags, /*keepTopLevelOpen*/ false); plan.root()->toVelocyPack(infoBuilder, flags, /*keepTopLevelOpen*/ false);
_collection->resetCurrentShard();
// remove shard id hack for all participating collections
for (auto& it : cleanup) {
it->resetCurrentShard();
}
} }
void EngineInfoContainerDBServer::CollectionInfo::mergeShards( void EngineInfoContainerDBServer::CollectionInfo::mergeShards(
@ -347,30 +373,19 @@ void EngineInfoContainerDBServer::addNode(ExecutionNode* node) {
TRI_ASSERT(!_engineStack.empty()); TRI_ASSERT(!_engineStack.empty());
_engineStack.top()->addNode(node); _engineStack.top()->addNode(node);
switch (node->getType()) { switch (node->getType()) {
case ExecutionNode::ENUMERATE_COLLECTION: { case ExecutionNode::ENUMERATE_COLLECTION:
auto* scatter = findFirstScatter(*node);
auto const& colNode = *ExecutionNode::castTo<EnumerateCollectionNode const*>(node);
auto const* col = colNode.collection();
std::unordered_set<std::string> restrictedShard;
if (colNode.isRestricted()) {
restrictedShard.emplace(colNode.restrictedShard());
}
handleCollection(col, AccessMode::Type::READ, scatter, restrictedShard);
updateCollection(col);
break;
}
case ExecutionNode::INDEX: { case ExecutionNode::INDEX: {
auto* scatter = findFirstScatter(*node); auto* scatter = findFirstScatter(*node);
auto const& idxNode = *ExecutionNode::castTo<IndexNode const*>(node); auto const* colNode = dynamic_cast<CollectionAccessingNode const*>(node);
auto const* col = idxNode.collection(); if (colNode == nullptr) {
THROW_ARANGO_EXCEPTION_MESSAGE(TRI_ERROR_INTERNAL, "unable to cast node to CollectionAccessingNode");
}
std::unordered_set<std::string> restrictedShard; std::unordered_set<std::string> restrictedShard;
if (idxNode.isRestricted()) { if (colNode->isRestricted()) {
restrictedShard.emplace(idxNode.restrictedShard()); restrictedShard.emplace(colNode->restrictedShard());
} }
auto const* col = colNode->collection();
handleCollection(col, AccessMode::Type::READ, scatter, restrictedShard); handleCollection(col, AccessMode::Type::READ, scatter, restrictedShard);
updateCollection(col); updateCollection(col);
break; break;
@ -452,8 +467,8 @@ void EngineInfoContainerDBServer::closeSnippet(QueryId coordinatorEngineId) {
if (it == _collectionInfos.end()) { if (it == _collectionInfos.end()) {
THROW_ARANGO_EXCEPTION_MESSAGE( THROW_ARANGO_EXCEPTION_MESSAGE(
TRI_ERROR_INTERNAL, TRI_ERROR_INTERNAL,
"Created a DBServer QuerySnippet without a Collection. This should " "created a DBServer QuerySnippet without a collection. This should "
"not happen. Please report this query to ArangoDB"); "not happen");
} }
it->second.engines.emplace_back(std::move(e)); it->second.engines.emplace_back(std::move(e));
} }

View File

@ -123,7 +123,7 @@ Result ExecutionEngine::createBlocks(std::vector<ExecutionNode*> const& nodes,
if (nodeType == ExecutionNode::GATHER) { if (nodeType == ExecutionNode::GATHER) {
// we found a gather node // we found a gather node
if (remoteNode == nullptr) { if (remoteNode == nullptr) {
return {TRI_ERROR_INTERNAL, "expecting a remoteNode"}; return {TRI_ERROR_INTERNAL, "expecting a RemoteNode"};
} }
// now we'll create a remote node for each shard and add it to the // now we'll create a remote node for each shard and add it to the
@ -134,7 +134,7 @@ Result ExecutionEngine::createBlocks(std::vector<ExecutionNode*> const& nodes,
TRI_ASSERT(serversForRemote != queryIds.end()); TRI_ASSERT(serversForRemote != queryIds.end());
if (serversForRemote == queryIds.end()) { if (serversForRemote == queryIds.end()) {
return {TRI_ERROR_INTERNAL, return {TRI_ERROR_INTERNAL,
"Did not find a DBServer to contact for RemoteNode."}; "Did not find a DBServer to contact for RemoteNode"};
} }
// use "server:" instead of "shard:" to send query fragments to // use "server:" instead of "shard:" to send query fragments to

View File

@ -1234,6 +1234,8 @@ ExecutionNode* EnumerateCollectionNode::clone(ExecutionPlan* plan, bool withDepe
outVariable, _random); outVariable, _random);
c->projections(_projections); c->projections(_projections);
c->_prototypeCollection = _prototypeCollection;
c->_prototypeOutVariable = _prototypeOutVariable;
return cloneHelper(std::move(c), withDependencies, withProperties); return cloneHelper(std::move(c), withDependencies, withProperties);
} }

View File

@ -260,6 +260,16 @@ class ExecutionNode {
} }
} }
/// @brief get the singleton node of the node
ExecutionNode const* getSingleton() const {
auto node = this;
do {
node = node->getFirstDependency();
} while (node != nullptr && node->getType() != SINGLETON);
return node;
}
/// @brief get the node and its dependencies as a vector /// @brief get the node and its dependencies as a vector
void getDependencyChain(std::vector<ExecutionNode*>& result, bool includeSelf) { void getDependencyChain(std::vector<ExecutionNode*>& result, bool includeSelf) {
auto current = this; auto current = this;

View File

@ -225,6 +225,8 @@ ExecutionNode* IndexNode::clone(ExecutionPlan* plan, bool withDependencies,
c->projections(_projections); c->projections(_projections);
c->needsGatherNodeSort(_needsGatherNodeSort); c->needsGatherNodeSort(_needsGatherNodeSort);
c->initIndexCoversProjections(); c->initIndexCoversProjections();
c->_prototypeCollection = _prototypeCollection;
c->_prototypeOutVariable = _prototypeOutVariable;
return cloneHelper(std::move(c), withDependencies, withProperties); return cloneHelper(std::move(c), withDependencies, withProperties);
} }

View File

@ -201,8 +201,9 @@ struct OptimizerRule {
// make operations on sharded collections use distribute // make operations on sharded collections use distribute
distributeInClusterRule, distributeInClusterRule,
// try to find candidates for shard-local joins in the cluster #ifdef USE_ENTERPRISE
optimizeClusterJoinsRule, smartJoinsRule,
#endif
// make operations on sharded collections use scatter / gather / remote // make operations on sharded collections use scatter / gather / remote
scatterInClusterRule, scatterInClusterRule,

View File

@ -542,15 +542,6 @@ std::vector<arangodb::aql::ExecutionNode::NodeType> const patchUpdateRemoveState
arangodb::aql::ExecutionNode::UPDATE, arangodb::aql::ExecutionNode::REPLACE, arangodb::aql::ExecutionNode::UPDATE, arangodb::aql::ExecutionNode::REPLACE,
arangodb::aql::ExecutionNode::REMOVE}; arangodb::aql::ExecutionNode::REMOVE};
int indexOf(std::vector<std::string> const& haystack, std::string const& needle) {
for (size_t i = 0; i < haystack.size(); ++i) {
if (haystack[i] == needle) {
return static_cast<int>(i);
}
}
return -1;
}
/// @brief find the single shard id for the node to restrict an operation to /// @brief find the single shard id for the node to restrict an operation to
/// this will check the conditions of an IndexNode or a data-modification node /// this will check the conditions of an IndexNode or a data-modification node
/// (excluding UPSERT) and check if all shard keys are used in it. If all /// (excluding UPSERT) and check if all shard keys are used in it. If all
@ -1240,15 +1231,16 @@ void arangodb::aql::removeCollectVariablesRule(Optimizer* opt,
class PropagateConstantAttributesHelper { class PropagateConstantAttributesHelper {
public: public:
PropagateConstantAttributesHelper() : _constants(), _modified(false) {} explicit PropagateConstantAttributesHelper(ExecutionPlan* plan)
: _plan(plan), _modified(false) {}
bool modified() const { return _modified; } bool modified() const { return _modified; }
/// @brief inspects a plan and propages constant values in expressions /// @brief inspects a plan and propages constant values in expressions
void propagateConstants(ExecutionPlan* plan) { void propagateConstants() {
SmallVector<ExecutionNode*>::allocator_type::arena_type a; SmallVector<ExecutionNode*>::allocator_type::arena_type a;
SmallVector<ExecutionNode*> nodes{a}; SmallVector<ExecutionNode*> nodes{a};
plan->findNodesOfType(nodes, EN::FILTER, true); _plan->findNodesOfType(nodes, EN::FILTER, true);
for (auto const& node : nodes) { for (auto const& node : nodes) {
auto fn = ExecutionNode::castTo<FilterNode*>(node); auto fn = ExecutionNode::castTo<FilterNode*>(node);
@ -1256,7 +1248,7 @@ class PropagateConstantAttributesHelper {
auto inVar = fn->getVariablesUsedHere(); auto inVar = fn->getVariablesUsedHere();
TRI_ASSERT(inVar.size() == 1); TRI_ASSERT(inVar.size() == 1);
auto setter = plan->getVarSetBy(inVar[0]->id); auto setter = _plan->getVarSetBy(inVar[0]->id);
if (setter != nullptr && setter->getType() == EN::CALCULATION) { if (setter != nullptr && setter->getType() == EN::CALCULATION) {
auto cn = ExecutionNode::castTo<CalculationNode*>(setter); auto cn = ExecutionNode::castTo<CalculationNode*>(setter);
auto expression = cn->expression(); auto expression = cn->expression();
@ -1274,7 +1266,7 @@ class PropagateConstantAttributesHelper {
auto inVar = fn->getVariablesUsedHere(); auto inVar = fn->getVariablesUsedHere();
TRI_ASSERT(inVar.size() == 1); TRI_ASSERT(inVar.size() == 1);
auto setter = plan->getVarSetBy(inVar[0]->id); auto setter = _plan->getVarSetBy(inVar[0]->id);
if (setter != nullptr && setter->getType() == EN::CALCULATION) { if (setter != nullptr && setter->getType() == EN::CALCULATION) {
auto cn = ExecutionNode::castTo<CalculationNode*>(setter); auto cn = ExecutionNode::castTo<CalculationNode*>(setter);
auto expression = cn->expression(); auto expression = cn->expression();
@ -1422,20 +1414,55 @@ class PropagateConstantAttributesHelper {
Variable const* variable = nullptr; Variable const* variable = nullptr;
std::string name; std::string name;
if (!getAttribute(parentNode->getMember(accessIndex), variable, name)) { AstNode* member = parentNode->getMember(accessIndex);
if (!getAttribute(member, variable, name)) {
return; return;
} }
auto constantValue = getConstant(variable, name); auto constantValue = getConstant(variable, name);
if (constantValue != nullptr) { if (constantValue != nullptr) {
// first check if we would optimize away a join condition that uses a smartJoinAttribute...
// we must not do that, because that would otherwise disable smart join functionality
if (arangodb::ServerState::instance()->isCoordinator() &&
parentNode->type == NODE_TYPE_OPERATOR_BINARY_EQ) {
AstNode const* current = parentNode->getMember(accessIndex == 0 ? 1 : 0);
if (current->type == NODE_TYPE_ATTRIBUTE_ACCESS) {
AstNode const* nameAttribute = current;
current = current->getMember(0);
if (current->type == NODE_TYPE_REFERENCE) {
auto setter = _plan->getVarSetBy(static_cast<Variable const*>(current->getData())->id);
if (setter != nullptr &&
(setter->getType() == EN::ENUMERATE_COLLECTION || setter->getType() == EN::INDEX)) {
auto collection = ::getCollection(setter);
if (collection != nullptr) {
auto logical = collection->getCollection();
if (logical->hasSmartJoinAttribute() &&
logical->smartJoinAttribute() == nameAttribute->getString()) {
// don't remove a smart join attribute access!
return;
} else {
std::vector<std::string> const& shardKeys = logical->shardKeys();
if (std::find(shardKeys.begin(), shardKeys.end(), nameAttribute->getString()) != shardKeys.end()) {
// don't remove equality lookups on shard keys, as this may prevent
// the restrict-to-single-shard rule from being applied later!
return;
}
}
}
}
}
}
}
parentNode->changeMember(accessIndex, const_cast<AstNode*>(constantValue)); parentNode->changeMember(accessIndex, const_cast<AstNode*>(constantValue));
_modified = true; _modified = true;
} }
} }
ExecutionPlan* _plan;
std::unordered_map<Variable const*, std::unordered_map<std::string, AstNode const*>> _constants; std::unordered_map<Variable const*, std::unordered_map<std::string, AstNode const*>> _constants;
bool _modified; bool _modified;
}; };
@ -1443,8 +1470,8 @@ class PropagateConstantAttributesHelper {
void arangodb::aql::propagateConstantAttributesRule(Optimizer* opt, void arangodb::aql::propagateConstantAttributesRule(Optimizer* opt,
std::unique_ptr<ExecutionPlan> plan, std::unique_ptr<ExecutionPlan> plan,
OptimizerRule const* rule) { OptimizerRule const* rule) {
PropagateConstantAttributesHelper helper; PropagateConstantAttributesHelper helper(plan.get());
helper.propagateConstants(plan.get()); helper.propagateConstants();
opt->addPlan(std::move(plan), rule, helper.modified()); opt->addPlan(std::move(plan), rule, helper.modified());
} }
@ -3425,301 +3452,6 @@ void arangodb::aql::interchangeAdjacentEnumerationsRule(Optimizer* opt,
opt->addPlan(std::move(plan), rule, false); opt->addPlan(std::move(plan), rule, false);
} }
/// @brief optimize queries in the cluster so that the entire query gets pushed
/// to a single server
void arangodb::aql::optimizeClusterSingleShardRule(Optimizer* opt,
std::unique_ptr<ExecutionPlan> plan,
OptimizerRule const* rule) {
TRI_ASSERT(arangodb::ServerState::instance()->isCoordinator());
bool wasModified = false;
bool done = false;
std::unordered_set<std::string> responsibleServers;
auto collections = plan->getAst()->query()->collections();
for (auto const& it : *(collections->collections())) {
Collection* c = it.second;
TRI_ASSERT(c != nullptr);
if (c->numberOfShards() != 1) {
// more than one shard for this collection
done = true;
break;
}
size_t n = c->responsibleServers(responsibleServers);
if (n != 1) {
// more than one responsible server for this collection
done = true;
break;
}
}
if (done || responsibleServers.size() != 1) {
opt->addPlan(std::move(plan), rule, wasModified);
return;
}
// we only found a single responsible server, and all collections involved
// have exactly one shard
// that means we can move the entire query onto that server
// TODO: handle Traversals and ShortestPaths here!
// TODO: properly handle subqueries here
SmallVector<ExecutionNode*>::allocator_type::arena_type s;
SmallVector<ExecutionNode*> nodes{s};
std::vector<ExecutionNode::NodeType> types = {ExecutionNode::TRAVERSAL,
ExecutionNode::SHORTEST_PATH,
ExecutionNode::SUBQUERY};
plan->findNodesOfType(nodes, types, true);
bool hasIncompatibleNodes = !nodes.empty();
nodes.clear();
types = {ExecutionNode::INDEX, ExecutionNode::ENUMERATE_COLLECTION, ExecutionNode::TRAVERSAL};
plan->findNodesOfType(nodes, types, false);
if (!nodes.empty() && !hasIncompatibleNodes) {
// turn off all other cluster optimization rules now as they are superfluous
opt->disableRule(OptimizerRule::optimizeClusterJoinsRule);
opt->disableRule(OptimizerRule::distributeInClusterRule);
opt->disableRule(OptimizerRule::scatterInClusterRule);
opt->disableRule(OptimizerRule::distributeFilternCalcToClusterRule);
opt->disableRule(OptimizerRule::distributeSortToClusterRule);
opt->disableRule(OptimizerRule::removeUnnecessaryRemoteScatterRule);
#ifdef USE_ENTERPRISE
opt->disableRule(OptimizerRule::removeSatelliteJoinsRule);
#endif
opt->disableRule(OptimizerRule::undistributeRemoveAfterEnumCollRule);
// get first collection from query
Collection const* c = ::getCollection(nodes[0]);
TRI_ASSERT(c != nullptr);
auto& vocbase = plan->getAst()->query()->vocbase();
ExecutionNode* rootNode = plan->root();
// insert a remote node
ExecutionNode* remoteNode =
new RemoteNode(plan.get(), plan->nextId(), &vocbase, "", "", "");
plan->registerNode(remoteNode);
remoteNode->addDependency(rootNode);
// insert a gather node
auto const sortMode = GatherNode::evaluateSortMode(c->numberOfShards());
auto* gatherNode = new GatherNode(plan.get(), plan->nextId(), sortMode);
plan->registerNode(gatherNode);
gatherNode->addDependency(remoteNode);
plan->root(gatherNode, true);
wasModified = true;
}
opt->addPlan(std::move(plan), rule, wasModified);
}
void arangodb::aql::optimizeClusterJoinsRule(Optimizer* opt,
std::unique_ptr<ExecutionPlan> plan,
OptimizerRule const* rule) {
TRI_ASSERT(arangodb::ServerState::instance()->isCoordinator());
bool wasModified = false;
SmallVector<ExecutionNode*>::allocator_type::arena_type s;
SmallVector<ExecutionNode*> nodes{s};
std::vector<ExecutionNode::NodeType> const types = {ExecutionNode::ENUMERATE_COLLECTION,
ExecutionNode::INDEX};
plan->findNodesOfType(nodes, types, true);
for (auto& n : nodes) {
ExecutionNode* current = n->getFirstDependency();
while (current != nullptr) {
if (current->getType() == ExecutionNode::ENUMERATE_COLLECTION ||
current->getType() == ExecutionNode::INDEX) {
Collection const* c1 = ::getCollection(n);
Collection const* c2 = ::getCollection(current);
bool qualifies = false;
// check how many (different) responsible servers we have for this
// collection
std::unordered_set<std::string> responsibleServers;
size_t n1 = c1->responsibleServers(responsibleServers);
size_t n2 = c2->responsibleServers(responsibleServers);
if (responsibleServers.size() == 1 && c1->numberOfShards() == 1 &&
c2->numberOfShards() == 1) {
// a single responsible server. so we can use a shard-local access
qualifies = true;
} else if ((c1->isSatellite() && (c2->numberOfShards() == 1 || n2 == 1)) ||
(c2->isSatellite() && (c1->numberOfShards() == 1 || n1 == 1))) {
// a satellite collection and another collection with a single shard
// or single responsible server
qualifies = true;
}
if (!qualifies && n->getType() == EN::INDEX) {
Variable const* indexVariable = ::getOutVariable(n);
Variable const* otherVariable = ::getOutVariable(current);
std::string dist1 = c1->distributeShardsLike();
std::string dist2 = c2->distributeShardsLike();
// convert cluster collection names into proper collection names
if (!dist1.empty()) {
auto trx = plan->getAst()->query()->trx();
dist1 = trx->resolver()->getCollectionNameCluster(
static_cast<TRI_voc_cid_t>(basics::StringUtils::uint64(dist1)));
}
if (!dist2.empty()) {
auto trx = plan->getAst()->query()->trx();
dist2 = trx->resolver()->getCollectionNameCluster(
static_cast<TRI_voc_cid_t>(basics::StringUtils::uint64(dist2)));
}
if (dist1 == c2->name() || dist2 == c1->name() ||
(!dist1.empty() && dist1 == dist2)) {
// collections have the same "distributeShardsLike" values
// so their shards are distributed to the same servers for the
// same shardKey values
// now check if the number of shardKeys match
auto keys1 = c1->shardKeys();
auto keys2 = c2->shardKeys();
if (keys1.size() == keys2.size()) {
// same number of shard keys... now check if the shard keys are
// all used and whether we only have equality joins
Condition const* condition =
ExecutionNode::castTo<IndexNode const*>(n)->condition();
if (condition != nullptr) {
AstNode const* root = condition->root();
if (root != nullptr && root->type == NODE_TYPE_OPERATOR_NARY_OR) {
size_t found = 0;
size_t numAnds = root->numMembers();
for (size_t i = 0; i < numAnds; ++i) {
AstNode const* andNode = root->getMember(i);
if (andNode == nullptr) {
continue;
}
TRI_ASSERT(andNode->type == NODE_TYPE_OPERATOR_NARY_AND);
std::unordered_set<int> shardKeysFound;
size_t numConds = andNode->numMembers();
if (numConds < keys1.size()) {
// too few join conditions, so we will definitely not
// cover all shardKeys
break;
}
for (size_t j = 0; j < numConds; ++j) {
AstNode const* condNode = andNode->getMember(j);
if (condNode == nullptr || condNode->type != NODE_TYPE_OPERATOR_BINARY_EQ) {
// something other than an equality join. we do not
// support this
continue;
}
// equality comparison
// now check if this comparison has the pattern
// <variable from collection1>.<attribute from
// collection1> ==
// <variable from collection2>.<attribute from
// collection2>
auto const* lhs = condNode->getMember(0);
auto const* rhs = condNode->getMember(1);
if (lhs->type != NODE_TYPE_ATTRIBUTE_ACCESS ||
rhs->type != NODE_TYPE_ATTRIBUTE_ACCESS) {
// something else
continue;
}
AstNode const* lhsData = lhs->getMember(0);
AstNode const* rhsData = rhs->getMember(0);
if (lhsData->type != NODE_TYPE_REFERENCE ||
rhsData->type != NODE_TYPE_REFERENCE) {
// something else
continue;
}
Variable const* lhsVar =
static_cast<Variable const*>(lhsData->getData());
Variable const* rhsVar =
static_cast<Variable const*>(rhsData->getData());
std::string leftString = lhs->getString();
std::string rightString = rhs->getString();
int pos = -1;
if (lhsVar == indexVariable && rhsVar == otherVariable &&
indexOf(keys1, leftString) == indexOf(keys2, rightString)) {
pos = indexOf(keys1, leftString);
// indexedCollection.shardKeyAttribute ==
// otherCollection.shardKeyAttribute
} else if (lhsVar == otherVariable && rhsVar == indexVariable &&
indexOf(keys2, leftString) == indexOf(keys1, rightString)) {
// otherCollection.shardKeyAttribute ==
// indexedCollection.shardKeyAttribute
pos = indexOf(keys2, leftString);
}
// we found a shardKeys match
if (pos != -1) {
shardKeysFound.emplace(pos);
}
}
// conditions match
if (shardKeysFound.size() >= keys1.size()) {
// all shard keys covered
++found;
} else {
// not all shard keys covered
break;
}
}
qualifies = (found > 0 && found == numAnds);
}
}
}
}
}
// everything else does not qualify
if (qualifies) {
wasModified = true;
plan->excludeFromScatterGather(current);
break; // done for this pair
}
} else if (current->getType() != ExecutionNode::FILTER &&
current->getType() != ExecutionNode::CALCULATION &&
current->getType() != ExecutionNode::LIMIT) {
// we allow just these nodes in between and ignore them
// we need to stop for all other types of nodes
break;
}
current = current->getFirstDependency();
}
}
opt->addPlan(std::move(plan), rule, wasModified);
}
/// @brief scatter operations in cluster /// @brief scatter operations in cluster
/// this rule inserts scatter, gather and remote nodes so operations on sharded /// this rule inserts scatter, gather and remote nodes so operations on sharded
/// collections actually work /// collections actually work
@ -4805,6 +4537,39 @@ void arangodb::aql::restrictToSingleShardRule(Optimizer* opt,
std::unordered_set<ExecutionNode*> toUnlink; std::unordered_set<ExecutionNode*> toUnlink;
std::map<Collection const*, std::unordered_set<std::string>> modificationRestrictions; std::map<Collection const*, std::unordered_set<std::string>> modificationRestrictions;
// forward a shard key restriction from one collection to the other if the two collections
// are used in a smart join (and use distributeShardsLike on each other)
auto forwardRestrictionToPrototype = [&plan](ExecutionNode const* current, std::string const& shardId) {
auto collectionNode = dynamic_cast<CollectionAccessingNode const*>(current);
auto prototypeOutVariable = collectionNode->prototypeOutVariable();
if (prototypeOutVariable == nullptr) {
return;
}
auto setter = plan->getVarSetBy(prototypeOutVariable->id);
if (setter == nullptr ||
(setter->getType() != EN::INDEX && setter->getType() != EN::ENUMERATE_COLLECTION)) {
return;
}
auto s1 = ::getCollection(current)->shardIds();
auto s2 = ::getCollection(setter)->shardIds();
if (s1->size() != s2->size()) {
// different number of shard ids... should not happen if we have a prototype
return;
}
// find matching shard key
for (size_t i = 0; i < s1->size(); ++i) {
if ((*s1)[i] == shardId) {
::restrictToShard(setter, (*s2)[i]);
break;
}
}
};
for (auto& node : nodes) { for (auto& node : nodes) {
TRI_ASSERT(node->getType() == ExecutionNode::REMOTE); TRI_ASSERT(node->getType() == ExecutionNode::REMOTE);
ExecutionNode* current = node->getFirstDependency(); ExecutionNode* current = node->getFirstDependency();
@ -4878,12 +4643,14 @@ void arangodb::aql::restrictToSingleShardRule(Optimizer* opt,
if (finder.isSafeForOptimization(collectionVariable) && !shardId.empty()) { if (finder.isSafeForOptimization(collectionVariable) && !shardId.empty()) {
wasModified = true; wasModified = true;
::restrictToShard(current, shardId); ::restrictToShard(current, shardId);
forwardRestrictionToPrototype(current, shardId);
} else if (finder.isSafeForOptimization(collection)) { } else if (finder.isSafeForOptimization(collection)) {
auto& shards = modificationRestrictions[collection]; auto& shards = modificationRestrictions[collection];
if (shards.size() == 1) { if (shards.size() == 1) {
wasModified = true; wasModified = true;
shardId = *shards.begin(); shardId = *shards.begin();
::restrictToShard(current, shardId); ::restrictToShard(current, shardId);
forwardRestrictionToPrototype(current, shardId);
} }
} }
} else if (currentType == ExecutionNode::UPSERT || currentType == ExecutionNode::REMOTE || } else if (currentType == ExecutionNode::UPSERT || currentType == ExecutionNode::REMOTE ||

View File

@ -124,14 +124,6 @@ void substituteClusterSingleDocumentOperations(Optimizer* opt,
std::unique_ptr<ExecutionPlan> plan, std::unique_ptr<ExecutionPlan> plan,
OptimizerRule const* rule); OptimizerRule const* rule);
/// @brief optimize queries in the cluster so that the entire query gets pushed to a single server
void optimizeClusterSingleShardRule(Optimizer*, std::unique_ptr<ExecutionPlan>,
OptimizerRule const*);
/// @brief try to find candidates for shard-local joins in the cluster
void optimizeClusterJoinsRule(Optimizer*, std::unique_ptr<ExecutionPlan>,
OptimizerRule const*);
/// @brief scatter operations in cluster - send all incoming rows to all remote /// @brief scatter operations in cluster - send all incoming rows to all remote
/// clients /// clients
void scatterInClusterRule(Optimizer*, std::unique_ptr<ExecutionPlan>, OptimizerRule const*); void scatterInClusterRule(Optimizer*, std::unique_ptr<ExecutionPlan>, OptimizerRule const*);
@ -155,6 +147,9 @@ ExecutionNode* distributeInClusterRuleSmartEdgeCollection(ExecutionPlan*, Subque
/// @brief remove scatter/gather and remote nodes for satellite collections /// @brief remove scatter/gather and remote nodes for satellite collections
void removeSatelliteJoinsRule(Optimizer*, std::unique_ptr<ExecutionPlan>, void removeSatelliteJoinsRule(Optimizer*, std::unique_ptr<ExecutionPlan>,
OptimizerRule const*); OptimizerRule const*);
void smartJoinsRule(Optimizer*, std::unique_ptr<ExecutionPlan>,
OptimizerRule const*);
#endif #endif
/// @brief try to restrict fragments to a single shard if possible /// @brief try to restrict fragments to a single shard if possible

View File

@ -289,14 +289,6 @@ void OptimizerRulesFeature::addRules() {
OptimizerRule::substituteSingleDocumentOperations, OptimizerRule::substituteSingleDocumentOperations,
DoesNotCreateAdditionalPlans, CanBeDisabled); DoesNotCreateAdditionalPlans, CanBeDisabled);
#if 0
registerRule("optimize-cluster-single-shard", optimizeClusterSingleShardRule,
OptimizerRule::optimizeClusterSingleShardRule, DoesNotCreateAdditionalPlans, CanBeDisabled);
registerRule("optimize-cluster-joins", optimizeClusterJoinsRule,
OptimizerRule::optimizeClusterJoinsRule, DoesNotCreateAdditionalPlans, CanBeDisabled);
#endif
// distribute operations in cluster // distribute operations in cluster
registerRule("scatter-in-cluster", scatterInClusterRule, OptimizerRule::scatterInClusterRule, registerRule("scatter-in-cluster", scatterInClusterRule, OptimizerRule::scatterInClusterRule,
DoesNotCreateAdditionalPlans, CanNotBeDisabled); DoesNotCreateAdditionalPlans, CanNotBeDisabled);
@ -329,6 +321,10 @@ void OptimizerRulesFeature::addRules() {
registerRule("remove-satellite-joins", removeSatelliteJoinsRule, registerRule("remove-satellite-joins", removeSatelliteJoinsRule,
OptimizerRule::removeSatelliteJoinsRule, OptimizerRule::removeSatelliteJoinsRule,
DoesNotCreateAdditionalPlans, CanBeDisabled); DoesNotCreateAdditionalPlans, CanBeDisabled);
registerRule("smart-joins", smartJoinsRule,
OptimizerRule::smartJoinsRule,
DoesNotCreateAdditionalPlans, CanBeDisabled);
#endif #endif
#ifdef USE_IRESEARCH #ifdef USE_IRESEARCH

View File

@ -419,6 +419,13 @@ static int distributeBabyOnShards(
shardID = shards->at(0); shardID = shards->at(0);
userSpecifiedKey = true; userSpecifiedKey = true;
} else { } else {
int r = transaction::Methods::validateSmartJoinAttribute(*(collinfo.get()), value);
if (r != TRI_ERROR_NO_ERROR) {
return r;
}
// Sort out the _key attribute: // Sort out the _key attribute:
// The user is allowed to specify _key, provided that _key is the one // The user is allowed to specify _key, provided that _key is the one
// and only sharding attribute, because in this case we can delegate // and only sharding attribute, because in this case we can delegate
@ -717,6 +724,36 @@ bool shardKeysChanged(LogicalCollection const& collection, VPackSlice const& old
return false; return false;
} }
bool smartJoinAttributeChanged(LogicalCollection const& collection,
VPackSlice const& oldValue,
VPackSlice const& newValue, bool isPatch) {
if (!collection.hasSmartJoinAttribute()) {
return false;
}
if (!oldValue.isObject() || !newValue.isObject()) {
// expecting two objects. everything else is an error
return true;
}
std::string const& s = collection.smartJoinAttribute();
VPackSlice n = newValue.get(s);
if (!n.isString()) {
if (isPatch && n.isNone()) {
// attribute not set in patch document. this means no update
return false;
}
// no string value... invalid!
return true;
}
VPackSlice o = oldValue.get(s);
TRI_ASSERT(o.isString());
return (arangodb::basics::VelocyPackHelper::compare(n, o, false) != 0);
}
//////////////////////////////////////////////////////////////////////////////// ////////////////////////////////////////////////////////////////////////////////
/// @brief returns revision for a sharded collection /// @brief returns revision for a sharded collection
//////////////////////////////////////////////////////////////////////////////// ////////////////////////////////////////////////////////////////////////////////

View File

@ -59,6 +59,11 @@ std::unordered_map<std::string, std::string> getForwardableRequestHeaders(Genera
bool shardKeysChanged(LogicalCollection const& collection, VPackSlice const& oldValue, bool shardKeysChanged(LogicalCollection const& collection, VPackSlice const& oldValue,
VPackSlice const& newValue, bool isPatch); VPackSlice const& newValue, bool isPatch);
/// @brief check if the value of the smartJoinAttribute has changed
bool smartJoinAttributeChanged(LogicalCollection const& collection,
VPackSlice const& oldValue,
VPackSlice const& newValue, bool isPatch);
//////////////////////////////////////////////////////////////////////////////// ////////////////////////////////////////////////////////////////////////////////
/// @brief returns revision for a sharded collection /// @brief returns revision for a sharded collection
//////////////////////////////////////////////////////////////////////////////// ////////////////////////////////////////////////////////////////////////////////

View File

@ -263,7 +263,7 @@ DBServerAgencySyncResult DBServerAgencySync::execute() {
LOG_TOPIC(INFO, Logger::MAINTENANCE) LOG_TOPIC(INFO, Logger::MAINTENANCE)
<< "Error reporting to agency: _statusCode: " << r.errorCode() << "Error reporting to agency: _statusCode: " << r.errorCode()
<< " message: " << r.errorMessage() << " message: " << r.errorMessage()
<< ". This can be ignored, since it will be retried automaticlly."; << ". This can be ignored, since it will be retried automatically.";
} else { } else {
LOG_TOPIC(DEBUG, Logger::MAINTENANCE) LOG_TOPIC(DEBUG, Logger::MAINTENANCE)
<< "Invalidating current in ClusterInfo"; << "Invalidating current in ClusterInfo";

View File

@ -3408,9 +3408,12 @@ Result MMFilesCollection::update(arangodb::transaction::Methods* trx,
if (_isDBServer) { if (_isDBServer) {
// Need to check that no sharding keys have changed: // Need to check that no sharding keys have changed:
if (arangodb::shardKeysChanged(_logicalCollection, oldDoc, builder->slice(), false)) { if (arangodb::shardKeysChanged(_logicalCollection, oldDoc, builder->slice(), true)) {
return Result(TRI_ERROR_CLUSTER_MUST_NOT_CHANGE_SHARDING_ATTRIBUTES); return Result(TRI_ERROR_CLUSTER_MUST_NOT_CHANGE_SHARDING_ATTRIBUTES);
} }
if (arangodb::smartJoinAttributeChanged(_logicalCollection, oldDoc, builder->slice(), true)) {
return Result(TRI_ERROR_CLUSTER_MUST_NOT_CHANGE_SMART_JOIN_ATTRIBUTE);
}
} }
} else { } else {
revisionId = TRI_ExtractRevisionId(VPackSlice( revisionId = TRI_ExtractRevisionId(VPackSlice(
@ -3543,6 +3546,9 @@ Result MMFilesCollection::replace(transaction::Methods* trx, VPackSlice const ne
if (arangodb::shardKeysChanged(_logicalCollection, oldDoc, builder->slice(), false)) { if (arangodb::shardKeysChanged(_logicalCollection, oldDoc, builder->slice(), false)) {
return Result(TRI_ERROR_CLUSTER_MUST_NOT_CHANGE_SHARDING_ATTRIBUTES); return Result(TRI_ERROR_CLUSTER_MUST_NOT_CHANGE_SHARDING_ATTRIBUTES);
} }
if (arangodb::smartJoinAttributeChanged(_logicalCollection, oldDoc, builder->slice(), false)) {
return Result(TRI_ERROR_CLUSTER_MUST_NOT_CHANGE_SMART_JOIN_ATTRIBUTE);
}
} }
} }

View File

@ -278,7 +278,8 @@ void RestCollectionHandler::handleCommandPost() {
StaticStrings::WaitForSyncString, "cacheEnabled", StaticStrings::WaitForSyncString, "cacheEnabled",
StaticStrings::ShardKeys, StaticStrings::NumberOfShards, StaticStrings::ShardKeys, StaticStrings::NumberOfShards,
StaticStrings::DistributeShardsLike, "avoidServers", StaticStrings::IsSmart, StaticStrings::DistributeShardsLike, "avoidServers", StaticStrings::IsSmart,
"shardingStrategy", StaticStrings::GraphSmartGraphAttribute, StaticStrings::ReplicationFactor, "shardingStrategy", StaticStrings::GraphSmartGraphAttribute,
StaticStrings::SmartJoinAttribute, StaticStrings::ReplicationFactor,
"servers"}); "servers"});
VPackSlice const parameters = filtered.slice(); VPackSlice const parameters = filtered.slice();

View File

@ -42,6 +42,7 @@ QueryRegistryFeature::QueryRegistryFeature(application_features::ApplicationServ
_trackSlowQueries(true), _trackSlowQueries(true),
_trackBindVars(true), _trackBindVars(true),
_failOnWarning(false), _failOnWarning(false),
_smartJoins(false),
_queryMemoryLimit(0), _queryMemoryLimit(0),
_maxQueryPlans(128), _maxQueryPlans(128),
_slowQueryThreshold(10.0), _slowQueryThreshold(10.0),
@ -126,6 +127,12 @@ void QueryRegistryFeature::collectOptions(std::shared_ptr<ProgramOptions> option
"default time-to-live of query snippets (in seconds)", "default time-to-live of query snippets (in seconds)",
new DoubleParameter(&_queryRegistryTTL), new DoubleParameter(&_queryRegistryTTL),
arangodb::options::makeFlags(arangodb::options::Flags::Hidden)); arangodb::options::makeFlags(arangodb::options::Flags::Hidden));
options->addOption("--query.smart-joins",
"enable smart joins query optimization",
new BooleanParameter(&_smartJoins),
arangodb::options::makeFlags(arangodb::options::Flags::Hidden, arangodb::options::Flags::Enterprise))
.setIntroducedIn(30405).setIntroducedIn(30500);
} }
void QueryRegistryFeature::validateOptions(std::shared_ptr<ProgramOptions> options) { void QueryRegistryFeature::validateOptions(std::shared_ptr<ProgramOptions> options) {

View File

@ -54,6 +54,7 @@ class QueryRegistryFeature final : public application_features::ApplicationFeatu
return _slowStreamingQueryThreshold; return _slowStreamingQueryThreshold;
} }
bool failOnWarning() const { return _failOnWarning; } bool failOnWarning() const { return _failOnWarning; }
bool smartJoins() const { return _smartJoins; }
uint64_t queryMemoryLimit() const { return _queryMemoryLimit; } uint64_t queryMemoryLimit() const { return _queryMemoryLimit; }
uint64_t maxQueryPlans() const { return _maxQueryPlans; } uint64_t maxQueryPlans() const { return _maxQueryPlans; }
@ -61,6 +62,7 @@ class QueryRegistryFeature final : public application_features::ApplicationFeatu
bool _trackSlowQueries; bool _trackSlowQueries;
bool _trackBindVars; bool _trackBindVars;
bool _failOnWarning; bool _failOnWarning;
bool _smartJoins;
uint64_t _queryMemoryLimit; uint64_t _queryMemoryLimit;
uint64_t _maxQueryPlans; uint64_t _maxQueryPlans;
double _slowQueryThreshold; double _slowQueryThreshold;

View File

@ -883,9 +883,12 @@ Result RocksDBCollection::update(arangodb::transaction::Methods* trx,
if (_isDBServer) { if (_isDBServer) {
// Need to check that no sharding keys have changed: // Need to check that no sharding keys have changed:
if (arangodb::shardKeysChanged(_logicalCollection, oldDoc, builder->slice(), false)) { if (arangodb::shardKeysChanged(_logicalCollection, oldDoc, builder->slice(), true)) {
return Result(TRI_ERROR_CLUSTER_MUST_NOT_CHANGE_SHARDING_ATTRIBUTES); return Result(TRI_ERROR_CLUSTER_MUST_NOT_CHANGE_SHARDING_ATTRIBUTES);
} }
if (arangodb::smartJoinAttributeChanged(_logicalCollection, oldDoc, builder->slice(), true)) {
return Result(TRI_ERROR_CLUSTER_MUST_NOT_CHANGE_SMART_JOIN_ATTRIBUTE);
}
} }
VPackSlice const newDoc(builder->slice()); VPackSlice const newDoc(builder->slice());
@ -987,6 +990,9 @@ Result RocksDBCollection::replace(transaction::Methods* trx,
if (arangodb::shardKeysChanged(_logicalCollection, oldDoc, builder->slice(), false)) { if (arangodb::shardKeysChanged(_logicalCollection, oldDoc, builder->slice(), false)) {
return Result(TRI_ERROR_CLUSTER_MUST_NOT_CHANGE_SHARDING_ATTRIBUTES); return Result(TRI_ERROR_CLUSTER_MUST_NOT_CHANGE_SHARDING_ATTRIBUTES);
} }
if (arangodb::smartJoinAttributeChanged(_logicalCollection, oldDoc, builder->slice(), false)) {
return Result(TRI_ERROR_CLUSTER_MUST_NOT_CHANGE_SMART_JOIN_ATTRIBUTE);
}
} }
VPackSlice const newDoc(builder->slice()); VPackSlice const newDoc(builder->slice());

View File

@ -1738,6 +1738,12 @@ OperationResult transaction::Methods::insertLocal(std::string const& collectionN
return Result(TRI_ERROR_ARANGO_DOCUMENT_TYPE_INVALID); return Result(TRI_ERROR_ARANGO_DOCUMENT_TYPE_INVALID);
} }
int r = validateSmartJoinAttribute(*collection, value);
if (r != TRI_ERROR_NO_ERROR) {
return Result(r);
}
ManagedDocumentResult documentResult; ManagedDocumentResult documentResult;
TRI_voc_tick_t resultMarkerTick = 0; TRI_voc_tick_t resultMarkerTick = 0;
TRI_voc_rid_t revisionId = 0; TRI_voc_rid_t revisionId = 0;
@ -3428,3 +3434,10 @@ Result Methods::replicateOperations(LogicalCollection const& collection,
return Result{}; return Result{};
} }
#ifndef USE_ENTERPRISE
/*static*/ int Methods::validateSmartJoinAttribute(LogicalCollection const&,
arangodb::velocypack::Slice) {
return TRI_ERROR_NO_ERROR;
}
#endif

View File

@ -434,6 +434,9 @@ class Methods {
} }
#endif #endif
static int validateSmartJoinAttribute(LogicalCollection const& collinfo,
arangodb::velocypack::Slice value);
private: private:
/// @brief build a VPack object with _id, _key and _rev and possibly /// @brief build a VPack object with _id, _key and _rev and possibly
/// oldRef (if given), the result is added to the builder in the /// oldRef (if given), the result is added to the builder in the

View File

@ -149,10 +149,10 @@ LogicalCollection::LogicalCollection(TRI_vocbase_t& vocbase, VPackSlice const& i
_isSmart(Helper::readBooleanValue(info, StaticStrings::IsSmart, false)), _isSmart(Helper::readBooleanValue(info, StaticStrings::IsSmart, false)),
_waitForSync(Helper::readBooleanValue(info, StaticStrings::WaitForSyncString, false)), _waitForSync(Helper::readBooleanValue(info, StaticStrings::WaitForSyncString, false)),
_allowUserKeys(Helper::readBooleanValue(info, "allowUserKeys", true)), _allowUserKeys(Helper::readBooleanValue(info, "allowUserKeys", true)),
_keyOptions(nullptr), #ifdef USE_ENTERPRISE
_keyGenerator(), _smartJoinAttribute(::readStringValue(info, StaticStrings::SmartJoinAttribute, "")),
_physical(EngineSelectorFeature::ENGINE->createPhysicalCollection(*this, info)), #endif
_sharding() { _physical(EngineSelectorFeature::ENGINE->createPhysicalCollection(*this, info)) {
TRI_ASSERT(info.isObject()); TRI_ASSERT(info.isObject());
if (!TRI_vocbase_t::IsAllowedName(info)) { if (!TRI_vocbase_t::IsAllowedName(info)) {
@ -182,6 +182,48 @@ LogicalCollection::LogicalCollection(TRI_vocbase_t& vocbase, VPackSlice const& i
_sharding = std::make_unique<ShardingInfo>(info, this); _sharding = std::make_unique<ShardingInfo>(info, this);
#ifdef USE_ENTERPRISE
if (ServerState::instance()->isCoordinator()) {
if (!info.get(StaticStrings::SmartJoinAttribute).isNone() &&
!hasSmartJoinAttribute()) {
THROW_ARANGO_EXCEPTION_MESSAGE(TRI_ERROR_INVALID_SMART_JOIN_ATTRIBUTE,
"smartJoinAttribute must contain a string attribute name");
}
if (hasSmartJoinAttribute()) {
auto const& sk = _sharding->shardKeys();
TRI_ASSERT(!sk.empty());
if (sk.size() != 1) {
THROW_ARANGO_EXCEPTION_MESSAGE(TRI_ERROR_INVALID_SMART_JOIN_ATTRIBUTE,
"smartJoinAttribute can only be used for collections with a single shardKey value");
}
TRI_ASSERT(!sk.front().empty());
if (sk.front().back() != ':') {
THROW_ARANGO_EXCEPTION_MESSAGE(TRI_ERROR_INVALID_SMART_JOIN_ATTRIBUTE,
std::string("smartJoinAttribute can only be used for shardKeys ending on ':', got '") + sk.front() + "'");
}
if (_isSmart) {
if (_type == TRI_COL_TYPE_EDGE) {
THROW_ARANGO_EXCEPTION_MESSAGE(TRI_ERROR_INVALID_SMART_JOIN_ATTRIBUTE,
"cannot use smartJoinAttribute on a smart edge collection");
} else if (_type == TRI_COL_TYPE_DOCUMENT) {
VPackSlice sga = info.get(StaticStrings::GraphSmartGraphAttribute);
if (sga.isString() && sga.copyString() != info.get(StaticStrings::SmartJoinAttribute).copyString()) {
THROW_ARANGO_EXCEPTION_MESSAGE(TRI_ERROR_INVALID_SMART_JOIN_ATTRIBUTE,
"smartJoinAttribute must be equal to smartGraphAttribute");
}
}
}
}
}
#else
// whatever we got passed in, in a non-enterprise build, we just ignore
// any specification for the smartJoinAttribute
_smartJoinAttribute.clear();
#endif
if (ServerState::instance()->isDBServer() || if (ServerState::instance()->isDBServer() ||
!ServerState::instance()->isRunningInCluster()) { !ServerState::instance()->isRunningInCluster()) {
_followers.reset(new FollowerInfo(this)); _followers.reset(new FollowerInfo(this));
@ -377,10 +419,6 @@ TRI_voc_rid_t LogicalCollection::revision(transaction::Methods* trx) const {
return _physical->revision(trx); return _physical->revision(trx);
} }
bool LogicalCollection::waitForSync() const { return _waitForSync; }
bool LogicalCollection::isSmart() const { return _isSmart; }
std::unique_ptr<FollowerInfo> const& LogicalCollection::followers() const { std::unique_ptr<FollowerInfo> const& LogicalCollection::followers() const {
return _followers; return _followers;
} }
@ -611,6 +649,10 @@ arangodb::Result LogicalCollection::appendVelocyPack(arangodb::velocypack::Build
// Cluster Specific // Cluster Specific
result.add(StaticStrings::IsSmart, VPackValue(_isSmart)); result.add(StaticStrings::IsSmart, VPackValue(_isSmart));
if (hasSmartJoinAttribute()) {
result.add(StaticStrings::SmartJoinAttribute, VPackValue(_smartJoinAttribute));
}
if (!forPersistence) { if (!forPersistence) {
// with 'forPersistence' added by LogicalDataSource::toVelocyPack // with 'forPersistence' added by LogicalDataSource::toVelocyPack
// FIXME TODO is this needed in !forPersistence??? // FIXME TODO is this needed in !forPersistence???

View File

@ -148,10 +148,16 @@ class LogicalCollection : public LogicalDataSource {
// SECTION: Properties // SECTION: Properties
TRI_voc_rid_t revision(transaction::Methods*) const; TRI_voc_rid_t revision(transaction::Methods*) const;
bool waitForSync() const; bool waitForSync() const { return _waitForSync; }
bool isSmart() const; bool isSmart() const { return _isSmart; }
bool isAStub() const { return _isAStub; } bool isAStub() const { return _isAStub; }
bool hasSmartJoinAttribute() const { return !smartJoinAttribute().empty(); }
/// @brief return the name of the smart join attribute (empty string
/// if no smart join attribute is present)
std::string const& smartJoinAttribute() const { return _smartJoinAttribute; }
void waitForSync(bool value) { _waitForSync = value; } void waitForSync(bool value) { _waitForSync = value; }
// SECTION: sharding // SECTION: sharding
@ -380,9 +386,11 @@ class LogicalCollection : public LogicalDataSource {
bool const _allowUserKeys; bool const _allowUserKeys;
std::string _smartJoinAttribute;
// SECTION: Key Options // SECTION: Key Options
// @brief options for key creation, TODO Really VPack? // @brief options for key creation
std::shared_ptr<velocypack::Buffer<uint8_t> const> _keyOptions; std::shared_ptr<velocypack::Buffer<uint8_t> const> _keyOptions;
std::unique_ptr<KeyGenerator> _keyGenerator; std::unique_ptr<KeyGenerator> _keyGenerator;

View File

@ -354,11 +354,16 @@ UpgradeResult methods::Upgrade::runTasks(TRI_vocbase_t& vocbase, VersionResult&
LOG_TOPIC(ERR, Logger::STARTUP) << msg << " Aborting procedure."; LOG_TOPIC(ERR, Logger::STARTUP) << msg << " Aborting procedure.";
return UpgradeResult(TRI_ERROR_INTERNAL, msg, vinfo.status); return UpgradeResult(TRI_ERROR_INTERNAL, msg, vinfo.status);
} }
} catch (basics::Exception const& e) { } catch (arangodb::basics::Exception const& e) {
LOG_TOPIC(ERR, Logger::STARTUP) LOG_TOPIC(ERR, Logger::STARTUP)
<< "Executing " << t.name << " (" << t.description << ") failed with " << "Executing " << t.name << " (" << t.description << ") failed with error: "
<< e.message() << ". Aborting procedure."; << e.what() << ". Aborting procedure.";
return UpgradeResult(e.code(), e.what(), vinfo.status); return UpgradeResult(e.code(), e.what(), vinfo.status);
} catch (std::exception const& e) {
LOG_TOPIC(ERR, Logger::STARTUP)
<< "Executing " << t.name << " (" << t.description << ") failed with error: "
<< e.what() << ". Aborting procedure.";
return UpgradeResult(TRI_ERROR_FAILED, e.what(), vinfo.status);
} }
// remember we already executed this one // remember we already executed this one

View File

@ -22,3 +22,6 @@ authentication = false
[ssl] [ssl]
keyfile = @TOP_DIR@/UnitTests/server.pem keyfile = @TOP_DIR@/UnitTests/server.pem
[query]
smart-joins = true

View File

@ -20,3 +20,6 @@ authentication = false
[ssl] [ssl]
keyfile = @TOP_DIR@/UnitTests/server.pem keyfile = @TOP_DIR@/UnitTests/server.pem
[query]
smart-joins = true

View File

@ -167,6 +167,11 @@
data.shardKeys = object.shardBy; data.shardKeys = object.shardBy;
} }
if (object.smartJoinAttribute &&
object.smartJoinAttribute !== '') {
data.smartJoinAttribute = object.smartJoinAttribute;
}
if (object.replicationFactor) { if (object.replicationFactor) {
data.replicationFactor = object.replicationFactor; data.replicationFactor = object.replicationFactor;
var pattern = new RegExp(/^[0-9]+$/); var pattern = new RegExp(/^[0-9]+$/);

View File

@ -96,6 +96,17 @@
</tr> </tr>
<% } %> <% } %>
<% if (figuresData.smartJoinAttribute) { %>
<tr>
<th class="collectionInfoTh2">Smart join attribute:</th>
<th class="collectionInfoTh">
<div class="modal-text"><%=figuresData.smartJoinAttribute%></div>
</th>
<th class="collectionInfoTh">
</th>
</tr>
<% } %>
<% if (figuresData.replicationFactor) { %> <% if (figuresData.replicationFactor) { %>
<tr> <tr>
<th class="collectionInfoTh2">Replication factor:</th> <th class="collectionInfoTh2">Replication factor:</th>

View File

@ -355,6 +355,7 @@
var collSync = $('#new-collection-sync').val(); var collSync = $('#new-collection-sync').val();
var shards = 1; var shards = 1;
var shardBy = []; var shardBy = [];
var smartJoinAttribute = '';
if (replicationFactor === '') { if (replicationFactor === '') {
replicationFactor = 1; replicationFactor = 1;
@ -364,6 +365,10 @@
} }
if (isCoordinator) { if (isCoordinator) {
if ($('#smart-join-attribute').val() !== '') {
smartJoinAttribute = $('#smart-join-attribute').val().trim();
}
shards = $('#new-collection-shards').val(); shards = $('#new-collection-shards').val();
if (shards === '') { if (shards === '') {
@ -429,6 +434,9 @@
if (self.engine.name !== 'rocksdb') { if (self.engine.name !== 'rocksdb') {
tmpObj.journalSize = collSize; tmpObj.journalSize = collSize;
} }
if (smartJoinAttribute !== '') {
tmpObj.smartJoinAttribute = smartJoinAttribute;
}
this.collection.newCollection(tmpObj, callback); this.collection.newCollection(tmpObj, callback);
window.modalView.hide(); window.modalView.hide();
arangoHelper.arangoNotification('Collection', 'Collection "' + collName + '" will be created.'); arangoHelper.arangoNotification('Collection', 'Collection "' + collName + '" will be created.');
@ -525,6 +533,18 @@
[{value: false, label: 'No'}, {value: true, label: 'Yes'}] [{value: false, label: 'No'}, {value: true, label: 'Yes'}]
) )
); );
advancedTableContent.push(
window.modalView.createTextEntry(
'smart-join-attribute',
'Smart join attribute',
'',
'String attribute name. Can be left empty if smart joins are not used.',
'',
false,
[
]
)
);
} }
advancedTableContent.push( advancedTableContent.push(
window.modalView.createTextEntry( window.modalView.createTextEntry(

View File

@ -350,6 +350,7 @@ ArangoCollection.prototype.properties = function (properties) {
'waitForSync': true, 'waitForSync': true,
'shardKeys': false, 'shardKeys': false,
'smartGraphAttribute': false, 'smartGraphAttribute': false,
'smartJoinAttribute': false,
'numberOfShards': false, 'numberOfShards': false,
'keyOptions': false, 'keyOptions': false,
'indexBuckets': true, 'indexBuckets': true,

View File

@ -346,7 +346,7 @@ ArangoDatabase.prototype._create = function (name, properties, type, options) {
'doCompact', 'keyOptions', 'shardKeys', 'numberOfShards', 'doCompact', 'keyOptions', 'shardKeys', 'numberOfShards',
'distributeShardsLike', 'indexBuckets', 'id', 'isSmart', 'distributeShardsLike', 'indexBuckets', 'id', 'isSmart',
'replicationFactor', 'shardingStrategy', 'smartGraphAttribute', 'replicationFactor', 'shardingStrategy', 'smartGraphAttribute',
'avoidServers', 'cacheEnabled'].forEach(function (p) { 'smartJoinAttribute', 'avoidServers', 'cacheEnabled'].forEach(function (p) {
if (properties.hasOwnProperty(p)) { if (properties.hasOwnProperty(p)) {
body[p] = properties[p]; body[p] = properties[p];
} }

View File

@ -321,6 +321,10 @@
"ERROR_KEY_MUST_BE_PREFIXED_WITH_SMART_GRAPH_ATTRIBUTE" : { "code" : 4003, "message" : "in smart vertex collections _key must be prefixed with the value of the smart graph attribute" }, "ERROR_KEY_MUST_BE_PREFIXED_WITH_SMART_GRAPH_ATTRIBUTE" : { "code" : 4003, "message" : "in smart vertex collections _key must be prefixed with the value of the smart graph attribute" },
"ERROR_ILLEGAL_SMART_GRAPH_ATTRIBUTE" : { "code" : 4004, "message" : "attribute cannot be used as smart graph attribute" }, "ERROR_ILLEGAL_SMART_GRAPH_ATTRIBUTE" : { "code" : 4004, "message" : "attribute cannot be used as smart graph attribute" },
"ERROR_SMART_GRAPH_ATTRIBUTE_MISMATCH" : { "code" : 4005, "message" : "smart graph attribute mismatch" }, "ERROR_SMART_GRAPH_ATTRIBUTE_MISMATCH" : { "code" : 4005, "message" : "smart graph attribute mismatch" },
"ERROR_INVALID_SMART_JOIN_ATTRIBUTE" : { "code" : 4006, "message" : "invalid smart join attribute declaration" },
"ERROR_KEY_MUST_BE_PREFIXED_WITH_SMART_JOIN_ATTRIBUTE" : { "code" : 4007, "message" : "shard key value must be prefixed with the value of the smart join attribute" },
"ERROR_NO_SMART_JOIN_ATTRIBUTE" : { "code" : 4008, "message" : "smart join attribute not given or invalid" },
"ERROR_CLUSTER_MUST_NOT_CHANGE_SMART_JOIN_ATTRIBUTE" : { "code" : 4009, "message" : "must not change the value of the smartJoinAttribute" },
"ERROR_CLUSTER_REPAIRS_FAILED" : { "code" : 5000, "message" : "error during cluster repairs" }, "ERROR_CLUSTER_REPAIRS_FAILED" : { "code" : 5000, "message" : "error during cluster repairs" },
"ERROR_CLUSTER_REPAIRS_NOT_ENOUGH_HEALTHY" : { "code" : 5001, "message" : "not enough (healthy) db servers" }, "ERROR_CLUSTER_REPAIRS_NOT_ENOUGH_HEALTHY" : { "code" : 5001, "message" : "not enough (healthy) db servers" },
"ERROR_CLUSTER_REPAIRS_REPLICATION_FACTOR_VIOLATED" : { "code" : 5002, "message" : "replication factor violated during cluster repairs" }, "ERROR_CLUSTER_REPAIRS_REPLICATION_FACTOR_VIOLATED" : { "code" : 5002, "message" : "replication factor violated during cluster repairs" },

View File

@ -1438,7 +1438,7 @@ function processQuery(query, explain, planIndex) {
case 'RemoteNode': case 'RemoteNode':
return keyword('REMOTE'); return keyword('REMOTE');
case 'DistributeNode': case 'DistributeNode':
return keyword('DISTRIBUTE'); return keyword('DISTRIBUTE') + ' ' + annotation('/* create keys: ' + node.createKeys + ', variable: ') + variableName(node.variable) + ' ' + annotation('*/');
case 'ScatterNode': case 'ScatterNode':
return keyword('SCATTER'); return keyword('SCATTER');
case 'GatherNode': case 'GatherNode':

View File

@ -170,6 +170,7 @@ std::string const StaticStrings::IsSmart("isSmart");
std::string const StaticStrings::NumberOfShards("numberOfShards"); std::string const StaticStrings::NumberOfShards("numberOfShards");
std::string const StaticStrings::ReplicationFactor("replicationFactor"); std::string const StaticStrings::ReplicationFactor("replicationFactor");
std::string const StaticStrings::ShardKeys("shardKeys"); std::string const StaticStrings::ShardKeys("shardKeys");
std::string const StaticStrings::SmartJoinAttribute("smartJoinAttribute");
// graph attribute names // graph attribute names
std::string const StaticStrings::GraphCollection("_graphs"); std::string const StaticStrings::GraphCollection("_graphs");

View File

@ -156,6 +156,7 @@ class StaticStrings {
static std::string const DistributeShardsLike; static std::string const DistributeShardsLike;
static std::string const ReplicationFactor; static std::string const ReplicationFactor;
static std::string const ShardKeys; static std::string const ShardKeys;
static std::string const SmartJoinAttribute;
// graph attribute names // graph attribute names
static std::string const GraphCollection; static std::string const GraphCollection;

View File

@ -441,6 +441,10 @@ ERROR_CANNOT_DROP_SMART_COLLECTION,4002,"cannot drop this smart collection","Thi
ERROR_KEY_MUST_BE_PREFIXED_WITH_SMART_GRAPH_ATTRIBUTE,4003,"in smart vertex collections _key must be prefixed with the value of the smart graph attribute","In a smart vertex collection _key must be prefixed with the value of the smart graph attribute." ERROR_KEY_MUST_BE_PREFIXED_WITH_SMART_GRAPH_ATTRIBUTE,4003,"in smart vertex collections _key must be prefixed with the value of the smart graph attribute","In a smart vertex collection _key must be prefixed with the value of the smart graph attribute."
ERROR_ILLEGAL_SMART_GRAPH_ATTRIBUTE,4004,"attribute cannot be used as smart graph attribute","The given smartGraph attribute is illegal and connot be used for sharding. All system attributes are forbidden." ERROR_ILLEGAL_SMART_GRAPH_ATTRIBUTE,4004,"attribute cannot be used as smart graph attribute","The given smartGraph attribute is illegal and connot be used for sharding. All system attributes are forbidden."
ERROR_SMART_GRAPH_ATTRIBUTE_MISMATCH,4005,"smart graph attribute mismatch","The smart graph attribute of the given collection does not match the smart graph attribute of the graph." ERROR_SMART_GRAPH_ATTRIBUTE_MISMATCH,4005,"smart graph attribute mismatch","The smart graph attribute of the given collection does not match the smart graph attribute of the graph."
ERROR_INVALID_SMART_JOIN_ATTRIBUTE,4006,"invalid smart join attribute declaration","Will be raised when the smartJoinAttribute declaration is invalid."
ERROR_KEY_MUST_BE_PREFIXED_WITH_SMART_JOIN_ATTRIBUTE,4007,"shard key value must be prefixed with the value of the smart join attribute","when using smartJoinAttribute for a collection, the shard key value must be prefixed with the value of the smart join attribute."
ERROR_NO_SMART_JOIN_ATTRIBUTE,4008,"smart join attribute not given or invalid","The given document does not have the required smart join attribute set or it has an invalid value."
ERROR_CLUSTER_MUST_NOT_CHANGE_SMART_JOIN_ATTRIBUTE,4009,"must not change the value of the smartJoinAttribute","Will be raised if there is an attempt to update the value of the smartJoinAttribute."
################################################################################ ################################################################################
## Cluster repair errors ## Cluster repair errors

View File

@ -320,6 +320,10 @@ void TRI_InitializeErrorMessages() {
REG_ERROR(ERROR_KEY_MUST_BE_PREFIXED_WITH_SMART_GRAPH_ATTRIBUTE, "in smart vertex collections _key must be prefixed with the value of the smart graph attribute"); REG_ERROR(ERROR_KEY_MUST_BE_PREFIXED_WITH_SMART_GRAPH_ATTRIBUTE, "in smart vertex collections _key must be prefixed with the value of the smart graph attribute");
REG_ERROR(ERROR_ILLEGAL_SMART_GRAPH_ATTRIBUTE, "attribute cannot be used as smart graph attribute"); REG_ERROR(ERROR_ILLEGAL_SMART_GRAPH_ATTRIBUTE, "attribute cannot be used as smart graph attribute");
REG_ERROR(ERROR_SMART_GRAPH_ATTRIBUTE_MISMATCH, "smart graph attribute mismatch"); REG_ERROR(ERROR_SMART_GRAPH_ATTRIBUTE_MISMATCH, "smart graph attribute mismatch");
REG_ERROR(ERROR_INVALID_SMART_JOIN_ATTRIBUTE, "invalid smart join attribute declaration");
REG_ERROR(ERROR_KEY_MUST_BE_PREFIXED_WITH_SMART_JOIN_ATTRIBUTE, "shard key value must be prefixed with the value of the smart join attribute");
REG_ERROR(ERROR_NO_SMART_JOIN_ATTRIBUTE, "smart join attribute not given or invalid");
REG_ERROR(ERROR_CLUSTER_MUST_NOT_CHANGE_SMART_JOIN_ATTRIBUTE, "must not change the value of the smartJoinAttribute");
REG_ERROR(ERROR_CLUSTER_REPAIRS_FAILED, "error during cluster repairs"); REG_ERROR(ERROR_CLUSTER_REPAIRS_FAILED, "error during cluster repairs");
REG_ERROR(ERROR_CLUSTER_REPAIRS_NOT_ENOUGH_HEALTHY, "not enough (healthy) db servers"); REG_ERROR(ERROR_CLUSTER_REPAIRS_NOT_ENOUGH_HEALTHY, "not enough (healthy) db servers");
REG_ERROR(ERROR_CLUSTER_REPAIRS_REPLICATION_FACTOR_VIOLATED, "replication factor violated during cluster repairs"); REG_ERROR(ERROR_CLUSTER_REPAIRS_REPLICATION_FACTOR_VIOLATED, "replication factor violated during cluster repairs");

View File

@ -1693,6 +1693,29 @@ constexpr int TRI_ERROR_ILLEGAL_SMART_GRAPH_ATTRIBUTE
/// graph attribute of the graph. /// graph attribute of the graph.
constexpr int TRI_ERROR_SMART_GRAPH_ATTRIBUTE_MISMATCH = 4005; constexpr int TRI_ERROR_SMART_GRAPH_ATTRIBUTE_MISMATCH = 4005;
/// 4006: ERROR_INVALID_SMART_JOIN_ATTRIBUTE
/// "invalid smart join attribute declaration"
/// Will be raised when the smartJoinAttribute declaration is invalid.
constexpr int TRI_ERROR_INVALID_SMART_JOIN_ATTRIBUTE = 4006;
/// 4007: ERROR_KEY_MUST_BE_PREFIXED_WITH_SMART_JOIN_ATTRIBUTE
/// "shard key value must be prefixed with the value of the smart join attribute"
/// when using smartJoinAttribute for a collection, the shard key value must be
/// prefixed with the value of the smart join attribute.
constexpr int TRI_ERROR_KEY_MUST_BE_PREFIXED_WITH_SMART_JOIN_ATTRIBUTE = 4007;
/// 4008: ERROR_NO_SMART_JOIN_ATTRIBUTE
/// "smart join attribute not given or invalid"
/// The given document does not have the required smart join attribute set or
/// it has an invalid value.
constexpr int TRI_ERROR_NO_SMART_JOIN_ATTRIBUTE = 4008;
/// 4009: ERROR_CLUSTER_MUST_NOT_CHANGE_SMART_JOIN_ATTRIBUTE
/// "must not change the value of the smartJoinAttribute"
/// Will be raised if there is an attempt to update the value of the
/// smartJoinAttribute.
constexpr int TRI_ERROR_CLUSTER_MUST_NOT_CHANGE_SMART_JOIN_ATTRIBUTE = 4009;
/// 5000: ERROR_CLUSTER_REPAIRS_FAILED /// 5000: ERROR_CLUSTER_REPAIRS_FAILED
/// "error during cluster repairs" /// "error during cluster repairs"
/// General error during cluster repairs /// General error during cluster repairs