ArangoDB v3.13 is under development and not released yet. This documentation is not final and potentially incomplete.

SatelliteGraph Details

How to create and use SatelliteGraphs

Below you find usage examples and advanced configuration possibilities for SatelliteGraphs. The examples use arangosh and the @arangodb/satellite-graph module. You can also manage SatelliteGraphs via the HTTP API.

How to create a SatelliteGraph

SatelliteGraphs enforce and rely on special properties of the underlying collections and hence can only work with collections that are either created implicitly through the SatelliteGraph interface, or manually with the correct properties:

  • There needs to be a prototype collection with replicationFactor set to "satellite"
  • All other collections need to have distributeShardsLike set to the name of the prototype collection

Collections can be part of multiple SatelliteGraphs. This means that in contrast to SmartGraphs, SatelliteGraphs can be overlapping. If you have a larger SatelliteGraph and want to create an additional SatelliteGraph which only covers a part of it, then you can do so.

Create a graph

To create a SatelliteGraph in arangosh, use the satellite-graph module:

var satelliteGraphModule = require("@arangodb/satellite-graph");
var graph = satelliteGraphModule._create("satelliteGraph");
graph = satelliteGraphModule._graph("satelliteGraph");
Show output
{[SatelliteGraph] 
}

In contrast to General Graphs and SmartGraphs, you do not need to take care of the sharding and replication properties. The properties distributeShardsLike, replicationFactor and numberOfShards will be set automatically.

Add node collections

Adding node collections is analogous to General Graphs:

var graph = satelliteGraphModule._create("satelliteGraph");
graph._addVertexCollection("aNodeCollection");
graph = satelliteGraphModule._graph("satelliteGraph");
Show output
{[SatelliteGraph] 
  "aNodeCollection" : [ArangoCollection 6010307, "aNodeCollection" (type document, status loaded)] 
}

If the collection "aNodeCollection" doesn’t exist yet, then the SatelliteGraph module creates it automatically with the correct properties. If it exists already, then its properties must be suitable for a SatelliteGraph (see prototype collection). Otherwise, it is not added.

Define relations

Adding edge collections works the same as with General Graphs, but again, the collections are created by the SatelliteGraph module with the right properties if they don’t exist already.

var graph = satelliteGraphModule._create("satelliteGraph");
var relation = satelliteGraphModule._relation("isFriend", ["person"], ["person"]);
graph._extendEdgeDefinitions(relation);
graph = satelliteGraphModule._graph("satelliteGraph");
Show output
{[SatelliteGraph] 
  "isFriend" : [ArangoCollection 6010535, "isFriend" (type edge, status loaded)], 
  "person" : [ArangoCollection 6010534, "person" (type document, status loaded)] 
}

Existing edge collections can be added, but they require the distributeShardsLike property to reference the prototype collection.

The prototype collection

Every SatelliteGraph needs exactly one document collection with replicationFactor set to "satellite". This automatically leads to the collection having an exact amount of one shard per collection. This collection is selected as prototype.

All other collections of the SatelliteGraph need to inherit its properties by referencing its name in the distributeShardsLike property.

If collections are created implicitly through the SatelliteGraph module, then this is handled for you automatically. If you want to create the collections manually before adding them to the SatelliteGraph, then you need to take care of these properties.

Prototype collection examples

Creating an empty SatelliteGraph: No prototype collection is present.

var satelliteGraphModule = require("@arangodb/satellite-graph");
var graph = satelliteGraphModule._create("satelliteGraph");
graph;
Show output
{[SatelliteGraph] 
}

Creating an empty SatelliteGraph, then adding a document (node) collection. This leads to the creation of a prototype collection "myPrototypeColl" (assuming that no collection with this name existed before):

var satelliteGraphModule = require("@arangodb/satellite-graph");
var graph = satelliteGraphModule._create("satelliteGraph");
graph._addVertexCollection("myPrototypeColl");
graph = satelliteGraphModule._graph("satelliteGraph");
Show output
{[SatelliteGraph] 
  "myPrototypeColl" : [ArangoCollection 6010674, "myPrototypeColl" (type document, status loaded)] 
}

Creating an empty SatelliteGraph, then adding an edge definition. This will select the collection "person" as prototype collection, as it is the only document (node) collection. If you supply more than one document collection, then one of the collections will be chosen arbitrarily as prototype collection.

var satelliteGraphModule = require("@arangodb/satellite-graph");
var graph = satelliteGraphModule._create("satelliteGraph");
var relation = satelliteGraphModule._relation("isFriend", ["person"], ["person"]);
graph._extendEdgeDefinitions(relation);
graph = satelliteGraphModule._graph("satelliteGraph");
Show output
{[SatelliteGraph] 
  "isFriend" : [ArangoCollection 6010744, "isFriend" (type edge, status loaded)], 
  "person" : [ArangoCollection 6010743, "person" (type document, status loaded)] 
}

The prototype collection can and also is automatically selected during the graph creation process if at least one document (node) collection is supplied directly. If more then one are available, they are chosen randomly as well, regardless whether they are set inside the edge definition itself or set as a node/orphan collection.

Utilizing SatelliteGraphs

Obviously, a SatelliteGraph must be created before it can be queried. Valid operations that can then be optimized are (k-)shortest path(s) computations and traversals. Both also allow for combination with local joins or other SatelliteGraph operations.

Here is an example showing the difference between the execution of a General Graph and a SatelliteGraph traversal query:

  1. First we setup our graphs and collections.

    var graphModule = require("@arangodb/general-graph");
    var satelliteGraphModule = require("@arangodb/satellite-graph");
    graphModule._create("normalGraph", [ graphModule._relation("edges", "nodes", "nodes") ], [], {});
    satelliteGraphModule._create("satelliteGraph", [ satelliteGraphModule._relation("satEdges", "satNodes", "satNodes") ], [], {});
    db._create("collection", {numberOfShards: 8});
    Show output
    {[GeneralGraph] 
      "edges" : [ArangoCollection 6010654, "edges" (type edge, status loaded)], 
      "nodes" : [ArangoCollection 6010653, "nodes" (type document, status loaded)] 
    }
    
    {[SatelliteGraph] 
      "satEdges" : [ArangoCollection 6010660, "satEdges" (type edge, status loaded)], 
      "satNodes" : [ArangoCollection 6010659, "satNodes" (type document, status loaded)] 
    }
    
    [ArangoCollection 6010664, "collection" (type document, status loaded)]

  2. Let us analyze a query involving a traversal:

    db._explain(`FOR doc in collection FOR v,e,p IN OUTBOUND "nodes/start" GRAPH "normalGraph" RETURN [doc,v,e,p]`, {}, {colors: false});
    Show output
    Query String (96 chars, results cachable: true):
     FOR doc in collection FOR v,e,p IN OUTBOUND "nodes/start" GRAPH "normalGraph" RETURN [doc,v,e,p]
    
    Execution plan:
     Id   NodeType                  Site  Par   Est.   Comment
      1   SingletonNode             DBS            1   * ROOT 
      2   EnumerateCollectionNode   DBS            0     - FOR doc IN collection   /* full collection scan, 8 shard(s)  */
      8   RemoteNode                COOR           0       - REMOTE
      9   GatherNode                COOR           0       - GATHER   /* unsorted */
      3   TraversalNode             COOR           1       - FOR v  /* vertex */, e  /* edge */, p  /* paths: vertices, edges */ IN 1..1  /* min..maxPathDepth */ OUTBOUND 'nodes/start' /* startnode */  GRAPH 'normalGraph' /* order: dfs */
      4   CalculationNode           COOR           1         - LET #5 = [ doc, v, e, p ]   /* simple expression */   /* collections used: doc : collection */
      5   ReturnNode                COOR           1         - RETURN #5
    
    Indexes used:
     By   Name   Type   Collection   Unique   Sparse   Cache   Selectivity   Fields        Stored values   Ranges
      3   edge   edge   edges        false    false    false      100.00 %   [ `_from` ]   [  ]            base OUTBOUND
    
    Traversals on graphs:
     Id  Depth  Vertex collections  Edge collections  Options                                              Filter / Prune Conditions
     3   1..1   nodes               edges             uniqueVertices: none, uniqueEdges: path, order: dfs  
    
    Optimization rules applied:
     Id   Rule Name                                 Id   Rule Name                        
      1   scatter-in-cluster                         2   remove-unnecessary-remote-scatter
    
    Optimization rules with highest execution times:
     RuleName                                        Duration [s]
     use-vector-index                                     0.00075
    
    78 rule(s) executed, 1 plan(s) created, peak mem [b]: 0, exec time [s]: 0.00237

    You can see that the TraversalNode is executed on a Coordinator, and only the EnumerateCollectionNode is executed on DB-Servers. This happens for each of the 8 shards in collection.

  3. Let us now have a look at the same query using a SatelliteGraph:

    db._explain(`FOR doc in collection FOR v,e,p IN OUTBOUND "nodes/start" GRAPH "satelliteGraph" RETURN [doc,v,e,p]`, {}, {colors: false});
    Show output
    Query String (99 chars, results cachable: true):
     FOR doc in collection FOR v,e,p IN OUTBOUND "nodes/start" GRAPH "satelliteGraph" RETURN [doc,v,e,p]
    
    Execution plan:
     Id   NodeType                  Site  Par   Est.   Comment
      1   SingletonNode             DBS            1   * ROOT 
      2   EnumerateCollectionNode   DBS            0     - FOR doc IN collection   /* full collection scan, 8 shard(s)  */
     10   TraversalNode             DBS            1       - FOR v  /* vertex */, e  /* edge */, p  /* paths: vertices, edges */ IN 1..1  /* min..maxPathDepth */ OUTBOUND 'nodes/start' /* startnode */  GRAPH 'satelliteGraph' /* local graph node, used as satellite */ /* order: dfs */
      4   CalculationNode           DBS            1         - LET #5 = [ doc, v, e, p ]   /* simple expression */   /* collections used: doc : collection */
     13   RemoteNode                COOR           1         - REMOTE
     14   GatherNode                COOR           1         - GATHER   /* parallel, unsorted */
      5   ReturnNode                COOR           1         - RETURN #5
    
    Indexes used:
     By   Name   Type   Collection   Unique   Sparse   Cache   Selectivity   Fields        Stored values   Ranges
     10   edge   edge   satEdges     false    false    false      100.00 %   [ `_from` ]   [  ]            base OUTBOUND
    
    Traversals on graphs:
     Id  Depth  Vertex collections  Edge collections  Options                                              Filter / Prune Conditions
     10  1..1   satNodes            satEdges          uniqueVertices: none, uniqueEdges: path, order: dfs  
    
    Optimization rules applied:
     Id   Rule Name                                 Id   Rule Name                                 Id   Rule Name                        
      1   scatter-in-cluster                         3   remove-satellite-joins                     5   remove-unnecessary-remote-scatter
      2   scatter-satellite-graphs                   4   distribute-filtercalc-to-cluster           6   parallelize-gather               
    
    Optimization rules with highest execution times:
     RuleName                                        Duration [s]
     use-vector-index                                     0.00063
    
    78 rule(s) executed, 1 plan(s) created, peak mem [b]: 0, exec time [s]: 0.00222

    Note that now the TraversalNode is executed on each DB-Server, leading to a great reduction in required network communication, and hence potential gains in query performance.

Convert General Graphs or SmartGraphs to SatelliteGraphs

If you want to transform an existing General Graph or SmartGraph into a SatelliteGraph, then you need to dump and restore your previous graph. This is necessary for the initial data replication and because some collection properties are immutable.

Use arangodump and arangorestore. The only thing you have to change in this pipeline is that you create the new collections during creation with the SatelliteGraph module or add collections manually to the SatelliteGraph before starting the arangorestore process.