SWIPE (SWappable Iterator for oPErations)

SWIPE represents an interface to modify the Operate process of TritanDB to extend the engine to support different languages, database and model tiers.

The implementation of the set of TritanDB operators is inspired by the work on an open and common relational algebra specification in Apache Calcite. The tables show the conversion from a SPARQL algebra operator to the corresponding TritanDB operator. Each TritanDB operator is described as follows.

Graph Patterns

SPARQL Algebra TritanDB Operator
BGP, $G$ match(BGP,map), scan_(TS)
Join, $\bowtie$ join(expr...)
LeftJoin, $\ltimes$ semiJoin(expr)
Filter, $\sigma$ filter(expr...)
Union, $\cup$ union()
Graph setMap(map)
Extend, $\rho$ extend(expr,var)
Minus minus()
Group/Aggregation, $\gamma$ aggregate(groupKey, aggr)

match which matches a Basic Graph Pattern (BGP) from a query with a mapping to produce a binding $\mathbb{B}$. A set of time-series are referenced within $\mathbb{B}$. scan is an operator that returns an iterator over a time-series TS.

join, combines two time-series according to conditions specified as expr while semiJoin joins two time-series according to some condition, but outputs only columns from the left input.

filter modifies the input to return an iterator over points for which the conditions specified in expr evaluates to true. A common filter condition would a specification of a range of time for a time-series.

union returns the union of the input time-series and bindings $\mathbb{B}$. If the same time-series is referenced within inputs, only the bindings need to be merged. If two different time-series are merged, the iterator is formed in linear time by a comparison-based sorting algorithm, for example, the merge step within a merge sort, as the time-series are retrieved in sorted time order.

setMap is used to apply the specified mapping to its algebra tree leaf nodes for match.

extend allows the evaluation of an expression expr to be bound to a new variable var. This evaluation is performed only if var is projected. There are three means in SPARQL to produce the algebra: using bind, expressions in the select clause or expressions in the group by clause.

minus returns the iterator of first input excluding points from the second input.

aggregate produces an iteration over a set of aggregated results from an input. To calculate aggregate values for an input, the input is first divided into one or more groups by the groupKey field and the aggregate value is calculated for the particular aggr function for each group. The aggr functions supported are count, sum, avg, min, max, sample and groupconcat.

Solution Modifiers

SPARQL Algebra TritanDB Operator
OrderBy sort(fieldOrdinal...)
Project, $\Pi$ project(exprList [, fieldNames])
Distinct, $\sigma_{D}$ distinct()
Reduced $\sigma_{R}$ distinct()
Slice, $\sigma_{S}$ limit(offset, fetch)

sort imposes a particular sort order on its input based on a sequence consisting of fieldOrdinals, each defining the time-series field index (zero-based) and specifying a positive ordinal for ascending and negative for descending order.

project computes the set of chosen variables to 'select' from its input, as specified by exprList, and returns an iterator to the result containing only the selected variables. The default name of variables provided can be renamed by specifying the new name within the fieldNames argument.

distinct eliminates all duplicate records while reduced, in the TritanDB implementation, performs the same function. The SPARQL specification defines the difference being that distinct ensures duplicate elimination while reduced simply permits duplicate elimination. Given that time-series are retrieved in sorted order of time, the distinct function works the same for reduced as well and eliminates immediately repeated duplicate result rows.

limit computes a window over the input returning an iterator over results that are of a maximum size (in rows) of fetch and are a distance of offset from the start of the result set.