# SWIPE (SWappable Iterator for oPErations)

SWIPE represents an interface to modify the Operate process of TritanDB to extend the engine to support different languages, database and model tiers.

The implementation of the set of TritanDB operators is inspired by the work on an open and common relational algebra specification in Apache Calcite. The tables show the conversion from a SPARQL algebra operator to the corresponding TritanDB operator. Each TritanDB operator is described as follows.

## Graph Patterns

SPARQL Algebra | TritanDB Operator |
---|---|

BGP, $G$ | match(BGP,map), scan_(TS) |

Join, $\bowtie$ | join(expr...) |

LeftJoin, $\ltimes$ | semiJoin(expr) |

Filter, $\sigma$ | filter(expr...) |

Union, $\cup$ | union() |

Graph | setMap(map) |

Extend, $\rho$ | extend(expr,var) |

Minus | minus() |

Group/Aggregation, $\gamma$ | aggregate(groupKey, aggr) |

*match* which matches a Basic Graph Pattern (BGP) from a query with a mapping to produce a binding $\mathbb{B}$. A set of time-series are referenced within $\mathbb{B}$. *scan* is an operator that returns an iterator over a time-series *TS*.

*join*, combines two time-series according to conditions specified as *expr* while *semiJoin* joins two time-series according to some condition, but outputs only columns from the left input.

*filter* modifies the input to return an iterator over points for which the conditions specified in *expr* evaluates to true. A common filter condition would a specification of a range of time for a time-series.

*union* returns the union of the input time-series and bindings $\mathbb{B}$. If the same time-series is referenced within inputs, only the bindings need to be merged. If two different time-series are merged, the iterator is formed in linear time by a comparison-based sorting algorithm, for example, the merge step within a merge sort, as the time-series are retrieved in sorted time order.

*setMap* is used to apply the specified mapping to its algebra tree leaf nodes for *match*.

*extend* allows the evaluation of an expression *expr* to be bound to a new variable *var*. This evaluation is performed only if *var* is projected. There are three means in SPARQL to produce the algebra: using *bind*, expressions in the *select* clause or expressions in the *group by* clause.

*minus* returns the iterator of first input excluding points from the second input.

*aggregate* produces an iteration over a set of aggregated results from an input. To calculate aggregate values for an input, the input is first divided into one or more groups by the *groupKey* field and the aggregate value is calculated for the particular *aggr* function for each group. The *aggr* functions supported are *count*, *sum*, *avg*, *min*, *max*, *sample* and *groupconcat*.

## Solution Modifiers

SPARQL Algebra | TritanDB Operator |
---|---|

OrderBy | sort(fieldOrdinal...) |

Project, $\Pi$ | project(exprList [, fieldNames]) |

Distinct, $\sigma_{D}$ | distinct() |

Reduced $\sigma_{R}$ | distinct() |

Slice, $\sigma_{S}$ | limit(offset, fetch) |

*sort* imposes a particular sort order on its input based on a sequence consisting of *fieldOrdinals*, each defining the time-series field index (zero-based) and specifying a positive ordinal for ascending and negative for descending order.

*project* computes the set of chosen variables to 'select' from its input, as specified by *exprList*, and returns an iterator to the result containing only the selected variables. The default name of variables provided can be renamed by specifying the new name within the *fieldNames* argument.

*distinct* eliminates all duplicate records while *reduced*, in the TritanDB implementation, performs the same function. The SPARQL specification defines the difference being that *distinct* ensures duplicate elimination while *reduced* simply permits duplicate elimination. Given that time-series are retrieved in sorted order of time, the *distinct* function works the same for *reduced* as well and eliminates immediately repeated duplicate result rows.

*limit* computes a window over the input returning an iterator over results that are of a maximum size (in rows) of *fetch* and are a distance of *offset* from the start of the result set.