sage.query_engine.iterators package

Submodules

sage.query_engine.iterators.filter module

class sage.query_engine.iterators.filter.FilterIterator(source: sage.query_engine.iterators.preemptable_iterator.PreemptableIterator, expression: str, context: dict)

Bases: sage.query_engine.iterators.preemptable_iterator.PreemptableIterator

A FilterIterator evaluates a FILTER clause in a pipeline of iterators.

Args:
  • source: Previous iterator in the pipeline.

  • expression: A SPARQL FILTER expression.

  • context: Information about the query execution.

has_next() → bool

Return True if the iterator has more item to yield

async next() → Optional[Dict[str, str]]

Get the next item from the iterator, following the iterator protocol.

This function may contains non interruptible clauses which must be atomically evaluated before preemption occurs.

Returns: A set of solution mappings, or None if none was produced during this call.

next_stage(mappings: Dict[str, str])

Propagate mappings to the bottom of the pipeline in order to compute nested loop joins

save() → iterators_pb2.SavedFilterIterator

Save and serialize the iterator as a Protobuf message

serialized_name() → str

Get the name of the iterator, as used in the plan serialization protocol

sage.query_engine.iterators.filter.to_rdflib_term(value: str) → Union[rdflib.term.Literal, rdflib.term.URIRef, rdflib.term.Variable]

Convert a N3 term to a RDFLib Term.

Argument: A RDF Term in N3 format.

Returns: The RDF Term in rdflib format.

sage.query_engine.iterators.loader module

sage.query_engine.iterators.loader.load(saved_plan: Union[iterators_pb2.RootTree, iterators_pb2.SavedBagUnionIterator, iterators_pb2.SavedFilterIterator, iterators_pb2.SavedIndexJoinIterator, iterators_pb2.SavedProjectionIterator, iterators_pb2.SavedScanIterator], dataset: sage.database.core.dataset.Dataset, context: dict) → sage.query_engine.iterators.preemptable_iterator.PreemptableIterator

Load a preemptable physical query execution plan from a saved state.

Args:
  • saved_plan: Saved query execution plan.

  • dataset: RDF dataset used to execute the plan.

  • context: Information about the query execution.

Returns:

The pipeline of iterator used to continue query execution.

sage.query_engine.iterators.loader.load_filter(saved_plan: iterators_pb2.SavedFilterIterator, dataset: sage.database.core.dataset.Dataset, context: dict) → sage.query_engine.iterators.preemptable_iterator.PreemptableIterator

Load a FilterIterator from a protobuf serialization.

Args:
  • saved_plan: Saved query execution plan.

  • dataset: RDF dataset used to execute the plan.

  • context: Information about the query execution.

Returns:

The pipeline of iterator used to continue query execution.

sage.query_engine.iterators.loader.load_nlj(saved_plan: iterators_pb2.SavedIndexJoinIterator, dataset: sage.database.core.dataset.Dataset, context: dict) → sage.query_engine.iterators.preemptable_iterator.PreemptableIterator

Load a IndexJoinIterator from a protobuf serialization.

Args:
  • saved_plan: Saved query execution plan.

  • dataset: RDF dataset used to execute the plan.

  • context: Information about the query execution.

Returns:

The pipeline of iterator used to continue query execution.

sage.query_engine.iterators.loader.load_projection(saved_plan: iterators_pb2.SavedProjectionIterator, dataset: sage.database.core.dataset.Dataset, context: dict) → sage.query_engine.iterators.preemptable_iterator.PreemptableIterator

Load a ProjectionIterator from a protobuf serialization.

Args:
  • saved_plan: Saved query execution plan.

  • dataset: RDF dataset used to execute the plan.

  • context: Information about the query execution.

Returns:

The pipeline of iterator used to continue query execution.

sage.query_engine.iterators.loader.load_scan(saved_plan: iterators_pb2.SavedScanIterator, dataset: sage.database.core.dataset.Dataset, context: dict) → sage.query_engine.iterators.preemptable_iterator.PreemptableIterator

Load a ScanIterator from a protobuf serialization.

Args:
  • saved_plan: Saved query execution plan.

  • dataset: RDF dataset used to execute the plan.

  • context: Information about the query execution.

Returns:

The pipeline of iterator used to continue query execution.

sage.query_engine.iterators.loader.load_union(saved_plan: iterators_pb2.SavedBagUnionIterator, dataset: sage.database.core.dataset.Dataset, context: dict) → sage.query_engine.iterators.preemptable_iterator.PreemptableIterator

Load a BagUnionIterator from a protobuf serialization.

Args:
  • saved_plan: Saved query execution plan.

  • dataset: RDF dataset used to execute the plan.

  • context: Information about the query execution.

Returns:

The pipeline of iterator used to continue query execution.

sage.query_engine.iterators.nlj module

class sage.query_engine.iterators.nlj.IndexJoinIterator(left: sage.query_engine.iterators.preemptable_iterator.PreemptableIterator, right: sage.query_engine.iterators.preemptable_iterator.PreemptableIterator, context: dict, current_mappings: Optional[Dict[str, str]] = None)

Bases: sage.query_engine.iterators.preemptable_iterator.PreemptableIterator

A IndexJoinIterator implements an Index Loop join in a pipeline of iterators.

Args:
  • left: Previous iterator in the pipeline, i.e., the outer relation of the join.

  • right: Next iterator in the pipeline, i.e., the inner relation of the join.

  • context: Information about the query execution.

  • current_mappings: The current mappings when the join is performed.

has_next() → bool

Return True if the iterator has more item to yield

async next() → Optional[Dict[str, str]]

Get the next item from the iterator, following the iterator protocol.

This function may contains non interruptible clauses which must be atomically evaluated before preemption occurs.

Returns: A set of solution mappings, or None if none was produced during this call.

next_stage(mappings: Dict[str, str])

Propagate mappings to the bottom of the pipeline in order to compute nested loop joins

save() → iterators_pb2.SavedIndexJoinIterator

Save and serialize the iterator as a Protobuf message

serialized_name() → str

Get the name of the iterator, as used in the plan serialization protocol

sage.query_engine.iterators.preemptable_iterator module

class sage.query_engine.iterators.preemptable_iterator.PreemptableIterator

Bases: abc.ABC

An abstract class for a preemptable iterator

abstract has_next() → bool

Return True if the iterator has more item to yield

abstract async next() → Optional[Dict[str, str]]

Get the next item from the iterator, following the iterator protocol.

This function may contains non interruptible clauses which must be atomically evaluated before preemption occurs.

Returns: A set of solution mappings, or None if none was produced during this call.

abstract next_stage(mappings: Dict[str, str])

Propagate mappings to the bottom of the pipeline in order to compute nested loop joins

abstract save() → Any

Save and serialize the iterator as a Protobuf message

abstract serialized_name() → str

Get the name of the iterator, as used in the plan serialization protocol

sage.query_engine.iterators.projection module

class sage.query_engine.iterators.projection.ProjectionIterator(source: sage.query_engine.iterators.preemptable_iterator.PreemptableIterator, context: dict, projection: List[str] = None)

Bases: sage.query_engine.iterators.preemptable_iterator.PreemptableIterator

A ProjectionIterator evaluates a SPARQL projection (SELECT) in a pipeline of iterators.

Args:
  • source: Previous iterator in the pipeline.

  • projection: Projection variables.

  • context: Information about the query execution.

has_next() → bool

Return True if the iterator has more item to yield

async next() → Optional[Dict[str, str]]

Get the next item from the iterator, following the iterator protocol.

This function may contains non interruptible clauses which must be atomically evaluated before preemption occurs.

Returns: A set of solution mappings, or None if none was produced during this call.

next_stage(mappings: Dict[str, str])

Propagate mappings to the bottom of the pipeline in order to compute nested loop joins

save() → iterators_pb2.SavedProjectionIterator

Save and serialize the iterator as a Protobuf message

serialized_name() → str

Get the name of the iterator, as used in the plan serialization protocol

sage.query_engine.iterators.scan module

class sage.query_engine.iterators.scan.ScanIterator(connector: sage.database.db_connector.DatabaseConnector, pattern: Dict[str, str], context: dict, current_mappings: Optional[Dict[str, str]] = None, mu: Optional[Dict[str, str]] = None, last_read: Optional[str] = None, as_of: Optional[datetime.datetime] = None)

Bases: sage.query_engine.iterators.preemptable_iterator.PreemptableIterator

A ScanIterator evaluates a triple pattern over a RDF graph.

It can be used as the starting iterator in a pipeline of iterators.

Args:
  • connector: The database connector that will be used to evaluate a triple pattern.

  • pattern: The evaluated triple pattern.

  • context: Information about the query execution.

  • current_mappings: The current mappings when the scan is performed.

  • mu: The last triple read when the preemption occured. This triple must be the next returned triple when the query is resumed.

  • last_read: An offset ID used to resume the ScanIterator.

  • as_of: Perform all reads against a consistent snapshot represented by a timestamp.

has_next() → bool

Return True if the iterator has more item to yield

last_read() → str
async next() → Optional[Dict[str, str]]

Get the next item from the iterator, following the iterator protocol.

This function may contains non interruptible clauses which must be atomically evaluated before preemption occurs.

Returns: A set of solution mappings, or None if none was produced during this call.

next_stage(mappings: Dict[str, str])

Propagate mappings to the bottom of the pipeline in order to compute nested loop joins

save() → iterators_pb2.SavedScanIterator

Save and serialize the iterator as a Protobuf message

serialized_name()

Get the name of the iterator, as used in the plan serialization protocol

sage.query_engine.iterators.union module

class sage.query_engine.iterators.union.BagUnionIterator(left: sage.query_engine.iterators.preemptable_iterator.PreemptableIterator, right: sage.query_engine.iterators.preemptable_iterator.PreemptableIterator, context: dict)

Bases: sage.query_engine.iterators.preemptable_iterator.PreemptableIterator

A BagUnionIterator performs a SPARQL UNION with bag semantics in a pipeline of iterators.

This operator sequentially produces all solutions from the left operand, and then do the same for the right operand.

Args:
  • left: left operand of the union.

  • right: right operand of the union.

  • context: Information about the query execution.

has_next() → bool

Return True if the iterator has more item to yield

async next() → Optional[Dict[str, str]]

Get the next item from the iterator, following the iterator protocol.

This function may contains non interruptible clauses which must be atomically evaluated before preemption occurs.

Returns: A set of solution mappings, or None if none was produced during this call.

next_stage(mappings: Dict[str, str])

Propagate mappings to the bottom of the pipeline in order to compute nested loop joins

save() → iterators_pb2.SavedBagUnionIterator

Save and serialize the iterator as a Protobuf message

serialized_name() → str

Get the name of the iterator, as used in the plan serialization protocol

class sage.query_engine.iterators.union.RandomBagUnionIterator(left: sage.query_engine.iterators.preemptable_iterator.PreemptableIterator, right: sage.query_engine.iterators.preemptable_iterator.PreemptableIterator, context: dict)

Bases: sage.query_engine.iterators.union.BagUnionIterator

A RandomBagUnionIterator performs a SPARQL UNION with bag semantics in a pipeline of iterators.

This operator randomly reads from the left and right operands to produce solution mappings.

Args:
  • left: left operand of the union.

  • right: right operand of the union.

  • context: Information about the query execution.

has_next() → bool

Return True if the iterator has more item to yield

async next() → Optional[Dict[str, str]]

Get the next item from the iterator, following the iterator protocol.

This function may contains non interruptible clauses which must be atomically evaluated before preemption occurs.

Returns: A set of solution mappings, or None if none was produced during this call.

sage.query_engine.iterators.union.random() → x in the interval [0, 1).

sage.query_engine.iterators.utils module

class sage.query_engine.iterators.utils.ArrayIterator(array: List[Dict[str, str]])

Bases: object

An iterator that sequentially yields all items from a list.

Argument: List of solution mappings.

has_next() → bool

Return True if the iterator has more item to yield

async next() → Optional[Dict[str, str]]

Get the next item from the iterator, following the iterator protocol.

This function may contains non interruptible clauses which must be atomically evaluated before preemption occurs.

Returns: A set of solution mappings, or None if none was produced during this call.

class sage.query_engine.iterators.utils.EmptyIterator

Bases: object

An Iterator that yields nothing

has_next() → bool

Return True if the iterator has more item to yield

async next() → None

Get the next item from the iterator, following the iterator protocol.

This function may contains non interruptible clauses which must be atomically evaluated before preemption occurs.

Returns: A set of solution mappings, or None if none was produced during this call.

sage.query_engine.iterators.utils.find_in_mappings(variable: str, mappings: Dict[str, str] = {}) → str

Find a substitution for a SPARQL variable in a set of solution mappings.

Args:
  • variable: SPARQL variable to look for.

  • bindings: Set of solution mappings to search in.

Returns:

The value that can be substituted for this variable.

Example:
>>> mappings = { "?s": ":Ann", "?knows": ":Bob" }
>>> find_in_mappings("?s", mappings)
":Ann"
>>> find_in_mappings("?unknown", mappings)
"?unknown"
sage.query_engine.iterators.utils.selection(triple: Tuple[str, str, str], variables: List[str]) → Dict[str, str]

Apply a selection on a RDF triple, producing a set of solution mappings.

Args:
  • triple: RDF triple on which the selection is applied.

  • variables: Input variables of the selection.

Returns:

A set of solution mappings built from the selection results.

Example:
>>> triple = (":Ann", "foaf:knows", ":Bob")
>>> variables = ["?s", None, "?knows"]
>>> selection(triple, variables)
{ "?s": ":Ann", "?knows": ":Bob" }
sage.query_engine.iterators.utils.tuple_to_triple(s: str, p: str, o: str) → Dict[str, str]

Convert a tuple-based triple pattern into a dict-based triple pattern.

Args:
  • s: Subject of the triple pattern.

  • p: Predicate of the triple pattern.

  • o: Object of the triple pattern.

Returns:

The triple pattern as a dictionnary.

Example:
>>> tuple_to_triple("?s", "foaf:knows", ":Bob")
{ "subject": "?s", "predicate": "foaf:knows", "object": "Bob" }
sage.query_engine.iterators.utils.vars_positions(subject: str, predicate: str, obj: str) → List[str]

Find the positions of SPARQL variables in a triple pattern.

Args:
  • subject: Subject of the triple pattern.

  • predicate: Predicate of the triple pattern.

  • obj: Object of the triple pattern.

Returns:

The positions of SPARQL variables in the input triple pattern.

Example:
>>> vars_positions("?s", "http://xmlns.com/foaf/0.1/name", '"Ann"@en')
[ "?s", None, None ]
>>> vars_positions("?s", "http://xmlns.com/foaf/0.1/name", "?name")
[ "?s", None, "?name" ]

Module contents