sage.database.hdt package

Submodules

sage.database.hdt.connector module

class sage.database.hdt.connector.HDTFileConnector(file: str, mapped=True, indexed=True)

Bases: sage.database.db_connector.DatabaseConnector

A HDTFileConnector search for RDF triples in a HDT file.

Args:
  • file: Path to the HDT file.

  • mapped: True maps the HDT file on disk (faster), False loads everything in memory.

  • indexed: True if the HDT must be loaded with indexes, False otherwise.

from_config()

Build a HDTFileFactory from a configuration object.

Args:
  • config: configuration object. Must contains the ‘file’ field.

Example:
>>> config = { "file": "./dbpedia.hdt" }
>>> connector = HDTFileConnector.from_config(config)
>>> print(f"The HDT file contains {connector.nb_triples} RDF triples")
property nb_objects

Get the number of objects in the database

property nb_predicates

Get the number of predicates in the database

property nb_subjects

Get the number of subjects in the database

property nb_triples

Get the number of RDF triples in the database

search(subject: str, predicate: str, obj: str, last_read: Optional[str] = None, as_of: Optional[datetime.datetime] = None) → Tuple[sage.database.hdt.iterator.HDTIterator, int]

Get an iterator over all RDF triples matching a triple pattern.

Args:
  • subject: Subject of the triple pattern.

  • predicate: Predicate of the triple pattern.

  • object: Object of the triple pattern.

  • last_read: A RDF triple ID. When set, the search is resumed for this RDF triple.

  • as_of: A version timestamp. When set, perform all reads against a consistent snapshot represented by this timestamp.

Returns:

A tuple (iterator, cardinality), where iterator is a Python iterator over RDF triples matching the given triples pattern, and cardinality is the estimated cardinality of the triple pattern.

sage.database.hdt.iterator module

class sage.database.hdt.iterator.HDTIterator(source: hdt.TripleIterator, pattern: Dict[str, str], start_offset=0)

Bases: sage.database.db_iterator.DBIterator

An HDTIterator implements a DBIterator for scanning RDF triples in a HDT file.

Args:
  • source: HDT iterator which scans for RDF triples from a HDT file.

  • pattern: Triple pattern scanned.

  • start_offset: Initial offset of the source iterator. Used to compute the last_read triple when preemption occurs.

has_next() → bool

Return True if there is still results to read, and False otherwise

last_read() → str

Return the ID of the last element read

next() → Tuple[str, str, str]

Return the next solution mapping or None if there are no more solutions

Module contents