metabograph package

Subpackages

Submodules

metabograph.cache module

Cached data management.

class metabograph.cache.CacheManager(config: Config)[source]

Bases: object

Manage cached data.

__init__(config: Config)[source]

property cache_dir: Path[source]

The cache directory path.

Returns:: The path to this applications cache directory.

clear()[source]: Clear all cached data.

get_path(subpath: str | Path, directory: bool = False) → Path[source]

Get a path to a subpath in the cache directory. The parent directory will be created if missing.

Parameters:

subpath – A subpath to interpret relative to the cache directory.
directory – If True, the subpath will be created as a directory.

Returns:

A path within the cache directory.

metabograph.common module

Common global variables and functions.

metabograph.common.download(url: str, path: str | Path, append_name: bool = False, timeout=10, force=False)[source]

Download a URL to a path.

Parameters:

url – The URL to download.
path – The output path.
append_name – If True, append the URL’s filename to the path.
timeout – The timeout for remote requests, in seconds.
force – By default, existing local files will only be overwritten if the remote server reports a newer modification time (or no modification time at all). Set this option to True to force a download without checking for remote modification times.

Returns:

The output path.

metabograph.config module

Graph configuration.

class metabograph.config.BiopaxData(custom_owl_files: list[~pathlib.Path] = <factory>, endpoint: str | None = None, include_complexes: bool = True, include_default_owl_files: bool = True, include_member_entities: bool = False, keep_unknown: bool = False, locations: list[str] | str | None = None, pathways: list[str] | str | None = None, species: str = 'homo sapiens')[source]

Bases: object

BioPAX data configuration.

Parameters:

endpoint – An optional URL to a SPARQL endpoint through which to query BioPAX data, such as a local Fuseki server.
include_default_owl_files – If True, include the default BioPAX files for the configured species. These files will be downloaded if necessary.
custom_owl_files – A list of paths to custom OWL files in the BioPAX level-3 format to use, either with or without the default files depending on the value of include_default_owl_files.
include_complexes – If True, include complexes and their components.
include_member_entities – If True, include member entities (as defined by BioPAX).
keep_unknown – If True, items for which a pathway or location is unknown will be kept when filtering by pathway and/or location.
locations – Either a list of BioPax entity locations, or a path to a plaintext file with one location per line. See metabograph –list-locations for the complete list.
pathways – Either a list of BioPAX pathways, or a path to a plaintext file with one pathways per line. See metabograph –list-pathways for the complete list.
species – The target species. It must be one supported by BioPAX. See metabograph –list-species for the complete list.

__init__(custom_owl_files: list[~pathlib.Path] = <factory>, endpoint: str | None = None, include_complexes: bool = True, include_default_owl_files: bool = True, include_member_entities: bool = False, keep_unknown: bool = False, locations: list[str] | str | None = None, pathways: list[str] | str | None = None, species: str = 'homo sapiens') → None[source]

custom_owl_files: list[Path][source]

endpoint: str | None = None[source]

include_complexes: bool = True[source]

include_default_owl_files: bool = True[source]

include_member_entities: bool = False[source]

keep_unknown: bool = False[source]

locations: list[str] | str | None = None[source]

pathways: list[str] | str | None = None[source]

species: str = 'homo sapiens'[source]

class metabograph.config.CacheData(path: Path | None = None, timeout: int | None = None)[source]

Bases: object

Cache data configuration.

Parameters:

path – The path to a cache directory. If unset, the standard XDG user cache directory will be used.
timeout – The timeout for the cached data. Data will be cleared from the cache after this timeout. If unset, cached data will not automatically time out.

__init__(path: Path | None = None, timeout: int | None = None) → None[source]

path: Path | None = None[source]

timeout: int | None = None[source]

class metabograph.config.Config(path: str | Path = None, data: dict = None)[source]

Bases: object

Graph configuration.

__init__(path: str | Path = None, data: dict = None)[source]

Parameters:

path – The path to a YAML configuration file that should be loaded.
data – A dict of keyword parameters for instantiating an instance of ConfigData.

asdict()[source]: Return the dict representing the current ConfigData object.

get_documented_yaml()[source]: Return the current object as a commented YAML document.

load(path: str | Path)[source]: Load a configuration file.

resolve_path(path: str | Path) → Path[source]

Resolve a path relative to the configuration file’s path if the path is set.

Parameters:: path – The path to resolve.
Returns:: The resolved path.

class metabograph.config.ConfigData(biopax: ~metabograph.config.BiopaxData = <factory>, cache: ~metabograph.config.CacheData = <factory>)[source]

Bases: object

Main configuration.

Parameters:

cache – Cache configuration.
biopax – BioPAX configuration.

__init__(biopax: ~metabograph.config.BiopaxData = <factory>, cache: ~metabograph.config.CacheData = <factory>) → None[source]

biopax: BiopaxData[source]

cache: CacheData[source]

metabograph.exception module

Exceptions.

exception metabograph.exception.MetabographConfigError[source]

Bases: MetabographException

Metabograph configuration error.

exception metabograph.exception.MetabographException[source]

Bases: Exception

Base class for custom exceptions raised by this package.

exception metabograph.exception.MetabographIOError[source]

Bases: MetabographException

Metabograph IO error.

exception metabograph.exception.MetabographRuntimeError[source]

Bases: MetabographException

Metabograph Runtime error.

metabograph.fuseki module

Run the Fuseki server.

metabograph.fuseki.main(args=None)[source]: Run the Fuseki server.

metabograph.fuseki.run_fuseki_server(config)[source]: Context manager to launch a Fuseki server

metabograph.fuseki.run_main(args=None)[source]: Wrapper around main with error handling.

metabograph.logging module

Configure logging.

metabograph.logging.configure_logging(level=20)[source]

Configure logging.

Parameters:: level – Logging level.

metabograph.main module

Metabograph command-line application.

metabograph.main.main(args=None)[source]: Main function.

metabograph.main.run_main(args=None)[source]: Wrapper around main with error handling.

metabograph.owl module

Run SPARQL queries on OWL files.

class metabograph.owl.OwlLoader(cache_man, name, paths)[source]

Bases: object

Load OWL files and run SPARQL queries on them.

__init__(cache_man, name, paths)[source]

Parameters:

cache_man – A CacheManager instance.
name – A name for this loader. It will be used as a database name by the owlready backend to speed up queries after the initial loading.
paths – Paths to OWL files to load.

load()[source]: Load the configured OWL files. Normally the OWL files are loaded lazily on demand but this function can be used to force loading of files from a temporary context.

property owlready2_world: None[source]: owlready2 ontology.

query(query: str) → pandas.DataFrame[source]

Run a SPARQL query on the loaded ontology.

Parameters:: query – The SPARQL query to run.
Returns:: A Pandas dataframe with the results.

property rdflib_graph: rdflib.Graph[source]: Run a query using rdflib.

metabograph.sparql module

SPARQL query functions.

metabograph.sparql.canonicalize(query)[source]

Canonicalize a SPARQL query. This uses rdflib’s SPARQL parsing and algebra translation functions. The returned query string may be longer than the input string so this function should not be used to shorten queryies.

Parameters:: query – The input SPARQL query string.
Returns:: The canonicalized query string.

metabograph.sparql.hash_query(query)[source]

Get a hash value for a SPARQL query. This is used to cache query results.

Parameters:: query – The input query.
Returns:: A hash string.

metabograph.utils module

Utility functions.

metabograph.utils.dict_from_dataframe_columns(dataframe: pandas.DataFrame, key_col: str, val_col: str)[source]

Get a dict mapping key in one column of a dataframe to values in another column.

Parameters:

dataframe – The input dataframe.
key_col – The name of the column with the keys.
val_col – The name of the column with the corresponding values.

Returns:

A Python dict.

metabograph.utils.get_common_prefix(items)[source]

Get the common prefix of an iterable of strings.

Parameters:: items – The items to parse.
Returns:: The common string prefix.

metabograph.utils.hash_data(data: Any, algorithm: str = 'sha256')[source]

Hash data.

Parameters:

data – The data to hash. If not a byte string then it will be converted to one.
algorithm – The hashing algorithm to use.

Returns:

The hexdigest of the data and the algorithm used.

metabograph.utils.hash_file(path: str | Path, algorithm: str = 'sha256')[source]

Hash a file.

Parameters:

path – The path to the file.
algorithm – The hashing algorithm to use.

Returns:

The hexdigest of the file and the algorithm used.

metabograph.utils.is_older_or_missing(dst: str | Path, src: str | Path)[source]

Check if the destination path is older than the source path or missing.

Parameters:

dst – The destination path.
src – The source path.

Returns:

True if the destination path is older or missing, else False

Module contents

Package stub.