metabograph.biopax package

Submodules

metabograph.biopax.graph_generator module

Create NetworkX graphs from BioPAX data.

class metabograph.biopax.graph_generator.BiopaxGraphGenerator(*, config: Config = None, bqm: BiopaxQueryManager = None)[source]

Bases: object

Create NetworkX graphs from BioPAX data.

__init__(*, config: Config = None, bqm: BiopaxQueryManager = None)[source]

Parameters:

config – An instance of Config. If not given, then the bqm parameter is required and the configuration from the BiopaxQueryManager instance will be used.
bqm – An instance of BiopaxQueryManager. If not given, one will be instantiated from the given Config instance.

filter_by_location(data: pandas.DataFrame, ent_cols: list[str]) → pandas.DataFrame[source]

Filter entities by their location.

Parameters:

data – The dataframe to filter.
end_cols – The column names of the entities.

Returns:

The filtered dataframe.

filter_by_pathway(data: pandas.DataFrame, pathway_cols: list[str]) → pandas.DataFrame[source]

Filter interactions by pathway.

Parameters:

data – The dataframe to filter.
end_cols – The column names of the entities.

Returns:

The filtered dataframe.

filter_complexes(complex_components: pandas.DataFrame, interaction_participants: pandas.DataFrame) → pandas.DataFrame[source]

Filter complex components to those containing the selected interaction participants. All components of each complex containing an interaction participant entity will be kept. This is used to indirectly filter by pathways.

Parameters:

complex_components – The complex components dataframe.
interaction_participants – The interaction participants dataframe.

filter_controls(controls: pandas.DataFrame, interaction_participants: pandas.DataFrame) → pandas.DataFrame[source]

Filter controlled-controller relations to those for which the controlled entity is included in the interaction participants.

Parameters:

controls – The controls dataframe.
interaction_participants – The interaction participants dataframe.

filter_member_entities(member_entities: pandas.DataFrame, interaction_participants: pandas.DataFrame) → pandas.DataFrame[source]

Filter member_entities to those containing the selected interaction participants. All members with a transitive member relation to one of the interaction participants will be keps. This is used to indirectly filter by pathways.

Parameters:

member_entities – The member entities dataframe.
interaction_participants – The interaction participants dataframe.

get_custom_node_attributes(node: str) → dict[str, Any][source]

Get custom node attributes. Override in a subclass for custom user data.

Parameters:: node – The target node.
Returns:: A dict of custom node attributes to add to the given node.

get_graph() → networkx.Graph[source]: Get the graph for the current configuration.

get_node_identifier(node: str) → str[source]

Get a node identifier for the given node. This is sometimes required to ensure that the same entity is recognized as such when input data assigns different identifiers to the same entity. Subclass this class and override this method to handle custom identifiers.

Parameters:: node – The node identifier, usually the entity string.
Returns:: The possibly modified node identifier to use for nodes in the graph.

property location_data[source]: The dataframe of location data.

property pathway_data[source]: The dataframe of pathway data.

class metabograph.biopax.graph_generator.Direction(*values)[source]

Bases: Enum

Directions for some participant relations.

LEFT_TO_RIGHT = 2[source]

REVERSIBLE = 1[source]

RIGHT_TO_LEFT = 3[source]

classmethod from_str(string: str) → Self[source]

Convert a string to an instance of Direction.

Parameters:: string – The string to convert.
Returns:: The corresponding Direction.

metabograph.biopax.protein module

Protein-specific methods for BiopaxQueryManager.

metabograph.biopax.protein.get_protein_entities(bqm: BiopaxQueryManager)[source]

Get the dataframe of protein entities.

Parameters:: bqm – An instance of BiopaxQueryManager.
Returns:: A Pandas DataFrame.

metabograph.biopax.protein.get_uniprot_id(row: dict[str, Any], bqm: BiopaxQueryManager) → str | None[source]

Get the UniProt ID for a row if it exists.

Parameters:

row – The row from the dataframe of entities.
bqm – An instance of BiopaxQueryManager.

Returns:

The UniProt ID, or None.

metabograph.biopax.protein.map_members_to_uniprot_ids(prot_ents: pandas.DataFrame, bqm: BiopaxQueryManager)[source]

Get a series with UniProt IDs of the members in the members column.

Parameters:

prot_ents – The protein entity dataframe.
bqm – An instance of BiopaxQueryManager.

Returns:

A Pandas series with the UniProt IDs.

metabograph.biopax.protein.map_uniprot_ids_to_gene_ids(prot_ents: pandas.DataFrame, bqm: BiopaxQueryManager)[source]

Map UniProt IDs to different gene IDs (GeneID, Ensembl).

Parameters:

prot_ents – The protein entity dataframe.
bqm – An instance of BiopaxQueryManager.

Returns:

The input dataframe with additional columns for the gene IDs. They will follow the format of the UniProt ID columns. For example, The “uniprot” column will map to a “geneid” column, and the “member_uniprot” will map to a “member_geneid” column. Multiple UniProt IDs separated by the BiopaxQueryManager’s item delimiter will map to multiple gene IDs separated by the same delimiter. Missing values will map to empty strings.

metabograph.biopax.query_manager module

Download and query BioPAX data.

References: * https://www.biopax.org/owldoc/Level3/

class metabograph.biopax.query_manager.BiopaxQueryManager(config: Config, debug: bool = False)[source]

Bases: object

BioPAX data manager.

FIELD_DELIMITER = ':::'[source]

ITEM_DELIMITER = ';;;'[source]

__init__(config: Config, debug: bool = False)[source]

property biopax_dir: Path[source]: The cache directory path for BioPAX data.

property endpoint: str[source]: The configured SPARQL endpoint. It may be None.

property entity_to_location_name_mapper: dict[str, str][source]: A dict mapping entities to their locations.

property entity_to_names_mappers: dict[str, str], dict[str, str][source]: Dicts mapping physical entities to names. The first maps entities to display names while the second maps entities to names. The names may be inconsistent due to the underlying data.

property entity_to_simplified_id: dict[str, str][source]: A dict mapping entities to simplified ID strings. This works by removing the common prefix from all entities. If there is no common prefix, the returned dict will be empty.

property entity_to_type_mapper: dict[str, str][source]: A dict mapping entities to their types.

extract_species_owl_file(dir_path: str | Path) → Path[source]

Extract a species OWL file to a directory. The data will be downloaded if necessary.

Parameters:: dir_path – An output directory path.
Returns:: The path to the extract file.

get_owl_files(dir_path: str | Path) → list[Path][source]

Get the configured OWL files. Retrievable files may be downloaded or extracted to the given path. Existing files will simply return their paths, which may lie outside of the given directory.

Parameters:: dir_path – The path to a directory.
Returns:: The generator of OWL file paths as pathlib.Path objects.

get_protein_entities()[source]

Get the dataframe of protein entities.

Parameters:: bqm – An instance of BiopaxQueryManager.
Returns:: A Pandas DataFrame.

property level_3_owl_file: Path[source]: Get the path to the BioPAX level 3 OWL file. This may be a user-configured path. It not set, the biopax.org data will be used. The file will be downloaded if necessary.

list_locations() → list[str][source]: List all known locations by name.

list_pathways() → list[str][source]: List all known pathways by name.

list_species() → list[str][source]: List the available species.

query(query: str) → pandas.DataFrame[source]

Run a SPARQL query on the loaded graph.

Parameters:: query – A SPARQL query string.

query_complex_components() → pandas.DataFrame[source]: Query all complex components.

query_controls() → pandas.DataFrame[source]: Query all controlled-controller relations.

query_conversion_directions() → pandas.DataFrame[source]: Query all conversion directions.

query_entities() → pandas.DataFrame[source]: Query all entities and their types.

query_interaction_participants() → pandas.DataFrame[source]: Query all interaction-participant-entity relations.

query_interactions() → pandas.DataFrame[source]: Query all interactions.

query_locations() → pandas.DataFrame[source]: Query all locations. Each entity is associated with at most one location.

query_member_entities() → pandas.DataFrame[source]: Query all member entities.

query_participants() → pandas.DataFrame[source]: Query all participant types.

query_pathways() → pandas.DataFrame[source]: Query all pathways and their components.

query_physical_entities() → pandas.DataFrame[source]: Query all physical entities and their references.

property reactome_archive_path: Path[source]: The path to the zipped Reactome BioPAX archive with the specied data. The will will be downloaded if missing.

property species: str[source]: The configured species.

metabograph.biopax.query_manager.owl_file(species: str) → str[source]: Get the name of the OWL file for the given species.

Module contents

Package stub.