metabograph.biopax package
Submodules
metabograph.biopax.graph_generator module
Create NetworkX graphs from BioPAX data.
- class metabograph.biopax.graph_generator.BiopaxGraphGenerator(*, config: Config = None, bqm: BiopaxQueryManager = None)[source]
Bases:
object
Create NetworkX graphs from BioPAX data.
- __init__(*, config: Config = None, bqm: BiopaxQueryManager = None)[source]
- Parameters:
config – An instance of Config. If not given, then the bqm parameter is required and the configuration from the BiopaxQueryManager instance will be used.
bqm – An instance of BiopaxQueryManager. If not given, one will be instantiated from the given Config instance.
- filter_by_location(data: pandas.DataFrame, ent_cols: list[str]) pandas.DataFrame [source]
Filter entities by their location.
- Parameters:
data – The dataframe to filter.
end_cols – The column names of the entities.
- Returns:
The filtered dataframe.
- filter_by_pathway(data: pandas.DataFrame, pathway_cols: list[str]) pandas.DataFrame [source]
Filter interactions by pathway.
- Parameters:
data – The dataframe to filter.
end_cols – The column names of the entities.
- Returns:
The filtered dataframe.
- filter_complexes(complex_components: pandas.DataFrame, interaction_participants: pandas.DataFrame) pandas.DataFrame [source]
Filter complex components to those containing the selected interaction participants. All components of each complex containing an interaction participant entity will be kept. This is used to indirectly filter by pathways.
- Parameters:
complex_components – The complex components dataframe.
interaction_participants – The interaction participants dataframe.
- filter_controls(controls: pandas.DataFrame, interaction_participants: pandas.DataFrame) pandas.DataFrame [source]
Filter controlled-controller relations to those for which the controlled entity is included in the interaction participants.
- Parameters:
controls – The controls dataframe.
interaction_participants – The interaction participants dataframe.
- filter_member_entities(member_entities: pandas.DataFrame, interaction_participants: pandas.DataFrame) pandas.DataFrame [source]
Filter member_entities to those containing the selected interaction participants. All members with a transitive member relation to one of the interaction participants will be keps. This is used to indirectly filter by pathways.
- Parameters:
member_entities – The member entities dataframe.
interaction_participants – The interaction participants dataframe.
- get_custom_node_attributes(node: str) dict[str, Any] [source]
Get custom node attributes. Override in a subclass for custom user data.
- Parameters:
node – The target node.
- Returns:
A dict of custom node attributes to add to the given node.
- get_node_identifier(node: str) str [source]
Get a node identifier for the given node. This is sometimes required to ensure that the same entity is recognized as such when input data assigns different identifiers to the same entity. Subclass this class and override this method to handle custom identifiers.
- Parameters:
node – The node identifier, usually the entity string.
- Returns:
The possibly modified node identifier to use for nodes in the graph.
metabograph.biopax.protein module
Protein-specific methods for BiopaxQueryManager.
- metabograph.biopax.protein.get_protein_entities(bqm: BiopaxQueryManager)[source]
Get the dataframe of protein entities.
- Parameters:
bqm – An instance of
BiopaxQueryManager
.- Returns:
A Pandas DataFrame.
- metabograph.biopax.protein.get_uniprot_id(row: dict[str, Any], bqm: BiopaxQueryManager) str | None [source]
Get the UniProt ID for a row if it exists.
- Parameters:
row – The row from the dataframe of entities.
bqm – An instance of
BiopaxQueryManager
.
- Returns:
The UniProt ID, or None.
- metabograph.biopax.protein.map_members_to_uniprot_ids(prot_ents: pandas.DataFrame, bqm: BiopaxQueryManager)[source]
Get a series with UniProt IDs of the members in the members column.
- Parameters:
prot_ents – The protein entity dataframe.
bqm – An instance of
BiopaxQueryManager
.
- Returns:
A Pandas series with the UniProt IDs.
- metabograph.biopax.protein.map_uniprot_ids_to_gene_ids(prot_ents: pandas.DataFrame, bqm: BiopaxQueryManager)[source]
Map UniProt IDs to different gene IDs (GeneID, Ensembl).
- Parameters:
prot_ents – The protein entity dataframe.
bqm – An instance of
BiopaxQueryManager
.
- Returns:
The input dataframe with additional columns for the gene IDs. They will follow the format of the UniProt ID columns. For example, The “uniprot” column will map to a “geneid” column, and the “member_uniprot” will map to a “member_geneid” column. Multiple UniProt IDs separated by the BiopaxQueryManager’s item delimiter will map to multiple gene IDs separated by the same delimiter. Missing values will map to empty strings.
metabograph.biopax.query_manager module
Download and query BioPAX data.
References: * https://www.biopax.org/owldoc/Level3/
- class metabograph.biopax.query_manager.BiopaxQueryManager(config: Config, debug: bool = False)[source]
Bases:
object
BioPAX data manager.
- property entity_to_location_name_mapper: dict[str, str][source]
A dict mapping entities to their locations.
- property entity_to_names_mappers: dict[str, str], dict[str, str][source]
Dicts mapping physical entities to names. The first maps entities to display names while the second maps entities to names. The names may be inconsistent due to the underlying data.
- property entity_to_simplified_id: dict[str, str][source]
A dict mapping entities to simplified ID strings. This works by removing the common prefix from all entities. If there is no common prefix, the returned dict will be empty.
- extract_species_owl_file(dir_path: str | Path) Path [source]
Extract a species OWL file to a directory. The data will be downloaded if necessary.
- Parameters:
dir_path – An output directory path.
- Returns:
The path to the extract file.
- get_owl_files(dir_path: str | Path) list[Path] [source]
Get the configured OWL files. Retrievable files may be downloaded or extracted to the given path. Existing files will simply return their paths, which may lie outside of the given directory.
- Parameters:
dir_path – The path to a directory.
- Returns:
The generator of OWL file paths as pathlib.Path objects.
- get_protein_entities()[source]
Get the dataframe of protein entities.
- Parameters:
bqm – An instance of
BiopaxQueryManager
.- Returns:
A Pandas DataFrame.
- property level_3_owl_file: Path[source]
Get the path to the BioPAX level 3 OWL file. This may be a user-configured path. It not set, the biopax.org data will be used. The file will be downloaded if necessary.
- query(query: str) pandas.DataFrame [source]
Run a SPARQL query on the loaded graph.
- Parameters:
query – A SPARQL query string.
- query_interaction_participants() pandas.DataFrame [source]
Query all interaction-participant-entity relations.
- query_locations() pandas.DataFrame [source]
Query all locations. Each entity is associated with at most one location.
- query_physical_entities() pandas.DataFrame [source]
Query all physical entities and their references.
Module contents
Package stub.