catrax-pathfinder is a Python package for discovering and returning candidate paths between two CURIE nodes using a PloverDB endpoint and precomputed databases (NGD and node degree). It supports SQLite and MySQL backends for both the NGD and degree repositories via a simple URL prefix.
pip install catrax-pathfinderYou will need a compatible curie_ngd_v1.0_KG and kg2c_v1.0_KG SQLite database for the KG version you are using.
- Recommended: Ask a team member for mysql urls to these databases
- Alternative: Ask a team member for local copies of these databases
from pathfinder.Pathfinder import Pathfinder
plover_url = "https://kg2cploverdb.ci.transltr.io"
ngd_url = "sqlite:curie_ngd_v1.0_KG2.10.2.sqlite"
degree_url = "sqlite:kg2c_v1.0_KG2.10.2.sqlite"
# Optional filters
blocked_curies = set([
# "CHEBI:1234",
])
blocked_synonyms = set([
# "aspirin",
])
# Any logger-like object is acceptable (e.g., a Python logging.Logger)
logger = None
pathfinder = Pathfinder(
repository_name="MLRepo",
plover_url=plover_url,
ngd_url=ngd_url,
degree_url=degree_url,
blocked_curies=blocked_curies,
blocked_synonyms=blocked_synonyms,
logger=logger,
)
result, aux_graphs, knowledge_graph = pathfinder.get_paths(
src_node_id="MONDO:0005148",
dst_node_id="CHEBI:15365",
src_pinned_node="node_1",
dst_pinned_node="node_2",
hops_numbers=4,
max_hops_to_explore=6,
limit=500,
prune_top_k=30,
degree_threshold=30000,
category_constraints=[],
)Constructor:
Pathfinder(
repository_name: str,
plover_url: str,
ngd_url: str,
degree_url: str,
blocked_curies: Set[str],
blocked_synonyms: Set[str],
logger,
)- repository_name: For now, this should always be
"MLRepo". - plover_url: URL of the PloverDB endpoint (example:
https://kg2cploverdb.ci.transltr.io). - ngd_url: Connection string for the CURIE-NGD repository (SQLite or MySQL).
- degree_url: Connection string for the node degree repository (SQLite or MySQL).
- blocked_curies: A set of CURIE IDs; any path that passes through these CURIEs is dropped.
- blocked_synonyms: A set of strings; any path that passes through nodes whose names match these values is dropped.
- logger: A logger-like object used for logging.
get_paths(
src_node_id: str,
dst_node_id: str,
src_pinned_node: str,
dst_pinned_node: str,
hops_numbers: int = 4,
max_hops_to_explore: int = 6,
limit: int = 500,
prune_top_k: int = 30,
degree_threshold: int = 30000,
category_constraints: Set[str] = None
)- src_node_id: Source CURIE ID.
- dst_node_id: Destination CURIE ID.
- src_pinned_node: Source pinned node ID.
- dst_pinned_node: Destination pinned node ID.
- hops_numbers: Maximum number of hops a returned path can have.
- max_hops_to_explore: Maximum depth to explore during expansion; after exploration, paths longer than
hops_numbersare removed. - limit: Maximum number of paths to return.
- prune_top_k: During each expansion step, neighbors are ranked and only the top
kare kept for further expansion. - degree_threshold: Nodes with degree greater than this threshold are not expanded.
- category_constraints (optional): If non-empty, keeps only paths that contain at least one of these categories.
get_paths(...) returns a 3-tuple of TRAPI-compliant objects.
These correspond to standard Translator Reasoner API (TRAPI) result structures: For more details on TRAPI object formats and the overall API specification, see the TRAPI documentation on GitHub: https://github.com/NCATSTranslator/ReasonerAPI
(result, aux_graphs, knowledge_graph)Both ngd_url and degree_url accept a backend prefix.
Use sqlite: followed by the SQLite filename/path.
- NGD example:
sqlite:curie_ngd_v1.0_KG2.10.2.sqlite
- Degree example:
sqlite:kg2c_v1.0_KG2.10.2.sqlite
Use mysql: followed by your MySQL config string.
- NGD example:
mysql:arax-databases-mysql.rtx.ai:public_ro:curie_ngd_v1_0_kg2_10_2
- Degree example:
mysql:arax-databases-mysql.rtx.ai:public_ro:kg2c_v1_0_kg2_10_2
The package automatically detects which backend to use based on the
sqlite:/mysql:prefix.
- Start with smaller
hops_numbersandlimitif you are experimenting, then scale up. - If exploration grows too quickly on high-degree nodes, consider lowering
degree_thresholdand/orprune_top_k. - Use
blocked_curiesandblocked_synonymsto remove known “noisy” nodes and keep path results cleaner.