Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Orthology recapitulation experiments with Carlo #30

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 36 additions & 0 deletions src/curate_gpt/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -1595,6 +1595,42 @@ def _text_lookup(obj: Dict):
db.update_collection_metadata(collection, object_type="OntologyClass")


# to analyze gene orthology recapitulation
# 1) make gene embeddings for all human and mouse genes (command = make_gene_embeddings)
# 2) choose 1000 pairs of orthologous genes and 1000 pairs of non-orthologous genes, and compare LLM embeddings
# to see if orthology is recapitulated in the embeddings (command = gene_orthology)

@ontology.command(name="make_gene_embeddings")
@click.option("--monarch_url", required=True, default="https://data.monarchinitiative.org/monarch-kg/2024-02-13/monarch-kg.tar.gz", help="URL for the Monarch knowledge graph")
@click.option('--gene_prefix', default=["HGNC:", "MGI:"], type=str, multiple=True)
@click.option('--association_prefix', default=["has_association"], type=str, multiple=True)
@click.option("--phenotype_prefix", default=["HP:", "MP:"], help="Prefix for phenotypes")
@click.option("--collection", required=False, default="gene_embeddings", help="Collection name for gene embeddings")
@click.option("--model_option", required=False, default=None, help="Model to use for embeddings")
def make_gene_embeddings(monarch_url, gene_prefix, association_prefix, phenotype_prefix, collection, model_option):
"""Generate LLM embeddings for human and mouse genes and phenotypes.

Example:
-------
curategpt gene_orthology --monarch_url $URL --hp_collection hp_index --mp_collection mp_index
"""

# Call an agent to:
# download Monarch knowledge graph
# extract gene and phenotype associations
# generate LLM embeddings for human and mouse genes and phenotypes
# write out embeddings to a collection
pass

@click.option("--monarch_url", required=True, default="https://data.monarchinitiative.org/monarch-kg/2024-02-13/monarch-kg.tar.gz", help="URL for the Monarch knowledge graph")
@click.option("--collection", required=False, default="gene_embeddings", help="Collection name for gene embeddings")
@click.option('--gene_prefix', default=["HGNC:", "MGI:"], type=str, multiple=True)
@click.option("--orthology_biolink_type", required=False, default="biolink:orthologous_to", help="Biolink term for orthology")
def gene_orthology(monarch_url, collection, gene_prefix, orthology_biolink_type):
"""Compare LLM embeddings for orthologous and non-orthologous genes.
"""
pass

@main.group()
def view():
"Virtual store/wrapper"
Expand Down
Loading