Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

results from function mgoSim based on "Wang" method #30

Open
DeepColin opened this issue Aug 21, 2020 · 1 comment
Open

results from function mgoSim based on "Wang" method #30

DeepColin opened this issue Aug 21, 2020 · 1 comment

Comments

@DeepColin
Copy link

I have found that different types of hsGO have no effect on the similarity results. R codes as below:

go1 <- c("GO:0000005","GO:0000007") # MF
go2 <- c("GO:0005385", "GO:0004553") # MF
go3 <- c("GO:0000017", "GO:0000014") # BP + MF

hsGO <- godata('org.Hs.eg.db', ont="BP", computeIC=FALSE)
mgoSim(go1,go2,semData=hsGO,measure="Wang",combine="BMA") # 0.473
mgoSim(go1,go3,semData=hsGO,measure="Wang",combine="BMA") # 0.016
hsGO <- godata('org.Hs.eg.db', ont="MF", computeIC=FALSE)
mgoSim(go1,go2,semData=hsGO,measure="Wang",combine="BMA") # 0.473
mgoSim(go1,go3,semData=hsGO,measure="Wang",combine="BMA") # 0.016
hsGO <- godata('org.Hs.eg.db', ont="CC", computeIC=FALSE)
mgoSim(go1,go2,semData=hsGO,measure="Wang",combine="BMA") # 0.473
mgoSim(go1,go3,semData=hsGO,measure="Wang",combine="BMA") # 0.016

Are these results reasonable?

@QianqianLiang
Copy link

I have been having similar issues with Wang method. When I input the same terms with different ontologies, I always get back exactly the same results.
I went through the functions in WangMethod.R and found the following line in getSV function:
line 58-61:

  if( exists(ID, envir=.SemSimCache) ) {
    sv <- get(ID, envir=.SemSimCache)
    return(sv)
  }

line 108-112:

if( ! exists(ID, envir=.SemSimCache) ) {
    assign(ID,
           sv,
           envir=.SemSimCache)
  }

It stores the Semantic Value of an ID into the .SemSimCache environment once you run it. The next time you want to get the Semantic Value of the same ID it will automatically retrieve it from the environment rather than run it again. The problem with this is that, if you want to retrieve the semantic value of the same ID in different ontologies, it will always give you back the one you first run it.
A quick way to prevent this is that you clear the .SemSimCache environment before you run the second one with the same ID. In your case you basically can do:

remove(list = ls(envir = .SemSimCache), envir = .SemSimCache)
hsGO <- godata(‘org.Hs.eg.db’, ont=“BP”, computeIC=FALSE)
mgoSim(go1,go2,semData=hsGO,measure=“Wang”,combine=“BMA”) # 0
mgoSim(go1,go3,semData=hsGO,measure=“Wang”,combine=“BMA”) # 0
remove(list = ls(envir = .SemSimCache), envir = .SemSimCache)
hsGO <- godata(‘org.Hs.eg.db’, ont=“MF”, computeIC=FALSE)
mgoSim(go1,go2,semData=hsGO,measure=“Wang”,combine=“BMA”) # 0.473
mgoSim(go1,go3,semData=hsGO,measure=“Wang”,combine=“BMA”) # 0.016
remove(list = ls(envir = .SemSimCache), envir = .SemSimCache)
hsGO <- godata(‘org.Hs.eg.db’, ont=“CC”, computeIC=FALSE)
mgoSim(go1,go2,semData=hsGO,measure=“Wang”,combine=“BMA”) # 0
mgoSim(go1,go3,semData=hsGO,measure=“Wang”,combine=“BMA”) # 0

I also attached the script with the modified getSV function (lightly tested). I store ID along with the ontology so it will retrieve the stored ones only if the input has both the same ID and ontology.

getSV <- function(ID, ont, rel_df, weight=NULL) {
  ID_ont = paste(ID, ont, sep = “:”)
  if (!exists(“.SemSimCache”)) .initial()
  .SemSimCache <- get(“.SemSimCache”, envir=.GlobalEnv)
  if( exists(ID_ont, envir=.SemSimCache) ) {
    sv <- get(ID_ont, envir=.SemSimCache)
    return(sv)
  }
  if (ont == “DO”) {
    topNode <- “DOID:4"
  } else {
    topNode <- “all”
  }
  if (ID == topNode) {
    sv <- 1
    names(sv) <- topNode
    return (sv)
  }
  if (is.null(weight)) {
    weight <- c(0.8, 0.6, 0.7)
    names(weight) <- c(“is_a”, “part_of”, “other”)
  }
  rel_df <- rel_df[rel_df$Ontology == ont,]
  if (! ‘relationship’ %in% colnames(rel_df))
    rel_df$relationship <- “other”
  rel_df$relationship[!rel_df$relationship %in% c(“is_a”, “part_of”)] <- “other”
  sv <- 1
  names(sv) <- ID
  allid <- ID
  idx <- which(rel_df[,1] %in% ID)
  while (length(idx) != 0) {
    p <- rel_df[idx,]
    pid <- p$parent
    allid <- c(allid, pid)
    sv <- c(sv, weight[p$relationship]*sv[p[,1]])
    names(sv) <- allid
    idx <- which(rel_df[,1] %in% pid)
  }
  sv <- sv[!is.na(names(sv))]
  sv <- sv[!duplicated(names(sv))]
  if(ont != “DO”)
    sv[topNode] <- 0
  if( ! exists(ID_ont, envir=.SemSimCache) ) {
    assign(ID_ont,
           sv,
           envir=.SemSimCache)
  }
  return(sv)
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants