Skip to content

RMassBank development task list

meowcat edited this page Dec 16, 2022 · 8 revisions

This page is supposed to be a loose collection of development proposals for RMassBank, which will be discussed in an upcoming dev meeting.

Use a h2 tag for every proposal. Descriptions may be as long or short as needed; make subsections with h3 if required.

There is no automatic TOC here apparently :(

Linting the sourcecode

Author: meowcat

Over the years we have amassed a lot of codestyle inconsistencies (and some of the things were bad to start with). We should fix bad practices.

  • I propose to run lintr and fix everything it suggests, with prio 1 to the actual package code (R directory).
  • Default lintr (and tidyverse, and most of modern R) style is snake_case. However Bioconductor style is camelCase for functions (and CamelCaps for S4 classes) and I suggest we stick with this because all our user-facing functions are camelCase. A lot of internal variables are snake_case or some other inconsistent mess, we should clean those up.
  • There is https://lcolladotor.github.io/biocthis/reference/bioc_style.html which might work.

Moving functionality to dplyr and purrr

Author: meowcat

An awful lot of RMassBank functionality is a bad implementation of what Tidyverse already does well, very crufty ways of doing the equivalent of mutate, bind_rows, map. We could cut out a lot of code and make the code much more readable if we rewrote the code to use tidyverse functionality.

  • The one drawback is that we would add a dependency, since we technically don't yet depend on dplyr and purrr. However these packages are nearly globally installed nowadays.

Unit tests

Author: meowcat

Old topic, but still not done. We should test more of our code.

Adding precursor and fragment EIC extraction / correlation

Author: meowcat

At Eawag/Uchem and at ETH, we already use wrapper scripts that extract EICs for all fragments and measure the correlation to the precursor, this is a second criterion in addition to formula assignment to get better discrimination between noise and signal. The current way is very hacky and uses attributes to tack stuff into the existing S4 objects.

I suggest to move this as an option to msmsRead. The data should go directly into the child spectra, probably as a List column in the properties dataframe.

Match collision energy from raw data

Existing issue: https://github.com/MassBank/RMassBank/issues/316