Skip to content

Commit

Permalink
Merge pull request #16 from qbicsoftware/feature/feedback
Browse files Browse the repository at this point in the history
Add RO-crate use case
  • Loading branch information
wow-such-code authored Oct 23, 2024
2 parents 06c20f7 + d3085f6 commit 3421b1b
Show file tree
Hide file tree
Showing 8 changed files with 409 additions and 101 deletions.
77 changes: 75 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -250,7 +250,7 @@ The script will try to find the provided **openbis ID** in experiments, samples
fetch any missing information to create a SEEK node containing at least one assay (when an
experiment without samples and datasets is specified).

The seek-study needs to be provides to attach the assay. TODO: This information is also used to
The seek-study needs to be provided to attach the assay. TODO: This information is also used to
decide if the node(s) should be updated (if they exist for the provided study) or created anew.

Similarly, the title of the project in SEEK where nodes should be added, can either be provided via
Expand Down Expand Up @@ -291,7 +291,7 @@ least one sample attribute is different in openBIS and SEEK

**Example command:**

`java -jar target/openbis-scripts-1.0.0-jar-with-dependencies.jar openbis-to-seek /MYSPACE/PROJECTY/00_P_INFO_691 mystudy -d -config config.txt --openbis-pw --seek-pw`
`java -jar scripts.jar openbis-to-seek /MYSPACE/PROJECTY/00_P_INFO_691 mystudy -d -config config.txt --openbis-pw --seek-pw`

**Example output:**

Expand All @@ -314,4 +314,77 @@ least one sample attribute is different in openBIS and SEEK
Mismatch found in Gender attribute of /MYSPACE/PROJECTY/00_P_INFO_691. Sample will be updated.
http://localhost:3000/assays/64 was successfully updated.

#### RO-Crates

While the creation of RO-Crates is not fully implemented, the command creates a folder and metadata
structure based on OpenBIS experiment, sample and dataset information. The command works similarly
to the OpenBIS to Seek command, with the difference that no SEEK instance and fewer mapping
parameters need to be provided (there will be no references to existing study or project objects in
SEEK).

The script will try to find the provided **openbis ID** in experiments, samples or datasets and
fetch any missing information to create a folder structure in the provided **ro-path** containing at
least one assay's information (when an experiment without samples and datasets is specified).

Assets (files and their ISA metadata) are stored in a folder named like the openBIS dataset code
they are part of, which is either the subfolder of the experiment (assay), or the subfolder of a
sample, depending on where the dataset was attached in openBIS.

Info in the created asset .jsons always links back to the openBIS path of the respective dataset.
The data itself can be downloaded into the structure using the '-d' flag.

To completely exclude some dataset information from being transferred, a file ('--blacklist')
containing dataset codes (from openBIS) can be specified. //TODO do this for samples/sample types

**Example command:**

`java -jar scripts.jar ro-crate /TEMP_PLAYGROUND/TEMP_PLAYGROUND/TEST_PATIENTS1 my-ro-crate -config config.txt --openbis-pw -d`

**Example output:**

reading config
Transfer openBIS -> RO-crate started.
Provided openBIS object: /TEMP_PLAYGROUND/TEMP_PLAYGROUND/TEST_PATIENTS1
Pack datasets into crate? true
Connecting to openBIS...
Searching for specified object in openBIS...
Search successful.
Collecting information from openBIS...
Translating openBIS structure to ISA structure...
Writing assay json for /TEMP_PLAYGROUND/TEMP_PLAYGROUND/TEST_PATIENTS1.
Writing sample json for /TEMP_PLAYGROUND/TEMP_PLAYGROUND/00_P_INFO_670490.
Writing sample json for /TEMP_PLAYGROUND/TEMP_PLAYGROUND/00_P_INFO_670491.
Writing asset json for file in dataset 20241014205813459-689089.
Downloading dataset file to asset folder.
Writing asset json for file in dataset 20241014210001025-689090.
Downloading dataset file to asset folder.
Writing asset json for file in dataset 20241014205813459-689089.
Downloading dataset file to asset folder.
Writing asset json for file in dataset 20241021191109163-689109.
Downloading dataset file to asset folder.
...

**Creates structure:**

my-ro-crate
└── TEMP_PLAYGROUND_TEMP_PLAYGROUND_TEST_PATIENTS1
├── 20241021125328024-689105
│ ├── README.md
│ └── README.md.json
├── TEMP_PLAYGROUND_TEMP_PLAYGROUND_00_P_INFO_670490
│ └── TEMP_PLAYGROUND_TEMP_PLAYGROUND_00_P_INFO_670490.json
├── TEMP_PLAYGROUND_TEMP_PLAYGROUND_00_P_INFO_670491
│ ├── 20241014210317842-689092
│ │ ├── scripts-new.jar
│ │ └── scripts-new.jar.json
│ ├── 20241021173011602-689108
│ │ └── smol_petab
│ │ ├── metaInformation.yaml
│ │ └── metaInformation.yaml.json
│ ├── 20241021191109163-689109
│ │ ├── testfile_100
│ │ └── testfile_100.json
│ └── TEMP_PLAYGROUND_TEMP_PLAYGROUND_00_P_INFO_670491.json
└── TEMP_PLAYGROUND_TEMP_PLAYGROUND_TEST_PATIENTS1.json

## Caveats and Future Options
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,8 @@
@Command(name = "openbis-scripts",
subcommands = {SampleHierarchyCommand.class, TransferSampleTypesToSeekCommand.class,
DownloadPetabCommand.class, UploadPetabResultCommand.class, UploadDatasetCommand.class,
SpaceStatisticsCommand.class, TransferDataToSeekCommand.class, FindDatasetsCommand.class},
SpaceStatisticsCommand.class, TransferDataToSeekCommand.class, FindDatasetsCommand.class,
CreateROCrate.class},
description = "A client software for querying openBIS.",
mixinStandardHelpOptions = true, versionProvider = ManifestVersionProvider.class)
public class CommandLineOptions {
Expand Down
237 changes: 237 additions & 0 deletions src/main/java/life/qbic/io/commandline/CreateROCrate.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,237 @@
package life.qbic.io.commandline;

import ch.ethz.sis.openbis.generic.OpenBIS;
import ch.ethz.sis.openbis.generic.dssapi.v3.dto.datasetfile.DataSetFile;
import java.io.File;
import java.io.FileWriter;
import java.io.IOException;
import java.net.URISyntaxException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.Arrays;
import java.util.HashMap;
import java.util.HashSet;
import java.util.List;
import java.util.Map;
import java.util.Set;
import java.util.stream.Collectors;
import java.util.stream.Stream;
import javax.xml.parsers.ParserConfigurationException;
import life.qbic.App;
import life.qbic.model.DatasetWithProperties;
import life.qbic.model.OpenbisExperimentWithDescendants;
import life.qbic.model.OpenbisSeekTranslator;
import life.qbic.model.download.OpenbisConnector;
import life.qbic.model.isa.GenericSeekAsset;
import life.qbic.model.isa.ISAAssay;
import life.qbic.model.isa.ISASample;
import life.qbic.model.isa.NodeType;
import life.qbic.model.isa.SeekStructure;
import org.xml.sax.SAXException;
import picocli.CommandLine.Command;
import picocli.CommandLine.Mixin;
import picocli.CommandLine.Option;
import picocli.CommandLine.Parameters;

@Command(name = "ro-crate",
description =
"Transfers metadata and (optionally) data from openBIS to an RO-Crate-like structure that is "
+ "based on assays, samples and one of several data types in SEEK). The data itself can "
+ "be put into the crate using the '-d' flag. To completely exclude some dataset "
+ "information from being transferred, a file ('--blacklist') containing dataset codes "
+ "can be specified. The crate is not zipped at the moment.")
public class CreateROCrate implements Runnable {

@Parameters(arity = "1", paramLabel = "openbis id", description = "The identifier of the "
+ "experiment, sample or dataset to transfer.")
private String objectID;
@Parameters(arity = "1", paramLabel = "ro-path", description = "Path to the output folder")
private String roPath;
@Option(names = "--blacklist", description = "Path to file specifying by dataset "
+ "dataset code which openBIS datasets not to transfer to SEEK. The file must contain one code "
+ "per line.")
private String blacklistFile;
@Option(names = {"-d", "--data"}, description =
"Transfers the data itself to SEEK along with the metadata. "
+ "Otherwise only the link(s) to the openBIS object will be created in SEEK.")
private boolean transferData;
@Mixin
OpenbisAuthenticationOptions openbisAuth = new OpenbisAuthenticationOptions();
OpenbisConnector openbis;
OpenbisSeekTranslator translator;

@Override
public void run() {
App.readConfig();
System.out.printf("Transfer openBIS -> RO-crate started.%n");
System.out.printf("Provided openBIS object: %s%n", objectID);
System.out.printf("Pack datasets into crate? %s%n", transferData);
if(blacklistFile!=null && !blacklistFile.isBlank()) {
System.out.printf("File with datasets codes that won't be transferred: %s%n", blacklistFile);
}

System.out.println("Connecting to openBIS...");

OpenBIS authentication = App.loginToOpenBIS(openbisAuth.getOpenbisPassword(),
openbisAuth.getOpenbisUser(), openbisAuth.getOpenbisAS(), openbisAuth.getOpenbisDSS());

this.openbis = new OpenbisConnector(authentication);

System.out.println("Searching for specified object in openBIS...");

boolean isExperiment = experimentExists(objectID);
NodeType nodeType = NodeType.ASSAY;

if (!isExperiment && sampleExists(objectID)) {
nodeType = NodeType.SAMPLE;
}

if (!isExperiment && !nodeType.equals(NodeType.SAMPLE) && datasetsExist(
Arrays.asList(objectID))) {
nodeType = NodeType.ASSET;
}

if (nodeType.equals(NodeType.ASSAY) && !isExperiment) {
System.out.printf(
"%s could not be found in openBIS. Make sure you either specify an experiment, sample or dataset%n",
objectID);
return;
}
System.out.println("Search successful.");

try {
translator = new OpenbisSeekTranslator(openbisAuth.getOpenbisBaseURL());
} catch (IOException | ParserConfigurationException | SAXException e) {
throw new RuntimeException(e);
}
OpenbisExperimentWithDescendants structure;
System.out.println("Collecting information from openBIS...");
switch (nodeType) {
case ASSAY:
structure = openbis.getExperimentWithDescendants(objectID);
break;
case SAMPLE:
structure = openbis.getExperimentAndDataFromSample(objectID);
break;
case ASSET:
structure = openbis.getExperimentStructureFromDataset(objectID);
break;
default:
throw new RuntimeException("Handling of node type " + nodeType + " is not supported.");
}
Set<String> blacklist = parseBlackList(blacklistFile);
System.out.println("Translating openBIS structure to ISA structure...");
try {
SeekStructure nodeWithChildren = translator.translateForRO(structure, blacklist, transferData);
String experimentID = nodeWithChildren.getAssayWithOpenBISReference().getRight();
ISAAssay assay = nodeWithChildren.getAssayWithOpenBISReference().getLeft();
String assayFileName = openbisIDToFileName(experimentID);

String assayPath = Path.of(roPath, assayFileName).toString();
new File(assayPath).mkdirs();

System.out.printf("Writing assay json for %s.%n", experimentID);
writeFile(Path.of(assayPath, assayFileName)+".json", assay.toJson());

for(ISASample sample : nodeWithChildren.getSamplesWithOpenBISReference().keySet()) {
String sampleID = nodeWithChildren.getSamplesWithOpenBISReference().get(sample);
String sampleFileName = openbisIDToFileName(sampleID);
String samplePath = Path.of(assayPath, sampleFileName).toString();
new File(samplePath).mkdirs();

System.out.printf("Writing sample json for %s.%n", sampleID);
writeFile(Path.of(samplePath, sampleFileName)+".json", sample.toJson());
}

Map<String, String> datasetIDToDataFolder = new HashMap<>();

for(DatasetWithProperties dwp : structure.getDatasets()) {
String sourceID = dwp.getClosestSourceID();
String code = dwp.getCode();
if(sourceID.equals(experimentID)) {
Path folderPath = Path.of(assayPath, code);
File dataFolder = new File(folderPath.toString());
datasetIDToDataFolder.put(dwp.getCode(), dataFolder.getAbsolutePath());
} else {
Path samplePath = Path.of(assayPath, openbisIDToFileName(sourceID), code);
File dataFolder = new File(samplePath.toString());
datasetIDToDataFolder.put(dwp.getCode(), dataFolder.getAbsolutePath());
}
}

for(GenericSeekAsset asset : nodeWithChildren.getISAFileToDatasetFiles().keySet()) {
DataSetFile file = nodeWithChildren.getISAFileToDatasetFiles().get(asset);
String datasetID = file.getDataSetPermId().getPermId();
String dataFolderPath = datasetIDToDataFolder.get(datasetID);
String assetJson = asset.toJson();
String assetWithoutOriginFolder = asset.getFileName().replace("original","");
File assetFolder = Path.of(dataFolderPath, assetWithoutOriginFolder).getParent().toFile();
assetFolder.mkdirs();

String assetPath = Path.of(dataFolderPath, assetWithoutOriginFolder+".json").toString();
System.out.printf("Writing asset json for file in dataset %s.%n", datasetID);
writeFile(assetPath, assetJson);
if(transferData) {
System.out.printf("Downloading dataset file to asset folder.%n");
openbis.downloadDataset(dataFolderPath, datasetID, asset.getFileName());
}
}
} catch (URISyntaxException | IOException e) {
throw new RuntimeException(e);
}

System.out.println("Done");
}

private String openbisIDToFileName(String id) {
id = id.replace("/","_");
if(id.startsWith("_")) {
return id.substring(1);
} else {
return id;
}
}

private void writeFile(String path, String content) throws IOException {
FileWriter file = new FileWriter(path);
file.write(content);
file.close();
}

private Set<String> parseBlackList(String blacklistFile) {
if(blacklistFile == null) {
return new HashSet<>();
}
// trim whitespace, skip empty lines
try (Stream<String> lines = Files.lines(Paths.get(blacklistFile))
.map(String::trim)
.filter(s -> !s.isBlank())) {

Set<String> codes = lines.collect(Collectors.toSet());

for(String code : codes) {
if(!OpenbisConnector.datasetCodePattern.matcher(code).matches()) {
throw new RuntimeException("Invalid dataset code: " + code+". Please make sure to use valid"
+ " dataset codes in the blacklist file.");
}
}
return codes;
} catch (IOException e) {
throw new RuntimeException(blacklistFile+" could not be found or read.");
}
}

private boolean sampleExists(String objectID) {
return openbis.sampleExists(objectID);
}

private boolean datasetsExist(List<String> datasetCodes) {
return openbis.findDataSets(datasetCodes).size() == datasetCodes.size();
}

private boolean experimentExists(String experimentID) {
return openbis.experimentExists(experimentID);
}

}
11 changes: 11 additions & 0 deletions src/main/java/life/qbic/model/DatasetWithProperties.java
Original file line number Diff line number Diff line change
Expand Up @@ -57,4 +57,15 @@ public Person getRegistrator() {
public Date getRegistrationDate() {
return dataset.getRegistrationDate();
}

/**
* Returns sample ID or experiment ID, if Dataset has no sample.
*/
public String getClosestSourceID() {
if(dataset.getSample()!=null) {
return dataset.getSample().getIdentifier().getIdentifier();
} else {
return getExperiment().getIdentifier().getIdentifier();
}
}
}
Loading

0 comments on commit 3421b1b

Please sign in to comment.