-
Notifications
You must be signed in to change notification settings - Fork 98
Storage Command Line
OpenCGA Storage (from v0.7+) implements two different command line interfaces (CLI) to allow users to easily work with the storage engines. These two CLIs are divided in client and server functionality and scripts:
- client: allow to load, index and query data among other specific features such as variant annotation. It is available in the script opencga-storage.sh
- server: two different servers to query data have been implemented, first a standard RESTful web service using Jetty, second a server using new gRPC technology that offers a more high-performance and scalable solution. It is available in the script opencga-storage-server.sh
Both Storage CLIs have been implemented with two levels: commands and subcommands to better organize functionality and provide specific parameters.
The different available commands are feature, alignment and variant, and the subcommands are:
- feature
- index: GFF/BED files are indexed using tabix by default, some plugins could override this and index in MongoDB or HBase
- query: to execute region-based queries
- alignment
- index: BAM/CRAM files are indexed using samtools by default, but some Storage plugins can use more advanced technologies such as Apache Hive
- query (old fetch-alignments): this allows to execute queries implemented in AlignmentDBAdaptor such as region-based or by coverage
- stats: basic file statistics and coverage is calculated
- benchmark: executes the common framework implemented, this allows to study the indexing and query times across different plugin implementations
- variant
- index: VCF/BCF (and gVCF) files are indexed using tabix by default, but some Storage plugins can use more advanced technologies such as MongoDB and Apache HBase to provide a much more high-performance and scalable solution
- query (old fetch-variants): execute queries implemented in VariantDBAdaptor such as region-based or by variant annotation
- query-grpc: this gRPC client executes queries to a remote gRPC server
- annotate: create and load the variant annotation from CellBase or Ensembl VEP, these are indexed with the data in mongodb and hadoop plugins
- stats: calculate variant and sample stats for different cohorts, these are indexed with the data in mongodb and hadoop plugins
- sample: sample-based aggregation queries
- admin: remove variants, samples, … from databases
- benchmark: executes the common framework implemented, this allows to study the indexing and query times across different plugin implementations
The different available commands are rest and grpc, and the subcommands are:
- rest
- start: start Jetty for RESTful web services at port 9090 by default
- stop: stop Jetty server
- status: prints some useful information about the server status
- grpc
- start: start gRPC server at port 9091 by default
- stop: stop gRPC server
- status: prints some useful information about the server status
These parameters are not specified on the command line and will change internal configuration parameters. Depending on the biotype (alignment or variant) and the selected storage engine, this parameters will be added to the redden configuration file in the options field.
-D<configuration-parameter-name>=<value>
The file storage-configuration.yml
should be placed at $OPENCGA_HOME/conf/
, and contains all configuration needed by OpenCGA-Storage. There are tree main blocks: storageEngines
, server
and cellbase
.
The Storage command line interface defines this set of commands:
- index-variants Index variants file
- fetch-variants Search over indexed variants
- annotate-variants Create and load variant annotations into the database
- stats-variants Create and load stats into a database.
- create-accessions Creates accession IDs for an input file
- index-alignments Index alignment file
- fetch-alignments Search over indexed alignments
These parameters are not specified on the command line and will change internal configuration parameters. Depending on the biotype (alignment or variant) and the selected storage engine, this parameters will be added to the redden configuration file in the options field.
-D<configuration-parameter-name>=<value>
The file storage-configuration.yml
should be placed at $OPENCGA_HOME/conf/
, and contains all configuration needed by OpenCGA-Storage. There are tree main blocks: storageEngines
, server
and cellbase
.
- Storage configuration
- Storage Engine configuration
- Variant
- Alignment
- Server configuration
- CellBase configuration
- Storage Engine configuration
Can define a set of configuration options for each installed storage-engine (mongodb, hadoop, ...). Each one contains a section for every supported biotype, currently alignment and variant.
Common options between all storage-engines for variants are defined in VariantStorageManager::Options
Common options between all storage-engines for alignments are defined in AlignmentStorageManager::Options
OpenCGA is an open source project and it is freely available.
General
- Home
- Architecture
- Data Models
- RESTful Web Services
- Configuration
- Download and Installation
- Tutorials
OpenCGA Catalog
OpenCGA Storage
About