Data explorer indexers

Overview

This repo contains indexers which index a dataset into Elasticsearch, for use by the Data Explorer UI.

For each dataset, two Elasticsearch indices are created:

The main dataset index, named DATASET
The fields index, named DATASET_fields

Main dataset index

This index is used for faceted search.

Each Elasticsearch document represents a participant. The document id is participant id.

A participant can have zero or more samples. Within a participant document, a sample is a nested object. Each nested object has a sample_id field.

Participant fields are in the top-level participant document. Sample fields are in the nested sample objects. For example, here's an excerpt of a 1000_genomes document:

"_id" : "NA12003",
"_source" : {
  "verily-public-data.human_genome_variants.1000_genomes_participant_info.Super_Population" : "EUR",
  "verily-public-data.human_genome_variants.1000_genomes_participant_info.Gender" : "male",
  "samples" : [
    {
      "sample_id" : "HG02924",
      "verily-public-data.human_genome_variants.1000_genomes_sample_info.In_Low_Coverage_Pilot" : true
      "verily-public-data.human_genome_variants.1000_genomes_sample_info.chr_18_vcf" : "gs://genomics-public-data/1000-genomes-phase-3/vcf-20150220/ALL.chr18.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf",
    }
  ],
}

Fields index

This index is used for field search. TODO: Include screenshot.

Each document represents a field. The document id is name of the Elasticsearch field from the main dataset index. Example fields are age, gender, etc. Here's an example document from 1000_genomes_fields:

"_id" : "samples.verily-public-data.human_genome_variants.1000_genomes_sample_info.In_Low_Coverage_Pilot",
"_source" : {
  "name" : "In_Low_Coverage_Pilot",
  "description" : "The sample is in the low coverage pilot experiment"
}

One-time setup

Set up git secrets.

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
.circleci		.circleci
bigquery		bigquery
dataset_config		dataset_config
hooks		hooks
indexer_util		indexer_util
kubernetes-elasticsearch-cluster		kubernetes-elasticsearch-cluster
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data explorer indexers

Overview

Main dataset index

Fields index

One-time setup

About

Releases

Packages

Languages

License

RoriCremer/data-explorer-indexers

Folders and files

Latest commit

History

Repository files navigation

Data explorer indexers

Overview

Main dataset index

Fields index

One-time setup

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages