Skip to content
/ OSIM-v5 Public
forked from OHDSI/OSIM-v5

An updated version of OSIM for CDM v5

Notifications You must be signed in to change notification settings

bcbi/OSIM-v5

 
 

Repository files navigation

OSIM_v5 version 1.0.001

Synthetic Health Data Generator - OMOP CDM v5 version for OSIM simulator

Description

OSIM 5 is a OMOP CDM v5 compatible procedure for constructing simulated observational datasets. The simulated datasets are modeled after real observational data sources, but consist of synthetic persons with simulated drug exposures, condition exposures and procedure occurrences. These condition/drug/procedure instances are based on random draws from probability distributions; these distribution are modeled after the relationships between real like drugs and conditions.

This package was built on PostgreSQL.

Current Version

Written in PostgreSQL: Kausar Mukadam, Georgia Tech Research Institute

Additional Documentation

For more information about version 5, please check the enclosed presentation.

Please refer to the documentation available in v2 Documentation folder, or the paper avaibale at: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3243118/ for more information about the initial version.

Initial Version

Oracle PL/SQL: Rich Murray, United BioSource Corporation
Last modified: 15 February 2011
2010 Foundation for the National Institutes of Health

Please note that much of the documentation and code logic for verison 5 is similar to version 2 with the addition of support for procedure occurrence generation and syntactical changes as required for the new format and PostgreSQL conversion.

Execution Process

Step 1: Edit Views

Modify first 4 views in OSIM5_views.sql

In order to analyze a CDM format database, the schema and tables for the source data need to be specified in OSIM 5. Modify the first 4 views in OSIM5_views.sql to point to the required tables. The final views should look as follows

============================================================================
Example view creation:
CREATE OR REPLACE VIEW s_person as select * from [CDM_SCHEMA].person;
CREATE OR REPLACE VIEW s_condition_era as select * from [CDM_SCHEMA].condition_era;
CREATE OR REPLACE VIEW s_observation_period as select * from [CDM_SCHEMA].observation_period;
CREATE OR REPLACE VIEW s_drug_era as select * from [CDM_SCHEMA].drug_era;

Step 2: Create Views

Run OSIM5_views.sql file

The above generated views are accessed through a standard set of read-only views. These are contained in the OSIM5_views.sql file and are seperate from the OSIM package and can be slightly modified with specialized filtering to limit the analysis (ex. gender_concept_id, person_id range).

After modifying the initial views in step 1, these additional views must be created by executing the OSIM5_views.sql script in Postgres.

NOTE: Change from version 2! - The persistance window cannot be changed in this version of the package. The code uses the standard persistance windows and does not filter based on persistance.

v_src_person -- This view selects the persons to analyze. The standard view limits selection to persons with an observation_period record and year_of_birth value.

v_src_person_strata -- This view returns the persons in the v_src_person view with a few additional precalculated values commonly used by the analysis, including distinct condition count, distinct drug count, and age.

v_observation_period -- This view returns the observation_period rows for the persons in the v_src_person view.

v_src_condition_era1_ids -- This view returns all IDs of the condition_eras.

v_src_condition_era1 -- This view returns all condition_era rows.

v_src_first_conditions -- This view returns only the first occurrence condition eras.

v_all_conditions -- This view returns all condition_eras including a precalculated person age at conditon start value.

v_src_drug_era1 -- This view returns all drug_eras.

v_src_first_drugs -- This view returns only the first occurrence drug_eras.

Step 3 (Optional): User Modifiable Range Functions

Modify range functions in OSIM5_package.sql

The user-modifiable range functions are used by both the database analysis and simulation phases of OSIM. They specify the bucketing of transition probabilities. The user can control these ranges and bucketing by modifying the function in OSIM5_package.sql. The default buckets are identical to version 2 (which were derived from trial and error during development). The functions are described in more detail in the Data dictionary and Process Design documentation of OSIM 2 avaiable in v2 documentation folder.

Please note: The same range functions must be used during anaysis and simulation phases.

Step 4: Table Generation

Run OSIM5_tables.sql file

The OSIM 5 package uses some tables (in the anaysis stage, to store final synthetic data, etc), whcih need to be created before the package is executed. This can be done by executing the OSIM5_tables.sql file. The tables are described in detail in the Data Dictionary and Process Design documentation of OSIM 2.

Step 5: Analysis Phase: analyze_source_db()

Run stored procedure (analyze_source_db)

OSIM 5 is based on transition probabilty tables which are used to store probability characteristics of the source CDM database. The OSIM5 package method analyze_source_db() performs all the CDM database analysis.

This method will truncate and repopulate all the Transition Probability Tables.

NOTE: This part of the process is time and computationlly intensive and may require several days to run, depending on the size of the database being analyzed. In version 2 the progress can be monitored in the process_log table, but since Postgres does not have support for autonomous transactions, the progress can be monitored through DEBUG statements by setting the postgres log level to DEBUG.

Step 6: Simulated Data: ins_sim_data(person_count,person_start_id)

Run stored procedure (ins_sim_data)

The simulated data will be inserted into four CDM format tables:

  1. osim_person
  2. osim_observation_period
  3. osim_condition_era
  4. osim_drug_era
  5. osim_procedure_occurrence

Patients are simulated through the ins_sim_data() method of the OSIM 5 package.

Optional parameters
person_count -- number of persons to simulate (default=5000)
person_start_id -- the starting person_id to use (default=next incremental value)

The method can be run multiple times in succession to append more and more data to the "osim_" prefixed CDM tables.

Step 7(Optional): Outcomes

In Progress

Step 8(Optional): Copy Data

In Progress

Other OSIM Methods

delete_all_sim_data() -- will delete all data from "osim_" prefixed tables


drop_osim_indexes() -- will drop all indexes from "osim_" prefixed tables


create_osim_indexes() -- will create all indexes from "osim_" prefixed tables

Simulation Scenarios

All command blocks should be executed inside PostgreSQL

========================================================================
Simple non-parallel analysis and simulation of 100,000 persons

begin
   perform analyze_source_db();
   peform ins_sim_data(100000);
end;

========================================================================
Analysis and parallel simulation of 50,000 persons (2 x 250,000)
Note: ANALYSIS MUST COMPLETE BEFORE STARTING SIMULATIONS

begin
   perform analyze_source_db();
end;



-- Parallel Simulation 1
begin
   perform ins_sim_data(250000,1);
end;



-- Parallel Simulation 2
begin
  perform ins_sim_data(250000,250001);
end; <br/> 

About

An updated version of OSIM for CDM v5

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • PLpgSQL 100.0%