Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding ETL Document #15

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 39 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,45 @@ Attribute Info - https://www.hindawi.com/journals/bmri/2014/781670/tab1/
# Build Data Warehouse
![Process](process.png)

## ETL Document

| | | Target | | | | | | Source | | |
|----------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------|------|-----------------|----------|---------------|--------------|----------------------------|------------------|---------------------------------------------------------------------------------------|
| Column Name | Description | Data Type | Size | Example Value | SCD Type | Source System | Source Table | Source Field Name | Source Data Type | ETL Rule |
| encounter_id | Unique identifier of an encounter | SMALLINT | | 2 | 1 | Derived | | encounter_id | INT | Surrogate key |
| patient_sk | source primary key | SMALLINT | | 5 | 1 | Derived | | | INT | Surrogate key |
| test_sk | source primary key | SMALLINT | | 10 | 1 | Derived | | | INT | Surrogate key |
| medication_sk | source primary key | SMALLINT | | 12 | 1 | Derived | | | INT | Surrogate key |
| diagnosis_sk | source primary key | SMALLINT | | 14 | 1 | Derived | | | INT | Surrogate key |
| date_sk | source primary key | SMALLINT | | 16 | 1 | Derived | | | INT | Surrogate key |
| discharge_sk | source primary key | SMALLINT | | 17 | 1 | Derived | | | INT | Surrogate key |
| admissionDetail_sk | source primary key | SMALLINT | | 67 | 1 | Derived | | | INT | Surrogate key |
| time_in_hospital | How many days stay at the hospital | INT | | 10 | 1 | dbdiabetic | patient | time_in_hospital | INT | |
| num_lab_procedure | Number of lab tests performed during the encounter | INT | | 50 | 1 | dbdiabetic | | num_lab_procedure | INT | |
| num_procedures | Number of procedures (other than lab tests) performed during the encounter | INT | | 5 | 1 | dbdiabetic | | num_procedures | INT | |
| num_medication | Number of distinct generic names administered during the encounter | INT | | 12 | 1 | dbdiabetic | | num_medication | INT | |
| number_outpatient | Number of outpatient visits of the patient in the year preceding the encounter | INT | | 0 | 1 | dbdiabetic | | num_medication | INT | |
| number_emergency | Number of outpatient visits of the patient in the year preceding the encounter | INT | | 1 | 1 | dbdiabetic | | number_emergency | INT | |
| number_inpatient | Number of inpatient visits of the patient in the year preceding the encounter | INT | | 2 | 1 | dbdiabetic | | number_inpatient | INT | |
| number_diagnoses | Number of diagnoses entered to the system | INT | | 7 | 1 | dbdiabetic | | number_diagnoses | INT | |
| race | Values: Caucasian, Asian, African American, Hispanic, and other | VARCHAR | 45 | AfricanAmerican | 1 | dbdiabetic | patient | race | VARCHAR | DELETE FROM `diabetes_dwh_staging`.`dataset_modified` WHERE `race` = '?'; |
| gender | Values: male, female, and unknown/invalid | VARCHAR | 45 | Male | 1 | dbdiabetic | patient | gender | VARCHAR | |
| age | Grouped in 10-year intervals: 0, 10), 10, 20), …, 90, 100) | VARCHAR | 45 | [0-10) | 1 | dbdiabetic | patient | age | VARCHAR | Copy column |
| weight | Weight in pounds. | VARCHAR | 45 | [50-75) | 1 | dbdiabetic | patient | weight | VARCHAR | Copy column |
| patient_number | Unique identifier of a patient | VARCHAR | 45 | 8222157 | 1 | dbdiabetic | | patient_nbr | VARCHAR | |
| discharge_dispositon | Integer identifier corresponding to 29 distinct values, for example, discharged to home, expired, and not available | VARCHAR | 45 | | 1 | dbdiabetic | | | VARCHAR | |
| readmitted | Days to inpatient readmission. Values: “<30” if the patient was readmitted in less than 30 days, “>30” if the patient was readmitted in more than 30 days, and “No” for no record of readmission. | VARCHAR | 45 | >30 | 1 | dbdiabetic | | readmitted | VARCHAR | Copy column |
| payer_code | Identifier corresponding to 23 distinct values, for example, Blue Cross/Blue Shield, Medicare, and self-pay | VARCHAR | 45 | MC | 1 | dbdiabetic | | | VARCHAR | DELETE FROM `diabetes_dwh_staging`.`dataset_modified` WHERE `payer_code` = '?'; |
| admission_type | Describe if it is Emergency, urgent elective,new born, truma cancer, not mapped or null | VARCHAR | 45 | Emergency | 1 | dbdiabetic | | | VARCHAR | Copy column |
| admission_source | identifier corresponding to 21 distinct values, for example, physician referral, emergency room, and transfer from a hospital | VARCHAR | 45 | Clinic Referral | 1 | dbdiabetic | | | VARCHAR | Copy column |
| medical_speciality | Specialized area identifier of a specialty of the admitting physician, corresponding to 84 distinct values, for example, cardiology, internal medicine, family/general practice, and surgeon | VARCHAR | 45 | Surgery-General | 1 | dbdiabetic | doctor | medical speciality | DOUBLE | DELETE FROM `diabetes_dwh_staging`.`dataset_modified`WHERE `medical_specialty` = '?'; |
| diagnosis_1 | The primary diagnosis (coded as first three digits of ICD9); 848 distinct values | DOUBLE | 45 | 250.7 | 1 | dbdiabetic | | diag_1 | DOUBLE | DELETE FROM `diabetes_dwh_staging`.`dataset_modified` WHERE `diag_1` = '?'; |
| diagnosis_2 | Secondary diagnosis (coded as first three digits of ICD9); 923 distinct values | DOUBLE | 45 | 411 | 1 | dbdiabetic | | diag_2 | DOUBLE | DELETE FROM `diabetes_dwh_staging`.`dataset_modified` WHERE `diag_2` = '?'; |
| diagnosis_3 | Additional secondary diagnosis (coded as first three digits of ICD9); 954 distinct values | DOUBLE | 45 | 486 | 1 | dbdiabetic | | diag_3 | VARCHAR | DELETE FROM `diabetes_dwh_staging`.`dataset_modified` WHERE `diag_3` = '?'; |
| glucose_serum_test_results | ndicates the range of the result or if the test was not taken. Values: “>200,” “>300,” “normal,” and “none” if not measured | VARCHAR | 45 | None | 1 | dbdiabetic | | glucose_serum_test_results | VARCHAR | Copy column |
| a1c_test_results | Indicates the range of the result or if the test was not taken. Values: “>8” if the result was greater than 8%, “>7” if the result was greater than 7% but less than 8%, “normal” if the result was less than 7%, and “none” if not measured. | VARCHAR | 45 | None | 1 | dbdiabetic | | a1c_test_results | VARCHAR | Copy column |
| change_of_medication | ndicates if there was a change in diabetic medications (either dosage or generic name). Values: “change” and “no change” | VARCHAR | 45 | Ch | 1 | dbdiabetic | | change_of_medication | VARCHAR | Copy column |

## Step 01 - Create Schema for Staging Area
Use following query to create the database named '*diabetes_dwh_staging*' and tables.
- Database: diabetes_dwh_staging
Expand Down