diff --git a/README.md b/README.md index 18144e2..d57d7a9 100644 --- a/README.md +++ b/README.md @@ -12,6 +12,45 @@ Attribute Info - https://www.hindawi.com/journals/bmri/2014/781670/tab1/ # Build Data Warehouse ![Process](process.png) +## ETL Document + +| | | Target | | | | | | Source | | | +|----------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------|------|-----------------|----------|---------------|--------------|----------------------------|------------------|---------------------------------------------------------------------------------------| +| Column Name | Description | Data Type | Size | Example Value | SCD Type | Source System | Source Table | Source Field Name | Source Data Type | ETL Rule | +| encounter_id | Unique identifier of an encounter | SMALLINT | | 2 | 1 | Derived | | encounter_id | INT | Surrogate key | +| patient_sk | source primary key | SMALLINT | | 5 | 1 | Derived | | | INT | Surrogate key | +| test_sk | source primary key | SMALLINT | | 10 | 1 | Derived | | | INT | Surrogate key | +| medication_sk | source primary key | SMALLINT | | 12 | 1 | Derived | | | INT | Surrogate key | +| diagnosis_sk | source primary key | SMALLINT | | 14 | 1 | Derived | | | INT | Surrogate key | +| date_sk | source primary key | SMALLINT | | 16 | 1 | Derived | | | INT | Surrogate key | +| discharge_sk | source primary key | SMALLINT | | 17 | 1 | Derived | | | INT | Surrogate key | +| admissionDetail_sk | source primary key | SMALLINT | | 67 | 1 | Derived | | | INT | Surrogate key | +| time_in_hospital | How many days stay at the hospital | INT | | 10 | 1 | dbdiabetic | patient | time_in_hospital | INT | | +| num_lab_procedure | Number of lab tests performed during the encounter | INT | | 50 | 1 | dbdiabetic | | num_lab_procedure | INT | | +| num_procedures | Number of procedures (other than lab tests) performed during the encounter | INT | | 5 | 1 | dbdiabetic | | num_procedures | INT | | +| num_medication | Number of distinct generic names administered during the encounter | INT | | 12 | 1 | dbdiabetic | | num_medication | INT | | +| number_outpatient | Number of outpatient visits of the patient in the year preceding the encounter | INT | | 0 | 1 | dbdiabetic | | num_medication | INT | | +| number_emergency | Number of outpatient visits of the patient in the year preceding the encounter | INT | | 1 | 1 | dbdiabetic | | number_emergency | INT | | +| number_inpatient | Number of inpatient visits of the patient in the year preceding the encounter | INT | | 2 | 1 | dbdiabetic | | number_inpatient | INT | | +| number_diagnoses | Number of diagnoses entered to the system | INT | | 7 | 1 | dbdiabetic | | number_diagnoses | INT | | +| race | Values: Caucasian, Asian, African American, Hispanic, and other | VARCHAR | 45 | AfricanAmerican | 1 | dbdiabetic | patient | race | VARCHAR | DELETE FROM `diabetes_dwh_staging`.`dataset_modified` WHERE `race` = '?'; | +| gender | Values: male, female, and unknown/invalid | VARCHAR | 45 | Male | 1 | dbdiabetic | patient | gender | VARCHAR | | +| age | Grouped in 10-year intervals: 0, 10), 10, 20), …, 90, 100) | VARCHAR | 45 | [0-10) | 1 | dbdiabetic | patient | age | VARCHAR | Copy column | +| weight | Weight in pounds. | VARCHAR | 45 | [50-75) | 1 | dbdiabetic | patient | weight | VARCHAR | Copy column | +| patient_number | Unique identifier of a patient | VARCHAR | 45 | 8222157 | 1 | dbdiabetic | | patient_nbr | VARCHAR | | +| discharge_dispositon | Integer identifier corresponding to 29 distinct values, for example, discharged to home, expired, and not available | VARCHAR | 45 | | 1 | dbdiabetic | | | VARCHAR | | +| readmitted | Days to inpatient readmission. Values: “<30” if the patient was readmitted in less than 30 days, “>30” if the patient was readmitted in more than 30 days, and “No” for no record of readmission. | VARCHAR | 45 | >30 | 1 | dbdiabetic | | readmitted | VARCHAR | Copy column | +| payer_code | Identifier corresponding to 23 distinct values, for example, Blue Cross/Blue Shield, Medicare, and self-pay | VARCHAR | 45 | MC | 1 | dbdiabetic | | | VARCHAR | DELETE FROM `diabetes_dwh_staging`.`dataset_modified` WHERE `payer_code` = '?'; | +| admission_type | Describe if it is Emergency, urgent elective,new born, truma cancer, not mapped or null | VARCHAR | 45 | Emergency | 1 | dbdiabetic | | | VARCHAR | Copy column | +| admission_source | identifier corresponding to 21 distinct values, for example, physician referral, emergency room, and transfer from a hospital | VARCHAR | 45 | Clinic Referral | 1 | dbdiabetic | | | VARCHAR | Copy column | +| medical_speciality | Specialized area identifier of a specialty of the admitting physician, corresponding to 84 distinct values, for example, cardiology, internal medicine, family/general practice, and surgeon | VARCHAR | 45 | Surgery-General | 1 | dbdiabetic | doctor | medical speciality | DOUBLE | DELETE FROM `diabetes_dwh_staging`.`dataset_modified`WHERE `medical_specialty` = '?'; | +| diagnosis_1 | The primary diagnosis (coded as first three digits of ICD9); 848 distinct values | DOUBLE | 45 | 250.7 | 1 | dbdiabetic | | diag_1 | DOUBLE | DELETE FROM `diabetes_dwh_staging`.`dataset_modified` WHERE `diag_1` = '?'; | +| diagnosis_2 | Secondary diagnosis (coded as first three digits of ICD9); 923 distinct values | DOUBLE | 45 | 411 | 1 | dbdiabetic | | diag_2 | DOUBLE | DELETE FROM `diabetes_dwh_staging`.`dataset_modified` WHERE `diag_2` = '?'; | +| diagnosis_3 | Additional secondary diagnosis (coded as first three digits of ICD9); 954 distinct values | DOUBLE | 45 | 486 | 1 | dbdiabetic | | diag_3 | VARCHAR | DELETE FROM `diabetes_dwh_staging`.`dataset_modified` WHERE `diag_3` = '?'; | +| glucose_serum_test_results | ndicates the range of the result or if the test was not taken. Values: “>200,” “>300,” “normal,” and “none” if not measured | VARCHAR | 45 | None | 1 | dbdiabetic | | glucose_serum_test_results | VARCHAR | Copy column | +| a1c_test_results | Indicates the range of the result or if the test was not taken. Values: “>8” if the result was greater than 8%, “>7” if the result was greater than 7% but less than 8%, “normal” if the result was less than 7%, and “none” if not measured. | VARCHAR | 45 | None | 1 | dbdiabetic | | a1c_test_results | VARCHAR | Copy column | +| change_of_medication | ndicates if there was a change in diabetic medications (either dosage or generic name). Values: “change” and “no change” | VARCHAR | 45 | Ch | 1 | dbdiabetic | | change_of_medication | VARCHAR | Copy column | + ## Step 01 - Create Schema for Staging Area Use following query to create the database named '*diabetes_dwh_staging*' and tables. - Database: diabetes_dwh_staging