Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
  • Loading branch information
Jenny Smith committed Feb 6, 2019
2 parents 3c1ceda + 540b630 commit 134a90c
Show file tree
Hide file tree
Showing 23 changed files with 612 additions and 1 deletion.
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -9,4 +9,5 @@ AML_assay_clinical.csv

scripts/AML_assay_clinical.csv
scripts/NBL_assay_clinical.csv
scripts/WT_assay_clinical.csv
scripts/WT_assay_clinical.csv
.Rproj.user
1 change: 1 addition & 0 deletions BuieRProj/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
*.nb.html
13 changes: 13 additions & 0 deletions BuieRProj/BuieRProj.Rproj
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
Version: 1.0

RestoreWorkspace: Default
SaveWorkspace: Default
AlwaysSaveHistory: Default

EnableCodeIndexing: Yes
UseSpacesForTab: Yes
NumSpacesForTab: 2
Encoding: UTF-8

RnwWeave: Sweave
LaTeX: pdfLaTeX
122 changes: 122 additions & 0 deletions BuieRProj/KM_WHATEVER.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
---
title: "R Notebook"
output: html_notebook
---

```{r setup}
library(xlsx)
library(MASS)
library(stats)
library(useful)
library(ggplot2)
```

# Aim

Identfiy relationships between categories of AML and genetic assay results

# Background

Acute Myeloid Leukemia is a cancer affecting blood stem cells. Onset is sudden and affected cells can outcompete normal cells, resulted in complications and death. AML has multiple variants. Unlike staging used in many cancers AML affected cells are typed as part of treatment. These types serve to identify phenotypes of expression of the disease. Treatment is similar to other cancerns, involving radiation and chemotherapy. Relapse affects 50% of cases.

```{r read_data}
#reading a full data set of patients (these are later subsetted to patients with assays
{
PatientList <- read.xlsx("./data/TARGET_AML_ClinicalData_20160714.xlsx", sheetIndex = 1, stringsAsFactors = FALSE)
PatientAndPhenotypeOnly <- PatientList
# removing non mutation phenotyping information from DF
PatientAndPhenotypeOnly[,c("Gender","Race",
"Ethnicity",
"Age.at.Diagnosis.in.Days",
"First.Event",
"Event.Free.Survival.Time.in.Days",
"Vital.Status",
"Overall.Survival.Time.in.Days",
"Year.of.Diagnosis",
"Year.of.Last.Follow.Up",
"Protocol",
"WBC.at.Diagnosis",
"Bone.marrow.leukemic.blast.percentage....",
"Peripheral.blasts....",
"CNS.disease",
"Chloroma",
"FAB.Category",
"Cytogenetic.Code.Other",
"Cytogenetic.Complexity",
"Primary.Cytogenetic.Code",
"ISCN",
"MRD.at.end.of.course.1",
"MRD...at.end.of.course.1",
"MRD.at.end.of.course.2",
"MRD...at.end.of.course.2",
"CR.status.at.end.of.course.1" ,
"CR.status.at.end.of.course.2" ,
"Risk.group",
"SCT.in.1st.CR",
"Bone.Marrow.Site.of.Relapse.Induction.Failure",
"CNS.Site.of.Relapse.Induction.Failure" ,
"Chloroma.Site.of.Relapse.Induction.Failure",
"Cytogenetic.Site.of.Relapse.Induction.Failure",
"Other.Site.of.Relapse.Induction.Failure",
"Comment",
"Refractory.Timepoint.sent.for.Induction.Failure.Project")] <- NULL
PatientAndPhenotypeOnly[PatientAndPhenotypeOnly == "Unknown" | PatientAndPhenotypeOnly == "N/A" | PatientAndPhenotypeOnly == "Not done"] <- 0
PatientAndPhenotypeOnly[PatientAndPhenotypeOnly == "No" |PatientAndPhenotypeOnly == "NO" ] <- 0
PatientAndPhenotypeOnly[PatientAndPhenotypeOnly == "Yes"| PatientAndPhenotypeOnly == "YES"] <- 1
PatientAndPhenotypeOnly[-1] <- lapply(PatientAndPhenotypeOnly[-1], as.numeric)
}
#reading compiled patient data
{
CleanedWithGenes <- read.csv("./data/AML_assay_clinical.csv")
}
```

# Data

Our data included 993 patients with AML. Each patient has demographic diagnostic and outcome data. Phenotyping and clinical data were not consistantly available in any case. Genetic assay data were available for 187 patients.

## Assumptions

Where phenotyping information were not available, we assumed the observation to be "not there" instead of NA. This was to

# Analysis

Our analysis took two phases, grouping of cellular classifications into phenotypes, and identification of differences in genetic assay results between phenotypes.

## Identifying Phenotypes: K-means

Data from all 993 patients were used to identify phenotypes. Patients had a variety of tests for cell mutation typing, including positive and negatives. However, all patients also had NA values across some tests. This aligns with clinical practice where a provider focuses on testing for certain types and, once an optimal course of treatment is identified, has no need to further specify the types of mutation. Because a step of clinical expertise was involved, we decided to identify all NA values has 0 (not found), under the assumption that, more likely than not, something not tested for was not expected by the expert.

Notably, rather than informing phenotypes through clinical or biological knowledge, we used the k-means unsupervised classification to group the available phenotype data. A synsitivity analysis of K= 20 to K=4 identified clustering of classes into groups of 4 across all values of K. Therefore the model of K = 4 was used to assign classes to the data.

```{r kmeans_clustering_analysis}
KMeansReview <- PatientAndPhenotypeOnly
unlink("plots/kmeans/*.jpg")
for(i in 20:4) {
km.PatientPheno <- kmeans(PatientAndPhenotypeOnly[-1],i, nstart = 25)
KMeansReview[,paste(i,"_k", sep = "")] <- km.PatientPheno$cluster
plot(km.PatientPheno)+ggsave(paste("plots/kmeans/",i,".jpg", sep = ""))
}
```

```{r attaching_clusters_cleaned_data }
GroupMatching <- KMeansReview[,c("TARGET.USI","4_k")]
Clea
```



474 changes: 474 additions & 0 deletions BuieRProj/KM_WHATEVER.nb.html

Large diffs are not rendered by default.

Binary file not shown.
Binary file added BuieRProj/plots/kmeans/10.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added BuieRProj/plots/kmeans/11.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added BuieRProj/plots/kmeans/12.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added BuieRProj/plots/kmeans/13.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added BuieRProj/plots/kmeans/14.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added BuieRProj/plots/kmeans/15.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added BuieRProj/plots/kmeans/16.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added BuieRProj/plots/kmeans/17.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added BuieRProj/plots/kmeans/18.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added BuieRProj/plots/kmeans/19.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added BuieRProj/plots/kmeans/20.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added BuieRProj/plots/kmeans/4.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added BuieRProj/plots/kmeans/5.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added BuieRProj/plots/kmeans/6.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added BuieRProj/plots/kmeans/7.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added BuieRProj/plots/kmeans/8.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added BuieRProj/plots/kmeans/9.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 134a90c

Please sign in to comment.