Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom PertData new_data_process error #62

Open
Yonggie opened this issue Apr 3, 2024 · 1 comment
Open

Custom PertData new_data_process error #62

Yonggie opened this issue Apr 3, 2024 · 1 comment

Comments

@Yonggie
Copy link

Yonggie commented Apr 3, 2024

According to custom data turorial,

(2) Create your own Perturb-Seq data
Prepare a scanpy adata object with
adata.obs dataframe has condition and cell_type columns, where condition is the perturbation name for each cell. Control cells have condition format of ctrl, single perturbation has condition format of A+ctrl or ctrl+A, combination perturbation has condition format of A+B.
adata.var dataframe has gene_name column, where each gene name is the gene symbol.
adata.X stores the post-perturbed gene expression.

custom data

dataset download: https://zenodo.org/records/7041849/files/AdamsonWeissman2016_GSM2406675_10X001.h5ad?download=1

data

  • adata.obs.columns.values: ['perturbation', 'read count', 'UMI count', 'tissue_type', 'cell_line', 'cancer', 'disease', perturbation_type', 'celltype', 'organism', 'ncounts', 'ngenes', 'percent_mito', 'percent_ribo', 'nperts']
  • adata.var.columns.values: ['ensembl_id', 'ncounts', 'ncells']

processing code

import scanpy
adata=scanpy.read_h5ad('./AdamsonWeissman2016_GSM2406675_10X001.h5ad')
# modifications:
# 1. adata.obs['perturbation]   gene_compound  => gene+compound
adata.obs['perturbation'] = adata.obs['perturbation'].str.replace('_', '+')  
adata.obs.rename(columns={'perturbation': 'condition'}, inplace=True)  
# 2. adata.obs['celltype'] => cell_type
adata.obs.rename(columns={'celltype': 'cell_type'}, inplace=True)  
# 3. adata.var  ensembl_id => gene_name
adata.var.rename(columns={'ensembl_id': 'gene_name'}, inplace=True)  

# condition should be in type str
adata.obs['condition']=adata.obs['condition'].astype(str)

pert_data.new_data_process(dataset_name = 'AdW1', adata = adata)

error:

ValueError: reference = lymphoblasts_ctrl_1 needs to be one of groupby = ['lymphoblasts_62(mod)+pBA581_1+1', 'lymphoblasts_*_1', 'lymphoblasts_BHLHE40+pDS258_1+1', 'lymphoblasts_CREB1+pDS269_1+1', 'lymphoblasts_DDIT3+pDS263_1+1', 'lymphoblasts_EP300+pDS268_1+1', 'lymphoblasts_SNAI1+pDS266_1+1', 'lymphoblasts_SPI1+pDS255_1+1', 'lymphoblasts_ZNF326+pDS262_1+1', 'lymphoblasts_nan_1']

except for the condition, cell_type, gene_name, X, what else preprocesses shall there be?

@bboyrush117
Copy link

I am also having a similar issue with this dataset: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE216595

Screenshot 2024-06-06 at 1 18 26 PM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants