Cell2location 是一个原则性的贝叶斯模型,它可以解析空间转录数据中的细胞类型,并创建不同组织的全面细胞图。Cell2location 解释了变异的技术来源,并借用了不同位置的统计强度,从而使单细胞和空间转录组的集成比现有工具具有更高的灵敏度和分辨率。这是通过估计哪种细胞类型的组合,其细胞丰度可以给出空间数据中的mRNA counts数,同时模拟技术效应(平台/技术效应,污染 RNA,无法解释的方差)来实现的。
Cell2location需要未转化、未标准化的空间mRNA counts数作为输入;还需要提供给Cell2location每个位置被期望的细胞丰度,这是用来作为一个估计完整细胞丰度的先导,这个数值取决于组织,可以通过计算配对组织学图像中少数部位的细胞核来估计,可以是近似的。
Cell2location提供了从 scRNA-seq 数据中估计参考细胞类型特征的两种方法:
cell2location.cluster_averages.compute_cluster_averages)。当批次效应很小时,这种计算每个集群平均值的更快的硬编码方法提供了同样高的准确性。我们还建议对于非 UMI 技术使用硬编码方法,如 Smart-Seq 2。
import sys
IN_COLAB = "google.colab" in sys.modules
if IN_COLAB and branch == "stable":
!pip install --quiet git+https://github.com/BayraktarLab/cell2location#egg=cell2location[tutorials]
import scanpy as sc
import anndata
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
import cell2location
import scvi
from matplotlib import rcParams
rcParams['pdf.fonttype'] = 42 # enables correct plotting of text
import seaborn as sns
#定义分析结果保存在哪
results_folder = './results/lymph_nodes_analysis/'
# create paths and names to results folders for reference regression and cell2location models
ref_run_name = f'{results_folder}/reference_signatures'
run_name = f'{results_folder}/cell2location_map'
#通过scanpy导入公开发表的淋巴结空间转录组数据
adata_vis = sc.datasets.visium_sge(sample_id="V1_Human_Lymph_Node")
adata_vis.obs['sample'] = list(adata_vis.uns['spatial'].keys())[0]
# rename genes to ENSEMBL
adata_vis.var['SYMBOL'] = adata_vis.var_names#_names:提取行名
adata_vis.var_names = adata_vis.var['gene_ids']
adata_vis.var_names.name = None#_names.name = None将名称设为无
注:线粒体编码基因(基因名称以 MT 或 MT-开头)对空间定位无关,因为它们的表达代表了单个细胞和细胞核数据中的技术性假象,而不是线粒体的生物丰度。然而,这些基因在每个位置的mRNA中占15-40% 。因此,为了避免绘制伪影,强烈建议移除线粒体基因。
# find mitochondria-encoded (MT) genes
adata_vis.var['MT_gene'] = [gene.startswith('MT-') for gene in adata_vis.var['SYMBOL']]
# remove MT genes for spatial mapping (keeping their counts in the object)
adata_vis.obsm['MT'] = adata_vis[:, adata_vis.var['MT_gene'].values].X.toarray()#传递了稀疏矩阵,但需要密集数据。使用X.toarray()转换为密集的numpy数组
adata_vis = adata_vis[:, ~adata_vis.var['MT_gene'].values]
已发表的淋巴结 scRNA-seq 数据集通常缺乏足够的代表性的生发中心相关的免疫细胞群体,由于患者捐赠者的年龄。因此,我们将涵盖淋巴结、脾脏和扁桃体的 scRNA-seq 数据集包含在单细胞参考数据中,以确保在空间转录数据集中能够捕捉到可能存在的免疫细胞状态的全部多样性。
在这里,我们下载这个数据集,导入到anndata并改变变量名称为 ENSEMBL 基因标识符。
# Download data if not already here
import os
if not os.path.exists('./data/sc.h5ad'):
!cd ./data/ && wget https://cell2location.cog.sanger.ac.uk/paper/integrated_lymphoid_organ_scrna/RegressionNBV4Torch_57covariates_73260cells_10237genes/sc.h5ad
# Read data
adata_ref = sc.read(f'./data/sc.h5ad')
# Use ENSEMBL as gene IDs to make sure IDs are unique and correctly matched
adata_ref.var['SYMBOL'] = adata_ref.var.index
adata_ref.var.index = adata_ref.var['GeneID-2'].copy()
adata_ref.var_names = adata_ref.var['GeneID-2'].copy()
adata_ref.var.index.name = None
adata_ref.raw.var['SYMBOL'] = adata_ref.raw.var.index
adata_ref.raw.var.index = adata_ref.raw.var['GeneID-2'].copy()
adata_ref.raw.var.index.name = None
# before we estimate the reference cell type signature we recommend to perform very permissive genes selection
# in this 2D histogram orange rectangle lays over excluded genes.
# In this case, the downloaded dataset was already filtered using this method,
# hence no density under the orange rectangle
from cell2location.utils.filtering import filter_genes
selected = filter_genes(adata_ref, cell_count_cutoff=5, cell_percentage_cutoff2=0.03, nonz_mean_cutoff=1.12)
# filter the object
adata_ref = adata_ref[:, selected].copy()

# prepare anndata for the regression model
scvi.data.setup_anndata(adata=adata_ref,
# 10X reaction / sample / batch
batch_key='Sample',
# cell type, covariate used for constructing signatures
labels_key='Subset',
# multiplicative technical effects (platform, 3' vs 5', donor effect)
categorical_covariate_keys=['Method']
)
scvi.data.view_anndata_setup(adata_ref)



# create and train the regression model
from cell2location.models import RegressionModel
mod = RegressionModel(adata_ref)
# Use all data for training (validation not implemented yet, train_size=1)
mod.train(max_epochs=250, batch_size=2500, train_size=1, lr=0.002, use_gpu=True)
# plot ELBO loss history during training, removing first 20 epochs from the plot
mod.plot_history(20)

# In this section, we export the estimated cell abundance (summary of the posterior distribution).
adata_ref = mod.export_posterior(
adata_ref, sample_kwargs={'num_samples': 1000, 'batch_size': 2500, 'use_gpu': True}
)
# Save model
mod.save(f"{ref_run_name}", overwrite=True)
# Save anndata object with results
adata_file = f"{ref_run_name}/sc.h5ad"
adata_ref.write(adata_file)
adata_file
mod.plot_QC()

#模型和输出 h5ad 可以像下面这样加载:
mod = cell2location.models.RegressionModel.load(f"{ref_run_name}", adata_ref)
adata_file = f"{ref_run_name}/sc.h5ad"
adata_ref = sc.read_h5ad(adata_file)
# export estimated expression in each cluster
if 'means_per_cluster_mu_fg' in adata_ref.varm.keys():
inf_aver = adata_ref.varm['means_per_cluster_mu_fg'][[f'means_per_cluster_mu_fg_{i}'
for i in adata_ref.uns['mod']['factor_names']]].copy()
else:
inf_aver = adata_ref.var[[f'means_per_cluster_mu_fg_{i}'
for i in adata_ref.uns['mod']['factor_names']]].copy()
inf_aver.columns = adata_ref.uns['mod']['factor_names']
inf_aver.iloc[0:5, 0:5]

# find shared genes and subset both anndata and reference signatures
intersect = np.intersect1d(adata_vis.var_names, inf_aver.index)
adata_vis = adata_vis[:, intersect].copy()
inf_aver = inf_aver.loc[intersect, :].copy()
# prepare anndata for cell2location model
scvi.data.setup_anndata(adata=adata_vis, batch_key="sample")
scvi.data.view_anndata_setup(adata_vis)

n_cells_per_location 和 detection_alpha)detection_alpha 超参数的默认值,但是将预期的细胞丰度 n_cells per_location调整到每个组织是有用的。n_cells_ per_location = 30)更改为在您的组织中观察到的值。
# create and train the model
mod = cell2location.models.Cell2location(
adata_vis, cell_state_df=inf_aver,
# the expected average cell abundance: tissue-dependent
# hyper-prior which can be estimated from paired histology:
N_cells_per_location=30,
# hyperparameter controlling normalisation of
# within-experiment variation in RNA detection (using default here):
detection_alpha=200
)
mod.train(max_epochs=30000,
# train using full data (batch_size=None)
batch_size=None,
# use all data points in training because
# we need to estimate cell abundance at all locations
train_size=1,
use_gpu=True)
# plot ELBO loss history during training, removing first 100 epochs from the plot
mod.plot_history(1000)
plt.legend(labels=['full data training']);

# In this section, we export the estimated cell abundance (summary of the posterior distribution).
adata_vis = mod.export_posterior(
adata_vis, sample_kwargs={'num_samples': 1000, 'batch_size': mod.adata.n_obs, 'use_gpu': True}
)
# Save model
mod.save(f"{run_name}", overwrite=True)
# mod = cell2location.models.Cell2location.load(f"{run_name}", adata_vis)
# Save anndata object with results
adata_file = f"{run_name}/sp.h5ad"
adata_vis.write(adata_file)
adata_file

#模型和输出 h5ad 可以像下面这样加载:
mod = cell2location.models.Cell2location.load(f"{run_name}", adata_vis)
adata_file = f"{run_name}/sp.h5ad"
adata_vis = sc.read_h5ad(adata_file)
# Examine reconstruction accuracy to assess if there are any issues with mapping
# the plot should be roughly diagonal, strong deviations will signal problems
mod.plot_QC()

当整合多个空间批次时和当处理载玻片中检测到的 RNA 差异很大的数据集时(这在组织学上无法用高细胞密度来解释) ,评估 cell2location 是否使这些影响恢复正常是很重要的。
期望看到的是不同批次中有相似的总细胞数和不同的RNA检测灵敏度(这都可以通过cell2location评估)。
期望组织学中的总细胞数量反映出高细胞密度。
# add 5% quantile, representing confident cell abundance, 'at least this amount is present',
# to adata.obs with nice names for plotting
adata_vis.obs[adata_vis.uns['mod']['factor_names']] = adata_vis.obsm['q05_cell_abundance_w_sf']
# select one slide
from cell2location.utils import select_slide
slide = select_slide(adata_vis, 'V1_Human_Lymph_Node')
# plot in spatial coordinates
with mpl.rc_context({'axes.facecolor': 'black','figure.figsize': [4.5, 5]}):
sc.pl.spatial(slide, cmap='magma',
# show first 8 cell types
color=['B_Cycling', 'B_GC_LZ', 'T_CD4+_TfH_GC', 'FDC','B_naive', 'T_CD4+_naive', 'B_plasma', 'Endo'],
ncols=4, size=1.3,
img_key='hires',
# limit color scale at 99.2% quantile of cell abundance
vmin=0, vmax='p99.2',
save=True
)

# Now we use cell2location plotter that allows showing multiple cell types in one panel
from cell2location.plt import plot_spatial
# select up to 6 clusters
clust_labels = ['T_CD4+_naive', 'B_naive', 'FDC']
clust_col = ['' + str(i) for i in clust_labels] # in case column names differ from labels
slide = select_slide(adata_vis, 'V1_Human_Lymph_Node')
import matplotlib.pyplot as plt
with mpl.rc_context({'figure.figsize': (15, 15)}):
fig = plot_spatial(
adata=slide,
# labels to show on a plot
color=clust_col, labels=clust_labels,
show_img=True,
# 'fast' (white background) or 'dark_background'
style='fast',
# limit color scale at 99.2% quantile of cell abundance
max_color_quantile=0.992,
# size of locations (adjust depending on figure size)
circle_diameter=6,
colorbar_position='right'
)
plt.savefig('showing_multiple_cell_types_in_one_panel.pdf')

我们通过使用由cell2location估计的细胞丰度来确定细胞组成不同的组织区域。
我们通过估计每种细胞类型的细胞丰度来聚集 Visium 点,从而找到组织区域。
首先构造了一个 k 邻近的 neigbour (KNN)图表示细胞丰度估计值中位置的相似性,然后应用 Leiden 聚类算法进行聚类。邻近区域的数量应适应数据集的大小和解剖学定义区域的大小(例如海马区域相当小,因此可能被大n_neighbors所掩盖)。这可以做一个范围的 KNN 近邻和Leiden聚类分辨率,直到一个聚类匹配的组织解剖结构获得。
聚类是跨所有 Visium 部分/批次共同完成的,因此区域身份是直接可比的。当多个批次之间存在强烈的技术效应时(这里不是这种情况) ,sc.external.pp.bbknn 原则上可以用来解释 KNN 构建过程中的这些效应。
生成的集群保存在 adata _ vis. obs [‘ region _ cluster’]中
# compute KNN using the cell2location output stored in adata.obsm
sc.pp.neighbors(adata_vis, use_rep='q05_cell_abundance_w_sf',
n_neighbors = 15)
# Cluster spots into regions using scanpy
sc.tl.leiden(adata_vis, resolution=1.1)
# add region as categorical variable
adata_vis.obs["region_cluster"] = adata_vis.obs["leiden"].astype("category")
# compute UMAP using KNN graph based on the cell2location output
sc.tl.umap(adata_vis, min_dist = 0.3, spread = 1)
# show regions in UMAP coordinates
with mpl.rc_context({'axes.facecolor': 'white',
'figure.figsize': [8, 8]}):
sc.pl.umap(adata_vis, color=['region_cluster'], size=30,
color_map = 'RdPu', ncols = 2, legend_loc='on data',
legend_fontsize=20)
sc.pl.umap(adata_vis, color=['sample'], size=30,
color_map = 'RdPu', ncols = 2,
legend_fontsize=20)
# plot in spatial coordinates
with mpl.rc_context({'axes.facecolor': 'black',
'figure.figsize': [4.5, 5]}):
sc.pl.spatial(adata_vis, color=['region_cluster'],
size=1.3, img_key='hires', alpha=0.5)


在这里,我们使用 cell2location 映射结果来确定细胞类型的空间共存,以便更好地理解组织组织和预测细胞相互作用。我们用cell2location进行了细胞类型丰度的非负矩阵分解估计。与将 NMF 应用于常规 scRNA-seq 的有点相似,加性 NMF 分解产生了一组空间细胞类型丰度分布图,这些分布图可以捕捉共同定位的细胞类型。这种基于NMF 的分解自然地解释了多种细胞类型和微环境可以在相同的 Visium 位置共存的事实 ,同时跨组织区域(例如个体生发中心)共享信息。
提示在实际应用中,最好对一系列因子 R = 5,..,30进行 NMF 训练。并选择 R作为捕获细小组织区域和分裂已知部分之间的平衡。如果你想找到一些最独特的细胞区间,使用小一点的R。如果你想找到非常强的同位信号,并假设大多数细胞类型不同位,使用大一点的R。
下面我们将展示如何执行此分析。为了帮助这个分析,我们把这个分析包装成一个管道,这个管道可以自动训练不同R的 NMF 模型:
from cell2location import run_colocation
res_dict, adata_vis = run_colocation(
adata_vis,
model_name='CoLocatedGroupsSklearnNMF',
train_args={
'n_fact': np.arange(11, 13), # IMPORTANT: use a wider range of the number of factors (5-30)
'sample_name_col': 'sample', # columns in adata_vis.obs that identifies sample
'n_restarts': 3 # number of training restarts
},
export_args={'path': f'{run_name}/CoLocatedComb/'}
)

cell_type_fractions_heatmap/:a dot plot of the estimated NMF weights of cell types (rows) across NMF components (columns)cell_type_fractions_mean/:the data used for dot plotfactor_markers/: tables listing top 10 cell types most speficic to each NMF factormodels/:saved NMF modelspredictive_accuracy/:2D histogram plot showing how well NMF explains cell2location outputspatial/:NMF weights across locatinos in spatial coordinateslocation_factors_mean/:the data used for the plot in spatial coordiantesstability_plots/:stability of NMF weights between training restartscell _ type _ fraction _ heatmap/中的文件,其中显示了 NMF 组件(列)中对应于细胞间隔的各种类型(行)的估计 NMF 权重的点图。显示的是相对权重,每个单元类型的组件之间的规范化。
# Here we plot the NMF weights (Same as saved to `cell_type_fractions_heatmap`)
res_dict['n_fact12']['mod'].plot_cell_type_loadings()

为此,我们采用了由 Cable 等人提出的估计条件期望表达式的方法。使用 cell2location,我们可以查看后验概率分布,而不仅仅是细胞型特定表达式的估计值
# Compute expected expression per cell type
expected_dict = mod.module.model.compute_expected_per_cell_type(
mod.samples["post_sample_q05"], mod.adata
)
# Add to anndata layers
for i, n in enumerate(mod.factor_names_):
adata_vis.layers[n] = expected_dict['mu'][i]
# Save anndata object with results
adata_file = f"{run_name}/sp.h5ad"
adata_vis.write(adata_file)
adata_file

# Look at cell type specific expression in spatial coordinates,
# Here we highlight CD3D, pan T-cell marker expressed by
# 2 subtypes of T cells in distinct locations but not expressed by co-located B cells
ctypes = ['T_CD4+_TfH_GC', 'T_CD4+_naive', 'B_GC_LZ']
genes = ['CD3D', 'CR2']
with mpl.rc_context({'axes.facecolor': 'black'}):
# select one slide
slide = select_slide(adata_vis, 'V1_Human_Lymph_Node')
from tutorial_utils import plot_genes_per_cell_type
plot_genes_per_cell_type(slide, genes, ctypes);

# Get posterior distribution samples for specific variables
samples_w_sf = mod.sample_posterior(num_samples=1000, use_gpu=True, return_samples=True,
batch_size=2020,
return_sites=['w_sf', 'm_g', 'u_sf_mRNA_factors'])
# samples_w_sf['posterior_samples'] contains 1000 samples as arrays with dim=(num_samples, ...)
samples_w_sf['posterior_samples']['w_sf'].shape

# Compute any quantile of the posterior distribution
medians = mod.posterior_quantile(q=0.5, use_gpu = True)
with mpl.rc_context({'axes.facecolor': 'white',
'figure.figsize': [5, 5]}):
plt.scatter(medians['w_sf'].flatten(), mod.samples['post_sample_means']['w_sf'].flatten());
plt.xlabel('median');
plt.ylabel('mean');

我在我的Rails3示例应用程序上使用CarrierWave。我想验证远程位置上传,因此当用户提交无效URL(空白或非图像)时,我不会收到标准错误异常:CarrierWave::DownloadErrorinImageController#createtryingtodownloadafilewhichisnotservedoverHTTP这是我的模型:classPaintingtrue,:length=>{:minimum=>5,:maximum=>100}validates:image,:presence=>trueend这是我的Controller:classPaintingsC
目录前言滤波电路科普主要分类实际情况单位的概念常用评价参数函数型滤波器简单分析滤波电路构成低通滤波器RC低通滤波器RL低通滤波器高通滤波器RC高通滤波器RL高通滤波器部分摘自《LC滤波器设计与制作》,侵权删。前言最近需要学习放大电路和滤波电路,但是由于只在之前做音乐频谱分析仪的时候简单了解过一点点运放,所以也是相当从零开始学习了。滤波电路科普主要分类滤波器:主要是从不同频率的成分中提取出特定频率的信号。有源滤波器:由RC元件与运算放大器组成的滤波器。可滤除某一次或多次谐波,最普通易于采用的无源滤波器结构是将电感与电容串联,可对主要次谐波(3、5、7)构成低阻抗旁路。无源滤波器:无源滤波器,又称
最近在学习CAN,记录一下,也供大家参考交流。推荐几个我觉得很好的CAN学习,本文也是在看了他们的好文之后做的笔记首先是瑞萨的CAN入门,真的通透;秀!靠这篇我竟然2天理解了CAN协议!实战STM32F4CAN!原文链接:https://blog.csdn.net/XiaoXiaoPengBo/article/details/116206252CAN详解(小白教程)原文链接:https://blog.csdn.net/xwwwj/article/details/105372234一篇易懂的CAN通讯协议指南1一篇易懂的CAN通讯协议指南1-知乎(zhihu.com)视频推荐CAN总线个人知识总
深度学习部署:Windows安装pycocotools报错解决方法1.pycocotools库的简介2.pycocotools安装的坑3.解决办法更多Ai资讯:公主号AiCharm本系列是作者在跑一些深度学习实例时,遇到的各种各样的问题及解决办法,希望能够帮助到大家。ERROR:Commanderroredoutwithexitstatus1:'D:\Anaconda3\python.exe'-u-c'importsys,setuptools,tokenize;sys.argv[0]='"'"'C:\\Users\\46653\\AppData\\Local\\Temp\\pip-instal
我完全不是程序员,正在学习使用Ruby和Rails框架进行编程。我目前正在使用Ruby1.8.7和Rails3.0.3,但我想知道我是否应该升级到Ruby1.9,因为我真的没有任何升级的“遗留”成本。缺点是什么?我是否会遇到与普通gem的兼容性问题,或者甚至其他我不太了解甚至无法预料的问题? 最佳答案 你应该升级。不要坚持从1.8.7开始。如果您发现不支持1.9.2的gem,请避免使用它们(因为它们很可能不被维护)。如果您对gem是否兼容1.9.2有任何疑问,您可以在以下位置查看:http://www.railsplugins.or
如何学习ruby的正则表达式?(对于假人) 最佳答案 http://www.rubular.com/在Ruby中使用正则表达式时是一个很棒的工具,因为它可以立即将结果可视化。 关于ruby-我如何学习ruby的正则表达式?,我们在StackOverflow上找到一个类似的问题: https://stackoverflow.com/questions/1881231/
深度学习12.CNN经典网络VGG16一、简介1.VGG来源2.VGG分类3.不同模型的参数数量4.3x3卷积核的好处5.关于学习率调度6.批归一化二、VGG16层分析1.层划分2.参数展开过程图解3.参数传递示例4.VGG16各层参数数量三、代码分析1.VGG16模型定义2.训练3.测试一、简介1.VGG来源VGG(VisualGeometryGroup)是一个视觉几何组在2014年提出的深度卷积神经网络架构。VGG在2014年ImageNet图像分类竞赛亚军,定位竞赛冠军;VGG网络采用连续的小卷积核(3x3)和池化层构建深度神经网络,网络深度可以达到16层或19层,其中VGG16和VGG
文章目录1、自相关函数ACF2、偏自相关函数PACF3、ARIMA(p,d,q)的阶数判断4、代码实现1、引入所需依赖2、数据读取与处理3、一阶差分与绘图4、ACF5、PACF1、自相关函数ACF自相关函数反映了同一序列在不同时序的取值之间的相关性。公式:ACF(k)=ρk=Cov(yt,yt−k)Var(yt)ACF(k)=\rho_{k}=\frac{Cov(y_{t},y_{t-k})}{Var(y_{t})}ACF(k)=ρk=Var(yt)Cov(yt,yt−k)其中分子用于求协方差矩阵,分母用于计算样本方差。求出的ACF值为[-1,1]。但对于一个平稳的AR模型,求出其滞
写在之前Shader变体、Shader属性定义技巧、自定义材质面板,这三个知识点任何一个单拿出来都是一套知识体系,不能一概而论,本文章目的在于将学习和实际工作中遇见的问题进行总结,类似于网络笔记之用,方便后续回顾查看,如有以偏概全、不祥不尽之处,还望海涵。1、Shader变体先看一段代码......Properties{ [KeywordEnum(on,off)]USL_USE_COL("IsUseColorMixTex?",int)=0 [Toggle(IS_RED_ON)]_IsRed("IsRed?",int)=0}......//中间省略,后续会有完整代码 #pragmamulti_c
TCL脚本语言简介•TCL(ToolCommandLanguage)是一种解释执行的脚本语言(ScriptingLanguage),它提供了通用的编程能力:支持变量、过程和控制结构;同时TCL还拥有一个功能强大的固有的核心命令集。TCL经常被用于快速原型开发,脚本编程,GUI和测试等方面。•实际上包含了两个部分:一个语言和一个库。首先,Tcl是一种简单的脚本语言,主要使用于发布命令给一些互交程序如文本编辑器、调试器和shell。由于TCL的解释器是用C\C++语言的过程库实现的,因此在某种意义上我们又可以把TCL看作C库,这个库中有丰富的用于扩展TCL命令的C\C++过程和函数,所以,Tcl是