Page 87 - 2021_10-Haematologica-web
P. 87

BLUEPRINT hematopoietic transcriptomes atlas
Results
Transcriptome complexity of hematopoietic cell types
We isolated 90 samples (Figure 1A and B, Online Supplementary Table S1) from 72 whole blood and cord blood donations, either by magnetic bead separation or by flow activated cell sorting (Online Supplementary Methods). Total RNA data were generated from the following 27 cell types: erythroblasts (EB), megakaryocytes (MK), platelets (PLT), eosinophils (EOS), basophils (BAS), neutrophils (NEU), monocytes (MONO), non-activated macrophages (M0), lipopolysaccharide activated macrophages (M1), alternatively activated macrophages (M2), dendritic cells (DC), naive CD4 lymphocytes (CD4 naive), central mem- ory CD4 lymphocytes (CD4 CM), effector memory CD4 lymphocytes (CD4 EM), regulatory CD4 lymphocytes (TREG), naive CD8 lymphocytes (CD8 naive), central memory CD8 lymphocytes (CD8 CM), effector memory CD8 lymphocytes (CD8 EM), terminally differentiated effector memory CD8 lymphocytes (CD8 TDEM), naive B lymphocytes (B naive) , memory B lymphocytes (B M), class switch B lymphocytes (BCS), natural killer cells (NK), blood outgrowth endothelial cell progenitors (BOEC), umbilical vein endothelial cells (resting and proliferating; HUVEC R and P) and mesenchymal stem cells (MSC). Small RNA data were generated from the following 11 cell types: EB, MK, NEU, MONO, M0, M1, M2, DC, CD4 naive, CD8 naive and NK. An overview of the number of samples assayed of each cell type by total and small RNA- sequencing is presented in Figure 1A and B and Online Supplementary Table S2. We generated a mean of 91M 75 bp paired-end reads for total ribosomal RNA-depleted samples, except for platelets (PLT), basophils (BAS) and eosinophils (EOS), which were sequenced at a comparable depth but with 150 bp paired-end reads (Online Supplementary Table S1). We also generated a mean of 4.5M 50 bp single-end reads for small RNA samples (Online Supplementary Table S3). Principal component analysis of the log expression estimates for both protein- coding genes and small RNA showed distinct clustering by cell type according to their ontology along the first two principal components, which explained approximately 40% of the variance in expression of both types of RNA species (Figure 1C and D, Online Supplementary Figure S2A and B). This correspondence was also apparent by hierar- chical clustering of samples using Spearman rank correla- tion (Online Supplementary Figure S2C and D).
The GTEx project23 showed that whole blood has a very low gene expression complexity compared to that of other tissues, as 60% of all blood transcripts emanate from three hemoglobin genes.24 However, a low complexity of a het- erogeneous tissue may mask a high complexity of some of its component cell types. We therefore analyzed transcrip- tome complexity in different types of blood cells. After excluding mitochondrial genes from the analysis to account for their considerable variation in steady-state expression across individuals,25 the number of protein- coding genes contributing 50% of total expression ranged from only 14 in PLT to 600 in BAS. The number of protein- coding genes contributing 75% of total expression ranged from 168 in PLT to 2,422 in resting HUVEC (Figure 2A, Online Supplementary Table S4, Online Supplementary File 1). With the exception of PLT, the sets of genes yielding 75% of total expression in each cell type showed enrichment for gene ontology (GO) terms only for functional cate-
gories related to general biological processes, such as translation or transcription. Thus, cellular integrity and basic cellular functions are supported at the transcriptional level even in mature cell types, some of which have short half-lives. In PLT, however, we found an enrichment for GO terms related to the core functions of platelets (i.e., hemostasis, wound healing, coagulation, platelet degranu- lation), while more general processes featured less promi- nently (Online Supplementary Table S5). The corresponding analysis of the small RNA data showed a very low com- plexity: between one and seven miRNA accounted for 50% of total expression and fewer than ten miRNA accounted for 75% of the expression in each of the 11 cell types (Figure 2B, Online Supplementary File 2).
Transcriptional signatures correspond to hematopoietic cell functions
As the most highly transcribed genes in a given cell type are in general not enriched for GO terms describing that cell type's specific functions, we reasoned that these func- tions must be encoded primarily by other more lowly expressed genes. The expression levels of these genes should in principle correlate with cell type in order to ensure function specialization. To determine which genes form the transcriptional signature of each cell type, we grouped cell types into functional categories (Online Supplementary Table S2) and then identified heteroge- neously expressed genes over these categories through a Bayesian comparison of two statistical models: one in which the gene under consideration had a global mean expression parameter and another in which the gene had a different mean expression parameter for each category. Both models included a binary covariate accounting for the source of the blood samples (venous or cord). Using this approach, we found that 19,861 (59.5%) of HUGO Gene Nomenclature Committee (HGNC)-annotated genes had a posterior probability of differential expression >0.8. Over half of these differentially expressed genes had a mean log expression across samples >0. In contrast, only 3.5% of the non-differentially expressed genes had a mean log expression >0, indicating that the number of ubiquitously expressed housekeeping genes in hematopoiesis is a few hundred. The differentially expressed genes were then classified by the cell type in which their expression was greatest. To ensure that the classification recapitulated cellular functions specific to the mature blood cells in this atlas, rather than functions of shared progenitors from which they originate, we only classified the 16,572 genes whose maximum log expres- sion level was at least 0.1 (i.e., 10.5%) greater than that found in the cell type with the second greatest expression (Online Supplementary Methods, Online Supplementary Table S6). For example, VWF was assigned the endothelial cell (EC) label because, firstly, its expression varies across cell types (posterior probability of differential expression approximately = 1), secondly, VWF is most highly expressed in EC (log expression estimate = 6.0) and, third- ly, the second highest expressed category (MK/PLT, com- bined because PLT are the immediate anucleated descen- dants of MK) has a log expression estimate (averaged over MK and PLT) of 2.2, which is smaller than 6.0 by more than 0.1 units (Figure 3A). The number of genes assigned to each category ranged from 186 in CD8 T lymphocytes (CD8TC) to 3,502 in MK/PLT (Figure 3B). Using these groups of genes, we found enrichment for GO terms
haematologica | 2021; 106(10)
2615


































































































   85   86   87   88   89