Page 49 - 2019_05-HaematologicaMondo-web
P. 49

Single cell lncRNAs in hematopoiesis
of gene expression and migration of Th2 cells.16 Downregulation of linc-MAF-4 skews T-cell differentia- tion towards the Th2 phenotype.17 TMEVPG1, a Th1-spe- cific intergenic lncRNA, controls the expression of inter- feron-γ together with the Th1-specific transcription factor T-bet, and is critical in modulating susceptibility to infec- tion with Theiler virus.19,20 Expression of lncRNAs in pro-B and mature B cells is regulated by PAX5, a transcriptional factor required to specify B-cell lineage.18 Despite these many examples of specific functions for either stem cells or differentiated lineages, the repertoire of lncRNAs in human HSPCs has not been fully described.
Whole transcriptome sequencing allows large scale pro- filing of lncRNAs in tissues and diseases and, therefore, enables the identification of many putative lncRNAs.5,21,22 lncRNAs in general are expressed at much lower lev- els3,4,23,24 but are more cell type-specific than are mRNAs.9,25 Until recently, lncRNA expression was assessed by averag- ing transcriptomes of bulk RNA extracted from mixed cell populations, which limits the sensitivity to detect lncRNA expression in small cell populations and thus to resolve diversity within a cell type. With recent advances in single cell transcriptome profiling methods, many seemingly homogeneous cell populations have shown unexpected variability in gene expression. Recently published studies profiling lncRNAs at the single cell level have revealed the cell-specific expression of these RNAs.5,26-30
In the current work, we performed single cell RNA sequencing (scRNA-seq) of 979 freshly isolated bone mar- row-derived human CD34+ cells from both healthy donors and patients with myelodysplastic syndrome (MDS). Using de novo transcriptome reconstruction, we identified a total of 3,173 lncRNAs, including 2,365 potential novel lncRNAs not reported in public databases. We further characterized the features and expression patterns of lncRNAs in CD34+ cells, revealing stage- and lineage- specificity of lncRNA expression and putative functions in normal hematopoiesis. Expression and lineage-specificity of almost 40 lncRNAs, including those novel lncRNAs, were validated by quantitative real-time polymerase chain reaction (RT-PCR). We also profiled lncRNAs in MDS cells, and aneuploid cells in particular. Our study provides a global assessment of lncRNA biology in early human hematopoiesis.
Methods
Subjects and samples
Bone marrow samples from seven healthy donors and five MDS patients were obtained after written informed consent in accordance with the Declaration of Helsinki and under protocols (www.clinicaltrials.gov NCT00001620 and NCT00001397) approved by the Institutional Review Boards of the National Heart, Lung, and Blood Institute. Of the five patients with MDS, patients 1, 2, and 5 had evolved to MDS from aplastic anemia while patients 3 and 4 had de novo MDS. Fluorescence activated cell sorting (FACS) was performed using the FACSAria II Cell Sorter (BD Biosciences) after isolation of bone marrow mononuclear cells. The gating strategies are shown in Online Supplementary Figure S1A. CD34+CD38- and CD34+CD38+ cells from four healthy donors and patient 4 were sequenced separately, while only the CD34+ populations of patients 1, 2, 3, and 5 were sequenced due to lim- ited cell numbers (Online Supplementary Figure S1B). The clinical characteristics of these patients have been published.31 Another
set of bone marrow cells from a further three healthy donors was used for quantitative RT-PCR (Online Supplementary Figure S2).
Single cell RNA sequencing
The C1 Single-cell Auto Prep System (Fluidigm) was employed to perform SMARTer (Clontech) whole transcriptome amplifica- tion on as many as 96 individual cells, according to the manufac- turer’s protocols (www.fluidigm.com). Whole transcriptome amplification products were converted to Illumina sequencing libraries using the Nextera XT DNA Sample Preparation Kit (Illumina). Final cDNA libraries were quantified using High Sensitivity DNA Kits (Agilent) and sequenced on a HiSeq 2500 or 3000 (Illumina), using the paired-end 75-bp protocol, as described previously.31 RNA-seq data from this study have been deposited at the National Center for Biotechnology Information Gene Expression Omnibus (accession number GSE99095), and updated with intermediate and result files from the lncRNA analysis. Aliquots of whole transcriptome amplification products were used for quantitative RT-PCR analysis.
Bioinformatic analysis
Total reads were mapped to the reference genome (hg19) with RSubreader and gene-level read counts were calculated using featureCounts.32 Only data from high-quality cells with captured genes were utilized further. The schematic pipeline has been pub- lished.31 Aneuploidy was evaluated by three independent meth- ods, including a sliding window analysis of copy number varia- tions, chromosome relative expression value distribution, and analysis of the degree of loss of heterozygosity.
Identification and classification of long noncoding RNAs
After filtering computationally for quality,31 single cells were used to define lncRNAs with a pipeline adopted from published methods of identifying high-confidence gene models.13,14,16,17,28 Fastq files of cells from each subject were merged. Reads were mapped to human genome hg19 with Tophat2 and assembled using Cufflinks packages.33 The assembled transcripts from all subjects were merged with Cuffmerge33 before removing genes with <200 nucleotides or containing single exons in order to obtain long transcripts. Assembled genes overlapping with known protein-coding genes were excluded, and we removed those with low expression (FPKM<2) to improve the reliability of the model. We investigated the coding potential of the remaining genes using three independent algorithms: (i) protein database homology with BlastX and Pfam 31.0 (hmmer2.0); (ii) codon potential assessment with CPAT;34 and (iii) presence of long open reading frames >100 amino acids with EMBOSS GetORF.35 Defined lncRNAs were compared with annotated databases from Ensembl, University of California Santa Cruz (UCSC) Genome Browser, and GENCODE:36 overlapping lncRNAs were defined as “annotated lncRNAs” and the others as putative “novel lncRNAs”. If supported by cap analysis of gene expression (CAGE) data,37 lncRNA transcripts obtained by the same filtering pipeline, but with medium expression levels (FPKM 0.1-2) were also defined to be expressed in human CD34+ cells (Online Supplementary Methods and Results).
Results
Identification and characterization of long noncoding RNAs in human CD34+ hematopoietic cells
To assess lncRNA expression in human HSPCs, we puri- fied CD34+ cells from the marrow of four healthy donors and five MDS patients. We then analyzed polyadenylated
haematologica | 2019; 104(5)
895


































































































   47   48   49   50   51