Page 50 - 2019_05-HaematologicaMondo-web
P. 50

Z. Wu et al.
RNA by scRNA-seq. After filtering, 391 cells from healthy donors and 588 cells from MDS patients were retained for analysis, with over 9.1 billion 75 bp paired-end mapped reads in total and 7.7 million reads per cell on average. Using a published strategy,31 a total of 10,791 protein-cod- ing genes were captured, 3,777 per cell on average.
To obtain reliable models of lncRNA expression, we fol- lowed a de novo transcript assembly pipeline (Figure 1A), in which “high-confidence” transcriptomes13,14,16,17,28 from CD34+ single cells of all nine subjects were merged in order to undergo multi-step filtering for: (i) overlap with known mRNA exon annotations, (ii) size and multiexonic selection, (iii) known protein domains, (iv) low levels of expression, and (v) predicted coding potential. Using this conservative multilayered analysis, we identified a total of 2,892 lncRNAs across 979 single human CD34+ cells. To assign lncRNAs to specific classes, we examined their overlap with annotated noncoding genes present in public databases: 808 lncRNAs were previously annotated and
A
2,084 were putative novel lncRNAs (Figure 1B and Online Supplementary File 1). In addition, transcripts that were expressed at medium levels and supported by CAGE data37 were also defined to be lncRNAs (n=281) expressed in human CD34+ cells (Online Supplementary File 2). Defined lncRNAs exhibited similarly low protein-coding potential (relative to protein-coding genes) as had previ- ously annotated lncRNAs in the GENCODE database (Figure 1C). Such defined lncRNAs in single human CD34+ cells were distributed across all chromosomes, at much lower average abundance than were protein-coding tran- scripts. Compared with protein-coding genes, lncRNA- encoding genes had fewer exons, were shorter and less well conserved. In general, lncRNA-encoding genes were enriched in 4-kb regions around the transcriptional start sites of their neighboring protein-coding genes, in agree- ment with previous work,38 suggesting that they share promoter regions [lncRNA-encoding genes show higher co-expression with protein-coding neighbors than do pro-
Figure 1. Identification of long noncoding RNAs expressed in single human CD34+ cells. (A) Bioinformatics pipeline for identification of long noncod- ing RNAs (lncRNAs). Single cell RNA-sequencing (scRNA- seq) data from nine subjects were processed and filtered before further analysis of messenger RNA (mRNA) and lncRNA expression. mRNA transcriptome analysis includ- ing cell clustering, cell type assignment, and identification of monosomy 7 cells was described32 and employed to analyze gene expression pat- terns among cell types, func- tional imputation of lncRNAs, and differentiation trajectory analysis in the current study. scRNA-seq data were
BC
processed by genome-based transcriptome reconstruction for the quantifi- cation of lncRNAs expressed in human CD34+ cells through the multi-step filtering bioin- formatic pipeline. Numbers of remaining transcripts after each filtering step are indicat- ed. (B) By comparing defined lncRNA transcripts in de novo transcript assembly with tran- scripts in the GENCODE data- base, 808 lncRNAs were pre- viously annotated while 2,084 were classified as potential novel lncRNAs. (C) Comparison of coding poten- tial among previously annotat- ed lnRNAs, novel lncRNAs, and mRNAs. x axis, coding probability calculated with CPAT; y axis, cumulative distri- bution function (CDF).
de novo
896
haematologica | 2019; 104(5)


































































































   48   49   50   51   52