Page 90 - 2021_10-Haematologica-web
P. 90
L. Grassi et al.
AC
B
Figure 3. Cell type-specific transcriptional signatures. (A) VWF expression estimates and posterior variances. (B) The number of differentially expressed genes clas- sified into each cell type grouping types. (C) Graphical representations of the Gene Ontology term enrichments for the MK/PLT and the DC groups. Note that, as PLT are the immediate anucleated descendants of MK, a gene was assigned to the composite MK/PLT group if it was maximally expressed in either cell type. The nodes represent terms, which are colored green if they are enriched and light blue if they are ontological ancestors of enriched terms, and the edges represent ontological relations. Abbreviations as in Figure 1.
regions and other repetitive or low complexity regions was higher than that of known coding genes and similar to that of novel non-coding genes (Online Supplementary Figure S3A). Secondly, their exons had low conservation among vertebrates, with scores resembling those of anno- tated lncRNA (P>0.05, Wilcoxon rank sum test) and novel non-coding genes (P>0.05, Wilcoxon rank sum test), and lower than those of protein coding genes (P≤0.0001, Wilcoxon rank sum test) (Figure 4B). Thirdly, their median expression was similar to that of annotated lncRNA and novel genes classified as non-coding by CPAT (median log expression levels: annotated lncRNA, 0.02; novel poten- tially coding, 0.03; novel non-coding, 0.02; protein-coding genes, 1.2) (Figure 4C). We therefore concluded that all the novel genes, including those with a CPAT score >0.364, were likely to be lncRNA.
Additionally, the novel genes differed from known lncRNA and protein-coding genes in that they had a high- er tissue specificity (median Tau: annotated lncRNA, 0.78; novel potentially coding, 0.95; novel non-coding, 0.94; protein-coding genes, 0.49) (Figure 4D). Low expression levels combined with high tissue specificity may explain why these transcripts have not been identified previously. The genomic coordinates of these novel transcripts are provided in Online Supplementary File 3.
Circular RNA in mature hematopoietic cells
CircRNA are single stranded RNA molecules of which the ends are covalently joined by a backsplice mechanism. Some circRNA have been shown to regulate transcrip- tion41 or act as miRNA sponges,42,43 but the majority of circRNA have no known function. Peripheral blood con- tains thousands of circRNA.44 We identified backsplice junctions in the total RNA-sequencing dataset using five methods43-46 and excluded backsplices detected by fewer than three of these methods in order to mitigate method- ological biases. In addition, we excluded backsplices over- lapping known segmental duplications,47 multiple genes or Ensembl 75-annotated readthrough transcripts. We thus obtained a list of 91,866 consensus backsplices (Online Supplementary Table S9). We further removed junctions observed only in one sample, as they are likely to be spu- rious, notwithstanding that this may tend to filter junc- tions specific to cell types with a small number of repli- cates. In total, 55,187 backsplices were retained for down- stream analyses. The majority (81.64%) of these back- splices were exonic and utilized annotated canonical splice sites (Figure 5A), which is consistent with previous reports.43,48 Almost half (44%) of the backsplices matched structures in circBase49 exactly and a further 30% shared one of their two splice sites with structures in circBase.
2618
haematologica | 2021; 106(10)