Page 97 - 2022_01-Haematologica-web
P. 97
Early epigenetic changes in KM3-AML
of methylation adaptors (synthesized with 5’-methylcytosine instead of cytosine). The adaptor-ligated DNA samples were purified using AMPure XP beads (Beckman Coulter Genomics, cat. #A63880). Nine hundred nanograms of each DNA sample were hybridized with Agilent SureSelect Methyl-Seq biotinyled RNA baits for 24 h at 65°C. The libraries were captured using Dynal MyOne Streptavidin T1 magnetic beads (Invitrogen, cat. #65601). The captured DNA was subjected to bisulfite conver- sion with the EZ DNA Methylation Gold Kit (Zymo Research, cat. #D5005) as described in the manufacturer’s instructions, fol- lowed by PCR for a total of 16 cycles using the Illumina indexed library primers. The final libraries were quantified by 2100 Bioanalyzer chips (Agilent Technologies) with quality control on a MiSeq (Illumina) sequencer. Libraries were sequenced on mul- tiple lanes of a HiSeq2000 flow cell at the IRIC genomics plat- form and the FastQC R package was used to assess the quality of the reads.
Bioinformatics analysis of ChIP-sequencing and ATAC- sequencing
The quality of the raw sequence data was checked by FastQC (version 0.11.4). Trimmomatic (version 0.32) was used to remove adapters and keep reads with quality scores >30. Paired-end sequencing reads were then mapped to the human reference genome (GRCH37/hg19) using BWA (version 0.7.10) with the default parameters. PCR duplicates were removed by Samtools (version 1.9). For ATAC-sequencing data, reads from mitochon- drial DNA were removed (samtools idxstats input.bam | cut -f 1 | grep -v MT | xargs samtools view -b input.bam > output.bam). The EaSeq software package (version 1.04)20 was used to analyze data and integrate gene expression and data from epigenetic marks from chromatin immunoprecipitation (ChIP)-sequencing experiments. To calculate the enrichment of each histone mark, the analysis focused on specific regions in the genome with a dis- tance from the transcription start site of -1,000 to +3,000 bp for H3K79me2, and -2,000 to +2,000 bp for H3K4me3. MACS2 (ver- sion 2.2.4)21 was the peak calling algorithm used to determine the list of significant enrichment peaks, following the default para- meters (callpeak –f BAMPE –g hs –q 0.05). Intervene22 was used to find peaks in common in our datasets (bedtools intersect –a – b –f 0,50 –r) and generate Venn diagrams (Pybedtools). A ngs.plot23 was used to visualize global repartition of mapped reads across the genome. Previously published data for lineage- specific ATAC-sequencing in a range of normal blood cells24 were accessed through the UCSC track hub (https://s3-us-west-1.ama- zonaws.com/chang-public-data/2016_NatGen_ATAC-AML/hub.txt) DNAse hypersensitive sites and transcription factor clusters gen- erated from the ENCODE consortium V3 were analyzed by cus- tom Perl Scripts. Public ATAC-sequencing peaks for all cell types were identified using a threshold of a normalized score >60 for all tracks for starts/ends of peaks. Fastq files for BLUEPRINT ATAC-sequencing data (accessions EGAD00001002710 and EGAD00001002709) were downloaded, remapped and analyzed with MACS2 as described above. Gene set enrichment analysis was performed using software (version 3.0) from the Broad Institute25 and a pre-ranked list based on significantly differential- ly expressed genes between CD34+ and CD34+KM3 cells using the R package DESeq2.26
Mapping methyl-sequencing reads and methylation analysis
Paired end sequencing reads (2x100 bp) were mapped to the human reference genome (GRCH37/hg19) using Bismark with a directional alignment mode and Bowtie2 as the alignment process. All putative duplicate reads were removed by previous-
ly described Perl scripts.27 The methylation calls were extracted with Bismark-methylation-extractor and the analysis was per- formed using custom python scripts, R package MethylKit 0.9.2 and SeqMonk 1.45.2 as the visualization tool (www.bioinformat- ics.babraham.ac.uk/projects/seqmonk/). Hierarchical clustering of the samples based on the differentially methylated cytosines was performed using the computed Pearson correlation distance between samples. For each cell type (CD34+, CD34+KM3, KM3- AML model) the DNA methylation data corresponded to the combination of methyl-sequencing libraries from two biological replicates. Analysis of methylated cytosines was restricted to positions covered by a minimum of eight reads coverage for each biological replicate. To identify differentially methylated cytosines (DMC) a logistic regression model was applied to compare the fraction of methylated cytosines across the three types of cells and a sliding linear model (SLIM) method was used to correct P-values for multiple testing and derive q-values.28 To define the DMC, we specified two parameters: q-values below 0.01 and a threshold of percentage methylation difference ≤40%. All functional annotations for genomic regions were defined based on GRCh37/hg19 assembly downloaded from the UCSC website. Gene bodies were defined as the transcribed regions from the start to the end of transcription sites for each RefSeq entry and promoter regions were defined as 1 kb of sequence upstream of the transcription start site for each RefSeq transcript.
Results
Transcriptional consequences of KM3 expression
To characterize the short-term transcriptional changes induced by the KM3 fusion, we analyzed the RNA- sequencing data from the initial unmanipulated cord blood cells and these cells after their transduction with the KM3 fusion. As previously described,16 untransduced cells are largely lost in culture after about 1 month of in vitro culture and the remaining cells show induction of genes associated with a myeloid phenotype (e.g., CCR1, CD14) (Figure 1A). Interestingly, transduced cells also dis- play the loss of normal or leukemia stem cell markers (e.g., CD34 and GPR56, respectively) and the loss of expression of other genes previously associated with AML induction in other subtypes (e.g., MN130). This observation is supported by expression data from patients with KMT2A-translocated AML in the TARGET study,4 which showed an equivalent loss of MN1 and GPR564 (Figure 1B) and a similar loss of expression in studies of GPR56 seen in adult AML.31 A global view of the expression changes through gene set enrichment analysis confirmed the loss of stem cell phenotype and the gain in expression of HOXA9-MEIS1 targets consis- tent with leukemic transformation (Online Supplementary Figure S1).
To examine the regulation underlying the transcription- al changes seen, we studied the transcription factor bind- ing motifs upstream of the differentially expressed genes. Analysis of enriched motifs using several different soft- ware (including HOMER,32 MAGICTRICKS,33 and ShinyGO34) produced differing lists of potential transcrip- tion factors (Figure 1C), but factors in the CEBP and FOS families, along with NFE2, were highlighted consistently, at differing ranks. Analysis of RNA-sequencing data con- firmed that the expression of all members of the CEBP and FOS families increased during in vitro culture (Figure
haematologica | 2022; 107(1)
89