Page 193 - 2021_10-Haematologica-web
P. 193
Metabolic profiling in Pyruvate Kinase Deficiency
blood spots (DBS) in the diagnostic evaluation of PKD and report for the first time a metabolic fingerprint for PKD.
Methods
Samples
Sixteen patients diagnosed with PKD based on clinical pheno- type, enzyme activity assays and molecular defect were includ- ed. Healthy controls (HC; institutional blood donor service) served as controls. All patients or their legal guardians approved the use of remnant samples for method development and valida- tion, in agreement with institutional and national regulations. All procedures followed were in accordance with the ethical standards of the University Medical Center Utrecht and with the Helsinki Declaration of 1976, as revised in 2000. In order to obtain DBS, 50 microL aliquots of blood were spotted onto Guthrie card filter paper (Whatman no. 903 Protein Saver TM cards). Filter paper was left to dry for at least 4 hours at room temperature, and subsequently stored at -800C in a foil bag with a desiccant package pending further analysis.
Metabolic profiling
Sample preparation, direct infusion high resolution mass spectrometry (DI-HRMS) and data processing was performed as previously reported.12,13 Mass peak intensities for metabolite annotation were averaged over technical triplicates. In addition, as DI-HRMS is unable to separate isomers, mass peak intensi- ties consisted of summed intensities of these isomers. Metabolite annotation was performed using a peak calling bioinformatics pipeline developed in R programming software, based on the human metabolome database (version 3.6) (https://github.com/UMCUGenetics/DIMS). This resulted in 3,835 metabolite annotations corresponding to 1,903 unique metabolite features.14
In order to compare the metabolic profiles between HC and PKD, mass peak intensities for each identified feature were con- verted to Z-scores. These scores, based on metabolic control samples that were added to each DI-HRMS run, were calculated by the following formula:
Z-score = (mass peak intensity of patient or HC sample - mean mass peak intensities of metabolic control samples)/standard deviation
mass peak intensities of metabolic control samples*
*Metabolic controls exist of a batch of banked DBS samples from individuals in whom an inborn error of metabolism (IEM) was excluded after an extensive diagnostic workup.
Data analysis
T-test and multivariate analysis were conducted in MetaboAnalyst.15 Classification of data was performed in R soft- ware (Version 3.6.1) using the caret package, which contains a set of data processing functions that facilitate the generation of predictive models. Support vector machine (SVM) with linear kernel was used for the classification of HC and PKD samples. SVM algorithms use a set of mathematical functions that are defined as the kernel. The function of kernel is to take data as input and transform it into the required form, for example a lin- ear or polynomial kernel. We applied SVM with a linear kernel, the simplest kernel function, to perform the classification of HC and PKD. SVM with linear kernel is a supervised machine learn- ing model that uses a classification method, which is based on mapping the data into a high dimensional space.
This allows the separation of two groups of samples into dis- tinctive regions by the identification of a small fraction of sam- ples that separates the groups, also referred to as ‘support vec- tors’. Separation can be achieved by identifying a separating hyperplane, or decision boundary, between the support vec- tors.16 Classification of the test set was determined by projecting each of the new samples into this space. Data and R code are available upon request.
Results
Explorative untargeted metabolomics analysis
A total of 1,903 unique metabolite features (and their respective isomers) were analyzed for 16 PKD patients and 32 HC samples. Clinical and laboratory characteris- tics, and baseline comparison are summarized in Table 1. The most significant differences between the groups, identified by a t-test, included glycolytic intermediates like phosphoenolpyruvic acid and 2-/3-phosphoglyceric acid, polyamines (spermidine and spermine) and several acyl carnitines (methylmalonylcarnitine and propionyl- carnitine) (Figure 1A). Broad data exploration to assess the variation between samples and separation between groups was performed by unsupervised principal compo- nent analysis (PCA) and supervised partial least square discriminant analysis (PLS-DA), the latter taking group label into account as a response variable. Both analyses revealed close clustering of control samples and a more heterogeneous delineation for PKD patients (Online Supplementary Figure S1).
Machine learning algorithm identifies metabolic profile for PKD
In order to explore the potential of this extensive meta- bolic fingerprint in predicting PKD a binary classification model was constructed using an SVM with linear kernel. SVM has advantages over PLS-DA with regard to robust- ness to outliers, resistance to overfitting and predictive power.16 An optimal hyperplane to separate classes based on all metabolomics data was determined by cross valida- tion (4-fold, five repeats). The final model had high per- formance characteristics with an average accuracy of 96%.
In addition, receiver operator characteristic curves with area under the curve (AUC) were used as performance indi- cator (Online Supplementary Figure S2A). Important features for classification in this model include the polyamines sper- midine and spermine, as well as phosphoenolpyruvic acid, 2-/3-phosphoglyceric acid and glutathione (Figure 1B). Most of these features were increased in PKD, with the exception of glutathione and asparaginyl-proline/prolyl- asparagine (Figure 1C).
Metabolic profile predicts new samples with high accuracy
External model validation was performed by predicting new control (n=13) and PKD samples (n=6). This resulted in accurate prediction for all controls, and all but one patient (accuracy =94%) (Figure 1D). In order to assess uncertainty of the model and its predictive ability, bootstrap resampling was applied to the complete dataset. By randomly generat- ing training and validation (test) data from the original data, a similarly high prediction performance was achieved, sup- porting the validity of the presented model (Online Supplementary Figure S2B).
haematologica | 2021; 106(10)
2721