Page 202 - 2019_01-Haematologica-web
P. 202

J.S. Gandelman et al.
Results
Patients’ organ scores
Three hundred and thirty-nine adult patients with chronic GvHD were analyzed, with predominantly inter- mediate (49.3%, n=167) and high (41.6%, n=141) overall NIH-Severity. Of these 339 patients, 338 had a malignancy as the indication for hematopoietic stem cell transplanta- tion, with acute myeloid leukemia being the most com- mon malignancy affecting 109 (32%) of the subjects. Additional characteristics are described in Online Supplementary Table S1. The organs involved by chronic GvHD at study entry by NIH criteria were the mouth (63%), gastrointestinal tract (37%), eye (43%), joint (24%), fascia (14%), skin by sclerosis (15%), skin by ery- thema (49%), and lung by symptom score (21%). Detailed organ scores are shown in Online Supplementary Table S2.
Unique chronic graft-versus-host disease phenotypes revealed by machine learning
Computational analysis of % erythema, eye, liver, gas- trointestinal tract, fascia, joint, mouth, and sclerosis scores revealed seven groups of patients with different clinical phenotypes and risks (Online Supplementary Figure S1). viSNE analysis reduced the dimensionality of chronic GvHD organ scores, with patients who are more similar to each other shown closer together and patients who are more different from each other shown further apart on the scatterplot (Figure 1). For example, a group of patients emerged with involvement of fascia and joints as well as skin sclerosis. In FlowSOM clustering analysis, this group of patients was labeled as Cluster 2 (Figure 1).
FlowSOM clustering revealed a total of seven unique clusters of patients (Figures 1 and 2).
• Cluster 1: ▲Eye+10 Liver+5 (7.1% of patients); unique in having predominantly ocular involvement, all with an NIH eye score of 3.
• Cluster 2: ▲Joint+10, Fascia+5, Sclerosis+4, ▼Mouth-5, Liver-10 (12.7% of patients); a phenotype with enrichment for joint and fascia sclerosis, while specifically lacking mouth and liver GvHD.
• Cluster 3: ▲Liver+5 (10.0% of patients); differentiated by moderate liver involvement, all patients with a NIH liver score of 2, while specifically lacking enrichment in other organ scores.
• Cluster 4: ▲Mouth+5, ▼Liver-10 (28.9% of patients); enriched for mouth involvement, while lacking enrich- ment in other organ scores.
• Cluster 5: ▲BSA Red+6, ▼Liver-10 (18.3% of patients); this cluster was differentiated by body surface area (BSA) involved by chronic GvHD.
• Cluster 6: ▲Mouth+5, Eye+5, Liver+5, GI+1 (13.9% of patients); a phenotype enriched for mouth, eye, liver and gastrointestinal (GI) tract chronic GvHD.
• Cluster 7: ▲Liver+10 (9.1% of patients); highly enriched for liver GvHD, all had NIH 3 liver scores while lacking specific involvement in other organ domains.
The meaning of positive liver enrichment differed between cluster groups. Cluster 7 differed from other clus- ters with liver enrichment by capturing patients with a liver score of 3 while Clusters 1, 3 and 6 had patients with liver scores of 1 and 2.
Machine-learning clusters were stable
In a cluster stability analysis involving four additional
runs of viSNE and FlowSOM using the same organ fea- tures, five of the seven clusters were highly stable (Online Supplementary Figure S5). Stability was defined as having a median f-measure ≥0.85. Stable clusters had phenotypical- ly similar MEM labels between replications of analysis as well. Clusters 2-5 and 7 were highly stable. Clusters 1 and 6 were unstable with low reproducibility between replica- tions of analysis.
Clusters of patients identified by machine learning had different overall survival
Overall survival probability was stratified for chronic GvHD patients identified in low-risk (Clusters 1-3), inter- mediate-risk (Cluster 4), and high-risk groups (Cluster 5-7) defined by computational analysis (Figure 2). Time from the development of chronic GvHD to death differed between the high-risk group and the low-risk group [haz- ard ratio (HR)=2.24; 95% confidence interval (95% CI: 1.36-3.68); P=0.002) and between the intermediate-risk group and the low-risk group (HR=1.70; 95% CI: 0.99- 2.94; P=0.055).
Survival differences were not explained by NIH- Severity alone. When NIH-Severity was viewed on the viSNE scatter plot, clusters varied in NIH-Severity. For example, Cluster 2 patients had a combination of moder- ate and severe chronic GvHD (Figure 1). Additionally, when overall survival of all patients was stratified by NIH- Severity in a Kaplan-Meier analysis, NIH-Severity did not significantly stratify overall survival (log-rank for trend: P=0.08) (Online Supplementary Figure S6).
A physician-driven decision tree recapitulates machine-learning clusters
To test clinical applicability, a decision tree was devel- oped to classify patients into the seven clusters (Figure 3). The decision tree was based on expert physicians’ inter- pretation of the organs that were found together in the machine-learning workflow. The decision tree was con- structed through observation of viSNE scatter plots and MEM labels from the clusters of patients identified by the machine learning (Figures 1 and 2A). Patients’ outcomes were not considered in developing the decision tree. This decision tree asks a series of seven questions and can phe- notype patients in as few as one question for patients in Cluster 7.
The decision tree successfully identified the seven clus- ters of patients, with highly similar phenotypes to those of the original analysis (Figure 3). Specifically, Clusters 3, 4 and 7 had identical phenotypes by MEM labels when compared with the original machine-learning analysis (Figure 2). The remaining clusters had similar MEM labels to those of the original machine-learning analysis.
The decision tree stratifies patients’ outcomes independently of NIH-Severity
Decision-tree-determined risk groups stratified survival. Patients in decision-tree-derived Clusters 1 (ocular pre- dominant phenotype), 2 (sclerotic phenotype) and 3 (liver predominant-moderate phenotype) were classified as low risk based on Cox proportional hazards risk coefficients (Figure 4). Patients in decision-tree-derived Clusters 4 (mixed-phenotype intermediate risk) and 5 (erythema pre- dominant phenotype) were classified as intermediate risk, while patients in Clusters 6 (mixed phenotype-high risk phenotype) and 7 (liver predominant-severe phenotype)
192
haematologica | 2019; 104(1)


































































































   200   201   202   203   204