2022_03-Haematologica-web

Page 61 - 2022_03-Haematologica-web

P. 61

GEP for high dimensional prognostic models in CLL
missing values in the clinical data were imputed using chained equations.16 The algorithm imputes the missing values using a model with all other clinical variables as predictors, thus generat- ing ’plausible’ synthetic values. As the percentage of missingness for each variable was low (maximum of 16 missing values in 337 patients), a single imputation method was adequate. Furthermore, a non-specific filtering was performed selecting the 500 genes with highest variability over all samples. The final model was built by sparsed Cox proportional hazards model using the smoothly clipped absolute deviation (SCAD) penalty.17 The “reference model” for our analysis is a Cox proportional hazards model including variables with confirmed prognostic impact: age (contin- uous), sex (male or female), study medication (FC or FCR), ECOG performance status (1 or 2 vs. 0), WBC, TK and β2-m (all continu- ous), IGHV/ NOTCH1/ SF3B1 mutation status (all unmutated vs. mutated), del(11q), del(13q), del(17p), trisomy 12 and TP53 muta- tion (all present or absent). The analysis is based on updated results from the CLL8 trial.1
Models investigated for possible improvement of prognostica- tion using GEP included, first: the combination of all above-men- tioned confirmed prognostic variables without penalization and a subset of the GEP data selected by SCAD penalization (referred to as “fixed model”), and secondly: the combination of confirmed prognostic variables and GEP data in which all variables were equally penalized (“equally penalized model”) allowing for substi- tution of the confirmed prognostic variables with equally strong prognostic GEP variables. For internal validation bootstrap sub- sampling with 1.000 subsamples equal to 63.2% of the original sample size was used.18 The prognostic value of the final model was evaluated on the basis of the time-dependent Brier score (as implemented in the R-package pec).19 The Brier score was used to estimate the prediction error at a given time point. Resulting pre- diction error curves show the time-dependent Brier score over 60 months of follow-up and the integrated Brier score (IBS) was used to summarize prediction accuracy. For external validation the apparent error was calculated. For visualization purposes, survival curves were calculated by means of the Stone-Beran estimator20 using symmetrical nearest neighborhoods around the lowest, the median, and the highest observed values of the prognostic variable combinations using the R-package prodlim,21 both for OS and PFS. Statistical analysis was performed with the R environment for sta- tistical computing, version 3.3.1, using the R packages survival, version 2.39-5, prodlim, version 1.5.7, mice, version 2.25, ncvreg, version 3.6-0, pec, version 2.4.9 and bootstrap, version 2015.2. For validation, the prognostic gene signature established on the CLL8 cohort was tested in an array-based GEP training set of an inde- pendent cohort (n=149 unsorted CLL samples from treatment- naive [83%] and pretreated [17%] patients).22 Unmutated IGHV was reported in 49.3% and del(17p) in 8.6% of tested samples. Further details on cohort characteristics are provided in a previous publication.22
Results
Gene expression profiling variables substitute established prognostic markers in multivariate models
We first established multivariate models for variables for which the prognostic impact was confirmed in previous studies and is herein referred to as the “reference model”. Results are shown in the Online Supplementary Table S1A for OS and in Online Supplementary Table S1B for PFS, respec- tively.
In order to evaluate the impact for OS including a signa- ture consisting of GEP variables selected in the penalized
Cox model (Online Supplementary Table S2A), we tested var- ious combinations of confirmed prognostic variables and GEP. Only model combinations including genetic markers with prognostic impact achieved prediction error estimates similar to the confirmed prognostic variables used in the “reference model” (Figure 1A). Using the “fixed model”, penalization of GEP resulted in selection of only one GEP variable (PITPNC1, phosphatidylinositol transfer protein cytoplasmic 1) and no further improvement as compared to the reference model (IBS: reference model 0.092; fixed model 0.092) (Figure 1A).
In contrast, using the “equally penalized model” on all variables from the reference model and GEP data resulted in selection of only three confirmed prognostic markers (FCR, β2-m, del(17p)) along with ten GEP variables comprising the genes CLEC2B, RGS1, LDOC1, L3MBTL4, PRKCA, FHL1, SGCE, DCLK2, VSIG1, CD72 (Online Supplementary Table S3A). When assessing the prediction accuracy, this model performed similarly as the reference model (IBS: ref- erence model 0.092; equally penalized model 0.096) (Figure 1A). When analyzing PFS by prediction models including a signature of selected GEP variables for PFS (Online Supplementary Table S2B) with the same approach, the “fixed model” did not lead to selection of GEP variables besides the confirmed prognostic variables. Conversely, only four confirmed prognostic markers (FCR, del(11q), del(17p), SF3B1 mutation) were selected in the “equally penalized model”, together with 11 GEP variables including the genes RGS1, EIF1AY, LDOC1, L3MBTL4, DCAF12, PLD5, GTSF1L, NIPAL2, CYBRD1, ANXA1 (Online Supplementary Table S3B). Again, variables selected in the “equally penalized model” performed similar to the “refer- ence model” as demonstrated by prediction error estimates (IBS: reference model 0.160; equally penalized model 0.166; fixed model 0.160) (Figure 1B). Of note, strong prognostic markers like TP53 and IGHV mutation status (Online Supplementary Table S1) were substituted in both models by prognostic GEP variables (Online Supplementary Table S3).
For the prognostication of PFS, inclusion of GEP data alone or in addition to non-genetic variables (β2-m, TK, WBC, ECOG, study medication, sex and age) compensated for missing genetic information in patients with late disease progression (Figure 1B). In such models, GEP reliably increased prediction accuracy for patients over time as pre- diction error curves converged with those of the reference model. Prediction accuracy was comparable with the refer- ence model at 60 months.
The overall number of prognostic variables remained similar for either model (“reference model”: OS/PFS 15 vari- ables vs. “equally penalized”: OS 13 and PFS 15 variables) and although chromosomal gains or losses covered multiple genes, these variables were substituted by the expression of a few genes only. Furthermore, expression variables select- ed along with clinical variables in the penalized models for OS and PFS were not derived from genes localized in the recurrently deleted or amplified chromosomal regions (Online Supplementary Table S3A and B).
Gene expression profiling signatures refine prognostic estimation and retain strong prognostic value
in an independent cohort of unselected patients
In order to illustrate the distribution for OS and PFS with- in the different prediction models, conditional Kaplan- Meier estimates were generated and survival curve esti- mates are shown for lowest, median, and highest values of
haematologica | 2022; 107(3)
617

59 60 61 62 63