Fit Cox Proportional Hazard Regression Model under dependent right censoring via MPL and Archimedean Copulas

Simultaneously estimate the regression coefficients and the baseline hazard function of proportional hazard Cox models under dependent right censoring using maximum penalised likelihood (MPL) and Archimedean Copulas

coxph_mpl_dc(surv, cova, control,...)

Arguments

surv	the outcome of survival data, with the first column X (observed time), second column del (event indicator) and third column eta (dependent right censoring indicator).
cova	the covariate matrix, with dimension of n rows and p columns, where 'n' is the sample size and 'p' is the number of covariates Default is `cova=matrix(0,n,1)`, which the covariates are not considered.
control	object of class `coxph_mpl_dc.control` specifying control options like basis choice, refer to `coxph_mpl_dc.control` to see the defaults.
...	other arguments. In `coxph_mpl_dc`, these elements, will be passed to `coxph_mpl_dc.control`.

Value

mpl_theta

MPL estimates of the regression coefficient for the basis functions of the baseline hazard of T, i.e. theta

mpl_gamma

MPL estimates of the regression coefficient for the basis functions of the baseline hazard of C, i.e. gamma

mpl_h0t

MPL estimates of the baseline hazard for T, i.e. h_0T(x_i)

mpl_h0c

MPL estimates of the baseline hazard for C, i.e. h_0C(x_i)

mpl_H0t

MPL estimates of the baseline cumulative hazard for T, i.e. H_0T(x_i)

mpl_H0c

MPL estimates of the baseline cumulative hazard for C, i.e. H_0c(x_i)

mpl_S0t

MPL estimates of the baseline survival for T, i.e. S_0T(x_i)

mpl_S0c

MPL estimates of the baseline survival for C, i.e. S_0C(x_i)

mpl_beta

MPL estimates of beta

mpl_phi

MPL estimates of phi

penloglik

the penalized log-likelihood function given the MPL estimates

mpl_Ubeta

the first derivative of penalized log-likelihood function with respect to beta given the MPL estimates

mpl_Uphi

the first derivative of penalized log-likelihood unction with respect to phi given the MPL estimates

mpl_Utheta

the first derivative of penalized log-likelihood function with respect to theta given the MPL estimates

mpl_Ugamma

the first derivative of penalized log-likelihood function with respect to gamma given the MPL estimates

iteration

a vector of length 3 indicating the number of iterations used to estimate the smoothing parameter (first value, equal to 1 when the user specified a chosen value), the beta, phi, theta and gamma parameters during the entire process (second value), and beta, phi, theta and gamma parameters during the last smoothing parameter iteration (third value)

mpl_cvl

the cross validation value given the MPL estimates

mpl_aic

the AIC value given the MPL estimates

mpl_beta_sd

the asymptotic standard deviation of the MPL estimated beta

mpl_phi_sd

the asymptotic standard deviation of the MPL estimated phi

mpl_h0t_sd

the asymptotic standard deviation of the MPL estimates for the baseline hazard coefficient of T, i.e. theta

mpl_h0c_sd

the asymptotic standard deviation of the MPL estimates for the baseline hazard coefficient of C, i.e. gamma

mpl_ht0_sd

the asymptotic standard deviation of the MPL estimates for the baseline hazard of T

mpl_hc0_sd

the asymptotic standard deviation of the MPL estimates for the baseline hazard of C

mpl_Ht0_sd

the asymptotic standard deviation of the MPL estimates for the cumulative baseline hazard of T

mpl_Hc0_sd

the asymptotic standard deviation of the MPL estimates for the cumulative baseline hazard of C

mpl_St0_sd

the asymptotic standard deviation of the MPL estimates for the baseline survival of T

mpl_Sc0_sd

the asymptotic standard deviation of the MPL estimates for the baseline survival of C

mpl_est_cov

the asymptotic covariance matrix of the MPL estimates

mpl_beta_phi_zp

the MPL estimates for regression coefficient with their corresponding standard deviations, z scores and p-values

binwv

the width of each discretized bin of the observed times when piecewise constant approximation applied to the baseline hazards

the bin ID for each individual of the sample when piecewise constant approximation applied to the baseline hazards

binedg

the edge for each discretized bin among the observed time vector X, which are the internal knots and boundaries

psix

basis function matrix psi(x_i) with dimension of n by m for baseline hazard, where m=number of internal knots+ordSp

Psix

basis function matrix Psi(x_i) with dimension of n by m for baseline cumulative hazard

Inputs defined in coxph_mpl_dc.control

Details

coxph_mpl_dc allows to simultaneously estimate the regression coefficients and baseline hazard function of Cox proportional hazard models, with dependent and independent right censored data, by maximizing a penalized likelihood, in which a penalty function is used to smooth the baseline hazard estimates. Note that the dependence between event and censoring times is modelled by an Archimedean copula.

Optimization is achieved using an iterative algorithm, which combines Newton’s method and the multiplicative iterative algorithm proposed by Ma (2010), and respects the non-negativity constraints on the baseline hazard estimate (refer to Ma et al (2014) and Xu et al (2018)).

The centered covariate matrix \(Z\) is used in the optimization process to get a better shaped (penalized) log-likelihood. Baseline hazard parameter estimates and covariance matrix are then respectively corrected using a correction factor and the delta method.

The estimates of zero are possible for baseline hazard parameters (e.g., when number of knots is relatively large to sample size) and will correspond to active constraints as defined by Moore and Sadler (2008). Inference, as described by Ma et al (2014) or Xu et al (2018), is then corrected accordingly (refer to Moore and Sadler (2008)) by adequately cutting the corresponding covariance matrix.

There are currently three ways to perform inference on model parameters: Let \(H\) denote the negative of Hessian matrix of the non-penalized likelihood, \(Q\) denote the product of the first order derivative of the penalized likelihood by its transpose, and \(M_2\) denote the negative of the second order derivative of the penalized likelihood. Then the three estimated covariance matrices for the MPL estimates are \(M_{2}^{-1}\), \(M_{2}^{-1}HM_{2}^{-1}\) and \(M_{2}^{-1}QM_{2}^{-1}\).

Simulation studies the coverage levels of confidence intervals for the regression parameters seem to indicate \(M_{2}^{-1}HM_{2}^{-1}\) performs better when using the piecewise constant basis, and that \(M_{2}^{-1}QM_{2}^{-1}\) performs better when using other bases.

References

Ma, J. (2010). "Positively constrained multiplicative iterative algorithm for maximum penalised likelihood tomographic reconstruction". IEEE Transactions On Signal Processing 57, 181-192.

Ma, J. and Heritier, S. and Lo, S. (2014). "On the Maximum Penalised Likelihood Approach forProportional Hazard Models with Right Censored Survival Data". Computational Statistics and Data Analysis 74, 142-156.

Xu J, Ma J, Connors MH, Brodaty H. (2018). "Proportional hazard model estimation under dependent censoring using copulas and penalized likelihood". Statistics in Medicine 37, 2238–2251.

Author

Jing Xu, Jun Ma, Thomas Fung

Examples

# \donttest{
 ##-- Copula types
 copula3 <- 'frank'

 ##-- Marginal distribution for T, C, and A
 a <- 2
 lambda <- 2
 cons7 <- 0.2
 cons9 <- 10
 tau <- 0.8
 betas <- c(-0.5, 0.1)
 phis <- c(0.3, 0.2)
 distr.ev <- 'weibull'
 distr.ce <- 'exponential'

 ##-- Sample size
 n <- 200

 ##-- One sample Monte Carlo dataset
 cova <- cbind(rbinom(n, 1, 0.5), runif(n, min=-10, max=10))
 surv <- surv_data_dc(n, a, cova, lambda, betas, phis, cons7, cons9, tau, copula3,
                     distr.ev, distr.ce)
 n <- nrow(cova)
 p <- ncol(cova)
 ##-- event and dependent censoring proportions
 colSums(surv)[c(2,3)]/n
#>   del   eta 
#> 0.415 0.415 
 X <- surv[,1] # Observed time
 del<-surv[,2] # failure status
 eta<-surv[,3] # dependent censoring status

 ##-- control inputs for the coxph_mpl_dc function
 control <- coxph_mpl_dc.control(ordSp = 4,
                             binCount = 100,
                             tau = 0.8, copula = copula3,
                             pent = 'penalty_mspl', smpart = 'REML',
                             penc = 'penalty_mspl', smparc = 'REML',
                             cat.smpar = 'No' )

 ##-- Fitting cox ph hazard model for T using MPL and an correct copula
 #with REML smoothing parameters
 coxMPLests5 <- coxph_mpl_dc(surv, cova, control, )
 mpl_beta_phi_zp5 <- coxMPLests5$mpl_beta_phi_zp
 mpl_h0t5 <- coxMPLests5$mpl_h0t
 mpl_h0Ti5 <- approx( X, mpl_h0t5, xout = seq(0, 5.4, 0.01),
                    method="constant", rule = 2, ties = mean)$y

 ##-- Real marginal baseline hazard for T
 ht0b <- a * (seq(0, 5.4, 0.01) ^ (a - 1)) / (lambda ^ a)


 ##-- Fitting cox ph hazard model for T using MPL and an correct copula
 #with zero smoothing parameters
 coxMPLests3 <- coxph_mpl_dc(surv, cova,
                          ordSp=4, binCount=100,
                          tau=0.8, copula=copula3,
                          pent='penalty_mspl', smpart=0, penc='penalty_mspl', smparc=0,
                          cat.smpar = 'No')
 mpl_beta_phi_zp3 <- coxMPLests3$mpl_beta_phi_zp
 mpl_h0t3 <- coxMPLests3$mpl_h0t
 mpl_h0Ti3 <- approx( X, mpl_h0t3, xout = seq(0, 5.4, 0.01),
 method="constant", rule = 2, ties = mean)$y

 ##-- Plot the true and estimated baseline hazards for T
 t_up <- 3.5
 y_uplim <- 2
 Ti<-seq(0, 5.4, 0.01)[seq(0, 5.4, 0.01)<=t_up]
 h0Ti<-ht0b[seq(0, 5.4, 0.01)<=t_up]
 h0Ti5<-mpl_h0Ti5[seq(0, 5.4, 0.01)<=t_up]
 h0Ti3<-mpl_h0Ti3[seq(0, 5.4, 0.01)<=t_up]

 plot( x = Ti, y = h0Ti5,
      type="l", col="grey", lty=4, lwd=3, cex.axis=1.6, cex.lab=1.6, ylim=c(0, y_uplim),
      xlab='Time', ylab='Hazard')
 lines(x = Ti, y = h0Ti,
      col="green",
      lty=1, lwd=3, cex.axis=1.6, cex.lab=1.6, ylim=c(0, y_uplim)
      )
 lines(x = Ti, y = h0Ti3,
      col="blue",
      lty=4, lwd=3, cex.axis=1.6, cex.lab=1.6, ylim=c(0, y_uplim)
      )

# }