Using Copulas to Model and Simulate Two-dimensional Warranty Data

Yuichi Hirose, Haylem Rayner, Stefanka Chukova and Binh Nguyen

December 6, 2019

Introduction

We consider a set of warranty data taken from model year 2001 vehicles sold mainly in calendar years 2000 and 2001. The ``cut’’ off date for this dataset is Oct., 24, 2003.

In the warranty data,

Thus we have two usage measures (age and mileage), but one of them (mileage) is incompletely observed. We model the warranty claims as recurrent events from a repairable system.

Warranty data set

Terminology and Notations I

Terminology and Notations II

The Model Formulation

The likelihood Function I

In order to derive the likelihood, we need to evaluate the following probabilities:

The likelihood Function II

The Likelihood Function III

Under 2D-NHPP scenario, as given above, the likelihood of the observed events generated by the \(i^{th}\) unit is

\[\begin{eqnarray*}\label{i_likelihood} L_i & = & \prod_{j=1}^{n_i} h(t_{ij},u_{ij}) \prod_{j=0}^{n_i}\exp\left\{-\int_{t_{ij}}^{t_{i (j+1)}} \int_{u_{ij}}^{u_{i(j+1)}} h(r,s) ds dr \right\}, \end{eqnarray*}\]

and the combined likelihood of the data for all units is \[\begin{eqnarray*}\label{M_likelihood} L %&=& \prod_{i=1}^M L_i \\ \nonumber &=& \prod_{i=1}^{M} \left\{\prod_{j=1}^{n_i} h(t_{ij},u_{ij}) \prod_{j=0}^{n_i}\exp\left\{-\int_{t_{ij}}^{t_{i (j+1)}} \int_{u_{ij}}^{u_{i(j+1)}} h(r,s) ds dr \right\} \right\}, \end{eqnarray*}\] where \(t_{i, n_{i+1}} = \mbox{min}\hspace{1ex} (a_i, T_0)\), \(u_{i, n_{i+1}} = U_0\) and \(t_{i0} = u_{i0} = 0.\)

Bivariate Intensity Models: Methods of Copula I

For a given copula function \(C(x,y)\), the two-dimensional joint survival copula \(\overline{C}(x,y)\) is analogous to the joint survival function \(\overline{F}(t,u)\). It is well known that \[\begin{equation}\label{surv} \overline{F}(t,u) = 1-F_T(t)-F_U(u)+F(t,u), \end{equation}\] where \(F(t,u)\) is the joint cdf, \(F_T(t)\) and \(F_U(u)\) are the marginal cdfs. Then by substituting in \(C(x,y)\) and using the Sklar’s theorem (, page 18) into () we arrive at \[\begin{equation} \overline{C}(F_T(t),F_U(u)) = 1-F_T(t)-F_U(u)+ C(F_T(t),F_U(u)). \end{equation}\]

Bivariate Intensity Models: Methods of Copula II

Therefore, the survival copula without marginal distributions is \[\begin{equation} \overline{C}(x,y) = 1 - x - y + C(x,y),\end{equation}\] where \(0 \leq x \leq 1\) and \(0 \leq y \leq 1\). The mixed partial derivative of the copula \(C(x,y)\) is the copula density. Substituting the joint survival copula in place of the survival function and the copula density leads to \[\begin{equation}h_C(x,y) = \cfrac{\frac{\partial^2 C (x,y)}{\partial x \partial y}}{1 - x - y + C(x,y)}.\end{equation}\]

Fitting the age marginal distributions

The best fit for the marginal of \(T\) was found to be the Weibull distribution with \(\beta_1=0.903\) and \(\theta_1=864.896\), i.e., distribution with cumulative hazard function \[\hat{\Lambda}_T(t)=\left(\frac{t}{864.896}\right)^{0.903}.\]

Fitting the mileage marginal distributions

Similarly, the best fit for the marginal of \(U\) was found to be the Weibull distribution with \(\beta_2=0.911\) and \(\theta_2=314.295\), i.e., distribution with cumulative hazard function \[\hat{\Lambda}_U(u)=\left(\frac{u}{314.295}\right)^{0.911}.\]

Examples of symmetric Copula

Name Copula \(C(u, v)\) Parameter \(\theta\)
Independence \(uv\)
Gumbel \(\exp\biggl\{-\bigg(\log(u)^\theta + \log(v)^\theta\bigg)^{1/\theta}\biggr\}\) \(\theta \in [1, \infty]\)
Clayton \(\max\{u ^{-\theta} + v ^{-\theta} - 1, 0\} ^{-1/\theta}\) \(\theta \in [-1, \infty)\backslash\{0\}\)
AMH \(\frac{uv}{1 - \theta(1 - u)(1-v)}\) \(\theta \in (-1, 1)\)
Joe \(1 - \big((1-u)^\theta + (1-v)^\theta - (1-u)^\theta(1-v)^\theta\big)^{1/\theta}\) \(\theta \in [1, \infty)\)
FGM \(uv + \theta u v(1-u)(1-v)\) \(\theta \in (-1, 1]\)

Model fitting using candidate copula I

Copula ln L AIC Parameters
Product -587465 1174932 ——-
Ali -572875 1145752 delta=1.000
Clayton -565068 1130138 delta=4.118
FGM -583761 1167524 delta=1.000
Frank -566861 1133724 delta=13.373
Joe -570028 1140058 delta=7.480
Gaussian -564643 1129288 rho=0.923
Student-t -564626 1129256 nu=55, rho=0.922
Gumbel -564387 1128776 delta=0.214

Based on the value of AIC, the Gumbel copula with \(\delta=0.214\) is the best fitted copula.

Simulation I

We treat each elementary region as a Bernoulli random variable \(X\) with probaibilty \(h(t,u)\delta t \delta u\):

\[ X \sim \textrm{Bernoulli}(p=h(t,u)\delta t \delta u). \]

Simulation II

Repeat the same procedure for the new region.

Simulated data from Gumbel copula

Asymmetric copula created from symmetric copula

Model fitting using candidate copula II

Copula AIC
KH (Clayton, Clayton) 839761.39
KH (Clayton, Gumbel) 840223.62
KH (FGM, Clayton) 840479.57
KH (Max, Clayton) 840481.76
KH (Min, Clayton) 840482.03
KH (Joe, Clayton) 840711.57
KH (AMH, Gumbel) 842520.04
KH (Joe, Gumbel) 842520.04
KH (FGM, Gumbel) 842520.04
KH (Gumbel, Gumbel) 842520.04
KH (Min, Gumbel) 842522.04
KH (Max, Gumbel) 842692.79
Wu(Gumbel) 844252.90
Gumbel 844366.35
Wu(Clayton) 845135.83
Clayton 845160.01
KH (Max, Joe) 845239.56
KH (Joe, Joe) 846724.36
KH (AMH, Clayton) 847060.09
KH (Joe, AMH) 849395.83
KH (Joe, FGM) 849539.47
Joe 849891.64
KH (Min, Joe) 854619.55
AMH 854803.72
Wu(Min) 855933.50
KH (Max, AMH) 856219.76
Max 858857.45
KH (Max, FGM) 859697.92
KH (FGM, AMH) 861122.03
KH (AMH, AMH) 861159.76
KH (FGM, FGM) 861763.73
FGM 861934.77
Wu(FGM) 861946.80
Wu(AMH) 862223.01
KH (Max, Max) 862571.52
KH (Max, Min) 864848.54
Wu(Joe) 865956.33
Min 866181.04
Wu(Max) 866194.86
KH (Min, AMH) 866236.35
KH (Min, FGM) 866237.40
KH (Min, Min) 866334.64

Based on the value of AIC, the KH (Clayton, Clayton) copula is the best fitted copula.

Simulated data from KH (Clayton, Clayton) copula

Currently working with …

Reference