## Abstract

Understanding conception probabilities is important not only for helping couples to achieve pregnancy but also in identifying acute or chronic reproductive toxicants that affect the highly timed and interrelated processes underlying hormonal profiles, ovulation, libido, and conception during menstrual cycles. Currently, 2 statistical approaches are available for estimating conception probabilities depending upon the research question and extent of data collection during the menstrual cycle: a survival approach when interested in modeling time-to-pregnancy (TTP) in relation to women or couples' purported exposure(s), or a hierarchical Bayesian approach when one is interested in modeling day-specific conception probabilities during the estimated fertile window. We propose a biologically valid discrete survival model that unifies the above 2 approaches while relaxing some assumptions that may not be consistent with human reproduction or behavior. This approach combines both the survival and the hierarchical models allowing investigators to obtain the distribution of TTP and day-specific probabilities during the fertile window in a single model. Our model allows for the consideration of covariate effects at both the cycle and the daily level while accounting for daily variation in conception. We conduct extensive simulations and utilize the New York State Angler Prospective Pregnancy Cohort Study to illustrate our approach. We also provide the code to implement the model in R software in the supplemental section of the supplementary material available at *Biostatistics* online.

## INTRODUCTION

Fecundability is defined as the probability of recognized conception in a menstrual cycle among couples having regular unprotected intercourse (Gini, 1924). It is used as a measure of a couple's fecundity or biologic capacity for reproduction. Motivated by the needs of natural family planners, and the research and clinical communities desire to identify fecundity determinants or timing of medical intervention, 2 quantities of interest have emerged: the time-to-pregnancy (TTP) and the day-specific conception probabilities in a given menstrual cycle for a couple.

TTP is defined as the number of menstrual cycles it takes for a couple, having regular sexual intercourse without the use of contraception, to conceive. It has been used as a measure to study the effect of various exposures on fecundity, see, for instance, Baird *and others* (1986), Law *and others* (2005), and Buck Louis *and others* (2009). Weinberg and Gladen (1986) first proposed a beta-geometric model for the probability distribution of the TTP. Subsequently, Scheike and Jensen (1997) proposed a discrete survival model for the hazard of conception in a cycle *j* given cycle level covariates $Uij$, as

A question of considerable interest, in the presence of detailed daily level information, is the probability of conception due to a single act of intercourse on a given day relative to ovulation. This is known as the day-specific conception probability. This enables one to determine the “fertile window,” a quantity of considerable interest to couples planning or trying to avoid pregnancy. The days outside of the “fertile window” have a small chance of pregnancy. The estimation of day-specific conception probabilities is complicated by the fact that it is unusual for a sexually active couple to have a single act of intercourse during the fertile window. Additionally, only a proxy for ovulation is available as the gold standard requires direct visualization of the ovaries via laparoscopic or ultrasonographic techniques that are available only for women seeking medical or infertility treatment such as *in vitro* fertilization or intracytoplasmic sperm injection. Consequently, differing precision exists in such identification.

The original seminal work of Barrett and Marshall (1969) proposed

**(A1)**Each day of intercourse has an independent effect (that is to say, each ejaculate of sperm acts independently) to fertilize the ovum.**(A2)**The daily probabilities of conception are constant across couples and cycles.

Since then, the literature has responded to the second assumption (A2) to a great extent starting with the work of Schwartz *and others* (1980) who noted that conception is not only dependent on the timing of the intercourse but also on several biological factors such as the penetrability of the mucus, the capacity of the ovum to be fertilized and the receptivity of the uterine lining for implantation. This led to the model

*ω*is the probability of a cycle being viable. Modifications of this model include the Weinberg

*and others*(1994c) model that accommodates cycle-specific covariates; the Zhou and Weinberg (1996) model that incorporates day-specific covariates; the Zhou

*and others*(1996) model that accounts for the within-woman dependency; the Dunson and Zhou (2000) model that incorporates both within-woman dependency and a sterile fraction. Zhou and Weinberg (1996) proposed an expectation–maximization (EM) algorithm for the maximum likelihood estimation of parameters and used a sandwich estimator of the variance to adjust for within-woman dependency. However, their estimates of day-specific probabilities are biased downward, and the sandwich estimator is not valid because less fertile women contribute more cycles to the data. This class of models culminated in the work of Dunson and Stanford (2005), who note the weak identifiability of the cycle viability term

*ω*. Consequently, they proposed a hierarchical model

An assumption that is common to all day-specific models is (A1), namely that the ejaculates of sperm introduced by the intercourse acts on different days compete with each other independently in an attempt to fertilize the ovum. In particular, the independence assumption necessitates the relation

*k*th intercourse act fertilizes the ovum, and $Xk$ denotes the indicator of intercourse occurring on the

*k*th day of the cycle. This may not be a reasonable biological assumption. Each intercourse act in the fertile window introduces a fresh ejaculate of sperm in the reproductive tract that can potentially fertilize the ovum. It is well known in the clinical literature (van Duijn and Freund, 1971; Tyler

*and others*, 1985; Levin

*and others*, 1986; Carlsen

*and others*, 2004) that tremendous inter- and intravariability exists in sperm quality ranging from azoospermic (absence of sperm) to high-quality sperm. In fact, studies have shown that there is a 29.2% reduction in sperm concentration when ejaculatory frequency went from one to two episodes during a 7-day period prior to semen collection following 2 days of abstinence. For 3 or more ejaculations, the level was reduced by about 41% as compared to the one ejaculate group (Carlsen

*and others*, 2004). Consequently, this implies that the number of sperm available for fertilization does not just vary from day to day (assuming one ejaculate per day) but also depends on how many intercourse acts occurred previously (that is, on the previous $Xk$'s). Sperm concentration and number per ejaculate may affect the probability of conception. These data suggest that (1.3) may not be reasonable. Although we focused on sperm concentration for the purpose of illustration, a host of other semen characteristics, such as motility and morphology also impact conception probability and are dependent upon previous intercourse pattern (Carlsen

*and others*, 2004), supporting the need to relax the independence assumption.

Royston and Ferreira (1999) proposed an alternative approach that does not require the independence assumption in that they assume that in cycles with multiple intercourse acts only the most fertile one contributes to the probability of conception. Although this may be a reasonable approximation in some cases, sperm introduced on less optimal days in the fertile window can also compete to fertilize the ovum and should not be ignored, as normal appearing sperm may survive up to 5 or 6 days in the female reproductive tract.

In this paper, we propose a model for the hazard for conception in a menstrual cycle that takes into account the daily level intercourse acts appropriate for conception as well as the effects of covariates (daily, cycle, or couple level). This enables us to assess the effect of covariates directly on the survival (hazard) function for TTP. This model retains the ease of interpretation of the effects of covariates as in the discrete Cox model, while still allowing us to assess the more subtle daily level probabilities of conception in a cycle. Moreover, in our approach, we do not require the assumption of independence of acts of intercourse in fertilizing the ovum within the same cycle inherent in the day-specific conception models. Under the assumption (A1), Dunson (2003) has discussed assessing hazard based on day-specific models that predate the Dunson and Stanford (2005) model. In Section 2, we present our model and discuss its relation with existing models; in Section 3, we provide extensive simulations; and in Section 4, we show the application of our model to a prospective pregnancy cohort study with preconception enrollment of women who were followed through 12 menstrual cycles at risk for pregnancy.

## MODEL FOR CONCEPTION

We begin by introducing some notation. Let $Ti$ denote the TTP for couple $i,i=1,2,\u2026,n$. As is usual in many time to event studies, $Ti$ is subject to right censoring ($\tau i$) and one observes ${Ti\u2227\tau i,\delta i=I(Ti\u2264\tau i)}$, where $I(\xb7)$ denotes the indicator function. Let $Xij=(Xij1,Xij2,\u2026,XijK)$ denote the intercourse indicators in the fertile window of *j*th cycle for the *i*th couple. Denote by $Uij,$ the cycle-level covariates. Further, denote by $\lambda i(j)$, the hazard rate for the TTP of the *i*th couple.

Let $Cj$ be the event that the ovum is fertilized in the *j*th cycle, and $Ak$ be the event that a sperm from the *k*th intercourse fertilizes the ovum. Note that the fertilization of an ovum normally requires a sperm originating from one of the potential intercourse acts that the couple may have had in the fertile window. Then $Cj=\u222ak=1KAk$ (disjoint union of events). So,

*k*with that occurring on day $k\u2032$ in the fertile window. These weights are estimated based on the observed sample. Furthermore, we propose to directly model conception in cycle

*j*, given that conception has not occurred so far, by

*j*. In other words, we propose the following discrete survival model for TTP:

*η*. Observe that the proposed model corrects for “immaculate conception,” that is, the hazard for conception in a cycle is zero if the couple does not have any intercourse in the fertile window of that cycle. The regression coefficients $\alpha k$ capture the baseline

*k*th day effect of intercourse on the probability of conception in cycle

*j*. The cycle-varying parameter $\rho (j)$ denotes the cycle-specific baseline, a quantity of considerable interest (Weinberg

*and others*, 1994b). The regression coefficients

*β*capture the effect of the covariates $Uij$. Observe that if a couple had intercourse only on day

*k*, that is, $Xijk=1,Xijk\u2032=0,fork\u2032\u2260k$, then, under the proposed model, the probability of conception in cycle

*j*is given by

*j*if the couple had intercourse only on a specific day in the fertile window of cycle

*j*. This is analogous to the day-specific probabilities of conception in the day-specific models for conception. Consequently, we refer to it as the

*k*th day-specific conditional hazard of conception. Also, note that the effect of covariates on $\lambda i(k)$ can be viewed as additive effects on complementary log–log scale.

Furthermore, we can also estimate the effect of covariates directly on the probability for conception in cycle *j* as follows:

Similar to Scheike and Jensen (1997), the marginal forms (with respect to random effect $Ri$) for the probability of conception in cycle *j* and the hazard for conception in cycle *j* can be expressed as

*η*, the hazard rate $\lambda i(j)=P(Ti=j|Ti\u2265j)$ is given by

*k*th day-specific conditional probability of conception in cycle

*j*is given by

Most prospective pregnancy studies design data collection to include the use of daily diaries to ascertain daily level covariates such as menstruation and sexual intercourse and, possibly, factors purported to impact couple fecundity (e.g., cigarette smoking, alcohol, and caffeine consumption). One can also estimate the effect of such covariates in the model by viewing them as a vector $Zij=(Zij1,\u2026,ZijL)$, the number of days *L* of interest need not be the same as the fertile window. One can incorporate the daily-level covariate into the model as follows:

To summarize, our model unifies the 2 approaches for modeling fecundability: TTP approach and day-specific approach while (i) accounting for the couple-level heterogeneity through the random effect, (ii) accounting for the cycle-level baseline effect $\rho (j)$, (iii) assessing day-level covariates directly on the cycle-level probability of conception rather than through day-specific probabilities of conception, (iv) while not requiring the independence of sperm fertilizing assumption (A1). The proposed discrete survival model can be fitted using a likelihood-based approach. We include the code to implement it using R software in the Supplementary Section of the supplementary material available at *Biostatistics* online.

## NUMERICAL STUDY

The goal of this section is to investigate the performance of the estimates using a likelihood-based approach. We also investigated the effect of zero-risk sets on estimation. Here, “zero-risk” indicates that a couple did not have an intercourse acts during the fertile window of a cycle and, consequently, did not put themselves at risk for conception. Observe that this unique requirement of an intermediate event is one of the features that sets the TTP data apart from the classical discrete survival setup. A common practice of incorporating intercourse into the discrete survival setup is to summarize the number of intercourse acts in the fertile window and include it as a cycle-varying covariate. Obviously, this practice ignores the effect of zero-risk sets on hazard. So, the simulation study focuses on the performance of the proposed estimates and also studies the impact of ignoring zero-risk sets.

We generated the data as follows: for subject *i*, the frailty variable $exp(Ri)$ was generated from a gamma distribution with mean 1 and variance *η*, the day-level intercourse behavior were generated such that $Xijk=0/1$ with probability of one as $0.6$ for all $k=1,\u2026,6$, $j=1,\u2026,12$, and the covariates $Ui=(Ui1,Ui2)$ were generated such that $Ui1=0/1$ variable with probability of one as $0.5$ each and $Ui2\u223cN(0,1)$. Subjects who had not experienced an event at $j=12$ were censored. Note that in this setup, we have incorporated approximately 8% zero-risk sets which is close to the percentage encountered in the real data analysis in the next section.

Tables 1 and 2 present the performance of the likelihood-based estimators accounting for the zero-risk sets as introduced in (2.1) or ignoring its effect. We made the comparison at various sample sizes ranging from $n=100,200,500$ to even $n=1000$ motivated by some recently completed prospective pregnancy studies, for example, the Life Study, the Oxford Conception Study. The censoring percentage was also varied from 10% to 30%. The results presented are based on 1000 replicates. The column bias refers to the average of the difference between the estimated value of the parameter and the true value, the Avg(SE) refers to the average of the asymptotic standard deviation (calculated using the estimated observed fisher information), SE(est) refers to the standard deviation of the estimates and CP refers to the coverage probability for a 95% confidence interval.

Variable | Corrected for zero-risk periods | Not corrected for zero-risk periods | ||||||

Bias | Avg(SE) | SE(est) | CP | Bias | Avg(SE) | SE(est) | CP | |

$n=100$ | ||||||||

$\beta 1$ | −0.01 | 0.30 | 0.32 | 0.942 | 0.00 | 0.30 | 0.32 | 0.937 |

$\beta 2$ | 0.01 | 0.16 | 0.16 | 0.944 | 0.01 | 0.16 | 0.16 | 0.938 |

$\alpha 1$ | 0.00 | 0.25 | 0.25 | 0.950 | 0.17 | 0.25 | 0.25 | 0.884 |

$\alpha 2$ | 0.00 | 0.25 | 0.26 | 0.952 | 0.17 | 0.25 | 0.26 | 0.877 |

$\alpha 3$ | 0.00 | 0.25 | 0.26 | 0.940 | 0.17 | 0.25 | 0.26 | 0.892 |

$\alpha 4$ | 0.01 | 0.25 | 0.25 | 0.954 | 0.18 | 0.25 | 0.25 | 0.889 |

$\alpha 5$ | 0.01 | 0.25 | 0.26 | 0.955 | 0.19 | 0.25 | 0.26 | 0.867 |

$\alpha 6$ | −0.01 | 0.25 | 0.25 | 0.949 | 0.16 | 0.25 | 0.25 | 0.898 |

η | −0.02 | 0.21 | 0.23 | 0.910 | −0.03 | 0.21 | 0.23 | 0.909 |

$n=200$ | ||||||||

$\beta 1$ | −0.01 | 0.21 | 0.22 | 0.950 | 0.01 | 0.21 | 0.21 | 0.954 |

$\beta 2$ | 0.01 | 0.11 | 0.11 | 0.946 | 0.00 | 0.11 | 0.11 | 0.950 |

$\alpha 1$ | 0.00 | 0.18 | 0.18 | 0.945 | 0.17 | 0.17 | 0.18 | 0.820 |

$\alpha 2$ | −0.01 | 0.17 | 0.18 | 0.948 | 0.16 | 0.17 | 0.18 | 0.855 |

$\alpha 3$ | 0.00 | 0.17 | 0.18 | 0.956 | 0.17 | 0.17 | 0.18 | 0.817 |

$\alpha 4$ | 0.00 | 0.17 | 0.18 | 0.937 | 0.17 | 0.17 | 0.18 | 0.833 |

$\alpha 5$ | 0.01 | 0.17 | 0.18 | 0.949 | 0.19 | 0.17 | 0.18 | 0.800 |

$\alpha 6$ | 0.00 | 0.18 | 0.17 | 0.955 | 0.16 | 0.17 | 0.18 | 0.842 |

η | −0.01 | 0.15 | 0.15 | 0.931 | −0.02 | 0.15 | 0.15 | 0.935 |

$n=500$ | ||||||||

$\beta 1$ | −0.01 | 0.13 | 0.14 | 0.946 | 0.01 | 0.13 | 0.14 | 0.948 |

$\beta 2$ | 0.00 | 0.07 | 0.07 | 0.958 | 0.00 | 0.07 | 0.07 | 0.958 |

$\alpha 1$ | 0.00 | 0.11 | 0.11 | 0.961 | 0.16 | 0.11 | 0.11 | 0.699 |

$\alpha 2$ | 0.00 | 0.11 | 0.11 | 0.939 | 0.17 | 0.11 | 0.11 | 0.661 |

$\alpha 3$ | 0.00 | 0.11 | 0.11 | 0.944 | 0.16 | 0.11 | 0.11 | 0.679 |

$\alpha 4$ | 0.00 | 0.11 | 0.11 | 0.944 | 0.17 | 0.11 | 0.11 | 0.639 |

$\alpha 5$ | −0.01 | 0.11 | 0.11 | 0.948 | 0.17 | 0.11 | 0.11 | 0.666 |

$\alpha 6$ | −0.01 | 0.11 | 0.11 | 0.945 | 0.15 | 0.11 | 0.11 | 0.696 |

η | 0.00 | 0.09 | 0.10 | 0.948 | −0.01 | 0.09 | 0.10 | 0.940 |

$n=1000$ | ||||||||

$\beta 1$ | −0.01 | 0.09 | 0.09 | 0.949 | 0.01 | 0.09 | 0.09 | 0.948 |

$\beta 2$ | 0.00 | 0.05 | 0.05 | 0.967 | −0.01 | 0.05 | 0.05 | 0.959 |

$\alpha 1$ | 0.00 | 0.08 | 0.08 | 0.936 | 0.16 | 0.08 | 0.08 | 0.456 |

$\alpha 2$ | 0.00 | 0.08 | 0.07 | 0.955 | 0.16 | 0.08 | 0.07 | 0.411 |

$\alpha 3$ | 0.00 | 0.08 | 0.08 | 0.951 | 0.16 | 0.08 | 0.08 | 0.438 |

$\alpha 4$ | 0.00 | 0.08 | 0.08 | 0.950 | 0.16 | 0.08 | 0.08 | 0.420 |

$\alpha 5$ | 0.01 | 0.08 | 0.07 | 0.959 | 0.18 | 0.08 | 0.07 | 0.342 |

$\alpha 6$ | 0.00 | 0.08 | 0.08 | 0.953 | 0.16 | 0.08 | 0.08 | 0.451 |

η | 0.00 | 0.07 | 0.07 | 0.950 | 0.00 | 0.07 | 0.07 | 0.948 |

Variable | Corrected for zero-risk periods | Not corrected for zero-risk periods | ||||||

Bias | Avg(SE) | SE(est) | CP | Bias | Avg(SE) | SE(est) | CP | |

$n=100$ | ||||||||

$\beta 1$ | −0.01 | 0.30 | 0.32 | 0.942 | 0.00 | 0.30 | 0.32 | 0.937 |

$\beta 2$ | 0.01 | 0.16 | 0.16 | 0.944 | 0.01 | 0.16 | 0.16 | 0.938 |

$\alpha 1$ | 0.00 | 0.25 | 0.25 | 0.950 | 0.17 | 0.25 | 0.25 | 0.884 |

$\alpha 2$ | 0.00 | 0.25 | 0.26 | 0.952 | 0.17 | 0.25 | 0.26 | 0.877 |

$\alpha 3$ | 0.00 | 0.25 | 0.26 | 0.940 | 0.17 | 0.25 | 0.26 | 0.892 |

$\alpha 4$ | 0.01 | 0.25 | 0.25 | 0.954 | 0.18 | 0.25 | 0.25 | 0.889 |

$\alpha 5$ | 0.01 | 0.25 | 0.26 | 0.955 | 0.19 | 0.25 | 0.26 | 0.867 |

$\alpha 6$ | −0.01 | 0.25 | 0.25 | 0.949 | 0.16 | 0.25 | 0.25 | 0.898 |

η | −0.02 | 0.21 | 0.23 | 0.910 | −0.03 | 0.21 | 0.23 | 0.909 |

$n=200$ | ||||||||

$\beta 1$ | −0.01 | 0.21 | 0.22 | 0.950 | 0.01 | 0.21 | 0.21 | 0.954 |

$\beta 2$ | 0.01 | 0.11 | 0.11 | 0.946 | 0.00 | 0.11 | 0.11 | 0.950 |

$\alpha 1$ | 0.00 | 0.18 | 0.18 | 0.945 | 0.17 | 0.17 | 0.18 | 0.820 |

$\alpha 2$ | −0.01 | 0.17 | 0.18 | 0.948 | 0.16 | 0.17 | 0.18 | 0.855 |

$\alpha 3$ | 0.00 | 0.17 | 0.18 | 0.956 | 0.17 | 0.17 | 0.18 | 0.817 |

$\alpha 4$ | 0.00 | 0.17 | 0.18 | 0.937 | 0.17 | 0.17 | 0.18 | 0.833 |

$\alpha 5$ | 0.01 | 0.17 | 0.18 | 0.949 | 0.19 | 0.17 | 0.18 | 0.800 |

$\alpha 6$ | 0.00 | 0.18 | 0.17 | 0.955 | 0.16 | 0.17 | 0.18 | 0.842 |

η | −0.01 | 0.15 | 0.15 | 0.931 | −0.02 | 0.15 | 0.15 | 0.935 |

$n=500$ | ||||||||

$\beta 1$ | −0.01 | 0.13 | 0.14 | 0.946 | 0.01 | 0.13 | 0.14 | 0.948 |

$\beta 2$ | 0.00 | 0.07 | 0.07 | 0.958 | 0.00 | 0.07 | 0.07 | 0.958 |

$\alpha 1$ | 0.00 | 0.11 | 0.11 | 0.961 | 0.16 | 0.11 | 0.11 | 0.699 |

$\alpha 2$ | 0.00 | 0.11 | 0.11 | 0.939 | 0.17 | 0.11 | 0.11 | 0.661 |

$\alpha 3$ | 0.00 | 0.11 | 0.11 | 0.944 | 0.16 | 0.11 | 0.11 | 0.679 |

$\alpha 4$ | 0.00 | 0.11 | 0.11 | 0.944 | 0.17 | 0.11 | 0.11 | 0.639 |

$\alpha 5$ | −0.01 | 0.11 | 0.11 | 0.948 | 0.17 | 0.11 | 0.11 | 0.666 |

$\alpha 6$ | −0.01 | 0.11 | 0.11 | 0.945 | 0.15 | 0.11 | 0.11 | 0.696 |

η | 0.00 | 0.09 | 0.10 | 0.948 | −0.01 | 0.09 | 0.10 | 0.940 |

$n=1000$ | ||||||||

$\beta 1$ | −0.01 | 0.09 | 0.09 | 0.949 | 0.01 | 0.09 | 0.09 | 0.948 |

$\beta 2$ | 0.00 | 0.05 | 0.05 | 0.967 | −0.01 | 0.05 | 0.05 | 0.959 |

$\alpha 1$ | 0.00 | 0.08 | 0.08 | 0.936 | 0.16 | 0.08 | 0.08 | 0.456 |

$\alpha 2$ | 0.00 | 0.08 | 0.07 | 0.955 | 0.16 | 0.08 | 0.07 | 0.411 |

$\alpha 3$ | 0.00 | 0.08 | 0.08 | 0.951 | 0.16 | 0.08 | 0.08 | 0.438 |

$\alpha 4$ | 0.00 | 0.08 | 0.08 | 0.950 | 0.16 | 0.08 | 0.08 | 0.420 |

$\alpha 5$ | 0.01 | 0.08 | 0.07 | 0.959 | 0.18 | 0.08 | 0.07 | 0.342 |

$\alpha 6$ | 0.00 | 0.08 | 0.08 | 0.953 | 0.16 | 0.08 | 0.08 | 0.451 |

η | 0.00 | 0.07 | 0.07 | 0.950 | 0.00 | 0.07 | 0.07 | 0.948 |

Avg(SE), the average of the asymptotic standard deviation; SE(est), the standard deviation of the estimates and CP, the coverage probability for a 95% confidence interval.

Corrected for zero-risk periods | Not corrected for zero-risk periods | |||||||

Variable | Bias | Avg(SE) | SE(est) | CP | Bias | Avg(SE) | SE(est) | CP |

$n=100$ | ||||||||

$\beta 1$ | −0.01 | 0.32 | 0.33 | 0.947 | -0.01 | 0.32 | 0.33 | 0.949 |

$\beta 2$ | 0.01 | 0.16 | 0.17 | 0.944 | 0.01 | 0.16 | 0.17 | 0.946 |

$\alpha 1$ | −0.01 | 0.27 | 0.27 | 0.953 | 0.12 | 0.27 | 0.27 | 0.927 |

$\alpha 2$ | −0.01 | 0.27 | 0.28 | 0.954 | 0.12 | 0.26 | 0.28 | 0.913 |

$\alpha 3$ | 0.00 | 0.27 | 0.27 | 0.950 | 0.14 | 0.26 | 0.26 | 0.927 |

$\alpha 4$ | 0.01 | 0.26 | 0.27 | 0.947 | 0.15 | 0.26 | 0.27 | 0.905 |

$\alpha 5$ | 0.01 | 0.26 | 0.25 | 0.963 | 0.15 | 0.26 | 0.25 | 0.915 |

$\alpha 6$ | −0.01 | 0.27 | 0.27 | 0.957 | 0.12 | 0.27 | 0.27 | 0.918 |

η | −0.02 | 0.31 | 0.34 | 0.900 | −0.01 | 0.32 | 0.34 | 0.899 |

$n=200$ | ||||||||

$\beta 1$ | 0.00 | 0.22 | 0.23 | 0.947 | 0.00 | 0.22 | 0.23 | 0.944 |

$\beta 2$ | 0.01 | 0.11 | 0.12 | 0.933 | 0.01 | 0.11 | 0.12 | 0.937 |

$\alpha 1$ | 0.00 | 0.19 | 0.18 | 0.957 | 0.13 | 0.19 | 0.18 | 0.907 |

$\alpha 2$ | 0.00 | 0.19 | 0.18 | 0.961 | 0.13 | 0.18 | 0.18 | 0.886 |

$\alpha 3$ | 0.01 | 0.18 | 0.19 | 0.934 | 0.14 | 0.18 | 0.19 | 0.867 |

$\alpha 4$ | 0.00 | 0.18 | 0.19 | 0.945 | 0.14 | 0.18 | 0.19 | 0.876 |

$\alpha 5$ | 0.01 | 0.18 | 0.18 | 0.959 | 0.15 | 0.18 | 0.18 | 0.853 |

$\alpha 6$ | 0.00 | 0.19 | 0.18 | 0.956 | 0.13 | 0.19 | 0.18 | 0.889 |

η | −0.02 | 0.22 | 0.23 | 0.935 | −0.02 | 0.22 | 0.23 | 0.934 |

$n=500$ | ||||||||

$\beta 1$ | 0.00 | 0.14 | 0.15 | 0.935 | 0.00 | 0.14 | 0.15 | 0.935 |

$\beta 2$ | 0.00 | 0.07 | 0.07 | 0.959 | 0.00 | 0.07 | 0.07 | 0.955 |

$\alpha 1$ | 0.00 | 0.12 | 0.12 | 0.958 | 0.13 | 0.12 | 0.12 | 0.786 |

$\alpha 2$ | $\u22120.01$ | 0.12 | 0.11 | 0.965 | 0.12 | 0.12 | 0.11 | 0.808 |

$\alpha 3$ | $\u22120.01$ | 0.12 | 0.12 | 0.949 | 0.13 | 0.12 | 0.12 | 0.797 |

$\alpha 4$ | 0.01 | 0.12 | 0.12 | 0.948 | 0.15 | 0.11 | 0.12 | 0.735 |

$\alpha 5$ | 0.00 | 0.11 | 0.12 | 0.948 | 0.14 | 0.11 | 0.12 | 0.751 |

$\alpha 6$ | −0.01 | 0.12 | 0.12 | 0.948 | 0.12 | 0.12 | 0.12 | 0.795 |

η | 0.00 | 0.14 | 0.14 | 0.954 | 0.00 | 0.14 | 0.14 | 0.952 |

$n=1000$ | ||||||||

$\beta 1$ | 0.00 | 0.10 | 0.10 | 0.951 | 0.00 | 0.10 | 0.10 | 0.950 |

$\beta 2$ | 0.00 | 0.05 | 0.05 | 0.965 | 0.00 | 0.05 | 0.05 | 0.964 |

$\alpha 1$ | 0.00 | 0.08 | 0.08 | 0.956 | 0.13 | 0.08 | 0.08 | 0.659 |

$\alpha 2$ | 0.00 | 0.08 | 0.08 | 0.943 | 0.13 | 0.08 | 0.08 | 0.622 |

$\alpha 3$ | 0.00 | 0.08 | 0.08 | 0.944 | 0.13 | 0.08 | 0.08 | 0.608 |

$\alpha 4$ | 0.00 | 0.08 | 0.08 | 0.950 | 0.13 | 0.08 | 0.08 | 0.621 |

$\alpha 5$ | 0.00 | 0.08 | 0.08 | 0.949 | 0.15 | 0.08 | 0.08 | 0.566 |

$\alpha 6$ | 0.00 | 0.08 | 0.08 | 0.945 | 0.13 | 0.08 | 0.08 | 0.646 |

η | 0.00 | 0.10 | 0.10 | 0.943 | 0.00 | 0.10 | 0.10 | 0.948 |

Corrected for zero-risk periods | Not corrected for zero-risk periods | |||||||

Variable | Bias | Avg(SE) | SE(est) | CP | Bias | Avg(SE) | SE(est) | CP |

$n=100$ | ||||||||

$\beta 1$ | −0.01 | 0.32 | 0.33 | 0.947 | -0.01 | 0.32 | 0.33 | 0.949 |

$\beta 2$ | 0.01 | 0.16 | 0.17 | 0.944 | 0.01 | 0.16 | 0.17 | 0.946 |

$\alpha 1$ | −0.01 | 0.27 | 0.27 | 0.953 | 0.12 | 0.27 | 0.27 | 0.927 |

$\alpha 2$ | −0.01 | 0.27 | 0.28 | 0.954 | 0.12 | 0.26 | 0.28 | 0.913 |

$\alpha 3$ | 0.00 | 0.27 | 0.27 | 0.950 | 0.14 | 0.26 | 0.26 | 0.927 |

$\alpha 4$ | 0.01 | 0.26 | 0.27 | 0.947 | 0.15 | 0.26 | 0.27 | 0.905 |

$\alpha 5$ | 0.01 | 0.26 | 0.25 | 0.963 | 0.15 | 0.26 | 0.25 | 0.915 |

$\alpha 6$ | −0.01 | 0.27 | 0.27 | 0.957 | 0.12 | 0.27 | 0.27 | 0.918 |

η | −0.02 | 0.31 | 0.34 | 0.900 | −0.01 | 0.32 | 0.34 | 0.899 |

$n=200$ | ||||||||

$\beta 1$ | 0.00 | 0.22 | 0.23 | 0.947 | 0.00 | 0.22 | 0.23 | 0.944 |

$\beta 2$ | 0.01 | 0.11 | 0.12 | 0.933 | 0.01 | 0.11 | 0.12 | 0.937 |

$\alpha 1$ | 0.00 | 0.19 | 0.18 | 0.957 | 0.13 | 0.19 | 0.18 | 0.907 |

$\alpha 2$ | 0.00 | 0.19 | 0.18 | 0.961 | 0.13 | 0.18 | 0.18 | 0.886 |

$\alpha 3$ | 0.01 | 0.18 | 0.19 | 0.934 | 0.14 | 0.18 | 0.19 | 0.867 |

$\alpha 4$ | 0.00 | 0.18 | 0.19 | 0.945 | 0.14 | 0.18 | 0.19 | 0.876 |

$\alpha 5$ | 0.01 | 0.18 | 0.18 | 0.959 | 0.15 | 0.18 | 0.18 | 0.853 |

$\alpha 6$ | 0.00 | 0.19 | 0.18 | 0.956 | 0.13 | 0.19 | 0.18 | 0.889 |

η | −0.02 | 0.22 | 0.23 | 0.935 | −0.02 | 0.22 | 0.23 | 0.934 |

$n=500$ | ||||||||

$\beta 1$ | 0.00 | 0.14 | 0.15 | 0.935 | 0.00 | 0.14 | 0.15 | 0.935 |

$\beta 2$ | 0.00 | 0.07 | 0.07 | 0.959 | 0.00 | 0.07 | 0.07 | 0.955 |

$\alpha 1$ | 0.00 | 0.12 | 0.12 | 0.958 | 0.13 | 0.12 | 0.12 | 0.786 |

$\alpha 2$ | $\u22120.01$ | 0.12 | 0.11 | 0.965 | 0.12 | 0.12 | 0.11 | 0.808 |

$\alpha 3$ | $\u22120.01$ | 0.12 | 0.12 | 0.949 | 0.13 | 0.12 | 0.12 | 0.797 |

$\alpha 4$ | 0.01 | 0.12 | 0.12 | 0.948 | 0.15 | 0.11 | 0.12 | 0.735 |

$\alpha 5$ | 0.00 | 0.11 | 0.12 | 0.948 | 0.14 | 0.11 | 0.12 | 0.751 |

$\alpha 6$ | −0.01 | 0.12 | 0.12 | 0.948 | 0.12 | 0.12 | 0.12 | 0.795 |

η | 0.00 | 0.14 | 0.14 | 0.954 | 0.00 | 0.14 | 0.14 | 0.952 |

$n=1000$ | ||||||||

$\beta 1$ | 0.00 | 0.10 | 0.10 | 0.951 | 0.00 | 0.10 | 0.10 | 0.950 |

$\beta 2$ | 0.00 | 0.05 | 0.05 | 0.965 | 0.00 | 0.05 | 0.05 | 0.964 |

$\alpha 1$ | 0.00 | 0.08 | 0.08 | 0.956 | 0.13 | 0.08 | 0.08 | 0.659 |

$\alpha 2$ | 0.00 | 0.08 | 0.08 | 0.943 | 0.13 | 0.08 | 0.08 | 0.622 |

$\alpha 3$ | 0.00 | 0.08 | 0.08 | 0.944 | 0.13 | 0.08 | 0.08 | 0.608 |

$\alpha 4$ | 0.00 | 0.08 | 0.08 | 0.950 | 0.13 | 0.08 | 0.08 | 0.621 |

$\alpha 5$ | 0.00 | 0.08 | 0.08 | 0.949 | 0.15 | 0.08 | 0.08 | 0.566 |

$\alpha 6$ | 0.00 | 0.08 | 0.08 | 0.945 | 0.13 | 0.08 | 0.08 | 0.646 |

η | 0.00 | 0.10 | 0.10 | 0.943 | 0.00 | 0.10 | 0.10 | 0.948 |

Avg(SE), the average of the asymptotic standard deviation; SE(est), the standard deviation of the estimates and CP, the coverage probability for a 95% confidence interval.

Observe that the bias of the estimates of the regression coefficients for the zero-risk set corrected model are reasonably small even for small sample size, and their performance improves considerably as *n* increases. Furthermore, the estimated standard deviation gets closer to the sampling standard deviation with the sample size. The coverage probability for the estimates are close to the nominal 95% and become closer as the sample size increases. However, the considerable bias in estimates of the $\alpha k$'s when the zero-risk sets are ignored causes the coverage probabilities to be much lower than the nominal value. In addition, the estimates of *β* and *η* have higher bias and standard error when zero-risk sets are ignored for both settings and all *n*. Overall, the estimates for the corrected model perform very well even for small sample size. We further observe that the coverage probability for the 95% confidence interval for the variance of the random effect was lower than that for other parameters for $n=100.$ However, it becomes much closer to the nominal level of 95% as the sample size increases from $n=200$ to $n=500$ and $n=1000.$

We next present our analysis of New York State Angler Prospective Pregnancy Cohort Study.

## ANALYSIS OF NEW YORK ANGLER PROSPECTIVE PREGNANCY COHORT STUDY

We illustrate our proposed method by analyzing the New York State Angler Prospective Pregnancy Cohort Study (Buck Louis *and others*, 2009). This prospective cohort study recruited women aged 20 to 34 years from 16 counties surrounding Lakes Erie and Ontario who were discontinuing contraception for the purposes of becoming pregnant. Women were followed until a human chorionic gonadotropin detected pregnancy was observed or up to 12 menstrual cycles at risk for pregnancy. A nice feature of this study is the follow-up time of 12 “at-risk” menstrual cycles, which is much longer than the 6 cycle follow-up of most other studies, see Buck *and others* (2004). Note that clinically a couple is eligible for infertility treatment if they do not conceive by 12 menstrual cycles. Among the 113 women recruited, 14 were pregnant at baseline and, thereby, excluded. Eighty-three (84%) women completed daily diaries on menstruation, sexual intercourse, home pregnancy test results and covariates believed to impact female fecundity. Given the absence of a proxy biomarker for ovulation in the study, we utilized the Ogino-Knaus method for estimating ovulation by counting back 14 days from the end of the cycle (Knaus, 1929; Ogino, 1930). *A priori*, the fertile window was defined as comprising eight days before ovulation through three days after ovulation (Buck Louis *and others*, 2009). In our analysis, we will focus on the following covariates, namely, female age (years) upon enrollment, parity (yes/no), cigarette smoking (yes/no) during cycle. We fitted the model

Variable | Estimate | SE | 95% CI |

Parity | 1.235 | 0.424 | (0.404, 2.065) |

Age | 0.237 | 0.174 | (−0.105, 0.578) |

Smoke | −1.079 | 0.454 | (−1.969, −0.189) |

$\alpha \u22128$ | −0.611 | 0.381 | (−1.357, 0.136) |

$\alpha \u22127$ | 0.403 | 0.336 | (−0.256, 1.062) |

$\alpha \u22126$ | 0.348 | 0.337 | (−0.314, 1.009) |

$\alpha \u22125$ | 0.393 | 0.329 | (−0.252, 1.039) |

$\alpha \u22124$ | 1.088 | 0.331 | (0.439, 1.737) |

$\alpha \u22123$ | 0.511 | 0.300 | (−0.077, 1.099) |

$\alpha \u22122$ | −0.285 | 0.323 | (−0.918, 0.347) |

$\alpha \u22121$ | −0.375 | 0.333 | (−1.027, 0.277) |

$\alpha 0$ | −0.356 | 0.336 | (−1.014, 0.302) |

$\alpha +1$ | −0.497 | 0.329 | (−1.143, 0.148) |

$\alpha +2$ | 0.404 | 0.304 | (−0.193, 1.000) |

$\alpha +3$ | −0.477 | 0.335 | (−1.134, 0.181) |

η | 0.132 | 0.374 | (−0.601, 0.866) |

Variable | Estimate | SE | 95% CI |

Parity | 1.235 | 0.424 | (0.404, 2.065) |

Age | 0.237 | 0.174 | (−0.105, 0.578) |

Smoke | −1.079 | 0.454 | (−1.969, −0.189) |

$\alpha \u22128$ | −0.611 | 0.381 | (−1.357, 0.136) |

$\alpha \u22127$ | 0.403 | 0.336 | (−0.256, 1.062) |

$\alpha \u22126$ | 0.348 | 0.337 | (−0.314, 1.009) |

$\alpha \u22125$ | 0.393 | 0.329 | (−0.252, 1.039) |

$\alpha \u22124$ | 1.088 | 0.331 | (0.439, 1.737) |

$\alpha \u22123$ | 0.511 | 0.300 | (−0.077, 1.099) |

$\alpha \u22122$ | −0.285 | 0.323 | (−0.918, 0.347) |

$\alpha \u22121$ | −0.375 | 0.333 | (−1.027, 0.277) |

$\alpha 0$ | −0.356 | 0.336 | (−1.014, 0.302) |

$\alpha +1$ | −0.497 | 0.329 | (−1.143, 0.148) |

$\alpha +2$ | 0.404 | 0.304 | (−0.193, 1.000) |

$\alpha +3$ | −0.477 | 0.335 | (−1.134, 0.181) |

η | 0.132 | 0.374 | (−0.601, 0.866) |

In Figure 1, we present the conditional day-specific probabilities of conception in cycle *j* for a non-smoking woman aged 30 years (the mean), and with parity = 1 versus 0. Note that the Figure 1(a) indicates the probability of conception in the first cycle given that the couple had intercourse on a specific day ($k=\u22128,\cdots ,+3$) in the fertile window. Figure 1(b) indicates the probability of conception in the second cycle given that the couple had intercourse only on a specific day in the fertile window and had not conceived in the first cycle, (c) and (d) denote these probabilities given that the couple had not conceived by the second and third cycles, respectively. These plots do indicate that parity significantly increases the conception probabilities at the daily-level. Also, the graphs do indicate considerable measurement error associated with identifying the day of ovulation by the Ogino-Knauss method, which is consistent with findings of using other proxies of ovulation with similar measurement errors. This is acknowledged in the reproductive literature, see Lynch *and others* (2006) for a summary.

Figure 2 displays the effect of parity on the TTP survival distribution for a nonsmoking woman aged 30 years with parity = 1 versus 0 and having intercourse on days −4 through 2. Here, it is clear that the parity significantly reduces the TTP, where the estimated median TTP reduces from 11 to 3 cycles.

Finally, we also assessed the model fit based upon a comparison with (i) model with no information concerning intercourse (ii) model with just accounting for total number of intercourse in the fertile window and ignoring the issue of “zero-risk” set as is the common practice (iii) proposed model incorporating the day-specific intercourse behavior in the fertile window. The Akaike information criterion for models (i–iii) were 355.17, 348.42, and 344.78, respectively. Thus, indicating improved model fit when accounting for daily intercourse pattern.

## DISCUSSION

We have proposed a unified approach for modeling fecundity, combining the discrete survival model for TTP and modeling the day-specific probabilities of conception while relaxing the sperm independence assumption. This unified approach is biologically plausible and provides an accessible method to ascertain the effects of covariates directly on the hazard for conception in a cycle. The day-specific conditional probabilities of conception have a practical interpretation for couples who have not yet conceived with regard to their chances in the next cycle, given their current pattern of intercourse behavior. In fact, this approach may be desirable when a prospective pregnancy study does not have daily level hormonal measurements either due to the burden on the subjects or the cost involved in obtaining such information. In such situations, the “fertile window” may be identified with considerable measurement error. Consequently, studying the effects of covariates through the day-level probabilities of conception approach of the hierarchical models may not be the best strategy. We also have illustrated the performance of the proposed likelihood-based estimates in the Angler study. Our findings are indicative of measurement error in specifying the day of ovulation, given the absence of a proxy biomarker in this cohort study. This limitation underscores the importance of cautious interpretation of the estimated day-specific probabilities of conception for this prospective cohort study. To this end, our findings await corroboration from future cohort studies such as the recently completed National Institute of Child Health and Human Development LIFE Study that utilized reliable proxies of ovulation for better-fit models.

An advantage of our approach is that it allows modeling TTP using conventional methods such as the proportional odds or discrete Cox models while accounting for day-level intercourse and other covariates to ensure biologically plausible models for estimation. Our approach also provides a context for assessing the sterile fraction in the context of the so-called “cure fraction.” Finally, another aspect of considerable interest is to assess how the biologically meaningful relaxation of assumption (A1) translates quantitatively. Note, the Dunson and Stanford (2005) model works under the independence assumption (A1), while our model does not have this requirement. An interesting problem to consider in future is to develop a model to account for various dependence structures and use the model that gives the best fit.

## SUPPLEMENTARY MATERIAL

Supplementary material is available at http://biostatistics.oxfordjournals.org.

## FUNDING

Intramural Research Program of the National Institutes of Health; *Eunice Kennedy Shriver* National Institute of Child Health and Human Development.

The authors would like to acknowledge that this study utilized the high-performance computational capabilities of the Biowulf Linux cluster at the National Institutes of Health, Bethesda, MD (http://biowulf.nih.gov). *Conflict of Interest:* None declared.