set more off
clear all
/* Section I:
OVERVIEW
The purpose of this example is to show you how to run a pooled logistic regression
(PLR) model with inverse probability (IP) and inverse probability of censoring
(IPC) weights to estimate the marginal hazard ratio for the effect of a
time-dependent treatment on a survival endpoint, in the presence of at
time-dependent confounder that is affected by treatment, and thus mediates at
least part of its effect. A Cox model adjusting for the confounder-mediator
as a time-dependent covariate (TDC) would at best estimate a conditional
relative hazard for the effect of treatment via other pathways ("at best"
because that model would have to meet conditional independence assumptions
for both treatment and the confounder-mediator, which may be unrealistic
in practice).
The gist is that we have to adjust for current values of the confounder-mediator
to meet the sequential randomization assumption (SRA), the extension of
conditional independence to the longitudinal setting. That is, if the
confounder-mediator is changing, we need to adjust for its history up to the
time treatment is initiated to control for confounding by indication. This
motivates including it as a TDC.
The trouble is that the TDC for the confounder-mediator would keep on getting
updated after treatment is initiated, and so subsequent values would reflect
the effects of treatment. Thus adjusting for it would adjust away whatever
part of the treatment effect is mediated via this pathway.
In contrast, the IPW-PLR model incorporates the time-dependent confounder-
mediator as weights. These make the treated and untreated sub-samples at
any time point representative of the population that would still be at risk
if we had both potential outcomes for each subject. Thus we can use an
unadjusted model for treatment, which avoids adjusting away the indirect
effect, but still get rids of the confounding by indication.
You will see that a big part of this example is taken up with transforming the
dataset for analysis, involving a lot of relatively low-level data management
tasks. If you decide to fit IPW-PLR models, it is very unlikely that the data
you begin with will be in a form suitable for running the final analysis, and
you will need make these moves (or be able to tell your programmer analyst
how to do them). Stata actually makes this about as easy as it can possibly
be, but there is still a lot of it. In the second section of the example, most of
these tasks will be taken care of already.
OBSERVED OUTCOMES DATASET
This do-file runs an analysis of simulated longitudinal data with a time-dependent
treatment, which begins at a specified first month on treatment, and then
continues (in short, once on treatment, always on treatment); a time-dependent
confounder-mediator; and a survival outcome, which occurs, or not, by
the end of follow-up for each subject. Think of the data as arising from
a study of four years duration. Subjects who remain event-free at four years
are censored. A subgroup of them are also lost to follow-up (LTFU) earlier.
POTENTIAL OUTCOMES DATASET
Because this is simulated, not real data, we were also able to simulate the true
potential outcomes, free of informative censoring by early losses to follow-up.
That is, we simulated an outcome assuming the subject was on treatment from the
start, and another as if he or she were off treatment the entire period. Those
are the two treatment histories we would like to compare.
We can analyze those potential outcomes data to get an estimate of the true
marginal hazard ratio, which is hard to calculate analytically from the parameter
values used in generating the data. Of course we can ONLY do this in simulated
data. We will do this just before getting estimates from the observed data.
Comparing the true value with the estimate allows us to evaluate the performance
of the estimate.
PREPARING THE OBSERVED OUTCOMES DATASET FOR PLR ANALYSIS
The dataset, msm.dta is in wide format, with one record per subject, and includes
the following variables:
id: subject id
fmtx: first month on treatment
month: end of follow-up, in months
event: event indicator (0 = censored, 1 = event)
cm_0-cm_48: values of the confounder-mediator measured in each month, with cm_0
giving the baseline value. At the end of the example we will look at what happens
when we only get to see its values every 6 months, but treatment and events
depend on the monthly values.
*/
use msm, clear
describe
list id month event fmtx cm_0 cm_1 cm_2 cm_3 cm_4 cm_5 cm_48 in 1/5, noobs clean
/*
We must first transform msm.dta into the long format we need for the PLR models.
The first step in the procedure is to stset the data, as if we were going to
run a conventional survival analysis, then use the stsplit command to obtain
a single record for each month. The variable start specified in the stsplit
command gives the starting time of each interval; it is not used later, but
stsplit won't run unless we specify a name for it.
*/
stset month, failure(event) id(id) // need to specify id to run stsplit next
stsplit start, at(0/48) // start variable required, not used later
drop start _t0 _st month event // some made by stset, not needed for analyses
rename _t month // month of current record is what we need
rename _d event // event indicator specific to month, only 1 in last month
label variable month "Month of follow-up"
label variable event "Event indicator"
sort id month
save preliminary, replace
* check what this has done
keep if _n <= 100
by id: list month event cm_0 cm_1 cm_2 cm_48, noobs clean
use preliminary, clear
/*
Next we evaluate time-dependent treatment indicator, by comparing the month of the
current record, month, to the first month on treatment (fmtx), which was on the
incoming dataset. The variable fmtx is missing if the subject never began treatment
during the 48 months of follow-up.
*/
capture drop tx
gen byte tx = (month >= fmtx)
/* The statement in parentheses evaluates to 1 if true, otherwise to zero. */
label variable tx "Currently on treatment"
gen tx_lagged = tx[_n-1] if id==id[_n-1]
label variable tx_lagged "On treatment in previous month"
/*
Next we evaluate confounder-mediator as a time-dependent covariate, using the
current value measured monthly; this will be used in estimating the censoring
weights. We also need the lagged value from the previous month, because that
is what we assume drives current treatment decisions, and so will be used in
estimating the treatement weights. Note that the ">" sign is used in the
definition of cm_current and cm_lagged to insure that non-missing values are
assigned at each time point. The last non-missing value is carried forward to
the current observation."
*/
cap drop cm_current cm_lagged
gen cm_current = cm_0
gen cm_lagged = cm_0
label variable cm_current "Confounder-mediator: current value"
label variable cm_lagged "Confounder-mediator: lagged value"
forvalues i = 1/48 {
* next line chooses confounder-mediator value for month `i', if non-missing
qui replace cm_current = cm_`i' if month>=`i' & cm_`i'~= .
local lagged = `i'-1
qui replace cm_lagged = cm_`lagged' if month>=`i' & cm_`lagged' ~= .
}
format cm_current cm_lagged %4.2f
/*
Here we have to add an extra record for the loss-to-follow-up (LTFU) models to be
used in calculating inverse probability of censoring (IPC) weights. LTFU is
taken to happen the month after the last month in follow-up for subjects who remain
event-free and do not continue in the study through month 48. We first calculate
the largest follow-time for each record, then keep those last records if they
do not represent an event (event==0) and are before month 48. For the additional
record, we advance the time by one month, and retain the treatment and
confounder-mediator values, which will be needed for evaluating the inverse
probability of censoring weights later.
*/
gen byte LTFU = 0 // set the LTFU indicator to zero on all existing records
label variable LTFU "Lost to follow-up indicator"
save preliminary, replace // save the main dataset
by id: egen maxt = max(month) // figure out the last month in follow-up
keep if month==maxt & month<48 & event==0 // pick out last record if LTFU
qui replace month = month+1 // reset LTFU event time to the following month
qui replace LTFU = 1 // reset LTFU indicator to 1
keep id month tx cm_0 cm_current LTFU // needed in the LTFU models
append using preliminary // join the new record to the analysis dataset
sort id month
save preliminary, replace
/*
Now print some records for LTFU subjects to see what this has done. You'll see
that we have an extra record on which event is missing, so those records will
not be used in outcome analyses; rather they will only be used in models for
the IPC weights.
*/
by id: egen everLTFU = max(LTFU)
by id: egen lastmonth = max(month)
keep if everLTFU==1 & lastmonth<=15
keep if _n<=100
sort id month
by id: list month LTFU event, noobs clean
use preliminary, clear
/*
Finally we make a restricted cubic spline in months since study entry. Then we
compress and save the almost analysis-ready dataset.
*/
mkspline month_sp = month, cubic
keep id month event fmtx tx tx_lagged month_sp* cm_current cm_0 cm_lagged LTFU
compress // stores variables more efficiently when possible
save preliminary, replace
/**TN: It's good to take a look at the dataset to keep track of what's going on:*/
keep if _n <= 100
sort id fmtx cm_0
by id fmtx cm_0: list month tx cm_current cm_lagged LTFU event, noobs clean
use preliminary, clear
/* Now we will verify that the confounder-mediator is affected by treatment --
in point of fact, with these data, on the value of treatment in the last month.
So we will evaluate the lagged value of the treatment, then run a repeated
measures model to test the association between the lagged value of treatment
and the confounder-mediator.
*/
xtmixed cm_current month_sp* tx_lagged || id: month
/*
So this analysis verifies that lagged treatment is a strong predictor of the
confounder-mediator. So we do need to use the IPW-PLR approach. Accordingly,
these last preparatory step is to make the combined stabilized weight for the
confounder-mediator and censoring.
First we calculate the denominator of the inverse probability of treatment/exposure
weight, which is allowed to depend on the time-dependent value of the confounder-
mediator. This is a PLR model for initiation of treatment, since we assume in
this example that assume once treatment is initiated, it continues without break;
records after the first month on treatment (t>fmtx) are not used; we also exclude
the extra records for subjects who are LTFU before going on treatment. Note that the
argument`varlist' we will specify when we call the program is plugged in the next
line.
*/
logit tx month_sp* cm_lagged if month<=fmtx & LTFU~=1
/*
The next line calculates conditional probability of being on treatment; this
is essentially a time-dependent propensity score.
*/
predict cp_tx_d
label variable cp_tx_d "Conditional prob starting tx"
/*
The conditional probability is 1 after initiation, under our assumption that
once on treatment, always on treatment.
*/
replace cp_tx_d = 1 if month>fmtx
/*
The weight should depend on the probability of treatment received, so we
calculate the compliment for each month where the subject is not yet on treatment.
*/
replace cp_tx_d = 1-cp_tx_d if tx==0
/*
Finally, we calculate marginal probability of treatment history. We don't get
this directly from the PLR model. Because it is restricted to subjects still off
treatment (except for the first month of treatment), it estimates the conditional
probability of starting treatment, given having remained off treatment so far.
To get the marginal probability of remaining off treatment in month 10, say,
we have to take the product of the probabilities of remaining off in month 1,
the conditional probability of remaining off in month 2, given that you remained
off through month 1, and so on, through month 10. If you started on treatment at
month 13, and are at risk in month 16, the marginal probability of your treatment
history then is the product of the probabilities of not starting in months 1-12, the
probability of starting in month 13, given off in 12, and 1 for months 14-16.
That's what the next line of code does, by multiplying each record by all previous
records for the same subject. We have to sort by id and month to make this work.
The if statement at the end of the second command prevents the code from cumulating
the probabilities across subjects; id[_n-1] is the id number of the previous
record in the dataset (which is sorted by id and month).
*/
sort id month
gen mp_tx_d = cp_tx_d // marginal = conditional at time 0
label variable mp_tx_d "Marginal prob of treatment history"
replace mp_tx_d = mp_tx_d * mp_tx_d[_n-1] if id==id[_n-1]
/*
The numerator of the weight is calculated using the same algorithm, but
the time-dependent values of the confounder-mediator are not used in the model.
*/
logit tx month_sp* if month<=fmtx & LTFU~=1
predict cp_tx_n
label variable cp_tx_n "Condition prob of starting treatment"
replace cp_tx_n = 1 if month>fmtx
replace cp_tx_n = 1-cp_tx_n if tx==0
sort id month
gen mp_tx_n = cp_tx_n
label variable mp_tx_n "Marginal probabiliity of treatment history"
replace mp_tx_n = mp_tx_n * mp_tx_n[_n-1] if id==id[_n-1]
/*
In a final step, we calculate the stabilized IP weight as the ratio of the
numerator and denominator marginal probabilities of treatment observed.
*/
gen ip_wt = mp_tx_n/mp_tx_d
label variable ip_wt "Stabilized inverse probability of treatment weight"
/*
The censoring weights are now calculated using essentially the same
strategy, but now LTFU is the outcome, and we are allowed to use
current treatment in both weights; however, the time-dependent confounder-
mediator can only be used in the denominator of the weight. Since we are
trying to weight up the sample remaining in follow-up to the overall
surviving sample, we need to weight by the inverse probability of
remaining in follow-up. Hence we will call what the course notes call
inverse probability of censoring (IPC) weights inverse probability of in
follow-up (ipinfl) weights.
*/
logit LTFU month_sp* tx cm_current // denominator
predict cp_LTFU_d
label variable cp_LTFU_d "Conditional probability of LTFU"
gen cp_infu_d = 1-cp_LTFU_d
label variable cp_infu_d "Conditional probability in FU"
sort id month
gen mp_infu_d = cp_infu_d
label variable mp_infu_d "Marginal probability in FU"
replace mp_infu_d = mp_infu_d * mp_infu_d[_n-1] if id==id[_n-1]
logit LTFU month_sp* tx // numerator
predict cp_LTFU_n
label variable cp_LTFU_n "Conditional probability of LTFU"
gen cp_infu_n = 1-cp_LTFU_n
label variable cp_infu_n "Condition probability in FU"
sort id month
gen mp_infu_n = cp_infu_n
label variable mp_infu_n "Marginal probability in FU"
replace mp_infu_n = mp_infu_n * mp_infu_n[_n-1] if id==id[_n-1]
gen ip_infu_wt = mp_infu_n/mp_infu_d
label variable ip_infu_wt "Stabilized IP in FU weight"
/*
In a final step, we calculate combined stabilized as the product of
the treatment and censoring weights.
*/
gen sc_wt = ip_wt*ip_infu_wt
label variable sc_wt "Stabilized combined weight"
format cp* mp* ip* sc_wt %7.3f
save final, replace
/*
Check how the marginal probabilities change over time.
*/
keep if id==48
sort id month
by id: list month tx cp_tx_d mp_tx_d ip_wt, noobs clean
/*
Have a look at the resulting data for the first subject.
*/
use final, clear
keep if id==48 // keep the record for one subject; feel free to look at someone else
sort id fmtx cm_0
by id fmtx cm_0: list month event tx cm_lagged ip_wt ip_infu_wt sc_wt, noobs clean
/*
Make sure you understand all the variables you see.
ESTIMATING THE TRUE TREATMENT EFFECT USING THE POTENTIAL OUTCOMES DATASET
We can use this dataset and fit a PLR model to estimate the parameters of the
true marginal structural model. The dataset has already been transformed into
the format needed.
*/
use msm_potential_outcomes, clear
describe
logistic event tx month_sp*
/*
In this analysis, the odds-ratio of 0.25 is interpretable as a hazard ratio,
because the data for each monthly outcome are restricted to those still at risk
(so like the hazard ratio, the OR is conditional on having survived to the
current month), and the incidence is very low in any given month (so the OR
and HR don't differ much).
RUNNING THE ANALYSES USING THE OBSERVED DATA
First we will run an unadjusted PLR model, which would be expected to give an
estimate of the treatment effect that is badly confounded by indication.
*/
use final, clear
logistic event tx month_sp*
/*
Comparing the result to the true marginal RH of about 0.25, that is the case.
Now we run a PLR model adjusting for the current value of confounder-mediator.
This is the naive model that inverse weighting is supposed to improve upon.
Because we are adjusting away the effects of treatment mediated by its effects
on the confounder-mediator, we would also expect the treatment to appear less
beneficial in this model. We need to turn on the robust SEs because we are
including a TDC in the model.
*/
logistic event tx month_sp* cm_current, vce(cluster id)
/*
So the bias here is very considerable. (In many applications it is not so bad; we
turned up the heat to make the point).
Now we are almost ready to run the IPW-PLR model.
But we should check the weights before proceeding -- really big ones are a
common problem and represent potential violations of the positivity assumption.
Big weights can be trimmed or those observations can be dropped, but no well-
established methods exist for doing this. Note that because of the stabilization,
a weight >=20 no longer corresponds to probability of observed treatment history
of less than 5%.
*/
sum sc_wt, detail
/*
Since there are essentially no big weights, we can run the model using the combined
stabilized weights as estimated. The weights are specified as pweights, as in
the IPTW propensity score models, and we use robust SEs, also required in this
application.
*/
logistic event tx month_sp* [pweight=sc_wt], vce(cluster id)
/*
Finally, the estimate is reasonABLY close to the relative hazard of 0.25 that we
obtained from the potential outcomes dataset. So at least with nice clean
simulated data like this, IPW-PLR can work well.
Section II
In this section of the example, we will consider four extensions of the IPW-PLR
model from Section I. That model used a failure time outcome, assumed that
once treatment began, it would continue without interruption until failure or
censoring, and that the treatment effect would be constant across time. In this
section, we will consider four extensions:
1. Treatment that can start and stop repeatedly. The models for the IP
weights have to be modified to accommodate this, by including previous
treatment as a predictor, and by including observations after treatment
is initiated. The line of code setting the conditional probability of
treatment to 1 after initiation is not needed, because continuation on
treatment is not a sure thing after initiation.
The causal effect estimated in this model is still the effect of being on
treatment. So this model posits an immediate effect of being on treatment,
which will not always hold. In the next part, we consider how to estimate
a treatment effect in cases where there may be no immediate effect, but
with time the effect increases.
2. Treatment effects that vary with time. We just need to include the
effect of time on treatment in the IPW-PLR model to accommodate this. In
this part of the example, we stick with once-on-treatment, always on treatment,
and failure time outcomes, but accommodating time-dependent treatment
effects could be done the same way in other contexts. Time-on-treatment
would admittedly be tricky to define in the on-off case: for example,
would it make more sense just to include the current episode of use in
calculating time on treatment, or should we include all observed episodes,
perhaps discounting more distant episodes? This is a substantive issue
depending on how the treatment works and the length of the gaps between
episodes of use. In either case, this type of variable treatment effect
is estimated by specifying a model that allows the effect to vary with
time on treatment. The protective effect of statins against CHD might
more or less follow this pattern; its effects though reducing LDL would
be expected to take time, though there might be a more immediate effect
through the inflammation pathway.
3. Repeated rather than failure time outcomes. Rather than IPW-PLR models
for the outcome we have to use IP-weighted repeated measures models. In
the example, treatment can stop and start, the repeated outcome is binary,
and is measured at 6, 12, 18, .... 48 months. For example, a repeated
outcome might be hospitalizations for asthma, and treatments might be
use of controller medications.
4. A baseline covariate modifies the effect of treatment. The treatment
pattern is once-treated, always-treated, as in part 2 and Section 1, and
the outcome is a failure time, as in parts 1 and 2 and Section 1. For
example, is the effect of bisphophonates on hip fracture may differ between
men and women, or by baseline BMD.
In all four cases, informative censoring is allowed, so IPC weights are again
needed, but nothing about them changes. For each of the four extensions, a
different version of the simulated data is provided, for which the modified
analysis would be appropriate. This does simplify the problem, to provide a
gentler introduction to these models. With real data it's easy enough to know
or figure out in advance whether treatment follows the once-on, always-on
pattern, and whether your outcome is a repeated measure or a failure time.
But it would be up to you to determine whether the effect of treatment changes
with time or is modified by a baseline covariate.
We also need to make some minor modifications to the data management steps,
detailed below. I simplified the variable names, to make the code more compact
and easier to write without making typos, but they are systematic and easy to
learn; use describe to get the variable labels.
To clarify temporality, the order of events is assumed to be as follows: the
confounder-mediator is measured, and on that basis, treatment may be modified.
By the next month or visit, treatment has affected the confounder-mediator,
and risk of the outcome depends on the new value of both treatment and the
confounder-mediator. Accordingly, at any given time point: - treatment depends
on the "lagged" (ie most recent) value of the confounder-mediator, as well as
the lagged value of treatment in cases where it can stop and start - the
confounder-mediator depends on the current value of treatment - the outcome
depends on the current values of treatment and the confounder mediator.
This new set-up requires a variable giving the value of the confounder-mediator
at the most recent time point, which we make using code similar to that used
for LVCF values.
To keep things within bounds, the analyses using LVCF values of the confounder-
mediator are omitted. Also, the transformations to long format for the potential
outcomes datasets have been done in the data generation step -- you can just
have a look at the data and run the model. Finally, we suggest doing at least
2 parts of this section, the ones that seem most likely to be useful for you,
only going on to the third and fourth extensions if they also seem worthwhile.
Feel free to take looks at the data in additional places; use methods from last
week to do that.
PART 1: ON-OFF TREATMENT
In this set-up, we assume that the probability of remaining on treatment depends
strongly on treatment status in the previous month, with subjects who are
already on treatment much more likely to remain on it, despite improvements
in the confounder-mediator. So we have to evaluate a variable giving treatment
status in the previous month.
PREPARING THE OBSERVED OUTCOMES DATASET FOR ANALYSIS
The dataset, msm_onoff.dta is in wide format and includes the following variables:
id: subject id
month: end of follow-up, in months
event: event indicator (0 = censored, 1 = event)
cm_0-cm_48: values of the confounder-mediator measured in each month
tx_0-tx_48: indicators for current treatment in each month
tx_months: a variable giving the total months on treatment -- not needed in the
analysis, but good for picking out subjects with more time on treatment
*/
use msm_onoff, clear
describe
list id month event tx_months tx_0-tx_9 in 1/10, noobs clean
* The transformation into long format is unchanged.
stset month, failure(event) id(id)
stsplit start, at(0/48)
drop start _t0 _st month event
rename _t month
rename _d event
label variable month "Month of follow-up"
label variable event "Event indicator"
label values event noyes
sort id month
/**TN: I added next 2 commands to take a look at the data for some subjects
with events**/
list id if event==1 & id <30
list id month cm_0 cm_1 cm_2 tx_0 tx_1 tx_months event if id==7|id==14
/* Next we evaluate TDCs for current and lagged treatment, which will be used
in the models for current treatment and censoring. */
cap drop tx
cap drop tx_lagged
gen byte tx = 0 // no one on treatment at baseline -- new users only
gen byte tx_lagged = 0
forvalues i = 1/48 {
qui replace tx = tx_`i' if month>=`i' & tx_`i'~= . //use most recent if missing
local lagged = `i'-1 // defines "macro" variable analogous to `i'
qui replace tx_lagged = tx_`lagged' if month>=`i' & tx_`lagged' ~=.
}
label variable tx "Currently on treatment"
label variable tx_lagged "On treatment in previous month"
label values tx tx_lagged noyes
* current and lagged values of confounder-mediator; same code as above
cap drop cm
gen cm_current = cm_0
gen cm_lagged = cm_0
label variable cm_current "Confounder-mediator: current value"
label variable cm_lagged "Confounder-mediator: lagged value"
forvalues i = 1/48 {
qui replace cm_current = cm_`i' if month>=`i' & cm_`i'~= .
local lagged = `i'-1
qui replace cm_lagged = cm_`lagged' if month>=`i' & cm_`lagged'~= .
}
format cm_current cm_lagged %5.3f
/**TN: I added next command to make sure these commands did what we wanted**/
list id month cm_lagged cm_current tx_lagged tx tx_months event if id==7
* extra record for LTFU, unchanged except for the keep list
gen byte LTFU = 0
label variable LTFU "Lost to follow-up indicator"
save preliminary, replace
by id: egen maxt = max(month)
keep if month==maxt & month<48 & event==0
qui replace month = month+1
qui replace LTFU = 1
keep id month tx tx_lagged cm_0 cm_current cm_lagged tx_months LTFU
append using preliminary
sort id month
*spline in month number, unchanged, for fitting baseline event rate
mkspline month_sp = month, cubic
keep id month event tx tx_lagged tx_months month_sp* cm_current cm_lagged LTFU
save preliminary, replace
* Take a look at some records for observations with time on treatment
keep if tx_months>0
keep if _n <= 100
sort id tx_months
by id tx_months: list month event tx tx_lagged cm_current cm_lagged, noobs clean
use preliminary, clear
/* The last preparatory step is to make the combined stabilized weight for the
confounder-mediator and censoring. The differences are that we adjust for
treatment history (using insider information that the lagged value is all that
matters). Also, records after the first month of treatment are included in the
models for treatment, because treatment can switch off and then on again.
Finally, we do not impute cp_tx=1 for all months after the first month of
treatment, for the same reason; it is no longer a done deal at that point.
In accord with the clarified temporality, we used the lagged value of the
confounder-mediator in the models for treatment. Note that we now have to
exclude the extra LTFU records in the IP weight models. */
* IP weights
logit tx cm_lagged tx_lagged month_sp* if event~= . // exclude extra LTFU records
/**TN: Recall cp is for cumulative probability and _d and _n refer to denominator
and numerator for the stabilized weights.**/
predict cp_tx_d
replace cp_tx_d = 1-cp_tx_d if tx==0
sort id month
gen mp_tx_d = cp_tx_d
replace mp_tx_d = mp_tx_d*mp_tx_d[_n-1] if id==id[_n-1]
logit tx tx_lagged month_sp* if event~=.
predict cp_tx_n
replace cp_tx_n = 1-cp_tx_n if tx==0
sort id month
gen mp_tx_n = cp_tx_n
replace mp_tx_n = mp_tx_n*mp_tx_n[_n-1] if id==id[_n-1]
gen ip_tx_wt = mp_tx_n/mp_tx_d
* IPC weights
logit LTFU cm_current tx month_sp*
predict cp_LTFU_d
gen cp_infu_d = 1-cp_LTFU_d
sort id month
gen mp_infu_d = cp_infu_d
replace mp_infu_d = mp_infu_d*mp_infu_d[_n-1] if id==id[_n-1]
logit LTFU tx month_sp*
predict cp_LTFU_n
gen cp_infu_n = 1-cp_LTFU_n
sort id month
gen mp_infu_n = cp_infu_n
replace mp_infu_n = mp_infu_n*mp_infu_n[_n-1] if id==id[_n-1]
gen ip_infu_wt = mp_infu_n/mp_infu_d
* stabilized combined weight
gen sc_wt = ip_tx_wt*ip_infu_wt
label variable sc_wt "Stabilized combined weight"
format cp* mp* ip* sc_wt %7.3f
* check positivity and trim weights
recode cp_tx_d 0/0.05=1 0.05/0.95=0 0.95/1=1, gen(ppv)
label variable ppv "Potential positivity violation"
label values ppv noyes
tab ppv
sum sc_wt if ppv==1, detail
/* In this case, there are a lot of PPV observations, but only a few have big
weights. The next step shows that there are additional big weights among the
rest. We will use sensitivity analyses to see if the big weights make a
difference. To conduct a proper analysis omitting the PPV observations, we
would need to recalculate the weights after omitting the PPV observations;
we will do that in Part 4 of the example. */
sum sc_wt, detail
recode sc_wt 0/20=0 20/max=1, gen(big_wt)
label variable big_wt "Stabilized combined weight > 20"
label values big_wt noyes
tab big_wt
recode sc_wt 20/max=20, gen(tr_sc_wt)
label variable tr_sc_wt "Trimmed stabilized combined weight"
save final, replace
* Have a look at the resulting data for the first few subject
use final, clear
keep if id<=5
sort id tx_months
by id tx_months: list month event ip_tx_wt ip_infu_wt sc_wt, noobs clean
/*
RUNNING THE ANALYSIS USING THE POTENTIAL OUTCOMES DATASET
We again use the potential outcomes dataset, but this week it is already in long
format, so we can directly proceed to estimating the marginal RH.
*/
use msm_onoff_potential_outcomes, clear
sort id newid
list id newid tx month event in 1/100
logistic event tx month_sp*
/* This suggests that the effect of treatment is to reduce the event rate by >70%
(the OR is approximately interpretable as a hazard ratio in the PLR model). */
* RUNNING THE ANALYSES USING THE OBSERVED DATA
use final, clear
* unadjusted PLR model; robust SEs not needed because there are no TDCs
logistic event tx month_sp*
/* The estimate is somewhat attenuated by confounding by indication. Now adjusting
for the confounder mediator. */
logistic event tx cm_current month_sp*, vce(cluster id)
/* The estimate is worse when we naively adjust for the confounder-mediator. Finally,
the IPW-PLR model. */
logistic event tx month_sp* [pw=sc_wt], vce(cluster id)
* sensitivity analysis using trimmed weights
logistic event tx month_sp* [pw=tr_sc_wt], vce(cluster id)
* sensitivity analysis omitting big weights
logistic event tx month_sp* [pw=sc_wt] if big_wt==0, vce(cluster id)
/* So all IPW models give results that are consistent with the estimate based on
the potential outcomes dataset, and show little sensitivity to big weights. A
caution: in our experience with big datasets, weights can sometimes be much
larger, and quite influential. */
/* PART 2: MODELING TRENDS IN THE TREATMENT EFFECT
This extension requires that we estimate the effect of time on treatment, which we
have to calcalate as a TDC.
PREPARING THE OBSERVED OUTCOMES DATASET FOR PLR ANALYSIS
The dataset, msm_txtrend.dta is in wide format and includes the following variables:
id: subject id
month: end of follow-up, in months
event: event indicator (0 = censored, 1 = event)
fmtx: first month on treatment
cm_0-cm_48: values of the confounder-mediator measured in each month
*/
use msm_txtrend, clear
describe
list id month event fmtx in 1/5, noobs clean
* The transformation into long format for PLR is unchanged from Section 1.
stset month, failure(event) id(id)
stsplit start, at(0/48)
drop start _t0 _st month event
rename _t month
rename _d event
label variable month "Month of follow-up"
label variable event "Event indicator"
label values event noyes
sort id month
save preliminary, replace
/* In this analysis we will assess the effect of treatment duration, which we
calculate in each month counting from the first month on treatment. This is
set to zero before treatment starts. */
cap drop tx
cap drop tx_months
gen tx = month>=fmtx
gen tx_months = max(0, month-(fmtx-1)) // max(0, x) = 0 if x<0 or x==.
gen tx_years = tx_months/12 // rescale to get more interpretable point estimate
label variable tx "Currently on treatment"
label variable tx_months "Current treatment duration (months)"
label values tx noyes
* current and lagged values of confounder-mediator (unchanged from Section 1)
cap drop cm
gen cm_current = cm_0
gen cm_lagged = cm_0
label variable cm_current "Confounder-mediator: current value"
label variable cm_lagged "Confounder-mediator: lagged value"
forvalues i = 1/48 {
qui replace cm_current = cm_`i' if month>=`i' & cm_`i'~= .
local lagged = `i'-1
qui replace cm_lagged = cm_`lagged' if month>=`i' & cm_`lagged'~= .
}
format cm_current cm_lagged %5.3f
* extra record for LTFU, unchanged from Section 1
gen byte LTFU = 0
label variable LTFU "Lost to follow-up indicator"
label values LTFU noyes
save preliminary, replace
by id: egen maxt = max(month)
keep if month==maxt & month<48 & event==0
qui replace month = month+1
qui replace LTFU = 1
keep id month tx tx_months cm_current cm_lagged LTFU
append using preliminary
sort id month
*spline in month number, also unchanged
mkspline month_sp = month, cubic
keep id month event fmtx tx tx_months tx_years month_sp* cm_0 ///
cm_current cm_lagged LTFU
save preliminary, replace
sort id fmtx
keep if _n <= 50
by id fmtx: list month cm_current tx tx_months event, noobs clean
use preliminary, clear
* IP weights
logit tx cm_lagged month_sp* if month<=fmtx
predict cp_tx_d
replace cp_tx_d = 1 if month>fmtx
replace cp_tx_d = 1-cp_tx_d if tx==0
sort id month
gen mp_tx_d = cp_tx_d
replace mp_tx_d = mp_tx_d*mp_tx_d[_n-1] if id==id[_n-1]
logit tx month_sp* if month<=fmtx
predict cp_tx_n
replace cp_tx_n = 1 if month>fmtx
replace cp_tx_n = 1-cp_tx_n if tx==0
sort id month
gen mp_tx_n = cp_tx_n
replace mp_tx_n = mp_tx_n*mp_tx_n[_n-1] if id==id[_n-1]
gen ip_tx_wt = mp_tx_n/mp_tx_d
* IPC weights
logit LTFU cm_current tx month_sp*
predict cp_LTFU_d
gen cp_infu_d = 1-cp_LTFU_d
sort id month
gen mp_infu_d = cp_infu_d
replace mp_infu_d = mp_infu_d*mp_infu_d[_n-1] if id==id[_n-1]
logit LTFU tx month_sp*
predict cp_LTFU_n
gen cp_infu_n = 1-cp_LTFU_n
sort id month
gen mp_infu_n = cp_infu_n
replace mp_infu_n = mp_infu_n*mp_infu_n[_n-1] if id==id[_n-1]
gen ip_infu_wt = mp_infu_n/mp_infu_d
gen sc_wt = ip_tx_wt*ip_infu_wt
label variable sc_wt "Stabilized combined weight"
format cp* mp* ip* sc_wt %7.3f
* Now we will check for potential positivity violations.
recode cp_tx_d 0/0.05=1 0.05/0.95=0 0.95/1=1, gen(ppv)
replace ppv = 0 if month>fmtx // cp_tx_d==1 after treatment begun
label variable ppv "Potential positivity violation"
label values ppv noyes
tab tx ppv, row
sum sc_wt if ppv==1, detail
/* So we have a lot of potential positivity violations, but no large weights
among those observations, or overall. The PPVs are almost all among subjects
who are off treatment, meaning that there was little indication for them to
start. So the probability of their observed treatment status was nearly 1,
and they got small weights. Restriction of the dataset to exclude these
subjects might be appropriate; they are like the infants in the phototherapy
study who are almost never treated. Again, the weights would need to be
re-stimated after restricting the dataset to exclude them, which we do in
the part 4 example. The next check shows that there are also no big weights
overall. */
sum sc_wt, detail
save final, replace
* look at some data
keep if id<=4
sort id fmtx
by id fmtx: list month event tx tx_months sc_wt, noobs clean
/* We again use the potential outcomes dataset, already in long format, to get an
estimate of the true marginal treatment effects. */
use msm_txtrend_potential_outcomes, clear
logistic event tx tx_years month_sp*
* RH at annual intervals
forvalues i = 0/4 {
dis _newline as result "Estimated RH after `i' years of treatment:"
lincom tx + `i'*tx_years
}
/* In the potential outcomes data, we see almost no immediate effect of
treatment, but increasing effects over time, reaching 86% after 4 years of
treatment. */
/* NOW USING THE OBSERVED DATA */
* Unadjusted PLR model
use final, clear
logistic event tx tx_years month_sp*
/* The trend estimate is OK but main effect is badly confounded by indication.
Now the naive PLR model adjusting for the current value of confounder-mediator. */
logistic event tx tx_years month_sp* cm_current, vce(cluster id)
/* Now both estimates are off. That is because a large part of the treatment
effect is mediated by its gradually increasing effect on the confounder-mediator.
Finally we will run the IPW-PLR model.*/
logistic event tx tx_years month_sp* [pw=sc_wt], vce(cluster id)
* RH at annual intervals
forvalues i = 0/4 {
dis _newline as result "Estimated RH after `i' years of treatment:"
lincom tx + `i'*tx_years
}
/* These estimates are OK, at least qualitatively.*/
/* PART 3: REPEATED BINARY OUTCOME
IP weighting applies to repeated measures as well as survival outcomes. In
the PLR model, we have a repeated binary outcome every month, until the outcome
occurs or censoring. In this context, we have a repeated outcome every visit,
until censoring, but the outcome can be continuous or a count as well as
binary, and even if it is binary, it can happen repeatedly -- so subjects are
seen again after a first
positive outcome.
In this example, treatment is allowed to stop as well as start, so the IP
weights are calculated as in PART 1. The other required change is substituting
a repeated measures model for the outcome in place of the PLR model used with
failure times. In Stata, we can't use the xt commands, which do not allow
time-varying weights -- the same problem that forces us to use PLR in place of
the Cox model with failure times. Instead we use standard regression commands
with the option vce(cluster id). This is tantamount to fitting GEE models with
so-called independence working correlation and robust SEs. Use of IPC weights
makes this procedure consistent if the data are missing at random, given
covariates and the outcomes we observe (MAR), provided the IPC weight models
are right and there are no unmeasured confounders of dropout. Without the IPC
weights, GEE is only consistent if the data are missing completely at random
(MCAR), given covariates, with no dependence of dropout on observed outcomes.
PREPARING THE OBSERVED OUTCOMES DATASET FOR ANALYSIS
The dataset, msm_repeated.dta is in wide format, with one record per subject, and
includes the following variables:
id: subject id
cm_bl: confounder-mediator at baseline
cm_6, cm_12, ..., cm_48: the confounder-mediator at each 6 month visit
tx_6, tx_12, ..., tx_48: treatment indicator at follow-up visits
event_6, event_12, ..., event_48: outcome event indicator at each visit.
The only thing we keep from baseline is the value of the confounder mediator.
We assume that some subjects may have been started on treatment after their
baseline visit, on the basis of the confounder-mediator value observed there.
So by the 6-month visit, the confounder-mediator and the outcome depend on
treatment. But treatment status depends on the baseline confounder-mediator
value, not the current
value. */
use msm_repeated, clear
describe
list id tx_* in 1/10, noobs clean
list id event_* in 1/10, noobs clean
/* The transformation is simpler here, aided by the variable names, which were
chosen to be compatible with the reshape command. In particular the baseline
value of the confounder-mediator is called cm_bl, to avoid making records for
everyone at baseline; these are not needed, because there are no outcomes at
baseline. */
reshape long cm_ tx_ event_, i(id) j(month)
drop if event_==.
* for convenience get rid of the trailing underscores in the variable names
foreach x in cm tx event {
rename `x'_ `x'
}
label variable tx "Currently on treatment"
label variable month "Month of follow-up"
label variable event "Event indicator"
label values tx event noyes
sort id month
tab month tx
tab month event
save preliminary, replace
* check what this has done
keep if _n <= 100
by id: list month event, noobs clean
use preliminary, clear
* get lagged values of treatment and confounder-mediator
sort id month
gen tx_lagged = 0 if month==6
replace tx_lagged = tx[_n-1] if month>6 & id==id[_n-1]
label values tx_lagged noyes
gen cm_lagged = cm_bl if month==6
replace cm_lagged = cm[_n-1] if month>6 & id==id[_n-1]
rename cm cm_current
format cm_current cm_lagged %4.2f
* extra LTFU record
gen byte LTFU = 0 // set the LTFU indicator to zero on all existing records
label variable LTFU "Lost to follow-up indicator"
label values LTFU noyes
save preliminary, replace // save the main dataset
by id: egen maxt = max(month) // identify last month in follow-up for all subjects
keep if month==maxt & month<48 // pick out last record for LTFU subjects
qui replace month = month+6 // reset LTFU event time to the following month
qui replace LTFU = 1 // reset LTFU indicator to 1
keep id month tx tx_lagged cm_current cm_lagged LTFU // needed in the LTFU models
append using preliminary // join the new record to the analysis dataset
sort id month
save preliminary, replace
keep id month event tx tx_lagged cm_current cm_lagged LTFU
save preliminary, replace
* Check this
keep if _n <= 100
sort id
by id: list month tx tx_lagged cm_current cm_lagged LTFU event, noobs clean
use preliminary, clear
* IP weights
logit tx cm_lagged tx_lagged i.month if event~= . // omit extra LTFU records
predict cp_tx_d
replace cp_tx_d = 1-cp_tx_d if tx==0
sort id month
gen mp_tx_d = cp_tx_d
replace mp_tx_d = mp_tx_d * mp_tx_d[_n-1] if id==id[_n-1]
logit tx tx_lagged i.month if event~=.
predict cp_tx_n
replace cp_tx_n = 1-cp_tx_n if tx==0
sort id month
gen mp_tx_n = cp_tx_n
replace mp_tx_n = mp_tx_n * mp_tx_n[_n-1] if id==id[_n-1]
gen ip_tx_wt = mp_tx_n/mp_tx_d
* IPC weights
logit LTFU tx cm_current i.month
predict cp_LTFU_d
gen cp_infu_d = 1-cp_LTFU_d
sort id month
gen mp_infu_d = cp_infu_d
replace mp_infu_d = mp_infu_d * mp_infu_d[_n-1] if id==id[_n-1]
logit LTFU tx i.month
predict cp_LTFU_n
gen cp_infu_n = 1-cp_LTFU_n
sort id month
gen mp_infu_n = cp_infu_n
replace mp_infu_n = mp_infu_n * mp_infu_n[_n-1] if id==id[_n-1]
gen ip_infu_wt = mp_infu_n/mp_infu_d
* stabilized combined weight
gen sc_wt = ip_tx_wt*ip_infu_wt
label variable sc_wt "Stabilized combined weight"
format ip* sc*_* %7.3f
* check positivity and trim weights
recode cp_tx_d 0/0.05=1 0.05/0.95=0 0.95/1=1, gen(ppv)
label variable ppv "Potential positivity violation"
label values ppv noyes
tab ppv
* combined stabilized weights in potential positivity violations
sum sc_wt if ppv==1, detail
* overall combined stabilized weight distribution
sum sc_wt, detail
recode sc_wt 0/20=0 20/max=1, gen(big_wt)
label variable big_wt "Stabilized combined weight > 20"
label values big_wt noyes
tab big_wt
recode sc_wt 20/max=20, gen(tr_sc_wt)
label variable tr_sc_wt "Trimmed stabilized combined weight"
save final, replace
* Have a look at the resulting data
use final, clear
keep if id<=10
sort id
by id: ///
list month event tx tx_lagged cm_current cm_lagged ip_tx_wt ip_infu_wt sc_wt, ///
noobs clean
* RUNNING THE ANALYSES USING THE POTENTIAL OUTCOMES DATASET
use msm_repeated_potential_outcomes, clear
describe
*list in 1/5, noobs clean
logistic event tx i.month, vce(cluster id)
* RUNNING THE ANALYSES USING THE OBSERVED DATA
use final, clear
* unadjusted repeated measures model
logistic event tx i.month, vce(cluster id)
* Naive repeated measures model adjusting for confounder-mediator
logistic event tx cm_current i.month, vce(cluster id)
/* Using non-independence working correlation or a random effects model does
not help, Uncomment the next two lines if you want to verify this. */
* xtlogit event tx cm_current i.month, pa i(id) vce(robust)
* xtlogit event tx cm_current i.month, i(id)
* IPW repeated measures model using stabilized combined weights
logistic event tx i.month [pw=sc_wt], vce(cluster id)
* sensitivity analysis omitting potential positivity violations
logistic event tx i.month [pw=sc_wt] if ppv==0, vce(cluster id)
* sensitivity analysis using trimmed weights
logistic event tx i.month [pw=tr_sc_wt], vce(cluster id)
* sensitivity analysis omitting big weights
logistic event tx i.month [pw=sc_wt] if big_wt==0, vce(cluster id)
* What do you find?
/* PART 4: MODELING INTERACTIONS BETWEEN BASELINE COVARIATE AND TREATMENT
This extension requires that we calculate the interaction, which is a TDC
because treatment is a TDC. In this part of the example, we go back to a once-on,
always-on treatment, but the approach would be essentially the same with an
on/off treatment.
PREPARING THE OBSERVED OUTCOMES DATASET FOR PLR ANALYSIS
The dataset, msm_blia.dta is in wide format and includes the following variables:
id: subject id
bl_em: fixed binary baseline effect modifier
month: end of follow-up, in months
event: event indicator (0 = censored, 1 = event)
fmtx: first month on treatment
cm_0-cm_48: values of the confounder-mediator measured in each month
*/
use msm_blia, clear
describe
label values event noyes
list id bl_em month event fmtx in 1/5, noobs clean
* The transformation into long format for PLR is unchanged from Section 1.
stset month, failure(event) id(id)
stsplit start, at(0/48)
drop start _t0 _st month event
rename _t month
rename _d event
label variable month "Month of follow-up"
label variable event "Event indicator"
label values event noyes
sort id month
save preliminary, replace
/* In this analysis we will assess the effect of the interaction between treatment
and the baseline effect modifier. We could calculate a product term in this simple
case, or use Stata factor coding to do it for us; the latter would be preferred in
cases where the baseline variable has more levels, or the treatment variables are
complicated. */
cap drop tx
cap drop tx_months
gen tx = month>=fmtx
gen tx_blia = tx * bl_em
label variable tx "Currently on treatment"
label variable tx_blia "Interaction of tx with baseline effect modifier"
label values bl_em tx tx_blia noyes
* current and lagged values of confounder-mediator (unchanged from Section 1)
cap drop cm
gen cm_current = cm_0
gen cm_lagged = cm_0
label variable cm_current "Confounder-mediator: current value"
label variable cm_lagged "Confounder-mediator: lagged value"
forvalues i = 1/48 {
qui replace cm_current = cm_`i' if month>=`i' & cm_`i'~= .
local lagged = `i'-1
qui replace cm_lagged = cm_`lagged' if month>=`i' & cm_`lagged'~= .
}
format cm_current cm_lagged %5.3f
* extra record for LTFU, unchanged from Section 1
gen byte LTFU = 0
label variable LTFU "Lost to follow-up indicator"
label values LTFU noyes
save preliminary, replace
by id: egen maxt = max(month)
keep if month==maxt & month<48 & event==0
qui replace month = month+1
qui replace LTFU = 1
keep id month tx tx_blia bl_em cm_current cm_lagged LTFU
append using preliminary
sort id month
*spline in month number, also unchanged
mkspline month_sp = month, cubic
keep id month event fmtx tx tx_blia bl_em month_sp* cm_0 cm_current cm_lagged LTFU
save preliminary, replace
keep if _n <= 50
sort id fmtx bl_em
by id fmtx bl_em: list month cm_current tx tx_blia event, noobs clean
use preliminary, clear
* IP weights
logit tx cm_lagged bl_em month_sp* if month<=fmtx
predict cp_tx_d
replace cp_tx_d = 1 if month>fmtx
replace cp_tx_d = 1-cp_tx_d if tx==0
sort id month
gen mp_tx_d = cp_tx_d
replace mp_tx_d = mp_tx_d*mp_tx_d[_n-1] if id==id[_n-1]
logit tx bl_em month_sp* if month<=fmtx
predict cp_tx_n
replace cp_tx_n = 1 if month>fmtx
replace cp_tx_n = 1-cp_tx_n if tx==0
sort id month
gen mp_tx_n = cp_tx_n
replace mp_tx_n = mp_tx_n*mp_tx_n[_n-1] if id==id[_n-1]
gen ip_tx_wt = mp_tx_n/mp_tx_d
* IPC weights
logit LTFU cm_current tx bl_em month_sp*
predict cp_LTFU_d
gen cp_infu_d = 1-cp_LTFU_d
sort id month
gen mp_infu_d = cp_infu_d
replace mp_infu_d = mp_infu_d*mp_infu_d[_n-1] if id==id[_n-1]
logit LTFU tx bl_em month_sp*
predict cp_LTFU_n
gen cp_infu_n = 1-cp_LTFU_n
sort id month
gen mp_infu_n = cp_infu_n
replace mp_infu_n = mp_infu_n*mp_infu_n[_n-1] if id==id[_n-1]
gen ip_infu_wt = mp_infu_n/mp_infu_d
gen sc_wt = ip_tx_wt*ip_infu_wt
label variable sc_wt "Stabilized combined weight"
format cp* mp* ip* sc_wt %7.3f
* Now we will check for potential positivity violations.
recode cp_tx_d 0/0.05=1 0.05/0.95=0 0.95/1=1, gen(ppv)
replace ppv = 0 if month>fmtx // cp_tx_d==1 after treatment begun
label variable ppv "Potential positivity violation"
label values ppv noyes
tab tx ppv, row
sum sc_wt if ppv==1, detail
/* Again, we have a lot of potential positivity violations, but no large
weights among those observations, or overall. The PPVs are almost all among
subjects who are off treatment, meaning that there was little indication for
them to start. So the probability of their observed treatment status was
nearly 1, and they got small weights. At the end of this section, we omit
these observations, re-evaluate the weights, and re-run the analysis. */
sum sc_wt, detail
save final, replace
/* We again use the potential outcomes dataset to get an estimate of the true
marginal treatment effects. This week it is already in long format, so proceed
to estimating the marginal RH. */
use msm_blia_potential_outcomes, clear
logistic event tx tx_blia bl_em month_sp*
* treatment effect in subjects with effect modifier
lincom tx + tx_blia
* rerun using factor coding
logistic event i.tx##i.bl_em month_sp*
* treatment effect in subjects with effect modifier
lincom 1.tx + 1.tx#1.bl_em
/* The effect modifier is associated with increased baseline risk, but
effectively doubles the protective effect of treatment. */
/* NOW USING THE OBSERVED DATA */
* Unadjusted PLR model
use final, clear
logistic event tx tx_blia bl_em month_sp*
* treatment effect in subjects with effect modifier
lincom tx + tx_blia
/* The effect modifier is well estimated, as is the its interaction with
treatment, but the main effect for treatment is badly confounded; this also
biases the effect in the subjects with the effect modifier. */
logistic event tx tx_blia bl_em month_sp* cm_current, vce(cluster id)
* treatment effect in subjects with effect modifier
lincom tx + tx_blia
/* Adjusting for the TDCM biases all estimates with the exception of the main
effect of the effect modifier. Now the IPW model. */
logistic event tx tx_blia bl_em month_sp* [pw=sc_wt], vce(cluster id)
* treatment effect in subjects with effect modifier
lincom tx + tx_blia
/* Both estimates are pretty good, but we have lost considerable precision in the
interaction estimate, which we might be tempted to omit from the model.
Finally, a sensitivity analysis omitting PPV observations. We will first
recalculate the weights */
drop if ppv == 1
drop cp_* mp_* ip_* sc_*
logit tx cm_lagged bl_em month_sp* if month<=fmtx
predict cp_tx_d
replace cp_tx_d = 1 if month>fmtx
replace cp_tx_d = 1-cp_tx_d if tx==0
sort id month
gen mp_tx_d = cp_tx_d
replace mp_tx_d = mp_tx_d*mp_tx_d[_n-1] if id==id[_n-1]
logit tx bl_em month_sp* if month<=fmtx
predict cp_tx_n
replace cp_tx_n = 1 if month>fmtx
replace cp_tx_n = 1-cp_tx_n if tx==0
sort id month
gen mp_tx_n = cp_tx_n
replace mp_tx_n = mp_tx_n*mp_tx_n[_n-1] if id==id[_n-1]
gen ip_tx_wt = mp_tx_n/mp_tx_d
* IPC weights
logit LTFU cm_current tx bl_em month_sp*
predict cp_LTFU_d
gen cp_infu_d = 1-cp_LTFU_d
sort id month
gen mp_infu_d = cp_infu_d
replace mp_infu_d = mp_infu_d*mp_infu_d[_n-1] if id==id[_n-1]
logit LTFU tx bl_em month_sp*
predict cp_LTFU_n
gen cp_infu_n = 1-cp_LTFU_n
sort id month
gen mp_infu_n = cp_infu_n
replace mp_infu_n = mp_infu_n*mp_infu_n[_n-1] if id==id[_n-1]
gen ip_infu_wt = mp_infu_n/mp_infu_d
gen sc_wt = ip_tx_wt*ip_infu_wt
label variable sc_wt "Stabilized combined weight"
format cp* mp* ip* sc_wt %7.3f
logistic event tx tx_blia bl_em month_sp* [pw=sc_wt], vce(cluster id)
* treatment effect in subjects with effect modifier
lincom tx + tx_blia
/* These are reasonably good estimates */
* clean up
erase preliminary.dta
erase final.dta