Mediation Analysis with simcausal

Hi,
I'd like to use simcausal for mediation analysis but wonder how this can best be done. For natural direct and indirect effects I need to simulate the crossworld counterfactual Y(a, M(a*)), - A is the binary exposure variable and M the mediator. M(a*)  is the level of the mediator that it would have naturally been under the reference intervention A=a*.

I'm also particularly interested in simulating randomized interventional analogues of the natural effects in the presence of exposure-induced confounding of the relationship of M and the outcome Y. These effects require the counterfactual Y(a, G(m|a*, c)), with G(m|a*, c) denoting a random draw from the distribution of M given A=a* and non-exposure-induced confounders C=c.

What I did so far was rather intuitive. For the interventional effects, I need to draw from the distribution G(m|a*, c) - therefore I simulated data first and then specified a model (lm or glm,e.g.) for M|a, c  for each counterfactual dataset a1 and a0. I used the estimated parameters for specifying G(m|a*, c), G(m|a, c) and the respective outcome variables in new nodes and simulated the counterfactual dataset again to evaluate the interventional effects. Below is a simple example for illustration:

################################################################
#Simple Mediation process with an exposure-induced confounder L
#An "intuitive" approach
################################################################

# set seed for reproducability
set.seed(41188)

M <- DAG.empty()

M <- M +
node("a", # Binary exposure variable
distr = "rbern",
prob = 0.53) +
node("c", # Confounder variable C
distr="rnorm", mean=5, sd=2)+
node("l", #Binary confounder variable L -> exposure induced
distr="rbern" ,
prob=ifelse(c>=5, 0.6-0.3*a, 0.7-0.3*a))+
node("m", #Mediator variable M
distr = "rnorm",
mean=10-2*a+c+5*l,
sd=3)+
node("u.Y", #Latent variable
distr = "rnorm", mean = 0, sd = 100)+
node("y", #Outcome Variable Y
distr = "rnorm", mean = 2000-200*a+10*m-5*a*m + 50*l -20*a*l+ 15*c + u.Y, sd=200)
Mset <- set.DAG(M, latent.v = c("u.Y"))
str(Mset)
# specify the two interventions
a1 <- node("a", distr = "rbern", prob = 1)
Mset <- Mset + action("a1", nodes = a1)
a0 <- node("a", distr = "rbern", prob = 0)
Mset <- Mset + action("a0", nodes = a0)


# counterfactual data to estimate conditional distribution M|a,c and M|a*,c
dat <- simcausal::sim(DAG = Mset, actions = c("a1", "a0"), n = 10000000, rndseed = 345)

# look at counterfactual data
head(dat[["a1"]]); head(dat[["a0"]])

plotDAG(Mset, xjitter = 0.2, yjitter = 0.2,
edge_attrs = list(width = 0.5, arrow.width = 0.5, arrow.size = 0.3),
vertex_attrs = list(size = 20, label.cex = 0.8))

#Evaluate distribution M|a,c and M|a*,c 

summary(m0<-lm(m~c, data=dat[["a0"]]))
sd.m0<-summary(m0)$sigma; sd.m0
coef.m0<-coefficients(m0); coef.m0

summary(m1<-lm(m~c, data=dat[["a1"]]))
coef.m1<-coefficients(m1); coef.m1
sd.m1<-summary(m1)$sigma; sd.m1

M.r<- M +
node("m.0", distr = "rnorm", mean = 13.7465169 + 0.9006255*c, sd=3.828051)+ #insert values by hand 
node("m.1", distr = "rnorm", mean = 10.2433650 + 0.9009554 *c, sd=3.826656) +
#node("m.0", distr = "rnorm", mean = coef.m0[1]+coef.m0[2]*c, sd=sd.m0)+ this syntax doesn't work
#node("m.1", distr = "rnorm", mean =coef.m1[1] + coef.m1[2]*c, sd=sd.m1)+
node("y.m0", distr = "rnorm",
mean = 2000-200*a+10*m.0-5*a*m.0 + 50*l -20*a*l+ 15*c + u.Y, sd=200)+
node("y.m1", distr = "rnorm",
mean = 2000-200*a+10*m.1-5*a*m.1 + 50*l -20*a*l+ 15*c + u.Y, sd=200)
Mset.r <- set.DAG(M.r, latent.v = c("u.Y"))
str(Mset.r)

# specify the two interventions
a1 <- node("a", distr = "rbern", prob = 1)
Mset.r <- Mset.r + action("a1", nodes = a1)
a0 <- node("a", distr = "rbern", prob = 0)
Mset.r <- Mset.r + action("a0", nodes = a0)

##################
# Evaluate TRUTH #
##################
#Simulate counterfactual dataset
dat <- simcausal::sim(DAG = Mset.r, actions = c("a1", "a0"), n = 10000000, rndseed = 345)

#Evaluate true randomized interventional analogue for the direct effect

Mset.r <- set.targetE(Mset.r, outcome = "y.m0", param = "a1")
y_r.m0_a1<-eval.target(Mset.r, data = dat)$res ; y_r.m0_a1

Mset.r <- set.targetE(Mset.r, outcome = "y.m0", param = "a0")
y_r.m0_a0<-eval.target(Mset.r, data = dat)$res; y_r.m0_a0
RIA.NDE<-y_r.m0_a1-y_r.m0_a0; RIA.NDE 

#Evaluate true randomized interventional analogue for the indirect effect

Mset.r <- set.targetE(Mset.r, outcome = "y.m1", param = "a1")
y_r.m1_a1<-eval.target(Mset.r, data = dat)$res; y_r.m1_a1

RIA.NIE<-y_r.m1_a1-y_r.m0_a1; RIA.NIE 

#Evaluate randomized interventional analogue for total effect:
RIA.TE<-y_r.m1_a1-y_r.m0_a0
RIA.TE == RIA.NIE + RIA.NDE
###################################

I'm not quite sure, if this code is ok or if there are more efficient solutions?
Thanks a lot for suggestions!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Mediation Analysis with simcausal #13

set seed for reproducability

specify the two interventions

counterfactual data to estimate conditional distribution M|a,c and M|a*,c

look at counterfactual data

specify the two interventions

Evaluate TRUTH

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Mediation Analysis with simcausal #13

Description

set seed for reproducability

specify the two interventions

counterfactual data to estimate conditional distribution M|a,c and M|a*,c

look at counterfactual data

specify the two interventions

Evaluate TRUTH

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions