Skip to content

Mediation Analysis with simcausal #13

@heledi

Description

@heledi

Hi,
I'd like to use simcausal for mediation analysis but wonder how this can best be done. For natural direct and indirect effects I need to simulate the crossworld counterfactual Y(a, M(a*)), - A is the binary exposure variable and M the mediator. M(a*) is the level of the mediator that it would have naturally been under the reference intervention A=a*.

I'm also particularly interested in simulating randomized interventional analogues of the natural effects in the presence of exposure-induced confounding of the relationship of M and the outcome Y. These effects require the counterfactual Y(a, G(m|a*, c)), with G(m|a*, c) denoting a random draw from the distribution of M given A=a* and non-exposure-induced confounders C=c.

What I did so far was rather intuitive. For the interventional effects, I need to draw from the distribution G(m|a*, c) - therefore I simulated data first and then specified a model (lm or glm,e.g.) for M|a, c for each counterfactual dataset a1 and a0. I used the estimated parameters for specifying G(m|a*, c), G(m|a, c) and the respective outcome variables in new nodes and simulated the counterfactual dataset again to evaluate the interventional effects. Below is a simple example for illustration:

################################################################
#Simple Mediation process with an exposure-induced confounder L
#An "intuitive" approach
################################################################

set seed for reproducability

set.seed(41188)

M <- DAG.empty()

M <- M +
node("a", # Binary exposure variable
distr = "rbern",
prob = 0.53) +
node("c", # Confounder variable C
distr="rnorm", mean=5, sd=2)+
node("l", #Binary confounder variable L -> exposure induced
distr="rbern" ,
prob=ifelse(c>=5, 0.6-0.3a, 0.7-0.3a))+
node("m", #Mediator variable M
distr = "rnorm",
mean=10-2a+c+5l,
sd=3)+
node("u.Y", #Latent variable
distr = "rnorm", mean = 0, sd = 100)+
node("y", #Outcome Variable Y
distr = "rnorm", mean = 2000-200a+10m-5am + 50l -20al+ 15c + u.Y, sd=200)
Mset <- set.DAG(M, latent.v = c("u.Y"))
str(Mset)

specify the two interventions

a1 <- node("a", distr = "rbern", prob = 1)
Mset <- Mset + action("a1", nodes = a1)
a0 <- node("a", distr = "rbern", prob = 0)
Mset <- Mset + action("a0", nodes = a0)

counterfactual data to estimate conditional distribution M|a,c and M|a*,c

dat <- simcausal::sim(DAG = Mset, actions = c("a1", "a0"), n = 10000000, rndseed = 345)

look at counterfactual data

head(dat[["a1"]]); head(dat[["a0"]])

plotDAG(Mset, xjitter = 0.2, yjitter = 0.2,
edge_attrs = list(width = 0.5, arrow.width = 0.5, arrow.size = 0.3),
vertex_attrs = list(size = 20, label.cex = 0.8))

#Evaluate distribution M|a,c and M|a*,c

summary(m0<-lm(m~c, data=dat[["a0"]]))
sd.m0<-summary(m0)$sigma; sd.m0
coef.m0<-coefficients(m0); coef.m0

summary(m1<-lm(m~c, data=dat[["a1"]]))
coef.m1<-coefficients(m1); coef.m1
sd.m1<-summary(m1)$sigma; sd.m1

M.r<- M +
node("m.0", distr = "rnorm", mean = 13.7465169 + 0.9006255c, sd=3.828051)+ #insert values by hand
node("m.1", distr = "rnorm", mean = 10.2433650 + 0.9009554 c, sd=3.826656) +
#node("m.0", distr = "rnorm", mean = coef.m0[1]+coef.m0[2]c, sd=sd.m0)+ this syntax doesn't work
#node("m.1", distr = "rnorm", mean =coef.m1[1] + coef.m1[2]c, sd=sd.m1)+
node("y.m0", distr = "rnorm",
mean = 2000-200
a+10
m.0-5
a
m.0 + 50l -20al+ 15c + u.Y, sd=200)+
node("y.m1", distr = "rnorm",
mean = 2000-200a+10m.1-5am.1 + 50l -20al+ 15c + u.Y, sd=200)
Mset.r <- set.DAG(M.r, latent.v = c("u.Y"))
str(Mset.r)

specify the two interventions

a1 <- node("a", distr = "rbern", prob = 1)
Mset.r <- Mset.r + action("a1", nodes = a1)
a0 <- node("a", distr = "rbern", prob = 0)
Mset.r <- Mset.r + action("a0", nodes = a0)

##################

Evaluate TRUTH

##################
#Simulate counterfactual dataset
dat <- simcausal::sim(DAG = Mset.r, actions = c("a1", "a0"), n = 10000000, rndseed = 345)

#Evaluate true randomized interventional analogue for the direct effect

Mset.r <- set.targetE(Mset.r, outcome = "y.m0", param = "a1")
y_r.m0_a1<-eval.target(Mset.r, data = dat)$res ; y_r.m0_a1

Mset.r <- set.targetE(Mset.r, outcome = "y.m0", param = "a0")
y_r.m0_a0<-eval.target(Mset.r, data = dat)$res; y_r.m0_a0
RIA.NDE<-y_r.m0_a1-y_r.m0_a0; RIA.NDE

#Evaluate true randomized interventional analogue for the indirect effect

Mset.r <- set.targetE(Mset.r, outcome = "y.m1", param = "a1")
y_r.m1_a1<-eval.target(Mset.r, data = dat)$res; y_r.m1_a1

RIA.NIE<-y_r.m1_a1-y_r.m0_a1; RIA.NIE

#Evaluate randomized interventional analogue for total effect:
RIA.TE<-y_r.m1_a1-y_r.m0_a0
RIA.TE == RIA.NIE + RIA.NDE
###################################

I'm not quite sure, if this code is ok or if there are more efficient solutions?
Thanks a lot for suggestions!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions