Skip to content

Commit 7031757

Browse files
committed
Update notes
1 parent 91925a1 commit 7031757

File tree

3 files changed

+43
-23
lines changed

3 files changed

+43
-23
lines changed

notes/Master.bib

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,9 @@
1+
@article{wang2025causal,
2+
title={Causal Inference: A Tale of Three Frameworks},
3+
author={Wang, Linbo and Richardson, Thomas and Robins, James},
4+
journal={arXiv preprint arXiv:2511.21516},
5+
year={2025}
6+
}
17
@article{chen2025identification,
28
title={Identification and Debiased Learning of Causal Effects with General Instrumental Variables},
39
author={Chen, Shuyuan and Zhang, Peng and Cui, Yifan},

notes/main.typ

Lines changed: 37 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -113,21 +113,25 @@ Issue of #emph("Observational Studies") titleed #link("https://en.wikipedia.org/
113113

114114
Form comment in #link("https://statmodeling.stat.columbia.edu/2025/11/14/how-is-it-that-this-problem-with-its-21-data-points-is-so-much-easier-to-handle-with-1-predictor-than-with-16-predictors/")[`Impossible statistical problems`] of Andrew Gelman by Phil, November 14, 2024.
115115

116-
#quote("I’m imagining a political science student coming in for statistical advice:
116+
#quote(
117+
"I’m imagining a political science student coming in for statistical advice:
117118
Student: I’m trying to predict the Democratic percentage of the two-party vote in U.S. Presidential elections, six months before Election Day. I want to use just the past ten elections because I think the political landscape was too different before that.
118119
Statistician: Sounds interesting. What predictive variables do you have?
119120
Student: I’ve got the Democratic share in the last election, and the change in unemployment rate over the past year and the past three years, and the inflation rate over the past year and the past three years, and the change in median income over the past year and past three years.
120121
Statistician: That’s a lot of predictors for not many elections, we are going to have some issues, but maybe we can use lasso or a regularization scheme or something. Let’s get started.
121122
Student: I also own an almanac.
122-
Statistician: Oh. Sorry, I can’t help you, your problem is impossible.")
123+
Statistician: Oh. Sorry, I can’t help you, your problem is impossible.",
124+
)
123125

124126
With only 10 data points and 7 predictors, there is still some room for analysis. However, when using an almanac with over 1,000 predictors, the problem becomes unsolvable: the model is overparameterized and loses all predictive power for future observations.
125127

126128
Therefore, in scenarios with extremely small sample sizes, an excess of irrelevant predictors can contaminate the data—rather than enriching it—and render meaningful analysis impossible.
127129

128-
But it now an empirical observation, can we theoretically explain this phenomenon?
130+
But it now an empirical observation, can we theoretically explain this phenomenon?
129131

130-
#question("Theoretical explanation for overparameterized models with small sample size")[For sample siez $n = 200$, outcome $Y in RR$ and predictors $X in RR^(p)$, $p / n = c in (0, infinity)$, $Y$ is independent of $X$, under what condition? We will see that a machine learning algorithm can still predict $Y$ well from $X$?.]
132+
#question(
133+
"Theoretical explanation for overparameterized models with small sample size",
134+
)[For sample siez $n = 200$, outcome $Y in RR$ and predictors $X in RR^(p)$, $p / n = c in (0, infinity)$, $Y$ is independent of $X$, under what condition? We will see that a machine learning algorithm can still predict $Y$ well from $X$?.]
131135

132136
Well, this is just the global null hypothesis testing problem in high dimension models. We can use a nonparametric regression view to see this problem.
133137

@@ -137,9 +141,11 @@ $ "H"_0 : f = 0 , "H"_1 : f eq.not 0 $
137141

138142
Using a base function of $cal(H)$ and truncating at $k$ terms, it should be answered well as a hypothesis testing problem, the signal noise ratio, sparsity and the constant $p/n$ will give the detection boundary, also the local minimax rate.
139143

140-
Well, that's asymptotic theory, there still is the question for tiny $n$, say $n = 10$ or $20$. Can we give any answer for this? Can we know anything useful form so tiny sample size? This case may be called #link("https://en.wikipedia.org/wiki/Knightian_uncertainty")[Knightian uncertainty]?
144+
Well, that's asymptotic theory, there still is the question for tiny $n$, say $n = 10$ or $20$. Can we give any answer for this? Can we know anything useful form so tiny sample size? This case may be called #link("https://en.wikipedia.org/wiki/Knightian_uncertainty")[Knightian uncertainty]?
141145

142-
#quote("In economics, Knightian uncertainty is a lack of any quantifiable knowledge about some possible occurrence, as opposed to the presence of quantifiable risk (e.g., that in statistical noise or a parameter's confidence interval). The concept acknowledges some fundamental degree of ignorance, a limit to knowledge, and an essential unpredictability of future events.")
146+
#quote(
147+
"In economics, Knightian uncertainty is a lack of any quantifiable knowledge about some possible occurrence, as opposed to the presence of quantifiable risk (e.g., that in statistical noise or a parameter's confidence interval). The concept acknowledges some fundamental degree of ignorance, a limit to knowledge, and an essential unpredictability of future events.",
148+
)
143149

144150
= On the undistinguishable or identification of statistical models
145151

@@ -236,34 +242,40 @@ Here $serif(Pr)(Y(1) = b | G = g)$ and $serif(Pr)( Y(0) = a | G = g)$ can be ide
236242

237243
@dong2025marginal
238244

239-
- talk about the indenfication of ATE with continuous or multiple-category IVs with binary treatment.
240-
245+
- talk about the indenfication of ATE with continuous or multiple-category IVs with binary treatment.
246+
241247

242248
- data are $(X,D,Z,Y)$
243249
#image("media/image.png")
244250

245251
- The identification assumption:
246-
+ Stable Unit Treatment Value Assumption (SUTVA) for potential outcomes:
247-
- Consistency and no interference between units:
248-
$ Y = Y (D) & = D Y(1) + (1-D) Y(0) \
249-
D & = D(Z) $
250-
+ IV relevance (version 1): $ Z cancel(perp) D | X$ almost surely.
251-
+ IV independence : $ Z perp U | X$
252+
+ Stable Unit Treatment Value Assumption (SUTVA) for potential outcomes:
253+
- Consistency and no interference between units:
254+
$
255+
Y = Y (D) & = D Y(1) + (1-D) Y(0) \
256+
D & = D(Z)
257+
$
258+
+ IV relevance (version 1): $Z cancel(perp) D | X$ almost surely.
259+
+ IV independence : $Z perp U | X$
252260
+ IV exclusion restriction : $Z perp Y | D, X$
253-
+ Unconfounderness/d-separation : $ (Z, D) perp Y(d) | X, U$ for $d = 0,1$
261+
+ Unconfounderness/d-separation : $(Z, D) perp Y(d) | X, U$ for $d = 0,1$
254262

255263
As @levis2025covariate mentioned, under these assumptions, the ATE is not point identified, homogeneity assumptions are :
256-
+ Version 1, for binary $Z$ : Either $ EE[D | Z = 1, X , U] - EE[D | Z = 0, X , U] $ or $ EE[ Y(1) - Y(0) | X , U] $ does not depend on $U$.
257-
- #quote([Assumption 5′ rules out additive effect modification by $U$ of the $Z-D$ relationship or $d-Y (d)$ relationship within levels of $X$. A weaker alternative is the no unmeasured common effect modifier assumption (Cui and Tchetgen Tchetgen, 2021, Hartwig et al., 2023), which stipulates that no unmeasured confounder acts as a common effect modifier of both the additive effect of the IV on the treatment and the additive treatment effect on the outcome:])
258-
+ Version 2, weaker alternative for binary $Z$, following equation holds almost surely:
259-
$ "Cov"(EE(D| Z= 1, X, U)- EE(D|Z=0, X, U), EE(Y(1) - Y(0) | X,U) | X ) = 0 $
260-
+ Final version, for continuous or multiple-category $Z$, for any $z$ in the support of $Z$, following equation holds almost surely:
261-
$ "Cov"(EE(D| Z= z, X, U)- EE(D| X, U), EE(Y(1) - Y(0) | X,U) | X ) = 0 $
262-
for any $z, z'$ in the support of $Z$.
264+
+ Version 1, for binary $Z$ : Either $ EE[D | Z = 1, X , U] - EE[D | Z = 0, X , U] $ or $ EE[ Y(1) - Y(0) | X , U] $ does not depend on $U$.
265+
- #quote(
266+
[Assumption 5′ rules out additive effect modification by $U$ of the $Z-D$ relationship or $d-Y (d)$ relationship within levels of $X$. A weaker alternative is the no unmeasured common effect modifier assumption (Cui and Tchetgen Tchetgen, 2021, Hartwig et al., 2023), which stipulates that no unmeasured confounder acts as a common effect modifier of both the additive effect of the IV on the treatment and the additive treatment effect on the outcome:],
267+
)
268+
+ Version 2, weaker alternative for binary $Z$, following equation holds almost surely:
269+
$ "Cov"(EE(D| Z= 1, X, U)- EE(D|Z=0, X, U), EE(Y(1) - Y(0) | X, U) | X ) = 0 $
270+
+ Final version, for continuous or multiple-category $Z$, for any $z$ in the support of $Z$, following equation holds almost surely:
271+
$ "Cov"(EE(D| Z= z, X, U)- EE(D| X, U), EE(Y(1) - Y(0) | X, U) | X ) = 0 $
272+
for any $z, z'$ in the support of $Z$.
263273

264274

265275
- The real-data applicationis combine many genetic variants as weak IVs to a strong and continuous IV to solve the "obesity paradox" in oncology.
266-
- #quote("Obesity is typically associated with poorer oncology outcomes. Paradoxically, however, many observational studies have reported that non-small cell lung cancer (NSCLC) patients with higher body mass index (BMI) experience lower mortality, a phenomenon often referred to as the “obesity paradox” (Zhang et al., 2017).")
276+
- #quote(
277+
"Obesity is typically associated with poorer oncology outcomes. Paradoxically, however, many observational studies have reported that non-small cell lung cancer (NSCLC) patients with higher body mass index (BMI) experience lower mortality, a phenomenon often referred to as the “obesity paradox” (Zhang et al., 2017).",
278+
)
267279

268280
- Using the ratio of conditional weighted average treatment effect, for multiple-category (CWATE) or conditional weighted average derivative effect (CWADE) to identify the ATE.
269281

@@ -275,6 +287,8 @@ As @levis2025covariate mentioned, under these assumptions, the ATE is not point
275287

276288
== The equivalence between DAG and potential outcome framework
277289

290+
@wang2025causal
291+
278292
=== The equivalence between nonparametric structural equation model(NPSEM) and potential outcome framework
279293

280294
=== The equivalence between SWIG and FFRCISTG

static/notes/notes.pdf

1.88 KB
Binary file not shown.

0 commit comments

Comments
 (0)