You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: notes/main.typ
+37-23Lines changed: 37 additions & 23 deletions
Original file line number
Diff line number
Diff line change
@@ -113,21 +113,25 @@ Issue of #emph("Observational Studies") titleed #link("https://en.wikipedia.org/
113
113
114
114
Form comment in #link("https://statmodeling.stat.columbia.edu/2025/11/14/how-is-it-that-this-problem-with-its-21-data-points-is-so-much-easier-to-handle-with-1-predictor-than-with-16-predictors/")[`Impossible statistical problems`] of Andrew Gelman by Phil, November 14, 2024.
115
115
116
-
#quote("I’m imagining a political science student coming in for statistical advice:
116
+
#quote(
117
+
"I’m imagining a political science student coming in for statistical advice:
117
118
Student: I’m trying to predict the Democratic percentage of the two-party vote in U.S. Presidential elections, six months before Election Day. I want to use just the past ten elections because I think the political landscape was too different before that.
118
119
Statistician: Sounds interesting. What predictive variables do you have?
119
120
Student: I’ve got the Democratic share in the last election, and the change in unemployment rate over the past year and the past three years, and the inflation rate over the past year and the past three years, and the change in median income over the past year and past three years.
120
121
Statistician: That’s a lot of predictors for not many elections, we are going to have some issues, but maybe we can use lasso or a regularization scheme or something. Let’s get started.
121
122
Student: I also own an almanac.
122
-
Statistician: Oh. Sorry, I can’t help you, your problem is impossible.")
123
+
Statistician: Oh. Sorry, I can’t help you, your problem is impossible.",
124
+
)
123
125
124
126
With only 10 data points and 7 predictors, there is still some room for analysis. However, when using an almanac with over 1,000 predictors, the problem becomes unsolvable: the model is overparameterized and loses all predictive power for future observations.
125
127
126
128
Therefore, in scenarios with extremely small sample sizes, an excess of irrelevant predictors can contaminate the data—rather than enriching it—and render meaningful analysis impossible.
127
129
128
-
But it now an empirical observation, can we theoretically explain this phenomenon?
130
+
But it now an empirical observation, can we theoretically explain this phenomenon?
129
131
130
-
#question("Theoretical explanation for overparameterized models with small sample size")[For sample siez $n = 200$, outcome $Y inRR$ and predictors $X inRR^(p)$, $p / n = c in (0, infinity)$, $Y$ is independent of $X$, under what condition? We will see that a machine learning algorithm can still predict $Y$ well from $X$?.]
132
+
#question(
133
+
"Theoretical explanation for overparameterized models with small sample size",
134
+
)[For sample siez $n = 200$, outcome $Y inRR$ and predictors $X inRR^(p)$, $p / n = c in (0, infinity)$, $Y$ is independent of $X$, under what condition? We will see that a machine learning algorithm can still predict $Y$ well from $X$?.]
131
135
132
136
Well, this is just the global null hypothesis testing problem in high dimension models. We can use a nonparametric regression view to see this problem.
133
137
@@ -137,9 +141,11 @@ $ "H"_0 : f = 0 , "H"_1 : f eq.not 0 $
137
141
138
142
Using a base function of $cal(H)$ and truncating at $k$ terms, it should be answered well as a hypothesis testing problem, the signal noise ratio, sparsity and the constant $p/n$ will give the detection boundary, also the local minimax rate.
139
143
140
-
Well, that's asymptotic theory, there still is the question for tiny $n$, say $n = 10$ or $20$. Can we give any answer for this? Can we know anything useful form so tiny sample size? This case may be called #link("https://en.wikipedia.org/wiki/Knightian_uncertainty")[Knightian uncertainty]?
144
+
Well, that's asymptotic theory, there still is the question for tiny $n$, say $n = 10$ or $20$. Can we give any answer for this? Can we know anything useful form so tiny sample size? This case may be called #link("https://en.wikipedia.org/wiki/Knightian_uncertainty")[Knightian uncertainty]?
141
145
142
-
#quote("In economics, Knightian uncertainty is a lack of any quantifiable knowledge about some possible occurrence, as opposed to the presence of quantifiable risk (e.g., that in statistical noise or a parameter's confidence interval). The concept acknowledges some fundamental degree of ignorance, a limit to knowledge, and an essential unpredictability of future events.")
146
+
#quote(
147
+
"In economics, Knightian uncertainty is a lack of any quantifiable knowledge about some possible occurrence, as opposed to the presence of quantifiable risk (e.g., that in statistical noise or a parameter's confidence interval). The concept acknowledges some fundamental degree of ignorance, a limit to knowledge, and an essential unpredictability of future events.",
148
+
)
143
149
144
150
= On the undistinguishable or identification of statistical models
145
151
@@ -236,34 +242,40 @@ Here $serif(Pr)(Y(1) = b | G = g)$ and $serif(Pr)( Y(0) = a | G = g)$ can be ide
236
242
237
243
@dong2025marginal
238
244
239
-
- talk about the indenfication of ATE with continuous or multiple-category IVs with binary treatment.
240
-
245
+
- talk about the indenfication of ATE with continuous or multiple-category IVs with binary treatment.
246
+
241
247
242
248
- data are $(X,D,Z,Y)$
243
249
#image("media/image.png")
244
250
245
251
- The identification assumption:
246
-
+ Stable Unit Treatment Value Assumption (SUTVA) for potential outcomes:
247
-
- Consistency and no interference between units:
248
-
$ Y = Y (D) & = D Y(1) + (1-D) Y(0) \
249
-
D & = D(Z) $
250
-
+ IV relevance (version 1): $ Z cancel(perp) D | X$ almost surely.
251
-
+ IV independence : $ Z perp U | X$
252
+
+ Stable Unit Treatment Value Assumption (SUTVA) for potential outcomes:
253
+
- Consistency and no interference between units:
254
+
$
255
+
Y = Y (D) & = D Y(1) + (1-D) Y(0) \
256
+
D & = D(Z)
257
+
$
258
+
+ IV relevance (version 1): $Z cancel(perp) D | X$ almost surely.
As @levis2025covariate mentioned, under these assumptions, the ATE is not point identified, homogeneity assumptions are :
256
-
+ Version 1, for binary $Z$ : Either $EE[D | Z = 1, X , U] - EE[D | Z = 0, X , U] $ or $EE[ Y(1) - Y(0) | X , U] $ does not depend on $U$.
257
-
- #quote([Assumption 5′ rules out additive effect modification by $U$ of the $Z-D$ relationship or $d-Y (d)$ relationship within levels of $X$. A weaker alternative is the no unmeasured common effect modifier assumption (Cui and Tchetgen Tchetgen, 2021, Hartwig et al., 2023), which stipulates that no unmeasured confounder acts as a common effect modifier of both the additive effect of the IV on the treatment and the additive treatment effect on the outcome:])
258
-
+ Version 2, weaker alternative for binary $Z$, following equation holds almost surely:
+ Version 1, for binary $Z$ : Either $EE[D | Z = 1, X , U] - EE[D | Z = 0, X , U] $ or $EE[ Y(1) - Y(0) | X , U] $ does not depend on $U$.
265
+
- #quote(
266
+
[Assumption 5′ rules out additive effect modification by $U$ of the $Z-D$ relationship or $d-Y (d)$ relationship within levels of $X$. A weaker alternative is the no unmeasured common effect modifier assumption (Cui and Tchetgen Tchetgen, 2021, Hartwig et al., 2023), which stipulates that no unmeasured confounder acts as a common effect modifier of both the additive effect of the IV on the treatment and the additive treatment effect on the outcome:],
267
+
)
268
+
+ Version 2, weaker alternative for binary $Z$, following equation holds almost surely:
- The real-data applicationis combine many genetic variants as weak IVs to a strong and continuous IV to solve the "obesity paradox" in oncology.
266
-
- #quote("Obesity is typically associated with poorer oncology outcomes. Paradoxically, however, many observational studies have reported that non-small cell lung cancer (NSCLC) patients with higher body mass index (BMI) experience lower mortality, a phenomenon often referred to as the “obesity paradox” (Zhang et al., 2017).")
276
+
- #quote(
277
+
"Obesity is typically associated with poorer oncology outcomes. Paradoxically, however, many observational studies have reported that non-small cell lung cancer (NSCLC) patients with higher body mass index (BMI) experience lower mortality, a phenomenon often referred to as the “obesity paradox” (Zhang et al., 2017).",
278
+
)
267
279
268
280
- Using the ratio of conditional weighted average treatment effect, for multiple-category (CWATE) or conditional weighted average derivative effect (CWADE) to identify the ATE.
269
281
@@ -275,6 +287,8 @@ As @levis2025covariate mentioned, under these assumptions, the ATE is not point
275
287
276
288
== The equivalence between DAG and potential outcome framework
277
289
290
+
@wang2025causal
291
+
278
292
=== The equivalence between nonparametric structural equation model(NPSEM) and potential outcome framework
0 commit comments