-
Notifications
You must be signed in to change notification settings - Fork 103
Refactor chi-squared distribution #77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Codecov Report
@@ Coverage Diff @@
## master #77 +/- ##
==========================================
+ Coverage 92.24% 92.67% +0.42%
==========================================
Files 44 44
Lines 7187 7299 +112
==========================================
+ Hits 6630 6764 +134
+ Misses 557 535 -22
Continue to review full report at Codecov.
|
src/distribution/chi_squared.rs
Outdated
| if x > 0.0 { | ||
| self.g.ln_pdf(x) | ||
| } else { | ||
| -f64::INFINITY |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
conventionally we've been using f64::NEG_INFINITY, I'm not actually 100% sure -f64::INFINITY == f64::NEG_INFINITY
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I completely missed this one.
BTW, I didn't know that these constants are defined like this 😁 :
pub const INFINITY: f64 = 1.0f64 / 0.0f64
pub const NEG_INFINITY: f64 = -1.0f64 / 0.0f64| /// where `k` is the degrees of freedom and `Γ` is the gamma function | ||
| fn pdf(&self, x: f64) -> f64 { | ||
| self.g.pdf(x) | ||
| if x > 0.0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can be more specific, x == 0.0 is defined (at least according to Wikipedia) as long as freedom != 1.0, so we can probably do something like if x > 0.0 || (self.freedom != 1.0 && x == 0.0). May want to consider prec::almost_eq instead of == or != for floats but I'm just spitballing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I screwed up here. Should've done better research.
The thing is a little bit more complicated. Wikipedia silently assumes only integer degrees of freedom. If we take freedom as a positive real, then we have 3 cases:
0 < freedom < 2:pdf(0) = +∞, hence0is excluded from the supportfreedom == 2:pdf(0) = 0.52 < freedom:pdf(0) = 0
This is obvious from the pdf formula (see Wikipedia), specifically the x^(k/2 - 1) term. All other terms are non-zero for any freedom.
Anyway, I've tried out scipy's implementation of chi-squared. Its behaviour is exactly as above (i.e. +∞, 0.5 and 0).
So the question is, what to do for the case when 0 < freedom < 2: Should we let the pdf function return +∞ (as scipy does)? Or should we handle it as a separate case?
My personal preference right now is to return +∞, but document this behaviour in the documentation.
Edit: wording.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's fine by me as long as it's clear in the documentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok! Just to make sure: You prefer to return +∞ when 0 < freedom < 2 and pdf is evaluated at 0.
Then I'll rewrite the unit tests and update the docs.
| test_almost(2.5, 0.28134822576318228131, 1e-15, |x| x.pdf(1.0)); | ||
| test_almost(2.5, 0.045412171451573920401, 1e-15, |x| x.pdf(5.5)); | ||
| test_almost(2.5, 1.8574923023527248767e-24, 1e-36, |x| x.pdf(110.1)); | ||
| test_case(2.5, 0.0, |x| x.pdf(f64::INFINITY)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does pdf return for freedom == f64::INFINITY normally? I'd like to have this case defined but am not necessarily sure we need an explicit branch in the code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It returns a NaN. I guess as a result of 0 * Inf when computing the pdf.
TBH I don't like the idea of returning 0 instead of NaN because it doesn't make sense for me... Shouldn't a pdf integrate to 1 over support? How can it be upheld when pdf() returns 0 everywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point. I think maybe being explicit about the NaN would be good, like adding a specific branch and note it in the docstring
|
Yup, I think being consistent with scipy is probably a good thing
…On Oct 30, 2017 11:52 AM, "Mikhail Pak" ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In src/distribution/chi_squared.rs
<#77 (comment)>:
> @@ -310,7 +310,11 @@ impl Continuous<f64, f64> for ChiSquared {
///
/// where `k` is the degrees of freedom and `Γ` is the gamma function
fn pdf(&self, x: f64) -> f64 {
- self.g.pdf(x)
+ if x > 0.0 {
Ok! Just to make sure: You prefer to return +∞ when 0 < freedom < 2 and
pdf is evaluated at 0.
Then I'll rewrite the unit tests and update the docs.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#77 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ACwkvpSBpFsyFbCHnZeBEr1mygWosyVlks5sxfDagaJpZM4P_MEl>
.
|
Mode cannot be negative
85d6884 to
2df72f2
Compare
|
Hey @mp4096 what's the status on this PR? Do you need any help bringing it across the finish line? |
Ok, this is a larger one:
Mode
Fix formula for the mode. Math.NET implementation has an error for
freedom < 2and faulty unit tests as well.Pdf and log density
When investigating this unit test:
I realised the problem two-fold: First, the old pdf implementation returned
Infat0instead of0. Second, forfreedom == 1.0the pdf is very steep around0and cannot be integrated with the hard-coded step size. So I implemented a branching in the pdf and log density (also see Wikipedia) and changedfreedomto1.5.Actually, Math.NET has a little bit more branching in the chi-squared pdf, but the bulk of it is covered in
statrs'sgamma.rsimplementation. I'm not sure if we want to cover the case offreedom = Inf. @boxtown What's your opinion on this? If yay, I'll add this branching topdf()andln_pdf()and uncomment the unit tests. If nay, I'll just delete the commented unit tests.Misc
And of course added more unit tests, but this is kind of self-explanatory.