Skip to content

Conversation

@BorisMuzellec
Copy link
Collaborator

@BorisMuzellec BorisMuzellec commented Feb 13, 2024

Reference Issue or PRs

(Partially) fixes #231, closes #235

What does your PR implement? Be specific.

This PR removes pydeseq2's dependence on statsmodels, which has been causing a few bugs recently (and is the reason why the CI currently fails).

Statsmodels was used to:

@BorisMuzellec BorisMuzellec marked this pull request as ready for review February 15, 2024 09:24
@BorisMuzellec BorisMuzellec requested review from a user and umarteauowkin February 15, 2024 09:24
@BorisMuzellec BorisMuzellec changed the title Remove statsmodels MAINT Remove statsmodels Feb 15, 2024
Copy link
Collaborator

@umarteauowkin umarteauowkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks perfect, I checked the math in the lowess computation and compared to the statsmodel implem, it is the same. Thanks a lot @BorisMuzellec

np.array([np.sort(np.abs(features - features[i]))[r] for i in range(n)]), 1e-12
)
w = np.clip(
np.abs(np.nan_to_num((features[:, None] - features[None, :]) / h)), 0.0, 1.0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe for coherence put h[:,None]

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it would make the code clearer? Given that the code is working as is, I'm a bit reluctant to add unnecessary broadcasting operations.

# Adapted from https://gist.github.com/agramfort/850437


def lowess(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the review!

"Every gene contains at least one zero, "
"cannot compute log geometric means. Switching to iterative mode.",
RuntimeWarning,
UserWarning,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, what is the rationale of selecting UserWarning ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bad reason: it was easier to catch it simultaneously with a switch to "mean" trend curve fitting in the tests. The better reason is that I felt it was more suited (RuntimeWarning is supposed to cover dubious runtime behavior which is not really what's going on here).

@adamgayoso
Copy link
Member

Just wanted to flag that there's a very stable implementation of loess in this package, which is a common dependency of plotnine so should be a stable dependency.

return (coeffs, covariates_fit @ coeffs)

if not res.success:
raise RuntimeError("Gamma GLM optimization failed.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be cleaner to return res.success and raise the error in the while loop

@umarteauowkin
Copy link
Collaborator

@adamgayoso From what I gather lowess is not exactly loess, the weighting scheme is automatic in lowess and hand made in loess. @BorisMuzellec correct me if I m wrong, but in scikit-misc, I think the weighted version is not implemented.

@adamgayoso
Copy link
Member

@umarteauowkin it looks like you can set the weights. Nonetheless I would consider wrapping this custom lowess with a numba njit decorator for speed, as well as test it against the statsmodels one

@BorisMuzellec
Copy link
Collaborator Author

Just wanted to flag that there's a very stable implementation of loess in this package, which is a common dependency of plotnine so should be a stable dependency.

Hi @adamgayoso, thanks for the suggestion! Indeed I tried to use this implementation in #234 (tests are currently failing due to an issue in the docs which can be ignored), but I've experienced several issues because it causes segmentation errors on some platforms, and because scikit-misc's loess class is not picklable. (Cf the PR description of #234)

@BorisMuzellec BorisMuzellec merged commit be50f09 into main Feb 19, 2024
@BorisMuzellec BorisMuzellec deleted the remove_statsmodels branch February 19, 2024 14:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Negative coefficients in the trend curve

4 participants