From 6fb644b1a13b835e1aab6a95491ac7c1d18c2af7 Mon Sep 17 00:00:00 2001 From: bgoodri Date: Fri, 27 Aug 2021 21:58:32 -0400 Subject: [PATCH 1/9] Create 0028-quantile-functions.md --- designs/0028-quantile-functions.md | 1 + 1 file changed, 1 insertion(+) create mode 100644 designs/0028-quantile-functions.md diff --git a/designs/0028-quantile-functions.md b/designs/0028-quantile-functions.md new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/designs/0028-quantile-functions.md @@ -0,0 +1 @@ + From 8ea2a56818d5da781a3177a304596711f323df69 Mon Sep 17 00:00:00 2001 From: bgoodri Date: Sat, 28 Aug 2021 01:34:34 -0400 Subject: [PATCH 2/9] first draft --- designs/0028-quantile-functions.md | 211 +++++++++++++++++++++++++++++ 1 file changed, 211 insertions(+) diff --git a/designs/0028-quantile-functions.md b/designs/0028-quantile-functions.md index 8b13789..1d0baf6 100644 --- a/designs/0028-quantile-functions.md +++ b/designs/0028-quantile-functions.md @@ -1 +1,212 @@ +- Feature Name: Quantile Functions +- Start Date: September 1, 2021 +- RFC PR: (leave this empty) +- Stan Issue: (leave this empty) +# Summary +[summary]: #summary + +I intend to add many quantile functions to Stan Math, at least for univariate continuous probability distributions, that would eventually be exposed in the Stan language. For a univariate continuous probability distribution, the quantile function is the inverse of the Cumulative Density Function (CDF), i.e. the quantile function is an increasing function from $\left[0,1\right]$ to the parameter space, $\Theta$. The quantile function can be used to _define_ a probability distribution, although for many well-known probability distributions (e.g. the standard normal) the quantile function cannot be written as an explicit function. + +# Motivation +[motivation]: #motivation + +The primary motivation for implementing quantile functions in Stan Math is so that users can call them in the `transformed parameters` block of a Stan program to express their prior beliefs about a parameter. The argument (in the mathematical sense) of a quantile function is a cumulative probability between $0$ and $1$ that would be declared in the `parameters` block and have an implicit standard uniform prior. Thus, if $\theta$ is defined in the `transformed parameters` block by applying the quantile function to this cumulative probability, then the distribution of $\theta$ under the prior is the probability distribution defined by that quantile function. When data is conditioned on, the posterior distribution of the cumulative probability becomes non-uniform but we still obtain MCMC draws from the posterior distribution of $\theta$. + +If we were to unnecessarily restrict ourselves to quantile functions for common probability distributions, this method of expressing prior beliefs about $\theta$ is no easier than declaring $\theta$ in the `parameters` block and using a prior Probability Density Function (PDF) to express beliefs about $\theta$ in the `model` block. However, if $\theta$ has a heavy-tailed distribution, then defining it in the `transformed parameters` block may yield more efficient MCMC because the distribution of the cumulative probability (when transformed to the unconstrained space) tends to have tails that are more reasonable. In addition, expressing prior beliefs via quantile functions is necessary if the log-likelihood function is reparameterized in terms of cumulative probabilities, which we intend to pursue for computational speed over the next three years under the same NSF grant. + +However, there is no need to restrict ourselves to quantile functions for common probability distributions, and it is conceptually easier for the user to express prior beliefs about $\theta$ using very flexible probability distributions that are defined by explicit quantile functions but lack explicit CDFs and PDFs. Examples include the Generalized Lambda Distribution, the metalog(istic) distribution, the Chebyshev distribution of the first kind, and increasing cubic splines. In each of these cases, the user would specify a few pairs of cumulative probabilities and values of $\theta$, and then an increasing curve would be run through these user-supplied points. In other words, the quantile function would interpolate through the prior median, the prior quartiles, etc. Thus, from the user's perspective, all of the hyperparameters are _location_ parameters --- even though they collectively characterize the shape of the prior distribution for $\theta$ --- and location parameters are easier for user's to think about than other kinds of hyperparameters. + +We can support this workflow for specifying prior beliefs about $\theta$ for any univariate continuous probability distribution, although the more flexible distributions include a lot of common probability distributions that have one or two hyperparameters as special cases. I anticipate that once Stan users are exposed to this workflow, many will prefer to use it over the traditional workflow of specifying prior beliefs about $\theta$ via a PDF. + +# Guide-level explanation +[guide-level-explanation]: #guide-level-explanation + +For univariate continuous probability distributions, the quantile function is the inverse of the CDF, or equivalently, the CDF is the inverse of the quantile function. For example, in the following minimal Stan program +``` +data { + real median; + real scale; +} +parameters { + real p; +} +transformed parameters { + real theta = median + scale * tan(pi() * (p - 0.5)); +} // quantile function of the Cauchy distribution +``` +the distribution of `p` is standard uniform (because there is no `model` block) and the distribution of $\theta$ is Cauchy because `theta` is defined by the quantile function of the Cauchy distribution, $\theta = m + s \tan\left(\pi * \left(p - \frac{1}{2}\right)\right)$. But it would be better for users if the quantile function for the Cauchy distribution were implemented in Stan Math and (eventually) exposed to the Stan language, so that this minimal Stan program could be equivalently rewritten as +``` +data { + real median; + real scale; +} +parameters { + real p; +} +transformed parameters { + real theta = cauchy_qf(p, median, scale); +} +``` +However, rarely do user's have prior beliefs about $\theta$ that necessitate the Cauchy distribution, i.e. $\theta$ is a ratio of centered normals. Rather, when users use the Cauchy distribution as a prior they do so because their beliefs about $\theta$ are heavy-tailed and symmetric. But there are many other probability distributions that are heavy tailed and symmetric that are not exactly Cauchy. In that situation, a user may find it preferable to use a more flexible distribution, such as the metalog(istic) distribution, whose quantile function interpolates through $K$ pairs of `quantiles` and `depths` (cumulative probabilities) that the user passes to the `data` block of their Stan program. +``` +data { + int K; + ordered[K] quantiles; + ordered[K] depths; +} +parameters { + real p; +} +transformed parameters { + real theta = metalog_qf(p, quantiles, depths); +} +``` +The mindset of the Stan user could be: + +> Before seeing any new data, I believe there is an equal chance that $\theta$ is negative as positive. There is a $\frac{1}{4}$ chance that $\theta < -1$ and a $\frac{1}{4}$ chance that $\theta > 1$. Finally, there is a $\frac{1}{10}$ chance that $\theta < -3$ and a $\frac{1}{10}$ chance that $\theta > 3$. Thus, `quantiles = [-3, -1, 0, 1, 3]'` and `depths = [0.1, 0.25, 0.5, 0.75, 0.9]'`. + +Then, the `metalog_qf` interpolates through these $K = 5$ points that the user provides. In this case, the user's prior beliefs about $\theta$ happen to be very close to a standard Cauchy distribution, but if the prior `quantiles` and `depths` were different, then the metalog distribution would still apply. Thus, the user does not need to know about the Cauchy distribution or that you could obtain a distribution with slightly lighter tails by using the Student t distribution with a small number of degrees of freedom; they just need to specify the prior values of the quantile function at $\frac{1}{10}$, $\frac{9}{10}$ or other tail `depths`. + +An exception is thrown by any quantile function if a purported cumulative probability is negative or greater than $1$, although $0$ and $1$ are valid edge cases. Similarly, for quantile functions like `metalog_qf` that input `quantiles` and `depths` an exception is thrown if these vectors are not strictly increasing. Finally, for some quantile functions like `metalog_qf`, it is difficult to tell whether it is strictly increasing over the entire $\left[0,1\right]$ interval; thus, an exception is thrown if the derivative of the quantile function (called the quantile density function) evaluated at `p` is negative. In the usual case where `quantiles` and `depths` are declared in the `data` block, it could be possible to call some sort of validity function in the `transformed data` block that would return $1$ if the quantile function is strictly increasing over the entire $\left[0,1\right]$ interval (and $0$ otherwise) by checking whether the quantile density function lacks a root on $\left[0,1\right]$. + +# Reference-level explanation +[reference-level-explanation]: #reference-level-explanation + +Unlike the `_cdf` and `_lpdf` functions in Stan Math that return scalars even if their arguments are not scalars, the quantile functions should return a scalar only if all of the arguments are scalars and otherwise should return something whose size is equivalent to the size of the argument with the largest size. In C++, `cauchy_qf` could be implemented in `prim/prob` roughly like +``` +template +return_type_t cauchy_qf(const T_p& p, const T_loc& mu, + const T_scale& sigma) { + static const char* function = "cauchy_qf"; + check_consistent_sizes(function, "Depth", p, "Location parameter", + mu, "Scale Parameter", sigma); + check_probability(function, "Depth", p); + check_finite(function, "Location parameter", mu); + check_positive_finite(function, "Scale parameter", sigma); + using T_partials_return = partials_return_t; + T_partials_return quantiles; + for (int n = 0; n < size(quantiles); n++) { // if mu and sigma are scalars + quantiles[n] = mu + sigma * tan(stan::math::pi() * (p[n] - 0.5)); + } + return quantiles; +} +``` +In the actual implementation, we would want to handle the partial derivatives analytically rather than using autodiff, at least in cases like `cauchy_qf` where the quantile function is explicit. In this case, if `p[n]` is $1$ or $0$, then $\tan\left(\pm \frac{\pi}{2}\right)$ is $\pm \infty$, which is correct. + +The `normal_qf` would be almost the same, even though it relies on `inv_Phi` which is not an explicit function of the depth: +``` +template +return_type_t normal_qf(const T_p& p, const T_loc& mu, + const T_scale& sigma) { + static const char* function = "normal_qf"; + check_consistent_sizes(function, "Depth", p, "Location parameter", + mu, "Scale Parameter", sigma); + check_probability(function, "Depth", p); + check_finite(function, "Location parameter", mu); + check_positive_finite(function, "Scale parameter", sigma); + using T_partials_return = partials_return_t; + T_partials_return quantiles; + for (int n = 0; n < size(quantiles); n++) { // if mu and sigma are scalars + quantiles[n] = mu + sigma * inv_Phi(p[n]); + } + return quantiles; +} +``` +In this case, if `p[n]` is _near_ $1$ or $0$, then `inv_Phi(p[n])` will overflow to $\pm \infty$, which is numerically acceptable even though analytically, it should only be $\pm \infty$ if `p[n]` is _exactly_ $1$ or $0$ respectively. + +In the case of the normal distribution, it is rather easy to differentiate with respect to $\mu$ and / or $\sigma$ because they do not enter the implicit function `inv_Phi`. However, in the case of many distributions like the gamma distribution, there is no explicit quantile function but the shape hyperparameters enter the implicit function. Thus, we can evaluate the quantile function of the gamma distribution rather easily via Boost: +``` +template +return_type_t gamma_qf(const T_p& p, + const T_shape& alpha, + const T_inv_scale& beta) { + // argument checking + auto dist = boost::gamma_distribution<>(value_of(alpha), 1 / value_of(beta)); + return quantile(dist, value_of(p)); +} +``` +but differentiating the quantile function with respect to the shape hyperparameters would be as difficult as differentiating the CDF with respect to the shape hyperparameters. So, I might start by requiring the hyperparameters to be fixed data for distributions that lack explicit quantile functions, and then allow the hyperparameters to be unknown `var` later. + +Boost also has "complimentary" quantile functions that have greater numerical accuracy when the depth is close to $1$. Hopefully, we can just utilize these inside Stan Math rather than exposing them to the Stan language. It would look something like +``` +template +return_type_t gamma_qf(const T_p& p, + const T_shape& alpha, + const T_inv_scale& beta) { + // argument checking + auto dist = boost::gamma_distribution<>(value_of(alpha), 1 / value_of(beta)); + if (p > 0.999) return quantile(complement(dist, value_of(p))); + else return quantile(dist, value_of(p)); +} +``` + +For any probability distribution with a conceptually _fixed_ number of hyperparameters, implementing the quantile function is basically like implementing the CDF (except without the reduction to a scalar return value). However, there are useful probability distributions, such as the metalog distribution, where the quantile function can take an _arbitrary_ number of pairs of quantiles and depths whose number is not known until runtime. In that case, some of the trailing arguments to the quantile function will be vectors or real arrays, or perhaps arrays of vectors or arrays of real arrays. The simplest signature would be +``` +template +T_p metalog_qf(const T_p& p, vector_d& quantiles, vector_d& depths); +``` +In order to interpolate through the $K$ points given by `quantiles` and `depths`, you have to set up and then solve a system of $K$ linear equations, which is not too difficult. However, in the usual case where both `quantiles` and `depths` are fixed data, it is computationally wasteful to solve a constant system of $K$ linear equations at each leapfrog step. It would be more efficient for the user to solve for the $K$ coefficients once in the `transformed data` block and pass those to `metalog_qf` in the `transformed parameters` block when it is applied to a `var`. If we were willing to distinguish user-facing functions by number of arguments, we could have a variant of the metalog quantile function that takes the coefficients: +``` +template +T_p metalog_qf(const T_p& p, vector_d& coefficients); +``` +that were produced by a helper function +``` +vector_d metalog_coef(vector_d& quantiles, vector_d& depths); +``` +that checked the `quantiles` and `depths` for validity, solved the system of $K$ equations, and returned the resulting coefficients. + +This `metalog_coef` function could also check that the quantile function that interpolates through these $K$ points is, in fact, strictly increasing over the entire $\left[0,1\right]$ interval. If `quantiles` and / or `depths` were not fixed data, then it would be very expensive to check whether the quantile function that interpolates through these $K$ points is strictly increasing over the entire $\left[0,1\right]$ interval, although it is cheap to check whether it is increasing at a _particular_ depth (and we would have to evaluate the derivative anyway if the depth is a `var`). Thus, if `quantiles` and / or `depths` is a `vector_v`, I propose that we check that the quantile function is increasing at the given depth(s) rather than at all possible dephs, which is presumably adequate for MCMC and optimization but could result in an invalid probability distribution if it were used in some other context. + +Boost has a well-known monotonic [interpolation](https://www.boost.org/doc/libs/1_76_0/libs/math/doc/html/math_toolkit/pchip.html) function called `pchip` that could be used as a flexible quantile function. It consists of piecewise cubic polynomials that are differentiable at the given points. However, the `pchip` constructor returns a callable, and like in the metalog case, it would be computationally wasteful to reconstruct the spline at every leapfrog step when the points that it is interpolating through are fixed data. But the Stan language currently lacks something useful like +``` +data { + int K; + ordered[K] quantiles; + ordered[K] depths; +} +transformed data { // not valid syntax + callable my_qf = pchip(depths, quantiles); +} +parameters { + real p; +} +transformed parameters { + real theta = my_qf(p); +} +``` +Maybe we could add something like that `transformed data` block when we do lambda functions. + +# Drawbacks +[drawbacks]: #drawbacks + +Beyond the usual fact that implementing any new function requires documentation, more computer time to unit test, maintainence, etc., this will create a lot of cognitive dissonance for users, even though it is intended to make things easier for them. I think it will be a lot like when we introduced the LKJ distribution, and most users had no idea what to input for the shape parameter because they had not previously thought about a probability distribution over the space of correlation matrices. In this case, almost all users have not previously thought about using a prior _transformation_ of a cumulative probability rather than a prior _PDF_ on the parameter of interest, even though these are the flip sides of the same coin and you could express the same prior beliefs either way. So, there are going to be a lot of misguided questions like "How do I interpret the posterior distribution of `p`?". Also, even if we have quantile functions, there are going to be a lot of users that only want to utilize probability distributions like the normal that they have heard of before, even though they have no idea what to put for the prior standard deviation and even though probability distributions that they have not heard of before are easier because you only have to specify prior quantiles and depths. Also, users are going to run into errors when the quantile function they try to define is not actually increasing over the entire $\left[0,1\right]$ interval and they will not know what to do. + +# Rationale and alternatives +[rationale-and-alternatives]: #rationale-and-alternatives + +A quantile function must be an increasing function from the $\left[0,1\right]$ interval to a parameter space, $\Theta$. So, it is not as if there is some other concept of a quantile function to compete with this definition, in the same way that there is essentially only one way to define the concept of a CDF. Thus, the only flexibility in the design is in the details, such as how much time are we willing to spend checking that a metlog quantile function is increasing? + +I suppose we do not have to merge the ultimate quantile function pull request(s), but I am obligated to make the branches under a NSF grant. This NSF grant also funds me and Philip to interpolate the log-likelihood as a function of `p`, as opposed to a function of $\theta$. We can potentially interpolate the log-likelihood as a function of `p` much faster than we can evaluate the log-likelihood as a function of $\theta$, but in order to do that, users would need to reparameterize their Stan programs so that `p` is declared in the `parameters` block, in which case they would need to use quantile functions in the `transformed parameters` block to express their prior beliefs about $\theta$. Thus, if we do not implement a lot of quantile functions for users to express their prior beliefs with, then they cannot capture the gains in speed from interpolating the log-likelihood as a function of `p`. + +# Prior art +[prior-art]: #prior-art + +Quantile functions have been implemented (but presumably not used much) in other languages for decades, such as: + +* R: [qcauchy](https://stat.ethz.ch/R-manual/R-devel/library/stats/html/Cauchy.html) and similarly for other common distributions +* Boost: [cauchy_distribution](https://www.boost.org/doc/libs/1_76_0/libs/math/doc/html/math_toolkit/dist_ref/dists/cauchy_dist.html) and similarly for other common distributions +* Mathematica: The quantile function is a standard (but not always explicit) function in the [Ultimate Probability Distribution Explorer](https://blog.wolfram.com/2013/02/01/the-ultimate-univariate-probability-distribution-explorer/) + +There is one [textbook](https://www.google.com/books/edition/Statistical_Modelling_with_Quantile_Func/7c1LimP_e-AC?hl=en) on quantile functions that spends a good bit of time pointing out that quantile functions are underutilized even for data (priors or anything to do with Bayesian analysis are not mentioned). Tom Keelin has a number of [papers](http://www.metalogdistributions.com/publications.html) and some Excel workbooks on the metalog distribution, which is defined by its quantile function that are essentially Bayesian but do not involve MCMC. I did a [presentation](https://youtu.be/_wfZSvasLFk) at StanCon and a couple at the Flatiron Institute on these ideas. + +# Unresolved questions +[unresolved-questions]: #unresolved-questions + +- Discrete distributions: Quantile functions exist for them, but they are not very useful because they are step functions. For completeness, we might implement them eventually but they are irrelevant to the NSF grant and the proposed workflow. +- Multivariate distributions: Quantile functions are harder to define for multivariate continuous distributions, and you never get any explicit functions. Thus, I doubt we will ever do anything on that front. +- Density quantile functions: These are the reciprocals of the quantile density functions (which are the first derivative of a quantile functions), which can be used for log-likelihoods. There is a Ph.D student at Lund who is working on this using Stan for his dissertation, but we will have to see who or what should implement that for Stan Math. +- Checking that the quantile function is increasing: In many cases, the quantile function is increasing by construction. But for some distributions like the metalog, the quantile function is only increasing over the entire $\left[0,1\right]$ interval only for some pairs of quantiles and depths. If those quantiles and / or depths are not fixed data, then verifying whether it is increasing for all points in $\left[0,1\right]$ is much more expensive than verifying so for a given point in $\left[0,1\right]$. +- Can we have a three-argument version `metalog_qf(p, quantiles, depths)` and a two-argument version `metalog_qf(p, coefficients)` where the coefficients are produced in the `transformed data` block by `metalog_coef(quantiles, depths)`? +- What to do about derivatives of implicit quantile functions with respect to hyperparameters? We have the implicit function stuff worked out in `algebraic_solver`, but I do not know how accurate the partial derivatives are going to be for probability distributions like the gamma. From 55d3dc1e4bf043acf022e915fd711062d2876965 Mon Sep 17 00:00:00 2001 From: Ben Goodrich Date: Sun, 29 Aug 2021 12:39:28 -0400 Subject: [PATCH 3/9] remove LaTeX --- designs/0028-quantile-functions.md | 44 +++++++++++++++--------------- 1 file changed, 22 insertions(+), 22 deletions(-) diff --git a/designs/0028-quantile-functions.md b/designs/0028-quantile-functions.md index 1d0baf6..3dfb38e 100644 --- a/designs/0028-quantile-functions.md +++ b/designs/0028-quantile-functions.md @@ -6,18 +6,18 @@ # Summary [summary]: #summary -I intend to add many quantile functions to Stan Math, at least for univariate continuous probability distributions, that would eventually be exposed in the Stan language. For a univariate continuous probability distribution, the quantile function is the inverse of the Cumulative Density Function (CDF), i.e. the quantile function is an increasing function from $\left[0,1\right]$ to the parameter space, $\Theta$. The quantile function can be used to _define_ a probability distribution, although for many well-known probability distributions (e.g. the standard normal) the quantile function cannot be written as an explicit function. +I intend to add many quantile functions to Stan Math, at least for univariate continuous probability distributions, that would eventually be exposed in the Stan language. For a univariate continuous probability distribution, the quantile function is the inverse of the Cumulative Density Function (CDF), i.e. the quantile function is an increasing function from [0,1] to the parameter space, Θ. The quantile function can be used to _define_ a probability distribution, although for many well-known probability distributions (e.g. the standard normal) the quantile function cannot be written as an explicit function. # Motivation [motivation]: #motivation -The primary motivation for implementing quantile functions in Stan Math is so that users can call them in the `transformed parameters` block of a Stan program to express their prior beliefs about a parameter. The argument (in the mathematical sense) of a quantile function is a cumulative probability between $0$ and $1$ that would be declared in the `parameters` block and have an implicit standard uniform prior. Thus, if $\theta$ is defined in the `transformed parameters` block by applying the quantile function to this cumulative probability, then the distribution of $\theta$ under the prior is the probability distribution defined by that quantile function. When data is conditioned on, the posterior distribution of the cumulative probability becomes non-uniform but we still obtain MCMC draws from the posterior distribution of $\theta$. +The primary motivation for implementing quantile functions in Stan Math is so that users can call them in the `transformed parameters` block of a Stan program to express their prior beliefs about a parameter. The argument (in the mathematical sense) of a quantile function is a cumulative probability between 0 and 1 that would be declared in the `parameters` block and have an implicit standard uniform prior. Thus, if 𝜃 is defined in the `transformed parameters` block by applying the quantile function to this cumulative probability, then the distribution of 𝜃 under the prior is the probability distribution defined by that quantile function. When data is conditioned on, the posterior distribution of the cumulative probability becomes non-uniform but we still obtain MCMC draws from the posterior distribution of 𝜃. -If we were to unnecessarily restrict ourselves to quantile functions for common probability distributions, this method of expressing prior beliefs about $\theta$ is no easier than declaring $\theta$ in the `parameters` block and using a prior Probability Density Function (PDF) to express beliefs about $\theta$ in the `model` block. However, if $\theta$ has a heavy-tailed distribution, then defining it in the `transformed parameters` block may yield more efficient MCMC because the distribution of the cumulative probability (when transformed to the unconstrained space) tends to have tails that are more reasonable. In addition, expressing prior beliefs via quantile functions is necessary if the log-likelihood function is reparameterized in terms of cumulative probabilities, which we intend to pursue for computational speed over the next three years under the same NSF grant. +If we were to unnecessarily restrict ourselves to quantile functions for common probability distributions, this method of expressing prior beliefs about 𝜃 is no easier than declaring 𝜃 in the `parameters` block and using a prior Probability Density Function (PDF) to express beliefs about 𝜃 in the `model` block. However, if 𝜃 has a heavy-tailed distribution, then defining it in the `transformed parameters` block may yield more efficient MCMC because the distribution of the cumulative probability (when transformed to the unconstrained space) tends to have tails that are more reasonable. In addition, expressing prior beliefs via quantile functions is necessary if the log-likelihood function is reparameterized in terms of cumulative probabilities, which we intend to pursue for computational speed over the next three years under the same NSF grant. -However, there is no need to restrict ourselves to quantile functions for common probability distributions, and it is conceptually easier for the user to express prior beliefs about $\theta$ using very flexible probability distributions that are defined by explicit quantile functions but lack explicit CDFs and PDFs. Examples include the Generalized Lambda Distribution, the metalog(istic) distribution, the Chebyshev distribution of the first kind, and increasing cubic splines. In each of these cases, the user would specify a few pairs of cumulative probabilities and values of $\theta$, and then an increasing curve would be run through these user-supplied points. In other words, the quantile function would interpolate through the prior median, the prior quartiles, etc. Thus, from the user's perspective, all of the hyperparameters are _location_ parameters --- even though they collectively characterize the shape of the prior distribution for $\theta$ --- and location parameters are easier for user's to think about than other kinds of hyperparameters. +However, there is no need to restrict ourselves to quantile functions for common probability distributions, and it is conceptually easier for the user to express prior beliefs about 𝜃 using very flexible probability distributions that are defined by explicit quantile functions but lack explicit CDFs and PDFs. Examples include the Generalized Lambda Distribution, the metalog(istic) distribution, the Chebyshev distribution of the first kind, and increasing cubic splines. In each of these cases, the user would specify a few pairs of cumulative probabilities and values of 𝜃, and then an increasing curve would be run through these user-supplied points. In other words, the quantile function would interpolate through the prior median, the prior quartiles, etc. Thus, from the user's perspective, all of the hyperparameters are _location_ parameters --- even though they collectively characterize the shape of the prior distribution for 𝜃 --- and location parameters are easier for user's to think about than other kinds of hyperparameters. -We can support this workflow for specifying prior beliefs about $\theta$ for any univariate continuous probability distribution, although the more flexible distributions include a lot of common probability distributions that have one or two hyperparameters as special cases. I anticipate that once Stan users are exposed to this workflow, many will prefer to use it over the traditional workflow of specifying prior beliefs about $\theta$ via a PDF. +We can support this workflow for specifying prior beliefs about 𝜃 for any univariate continuous probability distribution, although the more flexible distributions include a lot of common probability distributions that have one or two hyperparameters as special cases. I anticipate that once Stan users are exposed to this workflow, many will prefer to use it over the traditional workflow of specifying prior beliefs about 𝜃 via a PDF. # Guide-level explanation [guide-level-explanation]: #guide-level-explanation @@ -35,7 +35,7 @@ transformed parameters { real theta = median + scale * tan(pi() * (p - 0.5)); } // quantile function of the Cauchy distribution ``` -the distribution of `p` is standard uniform (because there is no `model` block) and the distribution of $\theta$ is Cauchy because `theta` is defined by the quantile function of the Cauchy distribution, $\theta = m + s \tan\left(\pi * \left(p - \frac{1}{2}\right)\right)$. But it would be better for users if the quantile function for the Cauchy distribution were implemented in Stan Math and (eventually) exposed to the Stan language, so that this minimal Stan program could be equivalently rewritten as +the distribution of `p` is standard uniform (because there is no `model` block) and the distribution of 𝜃 is Cauchy because `theta` is defined by the quantile function of the Cauchy distribution. But it would be better for users if the quantile function for the Cauchy distribution were implemented in Stan Math and (eventually) exposed to the Stan language, so that this minimal Stan program could be equivalently rewritten as ``` data { real median; @@ -46,9 +46,9 @@ parameters { } transformed parameters { real theta = cauchy_qf(p, median, scale); -} +} // theta = median + scale * tan(pi() * (p - 0.5)) ``` -However, rarely do user's have prior beliefs about $\theta$ that necessitate the Cauchy distribution, i.e. $\theta$ is a ratio of centered normals. Rather, when users use the Cauchy distribution as a prior they do so because their beliefs about $\theta$ are heavy-tailed and symmetric. But there are many other probability distributions that are heavy tailed and symmetric that are not exactly Cauchy. In that situation, a user may find it preferable to use a more flexible distribution, such as the metalog(istic) distribution, whose quantile function interpolates through $K$ pairs of `quantiles` and `depths` (cumulative probabilities) that the user passes to the `data` block of their Stan program. +However, rarely do user's have prior beliefs about 𝜃 that necessitate the Cauchy distribution, i.e. 𝜃 is a ratio of centered normals. Rather, when users use the Cauchy distribution as a prior they do so because their beliefs about 𝜃 are heavy-tailed and symmetric. But there are many other probability distributions that are heavy tailed and symmetric that are not exactly Cauchy. In that situation, a user may find it preferable to use a more flexible distribution, such as the metalog(istic) distribution, whose quantile function interpolates through _K_ pairs of `quantiles` and `depths` (cumulative probabilities) that the user passes to the `data` block of their Stan program. ``` data { int K; @@ -64,11 +64,11 @@ transformed parameters { ``` The mindset of the Stan user could be: -> Before seeing any new data, I believe there is an equal chance that $\theta$ is negative as positive. There is a $\frac{1}{4}$ chance that $\theta < -1$ and a $\frac{1}{4}$ chance that $\theta > 1$. Finally, there is a $\frac{1}{10}$ chance that $\theta < -3$ and a $\frac{1}{10}$ chance that $\theta > 3$. Thus, `quantiles = [-3, -1, 0, 1, 3]'` and `depths = [0.1, 0.25, 0.5, 0.75, 0.9]'`. +> Before seeing any new data, I believe there is an equal chance that 𝜃 is negative as positive. There is a 0.25 chance that 𝜃 < -1 and a 0.25 chance that 𝜃 > 1. Finally, there is a 0.1 chance that 𝜃 < -3 and a 0.1 chance that 𝜃 > 3. Thus, `quantiles = [-3, -1, 0, 1, 3]'` and `depths = [0.1, 0.25, 0.5, 0.75, 0.9]'`. -Then, the `metalog_qf` interpolates through these $K = 5$ points that the user provides. In this case, the user's prior beliefs about $\theta$ happen to be very close to a standard Cauchy distribution, but if the prior `quantiles` and `depths` were different, then the metalog distribution would still apply. Thus, the user does not need to know about the Cauchy distribution or that you could obtain a distribution with slightly lighter tails by using the Student t distribution with a small number of degrees of freedom; they just need to specify the prior values of the quantile function at $\frac{1}{10}$, $\frac{9}{10}$ or other tail `depths`. +Then, the `metalog_qf` interpolates through these _K_ = 5 points that the user provides. In this case, the user's prior beliefs about 𝜃 happen to be very close to a standard Cauchy distribution, but if the prior `quantiles` and `depths` were different, then the metalog distribution would still apply. Thus, the user does not need to know about the Cauchy distribution or that you could obtain a distribution with slightly lighter tails by using the Student t distribution with a small number of degrees of freedom; they just need to specify the prior values of the quantile function at 0.1, 0.9 or other tail `depths`. -An exception is thrown by any quantile function if a purported cumulative probability is negative or greater than $1$, although $0$ and $1$ are valid edge cases. Similarly, for quantile functions like `metalog_qf` that input `quantiles` and `depths` an exception is thrown if these vectors are not strictly increasing. Finally, for some quantile functions like `metalog_qf`, it is difficult to tell whether it is strictly increasing over the entire $\left[0,1\right]$ interval; thus, an exception is thrown if the derivative of the quantile function (called the quantile density function) evaluated at `p` is negative. In the usual case where `quantiles` and `depths` are declared in the `data` block, it could be possible to call some sort of validity function in the `transformed data` block that would return $1$ if the quantile function is strictly increasing over the entire $\left[0,1\right]$ interval (and $0$ otherwise) by checking whether the quantile density function lacks a root on $\left[0,1\right]$. +An exception is thrown by any quantile function if a purported cumulative probability is negative or greater than 1, although 0 and 1 are valid edge cases. Similarly, for quantile functions like `metalog_qf` that input `quantiles` and `depths` an exception is thrown if these vectors are not strictly increasing. Finally, for some quantile functions like `metalog_qf`, it is difficult to tell whether it is strictly increasing over the entire [0,1] interval; thus, an exception is thrown if the derivative of the quantile function (called the quantile density function) evaluated at `p` is negative. In the usual case where `quantiles` and `depths` are declared in the `data` block, it could be possible to call some sort of validity function in the `transformed data` block that would return 1 if the quantile function is strictly increasing over the entire [0,1] interval (and 0 otherwise) by checking whether the quantile density function lacks a root on [0,1]. # Reference-level explanation [reference-level-explanation]: #reference-level-explanation @@ -92,7 +92,7 @@ return_type_t cauchy_qf(const T_p& p, const T_loc& mu, return quantiles; } ``` -In the actual implementation, we would want to handle the partial derivatives analytically rather than using autodiff, at least in cases like `cauchy_qf` where the quantile function is explicit. In this case, if `p[n]` is $1$ or $0$, then $\tan\left(\pm \frac{\pi}{2}\right)$ is $\pm \infty$, which is correct. +In the actual implementation, we would want to handle the partial derivatives analytically rather than using autodiff, at least in cases like `cauchy_qf` where the quantile function is explicit. In this case, if `p[n]` is 1 or 0, then 𝜃 is infinite, which is the correct limiting value. The `normal_qf` would be almost the same, even though it relies on `inv_Phi` which is not an explicit function of the depth: ``` @@ -113,9 +113,9 @@ return_type_t normal_qf(const T_p& p, const T_loc& mu, return quantiles; } ``` -In this case, if `p[n]` is _near_ $1$ or $0$, then `inv_Phi(p[n])` will overflow to $\pm \infty$, which is numerically acceptable even though analytically, it should only be $\pm \infty$ if `p[n]` is _exactly_ $1$ or $0$ respectively. +In this case, if `p[n]` is _near_ 1 or 0, then `inv_Phi(p[n])` will overflow to infinity, which is numerically acceptable even though analytically, it should only be infinite if `p[n]` is _exactly_ 1 or 0 respectively. -In the case of the normal distribution, it is rather easy to differentiate with respect to $\mu$ and / or $\sigma$ because they do not enter the implicit function `inv_Phi`. However, in the case of many distributions like the gamma distribution, there is no explicit quantile function but the shape hyperparameters enter the implicit function. Thus, we can evaluate the quantile function of the gamma distribution rather easily via Boost: +In the case of the normal distribution, it is rather easy to differentiate with respect to μ and / or σ because they do not enter the implicit function `inv_Phi`. However, in the case of many distributions like the gamma distribution, there is no explicit quantile function but the shape hyperparameters enter the implicit function. Thus, we can evaluate the quantile function of the gamma distribution rather easily via Boost: ``` template return_type_t gamma_qf(const T_p& p, @@ -128,7 +128,7 @@ return_type_t gamma_qf(const T_p& p, ``` but differentiating the quantile function with respect to the shape hyperparameters would be as difficult as differentiating the CDF with respect to the shape hyperparameters. So, I might start by requiring the hyperparameters to be fixed data for distributions that lack explicit quantile functions, and then allow the hyperparameters to be unknown `var` later. -Boost also has "complimentary" quantile functions that have greater numerical accuracy when the depth is close to $1$. Hopefully, we can just utilize these inside Stan Math rather than exposing them to the Stan language. It would look something like +Boost also has "complimentary" quantile functions that have greater numerical accuracy when the depth is close to 1. Hopefully, we can just utilize these inside Stan Math rather than exposing them to the Stan language. It would look something like ``` template return_type_t gamma_qf(const T_p& p, @@ -146,7 +146,7 @@ For any probability distribution with a conceptually _fixed_ number of hyperpara template T_p metalog_qf(const T_p& p, vector_d& quantiles, vector_d& depths); ``` -In order to interpolate through the $K$ points given by `quantiles` and `depths`, you have to set up and then solve a system of $K$ linear equations, which is not too difficult. However, in the usual case where both `quantiles` and `depths` are fixed data, it is computationally wasteful to solve a constant system of $K$ linear equations at each leapfrog step. It would be more efficient for the user to solve for the $K$ coefficients once in the `transformed data` block and pass those to `metalog_qf` in the `transformed parameters` block when it is applied to a `var`. If we were willing to distinguish user-facing functions by number of arguments, we could have a variant of the metalog quantile function that takes the coefficients: +In order to interpolate through the _K_ points given by `quantiles` and `depths`, you have to set up and then solve a system of _K_ linear equations, which is not too difficult. However, in the usual case where both `quantiles` and `depths` are fixed data, it is computationally wasteful to solve a constant system of _K_ linear equations at each leapfrog step. It would be more efficient for the user to solve for the _K_ coefficients once in the `transformed data` block and pass those to `metalog_qf` in the `transformed parameters` block when it is applied to a `var`. If we were willing to distinguish user-facing functions by number of arguments, we could have a variant of the metalog quantile function that takes the coefficients: ``` template T_p metalog_qf(const T_p& p, vector_d& coefficients); @@ -155,9 +155,9 @@ that were produced by a helper function ``` vector_d metalog_coef(vector_d& quantiles, vector_d& depths); ``` -that checked the `quantiles` and `depths` for validity, solved the system of $K$ equations, and returned the resulting coefficients. +that checked the `quantiles` and `depths` for validity, solved the system of _K_ equations, and returned the resulting coefficients. -This `metalog_coef` function could also check that the quantile function that interpolates through these $K$ points is, in fact, strictly increasing over the entire $\left[0,1\right]$ interval. If `quantiles` and / or `depths` were not fixed data, then it would be very expensive to check whether the quantile function that interpolates through these $K$ points is strictly increasing over the entire $\left[0,1\right]$ interval, although it is cheap to check whether it is increasing at a _particular_ depth (and we would have to evaluate the derivative anyway if the depth is a `var`). Thus, if `quantiles` and / or `depths` is a `vector_v`, I propose that we check that the quantile function is increasing at the given depth(s) rather than at all possible dephs, which is presumably adequate for MCMC and optimization but could result in an invalid probability distribution if it were used in some other context. +This `metalog_coef` function could also check that the quantile function that interpolates through these _K_ points is, in fact, strictly increasing over the entire [0,1] interval. If `quantiles` and / or `depths` were not fixed data, then it would be very expensive to check whether the quantile function that interpolates through these _K_ points is strictly increasing over the entire [0,1] interval, although it is cheap to check whether it is increasing at a _particular_ depth (and we would have to evaluate the derivative anyway if the depth is a `var`). Thus, if `quantiles` and / or `depths` is a `vector_v`, I propose that we check that the quantile function is increasing at the given depth(s) rather than at all possible dephs, which is presumably adequate for MCMC and optimization but could result in an invalid probability distribution if it were used in some other context. Boost has a well-known monotonic [interpolation](https://www.boost.org/doc/libs/1_76_0/libs/math/doc/html/math_toolkit/pchip.html) function called `pchip` that could be used as a flexible quantile function. It consists of piecewise cubic polynomials that are differentiable at the given points. However, the `pchip` constructor returns a callable, and like in the metalog case, it would be computationally wasteful to reconstruct the spline at every leapfrog step when the points that it is interpolating through are fixed data. But the Stan language currently lacks something useful like ``` @@ -181,14 +181,14 @@ Maybe we could add something like that `transformed data` block when we do lambd # Drawbacks [drawbacks]: #drawbacks -Beyond the usual fact that implementing any new function requires documentation, more computer time to unit test, maintainence, etc., this will create a lot of cognitive dissonance for users, even though it is intended to make things easier for them. I think it will be a lot like when we introduced the LKJ distribution, and most users had no idea what to input for the shape parameter because they had not previously thought about a probability distribution over the space of correlation matrices. In this case, almost all users have not previously thought about using a prior _transformation_ of a cumulative probability rather than a prior _PDF_ on the parameter of interest, even though these are the flip sides of the same coin and you could express the same prior beliefs either way. So, there are going to be a lot of misguided questions like "How do I interpret the posterior distribution of `p`?". Also, even if we have quantile functions, there are going to be a lot of users that only want to utilize probability distributions like the normal that they have heard of before, even though they have no idea what to put for the prior standard deviation and even though probability distributions that they have not heard of before are easier because you only have to specify prior quantiles and depths. Also, users are going to run into errors when the quantile function they try to define is not actually increasing over the entire $\left[0,1\right]$ interval and they will not know what to do. +Beyond the usual fact that implementing any new function requires documentation, more computer time to unit test, maintainence, etc., this will create a lot of cognitive dissonance for users, even though it is intended to make things easier for them. I think it will be a lot like when we introduced the LKJ distribution, and most users had no idea what to input for the shape parameter because they had not previously thought about a probability distribution over the space of correlation matrices. In this case, almost all users have not previously thought about using a prior _transformation_ of a cumulative probability rather than a prior _PDF_ on the parameter of interest, even though these are the flip sides of the same coin and you could express the same prior beliefs either way. So, there are going to be a lot of misguided questions like "How do I interpret the posterior distribution of `p`?". Also, even if we have quantile functions, there are going to be a lot of users that only want to utilize probability distributions like the normal that they have heard of before, even though they have no idea what to put for the prior standard deviation and even though probability distributions that they have not heard of before are easier because you only have to specify prior quantiles and depths. Also, users are going to run into errors when the quantile function they try to define is not actually increasing over the entire [0,1] interval and they will not know what to do. # Rationale and alternatives [rationale-and-alternatives]: #rationale-and-alternatives -A quantile function must be an increasing function from the $\left[0,1\right]$ interval to a parameter space, $\Theta$. So, it is not as if there is some other concept of a quantile function to compete with this definition, in the same way that there is essentially only one way to define the concept of a CDF. Thus, the only flexibility in the design is in the details, such as how much time are we willing to spend checking that a metlog quantile function is increasing? +A quantile function must be an increasing function from the [0,1] interval to a parameter space, Θ. So, it is not as if there is some other concept of a quantile function to compete with this definition, in the same way that there is essentially only one way to define the concept of a CDF. Thus, the only flexibility in the design is in the details, such as how much time are we willing to spend checking that a metlog quantile function is increasing? -I suppose we do not have to merge the ultimate quantile function pull request(s), but I am obligated to make the branches under a NSF grant. This NSF grant also funds me and Philip to interpolate the log-likelihood as a function of `p`, as opposed to a function of $\theta$. We can potentially interpolate the log-likelihood as a function of `p` much faster than we can evaluate the log-likelihood as a function of $\theta$, but in order to do that, users would need to reparameterize their Stan programs so that `p` is declared in the `parameters` block, in which case they would need to use quantile functions in the `transformed parameters` block to express their prior beliefs about $\theta$. Thus, if we do not implement a lot of quantile functions for users to express their prior beliefs with, then they cannot capture the gains in speed from interpolating the log-likelihood as a function of `p`. +I suppose we do not have to merge the ultimate quantile function pull request(s), but I am obligated to make the branches under a NSF grant. This NSF grant also funds me and Philip to interpolate the log-likelihood as a function of `p`, as opposed to a function of 𝜃. We can potentially interpolate the log-likelihood as a function of `p` much faster than we can evaluate the log-likelihood as a function of 𝜃, but in order to do that, users would need to reparameterize their Stan programs so that `p` is declared in the `parameters` block, in which case they would need to use quantile functions in the `transformed parameters` block to express their prior beliefs about 𝜃. Thus, if we do not implement a lot of quantile functions for users to express their prior beliefs with, then they cannot capture the gains in speed from interpolating the log-likelihood as a function of `p`. # Prior art [prior-art]: #prior-art @@ -207,6 +207,6 @@ There is one [textbook](https://www.google.com/books/edition/Statistical_Modelli - Discrete distributions: Quantile functions exist for them, but they are not very useful because they are step functions. For completeness, we might implement them eventually but they are irrelevant to the NSF grant and the proposed workflow. - Multivariate distributions: Quantile functions are harder to define for multivariate continuous distributions, and you never get any explicit functions. Thus, I doubt we will ever do anything on that front. - Density quantile functions: These are the reciprocals of the quantile density functions (which are the first derivative of a quantile functions), which can be used for log-likelihoods. There is a Ph.D student at Lund who is working on this using Stan for his dissertation, but we will have to see who or what should implement that for Stan Math. -- Checking that the quantile function is increasing: In many cases, the quantile function is increasing by construction. But for some distributions like the metalog, the quantile function is only increasing over the entire $\left[0,1\right]$ interval only for some pairs of quantiles and depths. If those quantiles and / or depths are not fixed data, then verifying whether it is increasing for all points in $\left[0,1\right]$ is much more expensive than verifying so for a given point in $\left[0,1\right]$. +- Checking that the quantile function is increasing: In many cases, the quantile function is increasing by construction. But for some distributions like the metalog, the quantile function is only increasing over the entire [0,1] interval only for some pairs of quantiles and depths. If those quantiles and / or depths are not fixed data, then verifying whether it is increasing for all points in [0,1] is much more expensive than verifying so for a given point in [0,1]. - Can we have a three-argument version `metalog_qf(p, quantiles, depths)` and a two-argument version `metalog_qf(p, coefficients)` where the coefficients are produced in the `transformed data` block by `metalog_coef(quantiles, depths)`? - What to do about derivatives of implicit quantile functions with respect to hyperparameters? We have the implicit function stuff worked out in `algebraic_solver`, but I do not know how accurate the partial derivatives are going to be for probability distributions like the gamma. From 7d0161f2007ff41b7972d92b6916070fa4684c13 Mon Sep 17 00:00:00 2001 From: Ben Goodrich Date: Mon, 30 Aug 2021 19:49:39 -0400 Subject: [PATCH 4/9] almost ready for PR --- designs/0028-quantile-functions.md | 287 +++++++++++++++++++---------- 1 file changed, 188 insertions(+), 99 deletions(-) diff --git a/designs/0028-quantile-functions.md b/designs/0028-quantile-functions.md index 3dfb38e..eedb2d7 100644 --- a/designs/0028-quantile-functions.md +++ b/designs/0028-quantile-functions.md @@ -6,171 +6,187 @@ # Summary [summary]: #summary -I intend to add many quantile functions to Stan Math, at least for univariate continuous probability distributions, that would eventually be exposed in the Stan language. For a univariate continuous probability distribution, the quantile function is the inverse of the Cumulative Density Function (CDF), i.e. the quantile function is an increasing function from [0,1] to the parameter space, Θ. The quantile function can be used to _define_ a probability distribution, although for many well-known probability distributions (e.g. the standard normal) the quantile function cannot be written as an explicit function. +I intend to add many quantile functions to Stan Math for univariate continuous probability distributions that would eventually be exposed in the Stan language. For a univariate continuous probability distribution, the quantile function is the inverse of the Cumulative Density Function (CDF), i.e. the quantile function is an increasing function from a cumulative probability on [0,1] to the parameter space, Θ. The quantile function can be used to _define_ a probability distribution, although for many well-known probability distributions (e.g. the standard normal), the quantile function cannot be written as an explicit function. The partial derivative of a quantile function with respect to the cumulative probability is called the quantile density function, which is also needed in Stan Math but probably is not be worth exposing in the Stan language. # Motivation [motivation]: #motivation -The primary motivation for implementing quantile functions in Stan Math is so that users can call them in the `transformed parameters` block of a Stan program to express their prior beliefs about a parameter. The argument (in the mathematical sense) of a quantile function is a cumulative probability between 0 and 1 that would be declared in the `parameters` block and have an implicit standard uniform prior. Thus, if 𝜃 is defined in the `transformed parameters` block by applying the quantile function to this cumulative probability, then the distribution of 𝜃 under the prior is the probability distribution defined by that quantile function. When data is conditioned on, the posterior distribution of the cumulative probability becomes non-uniform but we still obtain MCMC draws from the posterior distribution of 𝜃. +The primary motivation for implementing quantile functions in Stan Math is so that users can call them in the `transformed parameters` block of a Stan program to express their prior beliefs about a parameter. The argument (in the mathematical sense) of a quantile function is a cumulative probability between 0 and 1 that would be declared in the `parameters` block and have an implicit standard uniform prior. Thus, if 𝜃 is defined in the `transformed parameters` block by applying the quantile function to this cumulative probability, then the distribution of 𝜃 under the prior is the probability distribution defined by that quantile function. When data is conditioned on, the posterior distribution of the cumulative probability becomes non-uniform but we still obtain MCMC draws from the posterior distribution of 𝜃 that differs from the prior distribution. -If we were to unnecessarily restrict ourselves to quantile functions for common probability distributions, this method of expressing prior beliefs about 𝜃 is no easier than declaring 𝜃 in the `parameters` block and using a prior Probability Density Function (PDF) to express beliefs about 𝜃 in the `model` block. However, if 𝜃 has a heavy-tailed distribution, then defining it in the `transformed parameters` block may yield more efficient MCMC because the distribution of the cumulative probability (when transformed to the unconstrained space) tends to have tails that are more reasonable. In addition, expressing prior beliefs via quantile functions is necessary if the log-likelihood function is reparameterized in terms of cumulative probabilities, which we intend to pursue for computational speed over the next three years under the same NSF grant. +As a side note, Stan Math is already using internally this construction in many of its `_rng` functions, where +drawing from a standard uniform and applying a quantile function to it is called "inversion sampling". So, +this design doc is mostly just a plan to make quantile functions available to be called directly. However, it does make the Stan program look more similar to one that draws from the prior predictive distribution in the `transformed data` block for Simulation Based Calibration. -However, there is no need to restrict ourselves to quantile functions for common probability distributions, and it is conceptually easier for the user to express prior beliefs about 𝜃 using very flexible probability distributions that are defined by explicit quantile functions but lack explicit CDFs and PDFs. Examples include the Generalized Lambda Distribution, the metalog(istic) distribution, the Chebyshev distribution of the first kind, and increasing cubic splines. In each of these cases, the user would specify a few pairs of cumulative probabilities and values of 𝜃, and then an increasing curve would be run through these user-supplied points. In other words, the quantile function would interpolate through the prior median, the prior quartiles, etc. Thus, from the user's perspective, all of the hyperparameters are _location_ parameters --- even though they collectively characterize the shape of the prior distribution for 𝜃 --- and location parameters are easier for user's to think about than other kinds of hyperparameters. +If we were to unnecessarily restrict ourselves to quantile functions for common probability distributions, this method of expressing prior beliefs about 𝜃 is no easier than declaring 𝜃 in the `parameters` block and using a prior Probability Density Function (PDF) to express beliefs about 𝜃 in the `model` block. However, if 𝜃 has a heavy-tailed distribution, then defining it in the `transformed parameters` block may yield more efficient MCMC because the distribution of the cumulative probability (when transformed to the unconstrained space) tends to have tails that are more reasonable. In addition, expressing prior beliefs via quantile functions is necessary if the log-likelihood function is reparameterized in terms of cumulative probabilities, which we intend to pursue for computational speed over the next three years under the same NSF grant but that will require a separate design doc. -We can support this workflow for specifying prior beliefs about 𝜃 for any univariate continuous probability distribution, although the more flexible distributions include a lot of common probability distributions that have one or two hyperparameters as special cases. I anticipate that once Stan users are exposed to this workflow, many will prefer to use it over the traditional workflow of specifying prior beliefs about 𝜃 via a PDF. +However, there is no need to restrict ourselves to quantile functions for common probability distributions, and it is conceptually easier for the user to express prior beliefs about 𝜃 using very flexible probability distributions that are defined by explicit quantile functions but lack explicit CDFs and PDFs. Examples include the Generalized Lambda Distribution, the metalog(istic) distribution, the no name distribution of the first kind, and distributions whose quantile function is a spline. In each of these cases, the user would specify a few pairs of cumulative probabilities and values of 𝜃, and then an increasing curve would be run through these user-supplied points. In other words, the quantile function would interpolate through the prior median, the prior quartiles, etc. Thus, from the user's perspective, all of the hyperparameters are _location_ parameters --- even though they collectively characterize the shape of the prior distribution for 𝜃 --- and location parameters are easier for users to think about than other kinds of hyperparameters (especially expectations). + +We can support this workflow for specifying prior beliefs about 𝜃 for any univariate continuous probability distribution, although the more flexible distributions include a lot of common probability distributions that have one or two hyperparameters as special cases. I anticipate that once Stan users become familiar with this workflow, many will prefer to use it over the traditional workflow of specifying prior beliefs about 𝜃 via a PDF. # Guide-level explanation [guide-level-explanation]: #guide-level-explanation -For univariate continuous probability distributions, the quantile function is the inverse of the CDF, or equivalently, the CDF is the inverse of the quantile function. For example, in the following minimal Stan program +In the following minimal Stan program ``` data { real median; real scale; } parameters { - real p; + real p; // implicitly standard uniform } transformed parameters { real theta = median + scale * tan(pi() * (p - 0.5)); -} // quantile function of the Cauchy distribution +} // ^^^ quantile function of the Cauchy distribution ``` -the distribution of `p` is standard uniform (because there is no `model` block) and the distribution of 𝜃 is Cauchy because `theta` is defined by the quantile function of the Cauchy distribution. But it would be better for users if the quantile function for the Cauchy distribution were implemented in Stan Math and (eventually) exposed to the Stan language, so that this minimal Stan program could be equivalently rewritten as +the distribution of `p` is standard uniform (because there is no `model` block) and the distribution of 𝜃 is Cauchy because `theta` is defined by the quantile function of the Cauchy distribution. But it would be better for users if the quantile function for the Cauchy distribution were implemented in Stan Math and eventually exposed to the Stan language, so that the even more minimal Stan program could be equivalently rewritten as ``` data { real median; real scale; } parameters { - real p; + real p; // implicitly standard uniform } transformed parameters { real theta = cauchy_qf(p, median, scale); } // theta = median + scale * tan(pi() * (p - 0.5)) ``` -However, rarely do user's have prior beliefs about 𝜃 that necessitate the Cauchy distribution, i.e. 𝜃 is a ratio of centered normals. Rather, when users use the Cauchy distribution as a prior they do so because their beliefs about 𝜃 are heavy-tailed and symmetric. But there are many other probability distributions that are heavy tailed and symmetric that are not exactly Cauchy. In that situation, a user may find it preferable to use a more flexible distribution, such as the metalog(istic) distribution, whose quantile function interpolates through _K_ pairs of `quantiles` and `depths` (cumulative probabilities) that the user passes to the `data` block of their Stan program. +However, rarely do users have prior beliefs about 𝜃 that necessitate the Cauchy distribution, i.e. 𝜃 is a ratio of standard normal variates. Rather, when users use the Cauchy distribution as a prior they do so because their prior beliefs about 𝜃 are vaguely heavy-tailed and symmetric. But there are many other probability distributions that are heavy tailed and symmetric that are not exactly Cauchy. + +In that situation, a user may find it preferable to use a more flexible distribution, such as the metalog(istic) distribution, whose quantile function interpolates through _K_ pairs of `depths` (cumulative probabilities) and `quantiles` that the user passes to the `data` block of their Stan program. If _K_ = 3, then the metalog quantile function is `theta = a + b * logit(p) + c * (p - 0.5) * logit(p)`, where `a`, `b`, and `c` are three coefficients whose values are implied by the _K_ = 3 `depths` and `quantiles`. This is a valid quantile function --- i.e. it is strictly increasing over the entire [0,1] interval --- if and only if both b > 0 and |c| / b < 1.66711. Having `depths` and `quantiles` in increasing order is necessary but not sufficient for the quantile function to be valid. But _K_ need not be 3 and its value is specified at runtime in the following Stan program: ``` data { int K; - ordered[K] quantiles; ordered[K] depths; + ordered[K] quantiles; } +transformed data { + vector[K] coefficients = metalog_coef(depths, quantiles); +} // maximal validity checking ^^^ parameters { - real p; + real p; // implicitly standard uniform } transformed parameters { - real theta = metalog_qf(p, quantiles, depths); -} + real theta = metalog_qf(p, coefficients); +} // minimal ^^^ validity checking ``` The mindset of the Stan user could be: -> Before seeing any new data, I believe there is an equal chance that 𝜃 is negative as positive. There is a 0.25 chance that 𝜃 < -1 and a 0.25 chance that 𝜃 > 1. Finally, there is a 0.1 chance that 𝜃 < -3 and a 0.1 chance that 𝜃 > 3. Thus, `quantiles = [-3, -1, 0, 1, 3]'` and `depths = [0.1, 0.25, 0.5, 0.75, 0.9]'`. +> Before seeing any new data, I believe there is an equal chance that 𝜃 is negative as it is positive. There is a 0.25 chance that 𝜃 < -1 and a 0.25 chance that 𝜃 > 1. Finally, there is a 0.1 chance that 𝜃 < -3 and a 0.1 chance that 𝜃 > 3. So, _K_ = 5, `depths = [0.1, 0.25, 0.5, 0.75, 0.9]'`, and `quantiles = [-3, -1, 0, 1, 3]'`. + +The `metalog_qf` interpolates through these _K_ points that the user provides. In this example, the user's prior beliefs about 𝜃 happen to be very close to a standard Cauchy distribution, but if the prior `depths` and `quantiles` were different, then the metalog distribution would still apply. Thus, the user does not need to know about the Cauchy distribution or that you could obtain a distribution with slightly lighter tails by using the Student t distribution with a small degrees of freedom; they just need to specify the prior values of the quantile function at 0.1, 0.9 or other tail `depths`. In other words, this workflow would drastically deemphasize the _name_ of the probability distribution and focus on its main _properties_ like the median, inter-quartile range, and left/right tail heaviness. + +The `metalog_coef` function that is called in the `transformed data` block above would output the `coefficients` that are needed to define the metalog quantile function as a weighted sum of basis functions, i.e. a, b, and c if _K_ = 3. These `coefficients` are not particularly interpretable but are implied by the readily-interpretable `depths` and `quantiles`; in other words, they are the `coefficients` that imply `metalog_qf` interpolates through those _K_ `depths` and `quantiles` and are thus a solution to a linear system of _K_ equations. When _K_ = 3, the equations to be solved using linear algebra are + + + | | | | | | | | +- | - | ------------ | - | -------------------------- | ----- | ----- |-----| --- +1 | | `logit(d_1)` | | `(d_1 - 0.5) * logit(d_1)` | | a | | `q_1` +1 | | `logit(d_2)` | | `(d_2 - 0.5) * logit(d_2)` | | b | = | `q_2` +1 | | `logit(d_3)` | | `(d_3 - 0.5) * logit(d_3)` | | c | | `q_3` + -Then, the `metalog_qf` interpolates through these _K_ = 5 points that the user provides. In this case, the user's prior beliefs about 𝜃 happen to be very close to a standard Cauchy distribution, but if the prior `quantiles` and `depths` were different, then the metalog distribution would still apply. Thus, the user does not need to know about the Cauchy distribution or that you could obtain a distribution with slightly lighter tails by using the Student t distribution with a small number of degrees of freedom; they just need to specify the prior values of the quantile function at 0.1, 0.9 or other tail `depths`. +The `metalog_coef` function would check that both `depths` and `quantiles` are strictly increasing and that the quantile density function is strictly increasing throughout the [0,1] interval, which is fairly expensive if _K_ > 3 but only needs to be checked once if both `depths` and `quantiles` are constants. Then, the `metalog_qf` function that is called in the `transformed parameters` block only needs to check that `p` is between 0 and 1. -An exception is thrown by any quantile function if a purported cumulative probability is negative or greater than 1, although 0 and 1 are valid edge cases. Similarly, for quantile functions like `metalog_qf` that input `quantiles` and `depths` an exception is thrown if these vectors are not strictly increasing. Finally, for some quantile functions like `metalog_qf`, it is difficult to tell whether it is strictly increasing over the entire [0,1] interval; thus, an exception is thrown if the derivative of the quantile function (called the quantile density function) evaluated at `p` is negative. In the usual case where `quantiles` and `depths` are declared in the `data` block, it could be possible to call some sort of validity function in the `transformed data` block that would return 1 if the quantile function is strictly increasing over the entire [0,1] interval (and 0 otherwise) by checking whether the quantile density function lacks a root on [0,1]. +An exception is thrown by any quantile function if a purported cumulative probability is negative or greater than 1, although 0 and 1 are valid edge cases. This behavior is unlike that of a CDF where it is arguably coherent, for example, to query what is the probability that a Beta random variate is less than or equal to 42? Helper functions like `metalog_coef` function throw an exception if the implied quantile density function is negative at any point in the [0,1] interval by numerically searching this interval for a root (if there is no better way). # Reference-level explanation [reference-level-explanation]: #reference-level-explanation -Unlike the `_cdf` and `_lpdf` functions in Stan Math that return scalars even if their arguments are not scalars, the quantile functions should return a scalar only if all of the arguments are scalars and otherwise should return something whose size is equivalent to the size of the argument with the largest size. In C++, `cauchy_qf` could be implemented in `prim/prob` roughly like +From a developer's perspective, there are four categories of quantile functions: + +1. Explicit functions with a predetermined number of hyperparameters. Examples include the Cauchy distribution, the exponential distribution, and maybe ten other distributions. All of the arguments could easily be templated. For most such distributions, we already have the CDFs and PDFs, but an exception is the Generalized Lambda Distribution, which does not have an explicit CDF or PDF. +2. Explicit functions with a number of hyperparameters that is determined at runtime. Examples include the metalog distribution, the no name distribution of the first kind, and a few other "quantile parameterized distributions". All of the arguments could be templated, but the hyperparameters are vectors of size _K_. None of these distributions currently exist in Stan Math because they do not have explicit CDFs or PDFs. +3. Implicit functions where all the `var` hyperparameters are outside the kernel. Examples include the normal distribution and the Student t distribution with a fixed degrees of freedom. In this case, the partial derivatives are fairly easy to compute. +4. Implicit functions where some of the `var` hyperparameters are inside the kernel. Examples include the gamma distribution, beta distribution, and many others that are not in the location-scale family. In this case, the partial derivative with respect to a hyperparameter is not an explicit function. + +Unlike the `_cdf` and `_lpdf` functions in Stan Math that return scalars even if their arguments are not scalars, the quantile (density) functions should return something whose size is the same as that of the first argument. + +## Category (1) + +In C++, `cauchy_qf` could be implemented in `prim/prob` basically like ``` template return_type_t cauchy_qf(const T_p& p, const T_loc& mu, const T_scale& sigma) { - static const char* function = "cauchy_qf"; - check_consistent_sizes(function, "Depth", p, "Location parameter", - mu, "Scale Parameter", sigma); - check_probability(function, "Depth", p); - check_finite(function, "Location parameter", mu); - check_positive_finite(function, "Scale parameter", sigma); - using T_partials_return = partials_return_t; - T_partials_return quantiles; - for (int n = 0; n < size(quantiles); n++) { // if mu and sigma are scalars - quantiles[n] = mu + sigma * tan(stan::math::pi() * (p[n] - 0.5)); - } - return quantiles; + // argument checking + return mu + sigma * tan(pi() * (p - 0.5)); } ``` -In the actual implementation, we would want to handle the partial derivatives analytically rather than using autodiff, at least in cases like `cauchy_qf` where the quantile function is explicit. In this case, if `p[n]` is 1 or 0, then 𝜃 is infinite, which is the correct limiting value. +In the actual implementation, we would want to handle the partial derivatives analytically rather than using autodiff, at least for categories (1) and (2). In this case, if `p` is 1 or 0, then the corresponding returned value is positive infinity or negative infinity respectively, which are the correct limiting values. -The `normal_qf` would be almost the same, even though it relies on `inv_Phi` which is not an explicit function of the depth: +The partial derivative of the quantile function with respect to the first argument is called the quantile density function and could be implemented in the Cauchy case like ``` -template -return_type_t normal_qf(const T_p& p, const T_loc& mu, - const T_scale& sigma) { - static const char* function = "normal_qf"; - check_consistent_sizes(function, "Depth", p, "Location parameter", - mu, "Scale Parameter", sigma); - check_probability(function, "Depth", p); - check_finite(function, "Location parameter", mu); - check_positive_finite(function, "Scale parameter", sigma); - using T_partials_return = partials_return_t; - T_partials_return quantiles; - for (int n = 0; n < size(quantiles); n++) { // if mu and sigma are scalars - quantiles[n] = mu + sigma * inv_Phi(p[n]); - } - return quantiles; +template +return_type_t cauchy_qdf(const T_p& p, + const T_scale& sigma) { + // argument checking + return pi() * sigma / square(sin(pi() * p)); } ``` -In this case, if `p[n]` is _near_ 1 or 0, then `inv_Phi(p[n])` will overflow to infinity, which is numerically acceptable even though analytically, it should only be infinite if `p[n]` is _exactly_ 1 or 0 respectively. +Here there is a minor issue of whether the signature of the `cauchy_qdf` should also take `mu` as an argument, even though the quantile density function does not depend on `mu`. + +## Category (2) -In the case of the normal distribution, it is rather easy to differentiate with respect to μ and / or σ because they do not enter the implicit function `inv_Phi`. However, in the case of many distributions like the gamma distribution, there is no explicit quantile function but the shape hyperparameters enter the implicit function. Thus, we can evaluate the quantile function of the gamma distribution rather easily via Boost: +For distributions like the metalog, there would be an exposed helper function with a `_coef` postfix that inputs vectors (or real arrays) of `depths` and `quantiles` and outputs the implied vector of `coefficients` after thoroughly checking for validity (and throwing an exception otherwise) ``` -template -return_type_t gamma_qf(const T_p& p, - const T_shape& alpha, - const T_inv_scale& beta) { - // argument checking - auto dist = boost::gamma_distribution<>(value_of(alpha), 1 / value_of(beta)); - return quantile(dist, value_of(p)); +template +return_type_t metalog_coef(const T_q& depths, + const T_d& quantiles) { + // check that both depths and quantiles are ordered + // check that all elements of depths are between 0 and 1 inclusive + // solve linear system for coefficients derived by Tom Keelin + // check that metalog_qdf with those coefficients has no root on [0,1] + return coefficients; } ``` -but differentiating the quantile function with respect to the shape hyperparameters would be as difficult as differentiating the CDF with respect to the shape hyperparameters. So, I might start by requiring the hyperparameters to be fixed data for distributions that lack explicit quantile functions, and then allow the hyperparameters to be unknown `var` later. - -Boost also has "complimentary" quantile functions that have greater numerical accuracy when the depth is close to 1. Hopefully, we can just utilize these inside Stan Math rather than exposing them to the Stan language. It would look something like +Then, the quantile and quantile density functions can be defined with `coefficients` as the hyperparameter vector: ``` -template -return_type_t gamma_qf(const T_p& p, - const T_shape& alpha, - const T_inv_scale& beta) { - // argument checking - auto dist = boost::gamma_distribution<>(value_of(alpha), 1 / value_of(beta)); - if (p > 0.999) return quantile(complement(dist, value_of(p))); - else return quantile(dist, value_of(p)); +template +return_type_t metalog_qf(const T_p& p, + const T_c& coefficients) { + // check that p is between 0 and 1 inclusive + // calculate quantiles via dot_product + return quantiles; } -``` -For any probability distribution with a conceptually _fixed_ number of hyperparameters, implementing the quantile function is basically like implementing the CDF (except without the reduction to a scalar return value). However, there are useful probability distributions, such as the metalog distribution, where the quantile function can take an _arbitrary_ number of pairs of quantiles and depths whose number is not known until runtime. In that case, some of the trailing arguments to the quantile function will be vectors or real arrays, or perhaps arrays of vectors or arrays of real arrays. The simplest signature would be -``` -template -T_p metalog_qf(const T_p& p, vector_d& quantiles, vector_d& depths); -``` -In order to interpolate through the _K_ points given by `quantiles` and `depths`, you have to set up and then solve a system of _K_ linear equations, which is not too difficult. However, in the usual case where both `quantiles` and `depths` are fixed data, it is computationally wasteful to solve a constant system of _K_ linear equations at each leapfrog step. It would be more efficient for the user to solve for the _K_ coefficients once in the `transformed data` block and pass those to `metalog_qf` in the `transformed parameters` block when it is applied to a `var`. If we were willing to distinguish user-facing functions by number of arguments, we could have a variant of the metalog quantile function that takes the coefficients: -``` -template -T_p metalog_qf(const T_p& p, vector_d& coefficients); -``` -that were produced by a helper function -``` -vector_d metalog_coef(vector_d& quantiles, vector_d& depths); +template +return_type_t metalog_qdf(const T_p& p, + const T_c& coefficients) { + // check that p is between 0 and 1 inclusive + // calculate slopes via dot_product + return slopes; +} ``` -that checked the `quantiles` and `depths` for validity, solved the system of _K_ equations, and returned the resulting coefficients. - -This `metalog_coef` function could also check that the quantile function that interpolates through these _K_ points is, in fact, strictly increasing over the entire [0,1] interval. If `quantiles` and / or `depths` were not fixed data, then it would be very expensive to check whether the quantile function that interpolates through these _K_ points is strictly increasing over the entire [0,1] interval, although it is cheap to check whether it is increasing at a _particular_ depth (and we would have to evaluate the derivative anyway if the depth is a `var`). Thus, if `quantiles` and / or `depths` is a `vector_v`, I propose that we check that the quantile function is increasing at the given depth(s) rather than at all possible dephs, which is presumably adequate for MCMC and optimization but could result in an invalid probability distribution if it were used in some other context. +If the `_qdf` can be written as a polynomial in `p`, then it is possible for the `_coef` function to quickly check whether the `_qdf` has any root on the [0,1] interval using Budan's [theorem](https://en.wikipedia.org/wiki/Budan%27s_theorem). If the `_qdf` is not a polynomial, then the `_coef` function would have to call one of the root-finding [functions](https://www.boost.org/doc/libs/1_76_0/libs/math/doc/html/root_finding.html) in Boost to verify that the `_qdf` has no root on the [0,1] interval. -Boost has a well-known monotonic [interpolation](https://www.boost.org/doc/libs/1_76_0/libs/math/doc/html/math_toolkit/pchip.html) function called `pchip` that could be used as a flexible quantile function. It consists of piecewise cubic polynomials that are differentiable at the given points. However, the `pchip` constructor returns a callable, and like in the metalog case, it would be computationally wasteful to reconstruct the spline at every leapfrog step when the points that it is interpolating through are fixed data. But the Stan language currently lacks something useful like +Boost has a well-known monotonic [interpolation](https://www.boost.org/doc/libs/1_76_0/libs/math/doc/html/math_toolkit/pchip.html) function called `pchip` that could be used as a flexible quantile function. It consists of piecewise cubic polynomials that are differentiable at the given points. However, the `pchip` constructor returns a callable, and it would be computationally wasteful to reconstruct the spline at every leapfrog step when the points that it is interpolating through are fixed data. But the Stan language currently lacks something useful like ``` data { int K; - ordered[K] quantiles; ordered[K] depths; + ordered[K] quantiles; } transformed data { // not valid syntax callable my_qf = pchip(depths, quantiles); } parameters { - real p; + real p; // implicitly standard uniform } transformed parameters { real theta = my_qf(p); @@ -178,35 +194,108 @@ transformed parameters { ``` Maybe we could add something like that `transformed data` block when we do lambda functions. +## Category (3) + +The `normal_qf` would be almost the same as `cauchy_qf`, even though it relies on `inv_Phi` which is not an explicit function of the cumulative probability: +``` +template +return_type_t normal_qf(const T_p& p, const T_loc& mu, + const T_scale& sigma) { + // argument checking + return mu + sigma * inv_Phi(p); +} +``` +In this case, if `p` is too _near_ 1 or 0, then `inv_Phi(p)` will overflow to positive or negative infinity respectively, which is numerically acceptable even though analytically, it should only be infinite if `p` is _exactly_ 1 or 0 respectively. In the case of the normal distribution, or another distribution in category (3), it is rather easy to differentiate with respect to the hyperparameters because they do not enter the kernel of the implicit function, in this case `inv_Phi`. + +The `inv_Phi` function currently in Stan Math [differentiates](https://github.com/stan-dev/math/blob/develop/stan/math/rev/fun/inv_Phi.hpp#L23) with respect to its argument by using the fact that the reciprocal of the quantile density function (called the density quantile function) is the the same expression as the PDF, just in terms of `p`. For the general normal distribution, the quantile density function does not depend on `mu`, so it could be implemented like +``` +template +return_type_t normal_qdf(const T_p& p, + const T_scale& sigma) { + // argument checking + return exp(-normal_lpdf(sigma * inv_Phi(p), 0, sigma)); + +} +``` + +## Category (4) + +In the case of many distributions like the gamma distribution, there is no explicit quantile function but the hyperparameters enter the kernel. We can evaluate the quantile function of the gamma distribution rather easily via Boost, which also has "complimentary" quantile functions that have greater numerical accuracy when the depth is close to 1. It would look something like +``` +template +return_type_t gamma_qf(const T_p& p, + const T_shape& alpha, + const T_inv_scale& beta) { + // argument checking + auto dist = boost::gamma_distribution<>(value_of(alpha), 1 / value_of(beta)); + if (p > 0.5) return quantile(complement(dist, 1 - value_of(p))); + else return quantile(dist, value_of(p)); +} + +template +return_type_t gamma_qdf(const T_p& p, + const T_shape& alpha, + const T_inv_scale& beta) { + // argument checking + return exp(-gamma_lpdf(gamma_qf(p, alpha, beta), alpha, beta)); +``` +However, the derivative of the quantile function with respect to the shape and inverse scale hyperparameters is not an explicit function. We do have the machinery to differentiate an implicit function in `algebraic_solver` that could be essentially reused in this case. Autodiff could also work if Stan Math would ever make `var` compatible with Boost's real number [concept](https://www.boost.org/doc/libs/1_77_0/libs/math/doc/html/math_toolkit/real_concepts.html). + # Drawbacks [drawbacks]: #drawbacks -Beyond the usual fact that implementing any new function requires documentation, more computer time to unit test, maintainence, etc., this will create a lot of cognitive dissonance for users, even though it is intended to make things easier for them. I think it will be a lot like when we introduced the LKJ distribution, and most users had no idea what to input for the shape parameter because they had not previously thought about a probability distribution over the space of correlation matrices. In this case, almost all users have not previously thought about using a prior _transformation_ of a cumulative probability rather than a prior _PDF_ on the parameter of interest, even though these are the flip sides of the same coin and you could express the same prior beliefs either way. So, there are going to be a lot of misguided questions like "How do I interpret the posterior distribution of `p`?". Also, even if we have quantile functions, there are going to be a lot of users that only want to utilize probability distributions like the normal that they have heard of before, even though they have no idea what to put for the prior standard deviation and even though probability distributions that they have not heard of before are easier because you only have to specify prior quantiles and depths. Also, users are going to run into errors when the quantile function they try to define is not actually increasing over the entire [0,1] interval and they will not know what to do. +Beyond the usual fact that implementing any new function requires documentation, more computer time to unit test, maintainence, etc., this will create a lot of cognitive dissonance for users, even though it is intended to make things easier for them. I think it will be a lot like when we introduced the LKJ distribution, and most users had no idea what to input for the shape parameter because they had not previously thought about a probability distribution over the space of correlation matrices. In this case, almost all users have not previously thought about using a prior _transformation_ of a cumulative probability rather than a prior _PDF_ on the parameter of interest, even though these are the flip sides of the same coin and you could express the same prior beliefs either way, although that is more difficult in practice if either is not an explicit function. + +Some would say that CDFs and PDFs are the only appropriate way to think about continuous probability distributions because the measure theory that lets you go from discrete to continuous random variables unifies the PMF with the PDF. I think this is, at best, irrelevant to most Stan users because they have not taken any measure theory classes and, at worst, creates unnecessary difficulties when they write Stan programs. + +But there are going to be a lot of misguided questions like "How do I interpret the posterior distribution of `p`?". Also, once we expose quantile functions, there are going to be a lot of users that only want to utilize probability distributions like the normal that they have heard of before, even though they have no idea what to put for the prior standard deviation and even though probability distributions that they have not heard of before are easier because you only have to specify prior quantiles and depths. Then, users are going to run into errors when the quantile function they try to define is not actually increasing over the entire [0,1] interval and they will not know what to do. # Rationale and alternatives [rationale-and-alternatives]: #rationale-and-alternatives -A quantile function must be an increasing function from the [0,1] interval to a parameter space, Θ. So, it is not as if there is some other concept of a quantile function to compete with this definition, in the same way that there is essentially only one way to define the concept of a CDF. Thus, the only flexibility in the design is in the details, such as how much time are we willing to spend checking that a metlog quantile function is increasing? +A quantile function must be an increasing function from the [0,1] interval to a parameter space, Θ. So, it is not as if there is some other concept of a quantile function with an alternative definition, in the same way that there is essentially only one way to define the concept of a CDF. Thus, the only genuine flexibility in the design is in some of the details. -I suppose we do not have to merge the ultimate quantile function pull request(s), but I am obligated to make the branches under a NSF grant. This NSF grant also funds me and Philip to interpolate the log-likelihood as a function of `p`, as opposed to a function of 𝜃. We can potentially interpolate the log-likelihood as a function of `p` much faster than we can evaluate the log-likelihood as a function of 𝜃, but in order to do that, users would need to reparameterize their Stan programs so that `p` is declared in the `parameters` block, in which case they would need to use quantile functions in the `transformed parameters` block to express their prior beliefs about 𝜃. Thus, if we do not implement a lot of quantile functions for users to express their prior beliefs with, then they cannot capture the gains in speed from interpolating the log-likelihood as a function of `p`. +The scope for alternatives is mostly for distributions in category (2) where the size of the hyperparameter vectors is not known until runtime and there is no cheap way to check whether the implied quantile function is increasing throughout the entire [0,1] interval. This is somewhat similar to `multi_normal_lpdf` having no cheap way to check whether the covariance matrix is positive definite. If the covariance matrix is fixed, users are better served calculating its Cholesky factor once in the `transformed data` block and then calling `multi_normal_cholesky_lpdf` in the model block. + +In the workflow proposed above, it would be the user's responsibility to call `vector[K] coefficients = metalog_coef(depths, quantiles)` before calling `metalog_qf(p, coefficients)`. The advantage of this workflow is that `metalog_qf` only needs to check that `p` is between 0 and 1. The disadvantage is that if users somehow make up their own coefficients without first calling `metalog_coef`, then `metalog_qf(p, coefficients)` might not be an increasing function throughout the entire [0,1] interval and thus is not a valid quantile function. + +One alternative would be for `metalog_qf` to check that it is increasing every time that it is called, in which case we might as well make a three argument version `metalog_qf(p, depths, quantiles)` and avoid having the two-argument version `metalog_qf(p, coefficients)`. That would be more similar to what `multi_normal_lpdf` does, even if the covariance matrix is fixed. However, `multi_normal_lpdf` is rarely called with a fixed covariance matrix, whereas `metalog_qf` would typically be called with fixed `depths` and `quantiles`, so the redundant checking would be more painful. + +Another alternative would be for `metalog_qf(p, coefficients)` to check that `metalog_qdf(p, coefficients)` is positive. However, it is conceptually insufficient to establish that the quantile density function is positive at a particular `p`; it needs to be positive for every value in the [0,1] interval. Thus, although this check is cheap, it would not be dispositive on its own. + +We could use the Eigen plugin mechanism to have an addition boolean that is `true` if and only if it has been produced by a `_coef` function that thoroughly checks for validity. Since users would not be able to set it to `true` within the Stan language, they could not call the `metalog_qf` without having called `metalog_coef`. However, that seems heavy-handed, and there could be valid ways to construct the vector of coefficients in a higher-level interface and passing them to the `data` block of their Stan programs. + +I suppose we do not have to merge the quantile function pull request(s) at all, but I am obligated to make the branches under a NSF grant. This NSF grant also funds me and Philip to interpolate the log-likelihood as a function of `p`, as opposed to a function of 𝜃. We can potentially interpolate the log-likelihood as a function of `p` much faster than we can evaluate the log-likelihood as a function of 𝜃, but in order to do that, users would need to reparameterize their Stan programs so that `p` is declared in the `parameters` block, in which case they would need to use quantile functions in the `transformed parameters` block to express their prior beliefs about 𝜃. Thus, if we do not implement a lot of quantile functions for users to express their prior beliefs with, then they cannot capture the gains in speed from interpolating the log-likelihood as a function of `p`. # Prior art [prior-art]: #prior-art -Quantile functions have been implemented (but presumably not used much) in other languages for decades, such as: +There is no prior art needed in the narrow intellectual property sense because quantile functions are just math and math cannot be patented. However, quantile functions have been implemented (but presumably not used for much besides pseudo-random number generation) in other languages for decades, such as: * R: [qcauchy](https://stat.ethz.ch/R-manual/R-devel/library/stats/html/Cauchy.html) and similarly for other common distributions * Boost: [cauchy_distribution](https://www.boost.org/doc/libs/1_76_0/libs/math/doc/html/math_toolkit/dist_ref/dists/cauchy_dist.html) and similarly for other common distributions * Mathematica: The quantile function is a standard (but not always explicit) function in the [Ultimate Probability Distribution Explorer](https://blog.wolfram.com/2013/02/01/the-ultimate-univariate-probability-distribution-explorer/) -There is one [textbook](https://www.google.com/books/edition/Statistical_Modelling_with_Quantile_Func/7c1LimP_e-AC?hl=en) on quantile functions that spends a good bit of time pointing out that quantile functions are underutilized even for data (priors or anything to do with Bayesian analysis are not mentioned). Tom Keelin has a number of [papers](http://www.metalogdistributions.com/publications.html) and some Excel workbooks on the metalog distribution, which is defined by its quantile function that are essentially Bayesian but do not involve MCMC. I did a [presentation](https://youtu.be/_wfZSvasLFk) at StanCon and a couple at the Flatiron Institute on these ideas. +There is one [textbook](https://www.google.com/books/edition/Statistical_Modelling_with_Quantile_Func/7c1LimP_e-AC?hl=en) on quantile functions that spends a good bit of time pointing out that quantile functions are underutilized even for data (priors or anything to do with Bayesian analysis are not mentioned). Tom Keelin has a number of [papers](http://www.metalogdistributions.com/publications.html) and some Excel workbooks on the metalog distribution, which is essentially Bayesian but does not involve MCMC. I did a [presentation](https://youtu.be/_wfZSvasLFk) at StanCon and a couple at the Flatiron Institute on these ideas. # Unresolved questions [unresolved-questions]: #unresolved-questions -- Discrete distributions: Quantile functions exist for them, but they are not very useful because they are step functions. For completeness, we might implement them eventually but they are irrelevant to the NSF grant and the proposed workflow. -- Multivariate distributions: Quantile functions are harder to define for multivariate continuous distributions, and you never get any explicit functions. Thus, I doubt we will ever do anything on that front. -- Density quantile functions: These are the reciprocals of the quantile density functions (which are the first derivative of a quantile functions), which can be used for log-likelihoods. There is a Ph.D student at Lund who is working on this using Stan for his dissertation, but we will have to see who or what should implement that for Stan Math. -- Checking that the quantile function is increasing: In many cases, the quantile function is increasing by construction. But for some distributions like the metalog, the quantile function is only increasing over the entire [0,1] interval only for some pairs of quantiles and depths. If those quantiles and / or depths are not fixed data, then verifying whether it is increasing for all points in [0,1] is much more expensive than verifying so for a given point in [0,1]. -- Can we have a three-argument version `metalog_qf(p, quantiles, depths)` and a two-argument version `metalog_qf(p, coefficients)` where the coefficients are produced in the `transformed data` block by `metalog_coef(quantiles, depths)`? -- What to do about derivatives of implicit quantile functions with respect to hyperparameters? We have the implicit function stuff worked out in `algebraic_solver`, but I do not know how accurate the partial derivatives are going to be for probability distributions like the gamma. +There are a lot of unresolved questions about the other part of the NSF grant, which involves evaluating the log-likelihood in terms of `p`, rather than in terms of `theta`. However, quantile functions and there use as prior transformations can stand on its own and can be implemented before the rest of the NSF grant. + +I am not going to do + + - Quantile functions for discrete distributions because they are not that useful in Stan and are not useful at all for the intended workflow. + - Empirical quantile functions that for example, could be used to determine the median of a set of data points because they are not relevant to the intended workflow. + - Any of the `_cdf`, `_lpdf` etc. functions for distributions that have explicit quantile functions but not explicit CDFs because they are not that useful for the intended workflow. + - Quantile functions for multivariate distributions because they are harder to define and you never get to use any explicit functions. + - Density quantile functions, which are the reciprocals of the quantile density functions (which are the first derivative of a quantile functions) and can be used for indirect log-likelihoods. There is a Ph.D student at Lund who is working on this using Stan for his dissertation who might be interested in implementing this approach for the metalog density quantile function. + +The main question to resolve in the design is whether to have a `metalog_coef` function that would check whether `metalog_qf` is increasing over the entire [0,1] interval by checking whether `metalog_qdf` is positive over the entire [0,1] interval or to implement one of the two alternatives described above. + +Two minor questions to resolve in the design are + + - Should quantile density functions for probability distributions in the location-scale family have signatures that involve the location parameter even though the quantile density function does not depend on it? + - Should we expose "complementary" versions of the quantile (density) functions, like `_cqf` and `_cqdf`? R does not have separate functions put takes a `lower.tail` argument that defaults to `TRUE` but can be specified as `FALSE` to obtain greater numerical precision when the cumulative probability is close to 1. I would say no for now because when the quantile function is explicit, you can usually enhance the numerical precision internally. + +There are some questions as to how many pull requests to do, in what order, and which quantile functions should go into which pull request, but those questions can be resolved later. From 5aaf7d32c4eb3170c60c5633aeb7154b5b36a37b Mon Sep 17 00:00:00 2001 From: Ben Goodrich Date: Tue, 31 Aug 2021 12:42:53 -0400 Subject: [PATCH 5/9] use png --- designs/0028-quantile-functions.md | 24 +++--------------------- designs/0028-quantile-functions.png | Bin 0 -> 18399 bytes 2 files changed, 3 insertions(+), 21 deletions(-) create mode 100644 designs/0028-quantile-functions.png diff --git a/designs/0028-quantile-functions.md b/designs/0028-quantile-functions.md index eedb2d7..8815e07 100644 --- a/designs/0028-quantile-functions.md +++ b/designs/0028-quantile-functions.md @@ -79,27 +79,7 @@ The `metalog_qf` interpolates through these _K_ points that the user provides. I The `metalog_coef` function that is called in the `transformed data` block above would output the `coefficients` that are needed to define the metalog quantile function as a weighted sum of basis functions, i.e. a, b, and c if _K_ = 3. These `coefficients` are not particularly interpretable but are implied by the readily-interpretable `depths` and `quantiles`; in other words, they are the `coefficients` that imply `metalog_qf` interpolates through those _K_ `depths` and `quantiles` and are thus a solution to a linear system of _K_ equations. When _K_ = 3, the equations to be solved using linear algebra are - - | | | | | | | | -- | - | ------------ | - | -------------------------- | ----- | ----- |-----| --- -1 | | `logit(d_1)` | | `(d_1 - 0.5) * logit(d_1)` | | a | | `q_1` -1 | | `logit(d_2)` | | `(d_2 - 0.5) * logit(d_2)` | | b | = | `q_2` -1 | | `logit(d_3)` | | `(d_3 - 0.5) * logit(d_3)` | | c | | `q_3` - +![system of equations][K3] The `metalog_coef` function would check that both `depths` and `quantiles` are strictly increasing and that the quantile density function is strictly increasing throughout the [0,1] interval, which is fairly expensive if _K_ > 3 but only needs to be checked once if both `depths` and `quantiles` are constants. Then, the `metalog_qf` function that is called in the `transformed parameters` block only needs to check that `p` is between 0 and 1. @@ -299,3 +279,5 @@ Two minor questions to resolve in the design are - Should we expose "complementary" versions of the quantile (density) functions, like `_cqf` and `_cqdf`? R does not have separate functions put takes a `lower.tail` argument that defaults to `TRUE` but can be specified as `FALSE` to obtain greater numerical precision when the cumulative probability is close to 1. I would say no for now because when the quantile function is explicit, you can usually enhance the numerical precision internally. There are some questions as to how many pull requests to do, in what order, and which quantile functions should go into which pull request, but those questions can be resolved later. + +[K3]: 0028-quantile-functions.png \ No newline at end of file diff --git a/designs/0028-quantile-functions.png b/designs/0028-quantile-functions.png new file mode 100644 index 0000000000000000000000000000000000000000..9e4da89f508f39f13e634a2770a7205902590a63 GIT binary patch literal 18399 zcmb`v2RxR4-#30FGD9e%WM)K2X4%=vYLK0TB3t$*J47KAA&Jb&NQ9D|rj(tL6q3rS z{_pb||L3}|>%O1odY=0}U$38^mpISk_>Rx#y}n1Rp}y8;8V(wQAU5l0s~ZypnFYQ- zOGSpSQ!Q-4_#d^K_6bjdpnpvIZ^LB~`t1b4P3Wkr9`()sl5@tC<>`+FC$KBjFIZ$S3>TpMl8|{8+LA0+xdryLX7dzuHCt+?mR8r+J;*w z)XBHBjD|mw_$)zJ{g7*S%}8DKk5bVe^Dp(hy}uXFDFxM+e9DeERXsp#{OgM-9NYHS z?@twkvTM+hUI`)13A~#~ueK%|%`JcV#ShjJ`7vRn*H`fjqsNa(uVvP5AzK^LYdu~f z|KsP+3fH*SetbES5y*y@?bzt(qo2**3!|!Mu3rA_CnVV1w#h?U);4FwU5x`TTM0HX zF)@qDBSrd`t#nBr?62d~-}hV1Orsi+LI=t(qnu*xtZ zCmbCeSy))K#hG!57i_h4bjEfUmz7;p47k6v=g^8m+I5>#8&@*7SN1B%6bWfvN=?<_ zzvTSB``I&bF|jM>vV>B!d^DxGlmeHZh5X((Xq-TH;lhO*H*T1UUUAOi)UH zyhA>@W8%}Nvouj^d4_LFPn@!|lNw}-9c@qH!^I!Dskg=I%RsfmojZ5zBu^L^JR2Ia zw5vXpF82M~Hz!wD>j&rWIFoBcXV>q$y?5{4yyuQQJUm4P*N;!QYo&|PZQfj0SI2qa zyq=EETM@a-x_iXM1yeiDojZ5&;>907e(;*_6vJh_Z*LQl%NA8}2d_*vM^L)EyQc~oT)CM;%PN+Zo<36NspWg`_wV12 zA3v_Ds+xXwosxng?=J7xuV1Z$zF&@tio&%^sol^j8%mM&+W#v_`M2-V#}xc(Rn_8^ zJ?75N&V8S2gVxTIg^gky939(jB!(M<%q=Vi-@Iv)mv0LA>B7ZzeI(?MqJ)H@i{hx} zhI*C!=jZ1JV_V0Vn3=!&Hb@=*GFa!?acLK>h{ftoS((`0y(!xKjR8N!L_}y4jmTT` zXxUD2-M87n$$9VIy{My@+4iok_{790A3I~?OUa#*Jffm>gyHey0!eL@l$4h*U*14G zfBxJe>(lJ4bz@*`RnCQooeErey#nS?#=14lYBnUyO-wU z#r2OLKXNltQPS_oeeUQiv-Y~gymT1vM$qOQ)*(vs8UUT<$N z-cwpuw#&*tul*S7;55B=Kry@l2s2=(D?d>0yXjI!X)6eSf-M8=C?_X>C9D3KMvRn(! zP%6HA_ioY2;@B~1Uf!0Wp`c%@-z~D@;^Rej?>4;eInkctJ^N;6b~eA%`smTf)-m-) zUC%1#0m)4f%?AQR8Cd!_S1LpP1nX!k zYzV?eG83z2Gc9e&?b}zCg7%w8W3h>gi$5v9on9g)YFc6QZK?;myTRIdW@biFQPGhj z@8!(e&-sz25M}!4gMNe9AXx7CO#`8}-Wxs~QMo<#GC7%XIlrpPcv}mdGM2Y?k(}R; z&jab!!Lc`^@Rvn}h0XnpY1pbUYup4uPD%M^>7~K@`{$UQ#NCZdeI`^bEiKQU{gk=S zL5|ka!lJvoyQQ_&b+DEv(8tMXHkyI!ypNBFh)79E32kwm>xj~!LtVFS33+;YvbQUe z86G)u=r-Jj>@wUo#DU%W6jH{Z^}5?Q4bZtJz-1M>#+pxsxSo0}67 z=+Z8~eDMNn4GZ7sjjM}GFh<+NI%rqf(Yd*~KkMHNPM*xGJrEbwQqRZWGSax2$k^vV zZ&H_-c-YEnxUY}#s^;ChcO%W!iBN3coSdQgrau>NHg|UF?#~e`P{ZArn6O^G$zq_P zp>gck*3M30Vd3pE&TO*uU$SH^E-I4-j``$gHVv69OLOzJ@h7xnB28xYY8 z;>JkVAd{7NG}pKoWw+gWe@>&{x2Y>vwx!wHoj%=jUD?|1-te%MuVE|~xtW>S{rmU( z%dJ^`v?>iToXtZId^Qm6ueifgQjDyvuRCAQLxB}4e{xyGL_tAe6D8$}%Pyu^{0y}( zS(EL=jGC*fc+oF`)Q;idVWHC>KYik5igk3^X(w?-n_pLEp^VLvku@!%n~mv_og}U^ zxvA>=4|ZuWcD3A>Gg?VI&z(JMR_n&IGbt%1MnXy|;o`-e%E1Z__m-BHP;=xxr=(XBc_H3b?PtUR0?D>d(14T|M4)e|Q=e28sh@9sHKQ&a2f>qkUH;41|M1=F)fTc{$jP;@K!bXf*D zGmltWV$vi|l9|Y7Czcc!yL)?Y9ncOA4nA=rv#LrmOeN!hPg(9+a-lA2k z>&CUY2hvV&>>M0Of8v5cP*PH2XJ;4cf4a!aY?&eFKHiqNU6_R?M_*h-#M8}<>0eWzW~o|bIm~sKB@Ha_!BVnubvy zo+f2wWMGXyY6|%Ad9Q>-#l3r52PCb>$4LVa9en0`tx)X&Ny=b8l<wS3#iO_`w zx3LHB-@YZMpg8T|;B)>w{>^2w^KxTjV^B~~ph)YPe$3E~EeTI-&ka;PdHvcO1xXm) zSXM$pNK{n)C7F3|ec!;q%=9#+@9~<|wY81Ja8ro5aYIZD6(1j;s(C;_z|i2}6>;lc z8;Q5hGKrcI^a`q~szUX>lw{-L+GH#!4a;(}{X z(66|yBDevc>0VBE2Ep>zMSNlTT1jtSy~>tz53gQWnt92k92{vKKoA^>+qP}fscv9Mh}@#0r#`dKl>h8tXZQR5>Dc4S?w2o9 z^#s0gItK#q1O_%Nw4#pgWuseO6l1V^%gGc=emJGYrqa|cG|ESlan;1j$A{Kc)!Ok)T#!=X))ltYZvcX?5c8nU$cjgKBpD=86~ z{_f=Bvgywn_pMvEsHmvk)_E2sB_*}%h-1@*uBwV|c;Qpj)D(hd@V>^?y2i!Q-@iV4 zVuF@Fn%PTwj4o1JIgFBFXW)+khjwdDPEHe({_bwQ##XO!D_Td^!+LtybSThYW@oc9 zGj+Hb4d;5hs9sU1@^zS$S#&&mHu5k)wur5#r^jOBJfALWANMV5BJ@c-E5mh*Q<0eZ z`}gj-Kl~;wBeP#l?vc%iNx9W#Tx8{`r)8E8Fwgu-eIKtK5#6)rbfu}iy?sO0==idr z!L_$<-GXKx6$3Th&g)ARyqu?c2M(Vn2*8jJ2XRSy@?SM>UAh+CRIN zla?msGAJiy#kha}eu9w9ObGVAWvV^G2{=9yymt85vG4d)Hx@H~lzWlmrJ=0a{JVGW zmY0`bP;d)t9g~ule(>M{7Gb}DY0Fk=uFciZeAzL%S&yHoPJl%#O)N%Ty9v-v(^sf5KGl<@tE+3F zWZu9)wlv$nfpDAQp8uqKf+xtwO>W0X2_HEIqqbIVNi(HfBht~y$;Zd%n+(OX?IYqg zkto{O$&M6TMz3~Y3c9Z!D5r8<>f`j#i$_J1`7Z{{r&xiMsFt&4+jAFce$v_(ah0@=j7xp zB@)C};8H?=uhmjycStv|@kC6bPP(z%z=#P+JHQ3eX&g~i5@0J|d|Ba)u&7*LBJg}$Edv;skOIF(2qvRKOQz>cY zE_R1)yo5qJvD@{`ndh%wv5c!lw~m2TMMY6$7psLHIo^Kw33bD^k6fJ~A7~H2E@&5R zAHJegUJv?_`QX8UP6Iou3mU8dz$?M@WCXhB@&#^2uW#HN=Zo}nm=3YUGyX5TKd7*m z)6&2(kfHL(Vd~1XE-{7f~eZ{$>#8|ycL^7s*}}*-pXbz&-3T+lc(-?9d6hw zBQteiT9BCpFqjz`L!W*?>y)ECBlQl=wDI?<6Wcnc$Li7<{g4O3M{d^E)nQPtc0K(H zoR3<9ZN6u|a=5{7n5e3RWGzofmG^e1-0$wtm?eCtWyOJOG-#2*J3`0Uk1z%Km76IykB5cSDR1^vEyTF zqc+{EHrz)6zmn2$cekjVTt?59*bgX3SFc{p7CCVIj@doufhv%Rb8jnnxVdfm%Phyn z#tI7yvjcd(cq$$TDn54X7&s2iA0@xV$oU#F;!$U3XJ22ktX#pBD{s&XQ@CI033y@? z=m{`Kw*ntEx5i3GkO}YEb9vU-mBL4BDkCe4OU`W+6g8W5X>qd4Ste`W#&M8Rdd~e` zm?U8Hg1-0fx3`^DX>V`$U0FPBXUA_HU~Oi0*2X4u>R4_;!Q-b-1w};SogN!oJdhKu zC+20PvB)>dvglxo~0AfddDCWzfpbo%_PR&mM@8=YfO0 z{aGg`K)&m8?z}QG$78aARIgpT)-Z{R3^*B0&xy+UuGXC|MH?j>%|J2WM`=@&^0-do zC@|TF4}9y{A7^HE2?)@X^X;noq;G2a7LZcF#ArpW=w*%TNTcu9aqM%vCop;Ok)o;T zX-^WJ%sSvR*WmXJSWCX^l&vj_iWlfIpfI)zRvwRhk%FS)ul4o57?zzoe=mPMrE1Pj z*9L_pQZVvTjhNg0^78U|=sR@*+S)PLZa}Lw=idJqZQiJ=R%I&6%$$m?vFPHzzFc|l z-uIPn(;+L9=_p{>fkXA)U09%sibEqKA#QHTC5=n8~g z;p@9R)*Ah=p<%%{5X{O%dQ$|ZIVp)XSdPcRN@mMBO)|PoAJEpfZr$qE%rA&Gh_Wnk zl#(n_^MU(^wuz3^bTP}go4-JBaBmn7n(`~@=|y>Y&9Y-V;-C0~{(;1Apx{ ztK2bxjA?0U30%t8eqKx7Q}wCWja@_g*s<3Tqmp(i(!AW6H}ds*$Zws)hc`1YJnQXs zadl-qdZ($$T1#uXyy^F)s0vAy;rhP8K?_q;p~YcZT3YkFj$1|!qj8-+eHz6?RTnpQ z4N`*ao*TIFiWnK&?}v_uCvNV`9tH`$bcyL!3m^w{payS=d?Ai)@r8HpK;&riak96+ z85c)$S@-hTn0IhSv0Xeo zBzn46{(7NmbG-_03NT@0B(-|%Qhn$y_MIy+-u~M;I0yo|$mr0k_wQ+m;?~O1jAnR-NvWaUHgRo^y~|iwy9*va-_n zLZ}fz^Ck+5XnvqtkcUULYDnvPqz=Th`EH7`_F?scu00$4b5DzHzr4E_=y1d?3JbN}gomAXelFoGHVlxS z;-GP^Mst1N%a^CloDmL;JF46Fk?cpI+@flTKRewfspy8v-_Q9bx*7g3sZuO_sTnad zHO-dy6xhH29ftMo+ghyX$*y!}0loI_?leg%Icwu1M_BimO?>|B3_ks>jhTtbRP@d8 z!+_JnRooNra*Ph2K0S#&pk?H5PB9yP^X3$a3mF+1smW;ur7}I4C2Lz4|MCUU_)bBA z6qwhslH-7YI_$3!(+ygL$cWEhz8pDnWCOuMgT-*z`Y$g#ZO7Q>FJ5$Zbbu7ghx~3> zaI+mZMqRVAx_0G?K&y-fZxim>KOn#$XaD8Y)P426vEkm$!^28qVsjuJ65?5O?AxF+ zT7g;YoqGD`}RZoS_SudeCoY*p~qfOQ1B3Q zg_o~Uh8QCiIm8(m14c_>_P?l=fgYMhDYw9&3y-H?J4sg~o`lc0+i884jxT9#+abj; z9tO=-e|eJ9OGg(K_MfWFMi==71yi;82kvKHziw?|A^Kj%h?n#}P-_>i|57~$mvZf`-EHiHDV>F zY5enhO}ufLj9?@QR$u4lTmk~t0OGWT`TZx}ar*6(l1jUB<^HLs7jtuSp=mY+ zt}snrgyi}wik20H1ncat6&o@dN>tr#zkqdS(o)Ja)W6h;gN9TsApVeW+o8%b+BI9kwCt~2R=S0mBTZo_oQ1KjzrG2a|fdr6g}57qS#2JZRVXY{t7Yl_={^=x7(p%wC5`nZL# ziM&NzcuQ4T`0?;h)XXroMi%t&P=#m%iY+WGUSG#Gq@+QBqo00wqMN?&R2!YSxw*SW zz3fP1w8vGF8%)2b$*J+;T4Z|kk2pB_CT7Sr{M(6Z|a3n>cOPMkOatiXEWPE8H#&ZFa$ zVUU;Z-1+i-w*SeHktoX>PCvJwbC%%v(lIu6cCY&T1>ZZ|ijSx5baJ2U+{(bfplW0n zt9~Osz6)wt!4}Qq^+3X1XPP?fs%mOvb=m9d>#?k_W>K;yK%~92xj{se)zH|~wB5e? zr1-6)x4SuwBnU!JAa!5$O=p9}TkaizeFN3iBlGf3dU;8i6iofk zYq0E0OuXvvAk^;S=a-Y0-%L-xUtWGId^m-TTHK5~l>&vDFA5pC`Pk{B4ih#lk<=EG zZZc&zZ??UDeawz;Wn~2j2$X`H>?9y`_-!b+0Ey?ly<@XK#mB|1&OMli&6b*z)3fyu z^yk&(Wx#GoE-@?#;*!s!VwnY925VE)^7>|$(CC4h2^CN|0AvRTLE=$s>p@8F#OUYG zbXx9y|9cJgT6j2_yCBP!3Fxq4^HBl<0#Fmdi8pWFyug(E$F_};EJ_V1YUhp}cdM#+ zQ#+vjG`N4*6sx{EQ*_<#JRMJdS-!1)WlaqV|6Y{n>UZe&4n0>j@(hzNef;Q%jkCJy zzbq>$$&~c8m7>t0H~WmUvt9MT(cN3^Yd|!{oc9O|Qxl=~`djLmXfwkO1h3b|#_Ak8 zM7#a~8ZoMNtOE(*|C8#vXwR$Y^^uX_R#325o|BQ0!F3M|3_wxOeVS0_6Pa?6~Rjjn=LqIBG}L*Y77ni;4tI)=;!wusta)6(v+)?2Oz2ZS_@X ztgU4e7T&%4(v<1~%)~q(_WU-01hK7BmC#uQM zS=M_&`V=mj)$sMLH5E*zh3+ho|MtzB3$0_&x1QEDwD8UCDcbVE zd0|b7l%jOCJV--A*J~g$`%pOx(KUsrNM=dO}v$D4D=vFl>SJ>_J_bkBi z<5qPZr_t2IHd2E5p$ftyny%@<8auOjzT{}-O#jM-$AP6d9D$uhmXqU#JNt49<*49%au!QeB!+)DN zw@s1Hkd-`<-@49Ym$LF7(64M;SU}(C8DuLXdmpDovX0{i~s=x8XvyNIZ0cSpyWbLZ|=R4n}bSyNGwJgU-? zF#hM)$RBqxdG;{iBQQ=jiU?ZxXT&3j4Hz&;w=hBQ3GvQ_#?ZF_G6p(d4iA^zzD;8} zUMmMMIXaq~p3dsPGv65wS#5oN9rFc{Fw_(h0(t=zKKzRGr;i^AqTNn%^>v;t(Et0{**ACK>5GYqvRt|yj2y(1xUCChvFaKcy95O(uThG*$z{WT_3@Ei z?1&)_@wZ=)b9y66JnG4k^}CZE7Z-;`o~frf0<3MUtJ`_|Sjphf5bmO|GR8*z6NrL< zt9(Rrs8Wu_@FR8J6el&K3-uBhGH94OXHxqV7{=mXL2Hqx9UGLfw&Q*KR%Ll62yTR} zEjs-q9IliWs1ZqA$}P`n2^Fl|vN9L&^|&})505ICnsD-=zc0%OO?1=li&Or+d@3kN zG24tQv%&v64e{vvS-P-8Y@@P^%U?f4goiURF)0PFJ32el5w*~fYTZAmfz*mTi=<{A zt)T8Vp}MU5(C^z#Fj$jh&0)%BsOcUa9(sCu1Oa>n(|*H-4I(CG6Ozh|wO+|S-i4*f zjRzockP83kY8uRxbvS%4iwqbxZ&vh~%g&9sT~hM&$&)gMK9!n10GeTO62 z>RdyXrgv!a2JR_7Fmh4HaA(z!Lx#|EBiiV4* zj?esXa}xydEXmVx+SXJMqcYo-gI1Sr-n@DLzD#d4%kq4MMEM_HHIxfjmDB`xigxyh zRiAQL`FmS*l!#`=Su+^<%PT8GweGFhka#o9Dp)Gh`zI%#M`dU4zCE?Jo#l*se*1v3d9Qt*^g7#OCUvqG-=J z;G<=GKM1OOTr_1QLJWeTg zHY#ZfqUQ3l?_Zj*11VAjQEaE*&#`e-ueP?f!0C0n@TDPl50CLuPt-y52)GAbIyPcZ zHW1^0o(7LDS&btRyr6ok$Z@yEI;hKLIyy6t!Zt3wMQ)^M!5HeVJ0_?vfQiuOtHF!9^WbStvrSm zSZF38PNQzXZgG&zT+j>Oa+B84(8wrL>I~ELf(~RA7nixlpf%rvQ}nD@TAVIh_@Rn@ z0?*B^>3JCN^WFRR07hdYUq3u`eRr37pwqq@K8jqx4`*t!*mi*%o9P~fDSeKAtkU)B z`SY+eNYrp0nE>w>A?*JRO0wXHF!Kw&mlywl<7dXmzI!FskVT1_;=Mkd-J;|&y?1>hlV(N z(mqvuBx@L9v_zPX(XI&_Gegh*^yytDfQDT?-QCGq(Ru>&Zz~TP7%=G9ZHZNH{QbMN zj8^FD&!1Y7l-QDBwSE{3zUh7LOH6ERF;73F%&cF%%Hc>)u}Ax2Lc-^nnYy60`}|Hr zQNA8AUOQsr;z(3TN(!$0jk2=2$Fa_9G%0>%1{a-8c4Pw@)1%-uA z1_oBJZ*_EZw#&L;3Bve9{d9PC?K|#Rp?C|7VTbOF<{66i_-og!(1iP53P3{EhbgeW z@@-4Gd6eqEBg)s3g*<)y*!kVvnckcOz@vG2d8@0dY99K2{{E{gD|uUl6%^L7RKxE} zvL(O}pkny8$JKJw8LHq&b90#IPsFlN#sP##lCU<-v2F3EZEZb%d_FOJ7+3(AA*GPt z>*7u8RglPW16Z=aJ%vR@P{BP{PE?)Fqow(F1e4?L=@}mrqkZTQA0ri(EQ=KvE!9Qt z#U+7Vy9VCB2d|Hcjz<4^+to#z^wh`O`|Fzqx3Lx&am_P!-SLAtKehgsJl|lna>UrU z^v<1=x9G_^63>5VOGIrGx2$J5y!P1kiX{9X~nUPVzlaNxnihn9wjMh1g;H2q#Zw5_F+@S9Zw^g;lvJwRMk)EN#> zpe*X{$I!>jG!o*)3EZn(ggNh>3x;^_}`f|{30gVF%K zWAP1o9uVc8gtkFQ0Yb9Vf+txhU^xSUBZIc=f(>-k$SBXaSW;myfqfsO>0K^k&?ylf zXn!Asq#O3GZ^_DEb^Y=DT=msCL!mU_F8AQ`D3K>NT}tE!B7+>llKSlg2TLg}Ka&oI zv9#foMY}F5i!;mf2zCwfN`2RZ2R9OJ$u*dc;4e&xfoo6{n~7U)a3eW}6D6P$Q#C_Yaz7XW>#I z2e(k)xmEZ9z;$EbiieIx3Zxir0E{ZYz!-#hZZ`kA0@&UWO>yt_^XhL zSpMW#`}Ys(>(MOF zpnH}azXKqxtEWd!$Q?M)wYy{-YiR=k{}P;(AZVgqym@mIK1X@ExFrAG$Bt*tOioNh zMn+~UhgkW_?R7^`LdM8+>NC;4B_sXt z1-f`K8#-|izRN#(l(y!_AnPR9{@v<+21dQ0fJ(PQY%ltMpSV?Xd+~vM4unLqvqcX- zdNHPPvR;Lo5tIK`LDj3!uGVYADd%q;>>3(3paESrv-GbWZx>?n^Kv-Kr+O{XV4L4y zP3tY>e=|}Vt?1E!H*P$5@;}d;r}1IwVtwB~2Llr-W!L?~Y3=z3@~<**DXmW{|6z9# z*4zM#4f1xC+1@0H|FbE;@`{VJ+y0C6_P@FUDcrmAq!Pp|Nlf0;^T7Gf84EGWM|a}BJ9g}tSmIuuOt*%+#mLGUExGp^ zx0pq(p_NrI@@ZqQVahTyGk0}%Lfqa7Ti`_;x^u|)*Cxo7pm@KairravUTYXK7#Q0J zaArV!ygOQ*558BGU}n~a%?Ht(PE+NU$9NLJNbt*f;MP#Ixv-~$&f>gnn-vFB;&!Vw?{ zCiO(M*3nUJJ%#OHP4|i$8-K$q6MBE<%o)J3OSvCEH&`?Vvj4?+uzH1jF?zen-YvJb zBFGGi2&hfJLk=l&;9O*ru$Ch2LdVrn!al`7^pm24sSerK;`ROm2LumaeKj~J8ZZ;Z zYhu~{>{)aW>XdN50B=f+(#!TQ?`zOJp4nC!zJbRw?=H7vH7hYupS425m5+-;YwK2z zsluyA5IJsu(hlXhX=Q5rnRlv57~ypelViscV`89DTQ>zO_TNNB;P<(ueQ}3;6~H$9 z@wD#{EJ5zuP0r{;U~99HFg7;s`&-idEm&rbe$vH@7u?IhHZ!xcNht;XjysIV`v4Y^ zL|o5LuZmBeJc$<7Ap8op%EN{RIc2V`5f1u0Jy%5%iOMgL9bOK|GM)Npsz}nD| zdoF??7y`p0YF1l$-i-oJ#8r&80?fe0gSZb7pd+XnLk6w<-rC)j-cnTo2aOZr8$#YlKX3vByQ0nBf7 z>{wKl2gDRKt3C&sfA9ewEvvKrH>Lm00v9$JD^MXH>wNB9xpkA++(p|w(iLM@krBzo zs^olp`S244lnw$d8(PIf?(p^XrHbP{#LtAEz{BKDl&E?Uh2*aO+F^lI)M2B}zSplc z_?ey$Dzw{(rf8=*k6P}n=iY$MrIzP*&6qCp-_2f6#l6Z#|IEmc!-P&v*>8#%`|{=2 z$LD+u3`sc$e2`HqJd_^%du1At(WASip?b^7u@P=C%3Gru4#xYAKaNETk9Y*NCsv); zS_fzVve^p#BVG&O7z7Ex`rX|6Iy#DTb5+LK<;cPo7RC-zpI0ZzuAb0pHW2W{M|Ve8 zy8~whzEt!`frWeg_)X+uxEZHnvtPad zFG1~{EWHnf~{Z3 zS{gY~3r|k_Yp#ep=yxs+A`P;8_o7mUh5YS4WQc;(PRJ8 zQR?F(W;CT6vw@6dv@}*h{xG*-Kq0LAs0p|?Fsm(Cqk@bVwSzWCv1HB7}N)GuJ8v&ON32Aa_Y7;s6!X3J2q_ZO&18#B_2q6_0&)|)VkM1TyVMnd5g8b@y zOWAYKvhU5Cshs$&@o(N7=kY84Pb1e$1>jT5%h`$n+uYy%hQhg1(Vx6u`INSvmR2-8 zKm*pbZd0>xXK zzh#UPr;IzkN0m?GOjp(ABCxw-$1cx{kmMOkN=j05 zrS&t}hFp?WgYQ?w_aKF#!Qo7u4i4^hc8&+gK~NwFXz@+^kBSz|_o8`{@@(&2n@*UT z8Wz6-nt0mQ_8YKCZ7LCQ#N(x=8^n{f;XDJGkNS<0MYo=}O(7Eqq z3;F(>-l+}={X%1G=wgUNlhsewxV3e4>w!Vc%s7Zg2yY_7fVz0QxHuEhsQIzc7plXr zFITCp&LJr|JvG%5)=5&+hU&lLN+43f{A2Ml{I`PMvSt78dj1Fh;6x!y6(G=2lm=Y; z#O&;W1lt51&_`3zZT&X~$hOl%}TZ^U-y{*C49%^V7z?aC>)OK6~WM8RE9MoH z!T<4GxZD1}=fWAR4X$k|q;-UgvxAK-L+0Gohlc(^qrD zk6ifIe0g+g6|7FFfNincWI$40TS}0gdJDj}AmG1rjF>@1%HUJZxNSM{Bf|c2H>>P5a_&Gv8jjNwZ{u<*Iab~x|Iia0N;=;mXq6N<#pIuX2Tb(UG zx!}`|jFkMZFE{lv$ET)#kODW-s$dXkoolFQen~iyv3+4=pTH#~o4B~R9`lkB@TDOx zB04-ZH+RwFV@5mzKdQ9mP;2yfQ+(IRT)}jm*sA=s4Rqp74GOKK=gv8jvvLKjgz&BCb6p4+IE^e)A3(G1{LfiNwG42e}6cT z$O^`)i-r6ljlH3#z-T;tR%qP~0<06Gcvp0Zo&R0YF3C zyj=#TaiOH?cu4TfCGtKL2CqZN6s%rQKO5@R0rzorh!8kli0oeMtgi4SNNqsshX6A0 zLwrhaZ&_a61{dIqAw=d8uHxn8?Y*u%KQ*se zK($Yj)PwWq&trTLE`X5pT)Q?mKd&q&*IuJ~#!K~N?-~vg zA{=+#-5vPE06cs1<{?;m$z6%^^L#N153!vf{$aZz3WG4&kHy874_^Tk!auF8tQ^Mk z6J*a*s^hr%VfCyO0um4sff^bV77`tZe8k zsrJpbXoQ3`UA)+99r9}z?a9-j4c>Xsl`AR?Cmymw^MbD0m<6{Kn-V24A`S3IA!wCF zCtk?h($GzPI}~X%Gm;#OhCC*{H$NtHY6(ybo`K3%nSJ{_y}Z!1Qg(C!g7MCBW2_fu=mqv$ERo`$&s zw z!|fjL?w$c0(bR<{jXXV?YTa4nZvU3K-z7qW%cGA5q4>bEh_#N&@p3oGWA3 zwwBTYHxE}k7^ta*ldoM=K>%F-k>io<5@%={5dKI>f&w5P1GxS{`=9+uD%Ja`Ex^au z`u1&xtf?pVba64D?(OcR+Vfw`8monQJG#2wqp|}CgGD+!e}=QW95+2Xy9V1?caMM> z|6DIg(b>2W(F)L$K^Tyvs4T>XO(!|_LE8j8^{*{di4XUN7yT4xgqB%~x$SPzzD*x* zr~EZ@T7s=_her=9PBGgf6e|%kiYHZ&WP}cTPj~0$x<*I~=GNBAOtznO4gCC8@Hhde zgWx&v=a78GQzGO~wnW05KEzI*Ej~0f1gmfrNdC*0FHeTjJM7@Veg@~oYOTVkd2cn) z))o>Hs`Z$RONiW>C^fW!Kv8=0`t{)qi7}{D>7sl4VhX|Vfg5p*t>wdq1R=OD*fnr{ z;dKP&H&SaIuQ#-`gg5sY?lOe`^6}14?xg8F(z76P6&?-`55Io>8fLt+%=YM3WGI^J z4r;r)y88LaN2yf;+~X*>^-s2#ihbhDFBRh)U0p+v>M=LZYUuZG2>98HsM!?lNj%>H z%uCEM>e_np5s3PDXammFgcLuqxXh2;~gtPZwIO5zvUmekw}gc zL&5@7shCHUgIw}iCKAdRpu3*-Z|M1aEkdZzsN5b~QM-dNqaMs>KIi_I0)O~B1b`-V zO~~?xbd%Da9bBQ7kx9N@w`I0FUM3C-s6`a6vW@h(nYQ)`1O?pO^rp1`@ewW!6;U$J zt}9O`m_*v(*g8&Z>^Vh^FCp|4)1W)uGZ0dv&w5dZ)H literal 0 HcmV?d00001 From e6d9e8e4b27a13695af69db8f936760e112f3bd9 Mon Sep 17 00:00:00 2001 From: Ben Goodrich Date: Wed, 1 Sep 2021 16:48:50 -0400 Subject: [PATCH 6/9] updates based on comments from Dmytro --- designs/0028-quantile-functions.md | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/designs/0028-quantile-functions.md b/designs/0028-quantile-functions.md index 8815e07..e2a144c 100644 --- a/designs/0028-quantile-functions.md +++ b/designs/0028-quantile-functions.md @@ -17,7 +17,7 @@ As a side note, Stan Math is already using internally this construction in many drawing from a standard uniform and applying a quantile function to it is called "inversion sampling". So, this design doc is mostly just a plan to make quantile functions available to be called directly. However, it does make the Stan program look more similar to one that draws from the prior predictive distribution in the `transformed data` block for Simulation Based Calibration. -If we were to unnecessarily restrict ourselves to quantile functions for common probability distributions, this method of expressing prior beliefs about 𝜃 is no easier than declaring 𝜃 in the `parameters` block and using a prior Probability Density Function (PDF) to express beliefs about 𝜃 in the `model` block. However, if 𝜃 has a heavy-tailed distribution, then defining it in the `transformed parameters` block may yield more efficient MCMC because the distribution of the cumulative probability (when transformed to the unconstrained space) tends to have tails that are more reasonable. In addition, expressing prior beliefs via quantile functions is necessary if the log-likelihood function is reparameterized in terms of cumulative probabilities, which we intend to pursue for computational speed over the next three years under the same NSF grant but that will require a separate design doc. +If we were to unnecessarily restrict ourselves to quantile functions for common probability distributions, this method of expressing prior beliefs about 𝜃 is no easier than declaring 𝜃 in the `parameters` block and using a prior Probability Density Function (PDF) to express beliefs about 𝜃 in the `model` block. However, if 𝜃 has a heavy-tailed distribution, then defining it in the `transformed parameters` block may yield more efficient MCMC because the distribution of the cumulative probability (when transformed to the unconstrained space) tends to have tails that are closer to a standard logistic distribution. In addition, expressing prior beliefs via quantile functions is necessary if the log-likelihood function is reparameterized in terms of cumulative probabilities, which we intend to pursue for computational speed over the next three years under the same NSF grant but that will require a separate design doc. However, there is no need to restrict ourselves to quantile functions for common probability distributions, and it is conceptually easier for the user to express prior beliefs about 𝜃 using very flexible probability distributions that are defined by explicit quantile functions but lack explicit CDFs and PDFs. Examples include the Generalized Lambda Distribution, the metalog(istic) distribution, the no name distribution of the first kind, and distributions whose quantile function is a spline. In each of these cases, the user would specify a few pairs of cumulative probabilities and values of 𝜃, and then an increasing curve would be run through these user-supplied points. In other words, the quantile function would interpolate through the prior median, the prior quartiles, etc. Thus, from the user's perspective, all of the hyperparameters are _location_ parameters --- even though they collectively characterize the shape of the prior distribution for 𝜃 --- and location parameters are easier for users to think about than other kinds of hyperparameters (especially expectations). @@ -90,7 +90,7 @@ An exception is thrown by any quantile function if a purported cumulative probab From a developer's perspective, there are four categories of quantile functions: -1. Explicit functions with a predetermined number of hyperparameters. Examples include the Cauchy distribution, the exponential distribution, and maybe ten other distributions. All of the arguments could easily be templated. For most such distributions, we already have the CDFs and PDFs, but an exception is the Generalized Lambda Distribution, which does not have an explicit CDF or PDF. +1. Explicit functions with a predetermined number of hyperparameters. Examples include the Cauchy distribution, the exponential distribution, and maybe ten other distributions. All of the arguments could easily be templated. For most such distributions, we already have the CDFs and PDFs. An exception is the Generalized Lambda Distribution, which does not have an explicit CDF or PDF. Also, there is a useful Johnson-like system of distributions with explicit quantile functions that do have explicit CDF or PDFs. 2. Explicit functions with a number of hyperparameters that is determined at runtime. Examples include the metalog distribution, the no name distribution of the first kind, and a few other "quantile parameterized distributions". All of the arguments could be templated, but the hyperparameters are vectors of size _K_. None of these distributions currently exist in Stan Math because they do not have explicit CDFs or PDFs. 3. Implicit functions where all the `var` hyperparameters are outside the kernel. Examples include the normal distribution and the Student t distribution with a fixed degrees of freedom. In this case, the partial derivatives are fairly easy to compute. 4. Implicit functions where some of the `var` hyperparameters are inside the kernel. Examples include the gamma distribution, beta distribution, and many others that are not in the location-scale family. In this case, the partial derivative with respect to a hyperparameter is not an explicit function. @@ -153,7 +153,7 @@ return_type_t metalog_qdf(const T_p& p, return slopes; } ``` -If the `_qdf` can be written as a polynomial in `p`, then it is possible for the `_coef` function to quickly check whether the `_qdf` has any root on the [0,1] interval using Budan's [theorem](https://en.wikipedia.org/wiki/Budan%27s_theorem). If the `_qdf` is not a polynomial, then the `_coef` function would have to call one of the root-finding [functions](https://www.boost.org/doc/libs/1_76_0/libs/math/doc/html/root_finding.html) in Boost to verify that the `_qdf` has no root on the [0,1] interval. +If the `_qdf` can be written as a polynomial in `p`, then it is possible for the `_coef` function to quickly check whether the `_qdf` has any root on the [0,1] interval using Budan's [theorem](https://en.wikipedia.org/wiki/Budan%27s_theorem). If the `_qdf` is not a polynomial, then the `_coef` function would have to call one of the root-finding [functions](https://www.boost.org/doc/libs/1_76_0/libs/math/doc/html/root_finding.html) in Boost to verify that the `_qdf` has no root on the [0,1] interval. Alternatively, we could approximate a non-polynomial with a Chebyshev interpolant and write the interpolant as a polynomial in `_p` as advocated by John Boyd and others. Boost has a well-known monotonic [interpolation](https://www.boost.org/doc/libs/1_76_0/libs/math/doc/html/math_toolkit/pchip.html) function called `pchip` that could be used as a flexible quantile function. It consists of piecewise cubic polynomials that are differentiable at the given points. However, the `pchip` constructor returns a callable, and it would be computationally wasteful to reconstruct the spline at every leapfrog step when the points that it is interpolating through are fixed data. But the Stan language currently lacks something useful like ``` @@ -241,10 +241,12 @@ In the workflow proposed above, it would be the user's responsibility to call `v One alternative would be for `metalog_qf` to check that it is increasing every time that it is called, in which case we might as well make a three argument version `metalog_qf(p, depths, quantiles)` and avoid having the two-argument version `metalog_qf(p, coefficients)`. That would be more similar to what `multi_normal_lpdf` does, even if the covariance matrix is fixed. However, `multi_normal_lpdf` is rarely called with a fixed covariance matrix, whereas `metalog_qf` would typically be called with fixed `depths` and `quantiles`, so the redundant checking would be more painful. -Another alternative would be for `metalog_qf(p, coefficients)` to check that `metalog_qdf(p, coefficients)` is positive. However, it is conceptually insufficient to establish that the quantile density function is positive at a particular `p`; it needs to be positive for every value in the [0,1] interval. Thus, although this check is cheap, it would not be dispositive on its own. +Another alternative would be for `metalog_qf(p, coefficients)` to check that `metalog_qdf(p, coefficients)` is positive. However, it is conceptually insufficient to establish that the quantile density function is positive at a particular `p`; it needs to be positive for every value in the [0,1] interval. Thus, although this check is cheap, it would not be dispositive on its own. For example, if `depths = [0.4, 0.5, 0.6]'` and `quantiles = [-0.1, 0, 0.5]'`, then `metalog_qf` would be U-shaped with a trough just above `0.4`. In this case, this check would pass if `p` were to the right of the trough even though `metalog_qf` is decreasing (and hence is invalid) to the left of the trough. We could use the Eigen plugin mechanism to have an addition boolean that is `true` if and only if it has been produced by a `_coef` function that thoroughly checks for validity. Since users would not be able to set it to `true` within the Stan language, they could not call the `metalog_qf` without having called `metalog_coef`. However, that seems heavy-handed, and there could be valid ways to construct the vector of coefficients in a higher-level interface and passing them to the `data` block of their Stan programs. +It has been suggested that if the `metalog_qf` implied by `depths` and `quantiles` is not strictly increasing over the entire [0,1] interval, then we could drop some of the points so that it becomes strictly increasing. That would not work well in HMC if `depths` and / or `quantiles` were an unknown `var` because of the discontinuities, but even in the case where both `depths` and `quantiles` are fixed data, this would be taking the unprecedented (in Stan) step of conditioning on something other than what the user said to condition on, even though what the user said to condition on makes no sense. + I suppose we do not have to merge the quantile function pull request(s) at all, but I am obligated to make the branches under a NSF grant. This NSF grant also funds me and Philip to interpolate the log-likelihood as a function of `p`, as opposed to a function of 𝜃. We can potentially interpolate the log-likelihood as a function of `p` much faster than we can evaluate the log-likelihood as a function of 𝜃, but in order to do that, users would need to reparameterize their Stan programs so that `p` is declared in the `parameters` block, in which case they would need to use quantile functions in the `transformed parameters` block to express their prior beliefs about 𝜃. Thus, if we do not implement a lot of quantile functions for users to express their prior beliefs with, then they cannot capture the gains in speed from interpolating the log-likelihood as a function of `p`. # Prior art @@ -269,9 +271,11 @@ I am not going to do - Empirical quantile functions that for example, could be used to determine the median of a set of data points because they are not relevant to the intended workflow. - Any of the `_cdf`, `_lpdf` etc. functions for distributions that have explicit quantile functions but not explicit CDFs because they are not that useful for the intended workflow. - Quantile functions for multivariate distributions because they are harder to define and you never get to use any explicit functions. - - Density quantile functions, which are the reciprocals of the quantile density functions (which are the first derivative of a quantile functions) and can be used for indirect log-likelihoods. There is a Ph.D student at Lund who is working on this using Stan for his dissertation who might be interested in implementing this approach for the metalog density quantile function. + - Density quantile functions, which are the reciprocals of the quantile density functions (which are the first derivative of a quantile functions) and can be used for indirect log-likelihoods. There is a Ph.D student at Lund who is working on this using Stan for his dissertation that is interested in implementing this approach for the metalog density quantile function. + +As usual, there is a question of naming. I prefer `_qf` suffixes to `_icdf` suffixes for third reasons. First, `_qf` is more coherent if there are also going to be `_qdf`, `_ldqf`, etc. functions. With `_icdf`, then it is less obvious that `_qdf` is the derivative of it. Second, although `_icdf` would perhaps be more descriptive if we were only talking about distributions in category (1) that have explicit CDFs, the `_icdf` suffix is less intuitive for the Generalized Lambda Distribution and distributions in category (2) that have explicit quantile functions but lack explicit CDFs. If anything, the CDF should be called the inverse quantile function when the quantile function is primitive and taken as the definition of the probability distribution, which is how I am trying to get Stan users to think. Third, some of these functions would have an argument that will be documented as `quantiles`, which corresponds better with the `_qf` suffix than with the `_icdf` suffix. -The main question to resolve in the design is whether to have a `metalog_coef` function that would check whether `metalog_qf` is increasing over the entire [0,1] interval by checking whether `metalog_qdf` is positive over the entire [0,1] interval or to implement one of the two alternatives described above. +The main substantive question to resolve in the design is whether to have a `metalog_coef` function that would check whether `metalog_qf` is increasing over the entire [0,1] interval by checking whether `metalog_qdf` is positive over the entire [0,1] interval or to implement one of the two alternatives described above. Two minor questions to resolve in the design are From 1d25c85683629765ec86bacc09a72563567673f2 Mon Sep 17 00:00:00 2001 From: Ben Goodrich Date: Wed, 1 Sep 2021 17:37:45 -0400 Subject: [PATCH 7/9] mention GLD --- designs/0028-quantile-functions.md | 55 ++++++++++++++++++++++++++++++ 1 file changed, 55 insertions(+) diff --git a/designs/0028-quantile-functions.md b/designs/0028-quantile-functions.md index e2a144c..f71e4c3 100644 --- a/designs/0028-quantile-functions.md +++ b/designs/0028-quantile-functions.md @@ -121,6 +121,61 @@ return_type_t cauchy_qdf(const T_p& p, ``` Here there is a minor issue of whether the signature of the `cauchy_qdf` should also take `mu` as an argument, even though the quantile density function does not depend on `mu`. +The Generalized Lambda Distribution (GLD) is a flexible probability distribution that can either be bounded or bounded on one or both sides depending on the values of its four hyperparameters. The GLD is not in Stan Math currently but is defined by its explicit quantile function since its CDF and PDF are not explicit functions. There are actually several parameterizations of the GLD quantile function that are not entirely equivalent, but the preferable parameterization looks like this in Stan code (when the median is zero and the interquartile range is one) +``` + Standard Generalized Lambda Distribution quantile function + + See equation 11 of + https://mpra.ub.uni-muenchen.de/37814/1/MPRA_paper_37814.pdf + + @param p real cumulative probability + @param chi real skewness parameter between -1 and 1 + @param xi real steepness parameter between 0 and 1 + @return real number that can be scaled and shifted to ~GLD + */ + real std_gld_qf(real p, real chi, real xi) { + real alpha = 0.5 * (0.5 - xi) * inv_sqrt(xi * (1 - xi)); + real beta = 0.5 * chi * inv_sqrt(1 - square(chi)); + if (chi < -1 || chi > 1) reject("chi must be between -1 and 1"); + if (xi < 0 || xi > 1) reject("xi must be between 0 and 1"); + if (p > 0 && p < 1) { + if (chi != 0 || xi != 0.5) { + if (fabs(alpha) != beta) { + real s = alpha + beta; + real d = alpha - beta; + if (alpha == negative_infinity()) return 0; + return (p ^ s - 1) / s - ( (1 - p) ^ d - 1 ) / d; + } else if (xi == 0.5 * (1 + chi)) { + real d = 2 * alpha; + return log(p) - ( (1 - p) ^ d - 1 ) / d; + } else {// xi == 0.5 * (1 - chi) + real s = 2 * alpha; + return (p ^ s - 1) / s - log1m(p); + } + } else return log(p) - log1m(p); // chi == 0 and xi == 0.5 + } else if (p == 0) { // equation 13 + return xi < 0.5 * (1 + chi) ? -inv(alpha + beta) : negative_infinity(); + } else if (p == 1) { // equation 14 + return xi < 0.5 * (1 - chi) ? inv(alpha - beta) : positive_infinity(); + } else reject("p must be between zero and one"); + return not_a_number(); // never reaches + } +``` +The quantile function of the general GLD would be like +``` +real gld_qf(p, real median, real iqr, real chi, real xi) { + return median + iqr * std_gld_qf(p, chi, xi); +} +``` +In this case, the GLD is guaranteed to be strictly increasing if the stated inequality conditions on `chi` and `xi` are satisfied. However, if the user supplies a prior median, prior lower / upper quartiles, and one other pair of a depth and a quantile, it is not guaranteed that there is any GLD quantile function that runs exactly through those four points. We can provide a helper function that takes + + * Inputs a prior median, prior lower / upper quartiles, and one other pair of a depth and a quantile + * Outputs the values of `chi` and `xi` that are consistent with those four points + * Throws an exception if there are no such values of `chi` and `xi` + +The main difficulty with that is that the equations are very nonlinear and difficult to solve numerically. The `algebraic_solver` in Stan Math currently sometimes fails to find values of `chi` and `xi` that are consistent with the prior quantiles even when admissible values exist, which is a source of frustration. Another difficult is that sometimes the "right" values of `chi` and `xi` imply a GLD that is bounded on one or both sides even though the user intended for all real numbers to be in the parameter space. There is not a lot that can be done about that, except print a warning. + + ## Category (2) For distributions like the metalog, there would be an exposed helper function with a `_coef` postfix that inputs vectors (or real arrays) of `depths` and `quantiles` and outputs the implied vector of `coefficients` after thoroughly checking for validity (and throwing an exception otherwise) From 776269fb3f83e18b472cdac27bf5aaab51cc8704 Mon Sep 17 00:00:00 2001 From: Ben Goodrich Date: Thu, 2 Sep 2021 14:19:44 -0400 Subject: [PATCH 8/9] add more alternatives --- designs/0028-quantile-functions.md | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/designs/0028-quantile-functions.md b/designs/0028-quantile-functions.md index f71e4c3..fe5ec7e 100644 --- a/designs/0028-quantile-functions.md +++ b/designs/0028-quantile-functions.md @@ -296,10 +296,14 @@ In the workflow proposed above, it would be the user's responsibility to call `v One alternative would be for `metalog_qf` to check that it is increasing every time that it is called, in which case we might as well make a three argument version `metalog_qf(p, depths, quantiles)` and avoid having the two-argument version `metalog_qf(p, coefficients)`. That would be more similar to what `multi_normal_lpdf` does, even if the covariance matrix is fixed. However, `multi_normal_lpdf` is rarely called with a fixed covariance matrix, whereas `metalog_qf` would typically be called with fixed `depths` and `quantiles`, so the redundant checking would be more painful. -Another alternative would be for `metalog_qf(p, coefficients)` to check that `metalog_qdf(p, coefficients)` is positive. However, it is conceptually insufficient to establish that the quantile density function is positive at a particular `p`; it needs to be positive for every value in the [0,1] interval. Thus, although this check is cheap, it would not be dispositive on its own. For example, if `depths = [0.4, 0.5, 0.6]'` and `quantiles = [-0.1, 0, 0.5]'`, then `metalog_qf` would be U-shaped with a trough just above `0.4`. In this case, this check would pass if `p` were to the right of the trough even though `metalog_qf` is decreasing (and hence is invalid) to the left of the trough. +An alternative suggested by Brian Ward would be for `metalog_qf` to have the code that checks whether the quantile function is increasing over the entire [0,1] interval be wrapped in a `ifndef X ... endif` statement so that the check could be disabled by a compiler flag. This would ensure validity if that code were executed, but has the disadvantage that it could not be avoided in a precompiled Stan program, such as those that come with rstanarm unless the user were to reinstall it from source every time they wanted to switch the compiler flag on and off. + +Another alternative would be for `metalog_qf(p, coefficients)` to check that `metalog_qdf(p, coefficients)` is positive. However, it is conceptually insufficient to establish that the quantile density function is positive at a particular `p`; it needs to be positive for every value in the [0,1] interval. Thus, although this check is cheap, it would not be dispositive on its own. For example, if `depths = [0.4, 0.5, 0.6]'` and `quantiles = [-0.1, 0, 0.5]'`, then `metalog_qf` would be U-shaped with a trough just above `0.4`. In this case, this check would pass if `p` were to the right of the trough even though `metalog_qf` is decreasing (and hence is invalid) to the left of the trough. In a HMC context, if there is any point in [0,1] where `metalog_qdf` is negative, it will undoubtedly be hit at some point, and we would throw a fatal exception. However, in the context of optimization, it is more plausible that no such point will be landed on along the optimization path, even though such points may exist in the [0,1] interval. We could use the Eigen plugin mechanism to have an addition boolean that is `true` if and only if it has been produced by a `_coef` function that thoroughly checks for validity. Since users would not be able to set it to `true` within the Stan language, they could not call the `metalog_qf` without having called `metalog_coef`. However, that seems heavy-handed, and there could be valid ways to construct the vector of coefficients in a higher-level interface and passing them to the `data` block of their Stan programs. +Steve Bronder suggested that if the user were to call `real theta = metalog_qf(p, metalog_coef(depths, quantiles));` in the `transformed parameters` block but where both `depths` and `quantiles` are fixed, then the parser could elevate the call to `metalog_coef(depths, quantiles)` to the end of the `transformed data` block. If the parser could also disable the check inside `metalog_qf` that the quantile function is increasing throughout the [0,1] interval, that could work as well and be implemented even after `metalog_qf` has been implemented. + It has been suggested that if the `metalog_qf` implied by `depths` and `quantiles` is not strictly increasing over the entire [0,1] interval, then we could drop some of the points so that it becomes strictly increasing. That would not work well in HMC if `depths` and / or `quantiles` were an unknown `var` because of the discontinuities, but even in the case where both `depths` and `quantiles` are fixed data, this would be taking the unprecedented (in Stan) step of conditioning on something other than what the user said to condition on, even though what the user said to condition on makes no sense. I suppose we do not have to merge the quantile function pull request(s) at all, but I am obligated to make the branches under a NSF grant. This NSF grant also funds me and Philip to interpolate the log-likelihood as a function of `p`, as opposed to a function of 𝜃. We can potentially interpolate the log-likelihood as a function of `p` much faster than we can evaluate the log-likelihood as a function of 𝜃, but in order to do that, users would need to reparameterize their Stan programs so that `p` is declared in the `parameters` block, in which case they would need to use quantile functions in the `transformed parameters` block to express their prior beliefs about 𝜃. Thus, if we do not implement a lot of quantile functions for users to express their prior beliefs with, then they cannot capture the gains in speed from interpolating the log-likelihood as a function of `p`. From 3698564f7579f8d4cb65c35ff52a7f0d6c696c00 Mon Sep 17 00:00:00 2001 From: Ben Goodrich Date: Sun, 5 Sep 2021 00:40:27 -0400 Subject: [PATCH 9/9] fix GLD quantile function --- designs/0028-quantile-functions.md | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/designs/0028-quantile-functions.md b/designs/0028-quantile-functions.md index fe5ec7e..c66effe 100644 --- a/designs/0028-quantile-functions.md +++ b/designs/0028-quantile-functions.md @@ -121,9 +121,9 @@ return_type_t cauchy_qdf(const T_p& p, ``` Here there is a minor issue of whether the signature of the `cauchy_qdf` should also take `mu` as an argument, even though the quantile density function does not depend on `mu`. -The Generalized Lambda Distribution (GLD) is a flexible probability distribution that can either be bounded or bounded on one or both sides depending on the values of its four hyperparameters. The GLD is not in Stan Math currently but is defined by its explicit quantile function since its CDF and PDF are not explicit functions. There are actually several parameterizations of the GLD quantile function that are not entirely equivalent, but the preferable parameterization looks like this in Stan code (when the median is zero and the interquartile range is one) +The Generalized Lambda Distribution (GLD) is a flexible probability distribution that can either be bounded or bounded on one or both sides depending on the values of its four hyperparameters. The GLD is not in Stan Math currently but is defined by its explicit quantile function since its CDF and PDF are not explicit functions. There are actually several parameterizations of the GLD quantile function that are not entirely equivalent, but the preferable parameterization looks like this in Stan code ``` - Standard Generalized Lambda Distribution quantile function + Generalized Lambda Distribution helper function See equation 11 of https://mpra.ub.uni-muenchen.de/37814/1/MPRA_paper_37814.pdf @@ -133,7 +133,7 @@ The Generalized Lambda Distribution (GLD) is a flexible probability distribution @param xi real steepness parameter between 0 and 1 @return real number that can be scaled and shifted to ~GLD */ - real std_gld_qf(real p, real chi, real xi) { + real S(real p, real chi, real xi) { real alpha = 0.5 * (0.5 - xi) * inv_sqrt(xi * (1 - xi)); real beta = 0.5 * chi * inv_sqrt(1 - square(chi)); if (chi < -1 || chi > 1) reject("chi must be between -1 and 1"); @@ -161,10 +161,11 @@ The Generalized Lambda Distribution (GLD) is a flexible probability distribution return not_a_number(); // never reaches } ``` -The quantile function of the general GLD would be like +The quantile function of the GLD would then be like ``` real gld_qf(p, real median, real iqr, real chi, real xi) { - return median + iqr * std_gld_qf(p, chi, xi); + return median + iqr * (S(p, chi, xi) - S(0.5, chi, xi)) + / (S(0.75, chi, xi) - S(0.25, chi, xi)); } ``` In this case, the GLD is guaranteed to be strictly increasing if the stated inequality conditions on `chi` and `xi` are satisfied. However, if the user supplies a prior median, prior lower / upper quartiles, and one other pair of a depth and a quantile, it is not guaranteed that there is any GLD quantile function that runs exactly through those four points. We can provide a helper function that takes