diagis
is an R package containing functions relating
weighted samples obtained for example from importance sampling. The main
motivation for developing diagis
was to enable easy
computation of summary statistics and diagnostics of the weighted
IS-MCMC runs provided by bssm
package (Helske and Vihola
2021; Vihola, Helske, and Franks
2020) for Bayesian state space modelling. For more broader
use, the diagis
package provides functions for computing
(probability) weighted means, covariances, and quantiles of possibly
multivariate samples, the running versions of these, as well as
diagnostic plot function weight_plot
for graphical
diagnostic of weights.
All the mean and covariance functions are written in C++ using
Rcpp
(Eddelbuettel and François 2011; Eddelbuettel 2013) and
RcppArmadillo
(Eddelbuettel and Sanderson
2014) packages, making these function computationally very
efficient even for large samples. The weight diagnostic plot uses
ggplot
(Wickham 2009) for visually appealing
graphics, while gridExtra
(Auguie 2016) combines the plots
together.
As an illustration, consider estimating the expected value of Gamma(α, β) distribution using importance sampling. The density of Gamma(α, β) is $$ p(x) = \frac{\beta^{\alpha}}{\Gamma(\alpha)} x^{\alpha - 1} \exp(-\beta x). $$ We will use another Gamma distribution with parameters a and b as our proposal distribution q(x), so the weights are of form $$ w(x) = \frac{p(x)}{q(x)} = \frac{\frac{\beta^{\alpha}}{\Gamma(\alpha)} x^{\alpha - 1} \exp(-\beta x)}{\frac{b^{a}}{\Gamma(a)} x^{a - 1} \exp(-b x)} = \frac{\Gamma(a)\beta^{\alpha}}{\Gamma(\alpha)b^a}x^{\alpha - a}\exp(-(\beta -b)x), $$ and our importance sampling estimator for θ = Ep(X) is $$ \hat \theta_n = \frac{1}{n}\sum_{i=1}^n Y_i w(Y_i), $$ where Yi, i = 1, …, n are drawn from Gamma(a, b). However, in practice we often have access only to unnormalized weights wu(x) = cw(x), which leads to self-normalized importance sampling estimate $$ \tilde \theta_n = \frac{1}{\sum_{i=1}^n w_u(Y_i)}\sum_{i=1}^n Y_i w_u(Y_i). $$
Note that the self-normalized estimate θ̃n is not
unbiased, but still consistent. Also, we might want to use
self-normalized version even when the computation of weights w is possible (Owen 2013). Thus
the functions in diagis
are focused on self-normalized
importance sampling as it is more generally applicable approach.
We can in principle choose a and b arbitrarily, but weights w(x) have finite variance only if a < α and b < β. As we will soon see, infinite variance of the weights can make the importance sampling unreliable.
Let us first take α = 2, and β = 1. The expected value is then α/β = 2. For proposal distribution, let us first use a = 1 and b = 0.5:
library("diagis")
set.seed(1)
x <- rgamma(10000, 1, 0.75)
w <- dgamma(x, 2, 1) / dgamma(x, 1, 0.75)
plot(w)
## [1] 2.012761
Our estimate is fairly close to the theoretical value, and the plot of weights w suggest that the distribution of the weights is nearly uniform with upper bound slightly below 2 (actual bound can be computed theoretically based on w(x)).
Let us now change a to 2:
set.seed(1)
x_bad <- rgamma(10000, 1, 2)
w_bad <- dgamma(x_bad, 2, 1) / dgamma(x_bad, 1, 2)
plot(w_bad)
## [1] 2.313655
Now our importance sampling estimate is cleary off, and the plot of
the importance weights show what is going on: There is one huge weight
and couple others which will dominate our mean estimator. We can use
weight_plot
function to further illustrate the differences
between our two approaches. First let’s check the results from our good
IS case:
The function weight_plot
draws four figures. The first one
shows 100 largest weights, again illustrating that there are no single
draws dominating the sample. On the second figure, the weights are
sorted and drawn in increasing order. This figure has mixed uses,
sometimes it shows interesting phenomena better than for example a
histogram (which often needs some tweaking), whereas in other cases its
information value is quite small. Here we see that the weights are
distributed quite uniformly. The third figure is often the most
important; it shows the variance of the weights computed from successive
samples $w_1, , w_t, t = 1, …, n. If the
importance weights have finite variance, this figure should show clear
converge towards finite values as is the case here. The final figure
shows the effective sample sizes again in a form of running line. The
effective sample size (ESS) is defined as $$
ESS_n = \frac{(\sum_{i=1}^n w_i)^2}{\sum_{i=1}^n w^2_i} =
\frac{1}{\sum_{i=1}^n \bar w^2_i},
$$ where $\bar w_i = w_i / \sum_{i=1}^n
w_i$. The interpretation of ESS is that our importance sampling
corresponds to the case with direct simulation from the target
distribution p(x)
using ESS samples (i.e. larger the ESS is better). Other measures of
efficiency are also available in literature, and they might be added to
diagis
in future.
Now let’s use weight_plot
function for our bad IS
run:
Oops! The figures are pretty self-explanatory, the running statistics look fine at first, but when the we take that one huge weight into account, the variance explodes and at the same time ESS drops significantly as that one (xi, wi) pair dominates the sums in ESS.
The diagis
package also contains function
weighted_quantile
for computing weighted quantiles (or
perhaps more precisely percentiles, but we call them quantiles as ìn
quantile
method in base R). As for non-weighted quantiles,
there are several ways to compute weighted quantiles, and the one used
in diagis
is based on the following linear interpolation of
the weighted empirical CDF (in nonweighted case, this corresponds to
type 4 quantile in quantile
). Let x1, …, xn
be an ordered (increasing) vector, w1, …, wn
the corresponding weights with $\sum_{i=1}^n
w_i=1$, and p the
target probability for which we want to find the corresponding quantile
q. Now let $w'_j=\sum_{i=1}^jw_i$ be the cumulative sum
of weights from 1 to j, and find index k so that wk − 1 < p
and wk ≥ p.
Then
$$
q = x_{k-1} + (x_k - x_{k-1})\frac{p - w_{k-1}}{w_{k} - w_{k-1}}.
$$ In case where x can
contain several identical values, these elements should be combined (and
corresponding weights added together), which is done automatically in
weighted_quantile
.