ramcmc: Building blocks for Robust Adaptive Metropolis algorithm

Introduction

This small R (R Core Team 2016) package provides key functions for the Robust Adaptive Metropolis (RAM) algorithm by (Vihola 2012). These functions can be used directly from R or the corresponding header files can be linked to other R packages.

The package cointains three functions, two of which can be useful in more general context as well: Fast Cholesky update and downdate functions, and the actual update function for the scaling matrix of the proposal distribution used in RAM. All functions are written in C++ using Rcpp (Eddelbuettel and François 2011; Eddelbuettel 2013) and RcppArmadillo (Eddelbuettel and Sanderson 2014) packages.

RAM algorithm

The idea of the RAM algorithm is to adaptively update the scaling matrix of the proposal distribution used in the general Metropolis type Markov chain Monte Carlo (MCMC) algoritm. We first fix the target acceptance rate α*, for example to the famous 0.234, and the value of the tuning parameter γ ∈ (0, 1]. Then at iteration i of the MCMC:

  1. Compute the proposal θ′ = θi − 1 + Si − 1ui, where ui is a vector of random variable, often from the standard d-dimensional Gaussian distribution, and Si − 1 is a lower diagonal matrix with positive diagonal elements.
  2. Accept the proposal with probability $\alpha_i := \min\{1, \frac{p(\theta')p(y | \theta')}{p(\theta_{i-1}) p(y | \theta_{i-1})\}}$.
  3. If the proposal θ is accepted, set θi = θ Otherwise, set θi = θi − 1.
  4. Compute the Cholesky factor Si satisfying the equation $$ S_i S_i^T = S_{i-1}\left(I + \min\{1, d i^{-\gamma}\} (\alpha_i - \alpha^{\ast}) \frac{u_i u_i^T}{\|u_i\|^2}\right) S_{i-1}^T. $$

For theoretical details and comparisons with other adaptive Metropolis algorithms, see Vihola (2012).

For step (4) of the previous algorithm, instead of computing the right hand side of the equation and performing new Cholesky decomposition, we can use rank-one update or downdate (depending on the sign of (αi − α*)) on the previous Si − 1 in order to obtain new Si, which is computationally much more efficient.

Illustration

As an example, consider a Bayesian estimation of a linear regression model yi = βxi + ϵi, ϵi ∼ N(0, σ2). For simplicity, we assume non-informative diffuse priors for all model parameters (β, σ), with constraint that σ must be positive. Let’s first simulate some data:

set.seed(1)
X <- cbind(1, rnorm(100))
theta_true <- c(1, 1, 1)
y <- X %*% theta_true[1:2] + rnorm(100)

And here is our MCMC algorithm which takes in the data (y, X), initial values for the parameters (theta0), initial value for the Cholesky factor for the multivariate normal proposal (S), number of iterations (n_iter), length of the burn-in period (n_burnin), and a logical argument adapt which tells whether we should adapt the proposal covariance or not:

metropolis <- function(y, X, theta0, S, n_iter, n_burnin, adapt = FALSE) {
  
  p <- length(theta0)
  theta <- matrix(NA, n_iter, p)
  accept <- numeric(n_iter)
  mu <- X %*% theta0[1:(p - 1)]
  posterior <- sum(dnorm(y, mean = mu, sd = theta0[p], log = TRUE))
  theta[1, ] <- theta0
  
  for (i in 2:n_iter){
    u <- rnorm(p)
    theta_prop <- theta[i - 1, ] + S %*% u
    if (theta_prop[p] > 0) {
      mu <- X %*% theta_prop[1:(p - 1)]
      posterior_prop <- sum(dnorm(y, mean = mu, sd = theta_prop[p], log = TRUE))
      acceptance_prob <- min(1, exp(posterior_prop - posterior))
      if (runif(1) < acceptance_prob) {
        accept[i] <- 1
        theta[i, ] <- theta_prop
        posterior <- posterior_prop
      }else{
        theta[i, ] <- theta[i - 1, ]
      }
    } else {
      theta[i, ] <- theta[i - 1, ]
      acceptance_prob <- 0
    }
    if(adapt & i <= n_burnin) {
      S <- ramcmc::adapt_S(S, u, acceptance_prob, i - 1)
    }
  }
  list(theta = theta[(n_burnin + 1):n_iter, ], S = S,
    acceptance_rate = sum(accept[(n_burnin + 1):n_iter]) / (n_iter - n_burnin))
}

Now we can compare an adaptive and non-adaptive versions:

mcmc <- metropolis(y, X, c(0, 0, 1), diag(1, 3), 1e4, 1e4/2)
mcmc_adapt <- metropolis(y, X, c(0, 0, 1), diag(1, 3), 1e4, 1e4/2, adapt = TRUE)
mcmc$acceptance_rate
## [1] 0.004
mcmc_adapt$acceptance_rate
## [1] 0.2464
mcmc_adapt$S
##             [,1]       [,2]      [,3]
## [1,]  0.16338847 0.00000000 0.0000000
## [2,] -0.02958480 0.18924912 0.0000000
## [3,]  0.01159992 0.01019856 0.1243577
hist(mcmc$theta[, 2], main = "theta_2")

hist(mcmc_adapt$theta[, 2], main = "theta_2")

acf(mcmc$theta)

acf(mcmc_adapt$theta)

As we can see, even when we start with a very poor proposal covariance, we get good results with the adaptive MCMC whereas the non-adaptive version is highly inefficient (there are very long autocorrelations present as the acceptance ratio is very small).

Linking to other R packages

Although one can use functions adapt_S (corresponding to the updating equation of the previous Section), chol_update, and chol_downdate in R, you can also use the underlying C++ code directly in your own R packages on the C++. In order to do so, you just need to add #include "ramcmc.h" to relevant source files in your package, and add ramcmc to LinkingTo field on the DESCRIPTION file of your package.

References

Eddelbuettel, Dirk. 2013. Seamless R and C++ Integration with Rcpp. New York: Springer.
Eddelbuettel, Dirk, and Romain François. 2011. Rcpp: Seamless R and C++ Integration.” Journal of Statistical Software 40 (8): 1–18. https://doi.org/10.18637/jss.v040.i08.
Eddelbuettel, Dirk, and Conrad Sanderson. 2014. “RcppArmadillo: Accelerating r with High-Performance c++ Linear Algebra.” Computational Statistics and Data Analysis 71: 1054–63. http://dx.doi.org/10.1016/j.csda.2013.02.005.
R Core Team. 2016. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Vihola, Matti. 2012. “Robust Adaptive Metropolis Algorithm with Coerced Acceptance Rate.” Statistics and Computing 22 (5): 997–1008. https://doi.org/10.1007/s11222-011-9269-5.