Metropolis-Hastings algorithm in C

In a previous post, I presented the main ideas of the Metropolis-Hastings algorithm, and showed how one may implement a random-walk Metropolis and an independent MH sampler in R. Running them however took some time until completion, hence, in this post, I’ll show how to speed things up by implementing these very algorithms in C.

We will however still rely on R to prepare the data as well as to analyze the MH samplers’ results.

This blog post is divided into four parts. First, I discuss the prerequisites, C code, and compilation of the MH algorithm into a shared library. Next, I show how to load this library into R, and how to call our C library function from within R to estimate a poisson regression model. Third, I benchmark the R and C implementations of the MH samplers to assess the speed gain obtained through compiled code. Fourth, I conclude.

Prerequisites

The MH algorithm heavily relies on mathematical structures such as matrices, vectors, and requires – among others – quality random number generators as well as statistical distribution functions. Therefore, my implementation of the Metropolis-Hastings algorithms heavily relies on the GNU Scientifiy Library (GSL), which is not part of C’s standard library. Hence, in order to be able to compile the code into working binaries, the GSL needs to present and known to the compiler. On Linux, if not already present, the GSL should be easily installable via the distribution’s package manager. On MacOS, one possibility to obtain the library is via homebrew or macports.1 There may be some binaries available online for Windows, yet the only way I have tested is to build the GSL from source.

C implementation

The implementations use several math functions from C’s standard library as well as specialized functions from the GSL: make sure you have the following headers in your code. I will not discuss in detail how the different GSL-functions used throughout this post work. For more details, please consult the GSL-documentation.

#include <math.h>
#include <stdio.h>
#include <stdlib.h>

#include "gsl/gsl_blas.h"
#include "gsl/gsl_cdf.h"
#include "gsl/gsl_linalg.h"
#include "gsl/gsl_randist.h"
#include "gsl/gsl_rng.h"
#include "gsl/gsl_sf_gamma.h"
#include "gsl/gsl_statistics.h"
#include "gsl/gsl_vector.h"

We rely on the same data we used when implementing the Metropolis-Hastings algorithm in R, and investigate the association between the number of biochemistry doctoral students’ published journal articles y_i and some of their characteristics x_i . Again, we assume the number of journal articles produced to follow a Poisson distribution:

\begin{align*} y_i &amp;\sim Pois(\lambda_i) \\ \lambda_i &amp;= exp(x_i\beta) \end{align*}

Assuming normal priors for \beta , the main building blocks are:

  1. the Poisson log-likelihood p(y, X|\beta)
  2. the log-normal prior p(\beta) , \beta \sim N(b_0, B_0)
  3. the proposal distribution J(\cdot)
  4. possibly some correction factor for unequal jumping probabilities (Hastings-extension)

The Poisson log-likelihood function evaluates the data X and y conditional on \beta and returns the corresponding log-likelihood.

  size_t n = y->size;

  // X * beta
  gsl_vector *xbeta = gsl_vector_calloc(y->size);
  gsl_blas_dgemv(CblasNoTrans, 1, X, beta, 1, xbeta);

  // exp(X * beta)
  gsl_vector *lambda = gsl_vector_alloc(y->size);
  for (size_t i = 0; i < n; i++) {
    gsl_vector_set(lambda, i, exp(xbeta->data[i]));
  }

  // poisson loglikelihood
  double ll = 0;
  for (size_t i = 0; i < n; i++) {
    ll = ll +
	 log(gsl_ran_poisson_pdf((unsigned int) y->data[i], lambda->data[i]));
  }

  gsl_vector_free(lambda);
  gsl_vector_free(xbeta);

  return ll;
}

// lognormal prior probability density function
double lognormal_pdf(const gsl_vector *beta, const gsl_vector *mu,
		     const gsl_matrix *Sigma_cholesky) {
  int k = mu->size;

Similarly, the log-prior function computes the log-probability of values of \beta conditional on the prior mean ( b_0 ) and variance ( B_0 ). The variance is specified in terms of its Cholesky-decomposition.

  double lognormal_pdf = 0;
  gsl_ran_multivariate_gaussian_log_pdf(beta, mu, Sigma_cholesky,
					&lognormal_pdf, work);

  gsl_vector_free(work);
  return lognormal_pdf;
}

// random-walk metropolis-hastings sampler
void random_walk_metropolis_hastings(const gsl_vector *y, const gsl_matrix *X,
				     const gsl_vector *beta_start,
				     const gsl_matrix *P, const gsl_vector *b_0,
				     const gsl_matrix *B_0, const size_t nsim,
				     gsl_matrix *betasim) {

The remaining steps to set up the random-walk Metropolis sampler are rather straightforward (except maybe some memory management tasks typical for C). To run this function, one needs to specify the data y , X , the mean and variance P of the multivariate (log-)normal proposal distribution, and the prior mean b_0 and variance B_0 , as well as the number of iterations to run. We also pass to the function a (pointer to the) matrix betasim, into which the results of the sampler are saved.

  // random-walk vector of means beta[i-1] and vector of means
  // from proposal distribution
  gsl_vector *beta_i_minus_1 = gsl_vector_calloc(k);
  gsl_vector *prop = gsl_vector_calloc(k);

  // proposal variance cholesky decomposition
  gsl_matrix *P_cholesky = gsl_matrix_calloc(k, k);
  gsl_matrix_memcpy(P_cholesky, P);
  gsl_linalg_cholesky_decomp1(P_cholesky);

  // prior variance cholesky decomposition
  gsl_matrix *B_0_cholesky = gsl_matrix_calloc(k, k);
  gsl_matrix_memcpy(B_0_cholesky, B_0);
  gsl_linalg_cholesky_decomp1(B_0_cholesky);

  // initialize RNG
  gsl_rng *rng = gsl_rng_alloc(gsl_rng_mt19937);
  gsl_rng_set(rng, 0);

  double r = 0;
  double alpha = 0;

  for (size_t i = 1; i < nsim; i++) {
    // 1. proposal values
    gsl_matrix_get_row(beta_i_minus_1, betasim, i - 1);
    gsl_ran_multivariate_gaussian(rng, beta_i_minus_1, P_cholesky, prop);

    // 2. compute acceptance probability
    r = exp(
	(loglik_poisson(prop, y, X) + lognormal_pdf(prop, b_0, B_0_cholesky)) -
	(loglik_poisson(beta_i_minus_1, y, X) +
	 lognormal_pdf(beta_i_minus_1, b_0, B_0_cholesky)));

    // 3. accept/reject
    alpha = (r < 1 ? r : 1);
    if (gsl_ran_flat(rng, 0, 1) <= alpha) {
      gsl_matrix_set_row(betasim, i, prop);
    } else {
      gsl_matrix_set_row(betasim, i, beta_i_minus_1);
    }
  }

  // free allocated memory
  gsl_vector_free(beta_i_minus_1);
  gsl_vector_free(prop);
  gsl_matrix_free(P_cholesky);
  gsl_matrix_free(B_0_cholesky);
  gsl_rng_free(rng);
}

// independence metropolis-hastings sampler
void independence_metropolis_hastings(const gsl_vector *y, const gsl_matrix *X,
				      const gsl_vector *theta_1,
				      const gsl_matrix *V_1,
				      const gsl_vector *b_0,
				      const gsl_matrix *B_0, const size_t nsim,
				      gsl_matrix *betasim) {
  int k = X->size2;

The structure of the independent Metropolis sampler is almost identical to the random-walk sampler’s. What differs is that the candidate values are now generated from a proposal distribution with a fixed mean \theta_1 , and variance V_1 . Consequently, we need to adjust for unequal jumping probabilities when computing the acceptance ratio, as P_J(\theta^{(t-1)}|\theta^*) \neq P_J(\theta^*|\theta^{(t-1)}) .

  gsl_vector *beta_i_minus_1 = gsl_vector_calloc(k);
  gsl_vector *prop = gsl_vector_calloc(k);

  // proposal variance cholesky decomposition
  gsl_matrix *V_1_cholesky = gsl_matrix_calloc(k, k);
  gsl_matrix_memcpy(V_1_cholesky, V_1);
  gsl_linalg_cholesky_decomp1(V_1_cholesky);

  // prior variance cholesky decomposition
  gsl_matrix *B_0_cholesky = gsl_matrix_calloc(k, k);
  gsl_matrix_memcpy(B_0_cholesky, B_0);
  gsl_linalg_cholesky_decomp1(B_0_cholesky);

  // initialize RNG
  gsl_rng *rng = gsl_rng_alloc(gsl_rng_mt19937);
  gsl_rng_set(rng, 0);

  double r = 0;
  double alpha = 0;

  for (size_t i = 1; i < nsim; i++) {
    // 1. proposal values
    gsl_matrix_get_row(beta_i_minus_1, betasim, i - 1);
    gsl_ran_multivariate_gaussian(rng, theta_1, V_1_cholesky, prop);

    // 2. compute acceptance probability
    r = exp(
	(loglik_poisson(prop, y, X) + lognormal_pdf(prop, b_0, B_0_cholesky)) -
	(loglik_poisson(beta_i_minus_1, y, X) +
	 lognormal_pdf(beta_i_minus_1, b_0, B_0_cholesky)) +
	(lognormal_pdf(beta_i_minus_1, theta_1, V_1_cholesky) -
	 lognormal_pdf(prop, theta_1, V_1_cholesky)));

    // 3. accept/reject
    alpha = (r < 1 ? r : 1);
    if (gsl_ran_flat(rng, 0, 1) <= alpha) {
      gsl_matrix_set_row(betasim, i, prop);
    } else {
      gsl_matrix_set_row(betasim, i, beta_i_minus_1);
    }
  }

  // free allocated memory
  gsl_vector_free(beta_i_minus_1);
  gsl_vector_free(prop);
  gsl_matrix_free(V_1_cholesky);
  gsl_matrix_free(B_0_cholesky);
  gsl_rng_free(rng);
}

// wrapper function creating gsl-vectors and matrices from array inputs,
// calling random_walk_metropolis_hastings() or
// independence_metropolis_hastings()
void metropolis_hastings(const double *yy, const double *XX, const int *rows,
			 const int *cols, const double *betastart,
			 const double *PP, const double *b0, const double *B0,
			 const int *iter, const int *type, double *Betasim) {
  int n = *rows;
  int k = *cols;
  int nsim = *iter;

Building the Metropolis-Hastings C binary

Remember that we want to call our MH-samplers from R: We will load and prepare the data in R, and pass these R objects to the MH-samplers. The C functions described above require GSL data types to work properly. Given that R does not pass its objects to C as GSL data types, we need a wrapper function that appropriately converts the data passed from R to be used by our functions, and to convert the results back to something R understands. The data we will pass from R consists of:

  1. dependent variable vector y and covariate matrix X
  2. number of rows and columns of X
  3. proposal mean and variance
  4. prior mean Aand variance
  5. number of iterations
  6. which sampler to use; (1) random-walk, (2) independent MH
  7. empty results matrix, into which the results will be written

See in the code below for one way of implementing the wrapper function.

  for (int i = 0; i < n; i++) {
    gsl_vector_set(y, i, yy[i]);
  }

  gsl_matrix *X = gsl_matrix_calloc(n, k);
  for (int i = 0; i < n; i++) {
    for (int j = 0; j < k; j++) {
      gsl_matrix_set(X, i, j, XX[i * k + j]);
    }
  }

  // start values (proposal mean of first iteration),
  // variance of proposal distribution
  gsl_vector *beta_start = gsl_vector_calloc(k);
  for (int i = 0; i < k; i++) {
    gsl_vector_set(beta_start, i, betastart[i]);
  }

  gsl_matrix *P = gsl_matrix_calloc(k, k);
  for (int i = 0; i < k; i++) {
    for (int j = 0; j < k; j++) {
      gsl_matrix_set(P, i, j, PP[i * k + j]);
    }
  }

  // results-matrix
  gsl_matrix *betasim = gsl_matrix_calloc(nsim, k);
  for (int i = 0; i < nsim; i++) {
    gsl_matrix_set_row(betasim, i, beta_start);
  }

  // prior mean and variance
  gsl_vector *b_0 = gsl_vector_calloc(k);
  for (int i = 0; i < k; i++) {
    gsl_vector_set(b_0, i, b0[i]);
  }

  gsl_matrix *B_0 = gsl_matrix_calloc(k, k);
  for (int i = 0; i < k; i++) {
    for (int j = 0; j < k; j++) {
      gsl_matrix_set(B_0, i, j, B0[i * k + j]);
    }
  }

  // run metropolis hastings algorithm
  if (*type == 1) {
    random_walk_metropolis_hastings(y, X, beta_start, P, b_0, B_0, nsim,
				    betasim);
  } else if (*type == 2) {
    independence_metropolis_hastings(y, X, beta_start, P, b_0, B_0, nsim,
				     betasim);
  }

  // pass results to results-array
  for (int i = 0; i < nsim; i++) {
    for (int j = 0; j < k; j++) {
      Betasim[i * k + j] = gsl_matrix_get(betasim, i, j);
    }
  }

  // free allocated memory
  gsl_vector_free(y);
  gsl_matrix_free(X);

  gsl_vector_free(beta_start);
  gsl_matrix_free(P);
  gsl_matrix_free(betasim);

  gsl_vector_free(b_0);
  gsl_matrix_free(B_0);
}
# ---- cat metropolis_hastings_c.R ----
## ---- Implementing the Metropolis-Hastings algorithm in C ----
library(pscl)

# ---- Data ----
head(bioChemists, 10)
y <- bioChemists$art
X <- model.matrix(art ~ ., data = bioChemists)
dimnames(X)[[2]] <- c("intercept", "woman", "married",
		      "kid", "phd", "mentor")

# ---- Priors ----
k <- ncol(X)
b_0 <- vector(mode = "numeric", length = k)
B_0 <- diag(rep(10^4, k))

This being said, we can now compile our C code into a shared library (.so/.dylib/.dll) using GCC (or any other decent C compiler).2

gcc metropolis_hastings_rw.c -lgsl -lgslcblas -lm -std=c99 -shared -o metropolis_hastings_rw.so

Using the MH-sampler library in R

Having built our library metropolis_hastings_rw.so, let us estimate the models we already encountered when implementing the MH algorithm in R. First of all, we have to load the data, and prepare them for use with the new functions. First, we load the biochemists data, and extract our dependent variable – the count of journal articles published by biochemistry doctoral students ( y ) – and the model covariate matrix X containing the doctoral students’ characteristics. Follows the prior mean b_0 and and variance B_0 , as well as the mean and variance of the proposal density.

The proposal density’s means differ according to the algorithm used. In the random-walk case, the mean equals the value that was accepted at the previous iteration of the algorithm. When using the independent MH, we set the mean to \beta_1 = (B_0^{-1} + V^{-1})^{-1} (B_0^{-1} b_0 + V^{-1} \hat\beta) . In both cases, the variance is set to P = V_1 = T (B_0^{-1} + V^{-1})^{-1} T with T = 1.1\cdot I , and \hat\beta and V being the MLE estimates and variance-covariance matrix.

inits <- glm(art ~ ., data = bioChemists,
	     family = poisson(link = "log"))

# Random-walk MH
beta_start <- coef(inits)
V <- vcov(inits)
Tmat <- diag(rep(1.1, k))
P <- Tmat %*% solve(solve(B_0) + solve(V)) %*% Tmat

# Independent MH
beta_hat <- coef(inits)
beta_1  <- solve(solve(B_0) + solve(V)) %*%
  (solve(B_0) %*% b_0 + solve(V) %*% beta_hat)
V_1 <- Tmat %*% solve(solve(B_0) + solve(V)) %*% Tmat

# ---- Input objects ----
iter_rw <- as.integer(10^5)
iter_in <- as.integer(10^4)

# data
rows_of_X <- nrow(X)
cols_of_X <- ncol(X)

y <- as.double(y)

As noted earlier, our C functions require specific GSL data types; R does not use these data types. Furthermore, its .C()-interface passes any data as pointers to the R objects’ location in memory. Thus, the interface provides the objects’ address, information about their structure, (e.g. dimensions of a matrix, vector length) however is lost. To alleviate this issue, we have to transform our R objects, and pass the transformed objects together with information about their original structure to .C(). With the information about the objects’ location in memory and their structure, we can reconstruct them in C, with a data type suitable for use in our MH-functions.3

beta_start <- as.double(beta_start)
P <- as.double(as.vector(t(P)))

beta_hat <- coef(inits)
theta_1  <- as.vector(beta_1)
V_1 <- as.double(as.vector(t(V_1)))

# create empty results object to be filled by C functions
betasim_rw <- vector(mode = "double", length = iter_rw * cols_of_X)
betasim_in <- vector(mode = "double", length = iter_in * cols_of_X)

# ---- Run Metropolis-Hastings algorithms ----
# ---- load shared library ----
dyn.load("bin/metropolis_hastings.so")

# ---- run algorithms ----
x_rw <- .C("metropolis_hastings",
	   y, X,
	   rows_of_X, cols_of_X,
	   beta_start, P,
	   b_0, B_0,
	   iter_rw,
	   1L,
	   betasim_rw)

x_in <- .C("metropolis_hastings",

What kind of transformations are necessary? In short, we need to take care of those objects whose structural information is lost when passing solely their addresses, in our case vectors and matrices. Vectors are stored in a contiguous block of memory; hence when passing them to .C(), we need to make sure to pass the correct data type as well as information about the vectors’ length. Matrices however require a bit more work. First, we need to transform the matrix into a vector (of length n \times k ). We also have to keep in mind that matrices are stored in column-major order in R, whereas C stores multidimensional arrays in row-major order. Therefore, we transpose R’s matrices using t(), and then transform them into vectors with as.vector().

With these preparations being done, we can load our library and call the Metropolis-Hastings functions:4

	   beta_1, V_1,
	   b_0, B_0,
	   iter_in,
	   2L,
	   betasim_in)

# ---- unload shared library ----
dyn.unload("bin/metropolis_hastings.so")

# ---- Prepare and save results ----
results_rw <- matrix(x_rw[[11]], byrow = TRUE, ncol = 6)[-c(1:1000), ]
colnames(results_rw) <- c("intercept", "woman", "married", "kid", "phd", "mentor")

results_in <- matrix(x_in[[11]], byrow = TRUE, ncol = 6)[-c(1:1000), ]
colnames(results_in) <- c("intercept", "woman", "married", "kid", "phd", "mentor")

save(list = c("results_rw", "results_in"), file = "results_data/metropolis_hastings_c.RData")

# ---- Results table ----
library(dplyr)
library(orgutils)
rw_df <- cbind(`estimate` = apply(results_rw, 2, mean),
	       `2.5%` = apply(results_rw, 2, quantile, 0.025),
	       `97.5%` = apply(results_rw, 2, quantile, 0.975),
Table 1: Random-walk MH: Bayesian Poisson regression estimates, 95% credible intervals, standard deviations, and probability of negative/positive association
estimate 2.5% 97.5% \sigma_{\textbf{estimate}} P(\beta &lt; 0) P(\beta &gt; 0)
intercept 0.302 0.096 0.502 0.104 0.002 0.998
woman -0.226 -0.332 -0.118 0.055 1.000 0.000
married 0.157 0.036 0.280 0.062 0.006 0.994
kid -0.186 -0.266 -0.109 0.040 1.000 0.000
phd 0.013 -0.038 0.065 0.026 0.313 0.687
mentor 0.026 0.022 0.029 0.002 0.000 1.000

Running our C implementation of the random-walk Metropolis-Hastings algorithm for 100,000 iterations produces results almost identical5 to those of R (see table 1, and here for the results of the R implementation). What does substantially differ however is the time the C and R implementations take to finish. Indeed, whereas the pure R version took about 45 seconds to finish, the C version completed after roughly 10 seconds.

Some speed comparisons

Execution times of 10 vs 45 seconds quite strongly indicate that our C implementation or the random-walk Metropolis-Hastings algorithm is in fact faster than the equivalent R function. Running each function only once however, we shouldn’t be too certain about these estimates. To get a better idea about our functions’ speed differences, I ran the R and C implementations of the random-walk and the independent MH-algorithms 1,000 times for 1,000 iterations.

Table 2: Metropolis-Hastings algorithm: Implementation-dependent computation time required to run 1,000 iterations (time in milli-seconds)
avg. duration 2.5% 97.5% R/C 2.5% 97.5%
random-walk R 471.39 447.28 522.13 4.70 4.26 5.25
random-walk C 100.55 97.65 108.24 1.00 1.00 1.00
independent R 773.97 746.06 846.86 7.62 7.01 8.45
independent C 101.59 98.57 108.13 1.00 1.00 1.00

Table 2 summarizes the time each implementation of the algorithms took to generate 1,000 values of the posterior. Clearly, the random-walk MH algorithm written in C is on average almost four times faster than its R equivalent, and in 95 % of the time the speedup ranges between the 4.3 and the 5.3-fold speed of R. The speedup is even more pronounced when it comes to the independent MH algorithms. Here, we find on average a more than 7-fold [7.0, 8.5] increase in speed.

The main cause for the differences in speedup are the algorithms: The random-walk algorithm draws from a symmetric proposal distribution, and hence does not require any correction step when computing the aqcceptance ratios. The independent MH sampler however, drawing candidate values from a distribution with fixed mean, requires such a correction step. As a matter of fact, comparing the respective R functions reveals that the independent MH sampler is roughly 1.6 times slower than the random-walk. Interestingly, the MH algorithms implemented in C display no such difference in computation time.

Conclusion

In this post, we have seen one way to implement the Metropolis-Hastings algorithm in C, motivated by the observation that running the same algorithm in R takes a while. The C implementations do indeed allow for faster computation times, with speedups from around 4.7 (on my machine) to 7.6, depending on the algorithm.

Note however that these speedups do not come without cost, as writing C code may take longer than writing the same ideas in e.g. R: C requires manual memory allocation (and keeping track of allocated memory) and compilation, which makes testing more tedious than in R. Furthermore, interfacing R with C requires knowledge about how each language stores data internally, and how to pass data around between them.

Note: For the C and R code discussed in this blog post, see https://git.staudtlex.de/blog/metropolis-hastings-c.


  1. An alternative is to build the library from source. Note however that some additional work may be required to let the compiler know where to find the library when compiling. ↩︎

  2. On Windows, note that you may have to adjust library and include-paths in order for the compiler to find the GSL-library. ↩︎

  3. The .C() interface provides direct (yet quite bare-bones) access to any shared C binary that takes inputs from external sources. R also comes with the higher-level .Call() interface, which has the advantage of R objects keeping their structural information when passed to C. However, this requires to include R-specific header files into our C-code, and to link it to an R binary. In this post however, I want the C code to be independent of any R specific headers, thus using .C()↩︎

  4. Be aware that passing inappropriate arguments to .C() may quite easily result in crashing the current R session. Proceed with caution. ↩︎

  5. The differences arising from both implementations are most likely due to different random-number generators and/or initial states of the random number generators. ↩︎