Interfacing a Fortran library function with C

Sat, Jul 18, 2020

In this post, I describe how to call a Fortran linear regression function from C, as well as the steps required to import the data from a space delimited text file.

Using libraries

There are several reasons for using libraries. For instance, a library might provide very fast or very specialized algorithms and functionalities that we would otherwise have to write from scratch. Using existing code avoids the risk of introducing idiosyncratic coding errors¹ and reduces development time. When the project’s programming language and the library we want to use are written in the same language, there is not much to including a library:

In R, use library(package_name) to use the required library
In C/C++, include the relevant header-files and – if required – link to the appropriate library file
In Java and Python, import the classes/packages to be used

Yet, it may as well happen that the library of interest is implemented in a different programming language. In that case, “importing” that library to call the desired function requires a bit more effort. In a series of posts, I will demonstrate how to call a Fortran library function from three programming languages, namely C, Java, and Python.² This post treats interfacing Fortran with C.

The remainder of this post is divided into six sections. In the first section, I discuss why (of all programming languages available) I want to use C to interface the Fortran library function. Section two and three describe the Fortran function that we will call from C, as well as the data that we will use. Section four discusses the tasks the code needs to take care of, and their implementation in C. Section five demonstrates how to compile and run the Fortran library function from C. The last section concludes.

Why C?

C, first developed in the early 1970s, is a programming language that is widely used for operating systems programming (e.g. Unix, Linux, macOS/Darwin, Windows NT are written in C).³ Being available on so many operating systems, C constitutes the lingua franca for interfacing with different programming languages: Interfacing is often achieved by translating language-specific data types into their C equivalents, passing these to the compiled (here: Fortran) code, and translating back to the calling language. Playing the key role in this process, it appears reasonable to first interface compiled Fortran code directly with C.

Fortran linear regression library

The function we will call is the ordinary least squares subroutine we wrote in a previous blog post to estimate regression coefficients using ordinary least squares in Fortran. The subroutine is defined as follows:

            
              
                  
                     1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41

                  
                  
                    subroutine linreg(y, X, n, k, beta, vcov, se)

  ! avoid variables i, j, k, l, n to be treated as integer automatically
  implicit none

  ! declare type of input objects
  integer :: n, k
  real(8) :: y(n), X(n, k), beta(k), vcov(k, k), se(k)

  ! type of intermediary objects
  integer :: ipiv(k), info, i
  real(8) :: XpX(k, k), invXpX(k, k), work(k), Xpy(k), e(n)

  ! compute point estimates:
  ! intermediary steps
  ! (1) X'X
  XpX = matmul(transpose(X), X)

  ! (2) (X'X)^-1: invert XpX using LAPACK functions
  invXpX = XpX
  call dgetrf(k, k, invXpX, k, ipiv, info)
  call dgetri(k, invXpX, k, ipiv, work, k, info)

  ! (3) X'y
  Xpy = matmul(transpose(X), y)

  ! (4) beta = (X'X)^-1 X'y
  beta = matmul(invXpX, Xpy)

  ! compute variance-covariance matrix:
  ! var(betahat|X) = sigmahat^2 * (X'X)^-1
  ! sigmahat^2 = e'e / (n - k)
  ! e'e = (y - Xbeta)'(y - Xbeta)
  e = y - matmul(X, beta)
  vcov = dot_product(e, e) / (n - k) * invXpX

  ! compute standard errors:
  ! square root of main diagonal of variance-covariance matrix
  se = (/ (sqrt(vcov(i, i)), i = 1,k) /)

end subroutine linreg

                  
                

            
          

After compiling the code into a shared library, let’s have a look at its symbol table (using the UNIX nm command) to see what objects it contains:

gfortran linreg.f95 -o liblinreg.so -shared -lblas -llapack -std=f95
nm -g -n liblinreg.so

		 w __cxa_finalize@@GLIBC_2.2.5
		 U dgetrf_
		 U dgetri_
		 U free@@GLIBC_2.2.5
		 U _gfortran_matmul_r8@@GFORTRAN_1.0
		 w __gmon_start__
		 w _ITM_deregisterTMCloneTable
		 w _ITM_registerTMCloneTable
		 w _Jv_RegisterClasses
		 U malloc@@GLIBC_2.2.5
00000000000006a0 T _init
0000000000000830 T linreg_
0000000000001490 T _fini
0000000000202048 B __bss_start
0000000000202048 D _edata
0000000000202050 B _end

The library contains a reference to our linreg-subroutine (linreg_) as well as references to functions called within the subroutine, i.e. memory allocation functions (malloc and free), intrinsics (_gfortran_matmul_r8), and LAPACK-routines (dgetrf_, dgetri_). Note that the various underscores are due to compiler name mangling rules.⁴ Running the linreg-subroutine requires calling the function linreg_ from C (or other languages such as Java or Python).

Data and model

I will estimate an OLS regression model based on the R attitude dataframe, which contains aggregated responses of clerical employees of a large financial organization. Each observation corresponds to the responses of approximately 35 employees of each randomly selected department. To keep things simple, I exported the R dataframe as a space separated text file. The first row of that text file contains the variable names, the second and third the number of observations and variables respectively. The respondents’ response values are stored in the subsequent rows.

rating complaints privileges learning raises critical advance
30 30 30 30 30 30 30
7 7 7 7 7 7 7
43 51 30 39 61 92 45
63 64 51 54 63 73 47
71 70 68 69 76 86 48
61 63 45 47 54 84 35
81 78 56 66 71 83 47
43 55 49 44 54 49 34

The response values represent the average percent proportion of favourable responses to seven questions in each department.⁵

Table 1: Variable description

Name	Symbol	Description
rating		overall rating
complaints		handling of employee concerns
privileges		does not allow special privileges
learning		opportunity to learn
raises		raises based on performance
critical		too critical
advance		advancement

Given this data, we want to estimate how much each variable contributes to employees’ overall department rating. We model this relationship as a linear model:

C code

Given the Fortran subroutine linreg and the attitude data stored as a space-separated text file, what does the C code need to do to successfully call linreg?

First, we have to provide the C compiler with information about the external function and its inputs (see line 6-7). From the source code, we know that linreg takes as arguments a vector for the dependend variable, a covariate matrix , the number of cases and covariates, as well as an (empty) vector for the estimated coefficients , an (empty) variance-covariance matrix vcov and an (empty) vector for the estimated standard errors se. Since Fortran passes function inputs by reference, the corresponding inputs of the function prototype are declared as pointers to double or integer.

Second, import the space-separated text file containing the attitude data and write the information into appropriate arrays (lines 12-61). In this example, this involves

to read the variable names (lines 16-38)
to determine the number of observations and variables (lines 41-46)
to allocate an array of appropriate size (line 49)
to read/parse the remaining data and to store it in the array (lines 50-60)

Note that in C, data is layed out in row-major order, whereas Fortran uses column-major order. To avoid any headaches converting arrays from one layout to the other, fill the array column-wise (this will make column manipuliations easier).

Third, prepare the function inputs, i.e. the dependent variable , the covariate matrix (including the intercept), as well as the vectors of coefficients and standard errors, and the variance-covariance matrix (lines 64-77).

Fourth, call linreg (line 80). The remaining code deals with printing the results to the screen (lines 83-96) and freeing the previously allocated memory (99-104).

            
              
                  
                      1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107

                  
                  
                    #include <stdio.h>
#include <stdlib.h>
#include <string.h>

// declare prototype of external function
extern void linreg_(double *y, double *x, int *n, int *k, double *beta,
		    double *vcov, double *se);

int main(void) {

  // read data from file
  FILE *file;
  file = fopen("attitude.txt", "r");

  // variable names (first row)
  char str[100]; // assuming the data is at most 100 characters wide
  fgets(str, sizeof(str), file);

  char delim[2] = " ";
  char **varnames = calloc(100, sizeof(char*));
  char *token = strtok(str, delim);

  int i = 0;
  int j = 0;
  int strl = 0;

  while (token != NULL) {
      varnames[i] = strdup(token);
      token = strtok(NULL, delim);
      // replace newlines with line null-string
      strl = strlen(varnames[i]);
      if (varnames[i][strl - 1] == '\n') {
	  varnames[i][strl - 1] = '\0';
      }
      i++;
  }
  varnames = realloc(varnames, i * sizeof(char*));
  varnames[0] = "intercept";

  // number of observations (second row)
  fgets(str, sizeof(str), file);
  int n = atoi(str);

  // number of variables (third row)
  fgets(str, sizeof(str), file);
  int k = atoi(str);

  // remaining data
  int *f_data = calloc(n * k, sizeof(int));
  i = 0;
  while (fgets(str, sizeof(str), file) != NULL) {
      j = 0;
      token = strtok(str, delim);
      while (token != NULL) {
	  f_data[n * j + i] = atoi(strdup(token));
	  token = strtok(NULL, delim);
	  j++;
      }
      i++;
  }
  fclose(file);

  // set up arrays for OLS-function
  double *y = calloc(n, sizeof(double));
  double *X = calloc(n * k, sizeof(double));
  double *beta = calloc(k, sizeof(double));
  double *vcov = calloc(k * k, sizeof(double));
  double *se = calloc(k, sizeof(double));

  // data is written to arrays in column-major format
  for (int i = 0; i < n; i++) {
    y[n * 0 + i] = f_data[i];
    X[n * 0 + i] = 1;
    for (int j = 1; j < k; j++) {
      X[n * j + i] = f_data[n * j + i];
    }
  }

  // call Fortran OLS-function
  linreg_(y, X, &n, &k, beta, vcov, se);

  // display results
  printf("\nData:\n");
  printf("========\n");
  printf("n = %i\n", n);
  printf("k = %i\n", k);
  printf("========\n");
  printf("\n");
  printf("Results:\n");
  printf("=======================================\n");
  printf("           Estimate  Std. Err  z-Value \n");
  printf("=======================================\n");
  for (int i = 0; i < k; i++) {
      printf("%-10s   %6.3f    %6.3f   %6.3f\n", varnames[i], beta[i], se[i], beta[i] / se[i]);
  }
  printf("=======================================\n");

  // free memory
  free(f_data);
  free(y);
  free(X);
  free(beta);
  free(vcov);
  free(se);

  return 0;
}

                  
                

            
          

Compilation and results

Having defined the necessary C code, we are now all set to create the binary (which will be named a.out) that will automatically load the data and run the Fortran linear regression function. Note that in the C code, we specified attitude.txt to be located in the same directory as a.out. Hence, make sure that the binary and the data file share the same directory.

# compile
gcc call_fortran.c liblinreg.so -Wall -Wpedantic -std=c99 -D_POSIX_C_SOURCE=200809L
# run
./a.out

Running the binary yields the output displayed below⁶, comprising the dimensions of the data (30 observations, 7 variables) and the regression estimates. According to the estimates, complaints (the way how complaints are handled) is positively associated with the overall rating. An increase of one percentage point in complaints is associated with an in increase in the overall rating by around 0.6 percentage points (±0.3).⁷

Data:
========
n = 30
k = 7
========

Results:
=======================================
	   Estimate  Std. Err  z-Value
=======================================
intercept    10.787    11.589    0.931
complaints    0.613     0.161    3.809
privileges   -0.073     0.136   -0.538
learning      0.320     0.169    1.901
raises        0.082     0.221    0.369
critical      0.038     0.147    0.261
advance      -0.217     0.178   -1.218
=======================================

Concluding remarks

In this (first) post about interfacing foreign language functions, we saw how to call our linreg Fortran function from C. The steps required to run the Fortran library from C was (1) to determine the function name (as displayed with the nm command) and input data types (int, real, double, char etc.), and (2) to write C code that takes care of data handling, memory allocation, and calling the foreign function, and (3) to compile, link, and run the resulting binary. In an upcoming post, I will show how to interface the same library with Java.⁸

Note: For this post’s accompanying code, see https://git.staudtlex.de/blog/call-fortran-from-c.

Assuming that said library has been extensively tested and proven reliable. ↩︎
For information about how to load and call foreign library functions from R, see here and the R documentation, especially dyn.load() and dyn.unload() and Registering native routines. ↩︎
Furthermore, C provides a low-level access to typical machine instructions, allowing the efficient implementation of computationally demanding algorithms. For further information about C, see this Wikipedia-entry. ↩︎
See Fortran naming conventions. ↩︎
For more details, see the R documentation. ↩︎
For a comparison of the results of our Fortran linreg library with R’s lm(), see this blog post. ↩︎
Learning (the opportunities given to employees to learn) as well seems to be positvely associated with the overall rating, yet the effect size of 0.32 (±0.33) does not statistically differ from zero at the 95%-level. ↩︎
For a quick overview about the Java programming language, see this Wikipedia entry. ↩︎