Interfacing a Fortran library function with C
In this post, I describe how to call a Fortran linear regression function from C, as well as the steps required to import the data from a space delimited text file.
Using libraries
There are several reasons for using libraries. For instance, a library might provide very fast or very specialized algorithms and functionalities that we would otherwise have to write from scratch. Using existing code avoids the risk of introducing idiosyncratic coding errors1 and reduces development time. When the project’s programming language and the library we want to use are written in the same language, there is not much to including a library:
- In R, use
library(package_name)
to use the required library - In C/C++,
include
the relevant header-files and – if required – link to the appropriate library file - In Java and Python,
import
the classes/packages to be used
Yet, it may as well happen that the library of interest is implemented in a different programming language. In that case, “importing” that library to call the desired function requires a bit more effort. In a series of posts, I will demonstrate how to call a Fortran library function from three programming languages, namely C, Java, and Python.2 This post treats interfacing Fortran with C.
The remainder of this post is divided into six sections. In the first section, I discuss why (of all programming languages available) I want to use C to interface the Fortran library function. Section two and three describe the Fortran function that we will call from C, as well as the data that we will use. Section four discusses the tasks the code needs to take care of, and their implementation in C. Section five demonstrates how to compile and run the Fortran library function from C. The last section concludes.
Why C?
C, first developed in the early 1970s, is a programming language that is widely used for operating systems programming (e.g. Unix, Linux, macOS/Darwin, Windows NT are written in C).3 Being available on so many operating systems, C constitutes the lingua franca for interfacing with different programming languages: Interfacing is often achieved by translating language-specific data types into their C equivalents, passing these to the compiled (here: Fortran) code, and translating back to the calling language. Playing the key role in this process, it appears reasonable to first interface compiled Fortran code directly with C.
Fortran linear regression library
The function we will call is the ordinary least squares subroutine we wrote in a previous blog post to estimate regression coefficients using ordinary least squares in Fortran. The subroutine is defined as follows:
|
|
After compiling the code into a shared library, let’s have a look at its symbol table (using the UNIX nm
command) to see what objects it contains:
gfortran linreg.f95 -o liblinreg.so -shared -lblas -llapack -std=f95
nm -g -n liblinreg.so
w __cxa_finalize@@GLIBC_2.2.5
U dgetrf_
U dgetri_
U free@@GLIBC_2.2.5
U _gfortran_matmul_r8@@GFORTRAN_1.0
w __gmon_start__
w _ITM_deregisterTMCloneTable
w _ITM_registerTMCloneTable
w _Jv_RegisterClasses
U malloc@@GLIBC_2.2.5
00000000000006a0 T _init
0000000000000830 T linreg_
0000000000001490 T _fini
0000000000202048 B __bss_start
0000000000202048 D _edata
0000000000202050 B _end
The library contains a reference to our linreg-subroutine (linreg_
) as well as references to functions called within the subroutine, i.e. memory allocation functions (malloc
and free
), intrinsics (_gfortran_matmul_r8
), and LAPACK-routines (dgetrf_
, dgetri_
). Note that the various underscores are due to compiler name mangling rules.4 Running the linreg-subroutine requires calling the function linreg_
from C (or other languages such as Java or Python).
Data and model
I will estimate an OLS regression model based on the R attitude
dataframe, which contains aggregated responses of clerical employees of a large financial organization. Each observation corresponds to the responses of approximately 35 employees of each randomly selected department. To keep things simple, I exported the R dataframe as a space separated text file. The first row of that text file contains the variable names, the second and third the number of observations and variables respectively. The respondents’ response values are stored in the subsequent rows.
rating complaints privileges learning raises critical advance
30 30 30 30 30 30 30
7 7 7 7 7 7 7
43 51 30 39 61 92 45
63 64 51 54 63 73 47
71 70 68 69 76 86 48
61 63 45 47 54 84 35
81 78 56 66 71 83 47
43 55 49 44 54 49 34
The response values represent the average percent proportion of favourable responses to seven questions in each department.5
Name | Symbol | Description |
---|---|---|
rating | overall rating | |
complaints | handling of employee concerns | |
privileges | does not allow special privileges | |
learning | opportunity to learn | |
raises | raises based on performance | |
critical | too critical | |
advance | advancement |
Given this data, we want to estimate how much each variable contributes to employees’ overall department rating. We model this relationship as a linear model:
C code
Given the Fortran subroutine linreg
and the attitude
data stored as a space-separated text file, what does the C code need to do to successfully call linreg
?
First, we have to provide the C compiler with information about the external function and its inputs (see line 6-7). From the source code, we know that linreg
takes as arguments a vector for the dependend variable, a covariate matrix , the number of cases and covariates, as well as an (empty) vector for the estimated coefficients , an (empty) variance-covariance matrix vcov and an (empty) vector for the estimated standard errors se. Since Fortran passes function inputs by reference, the corresponding inputs of the function prototype are declared as pointers to double or integer.
Second, import the space-separated text file containing the attitude
data and write the information into appropriate arrays (lines 12-61). In this example, this involves
- to read the variable names (lines 16-38)
- to determine the number of observations and variables (lines 41-46)
- to allocate an array of appropriate size (line 49)
- to read/parse the remaining data and to store it in the array (lines 50-60)
Note that in C, data is layed out in row-major order, whereas Fortran uses column-major order. To avoid any headaches converting arrays from one layout to the other, fill the array column-wise (this will make column manipuliations easier).
Third, prepare the function inputs, i.e. the dependent variable , the covariate matrix (including the intercept), as well as the vectors of coefficients and standard errors, and the variance-covariance matrix (lines 64-77).
Fourth, call linreg
(line 80). The remaining code deals with printing the results to the screen (lines 83-96) and freeing the previously allocated memory (99-104).
|
|
Compilation and results
Having defined the necessary C code, we are now all set to create the binary (which will be named a.out) that will automatically load the data and run the Fortran linear regression function. Note that in the C code, we specified attitude.txt
to be located in the same directory as a.out. Hence, make sure that the binary and the data file share the same directory.
# compile
gcc call_fortran.c liblinreg.so -Wall -Wpedantic -std=c99 -D_POSIX_C_SOURCE=200809L
# run
./a.out
Running the binary yields the output displayed below6, comprising the dimensions of the data (30 observations, 7 variables) and the regression estimates. According to the estimates, complaints (the way how complaints are handled) is positively associated with the overall rating. An increase of one percentage point in complaints is associated with an in increase in the overall rating by around 0.6 percentage points (±0.3).7
Data:
========
n = 30
k = 7
========
Results:
=======================================
Estimate Std. Err z-Value
=======================================
intercept 10.787 11.589 0.931
complaints 0.613 0.161 3.809
privileges -0.073 0.136 -0.538
learning 0.320 0.169 1.901
raises 0.082 0.221 0.369
critical 0.038 0.147 0.261
advance -0.217 0.178 -1.218
=======================================
Concluding remarks
In this (first) post about interfacing foreign language functions, we saw how to call our linreg
Fortran function from C. The steps required to run the Fortran library from C was (1) to determine the function name (as displayed with the nm
command) and input data types (int, real, double, char etc.), and (2) to write C code that takes care of data handling, memory allocation, and calling the foreign function, and (3) to compile, link, and run the resulting binary. In an upcoming post, I will show how to interface the same library with Java.8
Note: For this post’s accompanying code, see https://git.staudtlex.de/blog/call-fortran-from-c.
-
Assuming that said library has been extensively tested and proven reliable. ↩︎
-
For information about how to load and call foreign library functions from R, see here and the R documentation, especially dyn.load() and dyn.unload() and Registering native routines. ↩︎
-
Furthermore, C provides a low-level access to typical machine instructions, allowing the efficient implementation of computationally demanding algorithms. For further information about C, see this Wikipedia-entry. ↩︎
-
For more details, see the R documentation. ↩︎
-
For a comparison of the results of our Fortran
linreg
library with R’slm()
, see this blog post. ↩︎ -
Learning (the opportunities given to employees to learn) as well seems to be positvely associated with the overall rating, yet the effect size of 0.32 (±0.33) does not statistically differ from zero at the 95%-level. ↩︎
-
For a quick overview about the Java programming language, see this Wikipedia entry. ↩︎