# OLS in good old Fortran!

In this post, we will discuss the algorithm that produces point estimates given inputs and in linear regression. After deriving the function producing , I’ll briefly show how to implement this procedure in yet another compiled language, Fortran, and how to call it from within R.

## About linear regression

Let be a response vector and a matrix of covariates, a vector of population parameters, and a vector of error terms. Let furthermore be the number of observations, the number of parameters, , , , and the Gauss-Markov assumptions^{1} hold such that

Unfortunately, we do not know the actual population parameters and , and hence attempt to obtain their respective estimates and . One way to compute is by minimizing the sum of squared residuals

Minimization requires:^{2}

To assess the uncertainty around , we can derive their variance-covariance matrix (also see the Gauss-Markov assumptions):

We estimate with . Hence,

With and , we have all we need to estimate linear regressions in Fortran.

## Implement in Fortran

Fortran provides two ways to call algorithms, *functions* and *subroutines*. Simply put, functions return results and do not modify their inputs, while subroutines may also modify their inputs. Since we’ll call Fortran from within R, we’ll rely on the low level interface function `.Fortran()`

to call shared Fortran binaries. Yet, similar to R’s `.C()`

function, `.Fortran()`

does not allow code to create new R objects, only modification of existing ones. We’ll hence implement the linear regression algorithm as a Fortran subroutine.

The *linreg* subroutine takes as inputs a vector , covariate matrix , as well as the number of observations and parameters . It furthermore takes three objects to be modified by the subroutine: the parameter estimates, the variance-covariance matrix, and the vector of standard errors.

```
subroutine linreg(y, X, n, k, beta, vcov, se)
! avoid variables i, j, k, l, n to be treated as integer automatically
implicit none
! declare type of input objects
integer :: n, k
real(8) :: y(n), X(n, k), beta(k), vcov(k, k), se(k)
! type of intermediary objects
integer :: ipiv(k), info, i
real(8) :: XpX(k, k), invXpX(k, k), work(k), Xpy(k), e(n)
! compute point estimates:
! intermediary steps
! (1) X'X
XpX = matmul(transpose(X), X)
! (2) (X'X)^-1: invert XpX using LAPACK functions
invXpX = XpX
call dgetrf(k, k, invXpX, k, ipiv, info)
call dgetri(k, invXpX, k, ipiv, work, k, info)
! (3) X'y
Xpy = matmul(transpose(X), y)
! (4) beta = (X'X)^-1 X'y
beta = matmul(invXpX, Xpy)
! compute variance-covariance matrix:
! var(betahat|X) = sigmahat^2 * (X'X)^-1
! sigmahat^2 = e'e / (n - k)
! e'e = (y - Xbeta)'(y - Xbeta)
e = y - matmul(X, beta)
vcov = dot_product(e, e) / (n - k) * invXpX
! compute standard errors:
! square root of main diagonal of variance-covariance matrix
se = (/ (sqrt(vcov(i, i)), i = 1,k) /)
end subroutine linreg
```

To make *linreg* callable from R, we compile the above code into a shared library. Assuming the code is saved in a file named *linreg.f95*:

`gfortran-9 linreg.f95 -o linreg.so -shared -lblas -llapack -std=f95`

## Run linear regression in R

We test our Fortran linear regression subroutine in R using the `attitude`

R data frame. `attitude`

contains aggregated responses of clerical employees of a large financial organization. Each observation corresponds to the responses of 35 employees from each department:

```
rating complaints privileges learning raises critical advance
1 43 51 30 39 61 92 45
2 63 64 51 54 63 73 47
3 71 70 68 69 76 86 48
4 61 63 45 47 54 84 35
5 81 78 56 66 71 83 47
6 43 55 49 44 54 49 34
```

To be able to call *linreg* from R, we need to load our Fortran library using `dyn.load()`

. In a next step, we prepare the input data (y, X, n, k) we need to estimate the linear regression. Furthermore, prepare empty results objects into which *linreg* will write the regression results. We then run both our Fortran linear regression implementation as well as R’s `lm()`

-function.

```
## ---- implementing linear regression in Fortran ----
data(attitude)
## ---- prepare input data ----
# data inputs
X <- model.matrix(rating ~ ., data = attitude)
y <- attitude$rating
n <- nrow(X)
k <- ncol(X)
# results objects
beta <- double(k)
names(beta) <- colnames(X)
vcov <- matrix(0, nrow = k, ncol = k)
se <- double(k)
names(se) <- colnames(X)
## ---- run regression ----
# R
ols_R <- lm(rating ~ ., data = attitude)
# Fortran
dyn.load("linreg.so") # load Fortran library
ols_fortran <- .Fortran("linreg", y = y, X = X, n = n, k = k, beta = beta, vcov = vcov, se = se)
dyn.unload("linreg.so") # unload library
# summarize results from Fortran function
res <- cbind(ols_fortran$beta, ols_fortran$se, ols_fortran$beta/ols_fortran$se)
res <- cbind(res, 2 * pt(abs(res[, 3]), df = n - k, lower.tail = FALSE))
colnames(res) <- c("Estimate", "Std. Err", "t value", "Pr(>|t|)")
# results
summary(ols_R)
print(res)
```

A first glance at the results of `lm()`

and `linreg()`

indicate that our Fortran implementation produces estimates very similar to R’s `lm()`

.

```
Call:
lm(formula = rating ~ ., data = attitude)
Residuals:
Min 1Q Median 3Q Max
-10.9418 -4.3555 0.3158 5.5425 11.5990
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 10.78708 11.58926 0.931 0.361634
complaints 0.61319 0.16098 3.809 0.000903 ***
privileges -0.07305 0.13572 -0.538 0.595594
learning 0.32033 0.16852 1.901 0.069925 .
raises 0.08173 0.22148 0.369 0.715480
critical 0.03838 0.14700 0.261 0.796334
advance -0.21706 0.17821 -1.218 0.235577
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 7.068 on 23 degrees of freedom
Multiple R-squared: 0.7326, Adjusted R-squared: 0.6628
F-statistic: 10.5 on 6 and 23 DF, p-value: 1.24e-05
```

```
Estimate Std. Err t value Pr(>|t|)
(Intercept) 10.78707639 11.5892572 0.9307824 0.3616337210
complaints 0.61318761 0.1609831 3.8090182 0.0009028679
privileges -0.07305014 0.1357247 -0.5382229 0.5955939205
learning 0.32033212 0.1685203 1.9008516 0.0699253459
raises 0.08173213 0.2214777 0.3690310 0.7154800884
critical 0.03838145 0.1469954 0.2611064 0.7963342642
advance -0.21705668 0.1782095 -1.2179862 0.2355770486
```

To be sure, let’s check the results more precisely with R’s `all.equal()`

function. Note that with `lm()`

, we extract the model’s variance-covariance matrix, and take the square root of the matrix’ main diagonal to obtain the estimates’ standard errors. Indeed `all.equal()`

suggest that `lm()`

and our OLS implementation produce the same results.^{3}

```
# compare coefficients from lm() and linreg()
all.equal(coef(ols_R), res[, 1])
# compare standard errors
all.equal(sqrt(diag(vcov(ols_R))), res[, 2])
```

```
[1] TRUE
[1] TRUE
```

## Concluding remarks

This post describes in general terms how to compute parameter estimates using ordinary least squares. As we have seen, the algorithm behind linear regression essentially relies on basic matrix algebra and high-school calculus. More specifically, it offers a closed-form equation to directly compute parameter estimates: no starting values or iterative computations necessary.

Assuming some basic programming experience, implementing linear regression in Fortran has proven quite straightforward as well. Indeed, Fortran – unlike C – comes built-in with vectors, matrices, and various linear algebra functions, such that implementing linear regression almost^{4} boiled down to writing in Fortran code.

- Linear relation between and (); has full rank; Zero conditional mean (); Constant and uncorrelated error variance ().
^{[return]} - For completeness, minimization also requires , which is the case when has full rank.
^{[return]} - Note that
`all.equal()`

only tests for “near equality”, allowing numbers to differ with a tolerance of about 1.5 × 10^{-8}(approximately seven significant digits of precision). This should still be good enough for our purposes.^{[return]} - Some may very well beg to differ.
^{[return]}