Enabling OpenMP support for data.table on macOS
Xcode compilers on macOS do not support OpenMP. In consequence, R packages use only one of the many available cores for their computations, which often results in longer computing time. In this blog post, I show how to re-enable OpenMP on macOS, and how to rebuild affected R packages.
The R package data.table
provides very efficient functions to read and write large CSV files: fread
and fwrite
. To do so, data.table
makes heavy use of multithreaded C
code.
Yet, given Xcode’s missing OpenMP support1 on macOS,2 data.table
greets users with the following message:
> library(data.table)
data.table 1.14.8 using 1 threads (see ?getDTthreads). Latest news:
r-datatable.com
**********
This installation of data.table has not detected OpenMP support. It
should still work but in single-threaded mode.
This is a Mac. Please read https://mac.r-project.org/openmp/. Please
engage with Apple and ask them for support. Check r-datatable.com for
updates, and our Mac instructions here:
https://github.com/Rdatatable/data.table/wiki/Installation. After
several years of many reports of installation problems on Mac, it's
time to gingerly point out that there have been no similar problems
on Windows or Linux.
**********
Alleviating this issue is luckily not too complicated.3 All we have to do to convince Xcode Clang to enable multithreading in data.table
(and other R packages) is to install the OpenMP runtime and to add some build flags.
Get the OpenMP runtime library
The runtime consists of a dynamic library (libomp.dylib
) and three header files, which can be fetched from https://mac.r-project.org/openmp and need to be copied to /usr/local/lib
and /usr/local/include
respectively.
Alternatively, fetch the LLVM-sources, select the commit matching the version shipped with Xcode (for Xcode 14.x 4ba6a9c9f65b), and build libomp
from source.4
Build data.table
Now that we have the OpenMP runtime, let’s (re-)build data.table
. We may choose one of these approaches:
-
Update
.R/Makevars
.5CPPFLAGS += -Xclang -fopenmp LDFLAGS += -lomp -Wl,-rpath,/usr/local/lib
Then, build
data.table
from within R withinstall.packages("data.table", type = "source")
or from the command line. Installing from the command line requires the package tarball to present in the current working directory.
R CMD INSTALL data.table_1.14.8.tar.gz
-
Directly provide
R CMD INSTALL
with the information to compile and linkdata.table
with OpenMP.PKG_CPPFLAGS='-Xclang -fopenmp' \ PKG_LIBS='-lomp -Wl,-rpath,/usr/local/lib' \ R CMD INSTALL data.table_1.14.8.tar.gz
-
Alternatively, link the OpenMP runtime statically to
data.table
. This comes with the usual advantages and disadvantages of linking statically,6 and requireslibomp
to be built as a static library (see footnote 4).PKG_CPPFLAGS='-Xclang -fopenmp' \ PKG_LIBS='/usr/local/lib/libomp.a' \ R CMD INSTALL data.table_1.14.8.tar.gz
Use the rebuilt data.table
After (re-)building data.table
, start up R and load the fresh library. data.table
should now display how many threads it will use.
> library(data.table)
data.table 1.14.8 using 4 threads (see ?getDTthreads). Latest news: r-datatable.com
Happy coding!
-
This is not to say that macOS doesn’t provide means to build multithreading applications. The system in fact ships for instance with pthread.h as well as Apple’s Grand Central Dispatch. ↩︎
-
Thanks to the work of the R project at https://mac.r-project.org/openmp. ↩︎
-
What I did was something along these lines:
# fetch sources and switch to OpenMP directory git clone --branch release/15.x https://github.com/llvm/llvm-project.git cd llvm-project git checkout 4ba6a9 cd openmp # configure and build (to build a static library, add # -DLIBOMP_ENABLE_SHARED=false) mkdir build && cd build cmake -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ .. make -j # install sudo make install
-
Avid readers of the
data.table
installation documentation on GitHub or the macOS documentation on R-Project (https://mac.r-project.org/openmp) may notice that I added-Wl,-rpath,/usr/local/lib
PKG_LIBS
. This is necessary to ensure thatlibomp.dylib
is actually found bydata.table
. ↩︎ -
See e.g. https://en.wikipedia.org/wiki/Static_build and https://en.wikipedia.org/wiki/Library_(computing). ↩︎