Enabling OpenMP support for data.table on macOS

Xcode compilers on macOS do not support OpenMP. In consequence, R packages use only one of the many available cores for their computations, which often results in longer computing time. In this blog post, I show how to re-enable OpenMP on macOS, and how to rebuild affected R packages.

The R package data.table provides very efficient functions to read and write large CSV files: fread and fwrite. To do so, data.table makes heavy use of multithreaded C code.

Yet, given Xcode’s missing OpenMP support1 on macOS,2 data.table greets users with the following message:

> library(data.table)
data.table 1.14.8 using 1 threads (see ?getDTthreads).  Latest news: 
r-datatable.com
**********
This installation of data.table has not detected OpenMP support. It 
should still work but in single-threaded mode.
This is a Mac. Please read https://mac.r-project.org/openmp/. Please 
engage with Apple and ask them for support. Check r-datatable.com for 
updates, and our Mac instructions here: 
https://github.com/Rdatatable/data.table/wiki/Installation. After 
several years of many reports of installation problems on Mac, it's 
time to gingerly point out that there have been no similar problems 
on Windows or Linux.
**********

Alleviating this issue is luckily not too complicated.3 All we have to do to convince Xcode Clang to enable multithreading in data.table (and other R packages) is to install the OpenMP runtime and to add some build flags.

Get the OpenMP runtime library

The runtime consists of a dynamic library (libomp.dylib) and three header files, which can be fetched from https://mac.r-project.org/openmp and need to be copied to /usr/local/lib and /usr/local/include respectively.

Alternatively, fetch the LLVM-sources, select the commit matching the version shipped with Xcode (for Xcode 14.x 4ba6a9c9f65b), and build libomp from source.4

Build data.table

Now that we have the OpenMP runtime, let’s (re-)build data.table. We may choose one of these approaches:

  1. Update .R/Makevars.5

    CPPFLAGS += -Xclang -fopenmp
    LDFLAGS += -lomp -Wl,-rpath,/usr/local/lib
    

    Then, build data.table from within R with

    install.packages("data.table", type = "source")
    

    or from the command line. Installing from the command line requires the package tarball to present in the current working directory.

    R CMD INSTALL data.table_1.14.8.tar.gz
    
  2. Directly provide R CMD INSTALL with the information to compile and link data.table with OpenMP.

    PKG_CPPFLAGS='-Xclang -fopenmp' \ 
    PKG_LIBS='-lomp -Wl,-rpath,/usr/local/lib' \ 
    R CMD INSTALL data.table_1.14.8.tar.gz
    
  3. Alternatively, link the OpenMP runtime statically to data.table. This comes with the usual advantages and disadvantages of linking statically,6 and requires libomp to be built as a static library (see footnote 4).

    PKG_CPPFLAGS='-Xclang -fopenmp' \
    PKG_LIBS='/usr/local/lib/libomp.a' \
    R CMD INSTALL data.table_1.14.8.tar.gz
    

Use the rebuilt data.table

After (re-)building data.table, start up R and load the fresh library. data.table should now display how many threads it will use.

> library(data.table)
data.table 1.14.8 using 4 threads (see ?getDTthreads).  Latest news: r-datatable.com

Happy coding!


  1. Clang is Xcode’s C compiler. ↩︎

  2. This is not to say that macOS doesn’t provide means to build multithreading applications. The system in fact ships for instance with pthread.h as well as Apple’s Grand Central Dispatch↩︎

  3. Thanks to the work of the R project at https://mac.r-project.org/openmp↩︎

  4. What I did was something along these lines:

    # fetch sources and switch to OpenMP directory
    git clone --branch release/15.x https://github.com/llvm/llvm-project.git
    cd llvm-project
    git checkout 4ba6a9
    cd openmp
    
    # configure and build (to build a static library, add 
    # -DLIBOMP_ENABLE_SHARED=false)
    mkdir build && cd build
    cmake -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ ..
    make -j
    
    # install
    sudo make install
    
     ↩︎
  5. Avid readers of the data.table installation documentation on GitHub or the macOS documentation on R-Project (https://mac.r-project.org/openmp) may notice that I added -Wl,-rpath,/usr/local/lib PKG_LIBS. This is necessary to ensure that libomp.dylib is actually found by data.table↩︎

  6. See e.g. https://en.wikipedia.org/wiki/Static_build and https://en.wikipedia.org/wiki/Library_(computing)↩︎