Counting Digits Quickly

When things run slower than we’d like in R we tend to reach for another, usually compiled, language, and move our code there. What if it “just happened”? What started out as a silly exploration of how to count digits ended up with a race to see which language does it fastest. Maybe some surprises here for some, maybe some bad implementations on my part - let’s find out.

I saw some recent activity on {quickr} (github.com); Tomasz Kalinowski’s R to Fortran transpiler - I had starred the repo a long time ago (and in haste, accidentally unstarred it, then re-starred it) but never really played with it. I’m familiar with slightly older Fortran; nowadays it’s called “modern Fortran”, but I did my PhD using Fortran95 in the late 2000’s. I’ve even pushed some of my postdoc code to GitHub (github.com) after getting it working again for a recent student.

I figured now was a great chance to have a proper play with the package.

{quickr} “transpiles” R code which means it takes R code converts the syntax into Fortran syntax using the same variables and equivalent functions where available. The idea being that when R isn’t working fast enough for you, instead of re-writing your function in something like C++ (via {Rcpp}) it can automatically write a Fortran version of your code and compile that into a highly performant function which can be called with the same arguments. Faster running code with no additional effort - sounds great!

The README for {quickr} has some examples highlighting how it can improve the performance of some functions beyond what {Rcpp} can offer, in some cases approaching C speeds. That’s not surprising to those who know Fortran - it’s still very much used in theoretical physics partly because of the performance, partly due to the existing support in that field, but also partly because despite being an ‘old’ language, it’s actually pretty nice to use.

One of the big advantages of Fortran I found when learning other languages after learning Fortran was that there’s no manual memory management. If you want a vector or an array/tensor with many dimensions, you just ask for it (specifying a size along each dimension or dynamically sizing, but never manually freeing memory). R is known for its statistics chops, but under the hood of some of these functions still call out to Fortran code (github.com).

I wanted some example code to try out myself and see if I even recognise the Fortran it produces. I didn’t just want to use the example code from the package, so what could I use?

In this post I celebrated the fact that Julia has a ndigits() function, while in R I cheated and used nchar() which works fine provided you’re dealing with non-negative integers up to 99999, outside of which it doesn’t do what you want

nchar(99)       # 99    = 2 characters
## [1] 2
nchar(99999)    # 99999 = 5 characters
## [1] 5
nchar(99999+1)  # 1e+05 = 5 characters
## [1] 5
nchar(-99)      # -99   = 3 characters
## [1] 3

I had some interesting discussions on Mastodon (fosstodon.org) about different ways to implement ndigits() properly for R and in the end, re-implementing the Julia solution seemed to work great for all edge cases. I decided to use this for my Fortran testing with {quickr}.

I got the package installed and the compiler hooked up correctly so that I could run the example code, then tried adapting it to the ndigits() problem.

R

The R code I started with was

nd_R <- function(x) {
  out <- double(length(x))
  x <- abs(x / 10)
  for (v in seq_along(x)) {
    d <- 1
    m <- 1
    while (m <= x[v]) {
      m <- m * 10
      d <- d + 1
    }
    out[v] <- d
  }
  out
}

nd_R(c(123456, 234, -72))
## [1] 6 3 2

and while this looks like a moderate amount of code, in essence it’s taking the absolute value of the input (since we want to ignore negatives, and which is nicely vectorised in R), dividing by 10, checking if we’ve exceeded the input yet, and if not, stepping through successive multiples of 10 until we do, which finds the first power of 10 that is greater than our value, indicating the number of digits. For what it’s worth, this is why in that post I noted an alternative route to achieving this; ceil(log10(x)).

Fortran

Hoping to immediately transpile this to Fortran, I immediately hit my first snag; {quickr} hasn’t yet implemented while() so I can’t transpile this exactly as I have it. There’s no early return() or break either, so I can’t just exit an oversized loop early. Without an alternative, I’m going to cheat a bit and just run a loop 12 times - this puts an upper limit on the input to a 12 digit number, but I can live with that.

Update while writing the post: I suppose good things come to those who wait - on digging through some source code for this post I saw that while has been implemented in the last week, so I’m going to pretend that was always the case.

The other piece this transpiler needs is a type declaration for the input; R is fully dynamic in that a function can take any type of object and it’s up to the function to decide what to do with it. Fortran is a bit stricter, and requires types to be annotated, so I need to add a declare(type()) to the code.

nd2f <- function(x) {
  declare(
    type(x = double(NA))
  )
  out <- double(length(x))
  x <- abs(x / 10)
  for (v in seq_along(x)) {
    d <- 1
    m <- 1
    while (m <= x[v]) {
      m <- m * 10
      d <- d + 1
    }
    out[v] <- d
  }
  out
}

Note that this is still very much R code at this point - I can even run it in R and get the same answers as before

nd2f(c(123456, 234, -72))
## [1] 6 3 2

What surprised me here is that declare() is a base R function (not from {quickr}) intended for “specifying information about R code for use by the interpreter, compiler, and code analysis tools”. I was originally thinking it would be neat to be able to leverage that for some type-checking on the R side as well as being informative to the Fortran code, but it “ignores the arguments and returns NULL invisibly”, so no go on this throwing an error from R

int_id <- function(x) {
  declare(type(x = integer(NA)))
  x
}

int_id(3L)
## [1] 3
int_id(1.5)
## [1] 1.5

The magic happens when we ask {quickr} to do the transpilation.

The type information is used in the Fortran code, so compiling the id() example produces something that is more restrictive on types

int_id_F <- quickr::quick(int_id)
int_id_F(3L)
## [1] 3
int_id_F(1.5)
## Error in int_id_F(1.5): typeof(x) must be 'integer', not 'double'

I can inspect the generated code with r2f(), though one wouldn’t normally need to - it’s interesting to see what the Fortran code looks like

quickr:::r2f(int_id)
## subroutine int_id(x, x__len_) bind(c)
##   use iso_c_binding, only: c_int, c_ptrdiff_t
##   implicit none
## 
##   ! manifest start
##   ! sizes
##   integer(c_ptrdiff_t), intent(in), value :: x__len_
## 
##   ! args
##   integer(c_int), intent(in out) :: x(x__len_)
##   ! manifest end
## 
## 
## end subroutine
## 
## @r: function (x)
##   {
##       declare(type(x = integer(NA)))
##       x
##   }
## @closure: function (x)
##   {
##       declare(type(x = integer(NA)))
##       x
##   }

But of course, this just returns the value and that’s not particularly enlightening. Doing the same for the ndigits code

quickr:::r2f(nd2f)
## subroutine nd2f(x, out, x__len_) bind(c)
##   use iso_c_binding, only: c_double, c_int, c_ptrdiff_t
##   implicit none
## 
##   ! manifest start
##   ! sizes
##   integer(c_ptrdiff_t), intent(in), value :: x__len_
## 
##   ! args
##   real(c_double), intent(in out) :: x(x__len_)
##   real(c_double), intent(out) :: out(x__len_)
## 
##   ! locals
##   integer(c_int) :: v
##   real(c_double) :: d
##   real(c_double) :: m
##   ! manifest end
## 
## 
##   out = 0
##   x = abs((x / 10.0_c_double))
##   do v = 1, size(x)
##     d = 1.0_c_double
##     m = 1.0_c_double
##     do while ((m <= x(v)))
##       m = (m * 10.0_c_double)
##       d = (d + 1.0_c_double)
##     end do
##     out(v) = d
##   end do
## end subroutine
## 
## @r: function (x)
##   {
##       declare(type(x = double(NA)))
##       out <- double(length(x))
##       x <- abs(x/10)
##       for (v in seq_along(x)) {
##           d <- 1
##           m <- 1
##           while (m <= x[v]) {
##               m <- m * 10
##               d <- d + 1
##           }
##           out[v] <- d
##       }
##       out
##   }
## @closure: function (x)
##   {
##       declare(type(x = double(NA)))
##       out <- double(length(x))
##       x <- abs(x/10)
##       for (v in seq_along(x)) {
##           d <- 1
##           m <- 1
##           while (m <= x[v]) {
##               m <- m * 10
##               d <- d + 1
##           }
##           out[v] <- d
##       }
##       out
##   }

The subroutine itself looks a lot like the R code; sure, some type annotations are sprinkled around, do v = 1, size(x) replaces for v in seq_along(x) and do while replaces while, but I don’t think it’s entirely alien.

What might surprise some is the line

x = abs((x / 10.0_c_double))

Notice there’s no loop around this? Fortran is an array language… Rank-polymorphism, baby! I covered this in another post of mine but thanks to this, abs() is vectorised wherever needed

program test_abs
  implicit none
  integer, dimension(5) :: i = [-1, 2, -3, 4, -5]
  write(*,*) abs(i)
end program test_abs
#           1           2           3           4           5

Generating the compiled Fortran code from nd2f is as easy as

nd_F <- quickr::quick(nd2f)
nd_F
## function (x) 
## .External(<pointer: 0x11fddd73c>, x)

and we see that it’s referencing some external code. This can be called

nd_F(c(123456, 234, -72))
## [1] 6 3 2

with the big benefit that now it’s a LOT faster!

Generating a million random values and excluding any zero values, we can see the 40x performance increase (!!!)

set.seed(1)
nums <- round(runif(1e6, -1, 1) * 1e6)
nums <- nums[nums != 0]

b0 <- bench::mark(
  R = nd_R(nums),
  Fortran = nd_F(nums),
  min_iterations = 10
)

dplyr::arrange(b0[, 1:8], median)
## # A tibble: 2 × 6
##   expression      min   median `itr/sec` mem_alloc `gc/sec`
##   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
## 1 Fortran      3.56ms   3.88ms    255.      15.3MB    266. 
## 2 R          154.27ms 154.42ms      6.48    15.3MB     25.9
plot(b0)

For those not familiar, this benchmark plot shows the individual times taken for repeated executions of the code in each ‘expression’, grouped vertically by the ‘expression’ itself (annotated as the language here) with some random scatter to show the spread of execution times. Points to the left are faster. It’s also worth noting that bench::mark() defaults to check = TRUE so we can rest assured that the results from each of the different languages we’re about to explore are consistent and it’s not some artifact of one language doing less work.

If you run these yourself you’ll get slightly different results. I’m running them on a newish M3 Macbook Pro.

All that performance increase from just adding one line to the R code and wrapping it with one other function (resulting in an entirely different program being written and compiled, producing the correct results).

I should note that in the first iteration of this post (in which while was not yet supported) I used an excessive for loop which resulted in a not-as-impressive-but-still-very-impressive 15x performance boost.

R (compiled)

If compiled code is so great, what about just compiling the R code with, e.g. compiler::cmpfun()?

nd_comp = compiler::cmpfun(nd_R)

nd_comp(c(123456, 234, -72))
## [1] 6 3 2
b1 <- bench::mark(
  compiled = nd_comp(nums),
  R = nd_R(nums),
  Fortran = nd_F(nums),
  min_iterations = 10
)

dplyr::arrange(b1[, 1:8], median)
## # A tibble: 3 × 6
##   expression      min   median `itr/sec` mem_alloc `gc/sec`
##   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
## 1 Fortran      3.58ms   3.93ms    253.      15.3MB   123.  
## 2 compiled   147.86ms    149ms      6.66    15.3MB     6.66
## 3 R          154.69ms 155.25ms      6.30    15.3MB     2.70
plot(b1)

That doesn’t help; by the time the benchmark was running the nd_R function had been called enough times for it to be JIT compiled, anyway.

This did get me thinking, though - what about other compiled alternatives?

C

Since I’m going through Harvard’s CS50 ‘Introduction to Computer Science’ course (cs50.harvard.edu) with R Contributors (contributor.r-project.org) to learn a bit more structured C I figured I’d add that via coolbutuseless’ {callme} (github.com) package. This surely isn’t the world’s greatest C code, but it compiles and runs…

callme::compile(
  "
#include <R.h>
#include <Rinternals.h>
#include <stdlib.h>
#include <math.h>

SEXP nd_C(SEXP vec) {
  double *vec_ptr = REAL(vec);
  SEXP res = PROTECT(allocVector(REALSXP, length(vec)));
  double *res_ptr = REAL(res);
  for (int i = 0; i < length(vec); i++) {
    double abs_x = fabs(vec_ptr[i] / 10.0);
        int d = 1;
        double m = 1.0;
        while (m <= abs_x) {
            m *= 10.0;
            d++;
        }
        res_ptr[i] = d;
  }

  UNPROTECT(1);
  return res;
}
"
)

nd_C(c(123456, 234, -72))
## [1] 6 3 2

So, how does it compare?

b2 <- bench::mark(
  C = nd_C(nums),
  R = nd_R(nums),
  Fortran = nd_F(nums),
  min_iterations = 10
)

dplyr::arrange(b2[, 1:8], median)
## # A tibble: 3 × 6
##   expression      min   median `itr/sec` mem_alloc `gc/sec`
##   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
## 1 Fortran       3.3ms   3.58ms    277.     15.26MB   107.  
## 2 C            3.92ms   4.14ms    240.      7.63MB    50.6 
## 3 R          147.99ms 149.25ms      6.71   15.26MB     4.47
plot(b2)

Whoa - automatically transpiled Fortran runs faster than (my) C… That’s fast.

Impressively fast
Impressively fast

C++

What about C++ via {Rcpp}? Dealing with vectors is made easier by {Rcpp} having pre-built types compatible with R, and this otherwise looks very similar to the R code

nd_Rcpp <- Rcpp::cppFunction(
  "
IntegerVector nd(NumericVector x) {
    int n = x.size();
    IntegerVector out(n);

    for (int v = 0; v < n; v++) {
        double abs_x = std::abs(x[v] / 10.0);
        int d = 1;
        int m = 1;
        while (m <= abs_x) {
            m *= 10;
            d++;
        }
        out[v] = d;
    }

    return out;
}
"
)

nd_Rcpp(c(123456, 234, -72))
## [1] 6 3 2
b3 <- bench::mark(
  `C++` = nd_Rcpp(nums),
  C = nd_C(nums),
  R = nd_R(nums),
  Fortran = nd_F(nums),
  min_iterations = 10
)

dplyr::arrange(b3[, 1:8], median)
## # A tibble: 4 × 6
##   expression      min   median `itr/sec` mem_alloc `gc/sec`
##   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
## 1 C++          3.02ms   3.12ms    320.      3.82MB    16.6 
## 2 Fortran      3.29ms   3.58ms    279.     15.26MB    92.9 
## 3 C            3.75ms   3.93ms    255.      7.63MB    24.9 
## 4 R          148.52ms 148.85ms      6.71   15.26MB     1.68
plot(b3)

This one seems to wander around a bit; on different runs I’ve seen performance equal or better to the C code and on others, about 3x as long, but generally pretty fast.

Julia

After all of this, I remembered that I was comparing the Julia implementation - how does that perform? Julia is a JIT/AOT compiled language, so maybe it’s not too bad… I can still call that directly from R

JuliaCall::julia_eval("ndigits.([123456, 234, -72])")
## [1] 6 3 2

keeping in mind that the Julia function ndigits (the implementation for which (github.com) I’ve borrowed for all of the examples, so we are dealing with the same algorithm in each case) is in fact compiled, but available as ndigits(). As long as I make the vector available in a Julia session (as integers; the function is only defined for integers) I can run this

JuliaCall::julia_assign("nums", as.integer(nums))

b4 <- bench::mark(
  Julia = JuliaCall::julia_eval("ndigits.(nums)"),
  `C++` = nd_Rcpp(nums),
  C = nd_C(nums),
  R = nd_R(nums),
  Fortran = nd_F(nums),
  min_iterations = 10
)

dplyr::arrange(b4[, 1:8], median)
## # A tibble: 5 × 6
##   expression      min   median `itr/sec` mem_alloc `gc/sec`
##   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
## 1 C++          3.02ms   3.06ms    324.      3.81MB    26.6 
## 2 Julia         3.1ms   3.32ms    266.      3.81MB    22.0 
## 3 Fortran      3.29ms    3.6ms    270.     15.26MB    84.2 
## 4 C            3.74ms   3.94ms    251.      7.63MB    50.7 
## 5 R          157.75ms 157.91ms      6.33   15.26MB     2.71
plot(b4)

Ten points to Julia - remember, this is an interpreted language.

That’s really fast!
That’s really fast!

I should note there’s work being done towards making Julia binaries out of scripts (jbytecode.github.io), but this still has a startup time of a few dozen milliseconds for even a Hello, World example.

Rust

One more? What about Rust? We can use {rextendr} to call Rust code inline, making sure to target the release profile for maximum performance

rextendr::rust_function(
  r"(
  fn nd_Rust(x: &[f64]) -> Vec<i32> {
    let mut out = vec![0; x.len()];
    for v in 0..x.len() {
        let abs_x = (x[v].abs() / 10.0);
        let mut d = 1;
        let mut m = 1.0;
        while m <= abs_x {
            m *= 10.0;
            d += 1;
        }
        out[v] = d;
    }
    out
  }
)",
  profile = "release"
)
## ℹ build directory: '/private/var/folders/1h/k6c5hb4d2qx07m8kfqb54f9c0000gn/T/RtmppiKq7x/file8b2f28646e2'
## ✔ Writing '/private/var/folders/1h/k6c5hb4d2qx07m8kfqb54f9c0000gn/T/RtmppiKq7x/file8b2f28646e2/target/extendr_wrappers.R'
nd_Rust(c(123456, 234, -72))
## [1] 6 3 2
b5 <- bench::mark(
  Rust = nd_Rust(nums),
  Julia = JuliaCall::julia_eval("ndigits.(nums)"),
  `C++` = nd_Rcpp(nums),
  C = nd_C(nums),
  R = nd_R(nums),
  Fortran = nd_F(nums),
  min_iterations = 10
)

dplyr::arrange(b5[, 1:8], median)
## # A tibble: 6 × 6
##   expression      min   median `itr/sec` mem_alloc `gc/sec`
##   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
## 1 Rust         2.87ms   3.07ms    322.      3.82MB   24.9  
## 2 C++          3.03ms   3.13ms    318.      3.81MB   22.1  
## 3 Fortran      3.28ms   3.56ms    283.     15.26MB   65.5  
## 4 Julia        3.24ms   3.66ms    265.      3.81MB   17.8  
## 5 C            3.75ms   3.94ms    255.      7.63MB   22.5  
## 6 R          150.69ms 151.09ms      6.61   15.26MB    0.734
plot(b5)

Ridiculous speeds!
Ridiculous speeds!

We are truly spoiled for choice these days - not only do we have a plethora of languages we can call directly from R, but several languages which run faster than even (at least my implementation of) C and count number of digits of a million values in under 4ms.

Python

Just for funsies, what about Python? It’s not a compiled language, but maybe if I use numpy it will be fast … ? It’s at least another language I can call from R that is generally considered ‘faster’. Is it?

library(reticulate)
reticulate::py_run_string('
import numpy as np
def nd_python(x):
    x = np.asarray(x)
    out = np.zeros(len(x), dtype=int)

    for v in range(len(x)):
        abs_x = abs(x[v] / 10.0)
        d = 1
        m = 1
        while m <= abs_x:
            m *= 10
            d += 1
        out[v] = d

    return out.tolist()
')

py$nd_python(c(123456, 234, -72))
## [1] 6 3 2
b6 <- bench::mark(
  Python = py$nd_python(nums),
  Rust = nd_Rust(nums),
  Julia = JuliaCall::julia_eval("ndigits.(nums)"),
  `C++` = nd_Rcpp(nums),
  C = nd_C(nums),
  R = nd_R(nums),
  Fortran = nd_F(nums),
  min_iterations = 10
)

dplyr::arrange(b6[, 1:8], median)
## # A tibble: 7 × 6
##   expression      min   median `itr/sec` mem_alloc `gc/sec`
##   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
## 1 Rust         2.87ms   3.08ms    322.      3.81MB   13.0  
## 2 C++          3.02ms   3.13ms    321.      3.81MB    6.21 
## 3 Fortran      3.27ms   3.44ms    285.     15.26MB   32.7  
## 4 Julia        3.33ms   3.53ms    280.      3.81MB    7.49 
## 5 C            3.74ms   3.94ms    255.      7.63MB   10.6  
## 6 R           157.9ms 158.51ms      6.31   15.26MB    0.701
## 7 Python     269.08ms 271.27ms      3.64    3.81MB    0
plot(b6)

In fairness, there’s overhead here involved with calling it from R, but I think that’s apples-to-apples considering I’m doing the same with all the compiled languages.

Does it scale?

I’ve been running these benchmarks for a million numbers, but how do the results scale with that size? What if it’s just a handful of numbers? What about in between these extremes? Running the benchmarks at various scales should show this.

n_vals <- 10^(1:7)
scales <- purrr::map_df(n_vals, ~{
  set.seed(1)
  nums <- round(runif(.x, -1, 1) * .x)
  nums <- nums[nums != 0]
  JuliaCall::julia_assign("nums", as.integer(nums))
  b <- bench::mark(
    Python = py$nd_python(nums),
    Rust = nd_Rust(nums),
    Julia = JuliaCall::julia_eval("ndigits.(nums)"),
    `C++` = nd_Rcpp(nums),
    C = nd_C(nums),
    R = nd_R(nums),
    Fortran = nd_F(nums),
    min_iterations = 10,
    check = TRUE
  )
  dplyr::bind_cols(vec_len = .x, b[, 1:8])
})

library(ggplot2)
ggplot(scales,
       aes(x = vec_len,
           y = 1e6*as.numeric(median),
           col = as.character(expression)
       )) +
  geom_line(linewidth = 1) +
  geom_point(size = 2) +
  scale_x_log10() +
  scale_y_log10() +
  scale_color_discrete(palette = "Set2") +
  labs(
    title = "Scaling of Counting ndigits Benchmarks",
    x = "Vector Length",
    y = "Microseconds",
    color = "Language"
  ) +
  theme_bw()

What a nice, log-log linear result with that one exception - Julia is pretty constant up until 1000, after which it starts to follow the same trajectory as the other languages - presumably that’s just the overhead of starting up the Julia runtime, which is a known bottleneck.

There’s definitely a clear divide between the interpreted languages (R and Python) and the compiled ones.

At lower vector lengths there’s a little bit of a spread with Fortran really showing off at the lowest lengths

dplyr::arrange(scales[scales$vec_len == 10, ], median)
## # A tibble: 7 × 7
##   vec_len expression      min   median `itr/sec` mem_alloc `gc/sec`
##     <dbl> <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
## 1      10 Fortran     122.9ns 205.12ns  4081331.        0B     0   
## 2      10 C++         287.1ns 369.04ns  2585126.        0B     0   
## 3      10 C             410ns 491.97ns  1914470.        0B     0   
## 4      10 Rust        532.9ns  697.1ns  1367702.        0B     0   
## 5      10 R             943ns   1.11µs   878760.        0B     0   
## 6      10 Python       20.7µs  22.92µs    43281.        0B     4.33
## 7      10 Julia        69.8µs  71.59µs    13636.        0B     2.03

but we’re looking at sub microsecond differences - what will you do with all that free time?

By the time we’re looking at 1000 values, the compiled languages are all about the same

dplyr::arrange(scales[scales$vec_len == 1000, ], median)
## # A tibble: 7 × 7
##   vec_len expression      min   median `itr/sec` mem_alloc `gc/sec`
##     <dbl> <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
## 1    1000 C++          1.68µs   1.84µs   487745.    3.95KB     0   
## 2    1000 Rust         2.13µs   2.46µs   397120.    3.95KB     0   
## 3    1000 Fortran      1.97µs   2.79µs   347513.   15.72KB    34.8 
## 4    1000 C            2.79µs   3.08µs   299955.    7.86KB    30.0 
## 5    1000 Julia       73.06µs  77.49µs    12381.    3.95KB     0   
## 6    1000 R           77.53µs  81.22µs    12306.   15.72KB     0   
## 7    1000 Python      213.9µs 232.35µs     4282.    3.95KB     2.06

At ten million values it’s a complete wash the compiled languages with maybe a slight drop for C

dplyr::arrange(scales[scales$vec_len == 1e7, ], median)
## # A tibble: 7 × 7
##    vec_len expression      min   median `itr/sec` mem_alloc `gc/sec`
##      <dbl> <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
## 1 10000000 Rust        33.78ms  34.58ms    28.9      38.1MB   2.07  
## 2 10000000 C++         35.05ms  35.38ms    28.3      38.1MB   2.02  
## 3 10000000 Julia       37.63ms  38.56ms    18.1      38.1MB   2.01  
## 4 10000000 Fortran     39.29ms   41.9ms    23.8     152.6MB  17.0   
## 5 10000000 C           43.99ms  44.19ms    22.5      76.3MB   2.05  
## 6 10000000 R              1.8s    1.81s     0.550   152.6MB   0.236 
## 7 10000000 Python        3.06s     3.1s     0.322    38.1MB   0.0357

All very interesting!

It would probably be worthwhile digging into the memory usage of all of these since there’s a big difference that likely indicates something different is happening, but that’s beyond my understanding - feel free to let me know!

So, what might be the reason for Rust and Julia to be so fast, even compared to C? These are newer languages with a lot of focus on their compilers, and it’s entirely possible that they’re able to make some better optimisations compared to a very general C compiler, but more likely that’s the upper limit of what a computer can do in that much time and my C code is non-optimal.

Conclusions

Back to the original point, though - the transpilation does an amazing job of improving the code without having to write more code in a different language. Sure, Julia solves this ‘two language problem’ by just being ridiculously fast to begin with, but if I am writing R code, it’s fantastic to see there’s an option for just “making it go brrr” without actually doing anything extra.

Not all of R has been translated to Fortran so there’s a lot of code that won’t transpile just yet, but it’s a truly inspiring project that I’ll surely be keeping a close eye on.

I’d love to hear what people think about these comparisons - are there points I’ve overlooked? Better ways to do it? Improvements to my implementations which change the results? Other considerations I’ve missed? As always, I can be found on Mastodon (fosstodon.org) and the comment section below.


devtools::session_info()
## ─ Session info ───────────────────────────────────────────────────────────────
##  setting  value
##  version  R version 4.4.1 (2024-06-14)
##  os       macOS 15.5
##  system   aarch64, darwin20
##  ui       X11
##  language (EN)
##  collate  en_US.UTF-8
##  ctype    en_US.UTF-8
##  tz       Australia/Adelaide
##  date     2025-06-29
##  pandoc   3.4 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/aarch64/ (via rmarkdown)
## 
## ─ Packages ───────────────────────────────────────────────────────────────────
##  package      * version    date (UTC) lib source
##  beeswarm       0.4.0      2021-06-01 [1] CRAN (R 4.4.1)
##  bench          1.1.4      2025-01-16 [1] CRAN (R 4.4.1)
##  blogdown       1.21.1     2025-06-28 [1] Github (rstudio/blogdown@33313a5)
##  bookdown       0.41       2024-10-16 [1] CRAN (R 4.4.1)
##  brio           1.1.5      2024-04-24 [1] CRAN (R 4.4.0)
##  bslib          0.8.0      2024-07-29 [1] CRAN (R 4.4.0)
##  cachem         1.1.0      2024-05-16 [1] CRAN (R 4.4.0)
##  callme         0.1.10     2024-07-27 [1] CRAN (R 4.4.0)
##  cli            3.6.4      2025-02-13 [1] CRAN (R 4.4.1)
##  codetools      0.2-20     2024-03-31 [1] CRAN (R 4.4.1)
##  devtools       2.4.5      2022-10-11 [1] CRAN (R 4.4.0)
##  dichromat      2.0-0.1    2022-05-02 [1] CRAN (R 4.4.1)
##  digest         0.6.37     2024-08-19 [1] CRAN (R 4.4.1)
##  dotty          0.1.0      2024-08-30 [1] CRAN (R 4.4.1)
##  dplyr          1.1.4      2023-11-17 [1] CRAN (R 4.4.0)
##  ellipsis       0.3.2      2021-04-29 [1] CRAN (R 4.4.0)
##  evaluate       1.0.3      2025-01-10 [1] CRAN (R 4.4.1)
##  farver         2.1.2      2024-05-13 [1] CRAN (R 4.4.0)
##  fastmap        1.2.0      2024-05-15 [1] CRAN (R 4.4.0)
##  fs             1.6.5      2024-10-30 [1] CRAN (R 4.4.1)
##  generics       0.1.3      2022-07-05 [1] CRAN (R 4.4.0)
##  ggbeeswarm     0.7.2      2023-04-29 [1] CRAN (R 4.4.0)
##  ggplot2      * 3.5.2.9001 2025-06-15 [1] Github (tidyverse/ggplot2@9f80c8c)
##  glue           1.8.0      2024-09-30 [1] CRAN (R 4.4.1)
##  gtable         0.3.6      2024-10-25 [1] CRAN (R 4.4.1)
##  here           1.0.1      2020-12-13 [1] CRAN (R 4.4.0)
##  htmltools      0.5.8.1    2024-04-04 [1] CRAN (R 4.4.0)
##  htmlwidgets    1.6.4      2023-12-06 [1] CRAN (R 4.4.0)
##  httpuv         1.6.15     2024-03-26 [1] CRAN (R 4.4.0)
##  jquerylib      0.1.4      2021-04-26 [1] CRAN (R 4.4.0)
##  jsonlite       2.0.0      2025-03-27 [1] CRAN (R 4.4.1)
##  JuliaCall      0.17.6     2024-12-07 [1] CRAN (R 4.4.1)
##  knitr          1.50       2025-03-16 [1] CRAN (R 4.4.1)
##  later          1.4.1      2024-11-27 [1] CRAN (R 4.4.1)
##  lattice        0.22-6     2024-03-20 [1] CRAN (R 4.4.1)
##  lifecycle      1.0.4      2023-11-07 [1] CRAN (R 4.4.0)
##  magrittr       2.0.3      2022-03-30 [1] CRAN (R 4.4.0)
##  Matrix         1.7-1      2024-10-18 [1] CRAN (R 4.4.1)
##  memoise        2.0.1      2021-11-26 [1] CRAN (R 4.4.0)
##  mime           0.12       2021-09-28 [1] CRAN (R 4.4.0)
##  miniUI         0.1.1.1    2018-05-18 [1] CRAN (R 4.4.0)
##  pillar         1.10.1     2025-01-07 [1] CRAN (R 4.4.1)
##  pkgbuild       1.4.7      2025-03-24 [1] CRAN (R 4.4.1)
##  pkgconfig      2.0.3      2019-09-22 [1] CRAN (R 4.4.0)
##  pkgload        1.4.0      2024-06-28 [1] CRAN (R 4.4.0)
##  png            0.1-8      2022-11-29 [1] CRAN (R 4.4.0)
##  processx       3.8.6      2025-02-21 [1] CRAN (R 4.4.1)
##  profmem        0.7.0      2025-05-02 [1] CRAN (R 4.4.1)
##  profvis        0.4.0      2024-09-20 [1] CRAN (R 4.4.1)
##  promises       1.3.2      2024-11-28 [1] CRAN (R 4.4.1)
##  ps             1.9.0      2025-02-18 [1] CRAN (R 4.4.1)
##  purrr          1.0.4      2025-02-05 [1] CRAN (R 4.4.1)
##  quickr         0.1.0.9000 2025-06-29 [1] Github (t-kalinowski/quickr@254b4d0)
##  R6             2.6.1      2025-02-15 [1] CRAN (R 4.4.1)
##  RColorBrewer   1.1-3      2022-04-03 [1] CRAN (R 4.4.0)
##  Rcpp           1.0.14     2025-01-12 [1] CRAN (R 4.4.1)
##  remotes        2.5.0      2024-03-17 [1] CRAN (R 4.4.1)
##  reticulate   * 1.42.0     2025-03-25 [1] CRAN (R 4.4.1)
##  rextendr       0.3.1      2023-06-20 [1] CRAN (R 4.4.0)
##  rlang          1.1.5      2025-01-17 [1] CRAN (R 4.4.1)
##  rmarkdown      2.28       2024-08-17 [1] CRAN (R 4.4.0)
##  rprojroot      2.0.4      2023-11-05 [1] CRAN (R 4.4.0)
##  rstudioapi     0.17.1     2024-10-22 [1] CRAN (R 4.4.1)
##  S7             0.2.0      2024-11-07 [1] CRAN (R 4.4.1)
##  sass           0.4.9      2024-03-15 [1] CRAN (R 4.4.0)
##  scales         1.4.0      2025-04-24 [1] CRAN (R 4.4.1)
##  sessioninfo    1.2.2      2021-12-06 [1] CRAN (R 4.4.0)
##  shiny          1.9.1      2024-08-01 [1] CRAN (R 4.4.0)
##  stringi        1.8.4      2024-05-06 [1] CRAN (R 4.4.0)
##  tibble         3.2.1      2023-03-20 [1] CRAN (R 4.4.0)
##  tidyr          1.3.1      2024-01-24 [1] CRAN (R 4.4.0)
##  tidyselect     1.2.1      2024-03-11 [1] CRAN (R 4.4.0)
##  urlchecker     1.0.1      2021-11-30 [1] CRAN (R 4.4.0)
##  usethis        3.1.0.9000 2025-03-31 [1] Github (r-lib/usethis@a653d6e)
##  utf8           1.2.4      2023-10-22 [1] CRAN (R 4.4.0)
##  vctrs          0.6.5      2023-12-01 [1] CRAN (R 4.4.0)
##  vipor          0.4.7      2023-12-18 [1] CRAN (R 4.4.1)
##  withr          3.0.2      2024-10-28 [1] CRAN (R 4.4.1)
##  xfun           0.51       2025-02-19 [1] CRAN (R 4.4.1)
##  xtable         1.8-4      2019-04-21 [1] CRAN (R 4.4.0)
##  yaml           2.3.10     2024-07-26 [1] CRAN (R 4.4.0)
## 
##  [1] /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library
## 
## ─ Python configuration ───────────────────────────────────────────────────────
##  python:         /Users/jono/Library/Caches/org.R-project.R/R/reticulate/uv/cache/archive-v0/2tc-cviHm3ODucI_hIfUb/bin/python3
##  libpython:      /Users/jono/Library/Caches/org.R-project.R/R/reticulate/uv/python/cpython-3.11.12-macos-aarch64-none/lib/libpython3.11.dylib
##  pythonhome:     /Users/jono/Library/Caches/org.R-project.R/R/reticulate/uv/cache/archive-v0/2tc-cviHm3ODucI_hIfUb:/Users/jono/Library/Caches/org.R-project.R/R/reticulate/uv/cache/archive-v0/2tc-cviHm3ODucI_hIfUb
##  virtualenv:     /Users/jono/Library/Caches/org.R-project.R/R/reticulate/uv/cache/archive-v0/2tc-cviHm3ODucI_hIfUb/bin/activate_this.py
##  version:        3.11.12 (main, Apr  9 2025, 03:49:53) [Clang 20.1.0 ]
##  numpy:          /Users/jono/Library/Caches/org.R-project.R/R/reticulate/uv/cache/archive-v0/2tc-cviHm3ODucI_hIfUb/lib/python3.11/site-packages/numpy
##  numpy_version:  2.3.1
##  
##  NOTE: Python version was forced by py_require()
## 
## ──────────────────────────────────────────────────────────────────────────────


rstats  c  c++  fortran  rust  julia  python 

See also