Taking from Infinite Sequences

One thing that has really caught my attention as I learn more programming languages is the idea of generators or infinite sequences of values. Yes, infinite. Coming from R, that seems unlikely, but in at least several other languages, it’s entirely possible thanks to iterators and lazy evaluation.

I saw this video (youtu.be) which solves a codewars challenge using an infinite list, which references this one (youtu.be) on the same topic.

First, a diversion into recursion

A timely combination. @rverbsr
A timely combination. @rverbsr (botsin.space)

In Haskell, one of the first exercises after “Hello, World!” people discover is the Fibonacci sequence, where the nthn^{\rm th} value is given by the sum of the two previous values. As a function, this can be written as

λ> fib 0 = 0
λ> fib 1 = 1
λ> fib n = fib (n-1) + fib (n-2)

Essentially, fib(0) returns 0. fib(1) returns 1, and for any value n it returns the (recursively defined) sum of the two previous values. This isn’t infinite at all…

fib(4)=fib(3)+fib(2)=(fib(2)+fib(1))+(fib(1)+fib(0))=fib(1)+fib(0)+fib(1)+fib(1)+fib(0)=1+0+1+1+0=3 \begin{align*} {\rm fib}(4) &= {\rm fib}(3) + {\rm fib}(2)\\ &= ({\rm fib}(2) + {\rm fib}(1)) + ({\rm fib}(1) + {\rm fib}(0))\\ &= {\rm fib}(1) + {\rm fib}(0) + {\rm fib}(1) + {\rm fib}(1) + {\rm fib}(0)\\ &= 1 + 0 + 1 + 1 + 0\\ &= 3 \end{align*}

We could write that in R as

fib <- function(n) {
  if (n == 0) return(0)
  if (n == 1) return(1)
  fib(n - 1) + fib(n - 2)
}

fib(4)
## [1] 3
fib(8)
## [1] 21

This may come as a surprise to some - the function is defined in terms of itself and some base cases. This is entirely fine in R and Haskell as they’re lazily evaluated - nothing happens until a value is actually used. In R, this means that if an argument to a function isn’t used, it’s not evaluated at all

loud_func <- function() {
  cat("HELLLOOOOO!!!\n")
}

stays_quiet <- function(f, g = loud_func()) {
  f(c(1, 2, 3, 4))
}

stays_quiet(f = mean)
## [1] 2.5

Notice that although the argument g is the invocation of loud_func(), it’s never evaluated because we don’t use it. If we did use it…

noisy <- function(f, g = loud_func()) {
  g
  f(c(1, 2, 3, 4))
}

noisy(sum)
## HELLLOOOOO!!!
## [1] 10

So, in these languages, we can define a function recursively. Given the base case(s), these will eventually return just a number, so the computation will complete.

Instead of just adding the values together, we can create a sequence of values by concatenating the iterations together. Starting with data and working down to a base case is called “recursion”, while starting from a base case and building up a data structure is “corecursion”.

If we want a sequence of values that represents the Fibonacci numbers, we can use

fibs <- function(n) {
  if (n == 0) return(0)
  if (n == 1) return(1)
  prev <- fibs(n - 1)
  c(prev, sum(tail(prev, 2)))
}

fibs(10)
##  [1]  1  1  2  3  5  8 13 21 34 55

What’s blown my mind recently is the concept of infinite data structures. If I defined some function that, instead of working down to a base case, just kept expanding, say, by concatenating with a larger number (corecursion), such as

inf_series <- function(x) {
  c(x, inf_series(x + 1))
}

Then, if I tried to evaluate inf_series(5) this would produce

c(5, inf_series(6))

which would expand to

c(5, 6, inf_series(7))

and so on… forever. Of course, R can’t keep going forever

inf_series(5)
Error: C stack usage  7971732 is too close to the limit

This error comes about because each function in R is a “closure” which “encloses” its environment. The way R keeps track of that (and where it needs to return after returning from a function) is by adding “stack frames” each time it dives deeper into a function calling a function. We’ve asked it to add infinity of these, so at some point it says “too many”.

Okay, so, not possible, right?

In Haskell I can define

λ> fibs = 0 : scanl (+) 1 fibs

which is a concatenation (:) of 0 with the result of scanl (+) 1 fibs. Note carefully, this isn’t a function - it’s a vector of values defined recursively 🤯

Mind = blown
Mind = blown

To explain the definition a little more: scanl (hackage.haskell.org) is similar to reduce in that it takes a starting value, a vector, and a binary operator, but rather than reducing the vector to a value, it creates a new vector with the successively reduced values. In this example, the values 1..5 are successively added (+) to 0, so the second entry is 0+1=1, the next is 1+2=3, the next is 3+3=6, then 6+4=10, then 10+5=15

λ> scanl (+) 0 [1..5]
[0,1,3,6,10,15]

In the Fibonacci case, the operator is still addition, but the starting value is 1 and the vector is … the entire vector we’re defining. Writing out some of the terms makes this easier to understand

λ> scanl (+) 1 [0, 1, 1, 2, 3, 5, 8]
[1,1,2,3,5,8,13,21]

The first terms in the sequence, after concatenating with 0 will be

λ> [0, 1, 1, 2, 3, 5, 8]

so, by defining fibs in terms of fibs, the sequence can go on forever. So, what if you try to print this? In GHCI, the output will just stream forever, which isn’t particularly useful. Instead, we can ask for some number of values, say, ten

λ> take 10 fibs
[0,1,1,2,3,5,8,13,21,34]

Due to the laziness of Haskell, nothing is computed until it’s needed, so asking for any number of values is fast, despite the list being “infinite”

λ> take 100 fibs
[0,1,1,2,3,5,8,13,21,34,55,89,144,233,377,610,987,1597,2584,4181,6765,10946,
 17711,28657,46368,75025,121393,196418,317811,514229,832040,1346269,2178309,
 3524578,5702887,9227465,14930352,24157817,39088169,63245986,102334155,
 165580141,267914296,433494437,701408733,1134903170,1836311903,2971215073,
 4807526976,7778742049,12586269025,20365011074,32951280099,53316291173,
 86267571272,139583862445,225851433717,365435296162,591286729879,956722026041,
 1548008755920,2504730781961,4052739537881,6557470319842,10610209857723,
 17167680177565,27777890035288,44945570212853,72723460248141,117669030460994,
 190392490709135,308061521170129,498454011879264,806515533049393,
 1304969544928657,2111485077978050,3416454622906707,5527939700884757,
 8944394323791464,14472334024676221,23416728348467685,37889062373143906,
 61305790721611591,99194853094755497,160500643816367088,259695496911122585,
 420196140727489673,679891637638612258,1100087778366101931,1779979416004714189,
 2880067194370816120,4660046610375530309,7540113804746346429,12200160415121876738,
 19740274219868223167,31940434634990099905,51680708854858323072,
 83621143489848422977,135301852344706746049,218922995834555169026]

The idea of taking some values from an iterator shows up in other languages.

In Rust, I can create an infinite iterator of the value 1

use std::iter;
let ones = iter::repeat(1);

and I can take some number of these, say, five, collected into a vector

ones.take(5).collect::<Vec<_>>()
[1, 1, 1, 1, 1]

Python also has a notion of infinite sequences, and they’re likely even more common.

When you work with a (regular, finite) list of values in a range, you get a range object

numbers = range(1, 9)
numbers
## range(1, 9)
type(numbers)
## <class 'range'>

You can convert this to a regular list with

list(numbers)
## [1, 2, 3, 4, 5, 6, 7, 8]
type(list(numbers))
## <class 'list'>

If you filter the numbers (which works on a range or a list) you get a filter object

even_numbers = filter(lambda x: x % 2 == 0, numbers)
even_numbers
## <filter object at 0x7f5ac61da470>
type(even_numbers)
## <class 'filter'>

which you can also convert to a list to see the values

list(even_numbers)
## [2, 4, 6, 8]

But, you can use this as an iterable, so you can get the ‘next’ value as many times as you need (defined here again to restart the iterator)

even_numbers = filter(lambda x: x % 2 == 0, numbers)
next(even_numbers)
## 2
next(even_numbers)
## 4
next(even_numbers)
## 6
next(even_numbers)
## 8

until there’s none left

next(even_numbers)
## Error in py_call_impl(callable, dots$args, dots$keywords): StopIteration

As before, you can convert these to a fixed-length list, if desired

list(filter(lambda x: x % 2 == 0, numbers))
## [2, 4, 6, 8]

Still with me? Great. We can create an infinite list, if we want to, because it isn’t evaluated until we ask for elements

def infinitenumbers():
    count = 0
    while True:
        yield count
        count += 1

nums = infinitenumbers()        

nums
## <generator object infinitenumbers at 0x7f5ac615d310>
type(nums)
## <class 'generator'>

This is a generator which means it’s capable of generating values. We can ask for as many as we want, now

next(nums)
## 0
next(nums)
## 1
next(nums)
## 2
next(nums)
## 3
next(nums)
## 4

If we want to convert some number of these to a list, we need a new function, roughly the equivalent of Haskell’s take, in order to extract these

from itertools import islice

nums = infinitenumbers()        

list(islice(nums, 10))
## [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

and as before, if we ask for more now, we get the next batch

list(islice(nums, 10))
## [10, 11, 12, 13, 14, 15, 16, 17, 18, 19]

So, can we do this at all in R? We can’t build an infinite data structure per se, but what if we use a recursive definition and just… stop when it’s recursed enough times?

I initially thought about hacking into functions with body() to keep track of this, but we’ve already discussed something we can use - the stack frames! R keeps track of how deep it’s recursed using these, and we can access that information with sys.calls(), the help for which describes

.GlobalEnv is given number 0 in the list of frames. Each subsequent function evaluation increases the frame stack by 1.

So, we can tell from inside a recursive function how deeply nested we currently are.

I wrote the following helper, named for the Haskell inspiration

take <- function(f, n, x = 1) {
  current_depth <- length(sys.calls())  # Subtract 1 to exclude the current call,
                                        # but add 1 to start at 1
  
  # uncomment this line to watch the magic happen!
  # cat("Current Stack Depth: ", current_depth, "\n")
 
  if (current_depth >= n) {
    return(f(x))
  } else {
    return(c(f(x), take(f, n, x + 1)))
  }
}

this checks length(sys.calls()) which starts at 1 the first time it’s called, and adds 1 every time we go deeper. So long as we haven’t reached the requested depth, it combines the passed-in function evaluated at x+ix + i with a new evaluation one level deeper.

When that reaches the requested depth, it returns the evaluated function, bubbling up the returned values so that we end up with a vector of nn values

[f(x),f(x+1),f(x+2),f(x+3),,f(x+n1)] [f(x), f(x+1), f(x+2), f(x+3), \dots, f(x+n-1)]

Neat idea, but does it work? We can’t pass in an actual infinite data structure, but we can pass a function that defines one

A (trivial) function that produces a number at each value of x is

numbers <- function(x) {
  x
}

If we pretend that’s a generator for every number, we can “take” some values from it

take(numbers, 5)
 [1] 1 2 3 4 5
take(numbers, 10)
 [1]  1  2  3  4  5  6  7  8  9 10

A more complicated recipe for an infinite list of numbers could be

g <- function(x) {
  x + 1
}

take(g, 7)
[1] 2 3 4 5 6 7 8
take(g, 10)
 [1]  2  3  4  5  6  7  8  9 10 11

Dealing with non-sequential numbers might be trickier… what if we want all the even numbers?

evens <- function(x) {
  if(x %% 2 == 0) x else NULL
}

take(evens, 10)
 [1]  2  4  6  8 10

No, that only checks the first 10 numbers. Instead,

evens <- function(x) {
  x * 2
}

take(evens, 7)
 [1]  2  4  6  8 10 12 14
take(evens, 10)
 [1]  2  4  6  8 10 12 14 16 18 20

What about our original example?

fibs <- function(x) {
  if (x == 0) return(0)
  if (x == 1) return(1)
  fibs(x - 1) + fibs(x - 2)
}

# 10 values, starting at 0
take(fibs, 10, x = 0)
 [1]  0  1  1  2  3  5  8 13 21 34
# 10 values, starting at 1
take(fibs, 10, x = 1)
 [1]  1  1  2  3  5  8 13 21 34 55

Or, if we want 12 values, starting at the 10th

take(fibs, 12, x = 10)
 [1]    55    89   144   233   377   610   987  1597  2584  4181  6765 10946
It… works!
It… works!

I’d say that’s working quite nicely!!!

Some caveats to keep in mind, though…

Since we’re relying on a count of stack frames on top of .GlobalEnv, this take() implementation won’t work nicely inside another function. In fact, since {knitr} is already a few functions deep, it also doesn’t work in an Rmd file (including this blog which is Rmd via {blogdown}). Not for use in production, but a fun exercise to figure it out at all.

Is there a better way to achieve this take() functionality? Where do you use infinite iterators/generators in R or another language? Spot an improvement that I can make? I can be found on Mastodon (fosstodon.org) or use the comments below.


devtools::session_info()
## ─ Session info ───────────────────────────────────────────────────────────────
##  setting  value
##  version  R version 4.1.2 (2021-11-01)
##  os       Pop!_OS 22.04 LTS
##  system   x86_64, linux-gnu
##  ui       X11
##  language (EN)
##  collate  en_AU.UTF-8
##  ctype    en_AU.UTF-8
##  tz       Australia/Adelaide
##  date     2023-08-18
##  pandoc   3.1.1 @ /usr/lib/rstudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
## 
## ─ Packages ───────────────────────────────────────────────────────────────────
##  package     * version date (UTC) lib source
##  blogdown      1.17    2023-05-16 [1] CRAN (R 4.1.2)
##  bookdown      0.29    2022-09-12 [1] CRAN (R 4.1.2)
##  bslib         0.5.0   2023-06-09 [3] CRAN (R 4.3.1)
##  cachem        1.0.8   2023-05-01 [3] CRAN (R 4.3.0)
##  callr         3.7.3   2022-11-02 [3] CRAN (R 4.2.2)
##  cli           3.6.1   2023-03-23 [3] CRAN (R 4.2.3)
##  crayon        1.5.2   2022-09-29 [3] CRAN (R 4.2.1)
##  devtools      2.4.5   2022-10-11 [1] CRAN (R 4.1.2)
##  digest        0.6.33  2023-07-07 [3] CRAN (R 4.3.1)
##  ellipsis      0.3.2   2021-04-29 [3] CRAN (R 4.1.1)
##  evaluate      0.21    2023-05-05 [3] CRAN (R 4.3.0)
##  fastmap       1.1.1   2023-02-24 [3] CRAN (R 4.2.2)
##  fs            1.6.3   2023-07-20 [3] CRAN (R 4.3.1)
##  glue          1.6.2   2022-02-24 [3] CRAN (R 4.2.0)
##  here          1.0.1   2020-12-13 [1] CRAN (R 4.1.2)
##  htmltools     0.5.6   2023-08-10 [3] CRAN (R 4.3.1)
##  htmlwidgets   1.5.4   2021-09-08 [1] CRAN (R 4.1.2)
##  httpuv        1.6.6   2022-09-08 [1] CRAN (R 4.1.2)
##  jquerylib     0.1.4   2021-04-26 [3] CRAN (R 4.1.2)
##  jsonlite      1.8.7   2023-06-29 [3] CRAN (R 4.3.1)
##  knitr         1.43    2023-05-25 [3] CRAN (R 4.3.0)
##  later         1.3.0   2021-08-18 [1] CRAN (R 4.1.2)
##  lattice       0.21-8  2023-04-05 [4] CRAN (R 4.3.0)
##  lifecycle     1.0.3   2022-10-07 [3] CRAN (R 4.2.1)
##  magrittr      2.0.3   2022-03-30 [3] CRAN (R 4.2.0)
##  Matrix        1.6-0   2023-07-08 [4] CRAN (R 4.3.1)
##  memoise       2.0.1   2021-11-26 [3] CRAN (R 4.2.0)
##  mime          0.12    2021-09-28 [3] CRAN (R 4.2.0)
##  miniUI        0.1.1.1 2018-05-18 [1] CRAN (R 4.1.2)
##  pkgbuild      1.4.0   2022-11-27 [1] CRAN (R 4.1.2)
##  pkgload       1.3.0   2022-06-27 [1] CRAN (R 4.1.2)
##  png           0.1-7   2013-12-03 [1] CRAN (R 4.1.2)
##  prettyunits   1.1.1   2020-01-24 [3] CRAN (R 4.0.1)
##  processx      3.8.2   2023-06-30 [3] CRAN (R 4.3.1)
##  profvis       0.3.7   2020-11-02 [1] CRAN (R 4.1.2)
##  promises      1.2.0.1 2021-02-11 [1] CRAN (R 4.1.2)
##  ps            1.7.5   2023-04-18 [3] CRAN (R 4.3.0)
##  purrr         1.0.1   2023-01-10 [1] CRAN (R 4.1.2)
##  R6            2.5.1   2021-08-19 [3] CRAN (R 4.2.0)
##  Rcpp          1.0.9   2022-07-08 [1] CRAN (R 4.1.2)
##  remotes       2.4.2   2021-11-30 [1] CRAN (R 4.1.2)
##  reticulate    1.26    2022-08-31 [1] CRAN (R 4.1.2)
##  rlang         1.1.1   2023-04-28 [1] CRAN (R 4.1.2)
##  rmarkdown     2.23    2023-07-01 [3] CRAN (R 4.3.1)
##  rprojroot     2.0.3   2022-04-02 [1] CRAN (R 4.1.2)
##  rstudioapi    0.15.0  2023-07-07 [3] CRAN (R 4.3.1)
##  sass          0.4.7   2023-07-15 [3] CRAN (R 4.3.1)
##  sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.1.2)
##  shiny         1.7.2   2022-07-19 [1] CRAN (R 4.1.2)
##  stringi       1.7.12  2023-01-11 [3] CRAN (R 4.2.2)
##  stringr       1.5.0   2022-12-02 [1] CRAN (R 4.1.2)
##  urlchecker    1.0.1   2021-11-30 [1] CRAN (R 4.1.2)
##  usethis       2.1.6   2022-05-25 [1] CRAN (R 4.1.2)
##  vctrs         0.6.3   2023-06-14 [1] CRAN (R 4.1.2)
##  xfun          0.40    2023-08-09 [3] CRAN (R 4.3.1)
##  xtable        1.8-4   2019-04-21 [1] CRAN (R 4.1.2)
##  yaml          2.3.7   2023-01-23 [3] CRAN (R 4.2.2)
## 
##  [1] /home/jono/R/x86_64-pc-linux-gnu-library/4.1
##  [2] /usr/local/lib/R/site-library
##  [3] /usr/lib/R/site-library
##  [4] /usr/lib/R/library
## 
## ─ Python configuration ───────────────────────────────────────────────────────
##  python:         /usr/bin/python3
##  libpython:      /usr/lib/python3.10/config-3.10-x86_64-linux-gnu/libpython3.10.so
##  pythonhome:     //usr://usr
##  version:        3.10.12 (main, Jun 11 2023, 05:26:28) [GCC 11.4.0]
##  numpy:          /home/jono/.local/lib/python3.10/site-packages/numpy
##  numpy_version:  1.24.1
##  
##  NOTE: Python version was forced by RETICULATE_PYTHON_FALLBACK
## 
## ──────────────────────────────────────────────────────────────────────────────



See also