Who Moved My Cheese: Laziness in Clojure
In this post, I try to understand what lazy sequences are and how to create our own lazy sequence in Clojure.
Lazy Sequences in Clojure
Clojure reference explains laziness as:
Most of the sequence library functions are lazy, i.e. functions that return seqs do so incrementally, as they are consumed, and thus consume any seq arguments incrementally as well.
The important parts of this is that many library functions that produce sequences (e.g. lists) do so incrementally, as they are consumed. This means that unless there is someone to consume the sequence, nothing really happens.
This was particularly perplexing to the eyes of a beginner learning Clojure, especially as to why the following the following does not print anything:
(defn print-numbers [n] (map println (range n)) ;; .... (1) (println "Done printing: " n))
When this function was called with proper arguments, the REPL produced the following output:
butterfly.core=> (print-numbers 9) Done printing: 9 nil
It turns out that the function
print-numbers produces a lazy sequence on line 1 (as indicated on the code-block above), i.e., the
map produces a lazy sequence. As no one is consuming the sequence from
map, it is never really realised. For this reason, the
println inside the
map is never executed. This is what lazy evaluation means.
This can be fixed in any of the following ways.
Use a non-lazy version of
(defn print-numbers [n] (mapv println (range n)) (println "Done printing: " n))
Using map + doall
doall which realises a lazy sequence
(defn print-numbers [n] (doall (map println (range n))) (println "Done printing: " n))
Instead of generating a sequence, we can use
doseq which acts on each element of a sequence
(defn print-numbers [n] (doseq [i (range n)] (println i)) (println "Done printing: " n))
A better version (for the present example) than using
doseq is to use
run! which applies a given function on every element of a sequence, without generating another sequence (unlike
(defn print-numbers [n] (run! println (range n)) (println "Done printing: " n))
Generating a Lazy Sequence 💪🏼
Let us now try to generate an infinite lazy sequence of prime numbers.
First, we write a function which returns
true if a number is prime
(defn is-prime? [n] (not-any? (fn [factor] (zero? (mod n factor))) (range 2 (dec n))))
Let us now define an infinite sequence of prime numbers:
(def infinite-primes (filter is-prime? (drop 2 (range))))
infinite-primes var is doing a
filter operation on an infinite sequence of numbers. Because
range both produce lazy sequences, this will not halt our program. In fact, this is the power of lazy sequences, that we can compute as many number of prime numbers, as we need when we need it without having to know apriori how many we may need.
Thus, all of the following generates prime number sequences of varying lengths:
user=> (take 5 infinite-primes) (2 3 5 7 11) user=> (take 10 infinite-primes) (2 3 5 7 11 13 17 19 23 29) user=> (take 100 infinite-primes) (2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113 127 131 137 139 149 151 157 163 167 173 179 181 191 193 197 199 211 223 227 229 233 239 241 251 257 263 269 271 277 281 283 293 307 311 313 317 331 337 347 349 353 359 367 373 379 383 389 397 401 409 419 421 431 433 439 443 449 457 461 463 467 479 487 491 499 503 509 521 523 541)
Constructing a Lazy Sequence from Scratch 🧙🏼
In the above part, we generated a lazy sequence by leveraging the library functions
range both of which are lazy.
In order to construct a lazy sequence from scratch, we can use
lazy-seq. But before writing a
lazy-primes function, let us write an
eager-primes function that constructs a sequence of prime numbers from scratch:
(defn eager-primes ([n] (eager-primes n 2 )) ([n curr xs] (if (= (count xs) n) xs (if (is-prime? curr) (recur n (inc curr) (conj xs curr)) (recur n (inc curr) xs)))))
Here’s the how
eager-primes can be used:
butterfly.core=> (eager-primes 8) [2 3 5 7 11 13 17 19]
This is not a lazy sequence, because irrespective of how many primes are consumed,
eager-primes will always generate
n prime numbers, no more, no less.
So, in order to make it lazy, first we need to get rid of
n, so that the production of primes depends on how many are consumed and not on a pre-defined number. For example, here we need only
10 primes, but
eager-primes will still generate
(take 10 (eager-primes 100))
In order to make it a lazy sequence, we need to stop the recursion. The way to do this is to wrap the recursive call around
lazy-seq. What this will do is that it will tell Clojure that if any one is consuming from this sequence, this is the step to repeat. In other words, if we have a pause/resume functionality around
recur, we have essentially generated a lazy sequence. But, in practice, we cannot actually use
recur because when we use
lazy-seq around the recursive call, the recursive call is no longer a tail call. So, we have to make the recursive call by using the function name.
(defn primes ( (primes 2)) ([curr] (if (is-prime? curr) (cons curr (lazy-seq (primes (inc curr)))) (lazy-seq (primes (inc curr))))))
butterfly.core=> (def all-primes (primes)) #'butterfly.core/all-primes butterfly.core=> (take 5 all-primes ) (2 3 5 7 11) butterfly.core=> (take 10 all-primes ) (2 3 5 7 11 13 17 19 23 29) butterfly.core=> (take 15 all-primes ) (2 3 5 7 11 13 17 19 23 29 31 37 41 43 47)
Important things to note:
Why we must use
consappends an element to a sequence at the beginning, without traversing the rest of the sequence. This is important because, if we are generating an infinite sequence, we cannot traverse the sequence fully because the sequence is infinite. In contrast,
conjappends a new element at the end of the sequence, which, by definition, requires traversal of the whole sequence (to find the end). This is why we must use
lazy-seq: Whenever Clojure sees
lazy-seqit stops evaluation until someone is realising it. This means that a
conson a lazy sequence is a sequence which has not really been computed. It will be computed when someone is traversing that list, i.e., realising it. The way this works is that
conspoints to an element and a lazy sequence which, again, includes a
conscell that points to an element and another lazy sequence. The Clojure environment evaluates these
lazy-seqs as demanded as the traversal progresses. Therefore, it only evaluates that part of the sequence which has been traversed at least once.
To visualize the above, let us try to understand how a lazy sequence is traversed upon realization. For example, when four elements are taken out of the lazy sequence, the following happens:
cons cell has a subsequent sequence, which is lazy. This means that it has not been computed. Therefore, only when we traverse that lazy sequence, do we find another
cons cell with the next prime number, followed by another lazy sequence. This can be thought of as the proverbial Russian Dolls. Only by opening the first doll do you see the next doll (and not any more).
This traversal of lazy sequences and realisation (and caching of realised
cons cells) is transparently done by Clojure, as they are consumed by some enclosing code. As a corollary, by virtue of the REPL having an eval step, lazy sequences are realised on the spot if directly written on the REPL (i.e., the REPL tries to consume the entire lazy sequence).
user=> (cons 1 (range)) ;; this will never finish on the REPL
Who Moved My Cheese: Why laziness is not a bad thing
In the book ‘Who Moved My Cheese’, the author says that the lazy mouse ultimately loses out on life because when the going gets tough, the hard-working mouse (eager mouse) finds a solution and the lazy one, out of sheer laziness, perishes.
However, in a modern-day language like Clojure, a lazy sequence can prove to be very useful because some problems, like the prime numbers sequence, by definition, are infinite. If we built a web-page that displayed a paginated result of prime numbers, any language that did not implement a lazy sequence of prime numbers, would have to re-evaluate all the prime numbers for every page. This means that to generate the 500th prime number, we would have to generate 499 prime numbers and then the 500th one. Consequently, to generate the 501st prime number, we would have to re-generate the first 500 prime numbers all over again (as seen in the
eager-primes example above). This in Clojure would not be required because we maintain only one sequence of prime numbers that are realised as required and no recomputaton would be necessary.
Edit: In an earlier version, I had mentioned that a lazy sequence can provide efficiency. This is not accurate: a lazy sequence does not improve the inherent algorithmic efficiency of an expression. (However, caching of a lazy sequence does prevent recomputations). Thanks to Aditya Athalye, for pointing this out!