I’m happy to say I’ve finished Advent of Code 2020.
It was quite a battle and along the way I’ve learned a few new tricks and painful lessons.

Lazy evaluation

While lazy evaluation nicely solves some problems it is also a great source of confusion when inevitably bugs arise during development. What follows is a simple sample that demonstrates how this can happen:

(defn on-evaluate [x] 
  (println x)
  (if (= x 5)
    (throw (Exception. "fail"))
    (+ x 5)
    )   
  )

(defn demo-lazy-bug 
  []  
  (let [call-lazy-bug (map on-evaluate (range 10))]
    (println "it would be nice to see a crash here")
    (try 
      (doall call-lazy-bug)
      (catch Exception e
        (println "but no")
        )   
      )   
    )   
  )

Calling demo-lazy-bug produces the following output.

it would be nice to see a crash here
0
1
2
3
4
5
but no

The reason is that map produces what is called a lazy sequence which gets evaluated when it is actually used. This means if I wouldn’t have added a call-lazy-bug it wouldn’t be called at all. It all makes sense but during development it does result in quite a bit of head scratching.
To force evaluation early use either doall or dorun

Sequences and vectors

There were quite a few problems in AOC where I struggled with speed of execution. Initially I stored everything in (lazy) sequences by default. Clojure standard library also has plenty of functions that return lazy sequences even when used on evaluated lists, maps etc. so it pays to be careful when composing functions.

While sequences are easy to use and feel idiomatic using them can easily result in severe performance bottlenecks when used indiscriminately. Here’s a short demo where I just continuously split and merge a list of numbers. Note that in sequence-shuffler I use doall to force evaluation. On any list of significant size omitting doall causes a stack overflow error for this particular benchmark.

(defn sequence-shuffler [n iterations]
  (loop [some-sequence (range n)
         steps 0]
      (if (>= steps iterations)
        some-sequence
        (recur (doall (concat (drop (/ n 4) some-sequence) 
                              (take (/ n 4) some-sequence))) (inc steps))
        )
      )
    )
  
(defn vector-shuffler [n iterations]
  (loop [some-vector (into [] (range n))
         steps 0]
      (if (>= steps iterations)
        some-vector
        (recur (into (subvec some-vector (/ n 4) n) 
                     (subvec some-vector 0 (/ n 4))) (inc steps))
        )
      )
    )
  
(defn sequence-vector-benchmark []
  (let [all-ns [1e1 1e2 1e3 1e4 1e5]
        steps 1e3]
    (loop [remaining-ns all-ns]
      (when (not (empty? remaining-ns))
        (println "n:" (first remaining-ns) "steps:" steps)
        (time (sequence-shuffler (first remaining-ns) steps))
        (time (vector-shuffler (first remaining-ns) steps))
        (recur (rest remaining-ns))
        )
      )
    )
  )

For this particular benchmark vector-shuffler works faster as soon as n exceeds 100.
There are two reasons why performance is much better:

  • subvec is fast (O(1))
  • into works faster than concat for vectors
    (from testing both ways of merging vectors on several aoc days)
n steps sequence-shuffler [ms] vector-shuffler [ms]
10 1000 8.363016 7.38162
100 1000 21.158169 5.998396
1000 1000 208.560073 43.798472
10000 1000 1889.878375 349.64432
100000 1000 20178.565774 4021.776388

While this is only one particular case the real lesson here is to always use an appropriate data structure for the problem at hand.

Last but not least

  • simplifying problem statement makes it easier to hit the right solution earlier - pen and paper helps here
  • using loop/recur rather than recursion makes it less likely to encounter stack overflows
  • extract functions early before code becomes a mess of round brackets