How my algorithm accidentally became O(n!)

Advent of Code is a yearly event where you solve a series of programming puzzles. The puzzles are released daily and you can solve them in any language you like. I like to do them (until I give up after a few days), and this year is no different.

This year’s day four puzzle was interesting, and - that’s why I’m writing this blogpost - it becomes a good example of how sometimes your algorithms perform significantly worse than you expect them to.

The puzzle

Let’s first look at the puzzle. Effectively, you’re given a list of Scratchards. Each scratchcard can win other scratchcards. The cards are ordered, and you always win the cards behind it. In other words: Suppose your looking at card n, which wins 5 other cards. You can then add cards n + 1, n + 2, n + 3, n + 4 and n + 5 to your collection.

You’re tasked with finding the total amount of cards you’ve won.

First attempt: Naive solution

In my code, I started out with this¹:

cards = [
    (0, 4), # Card 0 wins 4 other cards
    (1, 0), # Card 1 wins 0 other card
    (2, 3), # Card 2 wins 3 other cards
    # ...
]

I then wrote a function that goes through all cards:

cards_stack = [card for card in cards]

for index, num_won in cards_stack:
    cards_stack.extend(cards[index + 1:index + num_won + 1])

We can then simply count the amount of cards in the stack:

print(len(cards_stack))

I tried it out with the example on the page and sure enough; it ran in a few milliseconds. Then I tried running it with the actual input. It took about 50 seconds. 50 seconds might not sound that bad, but it’s stunning for such a puzzle.

So why does this happen?

Big O notation

The Big O notation is a way to describe the performance of an algorithm. It gives an upper bound on the runtime of an algorithm on the size of its input. We write it like this: $\mathcal{O}(f)$ . The function $f$ is the upper bound on the runtime of the algorithm. Common Big Os are:

$\mathcal{O}(1)$ : The algorithm runs in constant time. This means that the runtime of the algorithm is independent of the size of the input.
$\mathcal{O}(n)$ : The algorithm runs in linear time. This means that the runtime of the algorithm is proportional to the size of the input. Intuitvely, one would need to look at every element in the input.

A good example is searching for an element in an (unsorted) array. You need to look at every element in the array to find the element you’re looking for.
$\mathcal{O}(n^2)$ : The algorithm runs in quadratic time. This means that the runtime of the algorithm is proportional to the square of the size of its input.

A good example is the bubble sort algorithm. It compares every element in the array with every other element in the array.
$\mathcal{O}(n \log n)$ (pronounced “n log n”): The algorithm runs in linearithmic time. This means that the runtime of the algorithm is proportional to the size of the input times the logarithm of the size of the input.

A good example is the merge sort algorithm. It splits the array in half, sorts the two halves, and then merges them together. The runtime of the algorithm is $\mathcal{O}(n \log n)$ because the array is split in half $\log n$ times, and each split takes $\mathcal{O}(n)$ time.
$\mathcal{O}(\log n)$ . The algorithm runs in logarithmic time. This means that the runtime of the algorithm is proportional to the logarithm of the size of the input.

A good example is the binary search algorithm. It splits the array in half, and then searches the half that contains the element you’re looking for. It then splits that half in half, and so on. The runtime of the algorithm is $\mathcal{O}(\log n)$ because the array is split in half $\log n$ times.

Careful with Big O

Note that Big O gives an asymptotic approximation of the runtime. As the size of the input grows towards infinity, the runtime of the algorithm is bounded by the function inside the O.

This explains why $\mathcal{O}(n^2) \supseteq \mathcal{O}(2n) = \mathcal{O}(n)$ : If an algorithm grows at most linearlly, it will trivially also grow at most quadratically.

However, if an algorithm is in $\mathcal{O}(n^2)$ , this doesn’t mean it will always be slower than an algorithm in $\mathcal{O}(n)$ . Let’s call the algorithm in $\mathcal{O}(n^2)$ $Q$ (quadratic), and the algorithm in $\mathcal{O}(n)$ $L$ (linear). Let’s assume $Q$ takes 2 second to run on an input of size 10 and $L$ takes 10 seconds to run on an input of size 10.

If we now double the input size to 20, $Q$ will take 4 8 seconds to run², and $L$ will take 20 seconds to run. So $Q$ is faster than $L$ .

When talking about Big O notation, always remember that we’re talking about the asymptotic behaviour of an algorithm.

Let’s look at how different functions grow:

A graph of how different functions grow. Taken with permission from Wikimedia. By Cmglee - Own work, CC BY-SA 4.0,
https://commons.wikimedia.org/w/index.php?curid=50321072 — A graph of how different functions grow. Taken with permission from Wikimedia. By Cmglee - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=50321072

It’s easy to see that $\mathcal{O}(1) \subseteq \mathcal{O}(\log n) \subseteq \mathcal{O}(n) \subseteq \mathcal{O}(n \log n) \subseteq \mathcal{O}(n^2) \subseteq \mathcal{O}(2^n) \subseteq \mathcal{O}(n!)$ .

So why is the algorithm $\mathcal{O}(n!)$ ?

Consider this: Suppose you have 100 cards. The first card wins 99 cards, the second one wins 98 cards, and so on. The last card wins 0 cards (basically each card wins all others behind it). The algorithm will look at the first card, and add 99 cards to the stack. Each of the added cards will add 98 cards to the stack. Those in turn will each add 97 cards to the stack.

Having realized this, what’s a better way to solve the puzzle?

Memoization avoids unnecessary computations

When we take a step back, we can actually see that there is a different way to express our algorithm:

cards = [
    (0, 4), # Card 0 wins 4 other cards
    (1, 0), # Card 1 wins 0 other card
    (2, 3), # Card 2 wins 3 other cards
    # ...
]

def get_num_cards_won(card_index):
    ret = cards[card_index][1] # This card wins this many other cards
    for index, num_won in cards[card_index + 1:card_index + num_won + 1]:
        ret += get_num_cards_won(index) # The cards it won, win this many other cards themselves

    return ret

This function is recursive, in other words: It calls itself. Moreover, the function is pure: It doesn’t have any side effects. Side effects are actions that change the state of the program (setting a variable, printing to the console, etc.). The nice thing about pure functions is that for a given input $\alpha$ they will always return the same output $\beta_{\alpha}$ .

We can exploit this behavior to avoid unnecessary computations. Instead of actually calling the function, we can store the result of a function call, and only compute it, if it’s missing³:

cache = {}

def get_num_cards_won(card_index):
    if card_index in cache: # If we've already done the calculation, just use that
        return cache[card_index]

    ret = cards[card_index][1] # This card wins this many other cards
    for index, num_won in cards[card_index + 1:card_index + num_won + 1]:
        ret += get_num_cards_won(index) # The cards it won, win this many other cards themselves
    cache[card_index] = ret

    return ret

This is called memoization. We check if our cache contains the result of the function call for a given input, and only do the work if absolutely necessary.

Exercise for the reader

Fibonacci numbers are a famous example of a recursive function. The $n$ -th Fibonacci number is defined as $F_n = F_{n - 1} + F_{n - 2}$ , with $F_0 = 0$ and $F_1 = 1$ . You can try implementing a function that computes the $n$ -th Fibonacci number and which uses memoization.

This approach drastically decreases the amount of calculations we need to do: Because for each card we calculate the result only once, we only need to look at each card once. This means that our algorithm is now $\mathcal{O}(n)$ .

Benchmarks

My solution includes a tiny benchmark that compares the two solutions. It uses time.perfcounter to measure the time each solution takes to run. I ran it on my machine (Ryzen R9 7950X, 64GB RAM⁴) and got the following results:

Part 2 (naive):             (50.370501 seconds)
Part 2 (with memoization):  (0.0506474 seconds)

That’s an improvement by 99.9%!

Closing remarks

This only scratches the surface of algorithms and Big O notation: Typically when analyzing algorithms, we’re interested in best case, average and worst case runtime complexity⁵. We can also look at the ammortized runtime complexity of an algorithm. Moreover, we haven’t considered space complexity yet.

Lastly, Python has the functools.cache decorator to do memoization.

2025-01-21: A previous version of this article gave the wrong increase in runtime. The mistake has been fixed. Thanks to Enric Solé-Farré for catching the error!

Don’t judge this too hard, this is mainly for illustrative purposes. My actual code can be found here. ↩
There was an error in the previous version of the article. It has been fixed now. ↩
This is a very naive implementation of memoization. A better implementation would be to use a dictionary that maps the input to the output, and then use a decorator to memoize the function. I didn’t do this because I wanted to keep the code as simple as possible. Also our function isn’t technically pure anymore because it has side effects (it mutates the cache). ↩
Not that it matters that much. ↩
For example, for sorting, the best case is an already sorted array and the worst case is an array sorted in reverse order. ↩