Book notes from Elements of Clojure by Zach Tellman.

I created a GitHub project for the corresponding “code notes”.

Review

It’s hard to imagine a book like this being written about another language.

Chapter 1: Names

”Names should be narrow and consistent.”

  • Narrow means that the name cannot represent anything else
  • Consistent means that the name is congruent with the surrounding code and should not be misunderstood by someone familiar with the codebase
  • The textual representation of a name is its sign
  • The thing a name refers to is its referent
  • How a name is used is its sense
  • Narrowness does not equal specificity
  • Describe the purpose of the function, not its implementation
  • Consider a name’s sense when thinking about referential transparency
  • The only way to achieve true consistency is to have a one-to-one relationship between signs and senses
  • Favour synthetic names over natural names to avoid ambiguity
  • Synthetic names allow experts to communicate without ambiguity
  • Novices are forced to learn the lexicon if they want to participate—a monad has no sense to a layperson
  • Natural names allow everyone to reason by analogy—great to for quickly grokking a codebase, bad for ensuring reducing ambiguity
  • Choose accordingly!

Naming Data

  • The relationship between our code and the outside world can be adversarial—we should make invariant checks at the periphery of our code
  • vars provide indirection by hiding the underlying value; function parameters provide indirection by hiding the implementation of the invoking function
  • We don’t need to name every intermediate result when transforming data
  • Consistent code means fewer deep dives to understand a codebase’s core concepts
  • Being able to skim and quickly understand Clojure code is a function of the language’s syntax and use of immutable data structures (as well as an individual’s experience)

“If a function’s name is more self-explanatory than any name you can think of, it should be an anonymous function.”

Idiomatic Clojure names

  • Could be anything: x
  • A sequence of anything xs
  • Arbitrary function: f
  • Sequence of arbitrary functions: fs
  • Arbitrary map: m
  • Sequence of arbitrary maps: ms
  • Self-reference: this
  • Arguments of the same datatype: [a b c & rst]
  • Arbitrary expression: form

Narrowing

  • Maps of more narrowly named data, e.g.: class->students, department->classes->student
  • Tuples of more narrowly named data,. e.g.: tutor+student
  • A sequence of tutor-student tuples could be tutor+students, but this could be conflated with tutor-sequence of student tuples—a synthetic name here can remove ambiguity
  • Clearly document synthetic names!

Naming Functions

  • Our data scope at runtime is any data accessible by our thread
  • Functions can do three things: pull new data into scope, transform data, push data into a different scope
  • One function in every process needs to do all three, but most functions should do only one

”Shared mutable state creates asymmetric scopes.”

  • Functions that cross scope boundaries should have a verb in the name
  • Functions that pull data from another scope should have the returned type in their name
  • Functions that push data into another scope should communicate their side effect

If a function only transforms data, we should avoid verbs wherever possible.

Naming Macros

”There are two kinds of macros: those that we understand syntactically, and those that we understand semantically.”

  • If we are required to understand a macro syntactically, this is a poor form of indirection
  • Macros that include with, def or let in their name should have predictable macroexpanded forms
  • It is difficult for a macro to be self-evident—the macro-expanded form and semantics matter more than the name

Chapter 2: Idioms

Inequalities

  • Favour < and <= for
  • Infix, prefix
  • Left or right associative?

Defaults for accumulating functions

  • Offer every arity if a function accumulates
  • Lots of Clojure functions comes with sensible defaults for niladic functions—e.g., concat, conj
  • The niladic/zero-arity variant of a function returns the identity value
  • Combining the identity value with any other argument leaves the argument unchanged—e.g., 0 for +, 1 for *
  • Generally only the niladic and dyadic versions of a function will be interesting
  • Monoids are sets that have a dyadic function as well as an identity value (result of the niladic function)
  • Monoids are useful for passing to reduce
  • Do not implement an identity value for a function if an obvious one does not exist—an exception is better than unexpected behaviour

Option maps over named parameters

  • Using positional parameters and giving them default values means implicitly defining a hierarchy in terms of which parameters are most likely to change
  • Using an option map means that invocations of a function do not need to change even though the potential arguments have
  • Non-positional keyword arguments carry the performance overhead of having to construct a hash map each time the function is called (but option maps still carry a performance overhead relative to positional parameters)

In performance-sensitive contexts, we should only use positional parameters.

  • On the spectrum of positional arguments to non-positional keyword arguments to option maps, choose an option map unless you have a good reason not to (e.g., performance)

Bindings

  • No one should have to know you’ve used a binding
  • Bindings and dynamic vars break referential transparency (as do side effects)
  • Laziness, generally speaking, relies on referential transparency
  • Higher-order functions assume referential transparency - where and when a function that is passed in as an argument is called is an implementation detail
  • Large enough chunked sequences could have all-but-the-first chunks realised outside of a binding - this could lead to some hard-to-diagnose bugs
  • Consider with-redefs if you need to use a dynamic var

Favour atoms for mutable state

  • A state container’s utilisation is a measure of how often it is in the process of being updated
    • Rule of thumb: you will see update retires increase dramatically when utilisation approaches 60%
  • Clojure’s software transaction memory (STM) implementations (i.e., refs) offer better throughput in pathological cases
    • But you are highly unlikely to need this throughput lift, and should favour the simplicity and reliability of atoms
  • Lazy realisation of updates via alter, commute or ref-set can lead to hard-to-find bugs
  • Having state defined by multiple refs makes it difficult to get a consistent snapshot of our state at any time - because updates to one ref can happen while we are reading another, we need to wrap our reading of the entire state in its own transaction
  • STM is useful: in write-heavy workloads that can’t be offloaded to a database
  • You need to have a good reason to not represent mutable state as a single atom

Communicate side effects consistently

  • Valid idioms for implying a side effect is occurring:
    • Redundant do block
    • Leaving whitespace around the side-effecting function
    • Binding the return value of the side-effecting function to _ within a let block
  • All are fine, just use one consistently within a codebase

Use data structure-specific functions where possible

  • Just because two ways of doing something are functionally equivalent doesn’t mean they are equal to the reader
  • If it matters that the data structure is a vector or map or set, use the narrowest possible accessor for said data structure
    • Do this hand-in-hand with intentional naming (e.g., for a map: key->value)
    • E.g., for a map k->v, (keys k->v) is much better than (map key k->v) - the former can only be a map

Don’t bother with letfn

  • If you need a let-bound function, just split let and fn for readability and consistency with other special forms that use bindings
  • Being able to forward declare functions from one another within the binding of letfn is one (and maybe the only) advantage of using letfn

Don’t obfuscate Java interop

  • Clojure data structures are understood by their internals
  • Java data structures are understood by their names
  • Stay away from anything (e.g., the .. macro) that can make it non-obvious that a Java method is being used on a Java object rather than a Clojure function

You don’t always need a transducer; use for for cartesian products

  • Think carefully about sacrificing readability for performance when introducing transducers
  • for’s best quality is to provide a readable way (without any :when or :let clauses) to generate all the possible combinations of lists - i.e., the cartesian product

Nil

  • Ambiguity makes our code more concise, but unbounded ambiguity makes it impossible to reason about
  • We must interpret nil at regular intervals throughout our code - a synthetically named keyword might be better at communicating absence than the generic nil
  • Don’t wrap in when!
    • If the result can be coerced to an empty collection (or something else sensible), do that
    • Or else throw an error
    • Don’t pass back a generic nil to the caller

Chapter 3: Indirection

  • Indirection provides separation between what and how
  • It allows readers of our your code to be able to stop reading at some point and be OK with not dissecting the underlying implementation of a function or macro
  • Two fundamental components of indirection:
    • References (each paired with its referent)
      • A name is a lexical reference that is dereferenced at compile time
      • A pointer is a memory reference that is dereferenced at runtime
    • Conditionals
  • Generally speaking, dereferencing happens implicitly - e.g., in passing the symbol (reference) for a sequence to map, the referent is implicitly accessed so that a function can be applied to each element
    • This is not the case for Clojure’s stateful constructs (e.g., atoms)
  • Conditionals are used for deciding functionality based on some input (e.g., input type, input value, input value relative to something else (index))
    • This can mean grouping types together to make use of a generic function
    • Or splitting behaviour based on type (or otherwise)
  • References are open - we can change the behaviour of a program by changing a referent
  • Conditionals are closed - we need to change the underlying implementation of the code to change the program’s behaviour
    • Just varying the arguments to a conditional isn’t enough - conditionals are ordered