Book notes from Elements of Clojure by Zach Tellman.
I created a GitHub project for the corresponding “code notes”.
Review
It’s hard to imagine a book like this being written about another language.
Chapter 1: Names
”Names should be narrow and consistent.”
- Narrow means that the name cannot represent anything else
- Consistent means that the name is congruent with the surrounding code and should not be misunderstood by someone familiar with the codebase
- The textual representation of a name is its sign
- The thing a name refers to is its referent
- How a name is used is its sense
- Narrowness does not equal specificity
- Describe the purpose of the function, not its implementation
- Consider a name’s sense when thinking about referential transparency
- The only way to achieve true consistency is to have a one-to-one relationship between signs and senses
- Favour synthetic names over natural names to avoid ambiguity
- Synthetic names allow experts to communicate without ambiguity
- Novices are forced to learn the lexicon if they want to participate—a monad has no sense to a layperson
- Natural names allow everyone to reason by analogy—great to for quickly grokking a codebase, bad for ensuring reducing ambiguity
- Choose accordingly!
Naming Data
- The relationship between our code and the outside world can be adversarial—we should make invariant checks at the periphery of our code
var
s provide indirection by hiding the underlying value; function parameters provide indirection by hiding the implementation of the invoking function- We don’t need to name every intermediate result when transforming data
- Consistent code means fewer deep dives to understand a codebase’s core concepts
- Being able to skim and quickly understand Clojure code is a function of the language’s syntax and use of immutable data structures (as well as an individual’s experience)
“If a function’s name is more self-explanatory than any name you can think of, it should be an anonymous function.”
Idiomatic Clojure names
- Could be anything:
x
- A sequence of anything
xs
- Arbitrary function:
f
- Sequence of arbitrary functions:
fs
- Arbitrary map:
m
- Sequence of arbitrary maps:
ms
- Self-reference:
this
- Arguments of the same datatype:
[a b c & rst]
- Arbitrary expression:
form
Narrowing
- Maps of more narrowly named data, e.g.:
class->students
,department->classes->student
- Tuples of more narrowly named data,. e.g.:
tutor+student
- A sequence of
tutor
-student
tuples could betutor+students
, but this could be conflated withtutor
-sequence ofstudent
tuples—a synthetic name here can remove ambiguity - Clearly document synthetic names!
Naming Functions
- Our data scope at runtime is any data accessible by our thread
- Functions can do three things: pull new data into scope, transform data, push data into a different scope
- One function in every process needs to do all three, but most functions should do only one
”Shared mutable state creates asymmetric scopes.”
- Functions that cross scope boundaries should have a verb in the name
- Functions that pull data from another scope should have the returned type in their name
- Functions that push data into another scope should communicate their side effect
If a function only transforms data, we should avoid verbs wherever possible.
Naming Macros
”There are two kinds of macros: those that we understand syntactically, and those that we understand semantically.”
- If we are required to understand a macro syntactically, this is a poor form of indirection
- Macros that include
with
,def
orlet
in their name should have predictable macroexpanded forms - It is difficult for a macro to be self-evident—the macro-expanded form and semantics matter more than the name
Chapter 2: Idioms
Inequalities
- Favour
<
and<=
for - Infix, prefix
- Left or right associative?
Defaults for accumulating functions
- Offer every arity if a function accumulates
- Lots of Clojure functions comes with sensible defaults for niladic functions—e.g.,
concat
,conj
- The niladic/zero-arity variant of a function returns the identity value
- Combining the identity value with any other argument leaves the argument unchanged—e.g.,
0
for+
,1
for*
- Generally only the niladic and dyadic versions of a function will be interesting
- Monoids are sets that have a dyadic function as well as an identity value (result of the niladic function)
- Monoids are useful for passing to
reduce
- Do not implement an identity value for a function if an obvious one does not exist—an exception is better than unexpected behaviour
Option maps over named parameters
- Using positional parameters and giving them default values means implicitly defining a hierarchy in terms of which parameters are most likely to change
- Using an option map means that invocations of a function do not need to change even though the potential arguments have
- Non-positional keyword arguments carry the performance overhead of having to construct a hash map each time the function is called (but option maps still carry a performance overhead relative to positional parameters)
In performance-sensitive contexts, we should only use positional parameters.
- On the spectrum of positional arguments to non-positional keyword arguments to option maps, choose an option map unless you have a good reason not to (e.g., performance)
Bindings
- No one should have to know you’ve used a binding
- Bindings and dynamic vars break referential transparency (as do side effects)
- Laziness, generally speaking, relies on referential transparency
- Higher-order functions assume referential transparency - where and when a function that is passed in as an argument is called is an implementation detail
- Large enough chunked sequences could have all-but-the-first chunks realised outside of a binding - this could lead to some hard-to-diagnose bugs
- Consider
with-redefs
if you need to use a dynamic var
Favour atoms for mutable state
- A state container’s utilisation is a measure of how often it is in the process of being updated
- Rule of thumb: you will see update retires increase dramatically when utilisation approaches 60%
- Clojure’s software transaction memory (STM) implementations (i.e.,
ref
s) offer better throughput in pathological cases- But you are highly unlikely to need this throughput lift, and should favour the simplicity and reliability of
atom
s
- But you are highly unlikely to need this throughput lift, and should favour the simplicity and reliability of
- Lazy realisation of updates via
alter
,commute
orref-set
can lead to hard-to-find bugs - Having state defined by multiple refs makes it difficult to get a consistent snapshot of our state at any time - because updates to one ref can happen while we are reading another, we need to wrap our reading of the entire state in its own transaction
- STM is useful: in write-heavy workloads that can’t be offloaded to a database
- You need to have a good reason to not represent mutable state as a single atom
Communicate side effects consistently
- Valid idioms for implying a side effect is occurring:
- Redundant
do
block - Leaving whitespace around the side-effecting function
- Binding the return value of the side-effecting function to
_
within alet
block
- Redundant
- All are fine, just use one consistently within a codebase
Use data structure-specific functions where possible
- Just because two ways of doing something are functionally equivalent doesn’t mean they are equal to the reader
- If it matters that the data structure is a vector or map or set, use the narrowest possible accessor for said data structure
- Do this hand-in-hand with intentional naming (e.g., for a map:
key->value
) - E.g., for a map
k->v
,(keys k->v)
is much better than(map key k->v)
- the former can only be a map
- Do this hand-in-hand with intentional naming (e.g., for a map:
Don’t bother with letfn
- If you need a
let
-bound function, just splitlet
andfn
for readability and consistency with other special forms that use bindings - Being able to forward declare functions from one another within the binding of
letfn
is one (and maybe the only) advantage of usingletfn
Don’t obfuscate Java interop
- Clojure data structures are understood by their internals
- Java data structures are understood by their names
- Stay away from anything (e.g., the
..
macro) that can make it non-obvious that a Java method is being used on a Java object rather than a Clojure function
You don’t always need a transducer; use for
for cartesian products
- Think carefully about sacrificing readability for performance when introducing transducers
for
’s best quality is to provide a readable way (without any:when
or:let
clauses) to generate all the possible combinations of lists - i.e., the cartesian product
Nil
- Ambiguity makes our code more concise, but unbounded ambiguity makes it impossible to reason about
- We must interpret
nil
at regular intervals throughout our code - a synthetically named keyword might be better at communicating absence than the genericnil
- Don’t wrap in
when
!- If the result can be coerced to an empty collection (or something else sensible), do that
- Or else throw an error
- Don’t pass back a generic
nil
to the caller
Chapter 3: Indirection
- Indirection provides separation between what and how
- It allows readers of our your code to be able to stop reading at some point and be OK with not dissecting the underlying implementation of a function or macro
- Two fundamental components of indirection:
- References (each paired with its referent)
- A name is a lexical reference that is dereferenced at compile time
- A pointer is a memory reference that is dereferenced at runtime
- Conditionals
- References (each paired with its referent)
- Generally speaking, dereferencing happens implicitly - e.g., in passing the symbol (reference) for a sequence to
map
, the referent is implicitly accessed so that a function can be applied to each element- This is not the case for Clojure’s stateful constructs (e.g.,
atom
s)
- This is not the case for Clojure’s stateful constructs (e.g.,
- Conditionals are used for deciding functionality based on some input (e.g., input type, input value, input value relative to something else (index))
- This can mean grouping types together to make use of a generic function
- Or splitting behaviour based on type (or otherwise)
- References are open - we can change the behaviour of a program by changing a referent
- Conditionals are closed - we need to change the underlying implementation of the code to change the program’s behaviour
- Just varying the arguments to a conditional isn’t enough - conditionals are ordered