25 Jun 2013

Clojure style: accessing maps and records

There are three ways of accessing the member :key of a map my-map:
(:key my-map)
(my-map :key)
(get my-map :key)
What are the drawbacks and advantages of each, and which one should you use? What should a style guide recommend?

Since Clojure is still a young language, there are not too many style guides available. Two popular sources for style guidance are Bozhidar Batsov's guide and clojure.org's Library Coding Standards. Both of them recommend the first form, (:key my-map). The ZenRobotics style guide (as yet unpublished) takes an opposing view: you should use the second form, (my-map :key).

What's wrong with (:key my-map)? Of course it works just fine:
user=> (def my-map {:foo 1 :bar 2})
user=> (:foo my-map)
The problem is that it works just as fine for too many cases:
user=> (:foo (java.util.LinkedList. [1 2 3]))
user=> (:foo [1 2 3])
user=> (:foo nil)
whereas (my-map :foo) gives you an error for these inputs:
user=> ((java.util.LinkedList. [1 2 3]) :foo)
ClassCastException java.util.LinkedList cannot be cast to clojure.lang.IFn  user/eval27 (NO_SOURCE_FILE:12)
user=> ([1 2 3] :foo)
IllegalArgumentException Key must be integer  clojure.lang.APersistentVector.invoke (APersistentVector.java:261)
user=> (nil :foo)
CompilerException java.lang.IllegalArgumentException: Can't call nil, compiling:(NO_SOURCE_PATH:14)
An error in each of these cases is good, because usually you shouldn't be looking for the key :foo in a linked list or a vector, and the sooner you discover it, the less time you'll spend debugging why some other code is getting a nil input. The (get my-map :foo) form has the same problem: it allows non-maps as input and just returns nil.

Of course, a style guide must balance various concerns, one of which is conciseness. Our code does have multiple instances of calls like
(filter :foo list-of-maps)
where #(% :foo) would be considered just too verbose. Similarly, sometimes you do want to make your code work with various different types and just get nil when the type is not applicable. A good thing about keeping (my-map :foo) as the general rule is that uses like this stand out, so if you're looking for why you're only getting nil values somewhere, you immediately think of list-of-maps being not actually maps at all - perhaps a case of making wrong code look wrong.

Often even better than calling maps as functions is to define a version of get that ensures errors for another kind of mistakes:
(defn get*
  "Like clojure.core/get but throws an error if
  result is not found."
  [m k]
  (let [result (get m k ::not-found)]
    (if (= result ::not-found)
      (throw (Error. (format
         "Key %s not found among %s" k (keys m))))
Now (get* {:foo 1 :bar 2} :baz) is an error, so using get* guards against typos in map keys.

Once code matures, it is common to replace hash-maps with records. With records the same rule won't work:
user=> (defrecord MyRecord [x y z size value material])
user=> (def r (->MyRecord 4.2 5.2 -0.9 7.0 9 :plastic))
user=> r
#user.MyRecord{:x 4.2, :y 5.2, :z -0.9, :size 7.0, :value 9, :material :plastic}
user=> (:x r)
user=> (r :x)
ClassCastException user.MyRecord cannot be cast to clojure.lang.IFn  user/eval108 (NO_SOURCE_FILE:23)
user=> (get r :x)
The get* function defined above works fine even for records, but an intriguing alternative is to take advantage of the Java interface associated with each record type and use Java interop to access fields:
user=> (.x r)
user=> (.x [1 2 3])
IllegalArgumentException No matching field found: x for class clojure.lang.PersistentVector  clojure.lang.Reflector.getInstanceField (Reflector.java:271)
The immediate problem with this is that some unrelated objects could have an x field or method, but since we are looking for type safety, it would be recommended to tag the record object with the interface - usually in the function signature, but we can demonstrate with let:
user=> (let [^MyRecord my-record r] (.x my-record))
user=> (defrecord OtherRecord [x y velocity])
user=> (def r' (->OtherRecord 1 2 3))
user=> (let [^MyRecord my-record r'] (.x my-record))
ClassCastException user.OtherRecord cannot be cast to user.MyRecord  user/eval158 (NO_SOURCE_FILE:46)
Since this can compile into a straightaway field access, it is actually very slightly faster than using the keyword as a function:
user=> (time (dotimes [_ 1000000]
        (let [^MyRecord my-record r] (.x my-record))))
"Elapsed time: 3.081 msecs"
user=> (time (dotimes [_ 1000000]
        (let [^MyRecord my-record r] (:x my-record))))
"Elapsed time: 6.24 msecs"
But there is a far more devious problem with this idea:
user=> r
#user.MyRecord{:x 4.2, :y 5.2, :z -0.9, :size 7.0, :value 9, :material :plastic}
user=> (.size r)
How did that happen? The value should be 7 but we get 6.

The reason is that MyRecord is, among other things, a java.util.Collection:
user=> (ancestors MyRecord)
#{clojure.lang.IMeta clojure.lang.Associative clojure.lang.Counted clojure.lang.IObj java.io.Serializable clojure.lang.ILookup java.lang.Object clojure.lang.IRecord java.lang.Iterable clojure.lang.Seqable clojure.lang.IPersistentCollection java.util.Map clojure.lang.IKeywordLookup clojure.lang.IPersistentMap}
... and Collection has a nullary method named size, and since apparently method calls take precedence over field accesses, that was a call to the size method, which told us that the record had six entries.

For records, we have so far decided to live with get* though some code does use the (.field record) form, which means we have to take care not to name any record field one of these:
user=> (defrecord Empty [])
user=> (sort (map :name (filter #(and (instance? clojure.reflect.Method %) (empty? (.parameter-types %))) ((reflect Empty) :members))))
(clear count empty entrySet getBasis hashCode isEmpty iterator keySet meta seq size values)
Out of these, mainly size and count are really plausible names for record fields. The danger, of course, is that future versions of Clojure add more methods.