25 Jun 2013

Clojure style: accessing maps and records

There are three ways of accessing the member :key of a map my-map:
(:key my-map)
(my-map :key)
(get my-map :key)
What are the drawbacks and advantages of each, and which one should you use? What should a style guide recommend?

Since Clojure is still a young language, there are not too many style guides available. Two popular sources for style guidance are Bozhidar Batsov's guide and clojure.org's Library Coding Standards. Both of them recommend the first form, (:key my-map). The ZenRobotics style guide (as yet unpublished) takes an opposing view: you should use the second form, (my-map :key).

What's wrong with (:key my-map)? Of course it works just fine:
user=> (def my-map {:foo 1 :bar 2})
#'user/my-map
user=> (:foo my-map)
1
The problem is that it works just as fine for too many cases:
user=> (:foo (java.util.LinkedList. [1 2 3]))
nil
user=> (:foo [1 2 3])
nil
user=> (:foo nil)
nil
whereas (my-map :foo) gives you an error for these inputs:
user=> ((java.util.LinkedList. [1 2 3]) :foo)
ClassCastException java.util.LinkedList cannot be cast to clojure.lang.IFn  user/eval27 (NO_SOURCE_FILE:12)
user=> ([1 2 3] :foo)
IllegalArgumentException Key must be integer  clojure.lang.APersistentVector.invoke (APersistentVector.java:261)
user=> (nil :foo)
CompilerException java.lang.IllegalArgumentException: Can't call nil, compiling:(NO_SOURCE_PATH:14)
An error in each of these cases is good, because usually you shouldn't be looking for the key :foo in a linked list or a vector, and the sooner you discover it, the less time you'll spend debugging why some other code is getting a nil input. The (get my-map :foo) form has the same problem: it allows non-maps as input and just returns nil.

Of course, a style guide must balance various concerns, one of which is conciseness. Our code does have multiple instances of calls like
(filter :foo list-of-maps)
where #(% :foo) would be considered just too verbose. Similarly, sometimes you do want to make your code work with various different types and just get nil when the type is not applicable. A good thing about keeping (my-map :foo) as the general rule is that uses like this stand out, so if you're looking for why you're only getting nil values somewhere, you immediately think of list-of-maps being not actually maps at all - perhaps a case of making wrong code look wrong.

Often even better than calling maps as functions is to define a version of get that ensures errors for another kind of mistakes:
(defn get*
  "Like clojure.core/get but throws an error if
  result is not found."
  [m k]
  (let [result (get m k ::not-found)]
    (if (= result ::not-found)
      (throw (Error. (format
         "Key %s not found among %s" k (keys m))))
      result)))
Now (get* {:foo 1 :bar 2} :baz) is an error, so using get* guards against typos in map keys.

Once code matures, it is common to replace hash-maps with records. With records the same rule won't work:
user=> (defrecord MyRecord [x y z size value material])
user.MyRecord
user=> (def r (->MyRecord 4.2 5.2 -0.9 7.0 9 :plastic))
#'user/r
user=> r
#user.MyRecord{:x 4.2, :y 5.2, :z -0.9, :size 7.0, :value 9, :material :plastic}
user=> (:x r)
4.2
user=> (r :x)
ClassCastException user.MyRecord cannot be cast to clojure.lang.IFn  user/eval108 (NO_SOURCE_FILE:23)
user=> (get r :x)
4.2
The get* function defined above works fine even for records, but an intriguing alternative is to take advantage of the Java interface associated with each record type and use Java interop to access fields:
user=> (.x r)
4.2
user=> (.x [1 2 3])
IllegalArgumentException No matching field found: x for class clojure.lang.PersistentVector  clojure.lang.Reflector.getInstanceField (Reflector.java:271)
The immediate problem with this is that some unrelated objects could have an x field or method, but since we are looking for type safety, it would be recommended to tag the record object with the interface - usually in the function signature, but we can demonstrate with let:
user=> (let [^MyRecord my-record r] (.x my-record))
4.2
user=> (defrecord OtherRecord [x y velocity])
user.OtherRecord
user=> (def r' (->OtherRecord 1 2 3))
#'user/r'
user=> (let [^MyRecord my-record r'] (.x my-record))
ClassCastException user.OtherRecord cannot be cast to user.MyRecord  user/eval158 (NO_SOURCE_FILE:46)
Since this can compile into a straightaway field access, it is actually very slightly faster than using the keyword as a function:
user=> (time (dotimes [_ 1000000]
        (let [^MyRecord my-record r] (.x my-record))))
"Elapsed time: 3.081 msecs"
nil
user=> (time (dotimes [_ 1000000]
        (let [^MyRecord my-record r] (:x my-record))))
"Elapsed time: 6.24 msecs"
nil
But there is a far more devious problem with this idea:
user=> r
#user.MyRecord{:x 4.2, :y 5.2, :z -0.9, :size 7.0, :value 9, :material :plastic}
user=> (.size r)
6
How did that happen? The value should be 7 but we get 6.

The reason is that MyRecord is, among other things, a java.util.Collection:
user=> (ancestors MyRecord)
#{clojure.lang.IMeta clojure.lang.Associative clojure.lang.Counted clojure.lang.IObj java.io.Serializable clojure.lang.ILookup java.lang.Object clojure.lang.IRecord java.lang.Iterable clojure.lang.Seqable clojure.lang.IPersistentCollection java.util.Map clojure.lang.IKeywordLookup clojure.lang.IPersistentMap}
... and Collection has a nullary method named size, and since apparently method calls take precedence over field accesses, that was a call to the size method, which told us that the record had six entries.

For records, we have so far decided to live with get* though some code does use the (.field record) form, which means we have to take care not to name any record field one of these:
user=> (defrecord Empty [])
user.Empty
user=> (sort (map :name (filter #(and (instance? clojure.reflect.Method %) (empty? (.parameter-types %))) ((reflect Empty) :members))))
(clear count empty entrySet getBasis hashCode isEmpty iterator keySet meta seq size values)
Out of these, mainly size and count are really plausible names for record fields. The danger, of course, is that future versions of Clojure add more methods.

5 comments:

  1. The Clojure style of error handling is to use nil in multiple cases. Using Java exceptions can break functional composition in unpredictable ways. Also if you start mixing exceptions where there usually isn't any, you must reason about your code in two different ways. So using the keyword as the map lookup function behaves properly when compared to other core Clojure functions such as: assoc, get, get-in and so on. Where the interface is consistent: with a missing value, empty map or an unsupported object; return nil.

    If you use the keyword as a function, you gain some nice interfaces to maps. You already mentioned using a keyword as a filter fn. You can make an exception for yourself to use keywords as functions in these cases, but with that mindset, you might not think how to compose with keywords (juxt and threading comes to mind first).

    ReplyDelete
    Replies
    1. The point of throwing an exception here is to catch programming mistakes such as typos, not conditions that should normally occur at run time. It is true that many Clojure functions just return nil for errors, but if you intended to get a non-nil value out of these functions and use it, eventually there will be some kind of error.

      In a good case you expect the value to be something like a number or a function and attempt to use it, and get an exception -- e.g. (inc nil), (< 1 nil), and (nil) all throw. In a bad case you just get a result that is not what you intended. In both cases getting an error earlier helps you find the actual mistake in the code.

      Yes, we all like to write things like

      (-> input-record :x (->> (partial >)) (comp :x) (filter records) count)

      and when experimenting with one-liners in the REPL, usually you'll notice when you have a typo or a mistaken assumption. But if you check code like that into a larger codebase, it pays to make sure the callers of the code provide data in the expected form. The difference between "NullPointerException" and "Error Key :x not found among (:z :y :x0)" is appreciated by the external user of the code.

      Delete
    2. I actually like fail fast behavior in applications, and even using Option monads, I'm just argumenting about the Clojure style.

      Have you used Clojure post and pre conditions? And what do you think about them? And how about the practice of just using more rigid error checking at the API boundaries and using unit tests to check for errors?

      Delete
  2. How can I create a function that accepts one argument and then use that argument as a key to lookup a value in a map variable?

    ReplyDelete