Monday, 16 July 2012

Replicating Datomic/Datalog queries with core.logic

I've been toying with Datomic recently, and I particularly like the power of it's query language (~Datalog). Mr Halloway showed a couple of months ago how the query engine is generic enough to be run on standard Clojure collections, gist here. Here is an example from that page of a simple join; A question you might ask yourself is how can I use core.logic to do the same kind of queries? It turns out that it's pretty straight forward, and also very fast. Core.logic provides some convenient helper functions for unification, that we are going to use. Here's an example of how to get a binding map for some logical variables over a collection; Let's write a little helper function that maps binding-map over all elements of a seq (of tuples) (I'm using binding-map* so I only need to prep the rule once) We can use clojure.set/join to perform the natural join of 2 sets of binding maps like so; Note that I also pick out just the first/height lvars here, this corresponds to the :find clause in the datomic query. That's it really, not as generic (and data driven) as the datomic query, but working; Here's the kicker, for this join query the datomic.api/q's time complexity estimates to O[n2] (actually 22n2-50n) where as the time complexity for core.logic/unify + clojure.set/join solution is O[n] (10n). That means that for a run over a modest dataset of size 5000 the difference is 50x!

Edit: The Datomic query used in the benchmark is not optimal as it turns out. In the example below you'll see a more optimal version that's infact ~18x faster than the core.logic + clojure.set/join solution. Obviously this little example doesn't convey the true power of either datomic/datalog or core.logic/unifier. Be careful writing your Datomic queries, the running time can be vastly different!

Here some more of the datomic queries converted in a similar fashion.

2 comments:

  1. The fast way to express that in Datomic is this:

    user=> (dotimes [_ 10]
    (bench 5000 (partial q '[:find ?first ?height
    :in $a $b
    :where [$a ?last ?first ?email] [$b ?email ?height]])))
    "Elapsed time: 8.88 msecs"
    "Elapsed time: 9.323 msecs"
    "Elapsed time: 9.507 msecs"
    "Elapsed time: 10.077 msecs"
    "Elapsed time: 10.013 msecs"
    "Elapsed time: 7.722 msecs"
    "Elapsed time: 9.141 msecs"
    "Elapsed time: 11.807 msecs"
    "Elapsed time: 11.712 msecs"
    "Elapsed time: 9.225 msecs"
    nil

    ;;core.logic on the same machine for comparison:

    user=> (dotimes [_ 10] (bench 5000 join-test))
    "Elapsed time: 133.484 msecs"
    "Elapsed time: 133.037 msecs"
    "Elapsed time: 133.162 msecs"
    "Elapsed time: 132.284 msecs"
    "Elapsed time: 137.815 msecs"
    "Elapsed time: 133.844 msecs"
    "Elapsed time: 135.673 msecs"
    "Elapsed time: 131.266 msecs"
    "Elapsed time: 134.066 msecs"
    "Elapsed time: 137.533 msecs"
    nil

    At some point we will turn your former query into the latter. Please be careful doing benchmarking such as this. Datomic should prove to be much faster at queries than core.logic, as Datalog is set oriented and core.logic is tuple-oriented.

    Rich

    ReplyDelete
  2. Thanks for this Rich, will update accordingly. This highlights the importance of good datalog/query documentation, the impacts on performance can be massive.

    ReplyDelete

Note: only a member of this blog may post a comment.