Monday, 27 August 2012

Some more Datalog

I've written about datalog and Datomic a bit recently. To conclude here's another post comparing execution speed with the contrib.datalog library, by Jeffrey Straszheim. Clojure1.4 ready source can be found here.

The example I'm using in my benchmark is a simple join between two relations, in datomic/datalog it would look like this; In contrib.datalog the same query requires a bit more ceremony, you can write it like this; In my previous posts I described a number of different way to use core.logic, unify+clojure.set/join to replicate the same query. How does the execution times compare? I use the same benchmark as in the previous post (the same query, with 5000 joins between the 2 'tables').

Datomic/datalog is fastest by far needing ~11ms to complete. Second fastest is the unify + clojure.set/join solution described here about an order of magnitude slower at ~150ms. The core.logic defrel/fact and contrib.datalog is about equal in speed at ~320ms, ie. 2x slower than unify+join and ~30x slower than datomic/datalog.

Conclusion

My recent datalog posts focused on querying in-memoy data structures (like log rings etc). This is not exactly what Datomic was designed to do, still it's query engine performs really well. An added bonus is that it's low on ceremony. The recently announced Datomic free edition eradicates some of my initial fear of using it in my projects. The main drawback is that Datomic is closed sourced, even the free edition. Another detail that's annoying is that even if you're just after the datalog/query machinery -- by adding datomic-free, you pull in all of datomics 27 jar dependencies weighing in at 19Mb. That's quite a heavy tax on your uberjar.

There are certainly alternatives, like core.logic and contrib.datalog. But the order of magnitude worse execution time can be hard to live with if your datasets are big. By using datomic/datalog queries you also have the flexibility to move into disk-based databases if your datasets explodes. More on this in upcoming blog posts.

For completeness, here's the join-test function I used for the contrib.datalog benchmark;

Sunday, 12 August 2012

cKanren time!

Mr David Nolen recently published core.logic 0.8.alpha2, with added cKanren (c for constraints) support. To celebrate this glorious event I'm writing up some core.logic/cKanren stuff I've been looking at recently.

Enter the Queens 

If you've followed this blog, you've perhaps seen my previous posts on solving N-Queens in core.logic (part1, part2). How will this look and perform using the new shiny cKanren extensions in core.logic 0.8? Obviously there are many (new) ways to solve this problem, here's a core.logic-irized version of the solution described in the cKanren paper (please read paragraph 4.2 for an in-depth explanation); distinctfd (with infd) is a really nice addition to core.logic, they really help me personally write logic programs. How does this code perform? It's very similar in speed compared to the (non-fd) core.logic solution described in my previous posting, not bad, all extra cKanren expressive power without any performance drop!

Sudoku time

How would you use core.logic to solve sudoku? Let's start by looking at a simple all-Clojure functional solution; A few things are worth mentioning. First, this code finds all (potential) solutions of a given board layout. This is a bit strange since a true sudoku board should only have one solution! This does make it a bit more similar to logic programs (which typically looks for many solutions), and gives you some nice perks; you can use this code to check if a given puzzle is indeed a true puzzle. Secondly, in order to terminate faster, this snippet uses a quite common sudoku heuristic called "most constrained". At each level of the backtracking search it will consider the open squares in order, sorted after the least possible numbers (i.e. most constrained first). This helps to "fail fast" and minimise the dead alleys we recursively search.

How would we do this in cKanren? As we'd expect the code comes very close the the definition of the rules; We have 9*9 squares, each square can contain a number from 1-9, the numbers in each row, column and 3*3 sub-square must be unique.

In this example I will use a 4*4 sudoku for simplicity. That's it, exactly how the rules for sudoku was written, logic programming really is magical!

Writing it this way is good for understanding the solution, but not very practical for a real 9*9 board. Fortunately we can write some more helpers to make this compact, here's an example from Mr Nolen himself; This solution uses a "pseudo" goal call everyg with is similar to all-distincto in the previous example.

So how fast is this then? How does it stack up against the hand-rolled implementation above? Well, on an empty board (all squares open) the hand-rolled code finds 5 solutions in 400ms, while this code above get's into a (>5min) loop. For for more realistic "hard" boards the core.logic solution is on average 20x faster than the hand rolled code (much less temporary objects I recon). Finally, on really evil boards like this one, discovered by Peter Norvig; the core.logic code terminates in ~6 seconds where as the hand rolled code loops forever (well, I gave up after 20 minutes).

Conclusion

In most cases where "searching" is involved, I warmly recommend using core.logic. The expressive power makes for easy to read code, and the performance cost over hand rolled code is either not very significant or reverse (i.e. core.logic is faster). The new constraints primitives (cKanren) in core.logic-0.8 is a great addition to an already impressive library.

Some other stuff
  • The excellent cKanren paper is getting a bit obsolete. It's still a very good read, but for the latest innovation check out it's github page, and ofcourse core.logic.
  • If you can't get enough of logic programming, the next step is to dip into the ocean of Prolog, there are plenty of awesome (and practical) books written over many years. Here's a good list of books to get you started