Monday, 23 July 2012

Untying the Recursive Knot

Here I present a couple of examples of the functional design pattern "untying the recursive knot". I've found this useful in a couple of occasions, for instance when breaking apart mutually recursive functions. Material inspired by Jon Harrop's excellent Visual F# to Technical Computing.

First, let's look at a simple factorial implementation using direct recursion; We can break the direct recursive dependency by replacing the recursive calls with calls to a function argument; We have now broken the recursive knot! The functionality can be recovered by tying the recursive knot using the following y combinator;For example;This has given us extra power, we may for instance inject new functionality into every invocation without touching the original definition;Now for a more practical example when we have mutually recursive functions;We can break these functions apart using the same strategy as with fact' above;Please note that the (declare ...) form is no longer required. The functions are now entirely independent and could live in different namespaces.

Using the same y combinator above, we can get back to the original functionalty;A handy pattern to add to your functional toolbox.

Tuesday, 17 July 2012

Replicating Datomic/Datalog queries with core.logic, take 2

This is a follow-up to my previous post on datalog-equivalent queries in core.logic.

Here I present an alternate way to do the unification and join inside core.logic (without having to use clojure.set/join). It uses the the relationships / facts API in core logic, described here. First let's consider this datomic query; In core.logic we start by defining the relationships between our 2 datasets; This mirrors the layout of the data above. The actual datapoints are defined as facts, and once we have those we can do a core.logic run* using goals with the same name as our defrels; The join is accomplished by using the same lvar (email) is both defrel goals.

Now consider a slightly more complicated query; Applying the same technique (with some more defrels) we get; Hmm, this looks suspiciously like a victim-of-a-macro(tm), maybe something like this; The two examples above can now be written like so; An important point here is that we have separated the definition of the relationships, definition (loading) of the facts, and the actual running of the query.

So how does this perform for larger datasets compared to the unification / clojure.set/join described in the previous post? If we look at the time for running the query it's ~33% faster and about 12x slower than the optimal datomic/datalog query. Follow this link for some more datalog-y queries in core.logic.

Monday, 16 July 2012

Replicating Datomic/Datalog queries with core.logic

I've been toying with Datomic recently, and I particularly like the power of it's query language (~Datalog). Mr Halloway showed a couple of months ago how the query engine is generic enough to be run on standard Clojure collections, gist here. Here is an example from that page of a simple join; A question you might ask yourself is how can I use core.logic to do the same kind of queries? It turns out that it's pretty straight forward, and also very fast. Core.logic provides some convenient helper functions for unification, that we are going to use. Here's an example of how to get a binding map for some logical variables over a collection; Let's write a little helper function that maps binding-map over all elements of a seq (of tuples) (I'm using binding-map* so I only need to prep the rule once) We can use clojure.set/join to perform the natural join of 2 sets of binding maps like so; Note that I also pick out just the first/height lvars here, this corresponds to the :find clause in the datomic query. That's it really, not as generic (and data driven) as the datomic query, but working; Here's the kicker, for this join query the datomic.api/q's time complexity estimates to O[n2] (actually 22n2-50n) where as the time complexity for core.logic/unify + clojure.set/join solution is O[n] (10n). That means that for a run over a modest dataset of size 5000 the difference is 50x!

Edit: The Datomic query used in the benchmark is not optimal as it turns out. In the example below you'll see a more optimal version that's infact ~18x faster than the core.logic + clojure.set/join solution. Obviously this little example doesn't convey the true power of either datomic/datalog or core.logic/unifier. Be careful writing your Datomic queries, the running time can be vastly different!

Here some more of the datomic queries converted in a similar fashion.

Wednesday, 11 July 2012

N-Queens with core.logic, take 2

This post is a follow-up to my previous post on NQueens and core.logic, in which I tried to find the solutions using "pure" logic (without arithmetic goals) and basic minKanren / Reasoner Schemer building blocks.

After some excellent feedback and hints from Mr David Nolen (big thanks), I here present a greatly simplified (and faster) way of using core.logic to find all solutions. Credit also goes to good old Bratko.

First, let's fix the safeo function (and def-subo macro). In miniKanren, you can use arithmetic goals given 2 prerequisites; the fresh variable must be bound to a finite (number) space and we much use project to bind the values. This means we can get rid of subo altogether.

Project is mentioned in passing in the Reasoned Schemer book, it says "project is syntactically like fresh, but it binds different values to the lexical variables. project binds walk*ed values, whereas fresh binds variables using var". What that means in this case is that we can perform arithmetic operations of the projected variables. Here's our new safeo; This works if the x's as bound to number range with membero (remember that the y's are never fresh), this is done in a updated queens-run macro; As you can see, we got rid of subo (and all it's associations). That improves performance on a 6-queens run with ~5x

Next, we want to get rid of the big macro above and replace it with some readable looping constructs. To replicate the block of safeos generated by this macro (the gen-safeos fn), we are going to need 2 loops, one outer loop that goes over all N queens q, and one inner loop that for every q checks that it's safe against all other queens. Its time for some core.logic pattern matching magic!

Let's change safeo again. Instead of checking if 2 queens are not attacking each other, it checks if a given queen is not attacking any other queen in a given list (this is our inner loop); There's a couple of things going on here. defne is a convenience macro to build conde functions (which you end up doing all the time), secondly there are some clever destructing / pattern matching going on to pick the head / tail of a list and recur.

Now for the outer loop, let's introduce another recursive "conde function" called nqueenso, this function should loop over all queens in a list, check that they are all safe against each other (using safeo). It also needs to bind the x variables to a finite number space using membero; We're almost done now, we just to need rewrite the (run* ...) form. We can actually do without the (fresh ...) expression by creating core.logic logical variables (lvar) directly, this also eliminates the need for a macro to generate n symbols. Here's the whole thing; As you can see, with looping we are generating drastically less associations for core.logic to consider, that's good for performance.

Now it's ~70x faster than the original solution in the previous post. For a 8-queens run, this is ~50x slower than the hand-rolled functional backtracking solution in the very first posting. That's still pretty good for a generic prolog machinery with all the extra expression power that it packs.

Next part in this series will use cKanren functionality that is being worked on at the moment in core.logic. That might be even faster! 

Saturday, 7 July 2012

N-Queens with core.logic, take 1

I've been "hammock-reading" the excellent Reasoned Schemer book the last couple of months, on my quest trying to develop a gut feel for when logic programming, as defined by miniKanren/core.logic, is applicable.

My first attempt is to apply it to a problem where (as it turns out) miniKanren isn't a good fit, n-queens. What you really need for this, in logical programming world, for this problem is something called contraint logic programming (CLP) which is implemented (for example) in cKanren. The good people over at core.logic are working on integrating CLP and cKanren in core.logic as we speak, so I intend to revisit this problem as that work progresses.

Let's have a crack at this problem anyway shall we? I've previously posted a functional implementation on n-queens in Clojure, and it's both nice to read and fast. What would this look like using core.logic? 


Here's the core function (in Clojure) which determines if 2 queens are threatening each other.
The main problem in core.logic is the subtraction operator, which cannot be applied to fresh variables. We need a goal that has all the subtraction algebra "mapped out" as associations, let's call it subo. If we had that, our safeo function could look like this;
With subo (for a (range 2) subtraction) like this; Fortunately this isn't too bad, since in the classic case we only need to subtract numbers in (range 8). We can write a macro that generates a subo for any range our heart desires.

Next up the associations for queen pieces. We can brute force this as well by writing safeo associations for each queen piece against each other and finally constraining each queen to a unique row. Typing it out for 4-queens would look something like this; Please note that the y variables doesn't have to be fresh since they can only take one value (each queen is determined by an unique y variable).

Here's a complete listing to the whole thing, with an example of a 7-queens run; So how does it perform? Well, you guessed it, terribly. My previous functional Clojure implementation finds all 4 solutions for 6-queens in ~7ms. The core.logic one above does it in ~6.5 seconds, 3 orders of magnitude, ouch!

It's quite possible that this brute force approach is a very inefficient way of solving n-queens using miniKanren. Maybe building / removing queens from solution lists are a better approach? Also, cKanren in core.logic promises much faster and cleaner solutions. Either way, I'll keep you posted...

Tuesday, 3 July 2012

Some thoughts on logging

Have you ever tried to log from multi threaded program? Have you tried to make sense of the log output which multiple subsystems were logging to? Have you tried to do average latency calculations based on that log file?

If you reading this blog, I am guessing you answered yes to a couple of the questions above.

There are multiple problems here; multiple producers (race conditions), out-of-order logs, conflated subsystem in the same logs etc. You gotta put a lot of effort into you log post-processor to make any sense at all of the data, decorating it with various metadata to make it at all possible.

In this post I point out a few ways you can use Clojure (both the language and it's general ethos) to overcome these problems. The solutions here are primarily tailored to "pipeline systems" where you want to find time consuming bottle-necks and produce latency reports etc. For simplicity, this is in-process logging, where all the parts (threads) can share a log resource and tick timer etc.

TL;DR the complete code snippet

Agents

First off, the race conditions; if the log is simply a file, or a in-memory data structure, you have to somehow serialise the "emit" calls. Conventional wisdom would have you put the emit call in a critical section, but this is a) ugly b) can introduce deadlocks c) can effect the system under stress. We want to post a log entry using a lock free, asynchronous mechanism, that's why we have agents in Clojure;
Please note that when emitting with timestamps, we take a snapshot of the time instantly (in the context of the calling thread), not in the agent-thread context later on.

Tree Logging

How do we apply structure to the log data? One idea is to put the log data into a tree instead of a flat vector (or file).  Then log entries from different subsystems can be separated (for easier post-processing), and we can express dependancy between log events for latency calculations.

Lets say each log entry is associated with one id and multiple correlation ids. The id is typically an UUID which you assign to a request, operation, instruction that "travels" through multiple parts of your system. The correlation ids can be used a splitting your logs into categories of arbitrary depth, yield possibly more meaningful reports etc.

Here's how such a tree structure can be built using Clojure's maps and the mighty assoc-in function; The logger handles the timestamping for you, it also gives you functionality such as extracting chronological logs for ids, and timing/latency reports.
Ok, this looks like it might perhaps work, but there are some drawbacks; we must keep the tree in memory for processing, which can be hard for huge logs, also serialisation to disk is trickier (but no really if you have a reader). We're also forcing a structure upon the log data, that feels awkward. Supplying the id, and corr-ids is not as clean as it could be.

Metadata

Let's get back to a simple flat log and more metadata in the events. We use the metadata to express the categories and other dependancies between the events that we used the tree for in the previous example. In a sense, we are still using a tree (or maybe even graph) data structure, but the branching is described with metadata in sequence of events. Using the emit function describes under the "Agents" paragraph above, here and example of some "connected" log events;
The helper functions also becomes much cleaner this way;
Please note that these functions take a (de-referenced) log rather than using @log directly. This helps testing, but is also very important for more complicated (looping) functions which should work on a stable snapshot of the log events.

Conclusion

The flat log file, with simple events described as Clojure maps worked out really well, and it's certainly more idiomatic. The log is easier to serialise, we only need parts of it in memory for meaningful analysis, we can treat it as a continuos stream. The analysis functions that we write can leverage the power of Clojure's library directly and compose beautifully. Also, we are not forcing any structure (or schema) on the log events (more than any analysis functions expect) which make it very flexible and future proof. The only price we are paying is additional (redundant) metadata to that can be more cheaply expressed in a tree data structure.

Here's a complete listing of a event logger, using metadata, with statistical reporting.

More reading