Tuesday, June 2, 2009

Exploring a Hierarchy in F# with Immutability

Recently a task was given to me where I had to extract a new code file from an existing object. The requirements were to extract only the basics of a type, it's public properties on classes and public fields on enumerations. Here's the catch, however: the new code should not reference any of the existing types. What this means, is that my code is going to have to recurse through my custom types and generate those as well. This created a situation where I had to explore the entire hierarchy of types associated with one specific type, and generate a class file for each of them.

To make matters worse, there are cycles in the object model. A parent class may have a List of child class instances, and each child may hold a reference to the parent via a Parent property. Exploring the hierarchy and simply adding each and every property, then exploring each particular type as they come along would most certainly cause a stack overflow as I go back and forth between the parent and child types. So I needed the list generated to be filled with only unique values, and the output of the method should have only the unique values within the entire hierarchy.

My next challenge (which, I admit, was self-imposed) was to do this whole hierarchical search in a completely immutable way. Though the OO developer in me really wanted to simply add the types to a generic list of types, and use the handy Contains method to determine if I've already added a specific type to the list as I recurse the hierarchy, this, I felt, was not a purely "functional" solution to the problem. In the end, I came up with this generic method to explore a hierarchy and return the distinct values from within the hierarchy.


let rec ExploreHierarchy (x:'a) (f:'a -> seq<'a>) (list:list<'a>) =
let newlist = x::list
let notExists x = List.exists ((=) x) newlist |> not
let children = f x |> Seq.filter notExists |> Seq.distinct
let rec innerloop (possible:seq<'a>) (completed:list<'a>) =
match possible |> Seq.to_list with
| h::t -> if List.exists ((=) h) completed
then innerloop t completed
else ExploreHierarchy h f completed |> innerloop t
| [] -> completed
innerloop children newlist


The first argument passed into the hierarchy is the starting point to explore the hierarchy. The second argument is a function that, taking an instance of the same type as the first argument, will return a sequence of the same type. The last argument is an F# list containing the existing items so far. When this function is first called, you'll want to pass in an empty list ([]).

Inside the function, the first thing to happen is that a new list, containing x combined with the list passed in is created.

Next, a function is created to determine if an item exists or not.

Next, a distinct list of children that don't already exist in the list passed in is created. This list of children is going to be used to recursively call the innerloop function, which itself will call the ExploreHierarchy function, so it's important that the items on this list haven't already been explorered.

The innerloop function takes two lists, a possible list, and a completed list. The idea here is to populate the completed list, while removing items from the possible sequence one at a time until there are none left, at which point, the completed list is returned.

Lastly, the innerloop function is called, passing in the children, as well as the newlist.

A simple exploration might look like this. First, I have to define a function to get the children of a particular node:


let getProperties (x:Type) = 
x.GetProperties() |> Seq.map (fun x -> x.PropertyType)


Next, I could display all the types referenced by calling something like this:


ExploreHierarchy typeof<System.Windows.Forms.Form> getProperties [] |> List.iter (printfn "%A")  


Of course, my final implementation for my purposes was much more complex, involving some CodeDom and a much more sophisticated getProperties method, but I think you can see the general gist of it.

As you can see, F# is ideal for a situation like this. I've managed to create a completely generic function that explores a hierarchy and returns distinct children from any hierarchy. I think the big advantage here is the ability to treat a function as a first-class citizen. It makes the code much more succinct.