Sunday, September 18, 2011

Diskette

I was emptying out a bunch of junk from my apartment and came across an old 3.5" floppy disk. My mind - being as it is - immediately wondered what the unit tests would look like if I were writing a virtual floppy disk (because nobody's got real floppy drives these days.)

On my Diskette class, I would expose the following methods to modify/access state:
  • bool ToggleDiskAccessSlot()
  • bool ToggleWriteProtect()
  • void RotateDisk(int numSectors)
  • int Read(int cylinder, int side)
  • void Write(int cylinder, int side, int value)

Using Roy Osherove's naming strategy (a la accepted answer to this question), I'd add a unit test class named DisketteTests. Unit tests - IMHO - are intended to ensure that public methods modify the internal state of an object as expected. A test first sets up the object under test to a known state, then invokes a method, and finally makes an assertion about the state.

I might want to test the ability to read or write while the disk access slot is closed (I'd expect some exception to be thrown, so there's no explicit calls to Assert in the method body, the assertion is made by the test framework based on the test's custom attributes):
[ExpectedException(typeof(InvalidOperationException))]
[TestMethod]
public void Read_DiskAccessSlotIsClosed_ThrowsException()
{
Diskette d = new Diskette();
// intentionally missing the step to ToggleDiskAccessSlot();
Ignore(d.Read(1, 1));
}

A more standard test might look like this:
[TestMethod]
public void Read_DiskAccessSlotIsOpen_GetsCurrentValue()
{
Diskette d = new Diskette();
if (!d.ToggleDiskAccessSlot())
{
Assert.Fail("Unable to open disk access slot");
}
int value = d.Read(1, 1);
Assert.AreEqual(value, 0);
}

Saturday, September 10, 2011

Memoize

The first time I saw this word I thought someone had mispelled it. Instead - as I later found out - it's the name for a common pattern in functional programming. Memoize caches the result of a function based on its input parameters. That may not sound particularly special, but in cases where a solution is recursive and frequently calls itself with the same parameters, memoize can turn an algorithm that takes billions of years to run[1], into one that takes a fraction of a second.

Consider this problem outlined on ROT13(Cebwrpg Rhyre). Every node in the triangle has an associated maximum cost. For the elements at the base level of the triangle, that value is the value of the element. For the element at the next highest level, the maximum cost is the value of the element itself, plus the greater value of the two elements below it. This property holds recursively throughout the entire graph, and we can use the memoize pattern to our advantage: instead of working out the cost of each element every time our algorithm requires it, we simply fetch it from the cache (if it doesn't exist, we add it to the cache, and use it on all subsequent calls.)

The original brute force C# code looked like this:
static int MaxCost(Triangle triangle, int x, int y)
{
if (y == triangle.Depth)
{
return 0;
}
else
{
int left = MaxCost(triangle, x, y + 1);
int right = MaxCost(triangle, x + 1, y + 1);
return Math.Max(left, right) + triangle[x, y];
}
}

We need to rewrite the method body slightly because it's self-recursive, and we can't take advantage of the memoize cache if our function just calls back into itself directly.

If we're pressed for time, we might simply code the cache into the method body:
static int MaxCost(Triangle triangle, int x, int y)
{
var key = new Tuple<Triangle,int,int>(triangle, x, y);
if (cache.ContainsKey(key))
{
return cache[key];
}

if (y == triangle.Depth)
{
return 0;
}
else
{
int left = MaxCost(triangle, x, y + 1);
int right = MaxCost(triangle, x + 1, y + 1);
int result = Math.Max(left, right) + triangle[x, y];
cache.Add(key, result);
return result;
}
}

With more time on our hands, we might separate the function into two distinct logic functions, one that deals with memoization, and the other that deals with the problem at hand:
static int MemoizedMaxCost(Triangle triangle, int y, int x)
{
var key = new Tuple<Triangle, int, int>(triangle, x, y);
if (cache.ContainsKey(key))
{
cache[key];
}
int result = InternalMaxCost(triangle, y, x);
cache.Add(key, result);
return result;
}

static int InternalMaxCost(Triangle triangle, int x, int y)
{
if (y == triangle.Depth)
{
return 0;
}
else
{
int left = MemoizedMaxCost(triangle, x, y + 1);
int right = MemoizedMaxCost(triangle, x + 1, y + 1);
return Math.Max(left, right) + triangle[x, y];
}
}

Note that we now have a pair of mutually recursive functions: the memoized function calls the internal one to evaluate any uncached values, and the internal one calls the memoized one to gain any of the performance benefits from memoizing. A match made in heaven?

Monday, September 05, 2011

Covariance / Contravariance

This is a little experiment to get your head around co- and contravariance, as applied to generic interfaces in C#. Covariance allows the result of a computation to vary, while Contravariance allows the input to vary. There are three interfaces in the code sample. The first demonstrates covariance by emulating a function that only returns a result. The second demonstrates contravariance by emulating an action that only takes an input. The third demonstrates both co- and contravariance. Let's look at the code:
#region Covariance
interface IFunction<out TResult>
{
TResult Invoke();
}

class Function<TResult> : IFunction<TResult>
{
public TResult Invoke()
{
throw new NotImplementedException();
}
}
#endregion

#region Contravariance
interface IAction<in T>
{
void Invoke(T arg);
}

class Action<T> : IAction<T>
{
public void Invoke(T arg)
{
throw new NotImplementedException();
}
}
#endregion

#region Both
interface IFunction<in T, out TResult>
{
TResult Invoke(T arg);
}

class Function<T, TResult> : IFunction<T, TResult>
{
public TResult Invoke(T arg)
{
throw new NotImplementedException();
}
}
#endregion

None of the methods are implemented, because a) I'm lazy and b) you don't need to see an implementation to witness the co- or contravariance first hand. The next code block shows the phenomena at work:
IFunction<object> f1 = new Function<string>();
IAction<string> a1 = new Action<object>();
IFunction<string, object> f2 = new Function<object, string>();

Notes:

  1. In the case of covariance, the generic argument type of the class is more specific than the interface (covariance: we can always implicitly cast from string to object when reading from the container).
  2. In the case of contravariance, the generic argument type of the interface is more specific than the class (contravariance: we can always implicitly cast from string to object when writing to the container).
  3. In the mixed case, we can see covariance for the output and contravariance for the input.