Tuesday, December 08, 2009

Think Big

I was playing around with my new installation of Windows 7 today, frustrated that I couldn't install SQL Server 2008 Management Studio Express (not yet supported), but determined to have fun. I was also playing around with 64-bit Windows Server 2008 running SQL Server 2008 on a machine with 8GB of RAM. I thought it would be fun to create a table with 2^32 rows, and I was right. If you run SELECT COUNT(*) FROM #Big on a table of this size, you're shown an error message "Arithmetic overflow error converting expression to data type int." No fear though, just use the COUNT_BIG() function instead of COUNT(). Bingo: 4294967296!

Tuesday, November 17, 2009

Realistic Scalability

Everybody seems to focus on adding processors and memory (or complete nodes) when they talk about scalability, but not a lot of mention is made about adding new people to manage the systems. True scalability should, in my opinion, include factors like the cost of human labour. You've designed a new system: great! It runs with only 20 errors a day on 1000 engines: super! It's linearly scalable, so your boss buys another 1000 engines: unfortunately, this means an extra 20 errors per day; this could mean another person needs to be added to the support team!

Tuesday, October 06, 2009

Escribir en español

I'm using Ubuntu 8.10 and learning to speak Spanish, so I wanted to know how to type the accented characters and inverted exclamation and question marks. As it turns out: it's pretty easy, even with a generic UK keyboard. Follow these easy steps to do it yourself:

System > Preferences > Keyboard

On the Layouts tab, click the "Other Options..." button, then expand the "Compose key position" element. Select "Right Alt is Compose." Close the Keyboard Layout Options dialog and focus the cursor on the "Type to test settings" box.

For á, é, í, ó and ú:
Press and release Right-Alt, then press and release @', then press and release the vowel key.

For the upper case Á, É, Í, Ó and Ú:
Press and release Right-Alt, then press and release @', then hold shift while pressing and releasing the vowel key.

For ñ:
Press and release Right-Alt, then press and release ~#, then press and release n.

For the upper case Ñ:
Press and release Right-Alt, then press and release ~#, then hold shift while pressing and releasing N.

For ¿:
Press and release Right-Alt, then hold shift while pressing and releasing the ?/ key twice.

For ¡:
Press and release Right-Alt, then hold shift while pressing and releasing the !1 key twice.

Tuesday, June 30, 2009

IIS and Process Orphaning

IIS has some very neat features. Consider process orphaning: your application has become unresponsive... IIS can take it out of the application pool so that it receives no further requests, and can automatically attach a debugger or write a full memory dump for you to examine later. And while all this is happening, it will have brought up a new worker to seamlessly service all other inbound requests. All in the name of reliability.

Thursday, May 28, 2009

Isolation / Immutability / Synchronization

If there's any one recipe for software runtime scalability, it's this: (IIS)^2. It's no secret I'm a fan of Internet Information Services, but in today's post it plays the role of host to our application, and my focus instead is on the underlying application's design.

Prevent state from being shared, allowing us to read and write to our hearts' content. Shared Nothing (SN) architectures are the ultimate realization of isolation. The definition of the isolation boundaries within an application might become contentious, and someone will need to be responsible for owning each isolated domain.

Prevent state from being modified - we can have as many readers as we want. In reality though, immutability by code contract is not as straightforward as simply not providing setter methods. In distributed environments especially, the .NET runtime must (de)serialize objects, which requires default constructors and public get/set accessors. We could choose to pass restrictive interface references instead, but this would force the (un)boxing of structs. Possibly, the application of custom attributes to state modifying methods/properties could be inspected statically by a tool like FxCop to prevent accidental object mutation. Objects do require mutation at times to be useful.

Mark critical sections where locks must be acquired before reading or writing. This is the typical style of programming that I've seen in Java and C# and doesn't easily lend itself to distributed programming. Also, while waiting, valuable resources are often wasted. It tends not to scale as well.

The best solution to a large distributed computing problem will undoubtably be a combination of the three above problems. Isolated tasks are likely candidates for physical distribution as they will scale horizontally. Immutability is hard to enforce completely and synchronization implies necessary waiting for other tasks. For these reasons, I give top priority to isolation, and let the other two duke it out as I see fit on the day.

Saturday, May 23, 2009

Without Intellisense, I'm Nothing

Imagine you're given a task: write the code for a program in a hitherto unknown DSL, embedding that code within a hitherto unknown markup language. What would be your tool of choice? Without any easily understood reference documentation, it's not going to be easy. Without the visual clues that you're missing a quote (or your string contains an extra quote) or that your function call doesn't exist or has the wrong arguments, you're going to have to run the program just to debug it. If that running process involved multiple clicks and typing followed by a short wait, you're going to get frustrated and probably won't deliver the program on time. Intellisense is great. You can write Javascript inside an HTML page, and Visual Studio will squiggle under everything you've done wrong!

Sunday, May 10, 2009

Hello Ruby

def reverse(value)
if value.length > 1 then
value[value.length-1] + reverse(value[0,value.length-1])
The string formatting functions of Ruby caught my interest, but I haven't checked if any equivalents exist in Python.
irb> "#{reverse 'Hello, World!'}"
=> "!dlroW ,olleH"

Saturday, April 18, 2009

Synchronisation... is... slooooow...

a.k.a an argument for Shared Nothing. Continuing my !random post, I decided to see how long it would take to build up a cumulative normal distribution using the values generated by System.Random.Next(). I used and timed the following configurations:
1) Four threads on four cores sharing a Random (with thread synchronisation): 180 seconds.
2) One thread on one core with its own Random: 63 seconds
3) Four threads on four cores, each with their own Random (sharing nothing): 17 seconds.
The lesson here is that synchronisation is slow. So slow in this case, that it was actually faster to single-thread the application. However, when each thread was given its own object that it didn't have to share with the others, the speed increase was dramatic. Not only was it massively faster, but I'd put my money on it scaling pretty well with an 8, 16, 32 or even 64 way server.

Out of curiosity, I also tried this configuration:
4) Four threads on four cores, each with their own Random, locking it unnecessarily: 39 seconds.
I found it interesting that even uncontended locks could halve the speed of my algorithm.

The moral of the story is to think about your locking strategy (hint: avoid putting yourself in the position where you need one) when looking to parallelize tasks.


System.Random's Next() method isn't guaranteed to be thread-safe, so if you start calling it from multiple threads at the same time, you're likely to end up with very little entropy indeed! I cooked up a static class, with a static System.Random field, and set 4 threads in motion calling Next() continuously. Each thread got its own core to run on, and very soon the only "random" number returned from Next() was 0.0 - the object had been well and truly corrupted. At this point I needed to choose: a single System.Random protected by lock() statements, or multiple System.Random objects. If I chose the single route (why?) all the synchronization would slow me down, and I'd end up not using each core to its fullest potential. If I chose the multiple route (only as many System.Random objects as there were threads), I would need to seed each one with a different value, otherwise they could - if created at the same time - return the same series across more than one System.Random object.

Interestingly, if I called the overloaded version of Next(int min, int max), soon the only return values would be min.

Thursday, March 26, 2009

Buffer vs. Array

To clear up any misconceptions, System.Array.Copy is not the same as System.Buffer.BlockCopy unless you're operating on arrays of System.Byte. The Buffer class copies bytes from one array to another. If you have an array of System.Int32 (4 bytes each) and you copy 3 bytes (not items) from your source to your destination array, you will just get the first 3 bytes of your 4 byte System.Int32. Depending on the endian-ness of your system, this could give you different results. Also, the System.Buffer class only works on primitives. Not C# primitives (which include strings) and not even derivatives of System.ValueType (e.g. enums and your own structs). Clearly it can't work on reference types safely (imagine just copying 3 of the 4 bytes from one object reference to another) but I would have expected it to work with enumerations (essentially ints).

Monday, March 23, 2009

Hello Python

>>> def reverse(value):
if len(value) > 1:
return value[len(value) - 1] + reverse(value[:len(value) - 1])
return value[0]

>>> print reverse('!dlroW ,olleH')
Hello, World!

System.OutOfMemoryException Part 4

And then there were 2. I'm talking about code paths that lead to an OutOfMemoryException being thrown by the CLR.

#1 is the standard "You've run out of contiguous virtual memory, sonny boy!", which is pretty easy to do in a 32-bit process. With the advent of 64-bit operating systems, you get a little more headroom. Actually, a lot more headroom: precicely 2^32 times as much as you had to start with! But, due to limitations built into the CLR, you're no further from an OutOfMemoryException...

#2 is when you try and allocate more than 2GiB in a single object. In the following example, it's the +1 that pushes you over the edge:
new System.Byte[System.Int32.MaxValue + 1];
If you're using a different ValueType your mileage may vary:
new System.Int32[(Ssytem.Int32.MaxValue / sizeof(System.Int32.MaxValue)) + 1];
Or if you're on a 64-bit system:
new System.IntPtr[(System.Int32.MaxValue / sizeof(System.Int64)) + 1];

Many thanks to this blog for helping me in the right direction here.

Hopscotch in the 64-bit Minefield

So it's no secret I've been playing with virtualization, Windows, Linux, IA32 and amd64. Virtualization looks the part, but so does a 24-bit color depth 1280 x 1024 bitmap of a fake desktop. You can't do much with either.

Microsoft has given us WOW64, allowing us to run old x86 (IA32) software on new 64-bit operating systems. It's so seamless you forget you've got it: until you try and install ISA Server 2006.

Getting Windows Hyper-V Server up and running on a headless box... well, let's just say I'm not that patient. I eventually settled for Hyper-V role on a full-fat Windows Server 2008 Enterprise Edition install, with crash-carted DKM. It doesn't appear to run on anything except the 64-bit version, either. Even then, loads of caveats await:
gigabit network card? you'll have to use a ridiculously slow emulated "switch" instead.
multiple cpus? not!
scsi drives? think again.
usb? umm... for some reason nobody in the world has ever thought of plugging a usb device into a server. i was the first. i'm so proud!
Granted, all these restrictions are imposed on the non-MS (see ubuntu) and older guest operating systems like Windows XP, but isn't half the pitch behind virtualization about getting rid of old physical kit and consolidating servers?

Flex Builder 3? Oh yes. But not for Linux. No wait... there's a free alpha version but it's not compatible with 64-bit.

VMWare ESXi? Check your hardware list first.

Saturday, March 21, 2009

Debug Enable!

All wireless functions on my O2 wireless box II appear to have died overnight. This is a problem for me because I use wireless all the time... to stream music to my Apple TV, to surf the Internet from my laptop, and play games on my iPod Touch. The wired ethernet is still happily routing packets between my LAN and the Internet. So I climbed up in my cupboard and pulled down the old Netgear DG834G - what a beautiful beast. It runs BusyBox, an embedded Linux operating system and you can connect to it using any telnet client. Just navigate to http://<netgear-router>/setup.cgi?todo=debug from your browser first, and you're A for Away - you can then telnet into it. Reboot the rooter when you're finished to disable the telnet server until next time.

Things that frustrated me:
There was no way to change the default gateway address handed out by either router. I suspect I could have done so with the Netgear box in time, but there is no text editor in the distribution.

To get my network back up and running would be difficult without the usual trance music coming through the Apple TV; keeping me focused. Still, I needed to:

  1. put both routers onto the same subnet
  2. keep DHCP running on the O2 box (so that the default gateway would remain the O2 box address)
  3. turn off DHCP on the Netgear box (otherwise it would give out its own IP address as default gateway)
  4. turn off the wireless interface on the O2 box for good

In a nutshell:
o2wirelessbox is now with DHCP handing out addresses in the range 64-253
netgear is now with disabled DHCP (I would have liked to use the range 2-63)

Wednesday, March 11, 2009

Gigabit Ethernet

... is actually quite fast. So fast - in fact - that the bottleneck on both PCs under my desk is currently the PCI bus into which the network cards are plugged. I had no idea that the bus ran at 33MHz and was only be able to transfer 32 bits per cycle (math says: 33M * 32b = 1056Mb). Todo: is this duplex?

There's a very useful FAQ regarding Gigabit Ethernet available here.

In a test with 4 network cards (2 in each machine) I saw the following:

w) 1A -> 2C (93Mb/s)
x) 1A -> 2D (902Mb/s)
y) 1B -> 2C (93Mb/s)
z) 1B -> 2D (294Mb/s)

1A: Gigabit embedded on motherboard (1000 link at switch)
1B: Gigabit on PCI bus (1000 link at switch)
2C: Fast embedded on motherboard ( 100 link at switch)
2D: Gigabit on PCI bus (1000 link at switch)

Machine 1 was sending large UDP datagrams. Machine 2 was not listening, it was just up so that ARP could get the MAC addresses of its adapters (without which, we could not send a datagram).

tests w + y appeared to be throttled by the router as it managed a mixed 100/1000 route
test x was great and showed that an onboard gigabit controller can actually do its job
test z showed the PCI bus being the bottleneck, allowing 3x faster than Fast, but 3x slower than Gigabit.

Saturday, March 07, 2009

Extellect Utilities

I've finally put a set of C# productivity classes on Google Code under the Apache 2.0 License. So, check 'em out, see if they make your work any easier, and let me know what you think.

Remoting, bindTo and IPAddress.Any

Just solved a fun problem. While running a .NET remoting server on my ubuntu box with multiple NICs I saw some strange behavior where the server would return an alternate IP address, and the client would attempt to re-connect (using a new Socket) to the new IP address. Problem being: the server was responding with its localhost address (, which the client was resolving to its own loopback adapter. See the problem yet?

It turns out that in the absence of a specific binding, the server binds to IPAddress.Any. When a client attempts to connect, it's redirected to the server's bound address. Unless client and server are hosted on the same physical machine, there's really no point in ever using the loopback adaptor... which makes it a strange choice for default.

The solution:
Before you create and register your TcpServerChannel, you need to set some options.
IDictionary properties = new Hashtable();
properties["bindTo"] = "dotted.quad.ip.address";
properties["port"] = port;
IChannel serverChannel = new TcpServerChannel(properties, new BinaryServerFormatterSinkProvider());
RemotingConfiguration.RegisterWellKnownServiceType(typeof(Explorer), "Explorer.rem", WellKnownObjectMode.SingleCall);

Voila! 'appiness should ensue...

PS. If for some reason the dotted quad doesn't appeal to your particular situation (e.g. load balancing), you can set two other properties instead:
properties["machineName"] = "load.balanced.server.name";
properties["useIpAddress"] = false;

PPS. I think the client will make always two socket connections, A + B. A is used at the start to do some initialization and get the address for connection B. B is used for meat and bones of the operation, and finally A is used just before they're both closed.

Friday, March 06, 2009

Goodput - Season Finale

I thought I'd take a look at a couple of different (and by no means an exhaustive list of) options for transferring a reasonably large file across a network. Over the past couple of days I tried sending a 700MB DivX using the .NET Remoting API (over both TCP and HTTP), the .NET Sockets API (over TCP), and finally using a mounted network share, reading the file as if it was local.

The table that follows shows the results of these tests:

I'd advocate taking pinch of salt when interpreting the numbers.

In general, remoting doesn't support the C# compiler generated closures it emits when it compiles an iterator block (e.g. the yield return keyword): quickly remedied by exposing the IEnumerator<T> as a remote MarshalByRefObject itself, wrapping the call to the iterator block. This gave us a nice looking (easy to read) interface, but will have increased the chattiness of the application, as every call to MoveNext() and Current would have required a network call. Further to this, the default SOAP serialization used with HTTP remoting doesn't support generic classes, so I had to write a non-generic version of my Streamable<T> class.

The performance of the HTTP/SOAP remoting was abysmal and there was very little gain by switching to a faster network. Even with what I suspect to be a massively chatty protocol (mine, not theirs), the bottleneck was probably somewhere else.

TCP remoting was next up. Under the covers it will have done all the marshalling/unmarshalling on a single TCP socket, but the chatty protocol (e.g. Current, MoveNext(), Current, MoveNext() etc.) probably let it down. TCP/Binary remoting's performance jumped 2.5x when given a 10x faster network, indicating some other bottleneck as it still used just 16% of the advertised available bandwidth.

CIFS was pretty quick, but not as quick as the System.Net.Sockets approach. Both used around 30% of the bandwidth on the Gigabit tests, indicating that some kind of parallelism might increase the utilization of the network link. An inverse-multiplexer could distribute the chunks evenly (round-robin) over 3 sockets sharing the same ethernet link, and a de-inverse-multiplexer (try saying that 10 times faster, after a couple of beers) could put them together.

Back on track...

Seeing as TCP/Binary remoting was the problem area that drove me to research this problem, I thought I'd spend a little more time trying to optimise it - without changing the algorithm/protocol/interface - by parameterizing the block size. The bigger the block size, the fewer times the network calls MoveNext() and get_Current have to be made, but the trade-off is that we have to deal with successively larger blocks of data.

What the numbers say: transmission rate is a bit of an upside down smile; at very low block sizes the algorithm is too chatty, at 4M it's at its peak, and beyond that something else becomes the bottleneck. At the 4M peak, the remote iteration invocations would only have been called 175 times, and the data transfer rate was 263Mb/s (roughly 89% of the observed CIFS' 296Mb/s).

Thursday, March 05, 2009

Full Duplex

Simple english: if two computers are connected by a full duplex ethernet link, then they should be able to carry out two conversations with each other simultaneously. For example, imagine two computers named A and B with a 100Mb/s full-duplex connection linking them both. A starts "talking" at 100Mb/s and B "listens". B also starts "talking" at 100Mb/s and A "listens". The total data moving up and down the link 200Mb/s. That's full duplex, baby!

Only, in real life you don't get the full 100Mb/s in either direction. On my PC, I managed to get 91Mb/s in one direction and 61Mb/s in the other direction. If I stopped the 91Mb/s conversation (call it X), the 61Mb/s conversation (call it Y) would quickly use up the extra bandwidth, becoming a 91Mb/s conversation itself. As soon as I restarted X, it reclaimed its original 91Mb/s, and Y returned to its original 61Mb/s. Freaky.

Goodput - Part 2

So then I thought to myself, "Hey, you have two NICs in each machine. Why don't you try and get double the throughput?" Even though all my NICs are gigabit ethernet, my modem/router/switch is only capable of 10/100 (a gigabit switch is in the post, as I type). Yesterday's tests indicated that I was getting roughly 89Mb/s, so I'd be aiming for 178Mb/s with my current hardware setup. And a glorious (hypothetical) 1.78Gb/s when the parcel arrives from amazon.co.uk.

What would have to change? For starters, the server was binding one socket to System.Net.IPAddress.Any; we'd have to create two sockets and bind each one to its own IP address. Easy enough. The client would also have to connect to one of the the two new server IP addresses.

Wait a minute... there isn't any System.Net.Sockets option on the client side to specify which ethernet adapter to use. You only specify the remote IP address. Oh no! This means we could end up sending/receiving all the data through just one of the client's NICs. Luckily, you can modify the routing table so that all traffic to a particular subnet can be routed via a specific interface. I'm using ubuntu as the client, and my routing table looks like this, which indicates that eth0 would get all the traffic to my LAN:

Destination Gateway Genmask Flags Metric Ref Use Iface * U 1 0 0 eth0 * U 1 0 0 eth1

I want to add a static route, with higher precedence than the LAN subnet, using eth1 for all communication with the remote IP, leaving eth0 with the traffic for the rest of the subnet. I type this command at the console:

sudo route add -net netmask dev eth1

Disaster averted. The routing table now looks like this, and I'm happy to say my diagnostics report that I'm getting around 170Mb/s with my new trick in place. It's not the 178Mb/s I was hoping for (I've lost about 4.5% on each connection), but it's still 190% of the original throughput.

Destination Gateway Genmask Flags Metric Ref Use Iface * UH 0 0 0 eth1 * U 1 0 0 eth0 * U 1 0 0 eth1

Throughput comparison reading a 700MB DivX file:
73Mb/s - Mounted network share using CIFS (although it also appeared to be caching the file on disk, incurring disk write penalties)
89Mb/s - 1x NIC using System.Net.Sockets
170Mb/s - 2x NIC using System.Net.Sockets

Goodput - Part 1

I've been interested recently in maximizing the application-level throughput of data across networks. The word I didn't know I was looking for - but found anyway - was goodput.

At first I tried streaming across a large DivX file. When I realised that I might be measuring disk seek time (doubtful, at the pitiful data rates I was achieving) I transitioned my tests to stream meaningless large byte arrays directly from memory (being careful not to invoke the wrath of the garbage collector, or any of the slow operating system-level memory functions).

What I noticed was that the application-level control of the stream was a big factor in slowing down the effective data transfer rate. In short, if my design of the logical/physical stream protocol was "bad", so would be the goodput.

Throughout the test, data was transmitted in chunks of varying sizes (from 4k to 128k). Firstly, just to see what it was like, I tried establishing a new System.Net.Socket connections for each chunk. Not good. This is why database connection pooling has really gained ground. It's expensive to create new connections. Next I tried a single connection where the client explicitly requested the next chunk. Also really bad. It was chatty like an office secretary, and got less done. So I tried a design I thought would be pretty good, where 1 request resulted in many chunks being returned. For some reason, I thought that prepending each chunk with its size would be a good idea. It was 360% better than the previous incarnations, but the extra size information was just repeating data that wasn't at all useful to the protocol I had devised: it was wasting bits and adding extra CPU load, and giving nothing in return; it had to go. Stripping these items from the stream resulted in an extra 3.6% of throughput.

Interestingly, I noticed that the choice of buffer size could drastically affect the goodput, especially when it was ((1024*128)+4) bytes. I expect this was something to do with alignment. It would be cool to do some more tests, looking for optimal buffer sizes.

Saturday, February 28, 2009

LiveCycle Data Services / amd64 / ubuntu

I haven't tried out the alternative - BlazeDS - yet, but I've had the time of my life trying to get ubuntu to use all my RAM and both my monitors, and even more fun trying to get LCDS installed. Nothing has been exceptionally difficult, but I've noticed a lot of things just are not compatible. Knowing not only where you're going, but also the route to get there, can make a major difference in total time taken. That's why I document my "mistakes" - for next time. Next time...

  1. Well, maybe next time I'll get Hyper-V Server 2008 to work. Even VMWare ESXi wasn't happy with my SCSI drives. Xen looked like more than I was prepared to bite off, nevermind chew.

  2. Accessing any more than 3GB of physical RAM on an ubuntu install requires the amd64 package. Similar limits are imposed on other 32 bit operating systems. In this case I chose to upgrade to the 64 bit version than enable PAE.

  3. to do anything useful in an unfamiliar operating system you'll want to use the desktop version.

  4. the graphics driver and operating system seemed to fight like small children if the driver's not open source ... I'm just glad that nVidia have made the effort for a 64 bit driver so I can see my desktop in TwinView, rather than being forced to run the server version (see 3).

  5. the Flash plugin that you can download from Adobe's website is - at time of writing - only compatible with i386 and not amd64. Problem's easily solved by using sudo apt-get install flashplugin-nonfree which was already set up inside Aptitude.

  6. when you download a .bin file, you're expected to know you're going to chmod +x and execute it. However, in the case of lcds261-lin.bin you can't just do this inside a terminal from the CTRL-ALT-F7 session; you must run it from one of the CTRL-ALT-F{2..6} sessions. Else you'll get a Java error java.lang.UnsatisfiedLinkError: Can't load library: /usr/lib/jvm/java-6-openjdk/jre/lib/amd64/xawt/libmawt.so. Amusingly, one of the lines in the stack trace read: at ZeroGay.a(DashoA10*..)

  7. even if you asked the installer for a Tomcat install, don't expect your container to be started when installation is complete. Perhaps it's too much to expect a reasonable error message too. Anyway, I created a new service account user "tomcat" with sudo adduser tomcat and then gave it ownership of the (otherwise locked down) Tomcat install directory and started the server.

  8. Actually check that the samples are working - front-to-back - at this point. First time around, I celebrated an early victory when I first saw the sample pages running inside Tomcat. There was a leeeettle bit more setup to do re: the sample database. The LCDS install ships with an HSQLDB database, which needs write permission on a bunch of directories. In the spirit of the previous step (and, perhaps, a pre-existing condition in which I derive pleasure from creating new service accounts) I created a new user called hsqldb with permission to create the .lck (lock) files. sudo adduser hsqldb and then /opt/lcds/sampledb$ sudo chown -R hsqldb .

  9. LCDS is free under Adobe's developer license on more than 1 CPU (or even in production environments, but only on a single CPU). This is great for developers like me who can tinker at home without parting with hard cash. We can even demo our applications to clients for free. The people with the cheque books can keep them in their pockets until bigger test and production environments are commissioned.

Friday, February 27, 2009

System.OutOfMemoryException Part 3

It's not every day you come home and plug 8GB of physical memory into your home PC. Before now, I wasn't completely aware that there was even a limit to the amount of memory you could usefully install on a Windows XP machine. There are all sorts of hurdles it seems. Luckily, Windows Server 2008 x64 has no problem with addressing it all. Bonus. But can you imagine the size of the swap files you would start to rack up if you ran a bunch of memory hungry processes simultaneously. Processes only see virtual memory... Windows is smart about whether this memory is backed by physical memory pages, or pages allocated on a system file called the swap file. As we move to the wider 64-bit addresses, we're not going to get another System.OutOfMemoryException; instead we'll run out of disk space for the swap file or spend so much time thrashing that we'll want the small address space back!

Saturday, February 21, 2009

Gen / Spec

I think it's common for vendors and consultancies to push their own technologies as solutions to the problems of clients. Even individual consultants are often bullish about the skills they've acquired. Together, these behaviours make it difficult (if not impossible) for the optimum solution to be found and implemented. Of course, an optimum solution isn't always what a client wants: consider delivering a sub-optimal solution that leaves your client better off than the nearest competitor. Still, I feel that recognising that an optimal solution does exist is an important step towards better architectures.

Adam Smith - the economist - wrote about the division of labour in his magnum opus, The Wealth of Nations. To paraphrase: he talks of a pin factory and the difference in productivity between two scenarios - sharing responsibilities across a group of employees, and assigning specific tasks to employees. There are many conclusions one can draw, but the one that shines out particularly to me is that performance gains can be made by choosing the right tool for the job (RTFJ).

Databases. Great for relational data storage and retrieval. They're designed to meet ACID requirements, and even with all the log shipping and replication in the world, don't scale quite as nicely as other technologies can/do. In my book, that would have been reason enough to keep business logic out of the database. However, certain cases might call for a gargantuan set of data to be worked on at once, and it might be prudent to "bring the computation to the data".

Grids and high performance computing. Great for compute intensive operations. However they're distributed by nature, and that generally makes things more difficult. They usually offer only a subset of the common operating system constructs we're used to - well conceptually, anyway. Spinning up a new thread locally is the "equivalent" of starting up a new process on a remote machine. Also there's the problem of moving data. (De)serialization is computationally intensive - optimisations can take the form of using shared metadata of common versions (e.g. .NET assemblies, and binary serialization) which bring new problems of managing versioning across the environment.

Whatever you're doing, always make "efficient" use of your CPU. Use asynchronous patterns for non-CPU tasks (e.g. waiting on I/O) using callbacks. Thread.Sleep() and spinning in a tight loop are generally evil (but I'm sure there exists a case where both are specifically wonderful).

Distribute only what you have to. If your constraint is virtual ("addressable") memory then it might be OK just to have multiple processes on the same machine with lots of physical memory, talking to each other via some non-network IPC mechanism.

Cache hits should be quick. Cache misses (generally) shouldn't result in multiple simultaneous requests to insert fresh data in the cache. Tricky bit is not making any "threads" wait while the cache data is inserted. That ties my previous point in with the next:

DRY. Don't repeat yourself. This goes for operations as well as for boilerplate copy and pasted code. If a cache can give you the result of an expensive operation you've already computed, for less cost, then consider caching. In-memory, disk, distributed and bespoke caches exist. Each will have a

Thursday, February 12, 2009

Licensing Components in .NET - Part 2

I reckon the only reason LicFileLicenseProvider is part of the framework is to get the point across the licensing greenhorns. All of a sudden, brains start ticking: you can load the license from anywhere. You begin crafting nefarious schemes using public key cryptography. It's brilliantly academic. But something's wrong. It would be easier just to hack the code. Steal the intellectual property. Hey, maybe that's why Microsoft stopped at the LicFileLicenseProvider? Maybe, and here's a thought: maybe they should have.

Another crazy piece of the (even crazier) licensing puzzle: licenses.licx files and lc.exe.

lc.exe is a tool written in .NET by Microsoft, which is used transparently by msbuild when your Visual Studio projects are compiling. Looking inside the assembly's resource strings, we discover it:

Generates a .NET Licenses file and adds it to the manifest of the given assembly
lc /target:TargetAssembly /complist:filename [/outdir:path] [/i:modules] [/v] [/nologo]

/target: Target assembly for the generated licenses file
/complist: Licensed component list file
/outdir: Output directory for the generated licenses file
/i: Specify modules to load
/v Verbose output
/nologo Suppress the display of the startup banner

The entry point into this assembly is the Main method of the System.Tools.LicenseCompiler class. Of (arguably) most importance is the /target: switch. This is the name of the assembly into which the compiled resource will be added. In Elsie.Target.exe this would be a resource named "Elsie.Target.exe.licenses", containing a binary stream representation of a serialized .NET object. More to come...

If you add a an empty text file named "licenses.licx" to your project, Visual Studio automatically sets its BuildAction:EmbeddedResource and CopyToOutput:DoNotCopy. It also calls lc.exe before calling csc.exe (yes, I'm a C#-a-holic). It makes the decision based on the .licx extension and you can have as many .licx files as you want in a single project (ok, that may not be true, but why would you want that many? Anyway, it will generate one /complist:[filename.licx] for each licx file in your project)

So what do you type in this/these text file(s)? If you really care, we'll have to make a 3rd installment.

Licensing Components in .NET - Part 1

I'm going to wear three hats here: the fantastic component developer who's written a third party library, and the poor sod who has to make the aforementioned fantastic component (not developer) work with the existing build environment. Finally, I'll be that guy who has to run this application and explain to his boss why it's suddenly stopped working.

Unsurprisingly, Microsoft do have a how-to for developing licensed components. There's also a nice diagram here. Some of the information presented in these tutorials is a little misleading, so I figured I'd get back to trying on my hats.

Hat 1

I've developed the one component that could change the world. That's a little c, not big C for component. There's no requirement for my class to fit into Microsoft's ComponentModel namespace (which is, incidentally, where all the bog standard licensing classes reside). So, I apply the [LicenseProvider(typeof(LicFileLicenseProvider))] to my fantastic class, and somewhere in the class I make a call to the static method LicenseManager.Validate(passing in the type of my class, as this is how the manager figures out that I wanted it to use the LicFileLicenseProvider). There are two overloads for Validate:
a) public static License Validate(Type type, object instance)
b) public static void Validate(Type type)
Option #1 offers the fullest functionality and it makes sense to make the call in my class's constructor - after all, the error message (of the LicenseException that's thrown if the call to Validate fails for some reason) WANTS me to do this: "An instance of type 'Elsie.Proprietary.Fantastic' was being created, and a valid license could not be granted for the type 'Elsie.Proprietary.Fantastic'. Please, contact the manufacturer of the component for more information." Because the call to Validate returns a new disposable License object, I'm responsible at least for ensuring it gets cleaned up properly. I'll assign it to an instance field, and make my class implement IDisposable.
Option #2 is a little less messy - I don't have to pass in an instance of my class, I don't have to worry about managing a Licence object's lifetime. "A valid license cannot be granted for the type Elsie.Proprietary.Fantastic. Contact the manufacturer of the component for more information."
That's it. I don't even have to create a license file.

Hat 2

I'm going to use the Fantastic class, so I mock up a new project of my own (which I call Elsie.Target.exe) and I add an assembly reference to it. Then I create (probably in notepad2.exe) a one line txt file: inside it I type "Elsie.Propietary.Fantastic is a licensed component". I make sure the file is called "Elsie.Propietary.Fantastic.lic" and I make sure it's copied to my working directory (probably by setting BuildAction:Content, and CopyToOutput:CopyAlways). Inside my application, I call the Fantastic constructor (within a using statement, because the class implements IDisposable, because the component deveoper was a responsible guy after all). Hidden inside the constructor, Fantastic checks if I'm allowed to use it by loading the .lic file. If the checks are successful, I go on my way to being a superstar developer. Otherwise, an exception will be thrown and it's back to the streets for me!

Hat 3

I'm in the London office at 7am. I deployed Elsie.Target.exe, along with Elsie.Propietary.dll and Elsie.Propietary.Fantastic.lic last night. While I've been sleeping, everyone in APAC has been delighted with just how fantastic the application is. In my excitement, I forget about being a cheeky monkey and changing the .lic file contents to read "... is *not* a licensed component". This is lucky for me, because it would BREAK the application!

Other examples:
Good: "Elsie.Proprietary.Fantastic is a licensed component."
Good: "Elsie.Proprietary.Fantastic is a licensed component. Yes it is!"
Good: "Elsie.Proprietary.Fantastic is a licensed component. NOT!"
Bad: "Elsie.Proprietary.Fantastic is a licensed component"
Bad: "Elsie.Proprietary.Fantastic is not a licensed component"
It turns out that the Key is valid if the text in the file starts with the fully qualified type name followed by " is a licensed component."

This is crazy! So crazy, in fact, that it might just work...