
One problem facing large software corporations is N^2 (or quadratic) development, where N is the number of features in a product. This tends to be a major problem when the product is not object-oriented to begin with, or when a product has been around for a long time, such as Microsoft Excel and Word which have been around for about 20 years.
If you are not familiar with the notion of algorithmic complexity, technically, the time or space efficiency of an algorithm is calculated as a function, f(N), of the input size, N. So, quicksort takes average time N log N where N is the number of items to sort, whereas insert sort takes average time N(N+1)/2 = N^2/2 +N/2. Each function can simply be approximated by reducing the function to its largest term, since the largest term, N^2 in the insertion sort case, easily drawfs all the other terms as N grows larger. The time complexity is usually seen with the big “Oh” notation for “order”—so quicksort takes O(N log N) and insertion sort takes O(N^2).
The function, f(N), becomes quite interesting to study when N becomes large, making it a very easy to compare two algorithms. By taking the ratio of two algorithmic functions, f(N) and g(N) and looking a limit as N goes to infinity, one can determine which algorithm is slower. If the limit is infinite, then algorithm using f(N) doesn’t scale as well as that using g(N), because as N grows larger the performance ratio between the two grow boundless.
I find time complexity is actually useful in a lot of mundane activities such as how one organizes and sorts things around the house, but my main purpose with this post is the application of time complexity to the process of software development.
I am defining N here as the number of features in the product. (The case where N is the number of developers is well-known from The Mythical Man Month. Having two many developers, who cooperate with each other, can doom a project because the N^2 communication and integration costs overwhelm the linear (N) increases in development productivity.)
This principle applies to many products, but my experience lies with Excel. With the advent of each new release of Office, comes a host of major cross-cutting infrastructure code that interact with every other feature.
Excel 1–4: Undo, Load/Save, Macros, Command Handling, Editing, Dialogs, Recalc, Localization, Memory Management, Menus, Toolbars Excel 5: VBA/Applescript, Recording, OLE Excel 97: Change Tracking, Simultaneous Editing, Office (Drawing, Toolbars, etc) Excel 2000: HTML Load/Save, Backward file compatibility, Publish to Web Excel XP/2003: XML Load/Save, XML UI Integration
This is just a small sampling of cross-cutting features that I can think of. Keep in mind that each of these are complex with multiple sub-features. Load/Save, for instance, includes past versions, secondary formats (CSV, XML, HTML), and competitor formats. Excel macros support four or five different regimes—Lotus Macros, AppleScript, XLM and VBA.
Each time a new feature is added, that feature needs to work with all the other infrastructure features. It needs new UI controls (dialogs, menus, toolbars), persistence (load/save), programmability, undo support, and so on. Individual features may also interact with other regular features. Hences, N different sets of code must be written for each new feature, which itself ups the N count.
Sometimes, when I look at the latest versions of Office, I am surprised at the small number of new features as compared to Office 97 release. I wonder if it’s because of this N^2 complexity in the product. If Microsoft doubles the development staff for Office 2006, I doubt that would address the long term problem. There will be twice as many features for Office 2006, but then for the following version of Office, N will be that much larger (and N^2 doubly so).
On the other hand, Microsoft engineers, who are smart, may have already figured this out and have been rearchitecting the products to a zero-cost model, which I’ll described below. In this case, there will be also be few features to show in the near-term release, but many more so in subsequent releases.
The problem with N^2 development is that:
1) It makes it easier for a new company to catch up in software development, because that company would have a smaller N. If the new company employs linear development (N instead of N^2), then the new company has a long-term developmental advantage. 2) The cost of implementing new features become progressively harder as the time goes on. 3) With features interacting with every other feature, N^2 development implies N^2 testing.
It seems to me that Avalon provides a good example of linear development in which features are written in isolation of another. Cross-cutting features are automatically support by every object without the need for special code.
The codebase is written in object-oriented managed code, with an emphasis on the Composite design pattern. The importance of OOP is that it promotes modularization and reuse, essential for zero-cost. Avalon utilizes declarative techniques and includes a system for registering self-describing dependency properties, commands, and events. This way, Avalon allows architectural features that cut across every feature to require zero code. New objects created in Avalon do not need to provide special support for data-binding, animation, undo/redo, XAML serialization.
In my own application framework, that I have built up, I came up with a similar design so that when I do add a new feature, that stuff like undo, load/save and so on for free. It means slightly heavier data structures such as dynamic expando-like properties, for which I rely on things like a flyweight pattern to mitigate. For me, I have no other alternative. I am trying to build a large application and I don’t have the testing resources to test every combination of each cross-cutting features with each feature.