Wesner Moise

Algorithmic Complexity

 

In PDC session, “Five Things Every Win32 Developer Should Know,” Raymond Chen makes a dangerous comment in his session abstract:

The top two factors that control performance are I/O and memory. After exploring the relationship between physical memory, virtual memory, and the virtual address space with popular Win32 blogger and Microsoft developer Raymond Chen, we mix in some knowledge of how CPUs work and see how an O(n) algorithm can outperform an O(log n) algorithm: Why much of what you learned in school simply doesn’t matter.

I understand that his comment is meant to be provocative and intended to drum up his interest in his talk. He wants to stress the importance of being aware of multi-level caching hierarchies as well as the notion that “memory is speed.” The more memory used, the slower the software runs because caches are cleaned out more frequently. Some background information on the memory hierarchy is in ars technica.

While binary trees can deliver logarithmic-time performance over vector representations of data, there are some costs:

*Poor locality*. Trees nodes are connected via links, which defeat
the caching benefits of locality and also leads to more frequent and
costly paging operations.

-

*Overhead*. Tree nodes typically introduce additional memory
overhead per item for bookkeeping such as pointers to parent and
children nodes, CLR object header information, and tree balancing
information. Arrays, in constrast, typically have no overhead on top
of data.

He ignores the possibility of hybrid data structures. Tree nodes, for example, can contain multiple items to increase the caching benefits of locality, thereby reducing the overall constant factor of algorithms applied to the tree. This also reduces per object overhead. Because of hybridization, my own tree-based list uses a comparable amount of overhead to a simple arraylist with average overhead less than 50% per element.

Also, The .NET garbage collector mitigates the impact of locality. Nodes created together are adjacent in memory. Over multiple collections, the locality benefits increases because temporary objects allocations between the two nodes are freed and the two are compacted together.

With these disadvantages out of the way, the logarithmic data structures also offers these advantages:

Scalability to large number of elements. Trees touch only a fraction
of the data contained.

-

Better composability. Operations on linear-time data structures can
quickly explode to quadratic performance.

Trees are also more flexible data-structure, supporting inheritance and composition, and allowing for a lot of new interesting inexpensive features. I have been able to use my tree-based list and set data structures universally without resorting to special-purpose data-structures; that would not have been the case with vector-based lists.

Update: Coetzee has a related post on unrolled linked list.

 

Please enable JavaScript to view the comments powered by Disqus.

Promo Section Heading

You can use this section to promote your side projects etc. Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aenean commodo ligula eget dolor. Aenean massa.

image