Roots of Software Inefficiency

Being a GEEK ACMer or TopCoder beast (i.e design and analysis of algorithms) is about writing an efficient algorithm in the scope of 3 or 2 functions doing a very specific and limited task. However, writing a software that runs on customers’ computers is bigger than this. A typical windows commercial mid-sized software consists of a set of Executable, Services and DLLs interacting with each other to shape what is called software. The efficiency of algorithms and data structures is necessary but not sufficient: By itself, it does not guarantee good overall program efficiency. It is important to know the roots of software inefficiency if we care about writing fast one.

What are the factors that affect efficiency? Efficient C++ Performance Programming Techniques book By Dov Bulka, David Mayhew made a very good high-level categorization to these factors:

Design Efficiency This involves the program’s high-level design. To fix performance problems at that level you must understand the program’s big picture.  We are talking here about software architecture, UML diagrams, pseudo codes, algorithms, data-structures, and anything you can consider it language independent.

Code Efficiency Small-to medium-scale implementation issues fall into this category. Fixing performance in this category generally involves local modifications. For example, you do not need to look very far into a code fragment in order to lift a constant expression out of a loop and prevent redundant computations. The code fragment you need to understand is limited in scope to the loop body.

Both these 2 levels can be broken down more:

1. Design

1.1 Algorithms and Data Structures Technically speaking, every program is an algorithm in itself. Referring to “algorithms and data structures” actually refers to the well-known subset of algorithms for accessing, searching, sorting, compressing, and otherwise manipulating large collections of data. Oftentimes performance automatically is associated with the efficiency of the algorithms and data structures used in a program, as if nothing else matters which is inaccurate.

1.2 Program Decomposition This involves decomposition of the overall task into communicating subtasks, object hierarchies, functions, data, and function flow. It is the program’s high-level design and includes component design as well as inter-component communication. Few programs consist of a single component. A typical Web application interacts (via API) with a Web server, TCP sockets, and a database, at the very least.

2. Coding

2.1 Language Constructs A programming language is a tool we use to express to computers how to do a specific task. Whether you use C++, C#, Java and you care about performance, you need understand the cost of your programming language constructs so not to be shocked at run time when you program scale. C++ adds power and flexibility to its C ancestor (i.e Object Oriented capabilities). These added benefits do not come for free—some C++ language constructs may produce overhead in exchange.

2.2 System Architecture System designers invest considerable effort to present the programmer with an idealistic view of the system: infinite memory, dedicated CPU, parallel thread execution, and uniform-cost memory access. Of course, none of these is true—it just feels that way. Developing software free of system architecture considerations is also convenient. To achieve high performance, however, these architectural issues cannot be ignored since they can impact performance drastically. When it comes to performance we must bear in mind that:

  • Memory is not infinite. It is the virtual memory system that makes it appear that way.
  • The cost of memory access is non-uniform. There are orders of magnitude difference among cache, main memory, and disk access.
  • Our program does not have a dedicated CPU. We get a time slice only once in a while.
  • On a uniprocessor machine, parallel threads do not truly execute in parallel—they take turns.

If you write Windows software, you need read well about WinAPI and dig deep in Windows programming world, to understand how your host operating system – Windows in this case – will execute your program. This applies if you write software for Linux, iOS or any operating system. Good understanding of the host operating system is a must.

2.3 Libraries The choice of libraries used by an implementation can also affect performance. For starters, some libraries may perform a task faster than others. Because you typically don’t have access to the library’s source code, it is hard to tell how library calls implement their services. For example, to convert an integer to a character string, you can choose between
sprintf(string, “%d”, i); or an integer-to-ASCII function call, itoa(i, string); Which one is more efficient? Is the difference significant?

There is also the option of rolling your own version even if a particular service is already available in a library. Libraries are often designed with flexibility and reusability in mind. Often, flexibility and reusability trade off with performance. If, for some critical code fragment, you choose to put performance considerations above the other two, it might be reasonable to override a library service with your own home-grown implementation. Applications are so diverse in their specific needs, it is hard to design a library that will be the perfect solution for everybody, everywhere, all the time.

2.4 Compiler Optimizations Simply a more descriptive name than “miscellaneous,” this category includes all those small coding tricks that don’t fit in the other coding categories, such as loop unrolling, lifting constant expressions out of loops, and similar techniques for elimination of computational redundancies. Most compilers will perform many of those optimizations for you. But you cannot count on any specific compiler to perform a specific optimization.For ultimate control, you have to take coding matters into your own hands.

I remember how Visual Studio saved my team in an Image Processing performance competition in my faculty. In this competition your image processing package has to run many image processing algorithms and your package timing in each algorithm is used to rank it among the others. We optimized some of our algorithms manually, but didn’t have much time to optimize the others. I got an evil idea of enabling Visual Studio code optimization, and I was shocked by the results. The running time of many algorithms dropped down greatly and I couldn’t believe how C++ code optimization held by the compiler can be that effective.


  1. Teach yourself how to design and analyze algorithms and practice well (i.e problem solving through ACM Online Judge and TopCoder)
  2. Pick a programming language and master it (i.e read about its internals and understand the scary dark side of it).
  3. Know the internals of a certain operating system on which you prefer to write your software (e.g Windows, Linux or Mac OS programming).
  4. Write big multi-file, multi-module projects with real requirements.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: