Monday, December 15, 2008

How Your Memory Affects Code Understanding

In a 1956 article by Psychologist George Miller it was determined that the short term memory of a human being is limited to (roughly) 7 "bits" of information at a time. Beyond this, the long term memory has to kick in.

I've started thinking about this in terms of how I refactor my code. Let's say I have a piece of code that looks like this:

Console.WriteLine(string.Format("Hello {0}, {1} has called you {2} time{3} today", 

   callee.FirstName, caller.FirstName, callcount, (callcount == 1 ? "" : "s")));

While this one line describes exactly what I want to do, it's incredibly confusing and requires me to spend alot of time analyzing it to understand the purpose. While short, it's relatively unreadable, and takes a good deal of time to comprehend.

Miller also talks of how combining bits of information together with each other can help memory retention. He calls these groups of bits chunks, and states that a chunk is about as difficult to remember as a bit. This means that the average programmer can remember, in short term memory, approximately 7 items, which could be bits or chunks.

How does this apply to your code? The code I posted above is one long line. The average reader is going to read code in bits, and then mentally group those pieces of code together to create one coherent whole. This process is made more difficult when the number of bits in the code exceed the number of bits that the human mind can remember mentally. When the user then has to analyze the text to find ways to chunk the information, this can drastically slow the process of reading code. In fact, the very act of analyzing the code will take a slot or two of this short term 7-item maximum, and will replace what's already there. Here's an example of how one developer might mentally separate the bits of data in their mind:

Console.WriteLine ... string.Format ... "Hello {0}, {1} has called you {2} time ... 

    {3} ...  today" ... callee ... FirstName ... caller ... FirstName ...

    callcount ... callcount == 1 ... ? ... "" ... : ... "s"

This is too many bits of information for the average person to remember in short term memory without chunking. So the reader would then be forced to reread the code in order to chunk the bits of information, so that he or she can better understand the code. The reader may have to make multiple passes to chunk the data mentally, and this can take significant time. The reader will have to focus on a small portion of the text, group that portion, and then move to a larger portion, and group that portion, and finally put it all together. This can take significant time, and can all but destroy efficiency in code review. Here's how our reader might chunk the information:

Console.WriteLine ... string.Format ... "Hello {0}, {1} has called you {2} time ... 

    {3} today" ... callee.FirstName ... caller.FirstName ...

    callcount ... (callcount == 1 ?  "" : "s")

Now the reader has 8 distinct chunks. This is somewhat manageable on a good day (says Miller). If I'm really tired, I might have to chunk it again:

Console.WriteLine(string.Format("Hello {0}, {1} has called you {2} time{3} today" ... 

    callee.FirstName ... caller.FirstName ...

    callcount ... (callcount == 1 ?  "" : "s")

Now I've gotten the chunks of code down to 5 chunks, but it took me two passes to get there. Compare the code above to the following code:
Now the question is, would it be easier for the reader of your code to group your code in their own mind, or would it be easier for you to chunk the code for them, by setting some local variables, causing the developer to read your code in small bits. Compare the code I first posted with the following:

string helloMessage = "Hello {0}, {1} has called you {2} {3} today";


string timeText = "time";

if (callcount != 1)

    timeText = "times";


string displayText = string.Format(helloMessage , 

    callee.FirstName, caller.FirstName, callcount, timeText);



Now, while this code may be significantly longer than the original code, and while small shortcuts could be taken without significantly affecting the speed of comprehension for the code, the above code groups for the reader. It makes the comprehension of the code somewhat simpler. Also, note that I've respected the natural chunking of the words "times" and "time". I haven't separated "time" from it's plural specifier. Splitting the word would create two bits of information from one bit of information that would need to be rejoined mentally before understanding the code fully. Finally, notice that I've indented a line when declaring the displayText variable. The reason for this is that the indentation creates a natural break mentally for the reader. It makes it simpler for the developer to chunk the data, and the indentation is placed functionally at a logical location. (The top line pertains to the format string, and the bottom line pertains to the objects to be inserted into the format string.)

In conclusion, while the one-line version of the code is less verbose and has fewer lines than the later version, it has to be grouped and chunked to be understood. Doing so can easily lead into fatigue or misunderstanding of the code. It's also harder to debug. So, in general, it's a better idea to write your code like you're writing it to a five year old, than to conglomerate and nest code statements ad infinitum in an attempt to "shorten" your code.

On a side note, you can naturally avoid many of the pitfalls of long one-line statements by setting a larger font in your IDE and reducing the amount of horizontal space available in your editor window. Doing this will cause you to naturally desire shorter, more easily digested, code statements.