Wednesday, August 19, 2009

Understanding Escape Characters (For Beginners)

I had a question asked on the forum recently about escape characters, and since this is a rather frequent question for C# beginners, I've chosen to reproduce the material here. If you already understand escape characters in string literals, you can stop reading now.

Let's say that I have to write a computer program that reads a string character by character. I've decided that I need to use double quotes to signify the beginning and the end of a string. Knowing this, I can represent the string

bob

in my code file by typing:

"bob"

Great, no problem. But, what if I want to signify the following:

Bob says, "Hello."

Now I have a problem. The text I want to represent contains a quote character. So doing this won't work:

"Bob says, "Hello""

Because, the quote signifies the beginning and the ending of ths string. So what do I do? I use a special character that means "the next character should be considered part of the string, and not signify the beginning or the end of the string. I'll make that character be a "\". So now, I can type the following to represent my desired string:

"Bob says, \"Hello\""

Preceding the quote symbol with a backslash means that the next character is going to be a quote that is considered part of the string, not the end of the string. Not preceding it means that I'm signifying the end or beginning of the string.

Now that I've done that, I have another problem. What if I want to represent the escape character (the "\" character)? For example, what if I have a string like the following:

C:\Program Files\Morton\

Because I've redefined my backslash to mean that the next character is to be taken literally and not as a bounds to the value I'm trying to represent, how do I represent the "\" character itself?

To do this, I follow that backslash with another backslash to represent the end of the string. So now, to represent the string, I do this:

"C:\\Program Files\\Morton\\"

The above format is what the C# debugger will show when you hover over a particular string variable containing the text:

C:\Program Files\Morton\

In other words, it shows the value as it would need to be written in the source file, and not as it actually is.

So what about when you need to represent the following text:

When two backslashes (\\) are together, it means we want to represent a single "\" character.

How would you format that in the source code? Like this:

"When two backslashes (\\\\) are together, it means we want to represent a single \"\\\" character."

Confusing, I know, but the idea is still the same. My actual text I want to represent contains two backslashes in succession, so to represent that, I'd have to put four backslashes in succession. Also, note how the characters following "single" and before "character" are represented, by using a backslash, quote, backslash, backslash, backslash, quote respectively.

This is what it means to escape a string. Though string appear escaped in the debugger, this does not add to their length. The runtime still recognizes an escaped character as simply one character and not two. In other words, it does not take into account the escape character (\) when calculating the length of the escaped character (\\).

This is how C# regular literals are represented. This is needed in order to ensure that the compiler, which reads a string line by line, can understand the programmers intention, that they intend to enter a literal quote character versus simply ending the string. I hope this helps to understand the concept.