4 June 2007

C# Strings

by mo

C# strings are an interesting topic. In C# strings are immutable, that means once you construct an instance of a string it can no longer be altered.

A statement like the following actually constructs 3 different instances of strings.

  string myString = "Hello" + "Mo";

Strings seem to be smarter in C#, because of the intern table. The string intern table manages references to all strings. If an instance of “Mo” has already been created, when using the same literal the same instance is referenced from the intern table. If either variable referencing the instance of “Mo” is changed then a copy is made and only the changed string is effected.

Coming from a C background you would think that if two pointers are referencing the same area in memory, that when one one pointer alters the value at the location in memory then the value is changed for both pointers because they are both referencing the same area in memory. C# seems to manage this quite nicely!

”” actually constructs an instance of a string object. String.Empty does not, although the overhead is probably quite small it’s something to be aware of. I prefer String.Empty, just because the sight of quotation marks kind of makes me weary. (Move it to a const or #Define is what the C in my says!)

A static method that I think is under used is String.IsNullOrEmpty(). Instead of checking to see if an instance of a string is equal to “” use the static method I just mentioned.

Kind of sloppy code…

  if( myString != "" ){}

Better code…

  if( null != myString && myString.Length > 0 ){}

My preferred version…

  if( !String.IsNullOrEmpty( myString ) ){ }

Carriage return, new lines like the following… “\r\n” should be replaced by Environment.Newline; This allows you to write environment agnostic code. In Unix environments, I believe the new line character is “\n”, however on Non-Unix it seems to be “\r\n”. As the .NET platform moves to Linux and Mac, this can come in quite handy. (Maybe some day… got Mono!?)

Our good friend String literals… FxCop always screams at the sight of string literals in code. A great tool for re factoring literals into a resource file is the “Resource Refactoring Tool” which can be found on CodePlex… check it out!

But I am still unsure as to when string literals should be re factored to a resource file and when it is OK to keep them. One thing I can say is this… please be aware that if you have hard coded string literals, this is also hard coded in the MSIL. So hard coded user names and passwords are definitely a bad idea…. (Especially since anyone with access to your assembly, can disassemble it back to IL.)

All I really know about strings is that they are tricky, some people say use StringBuilder versus concatenating strings. Some say StringBuilder is more efficient when concatenating strings in loops, other say it’s only more efficient if a certain number of concatenations are performed. Some say string literals are more efficient then string constants because of the string intern table. FxCop says that string literals should be re factored to resource files, which makes sense for localization and strings that appear on UI.

But all this kind of makes my head spin… It seems there is no one concrete answer to all solutions when it comes to strings. Being aware of how strings work is probably the most efficient way to use strings! My 2 cents…