Everything You Need To Know About .NET Performance Optimization

0
232

In performance-critical applications, every microsecond counts, and wasted time can quickly add up and become a bottleneck. We’ll discuss some general programming performance tips, as well as .NET specific best practices and optimizations.

How to Time Your Code

The most basic way to measure how long a function takes is using a StopWatch. This class can take very precise measurements of code when debugging, and can even be used programmatically for things like setting timeouts.

Stopwatch watch = System.Diagnostics.Stopwatch.StartNew();

// the code that you want to measure
watch.Stop();
double elapsedMs = watch.Elapsed.TotalMilliseconds;

While it’s pretty precise, it can lead to some unexpected results—it doesn’t factor out “background noise,” such as .NET running JIT compilation of a function, which can often lead to the very first execution being longer. You should take an average over many runs to get more accurate results.

For a better debugging experience though, you should use a performance profiler. Visual Studio has some great built-in debugging tools that can even display graphs showing which functions are taking the longest.

You can also use a performance profiler to debug memory usage. Simply click “Take Snapshot,” and you can view all objects currently allocated.

Check Your Loops

This one is a little obvious, but it’s necessary to mention. If you’re writing a function that’s only called a few times per minute, it doesn’t matter much if it takes 5 microseconds or 5 milliseconds. It’s likely not a bottleneck. (Though this can definitely lead to bad code in unimportant spots going unchecked, and you should still write good code in principle.)

However, if you’re doing something multiple times over, or doing something on every tick, or on every request, the standards are immediately different. Even small improvements are much more important, because over 10,000 loop cycles, it can add up quickly.

Another similar scenario is functions with unlimited parameter sizes. For example, you might have a function that runs well with a small string or a small list, but if you pass it a List with 10,000 elements, poorly optimized algorithms can reveal themselves.

Keep Big O Notation in Mind

In general, a computer’s performance is usually limited by one of two things: CPU speed and memory performance. If you can reduce the number of CPU cycles something takes, or how much memory something is using, you can usually speed it up.

Big O notation is used to categorize the speed and complexity of algorithms. You can think of it as a graph, with the size of the input (like the number of list elements) on the x axis, and execution time on the y axis.

The real world is obviously very messy, so rather than trying to describe the exact execution time, Big O notation is supposed to represent the shape of the curve itself. You can quickly see where something like O(N^2), or polynomial time, would be an issue; with 100 list elements, it might run fine, but with 10,000 list elements, it could grind to a halt.

Big O notation isn’t just for execution time though—it can also be used to describe other resource usage like disk space and memory footprint.

In .NET, one of the biggest algorithm improvements you can do is replace list lookups with hash-based data structures like Dictionary and HashSet, rather than searching through big lists through iteration.

For example, this code here will loop over every Person object in a big list, checking if any are named Steve. For small lists this isn’t a big deal, but this query resolves in O(N) time, or linear time, and will scale up if you need to do this often.

If you’re commonly looking up based on first name, you can instead use a Dictionary, which stores a unique list of objects based on the hash of the given Key. In this case, it stores the hash of every person’s first name, though you will need to generate this manually and keep it in sync with any other lookups you may have.

Looking up a Dictionary value is an O(1), or fixed time, operation. No matter the size of the list, it costs the same to do the hash and lookup. This also applies to direct array lookups, like array[i], which is just a direct memory access.

Another big performance hog is .Contains(), a function that checks if a list contains something. On a List<T>, this is done in linear time. If the List you’re checking is large, or if you’re just doing it many times, you may want to use a HashSet. A HashSet is a unique set of items that stores a hash alongside each entry, basically like a dictionary without a predefined key. Doing .Contains() on a hashset is done in linear time.

Lastly, LINQ queries actually use deferred execution. For example, .Where() checks each element to see if it matches a query. Except, this doesn’t return a new list—it returns an IEnumerable, which defers execution until it is iterated over or manually turned back into a list with .ToList().

This means that if you do two queries, you’re not doing two iterations, just doing two checks in one iteration, which removes all the overhead of doing unnecessary loops.

RELATED: How Do Enumerators and Enumerables Work in C#?

Reduce Your Garbage, Avoid Unneeded Allocations

C# and other .NET languages use a garbage collector, which periodically scans memory and cleans up objects that no longer have references. This is fantastic for developers, as it enables them to write simpler code much more quickly.

But, because there’s no such thing as a free lunch, it comes at a cost—performance. When the GC runs, it must halt program execution, which affects performance. However, just because C# is garbage collected, doesn’t mean you need to go around making unnecessary garbage. Anything you can do to relieve pressure on the GC and trigger GC fewer times will lead to better performance.

But what is garbage? In .NET, there are two different kinds of types: Value types, and reference types. Value types have fixed sizes, like int, bool, float, and other fixed-size structs. They’re stored on the stack, which is a data structure in memory that’s very fast.

The stack isn’t fast because it’s special memory, it’s fast because it’s a Last-In, First-Out (LIFO) data structure that’s fully managed. Variables you allocate essentially stack on top of each other in a bucket. When the variables go out of scope, like when the function exists, your computer reaches into the bucket and removes the top item, one by one until the stack is clean again. This makes it very fast. (Though while stack allocations are cheap, they’re not entirely free.)

The other kind of values are stored by reference, and are used for any type that doesn’t have a fixed size like Lists and string. The variable that holds the reference is stored on the stack, but the underlying memory is stored on the heap.

Compared to the stack, the heap is an unmanaged mosh pit of allocations. Not only is it much harder to clean up, requiring the use of a garbage collector, but it’s also much harder to allocate objects. Granted it’s still quite fast, but compared to stack allocations, it’s a lot slower.

Every time you initialize something with the new keyword, keep in mind that you’re creating memory that has to be cleaned up later. For example, the following code only uses one variable, but because you’re allocating two new objects, the first one gets thrown away.

Fruit apple = new Fruit();
apple = new Fruit();

When you reset this variable, you’re really just setting it to a different memory address. The garbage collector will realize you don’t care about the first address anymore, but it can be easy not to care when you’re not the one cleaning up.

Pool Expensive Objects When Possible

Object pooling is basically like recycling instead of just throwing things away, and it can lead to some big speedups when doing a lot of allocations in a loop.

For example, the following code runs 10,000 times, and leaves 10,000 garbage lists at the end.

The point here is that this function only requires one list worth of memory, but it’s actually using 10,000 times more memory than it needs.

With an object pool, rather than allocating a new object directly, you request one from the pool. The pool may create a new object if it doesn’t have one available. Then, when you’re done with it, you release that object back to the pool.

Rather than chucking the object in the garbage, the object pool keeps it allocated but clears it from all data. The next time you request a new object, it returns an old now-empty object. For large objects like Lists, doing this is much easier on the underlying memory.

Another more general implementation of this concept, is simply clearing out large lists rather than allocating new ones. This achieves the same effect of optimal memory usage, without explicitly using an object pool.

If you want to use Object Pools, Microsoft offers an implementation in Microsoft.Extensions.ObjectPool. If you want to implement it yourself, you can open up the source to check out how it works in DefaultObjectPool. Generally, you’ll have a different object pool for each type, with a maximum number of objects to keep in the pool.

JSON Serialization Tips

JSON serialization and deserialization is often a major performance bottleneck. After all, it’s working with gigantic strings and huge chunks of memory. Luckily, there are a few tricks to speed things up.

Deserialize From Streams

If you’re working with large pieces of JSON, particularly those greater than 85KB, you’ll want to only deserialize from a memory stream. This is because in order to work with strings larger than 85KB, it must be stored on the large object heap, which isn’t good for performance. When you instead deserialize from a memory stream, this doesn’t matter, because only a small piece is read at a time.

HttpClient client = new HttpClient();

using (Stream s = client.GetStreamAsync(“http://www.test.com/large.json”).Result)
using (StreamReader sr = new StreamReader(s))
using (JsonReader reader = new JsonTextReader(sr))
{
    JsonSerializer serializer = new JsonSerializer();
    Person p = serializer.Deserialize<Person>(reader);
}
Manual Serialization

Beyond working with a lot of memory, JSON serialization is slow for another reason—reflection. To deserialize a generic object into any given type, most JSON converters must look at the structure of the type being passed to it, which is a slow operation.

However, in performance-critical scenarios, you can drastically speed it up by manually serializing each field, bypassing the overhead of reflection-based serialization. In some cases, this can be up to twice as fast as standard serialization.

public static string ToJson(this Person p)
{
StringWriter sw = new StringWriter();
JsonTextWriter writer = new JsonTextWriter(sw);

writer.WriteStartObject();

writer.WritePropertyName(“name”);
writer.WriteValue(p.Name);

writer.WritePropertyName(“likes”);
writer.WriteStartArray();
foreach (string like in p.Likes)
{
writer.WriteValue(like);
}
writer.WriteEndArray();

writer.WriteEndObject();

return sw.ToString();
}
Consider Using Jil

Jil is a .NET JSON serializer/deserializer from StackExchange, with a number of “somewhat crazy optimization tricks.” It’s over twice as fast at serializing than NewtonSoft, and nearly twice as fast at deserialization (though with the tradeoff of using more memory). If you’re interested, you can read all about their performance optimizations on their GitHub page.

Use StringBuilder for Mutable Strings

Strings are essentially Lists in disguise, and it’s pretty easy to forget that fact when working with them. It doesn’t help that they’re immutable as well. This can lead to significantly more memory allocations than necessary.

For example, take the following code that works with a couple strings:

string first = “Hello, “;
first += “World”;

The intuitive understanding is that this simply adds “World” to the end of the string, and changes the value of the variable. But what it actually does is make an entirely new string, with a separate chunk of memory, and throw away the first string. The first variable is changed to point to a new memory address.

It’s quickly clear that doing this many times in a loop is terrible for performance. The solution is the StringBuilder, which instead stores the string in a more List like structure. This allows you to directly modify the string itself, which saves a ton of memory (and memory allocation time) in the process.

To use them, create a new StringBuilder(), which you can initialize with a base string and preallocated a certain number of characters. You can then append other strings or chars to it and then when you want to create an actual string, you can call .ToString() on it.

static void Main()
{
// Create a StringBuilder that expects to hold 50 characters.
// Initialize the StringBuilder.
StringBuilder sb = new StringBuilder(“Hello, “, 50);
sb.Append(“World”);
Console.WriteLine(sb.ToString());
}

Use stackalloc When Applicable

stackalloc is a feature used to allocate a List on the stack, which is much faster than heap allocations and produces no garbage. Previously this required the use of pointers in an unsafe context, but since C# 7.2, you can use it with Span to do it safely:

Span<byte> data = stackalloc byte[256];

Of course, with great power comes great responsiblility, and stackalloc should only be used very sparingly in specific scenarios. If you’re interested, you can read this article on the dos and don’ts, but the general rule is to only use it for local variables in relatively small, fixed size lists that go out of scope quickly when the function exits. You should never use it with variable allocation lengths, or in a variable length loop.

Use Tasks and Coroutines for Lengthy Functions

Sometimes, you can’t get around doing expensive calculations. However, you can minimize the effect these have on the end user. If you’re building a desktop application, waiting for long actions like web requests must be done on a separate thread, or else you’ll cause the entire application to hang.

This is usually done with Tasks, which get queued up on a separate thread once started, and can await other Tasks, like fetching a web request or something from a database. Once it’s done, the Task is destroyed, and the result can be accessed from the main thread.

RELATED: How Do Tasks Work In C#? Async/Background Threads

A big benefit of Tasks is that because they’re on a different thread, they can be processed concurrently. For example, you can start doing JSON deserialization on a different thread, then do some other processing on the main thread, and await the JSON result once you need it. For some applications, you can multithread the process, allowing it to make use of a CPU’s full power.

Another useful C# feature are coroutines using IEnumerators. Unlike Tasks, these are meant to be ran on the main thread, but called once per frame to do some processing. For example, in video games, the framerate matters a lot, so doing something like loading a large list of objects can cause unpleasant stuttering. Using a coroutine, you could configure it to only process X items per frame and then yield to the next frame.

RELATED: How Do Enumerators and Enumerables Work in C#?

Again, neither of these two techniques make your code any faster (unless you use Tasks for concurrent processing), but they can make slow code less noticeable to the user.

Avoid Unnecessary Boxing and Unboxing

object is the ultimate base class in .NET. Even value types, like int and bool, are objects. Because of this, methods and classes that accept object can be very useful for polymorphism. But, in most cases, you should instead look to use generic type parameters, to avoid unnecessary boxing.

Boxing is a concept that’s used to pass value types to methods accepting objects. Whenever you box a value type, it’s copied to the heap, which causes a heap allocation and a memory copy. This is obviously less performant, so any time you can get away with using generic type parameters and avoid boxing, you should.

Particularly, this includes ArrayList, an older list format that made heavy use of boxing. It’s been phased out in favor of the more performant List<T>, which you should definitely use instead.

Optimize 2D List Iterations For Cache Prefetching

This one’s a bit weird, but for two-dimensional lists, the order of iteration actually matters. For example, consider an Array[3][4], like so:

[1 2 3 4],
[5 6 7 8],
[9 1 2 3]

You would access each element first by giving it a Y value, then by giving it an X value.

list[y][x]

The problem is, .NET stores this list in order from left to right, going down the list. If you were to iterate over Y first (i.e, from top to bottom, moving along each column), you’d be iterating over large chunks of memory.

Because modern CPUs make heavy use of cache prefetching (basically, fetching the next 64 contiguous bytes before it’s actually requested) to improve performance, you will want to iterate over the list in the order it’s stored in memory. The difference can be drastic—for large lists, iterating the wrong way can be over 10x slower.