Allocation-free code in .NET

Introduction
01 — Stack vs Heap
02 — Structs & Value Types
03 — Span<T> and Memory<T>
04 — ArrayPool<T> and ObjectPool<T>
05 — stackalloc
06 — String Allocations
07 — Boxing
08 — Collections
09 — Closures & Delegates
10 — async/await and ValueTask
11 — Tooling & Measurement
12 — P/Invoke with [LibraryImport]
13 — Sequential Structs & Memory Layout
Quick Reference

Introduction

After 20 years of writing enterprise .NET applications, I’ve started exploring something completely different — game development. I’m not sure whether it’s a mid-career crisis, the sense that AI is taking over more and more of our development freedom, or just a bit of nostalgia that every developer eventually gets, but I’ve always been curious about what goes on underneath all the abstraction layers I’ve been comfortable with for so long.

I started by learning C. Proper, manual, “you asked for this memory and now you’re responsible for it” C. I wanted to understand what a pointer actually is, what stack and heap mean at the hardware level rather than just as concepts in a .NET book, and why old-school game developers treated memory as a precious resource rather than something the garbage collector would sort out eventually. It turns out that once you’ve manually freed memory in C a few times, the GC starts looking less like a convenience and more like something of a burden on the application.

Once I got a feel for that, I started thinking about C#. I know C# deeply — arguably better than most things in my life at this point — but I’ve never had to think about memory with the same discipline that game programming demands. A business application that allocates a few extra objects and triggers an occasional GC pause is fine. Code readiblity, unit test coverage, business logic and code structure have always been most important and manually optimised set of nested loops. A game loop running at 60 frames per second where a Gen 2 collection costs 30 ms is most definitely not something normal back in the day.

I want to explore what C# can do in this space. Libraries like Raylib and SDL3 have C# bindings, and the idea of building a small game in a language I already know well — while calling into native .dylib files on macOS through P/Invoke — is genuinely exciting. C# also gives me access to unsafe code and direct pointer manipulation when I need it, which is more capable than most developers who’ve only ever written web APIs would expect. The runtime gets out of your way if you ask it nicely enough.

This article is essentially a reference I’ve written for myself. When I eventually start building something with Raylib or SDL3 in C#, I want all of this close at hand. The rest of this site is mostly enterprise .NET — cloud-native APIs, event-driven systems, PostgreSQL — so this is a bit of a departure. But after 20 years in the industry, I think I’m entitled to a side quest.

01 — Stack vs Heap

Stack allocation is a single CPU instruction — push a pointer. Heap allocation requires the CLR to find space, write object headers, and eventually trigger collection. In a 60 fps game loop running 16 ms per frame, a Gen 2 GC pause can cost 5–30 ms alone.

Rule: target zero heap allocations per game-loop tick. Allocate freely at startup and on scene load.

// Stack — free, auto-reclaimed at end of method scope
int frameCount = 0;
float deltaTime = 0.016f;

// Heap — GC must eventually collect this
var message = new StringBuilder();  // ← avoid in Update()

// Value type on stack, no heap involvement
Vector3 velocity = new Vector3(1f, 0f, 0f);

02 — Structs & Value Types

Keep structs small (≤16–24 bytes). Mark them readonly to stop the compiler emitting hidden defensive copies when passing with in. A defensive copy is what the compiler produces when it can’t guarantee a method won’t mutate the struct — it silently duplicates the value first to protect the caller’s original, which costs you a copy you never asked for. Use ref struct to enforce stack-only lifetime — the type literally cannot escape to the heap.

// Without readonly: compiler emits a silent defensive copy of ColorRgba
// before calling any method — one hidden alloc per draw call.
public readonly struct ColorRgba
{
    public readonly byte R, G, B, A;
    public ColorRgba(byte r, byte g, byte b, byte a = 255) => (R, G, B, A) = (r, g, b, a);
}

// Pass by readonly reference — no copy, no alloc
void DrawSprite(int textureId, in ColorRgba tint) => _renderer.Submit(textureId, tint);

// ref struct — compiler forbids storing it in a field, List, or async method.
// Use it for transient per-frame data that must not outlive the current scope.
ref struct SpriteBatch
{
    public ReadOnlySpan<Sprite> Sprites;
    public ReadOnlySpan<Matrix3x2> Transforms;
    public float Alpha;
}

03 — Span<T> and Memory<T>

Span<T> is a pointer + length — a stack-only view over any contiguous memory. Use it instead of allocating sub-arrays or substrings. Memory<T> is the async-safe, heap-storable version; call .Span at the point of use.

byte[] packetBuffer = new byte[1024];

// Slice the buffer without any copy or allocation
Span<byte> header  = packetBuffer.AsSpan(0,   20);
Span<byte> payload = packetBuffer.AsSpan(20, 512);
Span<byte> trailer = packetBuffer.AsSpan(532);

// All three point into the same backing array — zero extra bytes.
ProcessHeader(header);
Decompress(payload, out int written);
ValidateChecksum(trailer);

// Span cannot cross async boundaries — use Memory<T> instead
Memory<byte> buffer = _sharedPool.Rent(4096).Memory;
await socket.ReceiveAsync(buffer, token);

// Materialise a Span only when you need direct access
var span = buffer.Span;
int messageLength = BitConverter.ToInt32(span[..4]);

04 — ArrayPool<T> and ObjectPool<T>

Never allocate temporary arrays inside a loop. Rent from the shared pool — it maintains per-thread buckets, sized to powers of two. Always return in a finally block. For heavyweight objects like parsers, use ObjectPool<T>.

// Called every frame — must not allocate
void UpdateParticles(ReadOnlySpan<Particle> particles)
{
    var alive = ArrayPool<Particle>.Shared.Rent(particles.Length);
    try
    {
        int count = 0;
        foreach (ref readonly var p in particles)
            if (p.Lifetime > 0f) alive[count++] = p;

        RenderBatch(alive.AsSpan(0, count));
    }
    finally
    {
        ArrayPool<Particle>.Shared.Return(alive);
    }
}

Gotcha: the rented array may be larger than requested. Always slice to your actual length before use — alive.AsSpan(0, count), not alive.AsSpan().

05 — stackalloc

Allocates directly on the stack and returns a Span<T>. Zero GC involvement. Use a threshold and fall back to ArrayPool for variable or large sizes — the default stack is 1 MB on Windows, 8 MB on Linux, and 8 MB on macOS for the main thread — but only 512 KB for worker threads (including .NET ThreadPool threads), which is the tightest limit you’ll hit in practice.

const int StackLimit = 512;

byte[]? rented = null;
Span<byte> scratch = size <= StackLimit
    ? stackalloc byte[StackLimit]
    : (rented = ArrayPool<byte>.Shared.Rent(size));

try
{
    SerialiseComponent(component, scratch[..size]);
    _socket.Send(scratch[..size]);
}
finally
{
    if (rented is not null)
        ArrayPool<byte>.Shared.Return(rented);
}

06 — String Allocations

Every Substring, + concatenation, and string.Format call creates a new heap object. In hot paths, avoid creating strings entirely. Use span-based methods for comparisons and parsing.

ReadOnlySpan<char> command = input.AsSpan();

// Parse verb without allocating a new string
bool isMove = command[..4].SequenceEqual("MOVE");

// Split on ':' and read the value side — no allocation
int colon = command.IndexOf(':');
ReadOnlySpan<char> value = command[(colon + 1)..];

// Parse an integer directly from the span
if (int.TryParse(value, out int entityId))
    MoveEntity(entityId);

// When you must return a string, allocate once and fill via Span.
// Avoids StringBuilder or intermediate concatenation strings.
string BuildEntityTag(int sceneId, int entityId) =>
    string.Create(12, (sceneId, entityId), (Span<char> buf, var s) =>
    {
        "SCN:".AsSpan().CopyTo(buf);
        s.sceneId.TryFormat(buf[4..], out _);
        buf[7] = ':';
        s.entityId.TryFormat(buf[8..], out _);
    });

07 — Boxing

Boxing wraps a value type inside a heap object. It’s invisible in source but ruins hot paths. The two most common traps: casting a struct to an interface, and passing value types to non-generic APIs.

interface IUpdatable { void Update(float dt); }

struct PhysicsBody : IUpdatable
{
    public void Update(float dt) { /* integrate velocity */ }
}

// ❌ Boxes PhysicsBody on every call — new heap object each frame
void Tick(IUpdatable u, float dt) => u.Update(dt);

// ✅ JIT generates specialised code per T — zero boxing
void Tick<T>(ref T u, float dt) where T : struct, IUpdatable
    => u.Update(dt);

// ❌ enum key boxes on .NET Framework (not .NET 5+ with JIT intrinsics)
var map = new Dictionary<ComponentType, ISystem>();

// ❌ non-generic string.Format boxes every value arg
var log = string.Format("pos={0} vel={1}", pos, vel);

// ✅ interpolation on .NET 6+ uses DefaultInterpolatedStringHandler
//    — avoids boxing for the format string itself
var log2 = $"pos={pos} vel={vel}";

08 — Collections

Pre-size your collections. Every resize copies the backing array. Avoid LINQ in loops — it allocates enumerators and state machines. Use CollectionsMarshal.AsSpan() to get a zero-copy span over a List<T>.

// Pre-size to avoid resize allocations during the game loop
var visibleEntities = new List<Entity>(capacity: 512);

// Iterate via Span — no enumerator allocation, cache-friendly
Span<Entity> span = CollectionsMarshal.AsSpan(visibleEntities);
for (int i = 0; i < span.Length; i++)
    span[i].Render();

// ❌ LINQ allocates Where enumerator + state machine
foreach (var e in entities.Where(e => e.IsActive))
    e.Update(dt);

// ✅ Zero allocation — iterate directly on List or array
Span<Entity> all = CollectionsMarshal.AsSpan(entities);
foreach (ref var e in all)
    if (e.IsActive) e.Update(dt);

09 — Closures & Delegates

Every lambda that captures a local variable generates a hidden class — and allocates it. Use static lambdas to prevent capture, cache delegate instances in fields, or use generic overloads that accept a state parameter.

// ❌ captures 'threshold' — allocates a closure object each call
float threshold = 100f;
entities.RemoveAll(e => e.Health < threshold);

// ✅ static lambda — cannot capture, cached as a single delegate
entities.RemoveAll(static e => e.Health <= 0f);

// ✅ cache event-handler delegates — allocate once in constructor
sealed class InputSystem
{
    private readonly Action<KeyEvent> _onKey;

    public InputSystem() { _onKey = HandleKey; }   // alloc once
    public void Subscribe() => Bus.On(_onKey);      // no alloc
    private void HandleKey(KeyEvent e) { /* ... */ }
}

10 — async/await and ValueTask

Each async method compiles into a state machine struct. Synchronous completion returns a cached Task — no allocation. Suspension boxes the state machine to the heap.

Task is a heap-allocated reference type — every time you return one, you allocate an object, even if the result was already available. That’s fine for operations that genuinely go to the network or disk, where the allocation cost is negligible compared to the I/O wait. It’s not fine for methods that are called thousands of times per second and usually have data ready immediately — a hot cache read, a ring buffer, a pre-filled queue.

ValueTask is a struct that wraps either the result directly or a Task as a fallback. If the method completes synchronously, you return new ValueTask<T>(result) and nothing is allocated. If it actually has to wait, it falls back to allocating a Task as normal. The win is in the common case — synchronous completion costs nothing. The reason you don’t use it everywhere is that ValueTask carries more restrictions — it must be awaited exactly once, can’t be cached or awaited again, and adds cognitive overhead that isn’t worth it when the method always suspends anyway.

// Ring-buffer audio reader — usually has data, rarely blocks
public ValueTask<AudioFrame> ReadFrameAsync(CancellationToken ct)
{
    if (_ring.TryRead(out var frame))
        return new ValueTask<AudioFrame>(frame);  // ← no heap alloc

    return ReadFromDiskAsync(ct);  // ← only allocates on actual I/O
}

Warning: ValueTask must be awaited exactly once. If you need to store or await it multiple times, call .AsTask() first.

// Avoids capturing SynchronizationContext — reduces marshalling
// overhead and potential allocations in UI / legacy ASP.NET hosts.
public async Task LoadAssetsAsync()
{
    await _manifest.FetchAsync().ConfigureAwait(false);
    await _textures.LoadAsync().ConfigureAwait(false);
}

11 — Tooling & Measurement

Only optimise what you can measure. Add this assertion to your hot-path unit tests:

long before = GC.GetAllocatedBytesForCurrentThread();

UpdateParticles(particles);   // the hot path under test

long after = GC.GetAllocatedBytesForCurrentThread();
Assert.Equal(before, after);  // fail if anything was allocated

Tool	Purpose
dotMemory	Heap snapshot diffing, allocation call trees
BenchmarkDotNet	Micro-benchmarks with `[MemoryDiagnoser]` — bytes allocated per op
PerfView / ETW	GC event traces, allocation sampling (Windows)
Roslyn Analysers	Detects boxing, closure captures, missing `readonly` — in-IDE
dotnet-trace	Cross-platform GC heap stats

12 — P/Invoke with [LibraryImport]

.NET 7 introduced [LibraryImport] as the source-generated replacement for [DllImport]. Instead of the runtime doing reflection-based marshalling at call time, the Roslyn source generator emits the interop code at compile time — faster, AOT-friendly, and the marshalling is visible in your project rather than hidden in the runtime.

Your binding class must be static partial, and each binding method must be partial too.

// Binding to a function in a C library you compiled yourself:
// int add_vectors(float* a, float* b, float* out, int count);

internal static partial class MyLib
{
    [LibraryImport("mylib", EntryPoint = "add_vectors")]
    [UnmanagedCallConv(CallConvs = [typeof(CallConvCdecl)])]
    internal static unsafe partial void AddVectors(float* a, float* b, float* result, int count);
}

On macOS your compiled library is a .dylib; on Linux it’s a .so. Rather than hardcoding the extension, use NativeLibrary.SetDllImportResolver to handle this once at startup:

NativeLibrary.SetDllImportResolver(typeof(MyLib).Assembly, (name, assembly, path) =>
{
    if (name != "mylib") return IntPtr.Zero;

    string lib = RuntimeInformation.IsOSPlatform(OSPlatform.OSX)
        ? "libmylib.dylib"
        : "libmylib.so";

    return NativeLibrary.Load(lib, assembly, path);
});

Strings require explicit marshalling — there is no implicit conversion. Specify StringMarshalling.Utf8 when binding to functions that take const char*:

[LibraryImport("mylib", EntryPoint = "set_window_title", StringMarshalling = StringMarshalling.Utf8)]
[UnmanagedCallConv(CallConvs = [typeof(CallConvCdecl)])]
internal static partial void SetWindowTitle(string title);

For callbacks — passing a C# method into native code as a function pointer — use delegate* unmanaged and [UnmanagedCallersOnly]. The method cannot be an instance method, cannot throw, and cannot trigger the GC.

unsafe
{
    delegate* unmanaged[Cdecl]<float, void> fp = &OnUpdate;
    MyLib.RegisterUpdateCallback(fp);
}

[UnmanagedCallersOnly(CallConvs = [typeof(CallConvCdecl)])]
static void OnUpdate(float dt)
{
    // Called from native code — no managed exceptions, no GC transitions
    _world.Tick(dt);
}

13 — Sequential Structs & Memory Layout

When you pass a struct to a native function, the CLR must lay it out in memory exactly as the C side expects. [StructLayout(LayoutKind.Sequential)] tells the runtime to match field order and alignment, producing a layout the C struct can read directly.

/* mylib.h */
typedef struct { float x, y, z; } Vec3;

typedef struct {
    Vec3  position;
    Vec3  velocity;
    float lifetime;
    int   id;
} Particle;

[StructLayout(LayoutKind.Sequential)]
public struct Vec3
{
    public float X, Y, Z;
}

[StructLayout(LayoutKind.Sequential)]
public struct Particle
{
    public Vec3  Position;
    public Vec3  Velocity;
    public float Lifetime;
    public int   Id;
}

Both structs are blittable — every field is a primitive or a blittable struct. Blittable means the managed and unmanaged memory layouts are identical, so P/Invoke pins the memory rather than copying it. No marshalling overhead at all.

For C fixed-size inline arrays (e.g. char name[64]), use fixed in an unsafe struct:

/* C: typedef struct { char name[64]; int score; } Player; */

[StructLayout(LayoutKind.Sequential)]
public unsafe struct Player
{
    public fixed byte Name[64];  // inline — same byte footprint as C, no heap pointer
    public int Score;
}

For C unions, LayoutKind.Explicit with [FieldOffset(0)] overlaps fields at the same address:

/* reinterpret float bits as uint */
typedef union { float f; uint32_t bits; } FloatBits;

[StructLayout(LayoutKind.Explicit)]
public struct FloatBits
{
    [FieldOffset(0)] public float F;
    [FieldOffset(0)] public uint  Bits;
}

When the native API is messy enough to be awkward to bind directly — variadic functions, opaque handle patterns, deeply nested callback registration — the simplest approach is a thin C bridge file. You compile it into your own .dylib and bind to your simplified surface instead of fighting the raw API:

/* bridge.c — compiled into libbridge.dylib alongside the game */
#include <SDL3/SDL.h>

void* bridge_create_window(const char* title, int w, int h)
{
    return SDL_CreateWindow(title, w, h, SDL_WINDOW_SHOWN);
}

void bridge_destroy_window(void* window)
{
    SDL_DestroyWindow((SDL_Window*)window);
}

void bridge_poll_events(int* quit_flag)
{
    SDL_Event e;
    while (SDL_PollEvent(&e))
        if (e.type == SDL_EVENT_QUIT) *quit_flag = 1;
}

internal static partial class Bridge
{
    [LibraryImport("bridge", EntryPoint = "bridge_create_window",
        StringMarshalling = StringMarshalling.Utf8)]
    [UnmanagedCallConv(CallConvs = [typeof(CallConvCdecl)])]
    internal static partial nint CreateWindow(string title, int w, int h);

    [LibraryImport("bridge", EntryPoint = "bridge_destroy_window")]
    [UnmanagedCallConv(CallConvs = [typeof(CallConvCdecl)])]
    internal static partial void DestroyWindow(nint window);
}

The bridge pattern gives you full control over the ABI surface and keeps the C# side clean. Worth writing one .c file if the alternative is fighting a complex native API header by header.

Quick Reference

Concern	Solution
Temporary buffers	`stackalloc` / `ArrayPool<T>`
Array / string slicing	`Span<T>` / `ReadOnlySpan<T>`
Async hot path	`ValueTask<T>`
List iteration	`CollectionsMarshal.AsSpan()`
Interface on struct	`where T : struct, IInterface`
Repeated callbacks	Cache delegate in field
Math structs	`readonly struct` + `in` param
String parsing	`Span.SequenceEqual` / `TryParse`
Heap snapshot	dotMemory / PerfView
Benchmark allocations	`[MemoryDiagnoser]` BenchmarkDotNet
Native library binding	`[LibraryImport]` + `static partial` class
Platform .dylib / .so	`NativeLibrary.SetDllImportResolver`
C struct mapping	`[StructLayout(LayoutKind.Sequential)]`
C union mapping	`[StructLayout(LayoutKind.Explicit)]`
Inline C array (`char[64]`)	`unsafe fixed byte Name[64]`
Native callbacks	`delegate* unmanaged[Cdecl]` + `[UnmanagedCallersOnly]`
Complex API bridging	Thin `.c` bridge file compiled to own `.dylib`