Let me tell you why you're here. You're here because you know something. What you know you can't explain, but you feel it. You've felt it your entire life, that there's something wrong with the world. You don't know what it is, but it's there, like a splinter in your mind, driving you mad. It is this feeling that has brought you to me. You want to know what is heap fragmentation.

And now for something completely different : Virtual Memory

Should you wish to know more about virtual memory, I advise you to read this Wikipedia article.

Virtual memory is an abstraction layer maintained by the Virtual Memory Manager (VMM), a component within the operating system kernel. It sits on top of physical memory. Every pointer in your program - running in an OS having virtual memory that is - refers to a virtual memory address. You don't address physical memory directly from an user mode program in a modern OS.

When your pointer says "0x400000", it's 0x400000 relative to the base of your virtual address space. It's totally unrelated to the actual physical memory address (only the VMM knows what is the corresponding physical address).

You don't have to allocate virtual memory directly, you use new or malloc.

It's nonetheless possible. On Windows this is done with the VirtualAlloc function that allows fine grained operation on the ranges you wish to allocate and the rights. You can choose to commit the memory or not.

Sounds useless? In most cases it is.

Here's one use case however: let's say you want to have a huge matrix in memory, but only plan on using few spots. You can allocate the whole matrix, but only commit the spots you need (you can even add a handler to the access violation routine to commit on the fly...). In doing so you're going to minimize the actual amount of memory used.

VirtualAlloc (or mmap in BSD) is however extremely slow. It's not Windows fault. Not only do you have to go through the kernel gate, but you're also requesting the system to fetch resources... The higher the load on your system, the slower it will be. Even when you have huge amounts of memory available, virtualalloc et al. are slow.

Fortunately, you generally don't allocate virtual memory directly. So what happens when you make a call to new or malloc?

You are using a heap!

The heap is a block allocator that reduces the amount of calls to the system. It will request a large block of memory and slice it to your needs. It will try to make your memory request fit in one of its available slots.

What happens when no slot is available? It asks the operating system for another block of virtual memory ,commits it and give you what you need.

What happens when no more virtual memory is available? A null pointer is returned and most of the time the program will cash in its chips unless it's been written to handle low memory conditions (hint : it hasn't been).

When you work with pointers, it's very important to keep your pointer valid (it's always pinned to use a .Net term). That means your heap allocator cannot move blocks around to optimize the space usage.

So you're randomly filling blocks of various sizes and cannot move them around. Here comes the fragmentation train, next stop is you.

Memory fragmentation results in increased memory usage as if you were leaking. It's pretty tough to narrow down the problem to fragmentation as you will generally waste of lot of time looking for a inexistent leak.

Differences between leaking and fragmenting

Fragmentation generally results in huge blocks of memory going away for no reason. Typically you'll suddenly lose 200 MiB of memory after allocating 20 bytes. First time it happens to you, believe me, you're going to get a bottle of rum and a cigar, and pray for Baron Samedi to help you on this case.

Memory leaking is usually triggered by a specific event (a call to the leaking code) and the memory increase is more linear and regular than with fragmentation.

The graph below sums up the differences:

Fragmentation vs Leaking

Please understand that the behavior of memory fragmentation is largely dependent on your heap allocation strategy. On this graph I exhibit a fragmentation where the allocator requests twice more memory than you're already using.

How to reduce heap fragmentation?

Use a better heap

The heap is nothing more than a chunck of code sitting somewhere in your C/C++ runtime. If you search the Internet a little bit, you will hear about lkmalloc, jemalloc, hoard, libumem, etc.

The classic strategy is to group allocations of similar sizes. If you have allocations close to, say, 4 kib, sitting one next to the other, you reduce the odds of fragmentation as it will be easier and faster to find room for new allocations.

Modern heaps are not only designed to fragment as little as possible but also to be efficient on multi CPU machines and to offer security and debugging features. So you might hit two birds with one stone: reduce fragmentation and get better performances!

FreeBSD comes with a very decent allocator by default (jemalloc since FreeBSD 7) and I see very few use cases where you would want to switch to another one.

In Windows, the default allocator is poor. For retrocompatiblity reasons, the more advanced memory allocator has to be explicitly requested. This allocator is called the "Low Fragmentation Heap" (LFH).

Activating the allocator consists in listing all existing heaps in your program and flipping the feature on for each one with the HeapSetInformation function.

Here is the code we use at Bureau 14 to switch all heaps to low fragmentation:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
void get_heaps(vector<handle>& heaps)
{
    HANDLE bogus[1];
    // measure
    DWORD c1 =::GetProcessHeaps(0, bogus);
    // resize
    heaps.resize(c1);
    // get
    if(::GetProcessHeaps(c1, &heaps[0])!= c1)
    {
        // fail, we discard everything
        heaps.clear();
    }
}

long lowfragheaps(void)
{

    // getting the heaps
    vector<handle> heaps;
    get_heaps(heaps);

    if(heaps.empty())
    {
        // no heap found => error, there must be at
        // least the process heap
        return0;
    }

    typedef std::pair<handle, LONG> heaptype;
    typedef std::vector<heap
type> heaps_vector;

    heapsvector heapsinfo(heaps.size());

    struct compatinfo
    {
        heap
type operator()(HANDLE h)const
        {
            LONG type =0;
            SIZE_T returnedLength =0;

            ::HeapQueryInformation(h,
                HeapCompatibilityInformation,
                &type,
                sizeof(type),
                &returnedLength);

            return heap_type(h, type);
        }
    };

    // we get the type of all heaps
    transform(heaps.begin(),
        heaps.end(),
        heapsinfo.begin(),
        compat
info());

    // and we only work on the ones that are of type 0
    // (standard heap)on the left we have all heaps
    // that are low fragmentation or look-aside on
    // the right all heaps that are standard (and will
    // need to be set to lf)
    // note : I don't think look-asides are available
    // in user mode
    heapsvector::iterator stdstart = partition(
        heapsinfo.begin(),
        heaps
info.end(),
        lambda::bind(&heaptype::second, lambda::1)!=0);

    // we want to include the heaps that are already lf in
    // our count (type 2)
    heapsvector::sizetype lfcount = std::countif(
        heapsinfo.begin(),
        std
start,
        lambda::bind(&heaptype::second, lambda::1)==2);

    ULONG hf_val =2;

    // we activate Low Fragmentation on all std heaps
    foreach(stdstart,
        heapsinfo.end(),
        lambda::var(lf
count)+=
        lambda::bind<bool>(&HeapSetInformation,
        lambda::bind(&heaptype::first, lambda::1),
        HeapCompatibilityInformation,
        &hfval,
        sizeof(hf
val)));

    // no need to close heaps handle
    // cast is safe as we won't have more than 2^32
    // heaps...
    returnstaticcast<long>(lfcount &0xffffffff);
}

The function returns the count of heaps that are in LF mode. You only need to call this once per program once all the heaps are created. In other words, call it after all your libraries have been initialized.

Note that the Low Fragmentation Heap is not available in debug mode unless you set the global variable NODEBUG_HEAP to 1 . As the variable's name suggest, doing so will strip the heap of all debugging features.

Write better programs

It's easy to blame others for your own deficiencies.

If you have a wide range of allocations of very different sizes, the above approach will only have a limited effect. It is my experience that the programmer has to do some effort to optimize allocations to get the best out of the operating system.

In this post I give you a precise example about how you can hugely improve memory usage in your programs without (too much) hassle.

Life is complicated...

A memory intensive program needs to be written with a broad view. You will need to have a good understanding of memory management, use the right allocator and make efficient use of it.

Heap fragmentation is one of the many enemies that will ambush you on the path to awesome programming.

I know you're reading this and wonder when I'm going to tell you how to cook rice well. Don't worry, I'll keep my word. The trick is to wash the rice. Wash the rice as long as the water is white. Be thorough. Make sure your hands are clean and use them to mix the rice with the water. Here comes the next important step to have a wonderfully cooked rice: take the washed rice, put it in the rice cooker and press on.