Reducing heap pressure

Posted by user on 14 Jun 2009

C++ is the world of do-it-yourself memory management. When discussing programming with someone, as I reach the point where I tell that I mainly work with C++ the first thing I hear is "Oh... But you have to deallocate memory yourself... That's so cumbersome and obsolete!"

Although I don't have to wrestle with news and deletes anymore, I find this remark pretty interesting. It's true that for most projects having to bother about memory management and other similar details is a waste of time. But for many projects, tight memory management is the difference between awesome software and bloated software.

Generally these projects imply programs that stay up and running for months, middleware used by developers on crack or video games that really, really, shouldn't go below 60 frames per seconds.

I lied

Actually, the reason why I find the comment about C++ memory management interesting is because it shows a lack of comprehension of the whole memory management thing.

Garbage collection is not a silver bullet against memory leaks, it's a different approach that works very well in many cases but horribly wrong in some others.

The garbage collector memory basically tells you :

You won't have to bother about calling delete (or free, or whatever you want to call it), I'll be able to see when it's appropriate. But please, don't do anything stupid because I'm just a program and I won't call delete if I cannot know for sure that the object must be destroyed. By the way if I find the one who left a half eaten sandwich in the fridge I'm going to beat him to death with linked lists.

Most of C++ programmers use reference counted memory management through the use of smart pointers. It's pretty safe and straightforward but it also has got some catches. You have two main things to keep in mind: you must be careful with circular references and you must make sure that you don't keep unnecessary copies of an object (singletons...).

We also hope to see improvement in a multithreaded context when smart pointer will start to use the new atomic semantics (aka lock free smart pointers). Currently, increasing and decreasing the reference counter can be a bottle neck when a lot of threads access the same variable.

In addition, it's not garbage collected memory. Memory isn't "abstracted out". When you create an object and encapsulate it in a smart pointer, you're making a call to new which in turns call the heap allocator of your operating system and that might result in a extremely expensive request to the kernel. That's the second most frequent performance penalty.

All things being equal, smart pointers make the usage of dynamic objects safe and easy. This is why they're great. No more memory leaks. Safe deletion. I love them and use them a lot.

Wait.

I lied again

Actually, I don't use them a lot. You know what's better than smart pointers ? Not using pointers.
The problem is that, if you're not careful, you can end up with stuff like this:

1
2
3
4
5
6
7
struct object
{
    shared ptr<object1> _o1;
    shared
ptr <object2 > o2;
    shared
ptr <object3 > o3;
    shared
ptr <object4 > _o4 ;
} ;

and of course you're going to do

1
shared ptr<object> o = makeshared <o > ( ) ;

That's a minimum of 5 allocations for you. Notwithstanding other allocations that sub-objects may do.

And when you have something like:

1
vector <shared_ptr <object >>

The heap allocator looks at you with scared eyes. He knows his face is going to be the landing zone of the baseball bat you have in the right hand.

That kind of pattern is acceptable in a garbage collected language such as Java or C#, especially because the garbage collector is going to reuse unused objects and issue less calls to the kernel.

In C++ we do care about the allocations count, because it's an expensive operation. I can already hear you say "premature optimization is evil!", "measure and then optimize!" and "don't assume anything!".

Sure. Whatever.

How can we improve this?

Don't use dynamic object unless you really need to. With references, in C++, you can avoid using pointers in many situations.

The above object becomes:

1
2
3
4
5
6
7
struct object
{
    object1 _o1 ;
    object2 _o2 ;
    object3 _o3 ;
    object4 _o4 ;
} ;

And if you need an accessor:

1
const object4 & get_o4 ( ) const { return _o4 ; }

That gives even more guarantees in terms of safety because the return value can never be a null pointer.

And maybe you can place it in the vector as in

1
vector <object >

Which is more straightforward.

Also don't forget that

1
2
3
4
struct object
{
    object1 & _o1 ;
}

Is perfectly legal, the only requisite being that o1 must be initialized in the constructor. Another alternative to dynamic allocation.

But I really need a dynamic object!

Of course you do. You're not going to put everything on the stack, are you? You may also want to make use of polymorphism. Last but not least, pointers have the advantage of taking only few bytes which is great when you want to swap data around.

But keeping their number to a minimum is a good way to reduce heap pressure and horrible heap fragmentation issues. Fragmentation is an issue whatever awesome heap manager you believe to use.

It occurs when you allocate and deallocate a lot of object of various sizes. It becomes difficult for the heap manager to find a suitable location for your new memory chunk, resulting in an increase of the heap size. When that happens, performances drop, memory usage raises and users cry.

This is extremely true when having large STL containers with a lot of dynamic objects. Not only the maintenance of your STL container is going to require dynamically allocated memory, but your objects too! You can easily find yourself doing thousands of small allocations when applying algorithms to your collections.

Something as innocent as:

1
2
3
4
5
6
7
8
vector <shared ptr<object>> v1;
vector<shared
ptr <object >> v2 ;
vector <shared ptr<object>> r;
set
intersection (v1. begin ( ),
    v1. end ( ),
    v2. begin ( ),
    v2. end ( ),
    back_inserter (r ) ) ;

Can kill your heap for good (I know it lacks a functor to make something useful, but you get the idea).

But Edouard, you n00b! You're using a vector whereas in that case a deque would be more appropriate! lol!

Right. Deque will help. But not enough. You're going to make so many hidden news and deletes that the heap will agonize, crawl to you and beg for mercy.

You need a pool in your backyard

You're not pimping you C++ program enough if you don't put a pool in your backyard. A memory pool is a specialized allocator that creates and destroys objects of identical sizes.

As you can guess, it's pretty easy to fill a hole in your closet when it's designed to receive boxes of identical sizes. That's what a pool is all about. Not trying to solve the difficult general case, but solving well an easy special case.

A well used pool results in increased performances and reduced memory fragmentation.

Guess what, the boost libraries have pools. If you couple one with a shared pointer, you'll be ready to rain down meteors on your containers.

How to use safely and efficiently the boost pool

Let's say we have an object we want to allocate dynamically a lot:

1
2
3
4
5
6
7
8
9
struct rock it
{
    rock
it ( void ) : _a ( 0 ), _b ( 1 ), _c ( 2 ), _d ( true ) { }

    int _a;
    int _b;
    char _c;
    bool _d;
};

I know, this object rocks. It's probably the most awesome object to be ever written in C++ or any other language.
In our example we will use the singleton pool which is safe in a multithreaded environment. You define the pool with a typedef and a tag:

1
2
3
4
5
6
struct a pooltag { } ;
typedef singleton pool
<
    a
pool tag,
    sizeof(rock
it )
> elite allocation;

This pool will create as many blocks as needed. Each block will be large enough to hold a rockit object. It's got the disadvantage of requiring explicit calls to free. You probably don't want that. Even if the pool can free all allocated objects at once when requested to, throughout the life of your program you probably want to keep in memory no more than what's needed.

You can do that in encapsulating your buffer into a smart pointer. You allocate the necessary memory with the pool and then you use a placement new on the allocated buffer to construct it. Then, you put that into a smart pointer and make sure it's going to be deallocated using the pool's semantics.

We'll design destruction first. Our smart pointer is going to need a bespoke destructor, the default one calls delete on the buffer and makes Mr. Access Violation pay you a visit.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
struct rock itdeallocator
{
    void operator ( ) (rock it * yeah)
    {
        // delete *must* work with NULL
        if(yeah)
        {
            // must come from our allocator
            BOOST
ASSERT (elite allocation::isfrom (yeah ) ) ;
            // explicit destruction
            yeah - >~rock it();
            // now we free the memory
            elite
allocation :: free (yeah ) ;
        }
    }
} ;

The assertion is pretty handy to catch you mixing custom deallocators. Thanks to the tag you specified in the pool definition, it's got the ability to recognize its own people. You want this error to be caught as early as possible in your code.

Other than that, he custom deallocator is all about making an explicit destructor call and then freeing memory. If you did the opposite the result would be undefined.

Destruction after deallocation would be like trying to carefully dismantle a building after a nuclear attack. I did that once. Even if you're careful with the radiations you spend your time figuring out if the pile of sand you have in the hand was the door or the roof.

Now time for the full fledged construction:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
rock itptr make rockit ( )
{
    // let's get a block of memory
    void * p = elite allocation::malloc();
    // boost allocator doesn't trow, you might want to
    // throw when you cannot allocate
    if(!p)
        throw bad
alloc ( ) ;

    // we construct on the allocated block,
    // we don't use the
    // strandard new allocator
    return rockitptr(new(p)rockit(),
        rock
it_deallocator());
}

This weird new call is the infamous placement new. It's a nice C++ feature that enables you to build objects on already allocated memory. Also note the pointer validation after the call to the pool, you might want to have a difference behavior (such as returning a null pointer instead of throwing an exception).

That's all there is to it. You can now pass around your rockitptr without having to bother how they were allocated and how they should be destroyed. The pool allocator will grow as necessary and you can be confident that freed objects will make room for new objects in a blink of eye, without fragmenting memory. Your memory management strategy is carefully delimited into the makerockit function. No one needs to know how you created the object to use correctly.

That's what C++ and generic programming is all about: solving problems once.

Conclusions

  • Don't use dynamically allocated objects unless you really need to. References help.
  • If you know you're going to need a flurry of identical objects, use a pool. Combine the pool with reference counting for optimal memory management.
  • Encapsulate your memory management. This decreases maintenance costs and raises reliability.

Topics: boost, c++, heap, pool, Uncategorized