A faster std::vector

This article is about using the std::vector class in the C++ programming language. The std::vector is a useful class for many things. However, when storing integral data types such as an int in a vector, there can be some overhead involved. For example, when we use a vector like this: 

// Construct a buffer with some dynamically determined size
vector<char> buffer(buffer_size);

// Open a file
ifstream file("myfile.bin");

// Read some data into the buffer
file.read(buffer.data(), buffer.size()); 

The use of vector is quite elegant, but there is overhead in using the vector like this: the buffer will be value initialized (initialized to zero), while this is unnecessary.

The other options...

There are several other options to allocate a dynamic array:

  1. Stack allocated storage: char buffer[buffer_size]; Such usage is a bit dangerous because the stack has limited space, and one can not tell how much space is available. Variable length arrays are not widely supported either. It is very fast though, and it is normally the fastest of the available options.
  2. Allocate a buffer with raw pointers: char* buffer = new char[buffer_size]; and delete[] it afterwards. Of course, as we all know, it is easy to leak such a buffer in case of an exception or early return, so raw pointers are to be avoided... Yes, raw pointers are yuk!
  3. Using C++11's unique_ptr: unique_ptr buffer(new char[buffer_size]); A bit verbose, but does it perform well? I compared the assembly code of options 2 versus option 3 in a few short programs, and the generated code was always equal. Hence, this is as fast as options (ii), with no danger of leaking the buffer. One can still index the array with its operator[], e.g. secondByte = buffer[1];. The standard library support of unique_ptr is good nowadays, and I'm even able to compile such code on the supercomputers I use, which often have old compilers (e.g., they don't have rvalue support yet). Sweet!

Despite option 3 being safe and fast, the unique_ptr is of course not nearly as powerfull as a vector. It does not know the size of its stored array and it does not allow adding or removing elements. Constructing it is also not as elegant. I would think that the behaviour of std::vector without its initialization would be commonly desired, and some googling shows indeed more programmers with the same question.

A vector that does not initialize

So, I wrote a vector with an interface equal to std::vector, but one which does not initialize its elements in the constructor with size parameter and the resize() method: the uvector, or "uninitialized vector". The full source code for the uvector can be found here: uvector.h (free to be used under the GPL 3 license). It supports rvalue stuff, so it requires a decent C++11 compiler. The uvector allows to hold any type that is trivial, and as it turns out, there are a few more beneficial performance effects of assuming the element type is trivial. The type is checked with the use of C++11's awesome static_assert: static_assert(std::is_trivial(), "A uvector can only hold classes that are trivial"); 

Enforce trivial or standard layout type?

I later relaxed the assertion to check for std::is_standard_layout<Tp>, because it turns out that a std::pair with two trivial types is not a trivial type itself, and I want to be able to store them in my uvector. I assume that the behaviour of delaying the initializing of a std::pair<trivial,trivial> is defined, since it is a standard layout class, hence it is required to be compatible with a C struct. Since a C struct can be left uninitialized without causing undefined behaviour, and we know that the elements do not require initialization, so should a std::pair<trivial,trivial>. Therefore, there is no freedom for an implementer to make a std::pair<trivial,trivial> not behave trivial. It sounds though the standard does not agree with std::pair<trivial,trivial> being trivial. Does it mean it is undefined behaviour to skip the constructor or memcopy it? The std::is_trivial reference on cppreference.com says: "Note: Only objects (including arrays) of trivial type may be created by a call to std::malloc.". Confusing.

Further performance improvements

Because the element type is known to be trivial(...ish), the elements can be copied and moved with memcpy() and memmove(). Also, construction, copying and moving without memcpy/memmove can never throw an exception, which makes sure that the uvector does not have to enclose any of its actions in try { ... } catch(...) { } blocks. While a compiler could in theory perform the memcpy/memmove and exception optimizations for a std::vector, this seems not to happen. Okay, enough background, let us look at the timings!

Timings

I created short test cases for several operations in the vector classes with a value type of int, and measured the time to do these operations a large number of times. I also tested the new uvector both with gcc and Clang. These are the results: Afbeelding verwijderd. On average, the uvector compiled in gcc takes 84% the time of the std::vector and in Clang it is 88%. These timings are relative to the time std::vector takes to perform the same operations, when compiled with gcc. I did that to be able to also compare the difference between Clang and gcc. In general, Clang generates code that is optimized reasonably similar, but it fails quite badly on the push_back(size_n n, const Tp& val) operation (this is not a standard std::vector method; see below for info on the new methods). Clearly, the largest difference between the two vectors is in the constructor that takes a size parameter, since the uvector skips initialization of the elements. This uvector's constructor is 70% faster, which is a significant speed gain. Other operations that resize the storage, and for which the uvector semantically does exactly the same as the std::vector, are also significantly faster: vector(const vector& source) is 19% faster, operator=(const vector& source) is 12% faster, push_back(Tp& val) is 38% faster(!), and the insert() and erase() methods are a few percent faster. I know what will be my preferred vector in the future! The only thing to be careful with, is the difference in initialization semantics. A uvector does not lose any functionality though, since it can still be initialize with the uvector(size_t n, Tp& val, [..]) constructor. Personally, I prefer this more verbose form of std::vector anyway when I want zero initialization of e.g. ints, because it shows that this is desired.

More about timing

While it is intuitively easy to see that various operations of a uvector should be fast, timing the operations is the only way to be sure (see also Andrei Alexandrescu's awesome GoingNative 2013 talk, themed "Measure everything"). However, the tests done are of course simple examples that do not reflect all the real life uses of a vector class. It is hard to time a very generic container and say something about it in real life uses. For example, memory does likely not become fragmented in these tests, causing allocation to be very fast and making initializaton seem more expensive than it might be in real examples. I compiled with "-O3 -march=native", and you can also see in the source code that I added funky assignments to a volatile pointer, to make sure statements that the compiler does not remove statements that are seemingly unuseful. Nevertheless, it should be clear that there is a significant performance benefit in some cases!

Extra methods in uvector

The only change in semantics of standard defined methods between the uvector and std::vector are the vector(size_t n) constructor and the resize(size_t n) methods. I found that, if you really are concerned about performance, there might be two more useful operations:

  • insert_uninitialized(const_iterator position, size_t n) This function is similar to the corresponding std::vector::insert method without initializing the new elements. The tests show it is 8% faster when inserting elements in the middle.
  • push_back_uninitialized(size_t n) This corresponds with uvector.insert_uninitialized(vector.end(), n). Compared to std::vector.insert(vector.end(), n), it is 53% faster.

To be honest, I can't think of many scenarios where you would use these functions, but it won't hurt having them available. Finally, I was struck by the fact that insert() has the following overloads:

  • iterator insert(const_iterator position, InputIterator first, InputIterator last);
  • iterator insert(const_iterator position, size_t n, const Tp& val; and
  • iterator insert(const_iterator position, std::initializer_list initlist),

but there are no corresponding overloads for push_back() in std::vector. I assume the thought behind this is that these operations are already available with insert(), for example: void add_some_values(vector& vec) { vec.insert(vec.end(), {1, 3, 3, 7}); } This might be performance-wise equal or very close to a push_back(initializer_list) call (the insert will result in an extra memmove of zero size, and thus can be optimized away), it is definitely not as elegant and readable as: void add_some_values(vector& vec) { vec.push_back({1, 3, 3, 7}); } This also does not require one to think about whether insert adds before or after the given position. I haven't come across a situation where I need this behaviour many times, but examples do exist. After all, a vector is a push_back monster: pushing back is one of its major strengths.

Conclusions

  • Initialization is a significant cost in the construction of a std::vector.
  • With the current state of compiler optimization, it is beneficial to specialize the std::vector operations to (a) use memcpy/memmove and therefore (b) remove exception handling during copying/moving.
  • The uvector is a useful class for performance-hungry developers. I've permanently replaced uses of std::vector<trivial> with the uvector in new code I write, having finally an elegant class for zero-overhead dynamic arrays.
  • I like the new overloads of push_back(), especially push_back(initializer_list), and wonder why it was chosen not to provide them in std::vector.

References