cpp

Introduction

A list, an associative map, a set, and other data structures, are called containers in C++. A container:

is a generic data structure,
stores elements of any, but only one type,
stores elements by value, not by reference,
stores elements at the heap,
can grow or shrink dynamically (i.e., at run-time).

A container is generic, because it can store data of any type T. A container type (e.g., std::vector<T>) is templated, because we have to give a template argument T at compile-type.

While the container itself (i.e., the object of the container type) can be anywhere in memory, the elements have to be stored at the heap, because the size of the container can change at run-time.

Containers are complemented by iterators and algorithms. The iterator is a generalization of a pointer. Algorithms, such as std::sort, are generalized in that they can be used with various containers, because access to the container elements is provided with iterators.

Standard containers, iterators, and algorithms fit together because of the naming and semantic convention. As for the naming convention, for instance, every container has the begin, and end, size, and other functions. As for the semantic convention, for instance, we get the data of iterator i by dereferencing: *i.

History

In the early 90’s, containers were:

a cutting-edge technology, and a hot research topic,
originally implemented as the STL, the seminal work by Alexander Stepanov.

Now containers are:

an indispensable tool of every-day use,
a part of the standard library.

There is no excuse, put them to use.

Basic container types

Sequence containers:

std::vector<T> - the vector,
std::list<T> - the doubly-linked list,
std::deque<T> - the deque, amalgamation of the vector and the doubly-linked list,
std::forward_list<T> - the singly-linked list,

Associative containers:

std::map<K, V> - the associative array (aka dictionary),
std::multimap<K, V> - the associative array with duplicate keys allowed,
std::set<T> - the set,
std::multiset<T> - the set with duplicate elements allowed,

The adapters:

std::stack - the stack,
std::queue - the queue,
std::priority_queue<T> - the priority queue.

Container types can be nested, i.e., T can be a container type too.

Comparison of basic types of containers

`std::vector<T>`

The vector offers:

random access with the random access operator, i.e., the [] operator, aka the index operator, or the subscript operator,
memory contiguity: all elements are stored contiguous in memory, which implies:
- the random access operator is as fast as can be, because it only has to increase a pointer,
- the memory access is faster: the elements of the vector are all stored compactly together, and so the processor cache is used the most effectively,
- slow insertion and removal.

The vector may have to reallocate its elements when the currently allocated memory is not enough, when we insert a new element at the end of the vector. First, new memory has to be allocated. Next, the elements are copied or moved, depending on whether the type of the elements has the move semantics implemented, and then a new element is added (copied or moved). Finally, the old memory is released.

The performance of the vector drops not only when elements are frequently reallocated, but also when elements are frequently inserted or removed. When an element is inserted or removed, the elements that follow have to be moved (or copied) one-by-one, because the vector has to guarantee the memory contiguity.

In comparison with other containers, the vector performs very well if the reallocation, insertion and removal do not frequently happen, for instance, if we build a vector, fill it in, and then random-access the elements frequently.

`std::list<T>`

The list does not store its elements contiguously in memory, but stores them in different locations at the heap, which were allocated separately. Then the elements are doubly-linked, which implies:

fast insertion and removal, because elements do not have to be reallocated, and neither the elements that follow have to be moved one-by-one,
iterative access only, because to access some element, we have to get to it through the elements that proceed or follow it.

The list does not offer element random access, even though it could with a more elaborate implementation. However, C++ aims at providing fast and lean tools, and such a bloated list would decrease performance. The list of type std::list offers the minimal functionality required, but with superior (time and memory) performance.

`std::deque<T>`

The deque (pronounced as “deck”, as in deck of cards) offers:

random access,
fast insertion and removal.

How come? Well, insertion and removal are not as fast as in std::list, and random access is not as fast as in std::vector. The deque offers a trade-off between functionality, and efficiency. Use the deque if you frequently need to random-access, insert and remove elements.

The deque is implemented with small vectors organized one after another, but without guaranteeing memory contiguity. If element reallocation, insertion or removal is required, then it’s limited to one small vector. However, random access is less efficient in comparison with vector, because finding an address of an element requires more arithmetic.

Use the deque only when the vector and the list won’t do.

`std::forward_list<T>`

Sometimes std::list is more than we need from a list. Sometimes we need to forward-iterate only, and can do without the capability to backward-iterate that is offered by std::list, the capability that we pay for with performance.

Type std::forward_list is even leaner and more performant than std::list, because it is a singly linked list: we can iterate forward, but not backward.

`std::set`, and `std::multiset`

The set stores unique elements, and the multiset allows for multiple elements of an equal value. Both container types store the elements sorted. By default the order is ascending, but we can establish any order with a callable.

Interestingly, we cannot modify the elements, because that would destroy the order of elements, and make the container inconsistent. For this reason, the type of the elements stored is made const, even if the template argument was non-const.

If we want to modify an element, then we have to remove the element first, and insert a new element with a different value next.

#include <iostream>
#include <set>

using namespace std;

int
main()
{
  // A compiler can deduce the type of the elements stored in the set.
  // Equivalent to: set<int> s = {3, 1, 2};
  set s = {3, 1, 2};

  for(auto &e: s)
    cout << e << endl;

  // This would not compile, because we cannot modify the elements.
  // for(auto &e: s)
  //   cout << ++e << endl;

  multiset ms = {1, 2, 3, 1, 2, 3};

  for(auto &e: ms)
    cout << e << endl;
}

Iterators

Iterators are the glue between the containers, and the algorithms. Function std::sort can be used with various containers through iterators. Iterators provide access to the container elements, so that we do not care about the type of a container (the container type was abstracted away).

An iterator is a generalization of a pointer. We could say that a pointer is the iterator of a C-style array. We can initialize the pointer, compare it to some other pointer, dereference it to get to the element it points to, and also increment it. Furthermore, we can random access any element in the C-style array if we increase (with the + operator) the pointer to the element number 0 by the value of the index as in here:

#include <iostream>

using namespace std;

int
main()
{
  // C-style array.
  int a[] = {1, 2, 3};

  // The same as: int *i = a;
  auto i = std::begin(a);
  cout << *i << endl;
  cout << *(i + 2) << endl;
  ++i;
  cout << *(i - 1) << endl;
  --i;
  cout << std::boolalpha << (i == std::end(a)) << endl;
  i += 3;
  cout << std::boolalpha << (i == std::end(a)) << endl;
}

Iterator types are user-defined, e.g., of a structure type. Iterators are wrappers around pointers, where the operators (defined for that type) implement the required functionality. For instance, if in the example above we replace the C-style array with a deque, the rest of the code can remain untouched:

#include <deque>
#include <iostream>

using namespace std;

int
main()
{
  deque<int> a = {1, 2, 3};

  auto i = std::begin(a);
  cout << *i << endl;
  cout << *(i + 2) << endl;
  ++i;
  cout << *(i - 1) << endl;
  --i;
  cout << std::boolalpha << (i == std::end(a)) << endl;
  i += 3;
  cout << std::boolalpha << (i == std::end(a)) << endl;
}

Iterators of the standard library are very small and very efficient. They typically store only a single pointer. Therefore we are free to use them by value, and copy them. We could use iterators by reference too, but that would be just awkward, just as awkward would be using pointers by reference.

For a given container type T, there are always at least two iterator types defined:

non-const iterator type: T::iterator,
const iterator type: T::const_iterator.

You cannot modify elements of a container with a const iterator. For better code, always use the const iterator if you do not modify the elements.

Functions `begin`, `end`

We know where the elements of a container are with the begin, and end functions. The begin function returns an iterator to the first element. The end function returns a past-the-end iterator which you would get if you incremented an iterator to the last element: we could say the end function returns an iterator to an imaginary element (imaginary, because that element does not exist) that would follow the last element. If a container has no elements, the iterators returned by begin and end equal.

The begin and end functions return non-const iterators for a non-const container, and const iterators for a const container. If we want to iterate with a const iterator over a non-const container, we can use the cbegin and cend functions, which return const iterators.

The cbegin and cend are for convenience only, because they are dispensable. We can achieve the same functionality by calling the begin and end functions for a non-const container when we reference the container with a const reference, which we can do with the std::as_const function.

Iterators can be categorized into a few basic categories (concepts, technically speaking), depending on the provided functionality:

a forward iterator,
a bidirectional iterator,
a random-access iterator.

A forward iterator

A forward iterator provides only the two most basic operations: *i, ++i. The iterator of std::forward_list is a forward iterator. Here’s an example:

#include <deque>
#include <forward_list>
#include <iostream>

using namespace std;

int
main()
{
  forward_list<int> a = {1, 2, 3};

  auto i = std::begin(a);
  cout << *i << endl;
  // That would not compile.
  // cout << *(i + 2) << endl;
  ++i;
  // That would not compile.
  // cout << *(i - 1) << endl;
  // That would not compile.
  // --i;
  cout << std::boolalpha << (i == std::end(a)) << endl;
  // That would not compile.
  // i += 3;
  cout << std::boolalpha << (i == std::end(a)) << endl;
}

A bidirectional iterator

A bidirectional iterator is a forward iterator with an extra operation defined: --i, i.e., move back by one element. The iterator of std::list is a bidirectional iterator. For example:

#include <deque>
#include <list>
#include <iostream>

using namespace std;

int
main()
{
  list<int> a = {1, 2, 3};

  auto i = std::begin(a);
  cout << *i << endl;
  // That would not compile.
  // cout << *(i + 2) << endl;
  ++i;
  // That would not compile.
  // cout << *(i - 1) << endl;
  --i;
  cout << std::boolalpha << (i == std::end(a)) << endl;
  // That would not compile.
  // i += 3;
  cout << std::boolalpha << (i == std::end(a)) << endl;
}

A random-access iterator

A random-access iterator allows for increasing or decreasing the iterator value by any integer, as we’ve done in the example above with the deque. The iterator of the vector is a random-access iterator. A pointer is a random-access iterator too.

Forward-iterating over the elements of a container

We can forward-iterate over the elements of a container (or, more broadly, a range, that includes the C-style array) in two ways: the old, and the new.

Iterating the old way

As shown in the example below, you can forward-iterate over the elements of a container the old way, which is a bit tedious, because we have to initialize the iteration variable i, write the loop condition, and then increment the variable. This loop is also error-prone, as it’s easy to mistakenly write ‘–i’ instead of ‘++i’, or begin instead of end.

#include <iostream>
#include <vector>

using namespace std;

int main()
{
  vector v = {1, 2, 3};

  // The legacy way of iterating.  We can modify the elements, because
  // we're using a non-const iterator.
  for(vector<int>::iterator i = v.begin(); i != v.end(); ++i)
    cout << (*i)++ << endl;

  // That would not compile, because we're trying to modifying the
  // elements that we access through a const iterator.
  // for(vector<int>::const_iterator i = v.begin(); i != v.end(); ++i)
  //   cout << (*i)++ << endl;

  // We can use the 'auto' type -- the preferred way, because it's
  // less error-prone.  We let the compiler deduce the iterator type.
  for(auto i = v.begin(); i != v.end(); ++i)
    cout << (*i)++ << endl;

  // We're using const iterators here.
  for(auto i = v.cbegin(); i != v.cend(); ++i)
    cout << *i << endl;

  // We're using const iterators here as well.
  for(auto i = begin(as_const(v)); i != end(as_const(v)); ++i)
    cout << *i << endl;
}

Likewise for C-style arrays:

#include <iostream>
#include <iterator>

using namespace std;

int main()
{
  int a[] = {1, 2, 3};

  // The old way of iterating.  We can modify the elements, because
  // we're using a pointer to non-const data.
  for(int *i = begin(a); i != end(a); ++i)
    cout << (*i)++ << endl;

  // That would not compile, because we're trying to modifying the
  // elements that we access through a pointer to the const data.
  // for(const int *i = begin(a); i != end(a); ++i)
  //   cout << (*i)++ << endl;
  
  // We can use the 'auto' type -- the preferred way, because it's
  // less error-prone.  We let the compiler deduce the iterator type.
  for(auto i = begin(a); i != end(a); ++i)
    cout << (*i)++ << endl;

  // We're using a const iterator (a pointer to const data) here.
  for(auto i = cbegin(a); i != cend(a); ++i)
    cout << *i << endl;

  // We're using const iterators here.
  for(auto i = begin(as_const(a)); i != end(as_const(a)); ++i)
    cout << *i << endl;
}

Iterating the new way

Since C++11, we can iterate the new way, using the range-based (aka for-each) syntax of the for loop. The semantics is the same as for the old way. The range-based loop is less error-prone, because we have to write less.

The syntax is:

for(declaration: expression) statement

Where:

declaration declares the variable that is initialized with the container elements (or more precisely, the range elements) in every iteration of the loop. We refer to this variable as the declared variable.
expression is the range expression. Most often, we put the container here. Having the range expression, we can get the values of the begin and end iterators.
statement is the statement executed in every iteration of the loop. In that statement we use the declared variable.

An example:

#include <iostream>
#include <vector>
#include <utility>

using namespace std;

int main()
{
  // The code works any range: a container and a C-style array.
  vector v = {1, 2, 3};
  // int v[] = {1, 2, 3};

  // We can modify the elements, because we use them through a
  // non-const reference of type: auto &
  for(auto &e: v)
    cout << ++e << endl;

  // Here we iterate with a const reference.
  for(const auto &e: v)
    cout << e << endl;

  // This would not compile, because we refer to the elements through
  // a const reference: const auto &

  // for(const auto &e: v)
  //   cout << ++e << endl;

  // We could also iterate through a non-const range and reference the
  // elements with a const reference, becasue we refer to the
  // non-const range with a const reference.
  for(auto &e: as_const(v))
    cout << e << endl;
}

How iteration the new way works

The new range-based loop is translated by a compiler to a regular loop, where the iteration variable is of an iterator type. The iteration variable is initialized with a value returned by the begin function. The loop continues if the value of the iterator is not equal to the value returned by the end function. In every iteration of the loop, the declared variable is initialized by the dereferenced iteration variable. After an iteration, the iterator is incremented.

To use the range-based loops, we need to make sure that:

we can call the begin and end functions for the range expression,
the type of the values returned by the begin and end functions should have the following defined:
- the != comparison operator,
- the dereference operator, i.e., *,
- the prefix increment operator, i.e., ++.

Here is an example how we can use that functionality:

#include <iostream>

template <typename T>
struct range_iter
{
  T m_i;

  range_iter(T i): m_i(i)
  {
  }

  range_iter &
  operator++()
  {
    ++m_i;
    return *this;
  }

  T
  operator *() const
  {
    return m_i;
  }
  
  bool
  operator!=(const range_iter &i)
  {
    return m_i != *i;
  }
};

template <typename T>
struct range
{
  T m_a;
  T m_b;

  range(T a, T b): m_a(a), m_b(b)
  {
  }

  range_iter<T> begin()
  {
    return range_iter<T>(m_a);
  }

  range_iter<T> end()
  {
    return range_iter<T>(m_b);
  }
};

using namespace std;

int main()
{
  for(auto i: range<int>(1, 11))
    cout << i << endl;
}

Containers and element management

We can copy or move an element into a container. We can also move it from a container, except for the associative containers, for which we can extract an element. We can also emplace an element in a container.

Containers have the move semantics implemented. An example:

#include <iostream>
#include <list>
#include <set>
#include <vector>
#include <string>

using namespace std;

struct A
{
  string m_id;

  A(string &&id): m_id(std::move(id))
  {
    cout << "ctor: " << m_id << '\n';
  }

  ~A()
  {
    cout << "dtor: " << m_id << '\n';
  }

  A(const A &a): m_id(a.m_id)
  {
    m_id += "-copied";
    cout << "copy-ctor: " << m_id << '\n';
  }

  A(A &&a): m_id(std::move(a.m_id))
  {
    m_id += "-moved";
    cout << "copy-ctor: " << m_id << '\n';
  }

  A& operator = (const A &a) = delete;
  A& operator = (A &&a) = delete;

  bool
  operator < (const A &a) const
  {
    return m_id < a.m_id;
  }
};

int main()
{
  // A temporary object is not moved but copied because {A("V1")}
  // creates an std::initializer_list object that provides access to
  // its elements through a const reference.
  vector<A> va, vb{A("V1")};
  cout << "Moving a container touches no element.\n";
  va = std::move(vb);

  cout << "-------------------------------------------------\n";
  
  // A temporary object is not moved but copied.  I don't know why.
  list<A> la, lb{A("L1")};
  cout << "Moving a container touches no element.\n";
  la = std::move(lb);

  cout << "-------------------------------------------------\n";

  // A temporary object is not moved but copied.  I don't know why.
  set<A> sa, sb{A("S1")};
  cout << "Moving a container touches no element.\n";
  sa = std::move(sb);

  cout << "-------------------------------------------------\n";

  return 0;
}

Move semantics for element types

We can move elements into containers: it’s enough to make sure that the element we insert is used in an rvalue. We can also move from an element of a sequence container. Example:

#include <iostream>
#include <list>
#include <set>
#include <vector>
#include <string>

using namespace std;

struct A
{
  string m_id;

  A(string &&id): m_id(std::move(id))
  {
    cout << "ctor: " << m_id << '\n';
  }

  ~A()
  {
    cout << "dtor: " << m_id << '\n';
  }

  A(const A &a) = delete;
  A& operator = (const A &a) = delete;

  A(A &&a): m_id(std::move(a.m_id))
  {
    m_id += "-moved";
    cout << "move-ctor: " << m_id << '\n';
  }

  A &
  operator = (A &&a)
  {
    m_id = std::move(a.m_id) + "-moved";
    cout << "move-assignment: " << m_id << '\n';
    return *this;
  }

  bool
  operator < (const A &a) const
  {
    return m_id < a.m_id;
  }
};

int main()
{
  vector<A> v;
  cout << "Vector push_back:\n";
  v.push_back(A("V1"));
  cout << "Vector insert:\n";
  v.insert(v.begin(), A("V2"));
  cout << "Vector element assignment:\n";
  A a = std::move(v[0]);

  cout << "-------------------------------------------------\n";

  list<A> l;
  cout << "List push_back:\n";
  l.push_back(A("L1"));
  cout << "List push_front:\n";
  l.push_front(A("L2"));
  cout << "List element assignment:\n";
  (*l.begin()) = std::move(a);

  cout << "-------------------------------------------------\n";

  set<A> s;
  cout << "Set insert:\n";
  s.insert(A("S2"));
  cout << "Set insert again:\n";
  s.insert(A("S1"));

  cout << "-------------------------------------------------\n";

  return 0;
}

Extract

We cannot move an element from an associative container, because we cannot modify that element, even if we are using a non-const iterator, as the type of the element is const. We shouldn’t even move, because the object after moving its value would remain in the container, but in an undefined state, which would make the container inconsistent. Instead of moving the value of an object, we should extract the object. Extraction is supported only by the associative containers, because it’s needed only there.

Extraction is implemented by unlinking the element from the container. As the result of the extraction, we get a node handle (an object of a move-only type) which owns the extracted element: when a node handle is destroyed while still holding the element, that element is destroyed too. The node handle can be implemented with the unique smart pointer, i.e., std::unique_ptr.

Having a node handle, we can insert the element into a different container of the same type as the container from which the element was extracted. The element remains untouched (still in the same place with the same value) when it’s extracted and inserted; the ownership of the element changes from one container to the other passing through a node handle.

Having a node handle, we can access the owned element with the value function, and move it (move its value) somewhere, e.g., to a different container of a different type. When the node handle is destroyed, it will destroy the element from which we moved.

An example:

#include <list>
#include <iostream>
#include <set>
#include <string>

using namespace std;

struct A
{
  string m_id;

  A(string &&id): m_id(std::move(id))
  {
    cout << "ctor: " << m_id << '\n';
  }

  ~A()
  {
    cout << "dtor: " << m_id << '\n';
  }

  A(const A &a) = delete;
  A& operator = (const A &a) = delete;

  A(A &&a): m_id(std::move(a.m_id))
  {
    cout << "move-ctor: " << m_id << '\n';
  }

  A &
  operator = (A &&a)
  {
    m_id = std::move(a.m_id);
    cout << "move-assignment: " << m_id << '\n';
    return *this;
  }

  bool
  operator < (const A &a) const
  {
    return m_id < a.m_id;
  }
};

int main()
{
  set<A> a, b;
  // The temporary object is moved into a container element.
  a.insert(A("A1"));
  a.insert(A("A2"));
  cout << "TEST\n";
  b.insert(a.extract(a.begin()));

  list<A> l;
  {
    // That node handle owns A2.
    auto nh = a.extract(a.begin());
    l.push_back(std::move(nh.value()));
  }

  return 0;
}

Emplace

An element can be copied, moved, or emplaced into a container. Copying is needed when we want to keep the source element intact. Moving is faster, and so preferred over copying, if the source won’t be needed later. In both copying and moving, we create an object ourselves, and then pass it to a container. Emplacing creates an object based on the arguments we provide.

Emplacing is the fastest: a container tries to create the element in the required place: the element is created in-place, i.e., in the place (memory location) required by the container. No copying, no moving… if all goes well.

An emplace function takes the arguments of an element constructor, and passes them (forwards, technically speaking) to the constructor when it’s known where (i.e., the place is known) the element should be constructed.

We emplace by calling an emplace function of a container. Containers have other functions for emplacing with slight semantic differences, e.g., std::list has emplace_front, and std::forward_list has emplace_after.

Emplacement works similar to insertion in that the elements that follow are “pushed to the right”. Therefore emplacement entails the same performance issues as the insertion does.

A container tries to emplace the element. Tries, because the place for the element might be already taken by some other element, e.g., when we emplace at the front of a non-empty vector. If that happens, the new element is created in a different memory location, and then moved into the required place.

#include <iostream>
#include <forward_list>
#include <list>
#include <set>
#include <vector>
#include <string>

using namespace std;

struct A
{
  string m_id;

  A(string &&id): m_id(std::move(id))
  {
    cout << "ctor: " << m_id << '\n';
  }

  ~A()
  {
    cout << "dtor: " << m_id << '\n';
  }

  A(const A &a) = delete;
  A& operator = (const A &a) = delete;

  A(A &&a): m_id(std::move(a.m_id))
  {
    m_id += "-moved";
    cout << "move-ctor: " << m_id << '\n';
  }

  A &
  operator = (A &&a)
  {
    m_id = std::move(a.m_id) + "-moved";
    cout << "move-assignment: " << m_id << '\n';
    return *this;
  }
};

int main()
{
  cout << "vector: -----------------------------------------\n";

  {
    vector<A> v;

    // Uncommend the line below to prevent vector reallocation.
    // v.reserve(3);

    auto i1 = v.begin();

    // This becomes the first element in the vector.
    cout << "Checkpoint V1\n";
    v.emplace_back("V1");
    cout << "Checkpoint V2\n";
    v.emplace(v.begin(), "V2");
    cout << "Checkpoint V3\n";
    v.emplace(v.end(), "V3");

    cout << "The vector elements are:\n";
    for(const auto &e: v)
      cout << e.m_id << endl;

    cout << "The vector was relocated: " << boolalpha
         << (i1 != v.begin()) << endl;
  }
  
  cout << "list: -------------------------------------------\n";

  {
    list<A> l;
    // We can emplace at the front, and at the back, because the list
    // is doubly-linked.
    l.emplace_front("L1");
    l.emplace_back("L2");
    l.emplace(++(l.begin()), "L3");

    cout << "The list elements are:\n";
    for(const auto &e: l)
      cout << e.m_id << endl;
  }
  
  cout << "forward_list: -----------------------------------\n";

  {
    forward_list<A> f;
    f.emplace_front("L1");
    // We can emplace after an element, but not before, because it's a
    // singly-linked list.
    f.emplace_after(f.begin(), "L2");

    // We can't emplace at the back, because we don't have an iterator
    // to the preceeding element in the list.

    // f.emplace_back("L3");

    cout << "The forward_list elements are:\n";
    for(const auto &e: f)
      cout << e.m_id << endl;

  }

  return 0;
}

Conclusion

Don’t implement the basic data structures, because they are already there.
Use the containers, and get better at using them. Their functionality is quite rich.
With the containers, you can build complex data structures.
With the containers use the standard algorithms, because your own algorithm implementations will most likely perform far worse.

Quiz

What are the prominent differences between container types?
Why can’t we modify the set elements?
How does emplacement work?

Acknowledgement

The project financed under the program of the Minister of Science and Higher Education under the name “Regional Initiative of Excellence” in the years 2019 - 2022 project number 020/RID/2018/19 the amount of financing 12,000,000 PLN.

cpp