A list, an associative map, a set, and other data structures, are called containers in C++. A container:
is a generic data structure,
stores elements of any, but only one type,
stores elements by value, not by reference,
stores elements at the heap,
can grow or shrink dynamically (i.e., at run-time).
A container is generic, because it can store data of any type T
. A
container type (e.g., std::vector<T>
) is templated, because we
have to give a template argument T
at compile-type.
While the container itself (i.e., the object of the container type) can be anywhere in memory, the elements have to be stored at the heap, because the size of the container can change at run-time.
Containers are complemented by iterators and algorithms. The iterator
is a generalization of a pointer. Algorithms, such as std::sort
,
are generalized in that they can be used with various containers,
because access to the container elements is provided with iterators.
Standard containers, iterators, and algorithms fit together because of
the naming and semantic convention. As for the naming convention, for
instance, every container has the begin
, and end
, size
, and
other functions. As for the semantic convention, for instance, we get
the data of iterator i
by dereferencing: *i
.
In the early 90’s, containers were:
a cutting-edge technology, and a hot research topic,
originally implemented as the STL, the seminal work by Alexander Stepanov.
Now containers are:
an indispensable tool of every-day use,
a part of the standard library.
There is no excuse, put them to use.
Sequence containers:
std::vector<T>
- the vector,
std::list<T>
- the doubly-linked list,
std::deque<T>
- the deque, amalgamation of the vector and the
doubly-linked list,
std::forward_list<T>
- the singly-linked list,
Associative containers:
std::map<K, V>
- the associative array (aka dictionary),
std::multimap<K, V>
- the associative array with duplicate keys
allowed,
std::set<T>
- the set,
std::multiset<T>
- the set with duplicate elements allowed,
The adapters:
std::stack
- the stack,
std::queue
- the queue,
std::priority_queue<T>
- the priority queue.
Container types can be nested, i.e., T
can be a container type too.
std::vector<T>
The vector offers:
random access with the random access operator, i.e., the []
operator, aka the index operator, or the subscript operator,
memory contiguity: all elements are stored contiguous in memory, which implies:
the random access operator is as fast as can be, because it only has to increase a pointer,
the memory access is faster: the elements of the vector are all stored compactly together, and so the processor cache is used the most effectively,
slow insertion and removal.
The vector may have to reallocate its elements when the currently allocated memory is not enough, when we insert a new element at the end of the vector. First, new memory has to be allocated. Next, the elements are copied or moved, depending on whether the type of the elements has the move semantics implemented, and then a new element is added (copied or moved). Finally, the old memory is released.
The performance of the vector drops not only when elements are frequently reallocated, but also when elements are frequently inserted or removed. When an element is inserted or removed, the elements that follow have to be moved (or copied) one-by-one, because the vector has to guarantee the memory contiguity.
In comparison with other containers, the vector performs very well if the reallocation, insertion and removal do not frequently happen, for instance, if we build a vector, fill it in, and then random-access the elements frequently.
std::list<T>
The list does not store its elements contiguously in memory, but stores them in different locations at the heap, which were allocated separately. Then the elements are doubly-linked, which implies:
fast insertion and removal, because elements do not have to be reallocated, and neither the elements that follow have to be moved one-by-one,
iterative access only, because to access some element, we have to get to it through the elements that proceed or follow it.
The list does not offer element random access, even though it could
with a more elaborate implementation. However, C++ aims at providing
fast and lean tools, and such a bloated list would decrease
performance. The list of type std::list
offers the minimal
functionality required, but with superior (time and memory)
performance.
std::deque<T>
The deque (pronounced as “deck”, as in deck of cards) offers:
random access,
fast insertion and removal.
How come? Well, insertion and removal are not as fast as in
std::list
, and random access is not as fast as in std::vector
.
The deque offers a trade-off between functionality, and efficiency.
Use the deque if you frequently need to random-access, insert and
remove elements.
The deque is implemented with small vectors organized one after another, but without guaranteeing memory contiguity. If element reallocation, insertion or removal is required, then it’s limited to one small vector. However, random access is less efficient in comparison with vector, because finding an address of an element requires more arithmetic.
Use the deque only when the vector and the list won’t do.
std::forward_list<T>
Sometimes std::list
is more than we need from a list. Sometimes we
need to forward-iterate only, and can do without the capability to
backward-iterate that is offered by std::list
, the capability that
we pay for with performance.
Type std::forward_list
is even leaner and more performant than
std::list
, because it is a singly linked list: we can iterate
forward, but not backward.
std::set
, and std::multiset
The set stores unique elements, and the multiset allows for multiple elements of an equal value. Both container types store the elements sorted. By default the order is ascending, but we can establish any order with a callable.
Interestingly, we cannot modify the elements, because that would destroy the order of elements, and make the container inconsistent. For this reason, the type of the elements stored is made const, even if the template argument was non-const.
If we want to modify an element, then we have to remove the element first, and insert a new element with a different value next.
#include <iostream>
#include <set>
using namespace std;
int
main()
{
// A compiler can deduce the type of the elements stored in the set.
// Equivalent to: set<int> s = {3, 1, 2};
set s = {3, 1, 2};
for(auto &e: s)
cout << e << endl;
// This would not compile, because we cannot modify the elements.
// for(auto &e: s)
// cout << ++e << endl;
multiset ms = {1, 2, 3, 1, 2, 3};
for(auto &e: ms)
cout << e << endl;
}
Iterators are the glue between the containers, and the algorithms.
Function std::sort
can be used with various containers through
iterators. Iterators provide access to the container elements, so
that we do not care about the type of a container (the container type
was abstracted away).
An iterator is a generalization of a pointer. We could say that a pointer is the iterator of a C-style array. We can initialize the pointer, compare it to some other pointer, dereference it to get to the element it points to, and also increment it. Furthermore, we can random access any element in the C-style array if we increase (with the + operator) the pointer to the element number 0 by the value of the index as in here:
#include <iostream>
using namespace std;
int
main()
{
// C-style array.
int a[] = {1, 2, 3};
// The same as: int *i = a;
auto i = std::begin(a);
cout << *i << endl;
cout << *(i + 2) << endl;
++i;
cout << *(i - 1) << endl;
--i;
cout << std::boolalpha << (i == std::end(a)) << endl;
i += 3;
cout << std::boolalpha << (i == std::end(a)) << endl;
}
Iterator types are user-defined, e.g., of a structure type. Iterators are wrappers around pointers, where the operators (defined for that type) implement the required functionality. For instance, if in the example above we replace the C-style array with a deque, the rest of the code can remain untouched:
#include <deque>
#include <iostream>
using namespace std;
int
main()
{
deque<int> a = {1, 2, 3};
auto i = std::begin(a);
cout << *i << endl;
cout << *(i + 2) << endl;
++i;
cout << *(i - 1) << endl;
--i;
cout << std::boolalpha << (i == std::end(a)) << endl;
i += 3;
cout << std::boolalpha << (i == std::end(a)) << endl;
}
Iterators of the standard library are very small and very efficient. They typically store only a single pointer. Therefore we are free to use them by value, and copy them. We could use iterators by reference too, but that would be just awkward, just as awkward would be using pointers by reference.
For a given container type T
, there are always at least two iterator
types defined:
non-const iterator type: T::iterator
,
const iterator type: T::const_iterator
.
You cannot modify elements of a container with a const iterator. For better code, always use the const iterator if you do not modify the elements.
begin
, end
We know where the elements of a container are with the begin
, and
end
functions. The begin
function returns an iterator to the
first element. The end
function returns a past-the-end iterator
which you would get if you incremented an iterator to the last
element: we could say the end
function returns an iterator to an
imaginary element (imaginary, because that element does not exist)
that would follow the last element. If a container has no elements,
the iterators returned by begin
and end
equal.
The begin
and end
functions return non-const iterators for a
non-const container, and const iterators for a const container. If we
want to iterate with a const iterator over a non-const container, we
can use the cbegin
and cend
functions, which return const
iterators.
The cbegin
and cend
are for convenience only, because they are
dispensable. We can achieve the same functionality by calling the
begin
and end
functions for a non-const container when we
reference the container with a const reference, which we can do with
the std::as_const
function.
Iterators can be categorized into a few basic categories (concepts, technically speaking), depending on the provided functionality:
a forward iterator,
a bidirectional iterator,
a random-access iterator.
A forward iterator provides only the two most basic operations: *i
,
++i
. The iterator of std::forward_list
is a forward iterator.
Here’s an example:
#include <deque>
#include <forward_list>
#include <iostream>
using namespace std;
int
main()
{
forward_list<int> a = {1, 2, 3};
auto i = std::begin(a);
cout << *i << endl;
// That would not compile.
// cout << *(i + 2) << endl;
++i;
// That would not compile.
// cout << *(i - 1) << endl;
// That would not compile.
// --i;
cout << std::boolalpha << (i == std::end(a)) << endl;
// That would not compile.
// i += 3;
cout << std::boolalpha << (i == std::end(a)) << endl;
}
A bidirectional iterator is a forward iterator with an extra operation
defined: --i
, i.e., move back by one element. The iterator of
std::list
is a bidirectional iterator. For example:
#include <deque>
#include <list>
#include <iostream>
using namespace std;
int
main()
{
list<int> a = {1, 2, 3};
auto i = std::begin(a);
cout << *i << endl;
// That would not compile.
// cout << *(i + 2) << endl;
++i;
// That would not compile.
// cout << *(i - 1) << endl;
--i;
cout << std::boolalpha << (i == std::end(a)) << endl;
// That would not compile.
// i += 3;
cout << std::boolalpha << (i == std::end(a)) << endl;
}
A random-access iterator allows for increasing or decreasing the iterator value by any integer, as we’ve done in the example above with the deque. The iterator of the vector is a random-access iterator. A pointer is a random-access iterator too.
We can forward-iterate over the elements of a container (or, more broadly, a range, that includes the C-style array) in two ways: the old, and the new.
As shown in the example below, you can forward-iterate over the
elements of a container the old way, which is a bit tedious, because
we have to initialize the iteration variable i
, write the loop
condition, and then increment the variable. This loop is also
error-prone, as it’s easy to mistakenly write ‘–i’ instead of ‘++i’,
or begin
instead of end
.
#include <iostream>
#include <vector>
using namespace std;
int main()
{
vector v = {1, 2, 3};
// The legacy way of iterating. We can modify the elements, because
// we're using a non-const iterator.
for(vector<int>::iterator i = v.begin(); i != v.end(); ++i)
cout << (*i)++ << endl;
// That would not compile, because we're trying to modifying the
// elements that we access through a const iterator.
// for(vector<int>::const_iterator i = v.begin(); i != v.end(); ++i)
// cout << (*i)++ << endl;
// We can use the 'auto' type -- the preferred way, because it's
// less error-prone. We let the compiler deduce the iterator type.
for(auto i = v.begin(); i != v.end(); ++i)
cout << (*i)++ << endl;
// We're using const iterators here.
for(auto i = v.cbegin(); i != v.cend(); ++i)
cout << *i << endl;
// We're using const iterators here as well.
for(auto i = begin(as_const(v)); i != end(as_const(v)); ++i)
cout << *i << endl;
}
Likewise for C-style arrays:
#include <iostream>
#include <iterator>
using namespace std;
int main()
{
int a[] = {1, 2, 3};
// The old way of iterating. We can modify the elements, because
// we're using a pointer to non-const data.
for(int *i = begin(a); i != end(a); ++i)
cout << (*i)++ << endl;
// That would not compile, because we're trying to modifying the
// elements that we access through a pointer to the const data.
// for(const int *i = begin(a); i != end(a); ++i)
// cout << (*i)++ << endl;
// We can use the 'auto' type -- the preferred way, because it's
// less error-prone. We let the compiler deduce the iterator type.
for(auto i = begin(a); i != end(a); ++i)
cout << (*i)++ << endl;
// We're using a const iterator (a pointer to const data) here.
for(auto i = cbegin(a); i != cend(a); ++i)
cout << *i << endl;
// We're using const iterators here.
for(auto i = begin(as_const(a)); i != end(as_const(a)); ++i)
cout << *i << endl;
}
Since C++11, we can iterate the new way, using the range-based (aka for-each) syntax of the for loop. The semantics is the same as for the old way. The range-based loop is less error-prone, because we have to write less.
The syntax is:
for(declaration: expression) statement
Where:
declaration
declares the variable that is initialized with the
container elements (or more precisely, the range elements) in every
iteration of the loop. We refer to this variable as the declared
variable.
expression
is the range expression. Most often, we put the
container here. Having the range expression, we can get the values
of the begin and end iterators.
statement
is the statement executed in every iteration of the
loop. In that statement we use the declared variable.
An example:
#include <iostream>
#include <vector>
#include <utility>
using namespace std;
int main()
{
// The code works any range: a container and a C-style array.
vector v = {1, 2, 3};
// int v[] = {1, 2, 3};
// We can modify the elements, because we use them through a
// non-const reference of type: auto &
for(auto &e: v)
cout << ++e << endl;
// Here we iterate with a const reference.
for(const auto &e: v)
cout << e << endl;
// This would not compile, because we refer to the elements through
// a const reference: const auto &
// for(const auto &e: v)
// cout << ++e << endl;
// We could also iterate through a non-const range and reference the
// elements with a const reference, becasue we refer to the
// non-const range with a const reference.
for(auto &e: as_const(v))
cout << e << endl;
}
The new range-based loop is translated by a compiler to a regular
loop, where the iteration variable is of an iterator type. The
iteration variable is initialized with a value returned by the begin
function. The loop continues if the value of the iterator is not
equal to the value returned by the end
function. In every iteration
of the loop, the declared variable is initialized by the
dereferenced iteration variable. After an iteration, the iterator
is incremented.
To use the range-based loops, we need to make sure that:
we can call the begin
and end
functions for the range
expression,
the type of the values returned by the begin
and end
functions
should have the following defined:
the !=
comparison operator,
the dereference operator, i.e., *
,
the prefix increment operator, i.e., ++
.
Here is an example how we can use that functionality:
#include <iostream>
template <typename T>
struct range_iter
{
T m_i;
range_iter(T i): m_i(i)
{
}
range_iter &
operator++()
{
++m_i;
return *this;
}
T
operator *() const
{
return m_i;
}
bool
operator!=(const range_iter &i)
{
return m_i != *i;
}
};
template <typename T>
struct range
{
T m_a;
T m_b;
range(T a, T b): m_a(a), m_b(b)
{
}
range_iter<T> begin()
{
return range_iter<T>(m_a);
}
range_iter<T> end()
{
return range_iter<T>(m_b);
}
};
using namespace std;
int main()
{
for(auto i: range<int>(1, 11))
cout << i << endl;
}
We can copy or move an element into a container. We can also move it from a container, except for the associative containers, for which we can extract an element. We can also emplace an element in a container.
Containers have the move semantics implemented. An example:
#include <iostream>
#include <list>
#include <set>
#include <vector>
#include <string>
using namespace std;
struct A
{
string m_id;
A(string &&id): m_id(std::move(id))
{
cout << "ctor: " << m_id << '\n';
}
~A()
{
cout << "dtor: " << m_id << '\n';
}
A(const A &a): m_id(a.m_id)
{
m_id += "-copied";
cout << "copy-ctor: " << m_id << '\n';
}
A(A &&a): m_id(std::move(a.m_id))
{
m_id += "-moved";
cout << "copy-ctor: " << m_id << '\n';
}
A& operator = (const A &a) = delete;
A& operator = (A &&a) = delete;
bool
operator < (const A &a) const
{
return m_id < a.m_id;
}
};
int main()
{
// A temporary object is not moved but copied because {A("V1")}
// creates an std::initializer_list object that provides access to
// its elements through a const reference.
vector<A> va, vb{A("V1")};
cout << "Moving a container touches no element.\n";
va = std::move(vb);
cout << "-------------------------------------------------\n";
// A temporary object is not moved but copied. I don't know why.
list<A> la, lb{A("L1")};
cout << "Moving a container touches no element.\n";
la = std::move(lb);
cout << "-------------------------------------------------\n";
// A temporary object is not moved but copied. I don't know why.
set<A> sa, sb{A("S1")};
cout << "Moving a container touches no element.\n";
sa = std::move(sb);
cout << "-------------------------------------------------\n";
return 0;
}
We can move elements into containers: it’s enough to make sure that the element we insert is used in an rvalue. We can also move from an element of a sequence container. Example:
#include <iostream>
#include <list>
#include <set>
#include <vector>
#include <string>
using namespace std;
struct A
{
string m_id;
A(string &&id): m_id(std::move(id))
{
cout << "ctor: " << m_id << '\n';
}
~A()
{
cout << "dtor: " << m_id << '\n';
}
A(const A &a) = delete;
A& operator = (const A &a) = delete;
A(A &&a): m_id(std::move(a.m_id))
{
m_id += "-moved";
cout << "move-ctor: " << m_id << '\n';
}
A &
operator = (A &&a)
{
m_id = std::move(a.m_id) + "-moved";
cout << "move-assignment: " << m_id << '\n';
return *this;
}
bool
operator < (const A &a) const
{
return m_id < a.m_id;
}
};
int main()
{
vector<A> v;
cout << "Vector push_back:\n";
v.push_back(A("V1"));
cout << "Vector insert:\n";
v.insert(v.begin(), A("V2"));
cout << "Vector element assignment:\n";
A a = std::move(v[0]);
cout << "-------------------------------------------------\n";
list<A> l;
cout << "List push_back:\n";
l.push_back(A("L1"));
cout << "List push_front:\n";
l.push_front(A("L2"));
cout << "List element assignment:\n";
(*l.begin()) = std::move(a);
cout << "-------------------------------------------------\n";
set<A> s;
cout << "Set insert:\n";
s.insert(A("S2"));
cout << "Set insert again:\n";
s.insert(A("S1"));
cout << "-------------------------------------------------\n";
return 0;
}
We cannot move an element from an associative container, because we cannot modify that element, even if we are using a non-const iterator, as the type of the element is const. We shouldn’t even move, because the object after moving its value would remain in the container, but in an undefined state, which would make the container inconsistent. Instead of moving the value of an object, we should extract the object. Extraction is supported only by the associative containers, because it’s needed only there.
Extraction is implemented by unlinking the element from the
container. As the result of the extraction, we get a node handle
(an object of a move-only type) which owns the extracted element: when
a node handle is destroyed while still holding the element, that
element is destroyed too. The node handle can be implemented with the
unique smart pointer, i.e., std::unique_ptr
.
Having a node handle, we can insert the element into a different container of the same type as the container from which the element was extracted. The element remains untouched (still in the same place with the same value) when it’s extracted and inserted; the ownership of the element changes from one container to the other passing through a node handle.
Having a node handle, we can access the owned element with the value
function, and move it (move its value) somewhere, e.g., to a different
container of a different type. When the node handle is destroyed, it
will destroy the element from which we moved.
An example:
#include <list>
#include <iostream>
#include <set>
#include <string>
using namespace std;
struct A
{
string m_id;
A(string &&id): m_id(std::move(id))
{
cout << "ctor: " << m_id << '\n';
}
~A()
{
cout << "dtor: " << m_id << '\n';
}
A(const A &a) = delete;
A& operator = (const A &a) = delete;
A(A &&a): m_id(std::move(a.m_id))
{
cout << "move-ctor: " << m_id << '\n';
}
A &
operator = (A &&a)
{
m_id = std::move(a.m_id);
cout << "move-assignment: " << m_id << '\n';
return *this;
}
bool
operator < (const A &a) const
{
return m_id < a.m_id;
}
};
int main()
{
set<A> a, b;
// The temporary object is moved into a container element.
a.insert(A("A1"));
a.insert(A("A2"));
cout << "TEST\n";
b.insert(a.extract(a.begin()));
list<A> l;
{
// That node handle owns A2.
auto nh = a.extract(a.begin());
l.push_back(std::move(nh.value()));
}
return 0;
}
An element can be copied, moved, or emplaced into a container. Copying is needed when we want to keep the source element intact. Moving is faster, and so preferred over copying, if the source won’t be needed later. In both copying and moving, we create an object ourselves, and then pass it to a container. Emplacing creates an object based on the arguments we provide.
Emplacing is the fastest: a container tries to create the element in the required place: the element is created in-place, i.e., in the place (memory location) required by the container. No copying, no moving… if all goes well.
An emplace function takes the arguments of an element constructor, and passes them (forwards, technically speaking) to the constructor when it’s known where (i.e., the place is known) the element should be constructed.
We emplace by calling an emplace
function of a container.
Containers have other functions for emplacing with slight semantic
differences, e.g., std::list
has emplace_front
, and
std::forward_list
has emplace_after
.
Emplacement works similar to insertion in that the elements that follow are “pushed to the right”. Therefore emplacement entails the same performance issues as the insertion does.
A container tries to emplace the element. Tries, because the place for the element might be already taken by some other element, e.g., when we emplace at the front of a non-empty vector. If that happens, the new element is created in a different memory location, and then moved into the required place.
#include <iostream>
#include <forward_list>
#include <list>
#include <set>
#include <vector>
#include <string>
using namespace std;
struct A
{
string m_id;
A(string &&id): m_id(std::move(id))
{
cout << "ctor: " << m_id << '\n';
}
~A()
{
cout << "dtor: " << m_id << '\n';
}
A(const A &a) = delete;
A& operator = (const A &a) = delete;
A(A &&a): m_id(std::move(a.m_id))
{
m_id += "-moved";
cout << "move-ctor: " << m_id << '\n';
}
A &
operator = (A &&a)
{
m_id = std::move(a.m_id) + "-moved";
cout << "move-assignment: " << m_id << '\n';
return *this;
}
};
int main()
{
cout << "vector: -----------------------------------------\n";
{
vector<A> v;
// Uncommend the line below to prevent vector reallocation.
// v.reserve(3);
auto i1 = v.begin();
// This becomes the first element in the vector.
cout << "Checkpoint V1\n";
v.emplace_back("V1");
cout << "Checkpoint V2\n";
v.emplace(v.begin(), "V2");
cout << "Checkpoint V3\n";
v.emplace(v.end(), "V3");
cout << "The vector elements are:\n";
for(const auto &e: v)
cout << e.m_id << endl;
cout << "The vector was relocated: " << boolalpha
<< (i1 != v.begin()) << endl;
}
cout << "list: -------------------------------------------\n";
{
list<A> l;
// We can emplace at the front, and at the back, because the list
// is doubly-linked.
l.emplace_front("L1");
l.emplace_back("L2");
l.emplace(++(l.begin()), "L3");
cout << "The list elements are:\n";
for(const auto &e: l)
cout << e.m_id << endl;
}
cout << "forward_list: -----------------------------------\n";
{
forward_list<A> f;
f.emplace_front("L1");
// We can emplace after an element, but not before, because it's a
// singly-linked list.
f.emplace_after(f.begin(), "L2");
// We can't emplace at the back, because we don't have an iterator
// to the preceeding element in the list.
// f.emplace_back("L3");
cout << "The forward_list elements are:\n";
for(const auto &e: f)
cout << e.m_id << endl;
}
return 0;
}
Don’t implement the basic data structures, because they are already there.
Use the containers, and get better at using them. Their functionality is quite rich.
With the containers, you can build complex data structures.
With the containers use the standard algorithms, because your own algorithm implementations will most likely perform far worse.
What are the prominent differences between container types?
Why can’t we modify the set elements?
How does emplacement work?
The project financed under the program of the Minister of Science and Higher Education under the name “Regional Initiative of Excellence” in the years 2019 - 2022 project number 020/RID/2018/19 the amount of financing 12,000,000 PLN.