cpp

Introduction

Pointers are indispensible. Pointers:

point to memory locations, usually dynamically-allocated,
are used in every programming language: C, C++, Java, C#,
can be used in the form of a reference, e.g., in Java or C#.

The pointer support can be:

wrapped in a reference, e.g., in Java or C#,
raw or advanced, as in C++,
raw only, as in C.

In C++, it’s best to avoid the raw pointers, and go for the advanced pointer support in the form of standard smart pointers.

A reference in Java or C# is a shared pointer with the object member selection syntax (i.e., object.member). A C++ reference is an alias, which at compile time will be either optimized out or turned into a raw pointer.

Motivation: the problems of raw pointers

Raw pointers are easy to use, but their use is very error-prone, i.e., it’s easy to make mistakes.

The problems

When we have a pointer of type T * which points to a dynamically-allocated memory location, we face these problems:

the type problem: we don’t know whether a pointer points to a single piece of data or to an array of data,
the ownership problem: we don’t know whether we or someone else (i.e., some other programmer who implemented some other part of code) should destroy the allocated data,
the exception handling problem: exception handling with raw pointers is difficult, arduous, and error-prone.

The type problem

The new and delete operators come in many versions, most notably:

the single version for allocating a single piece of data,
the array version for allocating an array of data.

If we allocate data with either the single or array version of the new operator, we should destroy the data with the same version of the delete operator. However, the type of the pointer used is the same for both version, so it’s easy to mismatch the versions, which results in an undefined behaviour.

The ownership problem

The ownership problem can result in:

a memory leak, when the dynamically-allocated data is never destroyed,
a dangling pointer, when we keep using a memory location, but the data that used to be there was already destroyed,
a double deletion, when we try to destroy again the data that was already destroyed.

The exception handling problem

If we manage the dynamically-allocated data with raw pointers, the exception handling becomes a boring and error-prone task, espacially if the data is complex. It’s doable, but who really wants to do it?

An example

The example below shows how easily we can run into the type, ownership, and exception handling problems. The compiler does not report problems with this broken code.

// Who should destroy the allocated data?  Should the data be
// destroyed by the foo function?

void
foo(int *p)
{
  // By mistake the array delete is used.
  delete [] p;
}

int *
factory()
{
  int *p;

  try
    {
      p = new int;

      // Work on the new data.  An exception could be thrown here.
      throw true;

      return p;
    }
  catch(bool)
    {
      // It's easy to forget this delete:
      delete p;      
    }

  return p;
}

int
main()
{
  // The problem is brewing: we use a pointer to integer to point to
  // the begining of an array of integers.
  int *p = factory();

  // I'm thinking that foo will use, but not destroy the data.
  foo(p);

  // This is the second delete.
  delete p;
}

The smart pointer solution

A smart pointer manages dynamically-allocated data, and so we call a smart pointer the managing object, and the dynamically-allocated data the managed data.

A smart pointer doesn’t copy or move the managed data, it can only destroy the data.

The type of the managed data doesn’t have to be prepared in some special way in order to be managed by smart pointers, e.g., the type doesn’t have to be derived from some special type with the required functionality implemented.

The smart pointers solve:

the type problem: a smart pointer knows the type of the object, so that it can be automatically (i.e., without a programmer requesting it explicitely) destroyed in the proper way,
the ownership problem: a smart pointer automatically manages the dynamically-allocated data, i.e., takes care of their destruction, and implements either the exclusive or shared ownership,
the exception handling problem: a smart pointer is automatically destroyed (and so is the managed data) when an exception is handled.

Every flexible language should support raw pointers, because this low-level functionality is needed to implement high-level functionality, such as smart pointers.

A programmer should have a choice between the raw pointers (perhaps for implementing an intricate functionality) and smart pointers (just for every day use).

In C++, for every day use, a programmer should not resort to the raw pointers, let alone to the void * trickery – these times are long gone.

Smart pointer types

There are three smart pointer types defined in the memory header file:

std::unique_ptr - used to exclusively own the managed data,
std::shared_ptr - used to share the managed data,
std::weak_ptr - used to track, but not share, the managed data.

Smart pointer types are wrappers around raw pointers, which are used at compile-time only, and so at run-time they should not degrade the memory or time performance. They should be as fast and take as little memory as the code of the same functionality manually crafted with raw pointers.

Smart pointer types are:

exception-safe: they can be used without problems when throwing or catching exceptions,
not thread-safe: they should be carefully used in a multithreaded program.

There is also deprecated type std::auto_ptr – don’t use it.

`std::unique_ptr`

An object of type std::unique_ptr has the exclusive ownership semantics:

exclusive, because the managing object is the sole owner of the managed data, i.e., there can be only a single object that owns the managed data,
ownership, because the managing object is responsible for destroying the managed data.

The exclusivity implies that std::unique_ptr is a move-only type, and so:

you cannot copy-initialize or copy-assign objects of this type, and for this reason this type has the copy constructor, and the copy assignment operator explicitly deleted,
you can transfer the ownership between the managing objects by move-initializing, and move-assigning.

The ownership implies that the managed data should be destroyed when the managing object is:

destroyed, e.g., goes out of scope,
assigned new data to manage.

Most likely you need this smart pointer when you want to switch from raw pointers to smart pointers.

An example

Type std::unique_ptr is a templated type: you need to pass the type of managed data as an argument to the template type. We pass the template arguments in the angle brackets, i.e., <>, like this:

std::unique_ptr<managed_data_type> p;

In the example below, the managing object p manages the data of type int, which will be automatically destroyed by p when it goes out of scope.

#include <memory>

int
main()
{
  std::unique_ptr<int> p(new int);
}

Function `std::make_unique`

Function template std::make_unique was introduced for convenience (we could do without it): it creates both the managing object, and the managed data.

We can create the managed data yourself with the new operator, and pass its raw pointer to the managing object like this:

unique_ptr<A> up(new A("A1"));

Instead, we can write the equivalent code like this, without typing type A twice:

auto up = make_unique<A>("A1");

Function std::make_unique introduces no overhead: the move constructor will be elided for the return value, and so the managing object will be created directly in the location of up.

By type auto above we ask the compiler to make the type of up the same type as the type of the initializing expression make_unique<A>("A1"), which is std::unique_ptr<A>. We could have equivallently written:

unique_ptr<A> up = make_unique<A>("A1");

To use function std::make_unique, you need to pass the template argument, which is the type of the managed data to create, and manage. The arguments (none, one or more) of the function are passed to the constructor of the managed data (a feat accomplished with the variadic templates). In the example above "A1" is the argument passed to the constructor of type A.

No performance overhead

This example demonstrates that there is no performance overhead of using smart pointers. In more complicated examples there might be some small overhead, which should go away as compilers get better at optimization.

The following example uses both the std::unique_ptr and std::make_unique. Save this file as test1.cc:

#include <memory>

int
main()
{
  auto p = std::make_unique<int>();
}

The following example of the same functionality uses raw pointers. Save this file as test2.cc:

#include <memory>

int
main()
{
  int *p = new int;
  delete p;
}

Now compile them to the assembly code with:

g++ -S -O3 test1.cc test2.cc

Now there are two files with the assembly code: test1.s, and test2.s. Take a look at one of them:

c++filt < test1.s | less

Compare them to see that they are instruction-to-instruction the same (almost, there is one small difference), which shows there is no overhead of using std::unique_ptr and std::make_unique:

diff test1.s test2.s

How to use `std::unique_ptr`

The example below demonstrates the basic usage of std::unique_ptr.

#include <cassert>
#include <iostream>
#include <memory>
#include <string>

using namespace std;

struct A
{
  string m_name;

  A()
  {
    cout << "default ctor\n";
  }

  A(string &&name): m_name(std::move(name))
  {
    cout << "ctor: " << m_name << endl;
  }

  ~A()
  {
    cout << "dtor: " << m_name << endl;
  }

  // Smart pointers never copy or move their managed data, so we can
  // delete these special member functions, and the code should
  // compile.
  A(const A &) = delete;
  A(A &&) = delete;
  A &operator=(const A &) = delete;
  A &operator=(A &&) = delete;
};

int
main()
{
  // That's an empty pointer.
  std::unique_ptr<A> p1;

  // That's how we test whether a pointer manages some data.
  assert(!p1);
  assert(p1 == nullptr);

  // This pointer manages an object.
  std::unique_ptr<A> p2(new A("A1"));
  assert(p2);
  assert(p2 != nullptr);

  // We can assign a new object to manage, but not this way.
  // p1 = new A("A1'");

  // That's the correct way.  The previously managed object is
  // destroyed.
  p2.reset(new A("A2"));

  // Or better yet:
  p2 = make_unique<A>("A3");

  // We cannot copy-initialize, because the ownership is exclusive.
  // std::unique_ptr<A> p3(p2);
  // auto p3(p2);

  // We cannot copy-assign, because the ownership is exclusive.
  // p2 = p1;

  // We can move-initialize to move the ownership.
  auto p3 = std::move(p2);

  // We can move-assign to move the ownership.
  p2 = std::move(p3);

  // That's how we can get access to the managed data.
  cout << p2->m_name << endl;
  cout << (*p2).m_name << endl;
  cout << p2.get()->m_name << endl;

  // The "release" function releases p1 from managing the data.  The
  // managed data is not destroyed.  Luckily, p1 doesn't manage
  // anything, so we don't get a memory leak.
  p1.release();
}

The solutions to the problems

The type problem

The type problem, more specifically the problem of mismatching the single and array versions of the new and delete operators, is solved by two versions (two template overloads) of smart pointers:

std::unique_ptr<A>: the managed data will be destroyed with the single version of the delete operator,
std::unique_ptr<A[]>: the managed data will be destroyed with the array version of the delete operator.

By using the right version of the smart pointer, you don’t have to remember to destroy the managed data with the matching version of the delete operator.

Lurking problems, and how to deal with them.

However, we still can introduce bugs like in the example below, where we:

declare to allocate a single integer, but allocate an array of integers,
declare to allocate an array of integers, but allocate a single integer.

Use std::make_unique to get the same done safer, as shown below.

#include <memory>

using namespace std;

int
main()
{
  // Undefined behavior!
  unique_ptr<int> up1(new int[5]);
  unique_ptr<int[]> up2(new int);

  // The preferred way, because it's less error-prone.
  auto up3 = make_unique<int>();
  auto up4 = make_unique<int[]>(5);
}

Use `std::array` instead!

If you really have to have an array of static size (i.e., the size doesn’t change at run-time), it’s better to use std::array instead of the C-style array. You can use it with smart pointers like this:

#include <array>
#include <memory>

using namespace std;

int
main()
{
  unique_ptr<array<int, 5>> up1(new array<int, 5>);
  auto up2 = make_unique<array<int, 5>>();
}

The ownership problem

The ownership problem is solved: you just move the ownership where you need to, e.g., a function or some structure. You can move the ownership when you pass or return a unique pointer by value in a function call, as shown in the example below.

#include <iostream>
#include <memory>

using namespace std;

struct A
{
  A()
  {
    cout << "ctor\n";
  }

  ~A()
  {
    cout << "dtor\n";
  }
};

auto // C++14
factory()
{
  auto p = make_unique<A>();

  return p; // Return value optimization.
}

void
stash(unique_ptr<A> p)
{
  static unique_ptr<A> stash;
  stash = std::move(p);
}

int
main()
{
  auto p = factory();
  stash(std::move(p));
}

The exception handling problem

When an exception is thrown, the data previously allocated (or any other resource acquired) and not required any longer because of the exception, should be deleted (or released). When programming with raw pointers, we can release the memory in the catch block, as shown in the example below. We have to declare p before the try block, so that it’s accessible in the catch block, and that complicates the code.

#include <iostream>

struct A
{
  A()
  {
    std::cout << "ctor\n";
  }

  ~A()
  {
    std::cout << "dtor\n";
  }
};

void
foo()
{
  throw true;
}

int
main(void)
{
  A *p;

  try
    {
      p = new A();
      foo();
      delete p;
    }
  catch (bool)
    {
      // Have to delete.
      delete p;
    }
  
  return 0;
}

The same can be accomplished with smart pointers better:

#include <iostream>
#include <memory>

struct A
{
  A()
  {
    std::cout << "ctor\n";
  }

  ~A()
  {
    std::cout << "dtor\n";
  }
};

void
foo()
{
  throw true;
}

int
main(void)
{
  try
    {
      auto p = std::make_unique<A>();
      foo();
    }
  catch (bool)
    {
    }
  
  return 0;
}

Raw pointers not so easy, rather error-prone.

Because function arguments are not guaranteed to be evalued in the order they are listed, in the example below we’ve got a memory leak. At least I’ve got it with GCC, and if you don’t, try to swap the arguments in the call to foo. The object of class A is:

created, because the second argument of the call to function foo is evaluated first, before function index is called,
not destroyed, because function foo is not called, because a call to function index throws an exception.

#include <iostream>

struct A
{
  A()
  {
    std::cout << "ctor\n";
  }

  ~A()
  {
    std::cout << "dtor\n";
  }
};

void
foo(int, A *p)
{
  delete p;
}

int
index()
{
  throw true;
  return 0;
}

int
main(void)
{
  try
    {
      foo(index(), new A());
    }
  catch (bool)
    {
    }
  
  return 0;
}

The same can be accomplished the safe way with smart pointers. This code works correctly regardless of whether an exception is thrown or not.

#include <iostream>
#include <memory>

struct A
{
  A()
  {
    std::cout << "ctor\n";
  }

  ~A()
  {
    std::cout << "dtor\n";
  }
};

void
foo(int, std::unique_ptr<A> p)
{
}

int
index()
{
  throw true;
  return 0;
}

int
main(void)
{
  try
    {
      foo(index(), std::make_unique<A>());
    }
  catch (bool)
    {
    }
  
  return 0;
}

The first example revisited

Below there is the first example fixed with raw pointers. All problems gone.

#include <memory>

using namespace std;

void
foo(unique_ptr<int> p)
{
}

unique_ptr<int>
factory()
{
  try
    {
      auto p = make_unique<int>();

      // Work on the new data.  An exception could be thrown here.
      throw true;

      return p;
    }
  catch(bool)
    {
    }

  return nullptr;
}

int
main()
{
  auto p = factory();
  foo(std::move(p));
}

Conclusion

Don’t use raw pointers, unless you really have to.
Start using std::unique_ptr, the most useful smart pointer type.
Smart pointers solve the type, ownership and exception handling problems.
Smart pointers introduce no, or little, performance overhead.
Go for the smart pointers!

Quiz

Why should we use smart pointers instead of raw pointers?
What is the exclusive ownership?
What do we need the make_uniqe function for?

Acknowledgement

The project financed under the program of the Minister of Science and Higher Education under the name “Regional Initiative of Excellence” in the years 2019 - 2022 project number 020/RID/2018/19 the amount of financing 12,000,000 PLN.

cpp

Introduction

Motivation: the problems of raw pointers

The problems

The type problem

The ownership problem

The exception handling problem

An example

The smart pointer solution

Smart pointer types

std::unique_ptr

An example

Function std::make_unique

No performance overhead

How to use std::unique_ptr

The solutions to the problems

The type problem

Lurking problems, and how to deal with them.

Use std::array instead!

The ownership problem

The exception handling problem

Raw pointers not so easy, rather error-prone.

The first example revisited

Conclusion

Quiz

Acknowledgement

`std::unique_ptr`

Function `std::make_unique`

How to use `std::unique_ptr`

Use `std::array` instead!