cpp

Introduction

C++ processes data of built-in types (e.g., int, long double) or user-defined types (e.g., struct A, class B, enum C, union D). The C++ standard describes:

The C++ memory organization has to respect the basic requirements of an operating system, but the rest is up to C++.

The basic requirements of an operating system

When we run a program, it becomes a process of an operating system, and a task a processor is executing. A process manages its memory within the limits imposed by the operating system. An operating system offers a process two types of memory: read-only, and read-write.

The read-only memory stores the code of the program (i.e., the processor instructions), and the static data (e.g., string literals). This memory is shared by all processes of the same program, which can be a substantial saving for a large program run in a large number (e.g., a web server).

An unprivileged task (a privileged task is a kernel task, i.e., a task of an operating system) cannot do anything that could disturb the operating system and other processes. For instance, an unprivileged task cannot write to its read-only memory. Every process is an unprivileged task.

In the following example we try to write to the read-only memory. The code compiles, but is killed by the operating system with the SIGSEGV (segment violation) signal.

#include <iostream>

using namespace std;

int main()
{
  // "Hello!" is a string literal in the read-only memory.
  *static_cast<char *>("Hello!") = 'Y';

  // A string literal is of type "const char *", and that's why we had
  // to static-cast it to "char *".  This would not compile:
  // *"Hello!" = 'Y';
}

All other data is located in the read-write memory, because it can be changed by a process. Every process has its own read-write memory, even when there are many processes of the same program.

What is up to C++

C++ strives for time and memory performance, and that is reflected in the memory organization by, e.g., using pointers (C++ keeps close to hardware). Furthermore, C++ also strives for a flexible control over data management by, e.g., allowing a programmer to allocate an object statically, globally, locally or dynamically. Finally, the C++ memory organization is also deterministic: we know exactly when and where the data are destroyed (so that they are destroyed as soon as no longer needed).

C++ is in stark contrast with other languages, such as Java or C#, where object management is simplified at the cost of performance, and the lack of flexible control. For instance, such languages allow allocation of objects on the heap only, which deteriorates performance and flexibility, but enables easy implementation of garbage collection. Some garbage collectors are even further inefficient, because they are nondeterministic, i.e., it is undefined when data are destroyed.

In the past, the C++ Standard Committee considered the garbage collection support, but dropped it for performance reasons. Nowadays, C++ requires no garbage collection since it offers advanced container and smart pointer support, which could be considered a form of garbage collection.

Data and their location

The read-write memory stores:

The global and static data

Global data are initialized before entering the main function:

#include <iostream>

using namespace std;

// This is not a string literal, but a table of characters initialized
// with a string literal.
char t[] = "Hello!";

int main()
{
  t[0] = 'Y';
  cout << t << endl;
}

Static data are initialized before its first use:

#include <iostream>

using namespace std;

struct A
{
  A()
  {
    cout << "A" << endl;
  }
};

void foo(bool flag)
{
  cout << "foo" << endl;
  if (flag)
    static A a;
}

int main()
{
  cout << "Main" << endl;
  foo(false);
  foo(true);
  foo(true);
}

In the example above remove static, and notice the changes in the program output.

The local data

All data local to a function or a block scope is allocated on the stack. The local data is automatically destroyed when it goes out of scope. It’s not only a great property you can rely on to have your data destroyed, but also a necessity since the stack has to be cleaned up when the scope ends.

Data created locally are destroyed in the reverse order of their creation, because the stack is a FILO (first in, last out) structure.

#include <iostream>
#include <string>

using namespace std;

struct A
{
  string m_name;

  A(const string &name): m_name(name)
  {
    cout << "ctor: " << m_name << endl;
  }

  ~A()
  {
    cout << "dtor: " << m_name << endl;
  }
};

int main()
{
  A a("a, function scope");
  A b("b, function scope");

  // Block scope.
  {
    A a("a, block scope");
    A b("b, block scope");
  }
  
  cout << "Bye!" << endl;
}

The dynamic data

Dynamic data are created on the heap, and should be managed by smart pointers, which in turn use the low-level functionality of the new and delete operators provided for raw pointers.

Data created with the new operator has to be eventually destroyed by the delete operator, otherwise we get a memory leak. We cannot destroy the same data twice, otherwise we get undefined behavior (e.g., a segmentation fault, bugs).

A programmer should use the smart pointers, which is error-safe but hard. In contrast, using raw pointers is error-prone (often resulting in vexing heisenbugs) but easy. Since smart pointers are the C++11 functionality, modern code uses the smart pointers, and the legacy code the raw pointers.

The following example uses the low-level new and delete operators, which is not recommended, but suitable to demonstrate the dynamic allocation.

#include <iostream>
#include <string>

using namespace std;

struct A
{
  A()
  {
    cout << "ctor\n";
  }

  ~A()
  {
    cout << "dtor\n";
  }
};

A * factory()
{
  return new A();
}

int main()
{
  A *p = factory();
  delete p;

  cout << "Bye!" << endl;
}

Local vs dynamic data

Allocation on the stack is fast: it’s only necessary to increase (or decrease, depending on the system architecture) the stack pointer (a.k.a. the stack register) by the size of the data needed. No memory allocation is faster. If an operating system supports it, the stack can have more memory allocated automatically when needed, i.e., without the process requesting is explicitly.

The following code tests how big a stack is, and whether an operating system automatically allocates more memory for the stack. A function calls itself and prints the number of how many times the function was recursively called. If we see small numbers (below a million) when the process is terminated, the operating system does not automatically allocate more memory for the stack. If we see large numbers (above a million or far more), then the operating system most likely automatically allocates more memory for the stack.

#include <iostream>

using namespace std;

void
foo(long int x)
{
  int y = x;
  cout << y << endl;
  foo(++y);
}

int main()
{
  foo(0);
}

Allocation on the heap is slow, because it’s a complex data structure which not only allocates and deallocates memory of an arbitrary size, but also deals with defragmentation, and so several memory reads and writes are necessary for an allocation. An operating system allocates more memory for the heap, when the process (i.e., the library, which allocates memory) requests it.

Data located on the stack is packed together according to when the data was created, and so data that are related are close to each other. This is called localization. And localization is good, because the data that a process needs is most likely already in the processor memory cache (which caches memory pages), speeding up the memory access manyfold. Data allocated on the heap are less localized, i.e., they are more likely to be spread all over the heap memory, which slows down the memory access, as quite likely the data is not in the processor memory cache.

Function calls

When calling a function we pass an argument by either value or reference. Also, a function can return its result by either value or reference.

Passing arguments

In C++ arguments are always passed either by value or by reference.

If a parameter of a function is of a non-reference type, we say that a function takes an argument by value, or that we pass an argument to a function by value. The argument (i.e., the argument expression) is used to initialize the parameter, which in the legacy C++ always entailed copying the data from the argument to the parameter.

If a parameter of a function is of a reference type, we say that a function takes an argument by reference, or that we pass an argument to a function by reference. The reference parameter is initialized by the argument expression. The parameter becomes a name (an alias) for the data of the argument expression.

This example shows how we pass arguments by value and by reference. Compile the example with the flag -fno-elide-constructors.

#include <iostream>
#include <string>

using namespace std;

struct A
{
  string m_name;

  A(const string &name): m_name(name)
  {
    cout << "ctor: " << m_name << endl;
  }

  A(const A &a)
  {
    cout << "copy-ctor: " << a.m_name << endl;
    m_name = a.m_name + " copy";
  }

  void hello() const
  {
    cout << "Hello from " << m_name << endl;
  }
};

void foo(A a)
{
  a.hello();
}

void goo(const A &a)
{
  a.hello();
}

int main()
{
  foo(A("foo"));
  goo(A("goo"));
}

Returning values

A function can return a result either by value or reference.

If the return type is of a non-reference type, we say that a function returns the result by value. In the deep past (before C++ was standardized) that always entailed copying the result (i.e., the data local to the function) from one location on the stack to a temporary on the stack, and then to its final location, e.g., a variable.

If the return type is of a reference type, we say that a function returns the results by reference. The reference should be bound to data that will exist when the function returns (i.e., the data should outlive the function). Containers (e.g., std::vector), for instance, return a reference to dynamically-allocated data in, for instance, operator[] or front functions.

This example shows how to return results by value and by reference. Compile the example with the flag -fno-elide-constructors.

#include <iostream>
#include <string>

using namespace std;

struct A
{
  string m_name;

  A(const string &name): m_name(name)
  {
    cout << "ctor: " << m_name << endl;
  }

  A(const A &a)
  {
    cout << "copy-ctor: " << a.m_name << endl;
    m_name = a.m_name + " copy";
  }

  void hello() const
  {
    cout << "Hello from " << m_name << endl;
  }
};

A foo()
{
  A a("foo");
  return a;
}

A & goo()
{
  static A a("goo");
  return a;
}

int main()
{
  foo().hello();
  goo().hello();

  A a = foo();
  a.hello();
}

Function call convention

The technical details on how exactly a function is called is known as the call convention, which depends on the system architecture, the operating system, and the compiler. C++ does not specify a call convention, but some C++ functionality (like the constructor elision and the return value optimization) follows from a typical call convention.

Typically, a call convention requires that the caller of the function (i.e., the code that calls the function):

Small data may be passed or returned in processor registers. For instance, if a function returns an integer, the return value can be returned in a register, e.g., EAX for x86, Linux, and GCC.

Legacy call conventions required the memory for the return value be the last data on the stack before a function was called, so that it could located with the pointer register. This, however, entailed copying of the return value from that temporary (the last on the stack) location to its final destination, e.g., a local variable.

Modern call conventions allow the memory for the return value be allocated anywhere in memory (on the stack, on the heap, or in the fixed-size memory for the static and global data), and the address be passed to a function in a processor register (e.g., RDI for x86, Linux, and GCC), so that the function can create the return value in the pointed location.

The following example demonstrates that the return value can be created anywhere (as the modern call convention allows), and not only on the stack (as the legacy call convention stipulated). In the example a function returns an object which is created directly in the memory location for global and static data, without copying the object from the stack as the legacy call convention would require.

#include <iostream>

using namespace std;

struct A
{
  A()
  {
    cout << "default-ctor" << endl;
  }

  A(const A &a)
  {
    cout << "copy-ctor" << endl;
  }

  ~A()
  {
    cout << "dtor" << endl;
  }
};

A foo()
{
  return A();
}

A a = foo();

int main()
{
}

Constructor elision

C++ elides (avoids) constructors (specifically, two constructors: the copy constructor, and the move constructor) for temporary or local objects that would soon be destroyed. Instead of creating a temporary, the object is created in the final location where it would end up.

This example that demonstrates the constructor elision. Compile the example with, then without the flag -fno-elide-constructors. Notice the differences at run-time.

#include <iostream>
#include <string>

using namespace std;

struct A
{
  string m_name;

  A()
  {
    cout << "default-ctor" << endl;
  }

  A(const string &name): m_name(name)
  {
    cout << "direct-ctor: " << m_name << endl;
  }

  A(const A &a): m_name(a.m_name)
  {
    cout << "copy-ctor: " << m_name << endl;
  }

  ~A()
  {
    cout << "dtor: " << m_name << endl;
  }
};

int main()
{
  // That's a function declaration, though in the legacy C++ it used
  // to mean the default initialization of object "foo".
  A foo();

  // The equivalent ways of default initialization.
  {
    A a;
    A b{};
    A c = A();
    A d = A{};

    // Acceptable and interesting, but we don't code like that.
    A e = A(A());
    A f = A{A{}};
  }

  // The equivalent ways of direct (with arguments) initialization.
  {
    A a("a");
    A b{"b"};
    A c = A("c");
    A d = A{"d"};

    // Acceptable and interesting, but we don't code like that.
    A e = A(A("e"));
    A f = A{A{"f"}};
  }
}

Compile the previous examples of passing arguments to and returning results from functions but without disabling the constructor elision. Notice that with constructor elision, objects are not copied unnecessarily.

When a temporary is passed by value as an argument, that temporary is created directly (i.e., with the constructor elided) in the location of the function parameter.

Return value optimization

When a result is returned by value from a function, it can be created directly (i.e., with the constructor elided) in the location for the return value. This is known as the return value optimization (RVO).

RVO not always can take place, because of technical reasons. First, because we return data, which has to be created prior to deciding which data exactly to return:

#include <iostream>
#include <string>

using namespace std;

struct A
{
  A()
  {
    cout << "default-ctor" << endl;
  }

  A(const A &a)
  {
    cout << "copy-ctor" << endl;
  }

  ~A()
  {
    cout << "dtor" << endl;
  }
};

A foo(bool flag)
{
  // These objects have to be created, and since we don't know which
  // is going to be returned, both of them have to be created locally.
  A a, b;

  // The returned value must be copied.
  return flag ? a : b;
}

int main()
{
  foo(true);
}

Second, because we try to return a function parameter, which was created by the caller, not the function, and so the function cannot create the parameter in the location for the return value:

#include <iostream>
#include <string>

using namespace std;

struct A
{
  A()
  {
    cout << "default-ctor" << endl;
  }

  A(const A &a)
  {
    cout << "copy-ctor" << endl;
  }

  ~A()
  {
    cout << "dtor" << endl;
  }
};

A foo(A a)
{
  return a;
}

int main()
{
  foo(A());
}

Finally, because we try to return static or global data, which has to be available after the function returns, and so the function can only copy the result from the static or global data:

#include <iostream>
#include <string>

using namespace std;

struct A
{
  A()
  {
    cout << "default-ctor" << endl;
  }

  A(const A &a)
  {
    cout << "copy-ctor" << endl;
  }

  ~A()
  {
    cout << "dtor" << endl;
  }
};

// Global data.
A a;

A foo()
{
  // This one overshadows the global "a".
  static A a;
  return a;
}

A goo()
{
  return a;
}

int main()
{
  foo();
  goo();
}

Conclusion

Data can be allocated statically, globally, locally or dynamically.

Allocating memory for local data (on the stack) is ultra fast, while for dynamic data (on the heap) is much slower.

Don’t use the dynamically-allocated data, if local data is good enough.

Passing parameters or return results by value is not that bad, because most likely their copying or moving will be elided.