cpp

Introduction

These are the most important facts about references:

The main uses of references:

We say that a reference binds to data, which means something like points to, though “points to” is used when talking about pointers.

A reference binds to the data of an lvalue, or an rvalue, but in short we say that a reference binds to an lvalue or an rvalue.

C++ references are like no references of other languages: in C++ a reference might not exists at run-time, because it was optimized out at compile-time.

In languages like Java or C#, references are pointers with shared-ownership semantics (i.e., a reference can be copied, and the object exists as long as at least one reference exists), and with the object member selection syntax. In these languages references must exist at run-time.

As an example that references are optimized out at compile-time, there are two programs below that produce the same output, but in the second one we use references. However, at compile-time, the references are gone.

Save this file as test1.cc:

#include <iostream>

int
main()
{
  int x = 1;
  double y = .2;
  std::cout << x << y << std::endl;
}

Save this file as test2.cc:

#include <iostream>
#include <utility>

int
main()
{
  int x = 1;
  double y = .2;
  std::pair<int &, double &> p{x, y};
  std::cout << p.first << p.second << std::endl;
}

Now compile them to the assembly code with:

g++ -S -O3 test1.cc test2.cc

Now there are two files with the assembly code: test1.s, and test2.s. Take a look at one of them:

c++filt < test1.s | less

Compare them to see that they are instruction-to-instruction the same:

diff test1.s test2.s

Reference types

There are three reference types:

Terms lvalue and rvalue in type names

Expressions are called an lvalue or an rvalue, e.g.:

These terms are also used to name a reference type:

NOW I GET IT: Even if the variable’s type is an rvalue reference, the expression consisting of its name is an lvalue expression.

Lvalue reference

An lvalue reference can bind to an lvalue, but not to an rvalue.

We define an lvalue reference like this:

T &name = <expr>;

Reference name binds to data of type T. & is called the lvalue reference declarator. The reference is initialized with expression <expr>.

Here are some examples:

#include <cassert>
#include <iostream>

int &
foo()
{
  static int x = 1;
  return x;
}

int
main()
{
  int x = 1;

  // Can initialize an lvalue reference with an lvalue.
  int &xr = x;

  // Error: needs initialization.
  // int &a;

  // OK: foo() is an lvalue.
  int &fr = foo();

  // A const.
  const int c = 300000000;

  // Error: an lvalue reference cannot bind to a const.
  // int &r = c;

  // Error: an lvalue reference cannot bind to an rvalue.
  // int &ncr = 1;

  // Initialized alright.
  int &z = x, y = 2;

  // The same as above, but this time we know what "&" applies to.
  // int y = 2, &z = x;

  // This placement of "&" is confusing: does "&" apply to "y" too?
  // int& z = x, y = 2;

  // IMPORTANT: Now z is an alias of x.  Whenever you see z, just
  // replace it with x to understand the code below.
  
  // A reference has no address.  &z is the address of x.
  assert (&z == &x);

  // There's no "reinitialization"!  It's an assignment to x.
  z = y;
  // Now x has 2.
  std::cout << "x = " << x << std::endl;

  // Initialize an lvalue reference with an lvalue reference.
  int &zz = z;
  // The above has the same effect as this one.
  int &zx = x;
}

Here are some examples for containers and arrays:

#include <array>
#include <vector>

int main()
{
  int x, y;

  // Containers can store pointers.
  std::vector<int *> v = {&x, &y};
  // But not references.
  // std::vector<int &> v;

  // Array can store pointers.
  int *a[] = {&x, &y};
  // But not references.
  // int &r[] = {x, y};

  // std::array can store pointers.
  std::array<int *, 2> b = {&x, &y};
  // But not references.
  // std::array<int &, 2> c = {x, y};
}

Here are some examples for std::pair and std::tuple:

#include <iostream>
#include <utility>
#include <tuple>
#include <vector>

// This example demonstrates that, unlike containers, std::pair and
// std::tuple can have elements of a reference type.  std::pair,
// std::tuple, and std::array are quasi-containers, because they have
// some container functionality, but cannot change size at run-time as
// containers do.
int main()
{
  // Error: the pair elements of reference type must be initialized.
  // std::pair<int &, int &> p2;

  int x = 1;

  std::pair<int &, int &> p(x, x);
  p.second = 2;

  std::cout << "x = " << x << std::endl;

  // The tuple elements of reference type must be initialized.
  std::tuple<int &, int &, int &> t(x, x, x);
  std::get<2>(t) = 3;

  std::cout << "x = " << x << std::endl;

  // Interestingly, we cannot have a vector of references, but we can
  // have a vector of reference pairs.
  std::vector<std::pair<int &, int &> > v{p, p};
  v[1].second = 4;

  std::cout << "v[0].first = " << v[0].first << std::endl;
  std::cout << "x = " << x << std::endl;
}

Const reference

A const reference can bind not only to an lvalue, but to an *rvalue** *too. This rule was introduced in C++98 to allow for binding a *function parameter reference to a temporary.

We define a const reference like this:

const T &name = <expr>;

This is exactly an lvalue reference that binds to the const data of type T, i.e., the const qualifier refers to the type of data the reference binds to. The reference itself is not really const, because we can’t change what the reference is bound to anyway. Nonetheless, it’s called the const reference for short; no need to say it’s an lvalue reference to const data.

Here are some examples:

#include <string>

using namespace std;

// The function parameter is a const reference.
int
foo(const string &)
{
  return 0;
}

int
main()
{
  int x = 1;
  // The const reference binds to an lvalue.  The type of the
  // initializing expression `x` is automatically augmented with the
  // const qualifier.
  const int &l1 = x;

  // Error: an lvalue reference cannot bind to an rvalue.
  // int &l2 = 1;
  // The const reference binds to an rvalue.
  const int &l3 = 1;

  // Expression "foo()" is an rvalue, because the function returns the
  // result by value.
  const int &c = foo("Hello!");
  // Error: we cannot take the address of the returned value.
  // const int *p = &foo("Hello!");
  // But we can take the address to the temporary to which reference c
  // is bound.  The reference extends the lifetime of the temporary.
  const int *p = &c;
  
  string s;
  // The function parameter reference binds to an lvalue.
  foo(s);
  // Since C++98, a const reference can bind to the data of an rvalue.
  // Here the function parameter reference binds to an rvalue, which
  // is the expression with a temporary of type string created with
  // the constructor taking a "const char *".
  foo("Hello!");
}

Rvalue reference

An rvalue reference can bind to an rvalue, but not to an lvalue.

We define an rvalue reference like this:

T &&name = <expr>;

&& is called the rvalue reference declarator.

The rvalue reference was introduced in C++11 to enable:

Here are some examples:

#include <iostream>

int main()
{
  // Error: an rvalue reference must be initialized.
  // int &&a;

  int i = 1;

  // Erorr: an rvalue reference cannot bind to an lvalue.
  // int &&z = i;

  int &l = i;
  // Error: l is an lvalue, and an rvalue reference can't bind to it.
  // int &&r = l;

  // OK: an rvalue reference can bind to an rvalue.
  int &&x = 1;

  // Error: x is an lvalue, and an rvalue reference can't bind to it.
  // x is an lvalue, because it has a name, even though it's of the
  // rvalue reference type.
  // int &&z = x;
}

A reference cannot rebind

Every reference (not only the const reference) cannot rebind to a new expression. A reference can only be initialized, i.e., bound to an expression. Such rebinding would be required in the assignment operator of a class-type, which has a reference member field, as in this example:

#include <utility>

struct A
{
  int &m_i;

  A(int &i): m_i(i)
  {
  }
};

int
main()
{
  int i;
  A a(i), b(i);

  // These would not compile:
  // a = b;
  // a = std::move(b);
}

Reference tricks

Reference type and function overload resolution

A function can be overloaded depending on the parameter types, and this applies to references too. We can have these overloads:

For a call expression foo(<expr>), a compiler will choose (which is called overload resolution):

A const reference (used in overload #2) can bind to an lvalue of a non-const type or to an rvalue, so when there is no overload #1 and #3, a compiler will choose overload #2 instead.

Here’s a complete example:

#include <iostream>

using namespace std;

// Uncomment, and analyze the difference in overload resolution.
//
// // Overload #1
// void
// foo(int &)
// {
//   cout << "int &" << endl;
// }

// Overload #2
void
foo(const int &)
{
  cout << "const int &" << endl;
}

// Uncomment, and analyze the difference in overload resolution.
//
// // Overload #3
// void
// foo(int &&)
// {
//   cout << "int &&" << endl;
// }

int
main()
{
  int x = 1;
  int &rx = x;
  foo(x); // Overload #1 if available, overload #2 otherwise.
  foo(rx); // Overload #1 if available, overload #2 otherwise.

  const int y = 2;
  const int &ry = y;
  foo(y); // Always overload #2, an error otherwise.
  foo(ry); // Always overload #2, an error otherwise.

  int &&rz = 3;
  foo(3); // Overload #3 if available, overload #2 otherwise.
  foo(rz); // Overload #1 if available, overload #2 otherwise.
}

Rvalue reference to an lvalue

We can explicitely get an rvalue reference to an lvalue with static_cast<T &&>(<expr>), where <expr> can be an lvalue or an rvalue. This is, however, a bit wordy, since we have to type in the type T.

It’s easier to get an rvalue reference with std::move(<expr>), where <expr> can be an lvalue or an rvalue. std::move is a function template: a compiler will deduce the type T based on <expr>, so we don’t have to type it in. That function uses static_cast<T &&>(<expr>).

Here’s an example:

#include <utility>

class A {};

int main()
{
  A a;

  // How come an lvalue is not converted to an rvalue with the
  // standard conversion?  Because this rule does not apply to
  // reference initialization.  The standard says that an rvalue
  // reference must be initialized with an rvalue.
  A &&r1 = static_cast<A &&>(a);
  A &&r2 = std::move(a);
}

The use case

I can think of one use case only. We use std::move(x) to explicitly enable the move semantics for object x (i.e., we turn x from an lvalue to an rvalue), which by default would not have the move sematics enabled, because the expression x is an lvalue. We enable the move semantics by making the compiler choose a different overload depending on category of the expression.

A temporary lifetime extension by reference

The lifetime of a temporary is extended by the reference that binds to it. The temporary will be destroyed when the reference goes out of scope. Otherwise, a temporary would be destroyed after the expression was evaluated.

#include <iostream>
#include <string>

using namespace std;

struct A
{
  string m_name;

  A(const string &name): m_name(name)
  {
    cout << "ctor: " << m_name << endl;
  }

  ~A()
  {
    cout << "dtor: " << m_name << endl;
  }
};

int main()
{
  A a = A("a");

  {
    const A &r1 = a;
  }

  {
    const A &r2 = A("r2");

    {
      const A &R2 = r2;
    }

    cout << "Just checking." << endl;
  }

  {
    A &&r3 = A("r3");
  }
}

We can even make a member reference bind to a temporary. The temporary will be destroyed, when the object is destroyed:

#include <iostream>

struct A
{
  ~A()
  {
    std::cout << "dtor of A\n";
  }
};

struct B
{
  const A &a = A();
};

int
main()
{
  B();
}

Conclusion