cpp

Introduction

These are the most important facts about references:

Reference is an alias (a name) for some data (a variable, an object, a temporary).
Reference has no identity, and so we cannot take its address.
When using a reference to an object, we use the object member selection syntax (i.e., object.member), and not the object member selection through a pointer syntax (i.e., pointer->member).
A reference must be initialized, and so there are no null references like there can be null pointers.
Unlike a pointer, a reference cannot be changed to be an alias for some other data.
A reference type cannot have the const and volatile type qualifiers.
There is a reference to a pointer, but not a pointer to a reference.
A reference can be an element of std::pair and std::tuple, but not of a container or an array.
There is type void *, but not void & (thank goodness).

The main uses of references:

passing an argument to a function by reference,
returning a result from a function by reference,
referencing data in an object member field,
accessing data read-only (a const reference).

We say that a reference binds to data, which means something like points to, though “points to” is used when talking about pointers.

A reference binds to the data of an lvalue, or an rvalue, but in short we say that a reference binds to an lvalue or an rvalue.

C++ references are like no references of other languages: in C++ a reference might not exists at run-time, because it was optimized out at compile-time.

In languages like Java or C#, references are pointers with the shared-ownership semantics (i.e., a reference can be copied, and the object exists as long as at least one reference exists), and with the object member selection syntax. In these languages references must exist at run-time.

As an example that references are optimized out at compile-time, there are two programs below that produce the same output, but in the second one we use references. However, at compile-time, the references are gone.

Save this file as test1.cc:

#include <iostream>

int
main()
{
  int x = 1;
  double y = .2;
  std::cout << x << y << std::endl;
}

Save this file as test2.cc:

#include <iostream>
#include <utility>

int
main()
{
  int x = 1;
  double y = .2;
  std::pair<int &, double &> p{x, y};
  std::cout << p.first << p.second << std::endl;
}

Now compile them to the assembly code with:

g++ -S -O3 test1.cc test2.cc

Now there are two files with the assembly code: test1.s, and test2.s. Take a look at one of them:

c++filt < test1.s | less

Compare them to see that they are instruction-to-instruction the same:

diff test1.s test2.s

Reference types

There are three reference types:

T & - an lvalue reference: binds to data that we can modify, but not move (because they still will be needed),
const T & - a const reference: binds to data that we neither can modify nor move,
T && - an rvalue reference: binds to data that we can both modify and move (because they soon will not be needed).

The reference types we use only in the definition of a variable type or of a function return type. An expression is never of a reference type, because a reference is replaced with the data the reference refers to. [expr.type]

Terms lvalue and rvalue in type names

Expressions are called an lvalue or an rvalue, e.g.:

"1" is an lvalue,
1 is an rvalue.

These terms are also used to name a reference type:

int &x = <expr>; - expression x is of the lvalue reference type, and of the lvalue category,
int &&x = <expr>; - expression x is of the rvalue reference type, but of the lvalue category.

NOW I GET IT: Even if the variable’s type is an rvalue reference, the expression consisting of its name is an lvalue expression.

Lvalue reference

An lvalue reference can bind to an lvalue, but not to an rvalue.

We define an lvalue reference like this:

T &name = <expr>;

Reference name binds to data of type T. That thingy & is called the lvalue reference declarator. The reference is initialized with expression <expr>.

Here are some examples:

#include <cassert>
#include <iostream>

int &
foo()
{
  static int x = 1;
  return x;
}

int
main()
{
  int x = 1;

  // Can initialize an lvalue reference with an lvalue.
  int &xr = x;

  // Error: needs initialization.
  // int &a;

  // OK: foo() is an lvalue.
  int &fr = foo();

  // A const.
  const int c = 300000000;

  // Error: an lvalue reference cannot bind to a const.
  // int &r = c;

  // Error: an lvalue reference cannot bind to an rvalue.
  // int &ncr = 1;

  // Initialized alright.
  int &z = x, y = 2;

  // The same as above, but this time we know what "&" applies to.
  // int y = 2, &z = x;

  // This placement of "&" is confusing: does "&" apply to "y" too?
  // int& z = x, y = 2;

  // IMPORTANT: Now z is an alias of x.  Whenever you see z, just
  // replace it with x to understand the code below.
  
  // A reference has no address.  &z is the address of x.
  assert (&z == &x);

  // There's no "reinitialization"!  It's an assignment to x.
  z = y;
  // Now x has 2.
  std::cout << "x = " << x << std::endl;

  // Initialize an lvalue reference with an expression that z is an
  // alias of.  Expression z is simply treated as x.
  int &zz = z;
  // Therefore the above has the same effect as this one.
  int &zx = x;
}

Here are some examples for containers and arrays:

#include <array>
#include <vector>

int main()
{
  int x, y;

  // Containers can store pointers.
  std::vector<int *> v = {&x, &y};
  // But not references.
  // std::vector<int &> v;

  // Array can store pointers.
  int *a[] = {&x, &y};
  // But not references.
  // int &r[] = {x, y};

  // std::array can store pointers.
  std::array<int *, 2> b = {&x, &y};
  // But not references.
  // std::array<int &, 2> c = {x, y};
}

Here are some examples for std::pair and std::tuple:

#include <iostream>
#include <utility>
#include <tuple>
#include <vector>

// This example demonstrates that, unlike containers, std::pair and
// std::tuple can have elements of a reference type.  std::pair,
// std::tuple, and std::array are quasi-containers, because they have
// some container functionality, but cannot change size at run-time as
// containers do.
int main()
{
  // Error: the pair elements of reference type must be initialized.
  // std::pair<int &, int &> p2;

  int x = 1;

  std::pair<int &, int &> p(x, x);
  p.second = 2;

  std::cout << "x = " << x << std::endl;

  // The tuple elements of reference type must be initialized.
  std::tuple<int &, int &, int &> t(x, x, x);
  std::get<2>(t) = 3;

  std::cout << "x = " << x << std::endl;

  // Interestingly, we cannot have a vector of references, but we can
  // have a vector of reference pairs.
  std::vector<std::pair<int &, int &> > v{p, p};
  v[1].second = 4;

  std::cout << "v[0].first = " << v[0].first << std::endl;
  std::cout << "x = " << x << std::endl;
}

Const reference

We define a const reference like this:

const T &name = <expr>;

Precisely, it is called an lvalue reference that binds to the const data of type T, i.e., the const qualifies the type of the data and not the reference. The reference itself is not really const because we can’t change the reference to bind to something else anyway. Nonetheless, it’s called the const reference for short; no need to say it’s an lvalue reference to const data.

The const reference was introduced so that the data can be referenced read-only. For instance, a function can accept an argument by const reference, thus ensuring that the argument will not be modified. Furthermore, in order to let a function accept a temporary (i.e., an rvalue) as an argument, C++98 stated:

A const reference can bind not only to an lvalue, but to an rvalue too.

Here are some examples:

#include <string>

using namespace std;

// The function parameter is a const reference.
int
foo(const string &)
{
  return 0;
}

int
main()
{
  // The data we bind references to.
  int x = 1;
  const int cx = 1;

  // The const reference binds to an lvalue.  The type of the
  // initializing expression `x` is automatically augmented with the
  // const qualifier.
  const int &l1 = x;

  // The reference to a const int is initialized with a const int.
  const int &cr = cx;

  // Error: an lvalue reference cannot bind to an rvalue.
  // int &l2 = 1;
  // The const reference binds to an rvalue.
  const int &l3 = 1;

  // Expression "foo()" is an rvalue, because the function returns the
  // result by value.
  const int &c = foo("Hello!");
  // Error: we cannot take the address of the returned value.
  // const int *p = &foo("Hello!");
  // But we can take the address to the temporary to which reference c
  // is bound.  The reference extends the lifetime of the temporary.
  const int *p = &c;
  
  string s;
  // The function parameter reference binds to an lvalue.
  foo(s);
  // Since C++98, a const reference can bind to the data of an rvalue.
  // Here the function parameter reference binds to an rvalue, which
  // is the expression with a temporary of type string created with
  // the constructor taking a "const char *".
  foo("Hello!");
}

Rvalue reference

An rvalue reference can bind to an rvalue, but not to an lvalue.

We define an rvalue reference like this:

T &&name = <expr>;

The rvalue reference declarator is &&.

The rvalue reference was introduced in C++11 to enable:

the move semantics,
the perfect forwarding of function arguments.

Here are some examples:

#include <iostream>

int main()
{
  // Error: an rvalue reference must be initialized.
  // int &&a;

  int i = 1;

  // Erorr: an rvalue reference cannot bind to an lvalue.
  // int &&z = i;

  int &l = i;
  // Error: l is an lvalue, and an rvalue reference can't bind to it.
  // int &&r = l;

  // OK: an rvalue reference can bind to an rvalue.
  int &&x = 1;

  // Error: x is an lvalue, and an rvalue reference can't bind to it.
  // x is an lvalue, because it has a name, even though it's of the
  // rvalue reference type.
  // int &&z = x;
}

A reference cannot rebind

Every reference (not only the const reference) cannot rebind to a new expression. A reference can only be initialized, i.e., bound to an expression once. Such rebinding would be required in the assignment operator of a class-type, which has a reference member field, as in this example:

#include <utility>

struct A
{
  int &m_r;

  A(int &r): m_r(r)
  {
  }
};

int
main()
{
  int i, j;
  A a(i), b(j);

  // These would not compile:
  // a = b;
  // a = std::move(b);
}

Qualifiers

A pointer type can be cv-qualified, i.e., can have the const or volatile qualifiers, while a reference type cannot (i.e., can only be cv-unqualified).

A pointer example:

int main()
{
  int x = 1;
  const int cx = 2;

  // A pointer to an integer.
  int *p2i_1 = &x;
  // A pointer to an integer cannot point to a const integer, because
  // then we could modify the integer that we declare const.
  // int *p2i_2 = &cx;

  // A pointer to a const integer.
  const int *p2ci_1 = &x;
  const int *p2ci_2 = &cx;

  // A const pointer to an integer.
  int * const cp2i = &x;
  // We can modify the data pointed to by the pointer all right.
  *cp2i = 2;
  // But not the pointer itself.
  // cp2i = &x;

  // Const pointer to a const integer.
  const int * const cp2ci_1 = &x;
  const int * const cp2ci_2 = &cx;
  // We cannot modify the data:
  // *cp2ci_1 = 2;
  // Nor the pointer itself:
  // cp2ci_1 = &cx;

  // The same applies for the volatile qualifier.
  const volatile int * const volatile cvp2cvi = &x;
}

A reference example:

int main()
{
  int x = 1;
  const int cx = 2;

  // A reference to an integer.
  int &r2i_1 = x;
  // A reference to an integer cannot point to a const integer, because
  // then we could modify the integer that we declare const.
  // int &r2i_2 = cx;

  // A reference to a const integer.
  const int &r2ci = x;
  // A reference to a volatile integer.
  volatile int &r2vi = x;
  // A reference to a const volatile integer.
  const volatile int &r2cvi = x;

  // A const-qualified reference type to an integer does not exist.
  // int & const cr2i = x;
  // Nor a const-qualified reference type to a const integer.
  // const int & const cr2ci = x;
  // Nor any other cv-qualified reference type.
  // const volatile int & const volatile cvr2cvi = x;
}

In the above example we used the qualifiers in reference declarations but not at the top level. A top-level qualifier for a reference type would be on the right of the & declarator but is disallowed there.

A reference to a pointer, but not the other way around.

A reference to a pointer exists, but a pointer to a reference doesn’t.

#include <iostream>

int main()
{
  int x = 1;
  int *p = &x;
  // A reference to a pointer to an integer.
  int * & r2p = p;
  // A reference to a const pointer to an integer.
  int * const & r2cp = p;
  *r2p = 2;
  // We cannot modify the const pointer through the reference.
  // r2p = &x;
  // But we can modify the pointer.
  p = &x;
  std::cout << x << '\n';

  int &r = x;
  // A pointer to a reference does not exist.
  // int & * p2r = r;
}

Reference tricks

Reference type and function overload resolution

A function can be overloaded depending on the parameter types, and this applies to references too. We can have these overloads:

void foo(T &); - overload #1,
void foo(const T &); - overload #2,
void foo(T &&); - overload #3.

For a call expression foo(<expr>), a compiler will choose (which is called overload resolution):

overload #1, if <expr> is an lvalue of a non-cost type,
overload #2, if <expr> is an lvalue of a const type,
overload #3, if <expr> is an rvalue.

A const reference (used in overload #2) can bind to an lvalue of a non-const type or to an rvalue, so when there is no overload #1 and #3, a compiler will choose overload #2 instead.

Here’s a complete example:

#include <iostream>

using namespace std;

// Uncomment, and analyze the difference in overload resolution.
//
// // Overload #1
// void
// foo(int &)
// {
//   cout << "int &" << endl;
// }

// Overload #2
void
foo(const int &)
{
  cout << "const int &" << endl;
}

// Uncomment, and analyze the difference in overload resolution.
//
// // Overload #3
// void
// foo(int &&)
// {
//   cout << "int &&" << endl;
// }

int
main()
{
  int x = 1;
  int &rx = x;
  foo(x); // Overload #1 if available, overload #2 otherwise.
  foo(rx); // Overload #1 if available, overload #2 otherwise.

  const int y = 2;
  const int &ry = y;
  foo(y); // Always overload #2, an error otherwise.
  foo(ry); // Always overload #2, an error otherwise.

  int &&rz = 3;
  foo(3); // Overload #3 if available, overload #2 otherwise.
  foo(rz); // Overload #1 if available, overload #2 otherwise.
}

Explicit conversion from an lvalue to an rvalue

We already know the standard conversion that implicitly converts an lvalue to an rvalue. However, that standard conversion is suppressed in the initialization of an rvalue reference: the initializing expression must by an rvalue ([dcl.init.ref]).

We can explicitly get an rvalue reference to an lvalue with static_cast<T &&>(<expr>), where <expr> can be an lvalue or an rvalue. This is, however, a bit wordy, since we have to type in the type T.

It’s easier to get an rvalue reference with std::move(<expr>), where <expr> can be an lvalue or an rvalue. std::move is a function template: a compiler will deduce the type T based on <expr>, so we don’t have to type it in. That function uses static_cast<T &&>(<expr>).

Here’s an example:

#include <utility>

class A {};

int main()
{
  A a;

  A &&r1 = static_cast<A &&>(a);
  A &&r2 = std::move(a);
  // The standard lvalue-to-rvalue conversion is suppressed.
  // A &&r3 = a;
}

The use case

I can think of one use case only. We use std::move(x) to explicitly enable the move semantics for object x (i.e., we turn x from an lvalue to an rvalue), which by default would not have the move sematics enabled, because the expression x is an lvalue. We enable the move semantics by making the compiler choose a different overload depending on category of the expression.

A temporary lifetime extension by reference

The lifetime of a temporary is extended by the reference that binds to it. The temporary will be destroyed when the reference goes out of scope. Otherwise, a temporary would be destroyed after the expression was evaluated.

#include <iostream>
#include <string>

using namespace std;

struct A
{
  string m_name;

  A(const string &name): m_name(name)
  {
    cout << "ctor: " << m_name << endl;
  }

  ~A()
  {
    cout << "dtor: " << m_name << endl;
  }
};

int main()
{
  A a = A("a");

  {
    const A &r1 = a;
  }

  {
    const A &r2 = A("r2");

    {
      const A &R2 = r2;
    }

    cout << "Just checking." << endl;
  }

  {
    A &&r3 = A("r3");
  }
}

A reference field has to be initialized by a constructor but not with a temporary expression [class.base.init#8]. The following example is ill-formed:

#include <iostream>

struct A
{
  ~A()
  {
    std::cout << "dtor of A\n";
  }
};

struct B
{
  // ACHTUNG!  These initializations are ill-formed! Does the compiler
  // complain?  Clang 11 does, but GCC 12 doesn't.
  const A &a = A();
  const int &i = 1;
};

int
main()
{
  const B &b = B();
  std::cout << "&b   = " << &b << '\n';
  std::cout << "&b.a = " << &b.a << '\n';
}

Conclusion

A reference gives us a way to refer by name to some data.
A reference is initialized, and then cannot be changed.
Three reference types:
- an lvalue reference, which can bind to an lvalue only,
- a const reference, which can bind to both an lvalue and rvalue,
- an rvalue reference, which can bind to an rvalue only.
A reference extends the lifetime of a temporary it’s bound to.

Quiz

What are the reference types, and what are their hallmarks?
What can we initialize a const reference with?
Can we get an rvalue reference to a non-const lvalue?

Acknowledgement

The project financed under the program of the Minister of Science and Higher Education under the name “Regional Initiative of Excellence” in the years 2019 - 2022 project number 020/RID/2018/19 the amount of financing 12,000,000 PLN.