cpp

Wprowadzenie

Najważniejsze fakty o referencjach:

Referencja jest aliasem (nazwą) danej (zmiennej, obiektu, czy obiektu tymczasowego).
Referencja nie ma tożsamości (ang.~identity) i dlatego nie możemy pobrać adresu referencji.
Mając referencję do obiektu, możemy uzyskać dostęp do składowych pól i funkcji z użyciem operatora dostępu (czyli object.member, jak w przypadku nazwy zmiennej), a nie z użyciem operatora dostępu przez wskaźnik (czyli pointer->member, jak w przypadku wskaźnika).
Referencja musi być zainicjalizowana, więc nie ma referencji pustych, które nie zostały zainicjalizowane. Wskaźnik może być pusty (czyli nullptr).
W przeciwieństwie do wskaźnika, referencji nie można zmienić, żeby była aliasem innej danej.
Typ referencyjny nie może mieć kwalifikatorów typu.
Istnieje referencja do wskaźnika, ale nie wskaźnik do referencji.
Referencja może być przechowywana w std::pair i std::tuple, ale nie w kontenerze czy tablicy.
Istnieje typ void *, ale nie void & (na szczęście).

Główne zastosowania referencji:

przekazywanie argumentu do funkcji przez referencję,
zwracanie wartości z funkcji przez referencję,
użycie danych przez referencję w polach składowych obiektów,
dostęp do danych tylko do odczytu (referencja stała).

Referencja jest inicjalizowana na podstawie wyrażenia inicjalizującego. Mając daną, która jest wartością wyrażenia inicjalizującego, możemy równoważnie powiedzieć, że referencja:

odnosi się (ang. refers to) do danej,
wiąże się (ang. binds to) z daną,
nazywa (ang. names) daną.

Referencja odnosi się do wartości wyrażenia kategorii l-wartość, albo kategorii r-wartość, ale w skrócie możemy powiedzieć, że wiąże się z l-wartością albo r-wartością (ang. a reference binds to an lvalue).

Referencje języka C++ nie mają odpowiednika w języku Java czy C#: w C++ referencja może nie istnieć w czasie uruchomienia, bo została wyoptymalizowana w czasie kompilacji.

W Javie czy C# referencja jest wskaźnikiem o semantyce współdzielonej własności: referencje mogą być kopiowane, a dana będzie tak długo istnieć, jak istnieje przynajmniej jeden taki wskaźnik. W tych językach wskaźnik (i to nie taki zwykły jak w C, tylko implementujący współdzielenie obiektu) zawsze istnieje w czasie uruchomienia.

Poniższy przykład pokazuje wyoptymalizowanie referencji w czasie kompilacji. Poniższe dwa programy robią to samo, ale w drugim używamy referencji, które są wyoptymalizowane.

Zapiszmy ten plik jako test1.cc:

#include <iostream>

int
main()
{
  int x = 1;
  double y = .2;
  std::cout << x << y << std::endl;
}

Zapiszmy ten plik jako test2.cc:

#include <iostream>
#include <utility>

int
main()
{
  int x = 1;
  double y = .2;
  std::pair<int &, double &> p{x, y};
  std::cout << p.first << p.second << std::endl;
}

Skompilujmy je do asemblera:

g++ -S -O3 test1.cc test2.cc

Mamy teraz dwa pliki test1.s i test2.s. Spójrzmy na pierwszy z nich:

c++filt < test1.s | less

Porównajmy je, żeby się przekonać, że są takie same (czyli że nie ma śladu po referencji):

diff test1.s test2.s

Typy referencyjne

Są trzy typy referencji:

l-referencja typu T &: odnosi się do danej, którą możemy zmieniać, ale nie przenosić (ang. move), bo będzie może ta dana potem potrzebna w jej obecnym miejscu,
referencja stała typu const T &: odnosi się do danej, której nie możemy zmieniać, ani przenosić,
r-referencja typu T &&: odnosi się do danej, którą możemy zmieniać i przenosić, bo wkrótce będzie zniszczona.

L-referencja jest też nazywana referencją l-wartości, albo referencją do l-wartości. R-referencja jest też nazywana referencją r-wartości, albo referencją do r-wartości.

Typy referencyjne używamy w definicji typu, np. typu zmiennej czy typu zwracanego przez funkcję. Wyrażenie nigdy nie jest typu referencyjnego, ponieważ w miejsce referencji (np. zmiennej referencyjnej) podstawiana jest dana, do której referencja się odnosi. [expr.type]

Terminy l-wartość i r-wartość w nazwach typów

Wyrażenia nazywamy l-wartością lub r-wartością, np.:

"1" jest l-wartością,
1 jest r-wartością.

Te terminy także są używane w nazwach typów, co jest trochę mylące:

int &x = <expr>; - wyrażenie x jest referencją l-wartości (ang. lvalue reference) i kategorii l-wartość.
int &&x = <expr>; - wyrażenie x jest referencją r-wartości (ang. rvalue reference) i kategorii l-wartość.

Teraz rozumiem to zdanie z http://cppreference.com:

Nawet jeżeli typem zmiennej jest referencja typu r-wartość (r-referencja), to wyrażenie składające się z nazwy tej zmiennej jest l-wartością.

L-referencja

L-referencja może być zainicjalizowana tylko l-wartością.

Tak definiujemy l-referencję:

T &name = <expr>;

Referencja name odnosi się do danej typu T. Wyrażeniem inicjalizującym jest <expr>. Deklaratorem l-referencji jest &.

Podstawowe przykłady:

#include <cassert>
#include <iostream>

int &
foo()
{
  static int x = 1;
  return x;
}

int
main()
{
  int x = 1;

  // Can initialize an lvalue reference with an lvalue.
  int &xr = x;

  // Error: needs initialization.
  // int &a;

  // OK: foo() is an lvalue.
  int &fr = foo();

  // A const.
  const int c = 300000000;

  // Error: an lvalue reference cannot bind to a const.
  // int &r = c;

  // Error: an lvalue reference cannot bind to an rvalue.
  // int &ncr = 1;

  // Initialized alright.
  int &z = x, y = 2;

  // The same as above, but this time we know what "&" applies to.
  // int y = 2, &z = x;

  // This placement of "&" is confusing: does "&" apply to "y" too?
  // int& z = x, y = 2;

  // IMPORTANT: Now z is an alias of x.  Whenever you see z, just
  // replace it with x to understand the code below.
  
  // A reference has no address.  &z is the address of x.
  assert (&z == &x);

  // There's no "reinitialization"!  It's an assignment to x.
  z = y;
  // Now x has 2.
  std::cout << "x = " << x << std::endl;

  // Initialize an lvalue reference with an expression that z is an
  // alias of.  Expression z is simply treated as x.
  int &zz = z;
  // Therefore the above has the same effect as this one.
  int &zx = x;
}

Przykłady dla kontenerów i tablic:

#include <array>
#include <vector>

int main()
{
  int x, y;

  // Containers can store pointers.
  std::vector<int *> v = {&x, &y};
  // But not references.
  // std::vector<int &> v;

  // Array can store pointers.
  int *a[] = {&x, &y};
  // But not references.
  // int &r[] = {x, y};

  // std::array can store pointers.
  std::array<int *, 2> b = {&x, &y};
  // But not references.
  // std::array<int &, 2> c = {x, y};
}

Przykłady dla std::pair i std::tuple:

#include <iostream>
#include <utility>
#include <tuple>
#include <vector>

// This example demonstrates that, unlike containers, std::pair and
// std::tuple can have elements of a reference type.  std::pair,
// std::tuple, and std::array are quasi-containers, because they have
// some container functionality, but cannot change size at run-time as
// containers do.
int main()
{
  // Error: the pair elements of reference type must be initialized.
  // std::pair<int &, int &> p2;

  int x = 1;

  std::pair<int &, int &> p(x, x);
  p.second = 2;

  std::cout << "x = " << x << std::endl;

  // The tuple elements of reference type must be initialized.
  std::tuple<int &, int &, int &> t(x, x, x);
  std::get<2>(t) = 3;

  std::cout << "x = " << x << std::endl;

  // Interestingly, we cannot have a vector of references, but we can
  // have a vector of reference pairs.
  std::vector<std::pair<int &, int &> > v{p, p};
  v[1].second = 4;

  std::cout << "v[0].first = " << v[0].first << std::endl;
  std::cout << "x = " << x << std::endl;
}

Referencja stała

Tak definiujemy referencję stałą:

const T &name = <expr>;

Jest to l-referencja, która odwołuje się do danej typu const T, czyli kwalifikator const doprecyzowuje typ danej, a nie typ referencji. Referencji i tak nie możemy zmienić, ale nazywamy ją referencją stałą w skrócie (myślowym), żeby nie mówić o l-referencji do stałej danej.

Referencja stała została wprowadzona, żeby dane można było używać tylko do odczytu. Na przykład, funkcja może przyjmować argument przez referencję stałą, żeby zapewnić brak modyfikacji argumentu. Co więcej, aby funkcja mogła przyjąć daną tymczasową (czyli r-wartość), w C++98 napisano:

Referencja stała może być zainicjalizowana l-wartością lub r-wartością.

Przykłady:

#include <string>

using namespace std;

// The function parameter is a const reference.
int
foo(const string &)
{
  return 0;
}

int
main()
{
  // The data we bind references to.
  int x = 1;
  const int cx = 1;

  // The const reference binds to an lvalue.  The type of the
  // initializing expression `x` is automatically augmented with the
  // const qualifier.
  const int &l1 = x;

  // The reference to a const int is initialized with a const int.
  const int &cr = cx;

  // Error: an lvalue reference cannot bind to an rvalue.
  // int &l2 = 1;
  // The const reference binds to an rvalue.
  const int &l3 = 1;

  // Expression "foo()" is an rvalue, because the function returns the
  // result by value.
  const int &c = foo("Hello!");
  // Error: we cannot take the address of the returned value.
  // const int *p = &foo("Hello!");
  // But we can take the address to the temporary to which reference c
  // is bound.  The reference extends the lifetime of the temporary.
  const int *p = &c;
  
  string s;
  // The function parameter reference binds to an lvalue.
  foo(s);
  // Since C++98, a const reference can bind to the data of an rvalue.
  // Here the function parameter reference binds to an rvalue, which
  // is the expression with a temporary of type string created with
  // the constructor taking a "const char *".
  foo("Hello!");
}

R-referencja

R-referencja może być zainicjalizowana tylko r-wartością.

Tak definiujemy r-referencję z użyciem deklaratora &&:

T &&name = <expr>;

R-referencja została wprowadzona w C++11, żeby umożliwić:

semantykę przeniesienia,
doskonałe przekazywanie argumentów wywołania funkcji szablonowej.

Przykłady:

#include <iostream>

int main()
{
  // Error: an rvalue reference must be initialized.
  // int &&a;

  int i = 1;

  // Erorr: an rvalue reference cannot bind to an lvalue.
  // int &&z = i;

  int &l = i;
  // Error: l is an lvalue, and an rvalue reference can't bind to it.
  // int &&r = l;

  // OK: an rvalue reference can bind to an rvalue.
  int &&x = 1;

  // Error: x is an lvalue, and an rvalue reference can't bind to it.
  // x is an lvalue, because it has a name, even though it's of the
  // rvalue reference type.
  // int &&z = x;
}

Referencji nie można zmienić

Referencja (każda, nie tylko stała) nie może być zmieniona, żeby odwoływała się do innej danej. Referencja może być tylko zainicjalizowana. Taka zmiana byłaby wymagana przez operator przypisania typu klasowego, który ma referencyjne pole składowe, tak jak w przykładzie niżej:

#include <utility>

struct A
{
  int &m_r;

  A(int &r): m_r(r)
  {
  }
};

int
main()
{
  int i, j;
  A a(i), b(j);

  // These would not compile:
  // a = b;
  // a = std::move(b);
}

Kwalifikatory

Typ wskaźnikowy może mieć kwalifikatory const lub volatile (ang. cv-qualified), a typ referencyjny nie (ang. cv-unqualified).

Przykłady dla wskaźnika:

int main()
{
  int x = 1;
  const int cx = 2;

  // A pointer to an integer.
  int *p2i_1 = &x;
  // A pointer to an integer cannot point to a const integer, because
  // then we could modify the integer that we declare const.
  // int *p2i_2 = &cx;

  // A pointer to a const integer.
  const int *p2ci_1 = &x;
  const int *p2ci_2 = &cx;

  // A const pointer to an integer.
  int * const cp2i = &x;
  // We can modify the data pointed to by the pointer all right.
  *cp2i = 2;
  // But not the pointer itself.
  // cp2i = &x;

  // Const pointer to a const integer.
  const int * const cp2ci_1 = &x;
  const int * const cp2ci_2 = &cx;
  // We cannot modify the data:
  // *cp2ci_1 = 2;
  // Nor the pointer itself:
  // cp2ci_1 = &cx;

  // The same applies for the volatile qualifier.
  const volatile int * const volatile cvp2cvi = &x;
}

Przykłady dla referencji:

int main()
{
  int x = 1;
  const int cx = 2;

  // A reference to an integer.
  int &r2i_1 = x;
  // A reference to an integer cannot point to a const integer, because
  // then we could modify the integer that we declare const.
  // int &r2i_2 = cx;

  // A reference to a const integer.
  const int &r2ci = x;
  // A reference to a volatile integer.
  volatile int &r2vi = x;
  // A reference to a const volatile integer.
  const volatile int &r2cvi = x;

  // A const-qualified reference type to an integer does not exist.
  // int & const cr2i = x;
  // Nor a const-qualified reference type to a const integer.
  // const int & const cr2ci = x;
  // Nor any other cv-qualified reference type.
  // const volatile int & const volatile cvr2cvi = x;
}

W przykładzie wyżej użyliśmy kwalifikatorów w deklaracjach referencji, ale nie na najwyższym poziomie (ang. top-level). Kwalifikator najwyższego poziomu dla typu referencyjnego byłby na prawo od deklaratora &, ale tam jest zabroniony.

Referencja do wskaźnika, ale nie na odwrót.

Referencja do wskaźnika istnieje, ale nie wskaźnik do referencji.

#include <iostream>

int main()
{
  int x = 1;
  int *p = &x;
  // A reference to a pointer to an integer.
  int * & r2p = p;
  // A reference to a const pointer to an integer.
  int * const & r2cp = p;
  *r2p = 2;
  // We cannot modify the const pointer through the reference.
  // r2cp = &x;
  // But we can modify the pointer.
  p = &x;
  std::cout << x << '\n';

  int &r = x;
  // A pointer to a reference does not exist.
  // int & * p2r = r;
}

Sztuczki referencyjne

Typ referencji a wybór przeciążeń funkcji

Funkcja może być przeciążana w zależności od typów parametrów i to także dotyczy typów referencyjnych. Mamy zatem trzy możliwe przeciążenia:

void foo(T &); - przeciążenie #1,
void foo(const T &); - przeciążenie #2,
void foo(T &&); - przeciążenie #3.

Mając dostępne powyższe trzy przeciążenia, przy wywołaniu funkcji foo(<expr>) kompilator wybierze przeciążenie (ang. overload resolution):

#1, jeżeli <expr> jest l-wartością niestałego typu,
#2, jeżeli <expr> jest l-wartością stałego typu,
#3, jeżeli <expr> jest r-wartością.

Referencja stała (użyta w przeciążeniu #2) może być zainicjalizowana l-wartością niestałego typu albo r-wartością, więc kiedy nie zadeklarowano przeciążeń #1 i #3, kompilator wybierze przeciążenie #2.

Oto przykład:

#include <iostream>

using namespace std;

// Uncomment, and analyze the difference in overload resolution.
//
// // Overload #1
// void
// foo(int &)
// {
//   cout << "int &" << endl;
// }

// Overload #2
void
foo(const int &)
{
  cout << "const int &" << endl;
}

// Uncomment, and analyze the difference in overload resolution.
//
// // Overload #3
// void
// foo(int &&)
// {
//   cout << "int &&" << endl;
// }

int
main()
{
  int x = 1;
  int &rx = x;
  foo(x); // Overload #1 if available, overload #2 otherwise.
  foo(rx); // Overload #1 if available, overload #2 otherwise.

  const int y = 2;
  const int &ry = y;
  foo(y); // Always overload #2, an error otherwise.
  foo(ry); // Always overload #2, an error otherwise.

  int &&rz = 3;
  foo(3); // Overload #3 if available, overload #2 otherwise.
  foo(rz); // Overload #1 if available, overload #2 otherwise.
}

Jawna konwersja l-wartości do r-wartości

Znamy już konwersję standardową, która niejawnie konwertuje l-wartość do r-wartości. Jednak ta konwersja standardowa nie ma zastosowania przy inicjalizacji r-referencji: wyrażenie inicjalizujące musi być r-wartością ([dcl.init.ref]).

Możemy jawnie konwertować l-wartość do r-wartości przez użycie static_cast<T &&>(<expr>), gdzie <expr> może być l-wartością albo r-wartością. Jawne podanie typu T jest jednak trochę uciążliwe.

Prościej jest otrzymać r-wartość z użyciem std::move(<expr>), gdzie <expr> może być l-wartością albo r-wartością. Funkcja std::move jest szablonowa, gdzie typ r-wartości będzie wywnioskowany na podstawie typu wyrażenia <expr>. Funkcja używa static_cast<T &&>(<expr>).

Przykład:

#include <utility>

class A {};

int main()
{
  A a;

  A &&r1 = static_cast<A &&>(a);
  A &&r2 = std::move(a);
  // The standard lvalue-to-rvalue conversion is suppressed.
  // A &&r3 = a;
}

Jeden przypadek użycia

Jawnej konwersji l-wartości do r-wartości używamy tylko po to, żeby jawnie umożliwić przenoszenie obiektu x, który nie mógłby być przeniesiony, bo wyrażenie x jest l-wartością. Zmieniając kategorię wyrażenia, wpływamy na wybór przeciążenia konstruktora obiektu, operatora przypisania, czy dowolnej innej funkcji przeciążonej typami referencyjnymi.

Przedłużenie istnienia danej tymczasowej

Istnienie danej tymczasowej może być przedłużone referencją, która odwołuje się do niej. Dana tymczasowa będzie zniszczona, kiedy referencja wyjdzie poza zakres. Bez referencji, dana tymczasowa byłaby niszczona po opracowaniu wyrażenia.

#include <iostream>
#include <string>

using namespace std;

struct A
{
  string m_name;

  A(const string &name): m_name(name)
  {
    cout << "ctor: " << m_name << endl;
  }

  ~A()
  {
    cout << "dtor: " << m_name << endl;
  }
};

int main()
{
  A a = A("a");

  {
    const A &r1 = a;
  }

  {
    const A &r2 = A("r2");

    {
      const A &R2 = r2;
    }

    cout << "Just checking." << endl;
  }

  {
    A &&r3 = A("r3");
  }
}

Referencyjne pole składowe musi być zainicjalizowane przez konstruktor, ale nie wyrażeniem tymczasowym [class.base.init#8]. Poniższy przykład jest błędny:

#include <iostream>

struct A
{
  ~A()
  {
    std::cout << "dtor of A\n";
  }
};

struct B
{
  // ACHTUNG!  These initializations are ill-formed! Does the compiler
  // complain?  Clang 11 does, but GCC 12 doesn't.
  const A &a = A();
  const int &i = 1;
};

int
main()
{
  const B &b = B();
  std::cout << "&b   = " << &b << '\n';
  std::cout << "&b.a = " << &b.a << '\n';
}

Podsumowanie

Referencja daje możliwość odwołania się do danej przez nazwę.
Referencja musi być zainicjalizowana i nie może być zmieniona.
Są trzy typy referencji:
- l-referencja,
- stała referencja,
- r-referencja.
Referencja przedłuża istnienie danej tymczasowej.