cpp

Introduction

Expression categories are fundamental, yet difficult to understand. It’s all about the details of the lvalues and rvalues, about which we don’t think in our daily programming.

To understand the meaning of the lvalues and rvalues, it’s best to go through this text, without searching for some deeper meaning at this time. Similar advice got Alice from Humpty Dumpty in the novel “Through the Looking-Glass” by Lewis Carroll:

“Must a name mean something?” Alice asks Humpty Dumpty, only to get this answer: “When I use a word… it means just what I choose it to mean – neither more nor less.”

The value of an expression

An expression can be:

a literal: 3.14,
a variable: x,
an operator with operands: x + y,
a function call: foo(x).

The value of an expression is the result of evaluating an expression.

An expression has:

a type (e.g., int, bool, class A) known at compile time,
a value of the type (e.g., 5, false, A()) known at run time,
a category (e.g., lvalue, rvalue) known at compile time.

History: CPL, C, C++98

Two expression categories introduced in the CPL language (about half a century ago) were:

lvalue: ``left of assignment’’ value, i.e., any expression that can go on the left of the assignment operator is an lvalue,
rvalue: ``right of assignment’’ value, i.e., any expression that can go on the right of the assignment operator is an rvalue.

CPL defined the lvalue and rvalue categories in relation to the assignment operator. These definitions are only of historical importance, and do not apply to C++.

In C, expressions are either lvalues (for locator value; a locator is something that locates (points to) the value, e.g., the name of a variable). In C, a non-lvalue is an expression that is not an lvalue. There is no rvalue in C!

C++98 adopted lvalues from C, and named the expressions that are not an lvalue as an rvalue.

Details

Category of an expression

In C++, the two most important categories of an expression are: the lvalue category and the rvalue category. In short, an lvalue is an expression of the lvalue category, and an rvalue is an expression of the rvalue category.

The expression category determines what we can do with the expression. Some operations we can do only with an lvalue (e.g., &x, i.e., taking the address of variable x), other operations only with an rvalue.

Example operations for expression <expr>:

assign: <expr> = 1
initialize reference: <reference type> y = <expr>
take the address: &<expr>
dereference: *<expr>
increment: ++<expr>, <expr>++

The definitions of lvalues and rvalues

You can look in vain for a concise and correct definition of lvalues and rvalues in the C++ standard. The C++ standard, which has about 1500 pages, defines them partially is various places, as needed.

Furthermore, in modern C++ new expression categories were introduced: prvalue, glvalue, and xvalue. However, the most important categories are still lvalue, and rvalue.

We need to learn the details of the lvalue and rvalue categories to understand and efficiently use the modern C++. For instance, the following is a statement from http://cppreference.com, which is hard to understand without knowing the lvalue and rvalue details:

Even if the variable’s type is an rvalue reference, the expression consisting of its name is an lvalue expression.

The lvalue category

It’s hard to find a succinct definition in the C++ standard of the lvalue category, because the meaning of the lvalue category is spread all over the standard. But the following is a good description of the lvalue category.

If &<expr> compiles, then <expr> is an lvalue. That is, if we can take the address of an expression, then this expression is an lvalue.

An expression with a variable name (e.g., x) is always an lvalue.

The examples of lvalues are:

the name of a variable: x
the name of a function: foo
a string literal: "Hello World!"
the results of the prefix incrementation: ++i

The definition of the lvalue that anything that can go on the left of the assignment operator is an lvalue does not apply to C++. You can have an lvalue on the left of the assignment operator, and the code will not compile:

int main()
{
  const int i = 1;

  &i; // Expression "i" is an lvalue.
  // &2; // Expression "2" is an rvalue.

  // i = 2; // Error, even though "i" is an lvalue.
}

The assignment operator for the integral types expects an lvalue on the left, so we cannot write 1 = 1. Here is a more elaborate example:

struct A
{
  int m_t[3];

  int
  operator[](unsigned i)
  {
    return m_t[i];
  }
};

int main()
{
  A a1;
  // The built-in assignment operator for integers expects an lvalue
  // on the left-hand size.  However, the overloaded operator[]
  // function returns a non-reference type, and so its call expression
  // is an rvalue.  That's why the following equivalent lines of code
  // do not compile.
  // a1[0] = 1;
  // a1.operator[](0) = 1;
}

The rvalue category

An expression is an rvalue, if it’s not an lvalue. We can’t take the address of an rvalue.

The examples of rvalues are:

a numeric literal: 1
an expression that creates a temporary object: std::string("Hello World!")
the result of the sufffix incrementation: i++
a function call: foo(), if int foo();

The definition of the rvalue as something that should be on the right of the assignment operator does not apply to C++. You can have an rvalue on the left of the assignment operator, and the code will compile. For instance, A() is an rvalue (that creates a temporary object), and we can assign to it, because we defined the assignment operator in class A:

int main()
{
  struct A
  {
    void
    operator = (int i)
    {
    }
  };

  A() = 1;
  A().operator=(1);
}

From lvalue to rvalue

The C++ standard defines this standard conversion, which is applied without the programmer explicitly requesting it:

An lvalue of a non-function, non-array type T can be converted to an rvalue.

For instance, the + operator for an integer type (e.g., int) requires rvalues as its operands. In the following example the + operator expects rvalues, and so the lvalues x and y are converted to rvalues.

int main()
{
  int x = 1, y = 2;
  x + y;
}

For instance, the unary * operator (i.e., the dereference operator) requires a value of a memory address, which is an rvalue. However, we can use the dereference operator with an lvalue too, because that lvalue will be converted to an rvalue.

int main()
{
  // The dereference operator requires an rvalue.  The null pointer
  // literal static_cast<int *>(0) is an rvalue.
  *static_cast<int *>(0);

  int x = 1;
  int *p = &x;
  *p; // OK: "p" is an lvalue, but converted to an rvalue.
}

There is no standard or implicit conversion from an rvalue to an lvalue. For example, the reference operator (i.e., the unary & operator, a.k.a. the take-the-address-of operator) expects an lvalue. The rvalue that you try to pass will not be converted to an lvalue.

int main()
{
  // static_cast<int *>(0) and nullptr are null-value literals of a
  // pointer type.  They both are rvalues.

  // &static_cast<int *>(0); // Error: lvalue required.

  // &nullptr; // Error: lvalue required.
}

Example of the increment operator

The increment operator (i.e., the ++ operator) requires an lvalue as its operand. This requirement applies to both the prefix and the suffix versions of the operator. The same applies to the decrement operator.

int main()
{
  int x = 1;
  ++x; // The prefix version of the increment operator.
  x++; // The suffix version of the increment operator.
  // ++1; // Error: lvalue needed, no rvalue to lvalue conversion.
  // 1++; // Error: lvalue needed, no rvalue to lvalue conversion.
}

The expression of the increment operator for built-in types is:

an lvalue for the prefix version of the operator, i.e., the ++<expr> is an lvalue, because the prefix increment operator returns a reference to the just-incremented object it got as an operand,
an rvalue for the suffix version of the operator, i.e., the <expr>++ is an rvalue, because the suffix increment operator returns a temporary copy (which is an rvalue) of the object is got as an operand.

Therefore ++++x compiles, and x++++ doesn’t.

int main()
{
  int x = 1;
  ++++x; // OK: ++x is an lvalue, and ++ wants an lvalue.
  // x++++; // Error: x++ is an rvalue, and ++ wants an lvalue.
}

As a side note:

the prefix version has lower precedence than the suffix version,
the prefix version has the right-to-left associativity,
the suffix version has the left-to-right associativity.

In the example below, std::string has the suffix increment operator defined. The loop with the prefix operator would be more complicated.

#include <algorithm>
#include <iostream>
#include <string>

using namespace std;

// We have to define the function as non-member, because we cannot
// modify type std::string.
string
operator++(string &s, int)
{
  string tmp = s;
  next_permutation(s.begin(), s.end());
  return tmp;
}

int main()
{
  cout << "Permutations for abc:" << endl;
  for(string i = "abc"; i++ != "cba";)
    cout << i << endl;
}

Temporary objects

A temporary object (or just a temporary) is an object that is created when an expression is evaluated. A temporary is automatically destroyed (i.e., you don’t need to explicitly destroy it) when it is not needed anymore.

A temporary is needed when:

evaluating an operation: 1 + 2, string("T") + "4"
when passing an argument to a function: foo(A())
when returning an object from a function: string x = foo();
throwing an exception: throw A();

A temporary is an object, not an expression, and so a temporary is neither an lvalue nor an rvalue, because an object has no category of expression. An object is used in an expression that is either an lvalue or an rvalue. Usually a temporary is created in rvalue expressions.

A temporary as a function argument

An expression with a temporary can be an argument of a function call, in which case that expression is an rvalue. If a function takes an argument by reference (i.e., the parameter of the function is of a const reference type), the expression with that parameter name is an lvalue even though the reference is bound to an rvalue.

That example follows. The constructor outputs the address of the object, so that we can make sure it’s the same object in function foo.

#include <iostream>

struct A
{
  A()
  {
    std::cout << "ctor: " << this << std::endl;
  }
};

// "a" is a parameter of a const reference type.
void
foo(const A &a)
{
  // "a" is an lvalue.
  std::cout << "foo: " << &a << std::endl;
}

int main()
{
  // "A()" is an rvalue.
  foo(A());
}

A temporary as an exception

An expression with a temporary can be an argument of the throw instruction, in which case that expression is an rvalue. If a catch block catches the exception by a reference, the expression with that reference name is an lvalue even though the reference is bound to an rvalue.

That example follows. The constructor outputs the address of the object, so that we can make sure it’s the same object in the catch block:

#include <iostream>

int main()
{
  struct A
  {
    A()
    {
      std::cout << "ctor: " << this << std::endl;
    }
  };

  try
    {
      // "A()" is an rvalue.
      throw A();
    }
  // Catch the exception by reference.  It's a non-const reference!
  catch (A &a)
    {
      // "a" is an lvalue.
      std::cout << "catch: " << &a << std::endl;
    }
}

We should catch an exception by reference: if we catch it by value, we’re going to copy that exception. Change the example so that an exception is caught by value, and you’ll see that we get a copy (you’ll see different addresses).

Interestingly, and as a side note: in the example above, that non-const reference is bound to an rvalue. C++98 states that only a const reference can bind to an rvalue, which does not hold in the case of catching an exception. In the example above, I would expect catch(A &a) to fail to compile, as it should be catch(const A &a). Wierd.

Interestingly, and as a side note, a statement block (i.e., {<statements>}), can be replaced with a single statement, e.g., {++i;} can be replaced with ++i;. However, the try and catch blocks always have to be blocks, and you cannot remote {} even if it has a single statement. Wierd.

Overloading member functions

A member function can be called for both an lvalue or an rvalue. However, a member function can be declared with a reference qualifier & or && (and therefore be ref-qualified), so that it can be called for either an lvalue or an rvalue. Example:

int main()
{
  struct A
  {
    void operator = (int) &
    {
    }

    void operator = (int) && = delete;
  };

  A a;
  a = 1;

  // Does not compile, because the overload declared deleted.
  // A() = 1;
}

Functions and categories of expressions

Function foo, (e.g., void foo(<params>)) can be used in an expression in two ways:

by name only:
- i.e., the expression: foo,
- that expression is an lvalue, because we can take its address: &foo,
by a function call:
- i.e., the expression: foo(<args>),
- that expression is:
  - an lvalue if the function returns a reference (an lvalue reference, specifically),
  - an rvalue otherwise.

This is an example of a function call that is an lvalue:

int &
loo()
{
  // FYI: It compiles even if we remove the return statement below!
  return *static_cast<int *>(0);
}

int main()
{
  &loo(); // OK: "loo()" is an lvalue.
  int &l = loo(); // OK: "loo()" is an lvalue.
}

This is an example of a function call that is an rvalue:

int
roo()
{
  return 0;
}

int main()
{
   // &roo(); // Error: "roo()" is an rvalue.
   // int &r = roo(); // Error: "roo()" is an rvalue.
}

Incomplete types and categories of expressions

An incomplete type is the type that was either:

declared, but not defined,
or defined as an abstract class.

Expressions of the incomplete type can be only lvalues (and so rvalues can be only of complete types).

class B;

B &
boo()
{
  return *static_cast<B *>(0);
}

int main()
{
  &boo(); // OK: "boo()" is an lvalue.
  // B(); // Error: expression "B()" is an rvalue.
}

Conclusion

An expression has a category. A value of some type (e.g., of class A or type int) has no category.

What we can do with an expression depends on its category.

Every expression is either an lvalue or an rvalue.

We covered only the basics, there is more: glvalue, prvalue, xvalue.

Quiz

Can an expression be both an lvalue and an rvalue at the same time?
Is a temporary object an rvalue?
What does the category of the function-call expression depend on?
Why does int i; ++i++; not compile?

Acknowledgement

The project financed under the program of the Minister of Science and Higher Education under the name “Regional Initiative of Excellence” in the years 2019 - 2022 project number 020/RID/2018/19 the amount of financing 12,000,000 PLN.