cpp

Introduction

In C we use a function pointer to ship a piece of code (e.g., that establishes order between objects) to some other piece of code (e.g., a sorting function). In C++ a generalization of a function is something that we can call. We call it a callable. Calling a callable has the syntax of calling a function (i.e., applying the () operator), and the function interface: we know the types of the arguments and the return value.

The goal of that generalization is to:

The standard library is using (passing, keeping as a member field) callables by value (i.e., not by reference), so copying a callable should be fast. Callables passed to standard algorithms (e.g., std::sort) and standard containers (e.g., std::priority_queue) should be copied fast. That means, a callable should not keep a lot of data that would be copied.

Callable is frequently used by value (e.g., by the standard library), but it can be used by reference or by pointer too.

Motivation

We sort integers below. We can do it, because integers have a built-in operator < defined:

#include <algorithm>
#include <iostream>
#include <vector>

using namespace std;

int
main()
{
  vector v{3, 1, 2};
  sort(v.begin(), v.end());
  for(const auto &e: v)
    cout << e << endl;
}

Below we sort data of a user-defined type, and to this end we need to define the ordering (we could also say to induce order or to establish order) between the objects with the comparison operator, i.e., the < operator. There are a number of comparison operators that we can define for a user-defined type: ==, !=, <, <=, >, =>, <=>, but for inducing order between elements (the objects of the user-defined type) the most important operator is the less-than operator, i.e., the < operator. We could refer to the less-than operator as the comparison operator, but the avoid confusion, we refer to it as the < operator.

We defined the < operator as a member function: this is the first object to compare, and the function parameter is the second object to compare. The member comparison operator should be declared as const (because it should not modify its object, i.e., this), and it should take the other operand by a const reference (because it should not change that operand).

#include <algorithm>
#include <iostream>
#include <vector>

using namespace std;

struct A
{
  int id;
  A(int id): id(id)
  {
  }

  bool
  operator < (const A &a) const
  {
    return this->id < a.id;
  }
};

int
main()
{
  vector v{A(3), A(1), A(2)};
  sort(v.begin(), v.end());
  for(const auto &e: v)
    cout << e.id << endl;
}

Function std::sort is using the < operator if we do not provide a comparison callable as the last call argument. Well, std::sort is actually using an object of type std::less which in turn uses the < operator. For comparing the sorted elements, the implementation of std::sort does not use the < directly, because it would be stuck with it, and be unfit for customization. Instead, the implementation uses a callable for comparison.

No wonder that we can get the same effect as in the example above if we pass as the last argument an object of type std::less<A> (that’s a callable), because if we didn’t, it would be used anyway:

#include <algorithm>
#include <iostream>
#include <functional>

using namespace std;

struct A
{
  int id;
  A(int id): id(id)
  {
  }

  bool
  operator < (const A &a) const
  {
    return this->id < a.id;
  }
};

int
main()
{
  vector v{A(3), A(1), A(2)};
  sort(v.begin(), v.end(), std::less<A>{});
  for(const auto &e: v)
    cout << e.id << endl;
}

With std::sort, we do not always have to use the < operator, and instead we can pass a callable to induce ordering. We can sort in ascending order if we pass an object of the std::greater type. That type is using the > operator, and so we have to define it instead of the < operator.

#include <algorithm>
#include <iostream>
#include <functional>
#include <vector>

using namespace std;

struct A
{
  int id;
  A(int id): id(id)
  {
  }

  bool
  operator > (const A &a) const
  {
    return this->id > a.id;
  }
};

int
main()
{
  vector v{A(3), A(1), A(2)};
  sort(v.begin(), v.end(), std::greater<A>{});
  for(const auto &e: v)
    cout << e.id << endl;
}

A callable can be passed not only to a function, but also to a constructor, which can store the callable as a member field, as, e.g., the standard priority queue (type std::priority_queue) does. Here’s an example, which we’ll be modifying later:

#include <iostream>
#include <queue>

using namespace std;

int
main(void)
{
  priority_queue<int> q;

  q.push(2);
  q.push(1);
  q.push(3);

  while(!q.empty())
    {
      cout << q.top() << endl;
      q.pop();
    }
  
  return 0;
}

Callable types

A callable is either:

A pointer to a function

An expression that is only the name of a function, e.g., foo, (without the () operator) is treated as the address of that function. We can also take the address of a function by preceeding the name of the function with the & operator, e.g., &foo. These ways of taking a function address are equivalent, which is inconsistent, because there should be just one way.

Having this address, we can call the function. The only operations we can do with a function pointer is: take a function address, call the function.

#include <iostream>

bool
foo(const int &a, const int &b)
{
  std::cout << "foo: a = " << a << ", b = " << b << '\n';
  return true;
}

int
main()
{
  // I don't like the pointer-to-function C syntax.
  bool (*f2a)(const int &, const int &) = foo;
  bool (*f2b)(const int &, const int &) = &foo;
  f2a(10, 20);
  (*f2a)(10, 20);
  f2b(10, 20);
  (*f2b)(10, 20);

  // That's better.
  using foo_type = bool(const int &a, const int &b);
  foo_type *f3a = foo;
  foo_type *f3b = &foo;
  f3a(10, 20);
  (*f3a)(10, 20);
  f3b(10, 20);
  (*f3b)(10, 20);
}

Here we sort descendingly by passing a callable, which is a function pointer:

#include <algorithm>
#include <iostream>
#include <vector>

using namespace std;

bool
cmp(const int &a, const int &b)
{
  return a > b;
}

int
main()
{
  vector v{3, 1, 2};
  sort(v.begin(), v.end(), cmp);
  for(const auto &e: v)
    cout << e << endl;
}

In the example below, we pass a pointer to a comparison function.

#include <functional>
#include <iostream>
#include <queue>

using namespace std;

bool
foo(const int &a, const int &b)
{
  return a < b;
}

int
main(void)
{
  priority_queue<int, vector<int>,
                 bool(*)(const int &, const int &)> q(foo);

  q.push(2);
  q.push(1);
  q.push(3);

  while(!q.empty())
    {
      cout << q.top() << endl;
      q.pop();
    }
  
  return 0;
}

A functor

A functor is an object on which we can call the () operator. The exact definition of the () operator depends on how we call a functor. In comparison with a function, a functor is more capable, because it can keep some extra data passed to its constructor.

In the example below we define a simple functor type, create a functor, and call it. The functor is a callable, because we can call it. Since the () operator was defined constant, we can call it for constant functors.

#include <iostream>

struct functor_type
{
  void operator()() const
  {
    std::cout << "Hello World from a functor!\n";
  }
};

int
main()
{
  const functor_type f;
  f();
}

We can store some data in a functor:

#include <iostream>

struct functor_type
{
  int m_n;

  functor_type(int n): m_n(n)
  {
  }

  void operator()() const
  {
    for(int i = 0; i < m_n; ++i)
      std::cout << "Hello World from a functor!\n";
  }
};

int
main()
{
  functor_type f(10);
  f();
}

A functor acts just like a function when it keeps no data. For example, types std::less and std::greater act like functions, because they do not store any data. Such functor types introduce no performance hit: the constructors and destructors are empty, so they generate no code, and the comparison function is inlined. It’s as fast as it can be.

By default, the priority queue uses std::less, and returns the largest element. To make the priority queue return the smallest element, we can pass it a functor of type std::greater:

#include <iostream>
#include <queue>
#include <vector>

using namespace std;

int
main(void)
{
  // HERE'S THE DIFFERENCE!
  priority_queue<int, vector<int>, greater<int>> q;

  q.push(2);
  q.push(1);
  q.push(3);

  while(!q.empty())
    {
      cout << q.top() << endl;
      q.pop();
    }
  
  return 0;
}

Here we define a functor type, which acts like a function:

#include <iostream>
#include <queue>
#include <vector>

using namespace std;

struct CMP
{
  bool
  operator()(const int &a, const int &b)
  {
    return a > b;
  }
};

int
main(void)
{
  // An object of CMP will be default-constructed by q.
  priority_queue<int, vector<int>, CMP> q;

  q.push(2);
  q.push(1);
  q.push(3);

  while(!q.empty())
    {
      cout << q.top() << endl;
      q.pop();
    }
  
  return 0;
}

In the example below we pass to a functor a boolean value that is used by the call operator. This value is passed at run-time. This functionality couldn’t be achieved with a function.

#include <functional>
#include <iostream>
#include <queue>

using namespace std;

struct CMP
{
  bool m_order;

  CMP(bool order): m_order(order)
  {
  }

  bool
  operator()(const int &a, const int &b) const
  {
    return m_order ? a < b : a > b;
  }
};

int
main(void)
{
  bool order;

  cout << "Enter 0 or 1:";
  cin >> order;
  
  priority_queue<int, vector<int>, CMP> q(order);

  q.push(2);
  q.push(1);
  q.push(3);

  while(!q.empty())
    {
      cout << q.top() << endl;
      q.pop();
    }
  
  return 0;
}

Closure

A closure is a functor which is the result of a lambda expression. A lambda (in short for a lambda expression) is syntactic sugar for conveniently creating functors: they help us create functors with less fuss in comparison with creating a functor. We could do away with lambda expressions, and achieve the same functionality with functors. Lambdas are just handy.

Since a closure is an object, it must have some type, but we usually do not need it, and so we say that a closure is of an anonymous type. We can get the type of a closure with the decltype operator.

Syntax

Lambda expressions can be nuanced, and we’ll not cover all the nuances. However, most lambdas are of this syntax:

[capture list](parameter list) mutable {body}

The capture list and the parameter list are comma-separated. If the parameter list is empty, the () can be dropped. Even if the capture list and the body are empty, [] and {} cannot be dropped.

The capture list can have:

The parameter list is the list of function parameters, just like for a regular function.

The mutable specifier is optional. By default the member () operator function is defined constant, but we can make it non-const with the mutable specifier.

Therefore the simplest lambda is []{}. Here we create a closure and call it in one go (in one expression):

int
main()
{
  // These two are equivalent.
  [](){};
  []{};

  // Here we also call the closure: notice the trailing ().
  []{}();
}

Expression []{}() is simply translated into this code:

int
main()
{
  // The block scope is here, so that object x behaves like a
  // temporary object.  Object x is of anonymous type.
  {
    struct
    {
      void
      operator()() const
      {
      }
    } x;
    x();
  }
}

Semantics

A lambda creates a functor type (a structure or a class), and an object of this type. These are the basic facts:

The capture list describes how to capture variables from the scope of the lambda expression, so that they are available in the body. The scope is the fragment of code where variables are accessible: the global scope, the class scope, the function scope, and the block scope.

The capture list can be empty. In that case only the parameters of the parameter list are available in the body. Example:

#include <iostream>

int
main()
{
  int x = 1;

  // Would not compile, because x is not captured, and the parameter
  // list is empty.
  // []{x = 10;}();

  [](int &x){x = 10;}(x);
  std::cout << "x = " << x << std::endl;
}

The code above is equivalent to this code:

#include <iostream>

int
main()
{
  int x = 1;

  {
    struct
    {
      void
      operator()(int &x) const
      {
        x = 10;
      }
    } f;
    f(x);
  }

  std::cout << "x = " << x << std::endl;
}

Ways of capturing variables

A variable can be captured by value or by reference. When a variable is captured by value, the closure keeps in its member field a copy of the value of the captured variable, i.e., the member field was initialized by copying the value of the captured variable. To capture a variable by value, put its name in the capture list.

When a variable is captured by reference, the closure keeps in its member field a reference to the captured variable, i.e., the member reference was initialized with the captured variable. To capture a variable by reference, put its name in the capture list, and preceeded it by an ampersand.

For example:

#include <iostream>

int
main()
{
  int x = 1, y = 2;

  // Variable x is captured by value, y by reference.
  [x, &y]() mutable {x = 10, y = 20;}();

  std::cout << "x = " << x << ", y = " << y << std::endl;
}

The code above is equivalent to this code:

#include <iostream>

int
main()
{
  int x = 1, y = 2;

  {
    struct F
    {
      int x;
      int &y;

      F(int x, int &y): x(x), y(y)
      {
      }

      void
      operator()()
      {
        x = 10;
        y = 20;
      }
    };

    F f(x, y);
    f();
  }

  std::cout << "x = " << x << ", y = " << y << std::endl;
}

Default capture policy

The capture list can begin with the default policy of capturing variables either by value or by reference. If a default capture policy is given, all variables used in the body are captured, and we do not have to list them.

We set the default capture-by-value policy with =. For example:

#include <iostream>

int
main()
{
  int x = 1, y = 2;

  // Capture every variable by value.
  [=]() mutable {x = 10; y = 20;}();

  std::cout << "x = " << x << ", y = " << y << std::endl;
}

The code above is equivalent to this code:

#include <iostream>

int
main()
{
  int x = 1, y = 2;

  {
    struct F
    {
      int x;
      int y;

      F(int x, int y): x(x), y(y)
      {
      }

      void
      operator()()
      {
        x = 10;
        y = 20;
      }
    };

    F f(x, y);
    f();
  }

  std::cout << "x = " << x << ", y = " << y << std::endl;
}

We set the default capture-by-reference policy with &. Please note that in the example below, and the next example too, the call operator can be const, because we are not modifying a member reference, but a variable to which the member reference is bound.

#include <iostream>

int
main()
{
  int x = 1, y = 2;

  // Capture every variable by reference.
  [&]{x = 10; y = 20;}();

  std::cout << "x = " << x << ", y = " << y << std::endl;
}

The code above is equivalent to this code:

#include <iostream>

int
main()
{
  int x = 1, y = 2;

  {
    struct F
    {
      int &x;
      int &y;

      F(int &x, int &y): x(x), y(y)
      {
      }

      void
      operator()() const
      {
        x = 10;
        y = 20;
      }
    };

    F f(x, y);
    f();
  }

  std::cout << "x = " << x << ", y = " << y << std::endl;
}

We can specify the default capture policy, and then list those variables that should be captured differently. Also, for the member fields, we do not have to use the names of the captured variables, but we can give any name.

#include <iostream>

using namespace std;

int
main()
{
  {
    int x = 1, y = 2;

    // Because of the default capture-by-value policy, x is captured
    // by value, while y is captured by reference.
    [=, &y]() mutable {x = 10, y = 20;}();
    cout << "x = " << x << ", y = " << y << endl;
  }

  {
    int x = 1, y = 2;

    // Because of the default capture-by-reference policy, x is
    // captured by reference, while y is captured by value.
    [&, y]() mutable {x = 10, y = 20;}();
    cout << "x = " << x << ", y = " << y << endl;
  }

  {
    int x = 1, y = 2;

    // We name the captured variables differently: a, b.
    [a = x, &b = y]() mutable {a = 10, b = 20;}();
    cout << "x = " << x << ", y = " << y << endl;
  }
}

Closure Examples

Since a closure has some type, but we just don’t care about it, we write:

auto c = closure expression;

By using the auto type, we let the compiler deduce (this process is called type deduction) the type of c based on the type of the closure expression. Even though there is the = sign, the line above is not an assignment expression, but an initialization statement. That entails that the closure is not copied, but created directly in the memory location of c with copy elision.

Here’s an example of using a lambda with a priority queue:

#include <functional>
#include <iostream>
#include <queue>
#include <vector>

using namespace std;

int
main(void)
{
  auto f = [](const int &a, const int &b){return a > b;};

  priority_queue<int, vector<int>, decltype(f)> q(f);

  q.push(2);
  q.push(1);
  q.push(3);

  while(!q.empty())
    {
      cout << q.top() << endl;
      q.pop();
    }
  
  return 0;
}

Here we pass an argument to a closure:

#include <functional>
#include <iostream>
#include <queue>
#include <vector>

using namespace std;

int
main(void)
{
  bool order;

  cout << "Enter 0 or 1: ";
  cin >> order;
  
  auto c = [order](const int &a, const int &b)
           {return order ? a < b : a > b;};

  priority_queue<int, vector<int>, decltype(c)> q(c);

  q.push(2);
  q.push(1);
  q.push(3);

  while(!q.empty())
    {
      cout << q.top() << endl;
      q.pop();
    }
  
  return 0;
}

Conclusion

A callable is a generalization of a function. A callable can be:

Lambdas are nifty and succinct: we can create closures with little writing, and with less room for mistakes (lambdas are less error-prone).

Quiz