C++ exercise: Folding range adaptor#

C++20 has a lot of nice features. I've still barely explored the main ones, but I've already gotten great use out of concepts and even more out of ranges. There's something quite beautiful about lazy for-range loops that just inline themselves into the equivalent, clunkier imperative code that feels tiresome before you even start typing it out. (I think the keyword is “functional”.)

Now, there's certainly something less beautiful about how complicated C++ makes this (and everything else). Rest assured that I went through funny-looking internal template nonsense errors and standard specifications while doing this little exercise. These along with design issues have been discussed before (you've likely read Jonathan Boccara's The Surprising Limitations of C++ Ranges Beyond Trivial Cases), but I don't have the expertise nor the incentive to join this discussion, so I'm going to leave that away.

The way this exercise came up is from a binary analysis project of mine. When analyzing a binary, I have a number of objects (mostly functions) that each occupy an interval of the address space, sometimes intersecting each other. On multiple occasions I needed an iteration principle that yields of all the uniform segments, which requires me to iterate the sequence while keeping track of which interval is active at each point.

I'll go back to the modeling of this problem at the end; its solution is rather specific. Instead we can start with a similar and slightly simpler structure: a range adaptor iterfold() that returns the successive partial folds of an input range over a given function. So let's make that.

auto const v = {1, 2, 3, 4, 5};
for(int i: v | iterfold(0, std::plus<int>()))
    std::print("{}\n", i);
// -> 1, 3, 6, 10, 15

Source for testing: https://gist.github.com/lephe/df76db492fe8a1c3afcdeb3a2b3b86c3

Disclaimer: I'm not a C++ wizard, but will happily incorporate suggestions if any get emailed to me.

Contents:

Introduction
“Smart iterator” adaptor
Building up to a range, view, and adaptor closure
Comparison with a C++23 generator
Solving my original problem
Conclusion

Introduction#

This “fold” operation I'm talking about is a well-known traversal function from the functional world, but is also called “reduce” in the map-reduce paradigm. It computes the “accumulation” of the elements in a range through a custom function, which we could do naively like this:

auto fold(std::ranges::input_range auto r, auto acc, auto f)
{
    for(auto value: r)
        acc = f(acc, value);
    return acc;
}

fold(v, 0, std::plus<int>()); // -> 15

(Note that I'm using abbreviated function template syntax, so in this case fold is a template with three parameters: the type of r, which must be an input_range, the type of acc, and the type of f.)

The function is available in C++23 as std::ranges::fold_left, but what I really want is to get the partial folds (i.e. all the intermediate values of acc) out of a lazy range. This will require some custom code.

“Smart iterator” adaptor#

Since ranges are built on top of iterators, I'll write the folding logic in an iterator that computes the next value of acc when advanced. Let's start with the iterator structure and its internal state:

/* I: Underlying iterator
   S: Sentinel for underlying iterator (same as I for usual iterators)
   T: Type of accumulated value
   F: Type of folding function, morally T(T, iterator_traits<I>::value_type) */
template<std::input_iterator I, typename S, typename T, typename F>
struct iterfold_iterator {
    T const &operator*() const {
        return acc;
    }

private:
    I it;
    T acc;
    F f;
public:
    S sentinel;
};

The smart iterator stores the underlying iterator it, the current accumulated value acc which is returned when we access the iterator with operator*, and the folding function f—so far so good.

Slightly annoying however, is the fact that it also has to store a sentinel (or end iterator, but we'll see why a sentinel soon). At a surface level, this is because there is no nice way to do the ending bounds check automatically. To see this, let's rewrite the naive fold() function in iterator style and plug in some user code at each iteration, similar to the use pattern that we want for the iterfold() range adaptor.

for(auto it = std::ranges::begin(r); it != std::ranges::end(r); ++it) {
    acc = f(acc, *it);
    use(acc);
}

Notice the order of operations on it:

If not the first iteration, ++it.
If it is the end iterator, stop.
Otherwise, use *it to update the accumulator.

It's obvious that we have to bounds-check the incremented it before we can access it. But then, there's no solid option for when to update the accumulator in our smart adaptor, whose iteration will look like this:

for(auto ad = /* ... */; ad != /* ... */; ++ad)
    use(*ad);

There are two options for where to update acc = f(acc, *it), both unsatisfying:

If we do it in ++ad, we'll be reading the newly-incremented it without bounds-checking it, and will access the past-the-end iterator (going in the order 1, 3, 2).
If we do it in *ad, we might end up calling f multiple times (the user can refer to *ad multiple times), which is annoying in general, and uniquely incorrect here as updating acc multiple times will yield the wrong results.

This conundrum comes up frequently with smart iterators and has been called the terrible problem of incrementing a smart iterator, which sums it up pretty well.

I elected to go with option #1 but add a bounds check based on a pre-determined sentinel, which gives me the following implementation of operator++().

    /* ... inside struct iterfold_iterator ... */
    constexpr iterfold_iterator &operator++() {
        ++it;
        if(it != sentinel)
            acc = f(acc, *it);
        return *this;
    }
    constexpr iterfold_iterator operator++(int) {
        iterfold_iterator tmp(*this);
        operator++();
        return tmp;
    }

Now if you've evolved the off-by-one diagnosis co-brain like the C programmer that I am, you've been counting. For an underlying range with N elements, whose end we reach in N calls to operator++(), there are N partial folds, not counting the initial value, and therefore there must be N updates to acc. The bounds check will obviously disable the update exactly once, when we reach the end of the underlying range, and therefore we are left with only N-1 updates. Which means we're missing one somewhere.

Predictably, if we're removing one at the end of the iteration then we should probably be adding it back at the beginning. And indeed, as implicitly specified by my introductory example, we consume the initial element of the underlying range before generating the first partial fold in the series. So we must also advance in the constructor:

    /* ... inside struct iterfold_iterator ... */
    constexpr iterfold_iterator(I it, S sentinel, T init, F f):
        it {it}, acc {init}, f {f}, sentinel {sentinel} {
        if(it != sentinel)
            acc = f(acc, *it);
    }

The last thing we need is a sentinel for the adaptor and a way to compare against it; I'm fine with not having a common range so I'll repurpose the underlying iterator's sentinel and provide equality comparison against S.

    /* ... inside struct iterfold_iterator ... */
    bool operator==(S const &other) const {
        return it == other;
    }

And that's the entirety of iterfold_adaptor structure with the folding logic. All that's left is specifying its iterator_traits and we'll fulfill the input iterator concept.

template<std::input_iterator I, typename S, typename T, typename F>
struct std::iterator_traits<iterfold_iterator<I, S, T, F>> {
    /* Distance is measured using the original iterator */
    using difference_type = std::iterator_traits<I>::difference_type;
    /* But values are of type T */
    using value_type = T;
    using pointer = T *;
    using reference = T &;
    /* This is exclusively an input iterator */
    using iterator_category = std::input_iterator_tag;
};

Unfortunately, to the best of my knowledge C++'s type system is completely incapable of checking concepts on parameterized definitions⁽¹⁾, so the best I know is to test that on a sample representative type.

using I = std::vector<int>::iterator;
using FI = iterfold_iterator<I, I, double, std::function<double(double,int)>>;
static_assert(std::input_iterator<FI>); // OK

At this point the core already works, and we can write out a full iteration:

for(auto ad = iterfold_iterator(v.begin(), v.end(), 0, std::plus<int>()); ad != ad.sentinel; ++ad)
    print("{}\n", *ad);
// -> 1, 3, 6, 10, 15

Now we get to enjoy lifting it into nicer C++20 abstractions.

Building up to a range, view, and adaptor closure#

Making a range out of the smart adaptor isn't difficult; all we have to do is build a structure inheriting from view_interface which returns the smart adaptor through begin() and its sentinel through end().

template<std::input_iterator I, typename S, typename T, typename F>
struct iterfold_range:
    public std::ranges::view_interface<iterfold_range<I, S, T, F>> {
    iterfold_range(I it, S sentinel, T init, F f): adaptor {it, sentinel, init, f} {}

    auto begin() const {
        return adaptor;
    }
    S end() const {
        return adaptor.sentinel;
    }

private:
    iterfold_iterator<I, S, T, F> adaptor;
};

And... for once, it's that easy. We already fulfill the concepts, and we can use the range in a for-range loop.

using FR = iterfold_range<I, I, double, std::function<double(double,int)>>;
static_assert(std::ranges::input_range<FR>); // OK
static_assert(std::ranges::viewable_range<FR>); // OK

for(int n: iterfold_range(v.begin(), v.end(), 0, std::plus<int>()))
    print("{}\n", n);
// -> 1, 3, 6, 10, 15

The next thing we can do is provide a view-like function that takes a range as input instead of the iterator pair. Abbreviated function templates really shine for this one. We also get extra brevity because the way I declared v gives me an std::initializer_list which is already a range.

auto iterfold_view(std::ranges::input_range auto r, auto init, auto f) {
    return iterfold_range(std::ranges::begin(r), std::ranges::end(r), init, f);
}

for(int n: iterfold_view(v, 0, std::plus<int>()))
    print("{}\n", n);
// -> 1, 3, 6, 10, 15

And finally, since C++23 we can also define our own range adaptor closures, which is a pretty convoluted way to say custom pipeline-style adaptors. You'll have noticed that standard adaptors like std::views::transform are called without a range argument, since it's provided with the pipeline syntax:

v | std::views::transform([](int n){ return 2 * n; })

The right-hand side of that pipe is a utility object that just holds the adaptor's parameters (here the transform function) until the operator| gets expanded to also supply it with the input range (here v). So it's essentially a partial application of the adaptor construction function, i.e. a closure, hence the name.

C++20 provided no tools for defining such closures, but C++23 does. Credit to Jonas Greitemann's article The C++ range adaptor pipe operator is associative for showing me how to do this, as the paper introducing this feature was not quite as enlightening to my poor brain.

template<typename T, typename F>
struct iterfold: std::ranges::range_adaptor_closure<iterfold<T, F>> {
    iterfold(T init, F f): init {init}, f {f} {}

    constexpr auto operator()(std::ranges::input_range auto &&r) const {
        return iterfold_view(r, init, f);
    }

private:
    T init;
    F f;
};

It's really simple once you see it; the closure is just there to hold the parameters in-between being constructed and having its operator() called with the left-hand side of the pipe. range_adaptor_closure is a CRTP structure and it just handles everything.

And now, behold!

for(int i: v | iterfold(0, std::plus<int>()))
    print("{}\n", i);
// -> 1, 3, 6, 10, 15

Yes you can chain it and mix it with standard adaptors. ;)

for(int i: v | iterfold(0, std::plus<int>())
             | std::views::filter([](int n){ return n % 2 == 1; })
             | iterfold(0, std::plus<int>()))
    print("{}\n", i);
// -> 1, 4, 19

g++ gives me maybe-uninitialized about the first iterfold's sentinel in a call to the filter's operator++(), but I believe it's spurious, as that sentinel is initialized as soon as the underlying range is constructed. Neither clang nor AddressSanitizer complain.

Comparison with a C++23 generator#

With C++23 generators, writing this whole thing becomes laughably easy.

#include <generator>

template<typename T>
std::generator<T> iterfold_generator(std::ranges::range auto r, T acc, auto f)
{
    for(auto value: r)
        co_yield (acc = f(acc, value));
}

for(int i: iterfold_generator(v, 0, std::plus<int>()))
    use(i);

However, as of now (I'm not sure whether that might be an early implementation) the code generated for this version is very unconvincing. The range adaptor closure gives you super clean assembler code (g++ 14.1 -O3) that has all the lazy evaluation I hoped I'd get.

void asm_adaptor(void)
{
    auto const v = {1, 2, 3, 4, 5};
    for(int i: v | iterfold(0, std::plus<int>()))
        use(i);
}

asm_adaptor():
        xor     eax, eax
        mov     edx, 1
        jmp     .L45
.L46:
        add     edx, DWORD PTR C.0.0[rax]
.L45:
        mov     edi, edx
        add     rax, 4
        call    use(int)
        cmp     rax, 20
        jne     .L46
        ret
C.0.0:
        .long   1, 2, 3, 4, 5

But the generator version is, erm... slightly longer: https://godbolt.org/z/6aj9qfE5P. I reckon there is no need to benchmark this comparison.

Solving my original problem#

So back to my binary analysis; I have objects that are overlapping intervals in the address space, like so:

Example set of overlapping intervals and the associated uniform segments.

This is stored in a map:

struct Interval {
    int start, length;
    int end() const { return start + length; }
};
using IntervalMap = std::map<int, Interval>;

And I want to iterate on all the uniform segments, i.e. stop at each of the vertical dotted lines while keeping track of which interval is open or not. To keep things simple at addresses where multiple intervals start/end (such as 16 in the example), I'll sweep from left to right and generate a value for every “change”. For instance, the iteration between addresses 9 and 22 will be:

Address	What	Active intervals
9	Interval [9..11] starts	[9..11]
11	Interval [9..11] ends	(none)
13	Interval [13..16] starts	[13..16]
16	Interval [13..16] ends	(none)
16	Interval [16..22] starts	[16..22]
18	Interval [18..20] starts	[16..22], [18..20]
20	Interval [18..20] ends	[16..22]
22	Interval [16..22] ends	(none)

The main piece of work is identifying these segments in the right order. Notice that there are two changes for each interval (one at the start and one at the end), so we're going to yield two changes for each read from the map iterator. However, this is not a ”one read, two writes” pattern: whenever there are intersections, we need to remember which intervals are open so we can go through their endpoints in the right order.

Since the map iterator gives us intervals by increasing start address but not by increasing end address, the responsibility for tracking that falls to the iterator. Here's an example at address 45, just after generating the change that opens interval [45..53]:

Queuing seen intervals in increasing order of end addresses.

The intervals [40..50], [42..48] and [45..53] are all open at that time, so we need to remember them in a queue sorted by increasing end addresses. Then, to find out what the next change is, we just need to check when the next interval to open does so (from the map iterator), when the next interval to end does so (from the queue), and pick whichever comes first.⁽²⁾

There's just one subtlety, and it's again related to the terrible problem of incrementing a smart iterator, i.e. the fact that operator++() and operator*() are separate (unlike e.g. in a generator). I want to expose a view of the queue to the user to indicate which intervals are currently open, as per the “Active intervals” column of the earlier table. However, this column gives you the state after the change, whereas a direct implementation of the concept would give you the state before the change.

As a result, I'm going to eagerly precompute one step in advance in a cached attribute called next. Here's the logic for that:

struct iterintervals_iterator {
    iterintervals_iterator(IntervalMap &IM): it {IM.begin()}, end {IM.end()} {
        compute_next_change();
    }

    constexpr iterintervals_iterator &operator++() {
        compute_next_change();
        return *this;
    }
    iterintervals_iterator operator++(int) {
        iterintervals_iterator tmp(*this);
        operator++();
        return tmp;
    }

    value_type operator*() const {
        return next;
    }
    value_type const *operator->() const {
        return &next;
    }

private:
    IntervalMap::iterator it;
    std::vector<Interval *> queue;
    value_type next;
public:
    IntervalMap::iterator end;
};

With this kind of setup, compute_next_change() does one round of what a generator for this function would look like: advancing and producing the next value at the same time. The value type is a row of the table, with the vector being exposed directly via a pointer (not the cleanest, but fine).

    /* ... within struct iterintervals_iterator ... */
    struct value_type {
        int address;
        bool added;
        Interval const *interval;
        std::vector<Interval *> const *active;
    };

    void compute_next_change() {
        /* Select the next opener or the next closer, whichever comes first */
        bool has_opener = (it != end);
        bool has_closer = !queue.empty();

        if(!has_opener && !has_closer)
            next = {-1, false, nullptr, &queue};
        else if(has_opener && (!has_closer || it->first < queue.back()->end())) {
            next = {it->second.start, true, &it->second, &queue};
            queue.push_back(&it->second);
            for(int i = queue.size() - 1; i > 0 && queue[i]->end() > queue[i-1]->end(); i--)
                std::swap(queue[i], queue[i-1]);
            ++it;
        }
        else {
            next = {queue.back()->end(), false, queue.back(), &queue};
            queue.pop_back();
        }
    }

    bool operator==(IntervalMap::iterator const &other) const {
        return it == other && next.interval == nullptr;
    }

It's quite a mouthful, but the key idea is finding out whether we open the next interval from it before we end the next interval from queue, or the other way around. The condition has_option_1 && (!has_option_2 || option_1_better_than_2) is typical of this “better of two options” process. You can find it in classical algorithms such as the merge step of merge sort.

Once we know which event happens first, we can build the associated value of next, and then update internal state. Newly-opened intervals are inserted sorted in the queue with a round of insertion sort, while newly-closed intervals are popped from the queue.

Since we precompute one step in advance, comparing against the sentinel requires us to check not just if we've reached the end but also if we've consumed the last element.

And with this, we have a working iterator.

IntervalMap IM { {9, {9,2}}, {13, {13,3}}, {16, {16,6}}, {18, {18,2}} };

for(auto it = iterintervals_iterator(IM); it != IM.end(); ++it) {
    print("@{:02d}: {} [{:2d}..{:2d}]   //", it->address,
        it->added ? "+" : "-", it->interval->start, it->interval->end());
    for(Interval *intv: *it->active)
        print(" [{:2d}..{:2d}]", intv->start, intv->end());
    print("\n");
}

@09: + [ 9..11]   // [ 9..11]
@11: - [ 9..11]   //
@13: + [13..16]   // [13..16]
@16: - [13..16]   //
@16: + [16..22]   // [16..22]
@18: + [18..20]   // [16..22] [18..20]
@20: - [18..20]   // [16..22]
@22: - [16..22]   //

Since this is not an adaptor there's no need to go overboard with abstractions, but we can make it into a nice range at least.

struct iterintervals: public std::ranges::view_interface<iterintervals> {
    iterintervals(IntervalMap &IM): iiv {IM} {}

    auto begin() const { return iiv; }
    auto end() const { return iiv.end; }

private:
    iterintervals_iterator iiv;
};

I'm pretty happy with this result.

Conclusion#

The ability to write lazy iteration principles within the standard is extremely satisfying. However, it's still a bit tedious to get there, especially because of the separation of operator++() and operator*() which doesn't make much sense for forward ranges, i.e. the most common type.

I hope that inter-procedural optimizations for generators will catch up in the future and inline the logic and storage for generators that clearly live in one function's context (e.g. those constructed and exhausted in for-range loops). Such optimizations would make iterfold() and iterintervals() trivially easy to write with basically no performance penalty.

The source code for iterfold() can be found at https://gist.github.com/lephe/df76db492fe8a1c3afcdeb3a2b3b86c3.

Think typeclasses in dependently-typed languages... though the approach is completely different. Typeclass resolution looks more like a Prolog query, with varying degrees of constraints depending on the expressiveness of type system at hand. Anyway, these functional paradigms mix with overloading and specialization about as well as oil with an actual fire, so I'm not surprised. ↩
The worst-case complexity for handling the queue is quadratic for arbitrary inputs but stays linear if the number of intersecting intervals at any given point is bounded. ↩