C++ exercise: Folding range adaptor#
C++20 has a lot of nice features.
I've still barely explored the main ones, but I've already gotten great use out of concepts and even more out of ranges.
There's something quite beautiful about lazy for
-range loops that just inline themselves into the equivalent, clunkier imperative code that feels tiresome before you even start typing it out.
(I think the keyword is “functional”.)
Now, there's certainly something less beautiful about how complicated C++ makes this (and everything else). Rest assured that I went through funny-looking internal template nonsense errors and standard specifications while doing this little exercise. These along with design issues have been discussed before (you've likely read Jonathan Boccara's The Surprising Limitations of C++ Ranges Beyond Trivial Cases), but I don't have the expertise nor the incentive to join this discussion, so I'm going to leave that away.
The way this exercise came up is from a binary analysis project of mine. When analyzing a binary, I have a number of objects (mostly functions) that each occupy an interval of the address space, sometimes intersecting each other. On multiple occasions I needed an iteration principle that yields of all the uniform segments, which requires me to iterate the sequence while keeping track of which interval is active at each point.
I'll go back to the modeling of this problem at the end; its solution is rather specific. Instead we can start with a similar and slightly simpler structure: a range adaptor iterfold()
that returns the successive partial folds of an input range over a given function. So let's make that.
1 2 3 4 |
|
Source for testing: https://gist.github.com/lephe/df76db492fe8a1c3afcdeb3a2b3b86c3
Disclaimer: I'm not a C++ wizard, but will happily incorporate suggestions if any get emailed to me.
Introduction#
This “fold” operation I'm talking about is a well-known traversal function from the functional world, but is also called “reduce” in the map-reduce paradigm. It computes the “accumulation” of the elements in a range through a custom function, which we could do naively like this:
1 2 3 4 5 6 7 8 |
|
(Note that I'm using abbreviated function template syntax, so in this case fold
is a template with three parameters: the type of r
, which must be an input_range
, the type of acc
, and the type of f
.)
The function is available in C++23 as std::ranges::fold_left
, but what I really want is to get the partial folds (i.e. all the intermediate values of acc
) out of a lazy range.
This will require some custom code.
“Smart iterator” adaptor#
Since ranges are built on top of iterators, I'll write the folding logic in an iterator that computes the next value of acc
when advanced.
Let's start with the iterator structure and its internal state:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
The smart iterator stores the underlying iterator it
, the current accumulated value acc
which is returned when we access the iterator with operator*
, and the folding function f
—so far so good.
Slightly annoying however, is the fact that it also has to store a sentinel (or end iterator, but we'll see why a sentinel soon).
At a surface level, this is because there is no nice way to do the ending bounds check automatically.
To see this, let's rewrite the naive fold()
function in iterator style and plug in some user code at each iteration, similar to the use pattern that we want for the iterfold()
range adaptor.
1 2 3 4 |
|
Notice the order of operations on it
:
- If not the first iteration,
++it
. - If
it
is the end iterator, stop. - Otherwise, use
*it
to update the accumulator.
It's obvious that we have to bounds-check the incremented it
before we can access it.
But then, there's no solid option for when to update the accumulator in our smart adaptor, whose iteration will look like this:
1 2 |
|
There are two options for where to update acc = f(acc, *it)
, both unsatisfying:
- If we do it in
++ad
, we'll be reading the newly-incrementedit
without bounds-checking it, and will access the past-the-end iterator (going in the order 1, 3, 2). - If we do it in
*ad
, we might end up callingf
multiple times (the user can refer to*ad
multiple times), which is annoying in general, and uniquely incorrect here as updatingacc
multiple times will yield the wrong results.
This conundrum comes up frequently with smart iterators and has been called the terrible problem of incrementing a smart iterator, which sums it up pretty well.
I elected to go with option #1 but add a bounds check based on a pre-determined sentinel, which gives me the following implementation of operator++()
.
1 2 3 4 5 6 7 8 9 10 11 12 |
|
Now if you've evolved the off-by-one diagnosis co-brain like the C programmer that I am, you've been counting.
For an underlying range with N elements, whose end we reach in N calls to operator++()
, there are N partial folds, not counting the initial value, and therefore there must be N updates to acc
.
The bounds check will obviously disable the update exactly once, when we reach the end of the underlying range, and therefore we are left with only N-1 updates.
Which means we're missing one somewhere.
Predictably, if we're removing one at the end of the iteration then we should probably be adding it back at the beginning. And indeed, as implicitly specified by my introductory example, we consume the initial element of the underlying range before generating the first partial fold in the series. So we must also advance in the constructor:
1 2 3 4 5 6 |
|
The last thing we need is a sentinel for the adaptor and a way to compare against it; I'm fine with not having a common range so I'll repurpose the underlying iterator's sentinel and provide equality comparison against S
.
1 2 3 4 |
|
And that's the entirety of iterfold_adaptor
structure with the folding logic.
All that's left is specifying its iterator_traits
and we'll fulfill the input iterator concept.
1 2 3 4 5 6 7 8 9 10 11 |
|
Unfortunately, to the best of my knowledge C++'s type system is completely incapable of checking concepts on parameterized definitions(1), so the best I know is to test that on a sample representative type.
1 2 3 |
|
At this point the core already works, and we can write out a full iteration:
1 2 3 |
|
Now we get to enjoy lifting it into nicer C++20 abstractions.
Building up to a range, view, and adaptor closure#
Making a range out of the smart adaptor isn't difficult; all we have to do is build a structure inheriting from view_interface
which returns the smart adaptor through begin()
and its sentinel through end()
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
And... for once, it's that easy. We already fulfill the concepts, and we can use the range in a for
-range loop.
1 2 3 4 5 6 7 |
|
The next thing we can do is provide a view-like function that takes a range as input instead of the iterator pair.
Abbreviated function templates really shine for this one.
We also get extra brevity because the way I declared v
gives me an std::initializer_list
which is already a range.
1 2 3 4 5 6 7 |
|
And finally, since C++23 we can also define our own range adaptor closures, which is a pretty convoluted way to say custom pipeline-style adaptors.
You'll have noticed that standard adaptors like std::views::transform
are called without a range argument, since it's provided with the pipeline syntax:
1 |
|
The right-hand side of that pipe is a utility object that just holds the adaptor's parameters (here the transform function) until the operator|
gets expanded to also supply it with the input range (here v
).
So it's essentially a partial application of the adaptor construction function, i.e. a closure, hence the name.
C++20 provided no tools for defining such closures, but C++23 does. Credit to Jonas Greitemann's article The C++ range adaptor pipe operator is associative for showing me how to do this, as the paper introducing this feature was not quite as enlightening to my poor brain.
1 2 3 4 5 6 7 8 9 10 11 12 |
|
It's really simple once you see it; the closure is just there to hold the parameters in-between being constructed and having its operator()
called with the left-hand side of the pipe.
range_adaptor_closure
is a CRTP structure and it just handles everything.
And now, behold!
1 2 3 |
|
Yes you can chain it and mix it with standard adaptors. ;)
1 2 3 4 5 |
|
g++ gives me maybe-uninitialized
about the first iterfold
's sentinel in a call to the filter's operator++()
, but I believe it's spurious, as that sentinel is initialized as soon as the underlying range is constructed.
Neither clang nor AddressSanitizer complain.
Comparison with a C++23 generator#
With C++23 generators, writing this whole thing becomes laughably easy.
1 2 3 4 5 6 7 8 9 10 11 |
|
However, as of now (I'm not sure whether that might be an early implementation) the code generated for this version is very unconvincing.
The range adaptor closure gives you super clean assembler code (g++ 14.1 -O3
) that has all the lazy evaluation I hoped I'd get.
1 2 3 4 5 6 |
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
But the generator version is, erm... slightly longer: https://godbolt.org/z/6aj9qfE5P. I reckon there is no need to benchmark this comparison.
Solving my original problem#
So back to my binary analysis; I have objects that are overlapping intervals in the address space, like so:
This is stored in a map:
1 2 3 4 5 |
|
And I want to iterate on all the uniform segments, i.e. stop at each of the vertical dotted lines while keeping track of which interval is open or not. To keep things simple at addresses where multiple intervals start/end (such as 16 in the example), I'll sweep from left to right and generate a value for every “change”. For instance, the iteration between addresses 9 and 22 will be:
Address | What | Active intervals |
---|---|---|
9 | Interval [9..11] starts | [9..11] |
11 | Interval [9..11] ends | (none) |
13 | Interval [13..16] starts | [13..16] |
16 | Interval [13..16] ends | (none) |
16 | Interval [16..22] starts | [16..22] |
18 | Interval [18..20] starts | [16..22], [18..20] |
20 | Interval [18..20] ends | [16..22] |
22 | Interval [16..22] ends | (none) |
The main piece of work is identifying these segments in the right order. Notice that there are two changes for each interval (one at the start and one at the end), so we're going to yield two changes for each read from the map iterator. However, this is not a ”one read, two writes” pattern: whenever there are intersections, we need to remember which intervals are open so we can go through their endpoints in the right order.
Since the map iterator gives us intervals by increasing start address but not by increasing end address, the responsibility for tracking that falls to the iterator. Here's an example at address 45, just after generating the change that opens interval [45..53]:
The intervals [40..50], [42..48] and [45..53] are all open at that time, so we need to remember them in a queue sorted by increasing end addresses. Then, to find out what the next change is, we just need to check when the next interval to open does so (from the map iterator), when the next interval to end does so (from the queue), and pick whichever comes first.(2)
There's just one subtlety, and it's again related to the terrible problem of incrementing a smart iterator, i.e. the fact that operator++()
and operator*()
are separate (unlike e.g. in a generator).
I want to expose a view of the queue to the user to indicate which intervals are currently open, as per the “Active intervals” column of the earlier table.
However, this column gives you the state after the change, whereas a direct implementation of the concept would give you the state before the change.
As a result, I'm going to eagerly precompute one step in advance in a cached attribute called next
.
Here's the logic for that:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
|
With this kind of setup, compute_next_change()
does one round of what a generator for this function would look like: advancing and producing the next value at the same time.
The value type is a row of the table, with the vector being exposed directly via a pointer (not the cleanest, but fine).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
|
It's quite a mouthful, but the key idea is finding out whether we open the next interval from it
before we end the next interval from queue
, or the other way around.
The condition has_option_1 && (!has_option_2 || option_1_better_than_2)
is typical of this “better of two options” process.
You can find it in classical algorithms such as the merge step of merge sort.
Once we know which event happens first, we can build the associated value of next
, and then update internal state.
Newly-opened intervals are inserted sorted in the queue with a round of insertion sort, while newly-closed intervals are popped from the queue.
Since we precompute one step in advance, comparing against the sentinel requires us to check not just if we've reached the end but also if we've consumed the last element.
And with this, we have a working iterator.
1 2 3 4 5 6 7 8 9 |
|
1 2 3 4 5 6 7 8 |
|
Since this is not an adaptor there's no need to go overboard with abstractions, but we can make it into a nice range at least.
1 2 3 4 5 6 7 8 9 |
|
I'm pretty happy with this result.
Conclusion#
The ability to write lazy iteration principles within the standard is extremely satisfying.
However, it's still a bit tedious to get there, especially because of the separation of operator++()
and operator*()
which doesn't make much sense for forward ranges, i.e. the most common type.
I hope that inter-procedural optimizations for generators will catch up in the future and inline the logic and storage for generators that clearly live in one function's context (e.g. those constructed and exhausted in for
-range loops).
Such optimizations would make iterfold()
and iterintervals()
trivially easy to write with basically no performance penalty.
The source code for iterfold()
can be found at https://gist.github.com/lephe/df76db492fe8a1c3afcdeb3a2b3b86c3.
-
Think typeclasses in dependently-typed languages... though the approach is completely different. Typeclass resolution looks more like a Prolog query, with varying degrees of constraints depending on the expressiveness of type system at hand. Anyway, these functional paradigms mix with overloading and specialization about as well as oil with an actual fire, so I'm not surprised. ↩
-
The worst-case complexity for handling the queue is quadratic for arbitrary inputs but stays linear if the number of intersecting intervals at any given point is bounded. ↩