The Curious Case Of Inclusive Ranges
I stumbled upon this reddit post recently, which at first glance, was complaining about the performance discrepancy between a while and a for loop, but the true answer is within the replies. TL;DR: using inclusive ranges in Rust lead to suboptimal code!
Have a look at the docs of
RangeIteratorImpl
and
RangeInclusiveIteratorImpl
:
impl<A: Step> RangeIteratorImpl for ops::Range { type Item = A;
#[inline]
default fn spec_next(&mut self) -> Option<A> {
if self.start < self.end {
let n =
Step::forward_checked(self.start.clone(), 1).expect("`Step` invariants not upheld");
Some(mem::replace(&mut self.start, n))
} else {
None
}
}
...
[`RangeInclusiveIteratorImpl`]:
```rust
impl<A: Step> RangeInclusiveIteratorImpl for ops::RangeInclusive<A> {
type Item = A;
#[inline]
default fn spec_next(&mut self) -> Option<A> {
if self.is_empty() {
return None;
}
let is_iterating = self.start < self.end;
Some(if is_iterating {
let n =
Step::forward_checked(self.start.clone(), 1).expect("`Step` invariants not upheld");
mem::replace(&mut self.start, n)
} else {
self.exhausted = true;
self.start.clone()
})
}
...
Obviously, the bottom version is more complex. Does that produce worse code though?
I tested it out with this simple program (code here):
use std::time;
fn for_inclusive() {
let mut num: u16 = 0;
for i in 0..=(u16::MAX - 1) {
num += i;
}
}
fn for_exclusive() {
let mut num: u16 = 0;
for i in 0..u16::MAX {
num += i;
}
}
fn main() {
println!("Running for 5 iterations");
let start = time::Instant::now();
for _ in 0..5 {
// Uncomment the desired function to test
for_exclusive();
//for_inclusive();
}
let end = start.elapsed();
println!("Done: {:?}", end);
}
And here are the results:
# for_inclusive()
$ cargo run --release
Finished release [optimized] target(s) in 0.15s
Running `target/release/iter`
Running for 5 iterations
Done: 49.03µs
# for_exclusive()
$ cargo run --release
Finished release [optimized] target(s) in 0.00s
Running `target/release/iter`
Running for 5 iterations
Done: 74ns
As it turns out, the loop using the inclusive range finished in some microseconds, while the loop using the inclusive range finshed in mere nanoseconds, a few factors faster than the inclusive range!
But why?
Playground
Plugging the above loops into the Rust playground gives us a better idea of what's happening. You can see that the HIR and MIR generated by the inclusive range is worse than the HIR/MIR generated by the exclusive range.
Godbolt
I plugged in the same functions into godbolt, on
-C opt-level=1
and the results
are telling. There's far more assembly generated by the inclusive range
rather than the exclusive range.
If you look at the definition of the RangeInclusiveIteratorImpl
from
above, you might notice that the RangeInclusive
struct contains an
exhausted
boolean - which deals with the equality check at the end of
the range - and this is perhaps the reason why there's so much extra
code generated by the inclusive range.
With -C opt-level=3
. The
results are even more interesting - because the functions aren't doing
anything, you'd think that the functions could be optimized away
completely - which is what happens for the exclusive range code - but
not for the inclusive range code! The exclusive range code returns
immediately and we are done, but the inclusive range code still
generates assembly to deal with the equality check at the end of the
range.
Takeaways
There isn't any real takeaway here other than to prefer using Range
rather than RangeInclusive
as much as possible. Of course, perhaps
where a loop ends can perhaps be more understandable with the inclusive
range notation, but that tradeoff would be for you to decide.