The Curious Case Of Inclusive Ranges

I stumbled upon this reddit post recently, which at first glance, was complaining about the performance discrepancy between a while and a for loop, but the true answer is within the replies. TL;DR: using inclusive ranges in Rust lead to suboptimal code!

Have a look at the docs of RangeIteratorImpl and RangeInclusiveIteratorImpl:

impl<A: Step> RangeIteratorImpl for ops::Range { type Item = A;

#[inline]
default fn spec_next(&mut self) -> Option<A> {
    if self.start < self.end {
        let n =
            Step::forward_checked(self.start.clone(), 1).expect("`Step` invariants not upheld");
        Some(mem::replace(&mut self.start, n))
    } else {
        None
    }
}

...

[`RangeInclusiveIteratorImpl`]:

```rust
impl<A: Step> RangeInclusiveIteratorImpl for ops::RangeInclusive<A> {
    type Item = A;

    #[inline]
    default fn spec_next(&mut self) -> Option<A> {
        if self.is_empty() {
            return None;
        }
        let is_iterating = self.start < self.end;
        Some(if is_iterating {
            let n =
                Step::forward_checked(self.start.clone(), 1).expect("`Step` invariants not upheld");
            mem::replace(&mut self.start, n)
        } else {
            self.exhausted = true;
            self.start.clone()
        })
    }

    ...

Obviously, the bottom version is more complex. Does that produce worse code though?

I tested it out with this simple program (code here):

use std::time;

fn for_inclusive() {
    let mut num: u16 = 0;
    for i in 0..=(u16::MAX - 1) {
        num += i;
    }
}

fn for_exclusive() {
    let mut num: u16 = 0;
    for i in 0..u16::MAX {
        num += i;
    }
}

fn main() {
    println!("Running for 5 iterations");
    let start = time::Instant::now();
    for _ in 0..5 {
        // Uncomment the desired function to test
        for_exclusive();
        //for_inclusive();
    }
    let end = start.elapsed();
    println!("Done: {:?}", end);
}

And here are the results:

# for_inclusive()
$ cargo run --release
    Finished release [optimized] target(s) in 0.15s
     Running `target/release/iter`
Running for 5 iterations
Done: 49.03µs

# for_exclusive()
$ cargo run --release
    Finished release [optimized] target(s) in 0.00s
     Running `target/release/iter`
Running for 5 iterations
Done: 74ns

As it turns out, the loop using the inclusive range finished in some microseconds, while the loop using the inclusive range finshed in mere nanoseconds, a few factors faster than the inclusive range!

But why?

Playground

Plugging the above loops into the Rust playground gives us a better idea of what's happening. You can see that the HIR and MIR generated by the inclusive range is worse than the HIR/MIR generated by the exclusive range.

Godbolt

I plugged in the same functions into godbolt, on -C opt-level=1 and the results are telling. There's far more assembly generated by the inclusive range rather than the exclusive range.

If you look at the definition of the RangeInclusiveIteratorImpl from above, you might notice that the RangeInclusive struct contains an exhausted boolean - which deals with the equality check at the end of the range - and this is perhaps the reason why there's so much extra code generated by the inclusive range.

With -C opt-level=3. The results are even more interesting - because the functions aren't doing anything, you'd think that the functions could be optimized away completely - which is what happens for the exclusive range code - but not for the inclusive range code! The exclusive range code returns immediately and we are done, but the inclusive range code still generates assembly to deal with the equality check at the end of the range.

Takeaways

There isn't any real takeaway here other than to prefer using Range rather than RangeInclusive as much as possible. Of course, perhaps where a loop ends can perhaps be more understandable with the inclusive range notation, but that tradeoff would be for you to decide.