Daniel Micay | 6 Jun 06:09 2013
Picon

The future of iterators in Rust

A quick terminology refresher, for those who aren't familiar with it:

* Internal iterator: takes a closure, runs the closure until it asks to break
* External iterator: state machine, advanced by the caller in a loop

To a caller, external iterators provide the most functionality, because they
can be used as an internal iterator. You lose the state of an internal iterator
by breaking out of iterator, so generic algorithms like zip, union, intersect
and merge can't be implemented for a pair of iterators.

# Issues with internal iterators in Rust

A few months ago, we only had internal iterators and there were no generic
algorithms to use with any iterator - only with BaseIter's `each` method.

Rust's internal iterators implement the protocol encoded in Rust's for
statement, but it's not possible to give them all a common trait or implement
generic methods or functions taking any internal iterator.

As a workaround, we can write algorithms assuming internal iterators only take
one argument (the closure):

    fn count<T>(f: &fn(fn(T) -> bool) -> bool) -> uint

The caller has to use a partial function to call these adaptors, for which we
lack sugar:

    count(|f| uint::range(0, 10, f))

For simple functions, this is fairly reasonable once you're used to it. It
quickly gets out of control though, even for a simple function like filter:

    filter<T>(pred: &fn(&T) -> bool, input: &fn(&fn(T) -> bool) -> bool, output: &fn(T) -> bool) -> bool {}

Sadly, `filter_ref` is also needed to work around closures behaving badly with
lifetimes. An example of the problem with `fold_ref`:

    fn product<T: One + Mul<T, T>>(iter: &fn(f: &fn(&T) -> bool) -> bool) -> T {
        fold_ref(One::one::<T>(), iter, |a, x| *a = a.mul(x))
    }

Since `product` expects an iterator yielding `&T` (a borrowed pointer in any
region), it won't work with `fold` because that requires the borrowed pointer
to have the same type (and thus lifetime) for every iteration like `&'a int`.

This issue with borrowed pointers was blocking me from replacing the existing
algorithms reimplemented for both `str` and `vec` with the generic ones.

Chaining together iteration algorithms is a common use case, but even chaining
two together is confusing at first:

    to_vec(|g| filter(|&x| *x < 3, |f| xs.each(f), g)

Another more alarming issue is that with internal iterators, the `break` and
`return` statements don't always work in a `for` loop. If the iterator isn't
implemented correctly, the loop will keep going.

This also has borrow checking implications, because the compiler can't assume
those statements actually cause flow control to leave the loop immediately.

# External iterators

Based on the above, you might think generic iteration algorithms in Rust are a
bleak prospect. However, we already have a nice external iterator library, and
they don't suffer from the above issues.

All kinds of external iterators implement the following trait, whether they are
a fibonacci number generator, a reverse iterator over a vector or iterator over
a range in a sorted set:

    pub trait Iterator<A> {
        /// Advance the iterator and return the next value. Return `None` when the end is reached.
        fn next(&mut self) -> Option<A>;
    }

Generic adaptors are implemented on `Iterator`, and many of them are `Iterator`
implementations themselves:

    use std::iterator::*;

    fn main() {
        let mut it = Counter::new(0.0, 1.0)
            .take_while(|x| *x < 10000000.0)
            .transform(|x| x / 2.0)
            .transform(|x| x + 2.0);
        println(it.fold(0.0, |a, b| a + b).to_str())
    }

If you're curious, the optimized LLVM IR: http://ix.io/5Xl

Unlike internal iterators, external iterators only run one iteration at a time,
so a `for` loop designed for them would always be able to succeed with `break`
and `return`. It would also be able to avoid the ugly `advance` wrapper
currently required to use external iterators with `for`.

    // The current situation, wrapping an external iterator as an internal one
    //
    // Since the advance method is not known the be correct, borrow checking
    // still assumes `return` and `break` are imperfect.
    for xs.zip(ys).advance |x| { ... }

    // A hypothetical `for` loop using the `Iterator` trait
    for iterator |x| { ... }

    // It could also fall back to an `Iterable` trait and obtain an iterator
    for container |x| { ... }

External iterators also avoid the problems with references and closures,
because they simply return `T` rather than passing it to a closure.

# Why not just switch to external iterators?

Algorithms that can be represented easily without the call stack are as easy to
write as either an internal or external iterator.

However, without support for compiling a function to a state machine (C# does
this for yield), traversal of a recursive data structure has to be manually
translated to a procedural algorithm with an explicit stack. For complex data
structures, this process can be very difficult.

I'm hopeful Rust will gain support for this down the road after 1.0. If it
does, there will be no reason to write immutable internal iterators.

Another issue is mutability, as you can write iterators that are able to mutate
containers. With internal iterators, this is easy to do with safe code. With
external ones, it will `unsafe` and won't be easy to get right.

# Solution?

I don't have any proposal to completely solve this issue. :)

I think extending the built-in `for` loop to work with external iterators
should be considered, because right now the verbosity discourages using them
and makes borrow checking more painful than it has to be.

It could treat functions as internal iterators, and look for an `Iterator`
implementation (using a `lang` item) for external ones.

Python's `for` loop starts by looking for an iterator (a `__next__` method) and
falls back to an iterable (an `__iter__` method) so behaviour like this isn't
an alien concept.
A quick terminology refresher, for those who aren't familiar with it:

* Internal iterator: takes a closure, runs the closure until it asks to break
* External iterator: state machine, advanced by the caller in a loop

To a caller, external iterators provide the most functionality, because they
can be used as an internal iterator. You lose the state of an internal iterator
by breaking out of iterator, so generic algorithms like zip, union, intersect
and merge can't be implemented for a pair of iterators.

# Issues with internal iterators in Rust

A few months ago, we only had internal iterators and there were no generic
algorithms to use with any iterator - only with BaseIter's `each` method.

Rust's internal iterators implement the protocol encoded in Rust's for
statement, but it's not possible to give them all a common trait or implement
generic methods or functions taking any internal iterator.

As a workaround, we can write algorithms assuming internal iterators only take
one argument (the closure):

    fn count<T>(f: &fn(fn(T) -> bool) -> bool) -> uint

The caller has to use a partial function to call these adaptors, for which we
lack sugar:

    count(|f| uint::range(0, 10, f))

For simple functions, this is fairly reasonable once you're used to it. It
quickly gets out of control though, even for a simple function like filter:

    filter<T>(pred: &fn(&T) -> bool, input: &fn(&fn(T) -> bool) -> bool, output: &fn(T) -> bool) -> bool {}

Sadly, `filter_ref` is also needed to work around closures behaving badly with
lifetimes. An example of the problem with `fold_ref`:

    fn product<T: One + Mul<T, T>>(iter: &fn(f: &fn(&T) -> bool) -> bool) -> T {
        fold_ref(One::one::<T>(), iter, |a, x| *a = a.mul(x))
    }

Since `product` expects an iterator yielding `&T` (a borrowed pointer in any
region), it won't work with `fold` because that requires the borrowed pointer
to have the same type (and thus lifetime) for every iteration like `&'a int`.

This issue with borrowed pointers was blocking me from replacing the existing
algorithms reimplemented for both `str` and `vec` with the generic ones.

Chaining together iteration algorithms is a common use case, but even chaining
two together is confusing at first:

    to_vec(|g| filter(|&x| *x < 3, |f| xs.each(f), g)

Another more alarming issue is that with internal iterators, the `break` and
`return` statements don't always work in a `for` loop. If the iterator isn't
implemented correctly, the loop will keep going.

This also has borrow checking implications, because the compiler can't assume
those statements actually cause flow control to leave the loop immediately.

# External iterators

Based on the above, you might think generic iteration algorithms in Rust are a
bleak prospect. However, we already have a nice external iterator library, and
they don't suffer from the above issues.

All kinds of external iterators implement the following trait, whether they are
a fibonacci number generator, a reverse iterator over a vector or iterator over
a range in a sorted set:

    pub trait Iterator<A> {
        /// Advance the iterator and return the next value. Return `None` when the end is reached.
        fn next(&mut self) -> Option<A>;
    }

Generic adaptors are implemented on `Iterator`, and many of them are `Iterator`
implementations themselves:

    use std::iterator::*;

    fn main() {
        let mut it = Counter::new(0.0, 1.0)
            .take_while(|x| *x < 10000000.0)
            .transform(|x| x / 2.0)
            .transform(|x| x + 2.0);
        println(it.fold(0.0, |a, b| a + b).to_str())
    }

If you're curious, the optimized LLVM IR: http://ix.io/5Xl

Unlike internal iterators, external iterators only run one iteration at a time,
so a `for` loop designed for them would always be able to succeed with `break`
and `return`. It would also be able to avoid the ugly `advance` wrapper
currently required to use external iterators with `for`.

    // The current situation, wrapping an external iterator as an internal one
    //
    // Since the advance method is not known the be correct, borrow checking
    // still assumes `return` and `break` are imperfect.
    for xs.zip(ys).advance |x| { ... }

    // A hypothetical `for` loop using the `Iterator` trait
    for iterator |x| { ... }

    // It could also fall back to an `Iterable` trait and obtain an iterator
    for container |x| { ... }

External iterators also avoid the problems with references and closures,
because they simply return `T` rather than passing it to a closure.

# Why not just switch to external iterators?

Algorithms that can be represented easily without the call stack are as easy to
write as either an internal or external iterator.

However, without support for compiling a function to a state machine (C# does
this for yield), traversal of a recursive data structure has to be manually
translated to a procedural algorithm with an explicit stack. For complex data
structures, this process can be very difficult.

I'm hopeful Rust will gain support for this down the road after 1.0. If it
does, there will be no reason to write immutable internal iterators.

Another issue is mutability, as you can write iterators that are able to mutate
containers. With internal iterators, this is easy to do with safe code. With
external ones, it will `unsafe` and won't be easy to get right.

# Solution?

I don't have any proposal to completely solve this issue. :)

I think extending the built-in `for` loop to work with external iterators
should be considered, because right now the verbosity discourages using them
and makes borrow checking more painful than it has to be.

It could treat functions as internal iterators, and look for an `Iterator`
implementation (using a `lang` item) for external ones.

Python's `for` loop starts by looking for an iterator (a `__next__` method) and
falls back to an iterable (an `__iter__` method) so behaviour like this isn't
an alien concept.

Gmane