RustBrock/Traits for Async.md
2025-03-31 14:55:57 -06:00

19 KiB

A Closer Look at the Traits for Async

Sometime, you will encounter situations where you will need to understand a few more of these details.

A high level understanding is ok for most of day to day Rust writing.

In this chapter we will dig in just enough to help in those scenarios.

Diving even requires reading the documentation.

The Future Trait

Now lets look at how the future trait works.

Here is how Rust defines it

use std::pin::Pin;
use std::task::{Context, Poll};

pub trait Future {
    type Output;

    fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output>;
}

The trait definition includes a bunch of new types and also some syntax we haven't seen before.

First, Future's associated type Output says what the future resolves to.

This is analogous to the Item associated type for the IUterator trait.

Second, Future also has the poll method, this takes a special Pin reference for its self parameter and a mutable reference to a Context type, and returns a Poll<Self::Output>.

For now we will focus on what the method returns, the Poll type

enum Poll<T> {
    Ready(T),
    Pending,
}

This Pool type is similar to an Option.

It has one variant that has a value, Ready(T), and one which does not Pending.

Poll means something quite different form Option.

The Pending variant indicates that the future still has work to do, so the caller will need to check again later.

The Ready variant indicates that the future has finished its work and the T value is available.

Note that futures, the caller should not call poll again after the future has returned Ready.

Many of the futures will panic if polled again after becoming ready.

Futures that safe to poll again will say so explicitly in their documentation.

This is similar behavior to Iterator::next.

When toy see code that uses await, Rust compiles it under the hood to code that calls poll.

If you look back at the previous example, where we printed out the page title for a single URL once it resolved.

Rust compiles it into something kind of like this

match page_title(url).poll() {
    Ready(page_title) => match page_title {
        Some(title) => println!("The title for {url} was {title}"),
        None => println!("{url} had no title"),
    }
    Pending => {
        // But what goes here?
    }
}

What should we do when the future is still Pending?

We need way to try again and again, until the future is finally ready.

We need a loop

let mut page_title_fut = page_title(url);
loop {
    match page_title_fut.poll() {
        Ready(value) => match page_title {
            Some(title) => println!("The title for {url} was {title}"),
            None => println!("{url} had no title"),
        }
        Pending => {
            // continue
        }
    }
}

If Rust compiled it to exactly this code.

Every await would be blocking, exactly the opposite of what we are trying to do.

Instead Rust makes sure that the loop can hand off control to something that can pause work in this future to work on other futures and then check this again later.

This is something that async runtime, and this scheduling and coordination work is one of its main jobs.

Earlier we described waiting on rx.recv.

The recv call returns a future, and awaiting the future polls it.

We noted that a runtime will pause the future until it is ready with either Some(message) or None when the channel closes.

Now with the deeper understanding of the Future trait, and specifically Future::poll, we can now see how that works.

The runtime knows the future isn't ready when it returns Poll::Pending.

The runtime also knows the future is ready and advances it when poll returns Poll::Ready(Some(message)) or Poll::Ready(None)

The exact details of how a runtime works, is something that will not be covered by this book.

The key is to see the basic mechanics of futures.

A runtime polls each future it is responsible for, putting the future beck to sleep when it is not yet ready.

The Pin and Unpin Traits

When we introduced pinning, we run into this error message.

Here is the relevant part again

error[E0277]: `{async block@src/main.rs:10:23: 10:33}` cannot be unpinned
  --> src/main.rs:48:33
   |
48 |         trpl::join_all(futures).await;
   |                                 ^^^^^ the trait `Unpin` is not implemented for `{async block@src/main.rs:10:23: 10:33}`, which is required by `Box<{async block@src/main.rs:10:23: 10:33}>: Future`
   |
   = note: consider using the `pin!` macro
           consider using `Box::pin` if you need to access the pinned value outside of the current scope
   = note: required for `Box<{async block@src/main.rs:10:23: 10:33}>` to implement `Future`
note: required by a bound in `futures_util::future::join_all::JoinAll`
  --> file:///home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/futures-util-0.3.30/src/future/join_all.rs:29:8
   |
27 | pub struct JoinAll<F>
   |            ------- required by a bound in this struct
28 | where
29 |     F: Future,
   |        ^^^^^^ required by this bound in `JoinAll`

This error message tells us not only that we need to pin the values but also why pinning is required.

The trpl::join_all function returns a struct called JoinAll.

The struct is a generic over a type F which is constrained to implement the Future trait.

Directly awaiting a future with await pins the future implicitly.

This is why we don't need to use pin! everywhere we want to await futures.

However we are not directly awaiting a future here.

Instead, we construct a new future, JoinAll, by passing a collection of futures to the join_all function.

The signature for join_all requires that the types of the items in the collection all implement the Future trait.

Box<T> implements Future only if the T wraps is a future that implements the Unpin trait.

Now we will dive a little deeper into how the Future trait actually works, in particular around pinning.

Lets look at the definition of pf the Future trait.

use std::pin::Pin;
use std::task::{Context, Poll};

pub trait Future {
    type Output;

    // Required method
    fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output>;
}

The cx parameter and its Context type are the key to how a runtime actually knows when to check any given future while still being lazy.

The details of how that works are beyond the scope of this chapter, and you generally only need to think about this when writing a custom Future implementation.

Here we will focus on the type for self as this is the first time we have seen a method where self has a type annotation.

A type annotation for self is works like type annotations for other function parameters, but there are two key differences.

  • It tells Rust what type self must be for the method to be called.
  • It can't be just any time
    • It is restricted to the type on which the method is implemented, a reference or smart pointer to the type, or a Pin wrapping a reference to that type. We will see more on this syntax in Ch18.

For now know that if we want to poll a future to check whether it is Pending or Ready(Output)m we need a Pin wrapped mutable reference to the type.

Pin is a wrapper for pointer-like types such as &, &mut, Box, and Rc.

(Technically Pin works with types that implement the Deref or DerefMut traits but this is effectively equivalent to working only with pointers)

Pin is not a pointer itself.

It also doesn't have any behavior of its own like Rc and Arc do with reference counting.

This is purely a tool the compiler can use to enforce constraints on pointer usage.

Recall that await is implemented in terms of calls to poll start to explain the error message from before.

This was in terms of Unpin, not Pin.

How does Pin relate to Unpin and why does Future need self to be in a Pin type to call poll?

Remember from before, a series of await points in a future get compiled into a state machine, and the compiler makes sure that state machine follows all of Rust's normal rules around safety, which includes borrowing and ownership.

In order to make this work, Rust looks at what data is needed between one await point and either the next await point or the end of the async block.

Each variant get the access it needs to the data that will be used in that section of the source code, whether by taking ownership of that data or by getting a mutable or immutable reference to it.

If we get anything wrong about the ownership or references in a given async block, the borrow checker will tell us.

When we want to move around the future that corresponds to that block, like moving it into a Vec to pass to join_all, where things get tricker.

When we move a future, either by pushing it into a data structure to use as an iterator with join_all or by returning from a function, this actually means moving the state machine Rust creates for us.

Unlike most other types in Rust, the future Rust creates for async blocks can end up with references to themselves in the fields of any given variant.

This is shown in this illustration By default, any object that has a reference to itself is unsafe to move, because references always point to the actual memory address of whatever they refer to.

If you move the data structure itself, those internal references will be left pointing to the old location.

However that memory location is now invalid.

One thing is that its value will not be updated when you make changes to the data structure.

Another thing, which is more important, is the computer is now free to reuse that memory for other purposes.

You could end up reading completely unrelated data later.

Theoretically, the Rust compiler could try to update every reference to an object whenever it gets moved, but that could add a lot of performance overhead.

This is especially true if a whole web of references needs updating.

If we could instead ensure that the data structure doesn't move in memory, we then wouldn't have to update any references.

This is exactly what Rust's borrow checker requires: in safe code, it prevents you from moving any item with an active reference to it.

Pin builds on that give us the exact guarantee we need.

When we pin a value by wrapping a pointer to that value in Pin, it can no longer move.

Thus if you have Pin<Box<SomeType>>, you actually pin the SomeType value, not the Box pointer.

The image illustrates this process. In fact, the Box pointer can still move around freely.

We car about making sure the data ultimately being referenced stays in place.

If a pointer moves around, but the data it points is in the same place, there is no potential problem.

As an independent exercise, look at the dos for the types as well as the std::pin module and try to work out how you would do do this with a Pin wrapping a Box.

The key is that the self-referential type cannot move, because it is still pinned. However most types are perfectly safe to move around, even if they happen to be behind a Pin pointer.

We only need to think about pinning when the items have internal references.

Primitives values such as numbers and Booleans are safe since they obviously don't have any internal references, so they are obviously safe.

Neither do most types you normally work with in Rust.

You can move around a Vec, for example, without worrying.

given what we have seen, if you have a Pin<Vec<String>>, you would have to everything via the safe but restrictive APIs provided by Pin/

Even though a Vec<String> is always safe to move if there are no other references to it.

We need a way to tell the compiler that it is fine to move items around in cases like this, this is where Unpin comes into action.

Unpin is a marker trait, similar to the Send and Sync traits.

Thus has no functionality of its own.

Marker traits exist only to tell the compiler to use the type implementing a given trait in a particular context.

Unpin informs the compiler that a given type does not need to uphold any guarantees about whether the value in question can be safely moved.

Just like Send and Sync, the compiler implements Unpin automatically for all types where it can prove it is safe.

A special case, is where Unpin is not implemented for a type.

The notation for this is impl !Unpin for *SomeType*, where *SomeType* is the name of a type that does need to uphold those guarantees to be safe whenever a pointer to that type is used in a Pin.

The relationship between Pin and Unpin has two important things to remember:

  • Unpin is the "normal case", !Unpin is the special case
  • Whether a type implements Unpin or !Unpin only matters when you are using a pinned pointer to that type like Pin<&mut *SomeType*>

To make that concrete, think about a String: it has a length and the Unicode characters that make it up.

We can wrap a String in Pin.

However String automatically implements Unpin as do most other types in Rust. Pinning a String; the dotted line indicates that the String implements the Unpin trait, and thus is not pinned. This results, in the ability to do things that would be illegal if String implemented !Unpin, such as replacing one string with another at the exact same location in has no interval references that make it unsafe to move around.

This wouldn't violate the Pin contract, because String has no internal references that make it unsafe to move around.

This is precisely why it implements Unpin rather than !Unpin. Now that we know enough to understand the errors reported for that join_all call from before.

There we originally tried to move the futures produced by the async blocks into a Vec<Box<dyn Future<Output = ()>>>.

As we have seen, those futures may have internal references, so they don't implement Unpin.

They need to be pinned and then we can pass the Pin type into the Vec, confident that the underlying data in the futures will not be moved.

Pin and Unpin are mostly important for building lower-level libraries, or when you are building a runtime itself, rather than for day-to-day Rust.

When you see these traits in error messages, now you will have a better idea of how to fix your code.

Note, the combination of Pin and Unpin makes it possible to safely implement a whole class of complex types in Rust that would otherwise prove challenging because they are self-referential.

Types that require Pin show up most commonly in async Rust today.

Every once in a while, you may see them in other contexts too.

The specifics of how Pin and Unpin work, and the rules they are required to uphold are covered extensively in the API documentation for std::pin, so you can check there for more info.

In fact there is a whole BOOK on async Rust programming, that you can find here

The Stream Trait

As we leaned earlier, streams are similar to asynchronous iterators.

Unlike Iterator and Future, Stream has no definition in the std library (as of writing this), but there is a very common definition form the fuitures crate used throughout the ecosystem.

Here is a review of the Iterator and Future traits before going into how a Stream trait might merge them together.

From Iterator, we have the idea of a sequence: its next method provides an Option<Self::Item>

From Future, we have the idea of readiness over time: the poll method provides a Poll<Self::Output>

This allows us to represent a sequence of items that become ready over time, we define a Stream trait that puts those features together.

use std::pin::Pin;
use std::task::{Context, Poll};

trait Stream {
    type Item;

    fn poll_next(
        self: Pin<&mut Self>,
        cx: &mut Context<'_>
    ) -> Poll<Option<Self::Item>>;
}

Here the Stream trait defines an associated type called Item for the type of the items produced by stream.

This is similar to Iterator, where there may be zero to many items, and unlike Future, where there is always a single Output, even if it is the unit type ().

Stream also defines a method to get those items.

We call it poll_next, to make it clear that it polls in the same way Future::poll does and produces a sequence of items in the same way Iterator::next does.

Its return type combines Poll with Option.

The outer type is Poll, because it has to be checked for readiness, just as a future does.

The inner type is Option, because it needs to signal whether there are more messages, just as an iterator does.

Somethin like this will likely end up as part of Rust's standard library.

In the meantime, it is part of the toolkit of most runtimes, so you can rely on it, and everything that will be covered should apply generally.

In the example we saw previously in the section on streaming, we didn't use Poll_next or Stream, but instead used next and StreamExt.

We could work with futures directly via their poll method.

Using await is much nicer, and the StreamExt trait supplies the next method so we can do just this:

trait StreamExt: Stream {
    async fn next(&mut self) -> Option<Self::Item>
    where
        Self: Unpin;

    // other methods...
}

Note: The definition that we used earlier in the ch looks slightly different that this.

This is because it supports versions of Rust that did not yet support using async functions in traits.

As a result it looks like this:

fn next(&mut self) -> Next<'_, Self> where Self: Unpin;

This Next type is a struct that implements Future and allows us to name the lifetime of the reference to self with Next<'_, Self>, so that await can work with this method.

The StreamExt trait also has some interesting method available to use with steams.

StreamExt is automatically implemented for every type that implements Stream.

These traits are defined separately to enable the community to iterate on convenience APIs without affecting the foundational trait.

In the version of StreamExt used in the trpl crate, the trait not only defines the next method but also supplies a default implementation of next that correctly handles the details of calling Stream::poll_next.

Meaning that even when you need to write your own streaming data type, you only have to implement Stream and then anyone who uses your data type can use StreamExt and its methods with it automatically.