# A Closer Look at the Traits for Async Sometime, you will encounter situations where you will need to understand a few more of these details. A high level understanding is ok for most of day to day Rust writing. In this chapter we will dig in just enough to help in those scenarios. Diving even requires reading the documentation. ## The `Future` Trait Now lets look at how the future trait works. Here is how Rust defines it ```rust use std::pin::Pin; use std::task::{Context, Poll}; pub trait Future { type Output; fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll; } ``` The trait definition includes a bunch of new types and also some syntax we haven't seen before. First, `Future`'s associated type `Output` says what the future resolves to. This is analogous to the `Item` associated type for the `IUterator` trait. Second, `Future` also has the `poll` method, this takes a special `Pin` reference for its `self` parameter and a mutable reference to a `Context` type, and returns a `Poll`. For now we will focus on what the method returns, the `Poll` type ```rust enum Poll { Ready(T), Pending, } ``` This `Pool` type is similar to an `Option`. It has one variant that has a value, `Ready(T)`, and one which does not `Pending`. `Poll` means something quite different form `Option`. The `Pending` variant indicates that the future still has work to do, so the caller will need to check again later. The `Ready` variant indicates that the future has finished its work and the `T` value is available. Note that futures, the caller should not call `poll` again after the future has returned `Ready`. Many of the futures will panic if polled again after becoming ready. Futures that safe to poll again will say so explicitly in their documentation. This is similar behavior to `Iterator::next`. When toy see code that uses `await`, Rust compiles it under the hood to code that calls `poll`. If you look back at the previous example, where we printed out the page title for a single URL once it resolved. Rust compiles it into something kind of like this ```rust match page_title(url).poll() { Ready(page_title) => match page_title { Some(title) => println!("The title for {url} was {title}"), None => println!("{url} had no title"), } Pending => { // But what goes here? } } ``` What should we do when the future is still `Pending`? We need way to try again and again, until the future is finally ready. We need a loop ```rust let mut page_title_fut = page_title(url); loop { match page_title_fut.poll() { Ready(value) => match page_title { Some(title) => println!("The title for {url} was {title}"), None => println!("{url} had no title"), } Pending => { // continue } } } ``` If Rust compiled it to exactly this code. Every `await` would be blocking, exactly the opposite of what we are trying to do. Instead Rust makes sure that the loop can hand off control to something that can pause work in this future to work on other futures and then check this again later. This is something that async runtime, and this scheduling and coordination work is one of its main jobs. Earlier we described waiting on `rx.recv`. The `recv` call returns a future, and awaiting the future polls it. We noted that a runtime will pause the future until it is ready with either `Some(message)` or `None` when the channel closes. Now with the deeper understanding of the `Future` trait, and specifically `Future::poll`, we can now see how that works. The runtime knows the future isn't ready when it returns `Poll::Pending`. The runtime also knows the future *is* ready and advances it when `poll` returns `Poll::Ready(Some(message))` or `Poll::Ready(None)` The exact details of how a runtime works, is something that will not be covered by this book. The key is to see the basic mechanics of futures. A runtime *polls* each future it is responsible for, putting the future beck to sleep when it is not yet ready. ## The `Pin` and `Unpin` Traits When we introduced pinning, we run into this error message. Here is the relevant part again ``` error[E0277]: `{async block@src/main.rs:10:23: 10:33}` cannot be unpinned --> src/main.rs:48:33 | 48 | trpl::join_all(futures).await; | ^^^^^ the trait `Unpin` is not implemented for `{async block@src/main.rs:10:23: 10:33}`, which is required by `Box<{async block@src/main.rs:10:23: 10:33}>: Future` | = note: consider using the `pin!` macro consider using `Box::pin` if you need to access the pinned value outside of the current scope = note: required for `Box<{async block@src/main.rs:10:23: 10:33}>` to implement `Future` note: required by a bound in `futures_util::future::join_all::JoinAll` --> file:///home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/futures-util-0.3.30/src/future/join_all.rs:29:8 | 27 | pub struct JoinAll | ------- required by a bound in this struct 28 | where 29 | F: Future, | ^^^^^^ required by this bound in `JoinAll` ``` This error message tells us not only that we need to pin the values but also why pinning is required. The `trpl::join_all` function returns a struct called `JoinAll`. The struct is a generic over a type `F` which is constrained to implement the `Future` trait. Directly awaiting a future with `await` pins the future implicitly. This is why we don't need to use `pin!` everywhere we want to await futures. However we are not directly awaiting a future here. Instead, we construct a new future, `JoinAll`, by passing a collection of futures to the `join_all` function. The signature for `join_all` requires that the types of the items in the collection all implement the `Future` trait. `Box` implements `Future` only if the `T` wraps is a future that implements the `Unpin` trait. Now we will dive a little deeper into how the `Future` trait actually works, in particular around *pinning*. Lets look at the definition of pf the `Future` trait. ```rust use std::pin::Pin; use std::task::{Context, Poll}; pub trait Future { type Output; // Required method fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll; } ``` The `cx` parameter and its `Context` type are the key to how a runtime actually knows when to check any given future while still being lazy. The details of how that works are beyond the scope of this chapter, and you generally only need to think about this when writing a custom `Future` implementation. Here we will focus on the type for `self` as this is the first time we have seen a method where `self` has a type annotation. A type annotation for `self` is works like type annotations for other function parameters, but there are two key differences. - It tells Rust what type `self` must be for the method to be called. - It can't be just any time - It is restricted to the type on which the method is implemented, a reference or smart pointer to the type, or a `Pin` wrapping a reference to that type. We will see more on this syntax in Ch18. For now know that if we want to poll a future to check whether it is `Pending` or `Ready(Output)`m we need a `Pin` wrapped mutable reference to the type. `Pin` is a wrapper for pointer-like types such as `&`, `&mut`, `Box`, and `Rc`. (Technically `Pin` works with types that implement the `Deref` or `DerefMut` traits but this is effectively equivalent to working only with pointers) `Pin` is not a pointer itself. It also doesn't have any behavior of its own like `Rc` and `Arc` do with reference counting. This is purely a tool the compiler can use to enforce constraints on pointer usage. Recall that `await` is implemented in terms of calls to `poll` start to explain the error message from before. This was in terms of `Unpin`, not `Pin`. How does `Pin` relate to `Unpin` and why does `Future` need `self` to be in a `Pin` type to call `poll`? Remember from before, a series of await points in a future get compiled into a state machine, and the compiler makes sure that state machine follows all of Rust's normal rules around safety, which includes borrowing and ownership. In order to make this work, Rust looks at what data is needed between one await point and either the next await point or the end of the async block. Each variant get the access it needs to the data that will be used in that section of the source code, whether by taking ownership of that data or by getting a mutable or immutable reference to it. If we get anything wrong about the ownership or references in a given async block, the borrow checker will tell us. When we want to move around the future that corresponds to that block, like moving it into a `Vec` to pass to `join_all`, where things get tricker. When we move a future, either by pushing it into a data structure to use as an iterator with `join_all` or by returning from a function, this actually means moving the state machine Rust creates for us. Unlike most other types in Rust, the future Rust creates for async blocks can end up with references to themselves in the fields of any given variant. This is shown in this illustration By default, any object that has a reference to itself is unsafe to move, because references always point to the actual memory address of whatever they refer to. If you move the data structure itself, those internal references will be left pointing to the old location. However that memory location is now invalid. One thing is that its value will not be updated when you make changes to the data structure. Another thing, which is more important, is the computer is now free to reuse that memory for other purposes. You could end up reading completely unrelated data later. Theoretically, the Rust compiler could try to update every reference to an object whenever it gets moved, but that could add a lot of performance overhead. This is especially true if a whole web of references needs updating. If we could instead ensure that the data structure *doesn't move in memory*, we then wouldn't have to update any references. This is exactly what Rust's borrow checker requires: in safe code, it prevents you from moving any item with an active reference to it. `Pin` builds on that give us the exact guarantee we need. When we *pin* a value by wrapping a pointer to that value in `Pin`, it can no longer move. Thus if you have `Pin>`, you actually pin the `SomeType` value, *not* the `Box` pointer. The image illustrates this process. In fact, the `Box` pointer can still move around freely. We car about making sure the data ultimately being referenced stays in place. If a pointer moves around, *but the data it points is in the same place*, there is no potential problem. As an independent exercise, look at the dos for the types as well as the `std::pin` module and try to work out how you would do do this with a `Pin` wrapping a `Box`. The key is that the self-referential type cannot move, because it is still pinned. However most types are perfectly safe to move around, even if they happen to be behind a `Pin` pointer. We only need to think about pinning when the items have internal references. Primitives values such as numbers and Booleans are safe since they obviously don't have any internal references, so they are obviously safe. Neither do most types you normally work with in Rust. You can move around a `Vec`, for example, without worrying. given what we have seen, if you have a `Pin>`, you would have to everything via the safe but restrictive APIs provided by `Pin`/ Even though a `Vec` is always safe to move if there are no other references to it. We need a way to tell the compiler that it is fine to move items around in cases like this, this is where `Unpin` comes into action. `Unpin` is a marker trait, similar to the `Send` and `Sync` traits. Thus has no functionality of its own. Marker traits exist only to tell the compiler to use the type implementing a given trait in a particular context. `Unpin` informs the compiler that a given type does *not* need to uphold any guarantees about whether the value in question can be safely moved. Just like `Send` and `Sync`, the compiler implements `Unpin` automatically for all types where it can prove it is safe. A special case, is where `Unpin` is *not* implemented for a type. The notation for this is `impl !Unpin for *SomeType*`, where `*SomeType*` is the name of a type that *does* need to uphold those guarantees to be safe whenever a pointer to that type is used in a `Pin`. The relationship between `Pin` and `Unpin` has two important things to remember: - `Unpin` is the "normal case", `!Unpin` is the special case - Whether a type implements `Unpin` or `!Unpin` *only* matters when you are using a pinned pointer to that type like `Pin<&mut *SomeType*>` To make that concrete, think about a `String`: it has a length and the Unicode characters that make it up. We can wrap a `String` in `Pin`. However `String` automatically implements `Unpin` as do most other types in Rust. Pinning a `String`; the dotted line indicates that the `String` implements the `Unpin` trait, and thus is not pinned. This results, in the ability to do things that would be illegal if `String` implemented `!Unpin`, such as replacing one string with another at the exact same location in has no interval references that make it unsafe to move around. This wouldn't violate the `Pin` contract, because `String` has no internal references that make it unsafe to move around. This is precisely why it implements `Unpin` rather than `!Unpin`. Now that we know enough to understand the errors reported for that `join_all` call from before. There we originally tried to move the futures produced by the async blocks into a `Vec>>`. As we have seen, those futures may have internal references, so they don't implement `Unpin`. They need to be pinned and then we can pass the `Pin` type into the `Vec`, confident that the underlying data in the futures will *not* be moved. `Pin` and `Unpin` are mostly important for building lower-level libraries, or when you are building a runtime itself, rather than for day-to-day Rust. When you see these traits in error messages, now you will have a better idea of how to fix your code. Note, the combination of `Pin` and `Unpin` makes it possible to safely implement a whole class of complex types in Rust that would otherwise prove challenging because they are self-referential. Types that require `Pin` show up most commonly in async Rust today. Every once in a while, you may see them in other contexts too. The specifics of how `Pin` and `Unpin` work, and the rules they are required to uphold are covered extensively in the `API` documentation for `std::pin`, so you can check there for more info. In fact there is a whole BOOK on async Rust programming, that you can find [here](https://rust-lang.github.io/async-book/) ## The `Stream` Trait As we leaned earlier, streams are similar to asynchronous iterators. Unlike `Iterator` and `Future`, `Stream` has no definition in the std library (as of writing this), but there *is* a very common definition form the `fuitures` crate used throughout the ecosystem. Here is a review of the `Iterator` and `Future` traits before going into how a `Stream` trait might merge them together. From `Iterator`, we have the idea of a sequence: its `next` method provides an `Option` From `Future`, we have the idea of readiness over time: the `poll` method provides a `Poll` This allows us to represent a sequence of items that become ready over time, we define a `Stream` trait that puts those features together. ```rust use std::pin::Pin; use std::task::{Context, Poll}; trait Stream { type Item; fn poll_next( self: Pin<&mut Self>, cx: &mut Context<'_> ) -> Poll>; } ``` Here the `Stream` trait defines an associated type called `Item` for the type of the items produced by stream. This is similar to `Iterator`, where there may be zero to many items, and unlike `Future`, where there is always a single `Output`, even if it is the unit type `()`. `Stream` also defines a method to get those items. We call it `poll_next`, to make it clear that it polls in the same way `Future::poll` does and produces a sequence of items in the same way `Iterator::next` does. Its return type combines `Poll` with `Option`. The outer type is `Poll`, because it has to be checked for readiness, just as a future does. The inner type is `Option`, because it needs to signal whether there are more messages, just as an iterator does. Somethin like this will likely end up as part of Rust's standard library. In the meantime, it is part of the toolkit of most runtimes, so you can rely on it, and everything that will be covered should apply generally. In the example we saw previously in the section on streaming, we didn't use `Poll_next` or `Stream`, but instead used `next` and `StreamExt`. We *could* work with futures directly via their `poll` method. Using `await` is much nicer, and the `StreamExt` trait supplies the `next` method so we can do just this: ```rust trait StreamExt: Stream { async fn next(&mut self) -> Option where Self: Unpin; // other methods... } ``` Note: The definition that we used earlier in the ch looks slightly different that this. This is because it supports versions of Rust that did not yet support using async functions in traits. As a result it looks like this: ```rust fn next(&mut self) -> Next<'_, Self> where Self: Unpin; ``` This `Next` type is a `struct` that implements `Future` and allows us to name the lifetime of the reference to `self` with `Next<'_, Self>`, so that `await` can work with this method. The `StreamExt` trait also has some interesting method available to use with steams. `StreamExt` is automatically implemented for every type that implements `Stream`. These traits are defined separately to enable the community to iterate on convenience APIs without affecting the foundational trait. In the version of `StreamExt` used in the `trpl` crate, the trait not only defines the `next` method but also supplies a default implementation of `next` that correctly handles the details of calling `Stream::poll_next`. Meaning that even when you need to write your own streaming data type, you *only* have to implement `Stream` and then anyone who uses your data type can use `StreamExt` and its methods with it automatically.