RustBrock/Futures and Async.md
darkicewolf50 c73d875808
Some checks failed
Test Gitea Actions / first (push) Successful in 18s
Test Gitea Actions / check-code (push) Failing after 12s
Test Gitea Actions / test (push) Has been skipped
Test Gitea Actions / documentation-check (push) Has been skipped
started ch17.3, completed half
2025-03-24 16:01:40 -06:00

18 KiB

Futures and the Async Syntax

The key parts of asynchronous programming in Rust are futures and Rust's async and await keywords

A future or a promise is a value that may not be ready now but will become ready at some point in the future

In other languages the same concept shows up under other names such as task or promise.

Rust provides a Future trait as a building block so that different async operations can be implemented with different data structures but with a common interface.

Rust futures are types that implement the Future trait.

Each future holds its own information about the progress that has been made and what "ready" means.

You can apply the async keyword to blocks and functions to specify that they can be interrupted and resumed.

Within an async block or async function, you can use the await keyword to await a future (that is, wait for it to become ready).

Any point where you await a future within an async or function is a potential spot for that async block or function to pause and resume.

The process of checking with a future to see if its value is available yet is called polling.

Some other languages, such as C# and JavaScript, that also use async and await keywords for async programming.

There are some significant differences in how Rust does things, including how it handles the syntax.

When writing async Rust, we use the async and await keywords most of the time.

Rust compiles them into equivalent code using the Future trait, much as it compiles for loops into equivalent code using the Iterator trait.

Due to Rust providing the Future trait, this means that you can also implement it for your own data types you need to.

Many of the functions that we will see have return types with their own implementations of Future.

We will return to the definition of the trait at the end of the chapter and dig into more of how it works.

This may all feel a bit abstract so we will go into our first program: a little web scraper.

We will pass in two URLs form the command line, fetch both of them concurrently and return the result of whichever one finishes first.

This will have a fair bit of new syntax.

Our First Async Program

To keep focus on learning async rather than juggling parts of the ecosystem, we created the trpl crate (this is sort for "The Rust Programming Language").

This re-exports all the types, traits, and functions you will need, primarily form the futures and tokio crates.

The futures crate is an official home for Rust experimentation for async code, and it is where the Future trait was orignially designed.

Tokio is the most widely used async runtime in Rust today, especially for web applications.

Here we the tokio crate under the hood for trpl becuase it is well tested and widly used.

In some cases trpl also renames or wraps the original APIs to keep us focused on the details relevant to this chapter.

If you want to understand in depth of what the crate does, check out its source code.

You will then be able to see what crate each re-export comes from, and we have left extensive comments explaining what the crate does.

First we will start by building a little command line tools that fetches two we pages, pulls the <title> element from each and print out the title of whichever page finishes that whole process first.

Defining the page_title Function

First we will start by writing a function that takes one page URL as a parameter

extern crate trpl; // required for mdbook test

fn main() {
    // TODO: we'll add this next!
}

use trpl::Html;

async fn page_title(url: &str) -> Option<String> {
    let response = trpl::get(url).await;
    let response_text = response.text().await;
    Html::parse(&response_text)
        .select_first("title")
        .map(|title_element| title_element.inner_html())
}

Fist we define a fnction named page_title and mark it with the async keyword.

We then ise the trpl::get function to fetch whatever URL is passed in and add the await keyword to await the response.

To get the response, we call its text method, and once again await it with the await keyword.

Both of these steps are asynchronous.

For the get function, we have to wait for the server to send back the first part of its response. This will include HTTP header, cookies and so on, and can be delivered separately from the response body.

Especially if the body is very large, it can take some time for it all to arrive.

If we have to wait for the entirety of the response to arrive, the text method is also aync.

Here we have to explicity await both of these future, because futures in Rust are lazy

Futures will not do anthing until you ask them to with the await keyword. (Rust will show a compiler warning if you don't use a future)

This might remind you of iterators in the section Processing a Series of Items With Iteraors.

Iterators do nothing unless you call their next method whether directly or by using for loops or methods such as map that use next under the hood.

Likewise, futures do nothing unless you explicitly ask them to.

This laziness allows Rust ot avoid running async code until its actually needed.

This is differnt from the behavoir we say before when unsing thread::spawn in Creating a New Thread with spawn, where the closure we passed to another thead started running immediately.

This is also different from how many other languages approach async.

This is improtant for Rust and we will see why later.

Once we have response_text, we can then parse it into an intance of the Html type using Html::parse.

Instead of a raw string, we now have a data type we can work with the HTML as a richer data structure.

In particular, we can use the select_first method to find the first instace of a given CSS selector.

By passing the string "title", we get the first <title> element in the document, if there is one.

Becuase there may not be any matching element, select_first returns an Option<ElementRef>.

Lastly we use the Option::map method, this lets us work with the item in the Option if it is present and do nothing if it isn't.

We could also use a match expression, but map is more idiomatic.

In the body of the function we supply to map, we call inner_html on title_element to get its content, which is a String.

When it is all done we have an Option<String>

Note that Rust's await kword goes after the expression you are awaiting, not before it.

It is a postfix keyword.

This may differ from what you may have used async in other languages, but in Rust it makes chains of methods much nice to work with.

This results that we can change the body of page_url_for to chain the trpl::get and text function calls together with await between them.

    let response_text = trpl::get(url).await.text().await;

With this we have successfully written our first async function.

Befor we add some code in main to call it. We will dive even deep into what we have written and what it means.

When Rust sees a block mared with the async keyword, it compiles it into a unique, anonymous data tpye that implements the Future trait.

When Rust sees a function marked with async, it compiles it not a non-async function whose body is an async block.

An async function's return type is the type of the anonymous data type the compiler creates for that async block.

Writing async fn is equivalent to writing a function that returns a future of the return type.

To the compiler, a function definition such as the async fn page_title is equivalent ot a non-async function defined like this:

use std::future::Future;
use trpl::Html;

fn page_title(url: &str) -> impl Future<Output = Option<String>> + '_ {
    async move {
        let text = trpl::get(url).await.text().await;
        Html::parse(&text)
            .select_first("title")
            .map(|title| title.inner_html())
    }
}

Lets see go through each part of the transformed version:

  • It uses the impl Trait syntax we discussed in Ch 10 in the "Traits as Parameters" section
  • The returned trait is a Future with an associated type of Output.
    • Note that the Output type is Option<String>, which is the same as the original return type from the async fn version of page_title.
  • This async block produces a value with the type Option<String>.
    • That value matches he Output type in the return type.
    • This is just like other blocks you have seen.
  • All of the code called in the body of the original function is wrapped in an async move block.
    • Blocks are expressions.
    • This whole block is the expression returned from the function.
  • That value matches the Output type in the return type.
  • The new function body is an async move block because of how it uses the url parameter.
    • We will go into async versus async move later.
  • The new version of the function has a kind of lifetime we haven't seen before in the output type: '_
    • This is due to the function reutrns a future that refers to a reference, in this case, the reference form the url parameter.
    • Here we need to tell Rust that we want that reference to be included.
    • We don't have to name the lifetime here because Rust is smart enough to know there is only reference that could be involved, but we do have to be explicit that the resulting future is bound by that lifetime.

Determining a Single Page's Title

To start, we will just get a single page.

Here we follow the same pattern as we yused in Ch 12 to get command line arguments.

Then we pass the first pass the first URL page_title and await the resutl.

Due to the value produced by the future is an Option<String>, we use a match epxression to print different messages to account for whether the page has a <title>

extern crate trpl; // required for mdbook test

use trpl::Html;

async fn main() {
    let args: Vec<String> = std::env::args().collect();
    let url = &args[1];
    match page_title(url).await {
        Some(title) => println!("The title for {url} was {title}"),
        None => println!("{url} had no title"),
    }
}

async fn page_title(url: &str) -> Option<String> {
    let response_text = trpl::get(url).await.text().await;
    Html::parse(&response_text)
        .select_first("title")
        .map(|title_element| title_element.inner_html())
}

This will not compile.

The only place we can use the await keyowrd is in async functions or blocks.

Rust will not let us akr the special main function as async

We will get this error

error[E0752]: `main` function is not allowed to be `async`
 --> src/main.rs:6:1
  |
6 | async fn main() {
  | ^^^^^^^^^^^^^^^ `main` function is not allowed to be `async`

The reason that main can't be marked async is that async code needs a runtime

A Rust crate that manages the details of executing asynchronous code.

A program's main function can initialize a runtime but it is not a runtime itself.

(We will see why this is the case in a bit)

Every Rust program that executes async has at least one place where it sets up a runtime that executes the futures.

Most languages that support async bundle a runtime, but Rust doesn't.

Instead, there are many different async runtimes available, each of which makes different tradeoffs suitable to the use case it targets.

An example fo this is a high-throughput web server with many CPU cores and a large amount of RAM has very has very differnt needs than a microcontroller with a ingle core, a samll amount of RAM, and no heap allocation ability.

The crates that provide those runtimes also often supply async versions of ocmmon functionality such as file or network I/O.

Throughout the rest of this chapter, we will use the run function from the trpl crate, which takes a futures as an argument and runs ti to completion.

Behind the scenes, calling run sets up a runtime that's is used to run the future passed in.

Once this completes, run returns whatever value the future produced.

We could have passed the future returned by page_title directly to run, and oce it compelted could match on the resulting Option<String> like as we did before.

However, for most of the examples in the chapter (and most async ocde in the real world), we'll be doing more than just one async function call.

Instead we will pass an async block and explicity await the result of the parge_title call.

Here is the updated version

fn main() {
    let args: Vec<String> = std::env::args().collect();

    trpl::run(async {
        let url = &args[1];
        match page_title(url).await {
            Some(title) => println!("The title for {url} was {title}"),
            None => println!("{url} had no title"),
        }
    })
}

Now when we run this code we get the behavior we expected initially

$ cargo run -- https://www.rust-lang.org
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.05s
     Running `target/debug/async_await 'https://www.rust-lang.org'`
The title for https://www.rust-lang.org was
            Rust Programming Language

Now we funally have some working async code.

Bt befoer we add the code to race the two site against each other we will breifly turn back to how futures work.

Each await point, every place where the code uses the await keyword, represents a plcae where control is handed back to the runtime.

To make this work, Rust needs to keep track of the state involved in the async block so that the runtime can kick off some other owrk and then come back when it is ready to try advancing the first one agian.

This is an invisible state machine, as if you had written an enum lie this to save the current state at each await point

enum PageTitleFuture<'a> {
    Initial { url: &'a str },
    GetAwaitPoint { url: &'a str },
    TextAwaitPoint { response: trpl::Response },
}

Writing the code to transition between each state by hand would be tedious and error-prone.

However, especially when you need to add more functionality and mroe states to the code later.

The Rust compiler creates and manages the statemachine data structures for async code automatically.

The normal borrowing and ownership rules around data structurs all still apply.

The compiler also handls checking those for us and provides useful error messages.

Ultimately, something has to execute this state machine, and that something is a runtime.

(This is why you may come across references to executors when looking into runtimes: an executor is the part od a runtime responsible for executing the async code)

You can now see why the compiler stopped us from making main itself an async function back.

If main were an async function, something else would need to manage the state machine for whatever future main returned, but main is the starting point for the program.

Instead we call the trpl::run function in main to set up a runtime and run the future returned by the async block until it retunrs Ready.

Note: Some runtimes provide macros so you are able to write an async main function.

These macros rewrite async fn main() { ... } to be a normal fn main.

This does the same thing as we did by hand before.

Call a function that runs a future to completion the way trpl::run does.

Now we can put these pieces tohether and see how we can write concurrent code.

Racing Our Two URLs Against Each Other

Here we will call page_title with tow different URLs passed in from th comand line and race them.

use trpl::{Either, Html};

fn main() {
    let args: Vec<String> = std::env::args().collect();

    trpl::run(async {
        let title_fut_1 = page_title(&args[1]);
        let title_fut_2 = page_title(&args[2]);

        let (url, maybe_title) =
            match trpl::race(title_fut_1, title_fut_2).await {
                Either::Left(left) => left,
                Either::Right(right) => right,
            };

        println!("{url} returned first");
        match maybe_title {
            Some(title) => println!("Its page title is: '{title}'"),
            None => println!("Its title could not be parsed."),
        }
    })
}

async fn page_title(url: &str) -> (&str, Option<String>) {
    let text = trpl::get(url).await.text().await;
    let title = Html::parse(&text)
        .select_first("title")
        .map(|title| title.inner_html());
    (url, title)
}

Here we start off by calling page_title for each of the user-supplied URLs.

We save the resulting futures as title_fut_1 and title_fut_2.

Now remember these don't do anything yet, this is due to futures being lazy we haven't awaited them.

Next we pass the futures to trpl::race, which returns a value to indicate which of the futures passed to it finishes first.

Note that under the hood, race is built on a more general function, select, which you will encounter more often in real-world Rust code.

A select function can do a lot of thinfs that trpl::race function can't, but it also has some additional complexity that we can kip over for now.

Either future can legitimately "win", so it doesn't make sense to return a Result.

Instead, race returns a trpl::Either.

This Either type is somewhat similar to a Result in that it has two cases.

Unlike Result, there is no notion of sucess or failure baked into Either.

Instead it uses Left and Right to indicate "one or the other"

enum Either<A, B> {
    Left(A),
    Right(B),
}

The race function returns Left with that future's output if the first argument wins and Right with the second future argument's output if that one wins.

This matches the order the arguments appear in when callign the function: the first argument is to the left of the second argument.

We also upadte page_title to return the same URL passed in.

Now if the page that returns first does not have a <title> we can resolve, we can still print a meaninful message.

With that information available, we wrap up by aupdating our println! output to indicate both which URL finished first and what, if any, the <title> is for the web page at that URL.

Here we have built a small working web scraper.

Now you can pcik a couple URLs and run the command line tool.

Some site are consistently faster than others, while in other cases the faster site varies form run to run.

This is the basics of working with futures, so now we can dig deeper into what we can do with async.