RustBrock/Advanced Types.md
2025-04-16 15:44:05 -06:00

12 KiB

Advanced Types

The Rust type system has some features that we have mentioned so far but haven't gone into detail.

To start we will go into the newtypes in general as we examine why newtypes are useful as types.

Then we will go onto type aliases, a feature similar to newtypes but slightly different semantics.

As well we will discuss the ! and dynamically sized types.

Using the Newtype Pattern for Type Safety and Abstraction

The newtype pattern are also useful for tasks beyond those discussed already.

This includes statically enforcing that values are never confused and indicating the units of a value.

Before we saw an example of using newtypes to indicate units: recall that the Millimeters and Meters structs wrapped u32 values in a newtype.

If we wrote a function with a parameter of type Millimeters, we couldn't compile a program that accidentally tired to call function with a value of type Meters or a plain u32.

We can also use the newtype pattern to abstract away some implementation details of a type.

The new type can expose a public API that is different form the API of the private inner type.

Newtypes can also hide internal implementation.

Lets say we could provide a People type to wrap a HashMap<i32, String> that store a person's ID associated with their name.

Code using People would only interact with the public API we provide.

Like a method to add a name string to the People collect: this code wouldn't need to know that we assign an i32 ID to names internally.

The newtype pattern is a lightweight way to achieve encapsulation to hide implementation details, which we discussed before in Ch18.

Creating Type Synonyms with Type Aliases

Rust provides the ability to declare a type alias to give an existing type another name.

We need to use the type keyword to do this.

For example we can create the alias Kilometers to i32 like this.

    type Kilometers = i32;

The alias Kilometers is a synonym for i32.

Unlike the Millimeters and Meters types we created before.

Kilometers is not a separate, new type.

Values that have the type Kilometers will be treated the same as values of type i32.

    type Kilometers = i32;

    let x: i32 = 5;
    let y: Kilometers = 5;

    println!("x + y = {}", x + y);

Because Kilometers and i32 are the same type, we can add values of both types and we can pass Kilometers values to functions that take i32 parameters.

However using this method, we don't get the type checking benefits that we get from the newtype pattern discussed earlier.

In other words, if we mix up Kilometers and i32 values somewhere, the compiler will not give us an error.

The main use for type synonyms is to reduce repetition.

As an example, we might have a lengthy type like this.

Box<dyn Fn() + Send + 'static>

Writing this lengthy type function signatures and as type annotations all over the code can be tiresome and error prone.

Just image a project full of code like this.

    let f: Box<dyn Fn() + Send + 'static> = Box::new(|| println!("hi"));

    fn takes_long_type(f: Box<dyn Fn() + Send + 'static>) {
        // --snip--
    }

    fn returns_long_type() -> Box<dyn Fn() + Send + 'static> {
        // --snip--
    }

A type alias makes this code more manageable by reducing the amount of repetition.

Here we have introduced an alias named Thunk for the verbose type and can replace all uses of the type with the shorter alias Thunk.

    type Thunk = Box<dyn Fn() + Send + 'static>;

    let f: Thunk = Box::new(|| println!("hi"));

    fn takes_long_type(f: Thunk) {
        // --snip--
    }

    fn returns_long_type() -> Thunk {
        // --snip--
    }

This is much easier to read and write.

Choosing a meaningful name for a type alias can help communicate your intent as well.

Thunk is a word for code to be evaluated at a later time, this is an appropriate name for a closure that gets stored.

Type aliases are also commonly used with the Result<T, E> type for repetition.

Consider the std::io module in the std library.

I/O operations often return a Result<T, E> to handle situations when operations fail to work.

This library has a std::io::Error struct that represents all possible I/O errors.

Many of the functions in std::io will be returning Result<T, E> where the E is std::io::Error, such as these functions in Write trait:

use std::fmt;
use std::io::Error;

pub trait Write {
    fn write(&mut self, buf: &[u8]) -> Result<usize, Error>;
    fn flush(&mut self) -> Result<(), Error>;

    fn write_all(&mut self, buf: &[u8]) -> Result<(), Error>;
    fn write_fmt(&mut self, fmt: fmt::Arguments) -> Result<(), Error>;
}

The Result<..., Error> is repeated a lot.

Therefore std::io has this type alias declaration

type Result<T> = std::result::Result<T, std::io::Error>;

Due to this declaration is in the std::io module, we can use the fully qualified alias std::io::Result<T>.

That is a Result<T, E> with the E filled in as std::io::Error.

The Write trait function signatures end up looking like this.

pub trait Write {
    fn write(&mut self, buf: &[u8]) -> Result<usize>;
    fn flush(&mut self) -> Result<()>;

    fn write_all(&mut self, buf: &[u8]) -> Result<()>;
    fn write_fmt(&mut self, fmt: fmt::Arguments) -> Result<()>;
}

This type alias helps in two ways:

  • It makes code easier to write.
  • And
  • It gives us a consistent interface across all of std::io Due to it being an alias, it is just another Result<T, E>, this means we can use any methods that work on Result<T, E> with it, as well as special syntax like the ? operator.

The Never Type that Never Returns

Rust has a special type named ! that is known in type theory lingo as the empty type because it has no values.

We prefer to call it the never type because it stands in the place of the return type when a function will never return.

Here is an example in use.

fn bar() -> ! {
    // --snip--
}

This code should be read as "the function bar returns never."

Functions that return never are called diverging functions.

We can't create values of the type ! so bar can never possibly return.

What is the use of a type you can never create values for?

Recall the code from Ch2, part of the number guessing game.

Here is a sample of that code

        let guess: u32 = match guess.trim().parse() {
            Ok(num) => num,
            Err(_) => continue,
        };

Before we skipped over some details about this code.

In ch6 we discussed that match arms must all return the same type.

For example this code will not compile.

    let guess = match guess.trim().parse() {
        Ok(_) => 5,
        Err(_) => "hello",
    };

The type of guess in this code would have to be an integer and a string, and Rust requires that guess have only one type.

So what does continue return?

How are we allowed to return a u32 from one arm and have another arm that ends with continue?

continue has a ! value.

That is, when Rust computes the type of guess, it looks at both match arms, the former with a value of u32 and the latter with a ! value.

Because ! can never have a value, Rust decides that the type of guess is u32.

The formal way to describe this behavior is that expressions of type ! can be coerced into any other type.

We are allowed to end this match arm with continue because continue doesn't return a value.

Instead it moves control back to the top of the loop, so in the Err case, we never assign a value to guess.

The never type is useful with the panic! macro as well.

Remember the unwrap function that we call on Option<T> values to produce a value or panic with this definition:

impl<T> Option<T> {
    pub fn unwrap(self) -> T {
        match self {
            Some(val) => val,
            None => panic!("called `Option::unwrap()` on a `None` value"),
        }
    }
}

Here, the same thing happens as in the match case form before.

Rust sees that val has the type T and panic! has the type !, so the result of the overall match expression is T.

This works because panic! doesn't produce a value, it ends the program.

In the None case, we will not be returning a value form unwarp so this code is valid.

One final expression that has the type ! is a loop.

    print!("forever ");

    loop {
        print!("and ever ");
    }

This loop never ends, so ! is the value of the expression.

However, this wouldn't be true if we included a break, because the loop would terminate when it got to the break.

Dynamically Sized Types and the Sized Trait

Rust must know certain details about its types, such as how much space to allocate for a value of a particular type.

This leaves one corner of its type system a little confusing at first: the concept of dynamically sized types.

Sometimes referred to as DSTs or unsized types, these types let us write code using values whose size we can know only at runtime.

Lets look into the details of a dynamically sized type called str, which we have been using throughout.

This does not include &str, but str on its own, is a DST.

We can't know how long the string is until runtime, meaning we can't create a variable of type str, nor can we make that argument of type str.

Consider this code, which will not compile.

    let s1: str = "Hello there!";
    let s2: str = "How's it going?";

Rust needs to know how much memory to allocate for any value of a particular type, and all values of a type must use the same amount of memory.

If Rust allowed use to write this code, these two str values would need to take up the same amount of memory.

These two have different lengths:

  • s1 needs 12 bytes of storage.
  • s2 needs 15. This is why it is not possible to create a variable holding a dynamically sized type.

So what should we do?

We should make the types of s1 and s2 a &str rather than a str.

Recall from the "String Slice" section from Ch4, that the slice data structure just stores the starting position and the length of the slice.

Even though a &T is a single value that stores the memory address of where the T is located, a &str is two values.

The address of the str and its length.

We can know the size of a &str value at compile time: it's twice the length of a usize.

This means we always know the size of a &str, no matter how long the string it refers to is.

Generally this is the way in which dynamically sized types are used in Rust, they have an extra but of metadata that stores the size of the dynamic information.

The golden rule of dynamically sized types is that we must always put values of dynamically sized types behind a pointer of some kind.

We can combine str with all kinds of pointers.

For example Box<str> or Rc<str>.

In fact we have seen this before but with a different dynamically sized type: traits.

Every trait is a dynamically sized type we can refer to by using the name of the trait.

In Ch18 in "Using Trait Objects That Allow for Values of Different Types", we mentioned that to use trait as trait objects, we must put them behind a pointer, such as &dyn Trait or Box<dyn Trait> (Rc<dyn Trait> would work as well).

To work with DSTs, Rust provides the Sized trait to determine whether or not a type's size is known at compile time.

This trait is automatically implemented for everything whose size is known at compile time.

Additionally Rust implicitly adds a bound on Sized to every generic function.

That is, a generic function definition like this:

fn generic<T>(t: T) {
    // --snip--
}

This is actually treated as though we had written this:

fn generic<T: Sized>(t: T) {
    // --snip--
}

By default, generic functions will work only on types that have a known size at compile time.

However, you can use the following special syntax to relax this restriction.

fn generic<T: ?Sized>(t: &T) {
    // --snip--
}

A trait bound on ?Sized means "T may or may not be Sized".

This notation overrides the default that generic types must have a known size at compile time.

The ?Trait syntax with this meaning is only available for Sized, not any other traits.

Note that we switched the type of the t parameter from T to &T.

Because the type might not be Sized, we need to use it behind some kind of pointer.

Here we have chosen to use a reference.