RustBrock/Unsafe Rust.md

# Unsafe Rust
So far the code we have discussed has had Rust's memory safety guarantees enforced at compile time.

Rust has a second language hidden inside it that doesn't enforce memory safety guarantees.

This is called *unsafety Rust*, which works just like regular Rust but it gives extra superpowers.

Unsafe Rusts exists because static analysis is conservative.

When the compiler tries to determine whether or not code upholds the guarantees, it is better for it to reject some valid programs than to accept some invalid programs.

Even if the code *may* be ok, if the compiler doesn't have enough information to be confident and therefore reject the code.

In these cases you can use unsafe code to tell the compiler, "Trust me, I know what I'm doing."

Be warned that you use unsafe Rust at your own risk.

If you use unsafe code incorrectly, problems can occur due to memory unsafety, such as null pointer dereferencing.

Rust has an unsafe alter ego is that the underlying computer hardware is inherently unsafe.

If Rust didn't allow us to do unsafe operations, you couldn't do certain tasks.

Rust needs to allow you to do low-level systems programming, such as directly interacting with the operating system or even writing your own operating system.

Working with low-level systems programming is one of the goals of the language.

## Unsafe Superpowers
To switch to unsafe Rust, you need to use the `unsafe` keyword and then start a new block that holds the unsafe code.

You can do five additional actions in unsafe Rust that you can't do in safe Rust, which we call *unsafe superpowers*.

These superpowers are:
- Dereference a raw pointer
- Call an unsafe function or method
- Access or modify a mutable static variable
- Implement an unsafe trait
- Access fields of a `union`
Note: `unsafe` does not turn off the borrow checker or disable any other of Rust's safety checks.

If you use a reference in unsafe code, it will still be checked.

The `unsafe` keyword only gives access to these five features that are then not checked by the compiler for memory safety.

You will still get some degree of safety inside of an unsafe block.

Additionally `unsafe` does not mean the code inside the block is necessarily dangerous or that it will definitely have memory safety problems.

The intent as the programmer, you will ensure the code inside an `unsafe` block will access memory in a valid way.

People are fallible, and mistakes will happen, but by requiring these five unsafe operations to be inside blocks annotated with `unsafe` you will know that any errors related to memory safety must be within an `unsafe` block.

Keep `unsafe` blocks small: you will be thankful later when you investigate memory bugs.

To isolate unsafe code as much as possible, it is best to enclose unsafe code within a safe abstraction and provide a safe API (This will be covered when we examine unsafe functions and methods).

Parts of the std library are implemented as safe abstractions over unsafe code that has been audited.

Wrapping unsafe code in a safe abstraction prevents uses of `unsafe` from leaking out into all the places that you want to use the functionality implemented with `unsafe` code, because using aa safe abstraction is safe.

Now we will look into each of the five unsafe superpowers.

We will also look at some abstractions that provide a safe interface to unsafe code.

## Deferencing a Raw Pointer

We mentioned before that the compiler ensures references are always valid.

Unsafe Rust has two new types called *raw pointers* that are similar to references

Raw pointers can be immutable or mutable, just like with references, are written as `*const T` and `*mut T`, respectively.

The asterisk isn't the dereference operator, it is part of the type name.

In the context of raw pointers, *immutable* means that the pointer can't be directly assigned to after being dereferenced.

Different from references and smart pointers, raw pointers:
- Are allowed to ignore the borrowing rules by having both immutable and mutable pointers or multiple mutable pointers to the same location
- Aren't guaranteed to point to valid memory
- Are allowed to be null
- Don't implement nay automatic cleanup
By opting out of Rust's enforced guarantees, you can give up safety in exchange for greater performance or the ability to interface with another language or hardware where Rust's guarantees don't apply.

Here is an example of how to create an immutable and a mutable raw pointer.
```rust
    let mut num = 5;

    let r1 = &raw const num;
    let r2 = &raw mut num;
```

Note that we don't include the `unsafe` keyword here.

We can create raw pointers in safe code; we just can't dereference raw pointers outside an unsafe block (We will see this later).

We have created raw pointers by using the raw borrow operators.

`&raw const num` creates a `*const i32` immutable raw pointer.

`&raw mut num` creates a `*mut i32` mutable raw pointer.

Because we created them directly form a local variable, we know these particular raw pointers are valid, but we can't make that assumption about just any raw pointer.

In order to demonstrate this, we will create a raw pointer whose validity we can't be so certain of, using `as` to cast a value instead of using the raw reference operators.

This shows how to create a raw pointer to an arbitrary location in memory.

Trying to use arbitrary memory is undefined: there might be data at that address or there might not, the compiler might optimize the code so there is no memory access, or the program might error with a segmentation fault.

Usually there is no good reason to write code like this, especially in cases where you can use a raw borrow operator instead, but it is possible.
```rust
    let address = 0x012345usize;
    let r = address as *const i32;
```
Remember that we can create raw pointers in safe code, but we can't *dereference* raw pointers and read the data being pointed to.

Here we use the dereference operator `*` on a raw pointer that requires an `unsafe` block.
```rust
    let mut num = 5;

    let r1 = &raw const num;
    let r2 = &raw mut num;

    unsafe {
        println!("r1 is: {}", *r1);
        println!("r2 is: {}", *r2);
    }
```
Creating a pointer does no hard. It is only when we attempt to access the value that it points at that we might end up dealing with an invalid value.

Note in the first and third example, we created `*const i32` and `*mut i32` raw pointers that both pointed to the same memory location where we `num` is stored.

If we instead tried to create an immutable and a mutable reference to `num`, the code would not have compiled because Rust's ownership rules don't allow a mutable reference at the same time as any immutable references.

With raw pointers, we can create a mutable pointer and an immutable pointer to the same location and change data through the mutable pointer, potentially create a data race.

With all of the dangers, why ever use raw pointers?

Once major use case is when interfacing with C code, as you will see in the next section ["Calling an Unsafe Function or Method"]().

Another case is when building up safe abstractions that the borrow checker doesn't understand.

We will introduce unsafe function and then look at an example of a safe abstraction that uses unsafe code.

## Calling an Unsafe Function or Method
The second type of superpower that you can perform in an unsafe block is calling unsafe functions.

Unsafe functions and methods look exactly like regular functions and methods, but they have an extra `unsafe` before the rest of the definitions.

The `unsafe` keyword indicates the function has requirements we need to uphold when we call this function, because Rust can't guarantee we have met these requirements.

By calling an unsafe function within an `unsafe` block, we are saying that we have read this function's documentation and take responsibility for upholding the function's contracts.

This code is an unsafe function named `dangerous` that doesn't do anything in its body
```rust
    unsafe fn dangerous() {}

    unsafe {
        dangerous();
    }
```
We must call the `dangerous` function within a separate `unsafe` block.

If we attempted to call `dangerous` without the `unsafe` block, we would get this error
```
$ cargo run
   Compiling unsafe-example v0.1.0 (file:///projects/unsafe-example)
error[E0133]: call to unsafe function `dangerous` is unsafe and requires unsafe function or block
 --> src/main.rs:4:5
  |
4 |     dangerous();
  |     ^^^^^^^^^^^ call to unsafe function
  |
  = note: consult the function's documentation for information on how to avoid undefined behavior

For more information about this error, try `rustc --explain E0133`.
error: could not compile `unsafe-example` (bin "unsafe-example") due to 1 previous error
```
With the `unsafe` block, we are asserting to Rust that we have read this function's documentation, we understand how to use it properly, and we have verified that we are fulfilling the contract of the function.

To perform unsafe operations in the body of an unsafe function you still need to use an `unsafe` block just as within a regular function, and the compiler will warn you if you forget.

This helps to keep `unsafe` blocks as small as possible, as unsafe operations may not be needed across the whole function body.

### Creating a Safe Abstraction over Unsafe Code
Just because a function contains unsafe code, this does mean we need to mark the entire function as unsafe.

In fact wrapping unsafe is a common abstraction.

For example, let's take a look at the `split_at_mut` function from the std library, which requires some unsafe code.

We will also explore how we might implement it.

This safe method is defined on mutable slices: it takes once slice and makes it two by splitting the slice at the index given as an argument.

Here is how to use `split_at_mut`
```rust
    let mut v = vec![1, 2, 3, 4, 5, 6];

    let r = &mut v[..];

    let (a, b) = r.split_at_mut(3);

    assert_eq!(a, &mut [1, 2, 3]);
    assert_eq!(b, &mut [4, 5, 6]);
```
We can't implement this function using only safe Rust.

An attempt might look something like this, which will not compile.

For simplicity we will implement `split_at_mut` as a function rather than a method and only for slices of `i32` values rather than for a generic type `T`.
```rust
fn split_at_mut(values: &mut [i32], mid: usize) -> (&mut [i32], &mut [i32]) {
    let len = values.len();

    assert!(mid <= len);

    (&mut values[..mid], &mut values[mid..])
}
```
This function first gets the total length of the slice.

Then it asserts that the index given as a parameter is within the slice by checking whether it is less than or equal to the length.

This assertion means that if we pass an index that is greater than the length to split the slice at, the function will panic before it attempts to use that index.

Next we return two mutable slices in a tuple, one form the start of the original slice to the `mid` index and another from `mid` to the end of the slice.

We get this compilation error
```
$ cargo run
   Compiling unsafe-example v0.1.0 (file:///projects/unsafe-example)
error[E0499]: cannot borrow `*values` as mutable more than once at a time
 --> src/main.rs:6:31
  |
1 | fn split_at_mut(values: &mut [i32], mid: usize) -> (&mut [i32], &mut [i32]) {
  |                         - let's call the lifetime of this reference `'1`
...
6 |     (&mut values[..mid], &mut values[mid..])
  |     --------------------------^^^^^^--------
  |     |     |                   |
  |     |     |                   second mutable borrow occurs here
  |     |     first mutable borrow occurs here
  |     returning this value requires that `*values` is borrowed for `'1`
  |
  = help: use `.split_at_mut(position)` to obtain two mutable non-overlapping sub-slices

For more information about this error, try `rustc --explain E0499`.
error: could not compile `unsafe-example` (bin "unsafe-example") due to 1 previous error
```
Rust's borrow checker can't understand that we are borrowing different parts of the slice.

It only knows that we are borrowing from the same slice twice.

Borrowing different parts of a slice fundamentally ok because the two slices aren't overlapping, but Rust isn't smart enough to know this.

When we know code is ok, but Rust doesn't, it is time to reach for unsafe code.

Below shows how to use an `unsafe` block, a raw pointer, and some calls to unsafe functions to make the implementation of `split_at_work`.
```rust
use std::slice;

fn split_at_mut(values: &mut [i32], mid: usize) -> (&mut [i32], &mut [i32]) {
    let len = values.len();
    let ptr = values.as_mut_ptr();

    assert!(mid <= len);

    unsafe {
        (
            slice::from_raw_parts_mut(ptr, mid),
            slice::from_raw_parts_mut(ptr.add(mid), len - mid),
        )
    }
}
```
Remember from ["The Slice Type"](), from Ch4 that slices are a pointer to some data and the length of the slice.

We can use the `len` method to get the length of a slice and the `as_mut_ptr` method to access the raw pointer of a slice.

In this case because we have a mutable slice to `i32` values, `as_mut_ptr` returns a raw pointer with the type `*mut i32`, which we have stored in the variable `ptr`.

We keep the assertion that the `mid` index is within the slice.

Next we get to the unsafe code: the `slice::from_raw_parts_mut` function takes a raw pointer and a length, and it creates a slice.

We use this function to create a slice that starts from `ptr` and is `mid` items long.

Next we call the `add` method on `ptr` with `mid` as an argument to get a raw pointer that starts at `mid`, and we create a slice using that pointer and the remaining number of items after `mid` as the length.

The function `slice::from_raw_parts_mut` is unsafe because it takes a raw pointer and must trust that this pointer is valid.

The `add` method on raw pointers is also unsafe, because it must trust that the offset location is also a valid pointer.

We have to put this in an `unsafe` block around our calls to `slice::from_raw_parts_mut` and `add` so we could call them.

By looking at the code and by adding the assertion that `mid` must be less than or equal to `len`.

We can tell that all the raw pointers used within the `unsafe` block will be valid pointers to data within the slice.

This is an acceptable and appropriate use of `unsafe`.

Notice that we don't need to mark the resulting `split_at_mut` function as `unsafe`.

We can call this function form safe Rust.

We have created a safe abstraction to the unsafe code with an implementation of the function that uses `unsafe` code in a safe way, because it creates only valid pointers from the data this function has access to.

By contrast, the use of `slice::from_raw_parts_mut` here would likely crash when the slice is used.

This code takes an arbitrary memory location and creates a slice 10,000 items long.
```rust
    use std::slice;

    let address = 0x01234usize;
    let r = address as *mut i32;

    let values: &[i32] = unsafe { slice::from_raw_parts_mut(r, 10000) };
```
We don't own the memory at this arbitrary location and there is no guarantee that the slice this code creates contains valid `i32` values.

Attempting to use `values` as though it's aa valid slice results in undefined behavior.

## Using `extern` Functions to Call External Code
Sometimes, your Rust code might need to interact with code written in another language.

To enable this, Rust has the keyword `extern` that facilitates the creation and use of a *Foreign Function Interface (FFI)*.

An FFI is a way for a programming language to defined functions and enable a different (foreign) programming language to call those functions.

This code demonstrates how to set up an integration with the `abs` function form the C std library.

Functions declared within `extern` blocks are usually unsafe to call from Rust code, so they must also be marked `unsafe`.

The reason is that other languages don't enforce Rust's rules and guarantees, and Rust can't check them, so the responsibility falls on the programmer to ensure safety.
```rust
unsafe extern "C" {
    fn abs(input: i32) -> i32;
}

fn main() {
    unsafe {
        println!("Absolute value of -3 according to C: {}", abs(-3));
    }
}
```
Within the `unsafe extern "C"` block, we list the names and signatures of external functions from another language we want to call.

The `"C"` part defines which *application binary interface (ABI)* the external function uses: the ABI defines how to call the function at the assembly level.

The `"C"` ABI is the most common and follows the C programming language's ABI.

This particular function does not have any memory safety considerations.

In fact, we know that any call to `aabs` will always be safe for any `i32`, so we can use the `safe` keyword to say that this specific function is safe to call even though it is in an `unsafe extern` block.

Once we make this change, calling it no longer requires an `unsafe` block.

Here is the updated code
```rust
unsafe extern "C" {
    safe fn abs(input: i32) -> i32;
}

fn main() {
    println!("Absolute value of -3 according to C: {}", abs(-3));
}
```
Marking a function as `safe` does not inherently make it safe.

Instead, it is like a promise you are making to Rust that it *is* safe.

It is still your responsibility to make sure that the promise is kept.

### Calling Rust Functions form Other Languages
You can also use `extern` to create an interface that allows other languages to call Rust functions.

Instead of creating a whole `extern` block, we add the `extern` keyword and specify the ABI to use before the `fn` keyword for the relevant function.

Additionally we need to add a `#[unsafe(no_mangle)]` annotation to tell the Rust compiler not to mange the name of this function.

*Mangling* is when a compiler changes the name we have given a function to a different name that contains more information for other parts of the compilation process to consume but is less human readable.

Every programming language compiler mangles names slightly differently, so for a Rust function to be nameable by other languages, we must disable the Rust compiler's name mangling.

This is unsafe because there might be name collisions across libraries with the built-in mangling, so it is our responsibility to make sure the name we have exported is safe to export without mangling.

Here we make the `call_from_c` function accessible from C code, after it is compiled to a shared library and linked from C
```rust
#[unsafe(no_mangle)]
pub extern "C" fn call_from_c() {
    println!("Just called a Rust function from C!");
}
```
This usage of `extern` does not require `unsafe`

## Accessing or Modifying a Mutable Static Variable
So far we have not talked about `global variables`, which Rust does support but can be problematic with Rust's ownership rules.

If two threads are accessing the same mutable global variable, it can cause a data race.

In Rust, global variables are called *static* variables.

Here shows an example declaration and use of a static variable with a string slice as a value.
```rust
static HELLO_WORLD: &str = "Hello, world!";

fn main() {
    println!("name is: {HELLO_WORLD}");
}
```
Static variables are similar to constants, which we discussed in the ["Constants"]() section in Ch3.

The names of static variables are in `SCREAMING_SNAKE_CASE` by convention.

Static variables can only store references with the `'static` lifetime.

This means the Rust compiler can figure out the lifetime and we aren't required to annotate it explicitly.

Accessing an immutable static variable is safe.

A subtle difference between constants and immutable static variables is that values in a static variable have a fixed address in memory.

Using the value will always access the same data.

Constants, on the other hand are allowed to duplicate their data whenever they are used.

Another difference is that static variables can by mutable.

Accessing and modifying mutable static variables is *unsafe*.

Here shows how to declare, access and modify a mutable static variable named `COUNTER`.
```rust
static mut COUNTER: u32 = 0;

/// SAFETY: Calling this from more than a single thread at a time is undefined
/// behavior, so you *must* guarantee you only call it from a single thread at
/// a time.
unsafe fn add_to_count(inc: u32) {
    unsafe {
        COUNTER += inc;
    }
}

fn main() {
    unsafe {
        // SAFETY: This is only called from a single thread in `main`.
        add_to_count(3);
        println!("COUNTER: {}", *(&raw const COUNTER));
    }
}
```
Just like with regular variables, we specify mutability using the `mut` keyword.

Any code that reads or writes from `COUNTER` must be within an `unsafe` block.

The code above compiles and prints `COUNTER: 3` as expected because it is single threaded.

Having multiple thread access `COUNTER` would likely result in data races, so it is undefined behavior.

Due to this we need to mark the entire function as `unsafe` and document the safety limitation, so anyone calling this function knows what they are and are not allowed to do safely.

Whenever we write an unsafe function, it is idiomatic to write a comment starting with `SAFETY` and explaining what the caller needs to do to call the function safely.

Also whenever we perform an unsafe operation it is idiomatic to write a comment starting with `SAFETY` to explain how the safety rules are upheld.

As well, the compiler will not allow you to create references to a mutable static variable.

You can only access it via a raw pointer, created with one of the raw borrow operators.

This includes in cases where the reference is created invisibly as when it is used in the `println!` in this code listing.

The requirement that references to static mutable variables can only be created via raw pointers helps make the safety requirements for using them more obvious.

With mutable data that is globally accessible, it is difficult to ensure that there are no data races, which is why Rust considers mutable static variables to be unsafe.

Where it is possible it is preferred to use concurrency techniques and thread-safe smart pointers, so the compiler checks that data accessed from different threads is done safely.

## Implementing an Unsafe Trait
Additionally `unsafe` to implement an unsafe trait.

A trait is unsafe when at least one of its methods has some invariant that the compiler can't verify.

We declare that a trait is `unsafe` by adding the `unsafe` keyword before `trait` and marking the implementation of the trait as `unsafe` too.

Here is an example of this
```rust
unsafe trait Foo {
    // methods go here
}

unsafe impl Foo for i32 {
    // method implementations go here
}

fn main() {}
```
By using `unsafe impl`, we  are promising that we will uphold the invariants that the compiler can't verify.

As an example, recall the `Sync` and `Send` marker traits we discussed in ["Extensible Concurrency with the `Sync` and `Send` Traits"]() section.

The compiler implements these traits automatically if our types are composed entirely of `Send` and `Sync` types.

If we implement a type that contains a type that is not `Send` or `Sync`, such as raw pointers, and we want to mark that type as `Send` or `Sync`, we must use `unsafe`.

Rust can't verify that our type upholds the guarantees that it can be safely sent across threads or accessed form multiple threads.

We need to do these checks manually and indicate as such with `unsafe`.

## Accessing Fields of a Union
The last superpower that works on with `unsafe` is accessing fields of a *union*.

A `union` is similar to a `struct`, but only one declared field is used in a particular instance at one time.

Unions are primarily used to interface with unions in C code.

Accessing union fields is unsafe because Rust can't guarantee the type of data currently being stored in the union instance.

You can learn more about unions in [the Rust Reference](https://doc.rust-lang.org/reference/items/unions.html).

## Using Miri to check unsafe code
When writing unsafe code, you may want to check that you have written actually is safe and correct.

Using [Miri](https://github.com/rust-lang/miri), an official Rust tool for detecting undefined behavior is one of the best ways to do it.

While the borrow checker is a *static* tool which works at compile time, Miri is a *dynamic* tool which works at runtime.

It checks your code by running your program or its test suite and detecting when you violate the rules it understands about how Rust should work.

Miri requires a nightly build of Rust (discussed in [Appendix G: How Rust is Made and "Nightly Rust"](https://doc.rust-lang.org/book/appendix-07-nightly-rust.html)).

You can install both a nightly version of Rust and the Miri tool by typing `rustup +nightly component add miri`.

This does not change what version of Rust your project uses.

This only adds the tool to your system so you can use it when you want to.

You can run Miri on a project by typing cargo `+nightly miri run` or `cargo +nightly miri test`.

Here is an example of how useful when we run it on a previous example
```
$ cargo +nightly miri run
   Compiling unsafe-example v0.1.0 (file:///projects/unsafe-example)
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.01s
     Running `/Users/chris/.rustup/toolchains/nightly-aarch64-apple-darwin/bin/cargo-miri runner target/miri/aarch64-apple-darwin/debug/unsafe-example`
warning: creating a shared reference to mutable static is discouraged
  --> src/main.rs:14:33
   |
14 |         println!("COUNTER: {}", COUNTER);
   |                                 ^^^^^^^ shared reference to mutable static
   |
   = note: for more information, see <https://doc.rust-lang.org/nightly/edition-guide/rust-2024/static-mut-references.html>
   = note: shared references to mutable statics are dangerous; it's undefined behavior if the static is mutated or if a mutable reference is created for it while the shared reference lives
   = note: `#[warn(static_mut_refs)]` on by default

COUNTER: 3
```
It helpfully and correctly notices that we have shared references to mutable data and warns about it.

Here it does not tell us how to fix the problem, but it mean that we know there is a possible issue and can think about how to make sure it is safe.

In other cases, it can actually tell us that some code is *sure* to be wrong and make recommendations about how to fix it.

Miri will not catch *everything* you may get wrong when writing unsafe code.

Since it is a dynamic check, it only catches problems with code that actually gets run.

This means you will need to use it with good testing techniques to increase your confidence about the unsafe code you have written.

Additionally it does not cover every possible way your code can be unsound.

If Miri *does* catch a problem, you know that there is a bug, but just because Miri *doesn't* catch a bug doesn't mean there isn't a problem.

Miri can catch a lot despite this.

## When to Use Unsafe Code
Using `unsfe` to take one of the five superpowers, isn't wrong or frowned upon.

But it is trickier to get `unsafe` code correct because the compiler can't help uphold memory safety.

When you have a reason to use `unsafe` code, you can do so and having the explicit `unsafe` annotation makes it easier to tack down the source of problems when they occur.

When you write unsafe code, you can use Miri to help to be more confident that the code you wrote upholds Rust's rules.

For a deeper exploration of how to work effectively with unsafe Rust, read Rust's official guide on the subject the [Rustonomicon](https://doc.rust-lang.org/nomicon/).