RustBrock/Leaky Reference Cycles.md
darkicewolf50 329c2e0d8d
Some checks failed
Test Gitea Actions / first (push) Successful in 17s
Test Gitea Actions / check-code (push) Failing after 13s
Test Gitea Actions / test (push) Has been skipped
Test Gitea Actions / documentation-check (push) Has been skipped
finished ch15.6 and ch15
2025-03-10 05:29:01 -06:00

16 KiB

Reference Cycles Can Leak Memory

It is not impossible but difficult to accidentally create memory that is never cleaned up (known as a memory leak). This is despite Rust's memory safety guarantees.

Preventing memory leaks is entirely is not one of Rust's guarantees.

This means that memory leaks are memory safe in Rust.

Rusts allows for memory leaks by using Rc<T> and RefCell<T>.

These two make it possible to create references where items refer to each other other in aa cycle.

This will create a memory leak because the reference count of e4ach item in the cycle will never get to 0 and the values will not be dropped.

Creating a Reference Cycle

Lets look into how a reference cycle could happen and how to prevent it.

We will start with the definition of the List enum and a tail method.

use crate::List::{Cons, Nil};
use std::cell::RefCell;
use std::rc::Rc;

#[derive(Debug)]
enum List {
    Cons(i32, RefCell<Rc<List>>),
    Nil,
}

impl List {
    fn tail(&self) -> Option<&RefCell<Rc<List>>> {
        match self {
            Cons(_, item) => Some(item),
            Nil => None,
        }
    }
}

fn main() {}

Here the second element in the Cons variant is now RefCell<Rc<List>> means that instead of having the ability to modify the i32 value as we did before.

We instead want to modify the List value a Cons variant is pointing to.

Here we also add a tail method to make it convenient for us to access the second item if we have a Cons variant.

In the next example we add to main function's body. This code create a list in a and a list in b that points to the list in a

It then modifies the list in a to point to b, this creates a reference cycle.

We show what is happening via println! statements to show what the reference count is at various points in the process.

fn main() {
    let a = Rc::new(Cons(5, RefCell::new(Rc::new(Nil))));

    println!("a initial rc count = {}", Rc::strong_count(&a));
    println!("a next item = {:?}", a.tail());

    let b = Rc::new(Cons(10, RefCell::new(Rc::clone(&a))));

    println!("a rc count after b creation = {}", Rc::strong_count(&a));
    println!("b initial rc count = {}", Rc::strong_count(&b));
    println!("b next item = {:?}", b.tail());

    if let Some(link) = a.tail() {
        *link.borrow_mut() = Rc::clone(&b);
    }

    println!("b rc count after changing a = {}", Rc::strong_count(&b));
    println!("a rc count after changing a = {}", Rc::strong_count(&a));

    // Uncomment the next line to see that we have a cycle;
    // it will overflow the stack
    // println!("a next item = {:?}", a.tail());
}

Here we create a Rc<List> instance holding a List value in the variable a with an initial list of 5, Nil.

Next we create another Rc<List> instance holding another List value in the variable b that contains the value 10 and points to the list in a.

We then modify a so that it points a b instead of Nil.

This creates a reference cycle.

We do this by using the tail method to get a reference to the RefCell<Rc<List>> in a which we put in the variable link.

Then we use the borrow_mut method on the RefCell<Rc<LIST>> to change the value inside form a Rc<List> that holds a Nil value to the Rc<List> in b.

When we run this, while keeping the last println! commented out for the moment, we get this output

$ cargo run
   Compiling cons-list v0.1.0 (file:///projects/cons-list)
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.53s
     Running `target/debug/cons-list`
a initial rc count = 1
a next item = Some(RefCell { value: Nil })
a rc count after b creation = 2
b initial rc count = 1
b next item = Some(RefCell { value: Cons(5, RefCell { value: Nil }) })
b rc count after changing a = 2
a rc count after changing a = 2

We can see the reference count of the Rc<List> instances in both a and b are 2 after we change the list in a to point to b.

At the end of main Rust will drop the variable b, this decreases the reference count of the b Rc<List> instance from 2 to 1.

The memory of that Rc<List> has on heap won't be dropped at this point because the reference count is still not 0, it is 1 currently.

Next Rust drops a, this also decreases the reference count from the a Rc<List> instance from 2 to 1.

This instance's memory cant be dropped either because of the reference form the other Rc<List> instance still referes to it.

This memory allocated to the list will remain uncollected forever.

Here is a visual aid to show this. If you now uncomment the last println! and run the program.

Rust will try to print this cycle with a pointing to a and so forth until it overflows the stack.

Compared to a real-world program, the consequences of creating a reference cycle in this example aren't very dire.

Right after we create the reference cycle, the program ends.

If this were a more complex program allocating lots of memory in a cycle and held onto it for a long time, the program will use more memory than it needs and might overwhelm the system, causing it to run out of available memory.

If you have RefCell<T> values that contain Rc<T> values or similar nested combinations of types with interior mutability and reference counting, you must ensure that you don't create these cycles.

You are unable to rely on Rust to catch them.

Creating a reference cycle would be a logic bug in your program that you should use automated tests, code reviews and other software development practices to minimize.

Another solution for avoiding this is reorganizing your data structs so that some references express ownership and some references don't.

This results in you can have cycles made up of some ownership relationships and some non-ownership relationships, and only the ownership relationships affect whether or not a value can be dropped.

In the previous example we always want Cons variants to own their list, so reorganizing the data struct isn't possible.

Now we will look at an example using graphs made up of parent nodes and child nodes to see when non-ownership relationships are an appropriate way to prevent reference cycles.

Preventing Reference Cycles: Turning a Rc<T> into a Weak<T>

So far we have demonstrated that calling Rc::clone increases the strong_count of a Rc<T> instance, and a Rc<T> instance is only cleaned up if its strong_count is 0.

You can also create a weak reference to the value within a Rc<T> instance by calling Rc::downgrade and passing a reference to the Rc<T>.

Strong references don't express an ownership relationship.

Weak references don't express an ownership relationship, and their count doesn't affect when a Rc<T> instance is cleaned up.

They will no cause a reference cycle because any cycle involving weak references will be broken once the strong reference count of values involved is 0.

When you call Rc::downgrade, you get a smart point of type Weak<T>.

Instead of increasing the strong_count in the Rc<T> by 1, using Rc::downgrade increases the weak count by 1.

The Rc<T> type uses weak_count to keep track of how many Weak<T> references exist, this is similar to strong_count.

The differences is that weak_count doesn't need to be 0 for the Rc<T> instance to be cleaned up.

Due to the value that Weak<T> references could be referred to might have been dropped, to do anything with a value that a Weak is pointing to, you must ensure that the value will still exist.

You can do this by calling the upgrade method on a Weak<T> instance, this will return an Option<Rc<T>>.

You will get a result of Some if the Rc<T> value has not been dropped yet and a results of None if the Rc<T> value has been dropped.

Due to upgrade returning an Option<Rc<T>>, Rust will ensure that the Some case and the None cases are handled and there will no be an invalid pointer.

Next we will show an example where rather than using a list whose items know only about the next item, we will create a tree whose items know about their children items and their parent items.

Creating a Tree Data Structure a Node with Child Nodes

We will start with building a tree with nodes that know about their child nodes.

We will create a struct called Node that holds its own i32 value as well as references to its children Node values.

use std::cell::RefCell;
use std::rc::Rc;

#[derive(Debug)]
struct Node {
    value: i32,
    children: RefCell<Vec<Rc<Node>>>,
}

Here we want a Node to own its children, and we want to share that ownership with variables so we can access each Node in the tree directly.

We do this by defining the Vec<T> items to be values of type Rc<Node>.

We also want to be able to modify which nodes are children of another node.

Now we have a RefCell<T> in children around the Vec<Rc<Node>>.

Next we will use the struct definition and create one Node instance named leaf with the value 3 and no children and another instance named branch with the vale 5 and leaf as one of its children.

fn main() {
    let leaf = Rc::new(Node {
        value: 3,
        children: RefCell::new(vec![]),
    });

    let branch = Rc::new(Node {
        value: 5,
        children: RefCell::new(vec![Rc::clone(&leaf)]),
    });
}

Here we clone the Rc<Node> in leaf and store that in branch, meaning the Node in leaf now has two owners.

These are leaf and branch.

We can get from branch to leaf through branch.children.

There is no way to get from leaf to branch.

The reason that leaf has no reference to branch and doesn't know they are related.

The reason that leaf has no reference to branch is that it doesn't know they are related.

We want leaf to know that branch is its parent.

Adding a Reference form a child to Its Parent

To make the child node aware of the relationship to its parent, we need to add a parent field to our Node struct definition.

The trouble is in deciding what the type of parent should be.

We know that it cannot contain a Rc<T>, because that would create a reference cycle with leaf.parent pointing to branch and branch.children pointing to leaf.

This would cause their strong_count values to never be 0.

Thinking about the relationships another way, a parent node should own its children.

If a parent node is dropped, then its child nodes should be dropped as well.

A child should not own its parent.

If a child node, the parent should still exist.

We should use weak references in this case.

So we instead of Rc<T> we will make the type of parent use Weak<T>.

Specifically a RefCell<Weak<Node>>. Now the Node definition will look like this.

use std::cell::RefCell;
use std::rc::{Rc, Weak};

#[derive(Debug)]
struct Node {
    value: i32,
    parent: RefCell<Weak<Node>>,
    children: RefCell<Vec<Rc<Node>>>,
}

Now a node will be able to refer to its parent node but doesn't own its parent.

In the next example we will update main to use this new definition so the leaf node will have a way to refer to its parent branch.

fn main() {
    let leaf = Rc::new(Node {
        value: 3,
        parent: RefCell::new(Weak::new()),
        children: RefCell::new(vec![]),
    });

    println!("leaf parent = {:?}", leaf.parent.borrow().upgrade());

    let branch = Rc::new(Node {
        value: 5,
        parent: RefCell::new(Weak::new()),
        children: RefCell::new(vec![Rc::clone(&leaf)]),
    });

    *leaf.parent.borrow_mut() = Rc::downgrade(&branch);

    println!("leaf parent = {:?}", leaf.parent.borrow().upgrade());
}

Creating the leaf node is similar to the example above with the exception of the parent field.

leaf starts out without a parent, so we create a new , empty Weak<node> reference instance.

At this point when we try to get a reference to the parent of leaf by using the upgrade method, we get a None value.

We will see this in the output from the first println! statement.

leaf parent = None

Here when we create the branch node, it will also have a new Weak<Node> reference in the parent field, this is because the branch doesn't have a parent node.

We will still have leaf as one of the children of branch.

Once we have the Node instance in branch, we can modify leaf to give it a Weak<Node> reference to its parent.

We will use the borrow_mut method on the RefCell<Weak<Node>> in the parent field of leaf.

Then we use the Rc::downgrade function to create a Weak<Node> reference to branch form the Rc<Node> in branch.

When we then print the parent of leaf again, this time we will get a Some variant holding branch

Now leaf will know the relationship to the parent and will have access to its parent.

When we print leaf, we need to avoid the cycle that eventually end in a stack overflow, like what we had before.

The Weak<Node> references are printed as (Weak)

leaf parent = Some(Node { value: 5, parent: RefCell { value: (Weak) },
children: RefCell { value: [Node { value: 3, parent: RefCell { value: (Weak) },
children: RefCell { value: [] } }] } })

Th lack of infinite output indicates that this code didn't create a reference cycle.

We can also tell this by looking at the values we get form calling Rc::strong_count and Rc::weak_count.

Visualizing Changes to strong_count and weak_count

Now we will look at how the strong_count and weak_count values of the Rc<Node> instances change by creating a new inner scope and moving the creation of branch into that scope`.

Doing this we can then see what happens when branch and then dropped when it goes out of scope.

Here we can see the modifications to main

fn main() {
    let leaf = Rc::new(Node {
        value: 3,
        parent: RefCell::new(Weak::new()),
        children: RefCell::new(vec![]),
    });

    println!(
        "leaf strong = {}, weak = {}",
        Rc::strong_count(&leaf),
        Rc::weak_count(&leaf),
    );

    {
        let branch = Rc::new(Node {
            value: 5,
            parent: RefCell::new(Weak::new()),
            children: RefCell::new(vec![Rc::clone(&leaf)]),
        });

        *leaf.parent.borrow_mut() = Rc::downgrade(&branch);

        println!(
            "branch strong = {}, weak = {}",
            Rc::strong_count(&branch),
            Rc::weak_count(&branch),
        );

        println!(
            "leaf strong = {}, weak = {}",
            Rc::strong_count(&leaf),
            Rc::weak_count(&leaf),
        );
    }

    println!("leaf parent = {:?}", leaf.parent.borrow().upgrade());
    println!(
        "leaf strong = {}, weak = {}",
        Rc::strong_count(&leaf),
        Rc::weak_count(&leaf),
    );
}

Now after creating leaf is created its Rc<Node> has a strong count of 1 and a weak count of 0.

In the inner scope we create branch and associate it with leaf, at which point when we print the counts, the Rc<Node> in branch will have a strong count of 1 and a weak count of 1.

For leaf.parent pointing to branch with a Weak<Node>.

When we print the counts in leaf we will see it have a strong count of 2, because branch now has a clone of the Rc<Node> of leaf stored in branch.children, built will still have a weak count of 0.

When the inner scope ends, branch goes out of scope and the strong count of the Rc<node> decreases to 0. The Node will be dropped.

The weak count of 1 form leaf.parent has no bearing on whether or not Node is dropped, so now we will not het any memory leaks.

Now if we try to access the parent of leaf after the inner scope we will get None again.

At the end of the program the Rc<Node> in leaf has a strong count of 1 and a weak count of 0.

This is because the variable leaf is now the only reference to the Rc<Node>.

All of this logic manages the counts and value dropping is built into Rc<T> and Weak<T> and their implementations of the Drop trait.

By specifying that the relationship form a child to its parent should be a Weak<T> reference in the definition of Node.

You are able to have parent nodes point to child nodes and vice versa without creating a reference cycle and memory leaks.