16 KiB
Reference Cycles Can Leak Memory
It is not impossible but difficult to accidentally create memory that is never cleaned up (known as a memory leak). This is despite Rust's memory safety guarantees.
Preventing memory leaks is entirely is not one of Rust's guarantees.
This means that memory leaks are memory safe in Rust.
Rusts allows for memory leaks by using Rc<T>
and RefCell<T>
.
These two make it possible to create references where items refer to each other other in aa cycle.
This will create a memory leak because the reference count of e4ach item in the cycle will never get to 0 and the values will not be dropped.
Creating a Reference Cycle
Lets look into how a reference cycle could happen and how to prevent it.
We will start with the definition of the List
enum and a tail
method.
use crate::List::{Cons, Nil};
use std::cell::RefCell;
use std::rc::Rc;
#[derive(Debug)]
enum List {
Cons(i32, RefCell<Rc<List>>),
Nil,
}
impl List {
fn tail(&self) -> Option<&RefCell<Rc<List>>> {
match self {
Cons(_, item) => Some(item),
Nil => None,
}
}
}
fn main() {}
Here the second element in the Cons
variant is now RefCell<Rc<List>>
means that instead of having the ability to modify the i32
value as we did before.
We instead want to modify the List
value a Cons
variant is pointing to.
Here we also add a tail
method to make it convenient for us to access the second item if we have a Cons
variant.
In the next example we add to main
function's body.
This code create a list in a
and a list in b
that points to the list in a
It then modifies the list in a
to point to b
, this creates a reference cycle.
We show what is happening via println!
statements to show what the reference count is at various points in the process.
fn main() {
let a = Rc::new(Cons(5, RefCell::new(Rc::new(Nil))));
println!("a initial rc count = {}", Rc::strong_count(&a));
println!("a next item = {:?}", a.tail());
let b = Rc::new(Cons(10, RefCell::new(Rc::clone(&a))));
println!("a rc count after b creation = {}", Rc::strong_count(&a));
println!("b initial rc count = {}", Rc::strong_count(&b));
println!("b next item = {:?}", b.tail());
if let Some(link) = a.tail() {
*link.borrow_mut() = Rc::clone(&b);
}
println!("b rc count after changing a = {}", Rc::strong_count(&b));
println!("a rc count after changing a = {}", Rc::strong_count(&a));
// Uncomment the next line to see that we have a cycle;
// it will overflow the stack
// println!("a next item = {:?}", a.tail());
}
Here we create a Rc<List>
instance holding a List
value in the variable a
with an initial list of 5, Nil
.
Next we create another Rc<List>
instance holding another List
value in the variable b
that contains the value 10 and points to the list in a
.
We then modify a
so that it points a b
instead of Nil
.
This creates a reference cycle.
We do this by using the tail
method to get a reference to the RefCell<Rc<List>>
in a
which we put in the variable link
.
Then we use the borrow_mut
method on the RefCell<Rc<LIST>>
to change the value inside form a Rc<List>
that holds a Nil
value to the Rc<List>
in b
.
When we run this, while keeping the last println!
commented out for the moment, we get this output
$ cargo run
Compiling cons-list v0.1.0 (file:///projects/cons-list)
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.53s
Running `target/debug/cons-list`
a initial rc count = 1
a next item = Some(RefCell { value: Nil })
a rc count after b creation = 2
b initial rc count = 1
b next item = Some(RefCell { value: Cons(5, RefCell { value: Nil }) })
b rc count after changing a = 2
a rc count after changing a = 2
We can see the reference count of the Rc<List>
instances in both a
and b
are 2 after we change the list in a
to point to b
.
At the end of main
Rust will drop the variable b
, this decreases the reference count of the b
Rc<List>
instance from 2 to 1.
The memory of that Rc<List>
has on heap won't be dropped at this point because the reference count is still not 0, it is 1 currently.
Next Rust drops a
, this also decreases the reference count from the a
Rc<List>
instance from 2 to 1.
This instance's memory cant be dropped either because of the reference form the other Rc<List>
instance still referes to it.
This memory allocated to the list will remain uncollected forever.
Here is a visual aid to show this.
If you now uncomment the last
println!
and run the program.
Rust will try to print this cycle with a
pointing to a
and so forth until it overflows the stack.
Compared to a real-world program, the consequences of creating a reference cycle in this example aren't very dire.
Right after we create the reference cycle, the program ends.
If this were a more complex program allocating lots of memory in a cycle and held onto it for a long time, the program will use more memory than it needs and might overwhelm the system, causing it to run out of available memory.
If you have RefCell<T>
values that contain Rc<T>
values or similar nested combinations of types with interior mutability and reference counting, you must ensure that you don't create these cycles.
You are unable to rely on Rust to catch them.
Creating a reference cycle would be a logic bug in your program that you should use automated tests, code reviews and other software development practices to minimize.
Another solution for avoiding this is reorganizing your data structs so that some references express ownership and some references don't.
This results in you can have cycles made up of some ownership relationships and some non-ownership relationships, and only the ownership relationships affect whether or not a value can be dropped.
In the previous example we always want Cons
variants to own their list, so reorganizing the data struct isn't possible.
Now we will look at an example using graphs made up of parent nodes and child nodes to see when non-ownership relationships are an appropriate way to prevent reference cycles.
Preventing Reference Cycles: Turning a Rc<T>
into a Weak<T>
So far we have demonstrated that calling Rc::clone
increases the strong_count
of a Rc<T>
instance, and a Rc<T>
instance is only cleaned up if its strong_count
is 0.
You can also create a weak reference to the value within a Rc<T>
instance by calling Rc::downgrade
and passing a reference to the Rc<T>
.
Strong references don't express an ownership relationship.
Weak references don't express an ownership relationship, and their count doesn't affect when a Rc<T>
instance is cleaned up.
They will no cause a reference cycle because any cycle involving weak references will be broken once the strong reference count of values involved is 0.
When you call Rc::downgrade
, you get a smart point of type Weak<T>
.
Instead of increasing the strong_count
in the Rc<T>
by 1, using Rc::downgrade
increases the weak count by 1
.
The Rc<T>
type uses weak_count
to keep track of how many Weak<T>
references exist, this is similar to strong_count
.
The differences is that weak_count
doesn't need to be 0 for the Rc<T>
instance to be cleaned up.
Due to the value that Weak<T>
references could be referred to might have been dropped, to do anything with a value that a Weak
is pointing to, you must ensure that the value will still exist.
You can do this by calling the upgrade
method on a Weak<T>
instance, this will return an Option<Rc<T>>
.
You will get a result of Some
if the Rc<T>
value has not been dropped yet and a results of None
if the Rc<T>
value has been dropped.
Due to upgrade
returning an Option<Rc<T>>
, Rust will ensure that the Some
case and the None
cases are handled and there will no be an invalid pointer.
Next we will show an example where rather than using a list whose items know only about the next item, we will create a tree whose items know about their children items and their parent items.
Creating a Tree Data Structure a Node with Child Nodes
We will start with building a tree with nodes that know about their child nodes.
We will create a struct called Node
that holds its own i32
value as well as references to its children Node
values.
use std::cell::RefCell;
use std::rc::Rc;
#[derive(Debug)]
struct Node {
value: i32,
children: RefCell<Vec<Rc<Node>>>,
}
Here we want a Node
to own its children, and we want to share that ownership with variables so we can access each Node
in the tree directly.
We do this by defining the Vec<T>
items to be values of type Rc<Node>
.
We also want to be able to modify which nodes are children of another node.
Now we have a RefCell<T>
in children around the Vec<Rc<Node>>
.
Next we will use the struct definition and create one Node
instance named leaf
with the value 3 and no children and another instance named branch
with the vale 5 and leaf
as one of its children.
fn main() {
let leaf = Rc::new(Node {
value: 3,
children: RefCell::new(vec![]),
});
let branch = Rc::new(Node {
value: 5,
children: RefCell::new(vec![Rc::clone(&leaf)]),
});
}
Here we clone the Rc<Node>
in leaf
and store that in branch
, meaning the Node
in leaf
now has two owners.
These are leaf
and branch
.
We can get from branch
to leaf
through branch.children
.
There is no way to get from leaf
to branch
.
The reason that leaf
has no reference to branch
and doesn't know they are related.
The reason that leaf
has no reference to branch
is that it doesn't know they are related.
We want leaf
to know that branch
is its parent.
Adding a Reference form a child
to Its Parent
To make the child node aware of the relationship to its parent, we need to add a parent
field to our Node
struct definition.
The trouble is in deciding what the type of parent
should be.
We know that it cannot contain a Rc<T>
, because that would create a reference cycle with leaf.parent
pointing to branch
and branch.children
pointing to leaf
.
This would cause their strong_count
values to never be 0.
Thinking about the relationships another way, a parent node should own its children.
If a parent node is dropped, then its child nodes should be dropped as well.
A child should not own its parent.
If a child node, the parent should still exist.
We should use weak references in this case.
So we instead of Rc<T>
we will make the type of parent
use Weak<T>
.
Specifically a RefCell<Weak<Node>>
. Now the Node
definition will look like this.
use std::cell::RefCell;
use std::rc::{Rc, Weak};
#[derive(Debug)]
struct Node {
value: i32,
parent: RefCell<Weak<Node>>,
children: RefCell<Vec<Rc<Node>>>,
}
Now a node will be able to refer to its parent node but doesn't own its parent.
In the next example we will update main
to use this new definition so the leaf
node will have a way to refer to its parent branch
.
fn main() {
let leaf = Rc::new(Node {
value: 3,
parent: RefCell::new(Weak::new()),
children: RefCell::new(vec![]),
});
println!("leaf parent = {:?}", leaf.parent.borrow().upgrade());
let branch = Rc::new(Node {
value: 5,
parent: RefCell::new(Weak::new()),
children: RefCell::new(vec![Rc::clone(&leaf)]),
});
*leaf.parent.borrow_mut() = Rc::downgrade(&branch);
println!("leaf parent = {:?}", leaf.parent.borrow().upgrade());
}
Creating the leaf
node is similar to the example above with the exception of the parent
field.
leaf
starts out without a parent, so we create a new , empty Weak<node>
reference instance.
At this point when we try to get a reference to the parent of leaf
by using the upgrade
method, we get a None
value.
We will see this in the output from the first println!
statement.
leaf parent = None
Here when we create the branch
node, it will also have a new Weak<Node>
reference in the parent
field, this is because the branch
doesn't have a parent node.
We will still have leaf
as one of the children of branch
.
Once we have the Node
instance in branch
, we can modify leaf
to give it a Weak<Node>
reference to its parent.
We will use the borrow_mut
method on the RefCell<Weak<Node>>
in the parent
field of leaf
.
Then we use the Rc::downgrade
function to create a Weak<Node>
reference to branch
form the Rc<Node>
in branch
.
When we then print the parent of leaf
again, this time we will get a Some
variant holding branch
Now leaf
will know the relationship to the parent and will have access to its parent.
When we print leaf
, we need to avoid the cycle that eventually end in a stack overflow, like what we had before.
The Weak<Node>
references are printed as (Weak)
leaf parent = Some(Node { value: 5, parent: RefCell { value: (Weak) },
children: RefCell { value: [Node { value: 3, parent: RefCell { value: (Weak) },
children: RefCell { value: [] } }] } })
Th lack of infinite output indicates that this code didn't create a reference cycle.
We can also tell this by looking at the values we get form calling Rc::strong_count
and Rc::weak_count
.
Visualizing Changes to strong_count
and weak_count
Now we will look at how the strong_count
and weak_count
values of the Rc<Node>
instances change by creating a new inner scope and moving the creation of branch
into that scope`.
Doing this we can then see what happens when branch
and then dropped when it goes out of scope.
Here we can see the modifications to main
fn main() {
let leaf = Rc::new(Node {
value: 3,
parent: RefCell::new(Weak::new()),
children: RefCell::new(vec![]),
});
println!(
"leaf strong = {}, weak = {}",
Rc::strong_count(&leaf),
Rc::weak_count(&leaf),
);
{
let branch = Rc::new(Node {
value: 5,
parent: RefCell::new(Weak::new()),
children: RefCell::new(vec![Rc::clone(&leaf)]),
});
*leaf.parent.borrow_mut() = Rc::downgrade(&branch);
println!(
"branch strong = {}, weak = {}",
Rc::strong_count(&branch),
Rc::weak_count(&branch),
);
println!(
"leaf strong = {}, weak = {}",
Rc::strong_count(&leaf),
Rc::weak_count(&leaf),
);
}
println!("leaf parent = {:?}", leaf.parent.borrow().upgrade());
println!(
"leaf strong = {}, weak = {}",
Rc::strong_count(&leaf),
Rc::weak_count(&leaf),
);
}
Now after creating leaf
is created its Rc<Node>
has a strong count of 1 and a weak count of 0.
In the inner scope we create branch
and associate it with leaf
, at which point when we print the counts, the Rc<Node>
in branch
will have a strong count of 1 and a weak count of 1.
For leaf.parent
pointing to branch
with a Weak<Node>
.
When we print the counts in leaf
we will see it have a strong count of 2, because branch
now has a clone of the Rc<Node>
of leaf
stored in branch.children
, built will still have a weak count of 0.
When the inner scope ends, branch
goes out of scope and the strong count of the Rc<node>
decreases to 0. The Node
will be dropped.
The weak count of 1 form leaf.parent
has no bearing on whether or not Node
is dropped, so now we will not het any memory leaks.
Now if we try to access the parent of leaf
after the inner scope we will get None
again.
At the end of the program the Rc<Node>
in leaf
has a strong count of 1 and a weak count of 0.
This is because the variable leaf
is now the only reference to the Rc<Node>
.
All of this logic manages the counts and value dropping is built into Rc<T>
and Weak<T>
and their implementations of the Drop
trait.
By specifying that the relationship form a child to its parent should be a Weak<T>
reference in the definition of Node
.
You are able to have parent nodes point to child nodes and vice versa without creating a reference cycle and memory leaks.