RustBrock/Hash.md
2025-01-20 17:14:33 -07:00

7.5 KiB

Hash Map

Hash maps are defined as HashMap<K, V>, where it stores a mapping of keys of type K to values of type V using a hashing function.

This determines how it places these keys and values into memory

Other programming support this kind of data structure its often called a different name, such as hash, map, object, hash table, dictionary, or associative array, these are just some of the names

Hash maps are useful when you want to look up data not by using an index, as you can with vectors, but instead you use a key that can be any type

For example it could be different teams scores where the name of the team is the key and the score is the value

Check the std library documentation for more info

Creating a New Hash Map

One way to create an empty hash map is to use new and to add elements with insert

Lets say we are keeping track of the score of two teams whose names are Blue and Yellow

Blue team has 10 points Yellow team has 50 points

    use std::collections::HashMap;

    let mut scores = HashMap::new();

    scores.insert(String::from("Blue"), 10);
    scores.insert(String::from("Yellow"), 50);

Note we must bring HashMap into scope because it is not included by default

Hash Maps also has less support from the std library, therefore there is no macros to construct them and there are less methods associated with them

They are store on the heap The keys must be all be the same type as each other The values must be all be the same type as each other

Keys and Values do not need to be the same type as each other

Accessing Values in a Hash Map

We can get a value out of the hash map by using its key to the get method

Here is an example

    use std::collections::HashMap;

    let mut scores = HashMap::new();

    scores.insert(String::from("Blue"), 10);
    scores.insert(String::from("Yellow"), 50);

    let team_name = String::from("Blue");
    let score = scores.get(&team_name).copied().unwrap_or(0);

This would result in score being equal to 10

The get method returns an Option<&V>, if there is no value for that key in the hash map, then the Option will be None

This program uses copied to get an Option<i32> rather than a Option<&i32> then unwrap_or to set score to zero if there is no entry for the key

You can iterate over each key-value pair in the hash map in a similar way as vectors using a for loop

    use std::collections::HashMap;

    let mut scores = HashMap::new();

    scores.insert(String::from("Blue"), 10);
    scores.insert(String::from("Yellow"), 50);

    for (key, value) in &scores {
        println!("{key}: {value}");
    }

This would output each pair in an arbitrary order

Yellow: 50
Blue: 10

Hash Maps and Ownership

If a type implements the Copy trait values are copied into the hash map For owned values like String the values with be moved and the hash map will be the owner of those values

Example

    use std::collections::HashMap;

    let field_name = String::from("Favorite color");
    let field_value = String::from("Blue");

    let mut map = HashMap::new();
    map.insert(field_name, field_value);
    // field_name and field_value are invalid at this point, try using them and
    // see what compiler error you get!

You aren't able to use field_name nor field_value because ownership of the associated value from those two variables has changed

This change happened after the insert method was called

If references are inserting into the hash map, then the values wont be moved but the reference MUST be valid for as long as the hash map is also valid

Updating a Hash Map

While a Hash Map is growable each key MUST by unique and can only be associated with a single value at a time. The reverse is not true

When you want to change the data in a hash map, you have to decided how to handle the case when a key already has a value assigned

Overwriting a Value

If you insert a key and value into a hash map then insert the same key with a different value then the value associated with the key will be overwritten

    use std::collections::HashMap;

    let mut scores = HashMap::new();

    scores.insert(String::from("Blue"), 10);
    scores.insert(String::from("Blue"), 25);

    println!("{scores:?}");

This only contains 1 key: value This will output {"Blue": 25} The original 10 is overwritten

Adding a Key and Value Only If a Key Isn't Present

This is common do do if there is no key associated with a desired value, then add the key and value otherwise do nothing.

Hash Maps have a special API for this entry that takes the key you want to check as a parameter

The return value of the entry method is an enum called Entry that represents a value that might or might not exist

This can be used if it doesn't have a value associated with the key then insert the value

Example of use

    use std::collections::HashMap;

    let mut scores = HashMap::new();
    scores.insert(String::from("Blue"), 10);

    scores.entry(String::from("Yellow")).or_insert(50);
    scores.entry(String::from("Blue")).or_insert(50);

    println!("{scores:?}");

The or_insert method on Entry is defined to return a mutable reference to the value for the corresponding Entry key if that key exists and if not it inserts the parameter as the new value for this key and returns a mutable refer to the new value

This is more clear then writing our own definition that complies with the borrow checker

This code will output {"Yellow": 50, "Blue": 10}

This means that this or_insert will not change it if it already exists

Updating a Value Based on the Old Value

Another common use case for hash maps is to look up a key's value then update it based on the old value

Example

    use std::collections::HashMap;

    let text = "hello world wonderful world";

    let mut map = HashMap::new();

    for word in text.split_whitespace() {
        let count = map.entry(word).or_insert(0);
        *count += 1;
    }

    println!("{map:?}");

This counts how many times each word appears in some text

If its the first time we see that word we initialize it to zero, then we add one using a reference to that key's value

This will output {"world": 2, "hello": 1, "wonderful": 1}

Output order is arbitrary when iterating over a hash map

split_whitespace method returns an iterator over sub slices, separated by whitespace, of the value in text.

or_insert method returns a mutable reference (&mut V) to the value for the specified key

In this case this reference is stored in count variable, then it is dereferenced count using * to assign a value to it. The reference then goes out of scope which ensures that the reference is freed and allows for the new assignment of another mutable reference to count. This makes all of the changes by the mutable reference safe because of the use of a for loop.

Hashing Functions

By default HashMap uses a hashing function called SipHash that can provided resistance to DoS (denial-of-service) attacks involving hash tables

This is not the fastest hashing algorithm available, but it the trade-off is for better security but with a drop in performance. This is worth it.

If your use case requires a different or faster hash you can switch by specifying a different hasher

A hasher is a type that implements the BuildHasher trait

Many other types of hashers can be provided in library crates which can be found on crates.io which implement many common hashing algorithms.