I/O Project: Building a Command Line Program
In this module I will be recreating a classic command line search tool grep
(globally search a regular expression and print)
Rust's speed, safety, single binary output and cross-platform support makes it an ideal language for creating command line tools
In the simplest use case, grep
searches a specified file for a specified string.
To do this grep
takes as its arguments a file path and a string. Then it reads the file, finds lines in that file that contain/match to the string argument nad prints those lines
This project will also show along the way how to use the terminal features that many other command line tools use
It will include reading the value of an environemnt variable to allow the user to configure the behavior of our tool
This project will also go into printing error messages to the standard error console stream (stderr
) instead of the standard output (stdout
)
We to that so the user can redirect successful output to a file while still seeing error messages onscreen ofr example
One Rust community member, Andrew Gallant, has already created a fully featured, very fast version of grep
called ripgrep
This version will be fairly simple.
Inital Goal: Accept Command Line Arguments
We can do this when running our program with cargo run
by two hyphens to indicate the follwing arguments are for our program rather than for cargo
- A string to search for
- A path to a file to search in
Here is an example running
$ cargo run -- searchstring example-filename.txt
The program generated y cargo new
cannot process argments we give it.
There are some existing libraries on crates.io can help with writing a program that accepts command line arguments.
But since its a learning opporutnity I (with the help of the rust programming language) will be implementing this capability
Reading the Arguments Values
We will need the std::env::args
function prvided in Rust's std library.
This function reutnrs an iterator of the command line arguments passed to the program
Iterators will be covered later in the chapter after
For now the two important details about iterators:
- iterators produce a series of values
- we can call the
collect
method on an iterator to turn it into a collection, such as a vector, that contains all the elements the iterator produces
we bring the std::env
module into scope using the use
statement so we can use its args
function
Note thatthe std::env::args
function is nestd in two levels in two levels of modules.
In cases where the desired function is nested in more than one module, we chose to bring the parent module into scope rather than the function
By doing this we can also use other functions from std::env
It also less ambiguous than adding use std::env::args
and then calling the function with just args
, because args
might easily be mistaken for a function that is defined in the current module.
The args
Function and Invalid Unicode
Note that std::env::args
will panic if any arguments contains invalid Unicode.
If your program needs to accept arguments containing invalid Unicode, use std::env::args_os
instead
This function produces an iterator that produces 0sString
values instead of String
values
We chose to use std::env:args
for simplicit because 0sString
values differ per platform and are more complex to work with than String
values.
On the first line of main
we call env::args
and then collect
is immediately used to turn the iterator into a vector containing all the values produced by the iterator.
We can use the collect
function to create many kinds of collection, so we eplicitly annotate the tpye of args
to specify that we want a vector of strings.
When using collect
and other functions like it we need to annotate because Rust isn't able to infer the kind of collection desired
See the output with and without any arguments after cargo run
$ cargo run
Compiling minigrep v0.1.0 (file:///projects/minigrep)
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.61s
Running `target/debug/minigrep`
[src/main.rs:5:5] args = [
"target/debug/minigrep",
]
$ cargo run -- needle haystack
Compiling minigrep v0.1.0 (file:///projects/minigrep)
Finished `dev` profile [unoptimized + debuginfo] target(s) in 1.57s
Running `target/debug/minigrep needle haystack`
[src/main.rs:5:5] args = [
"target/debug/minigrep",
"needle",
"haystack",
]
Notice that the first value in the vector is "target/debug/mingrep"
, this is the name of our binary.
This matches the behavior if the arguemtns list in C, letting programs they were invoked in their execution.
Its often convenient ot have access to the program name in case you want ot print it in messages or change the behavior of the program based on what command line alias was sed to invoke the program.
For this program we will ignore it and save only the tow arguments we need.
Saving the Argument Values in Variables
The program is currently able to access the values specified as command line args
Now we should save the two arguments in variables so that we can use them later and throuht the program
We should do this by &args[1]
The first arg that minigrep
takes is the string we are searching for, so we put a reference to the first arg in the var query
The second arg is the file path, so we put a reference to the second argument in the var file_path
.
We will temporarily print the values of these varaibles to prove that the code is working as intended
Here is what the output would look like at this point
$ cargo run -- test sample.txt
Compiling minigrep v0.1.0 (file:///projects/minigrep)
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.0s
Running `target/debug/minigrep test sample.txt`
Searching for test
In file sample.txt
Second Goal: Reading a File
Now we will add functionality to read the specified in the file_path
argument.
First we will create a sample file to test it with lots of repeating words in a small file
Here is an Emily Diskinson poem that we will use. It wil be stored in poem.txt at the root level of the project
I'm nobody! Who are you?
Are you nobody, too?
Then there's a pair of us - don't tell!
They'd banish us, you know.
How dreary to be somebody!
How public, like a frog
To tell your name the livelong day
To an admiring bog!
Now lets add the functionality to read the contents of the file
use std::env;
use std::fs;
fn main() {
// --snip--
println!("In file {file_path}");
let contents = fs::read_to_string(file_path)
.expect("Should have been able to read the file");
println!("With text:\n{contents}");
}
The first thing to note is that we bring in std::fs
to handle files, which is part of the std library
main
now contants fs::read_to_string
which takes the file_path
, this opens that file associated and reutrns a value of std::io::Result<String>
that contains the file's contents
Afterwards we add a temporary println!
statement that prints the vale of contents
after the file is read so that we can check for correctness.
Here is an example output. Note that the file name goes in the second argument
$ cargo run -- How poem.txt
Compiling minigrep v0.1.0 (/mnt/usb/RustBrock/minigrep)
warning: hard linking files in the incremental compilation cache failed. copying files instead. consider moving the cache directory to a file system which supports hard linking in session dir `/mnt/usb/RustBrock/minigrep/target/debug/incremental/minigrep-15n8unzbjfgbp/s-h4l8llgfdu-18r36aq-working`
warning: `minigrep` (bin "minigrep") generated 1 warning
Finished `dev` profile [unoptimized + debuginfo] target(s) in 9.26s
Running `target/debug/minigrep How poem.txt`
Searching for How
In the file poem.txt
With text:
I'm nobody! Who are you?
Are you nobody, too?
Then there's a pair of us - don't tell!
They'd banish us, you know.
How dreary to be somebody!
How public, like a frog
To tell your name the livelong day
To an admiring bog!
As you can see it works as expected.
But as you can see the main
function has multiple responsibilities: generally functions are cleareer and easier to maintain if each function is responsible ofr only one idea.
The other problem is that we are not handling errors as well as we could.
These arent big problems while the program is small, but as the program grows it will be harder to fix them cleanly.
It is good practice to begin refactoring early on when developing because it is easier to refactor smaller amounts of code
Third Goal: Refactor to Imporve Modularity and Error Handling
We ha ve 4 problems to fix
- Our
main
function now performs two tasks
- Parsing arguemnts
- reading files
It would be better to separate tasks in thr
main
function. As a function gians responsibilities, it becomes more difficult to reason aobut harder to test and harder to change withot breaking one of its parts.
It is best to separate functionality so each function is responsibile for one task
2. This is partly replated to the first problem, although query
and file_path
are config variables to our problem, variables like contents
are used to perform the program's logic.
As main
gets longer, the more variables we will need to bring into scope; the more variables we have in scope, which makes it harder to track the purpose of each.
It is best to group the config variables into one struct to make their purpose clear.
- We use
expect
to print an error mesage when the reading the file fails, but the error message printsShould have been able to read the file
. This is unclear what the error is.
The file can fail in a number of ways for example the file could be missing, or we may not have permission to open it. Currently we wold print the same error regardless of the situation or type of error.
- We use
expect
to handle an error and if the user runs our program without specifiying enough arguments, they will get an index out of bounds error from Rust that dosnt clearly explain the problem.
It would be best if all the erro-handling code was in one place, so that future maintainers had only one place o consult the code if the error handling logic needed to change.
Having all of the error handling code in one place will also ensure that when we print messages they will make sense to our end users.
Separation of Concerns for Binary Projects
The organizational problem of allocating responsibility for multiple tasks to the main
function is common to many binary projects.
As a result the Rust community has developed guidelines for splitting the separate concerns of a binar program when main
starts getting large.
The process has te follwing steps
- Split your program into a main.rs file and a lib.s file and move the program's logic to lib.rs
- As long as your command line parsing logic is small it can remain in main.rs
- When the common line parsing logic starts getting complicated, extarct it from main.rs then move it to lib.rs
The responsibilities that remian in the main
function after this process should be limited to:
- Calling the command line pasing logic with the argument values
- Setting up any other configuration
- Calling a
run
function in lib.rs - Handling the error if
run
returns an error
This pattern is about separating concerns: main.rs handles running the program and lib.rs handles all of the logic of the task at hand.
Due to not being able to test the main
function directly, this struct lets you test all of your program's logic by removing this limitation y moving it to lib.rs.
The small amount of code that remains in main.rs will be small enough to verify its correctness by reading it
Extracting the Arugment Parser
We will first extract the functionality for pasing args into a function that main will call to prepare for moving the command line parsing logic to src/lib.rs
Here is how the start of main
should now look
fn main() {
let args: Vec<String> = env::args().collect();
let (query, file_path) = parse_config(&args);
// --snip--
}
fn parse_config(args: &[String]) -> (&str, &str) {
let query = &args[1];
let file_path = &args[2];
(query, file_path)
}
We are still collecting the command line args into a vector, but instead of assigning the arg value at indexes to the variables we instead pass the whole vector to parse_config
function.
The parse_config
function then holds the logic that determines which arg goes in which varaible and asses the values back to main
.
We still create query
and file_path
in main
but it no longer has the responsiblity of determining how the command line arguments and values correspond.
This rework may seem like overkill but we are refactoring in small incremental steps.
After making this change it is good practice to verify that the arguments parsing still works
It is good to check your progress often to identif the cause of problems when they occur
Grouping Configuration Values
We can take another small step to improve the parse_config
function furter.
At the moment were returning a tuple then immediately breaking that tuple into individual parts again.
This is a sign that we might not have the right abstraction yet.
Another indicator is that shows there is room for improvement is the config
part of parse_config
.
This imples that the tow values we retrun are related and are both part of one configuration value.
We are currently not conveying this meaning in the structure of the data other than by grouping the two valus into a tuple.
Instead we should put the two values into one struct and give each of the struct fields a meaningful name.
By doing this you make it easier for future maintainers of this code to understand how the different values relate to each other and what their purpose is
Here is the improved version of the function
fn main() {
let args: Vec<String> = env::args().collect();
let config = parse_config(&args);
println!("Searching for {}", config.query);
println!("In file {}", config.file_path);
let contents = fs::read_to_string(config.file_path)
.expect("Should have been able to read the file");
// --snip--
}
struct Config {
query: String,
file_path: String,
}
fn parse_config(args: &[String]) -> Config {
let query = args[1].clone();
let file_path = args[2].clone();
Config { query, file_path }
}
We have now added a struct named Config
which has fields anmed query
and file_path
, which are both String
values
The signature of parse_config
now inidcates that it reutrns a Config
value6
The body of parse_config
, which is where we used to return string slices that reference String
values in args
The args
variable in main
is the owner of the argument values and is only letting the parse_config
function borrow tem, which means we'd voilate Rust's borrowing rules if Config
tried to take ownership of th values in args
There are a number of ways we could mange the String
data, the easiest though inefficient, route is to call the clone
method on the values
This makes a full copy of the data for the Config
instance to own, which takes more time and memory that sotring a reference to the string data.
However, cloning the data also makes the code very straightforward because we don't have to manage the lifetimes of the references; in this circumstance, giving up a little performance to gain this simplicitiy is a worthwhile trade-ff
The Trade-Offs of Using clone
There is a tendency to avoid using clone
to fix ownership problems becuase of its runtime cost.
The next chapter will go over how to use more efficient methods in this tpye of situation.
For now it is ok to copy a few strings to continue making progress because you will make thse copies only once and your file path and query string are very small.
It is better to have a working program that is a bit inefficient than to try to hyperoptimize code on the first pass.
With more experience it will be easier to start with the most effcient solutionm for now it is perfectly acceptable to call clone
.
Creating a Constructor for Config
So far we extracted the logic responsible for parsing the command line arguments form main
and placed it in the parse_config
Doing this helps us see that the query
and file_path
values are related and that relationship should be conveyed in our code.
We then added a Config
struc to name the related purpose of query
and file_path
and to be able to return the values' names as fields that are named the structs.
Now the purpose of the parse_config
function is to create a Config
instance so instead we should change parse_config
from a plain function to a function named new
that is associated with the Config
struct.
Making tihs chnage will make the code more idiomatic
We can create instances of tpyes in the std library such as String
by calling String::new
Similarly by changing Config
by calling Config::new
Here is how these changes should be made
fn main() {
let args: Vec<String> = env::args().collect();
let config = Config::new(&args);
// --snip--
}
// --snip--
impl Config {
fn new(args: &[String]) -> Config {
let query = args[1].clone();
let file_path = args[2].clone();
Config { query, file_path }
}
}
Fourth Goal: Fixing the Error Handling
Recall that attempting to access the values in the args
vector at index 1 or index 2 will cause the program to panic if the vector contains fewer than three items
Here is what the output would look like
$ cargo run
Compiling minigrep v0.1.0 (file:///projects/minigrep)
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.0s
Running `target/debug/minigrep`
thread 'main' panicked at src/main.rs:27:21:
index out of bounds: the len is 1 but the index is 1
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Notice the line index out of bounds: the len is 1 but the index is 1
is an error message intended for programmers.
This will not help our end users understand and what they should do instead
Improving the Error Message
First we will add a check in the new
function that will verify that the slice is long enough before accessing index 1 and 2
If the slice is not long enough then the program panics and idsplays a better error message.
// --snip--
fn new(args: &[String]) -> Config {
if args.len() < 3 {
panic!("not enough arguments");
}
// --snip--
This code is similar to some code we wrote before the Guess::new
function from [ch9](../Error%20Handling.md#To-panic!-or-Not-to- panic!), where when the value
argument was out of range of valid values.
Instead of checking for a range of values where we are just checking that the lengths of args
is at least 3
and the rest of the function can operate under the assumption that this condition has been met.
If args
has fewer than three items then the condition will be true
and the program will call the painc!
macro then end immediately.
Here is the new output after adding this code with the same lack or arguments
$ cargo run
Compiling minigrep v0.1.0 (file:///projects/minigrep)
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.0s
Running `target/debug/minigrep`
thread 'main' panicked at src/main.rs:26:13:
not enough arguments
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
This output is more of a reasonable error message.
However we also have extraneous info that we dont want to give to our users.
Perhaps the technique of calling a panic is more appropriate for a programming problem then a usage problem as discussed in ch9.
Instaed we would use a different technique, returning a Result
that indicates either success or an error
Returning a Result Instead of Calling panic!
Instead we will return a Result
value that will contain a Config
instance in the successful case and will describe the problem in the error case.
We are also going to change the function name from new
to build
becuase many programmer expect new
funcons to never fail
When Config::build
communicates with main
we can use the Result
type to signal there was a problem.
Thne we can change main
to convert an Err
variant into a mroe practicla error for our users without the surrounding test about thread 'main'
and RUST_BACKTRACE
that a call to panic!
causes.
Here is how we would make these changes to build.
Note that this will not run without changs to main
as well
impl Config {
fn build(args: &[String]) -> Result<Config, &'static str> {
if args.len() < 3 {
return Err("not enough arguments");
}
let query = args[1].clone();
let file_path = args[2].clone();
Ok(Config { query, file_path })
}
}
Our build
function returns a Result
with a Config
instance in the success case and a string literal in the error case.
Ourerror values will always be string literals that have the 'static
lifetime.
There are two major changes in the body of the function
- instead of caling
panic!
when the user doesnt pass enough argumetns we now return anErr
value - We wrapped the
Config
return value in anOk
These changes make the function conform to its new type signature.
Reutrning an Err
value allows the main
function to handle the Result
value returned from the build
function and exit the process more cleanly in the error case.
Calling Config::build
and Handling Errors
To hanlde the error and print a user friendly message we need to update main
to handle the Result
bein reutrned by Config::build
We will also take the responsibility of exiting the command line tool with a nonzero error code away form panic!
and instaed implement it by hand
A nonzero exit status is a convntion to signal to the process that called our program that the program exitied with an error state.
Here is the implementation of these things
use std::process;
fn main() {
let args: Vec<String> = env::args().collect();
let config = Config::build(&args).unwrap_or_else(|err| {
println!("Problem parsing arguments: {err}");
process::exit(1);
});
// --snip--
In this we used upwrap_or_else
which is defined on Result<T, E>
by the std library.
using this method allows us to define some custom, non-panic!
error handling
If the Result
is an Ok
valuem then this method's behavior is simialr to unrap
and it returns the inner value that Ok
is wrapping
If the value is an Err
value, this method calls the code in the closure, which is an anonymous function we define and pass as an arugment to unwrap_or_else
.
Closures will be covered in the next chapter (ch13)
For now you can think of it as it will pass the inner value of an Err
to our closure in the arguemnt err
that appears between the vertical ppes.
The code in the closure can then use the err
value when it runs.
We also brought in the process
from the std library into scope.
The code in the closure that will be run in the error case is only two lines:
- we print the
err
value - call the
process::exit
process::exit
function will stop the program immediately and return the number that was passed as the exit status code
This is similar to the panic!
based handling we used before.
Here is the new output in an error case
$ cargo run
Compiling minigrep v0.1.0 (file:///projects/minigrep)
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.48s
Running `target/debug/minigrep`
Problem parsing arguments: not enough arguments
As you can see this is way more user friendly
Fifth Goal: Extracting Logic form main
Now that we finished refactoring the configuration parsing
Lets separate the programs logic
As stated in the Separation of Concerns for Binary Projects, we extract a fnction named run
that will hold all the logic currently in the main
function that isn't involved with setting up configuration or handling errors.
When this is done main
will be concise and east to verify by inspection as well we will also write tests fro all the other logic
Here is the extracted run
function for now will be small and will imcrementally improve the extracting runction
fn main() {
// --snip--
println!("Searching for {}", config.query);
println!("In file {}", config.file_path);
run(config);
}
fn run(config: Config) {
let contents = fs::read_to_string(config.file_path)
.expect("Should have been able to read the file");
println!("With text:\n{contents}");
}
// --snip--
The run
function now contains all the remaining logic from main
starting from reading the file.
Returning Errors from the run
Function
With the remaining program logic in the run
function we can improve the error handling just like how we did with Config::build
Instead of calling expect
the run
function will return a Result<T, E>
when something goes wrong
This will let us further consolidate the logiv around errors into main
in a user friendly way
Here is the updated function with a new signature
use std::error::Error;
// --snip--
fn run(config: Config) -> Result<(), Box<dyn Error>> {
let contents = fs::read_to_string(config.file_path)?;
println!("With text:\n{contents}");
Ok(())
}
The three singificant changes are:
- Changed the return tpye of the
run
function toResult<(), Box<dyn Error>>
The function previosly returned the unit type()
and we keep that as the value returned in theOk
case
For the error tpye we used the trait object Box<dyn Error>
and we brought in std::error::Error
We will cover trait objects later (ch17)
For now know that Box<dyn Error>
means the function will return a type that impleetns the Error
trait, but we don't have to specify what the particular tpe the return value will be.
This flexibility to return error vlaues that may be of different tpyes in differnet error cases
The dyn
keyowrd is short for dynamic
- The call to
expect
in favor of the?
operator (can b found here)
Rather than panic!
on an error ?
will return the error value form the current function for the caller to handle
- The
run
function now returns anOk
value in the success case
The function returns ()
as the success tpye
The Ok(())
syntx might look strange at first, but ising ()
like this is the idiomatic way to indicate that we are calling run
for its side effects only
It doesn't return a value we need
Here is the error message that the compiler will output at this point
$ cargo run -- the poem.txt
Compiling minigrep v0.1.0 (file:///projects/minigrep)
warning: unused `Result` that must be used
--> src/main.rs:19:5
|
19 | run(config);
| ^^^^^^^^^^^
|
= note: this `Result` may be an `Err` variant, which should be handled
= note: `#[warn(unused_must_use)]` on by default
help: use `let _ = ...` to ignore the resulting value
|
19 | let _ = run(config);
| +++++++
warning: `minigrep` (bin "minigrep") generated 1 warning
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.71s
Running `target/debug/minigrep the poem.txt`
Searching for the
In file poem.txt
With text:
I'm nobody! Who are you?
Are you nobody, too?
Then there's a pair of us - don't tell!
They'd banish us, you know.
How dreary to be somebody!
How public, like a frog
To tell your name the livelong day
To an admiring bog!
Rust tells us that our code ignored the Result
value and the Result
value might indicate that an error occurred.
But we are not checking whether or not there was an error and the compiler reminds us that we probably meant to have some error-handling code here