I/O Project: Building a Command Line Program

In this module I will be recreating a classic command line search tool grep (globally search a regular expression and print)

Rust's speed, safety, single binary output and cross-platform support makes it an ideal language for creating command line tools

In the simplest use case, grep searches a specified file for a specified string.

To do this grep takes as its arguments a file path and a string. Then it reads the file, finds lines in that file that contain/match to the string argument nad prints those lines

This project will also show along the way how to use the terminal features that many other command line tools use

It will include reading the value of an environemnt variable to allow the user to configure the behavior of our tool

This project will also go into printing error messages to the standard error console stream (stderr) instead of the standard output (stdout)

We to that so the user can redirect successful output to a file while still seeing error messages onscreen ofr example

One Rust community member, Andrew Gallant, has already created a fully featured, very fast version of grep called ripgrep

This version will be fairly simple.

Inital Goal: Accept Command Line Arguments

We can do this when running our program with cargo run by two hyphens to indicate the follwing arguments are for our program rather than for cargo

  • A string to search for
  • A path to a file to search in

Here is an example running

$ cargo run -- searchstring example-filename.txt

The program generated y cargo new cannot process argments we give it.

There are some existing libraries on crates.io can help with writing a program that accepts command line arguments.

But since its a learning opporutnity I (with the help of the rust programming language) will be implementing this capability

Reading the Arguments Values

We will need the std::env::args function prvided in Rust's std library.

This function reutnrs an iterator of the command line arguments passed to the program

Iterators will be covered later in the chapter after

For now the two important details about iterators:

  • iterators produce a series of values
  • we can call the collect method on an iterator to turn it into a collection, such as a vector, that contains all the elements the iterator produces

we bring the std::env module into scope using the use statement so we can use its args function

Note thatthe std::env::args function is nestd in two levels in two levels of modules.

In cases where the desired function is nested in more than one module, we chose to bring the parent module into scope rather than the function

By doing this we can also use other functions from std::env

It also less ambiguous than adding use std::env::args and then calling the function with just args, because args might easily be mistaken for a function that is defined in the current module.

The args Function and Invalid Unicode

Note that std::env::args will panic if any arguments contains invalid Unicode.

If your program needs to accept arguments containing invalid Unicode, use std::env::args_os instead

This function produces an iterator that produces 0sString values instead of String values

We chose to use std::env:args for simplicit because 0sString values differ per platform and are more complex to work with than String values.

On the first line of main we call env::args and then collect is immediately used to turn the iterator into a vector containing all the values produced by the iterator.

We can use the collect function to create many kinds of collection, so we eplicitly annotate the tpye of args to specify that we want a vector of strings.

When using collect and other functions like it we need to annotate because Rust isn't able to infer the kind of collection desired

See the output with and without any arguments after cargo run

$ cargo run
   Compiling minigrep v0.1.0 (file:///projects/minigrep)
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.61s
     Running `target/debug/minigrep`
[src/main.rs:5:5] args = [
    "target/debug/minigrep",
]
$ cargo run -- needle haystack
   Compiling minigrep v0.1.0 (file:///projects/minigrep)
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 1.57s
     Running `target/debug/minigrep needle haystack`
[src/main.rs:5:5] args = [
    "target/debug/minigrep",
    "needle",
    "haystack",
]

Notice that the first value in the vector is "target/debug/mingrep", this is the name of our binary.

This matches the behavior if the arguemtns list in C, letting programs they were invoked in their execution.

Its often convenient ot have access to the program name in case you want ot print it in messages or change the behavior of the program based on what command line alias was sed to invoke the program.

For this program we will ignore it and save only the tow arguments we need.

Saving the Argument Values in Variables

The program is currently able to access the values specified as command line args

Now we should save the two arguments in variables so that we can use them later and throuht the program

We should do this by &args[1]

The first arg that minigrep takes is the string we are searching for, so we put a reference to the first arg in the var query

The second arg is the file path, so we put a reference to the second argument in the var file_path.

We will temporarily print the values of these varaibles to prove that the code is working as intended

Here is what the output would look like at this point

$ cargo run -- test sample.txt
   Compiling minigrep v0.1.0 (file:///projects/minigrep)
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.0s
     Running `target/debug/minigrep test sample.txt`
Searching for test
In file sample.txt