layout: post title: “Simple Rust, Part Four” date: 2015-12-30T14:07:28-06:00 subtitle: “Iterators, References and Options” author: “Ian Whitney” —
This is it, the home stretch. In the 3 previous articles we’ve learned a lot about Rust (like Strings, Arrays and Lifetimes), but we haven’t actually written the code that started us down this path – finding anagrams of words. Today we will write that code.
Of our 11 tests we have managed to get the first one passing. Just 10 more tests to go! Here’s our next one:
Our current function won’t work, obviously. It only returns an empty vector:
Now, I have no idea how to do this in Rust. But I know how I would do it in Ruby, which is a good place to start.
I can use my working Ruby implementation like a Rosetta Stone to translate code I know how to write into a language that I’m still learning.
In the first line of my Ruby
anagrams_for, I’m using
reject to remove any input that is the same word as our source. “ant” is not an anagram of “ant” because they are the same word. Then I use
select to gather up all the input words that have the exact same letters as our source. This relies on Ruby’s Enumerable library and its support for closures.
Rust loves list-transformation functions like these, which it calls Iterator Adaptors. Another common name for them is Higher Order Functions. Ruby’s
reject are examples of the filter higher order function. Rust also offers filter, which it calls filter. The
filter documentation reads:
An iterator that filters the elements of iter with predicate.
Ok. That almost makes sense. I know what a predicate is (though I just learned it recently). It’s an expression that returns true or false. But what’s
iter? Let’s go back to Ruby and see how it handles iteration.
In Ruby, you can iterate over anything that’s enumerable:
Arrays, hashes, ranges – these are enumerable in Ruby so you can iterate over their elements and use Ruby’s collection of higher order functions like
Rust offers arrays, hashes and ranges as well. You can iterate over them…sometimes:
Notice my qualifying ‘sometimes’ above:
1..3 can be iterated over, but an array can not. Let’s follow the error’s advice and use
Great. Simply put,
iter() takes a collection converts it into an Iterator; once you have an Iterator you can iterate over it and use all those fancy higher order functions, like
Filter will make it easy for us to replicate the first part of my Ruby solution, rejecting all inputs that are duplicates of the source word.
Our filter uses the
same_word function as its predicate. If the words are the same, we remove the input. If not, we keep the input. Running this gives us the error:
(expected struct `collections::vec::Vec`, found struct `core::iter::Filter`) [E0308]
We’ve told Rust that
anagrams_for returns a
Vec, but it’s returning a
This is because Rust’s iterators are lazy. Our call to filter doesn’t actually do the filtering until we try to gather up the results. Ruby’s Enumerator works similarly, as does its lazy enumerable. Rust calls its result-gathering functions Consumers. Since we want to collect the results of our filter, we use the
collect consumer. And, as the documentation points out, we have to tell collect what type we want to get back. Our function promises to return a
Vec<&str>, so let’s have collect give us one of those:
Now we’re sure to compile, right? By now you should know that every time I ask this question the answer is, “No.”
error: the trait `core::iter::FromIterator<&&str>` is not implemented for the type `collections::vec::Vec<&str>`
Hell. What this is saying is that collect can’t give me a
Vec<&str> because I’ve given it a iterator that contains
&&str? Or should I say
&&str??. Where did that second
& come from?
The short answer is that
iter() borrows each element in inputs. Inputs was full of
&str, which became
&&str when they were borrowed by
&&&str if they are borrowed a 3rd time. I don’t think there’s a limit to how many times you can re-borrow something). Herman Radtke’s blog post Effectively Using Iterators In Rust dives into this topic, if you want more details.
We can fix our code by changing it to collect the right type, or we can also tell
collect to figure out the type itself:
_ is a ‘type placeholder’, which allows us to leave out the type of elements our vector will contain and let Rust determine it for us.
The type placeholder is nice, but it just gets us a new error. Our vector is still full of
&&str elements, and our function says it returns a
Vec<&str>. That’s a no go. We need a way to return
I’ve been using the words ‘reference’ and ‘borrow’ pretty interchangeably without defining what’s going on. There’s a good reason for that – I don’t understand what’s going on. But let me take a stab at it:
If I write:
Then x is a
&str. I read that as “x is borrowing a str”, or “x holds a reference to a str”. Somewhere in memory is the actual
str, and our variable
x is holding a reference to that location.
If we then borrow x again:
y is a
&&str. Or, “y holds a reference to a reference to a str.”
Rust offers a way to get at the thing we are referencing – the entirely un-googleable
As far as I can tell
* is not well documented in the Rust book. It’s mentioned in the References and Borrowing section with the almost throwaway sentence:
You’ll also notice we added an asterisk (*) in front of y, making it *y, this is because y is an &mut reference. You’ll also need to use them for accessing the contents of a reference as well.
Emphasis mine. I dunno, maybe I’m off base, but this sounds pretty important. It’s certainly important to us right now. We have a vector full of references to references, and we want to get a vector of just the original references:
We create a new mutable vector and fill it with the
&str references from our filtered collection. We then return that vector, and our code compiles!
Now that we can remove inputs that match our source word, we have to find inputs that are anagrams of our source word. In my Ruby solution, I converted the word to an array of letters with
chars and then sorted the letters alphabetically with
I can then compare the sorted array of input characters to the sorted array of source characters.
We can do the same thing in Rust. The code looks familiar.
same_letters we use
chars() to convert our source and input to an iterable collection of characters, which we then
collect. We then sort those collections and compare them. This looks an awful lot like my Ruby example. Though in Rust, unlike Ruby, the
sort() function sorts the receiver in place, it does not return the sorted collection.
If we run this code against our test suite, nearly everything passes. Some tests cover upper-case characters, but the fix for case insensitivity isn’t that interesting so I’m going to skip it. If you want to see my solution, it’s over here.
Our tests pass, but I’m not happy with the code. I understand why I have to dereference everything, but that code seems totally irrelevant to what the
anagrams_for function is doing. It exists to make the compiler happy.
Creates an iterator that both filters and maps elements. If the specified function returns None, the element is skipped. Otherwise the option is unwrapped and the new value is yielded.
‘Unwrapping’ is kind of a weird word, and not one we’ve come across yet. There’s actually a lot of complexity here, which I’m going to entirely skip. The relevant thing we need to know is that I can stick a reference in the type
Some and when I ‘unwrap’ it I get my reference back. So:
Some’s partner in crime is
None. You always see the two of them together. Taken together they are the “Option” or “Optional” type. If an expression can return something or nothing, you’ll see code like this:
When you deal with the result of this expression you write code that checks, “Did I get Some or None?” and behave appropriately.
Though it’s not about Rust, I found the book Maybe Haskell to be a great introduction to the concepts behind Some and None It can be a hard concept to wrap your head around, though once you do the value it provides is immense.
Knowing that background, the
filter_map description makes more sense. It:
- filters an iterator’s elements, just like
- The filter uses a function that returns Some or None
- If None, the element is removed
- If Some, the reference is returned
Let’s see what our
anagrams_for function would look like if it used
2 lines shorter. More importantly, I find this easier to understand. The result of
filter_map is what we want to return, no need to loop and dereference a bunch of references.
And that’s it! Not only do we have a working anagram finder, but we (hopefully) know a lot more about Rust that when we started. There’s a lot (a lot) more to Rust than we’ve covered here, but this is where I’m going to stop for now. The next post on this site will return to the topic of Refactoring, and will be the first in a series of posts on Practicing Refactoring.
If you want to read more about Rust, Ruby, Refactoring and other things that start with R (and maybe other letters), maybe check out my totally free newsletter. You can read previous newsletters, or sign up for free. Comments/feedback/&c. welcome on twitter, at email@example.com, or leave comments on the pull request for this post.