# Exercism: The Hamming Exercise

Posted by on June 3, 2014

The Readme

The Test Suite

We need to find the differences between two strings. But those strings are also very array-like; as in, “On this DNA strand there’s a G at the first position.” And Ruby already has a lot of nice syntax for comparing arrays. So I started thinking about this problem as one with an array-focused solution.

I also thought about the Ruby Set library, but that won’t work because Sets want unique objects, so a “G” in the first position and a “G” in the 8th position would be squished down to just one “G”.

``````strand = %w(g a t t a g)
=> ["g", "a", "t", "t", "a", "g"]
strand.to_set
=> #<Set: {"g", "a", "t"}>
``````

That won’t work.

So, following my standard approach, I did the easiest fastest thing that got the tests passing. And it looked like this

``````class Hamming
def self.compute(a,b)
a = a.chars
b = b.chars
max = (a.count < b.count) ? a.count : b.count
a = a[0,max]
b = b[0,max]
y = a.zip(b)
y.inject(0) {|ret, h| ret += 1 if h.first != h.last; ret}
end
end
``````

Oh, yeah, that’s nasty. The duplicate logic; the weird, meaningless sorting and trimming of arrays; the block that takes a few seconds to puzzle out. It’s rough stuff.

But, it passed my tests, and that’s all I wanted. With tests green, I can refactor. And refactor I did! Removing the array trimming was an easy fix. Then I began to pull out Strand behaviors

``````class Strand
include Enumerable

attr_accessor :collection

def self.parse(strand_string)
self.new(strand_string)
end

def initialize(strand_string)
self.collection = strand_string.chars
end

def each(&block)
collection.each(&block)
end
end
``````

Like I said, I quickly thought of a Strand as being array-ish. So `Strand` has a way of parsing a string into an array and then acts as a thin wrapper around the array.

But if an array for each Strand is good, then surely one more array must be better! After noticing the behaviors that required two Strands, I decided to contain that logic in a `Strands` object.

This was a mistake. And even when I tried to make it better it was still a mistake. The `compute` method became nedlessly complex and tied to its knowledge of both Strand and Strands. Time to ditch this and revisit how I originally wanted to solve this problem.

I was initially drawn to a Strand as an array, largely because of Ruby’s implementation of array subtraction/diffing/etc. The Ruby implementaition of subtraction doesn’t work for Hamming, but I still liked that syntax. In my perfect world, this would work:

``````hamming_differences = Strand.parse('GAT') - Strand.parse('AAT')
``````

Your own desires are a great design tool. This is the syntax you want to see? Write code that makes it happen. Don’t know how? Learn!

So I ditched Strands. As I say in the commit:

Strands didn’t really give me anything beyond a way to manipulate two Strand objects. It seemed unecessary if I could have a Strand know how to do the maniuplation. Like, if I want to add 1 + 2, I don’t have an Integers object that does the math, the Integer object knows how to do it.

And, after I googled “What the hell is Hamming?” and discovered that it is a very generic idea used all the time, I also decide to separate the Hamming logic in such a way that I can add it to any class.

``````class Hamming
def self.compute(a,b)
(Strand.parse(a) - Strand.parse(b)).count
end
end

module Hammable
def -(other)
sorted = [self,other].sort_by { |x| x.count }
combined = sorted.first.zip(sorted.last)
x = combined.select {|strand_set| strand_set.first != strand_set.last}
self.class.new(x)
end
end

class Strand
include Enumerable
include Hammable

def self.parse(strand_string)
self.new(strand_string.chars)
end

def initialize(strand_array)
self.collection = Array(strand_array)
end

def each(&block)
collection.each(&block)
end

private

attr_accessor :collection
end
``````

Now I have a `Hammable` module I can add to whatever. The `-` method implementaiton is ugly, but it works.

I started to doubt the wisdom of overriding the array `-` method. It is probably a really dumb idea. Imagine the unexepcted surprises. Want to subtract two normal arrays, it works one way. Subtract two hammable arrays and it’s totally different. Yikes! And if you had one hammable and one normal array, the behavior would change depending on the order you used. Terrible. So a better approach is to define a new method, say `hamming_difference`

``````Strand.parse(a).hamming_difference(Strand.parse(b))
``````

It’s a better idea, but it’s not anywhere as nice to read. Oh, well. Sometimes we can’t have everything we want.

And there’s the question of what type of object it should return. Right now it will return a `Strand` as it was called by a `Strand`. But a better approach might be to have it return a `HammingDifference` object (itself just a thin wrapper around an array).

``````module Hammable
def hamming_difference(other)
sorted = [self,other].sort_by { |x| x.count }
combined = sorted.first.zip(sorted.last)
x = combined.select {|strand_set| strand_set.first != strand_set.last}
HammingDifference.new(x)
end
end

class HammingDifference < SimpleDelegator; end
``````

But what does that give us? Is there an appreciable difference between Hamming and HammingDifference? If so, I’m not seeing it. So now, maybe after all this code re-arranging, we should move the Hamming logic back under `Hamming`.

``````class Hamming
def self.compute(a, b, type = Strand)
self.new(a, b, type).count
end

def initialize(a, b, type)
self.one = type.parse(a)
self.two = type.parse(b)
end

def count
difference.count
end

def difference
one.zip(two).select {|pair| Comparison.different?(pair)}
end

class Comparison
def self.different?(couple)
couple.first != couple.last &&
!couple.last.nil? &&
!couple.first.nil?
end
end

private

attr_accessor :one, :two
end

require 'delegate'
class Strand < SimpleDelegator
def self.parse(strand_string)
self.new(strand_string.chars)
end
end
``````

And finally, maybe, I’m ok with the code. Well, the attributes inside of `Hamming` are badly named, but that’s minor. `compute` has become a declarative builder method that instantiates an instance of `Hamming` and call’s the instance’s `count` method.

Gone is `Hammable`, since its logic now lives in `Hamming`. We no longer have to mix that code into `Strand`, though we do now have to tell what kind of objects `Hamming` should create. I’ve defaulted it to `Strand` here, but `Hamming` could easily work with other strings we want to find the Hamming distance of.

`Comparison` lives inside of the `Hamming` namespace, which looks a little weird. Why not just have `difference?` be a method in `Hamming`? Well, I wanted to encapsulate the logic of what constitutes a difference between two strands: both strands have to have a letter and those letters have to be different from one another. And to encapsulate the logic inside of a `Hamming` method seemed weird. Because somewhere in the code we’nd end up calling `self.different?(pair)` and that wasn’t a method I wanted a `Hamming` instance to have. For some reason. Even I’m a little unclear on why I didn’t like it. Further proof that code design is frequently more an art that a science.

### Summary

I think the code here is more reasonable than my work on the Bob problem. There’s less cleverness for cleverness’ sake. `Hamming` works easily with `Strand` objects, but can work with other objects. The code is simple enough to understand. I can see this implementation being one that I’d be happy to work with.

Possible dowsides are that the array manipulation will make this slow. I’m sure there’s a much better algorithm out there for figuring out the hamming distance. But Exercism isn’t about performance tuning, so I’m not going to worry about that.