Identity of the results of Ruby conversion methods
Ruby standard types define various conversion methods. For example
Array#
makes a Set
of an Array
, and Set#
makes an Array
of
a Set
. Some conversion methods however, don’t seem to be so useful at first
sight, such as Array#
and Set#
. These “conversion” methods are
useful because they allow code to be written to operate on a specific type,
Set for example, while still accepting anything that supports being converted
to a Set
by defining a #to_
method.
What I am wondering is: do those “no-op” or “identity” conversion operators
create a new copy of the target or do they return the target itself? One way
to find out is to use the Object#
method. Because this method is
defined on the Object
class it is available on every single object. Its
return value is an integer which uniquely identifies its target. If two
objects have the same object ID, then they are one and the same. We say they
are identical. Two objects can be equal without being identical
however. For example, [] == []
will be true because all empty arrays are
equal, but [].
will be false because these are two
distinct empty arrays which resides in two different locations in
memory. Identical objects however, are always equal, because an object is
always equal to itself.
With this out of the way, let’s get to testing.
irb> ary = [] => [] irb> ary.object_id => 373620 irb> ary.to_a.object_id => 373620
Now this shows us that Array#
just returns the array without making a
copy. Let’s also check Set#
.
irb> set = Set[] => #<Set: {}> irb> set.object_id => 405360 irb> set.to_set.object_id => 405360
This means Set#
also just returns the set without copying it. This is a
reasonable optimization, but it has consequences one should be aware of.
Let’s define a method that will accept any kind of collection and count how many items are three-letter words.
def three_letters_word_count(collection) ary = collection.to_a ary.select! { |word| word.length == 3 } ary.count end
This certainly isn’t the nicest implementation. We could just call #count
with a block that does the filtering. But this is for illustration purposes
only. Let’s test this method.
irb> ary = %w[the quick brown fox jumps over the lazy dog] => ["the", "quick", "brown", "fox", "jumps", "over", "the", "lazy", "dog"] irb> three_letters_word_count ary => 4 irb> set = Set['foo', 'bar', 'quux'] => #<Set: {"foo", "bar", "quux"}> irb> three_letters_word_count set => 3
It all looks quite reasonable until we take another look at our set and our array after the fact.
irb> ary => ["the", "fox", "the", "dog"] irb> set => #<Set: {"foo", "bar", "quux"}>
The set is fine, but the array got mutated! This in itself is not surprising
as we know that Array#
did not perform a copy and the implementation of
our method mutates the array. What is surprising is that this behavior depends
on the type of the argument, since any other type of collection will get
copied into a new array which the method can safely mutate.
Just means using to_a or to_set isn’t enough if you’re planning on mutating a
copy of your argument. You need to also dup it if the type is already
correct. Or you could dup only if the result of to_a is identical to the
argument, which can be done with Object#
which is equivalent to
checking equality of the object IDs.
ary = collection.to_a ary = ary.dup if ary.equal? collection # Mutating ary is safe here.