Victor Schubert’s personal page

Identity of the results of Ruby conversion methods

Ruby standard types define various conversion methods. For example Array#to_set makes a Set of an Array, and Set#to_a makes an Array of a Set. Some conversion methods however, don’t seem to be so useful at first sight, such as Array#to_a and Set#to_set. These “conversion” methods are useful because they allow code to be written to operate on a specific type, Set for example, while still accepting anything that supports being converted to a Set by defining a #to_set method.

What I am wondering is: do those “no-op” or “identity” conversion operators create a new copy of the target or do they return the target itself? One way to find out is to use the Object#object_id method. Because this method is defined on the Object class it is available on every single object. Its return value is an integer which uniquely identifies its target. If two objects have the same object ID, then they are one and the same. We say they are identical. Two objects can be equal without being identical however. For example, [] == [] will be true because all empty arrays are equal, but [].object_id == [].object_id will be false because these are two distinct empty arrays which resides in two different locations in memory. Identical objects however, are always equal, because an object is always equal to itself.

With this out of the way, let’s get to testing.

irb> ary = []
=> []
irb> ary.object_id
=> 373620
irb> ary.to_a.object_id
=> 373620

Now this shows us that Array#to_a just returns the array without making a copy. Let’s also check Set#to_set.

irb> set = Set[]
=> #<Set: {}>
irb> set.object_id
=> 405360
irb> set.to_set.object_id
=> 405360

This means Set#to_set also just returns the set without copying it. This is a reasonable optimization, but it has consequences one should be aware of.

Let’s define a method that will accept any kind of collection and count how many items are three-letter words.

def three_letters_word_count(collection)
  ary = collection.to_a! { |word| word.length == 3 }

This certainly isn’t the nicest implementation. We could just call #count with a block that does the filtering. But this is for illustration purposes only. Let’s test this method.

irb> ary = %w[the quick brown fox jumps over the lazy dog]
=> ["the", "quick", "brown", "fox", "jumps", "over", "the", "lazy", "dog"]
irb> three_letters_word_count ary
=> 4
irb> set = Set['foo', 'bar', 'quux']
=> #<Set: {"foo", "bar", "quux"}>
irb> three_letters_word_count set
=> 3

It all looks quite reasonable until we take another look at our set and our array after the fact.

irb> ary
=> ["the", "fox", "the", "dog"]
irb> set
=> #<Set: {"foo", "bar", "quux"}>

The set is fine, but the array got mutated! This in itself is not surprising as we know that Array#to_a did not perform a copy and the implementation of our method mutates the array. What is surprising is that this behavior depends on the type of the argument, since any other type of collection will get copied into a new array which the method can safely mutate.

Just means using to_a or to_set isn’t enough if you’re planning on mutating a copy of your argument. You need to also dup it if the type is already correct. Or you could dup only if the result of to_a is identical to the argument, which can be done with Object#equal? which is equivalent to checking equality of the object IDs.

ary = collection.to_a
ary = ary.dup if ary.equal? collection
# Mutating ary is safe here.