Hashes and encapsulation

Earlier today I made an off-the-cuff remark about encapsulation on twitter.

FYI: if you're accessing hash elements via obj.hashthings['foo'] rather than obj.hashthing('foo'), you're breaking encapsulation

Then Josh, not realising that I was simply casting judgement down from my ivory tower, asked for more than 140 characters worth of explanation:

@jonleighton FYI its nice if you explain a bit more :) blog post?

So, I agreed. Here is that explanation.

Suppose you have an object representing a DOM element. DOM elements have attributes, which you store as a hash. Something like this:

class DOMElement
  attr_reader :attributes

  def initialize
    # ...
    @attributes = Hash[parse_attributes.map { |name, value|
      [name, DOMAttribute.new(name, value)]
    }]
  end
end

In your code that uses DOMElement, you access an attribute like this:

element.attributes['margin-left']

If you try to access an attribute that does not exist, nil will be returned.

Later on, you realise that you are checking for nil in lots of places where you use attributes. Listening to the code smell, you decide that missing attributes should instead return a null object.

What do you do? Well, your only option is to add a default proc to the Hash:

@attributes = Hash[parse_attributes.map { |name, value|
  [name, DOMAttribute.new(name, value)]
}]
@attributes.default_proc = proc { NullDOMAttributes.new }

You have now made your object impossible to marshal without using insane hackery, since procs cannot be marshalled.

The more important point, though, is that by allowing users to dip into the @attributes hash like this, you are failing to encapsulate the internal implementation of your DOMElement object.

This presents another problem: an unrelated dodgy bit of code could remove an attribute from the element, without your DOMElement object realising!

el.attributes.delete('margin-left')

If you pass around the same hash between lots of different objects, that hash is quite likely to get mutated at some point in my experience, and when that happens it will affect all the other objects relying on it.

Yes, you could call @attributes.dup every time the attributes method is called, but that’s hardly very efficient when we only want to get at a single attribute.

If the code had been written in an encapsulated manner to begin with:

class DOMElement
  def initialize
    # ...
    @attributes = Hash[parse_attributes.map { |name, value|
      [name, DOMAttribute.new(name, value)]
    }]
  end

  def attributes
    @attributes.dup
  end

  def attribute(name)
    @attributes[name]
  end
end

It would be straightforward to add the null object without further consequences:

def attribute(name)
  @attributes.fetch(name) { NullDOMElement.new }
end

So, exposing internal hashes can be convenient, but you should always think twice before doing so and realise that you might regret the decision at a later date.

Comments

I'd love to hear from you here instead of on corporate social media platforms! You can also contact me privately.

Celestial M Weasel's avatar

Celestial M Weasel

Yesbutnobut, you could always define [] and []= for whatever class you made attributes an instance of.

Jon Leighton's avatar

Jon Leighton

Yesbutnobut, that would still be a demeter violation :)

Jon Leighton's avatar

Jon Leighton

The other point is that you may want #attributes to return a 'plain old hash' when called - and only want the null object when the user has specifically request a specific attribute.

Celestial M Weasel's avatar

Celestial M Weasel

Yes, indeed, but not encapsulation.

Fun fact - one of the 'advantages' (for some value of advantage) of FORTRAN over C was that array reference and function call both have syntax BADGER(I,J), for getting values, so under certain (rare) circumstances you can change from having an array to having a function without having to change code.

Celestial M Weasel's avatar

Celestial M Weasel

You could do that when overloading [] - after all you could always fabricate the 'plain old hash' from whatever you were storing.

Celestial M Weasel's avatar

Celestial M Weasel

I am agnostic about Demeter anyway. We have a grids (as in spreadsheet type grids) library which is badger.duck().hamster().wombat() up to the hilt. I find things like this aesthetically unpleasing but am not sure that the disadvantages outweigh the advantages. I will have to look at the code and see what I think.

Lucas Hungaro's avatar

Lucas Hungaro

It's not a "one size fits all" solution, but I like to use OpenStruct for value objects (while most people use hashes). A nice and underused Ruby class. :)

iande's avatar

iande

Favorite line: "Listening to the code smell ..."

Alex Kahn's avatar

Alex Kahn

How so? [] would just be a different name for the method you called attribute.

Jon Leighton's avatar

Jon Leighton

You're now doing two method calls. You're doing element.attributes.[]('margin-left'), rather than element.attribute('margin-left').

Alex Kahn's avatar

Alex Kahn

I misunderstood. I thought the commenter was proposing DOMElement#[], so you would access attribute values like: el['href'].

Bill Dueber's avatar

Bill Dueber

I don't understand this. When attributes is a Hash, you're calling Hash#[]. When attributes is another class that defines the method [], you're just calling MyOtherClass#[]. One method call in each case.

Jon Leighton's avatar

Jon Leighton

Yes, but I am comparing the case of when you want to get a *single* attribute from *element*.

element.attributes[foo] results in two chained method calls. There's DOMElement#attributes, and then Hash#[] foo.

element.attribute(foo) results in one method call, to DOMElement#attribute.

(Note that I am talking about external calls, of course DOMElement#attribute contains a method call to Hash#[], but that's an implementation detail and not the concern of the user of DOMElement.)

Avdi Grimm wrote some good articles on Demeter:

* http://avdi.org/devblog/201...
* http://avdi.org/devblog/201...

Anthony Navarre's avatar

Anthony Navarre

The practical disadvantage of that kind of chaining (besides being difficult to write automated tests for it) is that when something breaks in one of those systems, it can become excruciating to track it down.

Avdi Grimm's avatar

Avdi Grimm

Indeed. There's a larger point here, which is that having objects expose their aggregations to the world directly--any type of aggregation--breaks encapsulation. If you're going to treat objects as objects rather than as glorified C structs, they need to be empowered to manage their own collections and control access to them.

Timon Vonk's avatar

Timon Vonk

Nice article! In cases like these I usually prefer dynamic methods on the attributes (e.g. attributes.keys.each &define_method), given the implementation is usually common. I guess in most cases attributes are part of the public interface anyway, so that's ok*.

* As long as it's readable -.-

Luke Redpath's avatar

Luke Redpath

What's wrong with defining #[] on DOMElement, so attributes can be fetched thus: el["some-attribute"]. Seems reasonable to me. I'm pretty sure this is what Alex meant.

Jon Leighton's avatar

Jon Leighton

Ok, in which case it's fine; it doesn't matter if you name the method #attribute or #[].

Samuel Cochran's avatar

Samuel Cochran

In this (increasingly contrived) example you could presume that a DOMElement exists mainly to manipulate attributes, and add #[], #[]= and #delete directly to DOMElement to enable encapsulated access to the attributes.

The better approach here would be to create a DOMAttributes object (lazily constructed by #attributes) which encapsulates this for you and mutates DOMElement when necessary. Mix in as much hash behaviour as makes sense, or add a #to_hash to get a copy of the attributes as a hash if neccessary.

All this being the programmer answer of jumping to solutions, you're right that this is a problem that occurs far too often mainly out of ignorance.

Alexey Muranov's avatar

Alexey Muranov

I am disappointed that `nil` does not qualify as a null object. But i do not like either checking the presence of a key by the "truth" of the return value, i prefer `has_key?` or at least `nil?` on the value.

skrat's avatar

skrat

Very nice! One note though. I wasn't familiar with null-object approach before, and after reading the linked article, it doesn't like such a good idea. In your case, hiding the fact that the attribute wasn't found, and pushing this fuzziness down the stack, can, and will cause issues in unexpected places. Being a polyglot, the common approach, the language supported approach would be to raise KeyError.

Emili Parreño's avatar

Emili Parreño

As mentioned above, why don't you use an OpenStruct object?

irb$ require 'ostruct'
=> true
irb$ person = OpenStruct.new(:name => "John", :lastname => "Smith")
irb$ person.name
=> "John"
irb$ person.phone
=> nil

Add your comment