The Problems of Data Hiding

Published September 26, 2007
Advertisement
Back to the technical stuff for now. I start at Day 1 on Monday, so you can probably guess what vein the posts for next week will be in.

The Problems of Data Hiding

I was reminded of this yesterday, when I was asked to reconsider sealing a bunch of SlimDX classes. (SlimDX objects were all previously sealed if there was no further inheritance in the codebase.) I looked at a couple MSDN things I was pointed to, and decided to unseal, with no particular objections or concerns. There was no particular reason to seal the classes in the first place, except for a vague idea of "well, these are the most inherited classes that show up in DX". It's worth noting that you can safely inherit DirectX classes and COM classes in general if you are careful about it, despite what I may or may not have said some years ago.

Sealing classes is basically a form of data hiding; it prevents people from accessing your protected members. Most people are familiar with scope limitations (private, protected, internal, package, etc) as a technique for data hiding. There are other techniques as well, like interfaces that selectively expose members via function calls (C APIs, COM, and other interop systems work like this). Sometimes you can provide a public version of the object that simply doesn't tell the user what's inside it. (This one is quite common in C.)

So what is data hiding? Why do we use it? Most people are aware that it relates to encapsulation, somehow. Some people think it is encapsulation. (Those are the people who end up with pass-through getters and setters for no reason other than that it provides 'encapsulation'.) I disagree with that view. Encapsulation is not information hiding. As for the why, most people will tell you that you do it so that people can't mess up your internal data and therefore break your code. The problem is that doesn't work. Breaking through the barrier is usually pretty easy, and I know how to do it with minimal effort in C, C++, and .NET. And if this is the case, why would you disallow reading of the internal data?

It should be clear at this point that data hiding isn't about preventing people from modifying internal values, it's just about discouraging it and suggesting going through the proper interfaces. There was a question posed in the forums recently about how you would implement private in C. (Someone please find this thread and post it in comments.) The answer is not clever syntactical tricks. It's much simpler than that.
struct SomeData{    int publicValue1;    float publicValue2;        //internal use only    int m_privateValue1;    float m_privateValue2;};
That's all. Having language enforced scope is merely a convenience. As long as you settle on a convention, you can have it even in C.

What about encapsulation, then? The point of encapsulation is to hide implementation details -- details which might change. A 3D vector will always have X, Y, and Z members, so there's no meaningful encapsulation for one. If you make the members private and give it get and set functions, you're just wasting your time and creating more code to maintain. It's data hiding, but it's not encapsulation! Data hiding merely provides a support mechanism which can be used to assist with encapsulation. Sadly, it seems that most introductory texts and courses don't explain this or even suggest it.

So why were we sealing classes? To provide encapsulation? The composition of most SlimDX classes is a pointer to a DX object and nothing more. That will never change. Hiding that stuff away is not useful, and in fact we're in the process of exposing that to the world via a CLS-friendly IntPtr member. The tricky part, then, is knowing what should actually be encapsulated. This is a tough one, and there's a couple questions to look at:

* What could a client possibly want this data for?
* Is there any harm in letting the client have this data?
* Is it likely that this data will vanish or change substantially in the future?
* Is this going to pollute reflection tools like Intellisense and property grids? (This last one is very new-age.)

Sadly, there's no definitive answer. Personally, I tend to favor not encapsulating. My reasoning is that failing to expose something is far more problematic than exposing it unnecessarily for most users, and hostile or abusive client code WILL get its hands on your data if it's decided to. (Just read Raymond Chen's blog for a while if you want confirmation of that.) Take a look at .NET's System.Collections.Generic.List. It's a straightforward resizable array. The mistake they made was to not expose the internal array directly, at least as a read-only property. The only danger in doing so is that you could skip the bounds check of the list size, and you'd only be subject to the capacity limitation. But who cares? It's not like that's even dangerous. And if someone wants your internal array, it's reasonable to assume they have a good reason. For example, they might want to pin it and submit it to SlimDX for use as buffer contents. People do this anyway; it's just that they end up using reflection to penetrate the scope shield and access it anyway. So what was accomplished by the data hiding here? Nothing at all. Not only that, MS can't even change the internal name of the member anymore.

In short, please take a moment to think about why you're hiding your class data, and if it's really doing any good, or just making everyone's life harder. Think about it real hard if you're using the 'friend' keyword in C++. You might just be wasting your time.
0 likes 4 comments

Comments

_the_phantom_
I personally favour encapsulating by default (within sensible rules, so no Vector classes with 'setX') based on the idea that it is easier to give than to take away; once exposed it's hard to remove that exposure without breaking code, but if it's nearly exposed there is less of a problem.. well, imo of course [smile]
September 26, 2007 04:49 PM
LachlanL
I agree with Phantom on this one.

I think that data-hiding/encapsulation/whatever provides a useful safety mechanism for anyone who would use your classes (including yourself). Sure, putting a comment "these are for internal use only" is one way of doing this, but people who use VS's auto-complete or similar IDE functionality to specify data members won't see that (or at least, they won't in your example. If you put the "this is private!" comment in front of all such members, you might get their attention, but then again, you might not. Also, you then remove your ability to comment your members).

Sure, if someone really wants to do something you didn't intend, they can probably find ways around (dunno if you can in C#, that's what I'm coding in atm), but these mechanisms are mainly there to stop people from accidently abusing your code, aren't they?
September 27, 2007 12:15 AM
Promit
Quote: Original post by LachlanL
people who use VS's auto-complete or similar IDE functionality to specify data members won't see that (or at least, they won't in your example. If you put the "this is private!" comment in front of all such members, you might get their attention, but then again, you might not.
That's why the private members are prefixed. The comment is just a clarification, it's not necessary.
September 27, 2007 12:10 PM
EnlightDM
I do not agree, data-hiding & encapsulation are just features that are used by the api developers to make life easier to the client. If .NET generic.list<T> doesn't work for you, use a different class. In c++ I can retrieve the pointer to the array with std::vector. So its a matter of using the right class for the right job, not *forcing* the classes to do what you want, because there begins your problems...

How do you solve this, without encapsulation:
class D3DGraphics
{
protected:
    bool m_windowed;
public:
inline void setWindowed(bool b) { m_windowed = b; updateDevice(); }
inline bool getWindowed() { return m_windowed; }
...
}

By removing get/set and forcing the user to "remember" to call updateDevice() you are making his life HARDER, not easier..., and it is so simple, and if you care about performance... inline functions like getWindowed() gets compiled in the same way as if m_windowed is declared public and directly accessed.

From the article you posted: "Encapsulation is not information hiding" (By Wm. Paul Rogers, JavaWorld.com)
Quote: Information hiding rule 1: Don't expose data items

Make all data items private and use getters and setters. Don't fool yourself into believing no harm will result from directly accessing an object's internal data items. Even if only you code against those internals, future vulnerability still exists. You can't predict when you might need to change the internal data's nature, and brittle coupling with client objects sounds unnerving when shattered.
November 05, 2007 07:14 AM
You must log in to join the conversation.
Don't have a GameDev.net account? Sign up!
Advertisement