The Problems of Data Hiding
I was reminded of this yesterday, when I was asked to reconsider sealing a bunch of SlimDX classes. (SlimDX objects were all previously sealed if there was no further inheritance in the codebase.) I looked at a couple MSDN things I was pointed to, and decided to unseal, with no particular objections or concerns. There was no particular reason to seal the classes in the first place, except for a vague idea of "well, these are the most inherited classes that show up in DX". It's worth noting that you can safely inherit DirectX classes and COM classes in general if you are careful about it, despite what I may or may not have said some years ago.
Sealing classes is basically a form of data hiding; it prevents people from accessing your protected members. Most people are familiar with scope limitations (private, protected, internal, package, etc) as a technique for data hiding. There are other techniques as well, like interfaces that selectively expose members via function calls (C APIs, COM, and other interop systems work like this). Sometimes you can provide a public version of the object that simply doesn't tell the user what's inside it. (This one is quite common in C.)
So what is data hiding? Why do we use it? Most people are aware that it relates to encapsulation, somehow. Some people think it is encapsulation. (Those are the people who end up with pass-through getters and setters for no reason other than that it provides 'encapsulation'.) I disagree with that view. Encapsulation is not information hiding. As for the why, most people will tell you that you do it so that people can't mess up your internal data and therefore break your code. The problem is that doesn't work. Breaking through the barrier is usually pretty easy, and I know how to do it with minimal effort in C, C++, and .NET. And if this is the case, why would you disallow reading of the internal data?
It should be clear at this point that data hiding isn't about preventing people from modifying internal values, it's just about discouraging it and suggesting going through the proper interfaces. There was a question posed in the forums recently about how you would implement private in C. (Someone please find this thread and post it in comments.) The answer is not clever syntactical tricks. It's much simpler than that.
struct SomeData{ int publicValue1; float publicValue2; //internal use only int m_privateValue1; float m_privateValue2;};
What about encapsulation, then? The point of encapsulation is to hide implementation details -- details which might change. A 3D vector will always have X, Y, and Z members, so there's no meaningful encapsulation for one. If you make the members private and give it get and set functions, you're just wasting your time and creating more code to maintain. It's data hiding, but it's not encapsulation! Data hiding merely provides a support mechanism which can be used to assist with encapsulation. Sadly, it seems that most introductory texts and courses don't explain this or even suggest it.
So why were we sealing classes? To provide encapsulation? The composition of most SlimDX classes is a pointer to a DX object and nothing more. That will never change. Hiding that stuff away is not useful, and in fact we're in the process of exposing that to the world via a CLS-friendly IntPtr member. The tricky part, then, is knowing what should actually be encapsulated. This is a tough one, and there's a couple questions to look at:
* What could a client possibly want this data for?
* Is there any harm in letting the client have this data?
* Is it likely that this data will vanish or change substantially in the future?
* Is this going to pollute reflection tools like Intellisense and property grids? (This last one is very new-age.)
Sadly, there's no definitive answer. Personally, I tend to favor not encapsulating. My reasoning is that failing to expose something is far more problematic than exposing it unnecessarily for most users, and hostile or abusive client code WILL get its hands on your data if it's decided to. (Just read Raymond Chen's blog for a while if you want confirmation of that.) Take a look at .NET's System.Collections.Generic.List. It's a straightforward resizable array. The mistake they made was to not expose the internal array directly, at least as a read-only property. The only danger in doing so is that you could skip the bounds check of the list size, and you'd only be subject to the capacity limitation. But who cares? It's not like that's even dangerous. And if someone wants your internal array, it's reasonable to assume they have a good reason. For example, they might want to pin it and submit it to SlimDX for use as buffer contents. People do this anyway; it's just that they end up using reflection to penetrate the scope shield and access it anyway. So what was accomplished by the data hiding here? Nothing at all. Not only that, MS can't even change the internal name of the member anymore.
In short, please take a moment to think about why you're hiding your class data, and if it's really doing any good, or just making everyone's life harder. Think about it real hard if you're using the 'friend' keyword in C++. You might just be wasting your time.