Tuesday, November 22, 2011

Data privacy in Java

Encapsulation is one of the powerful object oriented programming concepts. It is a technique by which member fields (data representative of object) are designated with private access, and harbored by accessor methods.

Encapsulation provides a shield which regulates the data access through getter and setter methods such that the data is tightly associated with the code around it, and the data is protected from cluttered and random accesses from external code - a code which does not belong to the class that encloses declaration of the member field.

But upto what degree this protection is being provided in Java? And in which context this notion of data privacy has to be perceived? Let us examine different scenarios.

1. When a member field is declared as private and flanked by public getter-setter methods, an external code entity can perform every action which an internal method is capable of doing on the field - the private field can be read from, written to, and even purged (nullified). In short, a private field with associated setter-getter methods is as good as a public field in all execution aspects, except in the title bestowed by the language.

2. When a member field is declared as private and covered by a public getter method but not with a setter method, an external code can perform 'most' of the actions which an internal method is capable of doing on the member: the field can be read from. The field, if it is a user defined object, every action which the container object is capable of doing on it, can be performed by external code as well, except for purging (nullifying) the field. Since the getter method returns a copy of the field reference (not a copy of the object), the returned reference also points to the same data in the heap. This means the component reference (private field residing in the container) as well as the returned value of the getter method - both are same in all aspects with respect to the permitted actions, and the underlying data pointed can be modified from outside code as well. Two exceptions here are: i) private primitive fields cannot be modified as the getter method returns a copy of it, not the reference. ii) Nullification of the component reference is not possible, as a copy of the reference is what was returned by the getter method and nullifying the reference nullifies only the copy, not the original.

In short, in both these cases, if the programmer wants to restrict outside code from modifying a private field, the getter method has to clone out a copy of the field and return it.

3. When the field is private and there are no getter setter methods, apparently the object is hidden from outside code, but it is not. Given an object, all of its components can be accessed and modified including purging and cleansing, from an outside java code by making use of reflection APIs. In addition there is undocumented unsafe API collections through which any object references, any objects, any part of the java heap and any part of the process address space can be reached out, with complete capability to read-write. Programmatically, by adhering to the language semantics. This way of data access can be restricted using custom security managers, but they have side effects.

In C++, an object or a primitive field returned through a getter method are cloned copies of the original, so external code is incapable of modifying the shielded object, adhering strictly to the data privacy documented in the language. Only when the getter method returns a object pointer instead of the object itself, the 'intrusion' is possible. The above explained limitation in Java is root caused by the pointer-less design which inhibits the program from flexible object allocation, administration and propagation.

Having said that, it is important to understand the notion of data privacy in java as a mean to design modularized code for better maintainability and as a mild endorsement by the compiler for fending off cluttering and unsolicited data accesses in the code. Encapsulation cannot be used as a mean to achieve data security in the application. Re-usable code modules which impart in enterprise business applications should not rely on this language supported feature for preserving the desired security, instead the data has to be secured through custom means specific to the application.

1 comment:

  1. When I learnt Reflections API, I was shocked too as I was able to peep into another class instance and view even the private variables.

    Gireesh, I am just curious to know how an entire object is returned in C++ because we can either return data equal to size of a register or a pointer(which is equal to the size of a register) which points to much bigger data. But how objects or even long long (which is 64 bits) in 32 bit systems are returned?

    I have been returning objects/64 bit types on x86 architecture all these days, but missed to check the assembly code that gets generated out of the C++ code to understand how this works :)