Sunday, May 10, 2015

Kernel abstraction in Java


This is what I think of JNI in Java.

A managed runtime which implements a virtual machine, has to abstract two types of programming needs, in one of many such abstractions - programming needs in the user space, and those in the kernel space. While the programming needs in the user space such as assignments, arithmetic operations, method invocations etc. are abstracted in well defined bytecodes which theoretically runs on a stack based virtual machine, abstracting the programming needs in the kernel space is not very easy and straightforward, and require special treatment.

Kernel services are exposed to the user world through APIs called system calls, and pluralities of platforms expose multitudes of methods which differ in their nomenclature, input, service they provide, and the side effect they cause in the execution environment.

There are three approaches to abstracting the programming needs in the kernel space:

1. Define one byte code for each system call in each platform. This approach has many drawbacks: i) The number of byte codes will bloat up beyond maintainable limit, ii) will cause the size of the bytecode to cross one byte, ii) The programmer should have the precise knowledge about the underlying system, iv) the program will no longer be platform independent, as it will contain platform specific conditionals and considerations.

2. Define one byte code (such as invokenative) for all the system calls in all the platforms. This has the last two disadvantages mentioned above - the platform independence is lost, the programmer should keep track of the platform differences and code accordingly. Also there are other logistical issues such as arrival of new bytecode combinations (such as invokenativevirtual, invokenativestatic etc. based on the access type of the native method).

3. Abstract all the kernels and define generic APIs which meet the system needs of the program (the existing approach) and manage the platform dependent details from within. This includes mapping each high level APIs to one or more system calls, preparing their input, and manage the call dispatch. It is not possible to do these procedures in an interpreter, as for different methods these activities will differ. And moreover, it is tiresome to identify custom invocation sequences for each native calls based on the name and signature of the generic API under execution. At this point it makes sense to have custom native (programs which compile into native code and run in an un-managed runtime such as the machine itself) wrappers around the system (or library) calls, and define a protocol for invoking these wrappers from the Java APIs. All of the (operating) system abstractions in the JRE (file system, console, networking, graphics, process management, threading, synchronization, etc.) have native interfaces wrapping around their system counter parts, and manage the service invocation between the system layer and the java layer.

This protocol, which is used to communicate between java APIs and their native back-ends which interface with the underlying system, is called JNI protocol. JNI is a necessity for the Java language and the virtual machine to achieve kernel abstraction and thereby achieve platform neutrality. As a matter of fact, not all the system calls are abstracted in this way. Also there could be scenarios where programmer would need to perform tasks natively, or avail computation service from pre-existing native libraries. So it makes perfect sense to open up JNI as a language feature for hybrid programming.

JNI is not necessarily a Java feature, rather an indispensable part of the design of the language runtime. A crucial internal capability which implements a subroutine linkage channel between the abstract and the real machine. The bedrock infrastructure which underpins the platform independent programming model, the by-product of which was exposed outside under the pretext of a language feature.

Friday, May 1, 2015

Function pointers in Java

This is what I think of Java interfaces:

Interfaces are defined as one or more methods grouped together with empty bodies, to represent object's interaction with the outside world. This is required if you want to impose certain abstract behaviors for sub types from which the said abstraction is made. But then we have abstract classes which serve a similar purpose. Technically abstract classes and interfaces differ in many aspects - with respect to inheritance models, what types of variables and methods they contain etc., but there are no programming scenarios where an abstract class cannot meet the purpose of an interface. If the requirement is to impose abstract methods to sub types, abstract classes will anyways help. If the requirement is to exhibit runtime polymorphism, normal virtual methods will help. If the requirement is something else, there are different semantics elsewhere in the language. In short, there exists no programming scenarios where both abstract classes and interfaces are discretely required.
Except for, simulating function pointers and call backs.

In (traditionally structural) programming, there exist numerous cases where one need to pass function pointer as an argument to method calls, to essentially register a call back. Classical examples are: i) Custom plug-ins, which get embedded and consumed in a program, after the executable is built. As the plugin library is built separately and possibly chronologically later than the executable, the main program does not have any information about the plug-ins and its routines. A contract is made wherein the main program will define the skeleton of the methods (function pointers which point to NULL), the plug-ins implement these routines, and at run time, after loading the plug-in library, the implemented methods are assigned to these pointers. At appropriate times, the main program invokes the routines through the pointers, and because of the aforesaid binding, they get dispatched to the right methods. ii) a re-usable library code, which intercepts certain asynchronous events, and want to pass it back to the application. The application would call a register method in the library and pass the call back function as an argument. The library routine assigns this function to a pre-defined function pointer. When library intercepts the event, the callback is invoked through the pointer. In either of the examples, the module which pre-defines the function pointer has no knowledge about the future assignments, and has no means to provide a default implementation - and hence they are always created as null pointers - the prototypes (primitive forms) of pure virtual methods.

Without pointers, this mechanism is not feasible in Java. At the same time, callbacks are powerful features without which a programming language cannot be deem complete. Java has a number of scenarios where this is essential. Couple of examples are: i) a thread creation scenario wherein one needs to pass the entry point method where the newly created thread should start its execution, ii) GUI scenarios wherein one needs to pass the call backs pertinent to handling of various events relevant to the graphics he renders. In either case, a default implementation and a default behavior neither seem feasible nor sensible. So it makes perfect sense to elevate abstract classes one level up in their abstraction, and designate them as interfaces. If you rip off an interface anatomically, what all you get are one or more function pointers inside, essentially NULL, well crafted and isolated within an object. In places where function pointers need to passed, one can pass the container object itself, and the interceptor routine can cache this object, and invoke the member method appropriately, and since the caller would actually pass the object of a concrete class implementing this interface, the dispatch happens appropriately, and the call back is simulated with great effort, though in style.

Interfaces are not new Java features, they are inevitables and imminences. Imperatives of a pointer-less language. An elegant way of creating, transporting and processing function pointers without ever talking about it.

Friday, December 16, 2011

Ahead of Time compilation in Java.

Banal initialization sequences had been a curse on the java runtime. A constant amount of time it takes for booting up the JVM, creating and initializing the vital organs of the virtual machine along with the loading of the kernel classes (such as java/lang/*) used to degrade the startup performance of enterprise applications at large, irrespective of the enterprise and characteristics of the resident application.

While thinking of solutions to reduce startup time, an obvious thought is about dynamic compiler such as JIT, which can intelligently profile the java bytecodes which get executed in this early phase of the JVM life-cycle, and perform native translations on those bytecodes specially, overriding the usual compilation qualifications and policies.

But this technique has its own drawback in that, when you go for optimally compiling the methods which impart in the startup, the compilation effort adds its own overhead to the startup, and the end result is a worsened performance.

Reflect upon this new problem at hand and you get the next obvious solution - fall back to the static compilation for these methods. One can fancy about statically compiling the entire rt.jar (or core.jar) in the target host, before JVM starts, and just use the compiled code at runtime, much in the same manner as the statically compiled executables and libraries.

But static compilation does not fare well with java: Apart from loosing the platform independence (can be solved by compiling in the target host), several powerful optimizations such as virtual method inlining cannot be properly performed because many information which dynamic compiler can obtain at runtime to positively influence the optimization, will be missing at the static compilation time.

AoT is an attempt to address the startup delay in applications, by compiling the method at such an optimization plan where all the statically computable information are utilized up-to the best possible extent, still providing a better performance than the interpreted bytecodes. Specially useful in clustered environments where the methods which are compiled at AoT level in one JVM, can be shared and re-used by other JVMs which take life at later point of time, and boost their startup drastically.

Tuesday, November 22, 2011

Data privacy in Java

Encapsulation is one of the powerful object oriented programming concepts. It is a technique by which member fields (data representative of object) are designated with private access, and harbored by accessor methods.

Encapsulation provides a shield which regulates the data access through getter and setter methods such that the data is tightly associated with the code around it, and the data is protected from cluttered and random accesses from external code - a code which does not belong to the class that encloses declaration of the member field.

But upto what degree this protection is being provided in Java? And in which context this notion of data privacy has to be perceived? Let us examine different scenarios.

1. When a member field is declared as private and flanked by public getter-setter methods, an external code entity can perform every action which an internal method is capable of doing on the field - the private field can be read from, written to, and even purged (nullified). In short, a private field with associated setter-getter methods is as good as a public field in all execution aspects, except in the title bestowed by the language.

2. When a member field is declared as private and covered by a public getter method but not with a setter method, an external code can perform 'most' of the actions which an internal method is capable of doing on the member: the field can be read from. The field, if it is a user defined object, every action which the container object is capable of doing on it, can be performed by external code as well, except for purging (nullifying) the field. Since the getter method returns a copy of the field reference (not a copy of the object), the returned reference also points to the same data in the heap. This means the component reference (private field residing in the container) as well as the returned value of the getter method - both are same in all aspects with respect to the permitted actions, and the underlying data pointed can be modified from outside code as well. Two exceptions here are: i) private primitive fields cannot be modified as the getter method returns a copy of it, not the reference. ii) Nullification of the component reference is not possible, as a copy of the reference is what was returned by the getter method and nullifying the reference nullifies only the copy, not the original.

In short, in both these cases, if the programmer wants to restrict outside code from modifying a private field, the getter method has to clone out a copy of the field and return it.

3. When the field is private and there are no getter setter methods, apparently the object is hidden from outside code, but it is not. Given an object, all of its components can be accessed and modified including purging and cleansing, from an outside java code by making use of reflection APIs. In addition there is undocumented unsafe API collections through which any object references, any objects, any part of the java heap and any part of the process address space can be reached out, with complete capability to read-write. Programmatically, by adhering to the language semantics. This way of data access can be restricted using custom security managers, but they have side effects.

In C++, an object or a primitive field returned through a getter method are cloned copies of the original, so external code is incapable of modifying the shielded object, adhering strictly to the data privacy documented in the language. Only when the getter method returns a object pointer instead of the object itself, the 'intrusion' is possible. The above explained limitation in Java is root caused by the pointer-less design which inhibits the program from flexible object allocation, administration and propagation.

Having said that, it is important to understand the notion of data privacy in java as a mean to design modularized code for better maintainability and as a mild endorsement by the compiler for fending off cluttering and unsolicited data accesses in the code. Encapsulation cannot be used as a mean to achieve data security in the application. Re-usable code modules which impart in enterprise business applications should not rely on this language supported feature for preserving the desired security, instead the data has to be secured through custom means specific to the application.

Wednesday, August 4, 2010

Hyper-virtual methods in java.

Here is what I think about the virtual methods in java.

In java, by design and specification, all the non-static, non-private, non-final, non-constructive methods are virtual. This means, the selection of the method to be invoked at a call site will depend on the actual (runtime) type of the invoker object (reciever), rather than its declared type.

In the case of C++, this is true only when the invoker object is declared as a pointer type, and the method is declared explicitly as 'virtual'. If either of this is false, then the method is resolved (identify and select the definition) always to the definition in the defining class of the declared (static) type of the invoker.

In contrast in java, since there is no pointer, there is no flexibility for methods to exhibit virtual and non-virtual behavior based on the type declaration mode - there is only one way to cite objects, that is through references. And moreover, in JRE implementations, java object looses its connection with the declaring type and gets associated with the defining class. At this point it is imperative that the virtual keyword be removed and designate all the normal methods as virtual.

But how often a program really requires the virtual property? Very rarely. What is the percentage of virtual methods who exercise this feature in a meaningful manner? less than 5%. Even in those cases where multiple subclasses are designed and methods redefined, an efficient programmer will go for an interface (or abstract class) for the base class, which means the base method is pure virtual(abstract), not virtual.

This precisely means that a normal, concrete java method (designed to be virtual) actually utilizing its virtual-ness is a rarest possibility.

Implementing virtual methods is easy in JREs, but their presence make the execution engine incapable of pre-linkage of the method call site, potentially slowing down the performance. In practice, the method resolution has to wait until the execution reaches the call site. Dynamic compilers devirtualize methods up to an extend, by tracing the source of the invoker object in the neighborhood of the call site, but this does not really alleviate the problem, and adds it's own additional computation overhead. One of the potential challenges of JIT today is the inability to perform inter-procedural analysis and compress the code any further owing the extremely delayed method resolutions. A powerful technique called ahead-of-time compilation is rendered ineffective because of the inability to resolve methods in advance.

The decision to qualify all the methods as virtual was not a well thought-out design, instead an un-anticipated side effect. An accidental by-product or an unexpected misfire came out of the pointer-less design.

Object leaks in java.

Here is what I think of Java parameter passing conventions.

At programmer's level, Java is said to pass objects by reference and primitives by value. This means for references, what the callee receives is a  heap address of the object, and the object references themselves are actually passed by value. This also means Java saves some space and effort in copying the entire object onto the subroutine linkage channel (for example stack memory).

By definition, pass by reference means 'a parameter passing convention where the lvalue of the actual parameter (argument) is assigned to the lvalue of the formal parameter.'

When passed by reference, the callee method can manipulate the original object’s attributes, can invoke the methods of the object, can re-new, re-assign and purge the components of a composite object thus passed. These operations affect the original reference of the caller, because we have only one object in the heap, which are pointed to by both of these references.

For destroying an object, the C++ way is to 'delete' the object, and the C way is to 'free' the pointer. If passed by reference or address, both these languages have the flexibility of cleaning the object or a structure from anywhere in the caller-callee chain. The invalidation of an object indirectly invalidates other references or pointers cached elsewhere in the stack locations, and trying to reuse those references or pointers results in a crash.

This is different in java. Since there is no explicit freeing of objects, we rely on null assignment on the reference, which is the only way to force an object cleanup. Even after the callee nullifies an object, the object lives through the caller's reference. This means that an object cannot be freed (or initiated for freeing) from an assignee reference, when the a peer reference is alive, and vice versa.

This may be a conscious design to eliminate bad references and make sure that all the object references are either null or a valid object's address. This is because, in the garbage collection, the memory of unreferenced objects are not really freed into the system, rather kept in the internal free pool, and is still mapped into the process, and is accessible through stale references, and such a bad dangling pointer will actually cause more damage than a crash.

But then how to clean up an unwanted java object? Set your object reference to null and wait for a gc to occur? might not work because, if there is a second reference elsewhere in the stacks and registers, consciously or unknowingly, the object is not collected. Consequently, many of the objects the programmer has explicitly discarded will lay remnant in the heap until the last reference of the object also went out of scope. This may be sooner or later, or never.

Many of the memory leaks including the infamous Classloader leaks can be attributed to this 'hidden and under-documented' behavior of java. And this is the very reason we see more OutOfMemoryErrors than NullPointerExceptions.

Garbage generation in java.

Here is what I think of java garbage collection:

In java programs, the use of pointers is forbidden by virtue of a design strategy or a security policy. Without pointers, functions cannot access objects across stack frames, among many other limitations. The inability to pass objects to and from functions will limit the scope of a programming language at large. To remedy this, in java, user defined objects are inherently passed by address (termed as reference), in contrast to C and C++ where passing arguments by their addresses is a volitional choice.

Conventionally, when arguments are passed by value, what the callee recieves is an isolated copy of the passed object. In C, when passed by address, the callee can manipulate the caller's arguments. In C++ the same applies, along with the call by reference. The user objects are normally created on the stack. In cases of producer functions where the function generates and returns an object, the allocation has to be made in the heap (locally created objects cannot be returned from a function, which causes dangling reference). Such cases are not so often, so one can free the object manually which was 'newed'. Two modes of creating user objects are:

Class obj();              => object and the handle created on the stack.
Class *obj = new Class();    => object in the heap, reference on the stack.

In java, without pointers, the language semantics does not allow the above flexibility and we have only one way to create objects – either everything on the stack or in the heap, not both. Creating all the objects on the stack is a bad choice, since objects whose life span is greater than the defining method will be destroyed when the frame is popped off while the function’s return, essentially forbidding methods from returning generated objects, causing java to be an incomplete language. As a workaround, all the objects are created in the heap. Now, as a matter of fact, it is difficult for a programmer to delete all the objects he 'newed' which are quite many, rather most of them.

Hence the garbage and hence the collector.

In a non-java programming paradigm, it is like allocating memory at arbitrary heap locations, and later scanning the entire virtual memory to clean up the filth.

Garbage collection is not a java feature. It is a compromise. A consequence of refraining from pointers. A skillful attempt to mend a defect. An unchecked sun heredity and an unbridled software hypothesis which we carried and dragged all the way along.