Sunday, May 10, 2015

Kernel abstraction in Java

This is what I think of JNI in Java.

A managed runtime which implements a virtual machine, has to abstract two types of programming needs, in one of many such abstractions - programming needs in the user space, and those in the kernel space. While the programming needs in the user space such as assignments, arithmetic operations, method invocations etc. are abstracted in well defined bytecodes which theoretically runs on a stack based virtual machine, abstracting the programming needs in the kernel space is not very easy and straightforward, and require special treatment.

Kernel services are exposed to the user world through APIs called system calls, and pluralities of platforms expose multitudes of methods which differ in their nomenclature, input, service they provide, and the side effect they cause in the execution environment.

There are three approaches to abstracting the programming needs in the kernel space:

1. Define one byte code for each system call in each platform. This approach has many drawbacks: i) The number of byte codes will bloat up beyond maintainable limit, ii) will cause the size of the bytecode to cross one byte, ii) The programmer should have the precise knowledge about the underlying system, iv) the program will no longer be platform independent, as it will contain platform specific conditionals and considerations.

2. Define one byte code (such as invokenative) for all the system calls in all the platforms. This has the last two disadvantages mentioned above - the platform independence is lost, the programmer should keep track of the platform differences and code accordingly. Also there are other logistical issues such as arrival of new bytecode combinations (such as invokenativevirtual, invokenativestatic etc. based on the access type of the native method).

3. Abstract all the kernels and define generic APIs which meet the system needs of the program (the existing approach) and manage the platform dependent details from within. This includes mapping each high level APIs to one or more system calls, preparing their input, and manage the call dispatch. It is not possible to do these procedures in an interpreter, as for different methods these activities will differ. And moreover, it is tiresome to identify custom invocation sequences for each native calls based on the name and signature of the generic API under execution. At this point it makes sense to have custom native (programs which compile into native code and run in an un-managed runtime such as the machine itself) wrappers around the system (or library) calls, and define a protocol for invoking these wrappers from the Java APIs. All of the (operating) system abstractions in the JRE (file system, console, networking, graphics, process management, threading, synchronization, etc.) have native interfaces wrapping around their system counter parts, and manage the service invocation between the system layer and the java layer.

This protocol, which is used to communicate between java APIs and their native back-ends which interface with the underlying system, is called JNI protocol. JNI is a necessity for the Java language and the virtual machine to achieve kernel abstraction and thereby achieve platform neutrality. As a matter of fact, not all the system calls are abstracted in this way. Also there could be scenarios where programmer would need to perform tasks natively, or avail computation service from pre-existing native libraries. So it makes perfect sense to open up JNI as a language feature for hybrid programming.

JNI is not necessarily a Java feature, rather an indispensable part of the design of the language runtime. A crucial internal capability which implements a subroutine linkage channel between the abstract and the real machine. The bedrock infrastructure which underpins the platform independent programming model, the by-product of which was exposed outside under the pretext of a language feature.

Friday, May 1, 2015

Function pointers in Java

This is what I think of Java interfaces:

Interfaces are defined as one or more methods grouped together with empty bodies, to represent object's interaction with the outside world. This is required if you want to impose certain abstract behaviors for sub types from which the said abstraction is made. But then we have abstract classes which serve a similar purpose. Technically abstract classes and interfaces differ in many aspects - with respect to inheritance models, what types of variables and methods they contain etc., but there are no programming scenarios where an abstract class cannot meet the purpose of an interface. If the requirement is to impose abstract methods to sub types, abstract classes will anyways help. If the requirement is to exhibit runtime polymorphism, normal virtual methods will help. If the requirement is something else, there are different semantics elsewhere in the language. In short, there exists no programming scenarios where both abstract classes and interfaces are discretely required.
Except for, simulating function pointers and call backs.

In (traditionally structural) programming, there exist numerous cases where one need to pass function pointer as an argument to method calls, to essentially register a call back. Classical examples are: i) Custom plug-ins, which get embedded and consumed in a program, after the executable is built. As the plugin library is built separately and possibly chronologically later than the executable, the main program does not have any information about the plug-ins and its routines. A contract is made wherein the main program will define the skeleton of the methods (function pointers which point to NULL), the plug-ins implement these routines, and at run time, after loading the plug-in library, the implemented methods are assigned to these pointers. At appropriate times, the main program invokes the routines through the pointers, and because of the aforesaid binding, they get dispatched to the right methods. ii) a re-usable library code, which intercepts certain asynchronous events, and want to pass it back to the application. The application would call a register method in the library and pass the call back function as an argument. The library routine assigns this function to a pre-defined function pointer. When library intercepts the event, the callback is invoked through the pointer. In either of the examples, the module which pre-defines the function pointer has no knowledge about the future assignments, and has no means to provide a default implementation - and hence they are always created as null pointers - the prototypes (primitive forms) of pure virtual methods.

Without pointers, this mechanism is not feasible in Java. At the same time, callbacks are powerful features without which a programming language cannot be deem complete. Java has a number of scenarios where this is essential. Couple of examples are: i) a thread creation scenario wherein one needs to pass the entry point method where the newly created thread should start its execution, ii) GUI scenarios wherein one needs to pass the call backs pertinent to handling of various events relevant to the graphics he renders. In either case, a default implementation and a default behavior neither seem feasible nor sensible. So it makes perfect sense to elevate abstract classes one level up in their abstraction, and designate them as interfaces. If you rip off an interface anatomically, what all you get are one or more function pointers inside, essentially NULL, well crafted and isolated within an object. In places where function pointers need to passed, one can pass the container object itself, and the interceptor routine can cache this object, and invoke the member method appropriately, and since the caller would actually pass the object of a concrete class implementing this interface, the dispatch happens appropriately, and the call back is simulated with great effort, though in style.

Interfaces are not new Java features, they are inevitables and imminences. Imperatives of a pointer-less language. An elegant way of creating, transporting and processing function pointers without ever talking about it.