Approach
It is becoming apparent that Java will play a significant role in the nation’s computing base, especially in the context of mobile computing, where security issues are of paramount importance. Unlike other widely-used programming languages, Java’s clean design is amenable to formal analysis. Sun has published an English specification of the JVM and makes available a reference implementation. However, the English specification, while quite good, is nonetheless incomplete and occasionally in conflict with the reference implementation. The reference implementation is a large, complex C program in which security checks are distributed throughout the code making it difficult to understand and impossible to prove anything about. This situation is not acceptable for security-critical applications. Because of the importance of the verifier, and because it is amenable to formal analysis, this is an ideal opportunity to apply formal specification and development technology. Use of formal methods will improve the security of Java applications.
The approach is to first create a specification in naive set theory based on Sun’s English specification and reference implementation. The specification formalizes, fills gaps in, removes bugs, and generalizes the Sun specification. Our specification removes constraints on loading strategies that are in the Sun implementation.
The specification builds heavily on standard mathematical structures, such as stacks, lattices, and product and sum lattice constructors. The bytecode verifier is formalized by the specification as an instance of a constraint satisfaction architecture. Thus, to verify a JVM class description, a constraint satisfaction problem is derived from the class description and then solved. If the constraint problem does not have a (non-trivial) solution, the class fails verification. Reliance on standard architectures and mathematical structures makes the specification easier to comprehend and validate.
The constraint satisfaction architecture is parameterized by a lattice and a collection of monotone functions over the lattice. The lattice formalizes the typing structure of the JVM. The monotone functions are used in constraint inequalities that directly formalize the JVM typing rules. The specification is thus at a high-level of abstraction, and in close correspondence to how a type theorist would specify the JVM type system. Furthermore there is clean separation of the specification of bytecode verification problem as a constraint problem, and its implementation by constraint solving algorithm. The algorithm specified in the constraint solving architecture is a chaotic or non-deterministic fixed-point iteration scheme similar to those used to solve dataflow analysis problems.
Next, the specification is expressed in Specware. It is type checked, and putative properties of the specification are proved. Specware composition capabilities permit a very natural structuring of the specification into components. For example, Specware’s parameterized specifications, which, like ML functors, build specifications (or modules) from specifications, are used to instantiate the constraint satisfaction architecture and lattice-constructing operations used extensively in the specification.
Finally, the specification is refined into code. This requires defining refinements for each of the components of the specification. These refinements are composed to yield an implementation in Lisp or C++. The refinements carry formal proof obligations that guarantee that the implementation is correct. The main work is to derive an efficient data representation for the JVM lattice, and to refine the constraint-solving algorithm into a more efficient form.
The specification of the bytecode verifier has been structured so that can be easily extended to verify additional JVM class properties, such as information flow properties.