Annual Technical Report F30602-96-2-0223 CHAIMS project, Stanford University Objective: The objective of the CHAIMS project (Compiling High-level Access Interfaces for Multi-site Software) is the composition of large, autonomous, heterogeneous and distributed software services (called megamodules). These can be legacy as well as CHAIMS-compliant services. Technical details of the composition like generating client code for CORBA, RMI, DCE or DCOM protocols are hidden from the person doing composition - this person only needs to be a specialist in the application domain and not a programmer for distributed systems. CHAIMS also addresses novel optimization opportunities and complexity challenges arising from composing large distributed modules. CHAIMS provides an executable procedural infrastructure that is at a level moving close to architecture description languages. URL: Details, including publications and demos, can be found at http://www-db.stanford.edu/CHAIMS Approach: CHAIMS consists of a novel, purely compositional language as well as a composition system supporting this language. The CHAIMS language is on a higher level than traditional languages, yet takes into account emerging complexity issues that arise when calling large, distributed services. The traditional CALL-statement is split up into substatements: pre-invocation setups, asynchronous invocation of services and extraction of results, and termination of services. Having several primitives also allows to add an additional primitive for pre-invocation estimates of the performance of a service - important for choosing optimal services at run-time, scheduling the order of invocations according to criteria like time or cost, and giving more control to the user during the execution of the megaprogram containing the composition. Similar objectives lead to a primitive for monitoring ongoing invocations. The CHAIMS system consists of two parts. For the persons providing megamodules there have to be tools that support them in wrapping legacy modules and presenting the necessary information about available megamodules to the persons using them. For the actual composition process CHAIMS provides a compiler that not only compiles the megaprogram written in the CHAIMS language but also generates all the necessary client code for the various distribution protocols like RMI, CORBA, DCE and DCOM. This generated CRST (client side run time) also is responsible for automatic optimization by reordering invocation of substatements (scheduling) and by controlling direct dataflow between megamodules. Our approach is different from what is found in the commercial sector in that we try to address composition issues as they arise when composing large, heterogeneous and distributed services. Today, such composition is either not done at all (e.g. a person responsible for logistics does such composition manually by phoning various companies and entering the information manually into a spreadsheet) or it is done by handcoded systems that directly use one specific distribution protocol and are limited to services provided for that protocol. Yet megamodules available for wrapping and integration are showing up in increasing numbers on the internet. Using these sources in a more automated and flexible way becomes now the challenge for the DoD and many organizations that want to take advantage of fast and flexible accesses to these services. CHAIMS addresses issues that will become very important in this domain, and thus for the DoD, and it addresses it in a general way, i.e. not bound to one particular protocol, as CORBA, DCOM of Microsoft, or RMI of Java. Recent Accomplishments: Since the 1997 project report the CHAIMS project has gone through a further iteration with refining the architecture of the CHAIMS system, enhancing the CHAIMS protocols and compiler for allowing also complex data to be exchanged between megamodules and megaprogram, and adding the capability to wrap legacy modules. The current version (version 1.1) offers the following added features: - Wrapper templates for wrapping legacy RMI and CORBA modules (using either Orbix or Omnibroker as ORB): these wrapper templates require only very limited handcoding when wrapping a megamodule and provide automatically most of the functionality required to create a CHAIMS compliant megamodule. The definition of CHAIMS compliancy has been enhanced since the last progress report in order to take full advantage of the CHAIMS megaprogramming language and the capabilities offered by using Gentypes and ASN.1 for data transfer. - Use of the public standard ASN.1 for shipping complex objects among megamodules while keeping data opaque for the megaprogram as well as the transport layer. This will eventually also allow dataflow optimization by introducing direct dataflows among megamodules. - Protocols for RMI and CORBA: the CHAIMS protocols, layered on top of the various distribution protocols, now also allow the transfer of complex data together with type as well as descriptive name information. This is achieved by packaging all data into Gentypes and encoding them using ASN.1 and BER. This allows not only protocol independent data transfer but also eliminates the need on the client side to interpret data that should be opaque to the client side. - The new compiler now generates code for two different CORBA systems (Orbix as well as Omnibroker) and for RMI. Because various ORBs differ significantly in how connections to megamodules are established, we have to treat each ORB as a different distribution protocol. This is no problem as dealing with heterogeneous megamodules is part of the CHAIMS objectives. The new compiler was also enhanced to include all CHAIMS primitives, to deal with opaque data encoded in ASN.1, and to allow basic control structures. - We have also developed a general fully CHAIMS compliant I/O-megamodule based on the Gentype. This I/O-megamodule allows to get and display data packaged into a Gentype without the need of further coding. - A first version of a preprocessing scheduler has helped us to determine how to approach automatic scheduling in future versions of CHAIMS. Implementing the basic infrastructure has proven that most of our concepts can be implemented as defined; a few need refinements. Further we have seen that CHAIMS indeed can be layered on top of CORBA and RMI, though these are quite different in their approach to building distributed software systems. Current Plan: Based on the experience with CHAIMS 1.1 we are now refining some concepts and enhancing the current infrastructure in order to incorporate certain details left out in version 1.1 (e.g. extracting lists of elements, having client-specific pre-invocation settings of parameters, mixing megamodules of several distribution protocols within the same megaprogram). For CHAIMS 2.0 we also want to add at least one other distribution protocol (additional wrapper templates and enhancements in the compiler). Another major point we are working on now and will be working on in the upcoming months are more powerful demonstrations, based on CHAIMS 1.1 as well as 2.0. This becomes possible as we have now a basic infrastructure, and it will help us to effectively verify our approach. Towards the end of this year and throughout the next year we will focus on the integration of the automatic scheduling of invocations. In contrast to the current preprocessing scheduler which only allows limited scheduling, scheduling finally will be done by a run-time scheduler built into the generated CRTS (client-side-run-time). This will necessitate a more complex architecture of the compiler and CRTS as we have now, yet we expect to reuse most of the current implementation. For next year's demo days we plan to present CHAIMS version 2.0 with various demonstration examples that verify the approach chosen by CHAIMS. Technology Transition: CHAIMS is a very young (only since September 1996) and small project (two students funded by DARPA, one additional student, and one postdoc funded by a fellowship). We are also moving to a new plateau of technology, not yet seen in commercial products. We are therefore still in the state of initial research and building the basic infrastructure for validating and proofing our approach to large-scale software composition. Though we have not been able yet to give large demonstrations or to distribute releases of our software, there has already been an amazing amount of interest from various software companies and research institutions in our work, e.g., by HP labs, SAP and BEASys. Our goal for technology transition for the time being and the near future is exactly what is already going on now: inspiring companies and research groups working in related domains with our concepts and experiences. A technology transition of our software is planned to start next year; the first step will be the application of our approach and tools to the domain of bio-informatics by using CHAIMS for the composition of genomics software modules. We have submitted a proposal to NSF with that objective, and are open to investigate other application domains.