Title of the project: Knowledge Networking for Genomic Services using CHAIMS Participants: Prof. Gio Wiederhold, Computer Science Department, Stanford University. Prof. Russ Altman, Medical Information Sciences, Stanford University. Dr. Dorothea Beringer, Computer Science Department, Stanford University. Two students. Principal Investigator: Prof. Gio Wiederhold, Gates Computer Science 4A, Stanford, CA 94305 tel: +650 725-8363, fax: +650 725-2588, email: gio@cs.stanford.edu. Anticipated budget: $300'000.-/year, for three years Focus: Knowledge Networking: Foundational Research and Prototype Development Description of the project: Genomic information is available from many sources, Genbank at NLM, GDB (stored using OPM, moving to Oak Ridge National Lab), Swiss Prot. OMIM at Johns Hopkins, EST information from Merck and public sources at the Washington Univ., Metabolic pathways (EcoCyc) at SRI International, etc. There has been much work on linking these information sources using advanced database technology. However, most of these sources have powerful search and processing engines. Integrating these service capabilities has only been performed spottily; for instance the partial incorporation of BLAST into OPM at Lawrence Berkeley labs, and similar efforts in many dispersed research groups. A smaller scale effort has been the development of bio-widgets, initiated by David Searls at the Univ. of Pennsylvania. These have focused on processing for information visualization, although a broader focus is being proposed now through OMG as a Life-Sciences group initiative, using CORBA for remote method access. We propose to develop and extend mega-programming technology, which enables the composition of remote computational services, into the genomic domain. We would provide composition capability not only for CORBA based tools, but also for DCOM and other distribution protocols. This is achieved through the use of a new high-level, composition only language, CHAIMS, which provides access to heterogeneous services, either directly or via wrappers. Our current work is supported in part by DARPA, and earlier work was motivated by the Commercenet Consortium. These efforts had to focus on relatively short-range demonstrations, as in logistics, and do not support essential research, as exploiting the potential for optimization. The principles underlying CHAIMS are a clear separation between providing services and composing services, heterogeneity across boundaries of distribution protocols like CORBA, DCE, RMI or DCOM (now part of Windows DNA), focus on the composition and clear separation between composition and computation, additional control concerning the execution of services, and optimization of dataflows between services. The CHAIMS megaprogramming language we have developed is a purely compositional language targeted to be used by domain experts and containing the possibilities for various optimization techniques at run-time as well as at compile-time. The handling of various distribution protocols is taken over by the CHAIMS compiler which uses the available client-server protocols. The compiler will also be responsible for the optimization techniques we would like to investigate and implement in this project. So far we have defined the CHAIMS megaprogramming language and implemented a basic CHAIMS infrastructure consisting of a basic compiler for compiling CHAIMS megaprograms and of wrapper templates for wrapping non-CHAIMS compliant services. In the context of this project we would like to focus on the following two points: exploring, integrating and evaluating optimization techniques; applying the CHAIMS approach to the composition of genomic services. Optimization in composition becomes an important issue especially in computing and data intensive domains like genomics. When composing remote services, new possibilities for optimization arise. Remote services can be executed in parallel, resulting in considerable gains in execution time whenever the services are of substantial complexity. Furthermore, the exchange of data between the client megaprogram and the services can become a bottleneck, therefore data should be routed directly among the services. The domain experts writing a megaprogram in order to compose services should not be burdened with doing these optimizations manually. We want to free the domain experts from any tasks not directly related to their domain knowledge. Therefore any optimization should be taken over by the CHAIMS system. In this project we will explore the implications the optimizations mentioned above have in a completely heterogeneous environment. We will investigate techniques for direct data exchange between services, design the necessary control mechanisms between megaprogram and services, and develop the optimization algorithms for the compiler. Parallel to implementing optimization techniques we also want to use CHAIMS in the composition of genomic services, and thus demonstrate and make available the composition capabilities of CHAIMS to this domain. The project will fall into the focus KN (knowledge networking). It addresses the still quite unexplored domain of integrating computation - in contrast to projects addressing data integration. The paradigm chosen for knowledge networking is that of composing services by a dedicated compositional megaprogramming language with automatic support for optimization. We expect the results of this research also to contribute to the focus NCC (new computational challenges). Gio Wiederhold is a professor of Computer Science at Stanford University, with courtesy appointments in Medicine and Electrical Engineering. He has a long history of applying advances in computer science into medicine. Russ Altman is an assistant professor of Medicine with a courtesy appointment in computer science, doing research in genomics matching and functionality. Dorothea Beringer is a postdoc with a PhD in software engineering from the EPFL in Lausanne, Switzerland, and has also research experience in distributed systems and several years of industrial experience.