Publication Abstracts

1. Query Rewriting using Semistructured Views
We address the problem of query rewriting for TSL, a language for querying semistructured data. We develop and present an algorithm that, given a semistructured query q and a set of semistructured views V, finds rewriting queries, i.e., queries that access the views and produce the same result as q. Our algorithm is based on appropriately generalizing containment mappings, the chase, and unification -- techniques that were developed for structured, relational data. We also develop an algorithm for equivalence checking of TSL queries. We show that the algorithm is sound and complete for TSL, i.e., it always finds every TSL rewriting query of q, and we discuss its complexity. We extend the rewriting algorithm to use available structural constraints (such as DTDs) to find more opportunities for query rewriting. We currently incorporate the algorithm in the TSIMMIS system.

2. Expressive Capabilities Description Languages and Query Rewriting Algorithms
Information integration systems have to cope with a wide variety of different information sources, which support query interfaces with very varied capabilities. To deal with this problem, the integration systems need descriptions of the query capabilities of each source, i.e., the set of queries supported by each source. Moreover, the integration systems need algorithms for deciding how a query can be answered given the capabilities of the sources. Finally, they need to translate a query into the format that the source understands. We present two languages suitable for descriptions of query capabilities of sources and compare their expressive power. We also use one of the languages to automatically derive the capabilities description of the integration system itself, in terms of the capabilities of the sources it integrates. We describe algorithms for deciding whether a query "matches" the description and show their application to the problem of translating user queries into source-specific queries and commands. We propose new, improved algorithms for the problem of answering queries using these descriptions. Finally, we identify an interesting class of source capability descriptions, for which our algorithms are much more efficient.
This paper extends the work in paper no. 6.

3. Incremental Maintenance of Semistructured Views over Semistructured Data
Semistructured data is not strictly typed like relational or object-oriented data and may be irregular or incomplete. It often arises in practice, e.g., when heterogeneous data sources are integrated or data is taken from the World Wide Web. Views over semistructured data can be used to filter the data and to restructure (or provide structure to) it. To achieve fast query response time, these views are often materialized. This paper studies incremental maintenance techniques for materialized views over semistructured data. We use the graph-based data model OEM and the query language Lorel, developed at Stanford, as the framework for our work. We propose a new algorithm that produces a set of queries that compute the changes to the view based upon a change to the source. We develop an analytic cost model and compare the cost of executing our incremental maintenance algorithm to that of recomputing the view. We show that for nearly all types of database updates, it is more efficient to apply our incremental maintenance algorithm to the view than to recompute the view from the database, even when there are thousands of updates.

4. Using Knowledge of Redundancy for Query Optimization in Mediators
Many autonomous and heterogeneous information sources are becoming increasingly available to the user through the Internet -- especially through the World Wide Web. The integration of Internet sources poses several challenges which have not been sufficiently addressed. In particular, knowledge of redundancy can be used to reduce the number of source accesses that have to be performed to retrieve the answer to the user query. The system must avoid retrieving the same set of data from multiple sites. Moreover, redundancy can be exploited to further increase query performance and system availability by having the system collect information from the ``cheapest'' available sources. We formulate this problem as a scheduling problem with AND/OR precedence constraints and discuss its complexity. Moreover, probabilistic information about source overlap can help derive efficient query plans for delivering partial answers to queries. We formulate the optimization framework and propose approximations to the problem that make efficient use of the source overlap information to provide suboptimal solutions.

5. Capability-based Mediation in TSIMMIS
Conventional mediators focus their attention on the contents of the sources and their relationship to the integrated views provided to the users. They do not take into account the capabilities of sources to answer queries. This may lead them to generate plans involving source queries that cannot be answered by the sources. In the TSIMMIS system, we have developed a capability-sensitive plan generation module that constructs feasible plans for user queries in the presence of limited source capabilities.

6. Describing and Using Query Capabilities of Heterogeneous Sources
Information integration systems have to cope with the different and limited query interfaces of the underlying information sources. First, the integration systems need descriptions of the query capabilities of each source, i.e., the set of queries supported by each source. Second, the integration systems need algorithms for deciding how a query can be answered given the capabilities of the sources. Third, they need to translate a query into the format that the source understands. We present two languages suitable for descriptions of query capabilities of sources and compare their expressive power. We also describe algorithms for deciding whether a query "matches" the description and show their application to the problem of translating user queries into source-specific queries and commands. Finally, we propose new improved algorithms for the problem of answering queries using these descriptions.

7. Template-Based Wrappers in the TSIMMIS System
Wrappers provide access to heterogeneous information sources by converting application queries into one or more source-specific queries or commands. and transforming the source results into the application data model. This paper describes the wrapper implementation toolkit we have developed for wrappers. The wrapper implementor specifies a set of templates written in a high level declarative language that describe the queries accepted by the wrapper and the objects returned. If an application query matches a template, an action associated with the template is executed to provide the native query for the underlying source. A query can match a template by being equivalent to it, or by being equivalent to a template modulo a wrapper-generated filter query. Finally, the wrapper transforms the answer into the representation that is used by the application. The wrapper can be easily extended by adding templates when more functionality is required.

8. Views for Semistructured Data
Defining a view over a semistructured database introduces many new problems. In this paper we propose a view specification language and consider the problem of answering queries posed over views. The two main approaches, query rewriting and view materialization, are outlined with focus on the difficult problems caused by the semistructured nature of the data.

9. The TSIMMIS approach to mediation: Data models and Languages
TSIMMIS -- The Stanford-IBM Manager of Multiple Information Sources -- is a system for integrating information. It offers a data model and a common query language that are designed to support the combining of information from many different sources. It also offers tools for generating automatically the components that are needed to build systems for integrating information. In this paper we shall discuss the principal architectural features and their rationale.

10. An Analysis Of Factors Directing The Admission Process Of AI Technologies
We start with the observation that, while artificial intelligence is a prominent branch of computer science that has even found its way into the popular culture, it appears that its fruits are an inconsequential part of the computer industry. The paper tries to assess the extent to which this observation is accurate, by determining and analyzing some important factors that affect the fortunes of AI technology in the marketplace. It identifies the stages in the admission process of Information Technology and the particular obstacles that AI is facing during that process. Finally, based upon the patterns that the research reveals, we make some predictions about the commercialization of AI in the future. Some of the observations made in this paper in 1995 have since been echoed by popular business magazines (see Forbes, Nov 30, 1998, "AI gets real").

11. Buffer size and congestion control in ATM networks
An important problem in ATM networks is to determine the optimal size of buffers in ATM switches such that traffic and quality of service requirements are met. I developed a simple scheduling scheme that, for a simple traffic model, under very general conditions and for a variety of common network topologies, required much smaller buffers in the worst case than other known approaches.
 
 

If you have comments or questions, please contact me at vassalos@db.stanford.edu