Query Flock Compiler: Efficient Implementation of Query Flocks

Abstract

The query flock framework enables efficient, ad-hoc, on-line data mining. In this document, we describe two different approaches to efficient implementation of query flocks using a conventional RDBMS. First, we show how, using a new logical operator group-select in query plans, the query flock idea can be implemented inside a conventional optimizer of RDBMS. Along with the new logical operator, we also present a novel optimization technique that involves pushing aggregate conditions down the query plan tree. This technique is applicable to any SQL query with aggregate conditions but is especially important and intuitive when presented within the context of query flocks.

The second approach to efficient implementation of query flocks is designed to work on top of a conventional RDBMS. However, this approach is still tightly coupled with the RDBMS as we only use SQL'92. For this approach we build query flock plans that use auxiliary relations to limit the size of the intermediate results of query flocks. Intuitively, auxiliary relations contain values of the parameters that satisfy the filter condition for a subset of the query flock. This idea is an extension of the optimization technique presented in the first approach. We also present an optimization algorithm for generating query flock plans and some greedy heuristics that empirically perform very well. They are used in the Query Flock Compiler that implements this approach. We also describe some experimental result which emphasize the power and flexibility of query flocks.