In my talk, I will present Query Flocks, a general framework over relational data that enables the declarative formulation, systematic optimization, and efficient processing of a large class of mining queries. In Query Flocks, each mining problem is expressed as a datalog query with parameters and a filter condition. In the optimization phase, a query flock is transformed into a sequence of simpler queries that can be executed efficiently. As a proof of concept, I have integrated Query Flocks with a conventional database system and will report on the performance results.
While the Query-Flock framework is well suited for relational data, it has limited use for semistructured data, i.e., nested data with implicit and/or irregular structure, e.g. web pages. The lack of an explicit fixed schema makes semistructured data easy to generate or extract but hard to browse and query. In my talk, I will present methods for structure discovery in semistructured data that alleviate this problem. The discovered structure can be of varying precision and complexity. In particular, I will present an algorithm for deriving a schema-by-example and an algorithm for extracting an approximate schema in the form of a datalog program.