My thesis defense took place on September 9, 1999. Here is the
abstract and the slides of my talk.
Data Mining Techniques for Structured and Semistructured Data
With the advent of the World Wide Web, the amount of data stored and
accessible electronically on the Internet and private Intranets has
grown tremendously. The process of knowledge discovery (data mining)
by sophisticated and intricate analysis of this data is becoming
increasingly important for the business and scientific-research
communities alike. The current state of the art in data mining is
specialized techniques and algorithms, designed for limited types of
data, that solve specific problems. In my thesis, I focus on the
design and development of a general framework and algorithms for
formulating, optimizing, and processing complex data mining queries in
a uniform manner for structured and semistructured data.
Mining Structured Data
-
Query Flocks: A Generalization of Association-Rule Mining
1998 ACM International Conference On
Management of Data (SIGMOD'98),
Seattle, Washington, June 1998.
joint work with Dick Tsur, Jeffrey Ullman, Chris Clifton, Serge Abiteboul, Rajeev Motwani, and Arnie Rosenthal
The query-flock framework is designed for efficient mining of
relational data in a uniform manner. Mining queries are expressed
declaratively (in datalog with parameters) along with a filter
condition (SQL-style). The paper introduces query flocks and explores
systematic ways of efficiently processing them. (Abstract)
-
Integrating Data Mining with Relational DBMS: A Tightly-Coupled Approach
4th Workshop on Next Generation Information Technologies and Systems (NGITS'99), Zikhron-Yaakov, Israel, July 1999
joint work with Dick Tsur
This paper reports on an integrated system for mining data by
fully utilizing the capabilities of relational DBMS. Using the
query-flock framework, complex mining queries are first optimized into
a sequence of simpler SQL queries and then processed at the
underlying DBMS. (Abstract)
Mining Semistructured Data
-
Representative
Objects: Concise Representations of Semistructured, Hierarchical Data
13th International Conference on Data Engineering
(ICDE'97), Birmingham, England, April 1997.
joint work with Jeffrey Ullman, Janet Wiener, and Sudarshan Chawathe
Representative objects are designed to serve as a DTD-by-example
for semistructured (XML-like) data. The paper presents an algorithm,
based on determinization of nondeterministic finite automata, for
deriving representative objects from semistructured data. (Abstract)
-
Inferring
Structure in Semistructured Data
Workshop on Management of Semistructured Data in conjunction with SIGMOD'97, Tucson, Arizona, May 1997.
joint work with Serge Abiteboul and Rajeev Motwani
This paper explores methods for organizing semistructured objects
into type hierarchies based on object attributes. The type
hierarchies are inferred from the data and have varying degrees of
precision (inversely proportional to their size) .(Abstract)
-
Extracting
Schema from Semistructured Data
1998 ACM International Conference On
Management of Data (SIGMOD'98),
Seattle, Washington, June 1998.
joint work with Serge Abiteboul and Rajeev Motwani
This paper presents algorithms for deriving approximate
schemas, akin to schemas in object-oriented databases, for
semistructured data. Schema are represented as datalog programs
with the semantics of the greatest fixed-point over the data. (Abstract)
Svetlozar Nestorov
Last modified: Fri Sep 10 14:43:58 PDT