Data Mining Techniques for Structured and Semistructured Data

9/10/99


Click here to start


Table of Contents

Data Mining Techniques for Structured and Semistructured Data

Abundance of Data

Data Mining

Talk Outline

Query Flocks: Efficient, On-line, Ad-hoc Mining of Structured Data

Mining Structured Data

Query Flock Features

Query Flock Roadmap

Association-Rule Mining

Mythical Association Rule

Market Baskets as Query Flock

Query Flock Result

Formal Definition of Flocks

Association-Rule Challenge

The A-Priori Technique

A-Priori for Query Flocks

Query Flock Roadmap

Medical Example: Side Effects

Side-Effect Query Flock

Some Safe Subqueries

Processing Flocks Efficiently

Query Flock Plans

Auxiliary Relations

Generating Flock Plans

Example Query Flock Plan

Side Effects Directly in SQL

Typical Direct Plan in RDBMS

Why Do Flocks?

Query Flock Roadmap

Query Flock Architecture

Query Flock Compiler

Flock Compiler Architecture

Performance: Medical Data

Structure Discovery: Mining Semistructured Data

Semistructured Data: Example

Semistructured Data: Definition

Semistructured Data: Data Model

Semistructured Data: Challenges

Benefits of Explicit Structure

Research Contributions

Representative Objects (RO)

RO Construction Algorithm

Semistructured Data: Example

RO Example

RO Features

Approximate Schema: Challenges

Approximate Schema: Our Solution

Notation

Notation Example

Typing Program: Definition

Typing Program: Example

Defect: Excess and Deficit

Typing Program: Construction

Stage 1: Perfect Types

Stage 2: Clustering Types

Clustering Types: Example

Semistructured Data: Example

Stage 3: Recasting Objects

Optimal Typing

Approximate Schema: Features

Research Contributions

Future Research Interests

Acknowledgements

Author: DBGroup

Email: evtimov@db.stanford.edu

Home Page: http://www-db.stanford.edu/~evtimov