07/10/1996 | Marcus Breunig |
07/14/1996 | Joachim Hammer |
Table of Contents
This document was written by Markus Breunig (breunig@cs.stanford.edu) and describes the work I have done in June/July 1996 on the TSIMMIS project.
My original and main project was to write code to match templates and queries for the wrappers in TSIMMIS.
Variable X | Variable Y | remember binding X=Y |
Constant C | error | |
Placeholder P | error | |
Constant C | Variable Y | generate filter Y=C |
Constant C | ok | |
Constant D | error (D !=C) | |
Placeholder P | remember binding P=C |
Variable Y | Variable X | remember binding X=Y |
Constant C | generate filter Y=C | |
Constant C | Variable X | error |
Constant C | ok | |
Constant D | (D !=C) error | |
Placeholder P | Variable X | error |
Constant C | remeber binding P=C |
The directory contains code to match a query with a template.
The following files are important:
We use a template consisting of 2 rules:
C :- C: <report {<title $X>}>
// "select * from book where Title = " ^ $X //.
B :- B: <book {<author $X><title Y><date Y>}>
// $S = "select * from "
$T = $S ^ "book where ",
$$ = $T ^ "Author = " ^ $X //.
and submit the following query:
A :- A: <book {<author "Joe"><title "1991"><date "1991">}>.
The parser and matcher together will produce the following results:
** Parsing template
C_0 :-
C_0:<OID1_0 report {<OID0_0 title $X_0>}>@(null)
// "select * from book where Title = " ^ $X_0 //.
B_1 :-
B_1:<OID5_1 book {<OID2_1 author $X_1> <OID3_1 title Y_1>
<OID4_1 date Y_1>}>@(null)
// $S_1 = "select * from " ,
$T_1 = $S_1 ^ "book where " ,
$$ = $T_1 ^ "Author = " ^ $X_1 //.
** Parsing Query
A_0 :-
A_0:<OID9_0 book {<OID6_0 author Joe> <OID7_0 title 1991>
<OID8_0 date 1991>}>@(null).
** Matching
** Results
Rule chosen : 1
Action : "select * from book where Author = Joe"
Placeholders:
{
$X_1 = Joe
$S_1 = select * from
$T_1 = select * from book where
}
extractor :
A_0:<OID9_0 book {<OID6_0 author Joe> <OID7_0 title 1991>
<OID8_0 date 1991>}>
constructor :
A_0
Optimizer chose rule: 1
Note:
void f() {
Match *m;
struct rulelisttype *templ;
struct rulelisttype *query;
Tracer TrQuery, TrTempl;
Parser parser;
char **vtTempl, **vtQuery;
// parse the template file
templ = parser.File("TEMPLATE");
// check if the parsing was successful
if(templ==NULL) { // parse error
cerr << " Parser error parsing the template. "
<< endl;
} else {
// get the symbol table from the parser
vtTempl = parser.GetVT();
// initialize the tracer with the symbol table
TrTempl.SetVT(vtTempl);
// print the parsed template
TrTempl.print(templ);
// parse the query file
query = parser.File("QUERY");
// check if the parsing was successful
if(query==NULL) { // parse error
cerr << " Parser error parsing the query. " << endl;
} else {
// same as for template
vtQuery = parser.GetVT();
TrQuery.SetVT(vtQuery);
TrQuery.print(query);
// create the matcher and let him do the matching
m = new Match(query->rule, templ);
// get the results from the matcher
const MatchResult *mr = m->GetResult();
// see what the matcher says
if(mr==NULL) { // matching not successful
if(m->GetState()==S_NOMATCH) {
// matching did just fail
} else if(m->GetState()==S_ERROR) {
// error while matching
// in this case more info about the error can be found in the
// ErrMsg string the matcher provides
cerr << "ERROR: " << m->GetErrMsg() << endl);
}
} else {
// matching was successful
// use the mr->... to continue working
// optimize the result.
Optimizer o;
mr = o.optimize(mr);
// now use mr to go to the source
}
delete m;
delete vtQuery;
}
delete vtTempl;
}
}
The following is NOT a concise abstract about the implementation of the matcher, but rather just s bunch of points I think might be important for someone trying to understand the code.
The functions that do the actual matching.
In order to test my code I've written a test-program that read a template and a query, parses them, matches them, calls the optimizer and prints the results. This program is called main.cc, read the file TEMPLATE and the file QUERY. It can be used to test more templates and queries.
During the coding phase of the wrapper some deficiencies of the parser became apparent. I was assigned the task to clear up the parser and do the necessary changed. The parser was originally written by Yannis, modified by Amit Agarwal to incorporate the actions.
Unfortunately due to a misunderstanding between Amit and Vassilis the code Amit wrote did not solve the actions in a satisfactory manner. After looking at the code I decided the best way to go was to start anew from Yannis' code.
As the parser was originally written for the mediators it required every condition to have a site associated with it, e.g. A :- A:<book {<author $B>}>@s1 As the wrapper only runs at one site this is not necessary for the wrapper. I changed to parser so the site is only optional and not required any more. So A :- A:<book {<authro $B>} will parse without error.
Actions were completely recoded. Newly introduced were names for the rules and a way to specify desirability, which is used by the optimizer. The syntax for template is now:
template ::= {rule}* rule ::= [PreHead] Head :- Condition-List Actionlist. PreHead ::= RuleName:Desirability RuleName ::= Constant Desirability ::= Constant Head ::= as before Actionslist ::= // NoAssAction // | // AssAction { , AssAction }* // NoAssAction ::= ActionBody AssAction ::= $$ = ActionBody | Placeholder = ActionBody ActionBody ::= ActionTerm ActionTerm ::= Func ( ActionTerm ) | ActionTerm { ^ ActionTerm }* | Constant | Placeholder | $$ Constant ::= "{AlphanumericalChar}*" Placeholder ::= $UpercaseChar{AlphenumericalChar}* Condition-List ::= similar to before, but everywhere where Variables were allowed we also now allow Placeholders Func is a function symbol. Each function symbol expects a certain number of parameters. Currently the following function-symbols and associated functions are available: uppercase(X) : converts X into uppercase letters lowercase(X) : converts X into lowercase letters equal(X,Y) : if the caseinsensitive comparison of X and Y yields that they are the same, this function returns "TRUE", otherwise it returns "" (the empty string) if(X,Y,Z) : if X is not the empty string ("") then this function returns X otherwise it returns Z cr() : returns a "\n", i.e. a newline quote(X) : puts the string in quotes, i.e. returns "X"
The files trace.cc and trace.h contained C functions to print a parse tree into understandable form. These functions have been encapsulated into a "Tracer" class (files tracer.cc and tracer.h). Furthermore new functions have been written to enable printing of Placeholders and Actions.
The easiest way to free the memory occupied by the parse trees
is by calling the Free functions of the class Parser. They are
overloaded functions that recursively free the whole parsetree
starting at the point given. Typical code would look like this:
{
Parser parse;
struct rulelisttype *res;
res = parse.File("TEMPLATES");
...
parse.Free(res);
}
The pdp.h file has been changed in the following ways:
Furthermore the file pdp.h now contains comments describing all structures
and which fields are valid for which kind values (at least to the best of
my knowledge). More comments would not hurt, though.
The parser returns a list of rules that match. The optimizer has the job of determining which one will be the most efficient to execute. It currently uses the following heuristic: Let R be the set of all matching rules. It can be partitioned into two disjoint sets P and NP, where P contains all rules that have not enough predicates for the query, i.e. rules for which additional predicates have to be evaluated after getting the result from the source. NP contains all other matching rules. So R = P union NP, and P and NP are disjoint.
If NP it not empty then the optimizer picks as the best rule the
one with the highest desirability on NP, otherwise the one with
the highest desirability in P.
The interface to the optimizer is very simple:
const MatchResult *mr, *omr;
Optimizer o;
.. get mr from matcher...
omr = o.optimize(mr)
..
omr points to the best element in the list mr.
~breunig/tsimmis/include/pdp.h
=> ~tsimmis/tsimmis-1.0/src/common/pdp.h
~breunig/tsimmis/parser/*
=> ~tsimmis/tsimmis-1.0/src/wrappers-96/parser/*
~breunig/tsimmis/matcher/*
=> ~tsimmis/tsimmis-1.0/src/wrappers-96/matcher/*
~breunig/tsimmis/matcher/L/*
=> ~tsimmis/tsimmis-1.0/src/wrappers-96/matcher/L/*