CS99I Meeting 4 Notes: XML

By Gio Wiederhold, Updated 28 Jan 2001.

Topics Covered briefly

Last time: HTML: for document transmittal, varied presentation, hierarchically structured + links; ordered
Tags provide metadata for presentation ( HTML intro). Problem: The nice-for-people presentaion doesn't really define what is being represented. For business use we want web pages that can be processed automatically.

To the rescue: XML: for document processing, hierarchically structured + links, more; ordered (except for attributes)

Read more in XML intro.

Whereas the HTML tags are common to all HTML documents, the XML tags are domain dependent. Domains might be:

For each domain the allowable tags, and the structure in which they appera has to be defined. That is done in a Data Tag Definition (DTD). To indicate if alements are optional, or can be repeated they are labeled with characters used in Regular Expressions.
Regular expression syntax

Important for formulating

  1. Representation grammars
  2. queries (getting some subset of the representation) sequence: (a,b,c)
    alternatives: (x|y), in combination (x|y, b,c) {x,b,c or y, b, c}
    optional: q$ {q | nothing}
    any: r* {nothing | r | rr | rrr | rrr... }
    repeats: s+ { s | ss | sss | sss... }
Example:
(((S|s)ection|paragraph(s$) )*.)
matches all citations looking like
Section xx., section xx., paragraph xx., paragraphs xx.
By setting a marker for xx, those text can be retrieved for display ot processing. A regular language is capable, but not really user-friendly.

Sales taxes and the Internet

What is the problem?

What is a solution>

References

Brief intro to XML.
0 Brief intro to RDS ADO [ASP, 25Feb 2000].
XSL information
See also the references.