CS99I Meeting
4 Notes: XML
By Gio Wiederhold,
Updated 28 Jan 2001.
Topics Covered briefly
Last time: HTML: for document transmittal, varied presentation, hierarchically structured + links; ordered
Tags provide metadata for presentation
( HTML intro).
Problem: The nice-for-people presentaion doesn't really define what is being represented. For business use we want web pages that can be processed automatically.
To the rescue: XML: for document processing, hierarchically structured + links, more; ordered (except for attributes)
- tags with `semantic' names (<person> person stuff </person>)
- still
Hyperlinks: http://computer/directory/file+/entrypoint$ (see Regular expression syntax)
- optional metadata for description (DTD) and/or presentation (XSL)
Read more in
XML intro.
Whereas the HTML tags are common to all HTML documents, the XML tags are
domain dependent. Domains might be:
- Petroleum products trading
- Shakespeare plays
- Household stuff (for manufaxturers to stores)
- office supplies
- . . . suggest more . . .
For each domain the allowable tags, and the structure in which they appera has to be defined. That is done in a Data Tag Definition (DTD).
To indicate if alements are optional, or can be repeated they are labeled with
characters used in Regular Expressions.
Important for formulating
- Representation grammars
- queries (getting some subset of the representation)
sequence: (a,b,c)
alternatives: (x|y), in combination (x|y, b,c) {x,b,c or y, b, c}
optional: q$ {q | nothing}
any: r* {nothing | r | rr | rrr | rrr... }
repeats: s+ { s | ss | sss | sss... }
Example:
(((S|s)ection|paragraph(s$) )*.)
matches all citations looking like
Section xx., section xx., paragraph xx., paragraphs xx.
By setting a marker for xx, those text can be retrieved for display ot processing.
A regular language is capable, but not really user-friendly.
- Would such a query language help your browsing?
- Would such a language help in screen-scraping?
Sales taxes and the Internet
What is the problem?
What is a solution>
References
Brief
intro to XML.
0
Brief
intro to RDS ADO [ASP, 25Feb 2000].
XSL information
See also the references.