CS99I Meeting 3 Notes: HTML

By Gio Wiederhold, Updated 20 Jan 2001.

Topics Covered briefly

HTML

Hyper (multi-linked) Text (documents) Markup (with format annotations) Language, Used to markup documents so they can be easily shown on a variety of computer devices, and reference ( HREF ) local and remote documents and images. Remote documents require a computer address (http://www.somewhere.xxx ) so they can be found.

Document Formats

Paper: arbitrarily structured/unstructured; physical order.
Books: somewhat structured/unstructured; layout order; metadata: ToC, index.
Tables: very structured. Exceptions awkward -- footnotes
Databases: very structured. Machine processable, queryable. Exceptions awkward.

    relational: tabular based, links by references, join operator; unordered. student|><|course-info
    object-oriented: tree-based, structural (and optional reference) links; ordered (often)

Components

Three older inventions combined:

  1. Document Markup for typesetting: SGML [IBM -- Air Force about 1975]. Markups are metadata for presentation ( HTML intro).
  2. Hypertext linkages to create a hierarchical document [Nelson, about 1960]. Uses Hyperlinks: http://computer/directory/file+/entrypoint$ (see Regular expression syntax)
  3. Simplified FTP, with embedded site address (http://cs.stanford.edu/account/...) avoiding having to login [BernersLee@CERN], uses Internet-based addressing for remote documents
Two Technologies:
  1. A means to access and documents remotely: Hypertext transfer (Http) -- an FTP that includes linkslinked
  2. A browser [Mosaic by [Andreesen, Bina the Univ.of Illinois HPPC center. A browser program interprets HTML, with http, and integrates text, images, and remote references (hyperlinks)
and a business requisite
    A community of high-energy physiscists who
  1. benefitted from rapid access to complex documents and
  2. had the computers on which the (free) browsers could be installed.

Browser competition [Clark-Netscape] [Gates-Microsoft]

Learn by reading and doing

Reading: Bring in a simple HTML web document (like this one), and see what it looks like

If you look at a `commercial' web page you will find many markups that we won't have to care about. Make notes about the ones that puzzle you and discuss them in class. The essential ones are listed in our CS99I HTML notes.
Doing, indirectly: Create a document with, say, Microsoft Word, save it as HTML, and look at it.
Doing, directly: Create a document with HTML markups yourself, as shown in the notes, and then save it as text. Change (rename) the postfix from .txt to .html, and then look at what you have created.

Some preliminary hints for future meetings

Role of HTML
Selling over the Internet
  1. fungible versus unique goods
  2. return policies and problems
Regular expression syntax

Important for formulating

  1. Representation grammars
  2. queries (getting some subset of the representation) sequence: (a,b,c)
    alternatives: (x|y), in combination (x|y, b,c) {x,b,c or y, b, c}
    optional: q$ {q | nothing}
    any: r* {nothing | r | rr | rrr | rrr... }
    repeats: s+ { s | ss | sss | sss... }
Example:
(((S|s)ection|paragraph(s$) )*.)
matches all citations looking like
Section xx., section xx., paragraph xx., paragraphs xx.
By setting a marker for xx, those text can be retrieved for display ot processing. A regular language is capable, but not really user-friendly.

Notes

See
Brief intro to HTML.
See also the references.