CS99I HTML INFO

Abstract by Gio Wiederhold, 19 Jan 2000. Minor updates Mar 2001.

HTML briefly

We describe only a few basic commands of the HyperText Markup Language (HTML). The current common version is HTML 2.0, but 3.0 is often available. In a browser you can inspect or save the source file to learn about the formatting that was used. Not all browsers handle all formats, and they certainly don't treat them the same way.

Conventions

HTML is an application conforming to ISO 8879 (Standard Graphic Markup Language or SGML). SGML uses embedded directives to indicate formatting, while leaving the interpretation to the client's display program and its knowledge about the screen, paper, user preferences, etc. These directives are bracketed by Less-Than(<) and Greater-Than (>) symbols. In this document we use UPPER CASE for all HTML directives shown, although lowere-and upper-case directives are equivalent.
Browsers may ignore stuff in these <brackets> they don't recognize. To enable us to show the directives in this HTML document we use internally some special symbols (see below).
There are also special characters, which start with an ampersand (&).

General layout

Each document should start with a declaration
<!Doctype html public "-//W3O//DTD/ W3 HTML 2.0//EN">,
here indicating that the document conforms to HTML version 2.0, followed by
<HTML>.
Most commands have a corresponding closure, for instance there should be a
</HTML> at end of the document.

 

A document is split into a HEAD and a BODY.

 The HEAD is for external information, as the TITLE, used by the browser for its frame, and the external name of the page to the browser, i.e.,
<HEAD><TITLE>HTML information for CS99I book</TITLE>
<BASE HREF="http://www-db.stanford.edu/pub/gio/CS99I/html-info.html">
</HEAD>

and a BODY, i.e.,
<BODY> followed by everything in the document, until the closing </BODY>,
except for <! declarations not to be displayed >

Headers and paragraph breaks

There are six levels of section headers:
<Hx>heading text</Hx> x = 1..6
We use <H1> for the chapter headings, <H2> for the major sections, and <H3> for subsections.

<P> starts a paragraph, to be terminated with </P>,
and
<BR> forces a linebreak (used liberally in this document).

Lists are a of three types:
<yL> list: <UL> unumbered; <OL> numbered; <DL> definition
Each list entry starts with <LI>
and the list is terminated by </yL>.
List commands as <OL>, </OL> are also (mis)used to provide indenting of text.

Normally you want to leave as much formatting as possible to the browser, since it will adjust itself to the available page size and customer preferences, but formatting can be disabled by bracketting
<PRE> preformatted asis </PRE>.

Cross References

The ability to go to other documents is the main innovation of HTML.
<A HREF="filename"> mousearea </A> as
<A HREF="http://db.stanford.edu/pub/gio/CS99I/intro.html">CS99I Introductory Chapter</A>
This also works to go to files that are in other formats, if your browser has the appropriate plugin, say Ghostscript for
<A HREF="http://db.stanford.edu/pub/gio/slides/atarpa.ps">ARPA postscript slides</A>.
One can also go into the middle of a document, if a name has been given to the entrypoint:
<A HREF="#SecSix">Section 6</A> --> <A NAME="SecSix">
(Note: The NAME=definition appears not to work inside of TABLEs)

Images

There are many image formats, and 2 ways two show them. Images can be embedded, as
<IMG Align=top/middle SRC="imagefilename.format">
say <IMG SRC="../gifs/exclaim.gif">,
(I keep a shared gifs file in a lower-level directory)
or referenced as distinct documents, requiring a click: <A HREF="../gifs/exclaim.gif">show exclaim.gif</A>. Standard formats are
  1. .gif, the most common graphic image format used with HTML
  2. .tiff (Tagged image format) is often avaialable as well;
  3. .bmp for Bitmaps, a PC format
  4. .xbm for XBitmaps, a UNIX format
  5. .jpg or .jpeg a compressed format for images.
  6. .mpg or .mpeg a compressed format for video.
  7. .mp3 a compressed format for audio and music.

It depends on the browser's plugins what can be handled.
One can also create clickable areas within an image.
In UNIX use xv to edit images.

email addresses

Use
<A HREF="mailto:gio@cs.stanford.edu">email to: gio@cs.stanford.edu</A>
to insert a mailing address. The text between the the opening <A..> and the closing </A> is arbitrary.

Other useful formatting commands

<BLOCKQUOTE> for quotations</BLOCKQUOTE>
<ADDRESS> for addresses <\ADDRESS>
<CENTER> text </CENTER>

Special characters

Most of these symbols starting with & not all browsers interpret all these characters, as you might notice below in the ():
  1. &lt for <
  2. &gt for >
  3. &amp for &
  4. &quot for "
  5. &nbsp for a non-breaking space ( ), compare size to nothing ()
  6. &ndash for a short (n-sized) dash (–)
  7. &mdash for a long (mn-sized) dash (—)
  8. &shy for a low dash (­)
  9. &auml for a-umlaut (ä), &ouml for o-umlaut (ö)
  10. &#169 copyright symbol (©)
  11. &#trade trademark symbol (™)
and many others. A semicolon can be used after a symbol to terminate it, the semicolon will not show.
<NULL> creates an invisible break, useful when combining special and ordinary characters, while &nbsp; creates a space that is not a blank character.
A <HR> creates a horizontal rule, like (
).
<underline> ... brackets (text to be underlined)<underline> (maybe).

More characters are denoted numerically as &nnn;, where nnn is the sum of the row and column numbers in the table below:

All 256 1 byte characters

Note that any characters your browser does not understand come out funny or as entered. I hope none crash your browser.

+0123456789|10111213141516171819|
0 | |
20| !"#$%&'|
40()*+,-./01|23456789:;|
60<=>?@ABCDE|FGHIJKLMNO|
80PQRSTUVWXY|Z[\]^_`abc|
100defghijklm|nopqrstuvw|
120xyz{|}~|ƒˆŠ|
140ŒŽ|˜šœžŸ|
160 ¡¢£¤¥¦§¨©|ª«¬­®¯°±²³|
180´µ·¸¹º»¼½|¾¿ÀÁÂÃÄÅÆÇ|
200ÈÉÊËÌÍÎÏÐÑ|ÒÓÔÕÖ×ØÙÚÛ|
220ÜÝÞßàáâãäå|æçèéêëìíîï|
240ðñòóôõö÷øù|úûüýþÿ|

Font Styles

Styles, relative sizes, and colors can be indicated, but your browser chooses the actual representation.
<FONT with options to increase the size, say as <FONT SIZE=+1> by 1, SIZE=+1 until </FONT>
and/or set the COLOR=BLUE> until </FONT>

Logical styles

<EM> Emphasis italics <EM> ; we use these for words cited in the glossary.
<STRONG> Strong emphasis italics <STRONG>
<CITE> book, journal citation italics <CITE>
<KBD> typing font <KBD>; we use these for examples of type-ins.
<VAR> substitution example font </VAR>

Physical styles

<B> bold <B>
<I> italic <I>
<TT> typewriter <TT>

Tables

We just show a summary example.
<TABLE> <TABLE BORDER=3> <TABLE CELLSPACING=2 (standard)>
<CAPTION> one line only, centered, plain, last line wins</CAPTION>
<TR><TH>a row of centered (default) header items <TH> more <TH> for as many columns as wanted
<TH WIDTH=pixels or WIDTH=percent%>, CENTER is the default.
<TR><TD>a row of data fields <TD> more data <TD> field with left-aligned data (default)
<TR> more rows, joint field width automatic, multi line automatic<TD> <TD>
<TR>more rows
<TD or TH options include


</TABLE>

By default the alignment of tables is done automatically, an example without SPACING, WIDTH, ALIGN, or SPAN options is seen above in the table of characters.

Long Tables

Tables can be very long, look for instance at the list of a list of all 84 Hitchcock movies has been manually split into 4 distinct tables. Long tables take long to load, and are hard to manage with scrollbars. We can use an option, DATAPAGESIZE, in the TABLE specification to to split the presentation of a long table, as

remainder of section is not complted

<TABLE DATAPAGESIZE=8 ID=table>. To allow maniplation of that table we add a provision for <INPUT TYPE="button" VALUE="Next" ONCLICK="table.nextPage();">
of a button click, which refers to that table's ID. Now we can look at all of Hitchcock's films page by page, although they are stored as a single HTML table.

Comments

There are two levels of comments;

  1. Comments intended for the systems that process HTML text. Those start with <&!command, where command is to be understood by the processor, as `Doctype' in the header of an HTML file. It is terminated by an > character.
  2. Internal comments, intended for the persons maintaining an HTML file. Those start with <!-- and are terminated by -->.
Comments should not extend over more than one line.

Counters

The counter we are using for the Web-book is installed on the server bergman.stanford.edu. An example would be:

Hit's since 15 Jan 2000:

More information on can be found at the counter's home page.

HTML Checkers

!!Check -- the company was bought by Netscape late 1998. One possible HTML checker is the Web Site Garage.

Notes

Current version of Microsoft Word and Powerpoint have the option to convert their documents to HTML, and vice versa. But things the capabilities of the HTML browsers don't match the capabilities of word, the result often is imperfect. Some subsequent manual editing can mak stuff look mch better.

Perhaps see Chris Hector "rtftohtml" to convert Word files to html [Cray Research Tech.report, 1995 ftp://ftp.cray.com/src/wwwstuff/RTF/rtftohtml_overview.html].
See also the CS99I references.