CS99I Freshman Seminar

Winter 1997/1998.

Traveling the Information Highways: World-Wide Browsing

Maps, Encounters, and Directions

Master copy on Earth.
Draft 14Dec1993, rev.10Mar94, updates 25May94, 15Jan98. 28 May 1998Master on EARTH
This material is

©Gio Wiederhold and CS99I students, Stanford University, 1998.

Chapter: World-Wide Browsing

Cartoon: "The Guy Who Took a Wrong Turn Off the Electronic Superhighway and Wound up in A Microwave Oven in Davenport Iowa": Defrost Wingettes, 4:15. [New Yorker]
Previous chapter: Ubiquitous Computing - Next chapter: Entertainment and Education

BROWSING.Intro

When we get to a town we haven't visited before we might immediately get down to business and search out the factories, stores, libraries, or government offices we need. Alternatively we can take some walks and get comfortable with the environment. When wandering through the neighborhoods we might do some window shopping, browsw in the shopping malls, meet people in restaurants, and enjoy playgrounds and parks. This chapter focuses on this second alternative, while later chapters are more directed.

To get into remote towns we get on the highway with some idea of the destination, and turn onto an *off-ramp when we get near to where we want to be. If we live in a large city, we may not need the highway, because many services will be local. We can meet friends, known ones or new ones by looking at *bulletin boards (BBs). We can look at local guides and directories, advertisements, and newspapers to learn what is going on.

In the meeting places one can meet potential friends, but also crooks. While one will not be physically hurt when interacting solely on the network, one should be careful when giving out addresses, keys, credit card numbers, and the like. In Chap\F we will present the possibilities of making contacts for electronic commerce, and the need for security. Today's electronic highways are still in the middle ages, and one can encounter knights, bandits, Robin Hoods, good samaritans, peasants, and artisans trudging warily to markets, and many individuals just seeking adventure in other countries.

Off-ramps from the highways also provide access to fantasy worlds, with castles, labyrinths, etc., populated by mythical beasts, to slay or befriend. A visitor can assume a playful role in this fantasy world: a meek person can be a fearless hero, and anyone can issue sage advice. Playing games with others in such fantasy worlds is enabled by accessing *Multi-User Dungeons (MUD). Players can deal with mythical beasts or other players, who they have never met as real persons. Players preserve anonymity by giving themselves imaginative names (*handles?) as "Moonshadow" [ref Washington Post Legislate #1193184, 28Nov93].

At other Off-ramps one finds opportunities to meet people, to shop, and a wide variety of information. The table in this chapter lists some of the resources, but the scene varies so rapidly that it is best explored on-line; although many guides are published [Braun:94, Dern:94, Krol:92]. <> In [Gr\/onbaek:94] methods for effective browsing on the Internet are presented, even with guidance one can easily get lost, since the signage on the `Internet look[s] like the New Jersey Turnpike outside Newark' [Acronym:94].

BROWSING.History

Using the computer for browsing is a relatively new activity. Until the internet was well established there were few places one could get to, and even fewer that could be accessed freely. There were libraries, but access to them was often limited to qualified experts. There were some directories, but those were intended for scientists, for instance to locate data gathered by NASA explorations [ref ].

But as travel became affordable, people strarted hawking their wares along the roads. And since goods are not consumed, only copied along the highways, many people set up stalls showing and sharing their wares, without expecting any reimbursement, other than some `thank you's, some recognition, and the hope to be able to change the world a bit. Many government institutions starting making their data available, often with similar motivation.

BROWSING.History.MEDLINE

Some institutions have delivered information to remote users for a long time, for instance the National Library of Medicine (NLM) with its Medline service. The papers that are made available are carefully selected and indexed. Such library operations will be presented in Chap\L. The number of such *value-added services is increasing, but the services discussed in this chapter focus on broad and free access, with little guarantee that the contents is accurate, complete, and unbiased. The reader must judge the value of what has been stored and retrieved. Knowing the source can help, for instance, one would not expect !example of an obviously biased BB.

BROWSING.History.access

There are many documents that people want to make available at a much more informal level. By providing *anonymous *ftp access to colleagues on the *Internet, the formal library system can be bypassed, avoiding both delays and scientific scrutiny. Accessing more general information remotely was awkward using the basic internet file transfer (FTP) protocol. A succession of services developed, which culminated in the browsers you are using to view this book.

ARCHIE

Since FTP-sites are widely dispersed, and may use somewhat different access conventions, a tool to broaden access is helpful. An early popular tool was `Archie'. An ftp-site can become an Archie-server, by submitting its TCP-IP address to [[xx]]. Software at this site makes an index of ftp-accessible documents and programs available to Archie clients. A searcher for a document now has a much wider choice, and when likely documents are identified, can execute the proper ftp protocol to obtain them.

GOPHER

Univ. of Minnesota

VERONICA

Veronica[Un.Nevada]=Gopher index server, updated monthly, replicated

PROSPERO

WAIS

In 1989 Thinking Machines Corporation, and in particular [Brewster Kahle], were investigating broader uses of their *Connection Machine (CM), a powerful parallel computer suitable for rapid scanning of large bodies of text. The CM computers had been effective in intelligence agencies, but that market is limited. Thinking Machines made their software, *WAIS (Wide Area Information Server) and CMs at their home site freely available and so enabled many groups with data resources to experiment with provision of free data access over networks. Data can be accessed by anyone with a minimal terminal or PC, all the search effort is done in the CM. The search conventions and data formats established for WAIS led to a standard: *X39.50. Today some of those experimenters have installed their own equipment, and can make data available themselves, following the same standards. Today a separate company, WAIS, Inc., provides WAIS .

but inadequate means of making them available,

WAIS needs 1. incremtenal delivery, 2. Measures `of relevance (now 1/0) , 3. then ranking

HTTP

Tim [Berners.Lee:] at Cern 1989 for high energy physicists hypertext

client server model. Objective was simply to reduce lead time for physics preprint

MOSAIC

Mosaic is a browsing tool provided by the Supercomputer Center in Champaign Illinois, supported through NASA. Browses through World-wide web. Berners.Lee:] arrt Cern

Mouse diven. Image access. Sound. 100ds of DBs configuted with Mosaic.

Nasa`weather, Clinton's speeches @ Un.of Missouri, music vidoes @ MTV , Library of Congres catalogue, UC Berkeley paleontology

Novell for its documents. Next on-line magazines, supported by advertising [O'Reilly and assocaites, Sebastopol CA]

need direct Internet hookup

Mitch Kapoor, ex Lotus, head Electronic Frontier Foundation.

Today an enormous volume of information is available.

BROWSING.Functions

Browsing is an informal, unaided search through information sources.

It distinguishes itself from formal *querying (Chap.\L\F\Query?) by serving casual visitors wandering along the information highway.

< NAME="STARTING">BROWSING.Functions.starting

When browsing the searcher has no specific idea in what exists in the information bases, and little idea of where relevant information might be. Initial steps try to identify candidate resources, using the equivalent of the *yellow pages to find candidate suppliers. In subsequent steps the material on the shelves of the candidate suppliers are scanned to look for interesting stuff. If there are many shelves, you may try to find the most likely shelves by their label, or you may consult local inventory lists.

Since the casual browser will not know the prcise designation of what is wanted, assistance is needed. A number of methods can be employed be helpful assistant.

\item{1} A menu can be provided. Since there is too much stuff to fit on one menu page, the menu will be hierarchically organized. Figure\friendly showed the top entry of such a menu. At each level of a hierarchy a choice among $7\pm2$ categories seems optimal for human perception [TMN<<>>]. Creating natural icons for all entries can be difficult. When there no natural hierarchy which can serve as a layer then initial letters or digits can be used, but user-friendliness is soon lost.

\item{2} Multiple menus are often needed. A single hierarchy imposes one organization principle. Even if it can be shown that one taxonomy is best, say arranging auomobiles by brand, type, and serial number, some searchers will prefer to search for the same cars by color, size, and age, and yet others by state, town, and license numbers. Zooming in a map may be the best way to pinpoint a location. Sometimes one may want to use characterstics from multiple hierarchies. There might have to be a menu of menus.

\item{3} Generalizing from examples. A browser may want to bring an example, perhaps by reference, and look for similar items. [CBR]. One may want a car that is similar to one

owned earlier, or find suspects that match a sketch or a video clip. To locate a piece of music one may hum a view bars, and to locate a house one may sketch its outline.

\enditem In all cases browsing is characterized by successive refinement and interaction. In Chap.\U\H we reviewed which types of computers are effective in supporting such interaction. <>

BROWSING.functions

\NEWS

BROWSING.functions

\GAMES

BROWSING.functions

\ENCYCLOPEDIAS

BROWSING.functions

\LINKING

problem WWWeb requires modification of source documents, unaccepatble, confusion. [Engelbart]

BROWSING.functions.annotation

(!or in Library)

BROWSING.functions.flexible-media

The systems that are becoming available for browsing use an increasing variety of media. While simple text still dominates, there are drawings, pictures, videoclips, film, sounds and voice. The only sensory output missing along the digital highways are smells and bumps. With the variety of information media come a variety of presentation and input *modes. For graphics we need to display or enter lines and shadings for areas. For pictures we need TV-like displays and digital input of photographs, x-rays, etc. For video and film we need sequences of images, presented with precision, so that motion remains smooth. Sounds are represented by digitized waveforms, and spoken words must be played back precisely to be clear to the listener.

Technology is making rapid progress in managing data in all these media. The cost of transmission is high for some of them, as discussed in Chap\U\T\? . However, not everyone along the digital highways has the same capability to receive or enter information in all those media, and there are situations where some media are not appropriate. For example, receiving driving directions in image form while driving is dangerous. In a noisy environment speech may not be audible and receiving data in voice form will create a distraction in a library or classroom.

Applies also to noisy emvironmements, or to situations as within a kibrary, where bleeps disturb the desired silemce, or to be people working underwater.

BROWSING.functions.disabled

An important category of requirements for media conversion is to provide fair access to *disabled persons. Participation in activities along the digital highways can can make a crucial difference her. Access to the information highwas should empower, rather than hinder disabled people. Today the U.S. alone is spending $200Billion per year on services to disabled and elderly persons. Bringing the digital highways into their homes is the first step. Assuring access for *visually, speech, or motion impaired individuals is the next step.

BROWSING.Technology

Technology

BROWSING.Technology.services

BROWSING.Technology.engines

Mosaic
Netscape
Microsoft Internet Browser

BROWSING.Technology.CATALOGUING

Altavista

BROWSING.Technology.classifying

Yahoo
Linkages to electronic Commerce
shopping recruitment $5B indisutry in advet. Including, classified ads. real-estate Linguistic feature extraction / JDBC / ODBC

BROWSING.Technology.rating

Alexa

BROWSING.Technology.mining

Alexa

BROWSING.Technology.knowbots

KNOWBOTS

Cookies

When a program coming into your home on the Information Highway it can leave a memento behind.

BROWSING.Technology.repositories

REPOSITORIES

BROWSING.Technology

\KNOWBOTS

< NAME="TECHNOLOGY">BROWSING.Technology

\STANDARDS Z39.50

< NAME="TECHNOLOGY">BROWSING.Technology

\HYPERTEXT A hypertext is an active text, where the reader can * touch any term in document and move to a section in the document where more information on that topic is provided. In practice only certain terms are touchable in hypertext systems, typically indicated by being displayed in bold-face. Invisible to the user are embedded cross-references, which indicate the position in the document of the referenced term.

Initially hypertext linkages may be created by the author. Authors who structure their writing in a top-down fashion, from layout to chapters to sections etc. create a natural linkage hierarchy, which is easily captured by these hyperlinks. Such an author may also be aware when the hierarchy breaks down and references across the hierarchical tree are needed.

Tools to convert existing texts to hypertext are available. They will seek out all terms and cross-link them, omitting words that are common (* stopwords, as considered when indexing in Chapter\L\T\INDEXING) or appear in every section. Terms that appear only once cannot be linked, of course. Some human assistance is typically required. It makes no sense to cross-link all citations, only links that provide useful explanatory material need to become hyperlinks. Just as in indexing, a problem for automatic creation of hypertext is that concepts are expressed by multiple terms, and the terms themselves are spelled inconsistently, so that it is easy to miss useful linkages. Having a good * thesaurus will generate many more links, but the result will require yet more editing to remove irrelevant linkages.

Ongoing interactions with users is one way to maintain hypertext documents. Now a responsive maintainer is needed throughout a document's useful life-time. Such a maintainer, should be reimbursed, and that requires charging mechanisms, which have been an anathema to the community developing browsing.

Linkages among documents add further value, but should probably be limited to major topics, so that the browsing user is not induced to open an excessive number of mariginal documents, The technology for inter-document and inter-node browsing is also more complex, since the references will be much more indirect. A remote document is also subject to editing, requiring updating of cross-references to it. Since up-to-date documents are more valuable than static ones, their maintenance is of great value.

Currently, remote hyperlinks are rarely available. No standards exists for links and link interpretation. If standards were available, remote access would be readily enabled, since most suppliers of hypertext documents are committed to * open systems. Having an open system does not imply that a hypertext service needs to be free, so that the ability to handle * e-money remains an issue.

< NAME="TECHNOLOGY">BROWSING.Technology

\MOSAIC

HTML MIME standards (no synchronized video)

Once the material has been received from a MOSAIC server it has to be displayed on your machine are needed to present it on your computer. Display software is available for most computers (Windows, Mackintosh, Unix Computers ), but you have to bring that software to your computer.

Work going on o make it extensible, work with SGML suppliers

Shared Mosaic (NCSAA Collage - whiteboard)

script language to create ahypermedia tour.

Secure Mosaic.

Authoring tool for mOSAIC

Storyboard (EITsech, available on PCs also, without standards) emailable animation <--> <

succuss in Mosaic is due to the good viewers on X-windows, MAC, MS wndows

<>

BROWSING.Technology.HYPERG
[H.Maurer, IICM, Graz, austria] compatible still with gopher

adds 3_D (using Silicon Graphics 3D icons) also real-world 3D models(digitizing the baroque library building in Vienna)

Anchors in arbitrary datatypes

Computer navgable links

Annotations of Different types

collection and guides tours overlaid over WWWeb net defned by public or private supplier

avoids physical copies

Attributes to constrain search, with intersection capability

Spin-oofs HyperM presentation system

HM-card personal Hypermedia system

PClibrary electronic library = collection of books with langenscheidt and Brockhaus, Springer. Intially mainly dictionairies, ency, Duden, now handbookof machnen bau, ENT , ... medical texts

select books for one's desktop, then allows searching for hyperterms. , then personal linkages can be made

Journal of Universal Computer Science (JUCS), annually in paper by Springer. [C.Calude, H.maurer, A.Salomaa] Submission by email, referreing via Hyper-G.,

Publication Hypper-G at multiple server sites (at many Univ. for local fast[A access, 50 committees), free 1995-1996 after 1997 $100/year per line per University

net access needed for detailed in figures (local default `postagecstamp figures', for printing acquire better quality PS copies.

CDROM, paper

Quality citable

150 leading scientist editors. { .. boman SRI,Stanford, ... , schllgeter Nievergelt (Zurich)

production

Protection trust universities, large companies, idvidula ccontrol has been lost. [Maurer]

Keep things affordable to discourage copying.

cross check with manual reference to see if you have the manual handy ask for a color code

BROWSING.Alternatives

MCC EInet MCC for corporate, industry networks with commercial txs

Security firewalls, Kerboros

for Galaxy directory services see http://galaxy.einet.net

Active learning [H.Maurer] Record voivce image of prof, presenation, etc digitally with xx dartmouth, Peter Klauer [was Un.Zrich, now swiss bank]

Remote access, and grebn/red light to indicate speed up and slow down. Authoring on the fly.

AEIOU project for Austria's 1000 anniversary, to become publically available, with history, pictures, culture, ilm arciv, Musickgeschichte with sound Ausriaca demo films how one lives, ets, dies` in austria, 15000 world images` collected by maurer.

Ostereich lexicon, [ontological unification with germany]

BROWSING.Alternatives.active objects

, tell users, NII channel with 100 most active Universal Resource Locators (URLs)

BROWSING.Alternatives.collaboration

\ Collaboration [hpc meet] for design UCberkeley

BROWSING.Alternatives.controlled vocabularies

[[or in library]]

BROWSING.Alternatives.ontologies

BROWSING.Alternatives.personal links

In the hypertext model described above we have assumed that the author or a subsequent maintainer creates hyperlinks as an added value to the users. But some users are likely to require private links.

mixed, exclusive, invalidated links. who maintains them-best local, unless turned over>

Such linkages may be created implicitly, by tracing the users path while navigating through a document. Creating such a pathe also has the immediate byproduct of allowing backup, by having an * undo option which reverses the travel, although retaining a record of the path taken.

Created by -what are you looking at 3 -- data helmet

BROWSING.Technology

\STANDARDS

URL

URI

Z39A.50

HTML

XML

BROWSING.Bio

Bio Brewster Kahle

BROWSING.Conclusion

Remote browsing is here today, and has opened the eyes and minds of many people to the benefits that can be gained by traveling the information highways. A secondary reaction is that too much is available along those roads, and the number of hawkers is increasing steadily. There will be a market for guides and advisors to help the traveler. If the traveler is in search of specific items, rather than idly perusing the wares, then there is also a role for brokers or mediators. As in any new enterprise, the market is quite inconsistent in form and content. As remote access becomes broadly available the problem of inconsistent terminology will become more troublesome. Those troubles will motivate [groundswell] efforts to become consistent, today the establishment of common * ontologies is largely carried out in isolation.

BROWSING.Lists

BROWSING.Resources

aurora@xi.uleth.ca /Canada t.holloway@warwick.ac.uk tr>
Resources
name / type sponsor topic access path charging [ref]| %source

AlterNex / BBoard / Brazil Ecology |
ARPA / Doc.svce HPCC documents free / http://ftp.arpa.mil | Aurora? / finger file S.T.D., Un. .../ Canada status of the Aurora Borealis
|

Chatback / email group IBM Great Britain / Warwick 01 223 0017 contacts for / speech-handicapped children free Telecom Gold 01:CLK001 /
|
Comlink / BBoard / Germany Ecology |
ConflictNet / BBoard Inst.for Global Comm. / San Francisco |
EcoNet / BBoard Inst.for Global Comm. / San Francisco Ecology Sprint |
EcuNex / BBoard / Ecuador Ecology |
Fedworld / file service federal documents @130.11.48.107 | GILS / Locator Office of Management\Budget / (OMB) US Government Information multiple / proposed IITF / echriti@usgs.gov>|
GlasNet / BBoard / Russia Ecology |
GreenNet / BBoard / Great Britain Ecology |
LaborNet / BBoard Inst.for Global Comm. / San Francisco |
PeaceNet / BBoard Inst.for Global Comm. / San Francisco |
Pegasus / BBoard / Australia Ecology | |
Web / BBoard / Canada Ecology Wellington NZ museum digitizes 1.5M objects, 100K in 3D. |

Free internet communication makes sattelite communication unaccetble

but neededto motivate introduction of new technologies.


Fin

Previous chapter: Ubiquitous Computing - Next chapter: Entertainment and Education

CS99I home page.


Notes

elsewhere in text 3D: polarized, sitching glasses, TO omnview to place spots on 3D space (use for air traffic control with mutiple controllers) , fine print, helmet >>

Superman as control hand direction and stretch to speed to warp speeds

(inside body?) space and body seems real, the ffamiliar

somewhere

Deal with thousands of servers.[MIT gifford: cntent labels on servers. Mediatr agents.]

http:///www-psrg.lcs.mit.edu/ for 500 wais servers. with query completion, probability based on headlines in contebt.