CS99I Freshman Seminar

Winter 1997/1998.

Traveling the Information Highways: ENTertainment and EDUcation

Maps, Encounters, and Directions

Master copy on Earth.
Spotty draft, from earlier material 15Jan98, updates 7Mar1998
This material is

©Gio Wiederhold and CS99I students, Stanford University, 1998.

Previous chapter: Browsing - Next chapter: Digital Libraries

Chapter: Entertainment and Education

We are all teachers, and we always teach what we know [Shirley MacLaine (actress), on receiving the Cecil DeMille Lifetime award, January 1998]

ENTEDU.Intro

We combine entertainment and education in one chapter because of the great similarities on objectives, methods, and technology in both domains. Differences do exist, but we see that the electronic highways will accentuate the similarities.

Paper by Andrew Dickson

ENTEDU.Intro.knowledge

When we deal with education and entertainment three terms are used frequently, and often not precisely. Since they are crucial to further discussions, we will distinguish them throughout this book, and define them now.
  • Data: recordings of factual observations. They should be verifiable by measurements, unless the state of the world has changed. Historical data should be time-stamped to avoid confusion.
  • Knowledge: general rules about the world. They are learned, perhaps by looking at many data elements, or given as explicit constraints..
  • Information: combinations of relevant data and knowledge, useful to customer.
  • It is not surprising that the boundaries are often blurred. This book, for instance, is strictly data by this strict definition until it is read and acted on by a reader. A reader who now creates web pages according to the prescription in the HTML note uses it as information, as was the intent of the author, but for the author it was just data. The transmission of data from a source to a receiver who is not aware of those facts is an important method in creating information. The reader also had to contribute knowledge to convert this data into information: how to locate and access the particular web page, and background to know what terms as heading, list, etc. mean. A model is hence that we convert data to information, as sketched here.

    Fig. Data-to-Information: Converting Data to Information

    . If the information is used for an action, as enrolling in CS99I, the state of the world is changed, and hence the observed data have to be updated. We now have a data loop, As we handle more data we create useful abstractions, perhaps that CS freshman seminars are rarely filled. This rule then becomes knowledge for later reuse, or for transmittal to fellow students. Knowledge, being general, is more compact than data: the list of all CS freshman seminars.

    Teaching is concerned with transmission of knowledge. In order to substantiate the knowledge, teachers often use factual examples, since those can be verified by the students. Knowledge is powerful, but often less precise. There are some buildings at Stanford which do not have red roofs, but that will not confuse the pilot as long as it is largely true, and a distinction with respect to other neighborhoods in the San Francisco Peninsula.
    Fig. Data-Loop: Updating of Data and Knowledge

    .

    ENTEDU.Support One difference in Entertainment and Education been in support, and that has made a difference in presentation: Both domains seek support of benefactors, since taxes and receipts often fall short of costs. Entertainment, when it can claim benefits for the public good, as Public Broadcasting (PBS) and symphony orchestras, seeks tax support, and many of the best educational institutions levy substantial fees.

    In the electronic word collecting fees becomes more difficult, we will deal with these business aspects in the Chapter on Electronic Commerce.

    ENTEDU.History

    Both entertainment and education have a long history, and we will make only some points that relate to the electronic highways.

    ENTEDU.History.minstrels

    Distribution of information by wandering individuals.

    ENTEDU.History.colleges

    Distribution and generation of information by groups.

    ENTEDU.History.broadcasting

    Distribution of information by dissemination.

    Teaching

    Shows

    ENTEDU.Functions

    [[copied material, to be massively edited for content ]]

    NATIONAL CHALLENGE: Education, Training and Lifelong learning (ETLL) Education "White Paper"

    The Challenge.

    The enormous and accelerating advances in technology in the past century have brought many benefits to nearly everyone, particularly in the industrialized nations. While this advance can be expected to continue unabated, in order to accrue a corresponding increase in benefit, it is necessary to concentrate more specifically on the one unchanging key component of every system: The human being. The fundamental human characteristics of memory capacity, input and output bandwidths etc., are essentially unchangeable. What can and must be improved and maintained is the skill set with which each person is equipped. This requires more effective education at all levels, improved job specific skill-based training and comprehensive accessibility of Lifelong Learning resources that will enable each person to maintain and develop the skills necessary to live and work successfully in an environment of constantly changing demands and opportunities.

    The classical model of education and training is based on direct communication between a teacher and the student or students. To be effective this has demanded small groups of students, physically collocated with the teacher. Recent adoption of TV broadcasts has effectively enlarged the size of the class that can be addressed by one teacher by electronically expanding the classroom to include the locations served by the receiving sites. Despite the significant advantage TV has brought, the model is fundamentally unchanged: one teacher providing instruction to a number of students in real time. The use of video recording technology allows a student to "attend class" in a delayed time mode, but the teacher-classroom model remains essentially intact. There have also been other successful applications of technology to the education and training process, but the overwhelming majority of ETLL providers remain entrenched in the classic model, or introduce slight variations to it.

    The demands of true lifelong learning for all citizens can only be met by application of digital processing and advanced communications technologies that will enable substantive, individualized training and education to be delivered on-demand, at affordable costs, anywhere in the country and any time of the day or night.

    The Task

    It is time to replace the classical ETLL model with one that allows the individual to pursue new knowledge and skills in a manner that places the location, content and timing of the process fully under his or her control. The power and rapidly increasing availability of computers and digital communications now make it possible to realize this new model. Because the classical model is so entrenched, the change to a new concept will encounter scepticism and resistance on the part of many potential users as well as some established education and training institutions. Overcoming these obstacles will require careful planning, combined with compelling pilot implementations that clearly demonstrate effectiveness, usability and economic advantage of the new approach. Effective planning to bring HPCC technology to bear on the ETLL challenge will require partnership with practitioners of the "soft sciences" who are expert in the areas of job and personal skill assessments (industrial psychologists). It will also be necessary to bring technological expertise together with educational psychologists and practicing teachers and trainers, both to learn how best to apply technology to the teaching/learning process and to assist the practitioners in developing confidence in its efficacy. There is also a major need for the technological community to reduce the amount of specialized knowledge necessary to access and employ applications designed to benefit education and training professionals and individuals pursuing independent learning objectives.

    Role of computation, information processing, storage and communications in modernizing Education, Training and Lifelong learning.

    Computation: A significant computational challenge is presented by the need to more carefully define the skills needed to perform real job functions, assess the level of relevant skills in employees and applicants, and define the course modules necessary to make up the difference. The skill assessment work that is being done today is still largely a craft, performed by highly trained professionals without significant technological support. As a result it does not scale to the nationwide application that is necessary in order for the new, skill based ETLL model to become a reality. These processes need to be automated, which will require programs that are capable of dealing with large amounts of information and perform processing that can deal with and resolve the ambiguities that still plague the "soft sciences." Probabilistic rather than deterministic processing will be necessary, suggesting the need for Artificial Intelligence or "Fuzzy Logic" applications. To automate and scale the current "expert" processes will demand both sophisticated software and high capacity, high speed computational hardware.

    One of the principal advantages that digital technology can bring to the ETLL process is the ability for the student to "learn by doing" in simulated environments. Simulations have the advantage that they are safer and less expensive than the actual experience, and can be more controlled in ways that will optimize the learning process. Creating simulations of this quality will place large computational demands, particularly in those cases where the simulation itself must be capable of being reproduced on platforms that are inexpensive enough to be commonplace in the home or office.

    Information processing: The challenge confronting information processing stems largely from the scale and variety of the ETLL problem. Skill requirements for virtually any job need to be kept current and readily available to industry and individuals. Each individual needs to be able to assess his or her own skills in terms of a standard metric and use that assessment to make personal career or ETLL decisions. These personal data must be afforded security such that they are accessible only to those with a legitimate right to them, yet must be easily shared among those with a valid need as determined by the data owner. The interface between the individual and stored courseware modules (e.g. simulations) must facilitate the identification and presentation of the desired ETLL modules with minimal demand for specialized technical knowledge or skills. The Information processing system must also keep track of the ETLL records of millions of individuals and maintain currency of skill databases and skill-based ETLL course modules stored in simultaneous multiple locations on the network. The information processing system and must enable course modules to be selected from competitive offerings, acquired and paid for electronically (using electronic commerce processes).

    Communications: A new, technology based model of ETLL will require multiple concurrent access to training opportunities from anywhere in the country. This can only be accomplished through ubiquitous and reliable broadband communications, that can move the educational and training materials to the student on demand. Because some of these materials may consist of simulations or video segments, the bandwidth requirements, at least in the users local environment. will be large (several megahertz), however, movement of the modules into storage in the local environment could occur at slower rates than those required for presentation to the student. In many learning situations and for many learners, peer interaction is a valuable part of the learning process. Consequently the communications systems must support one-to-one and one-to-many communications, both in real time and delayed mode, at the option of the student (or mentor/teacher, where appropriate). In most cases, simple audio or email style communications will probably suffice; however, there will probably be occasions when true video conferencing will be essential to the learning process, and must consequently be supported. For certain users and in certain circumstances it can also be anticipated that mobile or portable communications will be used in ETLL applications. Storage: Large amounts of information storage capacity will be required to house the numerous multimedia course modules that will be necessary to realize the vision of ETLL available to anyone anywhere at any time. For reliability and to avoid frequent overload of long distance telecommunications, several large "warehouse" storage facilities serving population centers or geographical regions will probably be needed. The course modules in the various servers will largely be the same, and it will be necessary to ensure that any updates to course modules are made to all copies of those modules regardless of the location of the servers housing them. (It can be anticipated that regional warehouses will contain some course modules that are only of interest to the local population and which would consequently not be replicated in other warehouses.)

    Relationship of this HPCC/IITA National Challenge activity to those of other agency activities.

    High level interest in both the legislative and executive branch has stimulated vigorous activity in the ETLL area in virtually all government agencies: The Office of Science and Technology Policy Special Assistant for Education and Training is actively promoting technology based training and coordinating the activities of interdisciplinary working groups and committees The Department of Education is logically the focus of much of the current effort, particularly in the K-12 area. The establishment of a specific office for Educational Technology demonstrates the strength of their commitment, as does the intensity of their efforts to assist the State education agencies in adopting modern technology in their school systems.

    The Department of Labor, in line with the creation of the Skill Standards Board, has begun a process that will allow trainers, educators and individuals to specifically identify the skills required for success on the job.

    The Department of Commerce, through the NTIA, is currently in the process of reviewing grant applications in anticipation of awarding millions of dollars worth of grants to support technology planning activities in the various states.

    The National Science Foundation continues to support educational research and development through grants and has also supported development of technology based courseware in the Math and Science areas.

    NASA actively supports K-12 education, placing resource material on the Internet for access by students doing school assignment and projects.

    Like the NASA, the Department of Energy maintains instructional material available to students over the Internet. As with the NASA offerings, The DOE materials receive high praise from the teachers whose students have been able to use them.

    The Department of Agriculture fosters distance learning through the Land Grant Colleges, and Americans Communicating Electronically. The Department of Defense has been very active in use of simulation and networks for training purposes, and is in dialogue with non-defense agencies with the goal of making appropriate learning modules available to the civilian population.

    NSA, in concert with other federal agencies, has joined with a group of industrial onsumers of job related training, forming a consortium to pilot applications of Job Skills Analysis leading to certified technology based training modules to develop or enhance needed skills.

    Included in this effort is the establishment of the non-profit American Training Standards Institute to coordinate and manage the skills assessment and course module certification processes.

    ENTEDU.Functions.broadcast

    ENTEDU.Functions.multicast

    ENTEDU.Functions.on-demand

    Here's a response I sent to Paul Losleben last week around the "education on-demand" concept and some approaches we are working on to deliver Stanford programming to industry. I've had a response >From Ullman, Tobagi and Harris and will be following up with them in the near future.
    Andy DiPaolo
    Assistant Dean, School of Engineering
    --------------------------------------------------------------------
    Paul,
    I wanted to respond to your recent message regarding on-demand education. Based on our discussions with engineering and education/training managers at SITN's member companies you're right on target. What we're hearing is that practicing engineers, and the companies that pay for their education, want "control" of the teaching and learning. They want control over the place and time (ideally at the desktop or even at home), pace, and even the scope and sequence of the material --- and not be constrained by the barriers imposed by the traditional on-campus class. If you wish, I can send you an outline of our assessment of the industry environment and engineering education. The idea of smaller increments of instruction is something SITN has been working on with Stanford faculty in the development of non-credit short courses. These are programs that average five hours in length and are typically broken into one hour increments. The programs are taped and offered as a five hour course on satellite and on video tape. In the future these courses could be divided up and offered as modules available on-line from video servers. The products (including regular 30 hour courses designed to be broken into stand alone modules) could also be converted into CD-ROMs with the entire course indexed for easy access. In fact, we have developed a CD-ROM demo of a repurposed engineering class that might serve as a model for future development. Stanford faculty have offered about 15 short courses over the last 20 months in a range of engineering disciplines. Examples include: C++ (Cheriton), Digital Circuits (DeMicheli), T-CAD (Dutton), Composites (Tsai), Cryptography (Hellman), Turbulence Modeling (Bradshaw) and Design for Assembly (Barkan). Our target is to have at least one of these a month, eventually ratcheting it up to three a month using both Stanford faculty and distinguished industry experts. As Jeff Ullman alluded to in his recent note, a few of us (Jeff, Fouad Tobagi, Dale Harris, Dwain Fullerton et al) have been trying to figure a way to run an experiment to get some of the School's televised classes and short courses available to customers on-demand using video servers. With the real possibility of some resources to get this idea started (and Gibbons endorsement), I suggest those who are interested gather to talk about next steps. It would probably make sense to have a few potential industry customers in attendance at one of these sessions so that they can provide a reality check about what we have in mind-they are the ones who are going to pay to receive the programming. For example, Hewlett-Packard's corporate engineering education group is currently delivering our 250 courses to the workstation as a live signal, but they are very interested in having a menu driven "pull" system. Their idea is to have SITN repurpose existing video product into smaller increments so that an H-P engineer can "pull" modules of instruction or information when and where needed. I know H-P would have an interest in discussing this concept with Stanford engineering faculty. Since H-P represents over 50% of the School's external engineering education business we should listen to their suggestions. H-P and other SITN companies might also have resources they wish to add to the mix..

    ENTEDU.Functions.feedback

    ENTEDU.Functions.collaboratory

    Gio, we describe below some ideas regarding data base tools to capture the value added to data during a collaboration session involving a distributed team supported by electronic collaboration tools-the collaboratory. These ideas would augment our ongoing collaboratory development, however, data base activities are not now supported. I will look forward to your reaction and guidance as to how to proceed. Best wishes, and warm regards, Bob

    ENTEDU.Technology

    Both entertainment and education require high transmission bandwidth. There are two components to this issue:
    1. The material is voluminous, and often includes images and video clips
    2. The number of recipients is large

    ENTEDU.Technology.multimedia

    The material we deal with in entertainment and education contains text, images, and today often voice and video as well. Smell is still rare. For each of those data representations there are a variety of technological issues and solutions. We will deal with them individually here, but must keep in mind that in the end an integrated presentation is desired, where text, images, voice, and video are synchronized. Achieving such a synchronization in the shared communication lines of the Internet is still a major challenge.

    Multimedia in Japan Today and Tomorrow
    By Iwasaki Ieo

    The mass media in Japan is placing heavy coverage on information about multimedia. Seminars, symposia, and expositions on this theme are being held with incredible frequency and numerous books, ranging from educational primers to fairly specialized works, are being published. To say that Japan is now in the midst of multimedia fever is indeed no exaggeration. [Hyperbole and exaggerated extrapolations are everywhere, as in the US. The multimedia excitement significantly overlaps that associated with the Internet, on which four books appeared in bookstores here last week. Several ministries are competing for major control of this new technology, including the Ministry of Posts and Telecommunications (MOPT, but usually abbreviated MPT), and the Ministry of International Trade and Industry (MITI), but also including others such as the Ministry of Construction, and the Ministry of Education (Mombusho). Japan is not alone in Asia in getting involved in multimedia. Recently, the Korean Daewoo Electronics Company announced that it would invest US$2B over the next ten years in various multimedia services and equipment and will get involved in cable and satellite broadcasting, the production of compact discs and CD-ROM software, electronic printing, films and theaters, with an intention of being one of the world's leading multimedia companies by 2015. Daewoo's plans begin with the establishment of the Daewoo Cinema Network for CA-TV, and will follow with satellite broadcasting, HDTV, etc. DKK] The video game equipment sector is also looking ahead by putting a steady stream of CD-ROM players, designed for the future multimedia age, on the market. Companies are employing animation compression technologies in the production of all of these game machines, whose high functionality, designed for the multimedia era, is used as a selling point.

    Appliance manufacturers are also selling home video CD players with simple, built-in interactive functions while personal computers equipped with CD-ROM drives, or, "multimedia PCs," that have recently entered the market are becoming the mainstream type of equipment.

    The direct impetus for this multimedia boom was US Vice President Al Gore's announcement in September 1993 of an action plan for the construction of an "information superhighway" that would use new infrastructure to raise educational, medical, and other social welfare levels by the year 2010. This had a tremendous impact on Japan. The average person had considered multimedia something indistinct and in the distant future, but it had suddenly begun to take shape in Japan.

    DAVID L. TENNENHOUSE
    Telemedia, Networks & Systems Group
    Laboratory For Computer Science
    Massachusetts Institute Of Technology
    The first wave of media applications, i.e., those that simply copy and store multi-media objects, will be followed by a second wave of computation-intensive applications-that actively process the media-based information. These applications extend the requirement for `video to the desktop' to a more general requirement for `media to the application'.

    The ViewStation architecture embodies a software-oriented approach that supports this `media to the application' paradigm. Our programming environment makes the raw media data, e.g., the actual video pixels and audio samples, accessible to the applications.

    We have derived a set of architectural guidelines and have constructed an integrated system that supports media-intensive applications. The principal components of our `stack' are:

    [[end of copied material]]

    ENTEDU.Technology.compression

    Need

    [[Copied material, to be massively edited]]
    Garson emphasized the difficulties of transmitting images over the networks-a full page image, uncompressed, could take 2 hours to transmit at 1200 baud, 14 minutes at 9600, 2.4 minutes at 56000, and 5 seconds on a T1 line.

    MPEG

    BERND WOLFINGER
    Hamburg University, Germany
    Computer Science Department
    berndw@icsi.berkeley.edu
    "Efficiency of PET and MPEG Encoding for Video Streams: Analytical QoS Evaluations"
    A promising solution in the transmission of video streams via communication networks is to use forward error control in order to mask some of the transmission errors and data losses at the receiving side. The redundancy required, however, to achieve error correction without retransmissions will consume some transmission capacity of a network therefore possibly enforcing stronger compression of the video stream to be transmitted.

    In this talk we introduce analytical models which allow us to determine the expected frame loss probability of MPEG encoded video streams assuming communication via constant bit rate (CBR) virtual circuits with data losses and/or unrecoverable transmission errors. The models can be used to compare the quality-of-service (QoS) as observed on Application Layer for encoding schemes without and with forward error control, possibly making use of different prioritization of transmitted data units (in particular applying PET encoding algorithm as designed at ICSI). The talk covers preliminary results and is conceived as a forum for critical discussions on the approach chosen

    .

    ENTEDU.Technology.redistribution

    ENTEDU.Technology.sharing

    Advanced Multimedia Database Tools to Support Distributed Scientific Team Analysis and Collaboration

    INVESTIGATORS
    Elke A. Rundensteiner
    Assistant Professor
    Software Systems Research Laboratory
    Electrical Engineering and Computer Science Dept.
    The University of Michigan
    Ann Arbor, Michigan 48109-2122
    Phone: (313) 936-2971
    Fax: (313)
    Email: rundenst@eecs.umich.edu

    Robert Clauer
    Research Scientist
    Atmospheric, Oceanic and Space Sciences Department
    Ann Arbor, Michigan 48109-2143
    Email: clauer@pitts.sprl.umich.edu

    Jason Daida
    Terry Weymouth
    Atul Prakash

    ABSTRACT:
    ADVANCED MULTIMEDIA DATABASE TOOLS TO SUPPORT DISTRIBUTED SCIENTIFIC TEAM ANALYSIS AND COLLABORATION

    We would like to design, implement, test, and distribute a software tool-kit that supports data analysis by a geographically distributed science group.

    Our vision, and the purpose of our proposal, is to enable a distributed team of scientists to work together with their data in a more productive fashion. The distributed scientific team will be supported by an emerging electronic infrastructure called the National Information Infrastructure (NII) and new object-oriented multimedia database technologies. Building upon collaboration tools being developed under separate NSF support for ground-based science (NSF Upper Atmosphere Research Collaboratory, or UARC), we propose to leverage off this NSF project to implement a distributed team collaboration facilitator.

    To create the facilitator, we will design and implement the software technology for researchers to jointly interact with data, add annotation and discussion that can be accumulated, retrieved, edited, added to, using distributed multimedia database technology. Key technologies will include distributed database tools to support archiving collaboration sessions, as well as the retrieving and updating of these sessions. The proposed suite of software tools would thus support the team analysis process from initial data collection all the way through the publication of new results and knowledge.

    We will utilize the existing Upper Atmospheric Research Collaboratory (UARC) collaboration tools developed through NSF support. We will augment the UARC collaboration tools with object-oriented, multi-media, data base tools to capture information from collaboration sessions. We will design and implement the software technology for researchers to jointly interact with data, add annotation and discussion that can be accumulated, retrieved, edited, added to, using distributed multimedia database technology. Key technology will include distributed database tools to support archival of collaborations sessions and retrieval and update of these sessions. The proposed suite of software tools would thus support the team analysis process from initial quick-look data all the way through publication of new results and knowledge.

    By providing a basic collaboration framework, we envision each member of a research team to be able to:

    1. (1) link electronically through their workstations
    2. (2) utilize shared data display windows
    3. (3) rapidly prototype a new view of the shared data
    4. (4) exchange email and voice dialogue
    5. (5) share drawing tools
    6. (6) make annotations upon the data.
      By providing multimedia database tools within this framework, we now envision members of this research team to also be able to:
    7. (7) store and save an entire collaboration session
    8. (8) fast-forward, rewind, skip-forward, skip-backward, pause and play any previously stored collaboration session.
    9. (9) replay any previous collaboration session within the context of another collaboration session.
    10. (10) search, browse, and retrieve content from within any stored session.
    To accomplish this level of functionality for the database tools, we propose to develop a new type of dynamic metadata that should be saved in a multimedia object-oriented database system. Note that our usage of the term metadata does not simply refer to descriptive data about the raw scientific data (e.g., netCDF, HDF), but also descriptive data about the process-context in which the raw scientific data appears (i.e., context-sensitive metadata that observes the temporal and distributed relationships between multimedia artifacts). Such multimedia objects must be synchronized with the scientific data being investigated, in addition to establishing possibly complex interrelationships among different types and groups of multimedia annotations.

    Members of the research team will be linked electronically through their workstations, utilizing shared data display windows, typed and voice dialogue, shared drawing tools and annotation upon the data. The dialogue, discussion, annotation and drawings which result from such collaboration sessions form a new type of dynamic metadata which should be saved in a multimedia object-oriented data base system. Note that our usage of the term metadata does not simply refer to descriptive data about the raw scientific data, as commonly assumed in the community, but rather we are targeting truly diverse multimedia data including hand drawn sketches representing graphical interpretations of phenomena observed on the scientific data, general conversations about the type of scientific observations or even the scientific process in general. Such multimedia objects must be synchronized with the scientific data being investigated, in addition to establishing possibly complex interrelationships among different types and groups of multimedia annotations.

    In short, we will employ state-of-the art digital library technology to achieve this level of collaboration support for scientific processes, consisting of the following components:

    IMPACT

    The proposed research into constructing distributed multimedia object-oriented database tools supporting scientific collaborations will clearly make a major impact on the ease with which scientists- distributed geographically among several institutions-can advance in their scientific interactions to generate publications of the studied data. Facilitating teams of investigators to collaboratively study science data should increase their productivity. More importantly, however, while the collection of 'raw' scientific data is important in general, the collection of interpretations of such scientific data by 'the' experts will be a true valued-added asset. Note that the augmentation of the archived science data sets with valued-added interpretations generated by 'the' experts of the data will be a natural by-product of their scientific studies, rather than requiring a pain-staking effort by the scientists in documenting their findings. Indeed, documentation tasks often receive a low priority because they are tedious, even though such tasks are often deemed important to the overall scientific inquiry.

    Furthermore, given that such annotations and interpretations are typically of several orders of magnitude smaller in quantity in comparison to the amount of actual data, it would be much more feasible to successfully interrogate and retrieve information based on these interpretations. In fact, these interpretations will typically focus on 'interesting' data sets, thus providing key pointers to meaningful features that would otherwise be buried in a sea of information. The proposed multimedia database will be a key technology in extracting truly useful information from scientific investigations. It will provide support for 'replaying' of previous scientific sessions, which would allow for annotating or revising previous interpretations with new information. Furthermore, this will bring scientific data and the scientific process itself into a format so that it can be utilized for educational purposes to demonstrate the scientific process involved in studying and learning from data.

    As noted above, this work will be undertaken in a testbed environment, utilizing the xisting UARC collaboratory testbed. Ultimately, however, the results of this proposed effort will have impact far beyond just the space science community. Many science team in all disciplines could benefit from the technology that we propose to develop. The generic quality of this technology could impact distributed teams who must work together over data in most all scientific disciplines, in engineering, in business, and in education. (For example, students could learn about the process of satellite image interpretation by collaborating with students at other distant locations to obtain "live" ground truth information.) While we are proposing a testbed in a scientific context, we feel that the impact of the technology will be much broader, affecting all collaborating teams in all manner of activity.

    [[end of copied material]]

    ENTEDU.Technology.compression

    Compression reduces the volume of data to be transmitted of the networks and stored on the storage devices by taking advantage of the inherent redundancy in the representation of information that we use. Redundancy in writing allows us to understand a sentence in the presence of some typos or smudged print. Redundancy in speaking allows us to understand a message even when a slamming door makes causes us not to hear a word. A car ad with a staple in the center still conveys its message. We can follow a film even if distracted for a minute. However, for each of the scenarios we can make up instances where the loss would be significant. For each of these media we have to consider what is truly redundant and what is of marginal benefit to the intended receiver.

    Even with increasing bandwidths, there is still a need for compression. Compression techniques have been developed to provide compression ratios varying from as low as 10:1 to as high as 60,000:1. Compression can be applied to for texts, sound, images (ranging from line drawings to animation or moving pictures) and video.

    Ideally we do not want to loose any information in the sequence Compression, Transmission, Storage, Transmission, Decompression. Lossless compression is essential when even one bit change can make a crucial difference in the result say an equation E= mc^2 is changed to E= mc^3. Similar precision is needed for musical notation. In general text has to be compressed without loss, because its redundancy is low (~50%). Formatting information associated with text may have more redundancy. Voice, images, and video contain much redundancy, so that the loss of a few bits may not be noticed, even though the receiver may still feel uncomfortable about any loss. A radiologist, receiving an X-ray, will be legitimately concerned about any loss. However, images used for entertainment and education are deemed to be less critical, so that here lossy compression dominates, since much greater compression ratios can be achieved.

    Compression does require computation capability. Since we expect that compression will be less frequent than decompression; material is read more often than written, most schemes put much computational effort into compression, and set the results so that decompression is fast, preferably as fast as the data can be received. The most powerful compression methods investigate an entire document for redundancy, create tables of recurring patterns, and then transmit first those tables, and then the skeleton of the document, where each occurrence of a pattern is replaced by a reference. The delay implicit in this process is significant, so that often the redundant information is determined dynamically, and the pattern entries are embodied with the document as they are found.

    .

    Lossless Compression

    The principle for lossless compression is to examine the bit-patterns of the data for repeated patterns. [[to document. page, word, related to indexing
    example @ Gtext, 750Mwords 900,000 unique, compresible by word based Huffman coding to 20MB , but 10.5 dict, but unique words can be omitted 5M --> prefix tree 1M ['canonical huffman coding] Index compressed to 6% by delta coding bitmap (partially due doc.granularity/ else about 15-20%)

    [[use material from CS545I lecture]]
    Examples of lossless compression are the Graphic Image Format (GIF) (8-bit color) often used in Web pages, and the BitMaP (BMP) (24-bit color) used initially by Microsoft Windows and IBM OS/2. In GIF files each 8-bit byte points to a palette table of 256 colors. That palette, or a reference to some standard palette, is transmitted with each image.

    Other techniques such as wavelets and fractals,/em> are also being incorporated into compression techniques along with a variety of error detection and correction techniques.

    Lossy Compression

    [[use material from CS545I lecture]]
    Examples of lossy compression are the image JPEG format established by the Joint Photographic Experts Group, also used in Web pages, and the video MPEG formats established by the Motion Pictures Experts Group.

    Lossy compression disables effectively some protection techniques, as digital ENTEDU.Technology.indexing

    ENTEDU.Alternatives

    ENTEDU.Bio

    ENTEDU.Conclusion

    ENTEDU.Lists


    Fin

    Previous chapter: Browsing - Next chapter: Digital Libraries

    List of all Chapters.
    CS99I CS99I home page.