We are all teachers, and we always teach what we know [Shirley MacLaine (actress), on receiving the Cecil DeMille Lifetime award, January 1998]
. If the information is used for an action, as enrolling in CS99I, the state of the world is changed, and hence the observed data have to be updated. We now have a data loop, As we handle more data we create useful abstractions, perhaps that CS freshman seminars are rarely filled. This rule then becomes knowledge for later reuse, or for transmittal to fellow students. Knowledge, being general, is more compact than data: the list of all CS freshman seminars.
Teaching is concerned with transmission of knowledge. In order to
substantiate the knowledge, teachers often use factual examples, since
those can be verified by the students. Knowledge is powerful, but
often less precise. There are some buildings at Stanford which do not
have red roofs, but that will not confuse the pilot as long as it is
largely true, and a distinction with respect to other neighborhoods in
the San Francisco Peninsula.
.
In the electronic word
collecting fees becomes more difficult, we will deal with these
business aspects in the Chapter on Electronic
Commerce.
The enormous and accelerating advances in technology in the past century have brought many benefits to nearly everyone, particularly in the industrialized nations. While this advance can be expected to continue unabated, in order to accrue a corresponding increase in benefit, it is necessary to concentrate more specifically on the one unchanging key component of every system: The human being. The fundamental human characteristics of memory capacity, input and output bandwidths etc., are essentially unchangeable. What can and must be improved and maintained is the skill set with which each person is equipped. This requires more effective education at all levels, improved job specific skill-based training and comprehensive accessibility of Lifelong Learning resources that will enable each person to maintain and develop the skills necessary to live and work successfully in an environment of constantly changing demands and opportunities.
The classical model of education and training is based on direct communication between a teacher and the student or students. To be effective this has demanded small groups of students, physically collocated with the teacher. Recent adoption of TV broadcasts has effectively enlarged the size of the class that can be addressed by one teacher by electronically expanding the classroom to include the locations served by the receiving sites. Despite the significant advantage TV has brought, the model is fundamentally unchanged: one teacher providing instruction to a number of students in real time. The use of video recording technology allows a student to "attend class" in a delayed time mode, but the teacher-classroom model remains essentially intact. There have also been other successful applications of technology to the education and training process, but the overwhelming majority of ETLL providers remain entrenched in the classic model, or introduce slight variations to it.
The demands of true lifelong learning for all citizens can only be met by application of digital processing and advanced communications technologies that will enable substantive, individualized training and education to be delivered on-demand, at affordable costs, anywhere in the country and any time of the day or night.
One of the principal advantages that digital technology can bring to the ETLL process is the ability for the student to "learn by doing" in simulated environments. Simulations have the advantage that they are safer and less expensive than the actual experience, and can be more controlled in ways that will optimize the learning process. Creating simulations of this quality will place large computational demands, particularly in those cases where the simulation itself must be capable of being reproduced on platforms that are inexpensive enough to be commonplace in the home or office.
Information processing: The challenge confronting information processing stems largely from the scale and variety of the ETLL problem. Skill requirements for virtually any job need to be kept current and readily available to industry and individuals. Each individual needs to be able to assess his or her own skills in terms of a standard metric and use that assessment to make personal career or ETLL decisions. These personal data must be afforded security such that they are accessible only to those with a legitimate right to them, yet must be easily shared among those with a valid need as determined by the data owner. The interface between the individual and stored courseware modules (e.g. simulations) must facilitate the identification and presentation of the desired ETLL modules with minimal demand for specialized technical knowledge or skills. The Information processing system must also keep track of the ETLL records of millions of individuals and maintain currency of skill databases and skill-based ETLL course modules stored in simultaneous multiple locations on the network. The information processing system and must enable course modules to be selected from competitive offerings, acquired and paid for electronically (using electronic commerce processes).
Communications: A new, technology based model of ETLL will require multiple concurrent access to training opportunities from anywhere in the country. This can only be accomplished through ubiquitous and reliable broadband communications, that can move the educational and training materials to the student on demand. Because some of these materials may consist of simulations or video segments, the bandwidth requirements, at least in the users local environment. will be large (several megahertz), however, movement of the modules into storage in the local environment could occur at slower rates than those required for presentation to the student. In many learning situations and for many learners, peer interaction is a valuable part of the learning process. Consequently the communications systems must support one-to-one and one-to-many communications, both in real time and delayed mode, at the option of the student (or mentor/teacher, where appropriate). In most cases, simple audio or email style communications will probably suffice; however, there will probably be occasions when true video conferencing will be essential to the learning process, and must consequently be supported. For certain users and in certain circumstances it can also be anticipated that mobile or portable communications will be used in ETLL applications. Storage: Large amounts of information storage capacity will be required to house the numerous multimedia course modules that will be necessary to realize the vision of ETLL available to anyone anywhere at any time. For reliability and to avoid frequent overload of long distance telecommunications, several large "warehouse" storage facilities serving population centers or geographical regions will probably be needed. The course modules in the various servers will largely be the same, and it will be necessary to ensure that any updates to course modules are made to all copies of those modules regardless of the location of the servers housing them. (It can be anticipated that regional warehouses will contain some course modules that are only of interest to the local population and which would consequently not be replicated in other warehouses.)
The Department of Labor, in line with the creation of the Skill Standards Board, has begun a process that will allow trainers, educators and individuals to specifically identify the skills required for success on the job.
The Department of Commerce, through the NTIA, is currently in the process of reviewing grant applications in anticipation of awarding millions of dollars worth of grants to support technology planning activities in the various states.
The National Science Foundation continues to support educational research and development through grants and has also supported development of technology based courseware in the Math and Science areas.
NASA actively supports K-12 education, placing resource material on the Internet for access by students doing school assignment and projects.
Like the NASA, the Department of Energy maintains instructional material available to students over the Internet. As with the NASA offerings, The DOE materials receive high praise from the teachers whose students have been able to use them.
The Department of Agriculture fosters distance learning through the Land Grant Colleges, and Americans Communicating Electronically. The Department of Defense has been very active in use of simulation and networks for training purposes, and is in dialogue with non-defense agencies with the goal of making appropriate learning modules available to the civilian population.
NSA, in concert with other federal agencies, has joined with a group of industrial onsumers of job related training, forming a consortium to pilot applications of Job Skills Analysis leading to certified technology based training modules to develop or enhance needed skills.
Included in this effort is the establishment of the non-profit American Training Standards Institute to coordinate and manage the skills assessment and course module certification processes.
Multimedia in Japan Today and Tomorrow
The mass media in Japan is placing heavy coverage on information about
multimedia. Seminars, symposia, and expositions on this theme are being
held with incredible frequency and numerous books, ranging from
educational primers to fairly specialized works, are being published.
To say that Japan is now in the midst of multimedia fever is indeed no
exaggeration.
[Hyperbole and exaggerated extrapolations are everywhere, as in the US.
The multimedia excitement significantly overlaps that associated with the
Internet, on which four books appeared in bookstores here last week.
Several ministries are competing for major control of this new
technology, including the Ministry of Posts and Telecommunications
(MOPT, but usually abbreviated MPT), and the Ministry of International
Trade and Industry (MITI), but also including others such as the Ministry
of Construction, and the Ministry of Education (Mombusho). Japan is not
alone in Asia in getting involved in multimedia. Recently, the Korean
Daewoo Electronics Company announced that it would invest US$2B over the
next ten years in various multimedia services and equipment and will get
involved in cable and satellite broadcasting, the production of compact
discs and CD-ROM software, electronic printing, films and theaters, with an
intention of being one of the world's leading multimedia companies by 2015.
Daewoo's plans begin with the establishment of the Daewoo Cinema Network
for CA-TV, and will follow with satellite broadcasting, HDTV, etc. DKK]
The video game equipment sector is also looking ahead by putting a
steady stream of CD-ROM players, designed for the future multimedia age,
on the market. Companies are employing animation compression
technologies in the production of all of these game machines, whose high
functionality, designed for the multimedia era, is used as a selling
point.
Appliance manufacturers are also selling home video CD players with
simple, built-in interactive functions while personal computers equipped
with CD-ROM drives, or, "multimedia PCs," that have recently entered the
market are becoming the mainstream type of equipment.
The direct impetus for this multimedia boom was US Vice President Al
Gore's announcement in September 1993 of an action plan for the
construction of an "information superhighway" that would use new
infrastructure to raise educational, medical, and other social welfare
levels by the year 2010. This had a tremendous impact on Japan. The
average person had considered multimedia something indistinct and in the
distant future, but it had suddenly begun to take shape in Japan.
DAVID L. TENNENHOUSE
The ViewStation architecture embodies a
software-oriented approach that supports this `media to the
application' paradigm. Our programming environment makes the raw
media data, e.g., the actual video pixels and audio samples,
accessible to the applications. We have derived a set of
architectural guidelines and have constructed an integrated system
that supports media-intensive applications. The principal components
of our `stack' are:
[[end of copied material]]
In this talk we introduce analytical models which allow us to
determine the expected frame loss probability of MPEG encoded video
streams assuming communication via constant bit rate (CBR) virtual
circuits with data losses and/or unrecoverable transmission errors.
The models can be used to compare the quality-of-service (QoS) as
observed on Application Layer for encoding schemes without and with
forward error control, possibly making use of different prioritization
of transmitted data units (in particular applying PET encoding
algorithm as designed at ICSI). The talk covers preliminary results
and is conceived as a forum for critical discussions on the approach
chosen .
INVESTIGATORS
Robert Clauer
Jason Daida
Our vision, and the purpose of our proposal, is to enable a
distributed team of scientists to work together with their data in a
more productive fashion. The distributed scientific team will be
supported by an emerging electronic infrastructure called the National
Information Infrastructure (NII) and new object-oriented multimedia
database technologies. Building upon collaboration tools being
developed under separate NSF support for ground-based science (NSF
Upper Atmosphere Research Collaboratory, or UARC), we propose to
leverage off this NSF project to implement a distributed team
collaboration facilitator.
To create the facilitator, we will design and implement the software
technology for researchers to jointly interact with data, add
annotation and discussion that can be accumulated, retrieved, edited,
added to, using distributed multimedia database technology. Key
technologies will include distributed database tools to support
archiving collaboration sessions, as well as the retrieving and
updating of these sessions. The proposed suite of software tools would
thus support the team analysis process from initial data collection
all the way through the publication of new results and knowledge.
We will utilize the existing Upper Atmospheric Research Collaboratory
(UARC) collaboration tools developed through NSF support. We will
augment the UARC collaboration tools with object-oriented,
multi-media, data base tools to capture information from collaboration
sessions. We will design and implement the software technology for
researchers to jointly interact with data, add annotation and
discussion that can be accumulated, retrieved, edited, added to, using
distributed multimedia database technology. Key technology will
include distributed database tools to support archival of
collaborations sessions and retrieval and update of these sessions.
The proposed suite of software tools would thus support the team
analysis process from initial quick-look data all the way through
publication of new results and knowledge.
By providing a basic collaboration framework, we envision each member
of a research team to be able to:
Members of the research team will be linked electronically through
their workstations, utilizing shared data display windows, typed and
voice dialogue, shared drawing tools and annotation upon the data. The
dialogue, discussion, annotation and drawings which result from such
collaboration sessions form a new type of dynamic metadata which
should be saved in a multimedia object-oriented data base system. Note
that our usage of the term metadata does not simply refer to
descriptive data about the raw scientific data, as commonly assumed in
the community, but rather we are targeting truly diverse multimedia
data including hand drawn sketches representing graphical
interpretations of phenomena observed on the scientific data, general
conversations about the type of scientific observations or even the
scientific process in general. Such multimedia objects must be
synchronized with the scientific data being investigated, in addition
to establishing possibly complex interrelationships among different
types and groups of multimedia annotations.
In short, we will employ state-of-the art digital library technology
to achieve this level of collaboration support for scientific
processes, consisting of the following components:
Furthermore, given that such annotations and interpretations are
typically of several orders of magnitude smaller in quantity in
comparison to the amount of actual data, it would be much more
feasible to successfully interrogate and retrieve information based on
these interpretations. In fact, these interpretations will typically
focus on 'interesting' data sets, thus providing key pointers to
meaningful features that would otherwise be buried in a sea of
information. The proposed multimedia database will be a key technology
in extracting truly useful information from scientific investigations.
It will provide support for 'replaying' of previous scientific
sessions, which would allow for annotating or revising previous
interpretations with new information. Furthermore, this will bring
scientific data and the scientific process itself into a format so
that it can be utilized for educational purposes to demonstrate the
scientific process involved in studying and learning from data.
As noted above, this work will be undertaken in a testbed environment,
utilizing the xisting UARC collaboratory testbed. Ultimately, however,
the results of this proposed effort will have impact far beyond just
the space science community. Many science team in all disciplines
could benefit from the technology that we propose to develop. The
generic quality of this technology could impact distributed teams who
must work together over data in most all scientific disciplines, in
engineering, in business, and in education. (For example, students
could learn about the process of satellite image interpretation by
collaborating with students at other distant locations to obtain
"live" ground truth information.) While we are proposing a testbed in
a scientific context, we feel that the impact of the technology will
be much broader, affecting all collaborating teams in all manner of
activity.
[[end of copied material]]
Even with increasing bandwidths, there is still a need for
compression. Compression techniques have been developed to provide
compression ratios varying from as low as 10:1 to as high as 60,000:1.
Compression can be applied to for texts, sound, images (ranging from
line drawings to animation or moving pictures) and video.
Ideally we do not want to loose any information in the sequence
Compression, Transmission, Storage, Transmission, Decompression. Lossless compression is essential when even one
bit change can make a crucial difference in the result say an equation
E= mc^2 is changed to E= mc^3. Similar precision is
needed for musical notation. In general text has to be compressed
without loss, because its redundancy is low (~50%).
Formatting information associated with text may have more redundancy.
Voice, images, and video contain much redundancy, so that the loss of
a few bits may not be noticed, even though the receiver may still feel
uncomfortable about any loss. A radiologist, receiving an X-ray, will
be legitimately concerned about any loss. However, images used for
entertainment and education are deemed to be less critical, so that
here lossy compression dominates, since much
greater compression ratios can be achieved.
Compression does require computation capability. Since we expect that
compression will be less frequent than decompression; material is read
more often than written, most schemes put much computational effort
into compression, and set the results so that decompression is fast,
preferably as fast as the data can be received. The most powerful
compression methods investigate an entire document for redundancy,
create tables of recurring patterns, and then transmit first those
tables, and then the skeleton of the document, where each occurrence
of a pattern is replaced by a reference. The delay implicit in this
process is significant, so that often the redundant information is
determined dynamically, and the pattern entries are embodied with the
document as they are found. .
[[use material from CS545I lecture]]
Other techniques such as wavelets and
fractals,/em> are also being incorporated into compression
techniques along with a variety of error detection and correction
techniques.
Lossy compression disables effectively some
protection techniques, as digital ENTEDU.Technology.indexing
List of all Chapters.
Fig. Data-Loop:
Updating of Data and Knowledge ENTEDU.Support
One difference in Entertainment and Education been in support, and that has made a difference in presentation:
Both domains seek support of benefactors, since taxes and receipts
often fall short of costs. Entertainment, when it can claim benefits
for the public good, as Public Broadcasting (PBS) and symphony
orchestras, seeks tax support, and many of the best educational
institutions levy substantial fees.ENTEDU.History
Both entertainment and education have a long history, and we will make only some points that relate to the electronic highways.
ENTEDU.History.minstrels
Distribution of information by wandering individuals.
ENTEDU.History.colleges
Distribution and generation of information by groups.
ENTEDU.History.broadcasting
Distribution of information by dissemination.
Teaching
Shows
ENTEDU.Functions
[[copied material, to be massively edited for content ]]
NATIONAL CHALLENGE: Education, Training and Lifelong learning (ETLL) Education "White Paper"
The Challenge.
The Task
It is time to replace the classical ETLL model with one that allows the individual to pursue new knowledge and skills in a manner that places the location, content and timing of the process fully under his or her control. The power and rapidly increasing availability of computers and digital communications now make it possible to realize this new model. Because the classical model is so entrenched, the change to a new concept will encounter scepticism and resistance on the part of many potential users as well as some established education and training institutions. Overcoming these obstacles will require careful planning, combined with compelling pilot implementations that clearly demonstrate effectiveness, usability and economic advantage of the new approach. Effective planning to bring HPCC technology to bear on the ETLL challenge will require partnership with practitioners of the "soft sciences" who are expert in the areas of job and personal skill assessments (industrial psychologists). It will also be necessary to bring technological expertise together with educational psychologists and practicing teachers and trainers, both to learn how best to apply technology to the teaching/learning process and to assist the practitioners in developing confidence in its efficacy. There is also a major need for the technological community to reduce the amount of specialized knowledge necessary to access and employ applications designed to benefit education and training professionals and individuals pursuing independent learning objectives. Role of computation, information processing, storage and communications in modernizing Education, Training and Lifelong learning.
Computation: A significant computational challenge is presented by the need to more carefully define the skills needed to perform real job functions, assess the level of relevant skills in employees and applicants, and define the course modules necessary to make up the difference. The skill assessment work that is being done today is still largely a craft, performed by highly trained professionals without significant technological support. As a result it does not scale to the nationwide application that is necessary in order for the new, skill based ETLL model to become a reality. These processes need to be automated, which will require programs that are capable of dealing with large amounts of information and perform processing that can deal with and resolve the ambiguities that still plague the "soft sciences." Probabilistic rather than deterministic processing will be necessary, suggesting the need for Artificial Intelligence or "Fuzzy Logic" applications. To automate and scale the current "expert" processes will demand both sophisticated software and high capacity, high speed computational hardware. Relationship of this HPCC/IITA National Challenge activity to those of other agency activities.
High level interest in both the legislative and executive branch has stimulated vigorous activity in the ETLL area in virtually all government agencies: The Office of Science and Technology Policy Special Assistant for Education and Training is actively promoting technology based training and coordinating the activities of interdisciplinary working groups and committees The Department of Education is logically the focus of much of the current effort, particularly in the K-12 area. The establishment of a specific office for Educational Technology demonstrates the strength of their commitment, as does the intensity of their efforts to assist the State education agencies in adopting modern technology in their school systems. ENTEDU.Functions.broadcast
ENTEDU.Functions.multicast
ENTEDU.Functions.on-demand
Here's a response I sent to Paul Losleben last week around the "education on-demand" concept and some approaches we are working on to deliver Stanford programming to industry. I've had a response >From Ullman, Tobagi and Harris and will be following up with them in the near future.
Andy DiPaolo
Assistant Dean, School of Engineering
--------------------------------------------------------------------
Paul,
I wanted to respond to your recent message regarding on-demand
education. Based on our discussions with engineering and
education/training managers at SITN's member companies you're right on
target. What we're hearing is that practicing engineers, and the
companies that pay for their education, want "control" of the teaching
and learning. They want control over the place and time (ideally at
the desktop or even at home), pace, and even the scope and sequence of
the material --- and not be constrained by the barriers imposed by the
traditional on-campus class. If you wish, I can send you an outline
of our assessment of the industry environment and engineering
education. The idea of smaller increments of instruction is something
SITN has been working on with Stanford faculty in the development of
non-credit short courses. These are programs that average five hours
in length and are typically broken into one hour increments. The
programs are taped and offered as a five hour course on satellite and
on video tape. In the future these courses could be divided up and
offered as modules available on-line from video servers. The products
(including regular 30 hour courses designed to be broken into stand
alone modules) could also be converted into CD-ROMs with the entire
course indexed for easy access. In fact, we have developed a CD-ROM
demo of a repurposed engineering class that might serve as a model for
future development.
Stanford faculty have offered about 15 short courses over the last 20
months in a range of engineering disciplines. Examples include: C++
(Cheriton), Digital Circuits (DeMicheli), T-CAD (Dutton), Composites
(Tsai), Cryptography (Hellman), Turbulence Modeling (Bradshaw) and
Design for Assembly (Barkan). Our target is to have at least one of
these a month, eventually ratcheting it up to three a month using both
Stanford faculty and distinguished industry experts.
As Jeff Ullman alluded to in his recent note, a few of us (Jeff, Fouad
Tobagi, Dale Harris, Dwain Fullerton et al) have been trying to figure
a way to run an experiment to get some of the School's televised
classes and short courses available to customers on-demand using video
servers. With the real possibility of some resources to get this idea
started (and Gibbons endorsement), I suggest those who are interested
gather to talk about next steps. It would probably make sense to have
a few potential industry customers in attendance at one of these
sessions so that they can provide a reality check about what we have
in mind-they are the ones who are going to pay to receive the
programming. For example, Hewlett-Packard's corporate engineering
education group is currently delivering our 250 courses to the
workstation as a live signal, but they are very interested in having a
menu driven "pull" system. Their idea is to have SITN repurpose
existing video product into smaller increments so that an H-P engineer
can "pull" modules of instruction or information when and where
needed. I know H-P would have an interest in discussing this concept
with Stanford engineering faculty. Since H-P represents over 50% of
the School's external engineering education business we should listen
to their suggestions. H-P and other SITN companies might also have
resources they wish to add to the mix.. ENTEDU.Functions.feedback
ENTEDU.Functions.collaboratory
Gio, we describe below some ideas regarding data base tools to capture
the value added to data during a collaboration session involving a
distributed team supported by electronic collaboration tools-the
collaboratory. These ideas would augment our ongoing collaboratory
development, however, data base activities are not now supported. I
will look forward to your reaction and guidance as to how to proceed.
Best wishes, and warm regards, Bob
ENTEDU.Technology
Both entertainment and education require high transmission
bandwidth. There are two components to this issue:
ENTEDU.Technology.multimedia
The material we deal with in entertainment and education
contains text, images, and today often voice and video as well. Smell
is still rare. For each of those data representations there are a
variety of technological issues and solutions. We will deal with them
individually here, but must keep in mind that in the end an integrated
presentation is desired, where text, images, voice, and video are
synchronized. Achieving such a synchronization in the shared
communication lines of the Internet is still a major challenge.
By Iwasaki Ieo
Telemedia, Networks & Systems Group
Laboratory For Computer Science
Massachusetts Institute Of Technology
The first wave of media applications, i.e., those that simply copy and
store multi-media objects, will be followed by a second wave of
computation-intensive applications-that actively process the
media-based information. These applications extend the requirement
for `video to the desktop' to a more general requirement for `media to
the application'.
ENTEDU.Technology.compression
Need
[[Copied material, to be massively edited]]
Garson emphasized the difficulties of transmitting images over the
networks-a full page image, uncompressed, could take 2 hours to
transmit at 1200 baud, 14 minutes at 9600, 2.4 minutes at 56000, and 5
seconds on a T1 line.MPEG
BERND WOLFINGER
Hamburg University, Germany
Computer Science Department
berndw@icsi.berkeley.edu
"Efficiency of PET and MPEG Encoding for Video Streams:
Analytical QoS Evaluations"
A promising solution in the transmission of video streams via
communication networks is to use forward error control in order to
mask some of the transmission errors and data losses at the receiving
side. The redundancy required, however, to achieve error correction
without retransmissions will consume some transmission capacity of a
network therefore possibly enforcing stronger compression of the video
stream to be transmitted.ENTEDU.Technology.redistribution
ENTEDU.Technology.sharing
Advanced Multimedia Database Tools to Support
Distributed Scientific Team Analysis and Collaboration
Elke A. Rundensteiner
Assistant Professor
Software Systems Research Laboratory
Electrical Engineering and Computer Science Dept.
The University of Michigan
Ann Arbor, Michigan 48109-2122
Phone: (313) 936-2971
Fax: (313)
Email: rundenst@eecs.umich.edu
Research Scientist
Atmospheric, Oceanic and Space Sciences Department
Ann Arbor, Michigan 48109-2143
Email: clauer@pitts.sprl.umich.edu
Terry Weymouth
Atul Prakash ABSTRACT:
We would like to design, implement, test, and distribute a software
tool-kit that supports data analysis by a geographically distributed
science group.
ADVANCED MULTIMEDIA DATABASE TOOLS TO SUPPORT
DISTRIBUTED SCIENTIFIC TEAM ANALYSIS AND COLLABORATION
To accomplish this level of functionality for the database tools, we
propose to develop a new type of dynamic metadata that should be saved
in a multimedia object-oriented database system. Note that our usage
of the term metadata does not simply refer to descriptive data about
the raw scientific data (e.g., netCDF, HDF), but also descriptive data
about the process-context in which the raw scientific data appears
(i.e., context-sensitive metadata that observes the temporal and
distributed relationships between multimedia artifacts). Such
multimedia objects must be synchronized with the scientific data being
investigated, in addition to establishing possibly complex
interrelationships among different types and groups of multimedia
annotations.
By providing multimedia database tools within this framework, we now
envision members of this research team to also be able to:
IMPACT
The proposed research into constructing distributed multimedia object-oriented database tools supporting scientific collaborations will clearly make a major impact on the ease with which scientists- distributed geographically among several institutions-can advance in their scientific interactions to generate publications of the studied data. Facilitating teams of investigators to collaboratively study science data should increase their productivity. More importantly, however, while the collection of 'raw' scientific data is important in general, the collection of interpretations of such scientific data by 'the' experts will be a true valued-added asset. Note that the augmentation of the archived science data sets with valued-added interpretations generated by 'the' experts of the data will be a natural by-product of their scientific studies, rather than requiring a pain-staking effort by the scientists in documenting their findings. Indeed, documentation tasks often receive a low priority because they are tedious, even though such tasks are often deemed important to the overall scientific inquiry.
ENTEDU.Technology.compression
Compression reduces the volume of data to be transmitted of
the networks and stored on the storage devices by taking advantage of
the inherent redundancy in the representation of information that we
use. Redundancy in writing allows us to understand a sentence in the
presence of some typos or smudged print. Redundancy in speaking allows
us to understand a message even when a slamming door makes causes us
not to hear a word. A car ad with a staple in the center still conveys
its message. We can follow a film even if distracted for a minute.
However, for each of the scenarios we can make up instances where the
loss would be significant. For each of these media we have to consider
what is truly redundant and what is of marginal benefit to the
intended receiver.Lossless
Compression
The principle for lossless compression is to
examine the bit-patterns of the data for repeated patterns. [[to
document. page, word, related to indexing
example @ Gtext,
750Mwords 900,000 unique, compresible by word based Huffman coding to
20MB , but 10.5 dict, but unique words can be omitted 5M --> prefix
tree 1M ['canonical huffman coding] Index compressed to 6% by delta
coding bitmap (partially due doc.granularity/ else about 15-20%)
Examples of lossless compression are the Graphic Image Format
(GIF) (8-bit color) often used in Web pages, and the BitMaP (BMP)
(24-bit color) used initially by Microsoft Windows and IBM OS/2. In
GIF files each 8-bit byte points to a palette table of 256 colors.
That palette, or a reference to some standard palette, is transmitted
with each image.Lossy Compression
[[use material from CS545I lecture]]
Examples of lossy compression are the image JPEG format
established by the Joint Photographic Experts Group, also used in Web
pages, and the video MPEG formats established by the Motion
Pictures Experts Group.ENTEDU.Alternatives
ENTEDU.Bio
ENTEDU.Conclusion
ENTEDU.Lists
Fin
Previous chapter: Browsing -
Next chapter: Digital Libraries
CS99I CS99I home page.