Stanford Computer Systems Laboratory Technical Reports from the 1980s

The authors have given permission to make their technical reports available on this server. If a report waspublished in print and is not here it may be that the author published it elsewhere.

Report Number: CSL-TR-80-182
Institution: Stanford University, Computer Systems Laboratory
Title: Design for autonomous test
Author: McCluskey, Edward J.
Author: Bozorgui-Nesbat, Saied
Date: June 1981
Abstract: A technique for modifying networks so that they are capable of self test is presented. The major innovation is partitioning the network into subnetworks with sufficiently few inputs that exhaustive testing of the subnetworks is possible. Procedures for reconfiguring the existing registers into modified linear feedback registers (LFSR's) which apply the exhaustive (not pseudo-random) test patterns or convert the responses into signatures are described. No fault models or test pattern generation programs are required. A method to modify CMOS circuits so that exhaustive testing can be used even when stuck-open faults must be detected is described. A detailed example using the 74181 ALU is presented.
CSL-TR-80-182

Report Number: CSL-TR-80-184
Institution: Stanford University, Computer Systems Laboratory
Title: Design automation at Stanford II
Author: vanCleemput, Willem M.
Date: February 1980
Abstract: This report contains a copy of the visual aids used by the authors during the presentation of their work at the Second Workshop on Design Automation at Stanford, held on Feb. 19, 1980. The topics covered range from circuit level simulation and integrated circuit process modelling to high level languages and design techniques. The presentations are a survey of the activities in design automation at Stanford University.
CSL-TR-80-184

Report Number: CSL-TR-80-189
Institution: Stanford University, Computer Systems Laboratory
Title: Center-based broadcasting
Author: Wall, David W.
Author: Owicki, Susan S.
Date: June 1980
Abstract: We consider the problem of routing broadcast messages in a loosely-coupled store-and-forward network like the ARPANET. Dalal discussed a solution to this problem that minimizes the cost of a broadcast; in contrast, we are interested in performing broadcast with small delay. Existing algorithms can minimize the delay but seem unsuitable for use in a distributed environment because they involve a high degree of overhead in the form of redundant messages or data-structure space. We propose the schemes of center-based forwarding; the routing of all broadcasts via the shortest-path tree for some selected node called the center. These algorithms have small delay and also are easy to implement in a distributed system. To evaluate center-based forwarding, we define four measures of the delay associated with a given broadcast mechanism, and then propose three ways of selecting a center node. For each of the three forms of center-based forwarding we compare the delay to the minimum delay for any broadcasting scheme and also to the minimum delay for any single tree. In most cases, a given measure of the delay on the centered tree is bounded by a small constant factor relative to either of these two minimum delays. When it is possible, we give a tight bound on the ratio between the center-based delay and the minimum delay; otherwise we demonstrate that no bound is possible. These results give corollary bounds on how bad the three centered trees can be with respect to each other; most of these bounds are immediately tight, and the rest are replaced by better bounds that are also shown to be tight.
CSL-TR-80-189

Report Number: CSL-TR-80-192
Institution: Stanford University, Computer Systems Laboratory
Title: Verifying network protocols using temporal logic
Author: Hailpern, Brent T.
Author: Owicki, Susan S.
Date: June 1980
Abstract: Programs that implement computer communications protocols can exhibit extremely complicated behavior, and neither informal reasoning nor testing is reliable enough to establish their correctness. In this paper we discuss the application of program verification techniques to protocols. This approach is more reliable than informal reasoning, but has the advantage over formal reasoning based on finite-state models that the complexity of the proof does not grow unmanageably as the size of the program increases. Certain tools of concurrent program verification that are especially useful for protocols are presented: history variables that record sequences of input and output values, temporal logic for expressing properties that must hold in a future system state (such as eventual receipt of a message), and module specification and composition rules. The use of these techniques is illustrated by verifying a simple data transfer protocol from the literature.
CSL-TR-80-192

Report Number: CSL-TR-80-193
Institution: Stanford University, Computer Systems Laboratory
Title: A language for microcode description and simulation in VLSI
Author: Hennessy, John L.
Date: July 1980
Abstract: This paper presents a programming language based system for specifying and simulating microcode in a VLSI chip. The language is oriented toward PLA implementation of microcoded machines using either a microprogram counter or a finite state machine. The system supports simulation of the microcode and will drive a PLA layout program to automatically create the PLA.
CSL-TR-80-193

Report Number: CSL-TR-81-201
Institution: Stanford University, Computer Systems Laboratory
Title: Research in VLSI systems design and architecture
Author: Baskett, Forest
Author: Clark, James
Author: Hennessy, John
Author: Owicki, Susan
Author: Reid, Brian
Date: March 1981
Abstract: The Computer Systems Laboratory has been involved in a VLSI research program for one and a half years. The major areas under investigation have included: analysis and synthesis design aids, applications of VLSI to computer graphics, the design of a personal workstation, special purpose chip design, VLSI computer architectures, and hardware specification and verification. Progress on these research problems is discussed, and a research program for the next two years is proposed.
CSL-TR-81-201

Report Number: CSL-TR-81-209
Institution: Stanford University, Computer Systems Laboratory
Title: Dynamic detection of concurrency in do-loops using ordering matrices
Author: Wedig, Robert G.
Date: May 1981
Abstract: This paper describes the data structures and techniques used in dynamically detecting concurrency in Directly Executed Language (DEL) instruction streams. By dynamic detection, it is meant that these techniques are designed to be used at run time with no special source manipulation or preprocessing required to perform the detection. An abstract model of a concurrency detection structure called an ordering matrix is presented. This structure is used, with two other execution vectors, to represent the dependencies between instructions and indicate where potential concurrency exists. An algorithm is developed which utilizes the ordering matrix to detect concurrency within determinate DO-loops. It is then generalized to detect concurrency in arbitrary del instruction streams.
CSL-TR-81-209

Report Number: CSL-TR-81-214
Institution: Stanford University, Computer Systems Laboratory
Title: An exponential failure/load relationship: results of a multi-computer statistical study
Author: Iyer, Ravishankar K.
Author: Butner, Steven E.
Author: McCluskey, Edward J.
Date: July 1981
Abstract: In this paper we present an exponential statistical model which relates computer failure rates to level of system activity. Our analysis reveals a strong statistical dependency of both hardware and software component failure rates on several common measures of utilization (specifically CPU utilization, I/O initiation, paging, and job-step initiation rates). We establish that this effect is not dominated by a specific component type, but exists across the board in the two systems studied. Our data covers three years of normal operation (including significant upgrades and reconfigurations) for two large Stanford University computer complexes. The complexes, which are composed of IBM mainframe equipment of differing models and vintage, run similar operating systems and provide the same interface and capability to their users. The empirical data domes from identically-structured and maintained failure logs at the two sites along with IBM OS/VS2 operating system performance/load records The statistically strong relationship between failures and load is evident for many equipment types, including electronic, mechanical, as well as software components. This is in opposition to the commonly-held belief that systems which are primarily electronic in nature exhibit no such effect to any significant degree. The exponential character of our statistical model is significantly not only in its simplicity, but also due to its compatibility with classical reliability techniques.
CSL-TR-81-214

Report Number: CSL-TR-81-219
Institution: Stanford University, Computer Systems Laboratory
Title: Consistency in interprocessor communications for fault-tolerant multiprocessors
Author: Fu, Peter Lincoln
Date: September 1981
Abstract: Consistency among processors is vital for fault-tolerant multiprocessors. This report describes modular communication interprocessor interface units which implement distributed consistency schemes such that failures within a single processor module cannot affect the consistency of data transferred among the remaining processors. Furthermore, one scheme provides concurrent and consistent self-diagnosis data on the integrity of the units themselves. Another scheme is tolerant to almost all failures within two processor modules. The theory of the schemes are explained and their implementations in LSI circuits are described in detail. The interprocessor communication structure defined by any of these schemes serves well as a critical element in highly reliable multiprocessor systems.
CSL-TR-81-219

Report Number: CSL-TR-81-221
Institution: Stanford University, Computer Systems Laboratory
Title: Parametric curves, surfaces and volumes in computer graphics and computer-aided geometric design
Author: Clark, James H.
Date: November 1981
Abstract: This document has four purposes. It is a tutorial in parametric curve and surface representations, it describes a number of algorithms for generating both shaded and line-drawn pictures of bivariate surfaces and trivariate volumes, it explicitly gives transformations between all of the widely used curve and surface representations, and it proposes a solution to the problem of displaying the results of three-dimensional flow-field calculations.
CSL-TR-81-221

Report Number: CSL-TR-81-223
Institution: Stanford University, Computer Systems Laboratory
Title: MIPS: a VLSI processor architecture
Author: Hennessy, John L.
Author: Jouppi, Norman
Author: Baskett, Forest
Author: Gill, John
Date: November 1981
Abstract: MIPS is a new single chip VLSI processor architecture. It attempts to achieve high performance with the use of a simplified instruction set, similar to those found in microengines. The processor is a fast pipelined engine without pipeline interlocks. Software solutions to several traditional hardware problems, such as providing pipeline interlocks, are used.
CSL-TR-81-223

Report Number: CSL-TR-81-224
Institution: Stanford University, Computer Systems Laboratory
Title: Code generation and reorganization in the presence of pipeline constraints
Author: Hennessy, John L.
Author: Gross, Thomas
Date: November 1981
Abstract: Pipeline interlocks are used in a pipelined architecture to prevent the execution of a machine instruction before its operands are available. An alternative to this complex piece of hardware is to rearrange the instructions at compile-time to avoid pipeline interlocks. This problem, called code reorganization, is studied. The basic problem of reorganization of machine level instructions at compile-time is shown to be NP-complete. A heuristic algorithm is proposed and its properties and effectiveness are explored. The impact of code reorganization techniques on the rest of a compiler system are discussed.
CSL-TR-81-224

Report Number: CSL-TR-81-225
Institution: Stanford University, Computer Systems Laboratory
Title: Automatic compiler code generation
Author: Ganapathi, Mahadevan
Author: Fischer, Charles N.
Author: Hennessy, John L.
Date: November 1981
Abstract: A classification of automatic code generation techniques and a survey of the work on these techniques is presented. Automatic code-generation research is classified into three categories: formal treatments, interpretive approaches and descriptive approaches. An analysis of these approaches and a critique of automatic code-generation algorithms are presented.
CSL-TR-81-225

Report Number: CSL-TR-81-226
Institution: Stanford University, Computer Systems Laboratory
Title: SILT: a VLSI design language
Author: Davis, Tom
Author: Clark, James
Date: October 1982
Abstract: SILT is an efficient, medium-level language to describe VLSI layout. Layout features are described in terms of a coordinate system based on the concept of relative geometry. SILT provides hierarchical cell description, a library format for parameterized cells with defaults for the parameters, constraint checking (but not enforcement), and some name control. It is designed to be used with a graphical interface, but can be used by itself.
CSL-TR-81-226

Report Number: CSL-TR-81-228
Institution: Stanford University, Computer Systems Laboratory
Title: Hardware/software tradeoffs for increased performance
Author: Hennessy, John L.
Author: Jouppi, Norman
Author: Baskett, Forest
Author: Gross, Thomas
Author: Gill, John
Date: February 1983
Abstract: Most new computer architectures are concerned with maximizing performance by providing suitable instruction sets for compiled code, and support for systems functions. We argue that the most effective design methodology must make simultaneous tradeoffs across all three areas: hardware, software support, and systems support. Recent trends lean toward extensive hardware support for both the compiler and operating systems software. However, consideration of all possible design tradeoffs may often lead to less hardware support. Several examples of this approach are presented, including: omission of condition codes, word-addressed machines, and imposing pipeline interlocks in software. The specifics and performance of these approaches are examined with respect to the MIPS processor.
CSL-TR-81-228

Report Number: CSL-TR-82-229
Institution: Stanford University, Computer Systems Laboratory
Title: The SUN workstation architecture
Author: Bechtolsheim, Andrew
Date: March 1982
Abstract: The Sun workstation is a personal computer system that combines graphics and networking capabilities with powerful local processing. The workstation has been developed for research in VLSI design automation, text processing, distributed operating systems and programming environments. Clusters of Sun workstations are connected via a local network sharing a network-based file system. The Sun workstation is based on a Motorola 6800 processor, has a 1024 by 800 pixel bitmap display, and uses Ethernet as its local network. The hardware supports virtual memory management, a "RasterOP" mechanism for high-speed display updates, and data-link-control for the Ethernet. The entire workstation electronics consists of 260 chips mounted on three 6.75 by 12 inch PC boards compatible with the IEEE 796 Bus (Intel Multibus). In addition to implementing a workstation, the boards have been configured to serve as network nodes for file servers, printer servers, network gateways, and terminal concentrators. The report discusses the architecture and implementation of the Sun workstation, gives the background and goals of the project, contemplates future developments, and describes in detail its three main components: the processor, graphics, and Ethernet boards.
CSL-TR-82-229

Report Number: CSL-TR-82-230
Institution: Stanford University, Computer Systems Laboratory
Title: Packet-voice communication on an Ethernet local computer network: an experimental study
Author: Gonsalves, Timothy A.
Date: February 1982
Abstract: Local computer networks have been used successfully for data applications such as file transfers for several years. Recently, there have been several proposals for using these networks for voice applications. This paper describes simple voice protocol for use on a packet-switching local network. This protocol is used in an experimental study of the feasibility of using a 3 Mbs experimental Ethernet network for packet-voice communications. This study shows that with appropriately chosen parameters the experimental Ethernet is capable of supporting about 40 simultaneous 64 Kbps voice conversations with acceptable quality. This corresponds to a utilization of 95% of the network capacity.
CSL-TR-82-230

Report Number: CSL-TR-82-231
Institution: Stanford University, Computer Systems Laboratory
Title: Dynamic detection of concurrency in DEL instruction streams
Author: Wedig, Robert G.
Date: February 1982
Abstract: Detection of concurrency in Directly Executed Languages (DEL) is investigated. It is theorized that if DELs provide a minimal time -space execution of serial programs, then concurrency detection of such instruction streams approaches the minimum execution time possible for a single task without resorting to algorithm restructuring or source manipulation. It is shown how DEL encodings facilitate the detection of concurrency by allowing early decoding and explicity detection of dependency information. The decoding and dependency algorithms as applied to DELs are developed in detail. Concurrency structures are presented which facilitate the detection process. Since all concurrency is capable of exploitation as soon as it is known that the code is to be executed, i.e., the result of the branch is known, it is proven that all explicit parallelism can be detected and exploited using the techniques developed.
CSL-TR-82-231

Report Number: CSL-TR-82-232
Institution: Stanford University, Computer Systems Laboratory
Title: Studies in microprocessor design
Author: Alpert, Donald
Date: June 1982
Abstract: Microprocessor design practice is briefly surveyed. Examples are given for high-level and low-level tradeoffs in specific designs with emphasis on integrated memory functions. Some relations between architectural complexity and design are discussed, and a simple model is presented for implementing a RISC-like architecture. A direction for microprocessor architecture is proposed to allow flexibility for designing with varying processing technologies, cost goals, and performance goals.
CSL-TR-82-232

Report Number: CSL-TR-82-233
Institution: Stanford University, Computer Systems Laboratory
Title: Yale user's guide: a SILT-based layout editor
Author: Davis, Tom
Author: Clark, James
Date: October 1982
Abstract: YALE is a layout editor which runs on SUN workstations, and deals with cells expressed in the SILT language. It provides graphical hooks into many features describable in SILT. YALE runs under the V kernel, and makes use of a window manager than provides a multiple viewpoint capability.
CSL-TR-82-233

Report Number: CSL-TR-83-236
Institution: Stanford University, Computer Systems Laboratory
Title: Design of a high performance VLSI processor
Author: Hennessy, John L.
Author: Jouppi, Norman
Author: Przybylski, Steven
Author: Rowen, Christopher
Author: Gross, Thomas
Date: February 1983
Abstract: Current VLSI fabrication technology makes it possible to design a 32-bit CPU on a single chip. However, to achieve high performance from that processor, the architecture and implementation must be carefully designed and tuned. The MIPS processor incorporates some new architectural ideas into a single-chip, nMOS implementation. Processor performance is obtained by the careful integration of the software (e.g., compilers), the architecture, and the hardware implementation. This integrated view also simplifies the design, making it practical to implement the processor at a university.
CSL-TR-83-236

Report Number: CSL-TR-83-240
Institution: Stanford University, Computer Systems Laboratory
Title: ADAM: an ADA-based language for multiprocessing
Author: Luckham, David C
Author: von Henke, Frederick W.
Author: Larsen, H. J.
Author: Stevenson, D. R.
Date: May 1983
Abstract: Adam is a high level language for parallel processing. It is intended for programming resource scheduling applications, in particular supervisory packages for runtime scheduling of multiprocessing systems. An important design goal was to provide support for implementation of Ada and its runtime environment. Adam has been used to implement Ada task supervision and also as a high level target language for compilation of Ada tasking. Adam provides facilities that match the Ada sequential constructs (including subprograms, packages, exceptions, generics). In addition there are specialized module constructs for implementation of packages that may be shared between parallel processes. Adam omits the Ada real types but includes some new predefined types for scheduling. The parallel processing constructs of Adam are more primitive than Ada tasking. Some restrictions are enforced on the ways in which parallel processes can interact. A compiler for Adam has been implemented in MacLisp on DEC PDP-10 computers. Runtime support packages in Adam for scheduling (on a single CPU) and I/O are also provided. The compile contains a library manipulation facility for separate compilation. The Adam compiler has been used to build an Ada compiler for most of the July 1980 Ada language design including task types and rendezvous constructs. This was achieved by implementing algorithms translating Ada tasking into Adam parallel processing as a preprocessor to the Adam compiler. This present Ada compiler, which has been operational since December 1980, uses a procedure call implementation of tasking (due to Haberman and Nassi and to Stevenson). It can be easily modified to other implementations. Compilation of Ada tasking into a high level target language such as Adam facilitates studying questions of correctness and efficiency of various compilation algorithms, and code optimizations specific to tasking, e.g. elimination of unnecessary thread of control. This paper gives an overview of Adam and examples of its use. Emphasis is placed on the differences from Ada. Experience using Adam to build the experimental Ada system is evaluated. Design of runtime supervisors in Adam and algorithms for translating Ada tasking to Adam processing are discussed in detail.
CSL-TR-83-240

Report Number: CSL-TR-83-242
Institution: Stanford University, Computer Systems Laboratory
Title: Fault simulation using ADLIB-SABLE
Author: Ghosh, Sumit
Author: vanCleemput, Willem
Date: March 1983
Abstract: This technical report presents work in the area of deductive fault simulation. This technique, one of the three fault simulation techniques discussed in the literature, has been implemented in ADLIB-SABLE, a hierarchical multi-level simulator designed and used at Stanford University. Most of the fault models illustrated in this report consider only two fault types: single stuck-at-0 and single stuck-at-Z (high impedance). Gate level fault models have been built for most commonly used gates. The ability to model the fault behavior of functional blocks in ADLIB-SABLE is also demonstrated. The motivation is that for many functional blocks, a gate level description may not be available or that the designer wishes to sacrifice detailed analysis for a higher simulation speed. Functional fault models are built for many commonly used blocks, using a decomposition technique. The ratio of functional fault simulation speed to gate level fault simulation speed has been observed to be of the order of 5 for the typical functional block sizes considered. The ratio however, is not the upper limit and will be larger for larger-sized functional blocks. It was also proved that the functional fault models are invariant with respect to the internal implementation details. A design discipline for sequential circuits is worked out which allows deductive fault simulation. Extensions to the simple (0,1) deductive techniques are studied and the fault models built in the extended domain are observed to be useful in modelling gates of some technologies. A comparison between deductive and concurrent fault simulation methods is given. Performance of deductive fault simulation, implemented in ADLIB-SABLE, shows that for sequential as well as combinational circuits, the CPU time increases linearly with increasing number of components simulated, an advantage over fault simulators which simulate one fault at a time and display a quadratic behavior.
CSL-TR-83-242

Report Number: CSL-TR-83-244
Institution: Stanford University, Computer Systems Laboratory
Title: High speed image rasterization using a highly parallel smart bulk memory
Date: June 1983
Abstract: VLSI technology allows the efficient realization of a class of highly parallel architectures consisting of high density semiconductor memory with an on-chip processor which accesses the memory in large sections simultaneously. A processor is described which uses this architecture to rasterize lines, polygons and text quickly, providing the rasterization support required in high performance graphic raster displays and fast page printers. This on-chip processor translates high-level low bandwidth commands into low-level high bandwidth actions on chip, where the high bandwidth can be tolerated. This architecture is capable of achieving performance comparable to the "processor per pixel" approaches while avoiding the tremendous density penalty incurred by such approaches. Consequently, it is practical to build a very high performance high resolution system from a small number of these chips.
CSL-TR-83-244

Report Number: CSL-TR-83-245
Institution: Stanford University, Computer Systems Laboratory
Title: EDT - a syntax-based program editor reference manual
Author: Finlayson, Ross S.
Date: July 1983
Abstract: This report describes an experimental syntax-based editor that has recently been developed at Stanford. Syntax-based editors are unlike conventional text editors in that they use knowledge of the syntactic structure of the item (typically a program) being edited, to provide "high level" editing operations. The editor described in this report is currently being used as an editor for programs written in Ada. Other programming languages could also be handled, by replacing the appropriate language definition files by those for another language.
CSL-TR-83-245

Report Number: CSL-TR-83-247
Institution: Stanford University, Computer Systems Laboratory
Title: Maintaining the time in a distributed system
Author: Marzullo, Keith
Author: Owicki, Susan
Date: August 1983
Abstract: To a client, one of the simplest services provided by a distributed system is a time service. A client simply requests the time from any set of servers, and uses any reply. The simplicity in this interaction, however, misrepresents the complexity of implementing such a service. An algorithm is needed that will keep a set of clocks synchronized, reasonably correct and accurate with respect to a standard, and able to withstand errors such as communication failures and inaccurate clocks. This paper presents a partial solution to the problem by describing two algorithms which will keep clocks both correct and synchronized.
CSL-TR-83-247

Report Number: CSL-TR-83-249
Institution: Stanford University, Computer Systems Laboratory
Title: Runtime and description of deadness errors in ADA tasking
Author: Helmbold, D.
Author: Luckham, David C.
Date: November 1983
Abstract: A routine monitoring system for detecting and describing tasking errors in Ada programs is presented. Basic concepts for classifying tasking errors, called deadness errors, are defined. These concepts indicate which aspects of an Ada computation must be monitored in order to detect deadness errors resulting from attempts to rendezvous or terminate. They also provide a basis for the definition and proof of correct detection. Descriptions of deadness errors are given in terms of the basic concepts. The monitoring system has two parts: (1) a separately compiled runtime monitor that is added to any Ada source to be monitored, and (2) a preprocessor that transforms the Ada source so that necessary descriptive data is communicated to the monitor at runtime. Some basic preprocessing transformations and an abstract monitoring for a limited class of errors were previously presented. Here an Ada implementation of a monitor and a more extensive set of preprocessing transformations are described. This system provides an experimental automated tool for detecting deadness errors in Ada83 tasking and supplies useful diagnostics. The use of the runtime monitor for debugging and for programming evasive actions to avoid imminent errors is described and examples of experiments are given.
CSL-TR-83-249

Report Number: CSL-TR-83-250
Institution: Stanford University, Computer Systems Laboratory
Title: Data buffers for execution architectures
Author: Alpert, Donald
Date: November 1983
Abstract: Directly Executed Language (DEL) architectures are derived from idealized representations of high-level languages. DEL architectures show dramatic reduction in the number of instructions and memory references executed when compared to traditional architectures, offering the design considerations for the data buffer in a DEL microprocessor. Simulation techniques were used to evaluate the performance of different sized buffers for a set of Pascal test programs. The results show that a buffer with 256 words typically faults on less than 5% of storage allocations.
CSL-TR-83-250

Report Number: CSL-TR-83-251
Institution: Stanford University, Computer Systems Laboratory
Title: GEM: a tool for concurrency specification and verification
Author: Lansky, Amy
Author: Owicki, Susan
Date: November 1983
Abstract: The GEM model of concurrent computation is presented. Each GEM computation consists of a set of partially ordered events, and represents a particular concurrent execution. Language primitives for concurrency, code segments, as well as concurrency problems may be described as logic formulae (restrictions) on the domain of possible GEM computations. An event-oriented method of program verification is also presented. GEM is unique in its ability to easily describe and reason about synchronization properties.
CSL-TR-83-251

Report Number: CSL-TR-83-253
Institution: Stanford University, Computer Systems Laboratory
Title: Evaluation of an interpreted architecture for Pascal on a personal computer
Author: Mitchell, Chad Leland
Date: December 1983
Abstract: This report describes the design and implementation of an interpreter on a personal computer. The architecture interpreted was specifically designed for the execution of Pascal and belongs to the class of architecture known as Direct Correspondence Architectures. The evaluation of the interpreter provides information about the suitability of the host for this architecture and identifies features of the architecture which are not adequately supported by the host.
CSL-TR-83-253

Report Number: CSL-TR-84-256
Institution: Stanford University, Computer Systems Laboratory
Title: Instruction selection by attributed parsing
Author: Ganapathi, Mahadevan
Author: Fischer, Charles N.
Date: February 1984
Abstract: Affix grammars are used to describe the instruction-set of a target architecture for purposes of compiler code generation. A code generator is obtained automatically for a compiler using attributed parsing techniques. A compiler built on this model can automatically perform most popular machine-dependent optimizations, including peephole optimizations. Implementations of code generators based on this model exist for the VAX-11, iAPX-86, Z -8000, PDP-ll and IBM-370 architectures.
CSL-TR-84-256

Report Number: CSL-TR-84-257
Institution: Stanford University, Computer Systems Laboratory
Title: Reverse synthesis compilation for architectural research
Author: Ganapathi, Mahadevan
Author: Hennessy, John
Author: Sarkar, Vivek
Date: March 1984
Abstract: This paper discusses the development of compilation strategies for DEL architectures and tools to assist in the evaluation of their efficiency. Compilation is divided into a series of independent simpler problems. To explore optimization of code for DEL compilers, two intermediate representations are employed. One of these representations is at a lower level than target machine instructions. Machine-independent optimization is performed on this intermediate representation. The other intermediate representation has been specifically designed for compiler retargetability It is at a higher level than the target machine. Target code generation is performed by reverse synthesis followed by attributed parsing. This technique demonstrates the feasibility of using automated table-driven code generation techniques for inflexible architectures.
CSL-TR-84-257

Report Number: CSL-TR-84-258
Institution: Stanford University, Computer Systems Laboratory
Title: A strongly typed language for specifying programs
Author: Henke, Friedrich W. von
Date: January 1984
Abstract: A language for specifying and annotating programs is presented. The language is intended to be used in connection with a strongly typed programming language. It provides a framework for the definition of specification concepts and the specification of programs by means of assertions and annotations. The language includes facilities for defining concepts axiomatically and to group definitions of related concepts and derived properties (lemmas) in theories. All entities in the language are required to be strongly typed; however, the language provides a very flexible type system which includes polymorphic (or generic) types. The paper presents a type checking algorithm for the language and discusses the relationship between specification language and programming language.
CSL-TR-84-258

Report Number: CSL-TR-84-259
Institution: Stanford University, Computer Systems Laboratory
Title: Organization and VLSI implementation of MIPS
Author: Przybylski, Steven A.
Author: Gross, Thomas R.
Author: Hennessy, John L.
Author: Jouppi, Norman P.
Author: Rowen, Christopher
Date: April 1984
Abstract: MIPS is an 32-bit, high performance processor architecture implemented as an nMOS VLSI chip. The processor uses a low level, streamlined instruction set coupled with a fast pipeline to achieve an instruction rate of two million instructions per second. Close interaction between the processor design and compilers for the machine yields efficient execution of programs on the chip. Simplifying the instruction set and the requirements placed on the hardware by the architecture, facilitates both processor control and interrupt handling in the pipeline. High speed MOS circuit design techniques and a sophisticated timing methodology enable the processor to achieve a 250nS clock cycle.
CSL-TR-84-259

Report Number: CSL-TR-84-261
Institution: Stanford University, Computer Systems Laboratory
Title: ANNA: a language for annotating ADA programs
Author: Luckham, David C.
Author: Henke, Friedrich W. von
Author: Krieg-Brueckner, Bernd
Author: Owe, Olaf
Date: July 1984
Abstract: ANNA is a proposed language extension of Ada to include facilities for formally specifying the intended behavior of Ada programs (or portions thereof) at all stages of program development. Anna programs are Ada programs extended by formal comments. Formal comments in ANNA consist of virtual Ada text and annotations. Anna provides annotations for all Ada constructs, including declarative annotations (for variables, subtypes, subprograms, and packages), statement annotations, annotations of generic units, exception annotations and visibility annotations. (The current Anna design does not include extensions for annotating Ada multi-tasking constructs.) Anna also includes a small number of new predefined attributes, which may appear only in annotations, e.g. the collection attribute of an access type. Since all Anna extensions appear as Ada comments, Anna programs are also legal Ada programs and acceptable by Ada translators. The semantics of annotations are defined in terms of Ada concepts; in particular, many kinds of annotations are generalizations of the Ada constraint concept. This simplifies the training of Ada programmers to use Anna for formal specification of Ada programs. Anna provides a formal framework within which different theories of formal specification may be applied to Ada. This manual also describes a translation of annotations into Ada text for run-time check of consistency with annotations.
CSL-TR-84-261

Report Number: CSL-TR-84-262
Institution: Stanford University, Computer Systems Laboratory
Title: DEBUGGING ADA TASKING PROBLEMS
Author: Helmhold, David
Author: Luckham, David
Date: July 1984
Abstract: A new class of errors, not found in sequential languages, can result when the tasking constructs of Ada are used. These errors are called deadness errors and arise when task communication fails. Since deadness errors often occur intermittently, they are particularly hard to detect and diagnose. Previous papers describe the theory and implementation of runtime monitors to detect deadness errors in tasking programs. The problems of detection and description of errors are different. Even when a dead state is detected, giving adequate diagnostics that enable the programmer to locate its cause in the Ada text is difficult. This paper discusses the use of simple diagnostic descriptions based on Ada tasking concepts. These diagnostics are implemented in an experimental runtime monitor. Similar facilities could be implemented in task debuggers in forthcoming Ada support environments. Their usefulness and shortcomings are illustrated in an example experiment with the runtime monitor. Possible future directions in task error monitoring and diagnosis based on formal specifications are discussed.
CSL-TR-84-262

Report Number: CSL-TR-84-265
Institution: Stanford University, Computer Systems Laboratory
Title: An overview of ANNA - a specification language for ADA
Author: Luckham, David
Author: Henke, Friedrich W. von
Date: September 1984
Abstract: A specification language permits information about various aspects of a program to be expressed in a precise machine processable form. This information is not normally part of the program itself. Specification languages are viewed as evolving from modern high level programming languages. The first step in this evolution is cautious extension of the programming language. Some of the features of Anna, a specification language extending Ada, are discussed. The extensions include generalizations of constructs (such as type constraints) that are already in Ada, and new constructs for specifying subprograms, packages, exceptions, and contexts.
CSL-TR-84-265

Report Number: CSL-TR-85-270
Institution: Stanford University, Computer Systems Laboratory
Title: A Model and Temporal Proof System for Networks of Processes
Author: Nguyen, Van
Author: Gries, David
Author: Owicki, Susan
Date: February 1985
Abstract: A model and a sound and complete proof system for networks of processes in which component processes communicate exclusively through messages is given. The model, an extension of the trace model, can describe both synchronous and asynchronous networks. The proof system uses temporal-logic assertions on sequences of observations - a generalization of traces. The use of observations traces makes the proof system simple, compositional and modular, since internal details can be hidden. The expressive power of temporal logic makes it possible to prove temporal properties (safety, liveness, precedence, etc.) in the system. The proof system is language-independent and works for both synchronous and asynchronous networks.
CSL-TR-85-270

Report Number: CSL-TR-86-289
Institution: Stanford University, Computer Systems Laboratory
Title: MIPS-X instruction set and programmer's manual
Author: Chow, Paul
Date: May 1986
Abstract: MIPS-X is a high performance second generation reduced instruction set microprocessor. This document describes the visible architecture of the machine, the basic timing of the instructions, and the instruction set.
CSL-TR-86-289

Report Number: CSL-TR-86-298
Institution: Stanford University, Computer Systems Laboratory
Title: Parallel program behavior - specification and abstraction using BDL
Author: Yan, Jerry C.
Date: August 1986
Abstract: This paper describes the syntax, semantics, and usage of BDL - a Behavior Description Language for concurrent programs. BDL program models can be used to describe and abstract the behavior of real programs formulated in various computation paradigms (such as CSP, remote procedures, data-flow, actors, etc.). BDL models are constructed from abstract computing entities known as "players". The models can behave as closely as possible to the actual program in terms of message passing, player creation and cpu usage. Although behavior abstraction using BDL only involves identifying the "redundant part" of the computation and replacing them with simple "NO-OP" statements, proper application of this technique remains difficult and requires a thorough understanding of how the program is architectured. Simulating BDL models is much more economical than instruction level emulation while program behavior is realistically preserved.
CSL-TR-86-298

Report Number: CSL-TR-86-300
Institution: Stanford University, Computer Systems Laboratory
Title: An overview of the MIPS-X-MP project
Author: Hennessy, John L.
Author: Horowitz, Mark A.
Date: April 1986
Abstract: MIPS-X-MP is a research project whose end goal is to build a small (workstation-sized) multiprocessor with a total throughput of 100-200 mips. The architectural approach uses a small number (tens) of high performance RISC-based microprocessors (10-20 mips each) The multiprocessor architecture uses software-controlled cache coherency to allow cooperation among processors without sacrificing performance of the processors. Software technology for automatically decomposing problems to allow the entire machine to be concentrated on a single problem is a key component of the research. This report surveys the four key components of the project: high performance VLSI processor architecture and design, multiprocessor architectural studies, multiprocessor programming systems, and optimizing compiler technology.
CSL-TR-86-300

Report Number: CSL-TR-86-301
Institution: Stanford University, Computer Systems Laboratory
Title: The complete transformation methodology for sequential runtime checking of an ANNA subset
Author: Sankar, Sriram
Author: Rosenblum, David
Date: June 1986
Abstract: We present in this report a complete description of a methodology for transformation of Anna (Annotated Ada) programs to executable self-checking Ada programs. The methodology covers a subset of Anna which allows annotation of scalar types and objects. The allowed annotations include subtype annotations, subprogram annotations, result annotations, object annotations, out annotations and statement annotations. Except for package state expressions and quantified expressions, the full expression language of Anna is allowed in the subset. The transformation of annotations to executable checking functions is thoroughly illustrated through informal textual description, universal checking function templates and several transformation examples. We also describe the transformer and related software tools used to transform Anna programs. In conclusion, we describe validation of the transformer and some methods of making the transformation and runtime checking processes more efficient.
CSL-TR-86-301

Report Number: CSL-TR-86-303
Institution: Stanford University, Computer Systems Laboratory
Title: The semantics of timing constructs in hardware description languages
Author: Luckham, David C.
Author: Huh, Youm
Author: Stanculescu, Alec G.
Date: August 1986
Abstract: Three different approaches to the representation of time in high level hardware design languages are described and compared. The first is the timed assignment statement of ADLIB/SABLE which anticipates future events. The second is the timed assignment of VHDL which predicts future events and allows predictions to be preempted by other predictions. The third is a new proposed method of expressing time dependency by qualifying expressions so that their values are required to be constant over a specified time interval. Examples comparing these three approaches are given. It is shown how time-qualified expressions could be introduced into a hardware description language. The possibility of proving correctness of hardware models in this language is illustrated.
CSL-TR-86-303

Report Number: CSL-TR-86-306
Institution: Stanford University, Computer Systems Laboratory
Title: Queueing network models for parallel processing of task systems: an operational approach
Author: Mak, Victor W.K.
Date: September 1986
Abstract: Computer performance modeling of possibly complex computations running on highly concurrent systems is considered. Earlier works in this area either dealt with a very simple program structure or resulted in methods with exponential complexity. A computationally efficient approximate solution method is developed to compute the performance measures for series-parallel-reducible task systems using queueing network models. Numerical results for a number of test cases are presented and compared to those of simulations.
CSL-TR-86-306

Report Number: CSL-TR-86-307
Institution: Stanford University, Computer Systems Laboratory
Title: A survey of concurrent architectures
Author: Mak, Victor W.K.
Date: September 1986
Abstract: A survey of 18 different concurrent architectures is presented in this report. Although this is by no means complete, it does cover a wide spectrum of both commercial and research architectures. A scheme is proposed to describe concurrent architectures using different dimensions: models of computation, interconnection network, processing element, memory system, and application areas.
CSL-TR-86-307

Report Number: CSL-TR-86-309
Institution: Stanford University, Computer Systems Laboratory
Title: Design of testbed and emulation tools
Author: Lundstrom, Stephen F.
Author: Flynn, Michael J.
Date: September 1986
Abstract: The research summarized in this report was concerned with the design of testbed and emulation tools suitable to assist in projecting, with reasonable accuracy, the expected performance of highly concurrent computing systems on large, complete applications. Such testbed and emulation tools are intended for the eventual use of those exploring new concurrent system architectures and organizations, either as users or as designers of such systems. While a range of alternatives was considered, a software-based set of hierarchical tools was chosen to provide maximum flexibility, to ease in moving to new computers as technology improves and to take advantage of the inherent reliability and availability of commercially available computing systems.
CSL-TR-86-309

Report Number: CSL-TR-86-310
Institution: Stanford University, Computer Systems Laboratory
Title: Dynamic resource allocation in a hierarchical multiprocessor system - a preliminary report
Date: October 1986
Abstract: In this report, an integrated system approach to dynamic resource allocation is proposed. Some of the problems in dynamic resource allocation and the relationship of these problems to system structures are examined. A general dynamic resource allocation scheme is presented. A hierarchical system architecture which dynamically maps between processor structure and programs at multiple levels of instantiations is described. Simulation experiments have been conducted to study dynamic resource allocation on the proposed system. Preliminary evaluation based on simple dynamic resource allocation algorithms indicates that with the proposed system approach, the complexity of dynamic resource management could be significantly reduced while achieving reasonably effective dynamic resource allocation.
CSL-TR-86-310

Report Number: CSL-TR-87-314
Institution: Stanford University, Computer Systems Laboratory
Title: Post-game analysis -- an initial experiment for heuristic-based resource management in concurrent systems
Author: Yan, Jerry C
Date: February 1987
Abstract: In concurrent systems, a major responsibility of the resource management system is to decide how the application program is to be mapped onto the multi-processor. Instead of using abstract program and machine models, a generate-and-test framework known as "post-game analysis" that is based on data gathered during program execution is proposed. Each iteration consists of (i) (a simulation of) an execution of the program; (ii) analysis of the data gathered; and (iii) the proposal of a new mapping that would have a smaller execution time. These heuristics are applied to predict execution time changes in response to small perturbations applied to the current mapping. An initial experiment was carried out using simple strategies on "pipeline-like" applications. The results obtained from four simple strategies demonstrated that for this kind of application, even simple strategies can produce acceptable speed-up with a small number of iterations.
CSL-TR-87-314

Report Number: CSL-TR-87-326
Institution: Stanford University, Computer Systems Laboratory
Title: SRT division diagrams and their usage in designing intergrated circuits for division
Author: Williams, Ted E.
Author: Horowitz, Mark
Date: November 1986
Abstract: This paper describes the construction and analysis of several diagrams which depict SRT division algorithms. These diagrams yield insight into the operation of the algorithms and the many implementation tradeoffs available in custom circuit design. Examples of simple low radix diagrams are shown, as well as tables for higher radices. The tables were generated by a program which can create and verify the diagrams for different division schemes. Also discussed is a custom CMOS integrated circuit designed which performs SRT division using self-timed circuit techniques. This chip implements an intermediate approach between a fully combinational array and a fully iterative in time method in order to get both speed and small silicon area.
CSL-TR-87-326

Report Number: CSL-TR-87-333
Institution: Stanford University, Computer Systems Laboratory
Title: Managing and measuring two parallel programs on a multiprocessor
Author: Yan, Jerry C
Date: June 1987
Abstract: Research is being conducted to determine how distributed computations can be mapped onto multiprocessors so as to minimize execution time. Instead of employing optimization techniques based on some abstract program/machine models, the approach being investigated here (called "post-game analysis") is based on placement heuristics which utilizes program execution history. Although initial experiments have demonstrated that "post-game analysis" indeed discovered mappings that exhibit significantly shorter execution times than the worst cases for the programs tested, three important issues remain to be addressed: i) the need to evaluate the performance of placement heuristics against the "optimal" speed-up attainable, ii) to find evidence to help explain why these heuristics work and iii) to develop better heuristics by understanding how and why the basic set performed well. Parallel program execution was simulated using "Axe" -- an integrated environment for computation model description, processor architecture specification, discrete-time simulation and automated data collection. Five groups of parameters are measured representing different aspects in the concurrent execution environment: (i) overall measurements, (ii) communication parameters, (iii) cpu utilization, (iv) cpu contention and (v) dependencies between players. Two programs were simulated -- a "pipe-line" of players and a "divide-and-conquer" program skeleton. The results showed that program execution time indeed correlated well with some of the parameters measured. It was also shown that "post-game" analysis achieved close to 96% optimal speed-up for both programs in most cases.
CSL-TR-87-333

Report Number: CSL-TR-87-335
Institution: Stanford University, Computer Systems Laboratory
Title: Allocations of Objects Considered as Nondeterministic Expressions - Towards a More Abstract Axiomatics of Access Types
Author: Meldal, Sigurd
Date: September 1987
Abstract: The concept of access ("reference" or "pointer") values is formalized as parametrized abstract data types, using the axiomatic method of Guttag and Horning as extended by Owe. Two formalizations are given. The first is a formalization of the approach used in the definition of a partial correctness system for Pascal by Hoare and Wirth. Its lack of abstraction is pointed out. This is caused by the annotation language being too expressive. An approach is taken which results in a more abstract system: The expressiveness of the annotation language is reduced and the allocation operator is viewed as a nondeterministic expression. This reinterpretation of the program language results in an appropriate level of abstraction of the proof system. An example is given, verification of a package defining a set type.
CSL-TR-87-335

Report Number: CSL-TR-87-337
Institution: Stanford University, Computer Systems Laboratory
Title: Design of testbed and emulation tools
Author: Flynn, Michael J.
Author: Lundstrom, Stephen
Date: October 1987
Abstract: In order to understand how to predict the performance of concurrent computing systems, an experimental environment is needed. The purpose of the research conducted under the grant was to investigate various aspects of this environment. A first performance prediction system was developed and evaluated (by comparison both with simulations and with actual systems). The creation of a second, complementary system is well underway.
CSL-TR-87-337

Report Number: CSL-TR-87-338
Institution: Stanford University, Computer Systems Laboratory
Title: Sparse, distributed memory prototpe: principles of operation
Author: Flynn, Michael J.
Author: Kanerva, Pentti
Author: Ahanin, Bahram
Author: Flaherty, Paul A.
Author: Hickey, Philip
Author: Bhadkamkar, Neal A.
Date: February 1988
Abstract: Sparse distributed memory is a generalized random-access memory (RAM) for long (e.g., 1,000 bit) binary words. Such words can be written into and read from the memory, and they can also be used to address the memory. The main attribute of the memory is sensitivity to similarity, meaning that a word can be read back not only by giving the original write address but also by giving one close to it as measured by the Hamming distance between addresses. Large memories of this kind are expected to have wide use in speech and scene analysis, in signal detection and verification, and in adaptive control of automated equipment---in general, in dealing with real-world information in real time. The memory can be realized as a simple, massively parallel computer. Digital technology has reached a point where building large memories is becoming practical. This research project is aimed at resolving major design issues that have to be faced in building the memories. This report describes the design of a prototype memory with 256-bit addresses and from 8K to 128K locations for 256-bit words. A key aspect of the design is extensive use of dynamic RAM and other standard components.
CSL-TR-87-338

Report Number: CSL-TR-87-339
Institution: Stanford University, Computer Systems Laboratory
Title: MIPS-X: the external interface
Author: Salz, Arturo
Author: Agarwal, Anant
Author: Chow, Paul
Date: November 1987
Abstract: MIPS-X is a 20-MIPS-peak VLSI processor designed at Stanford University. This document describes the external interface of MIPS-X and the organization of the MIPS-X processor system, including the external cache and coprocessors. The external interface has been designed to optimize the paths between the processor, the external cache and the coprocessors. The signals used by the processor and their timing are documented here. Signal use and timings during exceptions and cache misses are also shown.
CSL-TR-87-339

Report Number: CSL-TR-87-342
Institution: Stanford University, Computer Systems Laboratory
Title: Interprocedural analysis useless for code optimization
Author: Richardson, S.
Author: Ganapathi, M.
Date: November 1987
Abstract: The problem of tracking data flow across procedure boundaries has a long history of theoretical study by people who believed that such information would be useful for code optimization. Building upon previous work, we have implemented an algorithm for interprocedural data flow analysis. The algorithm produces three flow-insensitive summary sets: MOD, USE, and ALIASES. The utility of the resulting information was investigated using an optimizing Pascal compiler. Over a sampling of 27 benchmarks, we found that additional optimizations performed as a result of interprocedural summary information contributed almost nothing to program execution speed.
CSL-TR-87-342

Report Number: CSL-TR-88-347
Institution: Stanford University, Computer Systems Laboratory
Title: Trace compaction using cache filtering with blocking
Author: Agarwal, Anant
Date: December 1987
Abstract: Trace-driven simulation is a popular method of estimating the performance of cache memories, translation lookaside buffers, and paging schemes. Because the cost of trace-driven simulation is directly proportional to trace length, reducing the number of references in the trace significantly impacts simulation time. This paper concentrates on trace-driven simulation for cache analysis. A technique called cache filtering with blocking is presented that compresses traces by exploiting both the temporal and spatial locality in the trace. Experimental results show that this scheme can reduce trace length by nearly two orders of magnitude while introducing less than 15% error in cache miss rate estimates.
CSL-TR-88-347

Report Number: CSL-TR-88-348
Institution: Stanford University, Computer Systems Laboratory
Title: Thor user's manual: tutorial and commands
Author: Alverson, Robert
Author: Blank, Tom
Author: Choi, Kiyoung
Author: Hwang, Sun Young
Author: Salz, Arturo
Author: Soule, Larry
Author: Rokicki, Thomas
Date: January 1988
Abstract: THOR is a behavioral simulation environment intended for use with digital circuits at either the gate, register transfer, or functional levels. Models are written in the CHDL modeling language (a hardware description language based on the C programming language.) Network descriptions are written in the CSL language supporting hierarchical network descriptions. Using interactive mode, batch mode, or both combined, a variety of commands are available to control execution. Simulation output can be viewed in tabular format or in waveforms. A library of components and a toolbox for building simulation models are also provided. Other tools include CSLIM, used to generate boolean equations directly from THOR models and an interface to other simulators (e.g. RSIM and a physical chip tester) so that two simulations can be run concurrently verifying equivalent operation. This technical report is part one of two parts and is formatted similar to UNIX manuals. Part one contains the THOR tutorial and all the commands associated with THOR. Part two contains descriptions of the general purpose functions used in models, the parts library including many TTL components, and the logic analyzer model.
CSL-TR-88-348

Report Number: CSL-TR-88-349
Institution: Stanford University, Computer Systems Laboratory
Title: Thor user's manual: library functions
Author: Alverson, Robert
Author: Blank, Tom
Author: Choi, Kiyoung
Author: Hwang, Sun Young
Author: Salz, Arturo
Author: Soule, Larry
Author: Rokicki, Thomas
Date: January 1988
Abstract: THOR is a behavioral simulation environment intended for use with digital circuits at either the gate, register transfer, or functional levels. Models are written in the CHDL modeling language (a hardware description language based on the "C" programming language). Network descriptions are written in the CSL language supporting hierarchical network descriptions. Using interactive mode, batch mode or both combined, a variety of commands are available to control execution. Simulation output can be viewed in tabular format or in waveforms. A library of components and a toolbox for building simulation models are also provided. Other tools include CSLIM, used to generate boolean equations directly from THOR models and an interface to other simulators (e.g. RSIM and a physical chip tester) so that two simulations can be run concurrently verifying equivalent operation. This technical report is part one of two parts and is formatted similar to UNIX manuals. Part one contains the THOR tutorial and all the commands associated with THOR. Part two contains descriptions of the general purpose functions used in models, the parts library including many TTL components, and the logic analyzer model.
CSL-TR-88-349

Report Number: CSL-TR-88-350
Institution: Stanford University, Computer Systems Laboratory
Title: The ILSP behavioral description language and its graph representation for behavioral synthesis
Author: Odani, Masayasu
Author: Hwang, Sun Young
Author: Blank, Tom
Author: Rokicki, Thomas
Date: March 1988
Abstract: This report describes the ILSP behavioral description language and its internal representation employed in the Hermod behavioral synthesis system. Using combined control and data flow graph C/DFG as an intermediate representation, the Hermod system generates hardware modules and their interconnections from behavioral descriptions. The Hermod system is included in an integrated environment for hardware simulation and synthesis under development at Stanford University. The functional models written in the ILSP can be simulated on the THOR logic/functional/behavioral simulator without translation. After proper verification of its behavior, an ILSP model can be input to the synthesizer for compilation into an RT-level description. This report consists of two parts: the specification of the ILSP language and its graph representation.
CSL-TR-88-350

Report Number: CSL-TR-88-355
Institution: Stanford University, Computer Systems Laboratory
Title: Introductory user's guide to the architect's workbench tools
Author: Torrellas, Josep
Author: Bray, Brian
Author: Cuderman, Kathy
Author: Goldschmidt, Stephen
Author: Kobrin, Alan
Author: Z immerman, Andrew
Date: May 1988
Abstract: The Architect's Workbench is a set of simulation tools to provide insight on how the instruction set and the organization of registers and cache affect processor-memory traffic and, as a result, processor performance. This report is designed to be an introductory guide to the tools for the novice user.
CSL-TR-88-355

Report Number: CSL-TR-88-358
Institution: Stanford University, Computer Systems Laboratory
Title: Interviews: A C++ graphical interface toolkit
Author: Linton, Mark A.
Author: Calder, Paul R.
Author: Vlissides, John M.
Date: July 1988
Abstract: We have implemented an object-oriented user interface package, called InterViews, that supports the composition of a graphical user interface from a set of interactive objects. The base class for interactive objects, called an interactor, and base class for composite objects, called a scene, define a protocol for combining interactive behaviors. Subclasses of scenes define common types of composition: a box tiles its components, a tray allows components to overlap or constrain each other's placement, a deck stacks its components so that only one is visible, a frame adds a border, and a viewport shows part of a component. Predefined components include menus, scrollers, buttons, and text editors. InterViews also includes classes for structured text and graphics. InterViews is written in C++ and runs on top of the X window system.
CSL-TR-88-358

Report Number: CSL-TR-88-364
Institution: Stanford University, Computer Systems Laboratory
Title: Applying object-oriented design to structured graphics
Author: Vlissides, John M.
Author: Linton, Mark A.
Date: August 1988
Abstract: Structured graphics are useful for building applications that use a direct manipulation metaphor. Object-oriented languages offer inheritance, encapsulation, and runtime binding of operations to objects. Unfortunately, standard structured graphics packages do not use an object-oriented model, and object-oriented systems do not provide general-purpose structured graphics, relying instead on low-level graphics primitives. An object-oriented approach to structured graphics can give application programmers the benefits of both paradigms. We have implemented a two-dimensional structured graphics library in C++ that presents an object-oriented model to the programmer. The graphic class defines a general graphical object from which all others are derived. The picture subclass supports hierarchical composition of graphics. Programmers can define new graphical objects either statically by subclassing or dynamically by composing instances of existing classes. We have used both this library and an earlier, non-object-oriented library to implement a MacDraw-like drawing editor. We discuss the fundamentals of the object-oriented design and its advantages based on our experiences with both libraries.
CSL-TR-88-364

Report Number: CSL-TR-88-367
Institution: Stanford University, Computer Systems Laboratory
Title: An overview of VAL
Author: Augustin, Larry M.
Author: Gennart, Benoit A.
Author: Huh, Youm
Author: Luckham, David C.
Author: Stanculescu, Alec G.
Date: October 1988
Abstract: VAL (VHDL Annotation Language) provides a small number of new language constructs to annotate VHDL hardware descriptions. VAL annotations, added to the VHDL entity declaration in the form of formal comments, express intended behavior common to all architectural bodies of the entity. Annotations are expressed as parallel processes that accept streams of input signals and generate constraints on output streams. VAL views signals as streams of values ordered by time. Generalized timing expressions allow the designer to refer to relative points on a stream. No concept of preemptive delayed assignment or inertial delay are needed when referring to different relative points in time on a stream. The VAL abstract state model permits abstract data types to be used in specifying history dependent device behavior. Annotations placed inside a VHDL architecture define detailed correspondences between the behavior specification and architecture. The result is a simple but expressive language extension of VHDL with possible applications to automatic checking of VHDL simulations, hierarchical design, and automatic verification of hardware designs in VHDL.
CSL-TR-88-367

Report Number: CSL-TR-88-369
Institution: Stanford University, Computer Systems Laboratory
Title: Composing user interfaces with interviews
Author: Linton, Mark A.
Author: Vlissides, John M.
Author: Calder, Paul R.
Date: November 1988
Abstract: In this paper we show how to compose user interfaces with InterViews, a user interface toolkit we have developed at Stanford. InterViews provides a library of predefined objects and a set of protocols for composing them. A user interface is created by composing simple primitives in a hierarchical fashion, allowing complex user interfaces to be implemented easily. InterViews supports the composition of interactive objects (such as scroll bars and menus), text objects such as words and whitespace, and graphics objects such as circles and polygons. To illustrate how InterViews composition mechanisms facilitate the implementation of user interfaces, we present three simple applications: a dialog box built from interactive objects, a drawing editor using a hierarchy of graphical objects, and a class browser using a hierarchy of text objects. We also describe how InterViews supports consistency across applications as well as end-user customization.
CSL-TR-88-369

Report Number: CSL-TR-88-373
Institution: Stanford University, Computer Systems Laboratory
Title: Sparse distributed memory prototype: address module hardware guide
Author: Flynn, M. J.
Author: Z eidman, R.
Author: Lochner, E.
Date: December 1988
Abstract: This document is a detailed specification of the hardware design of the Address Module for the prototype Sparse Distributed Memory. It contains all of the information needed to build, test, debug, modify and operate the Address Module.
CSL-TR-88-373

Report Number: CSL-TR-89-378
Institution: Stanford University, Computer Systems Laboratory
Title: Analysis of Parallelism and Deadlocks in Distributed-Time Logic Simulation
Author: Soule, Larry
Author: Gupta, Anoop
Date: May 1989
Abstract: This paper explores the suitability of the Chandy-Misra algorithm for digital logic simulation. We use four realistic circuits as benchmarks for our analysis, with one of them being the vector-unit controller for the Titan supercomputer from Ardent. Our results show that the average number of logic elements available for concurrent execution ranges from 10 to 111 for the four circuits, with an overall average of 68. Although this is twice as much parallelism as that obtained by traditional event-driven algorithms for these circuits, we feel it is still too low. One major factor limiting concurrency is the large number of global synchronization points --- "deadlocks" in the Chandy-Misra terminology --- that occur during execution. Towards the goal of reducing the number of deadlocks, the paper presents a classification of the types of deadlocks that occur during digital logic simulation. Four different types are identified and described intuitively in terms of circuit structure. Using domain specific knowledge, the paper proposes methods for reducing these deadlock occurrences. For one of the benchmark circuits, the use of the proposed techniques eliminated all deadlocks and increased the average parallelism from 40 to 160. We believe that the use of such domain knowledge will make the Chandy-Misra algorithm significantly more effective than it would be in its generic form.
CSL-TR-89-378

Report Number: CSL-TR-89-379
Institution: Stanford University, Computer Systems Laboratory
Title: Two Dimensional Pinpointing: An Application of Formal Specification to Debugging Packages
Author: Luckham, David
Date: April 1989
Abstract: New methods of testing and debugging software utilizing high-level formal specifications are presented. These methods require a new generation of support tools. Such tools must be capable of automatically comparing the runtime behavior of hierarchically structured software with high-level specifications; they must provide information about inconsistencies in terms of abstractions used in specifications. This use of specifications has several advantages over present-day debugging methods: (1) the debugging problem itself is precisely defined by specifications; (2) violations of specifications are detected automatically, thus eliminating the need to search output traces and recognize errors manually; (3) complex tests, such as tests for side-effects on global data, can be made easily; (4) the new methods are independent of any compiler and runtime environment for a programming language; (5) they apply generally to hierarchically structured software --- e.g., packages containing nested units, (6) they also apply to other life-cycle processes such as analysis of prototypes, and the use of prototypes to build formal specifications. In this paper a particular process for locating errors in software packages, called two dimensional pinpointing, is described. Tests consist of sequences of package operations (first dimension). Specifications at the highest (most abstract) level are checked first. If violations occur then new specifications are added if possible, otherwise checking of specifications at the next lower level (second dimension) is activated. Violation of a new specification provides more information about the error which reduces the region of program text under suspicion. All interaction between programmer and toolset is phrased in terms of the concepts used to specify the program. Two dimensional pinpointing is presented using the Anna specification language for Ada programs. Anna and a toolset for comparing behavior of Ada programs with Anna specifications is described. Pinpointing techniques are then illustrated by examples. The examples involve debugging of Ada packages, for which Anna provides a rich set of specification constructs. The Anna toolset supports use of the methodology on the full Ada/Anna languages, and is being engineered to commercial standards.
CSL-TR-89-379

Report Number: CSL-TR-89-380
Institution: Stanford University, Computer Systems Laboratory
Title: Unidraw: A Framework for Building Domain-Specific Graphical Editors
Author: Vlissides, John M.
Author: Linton, Mark A.
Date: July 1989
Abstract: Unidraw is a framework for creating object-oriented graphical editors in domains such as technical and artistic drawing, music composition, and CAD. The Unidraw architecture simplifies the construction of these editors by providing programming abstractions that are common across domains. Unidraw defines four basic abstractions: components encapsulate the appearance and semantics of objects in a domain, tools support direct manipulation of components, commands define operations on components and other objects, and external representations define the mapping between components and the file format generated by the editor. Unidraw also supports multiple views, graphical connectivity and confinement, and dataflow between components. This paper describes the Unidraw design, implementation issues, and three prototype domain-specific editors we have developed with Unidraw: a drawing editor, a user interface builder, and a schematic capture system. Experience indicates a substantial reduction in implementation time compared with existing tools.
CSL-TR-89-380

Report Number: CSL-TR-89-383
Institution: Stanford University, Computer Systems Laboratory
Title: Super-Scalar Processor Design
Author: Johnson, William M.
Date: June 1989
Abstract: A super-scalar processor is one that is capable of sustaining an instruction-execution rate of more than one instruction per clock cycle. Maintaining this execution rate is primarily a problem of scheduling processor resources (such as functional units) for high utilization. A number of scheduling algorithms have been published, with wide-ranging claims of performance over the single-instruction issue of a scalar processor. However, a number of these claims are based on idealizations or on special-purpose applications. This study uses trace-driven simulation to evaluate many different super-scalar hardware organizations. Super-scalar performance is limited primarily by instruction-fetch inefficiencies caused by both branch delays and instruction misalignment. Because of this instruction-fetch limitation, it is not worthwhile to explore highly-concurrent execution hardware. Rather, it is more appropriate to explore economical execution hardware that more closely matches the instruction throughput provided by the instruction fetcher. This study examines techniques for reducing the instruction-fetch inefficiencies and explores the resulting hardware organizations. This study concludes that a super-scalar processor can have nearly twice the performance of a scalar processor, but that this requires that four major hardware features: out-of-order execution, register renaming, branch prediction, and a four-instruction decoder. These features are interdependent, and removing any single feature reduces average performance by 18% or more. However, there are many hardware simplifications that cause only a small performance reduction.
CSL-TR-89-383

Report Number: CSL-TR-89-387
Institution: Stanford University, Computer Systems Laboratory
Title: Specification and automatic verification of self-timed queues
Author: Dill, David L.
Author: Nowick, Steven M.
Author: Sproull, Robert F.
Date: August 1989
Abstract: Speed-independent circuit design is of increasing interest because of global timing problems in VLSI. Unfortunately, speed-independent design is very subtle. We propose the use of state-machine verification tools to ameliorate this problem. This paper illustrates issues in the modelling, specification, and verification of speed-independent circuits through consideration of self-timed queues. User-level specifications are given as Petri nets, which are translated into trace structures for automatic processing. Three different implementations of queues are considered: a chain of queue cells, two parallel chains, and "circular buffer" example using a separate RAM.
CSL-TR-89-387

Report Number: CSL-TR-89-390
Institution: Stanford University, Computer Systems Laboratory
Title: VAL to VHDL transformer: an implementation guide
Author: Augustin, Larry M.
Author: Gennart, Benoit A.
Author: Huh, Youm
Author: Luckham, David C.
Author: Sahai, Bob
Author: Stanculescu, Alec G.
Date: September 1989
Abstract: This report presents one implementation of the VAL semantics. It is based on a transformation from VAL annotated VHDL to self-checking VHDL that is equivalent to the original source from the simulation semantics standpoint. The transformation is performed as a sequence of tree to tree transformations. The report describes the semantic preserving transformations, as well as the structure of the transformer.
CSL-TR-89-390

Report Number: CSL-TR-89-395
Institution: Stanford University, Computer Systems Laboratory
Title: Design of Run Time Monitors for Concurrent Programs
Author: Helmbold, David
Author: Bryan, Doug
Date: October 1989
Abstract: We address the problem of correctly monitoring the run time behavior of a concurrent program. We view a program as having three (potentially different) sets of behavior: computations of the original program when monitoring is not performed, computations after the monitor is added to the program, and "observations'' produced by the monitor. Using these sets of behaviors, we define four properties of monitor systems: non-interference, safety, accuracy and correctness. We define both a minimal level and a total level for each of these properties. The non-interference and safety properties address the degree to which the presence of the monitor alters a computation (the differences between the first two sets of computations). Accuracy is a relationship between a monitored computation and the observation of the computation produced by the monitor. Correctness is a relationship between observations and the unmonitored computations. A run time monitor for TSL-1 and Ada has been implemented. This monitor system uses two techniques for constructing the observation. We show that any monitoring system using these two techniques is at least minimally correct, from which the (minimal) correctness of the TSL-1 monitor follows.
CSL-TR-89-395

Report Number: CSL-TR-89-396
Institution: Stanford University, Computer Systems Laboratory
Title: COOL: a language for parallel programming
Author: Chandra, Rohit
Author: Gupta, Anoop
Author: Hennessy., John L.
Date: October 1989
Abstract: We present COOL, an object-oriented parallel language derived from C++ by adding constructs to specify concurrent execution. We describe the language design, and the facilities for creating parallelism, performing synchronization, and communicating. The parallel construct is parallel functions that execute asynchronously. Synchronization support includes mutex functions and future types. A shared-memory model is assumed for parallel execution, and all communication is through shared-memory. The parallel programming model of COOL has proved useful in several small programs that we have attempted. We present some examples and discuss the primary implementation issues.
CSL-TR-89-396

Report Number: CSL-TR-89-397
Institution: Stanford University, Computer Systems Laboratory
Title: Design and Clocking of VLSI Multipliers
Author: Santoro, Mark Ronald
Date: October 1989
Abstract: This thesis presents a versatile new multiplier architecture, which can provide better performance than conventional linear arry multipliers at a fraction of the silicon area. The high performance is obtained by using a new binary tree structure, the 4-2 tree. The 4-2 tree is symmetric and far more regular than other multiplier trees while offering comparable performance, making it better suited for VLSI implementations. To reduce area, a partial, pipelined 4-2 tree is used with a 4-2 carry-save accumulator placed at its outputs to iteratively sum the partial products as they are generated. Maximum performance is obtained by accurately matching the iterative clock to the pipeline rate of the 4-2 tree, using a stoppable on-chip clock generator. To prove the new architecture a test chip, called SPIM, was fabricated in a 1.6 (Mu)m CMOS process. SPIM contains 41,000 transistors with an array size of 2.9 X 5.3 mm. Running at an internal clock frequency of 85 MHz, SPIM performs the 64 bit mantissa portion of a double extended precision floating-point multiply in under 120 ns. To make the new architecture commercially interesting, several high-performance rounding algorithms compatible with IEEE standard 754 for binary floating-point arithmetic have also been developed.
CSL-TR-89-397

Report Number: CSL-TR-89-398
Institution: Stanford University, Computer Systems Laboratory
Title: The relative effects of optimization on instruction architecture performance
Author: Cuderman, K. J.
Author: Flynn, M. J.
Date: October 1989
Abstract: The Stanford Architect's Workbench is a simulation platform used to evaluate the impact of optimization on the relative performance of instruction set architectures. The total impact optimization makes on an application is the combined interaction of the optimizer, the architecture, and the cache configuration. The relative performance of seven architectures are compared using a suite of six application programs. Optimization reduces the number of executed instructions, but its effectiveness varies with architecture. Register architectures capitalize on temporaries introduced by optimization without incurring penalties for moving data. Short instructions for register operations reduce the instruction bandwidth in addition to reducing the number of instructions. Reducing the number of executed instructions does not yield a reduction in memory traffic. Optimization only slightly alters the program working set size. An instruction cache quickly masks the effect of optimization. The result is that the instruction memory traffic remains almost constant for an application.
CSL-TR-89-398

Report Number: CSL-TR-89-400
Institution: Stanford University, Computer Systems Laboratory
Title: Sparse distributed memory prototype: principles and operation
Author: Flynn, Michael J.
Author: Kanerva, Pentti
Author: Bhadkamkar, Neil
Date: December 1989
Abstract: Sparse distributed memory is a generalized random-access memory (RAM) for long (e.g., 1,000 bit) binary words. Such words can be written into and read from the memory, and they can also be used to address the memory. The main attribute of the memory is sensitivity to similarity, meaning that a word can be read back not only by giving the original write address but also by giving one close to it as measured by the Hamming distance between addresses. Large memories of this kind are expected to have wide use in speech recognition and scene analysis, in signal detection and verification, and in adaptive control of automated equipment---in general, in dealing with real-world information in real time. The memory can be realized as a simple, massively parallel computer. Digital technology has reached a point where building large memories is becoming practical. This research project is aimed at resolving major design issues that have to be faced in building the memories.This report describes the design of a prototype memory with 256-bit addresses and from 8K to 128K locations for 256-bit words. A key aspect of the design is extensive use of dynamic RAM and other standard components.
CSL-TR-89-400