A miniPascal compiler for the E-machine by Frances Wren Goosey A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in Computer Science Montana State University © Copyright by Frances Wren Goosey (1993) Abstract: This thesis is the third phase in the development of a program animation system called DYNALAB (DYNAmic LABoratory). DYNALAB is an interactive software system that demonstrates programming and computer science concepts at an introductory level. The first DYNALAB development phase was the design of a virtual computer—the E-machine (Education Machine). The E-machine was designed by Samuel D. Patton and is presented in his Master’s thesis, The E-machine: Supporting the Teaching of Program Execution Dynamics. In order to facilitate the support of program animation activities, the E-machine has many unique features, notably the ability to execute in reverse. The second phase in the development of DYNALAB was the design and implementation of an E-machine emulator, which is presented in Michael L. Birch’s Master’s thesis, An Emulator for the E-machine. This thesis presents the design and implementation of a compiler for the E-machine. The compiler’s source language is miniPascal, which is a subset of ISO Standard Pascal. The miniPascal compiler was developed using the Unix lex and yacc compiler development tools. It has successfully generated object files ready for execution on the E-machine. This thesis focuses on the compilation aspects that are unique to the E-machine architecture and the planned animation environment.  A miniPASCAL COMPILER FOR THE E-MACHINE by Frances Wren Goosey A thesis submitted in partial fulfillment of the requirements- for the degree of Master of Science in Computer Science Montana State University Bozeman, Montana April 1993 7)^18 APPROVAL of a thesis submitted by Frances Wren Goosey ii This thesis has been read by each member of the thesis committee and has been found to be satisfactory regarding content, English usage, format, citations, bibliographic style, and consistency, and is ready for submission to the College of Graduate Studies. ____ ^ / 2 T / ^ 3_________ i3 • Date Chairperson ,^ Graduate Committee Approved for the Major Department Approved for the College of Graduate Studies Jy7 7 / ^ 3 Date Graduate Dean Ill STATEMENT OF PERMISSION TO USE In presenting this thesis in partial fulfillment of the requirements for a master’s degree at Montana State University, I agree that the Library shall make it available to borrowers under rules of the Library. If I have indicated my intention to copyright this thesis by including a copyright notice page, copying is allowable only for scholarly purposes, consistent with “fair use” as prescribed in the U.S. Copyright Law. Requests for permission for extended quotation from or reproduction of this thesis in whole or in parts may be granted only by the copyright holder. ACKNOWLEDGMENTS This thesis is part of a larger software development project, called DYNALAB. The DYNALAB project evolved from an earlier pilot project called DYNAMOD [R oss 91], a program animation system that has been used exten­ sively at Montana State University in introductory Pascal programming classes. DYNAMOD was originally developed by Cheng Ng [Ng 82-1, Ng 82-2] and later extended and ported to various computing environments by a number of students, including Lih-nah Meng, Jim Mclnerny, Larry Morris, and Dean Gehnert. Experi­ ence with DYNAMOD proved the worth of program animation as a tool for teaching and learning programming and computer science concepts. It also provided exten­ sive insight into the facilities needed in a fully functional program animation system and the inspiration for the subsequent DYNALAB project and this thesis. Many people have contributed to the DYNALAB project. Samuel Patton and Michael Birch laid the groundwork for this thesis by designing and implementing the underlying virtual machine for DYNALAB in their Masters’ theses. As this thesis is being completed, Craig Pratt is developing the animator portion of DYNALAB, and Robin Winslett and David Poole are implementing new compilers for the project. I would like to take this opportunity to thank my graduate committee members, Dr. Rockford Ross, Dr. Gary Harkin, and Dr. Year Back Yoo, and the rest of the faculty members from the Department of Computer Science for their help and guidance during my graduate program. I would also like to thank my thesis advisor, Dr. Ross, and DYNALAB team members, David Poole, Craig Pratt, Robin Winslett, and Michael Woodring, for their help and suggestions for my thesis. The original DYNAMOD project was supported by the National Science Foun­ dation, grant number SPE-8320677. Work on this thesis was also supported in part by a grant from the National Science Foundation, grant number USE-9150298. VContents Page LIST OF TABLES................................................. viii LIST OF FIG U R ES.................................................................................................. ix ABSTRACT ............................................................................................................... x 1. INTRODUCTION............................................................................................. I The DYNALAB S y s tem ...................................................................................... I Preview . ............................................................................................................... 3 2. THE E-MACHINE............................................................................................. 5 E-machine Design Considerations.................................................................... 5 E-machine Architecture ................................................................................... g E-machine E m u la to r ............................................................................................. 14 E-machine Object File S ec tions...........................................................................14 The CODESECTION............................................................................. ' 15 The PACKETSECTION..............................................................................16 The VARIABLESECTION . ........................................................... . . . 16 The LABELSECTION.................................................................................17 The SOURCESECTION.......................................................................... 17 The STATSCOPESECTION........................................................................17 The STRINGSECTION..............................................................................18 3. E-MACHINE COMPILATION CONSIDERATIONS......................................20 Program Animation Units and E-code Packets..................................................20 Identifying Program Animation U n its ........................................................21 Translating Program Animation Units into E-code P ack e ts ...................23 Generation of the Static Scope T a b le ................................................................. 25 Translating Enumerated Type Variables ........................................................... 29 Identifying Critical and Noncritical E-code Instructions.................................. 30 4. THE DESIGN OF THE miniPASCAL COMPILER.........................................32 The miniPascal Language ....................................................................................32 Overview of the miniPascal Com piler................................................................. 34 Error Detection and Recovery..............................................................................36 C onten ts— Continued Page Optimization ................................................................................ ..................... 30 The Compiler M odules...................................................................................... 37 The Main M odule..........................................................................................37 The Parser M odu le .......................................................................................38 Calls to the S c a n n e r ...........................................................................39 Interface to the Symbol T ab le ........................................................... 39 Initiating Semantic A c tio n s .............................................................. 39 Providing for Dynamic Scop ing ........................................................ 40 Translating Animation Units into P a c k e ts ..................................... 41 The Lookahead Problem in Animation Unit Translation . . . . 42 The Semicolon Problem in Animation Unit Translation................43 Adjusting an Animation Unit’s Ending D e lim ite r......................... 44 Adjusting an Animation Unit’s Beginning Delimiter...................... 45 Adjusting the Starting Memory Address of a Packet...................... 46 Adjusting the Ending Memory Address of a P a c k e t ...................... 47 Fragmented Animation U n i t s ........................................................... 48 To HighHght or Not ...........................................................................53 The Scanner M od u le ....................................................................................54 The Code Driver M odule..............................................................................56 The Semantic Analysis Module ................................................................. 56 The PACKET M o d u le .................................................................................57 The SOURCE M o d u le .................................................................................57 The LABEL M o d u le ....................................................................................57 The VARIABLE M odule..............................................................................58 The STRING Module....................................................................................58 The Error Module.......................................................................................... 62 The Memory Allocation M odule................................................................. 64 The Assembly Code M odu le ........................................................................64 The CODE M odule.......................................................................................64 The Symbol Table M odule.......................................................................... 65 The STATSCOPE M odule ...........................................................................74 Generating a Static Scope Block ................................. 74 The ProcNum Field ...........................................................................75 Writing the STATSCOPESECTION ...............................................80 Example of STATSCOPESECTION Generation............................ 81 vi Vll Contents— Continued Page 5. CONCLUSIONS AND FUTURE ENHANCEMENTS................. 86 Conclusions.............................................................................................................86 Future Enhancements .................................. '................................................... 87 REFERENCES ....................................................................... 89 APPENDICES.................................................... 92 APPENDIX A—THE E-MACHINE INSTRUCTION SE T .................................. 93 APPENDIX B—THE E-MACHINE ADDRESSING M O D ES.......................... 104 APPENDIX C -A miniPASCAL COMPILATION EX A M PLE ....................... 109 List o f Tables viii Table Page 1. Packet Table Resulting from Compilation of Program Saippl . . . . . . 25 2. Static Scope Table Resulting from Compilation of Program Sampl . . 26 3. Packet Table Resulting from Compilation of Program Increment I . . 51 4. Static Scope Table Resulting from Compilation of Program Ftrl . . . 78 5. Scope Owner Table for Program Samp2 ..................................................... 83 6. Scope Block for Function B in Procedure A in Program Samp2 . . . . 83 7. Scope Block for Procedure A in Program Samp2 ................................83 8. Scope Block for Procedure B in Program Samp2 ................................84 9. Scope Block for Program Scope in Program S'amp2 ............................... 84 10. Scope Block for “Bootstrap” Scope in Program Samp2 ......................... 84 11. Final Static Scope Table for Program Samp2 .................................. ... . 85 12. The E-code LABETSECTION for Program Samp3 116 13. The E-code VARIABLESECTION for Program S am p 3 ..................... 117 14. The E-code PACKETSECTION for Program Samp3 ■ ........................... 119 15. The E-code STATSCOPESECTION for Program Samp3 . . . . . . . 120 List o f Figures Figure Page 1. The E-machine ..................................................................................... 9 2. Source Code for Program Sampl .............................................................. 22 3. Animation Units Identified in Program S a m p l ...................................22 4. E-code Instructions Resulting from Compilation of Program Sampl . 24 5. Animation Display After Execution of X : = I ; 29 6. E-code Instructions Translating N := K + I * J ...................................31 7. Schematic Diagram of the miniPascal Compiler ......................................35 8. Code Fragment Illustrating the Semicolon P ro b le m ..........................44 9. Source Code for Program Increment I .................................................. 49 10. E-code Translation of Program Increment I ......................................... 49 11. Source Code for Program Increment2 .................................................. 52 12. E-code Translation of Program Increment2 ......................................... 52 13. Source Code for a CASE Statem ent......................................................55 14. Source Code for Program Payro lll.........................................................60 15. Animation Display After Execution of Program Payrolll ..................... 60 16. String Space’s Relationship with Variable Registers and Data Memory 61 17. Source Code for Program Payroll2.........................................................63 18. Animation Display After Execution of Program Payroll' 2 .................63 19. The Symbol Table Hash Im plem entation............................................ 67 20. The Symbol Table Structures .................................................................... 69 21. The miniPascal Identifier T y p e s ............................................................ 70 22. The miniPascal Identifier Classes ...............................................................70 23. Source Code for Program F t r l ............................................................... 77 24. Animation Display After Final Recursive Call of Function Fact . . . . 77 25. Procedure Count Array and Dynamic Scope S tack............................. 79 26. Source Code for Program Samp2 82 27. The E-code SOURCESECTION for Program Samp3 .......................... 115 28. The E-code STRINGSECTION for Program Samp3 ......................... 118 29. The E-code CODESECTION for Program Samp3 ............................ 121 30. Animation Display After Constant Declarations in Program Samp3 . 129 31. Animation Display Before Calling Procedure InitD in Program SampS 130 32. Animation Display at End of Procedure InitD in Program SampS . . 131 ABSTRACT This thesis is the third phase in the development of a program animation system called DYNALAB (DYNAmic LABoratory). DYNALAB is an interactive software system that demonstrates programming and computer science concepts at an introductory level. The first DYNALAB development phase was the design of a virtual computer—the E-machine (Education Machine). The E-machine was de­ signed by Samuel D. Patton and is presented in his Master’s thesis, The E-machine: Supporting the Teaching of Program Execution Dynamics. In order to facilitate the support of program animation activities, the E-machine has many unique features, notably the ability to execute in reverse. The second phase in the development of DYNALAB was the design and implementation of an E-machine emulator, which is presented in Michael L. Birch’s Master’s thesis, An Emulator for the E-machine. This thesis presents the design and implementation of a compiler for the E-machine. The compiler’s source language is miniPascal, which is a subset of ISO Standard Pascal. The miniPascal compiler was developed using the Unix lex and yacc compiler development tools. It has successfully generated object files ready for execution on the E-machine. This thesis focuses on the compilation aspects that are unique to the E-machine architecture and the planned animation environment. IC H APTER I IN TRO DUCTIO N T h e D Y N A L A B S y stem This thesis represents- the third phase of the ongoing DYNALAB software de­ velopment project. DYNALAB is an acronym for DYNAmic LABoratory, and its purpose is to support formal computer science laboratories at the introductory un­ dergraduate level. Students will use DYNALAB to experiment with and explore programs and fundamental concepts of computer science. The current objectives of DYNALAB include: o providing students with facilities for studying the dynamics of programming language constructs—such as iteration, selection, recursion, parameter passing mechanisms, and so forth—in an animated and interactive fashion; o providing students with capabilities to validate or empirically determine the run time complexities of algorithms interactively in the experimental setting of a laboratory; o extending to instructors the capability of incorporating animation into lectures on programming and algorithm analysis. In order to meet these immediate objectives, the DYNALAB project was di­ vided into four phases. The first phase was the design of a virtual computer, called the Education Machine, or E-machine, that would support the animation activities 2envisioned for DYNALAB. The two primary technical problems to overcome in the design of the E-machine were the incorporation of features for reverse execution and provisions for coordination with a program animator. Reverse execution was engi­ neered into the E-machine to allow students and instructors to animate repetitively sections of a program that were unclear without requiring that the entire program be restarted. Also, since the purpose of DYNALAB is to allow user interaction with animated programs, the E-machine had to be designed to be driven by an animator system that controls the execution of programs and displays pertinent information dynamically in animated fashion on a video screen. This first phase was completed by Samuel Patton in his Master’s thesis, The E-machine: Supporting the Teaching of Program Execution Dynamics [Patton 89]. The second phase of the DYNALAB project was the implementation of an em­ ulator for the E-machine. This was accomplished by Michael Birch in his Master’s thesis, An Emulator for the E-Machine, [Birch 90]. As the emulator was imple­ mented, Birch also included some modifications and extensions to the E-machine. The third phase of the DYNALAB project, and the subject of this thesis, is the design and implementation of a Pascal compiler for the E-machine. The source language for the compiler is a subset of ISO Standard Pascal, called miniPascal, and the object language is E-code, the machine language of the E-machine. During com­ piler development, the E-machine and its emulator were again modified somewhat as practical considerations uncovered new design issues. The fourth phase of the DYNALAB project, currently in progress, is the design and implementation of a program animator that will drive the E-machine and display miniPascal programs in dynamic, animated fashion under control of the user. Once the animator is complete, the first functional version of DYNALAB will be ready for use in introductory computer science laboratory and lecture courses by students and instructors alike. 3The DYNALAB project will not end at this point. Compilers for other pro­ gramming languages, such as C, Ada, and Juno—a pseudolanguage used purely for teaching [Winslett 93]—are in the initial stages of development. Algorithm anima­ tion (as opposed to program animation—see for example, [Brown 88-1, Brown 88-2]) is also a planned extension to DYNALAB. In fact, the DYNALAB project will likely never be finished, as new ideas and pedagogical conveniences are incorporated as they become apparent. P rev iew The thesis consists of five chapters and three appendices. Chapter I presents an overview of the thesis. Since a thorough understanding of the target computer’s architecture and instruction set is required for compiler development, a summary of the E-machine and its emulator is given in chapter 2. Much of the information in chapter 2 is taken from the Patton and Birch theses. During the compiler de­ velopment process, it became apparent that several additional E-machine features and modifications were necessary or desirable. These changes have been made and are so noted in chapter 2. For a more detailed explanation of the E-machine and its emulator, the reader is referred to the above-mentioned theses. Chapter 3 describes the special considerations that E-machine compilers must address in order to function within the DYNALAB animation environment. Chapter 4 contains a description of the miniPascal compiler. The Pascal subset comprising the miniPascal language is presented, followed by an overview of the compiler design. It is the intent of chapter 4 to focus on the solutions to the compi­ lation considerations unique to the DYNALAB animation environment. The current status of the miniPascal compiler is given in chapter 5. Chapter 5 also includes sug­ gestions for future enhancements. 4Since there are many E-code examples used throughout the thesis, appendices A and B are included for completeness. Appendix A describes the E-machine in­ struction set and appendix B describes the E-machine addressing modes. Both of these appendices are adapted from chapter 2 of Birch’s thesis. Appendix C presents a complete mini P as cal compilation example. 5C H A PTER 2 THE E-M ACHINE This chapter is included to provide a description of the E-machine and is adapted from chapter 5 of Patton’s thesis [Patton 89] and chapters I, 2, and 3 of Birch’s thesis [Birch 90]. This chapter is a summary and update of information from those two theses (much of the material is taken verbatim). New E-machine features that have been added as a result of this thesis are noted by a leading asterisk (*). The E-machine is a virtual computer with its own machine language, called E-code. The E-code instructions are described in appendix A; these instructions may reference various E-machine addressing modes, which are described in appendix B. The E-machine’s task is to execute E-code translations of high level language pro­ grams. The miniPascal language is the first language to be translated into E-code. The real purpose of the E-machine is to support the DYNALAB program animation system, as described more fully in [Ross 91], [Birch 90], [Ross 93] and in Patton’s thesis [Patton 89], where it was called a “dynamic display system”. E -m ach in e D esig n C on sid era tion s The fact that the E-machine’s sole purpose is to support program animation was central to its design. The E-machine operates as follows. After the E-machine 6is loaded with a compiled E-code translation of a high level language program, it awaits a call from a driver program (the animator). A call from the animator causes a group of E-code instructions, called a packet, to be executed by the E-machine. A packet contains the E-code translation of a single high level language construct, or animation unit, that is to be highlighted by the animator. An animation unit could be a complete high level language assignment statement, for example A := X + 2*Y; which is to be highlighted as a result of a single call from the animator; the cor­ responding packet would be the E-code instructions that translate this assignment statement. Another animation unit could be just the conditional part of an if state­ ment; in this case the corresponding packet would be just the E-code instructions translating the conditional expression. It is the compiler writer’s responsibility to identify the animation units in the source program so that corresponding E-code packets can be generated. After the E-machine executes a packet, control is re­ turned to the animator, which then performs the necessary animation activities before repeating the process by again calling the E-machine to execute the packet corresponding to the next animation unit. Chapter 3 describes this process in more detail. Since the E-machine’s purpose is to enable program execution dynamics of high level programming languages to be displayed easily by a program animator, it had to incorporate the following: o structures for easy implementation of high level programming language constructs; o a simple method for implementing functions, procedures, and parameters; o the ability to execute either forward or in reverse. The driving force in the design of the E-machine was the requirement for reverse execution. The approach taken by the E-machine to accomplish reverse execution 7is to save the minimal amount of information necessary to recover just the previous E-machine state from the current state in a given reversal step. The E-machine can then be restored to an arbitrary prior state by doing the reversal one state at a time until the desired prior state is obtained. This one-step-at-a-time reversal means that it is necessary only to store successive differences between the previous state and the current state, instead of storing the entire state of the E-machine for each step of execution. One other aspect of program animation substantially influenced the design of the reversing mechanism of the E-machine. Since the animator is meant to animate high level language programs, the E-machine actually has to be able to effect rever­ sal only through high level language animation units in one reversal step, not each low level E-machine instruction in the packet that is the translation of an animation unit. This observation led to further efficiencies in the design of the E-machine and the incorporation of two classes of E-machine code instructions, critical and noncrit- ical. An E-machine instruction within a packet is classified as critical if it destroys information essential to reversing through the corresponding high level language an­ imation unit; it is classified as noncritical otherwise. For example, in translating the animation unit corresponding to an arithmetic assignment statement, a number of intermediate values are likely to be generated in the corresponding E-code packet. These intermediate values are needed in computing the value on the right-hand side of the assignment statement before this value can be assigned to the variable on the left-hand side. However, the only value that needs to be restored during re­ verse execution as far as the animation unit is concerned is the original value of the variable on the left-hand side. The intermediate values computed by various E-code instructions are of no consequence. Hence, E-code instructions generating intermediate values can be classified as noncritical and their effects ignored during reverse execution. It is the compiler writer’s responsibility to produce the correct 8E-code (involving critical and noncritical instructions) for reverse execution. How­ ever, it should also be noted that the E-machine has the flexibility to accurately execute E-code in reverse, instruction by instruction (rather than a packet at a time), by simply designating each E-code instruction as critical. E -m ach in e A rch itec tu re Figure I shows the logical structure of the E-machine. A stack-based architecture was chosen for the Ermachine; however, a number of components that are not found in real stack-based computers were included. Program memory contains the E-code program currently being executed by the E-machine. Program memory is loaded with the instruction stream found in the CODESECTION of the E-machine object code file, which is described later in this chapter. The program counter contains the address in program memory of the next E-code instruction to be executed. The previous program counter, needed for reverse execution, contains the address in program memory of the most recently executed E-code instruction. Packet memory contains information about the translated E-code packets and their corresponding source language animation units. Packet memory, which is loaded with the information found in the PACKETSECTION of the E-machine ob­ ject code file, essentially effects the “packetization” of the E-code program found in program memory. Packet information includes the starting and ending line and column numbers of the original source program animation unit (e.g, an entire as­ signment statement, or just the conditional expression in an if statement) whose translation is the packet of E-code instructions about to be executed. Other packet information includes the starting and ending program memory addresses for the E-code packet, which are used internally to determine when execution of the packet 9Label Label Registers Stacks Evaluation Stack Evaluation Register Stack Variable Variable Registers Stacks Index Register Address Register ^Dynamic Scope * Dynamic Stack Scope Register Stack STATIC SCOPE MEMORY Return *Save Address Return Dynamic *Save Scope Stack Register Dynamic Scope Stack Stack Register Address Stack Previous Program Counter Save Stack Registers Save Stack Program CounterPacket Register PACKET MEMORY SOURCE MEMORY Figure I: The E-machine 10 is complete. The packet register contains the packet memory address of the packet information corresponding to either the next packet to be executed, or the packet that is currently being executed. The variable registers are an unbounded number of registers that are assigned to source program variables, constants, and parameters during compilation of a source program into E-code. Each identifier name representing memory in the source program will be assigned its own unique variable register in the E-machine. For example, in a miniPascal program, a variable named Result might be declared in the current program scope and another variable—also named Result— might be declared in another enclosing procedure scope. The compiler will assign a unique variable register to each of these two variables. Once a variable is assigned a variable register, the register remains associated with the variable for the duration of the program’s compilation and subsequent execution, regardless of whether the variable is currently active or not. The information held in a variable register consists of the corresponding vari­ able’s size (e.g., number of bytes) as well as a pointer to a corresponding variable stack. Each variable stack entry, in turn, holds a pointer into data memory, where the actual variable values are stored. The variable stacks are necessary because a particular variable may have multiple associated instances due to being declared in recursive procedures or functions. In such instances, the top of a particular variable’s register stack points to the value of the current instance of the associated variable in data memory; the second stack element points to the value of the previous in­ stantiation of the variable, and so on. The E-machine’s data memory represents the usual random access memory found on real computers. The E-machine, however, uses data memory only to hold data values (it does not hold any of the program instructions). 11 *The string space component of the E-machine’s architecture was added as a result of the miniPascal compiler development. The string space contains the values of all string literals and enumerated constant names encountered during the com­ pilation of a miniPascal program. The string space is loaded with the information contained in the STRINGSECTION of the E-machine object file. Currently, this string space is used only by the animator when displaying string constant and enu­ merated constant values. A more detailed discussion of the interaction of the string space and variable registers is found in chapter 4 The label registers are another unique component of the E-machine required for reverse execution. There are an unbounded number of these registers, and they are used to keep track of labeled E-code instructions. Each E-code lab e l instruction is assigned a unique label register at compile time. The information held in a label register consists of the program memory address of the corresponding E-code lab e l instruction as well as a pointer to a label stack. A label stack essentially maintains a history of previous instructions that caused a branch to the label represented by the label register in question. During reverse execution, the top of the label stack allows for correct determination of the instruction that previously caused the branch to the label instruction. The index register is found in real computers and serves the same purpose in the E-machine. In many circumstances, the data in a variable is accessed directly through the appropriate variable register. However, in the translation of a high level language data structure, such as an array or record, the address of the beginning of the structure is in a variable register; to access an individual data value in the structure, an offset—stored in the index register—is used. When necessary, the compiler can therefore utilize the index register so that the E-machine can access the proper memory location via one of the indexed addressing modes. 12 The address register is provided to allow access to memory areas that are not accessible through variable registers. For example, a pointer in Pascal is a variable that contains a data address. Data at that address can be accessed using the address register via the appropriate E-machine addressing mode. The address register can be used in place of variable registers for any of the addressing modes. As in many real computers, the results of all arithmetic and logical operations are maintained on the evaluation stack] the evaluation stack register keeps track of the top of this stack. For example, in an arithmetic operation, the operands, are pushed onto the evaluation stack and the appropriate operation is performed on them. The operands are consumed by the operation and the result is pushed onto the top of the stack. An assignment is performed by popping the top value of the evaluation stack and placing it into the proper location in data memory. The return address stack (or call stack) is the E-machine’s mechanism for imple­ menting procedure and function calls. When a subroutine call is made, the program counter plus one is pushed onto the return address stack. Then, when the E-machine executes a return from subroutine instruction, all it has to do is load the program counter with the top of/the return address stack. A pointer to the top of the return address stack is kept in the return address stack register. The save stack contains information necessary for reverse execution. Whenever some critical information (as determined by the execution of a critical instruction) is about to be destroyed, the required information is pushed onto the save stack. This ensures that when backing up, the instruction that most recently destroyed some critical information can be reversed by retrieving that critical information from the save stack. The save stack registers point to the top and bottom of the save stack. *The dynamic scope stack was added to the original E-machine architecture as a result of the miniPascal compiler development. The original E-machine did not provide a way for the animator to determine (for display) the currently 13 active program scopes. The animator must be able to display variable val­ ues associated with the execution of a packet both from within the cur­ rent invocation of a procedure (or function) and from within the call­ ing scope(s). That is, the animator must have the ability to illustrate a program’s run time stack during execution. The Static Scope Table, which is loaded into static scope memory from the E-machine object file’s STATSCOPESECTION, provides the animator with the information relevant to the static nature of a program (e.g., information pertaining to variable names local to a given procedure). However, the specific calling sequence resulting in a particular invocation of a procedure (or function) was not available, The dynamic scope stack provides the dynamic chain as found in the run time stack activation records generated by most conventional compilers. Even though the E-machine’s return address stack could be used to hold this information, a separate dynamic scope stack was added to the E-machine architecture in order to minimize the impact on the existing E-machine and its emulator. At any given point during program execution, the dynamic scope stack entries reflect the currently active scopes. Each dynamic scope stack entry—corresponding to a program name, a procedure name, or a function name—contains the index of the Static Scope Table entry describing that name (i.e., a static scope name). Once these indices are available, the animator can then use the Static Scope Table information to determine the variables whose values must be displayed following the execution of a packet. The animator needs access to the entire dynamic scope stack in order to display all pertinent data memory information following the execution of any given packet. A more detailed discussion of this process is found in chapter 4. The dynamic scope stack register points to the top of the dynamic scope stack. *In order to handle reverse execution, a save dynamic scope stack was added to the E-machine architecture. This stack records the history of procedures and/or 14 functions that have been called and. subsequently-returned from. The save dynamic stack register points to the top of this stack. Finally, source memory holds an array of records, each of which is a copy of a line of source code for the compiled program. Source memory is loaded from the E-machine object file’s SOURCESECTION at run time and is referenced only by the animator for display purposes. E -m ach ine E m u la tor The E-machine emulator was designed and written by Michael Birch and is de­ scribed in his thesis [Birch 90]. The emulator’s design essentially follows the design of the E-machine presented the previous sections of this chapter. The emulator was written in ANSI Standard C for portability and has been compiled in both Turbo C 2.0 and Borland C++ 3.1 by the current author. Within the complete DYNALAB environment, the emulator will act as a slave to the program animator, executing a packet of E-code instructions upon each call. The current author has written a simple DOS animator to drive the emulator in order to test compiled miniPascal programs. This animator/ emulator has successfully run compiled mini- Pascal programs on several IBM PO compatible computers including 286, 386, and 486 architectures. E -m ach in e O b ject F ile S ection s The E-machine emulator defines the object file format' that must be generated by a compiler. As a result of the miniPascal compiler development, several changes were made to the original Ermachine object file definition and are denoted with a 15 leading asterisk (*) in the following discussion. A single E-code object file ready for execution on the E-machine consists of seven sections, which may occur in any order. Each section is preceded by an object file record containing the section’s name followed by a record that contains a count of the number of records in that particular section. Each of these seven sections (whose names are shown in capital letters) holds information which is loaded into a corresponding E-machine component at run time as follows: o the CODESECTION, which is loaded into program memory; o the PACKETSECTION, which is loaded into packet memory; o the VARIABEESECTION, which is loaded into the size information associated with the variable registers; o the LABELSECTION, which is loaded into the label program address infor­ mation associated with the label registers; o the SOURCESECTION, which is loaded into source memory; o the STATSCOPESECTION, which is loaded into static scope memory; o- the STRINGSECTION, which is loaded into the string space. The file sections are described below. The CODESECTION The CODESECTION contains the translated program—the E-code instruction stream. Even though the instruction stream can be thought of as stream of pseudo assembly language instructions, the instructions are actually contained in an array of C structures, and are loaded from the CODESECTION into the E-machine’s pro­ gram memory at run time. Each E-code instruction structure contains the following information: o an operation code (e.g., push or pop); o the instruction mode (critical or non critical); 16 O The data type of the operand (e.g., I indicates INTEGER); o Either a numeric data value or an addressing mode. *The PACKETSECTION The PACKETSECTION consists of packet structures describing source program animation units and their translated E-code packets. These structures are loaded into the E-machine’s packet memory at run time. Each packet structure contains the following information: o the packet’s starting and ending E-code instruction addresses in program mem­ ory; o the starting and ending line and column numbers in the original source file of the program animation unit corresponding to the packet; o *an index into the current scope block of the Static Scope Table (discussed in chapter 3); o *the program memory address at which the packet may be “fragmented” (dis­ cussed in chapter 4); o *a flag indicating whether or not the animator should display information when the packet is executed (discussed in chapter 4). The VARIABLESECTION The VARIABLESECTION consists of structures describing the variable registers used by the compiled program. A variable register structure consists of a single field that contains the size of the data represented by the register. For example, on a DOS machine where the addressable unit is a byte, a variable representing a 32-bit integer would have a size of 4. This information is used to initialize size information held in the E-machine’s variable registers. 17 The LABELSECTION The LABELSECTION consists of label structures describing the label numbers generated by the compiled program. A label structure consists of a single field that contains the program address at which the corresponding label is defined. This information is used to initialize the label program address information held in the E-machine’s label registers. The SOURCESECTION The SOURCESECTION contains a copy of the source program being executed. Each record in this section corresponds to a fine of original source code, and is loaded into the E-machine’s source memory at run time. Source memory is referenced only by the animator for display purposes. The animator references source memory via packet memory information that describes correlations between the currently executing E-code packet and the corresponding source program animation unit. The animator references the packet structure fields that hold starting and ending line and column numbers in source memory to determine the animation unit to highlight. *The STATSCOPESECTION The STATSCOPESECTION was originally named the SYMBOLSECTION in Birch’s thesis. It contains a complex structure—the Static Scope Table (called the symbol table in Birch’s thesis)—which is used by the animator to determine the variable values that should be displayed upon execution of a packet. The name was changed to Static Scope Table in order to avoid confusion with the compiler’s symbol table. The STATSCOPESECTION records are loaded into the E-machine’s static scope memory at run time. 18 A number of additions and changes were made to the Static Scope Table’s struc­ ture during miniPascal compiler development. These changes deal primarily with making information available so that the animator can display both the dynamic and static information that are appropriate at various stages of program execution. The Static Scope Table is logically divided into “scope blocks,” each of which de­ scribes identifiers declared within a single static scope of the source program. A more complete discussion of this section is found in chapters 3 and 4. Each Static Scope Table entry contains the following information: o the name of the identifier being described (e.g., a variable name or a procedure name); o upper and lower bounds (for array variables); o *the index of the Static Scope Table entry containing the next array index bounds (for multidimensional arrays); o the offset value (for record fields); ■ o an enumerated value indicating the data type (e.g., INTEGER, RECORD, or STRING); o *the record size (for arrays of records); o a pointer to this entry’s parent Static Scope Entry; o a pointer to the child of this entry (e.g., if this static scope entry describes a procedure, this field would hold the index of the first entry in the static scope block describing the variables declared local to the procedure); o a variable register number (for variable names); o *a number statically assigned to procedure and functions entries; this number is used in determining the dynamic scoping level at execution time. *The STRINGSECTION The STRINGSECTION, which contains the values of string literals and enumer­ ated constant names, was added as a result of miniPascal compiler development. The 19 contents of the STRINGSECTION are loaded into the E-machine’s string space at run time. The string space allows the animator to have dynamic access to the names of an enumerated type as well as the internal numeric values corresponding to the names. The animator can also retrieve the values of string constants from the string space. 20 C H APTER 3 E-M ACHINE COM PILATION CONSIDERATIONS Many of the compilation concerns confronting E-machine compiler writers are the same as those faced by writers of compilers for conventional machines. There are, however, several unique factors that must be addressed when compiling for the E-machine’s animation environment, including: o identification and translation of program animation units into E-code packets; o generation of the Static Scope Table; o providing access to names associated with enumerated type variables; o identifying critical and noncritical E-code instructions. P rogram A n im a tio n U n its and E -cod e P ack ets As briefly described in chapter 2, the animation of a high level language program is accomplished by dividing its source code into program “chunks” called anima­ tion units. The compiler is responsible for isolating a source program’s animation units. Each animation unit, in turn, must be translated into a group—or packet—of E-code instructions along with corresponding descriptions of the animation unit and its translated E-code packet via a packet structure. 21 When a high level language program is animated, the animator begins execution by displaying the first several fines of the source code and highlighting the first animation unit in the program. The animator then awaits a response from the user. When the user responds, the animator calls the E-machine to execute the currently highlighted animation unit of the program. Actually, what the E-machine executes is the packet of instructions corresponding to the animation unit. When the E-machine has completed execution of the instructions contained in the packet, control is returned to the animator. The animator then performs various animation tasks (e.g., displaying pertinent data memory values) and then again awaits a user response before repeating this process by highlighting the next animation unit and so forth. Thus, two of the challenging tasks facing the compiler designer are identifying animation units and properly translating them into E-code packets for successful animation. The following two sections present an example program to illustrate how the miniPascal compiler accomplishes these two tasks. Although this example program posed no particular problems for the compiler, a number of subtle problems relative to identifying and translating program animation units were encountered during the compiler’s development. These problems and their solutions are discussed in detail in chapter 4. Identifying Program Animation Units The compiler identifies individual animation units as it is parsing the high level language source code. Consider the miniPascal program in figure 2 (the num­ bers on the left correspond to fine numbers in the source program file). For this program, the miniPascal compiler identifies the nineteen animation units shown in figure 3 (the numbers on the left correspond to each animation unit’s associ­ ated packet structure, as discussed in the next section). These animation units will be successively highlighted (in the original source program of figure 2) by the 22 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Program Sampl; VAR I,J,K:INTEGER; I:IITEGER; Procedure -Init(VAR X,Y :INTEGER) BEGIN X := I; Y := 2; END; BEGIN Init(IjJ) ; IF I < 10 THEN K := 100 ELSE K := 0; N := K + I*J END. Figure 2: Source Code for Program Sampl 0 Program Sampl; 1 VAR 2 IjJ jK:INTEGER; 3 N :INTEGER; 4 Procedure Init 5 (VAR X jY:INTEGER); 6 BEGIN 7 X := I; 8 Y := 2; 9 END; 10 BEGIN 11 Init(IjJ); 12 IF I < 10 13 THEN 14 K := 100 15 ELSE 16 K := 0 17 N := K + I*J 18 END. Figure 3: Animation Units.Identified in Program Sampl 23 animator as it performs the animation of the program. It should be noted that the determination of animation units is arbitrary and can vary from one compiler to another based on subjective aesthetics of program animation. As can be seen from this example, an animation unit can correspond to “chunks” of source code representing a single keyword, an entire program statement, the conditional part of an if statement, and so forth. Translating Program Animation Units into E-code Packets Once the compiler has identified an animation unit, it must then translate this unit into a corresponding packet of E-code instructions along with an associated descriptive packet structure. Thus, compilation of the example given in figure 2, would result in the generation of nineteen E-code packets and nineteen correspond­ ing packet structures. Figure. 4 shows the pseudo assembly language representation of the E-code instructions generated for the miniPascal program shown in figure 2. The numbers shown on the left in figure 4 correspond to program memory addresses (instruction numbers). The individual packets, corresponding to the animation units of figure 3, are shown separated by blank lines in figure 4. Table I shows the array of packet structures—called the Packet Table— describing the individual packets resulting from the translation of the program of figure 2. The PacketNumber field (column) is included for clarity—it is not part of the Packet Table. The first two fields in the Packet Table (StartAddr and EndAddr) give the starting and ending addresses in program memory of the E-code packet. The next four fields (Star thine, St art Col, EndLine, and EndCol) demark the phys­ ical location of the packet’s corresponding program animation unit in the source program array. The ScopeIndex field in the Packet Table is discussed in the next section of this chapter. The final two fields (FragAddr and DisplayPacket) provide 24 0 pushd' Cl2 36 nop I nop 37 push I,ClOO 2 nop 38 pop c,I,VO 39 br 5 3 inst c,VO 4 inst c,Vl 40 label 4 5 inst c,V2 41 nop. 6 inst c,V3 42 push I,C0 7 br 0 43 pop c,I,VO 8 label LI 44 label L5 9 pushd C9 45 inst c,V7 46 push I,V2 10 link V5 47 push I,Vl 11 link V4 48 mult c ,I 49 pop c,I,V7 12 nop 50 inst c,V8 51 push I,VO 13 push !,Cl 52 push I,V7 14 pop c,I,V5 53 add c,I 54 pop c,I,V8 15 push I,C2 55 push I,V8 16 pop c,I,V4 56 pop c,I,V3 17 nop 57 nop 18 unlink V4 58 uninst c,V8 19 unlink V5 59 uninst c,V7 20 popd 60 uninst c,V6 21 return 61 uninst c,V3 62 uninst c,VO 22 label 0 63 uninst c,Vl 23 nop 64 uninst c,V2 65 popd 24 pusha Vl 25 pusha V2 26 call I 27 label 2 28 label 3 29 inst c,V6 30 push I,V2 31 push I,CIO 32 . less c,I 33 pop c,B,V6 34 push B,V6 35 brf c,4 Figure 4: E-code Instructions Resulting from Compilation of Program Sampl 25 Packet Number Start Addr End Addr Start Line Start Col End Line End Col Scope Index Frag . Addr Display Packet 0 0 I 0 0 0 14 0 -I TRUE I 2 2 2 2 2 4 0 -I TRUE 2 3 5 3 4 3 17 3 -I TRUE 3 6 7 4 4 4 13 4 -I TRUE 4 8 9 6 2 6 15 0 -I TRUE 5 10 11 6 17 6 33 2 -I TRUE 6 12 12 7 4 7 8 2 -I TRUE 7 13 14 8 6 8 12 2 -I TRUE 8 15 16 9 6 9 12 2 -I TRUE 9 17 21 10 6 10 9 5 -I TRUE 10 22 23 12 2 12 6 5 -I TRUE 11 24 26 13 4 13 13 5 -I TRUE 12 27 35 14 4 14 12 5 -I TRUE 13 36 36 15 6 15 9 5 -I TRUE 14 37 39 15 11 15 18 5 -I TRUE 15 40 41 16 6 16 9 5 -I TRUE 16 42 43 16 11 16 17 5 -I TRUE 17 44 56 17 4 17 15 5 -I TRUE 18 57 65 18 4 18 7 5 -I TRUE Table I: Packet Table Resulting from Compilation of Program Sampl additional information necessary for animating an animation unit and are discussed in chapter 4. G en era tion o f th e S ta tic S cop e T able The compiler writer must also provide information describing all of the data memory variables that the animator must display. This information is provided in the Static Scope Table, a linear array which is, in turn, logically divided into numer­ ous scope blocks. Each scope block describes the identifiers (e.g., variable names and procedure names) declared in a single static scope in a program. Even though this information is obtained from the compiler’s symbol table, the generation of the 26 Static Scope Table is not a straightforward task due to scope nesting characteristics of many high level languages, such as miniPascal. Table 2 shows the Static Scope Table that is generated as a result of compiling the miniPascal program given in figure 2. The Entry (entry number) column, or field, is included for clarity—it is not part of the Static Scope Table. This Static Scope Table consists of three scope blocks—a block describing the identifiers declared within the scope of procedure Init (entries 0-3), a block describing the identifiers declared within the scope of program Sampl (entries 4-10), and a “bootstrap” block describing the main program entry (entries 11-13). En Id Upr Lwr Nxt Off Type Rec Par Ch Var Proc try Name Bnd Bnd Idx set Siz ent ild Reg Num S c o p e b lo c k d e s c r i b in g p r o c e d u r e I n i t 0 - - - - HEADER - 4 - - - I X - - - - INTEGER - - - 5 - 2 Y - - - - INTEGER - - - 4 - 3 - - - - END - - - - - S c o p e b lo c k d e s c r i b in g p r o g r a m S a m p l 4 - - - - HEADER - 11 - - - 5 I - - - - INTEGER - - - 2 - 6 J - - - - INTEGER - - - I - 7 K - - - - INTEGER - - - 0 - 8 N - - - - INTEGER - - - 3 - 9 Init - - - - P R O C E D U R E - - 0 - I 10 - - - - E N D - - - - - - Bootstrap scope block 11 - - - - H E A D E R - - - - - 12 Sampl - - - - P R O G R A M - - 4 - 0 13 - - - - E N D - - - - - Table 2: Static Scope Table Resulting from Compilation of Program Sampl The bootstrap block contains three entries: the HEADER and END entries that delimit the scope block and a PROGRAM entry containing information about the program itself. There are two fields of interest in the PROGRAM entry; these are the 27 child pointer field (Child) and the procedure number field (ProcNum). The Child field contains the index of the first entry of the scope block describing the identifiers declared in the program. The ProcNum field contains a compiler-generated number that is used in conjunction with dynamic scoping; this field is discussed in chapter 4. The entries in the scope block describing the identifiers declared in the pro­ gram scope consist of the HEADER and END delimiter entries as well as entries describing each of the scope’s identifiers. The Parent field of the HEADER en­ try in this scope block contains the index of the first entry of the bootstrap scope block. This scope block’s PROCEDURE entry—describing procedure Init—uses the Child field, which contains the index of the first entry of the scope block describing the identifiers declared in procedure Init. The ProcNum field is also used in the PROCEDURE entry; it contains a compiler-generated number to be used in con­ junction with dynamic scoping. The entries in the scope block describing the identifiers declared in procedure Init consist of the HEADER and END delimiter entries as well as entries describing each identifier declared in the scope, in this case the procedure’s parameters. The Parent field of the HEADER entry of this scope block contains the index of the first entry of the scope block containing the procedure’s declaration. There must also be some way to relate a high level language program’s dynamic ' nature to the static information found in the Static Scope Table. That is, the animator must be able to determine all of the active scopes at any given point during execution of the program. The animator can then display the data memory values pertinent to the most current scope as well as the data memory values associated with the scopes in the calling sequence leading to the most current scope. The animator retrieves dynamic scoping information from the E-machine’s dy­ namic scope stack. For instance, suppose that the animator has just highlighted 28 th_e animation unit X := I; in procedure Init. After receiving a response from the user, the animator then calls the E-machine to execute the E-code packet corresponding to this animation unit. When the E-machine returns control to the animator, the animator must then determine the relevant data memory values to be displayed following any changes that resulted from execution of the packet. This task is accomplished by querying the E-machine’s dynamic scope stack, which contains a history of the active scopes. In this example, the dynamic scope stack currently consists of two entries, each containing an index into the Static Scope Table. The top entry contains the value 9 and the bottom entry contains the value 12. These values indicate to the animator that procedure Init (Static Scope Table entry number 9) is the most current active scope and that program Sampl (entry number 12) is the calling scope, By using the child pointers associated with these two Static Scope Table entries, the animator can now determine the appropriate data memory values to be displayed. Figure 5 shows a possible animation resulting from the execution of this animation unit. The arrow (==>) pointing to the instruction Y := 2; indicates where animation proceeds. The ScopeIndex field of the packet structure can now be explained. Suppose that the E-machine has completed execution of the packet corresponding to the animation unit I,J,K:INTEGER; and has returned control to the animator. The animator, via a query of the dynamic scope stack, now determines that only the values of the variables contained in the outer program scope should be displayed. The variables listed in the block describing this scope’s variables are I, J, K, and N. However, at this point in the program’s execution, variable N has not yet been declared, and thus should not be displayed. The ScopeIndex field of the packet structure associated with the above animation 29 Program Sampl; Program Sampl I = I VAR J is undefined I,J,K :INTEGER; K is undefined N :INTEGER; N is undefined Procedure Init(VAR X,Y :INTEGER); Procedure Init BEGIN X = I X := I; Y is undefined ==> Y := 2; END; BEGIN Init(IjJ) ; IF I < 10 THEN K := 100 ELSE K := 0; N := K + I*J END. Figure 5: Animation Display After Execution of X := I; unit contains the value 3. This value indicates to the animator that it should only display data memory values for entries numbered 0, I, 2, and 3 in the window associated with the most current active scope block. Hence, the animator will display the values of the variables I, J, and K (0 stands for the HEADER entry). In this case, all of these variables would have the value “undefined,” as they have only just been declared and have not yet had values assigned to them. T ransla ting E n u m era ted T yp e V ariab les Ordinarily, only the internal numeric value of an enumerated type variable is required in translated object code. It is desirable, however, for program animation purposes to have the animator display the enumerated constant name rather than just the internal numeric value of a variable of an enumerated type. Thus, when translating an enumerated type variable, the compiler must provide a way for the 30 animator to relate the variable’s internal numeric value to its corresponding constant name. This task was accomplished by the addition of the string space to the El- machine’s architecture. The string space holds the enumerated constant names (as well as string literals) defined in a miniPascal program. The method that the miniPascal compiler uses to relate an enumerated type variable’s internal numeric value to the appropriate name in the string space is discussed in chapter 4. Id en tify in g C r itica l and N on cr itica I E -cod e In stru c tion s The final major E-machine compilation concern is that of identifying the E-code instructions that would destroy information that is needed (i.e., critical) for success­ ful reverse execution. Since the immediate concern for the miniPascal compiler was to produce a usable compiler, the current version of the compiler treats all E-code instructions as critical. For example, the animation unit M := K + I*J; in figure 2 corresponds to the packet of E-code instructions numbered 44 through 56 in figure 4. AU of these instructions are marked critical via the “c” operand. Only instruction number 56 is actuaUy critical, however, as only it results in critical information being destroyed. That is, the old value of M is being destroyed by popping a new value into it in instruction 56; for reverse execution, this old value of N must be saved. Thus, the packet of E-code instructions corresponding to this animation unit could be generated as shown in figure 6, where the operand “n” indicates a noncritical instruction. 44 label L5 45 inst rL,V7 46 push I-V2 47 push IjVl 48 mult n ,I 49 pop U jIjVT 50 inst IxjVS 51 push IjVO 52 push IjVT 53 add-n,I 54 pop IxjIjVS 55 push IjVS 56 pop C jIjVS Figure 6: E-code Instructions Translating N := K + l* j 32 CH A PTER 4 THE DESIG N OF THE miniPASCAL COM PILER The rainiPascal compiler is a one-pass compiler written in ANSI Standard C and developed with Borland C++ 3.1 on an IBM PC compatible computer. E-machine object files (E-code files) generated by the miniPascal compiler have been tested using a simple DOS animator driving the E-machine emulator. Even though the capabilities of this animator are quite limited, a significant number of miniPascal programs have been compiled, executed, and animated successfully. T h e m in iP a sca l L angnage The miniPascal language is a subset (with a few noted extensions) of ISO Stan­ dard Pascal as defined in the book Pascal User Manual and Report by Jensen and Wirth [Jensen 91]. The following Pascal features are supported by miniPascal: o constant, type, and variable declarations; o procedure and function declarations; o simple types including integer, real, character, boolean, enumerated types, and subrange types; 33 o structured types: — single and multidimensional arrays, — strings, including arrays of strings, • - fixed-part records including records whose fields are arrays, records, strings, or enumerated types (arrays of records are also supported); o boolean expressions, unary expressions, and infix expressions; o assignment statements; o procedure and function calls; o control statements: — the ifithen and if-then-else statements, — the while loop, — the repeat loop, — the for loop, — the case statement (with the extension of an others clause). The following Pascal features are not currently supported in miniEascal: o records with variant parts; o the with statement; o pointers; o sets; o labels; o the goto statement; o external files; o the forward directive; o predeclared functions and procedures; o procedure or function names as parameters; o conformant-array parameters. 34 O verv iew o f th e m in iP a sca l C om p iler The miniPascal compiler was developed using the lex and yacc compiler devel­ opment tools [Mason 90]. Lex is a scanner generator written by M.E. Lesk and E. Schmidt of Bell Laboratories [Lesk 75] and yacc is a parser generator written by S.C. Johnson, also of Bell Laboratories [Johnson 75]. Lex reads a specification file of regular expressions identifying the tokens in a language and generates a C mod­ ule containing a scanner for those tokens. Yacc reads a specification file containing a context-free grammar (and associated semantic actions) for a language and pro­ duces a C module containing an LALR(I) parser for the language. The basic lex and yacc specifications for ISO Standard Pascal were obtained from the ftp network site primost.cs.wisc.edu. The semantic stack definition and semantic actions were then added to these specifications. Both lex and yacc are standard utilities available on Unix machines. Even though there are versions of these utilities available for DOS machines, the lex and yacc specifications for miniPascal have been run exclusively on a Unix machine, with the resulting C modules being downloaded to a DOS machine. These C modules were then compiled and linked with numerous other C modules containing the semantic analysis and code generation routines. The compiler consists of a total of sixteen modules. Figure 7 is a schematic diagram showing the interactions among the various modules—the directions of the arrows indicate calls to a module. Three of the sixteen modules are omitted from the figure for the sake of clarity. These are the Error Message module, the Memory Allocation module, and the module that produces a text file containing the pseudo assembly language instructions translating the source program (used for compiler debugging purposes). A brief description of the compiler operation is given below. 35 STRING Module Scanner Module SOURCE Module Source x File J VARIABLE Module Code Driver Module Parser Module PACKET Module Main Module CODE Module LABEL Module Symbol Table Module Semantic Analysis Module STATSCOPE Module Figure 7: Schematic Diagram of the miniPascal Compiler After the Main module opens appropriate files, it calls the Parser module, which drives the compilation process by requesting tokens from the Scanner module and by calling various semantic analysis and code generation routines, notably the Semantic Analysis module and the Code Driver module. As can be seen in figure 7, the Symbol Table module plays a central role during semantic analysis and code generation. Seven of the modules are dedicated to producing the E-code object file. These modules are: • the PACKET module, which produces the PACKETSECTION; • the LABEL module, which produces the LABELSECTION; • the VARIABLE module, which produces the VARIABLESECTION; 36 o the CODE module, which produces the CODESECTION; o the STRING module, which produces the STRINGSECTION; o the SOURCE module, which produces the SOURCESECTION; o the STATSCOPE module, which produces the STATSCOPESECTION. When compilation is complete, control is returned to the Main module, which then calls routines in each of these seven E-code production modules in order to generate the final E-code file (these calls are not indicated in figure 7). If the compiler encounters an error during compilation, a call is made to the Error module (omitted from figure 7), which prints an error message and then calls a routine in the Main module for immediate termination of compilation. Error D e te c t io n and R ecovery When the compiler detects an error in a miniPascal source file, an appropriate message is printed and the compilation is halted. The initial users of this com­ piler will be instructors preparing laboratory exercises-—not students developing programs. Thus, minimal error reporting with no recovery was considered to be sufficient. O p tim iza tion There are no provisions for optimization in the compiler. There is no real need for optimization in the animation environment, and many optimizations would alter the E-code/source language relationship too severely for animation to be successful. 37 T h e C om p iler M od u les The remainder of this chapter describes the individual compiler modules in more detail. The discussion is focused on the role each module plays in the generation of the seven sections of the E-code file, giving particular attention to those sections that presented problems unique to this compiler, The E-code’s CODESECTION is essentially the equivalent of the intermediate code files gen­ erated by many compilers; the problems encountered in generating this section were the same as would be found in the development of any compiler. The four sections, VARIABLESECTION, LABELSECTION, SOURCESECTION, and STRINGSECTION, are unique to the E-machine; they, however, posed no particular problems and are generated in a straightforward manner. The PACKETSECTION, also unique to the E-machine, did present some problems, which are discussed below in the Parser Module description. The problems pre­ sented by the STATSCOPESECTION are discussed in the STATSCOPE Mod­ ule description. Another E-code generation problem occurred due to the desire to have the animator display an enumerated type variable’s constant name as well as its internal numeric value. The solution to this problem was to add the STRINGSECTION to the E-machine object file as discussed in the STRING Mod­ ule description. The Main Module When the miniPascal compiler is invoked, control passes to the main routine in the Main module. The Main module consists of the main routine, routines that handle the opening and closing of files, and a routine to handle abnormal end of compilation. The Main module opens the miniPascal source file, whose name is 38 obtained from a command line argument when the compiler is invoked. The Main module then creates three files to hold the output from the compilation: the E- code (object) file, a file to hold the pseudo assembly language instruction stream (for compiler debugging), and a temporary file to hold output from the module producing the CODESECTION of the E-code file. Next, the Main module calls the yacc-generated yyparse routine in the Parser module to begin the compilation. When yyparse returns successfully to the Main module, the compilation is complete, and the Main module then calls routines in the code producing modules to write the various E-code sections to the E-code file. Finally, the files are closed and the compiler exits normally. If a return marking an unsuccessful compilation is made to the main Module, the miniPascal source file is closed, the output files are deleted, and an abnormal exit is indicated. The Parser Module As indicated above, the yacc-generated Parser module is responsible for driving the compilation process. Yacc produces an LALR(X) parser by processing a speci­ fication file containing a context free grammar that generates the source language. Calls to semantic routines, written in C, are interspersed among the grammar pro­ duction rules given in the miniPascal yacc specification. The yacc-generated parser maintains a parser-controlled semantic stack, whose records hold information corre­ sponding to each token and non-terminal found in the grammar productions. The parser has access to the information in the semantic stack records via pseudo vari­ ables used in the semantic actions. The yacc specification provides a union structure to define the different types of semantic stack records necessary to describe the var­ ious semantic information required for each symbol in the grammar. In the case of the miniPascal specification, this union structure consists primarily of pointers to dynamically allocated structures containing information needed to produce the 39 E-code for the animation of a miniPascal program. The yacc-generated Parser mod­ ule consists of one very large routine, yyparse. Two small user-supplied supporting routines are also included in this module. Calls to th e Scanner. As in conventional compilers, the Parser module re­ quests the next token from the Scanner module by calling the lex-generated yylex routine. The Parser module has access to the value of a token through the external variable yytext, whose value is produced in the Scanner module. Since the mini- Pascal language is not case-sensitive, the Scanner converts all letters in an identifier name token to lower case before returning the token (and its value) to the Parser module. The numeric values of integer and real literal tokens are available to the Parser module via the external variable yylval, also produced in the Scanner module. In terface to th e Symbol Table. The Parser module interfaces directly with the Symbol Table module to enter and retrieve identifier names. The Parser module enters and retrieves an identifier’s symbol table attributes by calling routines in the Semantic Analysis module. In itia ting Semantic Actions. Many of the semantic actions initiated by the Parser module are accomplished by calls to routines in the Code Driver module. These routines perform further semantic analysis (via calls to routines in the Se­ mantic Analysis module) and then generate code (via calls to the code production routines). For example, when the Parser module recognizes an assignment produc­ tion, it calls the GenAssign routine in the Code Driver module. The parameters passed to GenAssign are pointers to the semantic stack structures corresponding to the symbols involved in the assignment production rule of the grammar. GenAssign can then determine whether the assignment is valid, determine what value (if any) to load into the index register, and generate appropriate E-code by calling routines in 40 the code production modules. There are also situations in which the Parser module itself can cause E-code generation directly by calling code production routines. Providing for Dynam ic Scoping. The Parser module provides dynamic scoping information to the E-machine by generating code to manipulate the E- machine’s dynamic scope stack. (Static scoping information is contained in the Static Scope Table and is discussed in detail in the STATSCOPE module descrip­ tion.) When the Parser module encounters the beginning of a program, function, or procedure scope, it calls the GenInstr routine in the CODE module to generate the pushd instruction. At run time, the pushd instruction causes an entry to be pushed onto the E-machine’s dynamic scope stack. This entry contains the index of the scope’s declaration (e.g., procedure name description) in the Static Scope Table; this index must be passed as an operand in the pushd instruction. (Recall from the discussion of the dynamic scope stack in chapter 2 that this is necessary in order that identifiers in calling scopes be accessible at run time.) At this point in the parse, however, this index value is not known because the scope’s declaration is “owned” by the containing scope, whose Static Scope Table entries will.not be generated until that entire scope has been parsed. This means that the Parser must asso­ ciate a dummy index value with the pushd instruction, and the instruction must be “patched” when the actual value becomes available. When the Parser module en­ counters the end of a scope, it generates the popd instruction and then calls the Gen St at Scop eBlo ck routine in the STATS GOPE module to generate the Static Scope Table entries for the scope. When the Parser module finally encounters the ' end of a containing scope, the STATSCOPE module can calculate the index of any nested procedure or function scope declarations and patch their corresponding pushd instructions via a call to the CODE module. 41 Translating Anim ation Units into Packets. The Parser module controls the identification and subsequent translation of a miniPascal program’s animation units. This translation involves the generation of a packet of E-code instructions (via calls to the CODE and Code Driver modules) as well as the construction of an associated packet structure describing the animation unit. The Parser calls the following routines in the PACKET module to construct a packet structure: o S tart Packet; o EndPacket; o Adjust S tart Packet; o AdjustEndPacket; o AddPktFragInstr; A packet structure’s delimiter values—pertaining to the source file fine and col­ umn number boundaries of the animation unit that the. packet translates, as well as the starting and ending program memory addresses of the E-code packet itself—are determined by the Parser module, which then passes these values to the appropriate PACKET module routine. The Parser module has access to the source file fine and column number values via external variables that are initialized when the Scanner module recognizes a token; the Parser has access to the current program memory address (instruction number) via an external variable maintained by the CODE module as E-code instructions are generated. When the Parser module recognizes a token that marks the beginning of an animation unit, it calls the StartPacket rou­ tine, passing as parameters the source file line and column numbers corresponding to the first character of this token as well as (in the general case) the number of the next E-code instruction to be generated. The PACKET module maintains an internal variable, PktNum, containing the number of the packet structure currently under construction. This variable, which serves as the index into the PACKET module’s 42 array of packet structures, is incremented in the StartPacket routine. Subsequent calls to any of the other PACKET module routines listed above refer to the pre­ viously “started” packet structure. Thus, for a given animation unit, StartPacket is called only once, while the remaining routines (including EndPacket) may be called any number of times while the animation unit’s packet structure is being constructed. In the simplest case, upon recognizing a token that marks the beginn ing of an animation unit, the Parser module first calls the StartPacket routine and then gener­ ates any corresponding E-code instructions. As other tokens within the animation unit are recognized, the Parser continues to generate E-code instructions. When the Parser recognizes the token marking the end of the animation unit, it calls the EndPacket routine, passing as parameters the source file line and column numbers corresponding to the last character of this token as well as the number of the current E-code instruction. For example, the BEGII keyword is considered to be a complete packet. Thus, upon recognizing the BEGIN token, the Parser first calls the Start- Packet routine, passing the source file fine and column numbers associated with the letter B as well as the number of the next E-code instruction to be generated. Next, the Parser generates the E-code nop instruction (via a call to the CODE module). Finally, the Parser calls the EndPacket routine, passing the source file fine and col­ umn numbers associated with the letter N as well as the current E-code instruction number (corresponding to the nop instruction). There are many cases in which an animation unit’s delimiters can be determined in such a straightforward man­ ner; there were, however, a number of subtle animation unit translation problems encountered during compiler development. The Lookahead Problem in Anim ation Unit T ranslation. One of these problems occurs when the Parser module is assigning source file fine and column 43 numbers to an animation unit. For those tokens that delimit an animation unit, the parser calls either StartPacket or EndPacket, passing the token’s line and col­ umn numbers and the appropriate instruction number. The values in the external variables containing the line and column numbers, however, are incorrect when the Parser module must examine the lookahead token in order to determine which pro­ duction to reduce. (Recall that yacc produces an LALR(I) parser that sometimes requires a one-symbol lookahead for proper parsing actions.) In these cases, the current line and column numbers reflect the location of the lookahead token instead of the token delineating the animation unit. This problem was solved by identi- fying the tokens that were involved in these situations and replacing them with non-terminals on the right-hand sides of productions. Each new non-terminal be­ comes the left-hand side of a unit production whose right-hand side is just the token the non-terminal replaced. During reduction of one of these new unit productions, the token’s line and column numbers can be captured and placed in the semantic record belonging to the token because no lookahead is required to reduce these unit productions. Yacc places this record on the semantic stack, which allows access by routines processing the stack. Thus, when a production that has one of the new non-terminals on its right-hand side is reduced, the correct values can be retrieved from the semantic stack and passed to the PACKET module routines. The Semicolon Problem in Anim ation Unit T ranslation. Another prob­ lem was easily solved by calling the EndPacket routine more than once for the same animation unit. This situation can be illustrated by Pascal’s use of semicolons to separate, rather than terminate, statements (recall that the semicolon is not part of a Pascal statement [Jensen 91]). For example, in the Pascal code fragment shown in figure 8, the semicolon at the end of J := 2; is unnecessary. It really separates J := 2 from a null statement, and the null statement precedes the END statement. 44 However, the statement (including the semicolon) J := 2; is considered to be an animation unit. The yacc production associated with assign­ ment statements however, does not immediately associate the semicolon with the statement (e.g., J := 2). Rather, an enclosing production eventually accomplishes this association. Thus, in this case, the Parser first issues a call to EndPacket, passing the source file fine and column numbers associated with the token “2”, as well as the number of the final instruction within the E-code translation of the an­ imation unit. Then, when the Parser reduces the enclosing production recognizing the semicolon, the Parser again calls the EndPacket routine, passing the source file line and column numbers associated with the token 11; ”, as well as the current in­ struction number (this number will not have changed since no code is generated for the semicolon). Hence, the animation unit is “ended” correctly. As noted for this case, since the statement precedes an END statement, the semicolon is optional. If the semicolon is omitted, the Parser performs correctly by calling the EndPacket routine only once because the enclosing production in this case has no semicolon. BEGIN I := I; J := 2; END; Figure 8: Code Fragment Illustrating the Semicolon Problem A djusting an Anim ation U n it’s Ending Delim iter. There are instances when an animation unit’s ending delimiters must be adjusted after the Parser has 45 already ended construction of its associated packet structure. For example, if there are procedure declarations following a scope’s variable declarations, the Parser must generate an instruction to branch around the code translating the nested procedures in order to achieve the correct program flow. During program execution, the an­ imator will need to highlight the animation units corresponding to the variable declarations in this routine, then skip the procedure declarations and continue by highlighting the body of this routine, demonstrating execution flow. In such cases, the Parser has already (correctly) ended the construction of the current packet structure—describing the animation unit consisting of the last variable declaration in the scope—prior to reaching the first procedure declaration. The branch in­ struction number, however, must now be included in the current packet structure as the ending program memory address of the corresponding E-code instruction packet to ensure proper animation around the procedure declarations. In this sit­ uation, the packet structure cannot simply be “ended” again, because the current source line and column numbers now reflect the location of the beginning of the animation unit corresponding to the subsequent procedure declaration. Hence, the AdjustEndPacket routine was designed to alter the ending program memory ad­ dress associated with the current packet structure. For the example given above, the Parser calls the AdjustEndPacket routine, passing as a parameter the E-code instruction number associated with the branch instruction. The Parser then contin­ ues by calling the StartPacket routine to begin construction of the packet structure describing the procedure declaration animation unit. A djusting an Anim ation U n it’s Beginning Delim iter. The routine, AdjustStartpacket, was provided to support the situation in which an animation unit is nested within another animation unit. This situation occurs when there is a function call within another miniPascal statement. For example, consider the 46 following statement: Result := Min(x,y) + 2; Upon recognizing the assignment production, the Parser issues a call to the StartPacket routine to begin construction of the packet structure describing the animation unit consisting of the entire statement. For animation purposes, how­ ever, it is desirable to highlight the function call, Min(x ,y), separately in order to illustrate the fact that the function Min must be called before the ,assignment state­ ment can be completed (i.e., Min(x,y) should be treated as a separate animation unit). Thus, when the Parser recognizes the function call, it issues a call to the AdjustStartPacket routine, passing the source line and column numbers associated with the beginning of the function call. The AdjustStartPacket routine returns (via parameters) the previous source line and column numbers associated with the original StartPacket call for the current packet structure. The Parser then contin­ ues to control the construction of the current packet structure, which now describes only the Min(x ,y) portion of the statement, generating an E-code instruction packet translating Min(x,y) . When the Parser completes the processing of the function call production, it calls the EndPacket routine to end the current packet structure and then calls the StartPacket routine to start construction of the next packet structure. The line and column numbers passed to this call to StartPacket are the previous source line and column numbers returned by the AdjustStartPacket routine; the instruction number of the next E-code instruction is also passed to StartPacket. Thus, the source code associated with the now current packet structure is the entire assignment statement; the E-code packet translating this assignment statement does not include the instructions translating the function call. A djusting th e S tarting M em ory Address of a Packet. There is a case when it is necessary to retain the value of the current E-code instruction number so 47 that it can be used as the starting program memory address of the packet translating the next animation unit. Normally, a packet’s starting program memory address is the next instruction number to be generated. However, when an E-code lab e l in­ struction is the current instruction, there are situations when this instruction must become the first instruction in the next E-code packet. Unfortunately, since it is not an enclosing production that needs the memory address of the label instruc­ tion, the semantic stack cannot be used to store this value. This problem is solved by storing the current instruction number (i.e., the number of the E-code lab e l instruction) in a global variable, SaveStartInstrNum, which is accessible in the ap­ propriate production. Thus, whenever the Parser recognizes a token that marks the beginning of an animation unit, it first queries SaveStartInstrNum for a valid instruction number. If the instruction number is valid (i.e., its value is not -I), the Parser passes this instruction number value to the StartPacket routine and then sets SaveStartlnstrNum’s value to -I. If the value of SaveStartInstrNum is -I, the Parser passes the number of the next E-code instruction to be generated to StartPacket. It should be noted that use of global variables in the yacc parser must be done very carefully to ensure that the parser does not alter a variable’s value before it is used. A djusting th e Ending M em ory Address of a Packet. Similarly, there is also a case when it is necessary to retain the value of the current E-code instruction number so that it can be used as the ending program memory address of the packet currently under construction. This situation arises when the Parser generates an E-code b r (branch) or c a l l instruction immediately preceding the generation of a lab e l instruction. Due to the nature of the parse, however, the Parser has not yet determined that it is time to end construction of the animation unit corresponding to the current packet. This decision is made when the next production is processed. This next production is not an enclosing production, and thus cannot retrieve the 48 necessary information from the semantic stack. Here again, a global variable is used. The Parser queries this variable, SaveEndInstrNum, before calling the EndPacket routine. (This is the same situation in which the succeeding lab e l instruction must become the starting program address for the next animation unit.) Fragm ented Anim ation Units. Another problem in translating animation units occurs when an animation unit becomes “fragmented”. Fragments result when parsing either a single conditional statement or a single procedure/ function call that occurs within another conditional statement alone, not within a compound (BEGIN/END) statement. This situation is best explained by an example. Con­ sider the miniPascal program in figure 9 (the numbers on the left correspond to line numbers in the source hie). This example illustrates a single procedure call statement that occurs within a while loop (line number 11). Figure 10 shows the pseudo assembly language representation of the E-code instructions translating pro­ gram Increment I. The numbers shown on the left correspond to program memory addresses (instruction numbers). The individual packets are separated by a blank line in the figure. First, assume that the animator has two options pertaining to when an animation unit should be highlighted. One of these options is to highlight an animation unit, await a response from the user before executing the corresponding E-code packet (so that the user can contemplate what will happen when the animation unit is executed), and then rehighlight the same animation unit upon completion of its execution (so that the user can ponder where execution will proceed next). This is the scenario that has been used in previous examples. However, as the user progresses in his understanding of program flow, it would also be desirable to give the animator a second option. This option would allow the animator, upon completion of the execution of an animation unit, to immediately highlight the next animation 49 0 PROGRAM Increment I; 1 VAR 2 i.: INTEGER; 3 4 PROCEDURE IncrI; 5 BEGIN 6 i:=i+l 7 END; 8 9 BEGIN 10 i :=1; 11 WHILE i < 5 DO IncrI; 12 END. Figure 9: Source Code for Program Increment I 0 pushd C7 19 label 0 I nop 20 nop 2 nop 21 push !,Cl 22 pop c-, I,VO 3 inst c,VO 4 br 0 23 label 2 24 inst c,V2 5 label I 25 push I,VO 6 pushd C4 26 push I,C5 27 less c,I 7 nop 28 pop c,B,V2 29 push B ,V2 8 inst c,Vl 30 brf c,3 9 push I,VO 10 push I,Cl 31 nop 11 add c,I 12 pop c,I,Vl 32 call I 13 push I,Vl 33 label 5 14 pop c,I,VO 34 br 2 15 nop 35 label 3 16 uninst c,Vl 36 nop 17 popd 37 uninst c,V2 18 return 38 uninst c,V0 39 popd Figure 10: E-code Translation of Program Increment I 50 unit to be executed. The following discussion assumes that the animator is running under this second option. Suppose that the animator has just highlighted the animation unit END; of pro­ cedure IncrL Upon receiving a response from the user, the animator calls the E- machine to execute the corresponding E-code packet (consisting of the instructions numbered 15 through 18). The E-machine returns to the animator when it com­ pletes execution of the packet. The animator now queries the E-machine’s packet register in order to determine the next animation unit to be highlighted. Although it is not evident from figure 10, the RETURN instruction (number 18) causes control to pass to the LABEL instruction (number 33) following the CALL instruction that caused control to pass to procedure Incrl. (This is accomplished via the E-machine’s query of its return address stack, as discussed in chapter 2.) Instruction number 33 is within the E-code packet translating the animation unit consisting of the call to procedure IncrI (instruction numbers 32 through 34). Thus, the E-machine’s packet register contains the address of the packet structure describing the animation unit consisting of the call to procedure Incrl. This animation unit, however, was high­ lighted prior to the call to the procedure and should not be rehighlighted. The animation unit that should be highlighted is WHILE i < 5. This fragmentation problem was solved by adding a new field to the packet structure definition. Table 3 shows the Packet Table containing the packet struc­ tures resulting from the compilation of program Increment I. This new field, named FragAddr in table 3, holds the first program memory address at which this fragmentation can occur. (More than one such LABEL instruction within the E-code packet can cause this problem to occur multiple times due to multiple nesting pos­ sibilities.) When the Parser module determines that this situation is possible, it calls the AddPktFragInstr routine to initialize the FragAddr field. The animator must now query the E-machine’s program counter as well as its packet register when 51 determining the next animation unit to be displayed. If the program counter value is greater than or equal to the FragAddr value in the packet structure corresponding to the packet register, the animator does not change its current display (i.e., it con­ tinues to highlight the animation unit it is on, not the animation unit described by the packet structure to which the return was made). Of course, the animator must still call the E-machine to complete execution of the fragmented E-code packet even though there is no change in what the animator highlights. Once execution of the fragmented packet is completed, the next animation unit is highlighted, in this case WHILE i < 5. Packet Number Start Addr End Addr Start Line Start Col End Line End Col Scope Index Frag Addr Display Packet 0 0 I 0 0 0 18 0 -I TRUE I 2 2 I 2 I 4 0 -I TRUE 2 3 4 2 4 2 13 I -I TRUE 3 5 6 4 2 4 22 0 -I TRUE 4 7 7 5 4 5 .8 0 -I TRUE 5 8 14 6 5 6 10 0 -I TRUE 6 15 18 7 5 7 8 2 -I TRUE 7 19 20 9 2 ■9 6 2 -I TRUE 8 21 22 10 3 10 7 2 -I TRUE 9 23 30 11 3 11 13 2 -I TRUE 10 31 31 11 15 11 16 2 -I TRUE 11 32 34 11 18 11 28 2 33 TRUE 12 35 39 12 3 12 6 2 -I TRUE Table 3: Packet Table Resulting from Compilation of Program Increment I It should be noted that the miniPascal code WHILE i < 0 DO BEGII IncrI(i) END; does not produce a fragmentation problem, because the call to IncrI is contained in a compound statement (BEGIN/END) pair. Figure 11 contains the source code illustrating this situation. Figure 12 shows the pseudo assembly language 52 0 PROGRAM Increment2 1 VAR 2 i:INTEGER; 3 4 PROCEDURE IncrI; 5 BEGIN 6 i:=i+l 7 END; 8 9 BEGIN 10 i:=l; 11 WHILE i < 5 DO BEGIN IncrI END; 12 END. Figure 11: Source Code for Program Increment2 0 pushd C7 21 push I,Cl I nop 22 pop c,I,VO 2 nop 23 label 2 24 inst c,V2 3 inst c,VO 25 push I,VO 4 br 0 26 push I,C5 27 less c,I 5 label I 28 pop c,B,V2 6 pushd C4 29 push B ,V2 30 brf c,3 7 nop 31 hop 8 inst c,Vl 9 push I,VO 32 nop 10 push I,Cl 11 add c,I 33 call I 12 pop c,I,Vl 13 push I,Vl 34 label 5 14 pop c,I,V0 35 nop 36 br 2 15 nop 16 uninst c,Vl 37 label 3 17 popd 38 nop 18 return 39 uninst c,V2 40- uninst c,VO 19 label 0 41 popd 20 nop Figure 12: E-code Translation of Program Increment2 53 representation of the E-code instructions translating program Increment2. As can be seen in figure 12, in this case the LABEL instruction following the CALL instruc­ tion is not within the E-code packet that contains the CALL instruction. This LABEL instruction is the first instruction in the E-code packet translating the END state­ ment associated with the while loop. Thus, upon completion of the execution of the E-code packet translating the procedure’s END statement, the animator would (cor­ rectly) highlight the animation unit containing the END statement of the while loop. The fact that the LABEL instruction is physically adjacent to an instruction involved in the translation of the next animation unit (in this case, the while loop’s END statement) allows the Parser to “adjust” the E-code packet boundaries by querying the SaveStartInstrNum and SaveEndInstrNum variables as previously discussed. The sample program found in appendix C illustrates another situation in which a packet becomes fragmented. To H ighlight or N ot. A final field, named DisplayPacket in table 3, was added to the packet structure to indicate whether or not the animator should dis­ play (highlight) anything at all before the corresponding E-code packet is executed. There are two miniPascal situations when the animator should not change its display before execution of a packet: o before execution of a packet associated with the return from a function call; o before execution of a packet containing instructions that determine the case label to which a branch should be made based on the case selector value. The following two examples illustrate these situations. First, consider the mini- Pascal statement, Result := Min(x,y) + 2; In this example, the animator will eventually highlight the animation unit Min(x,y) and then await a response from the user before executing the corresponding E- code packet. When execution of Min is complete and control returns to the calling 54 procedure, a dummy packet containing the E-code to pop the value returned by Min into a temporary variable is executed. Since Min(x,y) has already been highlighted and its corresponding E-code packet has been executed, there is no corresponding source code (i.e., animation unit) associated with this dummy packet. This situation is similar to the fragmentation problem discussed above. In this case, however, execution of the entire dummy packet should not result in any animation (highlighting) of the source program. Thus, the DisplayPacket field in the dummy packet’s associated packet structure is set to FALSE. The animator would continue to highlight Min(x,y) until execution of the dummy packet is complete, and then highlight the animation unit containing the statement Result := Min(x,y) + 2; in order to illustrate the result of executing this assignment statement. Now, consider the code in figure 13. In this example, upon completion of parse of the entire case statement (up to the END statement), the Parser Module calls the GenCaseSearch routine in the Code Driver module. This routine generates a packet of E-code that enables control to pass to the proper case label at run time. Here again, this is a “dummy” packet in that there is no animation unit associated directly with it. For the case statement in figure 13, the animator will first highlight the animation unit, CASE i OF, and then await a response from the user. Upon completion of execution of the corresponding E-code packet, the animator will subsequently call the E-machine to execute the dummy packet, without changing its display (i.e., CASE i OF continues to be highlighted). Then, since the value of the case selector is 2, the animator highlights the animation unit containing the case label 2: and again awaits a response from the user. The Scanner Module The Scanner module performs the scanning (or lexical analysis) function for the compiler. This module consists of the lex-generated yylex routine and two 55 i : =2; CASE i OF I: j:=100 2: j:=200 3: j :=300 OTHERS: j :=0 END; Figure 13: Source Code for a CASE Statement user-supplied routines that handle mini? as cal comments and quoted strings, respec­ tively. The yylex routine is called by the Parser when the next token is required. The other two routines are called internally (from yylex in the Scanner module). Lex produces a scanner by processing a specification file containing rules that consist of regular expressions. These regular expressions define the tokens in a language, in this case miniPascal. Actions, written in C, are interspersed among the rules—these actions effect the accomplishment of the two scanner tasks performed upon each call to yylex. The Scanner module’s first task is to recognize and return miniPascal tokens (and their values) to the parser. Its second task is to enter the original miniPascal source code into the E-code’s SOURCESEGTION via calls the GenSource routine in the SOURCE module. The Scanner module is also responsible for ensuring that the miniPascal compiler is not case-sensitive. Thus, when the Scanner module recognizes a identifier token, it first enters the name of the identifier into the SOURCESECTION, and then converts the name (in the yytext variable) to all lower case characters before returning to the Parser. 56 The Code Driver Module The Code Driver module drives the E-code translation of the source program. This module is a large one, consisting of thirty-two routines. Many of these routines are called by the Parser module when the parse reaches a point where code should be generated. The remaining routines in this module are called internally (from within the Code Driver module). The Code Driver module interfaces directly with the Semantic Analysis and Symbol Table modules to perform semantic analysis, and with the CODE, LABEL, and VARIABLE modules to perform code generation. The Semantic Analysis Module The Semantic Analysis module performs semantic analysis during compilation. This module is a large module, consisting of fifty-eight routines. These routines are called by the Parser module, the Code Driver module, and the STATSCOPE mod­ ules when semantic checking must be done. The Semantic Analysis, routines may also be called internally (from within the Semantic Analysis module). These rou­ tines perform tasks relevant to both the initialization and the retrieval of semantic information. For example, the Parser module calls the SetProcAttributes routine upon encountering a procedure declaration. This routine is dedicated to associating with the procedure name its (compiler generated) starting program memory address as well as any formal parameter attributes. Later, when the Parser encounters a call to the procedure, it calls the GetProcAttributes routine to retrieve this informa­ tion in order to associate the correct program memory address with the generated E-code c a l l instruction and to verify the actual parameter list. The Semantic Analysis module interfaces directly with the Symbol Table module to enter and re­ trieve symbol table attribute information. The Semantic Analysis module also inter­ faces directly with the STRING module by calling the EnterString routine to enter 57 the values of string literals and enumerated constant names into the string space array. The PACKET Module The PACKET module produces the E-code packet descriptors during compila­ tion. This module contains routines that initialize a statically allocated array of structures containing the packet descriptions. With the exception of the WritePKT routine, the PACKET module routines are called by the Parser module during the parsing process. The WritePKT routine, called by the Main module at the end of compilation, writes the packet structure array elements to records in the PACKETSECTION of the E-code file. The SOURCE Module The SOURCE module produces the source code array (for animation purposes). This module contains the GenSource routine, which initializes a statically allocated array containing the source code of the miniPascal program being compiled. Each element in the source code array corresponds to a single line in the miniPascal source program. The GenSource routine is called by the Scanner module during the scanning process. The WriteSOURCE routine, called by the Main module at the end of compilation, writes the source code array elements to' records in the SOURCESECTION of the E-code file. The LABEL Module The LABEL module produces the table that maps E-code label numbers to their corresponding E-code instruction numbers (i.e., E-code lab e l instructions). This module contains the GenLabRegTable routine, which initializes a statically allocated array whose elements contain the instruction number of corresponding 58 E-code (labe l) instructions. The GenLabRegTable routine is called by both the Parser and the Code Driver modules during the compilation process. The WriteLAB routine, called by the Main module at the end of compilation, writes the label array elements to records in the LABELSEGTION of the E-code file. The VARIABLE Module The VARIABLE module produces the table that maps E-code variable register numbers to their corresponding data memory sizes. This module contains the Gen- VarRegTable routine, which initializes a statically allocated array whose elements contain the size of the data memory reserved for corresponding variable register numbers. The GenVarRegTable routine is called by both the Parser and the Code Driver modules during the compilation process. The WriteVAR routine, called by the Main module at the end of compilation, writes the variable register array ele­ ments to records in the VARIABLESEGTION of the E-code file. The STRING Module The STRING module generates the string array found in the E-code STRINGSECTION. The miniPascal compiler’s implementation of enumerated types precipitated the need for a new E-machine component (the string space), and hence the need for a corresponding section in the E-code file. This new section is called the STRINGSECTION. Ordinarily, only the internal numeric value of an enumerated type variable is required in translated object code'for real computers and computing environments. It is desirable, however, for a program animation system to have the animator display the enumerated constant name rather than (or in addition to) the internal numeric value of a variable of an enumerated type. The STRINGSECTION consists of a statically allocated character array containing all of the enumerated constant names defined in a miniPascal program, as well as the values of any string 59 literals declared in the source program (which may also need to be displayed by the animator). When the Semantic Analysis module encounters the definition of a string literal or an enumerated constant name, it calls the EnterString routine in the STRING module. The WriteSTRINGS routine, called by the Main module at the end of compilation, writes the string character array to the STRINGSECTION of the E-code file. When a miniPascal program is animated, the STRINGSECTION portion of the E-code file is loaded into the E-machine’s string space. The string space is then accessed by the animator for displaying string constants and enumerated variable values. For example, upon completion of execution of the program in figure 14, the animator Can display the enumerated type variable values as shown in figure 15. Figure 16 illustrates the relationship of the E-machine’s string space with the variable registers and data memory. This illustration assumes that a variable regis­ ter associated with an enumerated type variable represents 32-bits of data memory. The 16 high-order bits of this data memory location contain the dynamically de­ termined internal numeric value of the enumerated constant associated with this variable; the 16 low-order bits contain an index into the string space where the as­ sociated enumerated constant name can be found. As can be seen in figure 16, the index into the string space is always that of the first constant name of the enumer­ ated type. This is due to the fact that the compiler can statically generate code to increment or decrement the numeric value of an enumerated type variable (e.g., for an enumerated type control variable in a for loop). The compiler cannot, however, statically determine in advance the absolute string space index of the enumerated constant name associated with an enumerated type variable at any given time. In­ stead, the animator has the capability to retrieve the variable’s numeric value and the starting string space index. The animator can then step sequentially through the string space until the name corresponding to the numeric value is found; the 60 Program Payroll!; TYPE DAYS = (MON,TUES,WED,THURS,FRI); FREQUENCY = (WEEK,MONTH); VAR OffDay,PayDay:DAYS; PayFreq:FREQUENCY; BEGIN OffDay:=WED; PayDay:=FRI; PayFreq:=WEEK; END. Figure 14: Source Code for Program PayrolIl Program Payroll!; Program Payroll! OffDay = 2 /* WED */ TYPE PayDay = 4 /* FRI */ DAYS = (MON,TUES,WED,THURS,FRI); PayFreq = 0 /* WEEK */ FREQUENCY = (WEEK,MONTH); VAR OffDay,PayDay:DAYS; PayFreq:FREQUENCY; BEGIN OffDay:=WED; PayDay:=FRI; PayFreq:=WEEK; END. Figure 15: Animation Display After Execution of Program Payrolll 61 Variable Variable Data String Registers Stacks Memory Space PayDay OfFDay PayFreq O 4 8 M O N O T U E S O W E D O T H U R S O F R I O W E E K O M O N T H O Figure 16: String Space’s Relationship with Variable Registers and Data Memory 62 names are null-terminated,, thus ■ allowing such a. search. (A similar situation will exist when the predeclared Pascal functions, pred and succ, are eventually imple­ mented.) The animator also accesses the string space when displaying enumerated type array indices. Thus, upon completion of the execution of the program shown in figure 17, the animator can display DayCode’s value as shown in figure 18. In this case, the animator retrieves the values of the enumerated type indices through information stored in the Static Scope Table. In this example, the Static Scope Table entry for the variable DayCode contains the following information: o Identifier Name: DayCode o Upper array bound: 19 o Lower array bound: O o Entry type: INTEGERENUMI o Variable Reg: O Type INTEGERENUMI means that the variable DayCode is an array with integer elements and an enumerated index type. This indicates to the animator that the array bounds are indices into the string space rather than absolute numbers. The Error Module The Error module produces an error message whenever an error is encountered during compilation. This module consists of routines to report the following types of errors: o scan errors; o parse errors; o internal errors; o parse warnings; o lack of support messages. 63 Program Payroll2; TYPE DAYS = (MON,TUBS,WED,THURS,FRI); DAYLIST = ARRAY [MON..FRI] OF INTEGER; VAR DayCode:DAYLIST: Day:DAYS; BEGIN FOR Day := MON TO FRI DO DayCode[Day] := O; END. Figure 17: Source Code for Program Payrolls Program Payrolls; Program Payrolls DayCode[MON] = O TYPE DayCode[TUBS] = O DAYS=(MON,TUBS,NED,THURS,FRI); DayCode[WED] = O DAYLIST=ARRAY [MON..FRI] OF INTEGER; DayCode[THURS] = O VAR DayCode:DAYLIST: Day:DAYS; BEGIN FOR Day := MON TO FRI DO DayCode[Day] := O ; END. DayCode[FRI] = O Figure 18: Animation Display After Execution of Program Payrolls 64 Each of these routines prints an appropriate message, and, with the exception of the parse warning routine, then calls the AbnormalEnd routine in the Main module. The Error module routines are called by various other modules during the compilation process. The Memory Allocation Module The Memory Allocation module is responsible for allocating and freeing memory for the various data structures required during the compilation process. This module consists of allocate and free routines associated with each data structure defined in the compiler. The Memory Allocation module routines are called, by many of the other modules during the compilation process. The Assembly Code Module The Assembly Code module produces a text file containing the pseudo assembly language translation of a source program. The WrtAsmFile routine in this module writes a copy of the CODESECTION instructions to a text file in pseudo assembly language format. This module is not required for compilation since it does not gen­ erate any of the E-code file sections; it does, however, provide an excellent debugging tool for compiler development. The routines in this module are (optionally) called by the CODE module as the instructions are generated. The CODE Module The CODE module produces the array of C structures containing the E-code instructions. The CODE module contains the GenInstr routine which writes a single E-code instruction to a temporary file, and (optionally) calls the WrtAsmFile routine in the Assembly Code module (to output an equivalent pseudo assembly code instruction). The GenInstr routine is called by the Parser and Code Driver modules. 65 The CODE module also contains the PatchInstr routine, which maintains an array of structures that associate a “patch value” with a program memory address. There are two situations when a code patch is necessary: 1. References to label values before they are known during the generation of case statement code. 2. The association of the index of the Static Scope Table entry for a proce­ dure/function with the appropriate pushd instruction (see the Parser module section previously in this chapter). When compilation is complete, the Main module closes and reopens the tem­ porary CODESECTION file and then calls the WriteCODE routine. This routine reads the temporary file and writes the records to the CODESECTION of the E-code hie, incorporating any patches into the proper instructions. The Symbol Table Module The Symbol Table module manages the compiler’s symbol table. The Symbol Table module routines are responsible for opening and closing static scopes as well as entering and retrieving identifier names and their attributes. The Symbol Table routines are called by the Parser, the Semantic Analysis, the Code Driver, and the STATSCOPE modules. Each identifier name in the symbol table has a static scope level and, possi­ bly, a record number, associated with it. This allows the same identifier name to be used in more than one scope, including scopes nested within each other. The current scope level is contained in a global variable in the Symbol Table Interface module. When the Parser module determines the beginning of a scope, it calls the OpenScope routine, which simply increments the scope level. When the Parser mod­ ule determines an end of scope, it calls the CloseScope routine, which deletes all symbol table entries for that scope and then decrements the scope level. It should 66 be noted that, at the beginning of compilation, the Parser module (via the Seman­ tic Analysis module) enters the predefined Pascal types—integer, real, boolean, and char—and the predefined Pascal constants—true, false, and maxint—into the sym­ bol table. These predefined identifiers are associated with the outermost scope of the program. The symbol table itself is implemented as a single hash table using chaining to resolve collisions. The chained buckets are stacked such that the identifiers declared i in the most recent scope are at the top of the stack. This ensures that the proper identifier is retrieved by an outward search of the buckets associated with the same identifier name. The hashing algorithm used is “hashpjw” from P.J. Weinberger’s C compiler, as presented in Compilers: Principles, Techniques, and Tools by Aho, Sethi, and Ullman [Aho 86]. The basic structure of the symbol table, as well as the design of the Symbol Ta­ ble module, are based on the symbol table design presented in Crafting A Compiler by Charles N. Fischer and Richard J. LeBlanc, Jr. [Fischer 88]. There are also sev­ eral symbol table features adapted from the symbol table design given in Compiler Design in C by Allen I. Holub [Holub 90]. Figure 19 illustrates the symbol table implementation. A symbol table entry for an identifier consists of several structures which are chained together. The various symbol table structures are shown in figure 20. The identifier’s primary structure, the Symbol Structure, is initialized as soon as the Parser module encounters the identifier’s declaration. Later, when the parse has progressed to the point where the identifier’s attributes become known, the re­ maining “descriptor” structures are initialized. Once its attributes are known, an identifier’s symbol table entry will consist of at least a Symbol Structure, a Type Descriptor Structure, and a Class Descriptor Structure. The various symbol table structures are described below. 67 Hash Table 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Figure 19: The Symbol Table Hash Implementation A Symbol structure consists of the following fields: • IdName (identifier name). This field contains a pointer to the dynamically allocated memory location containing the identifier name. • ScopeLevel. This field contains the static scope level of the identifier. • RecordNum. If the identifier being described is a field name, this Symbol Structure field contains the compiler-assigned number of the record containing the identifier name. If the identifier is not a field name, this number is 0. • LineNum. This field contains the identifier’s source code line number. Since all of the symbol table names are stored as lower case, this number is later used by the STATSCOPE module to retrieve the identifier’s original name (for animation purposes) from the source code array, which is maintained in the SOURCE module. 68 o, ColNuxn. This held contains the identifier’s starting column number in the source code and is likewise used by the STATSCOPE module to retrieve the identifier’s original name. 0 IdType. This field contains the pointer to the identifier’s Type Descriptor record. o IdClass. This field contains a pointer to the identifier’s Class Descriptor record. o NextInList. If the identifier is declared within a fist of identifiers, this field contains the pointer to the Symbol Structure of the previous identifier in the list. The Parser module can then traverse this chain to associate common attributes with the identifiers in the list. This chain is also used (in reverse order) when assigning internal numeric values to enumerated constant names. o NextInScope. This field contains a pointer to the Symbol Structure of the previous identifier in the current static scope level. This chain is traversed by the STATSCOPE module to retrieve the identifiers declared within a given static scope (as discussed in the next section). o NextBucket. This field contains the pointer to the previous Symbol Structure in the hash table entry’s collision chain. This chain is used internally in the Symbol Table module. A Type Descriptor structure consists of the following fields: o UseCount. Since numerous other structures may point to the same Type Descriptor structure, the UseCount field is utilized to prevent the Memory Allocation module from freeing the structure while it is still in use. o Size. This field contains the identifier’s size (in terms of the host computer’s smallest addressable memory size, normally bytes). o Packed. This field indicates whether or not a structured type identifier is packed. The miniPascal compiler recognizes the packed attribute only if the identifier is a string variable. 0 TypeName. This field contains an enumerated constant value indicating the identifier’s type (e.g., INTEGERTYPE, ENUMTYPE, or ARRAYTYPE). o SelectType. This field contains a pointer to another descriptor based on the value of the TypeName field. For example, if the TypeName is ARRAYTYPE, SelectType would point to an Array Descriptor record, which would contain further attribute information pertaining to the identifier. ' 69 Id Scope Record Line Col Id Id NextIn NextIn Next Name Level Num Num Num Type Class List Scope Bucket Symbol Structure C Jse ount Size Packed Type Name Select Type Type Descriptor Structure Class Name SelectClass Class Descriptor Structure Use Elemnt First Const Count Type Index Part Array Descriptor Structure Use Lower Upper Enum Is Next Count Bound Bound Desc Char Index Index Descriptor Structure Use First Base Max Count Const Type Val Enumeration Descriptor Structure Use Base Lower Upper Count Type Bound Bound Subrange Descriptor Structure Use Record Size Num FirstCount Num Fields Field Record Descriptor Structure Use Mode Param NextCount Type Param Parameter Descriptor Structure Figure 20: The Symbol Table Structures 70 A Class Descriptor structure consists of the following fields: o ClassName. This field contains an enumerated constant value indicat­ ing the identifier’s class (e.g., VARIABLECLASS, CONSTANTCLASS, or PROCEDURECLASS). o Select Class. This field is composed of a union structure containing information based on the value of the ClassName field. For example, if the ClassName is VARIABLECLASS, one of the fields in this structure would contain the variable register number associated with the variable. The remaining symbol table structures are used in describing specific type or class attributes pertaining to a given identifier. These symbol table structures are discussed below. Figure 21 shows the enumerated names of the various miniPascal identifier types; figure 22 shows the enumerated names of the various miniPascal identifier classes. (Identifier types POINTERTYPE, SETTYPE, and FILETYPE listed in figure 21 are not currently implemented.) /* Identifier types */ typedef enmti INTEGERTYPE, REALTYPE, BOOLEAFTYPE, CHARTYPE, ENUMTYPE, SUBRANGETYPE, POINTERTYPE, SETTYPE, ARRAYTYPE, RECORDTYPE, FILETYPE, EXISTINGTYPE, STRINGTYPE } IdTypes; Figure 21: The miniPascal Identifier Types /* Identifier classes */ typedef enum VARIABLECLASS, FIELDCLASS, TYPENAMECLASS, CONSTANTCLASS, PROCEDURECLASS, FUNCTIONCLASS, INTLITCLASS, REALLITCLASS, CHARLITCLASS, STRINGLITCLASS, PROGRAMCLASS, TEMPCLASS , PARAMETERCLASS } IdClasses; Figure 22: The miniPascal Identifier Classes 71 In some cases, no additional structures are needed to describe an identifier’s attributes. For example, the three basic symbol structures (defined above) suffice in describing a variable (i.e., identifier class is VARIABLECLASS) of type integer (i.e., identifier type is INTEGERTYPE). An Array Descriptor structure is used when an identifier’s TypeName field (in its Type Descriptor structure) has the value ARRAYTYPE. The SelectType field holds a pointer to the corresponding Array Descriptor structure. An Array Descriptor structure consists of the following fields; o UseCount. Since numerous other structures may point to the same Array Descriptor structure, the UseCount field is utilized to prevent the Memory Allocation module from freeing the structure while it is still in use. o ElemntType. This field holds a pointer to the Type Descriptor structure defining the array’s element type (e.g., the predefined integer type or some previously declared user-defined type). o Firstlndex. This field holds the pointer to the Index Descriptor structure describing the array’s (first) index. The Index Descriptor structure is defined below. o ConstPart. This field holds the “constant part” used in the algorithm that calculates the linear address for a (multidimensional) array reference. This algorithm was taken from [Fischer 88]. An Index Descriptor structure is referenced by the Firstlndex field in a corre­ sponding Array Descriptor structure, as discussed above. It may also be referenced by the NextIndex field of another Index Descriptor structure. An Index Descriptor structure consists of the following fields; o UseCount. Since numerous other structures may point to the same Index Descriptor structure, the UseCount field is utilized to prevent the Memory Allocation module from freeing the structure while it is still in use. o LowerBound. This field holds the (numeric) lower bound of an array index. o UpperBound. This field holds the (numeric) upper bound of an array index. 72 o EnumDesc. If the array index values are an enumerated type, this field holds a pointer to the Enumeration Descriptor structure describing the enumerated type. The Enumeration Descriptor structure is discussed below. o Is Char. This field’s value is TRUE if the array index values are CHARTYPE; otherwise this field’s value is FALSE. o Nextlndex. This field holds a pointer to the Index Descriptor structure de­ scribing the next array index (if any). An Enumeration Descriptor structure is used when an identifier’s TypeName field (in its Type Descriptor structure) has the value ENUMTYPE. The SelectType field holds a pointer to the corresponding Enumeration Descriptor structure. An Enumeration Descriptor structure consists of the following fields: o UseCount. Since numerous other structures may point to the same Enumera­ tion Descriptor structure, the UseCount field is utilized to prevent the Memory Allocation module from freeing the structure while it is still in use. o FirstConst. This field holds the pointer to the Symbol structure describing the first constant in the enumeration. o BaseType. This field holds a pointer to the Type Descriptor structure that describes the first constant in the enumeration. o MaxVaL This field holds the maximum numeric value associated with the enumeration (be., the number of constants in the enumeration minus I). A Subrange Descriptor structure is used when an identifier’s TypeName field (in its Type Descriptor structure) has the value SUBRANGETYPE. The SelectType field holds the pointer to the corresponding Subrange Descriptor struc­ ture. A Subrange Descriptor structure consists of the following fields: o UseCount. Since numerous other structures may point to the same Subrange Descriptor structure, the UseCount field is utilized to prevent the Memory Allocation module from freeing the structure while it is still in use. o BaseType. This field holds a pointer to the Type Descriptor structure de­ scribing the base type of the subrange (e.g., a pointer to the Type Descriptor structure describing the predefined integer or char types). 73 o LowerBound. This field holds the (numeric) lower bound of the subrange. o UpperBound. This field holds the (numeric) upper bound of the subrange. A Record Descriptor structure is used when an identifier’s TypeName field (in its Type Descriptor structure) has the value RECORDTYPE. The SelectType field holds a pointer to the corresponding Record Descriptor structure. A Record De­ scriptor structure consists of the following fields: o UseCount. Since numerous other structures may point to the same Record Descriptor structure, the UseCount field is utilized to prevent the Memory Allocation module from freeing the structure while it is still in use. o RecordNum, This field hold the (compiler-generated) number associated with the record. o Size. This field holds the record’s size (normally, in bytes). o NumFields. This field holds the number of fields in the record. o FirstField. This field holds a pointer to the Symbol structure associated with the record’s first field. A Parameter Descriptor structure is used when an identifier’s ClassName field (in its Class Descriptor structure) has the value PROCEDURECLASS or FUNCTIONCLASS. A Parameter Descriptor structure describes the formal param­ eters associated with a procedure (or function). The union structure corresponding to the SelectClass field holds a pointer to the Parameter Descriptor structure de­ scribing the first parameter in the formal parameter list. A Parameter Descriptor structure consists of the following fields: o UseCount. Since numerous other structures may point to the same Parameter Descriptor structure, the UseCount field is utilized to prevent the Memory Allocation module from freeing the structure while it is still in use. o Mode. This field holds the parameter’s mode (i.e., VALUE or REFERENCE). o ParamType. This field holds a pointer to the Type Descriptor structure that describes the parameter. o NextParam. This field holds a pointer to the Parameter Descriptor structure of the next parameter in the formal parameter list. 74 The STATS COPE Module The STATSGOPB module contains the routines that build the Static Scope Table. The animator uses the Static Scope Table in conjunction with the dynamic scope stack to determine the data memory values that should be displayed at a given point during a program’s execution. The Static Scope Table was called the Symbol Table in Birch’s thesis [Birch 90]; the name was changed here to avoid confusion with the compiler’s symbol table. This table is a linear array of structures (or entries) which are in turn divided into numerous scope blocks. The scope blocks are chained together via parent/ child pointers as discussed later. A scope block is used to describe the program identifiers associated with a single static scope. For example, a scope block would describe all of the local variable names and locally declared functions and/or procedures within a given procedure. G enerating a S tatic Scope Block. The Parser module calls the STATSCOPE module’s GenStatScopeBlock routine whenever the end of a static scope is encountered during parsing (i.e., at the end of a procedure, function, or program). The parsing of an inner scope is always completed before the containing scope is completely parsed (a result of Pascal syntax). The GenStatScopeBlock routine drives the generation of the static scope block in the Static Scope Table for the scope in question from information in the symbol table for the current scope. (Recall that the symbol table entries for this scope will be deleted at this point of the. parse, so this information must be saved in the Static Scope Table for animation purposes.) This routine, via calls to other STATSCOPE module routines, performs the following tasks: o dynamic allocation of a static scope block. The number of static scope en­ tries (i.e, the size of the static scope block) is passed as a parameter to Gen­ StatScopeBlock; 75 o entry of the static scope’s owner name into the Scope Owner Table. The Scope Owner Table contains the information necessary to tie all of the static scope blocks together at the end of compilation. The static scope’s owner name is passed as a parameter to GenStatScopeBlock; o initialization of the descriptive information contained in the static scope block entries. The names and descriptive attributes of the identifiers declared within a scope are retrieved by traversing the symbol table’s NextInScope chain; the head of the appropriate “scope chain” is passed as a parameter to GenStatScopeBlock. A Static Scope Table entry describing a simple variable identifier includes the variable’s type attribute (e.g., INTEGER) and its variable register number attribute. For the more complicated array variable entry, additional fields are utilized to de­ scribe the array bounds. If the array’s index values are simple integers or characters, the lower and upper bound values are entered directly into the corresponding fields. For arrays whose index values are enumerated type values, the appropriate indices into the STRINGSECTION array are computed and entered into the static scope entry’s array bound fields (see the STRING module section previously in this chap­ ter). For multidimensional arrays, additional scope blocks are used to describe the other index bounds. These additional index scope blocks are chained via the NxtIdx field. A corresponding entry is placed in the Scope Owner Table indicating that the array “owns” the index scope block (via the Array Descriptor field). Record variable entries also use an additional scope block to describe the fields within the record. The child pointer is used to associate a record name with its defining scope block. Here again, an entry is placed in the Scope Owner Table indicating that the record “owns” the scope block (via the Record Descriptor field). The P rocN um Field. The Static Scope Table’s ProcNum field can now be explained. As each program, procedure, and function name identifier is 76 encountered during compilation, it is assigned a unique “procedure number.” The identifier names are referred to as static scope names in the following discussion. The procedure number is produced by a counter variable in the Semantic Analy­ sis module. Thus, the procedure number assigned to a miniPascal program name is always 0. The next static scope name declaration encountered in the program would be assigned the procedure number I, and so on. A static scope name’s pro­ cedure number is stored as one of its symbol table attributes. Thus, when the GenStatScopeBlock routine encounters a static scope name while traversing a con­ taining scope’s NextInScope chain, one of the attributes it retrieves is the corre­ sponding procedure number. This number is then placed in the ProcNum field of the Static Scope Table entry describing the static scope name. The animator uses the ProcNum field in conjunction with the dynamic scope stack when determining the dynamics of program execution. The use of this field is best explained by an example. The program shown in figure 23 contains a recursively called function (function Fact). That Fact is recursive implies that for any given call to function Fact, the animator must be able to determine the “depth” of the pertinent data memory values associated with the variables declared in function Fact, as well as the depths of any variables in the calling (program) scope. These values are retrieved by querying the appropriate variable stacks, as discussed in chapter 2. Thus, upon the final recursive call to function Fact, the animator should be able to display data memory values as shown in figure 24. The ProcNum field is used in the following manner when determining the depths of the variables declared in a program, procedure or function scope. After the E-machine has been loaded with the E-code translation of a source program, the animator queries the E-machine to determine the total number of static “procedure” scopes that are described in the Static Scope Table. The Static Scope Table for the example in figure 23 is shown in table 4. The animator then dynamically allocates 77 Program Ftrl; VAR n ,nfact:INTEGER; Function Fact(n:INTEGER):INTEGER; BEGIN IF n = 0 THEN Fact==I ELSE Fact:=n * Fact(n-1) END ; BEGIN n:=3; nfact:=Fact(n); END. Figure 23: Source Code for Program Ftrl Program Ftrl; Program Ftrl n = 3 VAR nfact is undefined n.nfact:INTEGER; Function Fact Function Fact(n:INTEGER):INTEGER; n = 3 BEGIN Fact is undefined IF n = 0 THEN Fact:=1 Function Fact ELSE Fact:=n * Fact(n-1) n = 2 END; Fact is undefined BEGIN Function Fact n:=3; n = I nfact:=Fact(n); Fact is undefined END. Function Fact n = 0 Fact = I Figure 24: Animation Display After Final Recursive Call of Function Fact 78 a procedure count array that contains an entry corresponding to each of these scopes. Thus, for the program shown in figure 23, this array has two entries. Entry 0 cor­ responds to the program scope and entry I corresponds to function Fact. During program animation, the animator sets the values of the procedure count array en­ tries to reflect the current number of active calls to the corresponding procedure or function. (This means that the animator reinitializes the values in the procedure count array every time control is passed to the animator.) At the same time, the E-machine’s dynamic scope stack contains a history of active scopes, with the Static Scope Table entry number of the most current scope being the value at the top of this stack. En Id Upr Lwr Nxt Off Type Rec Par Ch Var Proc try Name End End Idx set Siz ent ild Reg Num S c o p e b lo c k d e s c r i b in g f u n c t i o n F a c t 0 - - - - HEADER - 4 - - - I n - - - - INTEGER - - - 2 _ 2 Fact - - - - INTEGER - - - 3 - 3 - - - - END - - - - - S c o p e b lo c k d e s c r i b in g p r o g r a m F t r l 4 - - - - HEADER - 9 - - - 5 n - - - - INTEGER - - - I 6 nfact - - - - INTEGER - - - 0 7 Fact - - - - FUNCTION - - 0 - I 8 - - - - END - - - - - B o o t s t r a p s c o p e b lo c k 9 - - - - HEADER - - - - _ 10 Ftrl - - - - PROGRAM - - 4 - 0 11 - - - - END - - - - - I Table 4: Static Scope Table Resulting from Compilation of Program Ftrl Now, consider the animation of the current example. Suppose the program has executed to the point that it is in the third recursive call to function Fact. When the animator begins displaying data memory variables after the execution of the packet 79 translating the animation unit F a c t: - I , the procedure count array and the dynamic scope stack are in the state shown in figure 25. The values in the procedure count array indicate that the program Ftrl has one active “call” and that function Fact has four active calls. In this example, the animator begins its retrieval of data memory values by examining the value at the bottom of the dynamic scope stack. The bottom stack value is 10, which means that the animator now examines the tenth entry in the Static Scope Table. This entry is a PROGRAM entry describing FtrL The ProcNum field in the PROGRAM entry has the value 0. Next, the animator will examine entry 0 in the procedure count array to determine the depth of the variables to be displayed for this invocation of the program scope. Since the program scope cannot be called recursively, this value will always be I. Thus, when the animator retrieves the values of the variables described in the program’s child scope block, it will instruct the E-machine to retrieve the data memory values associated with the top of the appropriate variable stacks. After these values have been displayed, the animator decrements the value in entry 0 of the procedure count array. Procedure Count Array (Program Ftrl) Q (Function Fact) I I 4 Dynamic Scope Stack 0 1 2 3 4 JlO _ 7_ _7_ _7_ 7 (bottom) (top) Figure 25: Procedure Count Array and Dynamic Scope Stack 80 Next, the animator examines the value in entry I in the dynamic scope stack. This value is 7, corresponding to the seventh entry in the Static Scope Table. This entry, whose ProcNum field has the value I, describes function Fact. The animator then examines entry I in the procedure count array. The current value in this entry is 4, indicating that the animator should instruct the E-machine to retrieve data memory values associated with the fourth level of the appropriate variable stacks when displaying variable values described in the function’s child scope block. These values reflect the function’s variable values resulting from its initial call from the program scope. The animator then decrements the value in entry I of the procedure count array so that the next iteration will result in displaying the values associated with the first recursive call to function Fact. The animator continues this process until the dynamic scope stack is exhausted, resulting in the display shown in figure 24. W riting the STATSCOPESECTION. When parsing is completed, the Main module calls the WriteSTATSCOPE routine. This routine first traverses the Scope Owner Table in reverse order to initialize the program, procedure, and function par­ ent/ child pointers that will appear in the final linear array containing the complete Static Scope Table. Since the nesting characteristics of miniPascal allow the same name to be given to more than one procedure or function, the reverse order traversal ensures that the proper child is found. The final entry in the Scope Owner Table describes a “bootstrap” scope block, which will become the final scope block in the completed Static Scope Table. The Scope Owner Table contains the information needed to initialize the child pointer in the bootstrap scope block; this child pointer is the computed index of the first entry in the scope block describing the local vari­ ables, procedures, and functions belonging to the program. Also, the parent pointer in the program scope block can now be initialized with the computed index of the 81 bootstrap scope block. Similarly, each function and procedure can have its child and parent pointers initialized. ■ Finally, the WriteSTATSCOPE routine traverses the Scope Owner Ta­ ble in forward order to sequentially write the various scope blocks to the STATSCOPESECTION of the E-code file. Exam ple of STATSCOPESECTIOIV G eneration. Program Samp2, shown in figure 26, is used to illustrate the generation of the STATSCOPESECTION. This program contains two procedures, named A and B, which are at the same static scope level. Procedure A contains a nested function, also named B. Table 5 is the Scope Owner Table for this program. The Scope Owner Table holds the following information: o Owner Name. This field contains the corresponding static scope’s owner’s name (for program, procedure, and function scopes); o Pointer to Scope Block. This field contains the memory address of the cor­ responding scope’s dynamically allocated scope block. The numbers used in this example are for illustrative purposes only; o Scope Table Index. This field contains the computed (final) index of the first entry of the static scope block describing the corresponding static scope; o Number of Scope Entries. This field contains the number of identifiers (e.g., variable names and function names) declared in the corresponding static scope; o Array Descriptor (indicates the “owner” of additional scope blocks containing index descriptions for multidimensional array variables'). This field contains the memory address of a dynamically allocated symbol table array descriptor structure, and thus allows array variables sharing the same user defined type to share an index scope block; o Record Descriptor (indicates the “owner” of the additional scope containing record field descriptions). This field contains the memory address of a dy­ namically allocated symbol table record descriptor structure, and thus allows record variables that share the same user defined type also to share a field description scope block. 82 Program - S amp 2 ; TYPE LIST = ARRAY [I..4] OF INTEGER; VAR X,Y :INTEGER; Listl=LIST; Procedure A(VAR X,Y :INTEGER); Function B (I:INTEGER):INTEGER; BEGIN { Function B } END; { Function B } BEGIN { Procedure A } END; { Procedure A } Procedure B(I,J :INTEGER); VAR List2=LIST; BEGIN { Procedure B } END; { Procedure B } BEGIN { Program Samp2 } END. { Program Samp2 } Figure 26: Source Code for Program Samp2 83 Owner Name Pointer to Scope Block Scope Table Index Number of Scope Entries Array Descriptor Record Descriptor B 1000 0 4 - _ A 3002 4 5 - - B 5001 9 5 - - Samp2 6240 14 7 - - Bootstrap 7000 21 3 - Table 5: Scope Owner Table for Program Samp2 Tables 6 through 10 show the five scope blocks generated by the STATSCOPE module during compilation. Table 11 shows the completed Static Scope table as it would be written to the STATSCOPESECTION at the end of compilation. En try Id Name Upr Bnd Lwr Bnd Nxt Idx Off set Type Rec Siz Par ent Ch ild Var Reg Proc Num 0 - - - - HEADER - - - - - I I - - - . - INTEGER - - - 5 - 2 B - - - - INTEGER - - - 6 - 3 - - - - END - - - - - Table 6: Scope Block for Function B in Procedure A in Program Samp2 En try Id Name Upr Bnd Lwr Bnd Nxt Idx Off set Type Rec Siz Par ent Ch ild Var Reg Proc Num 0 - - - - HEADER - - - - - I X - - - - INTEGER - - - 4 - 2 Y - - - - INTEGER - - - 3 - 3 B - - - - FUNCTION - - - - 2 4 - - - - END - - - - - Table 7: Scope Block for Procedure A in Program Samp2 84 En try Id Name Upr Bnd Lwr Bnd Nxt Idx Off set Type Rec Siz Par ent Ch ild Var Reg Proc Num 0 - - - - HEADER - - - - _ I I - - - - INTEGER - - - 8 - 2 J - - - " INTEGER - - - 7 - 3 List2 4 I - - INTEGER - - - 9 _ 4 - - - - END - - - - - Table 8: Scope Block for Procedure B in Program Samp2 En try Id Name Upr Bnd Lwr Bnd Nxt Idx Off set Type Rec Siz Par ent Ch ild Var Reg Proc Num 0 - - - - HEADER - - - - - I X - - - - INTEGER - - I - 2 Y - - - - INTEGER - - - 0 - 3 Listl 4 I - - INTEGER - - - 2 - 4 A - - - - PROCEDURE - - - - I 5 B - - - - PROCEDURE - - - - 3 6 - - - - END - - - - - Table 9: Scope Block for Program Scope in Program Samp2 En try Id Name Upr Bnd Lwr Bnd Nxt Idx Off set Type Rec Siz Par ent Ch ild Var Reg Proc Num 0 - - - - H E A D E R - - - - - I S amp 2 - - - - P R O G R A M - - - - 0 2 - - - - EN D - - - - - Table 10: Scope Block for “Bootstrap” Scope in Program Samp2 85 En try Id Name Upr Bnd Lwr Bnd Nxt Idx Off set Type Rec Siz Par ent Ch ild Var Reg Proc Num S c o p e b lo c k d e s c r i b in g f u n c t i o n B i n p r o c e d u r e A 0 - - - - HEADER - 4 - - _ I I - - - - INTEGER - - - 5 _ 2 B - - - - INTEGER - - - 6 _ 3 - - - - END - - - - - b c o p e b lo c k d e s c r i b in g p r o c e d u r e A 4 - - - - HEADER - 14 - - - 5 X - - - - INTEGER - - - 4 - 6 Y - - - - INTEGER - - - 3 - 7 B - - - - FUNCTION - - 0 - 2 8 - - - - END - - - - - Scope block describing procedure B 9 - - - - H E A D E R - 14 - - - 10 I - - - - IN T E G E R - - - 8 - 11 J - - - - IN T E G E R - - _ 7 12 List 2 4 I - - IN T E G E R - - _ 9 13 - - - - E N D - - - - - Scope block describing program Samp2 14 - - - - H E A D E R - 21 - - - 15 X - ,, - - - IN T E G E R - - - I _ 16 Y - - - - IN T E G E R - _ 0 17 Listl 4 I - - IN T E G E R - - - 2 _ 18 A - - - - P R O C E D U R E - - 4 - I 19 B - - - - P R O C E D U R E - - 9 - 3 20 - - - - E N D - - - - - Bootstrap scope block 21 - - - - H E A D E R - - - - 22 S amp 2 - - - - P R O G R A M - - 14 _ 0 23 - - - - E N D - - - - - Table 11: Final Static Scope Table for Program Samp2 86 C H APTER 5 CONCLUSIONS A ND FU TU R E ENH ANCEM ENTS C on clu sion s The first compiler for the E-machine has been designed and implemented. The compiler’s source language, called miniPascal, is a subset of ISO Standard Pascal. The miniPascal compiler is a one-pass compiler written in ANSI Standard C and was developed using the Unix development tools, lex and yacc [Mason 90], [Lesk 75], [Johnson 75]. The compiler’s scanner module, produced by running lex on a Unix machine, and its parser module, produced by running yacc on a Unix machine, were subsequently downloaded to a DOS machine. These two modules, compiled and finked with numerous semantic analysis and code generation modules, comprise the miniPascal compiler. A number of miniPascal programs compiled into E-machine object files have been -successfully animated using a simple DOS animator to drive the E-machine. 87 F uture E n h an cem en ts Since miniPascal is a subset of Pascal, future versions of miniPascal will include additional Pascal features. A next logical feature to be implemented is pointers— particularly important to animate, because they are often a difficult concept for students to graspi Other desirable features to be implemented in the future include: records with variant parts, the with statement, sets, and predeclared functions and procedures. It would be particularly useful to implement the predeclared procedure, read. The availability of the read procedure would greatly facilitate the initialization of data (e.g., arrays) in programs demonstrating concepts such as sorting and matrix multiplication. One feature that is not completely implemented in the current version is the method of displaying the value returned by a function call. Currently, the code generated by the compiler allows the animator to display a function value only when displaying the variable values in a window associated with the called function itself. The function name, however, is actually declared in the calling scope, and hence its value is available in this scope. It would be desirable to have the function value also displayed in the calling scope’s data memory window. A problem occurs when a function is called multiple times from the same scope, either by calls in several different statements or by multiple calls within the same statement. The question here is whether to display only the most recent value returned by the function, or to display all previous function values as well. Once this design decision is made, the compiler will require modification to produce code to support the display method. The compiler should also be enhanced to identify the E-code instructions that are considered critical. Currently, the compiler simply designates all E-code instructions as critical, thus hampering the efficiency of the E-machine. 88 Another compiler enhancement is improvement of error handling. Currently, only minimal error reporting is supported by the compiler, and there is no attempt at error recovery. This minimal support is considered sufficient for the present DYNALAB system since the miniPascal programs will be prepared by expert pro­ grammers. Later, however, the DYNALAB system may be used by students prepar­ ing their own programs for animation. Thus, error handling must be enhanced to provide a more “friendly” environment for the miniPascal programmers. Finally, since the DYNALAB system is intended to be an evolutionary system, the miniPascal compiler will continue to evolve in order to support new animation features. For example, the animator may provide visualization of expression evalua­ tion in order to demonstrate precedence rules in a language. The animator may also ' display “TRUE” or “FALSE” as conditional expressions are evaluated. It may be desirable to have the programmer indicate groups of source code lines that should appear in the same source code animation window in order to clearly illustrate some concept. AU of these animation features require modifications to the compiler in order to generate the supporting code. Thus, even though the miniPascal compiler is a usable first compiler for the E-machine, its evolution is expected to continue. The compiler is also expected to serve as a pattern for developers of future E-machine compilers. 89 REFERENCES 90 References [Aho 86] A. V. Aho, R. Sethi, and J . D. Ullman. Compilers: Principlesl Tech­ niques, and Tools. Addison-Wesley, Reading, Massachusetts. 1986- [Birch 90] M. L. Birch. An Emulator for the E-machine. Master’s thesis. Com­ puter Science Department, Montana State University. June 1990. [Brown 88-1] M. Brown. Algorithm Animation. The MIT Press, Cambridge, Mas­ sachusetts. 1988. [Brown 88-2] M. Brown. “Exploring Algorithms Using Balsa-IF, Computer Volume 21, Number 5. May 1988. [Fischer 88] C. N. Fischer and R. J. LeBlanc, Jr. Crafting a Compiler. Ben­ jamin/Cummings Pubhshing Company, Menlo Park, California. 1988. [Holub 90] A. I. Holub. Compiler Design in C. Prentice Hall, Englewood Chffs, New Jersey. 1990. [Jensen .91] K. Jensen and N. Wirth. Pascal: User Manual and Report. Springer- Verlag, New York, New York. 1991. [Johnson 75] S. C. Johnson. “Yacc: Yet Another Compiler-Compiler”, Computer [Lesk 75] [Mason 90] [Ng 82-1] [Ng 82-2] [Patton 89] [Ross 91] Science Technical Report Number 32. Bell Laboratories, Murray Hill, New Jersey. July 1975. M. E. Lesk and E. Schmidt. “Lex - A Lexical Analyzer Generator”, Computer Science Technical Report Number 39. Bell Laboratories, Murray Hill, New Jersey. October 1975. T. Mason and D. Brown, lex & yacc. O’Reihy and Associates, Sebastopol, California. 1990. C. Ng. Ling User’s Guide. Unpublished Master’s project. Computer Science Department, Washington State University. 1982. C. Ng. Ling Programmer’s Guide. Unpubhshed Master’s project. Computer Science. Department, Washington State University. 1982. S. D. Patton. The E-machine: Supporting the Teaching of Program Execution Dynamics. Master’s thesis. Computer Science Department, Montana State University. June 1989. R. J. floss. “Experience with the DYNAMOD Program Animator”, Proceedings of the Twenty-second Symposium on Computer Science Education, SIGCSE Bulletin, 23(l):35-42. 1991. [Ross 93] R. J. Ross. “Visualizing Computer Science”, Invited chapter to appear in the AACE monograph, Scientific Visualization in Mathematics and Science Education. 1993. [Winslett 93] R. Winslett. Juno. Master’s thesis in progress. Computer Science De­ partment, Montana State University. 91 A PPEND IC ES 93 A PPE N D IX A THE E-M ACHINE IN STRU CTIO N SET This appendix, which is adapted from chapter 2 of Birch’s thesis, lists all of the instructions in the instruction set of the E-machine. A pseudo assembly language format is used to describe the instructions, however the instruction stream itself is actually an array of structures loaded from the CODESECTION portion of the E-machine object file at run time. The object file is described in detail in chapters 2 and 4 of this thesis. Each instruction is composed of four fields (or arguments): o an opcode mnemonic (e.g., push, pop, add); o a flag marking the instruction critical or noncritical (CFLAG); o an field denoting, the data type to be used in the instruction (TYPE); o a field containing either a number (# ) or an addressing mode (ADDR); Addressing modes and their formats are described in appendix B. The mnemonic field is separated from the others by one or more spaces, and the remaining fields are separated by commas. The CFLAG field must be either c o r n to designate whether the instruction is to be treated as critical (c) or noncritical (n), The TYPE field holds a single capital letter, I, R, B, C, or A, referring to the data types integer, real, boolean, character, or address, respectively. The ^ refers to a constant specifying the number of an E-code label, a constant numeric value, or an 94 E-machine variable register number. If the ADDR argument is used for the fourth field, it refers to any of the addressing modes described in appendix B. In the following description of the instruction set, the effects off executing an instruction both forward and in reverse are given. The actions taken in each case will be different, depending on whether the instruction has been designated critical or noncritical. Some instructions have no critical/noncritical flag, because their execution (either forward or in reverse) would be the same in either case. Reversing through a noncritical instruction sometimes requires that something be pushed onto the evaluation stack to keep the stack of the proper size; in such cases an arbitrary value, called DUMMY is used. add CFLAG, TYPE A dds th e top tw o values on the eva luation stack and p laces th e resu lt on to th e evaluation stack . F o rw a rd -C ritic a l: P ops th e top tw o values o f the eva luation stack , pushes them on to the save stack , and th en pushes their sum on to the eva luation stack . F o rw a r d -N o n c r it ic a h Pops th e top tw o values o f the eva luation stack and pushes their sum on to th e eva luation stack . R e v e r s e -C r it ic a l: P ops the top value o f the eva luation stack and d iscards the value. Pops th e top tw o elem en ts o f the save stack and pushes th em on to th e eva luation stack . R e v e r s e -N o n c r it ic a l: P ushes D U M M Y on to th e eva luation stack . a l lo c C F L A G , # A llo ca tes a b lock o f m em ory o f # size. F o rw a rd : A ttem p ts to a lloca te # com puter w ords o f storage. I f successfu l, th e address o f th e first word o f d a ta m em ory th a t w as a lloca ted is pushed on to th e eva luation stack . O therw ise, a N U LL address is pushed on to th e eva luation stack . R e v e rs e : P ops the top va lue off the eva luation stack , w hich shou ld be a d a ta address, and frees # w ords o f d a ta m em ory startin g at th a t address. and CFLAG, TYPE B itw ise a n d ’s the top tw o values o f th e eva luation stack and p laces th e result on to the eva lua tion stack . 95 F o rw u rd -C ritic a l: P ops the top tw o values o f the eva luation stack , pushes the two values on to th e save stack , and th en pushes th e b o tto m value b itw ise a n d ’ed w ith the top value on to the eva luation stack . F o rw a rd -N o n c r it ic a l: P ops th e top tw o values o f the eva luation stack and pushes the b o tto m value b itw ise a n d ’ed w ith th e top value on to the eva lua tion stack . R e v e r s e -C r it ic a l: Pops the top value o f th e eva luation stack and d iscards it . Pops the top tw o values o f the save stack and pushes th em onto th e eva lua tion stack . R e v e r s e -N o n c r it ic a l: P ushes D U M M Y on to th e eva luation stack . br # U n con d ition a lly branches to lab e l # . F o rw a rd : Loads th e program counter w ith th e address o f the lab e l # in struction . R e v e rs e : N o operation . brt, brf CFLAG, # C ond itiona lly branches d epend ing on w hether th e top o f the eva lua tion stack is T R U E or FALSE. F o rw a rd -C ritic a l: P ops the top value off the eva luation stack and pushes it on to the save stack . I f th e value satisfies the con d itiona l on the branch (T R U E for brt, FALSE for b rf), th e program counter is loaded w ith th e address o f the lab e l # in struction . F o rw a rd -N o n c r it ic a l: P ops the top value off th e eva luation stack . If th e value agrees w ith th e con d itiona l branch (T R U E for brt, FA LSE for b rf), th e program counter is loaded w ith th e address o f the lab e l # in struction . R e v e r s e -C r it ic a l: P ops the top value o f the save stack and pushes it on to the evaluation stack . R e v e r s e -N o n c r it ic a l: A rb itrarily pushes D U M M Y on to the eva lua tion stack . call # B ranches to lab e l # sav ing th e program address w hich fo llow s the call in struction so th a t execu tion w ill con tinue there upon execu tion o f a return in struction . F o rw ard : P ushes the current program counter on to the return address stack , then loads th e address o f th e lab e l # in struction in to th e program counter. R e v e r s e : P ops th e top value from the return address stack . c a s t C F L A G , T Y P E , T Y P E C hanges th e top va lue o f the evaluation stack from th e first T Y P E to th e second. F o rw a rd -C ritic a l: P ops th e top value o f th e eva luation stack and pushes it on to the save stack , th en transform s the value from th e first T Y P E to th e second . T he result is pushed on to th e eva luation stack . 96 F o rw a rd -N o n c r it ic a l: P ops th e top value o f th e eva luation stack , th en transform s the value from th e first T Y P E to th e second . T he resu lt is pushed on to th e eva luation stack . R e v e r s e -C r it ic a l: P ops th e top value o f the eva luation stack . T h e p op s th e top value o f th e save stack and pushes it on to th e eva luation stack . R e v e r s e -N o n c r it ic a l N o th in g happens. d iv C F L A G 1 T Y P E D iv id es th e second value from th e top o f the eva luation stack by th e first and p laces the resu lt on to th e eva luation stack . F o rw a rd -C ritic a l: P ops the top tw o values o f the eva luation stack , pushes th e tw o values on to th e save stack , and pushes th e b o tto m value d iv ided by th e top value on to the eva lua tion stack . F o rw a rd -N o n c r it ic a l: P ops the top tw o values o f the eva luation stack and pushes the b o tto m value d iv ided by th e top value on to the eva luation stack . R e v e r s e -C r it ic a l: P ops th e top value o f th e eva luation stack and d iscards it . Pops the top tw o va lues o f th e save stack and pushes th em on to the eva lua tion stack . R e v e r s e -N o n c r i t ic a l: P ushes D U M M Y on to th e eva luation stack . e q l , n e q l , l e s s , l e q l , g t r , g e q l C F L A G , T Y P E If th e second value from th e top o f the eva lua tion stack com pares favorab ly w ith the first, then T R U E is pushed on to the eva luation stack . O therw ise FA LSE is pushed on to the eva lua tion stack . F o rw a rd -C ritic a l: P ops th e top tw o values off th e evaluation stack , pushes th e tw o values on to th e save stack , com pares th e b o tto m value w ith th e top . I f th e result o f the com parison m atches th e com parison op era tion perform ed, a b oo lean T R U E is pushed on to th e eva luation stack , otherw ise, a b oo lean FA LSE is pushed on to th e evaluation stack . F o rw a rd -N o n c r it ic a l: P ops th e top tw o values off th e eva luation stack and com pares the b o tto m value w ith th e top value. I f th e result m atches th e com parison operation perform ed, a b oo lean T R U E is pushed on to th e eva luation stack , o therw ise, a boolean FA LSE is pushed on to th e eva luation stack . R e v e r s e -C r it ic a l: P ops th e top value o f th e eva luation stack and d iscards it , then pops th e top tw o values off the save stack and pushes th em on to th e eva lua tion stack. R e v e r s e -N o n c r it ic a l: P ushes D U M M Y on to the evaluation stack . in s t C FL A G , # C reates an in stance o f the variable register # . F o rw a rd - C r itica l: A llo ca tes enough d a ta m em ory for the variable represen ted by the vari­ ab le register # . T h e address o f th e a lloca ted m em ory is th en pushed on to the variable reg ister’s stack . 97 F o rw a rd -N o n c r it ic a l: A llo ca tes enough d a ta m em ory for the variable represented by the variable register # . T he size o f the variable is stored in th e variable register. T he address o f the a lloca ted m em ory is then pushed on to th e variable reg ister’s stack . R e v e r s e -C r it ic a l: T h e d a ta m em ory occup ied by th e variable register is freed and the top value is popp ed off th e variable reg ister’s stack . R e v e r s e -N o n c r it ic a l: Frees th e space taken up by the variable in d a ta m em ory and pops th e top value off the variable reg ister’s stack . label # M arks the lo ca tion to w hich a branch m ay be m ade. F o rw a rd : P ushes th e previous program counter on to the stack p o in ted to by label register # • R e v e rs e : P ops the top value o f the stack p o in ted to by lab e l register # and p laces it in th e program counter. link # A ssoc ia tes one variable register w ith the value o f another. F o rw a rd : P ops th e top value o f the eva luation stack and pushes it on to th e variable stack p o in ted to by variable register # . R e v e r s e : P ops th e top value o f th e variable stack p o in ted to by variab le register # and pushes it on to th e eva luation stack . Ioadar CFLAG, ADDR P laces th e address A D D R in th e address register. F o rw a rd -C r itic a l: T h e con ten ts o f the address register are pushed on to the save stack . T h en th e address com puted for th e addressing m ode is p laced in th e address register. Im portan t note: it is th e address th a t is com p u ted by th e addressing m ode th a t is used , n o t the con ten ts o f th a t address. F o rw a rd -N o n c r it ic a l: T h e address com p uted for th e addressing m od e is p laced in the address register. Sam e n o te for Forw ard-C ritical applies here. R e v e r s e -C r it ic a l: T he address on top o f th e save stack is popp ed off and placed in the address register. R e v e r s e -N o n c r it ic a l: N o th in g happens. Ioadir CFLAG, # P laces th e # in to th e in dex register. F o rw a rd -C r itic a l: T h e con ten ts o f th e in dex register are pushed on to th e save stack . T hen # is p laced in th e address register. F o rw a rd -N o n c r it ic a l: # is p laced in the in d ex register. .98 R e v e r s e -C r it ic a l: T h e value on top o f the save stack is popped o ff and p laced in the index register. R e v e r s e -N o n c r it ic a l: N o th in g happens. m o d C F L A G 1 T Y P E F inds th e rem ainder o f the d iv ision o f the second value from th e top o f th e evaluation stack by th e first and p laces the resu lt on to th e eva luation stack . F o rw a rd -C ritic a l: P ops the top tw o values o f th e eva luation stack , pushes the tw o values on to th e save stack , and th en pushes the b o tto m value m odu lo th e top value on to the eva luation stack . F o rw a r d -N o n c r it ic a l P ops th e top tw o values o f the eva luation stack and pushes th e bottom , value m odu lo th e top value on to th e eva luation stack . R e v e r s e -C r it ic a l: Pops th e top value o f the eva luation stack and d iscards it . Pops the top tw o values o f th e save stack and pushes them onto th e eva lua tion stack . R e v e r s e -N o n c r it ic a l: P ushes D U M M Y on to the eva luation stack . m u l t C F L A G , T Y P E M ultip lies th e top tw o values on th e eva luation stack and p laces the resu lt on to the evalu­ a tion stack . F o rw a rd -C ritic a l: P ops th e top tw o values o f the eva luation stack , pushes the tw o values on to th e save stack , and then pushes their p roduct on to the eva lua tion stack. F o rw a rd -N o n c r it ic a l: P ops th e top tw o values o f th e eva luation s ta ck .a n d pushes their product on to th e eva luation stack . R e v e r s e -C r it ic a l: Pops the top value o f th e eva luation stack and d iscards it . Pops the top tw o values o f th e save stack and pushes th em on to the eva lua tion stack . R e v e r s e -N o n c r it ic a l: P ushes D U M M Y on to th e eva luation stack . n e g T Y P E N egates th e top value on th e eva luation stack . F o rw ard : Pops th e top o f the evaluation stack and pushes the n ega tion o f th a t value on to the eva luation stack . R e v e rs e : P ops the top o f the eva luation stack and pushes the n ega tion o f th a t value on to th e eva luation stack . n o p T h is in stru ction is th e standard n o -op era tion in struction . It can be u sed to create packets for h igh level program tex t for w hich no E -m ach ine in struction s are generated but w hich n oneth eless need to be h igh ligh ted for an im ation purposes. A n exam p le o f th is is th e b e g in keyw ord in P ascal. In illu stra tin g the flow o f con tro l during program an im ation , a b e g in keyw ord m ay need to be h igh ligh ted (and thu s have its ow n underly ing E -m ach ine packet o f in stru ction s). T h e n o p in struction can be used in these cases. 99 not CFLAG, TYPE B itw ise com p lem en ts th e top value o f the eva luation stack . F o rw a rd : P ops th e top o f th e eva luation stack and pushes th e b itw ise no t o f th a t value on to th e eva luation stack . R e v e rs e : Pops th e top o f the eva luation stack and pushes th e b itw ise no t o f th a t value on to th e eva luation stack . or CFLAG, TYPE B itw ise or s th e top tw o values o f th e eva luation stack and p laces th e resu lt on to the evalu­ a tion stack . F o rw a rd -C ritic a l: P ops the top two values o f th e eva luation stack , pushes the tw o values on to th e save stack , and th en .pu shes the b o tto m value b itw ise or’ed w ith the top value on to th e eva luation stack . F o rw a rd -N o n c r it ic a l: P ops the top tw o values o f th e eva luation stack and pushes the b o tto m value b itw ise o r ’ed w ith th e top value on to the eva lua tion stack . R e v e r s e -C r it ic a l: P ops th e top value o f the eva luation stack and d iscards it . Pops the top tw o values o f the save stack and pushes th em on to the eva lua tion stack . R e v e r s e -N o n c r it ic a l: P ushes D U M M Y on to th e eva luation stack . pop CFLAG, TYPE, ADDR P ops th e top va lue o f th e eva luation stack and p laces it in A D D R . F o rw a rd -C ritic a l: Pushes the value in A D D R on to the save stack and th en pops the top value o f the eva lua tion stack and stores it in A D D R . F o rw a rd -N o n c r it ic a l: P ops th e top value o f th e eva luation stack and stores it in A D D R . R e v e r s e -C r it ic a l: P ushes th e value in A D D R on to the eva luation stack and then pops the top va lue o f th e save stack and p laces it in A D D R . R e v e r s e -N o n c r it ic a l: P ushes th e value in A D D R on to th e eva luation stack . popar CFLAG P ops th e address on top o f th e evaluation stack and p laces it in th e address register. F o rw a rd - C r itica l: T h e con ten ts o f the address register are pushed on to th e save stack . T h e address on top o f th e evaluation stack is popp ed and p laced in th e address register. F o rw a rd -N o n c r it ic a l: T h e address on top o f the eva luation stack is p op p ed off and p laced in th e address register. R e v e r s e - C r itica l: T h e con ten ts o f the address register are pushed on to th e evaluation stack . T h en the address on top o f the save stack is p opp ed off and p laced in the address register. R e v e r s e -N o n c r it ic a l: T h e con ten ts o f th e address register are pushed on to the evaluation stack . 100 p o p d P ops th e top value from th e dynam ic scope stack . F o rw ard : P ops the top value from the dynam ic scope stack and pushes it on to the save d ynam ic scope stack . R e v e r s e : P op s th e top value from th e save dynam ic scope stack and pushes it on to the d ynam ic scope stack . p o p ir C FL A G P ops the in teger on top o f th e eva luation stack and p laces it in th e in d ex register. F o rw a rd -C ritic a l: T h e con ten ts o f th e in d ex register are pushed on to th e save stack . T hen th e in teger on top o f th e eva luation stack is popp ed off and p laced in the in dex register. F o rw a rd -N o n c r it ic a l: T h e in teger on top o f the eva luation stack is p op p ed off and p laced in the in d ex register. R e v e r s e -C r it ic a l: T h e con ten ts o f th e in d ex register are pushed on to th e eva luation stack . T h en th e in teger on top o f the save stack is popp ed off and p laced in th e in dex register. R e v e r s e -N o n c r it ic a l: T h e con ten ts o f the in dex register are pushed on to the eva luation stack . p u s h T Y P E , A D D R P ushes the value in A D D R on to the eva luation stack . F o rw a rd : P ushes th e value in A D D R on to th e eva luation stack . R e v e rs e : Pops th e top value o f the eva luation stack and stores it in A D D R . p u s h a A D D R P ushes th e ca lcu la ted address o f A D D R on to th e eva luation stack . T h is in struction is in tend ed to be used for push ing the addresses o f param eters passed b y reference. F o rw a rd : P u shes th e ca lcu la ted address o f A D D R on to the eva lua tion stack . R e v e r s e : P ops and d iscards th e address on top o f the eva luation stack . p u s h d # P ushes th e # on to the dynam ic scope stack (w here # is the index o f a program , procedure, or fun ction entry in the S ta tic Scope Table) F o rw ard : P ushes # on to th e dynam ic scope stack . R e v e rs e : P ops th e top value from th e dynam ic scope stack . r e a d C FL A G , T Y P E R eads a value from th e user. F o rw a rd : A user in terface fun ction is called to get inpu t from th e user. T h e inpu t is converted from a string to the appropriate typ e and pushed on to th e eva luation stack . 101 R e v e r s e : T h e top value is popp ed off the eva luation stack . r e t u r n R eturns to th e appropriate program address fo llow ing a ca ll in struction . F o rw a rd : P ops th e top value o f th e return address stack and load s it in to the program counter. R e v e r s e : P ushes th e previous program counter on to the return address stack . s h l C F L A G 1 T Y P E , # Sh ifts th e value on top o f th e eva luation stack # b its to th e left filling on th e right w ith 0 ’s. F o rw a rd -C ritic a l: Pops th e top value o f the eva luation stack , pushes it on to the save stack , th en sh ift it b its to the left and pushes th e result back on to th e eva luation stack . F o rw a rd -N o n c r it ic a l: P ops the top value o f the eva luation stack , sh ifts it le ft # b its, then pushes th e resu lt back on to the eva luation stack . R e v e rs e - C r itica l: P ops th e top value o f th e eva luation stack . T h en p op s the top value o f th e save stack and pushes it on to th e eva luation stack . R e v e r s e -N o n c r it ic a l: N o th in g happen s. s h r C FL A G , T Y P E , # Sh ifts th e value on top o f th e eva luation stack # b its to th e right filling on th e left w ith 0 ’s. F o rw a rd -C ritic a l: P ops th e top value o f th e eva luation stack , pushes it on to th e save stack , th en sh ift it b its to th e right and pushes the result back on to th e eva luation stack . F o rw a rd -N o n c r it ic a l: Pops th e top value o f the eva luation stack , sh ifts it right # b its, th en pushes th e resu lt back on to th e eva luation stack . R e v e r s e -C r it ic a l: P ops th e top value o f the eva luation stack . T h en pops th e top value o f th e save stack and pushes it on to the eva luation stack . R e v e r s e -N o n c r it ic a l: N o th in g happens. s u b C F L A G , T Y P E Sub tracts the value on th e top o f the eva luation stack from the second value from the top and p laces the resu lt on to th e evaluation stack . F o rw a rd -C ritic a l: P ops th e top two values o f th e eva luation stack , pushes the tw o values on to th e save stack , and then pushes th e b o tto m value m inus th e top value on to the eva luation stack . F o rw a rd -N o n c r it ic a l: P ops th e top tw o values o f th e eva luation stack , and pushes the b o tto m value m inus th e top value on to th e eva luation stack . R e v e r s e -C r it ic a l: Pops the top value o f th e eva luation stack and d iscards it . Pops the top tw o values o f the save stack and pushes th em on to the eva lua tion stack . R e v e r s e -N o n c r it ic a l: P ushes D U M M Y on to th e eva luation stack . 102 u n a l l o c C F L A G , # D ea llo ca tes a b lock o f m em ory o f # size b eg inn ing a t th e d a ta address a top the eva luation stack . F o rw a rd -C r itic a l: Pops th e top value off th e eva luation stack , w hich should be a d a ta address, cop ies # w ords o f d a ta m em ory sta rtin g a t th a t address to the save stack , th en frees the d a ta m em ory. F o rw a rd -N o n c r it ic a l: P ops th e top value off th e eva luation stack , w hich should be a d a ta address, and frees # words o f d a ta m em ory startin g a t th a t address. R e v e r s e -C r it ic a l: P ops the top value off the save stack , w hich shou ld be a d a ta address, pushes it on to the eva luation stack and a lloca tes ^ words o f d a ta m em ory starting at th a t lo ca tion . # words are then m oved from th e save stack to th is d a ta m em ory. R e v e r s e -N o n c r it ic a l: A llo ca tes # words o f d a ta m em ory and pushes th e address o f the first word o f a lloca ted m em ory on to th e eva luation stack . u n in s t C FL A G , # D isp ose o f an in stance o f variable register # . F o rw a rd -C ritic a l: Frees th e m em ory occup ied by th e variable th en pops the top d a ta m em ory address off the variable reg ister’s stack and pushes it on to th e save stack . F o rw a rd -N o n c r it ic a l: Frees th e m em ory occup ied by th e variable th en p op s the top address off th e variable reg ister’s stack . R e v e r s e -C r it ic a l: P ops th e address off the save stack and pushes it on to the variable reg ister’s stack th en rea llocates enough d a ta m em ory for th e variable # starting at th a t address. R e v e r s e -N o n c r it ic a l: R ea lloca tes enough d a ta m em ory for the variab le # and pushes the address o f th e d a ta m em ory a lloca ted on to th e variable reg ister’s stack . u n l in k # D isa ssoc ia tes a variable register from another. F o rw ard : P ops the top value o f th e variable stack p o in ted to by variable register # and pushes it on to th e save stack . R e v e rse : P ops th e top value o f the save stack and pushes it on to th e variable stack po in ted to by variable register # . w r i t e C F L A G , T Y P E D isp lays a va lue for the user. F o rw a rd - C r itica l: T h e top o f th e evaluation stack is popp ed and the va lue pushed on to the save stack . T h is value is th en converted in to a string and passed to a user interface fun ction w hich takes appropriate a ction to d isp lay the value. F o rw a rd -N o n c r it ic a l: T h e top o f the eva lua tion stack is popp ed and is converted in to a string and passed to a user in terface fun ction to be d isp layed . 103 R e v e r s e -C r it ic a l: T h e va lue on top o f th e save stack is popp ed and pushed on to the eva lua tion stack . T h en a user in terface fun ction is called to hand le und isp laying o f th e la st va lue d isp layed . R e v e r s e -N o n c r it ic a l: D U M M Y is pushed on to the eva luation stack and then a user in ter­ face fun ction is called to hand le u nd isp lay ing o f the la st value d isp layed . xor CFLAG1 TYPE B itw ise exclu sive-or s the top tw o values o f th e eva luation stack and p laces th e result on to th e eva luation stack . F o rw a rd -C ritic a l: P ops the top tw o Values o f th e eva luation stack , pushes th e tw o values on to th e save stack , and th en pushes the b o tto m value b itw ise exclu sive or’ed w ith th e top value on to th e eva luation stack . F o rw a rd -N o n c r it ic a l: P ops the top tw o values o f th e eva luation stack and pushes the b o tto m value b itw ise exclu sive or’ed w ith the top value on to th e eva luation stack . R e v e r s e -C r it ic a l: P ops th e top value o f the eva luation stack and d iscards it . P ops the top tw o values o f th e save stack and pushes th em on to the eva luation stack . R e v e r s e -N o n c r it ic a l: P ushes D U M M Y on to the eva luation stack . 104 A PPE N D IX B THE E-M ACHINE ADDRESSING MODES This appendix, which is adapted from chapter 2 of Birch’s thesis, describes the various addressing modes allowed in E-machine instructions. Quite a few modes are defined in order to accommodate standard high level language data structures more conveniently. Note that each addressing mode refers to either the data at the computed address or the computed address itself, depending on the instruction. That is, for those instructions that need a data value, such as push, the data value at the address computed from the addressing mode is used. For instructions that need an address, such as pop, the address that was computed from the addressing mode is used. For each addressing mode listed below, an example of its intended use is given. Each example is given in pseudo assembly language form for clarity; it is important to remember that no assembler (and hence no assembly language) has yet been developed for the E-machine. However, the pseudo assembly language examples should be easily understood. 105 c o n s t a n t m o d e - C # T h is m ode is o ften called th e im m ed ia te m ode in other architectures; # is itse lf th e in teger, real, b oo lean , character, or address con stan t operand required in the in struction . E x a m p le : A 1.5; cou ld be tran sla ted into: ■ p u s h R ,C 1 .5 ; push 1.5 p o p c ,R ,V l ; assign to A v a r ia b le m o d e - V # : va riab le reg is te r # ---- » to p o f va riab le s t a c k ---- > d a ta m e m o r y T h is m od e accesses th e d a ta m em ory lo ca tion g iven in the top elem ent o f th e variable stack th a t is p o in ted to by variable register f f . T h is m od e is in tended to address source program variables th a t are o f one o f the basic E -m ach ine types. E x a m p le : B := I; cou ld be tran sla ted into: p u s h ! ,C l ; push I p o p c ,I ,V 3 ; assign to B variable indirect - (V#): variab le reg is te r # ---- » to p o f va r iab le s t a c k ---- > d a ta m e m o r y — > d a ta m e m o r y T h is m ode accesses th e d a ta in d a ta m em ory w hose loca tion is stored at another d a ta m em ory lo ca tion , w hich is p o in ted to by th e top o f th e variable stack p o in ted to by variable register # . T h is m ode is in tended for accessing th e conten ts o f a h igh level language pointer variables. It w ould be particu larly usefu l for hand ling param eters in C w hich are passed as poin ters for th e in ten tion o f passing by reference. E x a m p le : in t fo o ( C ) in t *C { *C = I; } 106 cou ld be tran sla ted into: la b e l c,5 procedure entry in s t c ,V 3 create new in stance o f C p o p c,A ,V 3 assign argum ent passed to *c p u s h ! ,C l push I p o p c,I,(V3) assign to *c u n in s t c,V 3 destroy in stance o f C r e t u r n return from call v a r ia b le o f f s e t m o d e - V # { o f f s e t } : va r ia b le re g is te r # — > to p o f va r iab le s ta c k + I R — > d a ta m e m o r y T h is m ode accesses th e d a ta p o in ted to by th e top o f the variable register f f stack plus a b y te offset w hich w as prev iously loaded in to th e in dex register. T h is m ode is useful for accessing fields in a structured d a ta typ e such as a P asca l record or C struct. E x a m p le : A := D .F ie ld 2 cou ld be tran sla ted into: p u s h 1,2 ; D is a t offset o f 2 in structure p o p ir C ; put offset in to in dex register p u s h R ,V 4{IR > ; push D .F ie ld 2 p o p c ,R ,V l ; assign to A a d d r e s s in d i r e c t - (A ): a d d ress re g is te r — > d a ta m e m o r y T h is m od e provides access to d a ta loca ted at th e d a ta address in th e address register. T he address register m ust be loaded w ith a d a ta m em ory address w hich p o in ts to d a ta m em ory. T h is m ode is u sefu l for m u ltip le ind irection . E x a m p le : c = *(*g); cou ld be tran sla ted into: lo a d a r c ,V 7 ; load addr reg w ith addr o f g lo a d a r c ,(A ) ; load addr reg w ith addr o f *g p u s h I ,(A ) ; push * (*g ) p o p c ,I ,V 3 ; assign to c 107 a d d r e s s o f f s e t m o d e - A {o ffse t}: address reg is te r + I R ---- ► d a ta m e m o r y T h is m ode provides access to structured d a ta through the address register. T he in dex register is added to th e address register to provide an address to th e d a ta to be accessed . T h is m ode is u sefu l for ind irection w ith structured da ta , such as po in ters to record's in P ascal. E x a m p le : I := H j .D a ta , cou ld be tran sla ted into: p u s h A ,V 8 ; push H f (address value o f H) p o p a r C ; load ar w ith H t p u s h I,C 2 ; D a ta has offset o f 2 in record p o p ir c ; load Ir w ith offset p u s h I ,A { IR } push H t-D a ta p o p c,I,V 9 ; assign to I v a r ia b le in d e x e d m o d e - V # [in d ex ]: va r ia b le reg is te r # — > to p o f va r iab le s ta c k + I R * d a ta s i z e ---- > d a ta m e m o r y T h is address m od e uses th e top o f the variable register # stack as a base address and adds th e in d ex register, w hich m ust be p reviously loaded , m ultip lied by th e num ber o f bytes occup ied by th e d a ta typ e , w hich is a basic E -m ach ine d a ta type . T h e resu lting address p o in ts to th e d a ta item . T h is m ode is u sefu l for accessing an array w hose elem en ts are o f a basic E -m ach ine d a ta type . E x a m p le : B := L [3]; cou ld be tran sla ted into: p u s h n ,I,3 ; pu t in dex o f 3 in to p o p ir C ; th e in dex register p u s h I,V 12[IR ] ; push L [3] p o p c ,I,V 2 I assign to B a d d r e s s i n d e x e d m o d e - A [index]: address reg is te r + I R * d a ta s ize — > d a ta m e m o r y T h is m ode provides the sam e function as variable indexed m ode, excep t in stead o f a variable register provid ing the base address, th e address register is loaded w ith th e base address. T h is m ode cou ld be used for accessing elem en ts o f an array w hich is p o in ted to by a variable. 108 E x a m p le : B := STM; cou ld be tran sla ted into: p u s h A ,V 19 ; pu t address o f array in to p o p a r C ; address register p u s h 1,4 ; pu t in dex o f 4 in to p o p ir C ; th e in dex register p u s h I1A pR ] ; push S T [4] p o p c ,I,V 2 ; assign to B 109 A PPE N D IX C A m iniPA SCAL COM PILATION EXAM PLE This appendix provides an example showing the complete results of the compi­ lation of a miniPascal program. The compilation was produced on a DOS machine. The example program, shown in figure 27, is referred to as program SampS through­ out the remainder of this appendix. The numbers on the left refer to source program line numbers. The program, as shown in figure 27 (with the exception of the line numbers), is written to the SOURCESECTION portion of the E-machine object file (or E-code file). Program Samp3 contains several features that were not illus­ trated previously. These features include constant and type declarations, a record definition, a two dimensional array, and an array of records. The record definition, DRec, consists of two fields, one of which is a two-dimensional array of the pre­ viously defined Matrx type. An array of these records (DBase) is then declared, with an instance of such an array (Data) being declared in the variable declaration section of the main program. Another variable—also named Data—is declared in the formal parameter list of procedure InitD. In this case, Data is declared as only a single record of type DRec. Program Samp3 also contains a situation in which a packet becomes fragmented (see chapter 4 for a discussion of the packet fragmentation problem). The frag­ mentation occurs in procedure InitD, which contains a nested for loop in which the inner for loop is a single statement within another conditionally executed statement. no The particular packet fragmentation situation found in program SampS is discussed later in this appendix.. Table 12 shows the array containing the program memory addresses correspond­ ing to program SampS’s generated E-code label instructions. The column holding the label numbers (or label register numbers) is included for clarity—only the array of program memory addresses is actually written to the LABELSECTION portion of the E-code file. Table 13 shows the array containing the data memory sizes reserved for pro­ gram SampS’s variable registers. The columns holding the variable names and the variable register numbers are included for clarity—only the array of data memory sizes is actually written to the VARIABLESECTION portion of the E-code file. The variable registers whose corresponding names are blank are temporary regis­ ters needed to hold intermediate values. In this implementation, the data memory sizes are in terms of bytes; hence, the corresponding data memory size for a 32-bit integer (e.g., J) is 4. As can be seen in table 13, the full size of the array of records (variable Data represented by variable register number 3) is reserved for the array. The full size of a single record (40 bytes) is reserved for the record Data (variable register number 6) found in procedure InitD. Variable register number 18, repre­ senting a 2-byte temporary variable, holds the result of the i f comparison found in function Fact. Figure 28 shows the contents of program SampS’s string space array. In this example, the string literal, ’Sample Program’ associated with the string constant Name, is entered into the string array. The string array is subsequently written to the STRINGSEOTION portion of the E-code file. Table 14 shows the Packet Table generated for program Samp3. The column holding the packet number is included for clarity—the remaining fields (columns) are written to the PACKETSECTION of the E-code file. As can be seen in table 14, I l l packet number 24 is a fragmented packet. This fragmentation situation is discussed later in this appendix. There are also two packets (numbers 36 and 43) whose execution should not result in changing the animation display. These two packets correspond to a return from a function call; this situation was discussed in the Parser module section of chapter 4. Table 1.5 shows the Static Scope Table for program Samp3. The column holding the entry number is included for clarity—the remaining fields (columns) are written to the STATSCOPESEGTION of the E-code file. Two previously unillustrated types of scope blocks are found in table 15. These are a record description scope block (entries 6-9) and an array index description scope block (entries 10-13). As can be seen in table 15, two identifiers (entry I in procedure InitD’s scope block and entry 21 in program SampS’s scope) both refer to the same child scope block, which is the record scope block describing a record of type DRec (entries 6- 9). The compiler is able to determine that this record description scope block needs to be present only once (and possibly referenced multiple times) by querying the Scope Owner Table’s Record Descriptor field, as discussed in the STATSCOPE module section of chapter 4. Examine entry number 7 in table 15. This entry describes field A of a record of type DRec; field A is an array of type Matrx. The bounds of A’s first index are included in entry number 7. The NxtIdx field of this entry holds the index of the first entry of the scope block describing A’s second index (entries 10-13). Entry numbers 7 and 8, which describe the fields named A and B, also utilize the Offset field to denote the fields’ offsets from the beginning of the record. Finally, note that the RecSiz field is utilized for a variable representing an array of records (e.g., entry number 21 describing the variable Data in program Samp3). This value is required by the animator for proper calculation of offsets when displaying values associated with arrays of records. 112 Figure 29 shows the pseudo assembly language representation of the E-code in­ structions generated for program SampS. Figure 29 is formatted to enumerate the program’s animation units, with translated E-code packets printed directly beneath corresponding animation units. Here again, the reader is reminded that the pseudo assembly language format is used for clarity—it is an array of C structures represent­ ing the E-code instruction stream that is actually written to the CODESECTION of the E-code file. Figure 29 illustrates several situations that need to be discussed. First, examine E-code instruction numbers 3-11. Each name declared in the constant declaration section is assigned a variable register number. The constants’ values are then stored in their corresponding variable registers. Thus, the compiler treats constants as though they were variable names in order to allow the animator access to their val­ ues at run time. Figure 30 shows a possible animation snapshot after the constant declarations have been executed (i.e., at this point the keyword TYPE will be high­ lighted, indicating that it is the next animation unit to be executed). It should be noted that as each of the subsequent type declarations are executed, the animator simply sequentially highlights the corresponding animation unit—there will be no corresponding data memory values added to the right-hand side of the display until variable names are actually declared. The nop instructions (numbers 12-15) serve as “dummy” instructions to allow the animator to highlight the appropriate animation unit. As mentioned above, program Samp3 contains a situation in which a packet is fragmented. This packet, number 24, is the E-code translation of the animation unit Data.A[I,J] := I + 101.33 * MultF; This animation unit is part of a single for statement, thus illustrating the frag­ mentation of a packet resulting from a single for statement nested within a 113 conditionally executed statement, in this case another for statement. The frag­ mentation problem is manifested as follows. As the inner for loop is executed, the animator sequentially highlights the four animation units composing the inner for statement (i.e., the animation units translated by E-code packets numbered 21-24). The animator repeats this process upon each iteration of the inner loop. When the inner loop index eventually reaches its upper limit, the branch to label 7 (shown in instruction number 64) is taken. The E-code instruction defining label 7 (instruc­ tion number 117) is contained in packet number 24, which translates the above- mentioned animation unit. At this point, however, this animation unit should not be highlighted (since the instructions translating the assignment statement repre­ sented by the animation unit will not now be executed). The FragAddr field for packet number 24 in the Packet Table (shown in table 14) holds the value 117, indi­ cating that packet number 24 is considered fragmented whenever control branches into the packet at (or beyond) instruction number 117. The animator queries the E-machine’s program counter and packet register to determine which animation unit (if any) should be highlighted prior to the E-machine’s execution of the correspond­ ing packet. Thus, the animator, upon querying the E-machine’s program counter (currently 117) and packet register (currently 24), determines that packet number 24 is fragmented at its current point of entry, instruction number 117. The animator must now retain its previous display while the E-machine executes instruction num­ bers 117—118. When the branch to label 2 (shown in instruction number 118) is accomplished, the E-machine returns control to the animator, which again queries the E-machine’s program counter (currently 33) and packet register (currently 19). Since packet number 19 is not fragmented, the animator now highlights the anima­ tion unit corresponding to this packet, I := I TO Rows 114 Finally, figures 31 and 32 show two possible animation screens occurring during the animation of program Samp3. Figure 31 shows an animation display that could occur immediately before procedure InitD is called from the main program (i.e., the animator is highlighting the animation unit InitD (Data [ETum] , 3) ; while awaiting a response from the user). The dotted lines shown in the source program window indicate omitted source lines. Figure 32 shows an animation display that could occur immediately before a return is issued from procedure InitD (i.e,, the animator is highlighting the animation unit EMD; in procedure InitD). Here again, the dotted Hnes indicate omitted source Hnes. 115 0 Program Samp3; 1 CONST 2 Rows = 3; 3 Cols = 3; 4 Name = 'Sample Program'; 5 TYPE 6 Matrx = ARRAY [I..Rows,I..Cols] of REAL; 7 DRec = RECORD 8 A:Matrx; 9 B .-INTEGER; 10 END; { DRec > 11 DBase = ARRAY [I..2] OF DRec; 12 VAR 13 Data .-DBase ; 14 Num,nFact:INTEGER; 15 16 Procedure InitD(VAR Data:DRec; 17 MultF:INTEGER); 18 VAR 19 I,J :INTEGER; 20 BEGIN •{ Procedure InitD } 21 FOR I := I TO Rows DO 22 FOR J := I TO Cols DO 23 Data.A [I,J] := I + 101.33 * MultF; 24 Data.B := MultF; 25 END; { Procedure InitD } 26 27 Function Fact(n:INTEGER):INTEGER; 28 BEGIN { Function Fact } 29 IF n = 0 30 THEN Fact := I 31 ELSE Fact := n * Fact(n-1) 32 END; { Function Fact } 33 34 BEGIN { Program Samp3 } 35 Num := 2; 36 InitD(Data[Num],3); 37 nFact := Fact(Data[Num].B); 38 END. -( Program SampS } Figure 27: The E-code SOURCESECTION for Program Samp3 116 Label Register Number Label Program Address LO 187 LI 21 L2 33 L3 42 L4 .119 L5 51 L6 60 L7 117 L8 139 L9 145 LlO 157 LU 179 L12 166 L13 208 L14 233 Table 12: The E-code LABELSECTION for Program Samp3 117 Variable Variable Variable Name Register Number Size Rows 0 4 Cols I 4 Name 2 4 Data 3 80 nFact 4 4 Nnm 5 4 Data 6 40 MultF 7 4 J 8 . 4 I 9 4 10 4 11 4 12 4 13 4 14 4 15 4 n 16 4 Fact 17 4 18 2 19 4 20 4 21 4 22 4 23 4 24 4 25 4 26 4 Table 13: The E-code VARIABLESECTION for Program Samp3 118 String Space 0 , '0 I S 2 a 3 m 4 P 5 ■ I 6 . e . 7 8 P 9 r 10 O' 11 g 12 r 13 a 14 m 15 0 , Figure 28: The E-code STRINGSECTION for Program SampS 119 Packet N um ber Start Addr End Addr Start Line Start Col End Line End Col Scope Index Frag Addr Display Packet 0 0 I 0 0 0 13 0 -I T R U E I 2 2 I 0 I 4 0 -I T R U E 2 3 5 2 I 2 9 I - I T R U E 3 6 8 3 I 3 9 2 -I T R U E 4 9 11 4 I 4 24 3 -I T R U E 5 12 12 5 0 5 3 3 -I T R U E 6 13 13 6 I 6 40 3 - I T R U E 7 14 14 7 I 10 6 3 -I T R U E 8 15 15 11 I 11 29 3 -I T R U E 9 16 16 12 0 12 2 3 -I T R U E 10 17 17 13 I 13 11 4 -I T R U E 11 18 20 14 I 14 18 6 -I T R U E 12 21 22 16 I 16 15 0 -I T R U E 13 23 23 16 17 16 30 I - I T R U E 14 24 25 17 17 17 31 2 -I T R U E 15 26 26 18 3 18 5 2 -I T R U E 16 27 28 19 4 19 15 4 -I T R U E 17 29 29 20 3 20 7 4 -I T R U E 18 30 31 21 4 21 13 4 -I T R U E 19 32 46 21 8 21 21 4 -I T R U E 20 47 47 21 23 21 24 4 -I T R U E 21 48 49 22 5 22 14 4 -I T R U E 22 50 64 22 9 22 22 4 -I T R U E 23 65 65 22 24 22 25 4 -I T R U E 24 66 118 23 6 23 39 4 117 T R U E 25 119 130 24 4 24 19 4 -I T R U E 26 131 138 25 4 25 7 7 -I T R U E 27 139 140 27 I 27 13 0 -I T R U E 28 141 142 27 15 27 24 I - I T R U E 29 143 143 27 25 27 33 2 -I T R U E 30 144 144 28 3 28 7 2 -I T R U E 31 145 152 29 4 29 11 2 -I T R U E 32 153 153 30 6 30 9 2 -I T R U E 33 154 156 30 11 30 19 2 -I T R U E 34 157 158 31 6 31 9 2 -I T R U E 35 159 165 31 23 31 31 2 -I T R U E 36 166 168 - - - - 2 -I FALSE 37 169 178 31 11 31 31 2 -I T R U E 38 179 186 32 4 32 7 8 -I T R U E 39 187 188 34 I 34 5 8 -I T R U E 40 189 190 35 2 35 10 8 -I T R U E 41 191 207 36 2 36 20 8 -I T R U E 42 208 232 37 11 37 27 8 -I T R U E 43 233 235 - - - - 8 -I FALSE 44 236 237 37 2 37 28 8 -I T R U E 45 238 250 38 2 38 5 8 -I T R U E Table 14: The E-code PACKETSECTION for Program Samp3 120 En Id Upr Lwr Nxt Off Type Rec Par Ch Var Proc try Name Bnd Bnd Idx set Siz ent ild Reg Num S c o p e b lo c k d e s c r i b i n g p r o c e d u r e I m t D 0 - - - - HEADER - 17 - - - I Data - - - - RECORD - - 6 6 - 2 MultF - - - - INTEGER - - - 7 - 3 I - - - - INTEGER - - - 9 - 4 J - - - - INTEGER - - - 8 - 5 - - - - END - - - - - S c o p e b lo c k d e s c r i b i n g r e c o r d o f t y p e D R e c 6 - - - - HEADER - - - - - 7 A 3 I 10 0 REAL - - - - - 8 B - - - 36 INTEGER - - - - - 9 - - - - E N D - - - - - Scope block describing second index of array of type Matrx 10 - - - - H E A D E R - - - - - 11 3 I - - - - - - - - 12 - - - - E N D - - - - - Scope block describing function Fact 13 - - - - H E A D E R - 17 - - - 14 n - - - - IN T E G E R - - - 16 - 15 Fact - - - - IN T E G E R - - - 17 - 16 - - - - E N D - - - - - Scope block describing program SampS 17 - - - - H E A D E R - 27 - - - 18 Rows - - - - IN T C O N S T - - - 0 - 19 Cols - - - - IN T C O N S T - - - I - 20 Name - - - - S T R IN G C O N S T - - - 2 - 21 Data 2 I - - R E C O R D 40 - 6 3 - 22 Num - - - - IN T E G E R - - - 5 - 23 nFact - - - - IN T E G E R - - - 4 - 24 InitD - - - - P R O C E D U R E - - 0 - I 25 Fact - - - - F U N C T IO N - - 13 - 2 26 - - - - E N D - - - - - Bootstrap scope block 27 - - - - H E A D E R - - - - - 28 Samp3 - - - - P R O G R A M - - 17 - 0 29 - - - - E N D - - - - - Table 15: The E-code STATSCOPESECTION for Program SampS 121 Pkt Mum O 1 2 3 4 5 6 7 8 9 10 Animation Unit Instr E-code Mum Instruction Program Samp3; 0 pushd C28 ; Push program's Static Scope Table I nop ; index onto dynamic scope stack CONST 2 nop Rows = 3; 3 inst c ,,VO ; Create instance of Rows 4 push I,C3 ; Store value of Rows in data memory 5 pop c,I,VO Cols = 3; 6 inst c,Vl ; Create instance of Cols 7 push I,CS ; Store value of Cols in data memory 8 pop c,I,Vl Name = 'Sample Program'; 9 inst c,V2 ; Create instance of Name 10 push I,Cl ■ ; Store Name's string space index in 11 pop c,C,V2 ; corresponding Variable register TYPE 12 nop Matrx =: ARRAY [I..Rows,I..Cols] OF REAL; 13 nop DRec = RECORD A : Matrx.; B :INTEGER; END; 14 nop DBase = ARRAY [I..2] OF DRec; 15 nop VAR 16 npp Data:DBase; 17 inst c,V3 ; Create instance of Data Figure 29: The E-code CODESECTION for Program Samp3 122 Num,nFact:INTEGER; 18 inst c,V4 ; Create instance of nFact 19 inst c,V5 ; Create instance of Nmu 20 br 0 ; Branch to beginning of main program 12 Procedure InitD 2.1 label I ; Enter Procedure InitD 22 pushd 024 ; Push procedure's Static Scope Table ; index onto dynamic scope stack 13 (VAR Data:DRec; 23 link V6 ; Link Data to actual param 14 MultF INTEGER); 24 inst c,V7 ; Create instance of MultF 25 pop c,I,V7 ; Put actual param into MultF 15 VAR 26 nop 16 I,J: INTEGER; 27 inst c,V8 ; Create instance of J 28 inst c,V9 ; Create instance of I 17 BEGIN 29 nop 18 FOR I := I 30 push I,Cl ; Initialize I with value of I 31 pop c,I,V9 19 I := I TO Rows 32 br 3 ; Branch around MAXINT test and ; increment of I on first pass ; through the loop 33 label 2 ; Test label of outer FOR loop 34 push I,V9 35 push 1,032767 36 eql c,I ; Test that I has not exceeded MAXINT 37 brt c,4 ; If so, branch out of loop 38 push I,V9 39 push 1,01 40 add c,I ; Increment I 41 pop c,I,V9 42 label 3 43 push I,V9 44 push 1,03 45 gtr c,I ; Test for I reaching upper loop limit 46 brt c,4 ; If so, branch out of loop Figure 29 (continued) 123 20 47 nop 21 FOR J := I 48 push I,Cl 49 pop c,I,V8 22 50 J := I TO Cols hr 6 51 label 5 52 push I,V8 53 push I,C32767 54 eql c,I 55 brt c,7 56 push I,V8 57 push I,Cl 58 add c,I 59 pop c,I,V8 60 label 6 61 push I,V8 62 push I,C3 63 gtr c,I 64 brt c,7 23 65 nop 24 Data. A [I, J] := I - 66 inst c,VlO 67 push I,V9 68 pop c,I,VlO 69 inst c,Vll 70 push I,V8 71 pop c,I,Vll 72 inst c,V12 73 push I,CO 74 pop c,I,V12 75 push I,VlO 76 push I,C3 77 mult c,I 78 push I,Vll 79 add c,I 80 pop c,I,VlO 81 push I,VlO 82 push I,C4 83 sub c,I 84 pop c,I,V10 ; Initialize J with value of I Branch around MAXINT test and increment of J on first pass through the loop Test label of inner FOR loop Test that J has not exceeded MAXINT If so, branch out of loop ; Increment J Test for J reaching upper loop limit If so, branch out of loop DO Create instance of temporary variable (VlO) and store value of first index (I) in VlO Create instance of temporary variable (Vll) and store value of second index (J) in Vll Create instance of temporary variable (VI2) and calculate the final (lineal) array index value based on the values of the two indices, I and J ( P a ck e t number 24 c o n tin u e d on n e x t page) Figure 29 (continued) 124 (Continuation of packet number 24) 85 push I,VlO 86 push I,C4 87 mult c,I 88 push I,V12 89 add c,I 90 pop c,I,V12 91 push I,V12 92 push I,C0 93 add c,I 94 pop c,I,V12 95 inst c,V13 96 push R 1ClOl.33 97 push I,V7 98 cast c,I,R 99 mult c,R 100 pop C1R 1VlS 101 inst c,V14 102 push I,V9 103 cast C1I1R 104 push R 1V13 105 add C1R 106 pop c,R1V14 107 push R 1Vl4 108 push I,V12 109 popir c H O pop C1R 1VC-ClRd 111 uninst c,V14 112 uninst c,V13 113 uninst C1Vl2 114 uninst C1Vll 115 uninst C1VlO 116 br 5 117 label 7 118 br 2 Data .B := MultF; 119 label 4 120 inst c,V15 121 push I,CO 122 pop C1I1VlB 123 push I1VlB 124 push I,C36 125 add c,I 126 pop C1I1VlS 127 push I,V7 128 push I1VlB 129 popir c 130 pop C1I1VO-ClRd Store calculated value of final index in Vl2 Convert index value in Vl2 to offset value Create instance of temporary variable (V13) to hold result of 101.33 * MultF Cast MultF to REAL 101.33 * MultF Store multiplication result in V13 Create instance of temporary variable (V14) to hold result I + V13 and cast I to REAL I + V13 Store addition result in V14 Put offset value in index reg Put V14,s value in Data.A[I,J] Delete instances of temporary variables created within the inner FOR loop Branch to test of inner FOR loop Branch out label of inner FOR loop Branch to test of outer FOR loop Branch out label of outer FOR loop Create instance of a temporary variable (V15) to hold offset of field B Calculate offset of field B Store offset of field B in V15 Put MultF on evaluation stack Put offset of field B on eval stack Put offset of field B in index reg Put MultF in Data.B Figure 29 (continued) 125 26 END; 131 nop 132 uninst c,V15 Delete instance of temporary variable Delete instance of J133 uninst c,V8 134 uninst c,V9 Delete instance of I 135 uninst c,V7 Delete instance of MultF 136 unlink c,V6 Unlink Data 137 popd Pop procedure's Static Scope Table from the dynamic scope stack 138 return Return to calling scope 27 Function Fact 139 label L8 Enter Function Fact 140 pushd 025 Push function's Static Scope Table index onto dynamic scope stack 28 (n:INTEGER) 141 inst c,V16 Create instance of n 142 pop c,I,V16 Put actual param into n 29 INTEGER; 143 inst c,V17 Create instance of Fact (function's return value) 30 BEGIN 144 nop 31 IF n = 0 145 label L9 146 inst c,V18 Create instance of temporary 147 push I,V16 variable (V18) to hold comparison 148 push 1,00 result 149 eql c,I Check for n = 0 150 pop c,B,V18 Put comparison result in V18 151 push B ,V18 152 brf c,10 ; If n not = 0, branch to ELSE 32 THEN 153 nop 33 Fact := I 154 push 1,01 Put I in Fact 155 pop c,I,V17 156 br 11 Branch around ELSE 34 ELSE 157 label 10 ELSE label 158 nop Figure 29 (continued) 126 35 36 37 38 39 40 Fact(n-1) 159 inst CjVlO ; Create instance of temporary 160 push I,V16 ; variable (VlO) to hold n-1 161 push I,Cl 162 sub c,I ; Subtract I from n 163 pop CjIjVlO ; Put n-1 in V19 164 push IjVlO ; Push n-1 onto evaluation stack 165 call 8 ; Call Fact 166 label 12 Return from Fact 167 inst c,V20 Create instance of temporary 168 pop c,I,V20 variable (V20) to hold function value Fact := n * Fact(n-1) 169 inst 0,721 Create instance of temporary 170 push I,V16 variable (V21) to hold n* Fact(n-1) 171 push I,V20 172 mult c,I 173 pop 0,1,721 Put multiplication result in V21 174 push I,721 175 pop 0,1,717 Put function value in Fact 176 uninst c,V21 Delete instances of temporary 177 uninst c,V20 variables created in ELSE clause 178 uninst c,V19 END; 179 label 11 Branch out label for ELSE 180 nop 181 push 1,717 Put function value on eval stack index from the dynamic scope stack 182 uninst c,V18 Delete instance of temp variable ■ 183 uninst c,V17 Delete instance of Fact's result var 184 uninst c,V16 Delete instance of n 185 popd Pop function's Static Scope Table 186 return Return to calling scope BEGIN 187 label 0 Start label for main program 188 nop Num : = 2; 189 push I,C2 Put value of 2 in Num 190 pop c,I,V5 Figure 29 (continued) 127 41 42 InitD(Data[Num] ,3) ; 191 inst c,V22 ; 192 push I,V5 ; 193 pop c , I,V22 ; 194 push I,V22 ; 195 push I,Cl ; 196 sub c,I 197 pop c,I,V22 ; 198 push I,C3 ; 199 inst c,V23 ; 200 push I,V22 201 push I,C40 ; 202 mult c,I ; 203 pop c,I,V23 ; 204 push I,V23 205 popir c ; 206 pusha V3{IR> ; 207 call I J Fact(Data[Hum].B) 208 label 13 ; 209 inst c,V24 ; 210 push I,V5 211 pop c,I,V24 ; 212 inst c,V25 ; 213 push I,CO ; 214 pop c,I,V25 ; 215 push I,V24 216 push !,Cl 217 sub c,I 218 pop c,I,V24 219 push I,V24 220 push I,C40 221 mult c,I 222 push I,V25 223 add c,I 224 . pop c,I,V25 225 push I,V25 226 push I,C36 227 add c,I 228 pop c,I,V25 229 push I,V25 ; 230 popir c ; 231 push I,V3{IR> ; 232 call 8 : Create instance of temporary variable (V22) and store value of index (Mum) in V22 Calculate final (linear) array index Put final array index in V22 Put 3 on evaluation stack Create instance of temporary variable (V23) to hold offset of Data Calculate Data's offset and put it in V23 Put Data's offset in index reg Put address of Data[Mum] on eval stack Call InitD Return from InitD Create instance of temporary variable (V24) and store value of index (Mum) in V24 Create instance of temporary variable to hold calculated offset of Data[Mum].B Put offset of Data[Mum] .B in index reg Put Data[Mum].B on eval stack Call Fact Figure 29 (continued) 128 43 44 45 233 label 14 ; Return from Fact 234 inst c,V26 ; Create instance of temp variable 235 pop c,I,V26 ; (V26) to hold function value nF act : = Fact (Data [Num] .B) : 236 push I,V26 237 pop c,I,V4 Put value of Fact in nFact END 238 nop 239 uninst c,V26 Delete instances of temporary 240 uninst c,V25 variables 241 uninst c,V24 242 ■uninst c,V23 243 uninst c,V22 244 uninst c,V4 Delete instance of nFact 245 uninst c,V5 Delete instance of Num 246 uninst c,V3 Delete instance of Data 247 uninst c,V2 Delete instance of. Name 248 uninst c,Vl Delete instance of Cols 249 uninst c,VO Delete instance of Rows 250 popd Pop program's Static Scope Table index from the dynamic scope stac Figure 29 (continued) 129 Program Samp3; Program Samp3 CONST Rows = 3 Rows = 3; Cols = 3 Cols = 3; Name = 'Sample Program' Name = 'Sample Program'; TYPE Matrx = ARRAY [I..Rows,I..Cols] OF REAL DRec = RECORD A: Matrx'; B :INTEGER; END; -C DRec .} DBase = ARRAY [I..2] OF DRec; VAR Data:DBase; Num,nFact:INTEGER; Figure 30: Animation Display After Constant Declarations in Program Samp3 130 Program Samp3; Program SampS CONST Rows = 3 Rows = 3; Cols = 3 Cols =3; Name = 'Sample Program' Name = 'Sample Program'; DataEl] .A TYPE undef undef undef Matrx = ARRAY [I..Rows,I..Cols] OF REAL undef undef undef DRec = RECORD undef undef undef A:Matrx; Data[l].B is undefined B :INTEGER; Data[2] .A END; { DRec } undef undef undef DBase = ARRAY [I..2],OF DRec; undef undef undef VAR undef undef undef Data:DBase; Data[2].B is undefined Num, nFact:INTEGER; Num = 2 BEGIN { Program Samp3 } Num := 2; InitD(Data[Num],3); nFact is undefined Figure 31: Animation Display Before Calling Procedure InitD in Program Samp3 131 Program Samp3 Rows = 3 Procedure InitD(VAR Data:DRec; Cols = 3 Name = ’Sample Program’ MultF:INTEGER); Data[l] .A VAR undef undef undef I,J :INTEGER; undef undef undef BEGIN d Procedure InitD } undef undef undef FOR I := I TO Rows DO Data[1].B is undefined FOR J := I TO Cols DO Data[2].A Data.A[I,J] := I + 101.33 * MultF; 304.99 304.99 304.99 Data.B := MultF; 305.99 305.99 305.99 END; { Procedure InitD } 306.99 306.99 306.99 Data[2] .B = 3 Num = 2 BEGIN { Program Samp3 } nFact is undefined Num := 2; Procedure InitD InitD(Data[Num],3); Data.A nFact := Fact(Data[Num].B); 304.99 304.99 304.99 END. { Program SampS } 305.99 305.99 305.99 306.99 306.99 306.99 Data.B = 3 MultF = 3 I .= 4 J = 4 Figure 32: Animation Display at End of Procedure InitD in Program SampS MONTANA STATE UNIVERSlTf LIBRARIES HOUCHEN 3INDERYLTD -JTICA/OMAHANE. I