% Copyright (C) 1989, Digital Equipment Corporation % All rights reserved. % See the file COPYRIGHT for a full description. % % Last modified on Mon Dec 14 14:00:26 PST 1992 by kalsow % modified on Tue Feb 18 13:23:15 PST 1992 by muller \chapter{Internals} This section contains a brief introduction to the internal structure of the compiler and runtime system. This introduction is neither comprehensive nor tutorial; it is merely intended as a stepping stone for the courageous. \section{A tour of the compiler} %=============================== \index{intermediate code} \index{C intermediate code@{\tt C} intermediate code} The compiler has undergone much evolution. It started as a project to build a simple and easy to maintain compiler. Somewhere along the way we decided to compile \modula. Much later we decided to generate C. In hindsight, \modula was a good choice, C was at best mediocre. \index{compiler!organization} The initial observation was that most compilers' data structures were visible and complex. This situation makes it necessary to understand a compiler in its entirety before attempting non-trivial enhancements or bug fixes. By keeping most of the compiler's primary data structures hidden behind opaque interfaces, we hoped to avoid this pitfall. So far, bugs have been easy to find. During early development, it was relatively easy to track the weekly language changes. \index{objects, within compiler} The compiler is decomposed by language feature rather than the more traditional compiler passes. We attempted to confine each language feature to a single module. For example, the parsing, name binding, type checking and code production for each statement is in its own module. This separation means that only the \verb|CaseStmt| module needs to know what data structures exist to implement \verb|CASE| statements. Other parts of the compiler need only know that the \verb|CASE| statement is a statement. This fact is captured by the object subtype hierarchy. A \verb|CaseStmt.T| is a subtype of a \verb|Stmt.T|. The main object types within the compiler are: values, statements, expressions, and types. ``Values'' is a misnomer; ``bindings'' would be better. This object class include anything that can be named: modules, procedures, variables, exceptions, constants, types, enumeration elements, record fields, methods, and procedure formals. Statements include all of the \modula statements. Expressions include all the \modula expression forms that have a special syntax. And finally, types include the \modula types. The compiler retains the traditional separation of input streams, scanner, symbol table, and output stream. The compilation process retains the usual phases. Symbols are scanned as needed by the parser. A recursive descent parser reads the entire source and builds the internal syntax tree. All remaining passes simply add decorations to this tree. The next phase binds all identifiers to values in scopes. \modula allows arbitrary forward references so it is necessary to accumulate all names within a scope before binding any identifiers to values. The next phase divides the types into structurally equivalent classes. This phase actually occurs in two steps. First, the types are divided into classes such that each class will have a unique C representation. Then, those classes are refined into what \modula defines as structurally equivalent types. After the types have been partitioned, the entire tree is checked for type errors. Finally, the C code is emitted. C's requirement that declarations precede uses means that the code is generated in several passes. First, the types are generated during type checking. Then, the procedure headers are produced. And finally, the procedure bodies are generated. The compiler implementation is in the \verb|compiler| directory. Within that directory the following directories exist: \begin{quote} \begin{tabular}{ll} builtinOps & \verb|ABS|, \verb|ADR|, \verb|BITSIZE|, ... \\ builtinTypes & \verb|INTEGER|, \verb|CHAR|, \verb|REFANY|, ... \\ builtinWord & \verb|Word.And|, \verb|Word.Or|, ... \\ exprs & \verb|+|, \verb|-|, \verb|[]|, \verb|^|, \verb|AND|, \verb|OR|, ... \\ misc & main program, scanner, symbol tables, ... \\ stmts & \verb|:=|, \verb|IF|, \verb|TRY|, \verb|WHILE|, ... \\ types & \verb|ARRAY|, \verb|BITS FOR|, \verb|RECORD|, ... \\ values & \verb|MODULE|, \verb|PROCEDURE|, \verb|VAR|, ... \\ \end{tabular} \end{quote} \section{A tour of the runtime} %============================== The runtime itself implements the garbage collector, \modula startup code and a few miscellaneous functions. The runtime exists in the \verb|libm3/runtime| directory. \index{M3Runtime.h@{\tt M3Runtime.h}} \index{M3Machine.h@{\tt M3Machine.h}} The interface between the compiler and runtime system is embodied (and very sparsely documented) in \verb|M3Runtime.h|, \verb|M3Machine.h| (an architecture-dependent file) and \verb|M3RuntimeDecls.h|. Every C file generated by the compiler includes these files. \index{garbage collection!copying} \index{garbage collection} The allocator and garbage collector are based on Joel Bartlett's ``mostly copying collector''. The best description of his collector is in \cite{m3-Bar88}. Since that paper, we've made a few modifications to support a growing heap and to use extra information that the \modula compiler generates. \index{exceptions!setjmp, longjmp@{\tt setjmp}, {\tt longjmp}} \index{setjmp@{\tt setjmp}} \index{longjmp@{\tt longjmp}} Exceptions are implemented with \verb|setjmp| and \verb|longjmp|. The jump buffers and scope descriptors are chained together to form a stack. The head of the chain is kept in \verb|ThreadSupport.handlers|. There is a distinct chain for each thread. When an exception is raised, the chain is searched. If a handler for the exception is found, the exception is allowed to unwind the stack, otherwise a runtime error is signaled. To unwind the stack, a \verb|longjmp| is done to the first handler on the stack. It does whatever cleanup is necessary and passes control on up the stack to the next handler until the exception is actually handled. \index{typecells} Reference types are represented at runtime by a ``typecell''. Due to separate compilation, opaque types and revelations, it is not possible to fully initialize typecells at compile time. Typecell initialization is finished at link time. A typecell contains a type's typecode, a pointer to its parent typecell, the size of the types referent and method list if any, the type's brand, the number of open array dimensions, the type's fingerprint, and procedures to initialize the typecell, initialize new instances of the type, print instances of the type and trace the type for garbage collection. \index{linking} \index{typecodes} \index{type brands} Link time type elaboration occurs in several steps. First, all types are registered. That is, a global array that points to all typecells is built. Next, the runtime verifies that all opaque types have been given concrete representations. Then, the initialization of typecells is finished. Then, all types with the same brand and fingerprint are identified with the same typecode. Finally, a check is made to ensure that distinct types have distinct brands. \index{initialization} \index{module initialization} At the beginning of the execution of the program, all global variables are initialized, and the main bodies of the modules are invoked. The skeletal code that ensures that every module is initialized is generated by the linker part of the driver. Other parts of the runtime, such as threads, are actually implemented in the base library. \index{threads} \index{threads!preemption} \index{atomicity} Thread switching is implemented with \verb|setjmp| and \verb|longjmp|. A timer interrupt (signal) is used to cause preemptive thread switching. The global variable \verb|ThreadSupport.self| points to the currently running thread. The integer \verb|ThreadSupport.inCritical| is used by the runtime to prevent thread switching during garbage collection and other ``atomic'' runtime operations. \section{Porting to another machine} %=================================== \index{porting} \index{machine description} Anyone who is interested in porting this compiler is encouraged! We would like to know how it goes. The primary concerns when doing a port will be the size and alignment constraints of the target machine and the runtime. We tried to avoid suspicious C constructs, but we doubt that we were completely successful. The directions in this section are somewhat sparse. We tried to make the installation of \srcmodula smooth, but it is another story to make the development of ports smooth. Please bear with us and tell us what we can do to improve this section. If you want to a port to an unsupported system you should: \begin{itemize} \item get the {\tt compiler} and {\tt driver} source archives (in addition to {\tt m3make} that came with the boot archive and {\tt libm3} which you had to install anyway). \item decide on the name of the new architecture; in the rest, we assume that it is {\it new} \item describe the target machine for the compiler \item implement the machine-specific part of the base libraries for the new machine \item build a cross-compiler on a supported machine \item cross-compile (to C) the driver and the compiler \item finish the compilation of the driver and the compiler on the target machine \end{itemize} In the following, all the paths are relative to the directory in which you unpacked the archives (also known as the top-level directory). \paragraph{Describing the target machine} \index{Target.i3@{\tt Target.i3}} The compiler has a small number of parameters that are used to describe the target machine. These parameters are expressed in the interface {\tt compiler/src/{\it new}/Target.i3}. Create the directory {\tt compiler/src/{\it new}} and build the file {\tt Target.i3}, using the descriptions for the other machines as models. In {\tt compiler/src/m3makefile}, add the lines: \begin{verse} \tt \#if defined (TARGET\_{\it new}) \\ source\_dir (../src/{\it new}) \\ \#endif \\ \end{verse} \paragraph{Porting the runtime and base libraries} Some of the Modula-3 code (as well as very little pieces of C) are machine-dependent. Of course, it may be that some code we thought to be machine-independent will turn to have to be changed for your {\it new} architecture, so we cannot guarantee that the list below is exhaustive. In general, look at what is done for the other machines, and find the most similar as a starting point. In {\tt libm3/Csupport/src}, add a directory {\it new} and put in it the files: \begin{itemize} \item {\tt m3makefile} to describe the contents of the directory \item {\tt M3Machine.h} which is included in every C file generated by the \srcmodula compiler \item {\tt dtoa.c} to configure {\tt ../generic/dtoa.h}; look at that file for the things to configure. \item {\tt float.h} if your system does not have one. You can build it using a program called {\tt enquire}, which can be found on the net. % in the directory {\tt compiler/src/enquire} \end{itemize} In {\tt libm3/C/src}, add a directory {\it new} and put in it the files: \begin{itemize} \item {\tt m3makefile} to describe the contents of the directory \item {\tt Csetjmp.i3} to describe the interface to {\tt setjmp}, {\tt longjmp}, {\tt \_setjmp} and {\tt \_longjmp}. Be careful to get the size of the \verb|jmp_buf| right. \item {\tt Cstdio.i3}, essentially a \modula translation of {\tt stdio.h} \end{itemize} In {\tt libm3/thread/src}, add a directory {\it new} and put in it the files: \begin{itemize} \item {\tt m3makefile} to describe the contents of the directory \item {\tt WildJmp.i3}: this interface provides functions that are similar to {\tt \_setjmp} and {\tt \_longjmp}, but that do not impose any restriction on what is possible. The {\tt DS3100} version is the simplest, because on that architecture, {\tt \_setjmp} and {\tt \_longjmp} are just fine. The {\tt VAX} version is the most complex, we had to rewrite our own versions because {\tt longjmp} requires that the stack be popped (remember the {\tt longjmp botch} message ?). \end{itemize} \index{stack scanning} \index{conservative collection} \index{threads!stacks} In {\tt libm3/runtime/src}, you can either reuse one of the {\tt StackInc-{\it n}} component, or you may have to create a new one (i.e., you need a different value of {\it n}). The dependency described there is for the benefit of the garbage collector. At the beginning of a collection, the collector must find all the roots, that is, all the heap objects that are referenced from outside the heap itself. The stacks contain such pointers, and the collector scans them to find roots. However, the collector does not know the full structure of the stacks (frames, argument lists and so on); rather, it just looks at the values that are there and make conservative decisions by interpreting these values as possible pointers. For each stack, the collector initializes a pointer $P$ to the bottom of the stack; then it repeatedly tries to interpret the bits pointed by $P$ as a pointer in the heap, marks the root if the interpretation is successfull and advances $P$. The question is by how much $P$ should be advanced; if all entries in the stack are aligned at $n$-bytes boundaries, it is sufficient to increment $P$ by $n$ bytes; a smaller value would be an overkill. We have found that some machines require $n$ to be 2, and that 4 is enough for others. \index{floating-point} \index{IEEE floating-point} In {\tt libm3/float/src}, add a directory {\it new} and copy the files that are in {\tt MODEL} in that directory. The routines in these modules provide access the floating point control (to set exceptions and so on). The version in {\tt MODEL} is a template, and most of the routines will fail (because of an \verb|<*ASSERT FALSE*>|) if executed. The {\tt DS3100} and {\tt SPARC} directories are examples of implementations for IEEE machines, the {\tt VAX} directory is an example for non-IEEE machines. It is not essential that you implement the proper procedures right now: the versions in {\tt MODEL} are good enough for the compiler, the driver and simple test programs. But you will have to take care of that at some point. \index{random numbers} In {\tt libm3/random/src}, you can either reuse one of the directories {\tt VAX}, {\tt IEEE-le} or {\tt IEEE-be}, or create your own on those models. The goal is to describe enough of the floating point representation for the random number generator. There is probably some overlap with the {\tt libm3/float} stuff, we will take care of that at some point. \index{Unix} \index{OS interfaces} \index{system interfaces} In {\tt libm3/unix/src}, you will find a bunch of interfaces to the procedures of sections 2 and 3 of U**X. Not everything is there, but we sometime dream to have a complete set; in other words, it's quite a bit of work to make sure that you have the proper descriptions, and of course, there is nothing from which these interfaces could be mechanically derived. Fortunately, the driver and the compiler rely on very few of these procedures, and any version is probably good enough for your machine. We suggest that you do the work only when you find some problems (at least, wait until you get a basic port running). The last thing is to reflect all these changes in {\tt libm3/src/m3makefile}. Add a bunch of lines: \begin{verse} \tt \#if defined (TARGET\_{\it new}) \\ ... \\ \#endif \\ \end{verse} similar to those that are already there. You need to make a similar modification to {\tt compiler/src/m3makefile} and {\tt driver/src/m3makefile} (sorry for the duplication, but it is difficult to avoid). \paragraph{Creating a cross-compiler} At the top level, type to the shell: \begin{verse} \tt \$ m3make cross NEW={\it new} \\ \end{verse} After a while, you should get an executable {\tt compiler/cross-{\it new}/m3compiler}. \paragraph{Cross-compiling the driver and the compiler} At the top level, type to the shell: \begin{verse} \tt \$ m3make bootstrap\_driver NEW={\it new} \\ \$ m3make bootstrap\_compiler NEW={\it new} \\ \end{verse} At this point, you should in the same state as if we had built a boot archive for {\it new} and you had grabbed it. If you cannot mount the file system that contains all the files on the {new} machine, create a boot archive: \begin{verse} \tt \$ m3make pack NEW={\it new} \\ \end{verse} This creates a file {\tt boot\_files/boot.{\it new}-{\it version.tar.Z}}, which you need to unpack on the {\it new} machine. You can then proceed as for the first installation of \srcmodula (see the top level {\tt README}). Good Luck!