Bill Kalsow
Last modified 25 March 2021
The first version of the M3 compiler was written in Modula-2+ [*** footnote Modula-2+]. It had the full M2+ runtime available, but only needed a few things: access to command line arguments, opening and closing files, and reading and writing bytes. There was no need for threads, garbage collection, or exception handling. It parsed the input with a hand-written recursive descent parser and built an abstract syntax tree (AST). Then it ran a type-checking pass over the tree. Finally, it emitted code. This first version emitted C source code, which was compiled and linked by the native compilers.
Since the compiler emitted C, the details of function calls, parameters, and return values were all handled by the C compiler. This compatibility made it easy to interface with existing C libraries and OS support. While not a part of the language, the "EXTERNAL" tag was added very early to make this access easier.
The compiler and its underlying runtime evolved in parallel. John DeTreville wrote the first trace & sweep garbage collector. [*** footnote GC] He continued improving that collector for several years. I believe the initial collector was written in Modula-3. The compiler supplied "maps" that identified references in all heap allocated structures. Thread stacks were scanned conservatively: every 4-byte aligned word was treated as a potential heap reference.
Control was passed from thread to thread via the C library's setjmp/longjmp routines. Periodic timer interrupts were used to trigger thread switching. The initial garbage collector was never interrupted. If the interrupt routine detected that the collector was running, it tossed the interrupt. Later versions of the collector ran as multiple shorter passes so thread switching saw more regular, but with shorter interruptions.
Exception handling was also done using the setjmp/longjmp routines. Initially the compiler generated code to push and pop handler frames on the running thread's stack. Later versions of the compiler generated tables of C labels that identified the exception handling scopes. A tiny bit of platform-dependent code was needed to recover the current PC and walk the thread stack to determine which exception scopes were active. Optimizing C compilers that moved code around could break this scheme by moving code from inside to outside a handler scope, though I don't know that we ever identified a case where that actually happened.
Andrew Birrell's mail client (Postcard?) was throwing so many exceptions so frequently that it exposed a bug in the Ultrix library version of longjmp. It restored the SP before it was done restoring all other registers, leaving the longjump argument exposed above the top of the stack where it could be mutated by other interrupt-driven code.
I used the Vulcan [*** footnote Vulcan] programming environment's AST builder to translate the M3 compiler's M2+ code into Modula-3. It did an amazingly good job. At least 90% of the translation work was done automatically. All that was left was to go through and tidy up.
The M2+ version of the compiler did not have objects available as a programming construct, but I wrote the compiler with that in mind. Converting the translated M2+ to M3 code from pseudo-objects into the objects supported by Modula-3 was tedious, but straight-forward.
At this point the compiler for Modula-3 was written in Modula-3. It's a precarious situation. Bugs in the compiler could break the compiler. But, we survived.
The next major iteration of the compiler switched from generating C code to generating an internal intermediate language. The language was defined as a set of method calls on a Modula-3 code generator object. This interface provided a clean way to port the compiler to various platforms. One back end generated native Windows NT code. Another generated a text version of the IL. The back end that saw the most use read that textual intermediate language and called the GCC code generation libraries. This back end was a relatively small program written in C. Eric Muller was its primary developer. Because the GPL "virus" took effect when you linked with GPL-protected code, only this tiny back end code was released under the GPL license. The vast majority of the system, the compiler, the runtime, the libraries, and the applications, were all released under DEC's license. It made DEC lawyers happy and I suspect Free Software Foundation supporters were grumpy.
Debugging Modula-3 programs from the early C-generating compilers was not for the weak of heart. It was possible to include Modula-3 source line number information in the generated code, but the information describing data structures was almost non-existent. The compiler tried to generate C function and structure names that were as close as possible to the original Modula-3 names, but it wasn't a 100% success. Later versions of the compiler that generated the intermediate code also generated better debugging information.
Another thing that was happening during the compiler development was Fingerprints [*** footnote Fingerprints]. I did the initial implementation (in C?) for Andrei. The compiler then used that code to fingerprint types. Modula-3 was defined to use structural type equivalence. Generating a fingerprint for each type as it was defined meant that the compiler could test type equivalence with a 64-bit comparison and avoid zillions of recursive walks over type descriptors.
I seem to recall that the Network Objects system [*** footnote Network Objects] also used compiler-generated fingerprints. The fingerprints were included in the runtime type descriptors used by the garbage collector. I'd guess the Network Objects system grabbed them from the same runtime tables. (The earlier RPC system used a stub generator with a few built-in types that it supported.) (*** Wasn't this the flume stub generator for M2+ RPC?)
"m3make" also used compiler generated fingerprints to determine when code needed to be recompiled. If the set of symbols defined by an interface and their fingerprints matched those previously imported (and used!) by the compiler, recompilation of the importer was unnecessary. The result was that you could edit an interface and trigger minimal recompilation. File dates didn't matter. Additions to interfaces were usually non-events. With today's processors very few people seem to care about recompilation time (or performance in general 🙁).
It sure was an amazing ecosystem....
My recollection of the time line is fuzzy. I do remember that the first public release of the system was in December 1989 - a Christmas present to myself. :-) This first public version was written in Modula-3, generated C-code as its intermediate language and had support for the full language. I would guess that the Modula-3 language design committee started meeting 12-18 months prior to that first release. The initial Modula-3 Report (#31) was issued in August 1988 and the revised report (#52) in November 1989. Greg's book is copyrighted 1991.
I left SRC in July of 1995. Sometime early in 1996 Farshad Nayeri recruited me to work for Critical Mass. We continued development of Modula-3 using the SRC release as a basis for CM3. We wanted to create a web-based GUI development environment and leverage the growing momentum behind Java. e.g. We switched to a 16-bit CHAR datatype. The HTML-based GUI environment was innovative for the time, but would be considered conventional today. I wrote a clean-room implementation of the JVM in Modula-3, mostly as a proof-of-concept.
*** footnote Modula-2+: See [vanLeunen1986]
*** footnote GC: See Bill Kalsow and John DeTreville. History of the SRC Modula-3 Garbage Collector. Email to Paul McJones, April 2021. HTML
*** footnote Vulcan: See Section 6 of: Mark R. Brown and John R. Ellis. Bridges Tools to Extend the Vesta Conguration Management System Report 108, Systems Research Center, Digital Equipment Corporation, 14 June 1993. PDF
*** footnote Fingerprints
*** footnote Network Objects