The Multics MACLISP Compiler

The Basic Hackery -- a Tutorial

By Bernard S. Greenberg, December, 1977

Copyright © 1977, 1988, 1996, Bernard S. Greenberg

13 February 1996

Every decade this unpublished but nonetheless popular 1977 paper of mine on the Multics Lisp Compiler (lcp) seems to require upgrade to a new text-representation technology. First written in Multics runoff and printed out in 80-column format justified by padding with spaces, converted in 1988 to the then-reasonable TJ-Pigsnout (a SCRIBE®-like laserwriting text-justifier of my own), "The LCP Paper" now reappears in the clothes of the 90's, HTML, but hopefully, this time, it will publish itself via the miracle of the World Wide Web as part of the Multics Web Site and remain unpublished no more.

Although this document is littered with references to obsolete programs, impossible-to-obtain listings and sources, references to people since widely scattered, and the pretentious rantings of the 27-year old who wrote it, once it gets going, it has something wonderful to explain (lcp's value management scheme) which is as relevant to compiler-writing today as ever--it is worth slogging through. I have not altered the document text or content in any way except to convert it to HTML and add the table of contents at the front, and a small fix to a PL/I example in a footnote.

The icon represents a footnote. Click on occurrences of it in the text to see the footnote.

NB: The acronym MAC in the term MACLISP refers not to the Apple® Macintosh® computer, but to "Project MAC", which was the name of the MIT Laboratory for Computer Science at the time MACLISP was developed.

This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by the author. All persons copying this information are expected to adhere to the terms and constraints invoked by the author's copyright. In most cases, these works may not be reposted or republished without the explicit permission of the copyright holder.

Exordium
Great Overview
Basic Passes and Recursions
Pass 1
Multics Lisp Execution Environment
Pass 2
Value Management: An Example
Value Management: In Summary
Object Code List
Branching and Tags
Branching and Value Management
PDL Height Management
Compilation for Predicate
Object Type Strategy
Appendix A -- Value tag forms
Appendix B -- NCOMPLR and LCP
Appendix C -- Relevant Humor

29 February 1988

This paper is a reprint, an upgrade to "modern" computer typography and printing technology, of my unpublished, "bootleg" paper written ten years ago. In this age of marketed Lisp-based architectures, this paper still presents one of very few accounts of powerful compilation techniques for Lisp on so-called "stock architectures," and I have received in the interim many requests for it. The best of the techniques described herein flower a dozenfold in ITS' NCOMPLR, and are brought to their documented fulfillment (albeit for BLISS, not Lisp) in Wulf, Johnsson, et al.'s The Design of an Optimizing Compiler (Elsevier 1975).

Other than for the addition of the new appendices B and C, this reprint represents only a reprint and upgrading of typography. I have not attempted to rewrite it in any way, nor altered at all the potpourri of at times gratingly informal, at times erhoben, jargon-laden, and ostentatiously anthropomorphic and overtly sexist language, which is as much a reflection of the circumstances of origin as is the compiler technology described.

The Multics MACLISP Compiler

The Basic Hackery -- a Tutorial

December 1977

Exordium

To those LISPmen who have achieved proficiency in this our chosen language, and who have wondered about the strategies of the MACLISP compilers, I offer this tutorial, wherein I divulge the basic hacks of the Multics Compiler (lcp). Many of these hacks, including the entire first pass, were derived from the ITS Compiler (COMPLR, now NCOMPLR (or NCOMPL)), a much, much, hairier beast of which I have not yet achieved mastery. By learning that which I offer herein, the ambitious LISPman might not only acquire sufficient knowledge to modify or debug the Multics Compiler, but undertake the serious study of NCOMPLR. In fact, the state management hacks presented herein are of great interest in comparison to those of other modern compilers, e.g., Multics PL/I. He who comprehends this will also glean generally powerful knowledge of compiler-writing techniques in general, and will have interesting and useful knowledge under his belt.

If you are not already familiar with LISP, in some detail, including the traditional implementations and value/object issues, you probably should not be reading this.

Pass 1 of lcp was obscurely derived by Moon from COMPLR.264, which was a product of White, Rosen, Golden, Diffie, Nelson, and Greenblatt. Pass 2 of lcp was written by Reed, after study of COMPLR. Both passes were maintained for years by Daves Moon and Reed. I myself have added some of the recent optimizations.

I. Great Overview

lcp is a lisp program itself, usually compiled by itself, although it was bootstrapped via the interpreter during its childhood. The Multics lcp or lisp_compiler commands enter a PL/I program which figures out what directory it is in, and calls lisp (the lisp interpreter) with the pathname of the saved environment compiler.sv.lisp and the arguments to the lcp command. Thus,

    lcp abfuns -list -long

results in a call as though the command

    lisp >am>compiler abfuns -list -long

had been issued. This tells the interpreter to load up the saved environment, and execute the errlist (^g handler) of the compiler, which then accesses its Multics arguments via (status arg n).

lcp produces a standard Multics object segment; there is no LAP involved with it¹. A PL/I utility (lisp_cg_utility_) is called at the absolute end of compilation to take a list of binary words, fixnums, representing the object program, and build a standard object segment out of them. This PL/I program deals only with construction of the object segment, including the building of LISP and Multics linkage. All code generation issues have already been addressed by the lcp code generator at the time lisp_cg_utility_ is invoked².

Other than the PL/I program, lcp comes in three source files (and a couple of include files, which we will skip over):

    lcp_init_(.lisp), an obarray hack
    lcp_semant_(.lisp), Pass 1
    lcp_cg_(.lisp), Pass 2

lcp_init_ is a trivial program used at the time the compiler saved environment is built. It is loaded by the exec_com which builds the compiler saved environment, and loads the two main files of lcp after saving a copy of the interpreter's obarray. This obarray, which is known as the "working" obarray, will be used by lcp to read user source code over, so that the functions and variables of the compiler are inaccessible to user macros and compile-time functions³. Another copy of the obarray (the "compiler" obarray) is used to read lcp_cg_ and lcp_semant_ over. Load-time calls in the two main files to the fexpr global in lcp_init_ request the explicit interning of compiler symbols (e.g., special, *lexpr, etc.) on the working obarray.

The basic flow of lcp is two-pass, function by function. The command interface (command-interface in lcp_semant_) is invoked by the errlist upon environment startup. It interprets Multics control arguments, and remains in control, and executes (quit) out of lisp at the end of compilation. The function cf (in lcp_semant_) is called by command-interface to compile a file into an object segment: its single argument is a file name⁴. It deals with accessing the source file and metering and printing various meters of compilation CPU time and paging. At the center of this function is a call to cmp1 (in lcp_semant_) which is the implementation of the compiler top-level.

The compiler top-level (cmp1 in lcp_semant_) implements the documented, well-known manifestations of the compiler, specifically:

Expanding/reconsidering top level macro forms.
Mapping eval over the cdr of (declare ..) forms.
Evalling %include at compile time.
Putting "random forms" at top-level in the output list.
Considering (progn 'compile) results of top-level macro forms.
Putting macro definitions in the output list in (macros t) mode.
Canonicalizing (defun expr ....) and (defprop ... ... expr) (into the defprop form, as it happens.)
Declaring arrays seen as (array foo ...) at top level.

Anything that comes through the above as

          (defprop FUNNAME
                  (lambda...) PROP)

where PROP is one of expr, fexpr, or lexpr, gets handed to compile (in lcp_semant_) for the two-pass compilation process.

Macro definitions are evaluated in simulation to avoid putting macro properties on system functions at the time the compiler is used interpretively. In the case that the symbol on which a macro property is supposed to be hung has an explicit or declared (i.e., *fexpr, etc.) functional property, the macro lambda expression is hung off of *macro instead of macro. It goes without saying that compile-time user code cannot hope to invoke such macros: the user can only compile code using them.⁵

The function compile in lcp_semant_ compiles, i.e., applies the two-pass compilation process to the function definition arrived at at top level. It is called like this:

    (compile 'foobar 'expr 
     '(lambda (x) (* x 2)) nil)

The nil is a cheat to bind the global special rnl (rename list, to be discussed) to nil at the time compile is invoked. compile checks for redefinition, declares the function being defined (i.e., gives it a *expr property as its second argument is expr), invokes "pass1 processing" (via a call to p1glm, a point about 2 functions into pass-1 lambda-expression processing) on the lambda expression, and if no errors occurred, invokes pass 2 (pass2, in lcp_cg_) upon the result produced by pass 1.

II. Basic Passes and Recursions

It is quite simple to develop a lisp compiler that makes one pass over its source code, i.e., sees the sequence of forms in a function only once through⁶. However, certain optimizations and code-producing techniques in both MACLISP compilers require a prepass over the source code for each function before correct and efficient object code can be generated for each function. Knowledge about certain forms which could have only been gleaned from preanalyzing that form entirely is necessary for these techniques to operate. This knowledge is:

For all PROG tags: are there backward-branches to this tag? If so, the code generator must invalidate its assumptions about the content of dynamic registers and stack locations as compilation of that PROG proceeds past this tag.
For each local variable: what is the last form in which it is used? This knowledge allows storage for lambda variables (at the function and internal levels) to be reused when this form has been passed.
For each conditionally-branching form (COND, AND, OR, PROG): what variables are setqed, or potentially setqed (via calls to arbitrary functions) in this form? Fairly obscure, the need for this will be motivated in our description of branching.

Since two passes are necessary anyway, certain responsibilities have been given to pass 1 in order to simplify pass 2. These responsibilities are:

Almost all error checking. Checking the number of arguments to known system functions, and functions already compiled. Checking the type of arguments where appropriate. Checking fexprish forms for legal format and content. Substituting good forms in the output for bad ones.
Canonicalization. All constants which appear in expr and lexpr forms unquoted (e.g., 2 in (+ a 2)) are quoted. Stylistically, people have tended not to quote t, nil, and numeric constants. This quoting makes it trivial for pass 2 to tell if an argument of a form is a constant: if so, its car is quote.
Local and implicit declaration. Local declarations appearing in PROGs are processed. References to variables not in any regnant lambda list are implicitly declared special.
Semantication and machine-independent optimization.

The contract⁷ of a lisp compiler is to produce a machine-language program which, when executed, will have the same effect on the lisp environment, and return the same value, as having applied eval to the forms it was given to compile. This involves a great deal of knowledge of eval on the compiler's part. The only viable technique for fulfilling this contract, especially given that the compiler is a lisp program, is to consider source functions recursively, as eval does, considering arguments and sub-forms in the exact same order that eval would, for the order of evaluation of sub-forms is specified by the language as being from first to last⁸. Thus, both passes center around a "simulated-evaluator" function, which considers the source function recursively. The two functions, one for each pass, are

p1

for pass 1, which given a form, performs all pass 1 checking and transformation on it, producing as a

VALUE:: the pass-1 transmogrified (canonicalized and semanticated) form.; and as an
EFFECT:: changes upon pass 1's global data about this function.

comp

for pass 2, which given a form performs all pass 2 code generation functions on it producing as a

VALUE:: a "value tag," a name for the result of this form used by pass 2 state management.; and as an
EFFECT:: Additions to the list of machine-code instructions which is the object code for this program, and changes to pass 2's model of the runtime machine state.

Both p1 and comp perform trivial manipulations when the form on which they are invoked is a variable or a constant. For other forms, i.e., function calls, they dispatch their work to more specialized routines which know how to p1 or comp calls to specific subrs, fsubrs, or lsubrs, or lambda expressions, etc. In both cases, these more specific routines, perhaps through other intermediate routines, almost⁹ always wind up calling p1 or comp respectively on the elements of the cdr (i.e., the argument forms) of the form they were handed, and doing something with the result. These recursive calls must be performed on the argument forms in the language-defined order: first to last for subrs/exprs, functional lambda expressions after their arguments, and as fits each fsubr. These two functions are the heart of their respective passes.

Both basic recursions are cognizant of compiling for value versus compiling for effect. Compiling for value is the normal default: a form is being compiled for value when it is known that its value will not be thrown away at runtime. A form is being compiled for effect when it is known that its value will not be used in any case at runtime. For example, all clauses of a progn except the last are compiled for effect. Similarly all forms at the top level of a prog, and all forms except the second of a prog2. In the case of Pass 2, this is very important, for even though all Lisp forms are defined as returning a result, the lisp code generator (pass 2) need not worry about preserving the result of a form (e.g., a setq) that is being compiled for effect. Pass 2 does not save, nor ever call for the results of computations that are compiled for effect. Constants and variables compiled for effect may be simply ignored by the compiler.

Both passes maintain a variable, effs, normally bound nil, but t when a recursion for effect is in progress. Both basic recursions normally bind it nil; compe is used instead of comp when it must be bound t, and returns no meaningful result. p1e is used instead of p1v in pass 1. The functions p1 and comp0 are interfaces which preserve the current value of effs: in the compilation of progn, for instance, the last form is for value if the progn is being compiled for value, but for effect if the progn is being compiled for effect. Pass 1 rarely worries about value or effects, i.e., uses the value of the flag effs, but in pass 2 it is quite important.

Pass 2 also has a notion of compiling "for predicate," which will be described at the appropriate time.

III. Pass 1

The basic idea of pass 1 is to take the source-code function, and produce a "transmogrified version," upon which all pass 1 optimizations, semantications, and other transformations have been performed. The output of pass 1 is not a strange internal list structure: it is still lisp code, modulo two orthree new arguments at the head of COND, AND, OR, and PROG forms. A form handed¹⁰ to pass 1 as

    (cons (cdr x)(+ a '3))

will come out unchanged (although reconsed, i.e., not EQ to the original form, p1 having been called on each sub-form).

A service performed by pass 1 for pass 2 is to rename all lambda variables that have the same name as currently regnant lambda variables. A list called rnl is bound to its previous value each time a lambda expression is p1'ed, and an "assq pair" (e.g., (old . new)) is pushed onto rnl each time a lambda list of a lambda expression contains a variable already in a regnant lambda list. The "old" is the symbol in the source code: the "new" is a gensym created by p1rename in lcp_semant_. The gensym is given a rename property of the old symbol so that error diagnostic printers, upon seeing this, will print the "old" symbol. Thus, p1's result for a variable will be (cdr (assq var rnl)) if (assq var rnl) is non-nil at all.

This rename hack is a feature of the lexical scoping of compiled lisp local variables:

    (defun foo (x)
       ((lambda (x)(bar x 3)))
        (mung x 5)))

the outer x has nothing at all to do with the inner x: the outer x cannot even be referred to inside the lambda expression. It is a different variable. By removing name conflicts in this way, pass 2 can use eq test for two variables being the same, and put properties on the symbols which represent variables for whatever purposes it needs.

Pass 1 also performs macro reduction/expansion. By expansion, we mean that p1, upon being given a form whose car is a symbol with a macro or *macro property, applies that property to the input form, as is specified in the MACLISP semantics of macros, and reconsidering the result (iteratively) as though that were the original input. By reduction, we mean beating on the form in this way until its car is no longer a macro. Thus, inductively, we may prove that the input to pass 2 contains no macro forms whatsoever. Macro expansion is delegated by p1 to macro-expand in lcp_semant_, which errsets this fearsome task, to protect lcp against random user macro lossages, and diagnose the latter if they occur.

Several very critical mechanisms in pass 1 depend upon lists that have their current head designated by a special variable that is bound (not setqed) to its current value at the start of some processing, and rebound to that value at the end of that processing, thus popping all that was pushed on to the list since its head was saved. The best example of this is rnl, the rename-list, which has rename-pairs pushed onto it by the lambda-list processor (p1lmbify), after its value has been saved by compile or p1lam (the p1er of lambda applications), to be popped by which either of the latter pushed it. Thus, during the p1'ing of any given lambda expression, rnl appears as a list of rename pairs of all regnant lambda variables. The list bvars is similarly managed to always be a list of all regnant lambda variables, after renaming, managed by those same functions. We will call such a list a "stack-bound list."

Pass 1 inserts as the first two elements of AND, OR, COND and PROG forms two quantities known as the setq-list and call-out flag, respectively. The setq-list is a list of all local variables setqed anywhere within that AND, OR, COND, or PROG. Since local variables can only be named, in compiled lisp, in their own function, no code outside of this function can setq any of these variables. The call-out flag is set t if any external function which the compiler knows can setq special variables, or any random user function, is called, within that AND, OR, COND, or PROG. If this flag is on, in effect, all of the special variables in the world may be considered to be on the setq-list for this form, because one of these external functions might, for all we know, setq that variable. Pass 2 must know what variables are setqed in a form which has more than one path through it before compiling that form (PROG qualifies because GO and RETURN can get you out of it without executing all of it). This will be motivated in the discussion on branching.

Suffice it to say that each conditional form (we will use this to mean COND, AND, OR, or PROG) which contains internal conditional forms has the setq-list of the internal form as a subset of its own setq-list, and has a call-out flag of t if any internal conditional form has a call-out flag of t, for what is setqed in some form is by virtue of containment setqed in the form that contains it.

Three lists of setqed variables are maintained, one by the regnant COND, AND, or OR (p1csq), one by the regnant PROG (p1psq), and one by the regnant lambda expression (p1lsq), with the flags condp, progp, and lmbp saying if there are any of the above at all. Each time the setq of a non-special variable is p1'ed, p1sqv is called to add that variable to whichever of the above lists are valid (conceivably they all might be). A utility function (add) is used to add a symbol to a list if and only if it is not already on it. When the completion of the p1'ing of the conditional form or lambda expression is complete, the contents of the appropriate setq-list is made the setq-list of the output form, and that list merged into all regnant lists after the previous binding of the particular list-head is restored¹¹. This is done by p1setqex, the "setq expander." The reason that multiple kinds of lists are maintained, even though the function is seemingly identical, is so that setq's of lambda variables may be deleted out of the setq-list (or PROG variables out of PROG's setq list) before it is either made the setq-list of the form, or merged to the higher regnant setq lists. Note that lambda expressions are NOT conditional forms, but must participate in this hack as though they were, in order to delete their lambda variables out of their setq-list before propagating it.

The call-out flag is handled similarly: it is bound to nil at the start of the p1'ing of each conditional form or lambda expression, and setqed t if a setq of a special variable or call to a random function is ever p1'ed. p1setqex propagates it downward at the time that setq-lists are merged (his argument is a list of setq-list and call-out flag). The name of the call-out flag is p1cof.

Pass 1 also renames prog-tags for the same reasons as it renames variables, and puts in a prog, as the fourth element, the rename list. As a prog is processed, renamed tags are given a defined property. p1fsubr, upon hitting a go to a constant tag, checks (after renaming) for defined property: if there is one, this tag is the target of a "backward jump," a fact which pass 2 must know at the time it encounters the tag, and it is given a back-reference property. If an expression is gone to, all tags must get the back-reference property.

Pass 1 maintains a variable, cnt, during the p1'ing of each function. It is zeroed at the start of p1'ing of a function. It is incremented by one each time a variable is p1'ed. It is incremented by two every time a setq of a variable is p1'ed, and by one at several other random points. Pass 2 maintains an identical variable, also called cnt, which is of utmost importance to the operation of pass 2, and increments it at exactly the same points of comp'ing as in p1'ing. If the cnt produced by pass 1 is not equal to the cnt produced by pass 2 at the end of the function, the compiler is in error. This counter is used to mark the passage of time in simulated evaluation in pass 2: this will become clearer in our discussion of pass 2. However, the fact that pass 1 maintains it identically allows pass 1 to give to pass 2 critical hints about the usage of variables. Specifically, a list of all local variables ever defined during a function (either function or internal lambdas, after renaming, of course), called locvars is made by pass 1 for pass 2. Each element of this list is a cons of the variable and a fixnum. This fixnum is rplaced to the current value of cnt every time that variable is p1'ed or a setq of that variable is p1'ed (p1 and p1sqv do these things). Thus, when p1'ing of the function is complete, pass 2 knows, by asking (cdr (assq var locvars)) the latest time at which this variable is ever used, thus allowing its storage to be reused for other things. More on this later. Let us also mention in this context that the obscurely-named function p1bug is used every time a conditional form is finished being p1'ed or a label is hit, and it (being passed the value of cnt at which the difficult form was first starting to be p1'ed) sets the last-use-cnt of all variables in locvars at that time to "now" (the current cnt) if it appears that they have been used (p1'ed) since then (cdr of locvars element is greater than saved cnt). This is to say that such storage may not be reused, because the object code might branch back to a place where that variable was expected to be there, but the storage was reused! More on this later too.

Pass 1 semanticates: this is to say, constructs whose semantics may be expressed, at the source-language level, in terms of simpler constructs are so expressed. For example, do is given a *macro property (as though the source programmer had defined a macro called do!) of an internal compiler function, doexpander in lcp_semant_. Like any macro, this function is "out of the domain of the compiler:" it gets its entire source form as its single argument, and produces valid lisp code as its value. In the case of do, a bunch of setqs, conds, etc., in a prog are created by doexpander to represent the do-loop¹².

The bulk of pass 1 consists of semantication, i.e., rendering complex lisp constructs into simpler ones which express their meanings. As p1 encounters calls to subrs, lsubrs, or fsubrs, respectively, it hands the form down to p1subr, p1lsubr, and p1fsubr, respectively (all in lcp_semant_) to perform semantication and checking on specific functions. Many of these semantications are random¹³ and gratuitous in nature, and optimizations of constructs that no sane programmer or macro would ever produce. For example,

    (set (quote a) b)

becomes

    (setq a b)

    (eval (cons 'random-fsubr .....))

becomes

    (random-fsubr ....)

or (list) becomes 'nil.

More useful semantications include transforming the map functions (map, mapc, mapcan, etc.) into dos, themselves but a macro. One profound effect of this is to transform the functional argument of the mapping function into inline code. This has the effect of achieving the desired lexical scoping of variables in the map-function's functional argument. p1lsubr calls p1map to produce this do, and calls p1 on the result before handing it back to p1 as an answer. This has the effect of the map-functions acting like macros, i.e., their answer is Lisp, not p1 output, in pass 1. The actual function returned is not do, but a magic uninterned symbol named map-do, which also has a *macro property of the same doexpander function in pass 1.

A peculiar feature resulting from the transformation of map-functions into dos, with their resulting PROGs, is that GOs and RETURNs inside map-functions inside source do/prog constructs would refer to the compiler-generated do/prog instead of the source programmer's unless special action were taken! For instance,

    (prog (y)
      (return
         (mapc
          '(lambda (x)
             (and (car x)(return (cdr x))))
           y) 3)))

which is supposed to return (cdr x) when an x with non-nil car is found, will always return 3. This must not be allowed to happen-- thus, pass 1 keeps track of whether any PROG it is compiling was produced by a map function or not--- this is encoded in a flag, mapf, t for p1ing the internal function of a map-function (bound this way by p1, not setqed), nil otherwise, and a counter, p1mapcnt, rebound at each p1ing of a PROG, to 1+ its old value if this is a "phony" PROG (e.g., compiler-generated for map function) or to 0 at each "legitimate" PROG. This allows the p1er of gos and returns (p1sqg in lcp_semant_) to put as the caddr of gos and returns the value p1mapcnt, which is how many PROGs (legitimate or otherwise) we are being asked to return out of. Pass 2 will need to know this--- note that this is not an optimization, but a consequence of generating progs non-transparently. The progs and functions generated by p1map are magic uninterned atoms named map-prog and map-funarg, with *macro properties of t. This causes p1 to catch them, and bind mapf and p1mapcnt appropriately before calling the p1ers of progs expressions, respectively.

Functional expressions appearing in calls to functions with known functional arguments, e.g., the second argument to the sort functions, are compiled by recursive call to compile from p1gfy in lcp_semant_. p1fcnarg is called on all forms which are expected to be functions: if not quoted, they are p1'ed as any other form, otherwise, the result is either the symbol (of gensym-name, generated by gen-fcn-name, called by p1gfy) representing the function compiled (both passes!) on the spot, when p1gfy called compile recursively. Note that if a symbol (e.g., alphalessp) appears instead of a lambda expression behind the quote, that symbol gets returned. Note also that p1fsubr, upon being handed a (function ...) form that p1 got, converts it to (quote ...) and hands it to p1fcnarg. Note that for things where functions are expected (e.g., sort calls) opposed to hit (e.g., (setq x (function (lambda....)))) p1 will never encounter the lambda expression, but it will be given to compile to compile as a separate function.

Other lesser pass 1 semantications:

All BOOLE's of known constant first argument are converted into compoundings of logical AND, OR, and XOR and NOT, because these are the only ones that pass 2 can generate. p1boole does this. Specific (e.g., +, +$) arithmetic functions are substituted for generic (e.g., plus) arithmetic functions where all argument typings let this be done: p1 and p1arithsubst do this. Similarly, generic arithmetic expressions all of whose numeric types are known are converted (by convert-mixed-to-float) to floating computations by wrapping (float ...) around the known fixnums. The sassoc and sassq functions are changed by p1subr into COND's and APPLY's with assoc and assq. NOT gets changed by p1subr into NULL. Winning cases of DELETE, ASSOC, and MEMBER become DELQ, ASSQ, and MEMQ where possible (in p1subr). MEMQ's and ASSQ's of sufficiently small constant second arguments get expanded by p1subr into nested COND's of EQ's.

Reducible arithmetic expressions get their constants collected and evaluated at compile time by p1ctev (compile-time eval, in lcp_semant_). p1ctevq returns the same quoted, which is generally useful, given p1's output contract. p1ctev errsets possibly-losing user computations: Source-code *reducible's are also reduced by him. Multiple-form lambda expressions and multiple-form COND consequents are converted into PROGN's to reduce the amount of checking and processing in pass 2.

IV. Multics Lisp Execution Environment

Herein I state all the things you have to know about the execution environment of Multics Lisp to understand the code generation aspects of the Pass 2 discussion. They are not many.

Multics Lisp objects live in Multics segments. Multics Lisp values are doubleword ITS¹⁴ pointers to the objects, with type bits stored in unused bits of the pointer format. Cons values have zero type bits, because the pointer registers of the machine can hold an address: a store pointer instruction stores the pointer with the type bits zero.

Fixnums and flonums are entirely encoded within the doubleword value format, the first word containing, in addition to the type bits stating that this is a fixnum or a flonum, a quantity which causes a hardware error if an attempt is made to indirect through it.

Symbols have their bound values stored in a "shallow binding cell" at the first word of the symbol. Thus, and indirect load through a LISP value corresponding to a symbol loads its binding. Conses are stored car-cdr: the first doubleword of the cons, loadable indirect through a value corresponding to the cons, is the value which is its car.

There exists an A register and a Q register, in both of which arithmetic can be done, although only the A can be negated. The A and Q can also be considered as a double-length AQ register, which can hold a doubleword. Instructions exist to load and store the AQ from and to doublewords in storage. The AQ is basically the only register in Multics LISP which can hold a lisp value, although the pointer registers can hold the lisp value corresponding to a cons, because the type bits would be stored as zero, which is correct for a cons.

There are eight pointer registers, each of which can hold a Multics address. Two are reserved for the system (Multics, #6 and #7). One (#0, "AP") is reserved by the lisp runtime environment to maintain a stack of lisp values, the "marked PDL"¹⁵, AP pointing to the next available location. All saving of values of lisp variables is done on the marked PDL. Pointer register #1 ("AB") is reserved by the lisp runtime to point to a transfer vector of "lisp operators," assembler-coded routines to perform quick and basic functions (e.g., function return, function call, cons allocation, variable binding (for specials)) in the runtime environment. These routines must always be addressable by compiled code. Pointer register #2 ("BP") may be used at the discretion of the compiler, as may pointer register #3 ("BB"). BB, however, is also used by the cons-allocation operator to return its result, and this is known and used to great advantage¹⁶ so that BB might be stored with a hardware pointer-store to move the lisp value it contains around. Pointer register #4 ("LP"), during the running of compiled functions, is loaded by the function call operator¹⁷ with the address of the lisp linkage section of the object program. This is a block into which the function (load) which loads lisp object programs into the lisp environment places lisp values corresponding to all of the constants and special variables referenced by the source code. The compiler provides, in the object segment, information about what lisp value it has built the object program to assume will be stored in each slot of the lisp linkage section¹⁸. Pointer register #5 ("LB") can be used freely by the compiled code, but rarely is.

The contents of the pointer registers and the AQ are not subject to garbage collector scrutiny. At the time the garbage collector is invoked by the cons allocator, no active registers contain lisp values, all having been saved on the marked PDL.

Functions are called by placing the arguments to the function (i.e., the lisp values representing them) on the marked PDL: the first argument at the lowest address, the next 2 locations beyond, etc. AP will point to the location beyond the last argument (first available PDL location). A tspbp (transfer, leaving return address in BP) instruction is used to transfer to the function being called, transferring indirectly through the lisp linkage section. This will initially cause a transfer to an operator which finds the machine code for the function being called (if compiled or builtin), "snapping" the link¹⁹ by placing the runtime code address of the function²⁰ in that link, and transferring there. Thus, future calls branch to the function directly²¹ It is the responsibility of a called function to pop all of its arguments off of the marked PDL before returning; compiled code will assume that this has been done, and will do it. Functions return their result in the AQ, as a fully typed lisp value.

The garbage collector does not protect the marked PDL above AP.

There exists an "unmarked PDL," usable for storing values other than lisp values. It is addressed by the pair of AB and index (not pointer) register 7²².

V. Pass 2

Pass 2 is that part of lcp which actually compiles LISP: what it is given by Pass 1 is a subset of LISP with a few quantities added here and there which are necessary to be able to compile LISP in this way.

The goal of compilation is twofold: the first goal is to remove all of the overhead of the evaluator which is spent in figuring out what to do next and in what order to do it by figuring it out once and for all for every execution of this program. The second goal is to recognize calls to functions which can be compiled inline, i.e., code can be generated to carry out the semantics of these functions in a very small number of machine instructions. Good examples of such functions are +, rplaca, setq, cddar, and cadadr. When a function call can be compiled as inline code, all of the overhead associated with setting up its arguments on the marked PDL and performing the actual transfer can be eliminated. Now for functions like sortcar, although a persistent questioner might pose that inline code could be generated for this function, the relative overhead of argument list preparation is negligible compared to the actual work done by this function. Thus, a Multics LISP function call is generated for calls to such functions, as is for calls to arbitrary non-system functions.

A secondary goal of compilation is to create and manage the beings known as compiled local variables, the lexically-scoped value holders so akin to variables in "normal" programming languages. Basically, "slots" (fixed locations during function execution) on the marked PDL are allocated to such variables. This slot is known as the "home" of the variable. This is a definition of a home; this must be kept in mind during the confusion which is to follow. When this variable is setqed, its value is stored into this "home."

Another goal is to implement the semantics of fexprs: CONDs, PROGs, ANDs/ORs etc. appearing in the source code may be viewed as compiler directives to construct control-flow code; to generate conditional branches, executing code conditionally. SETQs map into the notion of assignment in compiled languages: a SETQ appearing in the source code is a directive to the compiler to construct code which stores a value in a local variable or in the value cell of a special variable. Each fexpr is like this: a special case for the compiler.

Compiling LISP source code also means identifying the constants: Quoted lists and atoms tell the evaluator to return what is behind the quote: the compiler calls this whole construct a "constant," and can compile code to use this constant value at the corresponding point of the compiled program.

It has been posed that it is not even clear what it means to compile an interpreted language like LISP: let us reiterate therefore what it means. To compile a LISP program is to produce an object program, which, when executed by the processor, will produce the lisp value which corresponds to the object that the interpretation of the source program is supposed to produce, and produce the same effect upon the lisp environment. With this in hand, let us develop inductively what it is that the compiler does.

The process of evaluation of a form has a direct analog in the notion of execution of code which produces the value of that form: As evaluation of a non-atomic form involves evaluation of the argument sub-forms, in argument order, execution of code producing the value of a non-atomic form involves execution of code producing the values of the argument sub-forms, in argument order.

Since machine²³ code executes in linear order, the compiler must produce code to develop the value of that form, by, at least in part, recursively considering the code to get code for all argument sub-forms at every level generated first. This is so for the same reason that eval must evaluate lowest-level sub-forms first: it needs the argument values before it can apply the function.

Thus, producing code to be the object code of a function consists of compiling code to develop the value of the lambda-body of the function²⁴ in the AQ, and pop the function arguments off the marked PDL. If the simplest approach to building a LISP compiler were to be taken, compiling a form consisting of a variable alone would consist of compiling code to load its value into the AQ. Compiling a form for a function call would consist of:

Reserving some PDL slots for the function arguments, compiling code to bump the PDL pointer if necessary
For each argument,
1. Compiling code recursively to develop its value (into the AQ), and
2. Compiling code to store that in the appropriate PDL slot.
3. Compiling a transfer-and-set-bp to the function.

This is not actually a half-bad approach. Several less hirsute lisp compilers do precisely this, with the help of stack-oriented hardware²⁵ in some cases. There are several cases, however, where this is less than optimal. One such case is where function code is to be compiled inline, as opposed to via a function call with arguments on the marked²⁶ PDL. Doing a little injustice to the Multics instruction set, our proposed "simpleton" compiler might generate for (+ a b)

          up-pdl   2slots
          load-aq  a
          store-aq pdl-slot-1
          load-aq  b
          store-aq pdl-slot-2
          tspbp    +_function

If + were a non-inline-compileable function, this is about as good as we could expect to do. Or if we were dealing in hardware that did "load and push on pdl" in one operation, and had operators in the hardware that dealt with the top n values on the PDL²⁷ this would again be as good as we could do. Thus, the lisp machine compiler is pretty simple.

The next approximation would be to change the function-call compiling algorithm to compile code to perform the function instead of to call it if it knew how. Given the typical single-address arithmetic instructions available on Multics and the PDP10, we would then have

          up-pdl   2slots
          load-aq  a
          store-aq pdl-slot-1
          load-aq  b
          store-aq pdl-slot-2
          load-aq  pdl-slot-1
          add      pdl-slot-2 + 1
           ;number in second word.

This is insane. What we clearly want to do is get rid of all of the PDLing around and argument preparation. What we want to get is clearly

          load-aq  a
          add      b + 1

and that's it.

Another case where the simple minded approach loses is in LISPs²⁸ where the argument passing convention is to pass arguments in registers instead of on a PDL. If we change "place it in the appropriate PDL slot" in our simple algorithm to "place it in the right argument register," the algorithm does not even work, because later argument calculations are almost guaranteed to destroy the value of that register. Even if they attempted to save it, a tremendous amount of useless data movement would result. This is a much more serious problem than the previous. The creators of NCOMPLR developed a technique to solve this problem which turns out adequate to solve the other problem, too. It is this "value management technique" which is the major arcanum of both lcp and NCOMPLR.

The technique of covering the source-code recursively in a manner identical to the evaluator is mandated by the definition of the evaluator that the object code is to simulate. The basic recursive flow of the simple-minded algorithm cannot be abandoned: where it is Losing, it must be fixed. The central function of pass 2, comp, previously mentioned in relation to p1, is that function which considers the source code²⁹ recursively. When a call to a function is to be comp'ed, the outputting of object code for that function will almost³⁰ always result in code terminating with a call or performing of that function. Some study will show that almost all object code inefficiencies of the simple-minded approach result from the loading and copying of values into PDL slots (and other places) when the object code for an in line function could have equally well used them in their homes. Now of course, if we have a call to an external function, we can still use the simple-minded algorithm, at least at this level of call, because we determined we had to do that anyway, and we couldn't do any better. But we can now say, "If we are compiling code for an inline function, only allocate PDL temporaries for argument values whose arguments weren't variables. When time comes to use the values for the machine operations, we will find them somehow, and get variables out of their homes."

At first inspection, this may not seem like a bad approach. In fact, many languages do just this. But this is no good at all for LISP. Consider the form

          (+ a (setq a b) a)

Our not-so-simple-minded approach now generates

          up-pdl    1slot
          load-aq   b
          store-aq  a ;inline code 
                      ;for setq
          store-aq  pdl-slot-1
          load-aq   a
          add       pdl-slot-1 + 1
          add       a

This is out-and-out wrong. The "add a" line adds a correctly but the "load-aq a" line is out and out wrong! The value of a which the + form wants is that before the setq of a changed the value of a. Thus, in order for this to work properly either

Before we attempted to avoid loading and storing a, we checked through all of the arguments of the form, recursively, to see what variables were setqed, to see if we needed to do this and did it if so. For this to work, either we spend a gigantic amount of time rescanning forms for possible side effects³¹, or have pass 1 build a "setq list" into every form³². In either case, we would have to know how to "locate the variable value," i.e., find out where a "good value" of a for each reference to a was at the time we generated code for the application of the inline function.
We could have the part of the compiler which comp'ed setq's recognize that someone was depending on the value of a before he generated code to setq it, and have him generate code to save the current value of a somewhere. In order for this to work, we need
1. A globally addressable list of variables which people have comp'ed but not actually used, that we will look through if we intend to setq some variable.
2. A way of finding out that this happened when we need the variable value after all the arguments have been comp'ed.
3. A way of finding out exactly where it was that this clever hack put the damned thing³³.

lcp and NCOMPLR implement the second approach. The "globally addressable list" of comped-but-unused results is called the loadlist: it is the binding of the variable of the same name.

To make this methodology consistent, rather than just being a "save the values of variables hack," the function (iloc, in lcp_cg_³⁴, one of the most critical actors in the whole compiler) which locates values, i.e., finds out "exactly where the thing is," and returns answers like

    BB
    AQ
    (temp 6) ;i.e., PDL slot at 6, #3
    (special x) ;i.e., special var named "x"
    (quote 5) ;i.e., the fixnum constant 5

can find any value which we somehow guaranteed had code compiled to produce it, not just variables. Well, what is it exactly that we give this thing to identify exactly what it is whose value we want to find, and how do we guarantee to it that indeed we comp'ed it?

Well, these questions answer each other, and dig this closely. Calling comp, like any lisp function, returns a value to the function, which called it. What comp returns is a type of lisp laundry ticket called a value tag. It is this value-tag that we hand to iloc.

These characters maintain a model of the runtime PDL and registers called the slotlist in which they keep value-tags representing what is currently assumed to be in each PDL slot or register. When we hand iloc a value tag, it searches the slotlist (and the variables AQ and BB, we will use the term "extended slotlist") to locate the register or PDL slot where the compiled code "stored"³⁵ or "left" that result. Note how elegantly the notion of "result" maps into "value-tag returned by comp." The latter is the compile-time implementation of the former.

comp always³⁶ conses the value-tag he is about to return onto the loadlist³⁷ so that people will look there before attempting to destroy its loadability. When we comp any kind of function call, in-line or external³⁸, comp, or he whom he calls, returns a "gensym" result for a value tag, i.e., the ncons of a new gensym. The fact that the cdr of this value-tag is nil identifies it as such. When we comp a variable, we return as a value tag a cons of the symbol representing that variable with the current value of cnt, the simulated-evaluation time counter incremented every time a variable is comped or a setq is comped, and other times. Thus, the result of handing s to comp might return (s.264) and set cnt to 265. This is the implementation of "x now" or "x then" or "x before that happened." Thus, the loadlist does not contain requests for the value of x as "x needed by somebody," but "x at time 264 needed by somebody." These "x needed by somebody"'s are called "unrealized demands on x," or "unrealized loads" or "outstanding demands," etc., on x. Managing outstanding loads is the better part of what lcp does for a living.

Thus, the the general algorithm for comping an inline function is FIRST to get the comp results of all of the arguments, and THEN call iloc on these value-tags to find out where the results "are"³⁹ and generate hand-tailored machine code to operate upon these quantities optimally from these locations⁴⁰. Finally, construct a "gensym" value tag, and before returning it, place it in the correct slot in the slotlist (or in BB or AQ) so that iloc can later find our result. comp (who called this specific routine) will cons this gensym result on to the loadlist.

Wait. Stop. What is the meaning of this? Why is comp putting gensym (function) results on the loadlist? We understand that we put unrealized variable loads on the loadlist so that the setq-comper (compsetq) can notice them before it setq's a variable and⁴¹ causes its value to be saved. But no-one is in danger of setqing a function result. Why that's as silly as (setq (cos x) 22)! Or is it? Where does that function result live? In whatever register the compiler claimed it left it when it compiled the call. What is to prevent the next code sequence from clobbering that register? There's only one BB and an even scarcer AQ to go around, and most function results are left there. If every inline function saved the AQ before it executed, we would generate a lot of useless code and soon run out of places to uselessly save the AQ. We know that at some times the AQ has something useless, something duplicated elsewhere, or nothing known in it⁴², and the rest of the time has something critical in it. How can a part of the compiler which is going to clobber the AQ know whether it has to be saved or not? Simply by checking the loadlist to see if there is an unrealized demand on that result. For a gensym result, it's easy: if it's on the loadlist, but not elsewhere on the extended slotlist, we'd better find a slot in the slotlist, and save it. For a variable, it is a little more complex. For a constant (see earlier footnote), we need never save it. Thus, we have the concept of damnability result. A result is damnable⁴³ if no unrealized load depends on it. A register that contains a damnable result may be used for something else; a register that contains a non-damnable result must be saved before it is reused⁴⁴. The predicates storeaq? and savebb? are used to test the damnability of the values of AQ and BB in lcp, and save them if needed.

A list of all possible forms of value tags is given in Appendix A.

When PDL temporaries are needed, the damnability test is applied (by freeable) to the contents of each slot on the current PDL map (slotlist) to see if that temporary can be reused. Note that those PDL slots which are the "homes" of local variables become reusable after cnt passes the last cnt of a use of this variable as recorded in the locvars list by pass 1. Already-used function results (which therefore don't appear on the loadlist) are always damnable, and PDL temps or registers containing them can freely be reused.

The damnability and ilocing of variables is a complex issue because variable values, unlike function values, are not unique⁴⁵. All compings of the same variable between two setq's refer to the same lisp value at runtime, and can be made into references to the same value (usually the home of the variable) at compile time. In the most trivial case, let us consider the form:

      (+ x x (- y z) x)

After all arguments have been comped, the loadlist will look like this:

      ((x.347)(z.346)(y.345)(x.344)(x.343))

When iloc is called to realize the demand, i.e., come flat out and say where the value can be found, of (x.343), (x.344), or (x.347), having no reason to believe otherwise, all of these demands can be realized from the home of x (which we denote as (x.home) for this is the not-really-value-tag which appears in the slotlist position for x's home). And thus they will be.

Clearly, the function that is about to compile code to setq must put something somewhere to cause iloc to realize that (x.home) won't do, if it is about to setq x. The "somewhere" is the extended slotlist, and the "something" is a value-tag for a later or contemporary x, i.e., (x.now) where "now" is the current cnt equal or greater to the cdrs of outstanding demands on the loadlist. When iloc is asked to realize a demand on a dotted-fixnum (variable) value-tag, it calls locvalue to find if there are x's on the extended slotlist (with corresponding values in runtime registers, of course) later or contemporary (higher or equal cnt) than the cdr of the input value-tag. The earliest such value on the extended slotlist was stored there so that we might realize our demand on x from it. That value is the right x.

To "get something onto the extended slotlist" all we must do is load it into a register (e.g., the AQ), and indicate that we have done so by placing a value-tag for it in that extended-slotlist position (e.g., the variable AQ). That value may be shoved around, but never lost, because the unrealized demand on an x, on the loadlist, justifies this value's existence. When iloc finally calls for it, it is guaranteed that it will be somewhere on the extended slotlist, i.e., somewhere on the runtime PDL or in some register. It is not damnable.

VI. Value Management: An Example

Consider the compilation of the following form:

          (+ x (setq x '7) x)

Assume an initial cnt of 0. Assume x's home is PDL temp 1, and thus slotlist position 1 contains (x.home).

comp is given (+ x (setq x '7) x).

comp passes the buck to comparith, who knows how to compile arithmetic functions. He calls comp on the first x recursively. comp generates the value-tag (x.1), conses it onto the loadlist, and returns it to comparith. comparith remembers this value.

The current state of the world: the loadlist has (x.1) on it, cnt is at 1, the slotlist has (x.home) in position 1, and comparith has (x.1) in hand.

comparith now invokes comp recursively on (setq x '7). compsetq gets the buck. The second operand is comped, obtaining a value tag of '7⁴⁶ which is retained by compsetq. Knowing that x is about to be setqed, compsetq calls cleanup-var-load on x to see if there are unrealized demands on x on the loadlist that do not already have x-value-tags on the extended slotlist to satisfy them. In other words, it asks "Are there compilations in progress which need the current value of x, and thus should I save it?" cleanup-var-load does this by asking the predicate gotvalue on each unrealized demand on x in the loadlist. gotvalue, given a specific x-value-tag from the loadlist, walks over the entire extended slotlist, looking for any x-value (except, obviously, (x.home), because that's gonna change right now) from which this particular demand on x (i.e., (x.1)) might be realized. gotvalue calls the predicate goodval on the contents of each element of the extended slotlist, to see if it is such a value.

When cleanup-var-load obtains nil from an answer from gotvalue on (x.1), it concludes that the current value of x must be saved, and calls savevalue on (x.1). savevalue locates x's home, and (assuming the current value of x is not already in the AQ), generates a Load AQ instruction to load that home into the AQ⁴⁷. The AQ position of the extended slotlist (the variable AQ) is set to (x.idup), indicating that it contains "an idup of x," a copy of the home made for this purpose. goodval and locvalue (the former used at setq time, the latter at iloc time) and the damnability predicates all consider an idup to be as recent as can be, i.e., as recent as x.home. At the end of processing the loadlist, fixidups is called to actually rplacd all idups (and dups, which we will meet below) to the current cnt⁴⁸.

The current state of the world: cnt stands at 1. comparith has (x.1) in its hand. compsetq has a '7 in its hand. The loadlist contains (x.1). The slotlist has (x.home) in position 1. AQ has the idup (x.1) in it.

compsetq now increments cnt to 3, as convention is to increment it by 2, and calls get-in-aq on the value-tag '7 (NOT the form '7) as previously returned. get-in-aq realizes that it is going to load something into the AQ that it has determined is not there already (recall that (x.1) is there now), and thus calls clearaq to call storeaq? to see if AQ must be saved. Indeed, storeaq? finds that AQ contains this variable value, and calls damned-variable on it to see if it is damnable. damned-variable searches the slotlist and finds no other value for x (it will not look in x.home, because that might be stored into later). Thus, (x.1) must be the most recent x around. It then searches the loadlist for unrealized demands on x, asking the following question of each: "If the x-value whose tag I was passed as an argument were to be damned, i.e., not kept around, would there still be some x-value somewhere on the extended slotlist from which this unrealized demand might be realized?" Since there were no other x's (other than (x.home), of course) on the extended slotlist, the answer here must be "no, there would not," or "you'd better save the value you asked me about." storeaq?, being told this by damned-variable, decides that indeed, it must save the AQ, and it generates code to store the AQ at, say, (temp 4), i.e., the second PDL location. To indicate that it has done this, the second element of the slotlist is rplacaed to (x.1), by copying what is in AQ.

Current state of world: cnt stands at 3. The slotlist has (x.home) in its first position, and (x.1) in its second. The loadlist has (x.1) on it, and AQ has (x.1) in it too. (These are not eq, by the way.)

get-in-aq proceeds to load the constant 7 into the AQ, setting the variable AQ to the value-tag '7 to indicate this. compsetq generates code to store the AQ at (x.home), i.e., the PDL temporary which is the home for x, because this is what it means to compile a setq. compsetq sets the variable AQ to (x.dup), meaning "a duplicate of the home value of x, because I either just loaded it from there or (as in this case) just stored it from here." This "dup" of x, is considered to be as recent a value of x as can be had by locvalue (at iloc time), the damnability predicates (at register-re-use time) and by goodval (at setq time). The reason we use (x.dup) instead of (x.3) is so that later demands on x might be satisfied from this same value; As more and more demands on x are made from this point on, (x.dup) in the AQ can say "Yeah, we can take care of all of you." That is to say, were our form

     (list x (setq x '7) x x x x x)

the last 5 x's can all be found by iloc in the AQ, which is a good place.

Now to make this "dup" hack work, this "dup" better know when it starts to become invalid, and had better take its philanthropic x-satisfier out of business. That time is such time that we are about to setq x: fixidups, which we met earlier, rplacds all dups of x to the current cnt as well as idups. From that point on, (x.dup) is (x.number), and that number is the upper limit on his validity.

Dups have the property that they are always damnable, from the moment they are created. As long as they are dups (i.e., not rplacded to (x.number)), they are guaranteed to be a copy of that is in (x.home), and thus can always be dispensed with. The damnability predicates know this. Idups, on the other hand (made by savevalue) are never damnable at the moment they are created: savevalue put them there because gotvalue saw that an unrealized demand on x needed this to happen. Idups become damnable when all unrealized demands on x that had been depending on that saved value are realized, i.e., removed from the loadlist.

compsetq, while putting (x.dup) in AQ, returns (x.3) (3 is the current cnt) to its caller, comp, who conses it onto the loadlist, and returns it to comparith. (x.3) is a representation of the value of x representing the value of the setq form, not necessarily the current contents of the AQ. To return a dup as a value would be meaningless.

Current state of world: The loadlist has (x.1) and (x.3) on it. comparith has an (x.1) and an (x.3) in its hand. The AQ has (x.dup) in it. The slotlist has (x.home) in position 1, and (x.1) in position 2. The cnt stands at 3.

comparith now has two values (i.e., two value-tags as laundry tickets for compiled results), and can start adding stuff. A clever little subroutine called get-fixnum-commu is called with the value tags (x.1) and (x.3). Knowing that it is being invoked on behalf of a commutative operation, get-fixnum-commu tries to see if either is in the AQ, by calling the predicate in-aq on both. Acting as a kind of iloc, in-aq sees that it is being asked about a variable in both cases (i.e., (x.number), and asks iloc's helper locvalue (see above) if the AQ can satisfy the demand on hand. locvalue sees that there is no x (other than (x.home)) on the extended slotlist later than (x.3), and thus (x.home), or (x.dup), which is as good as right now, can indeed satisfy this demand. The (x.dup) in the AQ can be used as (x.3). get-fixnum-commu thus removes (x.3) from the loadlist, realizing the demand on that value of x, and returns the other argument, (x.1) to comparith, having "gobbled" the (x.3) by indeed fulfilling its contract that one of its arguments is in the AQ and the other returned.

comparith now contemplates generating code to add the (x.1) into the Q register (the number part of the AQ when it contains a lisp object). Since it is going to change what is in the AQ, it calls storeaq? first to see if what happens to be in the AQ is some idup or something like that (even though it is also, by coincidence a good value for our variable) and has to be stored⁴⁹.

comparith calls ilocs, a version of iloc which refuses to see things in the AQ (or BB), storing them if necessary⁵⁰. iloc is given (x.1): seeing that it is a variable (i.e., (x.1) not (gensym)) it calls locvalue. locvalue searches the extended slotlist and finds that there is indeed a value on it, (x.1) in temp 2, which is not earlier than our (x.1), and that is the earliest (and in this case only) x of which that is true. Thus, iloc returns (temp 4) to comparith. comparith compiles an instruction to add the second word of temp 2 to the Q-register.

comparith has now changed the entire state of the world. It must make this apparent. It has used the result (x.1) that it was given by comp: thus it removes (x.1) (by a call to remove) from the loadlist. Note that this implicitly renders the (x.1) in temp 2 on the slotlist damnable. The AQ no longer contains (x.dup), but a new result. So comparith sets AQ to the ncons of a gensym (function value representation). comparith now conses this value onto the loadlist to make sure that it does not get damned, i.e., the AQ reused while the remaining argument(s) are comped.

Current state of world: comparith has the value tag (g0005) in its hand. AQ has (g0005) in it. The slotlist has (x.home) in position 1 and (x.1) in position 2. The loadlist has (g0005) on it. The first addition has been compiled.

comparith now loops around its potentially-endless form and comps the final x in the form. comp, invoked recursively, increments cnt by 1, creates the value tag (x.4), conses it on to the loadlist, and returns it to comparith.

Current state of world: comparith has the value tags (x.4) and (g0005) in its hand. AQ has (g0005) in it. The loadlist has (g0005) and (x.4) of it. The slotlist has (x.home) in position 1 and (x.1) in position 2. The cnt stands at 4.

comparith calls get-fixnum-commu on (x.4) and (g0005). get-fixnum-commu asks in-aq on g0005, and being a gensym result, it is the only good value for itself, and yes, it is in the AQ. So get-fixnum-commu removes it from the loadlist and returns (x.4) to comparith. comparith calls ilocs on that (x.4). There are no saved x's on the slotlist, so (x.home) (or any dup or idup of x, but, in this case, there are none) will do. It returns (temp 2), or whatever temp (x.home) lives in. (The slotlist contains (x.home) for (x.home), not (x.number) or the locator (temp 2), which is not even a value tag.) So comparith generates code to add x's home to the Q-register.

Again, the state of the world has been changed. The result corresponding to (g0005) is no longer in the AQ. What is more, (g0005) has been used. comparith deletes (g0005) from the loadlist, and sets the variable AQ to (g0006), a new gensym result. comparith notices that it finished all its arguments, and returns (g0006) to comp to cons onto the loadlist, and return to its caller.

The compilation of (+ x (setq x '7) x) is complete. The state of the world is as follows: The loadlist contains (g0006). The slotlist still has (x.home) in position 1, and the damnable variable (x.1) in position 3. The cnt stands at 4. The AQ contains (g0006). comparith is off the stack. The caller of comp on the + form has been given a value-tag for a non-damnable value now in the AQ. He can iloc it as he sees fit whenever he so chooses.

VII. Value Management: In Summary

We see that Pass 2 value management is an interaction of four sets of primitives, keeping track of the location of results, the damnability of results, and the naming of results:

comp, the producer of results, who leaves them in registers or PDL locations, and conses their tags on to the loadlist. comping produces object code, and changes variable values and register contents.
iloc, (and his friends like in-aq, in-bb, get-in-aq, etc.), who finds results in the registers and PDL locations, and calls locvalue to find savings of variables to realize demands on them. ilocing may produce object code, which might even change the contents of registers, but will never have side effects like changing the values of variables.
cleanup-var-load, called at the time that a variable is about to be setqed, to see if there are unrealized demands on this variable by searching the loadlist.
The damning predicates, storeaq?, savebb?, freeable, and damned-variable, which, looking at a result, see if that result is destructible or must be saved. These primitives are called when code is generated to put new results in registers or PDL locations, or usable PDL locations must be found.

comp returns a result to a specific-function compiler having put it on the loadlist, guaranteeing that iloc will be able to find it, or something that will satisfy the demand for it, someplace. When a specific-function compiler calls remove on a result, to remove it from the loadlist, it becomes damnable. The damning predicates determine if the result in a specific register or PDL location must be saved or stored, because some demand for a value will not be able to be satisfied without it. cleanup-var-load exists because demands for variables are generally realized from the home of the variable, and the old value in the home must be saved if an unrealized demand on the variable exists at the time it is to be setqed.

Local variables can only be setqed in the function in which they are defined: they are lexically scoped, like PL/I or Algol variables. Thus, the compiler, while compiling a function, knows to create idups (i.e., perform a savevalue) upon them when an unrealized demand on them exists when a setq is about to be compiled. For the case of a special variable, this is not so. They are dynamically scoped: any lisp function can freely (so to speak) change the value of any special variable. Thus, pass 2 must consider any call to a function whose behavior is not known (i.e., non-builtin function) to be potentially a setq of all special variables that exist. Thus, when such a call is comp'ed (by make-call) a cleanup-var-load is done for all special variables on the loadlist (i.e., all unrealized demands on special variables). This is performed by cleanup-special-var-loads.

Among the functions that the compiler knows about, it is capable of determining which can potentially setq special variables and which cannot. For instance, cons cannot, but set (not setq) certainly can. break, or other random user-interaction functions can. Those functions which can potentially setq special variables are defined in the include file compiler-badfns.incl.lisp, and are returned as the list value of the macro badfns defined therein. Thus, make-call can check all system functions against (badfns) to see if a cleanup-special-var-loads is required. Pass 1 also checks this list when p1'ing to see if the call-out flag, p1cof, should be set. This is used to set the downward-propagating call-out flags in conditional forms, already discussed, but not yet motivated.

There are times when it is necessary to perform a cleanup-var-load on all demands on the loadlist. This is basically for catch and errset, which can receive control from arbitrary code not compiled in this function, invoked by an unpredictable point in this function. When the throw or error occurs, the catch or errset receives control. Let us consider the following form:

   (foo x (errset
             (progn
                 (bar)
                 (setq x 5)
                 (quux))))

Upon comping the errset, an unrealized demand exists for the x, the first argument to foo. While compiling the progn, the setq of x to 5 would perform a cleanup-var-load on x, in preparation for the setq, storing an idup of x into some PDL temp. x not being special, the external calls to bar and quux do not cause a cleanup-var-load on x. At the completion of the comping of the errset form, the value-tag for the first x would presumably be ilocable at this PDL temp, for the setq caused code to load it and probably code to store it before the 5 was loaded. However, if a lisp error occurred while bar was executing, control returns to the errset without the code for saving x ever having been executed. The code that loads x for the call to foo would load garbage, for iloc has clearly lied.

The solution to this dilemma is that the comping of any form which can receive control from an unpredictable arbitrary depth within it must cleanup all variable loads. cleanup-var-loads does this.

A little thought here leads us to consideration of conditional forms, where the problem is somewhat simpler. The clever reader can begin to motivate the setq list and call-out flags in such forms. I, however, cannot discuss this until I have explained the compilation of branches.

It is worthwhile stating the precise algorithm of locvalue, which given a value-tag for a variable, finds that value, guaranteed by the damning-predicates and savevalue to be there, that can satisfy this demand. Recall that locvalue is called by iloc and those like him:

Among all the extended slotlist, i.e., the PDL, BB, and AQ, with dups, idups, and home being considered as (x.now), the earliest x on the extended slotlist which is not earlier than what we're looking for, satisfies this demand.

Example:

 loadlist 1 2 3 5 6 7 9 10 11 12 14
 slotlist     3     7 9       12    home

where all the above are cdr's of (x.---).

(x.1), (x.2), and (x.3) demands all can be satisfied from the (x.3) on the PDL. Clearly, at time cnt = 3, a savevalue of x was done by one of the cleanup-var-load'er's, either because of an imminent setq, a random-function call if x is special, etc., as just discussed. (x.5), (x.6), and (x.7) can be satisfied similarly from the (x.7) on the PDL, and (x.10), (x.11), and (x.12) from the (x.12) saved there for this reason. The reference to (x.14) has no later x's saved: the current value of x can be used, e.g., (x.home) or a dup of x in the AQ or BB.

A similar consideration of the damnability predicate on the value-tag of a variable is worthwhile:

A value-tag for a variable (and thus the register it is in) is damnable if there are no demands for that variable on the loadlist for which iloc would have to return this value.

Thus, in the above example, there are no damnable quantities on the slotlist, because we gave a case above where each one of them would be found by iloc. Were, however, the (x.9) to be used, and thus (x.9) removed from the loadlist, the (x.9) on the slotlist would be damnable. Picture:

 loadlist 1 2 3 5 6 7  10 11 12 14
 slotlist     3     7 9      12    home

Another way of stating this is that a variable value is damnable if there are no earlier demands on the loadlist for which there is not a value of this variable earlier than us, but later than or contemporary to the demand.

Similarly, we must state in entirety the algorithm of the var-load-cleaner-uppers, which determine if an idup must be made of a variable (i.e., a savevalue done) when a setq is about to be done. This is implemented by goodval:

An idup must be made at cleanup-var-load-time if there are outstanding demands on the loadlist for which iloc would find (x.now) if it were there.

Another way of stating this is that an idup must be made if there exists at least one unrealized demand for this variable on the loadlist for which no contemporary or later value other than the home exists on the extended slotlist. Examples:

x needs to be saved:

loadlist 1 2 3 5 6 7 9 11
slotlist     3     7     home

x need not be saved:

loadlist 1 2 3 5 6 7 9 11
slotlist     3     7     12 home

The damnability of a variable-home is the "and" of the damnability of (x.now) and cnt being greater than what is stored in locvars (last-use-cnt) for this variable. The damnability of a slot containing a function-result is the inverse of whether that result is on the loadlist.

VIII. Object Code List

lcp produces machine-language output, not LAP. Instructions produced are most often 36-bit fixnums representing a full Multics instruction. A list of such fixnums, i.e., a chain of conses with fixnums in their cars, is maintained by outwrd. The variable pc represents the simulated program-counter (instruction counter); the variable codelist keeps the head (highest address, place where things are being added) of this chain.

Most instructions, those which call operators, increment or decrement the PDL pointer, load constants, or branch backwards, are completely-determined constants at the time they are consed onto the codelist. Others, such as references to the PDL, references to functions, references to constants, and forward jumps are not. It is the function outinst, given a fixnum opcode and a representation of a runtime address, almost always an iloc answer, which must deal with this.

PDL references are not known at the time they are consed onto the codelist because the size of the PDL frame for a function is not known until the compilation of the function is finished, and PDL references are made as negative off of the AP. There is only one PDL pointer, and it points to the location beyond the top of the PDL region for this function. Thus, (temp.inst-2N), where inst-2N is a machine-language instruction whose address-field specifies twice (for 2 words per PDL slot) the PDL slot number, is consed onto codelist. At the completion of function compilation, pass2 scans the portion of the codelist for this function, rplacaing all conses with caars of temp with their cdars with the known PDL-frame size subtracted from the offset.

Instructions which need to reference numeric constants can often be compiled into "immediate" instructions by outarith, which will wind up in the codelist as normal fixnum instructions. But often, full doubleword lisp-value typed pointer constants must be referenced. These are compiled into the text of the object segment following the function. Thus, their locations cannot be known until compilation of the function is complete. Therefore, outinst places a cons similar to that for PDL references, but with a car of literal instead of temp in the codelist. get-literal-addr keeps track of pooling these literals, via a hash table of all constants referenced maintained by make-const. Thus, pass2 rplacas all conses containing this appropriately, adding in the base of the literal pool after the function to their cdars.

Similarly for array references and function references in the object program, where conses with array and function are put on the codelist. Since these snappable pointers⁵¹ are pooled, and will appear in the LISP linkage section, the final instruction cannot be known until the layout of the whole LISP linkage section is known at the end of the entire compilation of the source file. Thus, conses with array and function as their cars are consed onto the codelist, with the template instruction containing the relative offset into the particular pool (which is known at the time the instruction is encountered). When all compilation is complete, finish-code is called, which lays out the LISP linkage section, knowing how many of each of these things there are, and scans the total code list to rplaca conses having these things in their cars with their cdars with the correct pool offset added.

Constants, e.g., list structure, symbols, bignums, and strings referenced by the compiled code also are referenced via pointers in the LISP linkage section. However, since the constant pointers (which, by the way, are garbage-collectable lisp values) are placed first in the LISP linkage, the linkage offset of constants (once uniquized by make-const) is known at the time instructions referencing them are compiled, and thus, these instructions can be consed into codelist as fixnums.

References by the object code to itself are always made in instruction-counter relative format; the object code contains no object-code addresses. Thus, when the compilation of a given function is finished, there are no instructions in it which make any assumptions about the value of the program-counter at runtime. This allows pass2 to insert code at the beginning of the function, i.e., rplacding a deep point in the codelist, with new code⁵² after function compilation without upsetting any code.

The entire codelist is nreversed by finish-code, to put location 0 at the top, before handing it to lisp_cg_utility_.

IX. Branching and Tags

Divergences from the standard order of evaluation are implied in the LISP language by the use of COND, AND, OR, GO, RETURN, and other hairier dynamic constructs (e.g., THROW, ERRSET, etc.). All of these constructs imply conditional evaluation or explicitly-controlled evaluation in the interpreter; in compiled code, these map into the conditional execution of code, and explicit transfer of machine control within a compiled program.

The standard technique in most compilers, which is used by lcp, is the generation of tags, or compiler-generated labels, which reduce all cases of control flow constructs to go-to's. Thus, a construct like

       (cond (predicate (do-this))
             (t (do-that)))

generates code something like this:

      (code for predicate)
      conditional-transfer middle-tag
      code-to-do-this
      transfer end-tag
middle-tag:
      code-to-do-that
end-tag:

The notion of tags is an extension of the tags in PROGs: PROG tags are a special case of tags in general. By reducing all internally-generated and implicit control transfer to the simple case of the go-to, the ultimate run-time control construct, all cases can be handled. Thus, we consider the operation of the simple tag mechanism⁵³.

The compiler generates tags for PROG labels, middles and ends of conditional forms, etc. In cases of conditional forms⁵⁴ the compiler generates branches to these tags, conditional or unconditional, as part of rendering the semantics of these forms. The compilation of an actual GO or RETURN, however, even if it appears in a conditional form, is a more complex deal, involving issues of PDL management, which will be described in a subsequent section. The compilation of a GO or RETURN, however, always culminates in the compiler generating a conditional or unconditional transfer to the PROG label (or bottom-of-PROG internally-generated label.

When the compiler (comp) is asked to compile a form like (go g0006), a primitive called outjump is called with an opcode and the tag g0006. If earlier processing has not already passed the definition of the tag g0006 (at PROG top level in comp-prog), outjump passes the symbol g0006 to outinst. But if the location of g0006 is already known, i.e., the runtime program-counter to which it corresponds, the difference of the program counter from the current one is computed, and an instruction-counter-relative⁵⁵ and an appropriate transfer constructed. The instruction-counter value corresponding to a tag is hung off its lvalue property; if it has none, its definition is not yet known.

If an attempt is made to output an instruction which references a tag which has not yet been defined, outinst senses that this symbol has no lvalue property. It outputs an instruction with the negative value of the current program-counter, and pushes the cons of the codelist into which it did so onto the references property of the symbol operand it was given. Thus, when a tag is finally defined, the symbol which represents it has off of its references property a list of conses of the codelist, whose cars are instructions whose address fields must have the location counter at this time added to them⁵⁶. At that time, the symbol gets its lvalue property as well, and all future references reduce to the previous case.

It should be clear that references to tags can be processed by outjump whether or not that tag has been defined, assuming that at some time it gets defined. Thus, the compiler can generate a symbol any time it desires, representing a tag it will define later, and put it in an instruction being handed to outjump. When the time comes to define that tag, a call to define-tag on it will do all the right things.

When a tag, regardless of whether it was user-generated or compiler-generated, is to be defined, it is given as an argument to define-tag. define-tag conses it onto a list of tags to be defined at the next instruction. First thought leads us to believe that this is foolish, that since the location counter of the next instruction is a certainty, we could define it right now, and go back and fix all references (done by fix-refs) at this time. However, the actual defining of the tag, being the putting of the lvalue property and the rplacaing of the forward references, is delayed until it is ascertained that the next instruction is not itself an unconditional transfer. If the next instruction is an unconditional transfer, then all references to the tags about-to-be-defined should be made into references to the target of the unconditional transfer, to avoid transfering to transfers and so on⁵⁷. Thus, outjump, when about to output an unconditional jump, checks to see if there are labels that were supposed to be defined as this instruction. If so, a fix-refs is done to all forward (already-encountered) references to these labels, giving the target of our instruction (assuming that it has already been defined, i.e., has an lvalue property) as a definition for each of those labels. If our target has not yet itself been defined, then all of these "tags to be defined" are hung off of the synonyms property of our target tag. When a fix-refs is finally done on our target tag, either by the normal (label is defined) mechanism or the transfer-tensioning mechanism, fix-refs will process references to these synonyms as well as references to our target tag, including our instruction. Examples:

Source construct:

          (prog ()
           c
           (cond ((p1) (do-this))
                 (t (do-that)))
           (go c))

The tag c is first encountered. It is handed to define-tag, which conses it onto labels-to-define. The code for the first predicate is compiled. When the first attempt is made to output a word, by outwrd, he sees that there are labels to define, and defines c as "here," giving it an lvalue property of the current location counter. The code for the first predicate is then output:

c:
          code-for-predicate
          conditional-transfer g0053

where g0053 is a compiler-generated gensym being the tag for the "middle-of-the-cond" place which hasn't been defined yet. g0053 gets a references property of a list containing the cons containing that last conditional-transfer instruction in the codelist. Object code for the first consequent is generated.

          code-for-do-this
          transfer g0054

g0054 is another compiler-generated tag, this time representing the as-yet-undefined "end-of-the-cond" place. It gets a references property of the list of the codelist cons containing this transfer instruction. The second clause is now compiled: The middle-of-the-cond label, g0053, is now defined (compcond kept it on hand) by passing it to define-tag, who conses it on to labels-to-define. The code for the second-consequent⁵⁸ is now generated:

When an attempt is made to output the first word of code-for-do-that, outwrd sees that there are labels to define, and defines g0053 as "here," fixing the previous conditional transfer by changing its address field. It found it from g0053's property list.:

g0053:  code-for-do-that

A transfer is not generated by compcond for the last clause; the object code drops through to the end-of-the-cond tag. compcond now defines the end-of-the-cond tag by passing g0054 to define-tag. It gets consed onto labels-to-define, which had been nil. We now attempt to compile the (go c): outjump is handed c, with an opcode of "unconditional transfer." outjump sees that there are labels to define, namely g0054, and sees that c is already defined, from its lvalue property. So a fix-refs is done on g0054, with the location of c. The conditional transfer to g0054 which had been generated is found from the references property of g0054, and is changed to a transfer to c. The unconditional transfer is now generated anyway for the benefit of the second cond clause.

          transfer c

The case of synonyms is much harder to get from reasonable source constructs, although constructs like

         (cond ((this)(that))
               ((otherwise)(maybe)))
         (go a)

where a is further down the PROG, and has not been defined yet, will give a synonyms property of the end-of-the-cond tag when an attempt is made to define that.

There is also a fairly recent hairy hack to condense conditional transfers around transfer instructions into transfers in the opposite sense to the unconditional transfer's target. This also includes eliminating transfers to "the next location." Both of these constructs are generated naturally by the compiler because of the interactions of compilations of control-transfer and conditional constructs at different level. These peephole optimizations are performed by tra-adjust. Whenever an unconditional transfer is compiled, the next attempt to output a word calls tra-adjust, which inspects any labels to be defined to see if they are referenced by the penultimate instruction. Since that unconditional transfer cannot have any labels on it (branches to it; they would have been tensioned out) it is eliminated, and the conditional transfer 2 instructions back gets its sense inverted, and gets redirected as a reference to the unconditional transfer's target.

Example:

          transfer-zero x
          transfer y

becomes

          transfer-nonzero y

when an attempt is made to define x as the next location. The function tra-adjust takes care of moving around the references property between x and y, etc.

X. Branching and Value Management

Transfer of control within the object code produced by the compiler has widespread and deep implications upon the value management scheme described in a previous section. Code for conditional forms is compiled linearly, one edge⁵⁹ at a time. The model of the runtime PDL and registers maintained by the extended slotlist, as described up until now, assumes that their contents at the beginning of each instruction are the same as at the end of the previous instruction. In the case of an instruction with a tag on it, i.e., the target of any kind of branch, this is not so. The compiler must be able to model the contents of the runtime registers at all times: thus, when a referenced label is encountered (and defined, thus define-tag concerns itself with this), the extended slotlist must be reconstructed from the contents of the extended slotlist at each point which branches to this tag. Therefore, unless a given value is in the same position in the extended slotlist whenever a (conditional or unconditional) branch to a given tag is compiled, that value may not be assumed to be in that position of the extended slotlist when the code at that tag is compiled. The extended slotlist at a tag may be described as the intersections of the extended slotlists at its forward references. If a tag has backward references⁶⁰ nothing may be assumed to be in any position of the extended slotlist, except the homes of all variables defined in all regnant lambdas and progs. This is because we have not compiled the code yet that branches here, and we have no way of telling⁶¹ what will be in the runtime registers and PDL when it branches here⁶².

The compiler therefore merges extended slotlists to form the extended slotlist at a tag. The bulk of this work is done by the function level-tag, which is used by the compiler every time it wants to call outjump on a symbol representing a tag. level-tag is thus called when a branch to a tag is about to be generated. level-tag maintains off of the level, AQ, and BB properties of the tag a copy of the extended slotlist which will be merged with the to-be-current slotlist at the time the tag was defined. The first time a tag is referenced (level-tag is called each time), the current extended slotlist is copied, and hung off these properties of that tag. Each succeeding time, until the tag is defined, the extended slotlist at that time is merged into the copy hung off these properties, setting to nil (empty) those locations in the copy that are not common in both.

When the tag is finally defined, by define-tag, he merges the saved, merged slotlist with the current slotlist. If the previous instruction was an unconditional transfer, (as indicated by the variable useless-code), there is no current extended slotlist---the saved image becomes the real one.

The extended slotlist at a referenced tag is therefore the intersection of the final extended slotlists over all edges that lead to that point.

The situation is complicated further by the existence of unrealized variable loads. Consider the form

  (foo x (cond ((what)(setq x 5))))

When time comes to comp this form, comp returns, say (x.356) for the first x, consing this on to the loadlist. The cond form is now comped, by make-call, who is sequentially comping arguments to the foo form. At the time the setq form is comped, cleanup-var-load, invoked by compsetq on x, notices that x is about to be setqed, and that there is a demand on x on the loadlist that depends upon the current x. So, as we have seen, the value of x is saved, and winds up in a slotlist position, as an idup, at the time the comping of the cond form is finished. But what happens when the "end-of-the-cond" tag is defined by the code after the cond? The merging of slotlists with the code that branched here if (what) were nil insists that that idup not be in the slotlist, for surely, it was not there when the conditional transfer after the predicate was compiled. What this is saying is that dependent upon the object-time value of the (what) result, x either got saved in some slotlist position or didn't. Well, one might argue that that's reasonable, in one case x got clobbered, and in one case it didn't. But the compiler will have a jolly hard time ilocing x when the arguments to foo are to be prepared, because it cannot know how the object code arrived here⁶³. Thus, the compiler must ensure that (x.356) will be ilocable at the merge point of all edges emanating from the point where the demand on (x.356) was placed on the loadlist. There is never any problem (i.e., the previous explanations hold) as long as there are no conditional branches. But at the point where a form which can conditionally branch is encountered, idups must be made of all unrealized variable demands for which idups would have to be made along any edge of the cond. This must be done at the furthest point in the program where you "have to go through" to get to the point where the demand on (x.356) is realized⁶⁴. Therefore, at the time a conditional form is encountered, we must know precisely what variables are setqed anyplace within it, so that we might make idups of them at this point, the back dominator of the demand-realization-point, so that each path to the demand-realization point will have a usable value for that variable at the same point in the slotlist. Once a non-damnable variable is in the slotlist (not the extended slotlist---this is not so of the AQ and BB) it will stay there. A non-damnable value in the AQ or BB is guaranteed to be saved somewhere, so it is ilocable, but is not guaranteed to stay in the AQ or BB. Thus, the code at the back-dominator of a delayed variable load⁶⁵ before a conditional form must do a cleanup-var-load on each variable setqed in that conditional form, to be sure that an idup is made, and then do a storeaq? and a savebb to make sure that these idups will be in the same place at the target of all of the conditional transfers.

This is precisely the long-awaited reason why pass 1 goes through all this hair to maintain lists of setqed variables and call-out flags in all conditional forms. If any external function or "badfn" function is called in the conditional form, a cleanup-special-var-loads must be done as well.

Note that we already encountered something related to this in the handling of ERRSETs and CATCHes in an earlier section.

XI. PDL Height Management

Each lisp function, as it begins execution, raises the marked PDL pointer, AP, up above its arguments, to cover its frame, the first n positions of which are the homes of its n arguments. The PDL frame includes enough locations to cover the maximum number of regnant lambda variables at any point in the function body, and enough (multiplexed) temporary locations to hold all intermediate results and idups that need be stored. The size of the frame is determined by the end of function compilation, after the slotlist has been grown to its maximum length. The slotlist grows when a temporary is needed (usually because the AQ must be saved because the AQ is needed and its contents are non-damnable) and no current temporaries have damnable contents. Thus, quantities are not "pushed" onto the PDL⁶⁶ but are stored in fixed locations relative to the PDL pointer.

However, when a function calls another function, the first function must place the arguments for the second function above its own frame on the PDL, for the slots containing those arguments will become the first slots of the PDL frame of the second function, and be popped by it. The "arg temps" of function calls are not managed by the slotlist mechanism: when another function is about to be called, AP is bumped up to cover all of the arg temps⁶⁷ and the function arguments are loaded into the AQ and stored into those temps (by storearg), one by one, although an attempt is made to group together the loading of results with identical iloc'ings.

The AP can be bumped before the comp'ing of the arguments to the function, or after. It will be popped by the function itself on return from the call. The variable arg-height indicates by how many arg temps the PDL frame is extended; note that the addressing of arg temps is a constant off of AP regardless of the frame size, as they are above the frame no matter how large the frame is. This allows instructions which refer to them to be output as constants immediately, without the patching-back described above for other PDL references (see storearg).

The compiler tries to avoid bumping arg-height (and thus AP at runtime) until after the comp'ing of all the arguments. This is to save AP-pushing instructions. For instance, if the form

    (f1 (f2 (f3 (f4 ...) ...  )))

were being compiled, the first step of compilation of f1's form would be to push AP to cover f1's args. The next step would be to compile f2's form, which would start by pushing AP to cover f2's args as well, etc. If we avoid moving AP at all until somebody actually wants to store something there, we can push AP all at once.

In order to carry this out, a variable called bump-arg-height, normally zero, is set before the comping of any arguments to how high the PDL ought be bumped. When any attempt is actually made to store into an arg temp, which normally occurs after all the args have been comp'ed, storearg (or storearg-bb) calls a function called force-arg-height, which bumps AP by bump-arg-height, and sets the latter to zero.

There are several other circumstances that can cause the arg-height to be forced, i.e., the arg temps that are going to have to be there made to appear. The compiling of any kind of transfer (as seen by a reference to a tag by level-tag) implies that there must be some agreed-upon arg height at any tag. That is to say, we must know what arg-height the compiler will assume to be in effect at the tag to which we are transferring. If the tag is an internally generated one, let us say from a COND, AND, or OR, then the convention is that the arg height there (i.e., at the forward tag) is the same as it should be here (i.e., the current arg height after it has been forced). Thus, when a (conditional or unconditional) transfer is compiled, the arg height is forced, so that if we branch, AP is right, and if we don't, AP is what it had to be set sooner or later to anyway. The end or middle of the COND, AND, or OR must be part of the same argument to the same function as the beginning, i.e., the place where the transfer was encountered. Thus, there cannot be fewer or more arg-temps on the PDL at that place. This is why this convention is possible.

If the transfer is that to a user-defined (i.e., PROG) tag, which can only occur via the go fexpr, the situation is more complex. Consider the form

    (prog ()
       (foo a b
          (bar (cond ((p1) 
                      (go a)......
       (....)
     a (.....

At the time the GO is compiled, the arg height for the calls to foo and bar will already have been forced by the call to p1 to include the arg-temps of foo and bar. At the tag a, there are no arg-temps⁶⁸ on the PDL. If the (GO A) branch is taken, the compiler must pop AP down to the level at a. Therefore, when a GO is to be compiled, the compiler must know the arg-height at the tag which is being gone to, and generate code to pop AP. Note that the arg-height at the top-level of any PROG is the same throughout the PROG; this is because it is top level: PROG tags may only exist at top level. Thus, compprog maintains a list called framel of arg-heights associated with PROGs. When a GO is about to be compiled, unwindgoret goes down as many levels into this stack as the PROG depth that Pass 1 put in GO⁶⁹ forms to find the arg height of top-level of the PROG whose label⁷⁰ is being GOne to. unwindgoret calls popap enough times to generate code to unwind as many PROG frames and their arg heights as necessary.

The compiler forces arg-height when compiling function arguments, if it finds that one of the argument comp'ings left a result in the AQ. In this case, the one instruction needed to bump the AP now to store the result now more than makes up for the multiple instructions necessary to save this result and load it back after the arg height had been forced after all arguments had been compiled.

The compiler also forces arg-height at other random times, such as during the bindings of special variables (when the bindings will be above the args, the args must appear), and at PROG labels (to validate the assumptions about transfers to PROG labels.) CATCH and ERRSETs, to validate assumptions about transfers to these forms from within.

XII. Compilation for Predicate

In addition to the two notions of compilation for value and effects⁷¹ there exists in Pass 2 a notion of compilation "for predicate," meaning that a form is being compiled so that its value's nilness or non-nilness will be used to determine if a runtime conditional transfer will indeed transfer.

Consider the form:

   (cond ((eq x y) (setq z 3)))

The obvious compilation of this code would look something like this:

   (code-to-load-(eq x y)-into-AQ)
   compare-aq nil
   transfer-equal end-tag
   (code-for-setq)
end_tag:

Now the code to compute the eq form is going to do something like load the value of x, compare it with the value of y, and execute code to load either t or nil as a result. If the form we were dealing with were

          (setq b (eq x y))

that would be reasonable. But it is not, and the conditional loading of t or nil, only to test it again to see which one it was and branch on that, is wasteful and inefficient.

If the compiler of the cond could tell the compiler of the eq "Look, eq, you're one of these guys who makes a decision and loads t or nil (i.e., a predicate), so you know and I know that among your object code is going to be a conditional transfer to load one or the other. Why don't you avoid loading it, I'll avoid testing it, and we'll both save a lot of useless work. I'll give you a tag to which to generate your conditional transfer in case of runtime nil, and everybody will be happier."

Of course, I would not be telling you this if this were not what is in fact done. It is, however, a little more subtle than this. The compiler is optimized to handle the case of predicates being used as predicates: the case of predicates being used for value is seen by compsubr who proceeds to call the other case, which is called "compilation for predicate," handing it a tag, and loading t or nil based upon whether or not this tag is transferred to.

The function which compiles predicates is compred. It is called by compcond, compandor, and similar, and itself recursively, as would be expected. These functions call compred with any form which is to be compiled for predicate: if the form is not that of a lisp predicate, compred compiles it for value, and (via testnil) generates code to load t or nil based on the outcome. The critical arguments to compred are thus a predicate form and a tag to which a conditional transfer is to be generated. compred need not hand back a value-tag, just as compilation for effects need not hand back a value tag. Compilation for predicate does not result in any value.

There are several other tricks fundamental to compilation for predicate. Another argument to compred says whether the supplied tag is to be transferred to for nil or non-nil outcome. Thus, the compilation of not (or null) is simply a recursive call on compred with this flag inverted: not generates no code at all. The remaining argument to compred tells compred, if non-nil, that the value of the predicate form should be in the AQ when the conditional branch is taken. This is used, for example, in the compilation of AND and OR, where the value of the form is defined to be the value of the winning (or losing) form. Each form would be compiled internally with a load flag of t.

Now compilation for predicate can get awfully clever if conditional forms like AND are compiled for predicate⁷² the tags of the outer conditional can be passed down to the compilation of the inner conditional, and loading of conditional results can be avoided. compred is particularly clever about compiling CONDs for predicate in this way.

XIII. Object Type Strategy

Because Multics Lisp objects carry type information in their values, there is no overhead in storing a fixnum or flonum in storage as compared to implementations where space in a reserved area of the address space must be allocated. Temporary fixnum or flonum results may be stored in temporaries on the PDL as Lisp objects. If a numerical calculation is performed in the Q register, on fixnums, all that is needed to develop a LISP object is to load the type-bits word for a fixnum, which is a constant, into the A register. The AQ may now be stored as a LISP object.

The compiler maintains only one lisp object in the A, Q, or AQ at once. The variable AQ contains the value-tag for that object at all times, or nil if nothing or garbage is known to be in the AQ⁷³. The variable AQ-state tells whether or not a typed LISP object is in the AQ: if its value is nil, it is so. Otherwise, either there is a fixnum computation in the A (and AQ is the symbol A), or in the Q (and it is the symbol Q), or there is a flonum computation in the E and AQ registers⁷⁴.

When code is generated to load any LISP value, from the PDL or LISP linkage, or a function call has just been compiled, it is known that the AQ contains a typed object, so AQ-state is set nil. When fixnum or flonum constants are loaded from immediate operands of machine instructions, the AQ will not contain a LISP object. Before the AQ can be stored in an arg temp or in a PDL slot, it must be typed (by put-type-in-aq). Because there is only one PDL, and it is subject to garbage collection, only typed results may be stored there.

Local declaration allows the compiler to generate code to store fixnum and flonum results in the second words of PDL temporaries, knowing that the first word (the type bits) will be valid (those of an ungarbage-collectable fixnum or flonum). The programmer declares variables fixnum or flonum to promise this to the compiler. Thus, the compiler generates code at the beginning of each function to place valid fixnums and flonums in these slots (of value 0 or 0.0)⁷⁵ which are so declared.

The compiler also allocates temporaries on a typed basis. When the AQ has to be stored, and contains numbers, without type bits, saveaq attempts to find a typed temporary, such that the A or Q can be stored without loading the type bits⁷⁶. If there is no such temporary available, findtemp simply declares that the PDL frame of this function is one larger than it had been, and grows the slotlist, returning the iloc of the new temporary. The list slot-types maintains the type of each slot.

The compiler can get away with this because of the feature described under "Object Code List," whereby it goes back and patches the object code list for the beginning of the function to include code to initialize the type fields (initialize-slot-types does this) of these temporaries. The fact that all text references in the compiled function are instruction-counter relative allows code to be so patched in without disturbing that which was already compiled.

Appendices

Appendix A -- Value tag forms

The following is a table of the various forms value tags can take:

(quote (a b c)): Value tag for comping of a constant, '(a b c)
(g0006): Value tag for comping of a function call: Has a nil cdr.
(x . 357): Value tag for comping of a variable: Has a numeric cdr.
(x . home): The value tag in the home of x; i.e., the PDL slot where x lives.
(x . dup): A copy of (x . home) by loading, or from just having stored.
(x . idup): A copy made of (x . home) to save a previous value for an unrealized demand.
nil: No meaningful value in this register.

The following is a table of all of the various forms that "iloc" answers can take:

(quote xxx): A constant, usually to be put in constant table.
(special y): The value cell of the special variable y
(temp 6): PDL slot 3 (2 words each), arg height will be subtracted. Arg temps not returned by iloc, can never designate an arg temp.
AQ: The AQ register
BB: The BB register.

Appendix B -- NCOMPLR and LCP

Written February, 1988

So infrequently has the above strategy of compilation been explored in print, that the above paper has acquired a reputation of actually explaining NCOMPLR, the hoary spiritual parent of LCP. In fact, as stated in the preface, some of the important strategies of NCOMPLR are reproduced in the Multics Compiler, and explained in this paper.

As a gesture towards those attempting to understand what can be understood of NCOMPLR through this paper, and in recognition of the fact that no similar paper, published or not, has ever been written about the '10 compiler, I will enumerate here those notable differences in strategies and capabilities between the two compilers that are known to me.

The PDP-10's runtime architecture is far more suited to Lisp than is Multics'. A thirty-six bit PDP-10 register (there are 16 general purpose registers) can contain two Lisp pointers (in fact, the implied limitation on address space was one of the reason that a Lisp was built on Multics.) There are no type bits: address space is divided up by data type (the "BIBOP" ("BIg Bag Of Pages") scheme), and consequently, 18 bits is a full Lisp pointer. Thus, each register could potentially hold not just one, but two full Lisp pointers.

In fact, PDP-10 MacLisp divides the register space into registers that are allowed to, and restricted to, contain valid Lisp pointers (only one), and registers that are not permitted to contain valid Lisp pointers. The pointer-bearing registers are subject to garbage-collector scrutiny.
Like LCP, NCOMPLR can compile purely numeric computations without creating "official" Lisp pointers. But while LCP has one AQ to work with, NCOMPLR has multiple registers devoted to fixnum and flonum computation, and can iloc non-typed results in them.
Function arguments are passed in registers, when there are five or fewer, as opposed to being pushed on a stack. They are pushed on the stack if there are more than five, or a LEXPR is being called. This capability engenders far more efficient code than Multics Lisp can generate. It also allows, encourages, the compiler to "target" results into these registers. That is, when there is a choice, as there is for most inline operations, of in which register a computation will be performed and leave its result, the compiler prefers, and chooses, if free or freeable, that register where the result is expected by a function whose call is being compiled (for instance, register 3 if it is to be third argument). This is effectively an argument to comp.
In comparison, all fixnum arithmetic in Multics is done in the Q, all car/cdr-type indirection and most other operations that develop Lisp pointers leave results in the AQ, etc. (that is, there is no choice). This "targeting" in NCOMPLR adds great power, flexibility, and complexity.
ITS MacLisp, and NCOMPLR in particular, implement a very complex but ultimately very efficient strategy of arithmetic compilation. Since words do not bear type bits, fixnums and flonums in storage have to be allocated, and pointed to by Lisp pointers. This gives rise to the desire for arithmetic without need to allocate intermediate results. Toward this end, there are two entire PDLs for intermediate fixnum results and intermediate flonum results.
Some operations, such as storing a number in a special variable, or RPLAC'ing it into a cons, clearly require permanent allocation of the number. However, in order to allow allocation-free numeric computation, PDP-10 Lisp allows the passing of "intermediate" numeric results between functions without allocating them. Because of this, a function receiving any argument which cannot be positively verified at compile time as not being a pointer to a "temporary" number ("PDL number") must be extremely circumspect about what it does with that argument, in fact, calling a runtime routine named PDLNMK whenever a result of "unknown parentage" is stored into a permanent place. The routine checks to see if the result being stored is in fact a pointer to a PDL number, and allocates it permanently if so.
The necessity to call PDLNMK on a result which may not even be a number is a massive inelegancy, and a loss of efficiency. To fight fire with gasoline, NCOMPLR indulges in massive flow-analysis, keeping track of the "might be a PDL number status" of nearly every value passed around, while seeking every opportunity to strip a result of such status. The incremental amount of complexity added to the compiler for this strategy is substantial.
("The CAR/CDR hack.") Because indirection through general registers is possible and utterly standard on the PDP-10, NCOMPLR considers the computation of CAR's and CDR's so trivial that it is not considered a computation at all, but rather part of loading a value once it has been iloc'ed. This follows upon the heels of the reality that CAR/CDR chasing can be performed without destroying any register except the one attempting to be loaded. Because of this, value descriptions and slotlists, as well as the descriptions of unrealized demands, must record "delayed car-cdr'ing," and iloc-time operations must cause the pointer chasing to be realized. It also implies that not only setq's, but RPLACA's and RPLACD's must force the saving of some values needed for unrealized demands. This, of course, adds significant complexity to the value-management primitives.

One effect of this "optimization" is to cause common subexpressions, for instance as in
```
(+ (CADDR X) (CADDDR X))
```
to have their common pointer chasing shared. While this is extremely ingenious, it is clear that a general common-subexpression sharing strategy (such as in Multics PL/I) would accomplish not only this latter advantage, but more. It is also not clear whether good programming style, the use of DEFSTRUCT, a clever implementation of the latter, or more sophisticated complex-object-representation techniques, or all of above, lessen the advantage of many of these gains.
("Homeless compilation.") LCP compiles a setq by storing the result in the home of the variable, as discussed at length, even though a copy of the result is left in one of the two possible registers (AQ or BB) in which it could have been generated. Having many registers to work with, NCOMPLR does not store results into local variables at setq time, but simply leaves them ilocable wherever they were computed, and that becomes the new home. The "old home" becomes ohome⁷⁷
This "store avoidance" technique looks towards "data flow" compilation, that is, completely abandoning the notion of a "variable" representing a "storage location," interpreting it instead as a user-level tool for specifying the flow of results, and thereby wholly engulfing the concept of "variable" into the value-managment scheme. Wulf's BLISS compiler (see the Preface) operates on this latter principle. (Note that this strategy makes the task of run-time "debuggers," which can ask "what is the value of x at this point," very difficult or impossible.) The natural evolution of this potent strategy out of the LCP/NCOMPLR value management strategy is one of the main reasons for studying the latter.
This notion of variable value assignment as data-flow specification is also taken up productively by the Lambda, the Ultimate {Imperative, Declarative, Go-to} sequence of MIT AI Lab Memos (353, 379, etc.) by Steele and sometimes Sussman. This work culminates in Steele's monumental SCHEME compiler, RABBIT, described in MIT AI TR-474 (May, 1978).
NCOMPLR does not produce machine code, but symbolic, LAP code, which is subsequently fed through a LAP. While not truly significant, this strategy allows for simple inspection, for study, of object code. The Multics Compiler, on the other hand, involves an ingenious and complex little program, make_lisp_listing, which reconstructs the compiler's arg-height and PDL management from the object code in order to produce a coherent assembly-like listing.

The ITS data representation strategy is described excellently at length in MIT AI Lab Memo 420, Data Representation in PDP-10 MacLisp, by Guy Steele, dating from September 1977. His companion paper of the same date, Memo 421, Fast Arithmetic in MacLisp, describes the number and PDLNMK strategy in detail. (Neither deals with the details of the compiler, however.)

NCOMPLR is (was, in 1978, version 712) about one-and-a-half times as long as LCP. Functions seem to have many more arguments, many of which cover obscure cases. The implications of the complex strategies described above pervade the basic organs of the compiler. By the time I encountered NCOMPLR in 1977, the program had acquired a reputation of monstrous, convoluted complexity, to the point where no living person claimed to understand it, and no person could deterministically modify it, for new features or bug fixes, without fear of damaging it in a way he or she would not be able to undo: it had become a metaphor for complexity and opacity gone berserk. This inspired me to study and attempt to understand the code of NCOMPLR, and was largely borne out. Although a good piece of this complexity arises out of Lisp pointers not large enough to encode numbers (a difficulty present on neither Multics nor dedicated Lisp architectures), much of it, however, comes from the flexibility of many general-purpose registers, which is not present on Multics, if compensated for by the use of stack caches in more modern architectures.

At this point--

Appendix C -- Relevant Humor

Written February, 1988

In January 1978, while preparing for the annual teaching of the MIT IAP (January) Lisp Course, Daniel Weinreb and the author ran the above paper through "Dissociated Press," a toy program at the MIT AI Lab, that scrambles text while preserving strings of words, keeping joining words in common. Some remarkable wisdom passed its innocent lips, which soon acquired a reputation of its own. The output is reproduced here:

,can get you out of the home value of x, because I either just loaded by the function call operator (2) with the address of the of the object program. This is a block into which the function ("load it back after the arg height had been forced after all arguments had been compiled. When the first attempt is made of all unrealized variable demands for which we denote as "(x.home)", for this variable value, and calls damned thing. (3)"

lcp and NCOMPLR implement the second approach. The "globally addressable list" of comped-but-unused results is (x.<number>), and that number in second word.

This is insane. What we clearly want to do is not exactly clear, and is rooted in NCOMPLR.⁷⁸ Experiment and consultation have shown that indeed this alternative works. Bear in mind that this is not the concept of (x.---).

(x.1), (x.2), (x.3) is so that later demands on x might be satisfied from this same value; As more and more on this later. Let us also mention in this context that the obscurely-named function p1bug is used every time a conditional form as it happens.) Declaring arrays seen as (array foo ... the saved image becomes the real one.

The extended slotlist at a referenced tag is therefore the intersection of the final extended slotlists over all edges that list merged into all regnant lists after the previous binding of the particular list-head is restored (1). This is done by the

The pair of sentences indicated as bold above is found uproariously funny by many, on account of NCOMPLR's widespread, deserved notoriety as a program grown to stupendous complexity, well past the limits of human comprehension, let alone its own maintainabilty. The fact that "What we clearly want to do is not exactly clear" is rooted in NCOMPLR makes more than a little sense. The phrase "rooted in NCOMPLR" subsequently became current, and the two sentences above were incorporated by Weinreb at the head of the Symbolics (3600 Series) Lisp Compiler.

The Multics MACLISP Compiler

The Basic Hackery -- a Tutorial

By Bernard S. Greenberg, December, 1977

Copyright © 1977, 1988, 1996, Bernard S. Greenberg

13 February 1996

Table of Contents

29 February 1988

The Multics MACLISP Compiler

The Basic Hackery -- a Tutorial

December 1977

Exordium

I. Great Overview

II. Basic Passes and Recursions

III. Pass 1

IV. Multics Lisp Execution Environment

V. Pass 2

VI. Value Management: An Example

VII. Value Management: In Summary

VIII. Object Code List

IX. Branching and Tags

X. Branching and Value Management

XI. PDL Height Management

XII. Compilation for Predicate

XIII. Object Type Strategy

Appendices

Appendix A -- Value tag forms

Appendix B -- NCOMPLR and LCP

Written February, 1988

Appendix C -- Relevant Humor

Written February, 1988