     The  Data  Storage  and  Retrieval  subsystem  of  the  Data
Management System has  a need for a more  powerful error handling
mechanism  than  the standard  status code-based  mechanism.  The
current  status  code mechanism  is  inadequate for  the  DS&R in
basically four ways:

   - information loss;
   - incomplete information;
   - lack of debugging and error detection aids;
   - possible  performance  degradation   incurred  carrying  the
     "code" parameter from module to module to module.

The DS&R uses an error handling mechanism based on the signalling
mechanism  to  overcome  the   limitations  of  the  status  code

     Actually, only the user ring portion of the DS&R (all except
the  file_manager_) uses  this new  mechanism.  The file_manager_
and  the  Integrity  Services  subsystem are  excluded  from this
discussion so as  not to require dealing with  issues relative to
running  in an  inner ring.  This  is not  to say that  it is not
desirable to include  all of DMS.  Comments on  how to modify the
error mechanism described  in this MTB to be usable  by al of DMS
are welcome.

2.1 Information loss

     Due to the layered design of the DS&R subsystem, errors must
be reported in  a manner meaningful to the  caller at each layer.
The result of this requirement  is that errors encountered at low
levels are potentially translated at each layer's interface, each
time  losing information  about the actual  error.  The following
call/return  sequence  demonstrates  such a  loss  of information
through code translation:

     Call relation_manager_$get_tuple
       Call index_manager_$get_key
         Call collection_manager_$get_header
           Call collection_manager_$get_element
           Return with error code no_element
         Return with error code collection_not_found
       Return with error code collection_not_found
     Return with error code index_not_in_relation

The final  code, index_not_in_relation, alerts the  caller to the
problem on the  caller's level, i.e., the index  specified in the
call  is  not in  the  relation.  However,  the actual  error, no

element  was  found, is  lost.  The  DS&R needs  a way  to report
errors without losing such information.

2.2 Incomplete information

     This  example  also  illustrates  the  lack  of  information
reported   by   the   status   code   mechanism.    The   routine
collection_manager_$get_header   knows  the   element  for  which
collection_manager_$get_element  was  looking,   but  it  has  no
convenient way to  convey that information to its  caller or to a
user investigating  the problem.  Information such  as that which
can be provided  via sub_err_ would be very  helpful in detecting
errors, especially if there were information associated with each
interesting level on  the stack.  The following is  an example of
the type of information desired:

     (at relation_manager_$get_tuple):
          The  specified index  does not  exist in  the relation.
          The index with the identifier of 16o could not be found
          in the relation with opening identifier of 340561o.

     (at collection_manager_$get_header):
          The specified collection could  not be found.  There is
          no collection_header at control  interval 0, slot index

     (at collection_manager_$get_element):
          The  specified element  was not found.   The element at
          control interval 0, slot 16 has been freed.

This   information  quickly   points  out  the   existence  of  a
programming   error,    as   collection_manager_$get_header   and
collection_manager_$get_element   think   they  are   looking  at
different locations (slots 14 and 16 respectively).

2.3 Lack of ancillary error detection aids

     The manner in which the DS&R is used, both in production and
in  debug, lends  itself to  some additional  debugging and error
detection aids.  Two such aids are  the ability to log errors and
the  ability  to  maintain/report information  about  the process
state at the  time an error occurred.  Logging  certain errors is
desirable during debug to spot  all occurrences of certain errors
and in production to track commonly encountered errors.

     Debugging the  DS&R has required heavy  use of long absentee
processes.  Errors encountered in  an absentee are very difficult
to investigate  since developers can't examine  the process which

encountered the problem.  A trace of  the stack taken at the time
of  the  error  provides  a  good  deal  of  helpful  and  timely
information about the process  state.  Both logging and producing
stack traces  would both help developers  debug problems and Beta
test sites report problems accurately.

2.4 The status code parameter

     The DS&R  contains very many  modules and a  typical call to
the relation_manager_  produces a large number  of calls to lower
level modules.  Currently each call includes a code parameter and
most calls are followed by the standard cliche:

          if code ^= 0
          then call ERROR_RETURN;

The large majority of modules do not care what the code is, other
than that  is is zero  or non-zero, and  pass the code  on to the
caller.   Many modules,  then, incur an  unnecessary expense when
calling modules by having to pass  an argument of little use, and
further  after  the call  by having  to check  the value  of that
argument.  Most DS&R modules effectively  only want to be unwound
past if an error of any type  has occurred, and returned to if no
error has occurred.


     The dm_error_util_ mechanism is designed to meet some of the
needs of the  DS&R which standard status code  mechanisms fail to
satisfy.   The   major  components  of  the   mechanism  are  the
signalling mechanism, dm_error_objects, the dm_error_util_ module
and a protocol by which modules use the mechanism.

3.1 The signalling mechanism

     Modules  are alerted  of errors  via the  Multics signalling
mechanism.  The  majority of DS&R  modules do not  handle errors.
Currently most modules that receive a non-zero status code from a
called module  simply pass the  status code on  to their callers.
The  signalling  mechanism  approach  frees  these  modules  from
dealing  with  status  codes, removes  code-checking  cliches and
removes status codes from their calling sequences.

     Those modules which actually handle errors set up "on units"
for  the dm_sub_error_  condition.  The  dm_error_util_ mechanism
uses  the  dm_sub_error_ condition  in a  similar fashion  as the
sub_err_  system  routine  uses  the  sub_error_  condition.  The

dm_sub_error_  condition  is  used   instead  of  the  sub_error_
condition   so   as   to   avoid   possible   confusions  between
dm_error_util_ and sub_err_.

     The performance penalty paid by  those modules that must set
up  dm_sub_error_  on  units is  not  expected to  be  very high.
Because  the  number  of  modules  needing  such  an  on  unit is
considered to be low, the  relative price should be nomimal.  The
savings gained by freeing all  modules of processing status codes
may even be greater than  the performance penalty.  The reader is
reminded that this analysis is not based on actual measurements.

3.2 Error objects

     Associated with each instance of the dm_sub_error_ condition
is a linked list of dm_error_objects.  These dm_error_objects are
structures  which  contain  information  about  the  error  which
resulted  in the  signalling of  dm_sub_error_.  The  linked list
contains  one dm_error_object  created as a  result of signalling
the  error and  potentially one  dm_error_object for  each module
which handles  the error.  This  allows for information  about an
error  to  be described  in  high-level terms  (say, at  level of
relation_manager_$get_tuple) without discarding information about
the   error   described   in   terms   of   lower   levels  (say,
record_manager_$get_record and collection_manager_$get_element).

     The primary pieces of  information in an dm_error_object are
an error  code and a  message string.  The  message describes the
error  in  the context  of  the module  handling the  error.  The
message  is  just  like  the kind  of  message  supported  by the
sub_err_ system  routine.  Each module which  handles errors uses
the error code to determine if the error is the kind of error the
particular  module  wants  to  handle.  The  error  codes  in all
dm_error_objects of a single list need not be the same.  A module
may translate  the code to  a more appropriate code,  such as the
translation    of    no_element   to    collection_not_found   to
index_not_in_relation shown  in an earlier example.   The code is
translated  not by  changing the value  of the error  code in the
dm_error_object, but  by creating a new  dm_error_object with the
new code.  This behavior prevents the loss of information by code

3.3 dm_error_util_

     The DS&R modules use  dm_error_util_ operations to deal with
errors.   The  dm_error_util_  module contains  four  entries, as

          creates a dm_error_object and signals the dm_sub_error_
          creates a dm_error_object and  calls the system routine
          continue_to_signal_.   This  entry  is  called  from  a
          dm_sub_error_ handler.
          handles a specified error.  This entry is called from a
          dm_sub_error_ on unit.
          displays  information   about  dm_error_objects.   This
          entry  is called  by the  default_error_handler_ and by
          error reporting commands.

3.4 Protocol

     That power  of the signalling  mechanism is great  enough to
allow   for   many   complex   situations.    To   simplify   the
dm_error_util_  approach, restrictions  are placed on  the use of
the  signalling mechanism  and a  strict protocol  is defined for
proper use of the  dm_error_util_ operations.  These restrictions
do not actually  prevent the use of any  aspect of the signalling
mechanism;  rather they  spell out  those uses  which may produce
non-intuitive  or  problematic  results  that are  not  under the
control of dm_error_util_.


4.1 The basic model

     The modules  which use dm_error_util_  are easily classified
into  three groups:   modules that  detect an  error and  wish to
report  it;  DS&R modules  that  wish to  handle errors;  and the
default_error_handler_  and  commands  to  examine  errors.   The
dm_error_util_  mechanism is  easily discussed  by describing how
each group uses dm_error_util_.


     Any DS&R  module which detects  an error reports  that error
via  dm_error_util_$signal.  The  module supplies in  the call an
error code, the module's name, an error message and action flags,
as  when  calling   sub_err_.   dm_error_util_$signal  creates  a
dm_sub_error_info condition info structure and an dm_error_object
structure  in  the  "dm  free   area"  (the  area  returned  from

get_dm_free_area_)  and  signals dm_sub_error_  (via  the signal_
system routine).   If the caller of  dm_error_util_$signal has an
enabled  dm_sub_error_  on  unit,  that on  unit  will  catch the
condition, so the module must  understand that it should continue
the signal without handling it.

     Before returning, or if unwound, dm_error_util_$signal frees
all of the  dm_error_objects allocated as a result  of the error.
This is  necessary because the  objects are not  allocated in the
stack,  so  are  not  automatically released  when  the  stack is


     Any  module  which  wishes  to  handle  errors  must  have a
dm_sub_error_ on unit enabled.  The  on unit should have at least
one call  to dm_error_util_$handle, passing an  error code and an
entry variable to a handler routine  for the error.  If the error
code  matches  the  code  in  the  most  recent  dm_error_object,
dm_error_util_$handle invokes the handler with a standard calling

     The handler  can, in fact,  do anything it wants  to do, but
some  restrictions   are  necessary  to   guarantee  well-defined
behavior.  The  following four types  of action can  be taken, in
the manner described:

   - Continue  the  signal without  adding any  information.  The
     handler   should   call   the   continue_to_signal_   system
     subroutine and return.

   - Continue the  signal after adding  information.  The handler
     can add  an dm_error_object to the  list of dm_error_objects
     by  calling  dm_error_util_$continue_to_signal.   This entry
     creates  an dm_error_object,  fills it  with the information
     supplied in the parameters, links the dm_error_object to the
     previous  dm_error_object,  and  calls  continue_to_signal_.
     The     handler     should      return     after     calling

   - Stop  the signal.   The handler  can stop  the signal  via a
     non-local transfer of control or via a simple return without
     having called continue_to_signal_.  In  the former case, all
     stack frames more recent than  the one into which control is
     transferred  are  unwound  from the  stack,  causing cleanup
     handlers  to  be  invoked.   In the  latter  case, execution
     continues from  the point of the  original signal, i.e, from
     the statement after the call to dm_error_util_$signal.

   - Re-signal.  Any action which could cause dm_sub_error_ to be
     signalled  should  be  avoided  unless  the  on  unit  has a
     dm_sub_error_ on  unit of its own  enabled.  This is because
     the signalling mechanism will search all stack frames for on
     units, including  those that have already  handled the prior
     instance  of  the  dm_sub_error_  condition.   Such  actions
     include calling dm_error_util_$signal,  calling signal_ with
     the  dm_sub_error_  condition, or  calling any  module which
     might directly or indirectly signal dm_sub_error_.


     The  default_error_handler_  will  handle  the dm_sub_error_
condition  by  first  calling  dm_error_util_$display  to display
information about the error, then getting to a new command level.
dm_error_util_$display    finds    the    last    (most   recent)
dm_error_object in  the list of  dm_error_objects associated with
the   condition   and   displays    the   information   in   that
dm_error_object.   Existing  error  reporting  commands,  such as
reprint_error, can be changed or  new ones written to exploit the
ability of  dm_error_util_$display to display  optionally several

4.2 The dm_sub_error_ condition

     Following  is  a  Reference Guide-style  description  of the
dm_sub_error_ condition:


                    a Data Management  subroutine has detected an
                    error situation for which  it wants to signal
                    a  condition, often  with the  possibility of
                    continuing,  rather  than returning  a status
                    code.   The  dm_error_util_$signal subroutine
                    signals this condition.
     Default action:
                    prints  a  message  and  returns  to  command
                    level; however, the condition name printed is
                    not  dm_sub_error_ but  the module  name from
                    the dm_error_object in the data structure.
                    immediately     restartable,    conditionally
                    restartable, or not  restartable depending on
                    the particular  situation and how  the action

                    flags in the data structure are set.
     Data structure:
          dcl 1 dm_sub_error_info  aligned,
                2 header      like condition_info_header,
                2 dm_error_object_ptr ptr;


     dm_error_object_ptr  points to  an dm_error_object structure
     created by dm_error_util_$signal.

4.3 The dm_error_object structure

     The      dm_error_object       structure,      found      in
dm_error_object.incl.pl1, has the following format and meaning:

     dcl 1 dm_error_object      aligned based (dm_error_object_ptr),
           2 version            char (8) init (ERROR_OBJECT_VERSION_1),
           2 next_error_object_ptr
                                ptr init (null),
           2 prev_error_object_ptr
                                ptr init (null),
           2 dm_sub_error_info_ptr
                                ptr init (null),
           2 flags,
             3 begins_new_error bit (1) unal init ("0"b),
             3 mbz1             bit (33) unal init ("0"b),
           2 signalling_program_name
                                char (32) varying init (""),
           2 message            char (256) varying init ("");


            is     equal     to     ERROR_OBJECT_VERSION_1     in

            points  to  the next  most recent  dm_error_object in
            this chain of dm_error_objects.

            points  to the  next least  recent dm_error_object in
            this chain of dm_error_objects.

            points to  the dm_sub_error_info condition  info data
            structure  for  the  instance  of  dm_sub_error_ with

            which this dm_error_object is associated.

            if   on   indicates   that    this   is   the   first
            dm_error_object  associated with  an instance  of the
            dm_sub_error_ condition.  If dm_error_util_$signal is
            called   when  there   is  already   an  instance  of
            dm_sub_error_,   and   hence  already   a   chain  of
            dm_error_objects, the new  dm_error_object created by
            dm_error_util_$signal is added to the chain with this
            flag on to show that a new error has occurred.

            must be zero ("0"b).

            is  the  name  of   the  module  which  created  this
            dm_error_object, i.e.,  the last module  to signal or
            continue to signal dm_sub_error_.

            is a message describing the error.


     Although  the  main reason  for adopting  the dm_error_util_
model of  error handling is  for maintainability, it  is expected
that a performance enhancement may  be a welcome side-effect.  If
it  becomes  clear that  a  performance degradation  will result,
dm_error_util_ will not be used.

     Performance degradations  could result in two  ways:  in the
added expense  of setting up  a dm_sub_error_ on unit  and in the
added  expense  of  signalling  and  handling  the  dm_sub_error_
condition.  It  is speculated that  the savings re-couped  by the
removal  of  code parameters  and code  checking will  offset any
increase  in time  spent enabling on  units.  The rationalization
for this  argument lies in  the believe that so  few modules will
need   to  enable   the  on  units.    A  cursory   look  at  the
index_manager_  modules revealed  that three  of the thirty-three
modules   would   require  a   dm_sub_error_   on  unit   if  the
index_manager_ were converted directly to using dm_error_util_.

     The  cost of  signalling and  handling conditions  is only a
problem if the error ultimately is  found to not be an error.  An
example of including an error in  the normal and common course of
events    is    the     index_manager_'s    reliance    on    the
dm_error_$long_element.  In some cases, index_manager_ determines
if a key  will fit in a control interval  by attempting to put it

in the control interval.   If dm_error_$long_element is returned,
index_manager_ shifts keys around until  a space is found for the
new key.   It would be very  expensive for collection_manager_ to
signal dm_sub_error_ simply to  give index_manager_ a small piece
of information, especially since this is a very common occurence.
This  expense  can  be  bypassed in  this  case  by  changing the
relavent collection_mnager_  entry to return  a failure indicator
if  there is  not room  for the element  and to  report all other
errors  via  dm_error_util_.   In   fact,  such  a  scheme  would
eliminate   all   requirements   for   error   handling   in  the

     In short, there are known  performance penalties for using a
signalling-based model,  some of which are  offset by performance
gains  and some,  possibly the rest,  of which  are eliminated by
minor interface changes to a few select DS&R modules.


Name:  dm_error_util_

This module  is for reporting, handling  and displaying errors in
the Data  Management System.  The  report of an error  is made by
calling   the  $signal   entry.   These  error   signals  can  be
selectively caught and handled by using the $handle entry.

Entry:  dm_error_util_$signal

     This  entry is  for creating  and signalling  error objects.
Signalling  an  error object  means signalling  the dm_sub_error_
condition  where  the  condition   info  structure  points  to  a
dm_error_object structure.   An error object can  be caught using
the  $handle entry  from inside of  a dm_sub_error_  on unit.  If
there are more than one  error objects which have been signalled,
they are all chained together in a single list, the most recently
signalled  at the  head of the  list.  The default_error_handler_
can be  convinced to display  any number of the  error objects in
such a list.  It can also  be specified how much about each error
object is  displayed (by default)  by the default_error_handler_.
The $display  entry can be  used directly to  display the current
error object list.


     dcl dm_error_util_$signal entry options (variable);

     call dm_error_util_$signal (code, signalling_program_name,
          control_flags, message, message_args);


       code                         (Input)
            is a  standard system error code,  declared fixed bin

       signalling_program_name      (Input)
            is  the  name  of  the program  signalling  the error
            object, declared char (*).

       control_flags                (Input)
            is a  set of flags controlling  how the signalling of
            the error  object is to be  handled (e.g.  whether to
            log the error object in the DM system log, whether to
            create  a  trace_stack,  what  ACTION  flags  to  set
            defining the restartability  of the condition).  This
            is  declared  bit  (36) aligned,  and  is interpreted
            according  to  the  dm_error_flags  structure  in the
            dm_error_flags.incl.pl1 include file.   The flags can
            be   set  by   or-ing  the   DM_ACTION  constants  in
            dm_error_flags.incl.pl1, as in:

       message                      (Input)

            is  an  ioa_  control  string  for  a  message  to be
            associated with the error object being signalled.

       message_args                 (Input)
            is  any  number  of  arguments for  the  message ioa_
            control string.


     call dm_error_util_$signal
          dm_error_$ci_already_allocated, cm_allocate_ci,
          "^/Control interval  ^d in file  ^3bo, collection ^3bo,
          was marked as free in the file_reservation_map, but was
          already allocated.",
          control_interval_id, file_opening_id, collection_id);

     call dm_error_util_$signal
          dm_error_$key_out_of_order, im_rotate_insert,
          "^/The key  in node ^d,  slot ^d has a  value less than
          the  key in  node ^d,  slot ^d.   The former  should be
          greater than the latter.",
          new_key_id.control_interal_id, new_key_id.index,
          old_key_id.control_interval_id, old_key_id.index);

Entry:  dm_error_util_$continue_to_signal

     This entry is for adding an  error object to a list of error
objects and  continuing to signal  the most recent  error object.
Continuing   to   signal   an    error   object   means   calling
continue_to_signal_ from inside a  handler invoked by the $handle
entry.    where  the   condition  info  structure   points  to  a
dm_error_object structure.   An error object can  be caught using
the $handle  entry from inside  of a dm_sub_error_  on unit.  The
default_error_handler_ can be convinced  to display any number of
the error objects  in such a list.  It can  also be specified how
much  about each  error object is  displayed (by  default) by the
default_error_handler_.  The $display entry  can be used directly
to display the current error object list.


     dcl dm_error_util_$continue_to_signal entry options

     call dm_error_util_$continue_to_signal (code,
          signalling_program_name, message, message_args);


       code                         (Input)
            is a  standard system error code,  declared fixed bin

       signalling_program_name      (Input)
            is  the  name  of  the program  signalling  the error
            object, declared char (*).

       message                      (Input)
            is  an  ioa_  control  string  for  a  message  to be
            associated with the error object being signalled.

       message_args                 (Input)
            is  any  number  of  arguments for  the  message ioa_
            control string.

Entry:  dm_error_util_$handle

     This entry is used to invoke error handlers when the current
dm_error_object contains  an error of some  particular type.  The
error  handler  invoked  is  program  with  a  particular calling
sequence  which  can do  anything  the caller  of  $handle errors
desires.  However, the handler should obey the restrictions cited
in "Notes".   The call of the  $handle entry is made  from the on
unit for the dm_sub_error_ condition.


     dcl dm_error_util_$handle entry (char (*), entry variable,
          ptr, bit(1)aligned);

     call dm_error_util_$handle (error_type, handler_entry,
          handler_info_ptr, handled_sw);


       error_type                   (Input)
            is the name of an  error type, currently this must be
            the  same  as the  name  of an  error code,  and only
            matches error objects with that error code.

       handler_entry                (Input)
            is an entry to be invoked if there is an error of the
            specified type  in the current  dm_error_object list.
            The syntax of the handler is:

            dcl handler entry (char (*), ptr, ptr);
            call handler (error_type,

       handler_info_ptr             (Input)
            is a  pointer to a caller-defined  info structure for
            use by the caller-specified handler_entry.

       handled_sw                   (Output)
            is an  output flag which  indicates, if on,  that the
            error was handled.  This is  useful if an on unit has
            multiple  calls to  $handle, and wants  to stop after
            one such calls handles the error.

     The following code fragment illustrates a use of the $handle
entry to catch the dm_error_$no_element error:

     my_no_element_handler_info.return_label = EXIT;
     on dm_sub_error_ call dm_error_util_$handle

     call foo;

EXIT: return;

               proc (p_error_type,

     goto p_my_no_element_handler_info_ptr ->


Entry:  dm_error_util_$display

     This  entry displays  information from  the current  list of
error objects.


     dcl dm_error_util_$display entry (fixed bin (17) aligned);

     call dm_error_util_$display (depth);


       depth                        (Input)
            is the  number of error objects  in the current error
            list  (counting  from  the  "top",  or  most recently
            signalled) which are displayed.