Multics Technical Bulletin                                MTB-556
  DM: Transaction Manager Design

  To:       Distribution

  From:     Steve Herbst

  Date:     09/06/84

  Subject:  Data Management: Transaction Manager Design

  1  ABSTRACT

       This   paper  describes   the  internal   operation  of  the |
  Transaction Manager and the Transaction Definition Table (TDT) it |
  uses to  keep track of  the state of  each transaction.  Included |
  are  brief  descriptions  of  the Transaction  Manager's  role in |
  system   initialization,   process   initialization,   and  crash |
  recovery.                                                         |

  Comments should be sent to the author:

  via Multics Mail:
     Spratt.Multics on either MIT Multics or System M.

  via US Mail:
     Lindsey Spratt
     Honeywell Information Systems, inc.
     4 Cambridge Center
     Cambridge, Massachusetts 02142

  via telephone:
     (HVN) 261-9321, or
     (617) 492-9321

  _________________________________________________________________

  Multics  project  internal  working  documentation.   Not  to  be
  reproduced or distributed outside the Multics project without the
  consent of the author or the author's management.

  Multics Technical Bulletin                                MTB-556

                            CONTENTS

                                                         Page

            1 Abstract  . . . . . . . . . . . . . . . .     i
            2 Introduction  . . . . . . . . . . . . . .     1
            3 Transaction Definition Table  . . . . . .     1
               3.1 TDT Header Structure . . . . . . . .     1
                     The tm_tdt structure . . . . . . .     1
               3.2 TDT Entry Structure  . . . . . . . .     2
                     The tm_tdt_entry structure . . . .     2
               3.3 Current Transaction Info . . . . . .     4
               3.4 System Initialization  . . . . . . .     5
               3.5 Recovery . . . . . . . . . . . . . .     5
               3.6 Detailed Description of Operations .     5
                  3.6.1 begin_txn . . . . . . . . . . .     6
                  3.6.2 commit_txn  . . . . . . . . . .     6
                  3.6.3 abort_txn . . . . . . . . . . .     7
                  3.6.4 rollback_txn  . . . . . . . . .     8
                  3.6.5 suspend_txn . . . . . . . . . .     8
                  3.6.6 resume_txn  . . . . . . . . . .     8
                  3.6.7 adjust_txn  . . . . . . . . . .     9
                  3.6.8 adjust_tdt  . . . . . . . . . .     9
                  3.6.9 adjust_process_id . . . . . . .     9
                  3.6.10 per_system_init  . . . . . . .    10
                  3.6.11 per_process_init . . . . . . .    10
                  3.6.12 recover_after_crash  . . . . .    10

  DM: Transaction Manager Design

  2  INTRODUCTION

       The purpose  of Data Management's Transaction  Manager is to |
  maintain  the  consistency   of  transaction  operations  despite |
  process  interruption  and  system  failure.   It  does  this  by |
  following strict protocols in the order of its internal steps and |
  by recording each  step it takes as a  unique transaction "state" |
  in the  TDT entry corresponding  to the transaction.   Using this |
  state, it can restart any interrupted operation.                  |

       Beginning and  completing transactions involves  locking and |
  unlocking Data Management tables,  writing journals, and flushing |
  modified  pages of  data to disk.   These operations  are done by |
  calls   to   the    other   managers:    before_journal_manager_, |
  file_manager_,  and  lock_manager_.   For  descriptions  of these |
  facilities, see:                                                  |

     MTB-553:  File Manager Functional Specifications               |
     MTB-557:  Lock Manager Functional Specifications               |
     MTB-559:  Before Journal Manager Functional Specifications     |

  3  TRANSACTION DEFINITION TABLE

       The  Transaction  Definition  Table (TDT)  is  a system-wide
  table containing runtime information  about transactions that are
  currently in progress  or are in the process  of being committed,
  rolled back, or aborted.  The  TDT is logically divided into four
  parts, each managed exclusively by one of the Transaction, Before
  Journal, File and Lock Managers.  Header information is contained
  in the  part managed by  the Transaction Manager,  called tm_tdt.
  This  table  also contains  information about  the state  of each
  transaction in progress.

  3.1  TDT HEADER STRUCTURE

  The tm_tdt structure                                              |

  The  portion  of the  TDT  maintained by  transaction_manager_ is |
  declared  in  the  include  file  dm_tm_tdt.incl.pl1.   The table |
  header is declared as follows:                                    |



| dcl 1 tm_tdt                  aligned based (tm_tdt_ptr),
|       2 version               char (8),
|       2 lock                  fixed bin (71),
|       2 last_uid              bit (27) aligned,
|       2 flags,
|         3 no_begins           bit (1) unaligned,
|         3 mbz2                bit (35) unaligned,
|       2 entry_count           fixed bin,
|       2 entry (tdt_max_count  refer (tm_tdt.entry_count))
|                               like tm_tdt_entry;

| where:

| version
|    is the version of the structure, currently "TM-TDT 2".

| lock
|    can be used to lock the table.  It is not currently used.

| last_uid
|    is  a  bit  string  used  to  generate  the  next  transaction
|    identifier.

| no_begins
|    is          turned          on          temporarily         by
|    transaction_manager_$recover_after_crash  to  prevent  any new
|    transactions from beginning while recovery is taking place.

| entry_count
|    is the total number of entry slots allocated.

| entry
|    is the array of TDT entries.

  3.2  TDT ENTRY STRUCTURE

| The tm_tdt_entry structure

| The individual transaction entry is  declared in the include file
| dm_tm_tdt.incl.pl1.



  dcl 1 tm_tdt_entry            aligned based (tm_tdt_entry_ptr),   |
        2 process_id            bit (36) unaligned,                 |
        2 event_channel         fixed bin (71),                     |
        2 transaction           aligned,                            |
          3 txn_id              bit (36) aligned,                   |
          3 date_time_created   fixed bin (71),                     |
          3 mode                fixed bin (17) unaligned,           |
          3 state               fixed bin (17) unaligned,           |
          3 error_code          fixed bin (35),                     |
          3 return_idx          fixed bin (17) unaligned,           |
          3 flags,                                                  |
            4 dead_process_sw   bit (1) unaligned,                  |
            4 suspended_sw      bit (1) unaligned,                  |
            4 error_sw          bit (1) unaligned,                  |
            4 mbz1              bit (12) unaligned,                 |
          3 post_commit_flags,                                      |
            4 (fmgr,                                                |
               bjmgr,                                               |
               ajmgr)           bit (1) unaligned;                  |

  where:                                                            |

  process_id                                                        |
     is the unique identifier of  the owner process.  This field is |
     filled in by transaction_manager_$per_process_init and remains |
     unchanged for the life of the process.                         |

  event_channel                                                     |
     is an event-call  channel used by the Daemon  to send messages |
     to  the  owner  process.   This field  is  also  filled  in by |
     transaction_manager_$per_process_init.                         |

  txn_id                                                            |
     is   the  unique   identifier  of  the   transaction,  set  by |
     transaction_manager_$begin_txn.                                |

  date_time_created                                                 |
     is  the  time  that  transaction_manager_$begin_txn  began the |
     transaction.                                                   |

  mode                                                              |
     is  the  mode passed  to  transaction_manager_$begin_txn.  The |
     available   modes    are   listed   in    the   include   file |
     dm_tm_modes.incl.pl1, and  include special modes  used to test |
     and meter the system.                                          |

  state                                                             |
     is  the transaction's  state, used to  maintain consistency of |
     operations  on  the  transaction.   The  available  states are |
     listed in the include file dm_tm_states.incl.pl1.              |

  error_code                                                        |



|    if  error_sw (below)  is on,  this field  contains the nonzero
|    error   code   returned   by   the  last   entry   point  that
|    transaction_manager_ called.

| return_idx
|    currently unused, this field is  reserved for the index of the
|    parent  transaction  when  transactions   are  allowed  to  be
|    stacked.

| dead_process_sw
|    can be  turned on to  cause transaction_manager_$adjust_txn to
|    adjust the transaction even though process_id corresponds to a
|    live process.  It is not currently used.

| suspended_sw
|    is   ON  if   this  transaction  is   currently  suspended  by
|    transaction_manager_$suspend_txn.

| error_sw
|    is ON  if a transaction_manager_ operation  received a nonzero
|    error code  from one of  the entry points it  called.  If this
|    switch is  ON, state is equal  to one of the  error states.  A
|    transaction in error is eventually logged and aborted.

| post_commit_flags
|    are  flags  indicating  that  post-commit  operations  must be
|    performed  after  a  commit  by calling  entry  points  in the
|    appropriate   managers.    The  three   flags   correspond  to
|    file_manager_,      before_journal_manager_,      and      the
|    not-yet-implemented after_journal_manager_.
*

  3.3  CURRENT TRANSACTION INFO

|      The  following  fields are  maintained  in the  data segment
| dm_data_ for the use of all programs in the user process:

| dm_data_$current_txn_id       bit (36) aligned;
|      The unique identifier of the current transaction, or "0"b if
|      there is no current transaction.
| dm_data_$current_txn_index    fixed bin;
|      The index of the current transaction's TDT entry.
| dm_data_$suspended_txn_id     bit (36) aligned;
|      This  is  "0"b  unless  transaction_manager_$suspend_txn has
|      been called.
| dm_data_$suspended_txn_index  fixed bin;
| dm_data_$tm_tdt_ptr           ptr;
|      A pointer to transaction_manager_'s TDT.
| dm_data_$my_tdt_index         fixed bin;
|      The  index   of  the  process'  TDT   entry,  in  which  its
|      transaction (only one allowed)  is recorded.  A process uses
|      the same TDT slot throughout its life.



       User             programs             should            call |
  transaction_manager_$get_current_txn_id  rather   than  refer  to |
  dm_data_$current_txn_id  or dm_data_$current_txn_index,  since it |
  takes  precautions  about   interrupted  transactions  and  other |
  unusual situations and is guaranteed to be correct.  The dm_data_ |
  values are widely used within transaction_manager_.               |

  3.4  SYSTEM INITIALIZATION

       The  TDT (per  bootload) is  created and  initialized by the |
  Initializer   process  at   system  start-up,   via  a   call  to |
  dm_initializer_.  Initialization of  the current transaction info |
  (per process)  is triggered by  a first-reference trap  the first |
  time a process references the data segment dm_data_.              |

       These  mechanisms are  described in more  detail in MTB-592, |
  "Data Management:  System Structure".                             |

  3.5  RECOVERY

       When  the system  is brought  up after  a crash,  one of the |
  first   things  the   Daemon  does   is  call   the  entry  point |
  transaction_manager_$recover_after_crash.    This   entry   point |
  rebuilds  a  TDT  containing   all  the  unfinished  transactions |
  reconstructed  from the  last bootload,  and aborts  all of them. |
  Then  it zeroes  out the  TDT and  allows new  transactions to be |
  begun.                                                            |

       The  recovery  protocol  is  described  in  more  detail  in |
  MTB-603, "Data Management:  Crash Recovery".                      |

  3.6  DETAILED DESCRIPTION OF OPERATIONS

       The Transaction Manager is concerned mainly with keeping the |
  TDT  consistent   and  keeping  track   of  the  state   of  each |
  transaction.  The rest of the work  is done by calls to the other |
  managers.  Before each such call,  the transaction's state is set |
  to a value that indicates the routine that is about to be called. |
  (See  the  include  file  dm_tm_states.incl.pl1  and  the program |
  tm_cleanup.)  If  the called routine returns  an error code, this |
  code  is recorded  in the TDT  entry and  the state is  set to an |
  error value.  A transaction that is in an error state can only be |
  modified  by  special  error-logging   routines;  for  all  other |
  purposes, it remains in an error state.                           |

       Most  operations,  when  they   find  a  transaction  in  an |
  intermediate state,  call the internal  utility tm_cleanup.  This |
  routine uses the value of state  to decide which calls to make to |
  complete the unfinished operation.  The contract of tm_cleanup is |
  to  leave  the  transaction  either  aborted  (if  the unfinished |



| operation was an abort) or  in the "in-progress" state.  The name
| of    this   state    as   declared    in   the    include   file
| dm_tm_states.incl.pl1 is TM_IN_PROGRESS_STATE.

|      An  important  part of  each  protocol listed  below  is the
| setting of the transaction's state  before and after each call to
| another routine.  The action taken by tm_cleanup when cleaning up
| an  unfinished  operation  is  to start  at  the  step  where the
| original operation was interrupted and  proceed to the end of the
| operation.

  3.6.1  BEGIN_TXN

       This  entry  point  is callable  by  users as  well  as Data
  Management system routines.  It takes the following steps:

| o    If the TDT's no_begins flag is on (running recovery), return
|      the error code dm_error_$no_begins.

| o    If    there    is     a    current    transaction,    return
|      dm_error_$transaction_in_progress.

| o    If    there    is    a    suspended    transaction,   return
|      dm_error_$transaction_suspended.

| o    Zero the transaction  info in the TDT entry  and fill in the
|      current clock time.

| o    Generate a new transaction id and put it into the entry.

| o    Call before_journal_manager_$write_begin_mark.

| o    Set    the    values    of    dm_data_$current_txn_id    and
|      dm_data_$current_txn_index.

| o    Set the transaction's state to "in-progress".
*

  3.6.2  COMMIT_TXN

       This operation consists of the following steps:

| o    If  there  is no  current  or suspended  transaction, return
|      dm_error_$no_current_transaction.

| o    If     the     transaction      is     suspended,     return
|      dm_error_$transaction_suspended.

| o    If the  state of the transaction  is not "in-progress", call
|      tm_cleanup to complete any unfinished operation.

| o    If    running   in    one   of   the    test   modes,   call



       transaction_manager_$abort_txn and return.                   |

  o    Call before_journal_manager_$flush_transaction  to flush the |
       appropriate journals.                                        |

  o    Call file_manager_$flush_modified_ci.                        |

  o    Call before_journal_manager_$write_committed_mark.           |

  o    Perform any necessary post-commit  operations by calling the |
       appropriate managers' post_commit entry points.              |

  o    Call lock_manager_$unlock_all  to release all  locks held by |
       the transaction.                                             |

  o    Zero dm_data_$current_txn_id and dm_data_$current_txn_index. |

  o    Zero the transaction info in the TDT entry.                  |

  3.6.3  ABORT_TXN

       The following steps are involved:

  o    If  there  is no  current  or suspended  transaction, return |
       dm_error_$no_current_transaction.                            |

  o    If  the transaction  is suspended,  abort is  still allowed. |
       Temporarily resume the transaction.                          |

  o    If  the  state  is  not  "in-progress",  call  tm_cleanup to |
       complete any unfinished operation.                           |

  o    Call before_journal_manager_$flush_transaction  to flush the |
       appropriate journals.                                        |

  o    Call  before_journal_manager_$rollback  to   roll  back  the |
       transaction.                                                 |

  o    Call file_manager_$flush_modified_ci.                        |

  o    Call before_journal_manager_$write_aborted_mark.             |

  o    Perform  any  necessary post-commit  actions by  calling the |
       appropriate managers' post_commit entry points.              |

  o    Call lock_manager_$unlock_all  to unlock all  locks held for |
       this transaction.                                            |

  o    Zero dm_data_$current_txn_id and dm_data_$current_txn_index. |

  o    Zero the transaction info in the TDT entry.                  |



| 3.6.4  ROLLBACK_TXN

|      This  entry point  rolls a  transaction back  to a specified
| checkpoint,  currently always  the beginning  of the transaction.
| It takes the following steps:

| o    If  there  is no  current  or suspended  transaction, return
|      dm_error_$no_current_transaction.

| o    If the transaction is  suspended, rollback is still allowed.
|      Temporarily resume the transaction.

| o    If  the   caller  does  not  own   the  transaction,  return
|      dm_error_$not_own_transaction.

| o    If  the  state  is  not  "in-progress",  call  tm_cleanup to
|      complete any unfinished operation.

| o    Call before_journal_manager_$flush_transaction  to flush the
|      journals.

| o    Call before_journal_manager_$rollback.

| o    Call file_manager_$flush_modified_ci.

| o    Call before_journal_manager_$write_rolled_back_mark.

| o    Call  lock_manager_$unlock_to_checkpoint with  the specified
|      checkpoint number.

| o    Set the transaction's state to "in-progress".

| 3.6.5  SUSPEND_TXN

|      This   entry  point   suspends  the   current  (in-progress)
| transaction,  preventing it  from being  used for  protected file
| operations until transaction_manager_$resume_txn  is called.  The
| following steps are involved:

| o    If    there    is    no    current    transaction,    return
|      dm_error_$no_current_transaction.

| o    Copy  dm_data_$current_txn_id  to dm_data_$suspended_txn_id,
|      dm_data_$current_txn_index to dm_data_$suspended_txn_index.

| o    Set      dm_data_$current_txn_id       to      "0"b      and
|      dm_data_$current_txn_index   to   0,   so   that  subsequent
|      operations cannot reference the current transaction.

| o    Turn on the transaction's suspended_sw in the TDT.



  3.6.6  RESUME_TXN                                                 |

       This     entry     point    reverts     the     effect    of |
  transaction_manager_$suspend_txn,    restoring     the    current |
  transaction.  The following steps are involved:                   |

  o    If   there   is  a   current  transaction   defined,  return |
       dm_error_$transaction_in_progress.                           |

  o    If    there    is   no    suspended    transaction,   return |
       dm_error_$no_suspended_transaction.                          |

  o    Copy  dm_data_$suspended_txn_id  to dm_data_$current_txn_id, |
       dm_data_$suspended_txn_index to dm_data_$current_txn_index.  |

  o    Set      dm_data_$suspended_txn_id      to      "0"b     and |
       dm_data_$suspended_txn_index to 0.                           |

  o    Turn off the transaction's suspended_sw in the TDT.          |

  3.6.7  ADJUST_TXN

       This entry  point is run only  in the Data_Management.Daemon
  process.  When a Data  Management program discovers a transaction
  belonging to a  dead process, it sends a wakeup  to the Daemon to
  adjust the transaction.  The following steps are involved:        |

  o    If the  process that owns  the transaction is  still active, |
       return dm_error_$transaction_in_progress.                    |

  o    Call  tm_adopt  to begin  executing on  behalf of  the owner |
       process.                                                     |

  o    Unless the transaction  is in an error state  or an abort or |
       commit mark may already have  been written, set the state of |
       the transaction to force abortion.                           |

  o    Call  tm_cleanup  to  perform  the  abort  or  complete  the |
       unfinished commit.                                           |

  o    Call tm_abandon to reverse the effect of tm_adopt.           |

  3.6.8  ADJUST_TDT                                                 |

       Called  only  in  the  Data_Management.Daemon  process, this |
  routine  adjusts each  transaction in the  TDT that  belongs to a |
  dead process.                                                     |



| 3.6.9  ADJUST_PROCESS_ID

|      Called  only  in  the  Data_Management.Daemon  process, this
| routine adjusts  all transactions (currently  only one) belonging
| to  a  specified dead  process.   It is  called when  the process
| terminates and also  when there is contention for  a lock held by
| the process.

| 3.6.10  PER_SYSTEM_INIT

|      This entry point, called in the Initializer process, creates
| and initializes the TDT.

| 3.6.11  PER_PROCESS_INIT

|      This entry point, called by each user process the first time
| it references dm_data_, obtains a  TDT entry for exclusive use by
| the  process  and initializes  the values  of various  entries in
| dm_data_.  The following steps are involved:

| o    Calculate  a  pointer  to  the  TDT  previously  created  by
|      transaction_manager_$per_system_init.

| o    If  there are  any TDT  entries reserved  by inactive (dead)
|      processes, send  a wakeup to  the Daemon causing  it to call
|      transaction_manager_$adjust_tdt.

| o    Reserve  the  first  available  TDT  entry  by  writing  the
|      process'  unique  identifier into  it.   Store the  index in
|      dm_data_$my_tdt_index.      Observing      the     temporary
|      implementation restriction of one  transaction at a time per
|      process, all  operations in this  process will use  the same
|      reserved TDT entry.

| o    Zero   dm_data_$current_txn_id,  dm_data_$current_txn_index,
|      dm_data_$suspended_txn_id, dm_data_$suspended_txn_index.

| 3.6.12  RECOVER_AFTER_CRASH

|      This  entry point  is called in  the Daemon  process after a
| crash to  complete all transactions that  were interrupted by the
| crash.   It   is  passed  two  data   structures  containing  the
| information   needed  to   reconstruct  the   transactions.   The
| following steps are involved:

| o    Build a TDT from information passed in the input structures.

| o    Turn on  the TDT's no_begins  switch, temporarily preventing
|      any new transactions from beginning.



  o    Call before_journal_manager_$rebuild_after_crash  to rebuild |
       the bjm's TDT.                                               |

  o    Abort all transactions in the  rebuilt TDT, on behalf of the |
       owner processes.                                             |

  o    Turn off the TDT's no_begins switch.                         |