Multics Technical Bulletin MTB-508 DM Architecture To: Distribution From: Lindsey L. Spratt Date: 05/14/81 Subject: Data Management: An Architectural Overview 1 ABSTRACT This document presents some of the basic concepts of the new Data Management architecture for Multics. Each of the major design areas is introduced, and a discussion of some of the more important interactions between them is presented. Comments should be sent to the author: via Multics Mail: Spratt.Multics on either MIT Multics or System M. via US Mail: Lindsey Spratt Honeywell Information Systems, inc. 4 Cambridge Center Cambridge, Massachusetts 02142 via telephone: (HVN) 261-9321, or (617) 492-9321 _________________________________________________________________ Multics project internal working documentation. Not to be reproduced or distributed outside the Multics project without the consent of the author or the author's management. Page i. CONTENTS Page 1 Abstract . . . . . . . . . . . . . . i 2 Introduction . . . . . . . . . . . . 1 2.1 Note on the Organization of Data Management Documents . . . . 1 3 The Data Management Architecture . . 3 3.1 Some Basic Definitions . . . . 3 3.1.1 The Layer . . . . . . . . 3 3.1.2 The Request . . . . . . . 4 3.1.3 The Object of Discourse . 4 3.2 Distributed System Architecture (DSA) and Multics Data Management 4 4 Example Use of DSR System . . . . . 5 4.1 Definition of the Database . . 5 4.2 Accessing . . . . . . . . . . . 7 5 Objects of Discourse in the DSR System . . . . . . . . . . . . . . . 7 5.1 Description of the Relationship Between the Objects of Discourse . 9 6 External Access Layer . . . . . . . 11 6.1 Services . . . . . . . . . . . 11 6.2 Naming . . . . . . . . . . . . 11 6.3 The Interface . . . . . . . . . 12 7 Container Access Layer . . . . . . . 13 7.1 Services . . . . . . . . . . . 13 7.2 The Interface . . . . . . . . . 13 7.3 Administrative Services . . . . 14 7.4 Naming . . . . . . . . . . . . 14 7.5 Internal Structure . . . . . . 14 8 Page Access Layer . . . . . . . . . 15 8.1 Services . . . . . . . . . . . 15 8.2 Naming . . . . . . . . . . . . 15 9 Media Access Layer . . . . . . . . . 15 9.1 Services . . . . . . . . . . . 15 10 Ancillary Services System . . . . . 16 10.1 Services . . . . . . . . . . . 16 10.1.1 Transaction Management . 16 10.1.2 Concurrent Access Synchronization . . . . . . . . 17 10.1.3 Recovery Management . . . 18 iii Multics Technical Bulletin MTB-508 DM Architecture 2 INTRODUCTION The work being done is not simply an attempt to "make vfile_ better" or to provide MRDS with a better file manager; it is to make data management on Multics better and more respected. Possibly the single-most important objective is to make data management a part of the operating system, not just an application on top of it. This means a large concentration of effort in hardcore-related issues. Part and parcel to making data management "better" is providing Multics with ways to cost effectively and reliably offer a coherent set of data management services. Some of these are: 1) Reliability and recovery against system failure. 2) Handling of very large sets of data. 3) Integrity of data. 4) Concurrent access synchronization. 5) Speed of access. 6) Transaction management. 7) Security of data, with both discretionary and non-discretionary access control. As lagniappe (something extra), we want to centralize redundant code as much as possible and make useful routines available as external entrypoints, both to save programming time and to facilitate the writing of user utilities that can make use of these features. An exhaustive list of the goals of data management can be found in MTB-481. 2.1 Note on the Organization of Data Management Documents There is a set of documents describing the new data management of which this document is intended as a broad overview. There is a level of documents immediately following this one which provide overviews of specific areas of the new Data Management architecture. Each of these documents, in turn, has the potential for a series of documents of a greater degree of detail deriving from it. Page 1. MTB-508 Multics Technical Bulletin DM Architecture The documents are as follows: 1) MTB-508: Architectural Overview - A discussion of the entire Data Management Architecture. 2) MTB-509: External Access Layer - Describes the various collections of data management services which the Data Management architecture provides to its users, and presents the elements of the implementation. 3) MTB-510: Container Access Layer - Describes the set of services which this layer provides and presents the implementation of these services. 4) MTB-511: Page Access Layer - Describes the set of services which this layer provides and presents the implementation. 5) MTB-512: Transaction Management Overview - Defines what transactions are, their uses, their interactions, and the techniques for managing them. 6) MTB-513: Recovery Management Overview - Describes the possible failure situations for which recovery of data is necessary, and also describes the techniques used to achieve this recovery. 7) MTB-514: Concurrency Management Overview - Describes how concurrent operations are synchronized. 8) MTB-515: Security - Describes how access security interacts with the architecture. 9) MTB-516: Distributed Databases - Describes how the data management architecture will deal with distributed environments. Additional documents will be produced to discuss in detail the specification and implementation of various parts of the Data Management Architecture. There are some other documents which are not part of the set of documents describing the Data Management Architecture, but relate closely to it. They are: MTB-481 - "A Framework of Goals toward more Complete Data Management Capabilities on Multics." MTB-483 - "Problems with Data Management on Multics." MTB-487 - "Medium Range Goals in Data Management." Page 2. Multics Technical Bulletin MTB-508 DM Architecture 3 THE DATA MANAGEMENT ARCHITECTURE The Data Management architecture is composed of two parts, the Data Storage and Retrieval system (DSR) and the Ancillary Services system (AS). These systems provide logically distinct sets of services. The DSR system has only the capability of storing and retrieving data, while the AS system knows what a transaction is, how to keep track of locks, and how to recover from various interruptions. DSR and AS may make use of each other, particularly DSR shall use AS, but it is possible to use one of them without the other as well. If one doesn't want to have transactions and all of the other things the AS can add to the DSR, it is reasonable to use the DSR system by itself. Similarly, one may use the locking, journalling, and transaction definition facilities of the AS system without using the DSR system, if one isn't interested in doing any data storage or retrieval via the DSR system. The DSR system is organized in several layers. The outermost layer is the one which all users of Data Management (DM) work through, or make their requests of; it is the External Access Layer. The complete list of (software) layers, in order of depth, is: 1) External Access Layer 2) Container Access Layer 3) Page Access Layer 4) Media Access Layer 3.1 Some Basic Definitions The terms "layer", "request", and "object of discourse" all have fairly specific meanings within the discussion of the Data Management Architecture. The explanations of these terms which follow are the intended meanings whenever one of the terms is used anywhere in the Data Management documents. 3.1.1 THE LAYER Within the proposed architecture the term "layer" has some strong implications. All of a layer's services are employed by invoking that layer's interface requests. All processing within a layer depends on the services of the next "lower" layer to implement data storage and retrieval "action" requests. This processing within a layer may employ certain portions of the requests of the AS, depending on the particular implementation of services being used. It is important to emphasize that the same implementation of DSR may use the AS from several layers, this is Page 3. MTB-508 Multics Technical Bulletin DM Architecture why the DSR system and the AS system are presented as separate systems; the DSR system is a user of the AS system, primarily in an advisory fashion. 3.1.2 THE REQUEST Each layer has two types of requests which it implements; "administrative" and "action". The administrative requests are those which allow one to specify the method, or implementation, which the layer is to use in carrying out its duties. Typically, the implementation of these requests does not require the services of any other layer of the DSR system, specifically the next lower layer. The action requests, already mentioned above, implement the data storage and retrieval services of the layer. The implementation of these requests necessarily does include the use of the action requests of the lower layer. In fact, a layer's action requests are only invoked by the implementation of the next higher layer. Administrative requests are only invoked from outside the DSR system. 3.1.3 THE OBJECT OF DISCOURSE At the interface for any given layer there exists a limited set of objects which can be named when using the requests of that layer. These are the objects of discourse for that layer. There is a set of objects for the administrative requests and one for the action requests. These two sets of objects of discourse, the objects of the administrative discourse and the objects of the action discourse, may be identical. It is not necessary that they be so, however. An example object of discourse is the "relation", when using the mdbm_util_ interface of the External Access Layer It is an object of both the action discourse and also the administrative discourse to the External Access Layer. 3.2 Distributed System Architecture (DSA) and Multics Data Management The basic outline of this proposal is drawn on the current work for DSA Data Storage and Retrieval. The Data Storage and Retrieval portion of Multics Data Management which is being proposed will be compatible with DSA DS&R sub-architecture. The Ancillary Services portion of Multics DM needs to be compatible with the Process and Resource Control subarchitecture of DSA to allow for transaction management, recovery, and synchronization within a DSA environment. Both architectures (Multics DM DS&R Page 4. Multics Technical Bulletin MTB-508 DM Architecture and DSA DS&R) employ the layering formalism, with a simple correspondence. The Internal Access Layer of the DSA DS&R has been incorporated into the External Access Layer of Multics DM DS&R. For the other layers of DSA, they have the same names as their corresponding layers in Multics DM. 4 EXAMPLE USE OF DSR SYSTEM To illustrate how the objects discussed in DSR interact with each other, and how they correspond with and are used by traditional database objects, consider the following relational database, office, composed of three relations: old_employees, current_employees, and managers. Both old_employees and current_employees relations have the attributes name, job and address; managers relation has attributes mgr_name and emp_name. The old_employees relation will rarely be accessed. The system was previously defined, in terms of page files and media, as such: Page File 1 is on Media File 1 (Tape A). Page File 2 is on Media File 2 (Disk A) and on Media File 3 (Disk B). 4.1 Definition of the Database When the database is created, the Database Administrator or the database management system describes the database as follows: 1) The database is defined to be the external file "office". 2) The containers which make up office are defined and created. 3) Each container is determined to use a certain page file. 4) Each container is described as to how storage management is to be done. These four steps yield the following set of containers which make up the "office" external file: Page 5. MTB-508 Multics Technical Bulletin DM Architecture office: external file old_employee_tuples: container, style3, Page File 1 ACTIVE_DATA: container, style4, Page File 2 old_employee_index: container, style1, Container ACTIVE_DATA employee_tuples: container, style3, Container ACTIVE_DATA employee_index: container, style1, Container ACTIVE_DATA manager_tuples: container, style3, Container ACTIVE_DATA manager_hash: container, style2, Container ACTIVE_DATA For this example, old_employee_tuples contains data for the old_employees relation, and old_employee_index contains that relation's index. Likewise for the other two relations. With each container is specified a page file that the container will reside on and a style of storage management. Page File 1 was chosen for old_employee_tuples because the old_employees relation is seldom referenced, and Page File 1 is on a tape device. However, old_employee_index is on Page File 2 (disk) because disk access is better for an index. For this example it is not important what style1, style2 and style3 actually correspondd to, except that the style of storage management is specifiable. The containers given make up the external file office. Figure 1 is the resulting configuration of external file, containers, page files and media files. External File Page Files Media Files . . . . . . . . . . . . . . . . . . --------------------------- . . | office.xf | . . --------------------------- . . . . Containers . . . . . . . . . . . . . . . . . . . . . . . . . . -------- . --------- . --------- . | old_t | . | page | . | media | . -------- . | file 1 | . | file 1 | . . --------- . --------- . . . . . . . . . . . . . . . . . . . . . . . . . --------------------------- . --------- . --------- . | ------- | . | | . | | . || old_i | | . | | . | media | . | ------- ACTIVE_DATA | . | | . | file 2 | . | | . | | . --------- . | ----------- ----------- | . | page | . --------- . || current_i | | current_t || . | file 2 | . | media | . | ----------- ----------- | . | | . | file 3 | . | ------- ------- | . | | . | | . || mgr_i | | mgr_t | | . | | . | | . | ------- ------- | . | | . | | . --------------------------- . --------- . --------- . . . . . . . . . . . . . . . . . . . . . . . . Page 6. Multics Technical Bulletin MTB-508 DM Architecture Figure 1. Organization of Example Database. Next the insides of each container must be defined. In this case, there are three kinds of containers. One is that which holds the complete tuple and is managed by a record manager. An example of this is the old_employee_tuples. The other is that which holds an index for a relation and is managed by an index manager. An example of this is the old_employees_index. The tuples are containers themselves, each of which contains elements that correspond to fields in a tuple. So, for example, old_employees_tuples container contains employee_tuple_N containers, where N is the identifier of the individual tuple, which contains name, address and job container elements. --------------------------------------------------------- | old_employee_tuples | | ---------------------- ---------------------- ... | | | employee_tuple_1 | | employee_tuple_2 | ... | | |----------------------| |----------------------| ... | | | name | addr | job | | name | addr | job | ... | | ---------------------- ---------------------- | --------------------------------------------------------- Figure 2. Tuple Implemented as a Container. 4.2 Accessing Each time a tuple is stored, current_employee_tuples is checked to see if there is room in it to fit another element, Page File 2 is searched for free space to store the tuple container element in, and Media Files 2 and 3 are searched for actual pages on disk to store the data. 5 OBJECTS OF DISCOURSE IN THE DSR SYSTEM Each layer in the Data Storage and Retrieval system has two objects of discourse, a file-like object and a record-like object. A file-like object comprises a set of record-like objects. In tabular form, the names of each kind of object for each layer are: Page 7. MTB-508 Multics Technical Bulletin DM Architecture Layer Name File-like Record-like Object Object ---------------------------------------------------------- | External Access | external file | record | | Layer | | | |----------------------|-----------------|-----------------| | Container Access | container | container | | Layer | | element | |----------------------|-----------------|-----------------| | Page Access Layer | unstructured | extent (address | | | file | and length) | |----------------------|-----------------|-----------------| | Media Access Layer | media | physical record | ---------------------------------------------------------- Figure 3. Major Objects of the Layers. In Data Structure Diagram form: Page 8. Multics Technical Bulletin MTB-508 DM Architecture ---------------- ---------------- | external file | | record | ---------------- --> ---------------- | | | | | | | | | | v v ---------------- ---------------- | container | | container | | | | element | ---------------- --> ---------------- ^ | ^ | | | | | | ------- | | | | | v ---------------- ---------------- | unstructured | | extent | | file | | | ---------------- --> ---------------- | | | | | | | | | | v v ---------------- ----------------- | media | | physical record | ---------------- --> ----------------- Figure 4. Data Structure Diagram of the Major Objects. The boxes identify the kind of object and the arrows indicate that a single instance of the kind of object at the root of the arrow identifies a set of instances of the kind of object at the head of the arrow. There can be any number of elements in the identified set. Page 9. MTB-508 Multics Technical Bulletin DM Architecture 5.1 Description of the Relationship Between the Objects of Discourse There are two "directions" of relationships to investigate; the relationships between objects in the same layer's discourse, and the relationships between objects in different discourses. Within the same discourse there are only two major kinds of objects, the file-like object and the record-like object. An instance of a file-like object identifies a set of instances of record-like objects. The multi-layer relationships are not obvious from a layer by layer presentation and at the same time are potentially more interesting. A few are presented, therefore, before the more detailed layer-by-layer is given. An external file can be mapped on to any number of page files. A page file can be mapped on to any number of media files, but page files may not share media files. Media files may "compete" for the same physical space on disks. An external file is composed of any number of records. When talking across the External Access Layer, using the action requests of this layer, these are the only objects which can be identified and manipulated. Instances of external files, are: - relational databases (see the "Example use of the DSR system" section), - indexed sequential files, and - unstructured files. Examples of records are: - tuples, - file records (with keys as names), and - bytes (or even single bits), respectively. An external file is built from containers. A container must be in one and only one external file. A record is built from container elements. A record may be composed of container elements from more than one container. This allows the construction of a record which is structured in a strikingly different fashion than the layout of the containers. A container is composed of any number of container elements. A container is similar in some respects to a pl/1 area. In this model, a container element is a piece of allocated storage (or freed storage), and the pointer is the container element identifier. A container which is not contained in other Page 10. Multics Technical Bulletin MTB-508 DM Architecture containers is an "exterior" container. Only one of these may be in a page file. A container is part of a page file. It is defined by one or more offsets and extents within a page file. Container elements are also defined by a set of one or more offsets and extents, but these must be within (a subset of) the offsets and extents of their parent container. A container is parceled out as container elements. A container element within a container may be defined to be a container in its own right. 6 EXTERNAL ACCESS LAYER This layer provides "logical" access to data. What are generally thought of as database managers are implemented in this layer and, hence, the services of database management are available as "action" requests of this layer. 6.1 Services There are several types of sets of services; these "types" are mutually exclusive. The term "access method" will be used to mean a type of service set. It is not necessarily possible to view a set of data through more than one access method. For instance, data organized for use with a sequential access method cannot be used with the relational access method. Some of the access methods are: 1) stream access (unstructured). 2) sequential access to record-like objects. 3) direct access to record-like objects (keyed lookups). The record-like objects referred to here may either be records organized by an index of some sort, or they may be the objects of which an index is composed ("keys"). 4) relational access (e.g., MRDS). 5) network access (e.g., IDS). 6) hierarchical access (For manipulating "tree" structured data simply). Page 11. MTB-508 Multics Technical Bulletin DM Architecture 6.2 Naming An external file is named by a pathname, which is a character string with ">" used to delimit components of the "path". The pathname of an external file is unique within a system, or node of a network. 6.3 The Interface There are many interfaces to the External Access Layer. Each different "External File Manager" in the External Access Layer defines the terms in which applications outside of the External Access Layer are to "speak" with it. Examples of existing systems which can be implemented as "External File Managers" are: MRDS, a relational manager; read_mail/send_mail, the "extended mail facility"; continuum; and vfile_'s several interfaces, unstructured, sequential, blocked, and indexed. To provide the above services, several classes of modules within the External Access Layer are used: 1) Index managers. For instance, a b-tree manager, a hash-table manager, an array manager. 2) Record manager(s). For instance, a variable size manager, a fixed size manager, a variable size within fixed size, pre-allocated blocks manager(a la "blocked" vfiles). Each of the foregoing would associate identifiers (which could be permanent) with records. This last is a major service. 3) Relation manager utility. This is currently implemented by mdbm_util_, under MRDS. This utility understands the structure of tuples, as stored in the record manager, and how these tuples are indexed (by one or more of the index managers). It translates primitive operations against tuples, such as "delete", "get", "put", "find", into operations of the index managers and the record manager. 4) Relational database manager (MRDS). This system is built on top of relation manager utility and understands the existence of multiple relations, searches and other operations which involve them, translating these operations into terms mdbm_util_ understands. The relational database manager may directly invoke index or record managers to manage some of its control or definitional data. The relational database manager relies totally on the relation manager utility and the index and record managers for all DS&R activity; and the relation manager utility relies totally on the Page 12. Multics Technical Bulletin MTB-508 DM Architecture index and record managers for all of its DS&R activity. Both the index and record managers use the Container Access Layer to manage their internal objects. For instance, a b-tree manager uses "nodes" as its storage objects. If it needs a new node or wants to delete an old one, it uses the allocate or free verbs, respectively. A record manager would use the container manager in similar circumstances, as well as when it needed to expand or contract a record. The put and get interfaces of the Container Access Layer would be used whenever "putting" and "getting" were to be done, since only the Container Access Layer knows how storage ids map into something the lower layers can understand. 7 CONTAINER ACCESS LAYER This layer provides logical storage management (allocation, freeing, and garbage-collection). 7.1 Services 1) Reading and writing of record-like objects addressed by unique "container element identifiers". 2) Allocation and freeing of arbitrary sized pieces of storage, referred to as "container element". 3) Creation and destruction of containers. 7.2 The Interface The following are the operations of the interface for the Container Access Layer: 1) Create a container in a specified page file. This (or the following) operation is required before any other Container Access Layer operation can be used. The name of the container to be created and the page file in which it is to be placed must be supplied. 2) Create a container in a specified container. The name of the container to be created and the container in which it is to be placed must be provided. 3) Destroy a container. The name (or local identifier) of the container to be destroyed must be provided as input. Page 13. MTB-508 Multics Technical Bulletin DM Architecture 4) Allocate a container element. The name of the container in which the element is to be allocated and the size of the element needed must be provided as input. Returns the name of the element allocated. 5) Free a container element. The name of the container and the name of the container element must be provided as input. Or, only the local identifier of the container element need be provided, as it implies the container. 6) Get the "contents" of a container element. The name of the container and the name of the container element must be provided as input (with the same exception about the local identifier as above). This is done whenever it's necessary to look at a container element. It is always necessary to go through the Container Access Layer to look at or alter data. 7) Put the "contents" of a container element. The name of the container and the name of the container element must be provided as input (with the local identifier exception). This is done whenever it's necessary to alter a container element. 7.3 Administrative Services Some of these services are: 1) Current state of a container, including space allocated. 2) Set the type of container management. 7.4 Naming The exterior container is named by a character string name, unique within its external file. Interior containers are identified as container elements within their continaining container. 7.5 Internal Structure Several different techniques of managing storage are available. Two examples are: a circular free-list with roving pointer; and, a multiple bucket first-fit. Containers are either implicitly or explicitly "typed" at creation time in terms of the style of storage management which is to be used on them. Page 14. Multics Technical Bulletin MTB-508 DM Architecture 8 PAGE ACCESS LAYER This layer is an interface between the higher layer in the architecture and the low-level Multics functions which manage the physical representation of the data. The actual physical representation of the data is hidden by means of this layer. This layer presents the following view of data storage to higher layers: 1) Data is stored in named entities called page files. The name of a page file within the system is unique (analogous to path names in the Multics Storage System). 2) A page file is a contiguous stream of bits, each addressable, with no practical limitation in size. 3) A section of a page file is referenced by specifying the page file name, the bit offset of the beginning of the section, and the length of the section in bits. 8.1 Services 1) Maintain page files as lower-level Multics structures. Create, destroy, and administer these structures as needed. 2) Translate page file names and sections within page files (bit offsets and lengths) into the appropriate parameters identifying the corresponding lower-level Multics structures. This can be thought of as converting logical addresses into physical addresses. 8.2 Naming Page files have pathnames which are unique per system. 9 MEDIA ACCESS LAYER 9.1 Services Moves data between physical storage locations in peripherals, such as a disk, and main memory locations. Basically, the services the disk Device Interface Module (DIM) provides. Page 15. MTB-508 Multics Technical Bulletin DM Architecture 10 ANCILLARY SERVICES SYSTEM This system is logically outside of the DSR architecture but provides services that are used by several layers of the DSR. The reason the system is outside of DSR is that its services are logically distinct from the storing and retrieving of data done by the DSR, and because these services can be used without using the DSR. Also, these services do not correspond directly to any one layer; many services can be used by any layer. These services also are not absolutely necessary for data storage and retrieval, but enhance the overall data management abilities. 10.1 Services The major services provided by the Ancillary Services system are: 1) Transaction definition management. 2) Concurrent access synchronization. 3) Recovery management. 10.1.1 TRANSACTION MANAGEMENT A transaction is a sequence of data management operations which form an atomic unit using the concepts of commit, checkpoint and rollback. Transactions are run by processes, but are not dependent upon the existence of any one process. Transactions control the state of data, in terms of the volatility and availability of data. Volatile data is data which has no existence beyond that of some particular transaction, the transaction which "owns" it. Permanent data does not depend on the existence of any transaction for its existence. Public data is that which can be seen by any transaction, private data is only available to one transaction. The generally usable data of a database is permanent and public. During the course of a transaction, the transaction will make some of the public data private and will create some volatile data. Committing a transaction makes all of its volatile data permanent and all of its private data public. Checkpointing a transaction transforms all of its volatile data into permanent data. Should all of the processes of a transaction die, the transaction can be resumed from the last checkpoint. Page 16. Multics Technical Bulletin MTB-508 DM Architecture Rolling back a transaction destroys all of the data it made volatile since a given checkpoint (perhaps the beginning of the transaction) and makes public all of the data that the transaction made private since that checkpoint. A committed transaction cannot be rolled back. Aborting a transaction requires rolling it back to the beginning of the transaction. 10.1.2 CONCURRENT ACCESS SYNCHRONIZATION If transactions were run one at a time, each transaction would see the consistent state left behind by its predecessor. If several transactions run concurrently, without properly synchronizing their actions, their interference may produce incorrect results and may cause the data base to become inconsistent. It is the purpose of concurrency management to regulate concurrent access to the data base in order to preserve its consistency and prevent all transactions from producing incorrect results. A system in which concurrent transactions are synchronized in such a way that their result is equivalent to SOME serial execution of these transactions is commonly regarded as correct with respect to concurrency. This property is known as the "serialization" property and can be obtained if all transactions behave according to the following protocol: 1) Never modify any data already read or modified by another transaction still in progress. 2) Never read any data already modified by another transaction still in progress. Read locks and write locks are available to assist transactions in enforcing these rules. A read lock can be granted to many readers while a write lock can be granted only to one writer. During its execution, a transaction requests a read lock on an object before reading it and a write lock on an object before modifying it. These locks are held by the transaction until the transaction commits or aborts, at which point they are released. It is therefore impossible for a transaction to see the modifications made by another transaction still in progress. As a result, if a transaction has to be undone, partially or completely, by the recovery mechanism, the rollback will not affect any other transaction. This locking protocol does not prevent deadlocks from occurring. The lock manager is responsible for detecting Page 17. MTB-508 Multics Technical Bulletin DM Architecture deadlock situations when they occur; one of the transactions involved in the deadlock will be interrupted and rolled back. The rollback will undo what the transaction has done and, in particular, unlock the locks set by the transaction. The transaction is rolled back to the nearest checkpoint that eliminates the conflict. 10.1.3 RECOVERY MANAGEMENT Recovery management guarantees that data cannot be lost and cannot be left inconsistent after program or system failure. The two important concepts in recovery are the "transaction" and the "checkpoint". The transaction is the basic unit of data base consistency; the checkpoint is the basic unit of data base recovery. A transaction may either run successfully to completion, in which case it is said to be committed, or abort (or be aborted) if it has no chance to reach completion. A committed transaction leaves the data base in a consistent state, and its effects must survive any system failure. A transaction to be aborted may have set the data base in an inconsistent state, and its effects must be completely undone. A transaction may be broken up into several consecutive phases, each of which being ended by a checkpoint operation. At a checkpoint, the data base may be inconsistent, since the transaction is not completed yet. However, the intermediate state of the data base is well understood by the particular transaction. In the event that the transaction is interrupted during the execution of a given phase, restoring the state of the data base as it was at the last checkpoint would allow the transaction to restart from the last checkpoint instead of having to be aborted, saving all the work done in the checkpointed phases. The recovery mechanism must therefore guarantee that it can undo the effect of an uncommitted transaction up to its last checkpoint. The method chosen to implement these recovery requirements is to log, for each modification made to the data base by a transaction, the value of the data before modification and its value after modification. The values before modification are logged in a journal called the Before Image Journal, which is used to undo, or "rollback", the effects of an interrupted transaction, up to the last checkpoint if transaction does not have to be aborted or completely if the transaction has to be aborted. Page 18. Multics Technical Bulletin MTB-508 DM Architecture The values after modification are logged in a journal called the After Image Journal, which is used to redo all committed transactions in the event of a catastrophic system failure. The data base has to be reconstructed from its last saved version to which all modifications recorded in the After Image Journal are applied. After all committed transactions are redone, the checkpointed phases of uncommitted transactions are redone to allow these transactions to resume execution from their last checkpoint. Page 19.