Tuesday, July 3, 2012

SQL Server: Transaction Log Architecture


--------------------------------------- Logical Architecture --------------------------------------


-   Transaction log guarantees the data integrity of the database and data recovery.

-   Transaction log is a string of log records. Each log record is identified by a log sequence number (LSN).

-   Each new log record is written to the logical end of the log with an LSN that is higher than the LSN of the record before it.
-   Log records are stored in a serial sequence as they are created. Each log record contains the ID of the transaction that it belongs to. For each transaction, all log records associated with the transaction are individually linked in a chain using backward pointers that speed the rollback of the transaction.

-   Log records for data modifications record either the logical operation performed or they record the before and after images of the modified data. The before image is a copy of the data before the operation is performed; the after image is a copy of the data after the operation has been performed.

-   The steps to recover an operation depend on the type of log record:
  • Logical operation logged
    • To roll the logical operation forward, the operation is performed again.
    • To roll the logical operation back, the reverse logical operation is performed.
  • Before and after image logged
    • To roll the operation forward, the after image is applied.
    • To roll the operation back, the before image is applied.
Many types of operations are recorded in the transaction log. These operations include:
  • The start and end of each transaction.

  • Every data modification (insert, update, or delete). This includes changes by system stored procedures or data definition language (DDL) statements to any table, including system tables.

  • Every extent and page allocation or deallocation.

  • Creating or dropping a table or index.
Rollback operations are also logged. Each transaction reserves space on the transaction log to make sure that enough log space exists to support a rollback that is caused by either an explicit rollback statement or if an error is encountered. The amount of space reserved depends on the operations performed in the transaction, but generally is equal to the amount of space used to log each operation. This reserved space is freed when the transaction is completed.
The section of the log file from the first log record that must be present for a successful database-wide rollback to the last-written log record is called the active part of the log, or the active log. This is the section of the log required to do a full recovery of the database. No part of the active log can ever be truncated.



----------------------------------------- Physical Architecture -------------------------------------


-   The transaction log is used to guarantee the data integrity of the database and for data recovery.

-   The transaction log in a database maps over one or more physical files. Conceptually, the log file is a string of log records. Physically, the sequence of log records is stored efficiently in the set of physical files that implement the transaction log.

-   The SQL Server Database Engine divides each physical log file internally into a number of virtual log files. Virtual log files have no fixed size, and there is no fixed number of virtual log files for a physical log file. The Database Engine chooses the size of the virtual log files dynamically while it is creating or extending log files.  The size or number of virtual log files cannot be configured or set by administrators.

-   The only time virtual log files affect system performance is if the log files are defined by small size and growth_increment values. If these log files grow to a large size because of many small increments, they will have lots of virtual log files. This can slow down database startup and also log backup and restore operations. We recommend that you assign log files a size value close to the final size required, and also have a relatively large growth_increment value

-   The transaction log is a wrap-around file. For example, consider a database with one physical log file divided into four virtual log files. When the database is created, the logical log file begins at the start of the physical log file. New log records are added at the end of the logical log and expand toward the end of the physical log. Log truncation frees any virtual logs whose records all appear in front of the minimum recovery log sequence number (MinLSN). The MinLSN is the log sequence number of the oldest log record that is required for a successful database-wide rollback.


--------------------- Checkpoints and the Active Portion of the Log ----------------------



-   Checkpoints flush dirty data pages from the buffer cache of the current database to disk. This minimizes the active portion of the log that must be processed during a full recovery of a database. During a full recovery, the following types of actions are performed:
  • The log records of modifications not flushed to disk before the system stopped are rolled forward.
  • All modifications associated with incomplete transactions, such as transactions for which there is no COMMIT or ROLLBACK log record, are rolled back.

Checkpoint Operation

A checkpoint performs the following processes in the database:
  • Writes a record to the log file, marking the start of the checkpoint.
  • Stores information recorded for the checkpoint in a chain of checkpoint log records.

    One piece of information recorded in the checkpoint is the log sequence number (LSN) of the first log record that must be present for a successful database-wide rollback. This LSN is called the Minimum Recovery LSN (MinLSN).
  • If the database uses the simple recovery model, marks for reuse the space that precedes the MinLSN. 
  • Writes all dirty log and data pages to disk.
  • Writes a record marking the end of the checkpoint to the log file.
  • Writes the LSN of the start of this chain to the database boot page.
The checkpoint records also contain a list of all the active transactions that have modified the database.

Activities That Cause a Checkpoint

Checkpoints occur in the following situations:
  • A CHECKPOINT statement is explicitly executed. A checkpoint occurs in the current database for the connection.

  • A minimally logged operation is performed in the database; for example, a bulk-copy operation is performed on a database that is using the Bulk-Logged recovery model.

  • Database files have been added or removed by using ALTER DATABASE.

  • An instance of SQL Server is stopped by a SHUTDOWN statement or by stopping the SQL Server (MSSQLSERVER) service. Either action causes a checkpoint in each database in the instance of SQL Server.

  • An instance of SQL Server periodically generates automatic checkpoints in each database to reduce the time that the instance would take to recover the database.

  • A database backup is taken.

  • An activity requiring a database shutdown is performed. For example, AUTO_CLOSE is ON and the last user connection to the database is closed, or a database option change is made that requires a restart of the database.


------------------------------ Write-Ahead Transaction Log ------------------------------------



-   SQL Server uses a write-ahead log (WAL), which guarantees that no data modifications are written to disk before the associated log record is written to disk. This maintains the ACID properties for a transaction.

-   SQL Server maintains a buffer cache into which it reads data pages when data must be retrieved. Data modifications are not made directly to disk, but are made to the copy of the page in the buffer cache. The modification is not written to disk until a checkpoint occurs in the database, or the modification must be written to disk so the buffer can be used to hold a new page. Writing a modified data page from the buffer cache to disk is called flushing the page. A page modified in the cache, but not yet written to disk, is called a dirty page.


-   At the time a modification is made to a page in the buffer, a log record is built in the log cache that records the modification. This log record must be written to disk before the associated dirty page is flushed from the buffer cache to disk. If the dirty page is flushed before the log record is written, the dirty page creates a modification on the disk that cannot be rolled back if the server fails before the log record is written to disk. SQL Server has logic that prevents a dirty page from being flushed before the associated log record is written. Log records are written to disk when the transactions are committed.





1 comment: