Information about http://www.postgresql.org/files/developer/transactions.pdf

Transaction Processing in PostgreSQL Transaction Processing in…

Tags: application process, authentication client, client daemon, client processes, great bridge, library api, management locks, pgh, postmaster, process library, request server, server library, server processes, sql queries, sss, storage management, system overview, tgl, transaction processing, unix system,
Pages: 22
Language: english
Display cached document
Page 1
image
Page 2
image
Page 3
image
Page 4
image
Page 5
image
Page 6
image
Page 7
image
Page 8
image
Page 9
image
Page 10
image
Page 11
image
Page 12
image
Page 13
image
Page 14
image
Page 15
image
Page 16
image
Page 17
image
Page 18
image
Page 19
image
Page 20
image
Page 21
image
Page 22
image
Transaction Processing in PostgreSQL



Transaction Processing in PostgreSQL




Tom Lane
Great Bridge, LLC
tgl@sss.pgh.pa.us
                                       1
30 Oct 2000 Tom Lane
Transaction Processing in PostgreSQL


 Outline

Introduction
    · What is a transaction?
User's view
    · Multi-version concurrency control
Implementation
    · Tuple visibility
    · Storage management
    · Locks




                                          2
30 Oct 2000 Tom Lane
Transaction Processing in PostgreSQL


 PostgreSQL system overview


                 Client Processes                       Server Processes

                                                          Postmaster
                         Client                            Daemon
                       Application                         Process
                                           Initial
                       DB Requests      Connection
                                                             Spawn
                       and Results       Request
                                                              Server
                            via             and
                                                             Process
                        Library API    Authentication


                         Client                             Postgres
                       Interface       SQL Queries
                                                             Server
                        Library        and Results         (Backend)




                                                                           3
30 Oct 2000 Tom Lane
Transaction Processing in PostgreSQL


 PostgreSQL system overview

                       Server Processes             Shared Memory   Unix System

                           Postmaster     Create
                            Daemon                      Shared         Kernel
                            Process                      Disk           Disk
                                                        Buffers        Buffers
                               Spawn
                                Server
                               Process
                                                        Shared
                                            Read/                       Disk
                             Postgres       Write       Tables
                              Server                                   Storage
                            (Backend)




    · Database files are accessed through shared buffer pool
        · Hence, two backends can never see inconsistent views of a file
    · Unix kernel usually provides additional buffering
                                                                                  4
30 Oct 2000 Tom Lane
Transaction Processing in PostgreSQL


 What is a transaction, anyway?


Definition: a transaction is a group of SQLcommands whose results will be made
visible to the rest of the system as aunit when the transaction commits --- or not
at all, if the transaction aborts.


Transactions are expected to be atomic, consistent,isolated, and durable.


    · Postgres does not support distributed transactions, so all commandsof a transaction
    are executed by one backend.
    · We don't currently handle nested transactions, either.




                                                                                            5
30 Oct 2000 Tom Lane
Transaction Processing in PostgreSQL


 The ACID test: atomic, consistent, isolated, durable

    Atomic: results of a transaction are seen entirely or not at all within other transactions.
    (A transaction need not appear atomic to itself.)


    Consistent: system-defined consistency constraints areenforced on the results of
    transactions. (Not going to discuss constraintchecking today.)


    Isolated: transactions are not affected by the behavior ofconcurrently-running
    transactions.
    Stronger variant: serializable. If the final resultsof a set of concurrent transactions
    are the same as if we'd run thetransactions serially in some order (not necessarily
    any predetermined order), then we say the behavior is serializable.


    Durable: once a transaction commits, its results will not belost regardless of
    subsequent failures.
                                                                                                  6
30 Oct 2000 Tom Lane
Transaction Processing in PostgreSQL


 But how can thousands of changes be made "atomically"?

    · The actual tuple insertions/deletions/updates are all markedas done by transaction N
    as they are being made. Concurrently running backends ignore the changes
    because they know transaction N is not committed yet. When the transactioncommits,
    all those changes become logically visible at once.
    · The control file pg_log contains 2 status bits pertransaction ID, with possible states
    in progress, committed,aborted. Setting those two bits to thevalue committed is
    the atomic action that marks a transaction committed.
    · An aborting transaction will normally set its pg_log statusto aborted. But even if
    the process crashes without havingdone so, everything is safe. The next time some
    backend checks the state ofthat transaction, it will observe that the transaction is
    marked inprogress but is not running on any backend, deduce that it crashed,and
    update the pg_log entry to aborted on its behalf.No changes are needed
    in any table data file during abort.


                                                                                               7
30 Oct 2000 Tom Lane
Transaction Processing in PostgreSQL


 But is it really atomic and durable, even if the system crashes?

    Well ... that depends on how much you trust your kernel and hard disk.


    · Postgres transactions are only guaranteed atomic if a disk page write isan atomic
    action. On most modern hard drives that's true if a page is aphysical sector, but most
    people run with disk pages configured as 8K or so,which makes it a little more dubious
    whether a page write is all-or-nothing.


    · pg_log is safe anyway since we're only flipping bits init, and both bits of a
    transaction's status must be in the same sector.


    · But when moving tuples around in a data page, there's a potential fordata corruption
    if a power failure should manage to abort the page writepartway through (perhaps only
    some of the component sectors get written).This is one reason to keep page sizes
    small ... and to buy a UPS for your server!
                                                                                             8
30 Oct 2000 Tom Lane
Transaction Processing in PostgreSQL


 Working through the Unix kernel costs us something, too

    It's critical that we force a transaction's data page changes down to diskbefore we write
    pg_log. If the disk writes occur in the wrongorder, a power failure could leave us with a
    transaction that'smarked committed in pg_log but not all ofwhose data changes are
    reflected on disk --- thus failing the atomicity test.


    · Unix kernels allow us to force the correct write order via fsync(2), butthe performance
    penalty of fsync'ing many files is pretty high.


    · We're looking at ways to avoid needing so many fsync()s, but that's adifferent talk.




                                                                                                9
30 Oct 2000 Tom Lane
Transaction Processing in PostgreSQL


 User's view: multi-version concurrency control

A PostgreSQL application sees the following behavior of concurrenttransactions:


· Each transaction sees a snapshot (database version) as of its starttime,
no matter what other transactions are doing while it runs


· Readers do not block writers, writers do not block readers


· Writers only block each other when updating the same row




                                                                                  10
30 Oct 2000 Tom Lane
Transaction Processing in PostgreSQL


 Concurrent updates are tricky

Consider this example: transaction A does
                              UPDATE foo SET x = x + 1 WHERE rowid = 42

and before it commits,transaction B comes along and wants to do the same thing
on the same row.


· B clearlymust wait to see if A commits or not.
· If A aborts then B can go ahead,using the pre-existing value of x.
· But if A commits, what then?
    · Usingthe old value of x will yield a clearly unacceptableresult: x ends up
    incremented by 1 not 2 after both transactions commit.
    · But if B is allowed to increment the new valueof x, then B is reading data committed
    since it began execution. This violates the basic principle oftransaction isolation.




                                                                                             11
30 Oct 2000 Tom Lane
Transaction Processing in PostgreSQL


 Read committed vs. serializable transaction level

PostgreSQL offers two answers to the concurrent-update problem(out of four
transaction isolation levels defined in the ISO SQL standard):
    Read committed level: allow B to use new tuple as inputvalues (after checking
    to ensure new tuple still satisfies query's WHERE clause). Thus, B isallowed to
    see just this tuple of A's results.
    Serializable level: abort B with "not serializable" error.Client application must redo
    the whole transaction B, which will then be allowedto see the new value of x under
    strict serializable-behavior rules.


    · Serializable level is logically cleaner but requires more code inapplication, so by
    default we run in read-committed level which usually produces the desiredbehavior.
    · In either case a pure SELECT transaction only sees data committed beforeit started.
    It's just updates and deletes that are interesting.


                                                                                             12
30 Oct 2000 Tom Lane
Transaction Processing in PostgreSQL


 How it's implemented

"O say, can you see that tuple?"


The most fundamental implementation concept is tuplevisibility: which versions
of which table rows are seen by which transactions.


Ignoring tuples you're not supposed to be able to see is the key tomaking
transactions appear atomic.


Definition: a tuple is a specific stored object ina table,representing one version
of some logical table row. A row may exist inmultiple versions simultaneously.




                                                                                     13
30 Oct 2000 Tom Lane
Transaction Processing in PostgreSQL


 Non-overwriting storage management

We must store multiple versions of every row. A tuple can be removed onlyafter
it's been committed as deleted for long enough that no activetransaction
can see it anymore.


Fortunately, PostgreSQL has always practiced "non overwriting" storage
management: updated tuples are appended to the table, and older versionsare
removed sometime later.


Currently, removal of long-dead tuples is handled bya VACUUM maintenance
command that must be issuedperiodically. We are looking at ways to reduce
need for VACUUM by recycling dead tuples on-the-fly.




                                                                                 14
30 Oct 2000 Tom Lane
Transaction Processing in PostgreSQL


 Per-tuple status information

Tuple headers contain:
    · xmin: transaction ID of inserting transaction
    · xmax: transaction ID of replacing/deleting transaction (initially NULL)
    · forward link: link to newer version of same logical row, if any
Basic idea: tuple is visible if xmin is valid and xmax is not. "Valid"means
"either committed or the current transaction".


    If we plan to update rather than delete, we first add new version of rowto table,
    then set xmax and forward link in old tuple. Forward link willbe needed by
    concurrent updaters (but not by readers).
    To avoid repeated consultation of pg_log, there are alsosome statusbits that indicate
    "known committed" or "known aborted" for xmin and xmax.These are set by the first
    backend that inspects xmin or xmax after thereferenced transaction commits/aborts.


                                                                                            15
30 Oct 2000 Tom Lane
Transaction Processing in PostgreSQL


 "Snapshots" filter away active transactions

If Transaction A commits while Transaction B is running, we don't want Bto
suddenly start seeing A's updates partway through.


    · Hence, we make a listat transaction start of which transactions are currently being
    run by other backends.(Cheap shared-memory communication is essential here: we
    just look in ashared-memory table, in which each backend records its current
    transactionnumber.)
    · These transaction IDs will never be considered validby the current transaction,
    even if they are shown to be committed in pg_log or on-row status bits.
    · Nor will a transaction with ID higher than the current transaction'sever be
    considered valid.
    · These rules ensure that no transaction committing after the currenttransaction's
    start will be considered committed.
    · Validity is in the eye of the beholder.
                                                                                            16
30 Oct 2000 Tom Lane
Transaction Processing in PostgreSQL


 Table-level locks: still gotta have 'em for some things

Even though readers and writers don't block each other under MVCC, we stillneed
table-level locking.


This exists mainly to prevent the entire table frombeing altered or deleted
out from under readers or writers.


We also offer various lock levels for application use (mainly forporting applications
that take a traditional lock-based approach toconcurrency).




                                                                                        17
30 Oct 2000 Tom Lane
Transaction Processing in PostgreSQL


 Types of locks

     Lock type                         Acquired by system for            Conflicts with



     1   AccessShareLock               SELECT                            7

     2   RowShareLock                  SELECT FOR UPDATE                 6,7

     3   RowExclusiveLock              UPDATE, INSERT, DELETE            4,5,6,7

     4   ShareLock                     CREATE INDEX                      3,5,6,7

     5   ShareRowExclusiveLock                                           3,4,5,6,7

     6   ExclusiveLock                                                   2,3,4,5,6,7

     7   AccessExclusiveLock           DROP TABLE, ALTER TABLE, VACUUM   all



    All lock types can be obtained by user LOCK TABLE commands.


    Locks are held till end of transaction: you can grab a lock, but you can'trelease it
    except by ending your transaction.


                                                                                           18
30 Oct 2000 Tom Lane
Transaction Processing in PostgreSQL


 Lock implementation

Locks are recorded in a shared-memory hash table keyed by kind and ID ofobject
being locked. Each item shows the types and numbers of locks held orpending on
its object. Would-be lockers who have a conflict with an existinglock must wait.


Waiting is handled by waiting on a per-process IPC semaphore, which willbe
signaled when another process releases the wanted lock. Note we needonly one
semaphore per concurrent backend, not one per potentially lockableobject.




                                                                                   19
30 Oct 2000 Tom Lane
Transaction Processing in PostgreSQL


 Deadlock detection

Deadlock is possible if two transactions try to grab conflicting locksin different orders.


If a would-be locker sleeps for more than a second without getting thedesired lock,
it runs a deadlock-check algorithm that searches thelock hash table for circular
lock dependencies. If it finds any, thenobtaining the lock will be impossible, so it
gives up and reports anerror. Else it goes back to sleep and waits till granted the
lock (ortill client application gives up and requests transaction cancel).


    · The delay before running the deadlock check algorithm can betuned to match the
    typical transaction time in a particular server'sworkload. In this way, unnecessary
    deadlock checks are seldomperformed, but real deadlocks are detected reasonably
    quickly.



                                                                                          20
30 Oct 2000 Tom Lane
Transaction Processing in PostgreSQL


 Short-term locks

Short-term locks protect datastructures in shared memory, such as the lock
hashtable described above.


These locks should only be held for long enough toexamine and/or update a
shared item --- in particular a backend should neverblock while holding one.


Implementation: spin locks based on platform-specificatomic test-and-set
instructions. This allows the lock code to fall through extremely quicklyin the
common case where there is no contention for the lock. If the test-and-setfails,
we sleep for a short period (using select(2)) andtry again. No deadlock detection
as such, but we give up and report errorif fail too many times.




                                                                                    21
30 Oct 2000 Tom Lane
Transaction Processing in PostgreSQL


 Summary

PostgreSQL offers true ACID semantics for transactions, given somereasonable
assumptions about the behavior of the underlying Unix kerneland hardware.


Multi-version concurrency control allows concurrent readingand writing of tables,
blocking only for concurrent updates of same row.


MVCC is practical because of non-overwriting storage manager that weinherited
from the Berkeley POSTQUEL project. Traditionalrow-overwriting storage
management would have a much harder time.




                                                                                    22
30 Oct 2000 Tom Lane