Information about http://www.postgresql.org/files/developer/tour.pdf

A Tour of PostgreSQL Internals A Tour of PostgreSQL Internals …

Tags: authentication client, client daemon, client libraries, client processes, communication structure, cost estimation, data types, great bridge, library api, pgh, postgresql, process library, request server, server communication, server library, server processes, sql queries, system catalogs, tgl, view 3,
Pages: 25
Language: english
Display cached document
Page 1
image
Page 2
image
Page 3
image
Page 4
image
Page 5
image
Page 6
image
Page 7
image
Page 8
image
Page 9
image
Page 10
image
Page 11
image
Page 12
image
Page 13
image
Page 14
image
Page 15
image
Page 16
image
Page 17
image
Page 18
image
Page 19
image
Page 20
image
Page 21
image
Page 22
image
Page 23
image
Page 24
image
Page 25
image
A Tour of PostgreSQL Internals



A Tour of PostgreSQL Internals




Tom Lane
Great Bridge, LLC
tgl@sss.pgh.pa.us
                                 1
31 Oct 2000 Tom Lane
A Tour of PostgreSQL Internals


 Outline

I'm going to present three separate views of PostgreSQL.
Each view is equally valid but will teach you something different about the beast.


· VIEW 1: Processes and interprocess communication structure
     Key ideas: client/server separation, inter-server communication


· VIEW 2: System catalogs and data types
     Key ideas: extensibility, compartmentalization


· VIEW 3: Steps of query processing
     Key ideas: parse and plan trees, scans and joins, cost estimation




                                                                                     2
31 Oct 2000 Tom Lane
A Tour of PostgreSQL Internals


 PostgreSQL processes: client/server communication

                       Client Processes                     Server Processes

                                                               Postmaster
                             Client                             Daemon
                           Application                          Process
                                               Initial
                            DB Requests     Connection
                                                                  Spawn
                            and Results      Request
                                                                  Server
                                 via            and
                                                                  Process
                             Library API   Authentication


                              Client                             Postgres
                            Interface      SQL Queries
                                                                  Server
                             Library       and Results          (Backend)




     · Multiple client libraries offer different APIs: libpq, ODBC, JDBC,Perl DBD
     · Client libraries insulate client applications from changes inon-the-wire protocol
                                                                                           3
31 Oct 2000 Tom Lane
A Tour of PostgreSQL Internals


 PostgreSQL processes: inter-server communication


                Server Processes             Shared Memory   Unix System

                     Postmaster    Create
                      Daemon                     Shared         Kernel
                      Process                     Disk           Disk
                                                 Buffers        Buffers
                         Spawn
                          Server
                         Process
                                                 Shared
                                     Read/                       Disk
                        Postgres     Write       Tables
                         Server                                 Storage
                       (Backend)




                                                                           4
31 Oct 2000 Tom Lane
A Tour of PostgreSQL Internals


 PostgreSQL processes: summary

Pros:
· Hard separation of client and server is good for security/reliability
· Works well in a networked environment
· Portable across most flavors of Unix


Cons:
· Dependence on shared memory for inter-server communication limits scalability
     A single server site can't be spread across multiple machines
· Connection startup overhead is bad for short-duration client tasks
     Usual workaround: client-side connection pooling, as in AOLServer for example




                                                                                     5
31 Oct 2000 Tom Lane
A Tour of PostgreSQL Internals


 System catalogs and data types: introduction

Postgres is catalog-driven to a much greater extent than most other DBMSes.


     · We have the usual sort of system catalogs to describe tables, theircolumns,
      their indexes, etc.


     · But we also use system catalogs to store informationabout datatypes, functions,
      operators, index access methods, and so forth.


     · The system can be extended by adding new catalog entries (and writingthe
      underlying code, in the case of adding a function).




                                                                                         6
31 Oct 2000 Tom Lane
A Tour of PostgreSQL Internals


 System catalogs and data types: basic catalogs

Basic system catalogs that describe a table:
     pg_class: one row for each table in the database, containing name,owner,
     access permissions, number of columns, etc.


     pg_attribute: one row for each column of each table, containing name,data type,
     column number, etc.


     pg_index: relates a table to its indexes. (Indexes aretables and so have their own
     entries in pg_class and pg_attribute, too.) One row per index, containingreferences
     to pg_class entries of index and underlying table, plus info about whichcolumns
     of the table the index is computed on and what the index operators are.


     Lots of less-important tables, such as pg_relcheck which stores constraintdefinitions,
     but I'm just hitting the high spots.
                                                                                              7
31 Oct 2000 Tom Lane
A Tour of PostgreSQL Internals


 System catalogs and data types: functions

     pg_proc: defines functions. Each function has one row that gives itsname,
     input argument types, result type, implementation language,and definition
     (either the text, if it's in an interpreted language,or a reference to its executable
     code, if it's compiled). Compiled functionscan be statically linked into the server,
     or stored in shared librariesthat are dynamically loaded on first use. Compiled
     functions are typically written in C, though in theory one could use other choices.


     pg_language: defines implementation languages for functions. Languageswith
     hard-wired support are internal (statically-linked compiledcode),
     C (dynamically-linked compiled code), and SQL (body is one or moreSQL queries).
     Optional languages currently include pl/pgsql (PL/SQL-ish),tcl, and perl, with more
     to come. These languages are supported by handlersthat are dynamically-linked
     functions --- the core server doesn't know athing about them.


                                                                                             8
31 Oct 2000 Tom Lane
A Tour of PostgreSQL Internals


 System catalogs and data types: example functions

C:
      int4
      square_int4 (int4 x)
      {
          return x * x;
      }


     Compile the above into a shared-library file, then say:
      CREATE FUNCTION square(int4) RETURNS int4
      AS '/path/to/square.so', 'square_int4'
      LANGUAGE 'C';



PL/PGSQL:
      CREATE FUNCTION square(int4) RETURNS int4 AS 'begin
      return $1 * $1;
      end;' LANGUAGE 'plpgsql';




                                                               9
31 Oct 2000 Tom Lane
A Tour of PostgreSQL Internals


 System catalogs and data types: aggregate functions

     pg_aggregate: defines aggregate functions like min(),max(), count(). Aggregates
     involve a working-state datatype, an update function,and a final-output function.

     AVG(int4) in pltcl:
      -- The working state is a 2-element integer array, sum and count.
      -- We use split(n) as a quick-and-dirty way of parsing the input array
      -- value, which comes in as a string like '{1,2}'. There are better ways...

      create function tcl_int4_accum(int4[], int4) returns int4[] as '
          set state [split $1 "{,}"]
          set newsum [expr {[lindex $state 1] + $2}]
          set newcnt [expr {[lindex $state 2] + 1}]
          return "{$newsum,$newcnt}"
      ' language 'pltcl';

      create function tcl_int4_avg(int4[]) returns int4 as '
          set state [split $1 "{,}"]
          if {[lindex $state 2] == 0} { return_null }
          return [expr {[lindex $state 1] / [lindex $state 2]}]
      ' language 'pltcl';

      create aggregate tcl_avg (
              basetype = int4,               --   input datatype
              sfunc = tcl_int4_accum,        --   update function name
              stype = int4[],                --   working-state datatype
              finalfunc = tcl_int4_avg,      --   final-output function name
              initcond = '{0,0}'             --   initial value of working state
          );
                                                                                         10
31 Oct 2000 Tom Lane
A Tour of PostgreSQL Internals


 System catalogs and data types: operators

     pg_operator: defines operators that can be used inexpressions.
     An operator is mainly just syntactic sugar for a function of one or twoarguments.


     Do-it-yourself exponentiation operator:
      CREATE FUNCTION mypower(float8, float8) RETURNS float8 AS 'begin
      return exp(ln($1) * $2);
      end;' LANGUAGE 'plpgsql';

      CREATE OPERATOR ** (
          procedure = mypower,
          leftarg = float8,
          rightarg = float8 );

      SELECT 44 ** 2, 81 ** 0.5;
       ?column? | ?column?
      ----------+----------
           1936 |        9
      (1 row)




                                                                                         11
31 Oct 2000 Tom Lane
A Tour of PostgreSQL Internals


 System catalogs and data types: data types

     pg_type: defines the basic data types that values stored intables can have
     and that are accepted and returned by operators and functions.
     New data types can be invented by making new entries in pg_type.


     Example: say your application wants to store coordinates of points in3-D space.
     The built-in type Point won't do (it's only 2-D), so youbuild a 3-D point datatype.
     At minimum, a data type must have two associated functions, which convertits
     internal representation to and from external textual form. Havingwritten these
     functions, you can define the type to Postgres:
      CREATE TYPE point3 (
          input = point3in,
          output = point3out,
          internallength = 24,        -- space for three float8's
          alignment = double );       -- ensure storage will be aligned properly




                                                                                           12
31 Oct 2000 Tom Lane
A Tour of PostgreSQL Internals


 System catalogs and data types: data types

     Typically one wants to do more with a datatype than just store and retrievevalues,
     so the next step is to add functions and operators that do usefulthings with the datatype.
     For instance, distance between two points isa function you'd probably want to have for
     your 3-D point type. This iswhere things get tedious...

     A new datatype can be made indexable by adding entriesto pg_amop and pg_amproc.
     These entriestell the indexing machinery about a set ofcomparison operators and
     functions that behave in the way an indexaccess method expects. For example,
     to be btree-indexable, entries mustbe provided for a datatype's <  >=
     comparison operators, as well as a 3-way sort comparison function.

     None of the standard index types are very suitable for 3-D points (R-trees aregeometric,
     but only 2-D). In theory you could write a new index access methodfor 3-D R-trees
     and link it into the system with a new pg_am entry. No onehas actually done
     something like that in recent memory, however.
                                                                                             13
31 Oct 2000 Tom Lane
A Tour of PostgreSQL Internals


 System catalogs and data types: summary

Pros:
· Extensibility of functions and datatypes is a big win for specializedapplications


Cons:
· Making a new datatype with a usefully rich set of functions is a lotof work
· System design demands an "arm's length", "black box" treatment ofobjects
     For example, MAX() and MIN() aggregate functions are processed the same as allother
     aggregates: they must scan all the data. In the presence of an ordered index,these
     could be implemented more efficiently by looking at the extremal indexvalue. But
     making this happen in a way that isn't a complete kluge requiresdesigning a general
     representation of a connection between aggregates andindexes, which is a rather hard
     problem. The system design isn't veryforgiving of partial solutions to issues like this.



                                                                                                14
31 Oct 2000 Tom Lane
A Tour of PostgreSQL Internals


 Steps of query processing: overview

                                               Client
                                           Interface Lib

                 SQL Queries                                            Query Results


                                       Postgres Server
                        1                                                    4
                                  Look up              Fetch/
                   Parser                                                Executor
                                   Object               Store
                                 Definitions Database User Data
                 Parse Trees                   Tables                    Plan Trees
                                  Look up
                                 Rule/View                 Look up
                                 Definitions               Statistics
                        2                                                    3
                  Rewriter              Rewritten Parse Trees            Planner


     Key data structures: parse tree, plan tree
                                                                                        15
31 Oct 2000 Tom Lane
A Tour of PostgreSQL Internals


 Steps of query processing: parser

Input:
      SELECT * FROM tab1, tab2 WHERE tab1.a = tab2.f

Output:


                       SELECT Query
                           Target List                    tab1.a       int4
                                                          tab1.b       text
                           Join Tree                      tab1.c       float8
                                                          tab2.d       text
                                                          tab2.e       text
                          Qualification
                                                          tab2.f       int4



                                                              Cross Join
                                 int4 =
                                                       tab1                tab2
                    tab1.a                tab2.f

                                                                                  16
31 Oct 2000 Tom Lane
A Tour of PostgreSQL Internals


 Steps of query processing: rewriter

Views are handled by substitution of subqueries into the parse tree.
     For example, if tab2 were a view then the rewriter would emit somethinglike this:

                        SELECT Query
                            Target List               tab1.a       int4
                                                      tab1.b       text
                            Join Tree                 tab1.c       float8
                                                      tab2.d       text
                                                      tab2.e       text
                           Qualification
                                                      tab2.f       int4



                                                          Cross Join
                                int4 =
                                                   tab1                     SELECT Query
                       tab1.a             tab2.f                              Target List
                                                                               Join Tree
                                                                              Qualification




ON INSERT/UPDATE/DELETE rules require more extensive transformations,
and may produce multiple queries from a single query.
                                                                                              17
31 Oct 2000 Tom Lane
A Tour of PostgreSQL Internals


 Steps of query processing: executor

The basic idea of the executor is that it executes a plan tree, which isa pipelined
demand-pull network of processing nodes. Each node producesthe next tuple
in its output sequence each time it is called. Upper-levelnodes call their
subnodes to get input tuples, from which they compute theirown output tuples.


Bottom-level nodes are scans of physical tables --- either sequential scans
or index scans.


Upper-level nodes are usually join nodes --- nested-loop, merge, and hashjoins
are available. Each join node combines two input tuple streams intoone.


There are also special-purpose node types, such as SORT and AGGREGATE.



                                                                                      18
31 Oct 2000 Tom Lane
A Tour of PostgreSQL Internals


 Steps of query processing: executor example

Query:
      SELECT SUM(a1)+1 FROM a WHERE a2 < a3

Plan tree:

                       Executor top level


                                      Single result tuple of aggregate calculation

                            Aggregate
                                                                                                   +
                             Target List
                                                                                     SUM result          1
                          Working state(s)
                                                              Running state
                                                               for SUM(a1)


                                       Equivalent to SELECT a1 FROM a WHERE a2 < a3

                        Sequential Scan
                                 Table: a                                                         a.a1
                             Target List                              int4 <
                            Qualification
                                                              a.a2             a.a3
                                                                                                             19
31 Oct 2000 Tom Lane
A Tour of PostgreSQL Internals


 Steps of query processing: executor example

Query:
      SELECT DISTINCT a1, b1 FROM a, b WHERE a2 = b2 AND a3 = 42

Plan tree:
   Executor top level


              Adjacent duplicate tuples removed

        Unique
                                                                              SELECT a1, b1 FROM a, b WHERE a2 = b2 AND a3 = 42
              Tuples sorted to bring duplicates together
                                                                    Nestloop Join
          Sort                                                     Target List: a1, b1
   Sort Columns: a1, b1                                           Qualification: a2 = b2

                                                                     Current a2 value

                                                       SELECT a1, a2 FROM a WHERE a3 = 42              SELECT b1, b2 FROM b WHERE b2 = $1

                                         Index Scan                                            Index Scan
                                            Table: a                                              Table: b
                                          Index: a(a3)                                          Index: b(b2)
                                       Target List: a1, a2                                   Target List: b1, b2
                                     Qualification: a3 = 42                                 Qualification: b2 = $1


                                                                                                                                      20
31 Oct 2000 Tom Lane
A Tour of PostgreSQL Internals


 Steps of query processing: execution of non-SELECT queries

What if the query's not a SELECT?
Guess what, the plan tree machinery doesn't really care.


· INSERT is same as SELECT except for where the result rows go.


· For UPDATE/DELETE, the planner adds a hidden targetlist item to returnthe TID
 of the selected rows, which the executor's top level uses to determinewhich row
 to update or delete.


So, for planning and most of the executor, all queries look like SELECTs.
Only the executor top level has to behave differently depending on thequery type.




                                                                                    21
31 Oct 2000 Tom Lane
A Tour of PostgreSQL Internals


 Steps of query processing: planner

Now that we've looked at plan trees, we can talk about the planner.
The basic idea of the planner is cost-estimate-based selection of the bestplan tree
for a given query.
Simple example:
           SELECT * FROM t WHERE f1 < 100

Assuming there is an index on t(f1), two possible execution plans areconsidered:
· sequential scan over all of t
· index scan with index restriction f1 < 100
Costs of each plan (in disk pagefetches and CPU time) are estimated and the plan
with lower estimatedcost is selected.
Note that the indexscan is not an automatic winner,and should not be. The choice
will depend on what fraction of the rowsof t are estimated to be retrieved.




                                                                                      22
31 Oct 2000 Tom Lane
A Tour of PostgreSQL Internals


 Steps of query processing: join planning

In a multi-table query, we first estimate costs for sequential scansand index scans
(where applicable) of each component table, then build upa join tree in which all
possible pairwise join paths are considered.The k 'th level of the tree gives us the
cheapest ways tojoin any k ofthe base tables, and at the top level (level n for
an n -table query) we find the cheapest overall joinpath.


If there are a large number of tables involved (more than about ten),this exhaustive
search of the join space is impractical due to exponentialgrowth of the number of
possible join paths. In such cases we fall backon a probabilistic search through a
limited number of alternatives, using agenetic optimization algorithm.




                                                                                       23
31 Oct 2000 Tom Lane
A Tour of PostgreSQL Internals


 Steps of query processing: summary

Pros:
· System usually finds a good query plan without human assistance


Cons:
· Plan is only as good as the system's cost models and statistics aboutuser data
     It's possible to choose a spectacularly bad plan if the statisticsare way off base.
     We are working on improving the models andstatistics.
· Time to generate a plan can be annoying, particularly forrepetitive queries
     Pushing complex queries into plpgsqlfunctions helps, since plpgsql caches plans.
     We are also talking aboutcreating a more generic plan cache mechanism.




                                                                                           24
31 Oct 2000 Tom Lane
A Tour of PostgreSQL Internals


 Summary

PostgreSQL is a complex system that takes a good deal of study to become
familiar with.


If you do study it, you find considerable elegance of design.
     ... much of the credit for which is due to Stonebraker and his students atBerkeley,
     not to the current developers.


While PostgreSQL has some designed-in limitations, it is also capable ofdoing
things that no competing DBMS can.




                                                                                           25
31 Oct 2000 Tom Lane