Monday, August 29, 2011

A crash course in PostgreSQL, Part 1

SkyHi @ Monday, August 29, 2011
PostgreSQL is a first-rate, enterprise-worthy open source RDBMS (relational database management system) that compares very favorably to high-priced closed-source commercial databases. Databases are complex, tricksy beasts full of pitfalls. In this two-part crash course, we'll get a new PostgreSQL database up and running with elegant ease, and learn important fundamentals.
['NewSQL' could combine the best of SQL and NoSQL and NoSQL offers users scalability, flexibility, speed]
If you're a database novice, then give yourself plenty of time to learn your way around. PostgreSQL is a great database for beginners because it's well documented and aims to adhere to standards. Even better, everything is discoverable -- nothing is hidden, not even the source code, so you can develop as complete an understanding of it as you want.


The most important part of administering any database is preparation, in planning and design, and in learning best practices. A good requirements analysis will help you decide what data to store, how to organize it, and what business rules to incorporate. You'll need to figure out where your business logic goes -- in the database, in middleware, or applications? You may not have the luxury of a clean, fresh new installation, but must instead grapple with a migration from a different database. These are giant topics for another day; fortunately there are plenty of good resources online, starting with the excellent PostgreSQL manuals and Wiki.


We'll use three things in this crash course: PostgreSQL, its built-in interactive command shell psql, and the excellent pgAdmin3 graphical administration and development tool. Linux users will find PostgreSQL and pgAdmin3 in the repositories of their favorite Linux distributions, and there are downloads on for Linux, FreeBSD, Mac OS X, Solaris, and Windows. There are one-click installers for OS X and Windows, and they include pgAdmin3. Any of these operating systems are fine for testing and learning. For production use, I recommend a Linux or Unix server, because they're reliable, efficient, and secure.
Linux and FreeBSD split PostgreSQL into multiple packages. You want both the server and client. For example, on Debian the metapackage postgresql installs all of these packages:
# apt-get install postgresql
postgresql postgresql-9.0 postgresql-client-9.0 
           postgresql-client-common postgresql-common
See the detailed installation guides on the PostgresSQL wiki for more information for all platforms.
The downloads page also includes some live CDs which make it dead easy to set up a test server; simply boot the CD and go to work. For this article, I used a Debian Wheezy (Testing) system running PostgreSQL 9.0.4, the current stable release.

Creating and destroying a new PostgreSQL database

You'll have to jump through a number of startup and user account hurdles to get started. First, verify that your PostgreSQL server has started by opening psql, its interactive command shell:
$ psql
psql: could not connect to server: No such file or directory
Oops, PostgreSQL is not running. Go back to the appropriate installation guide to see how to start it on your system. On Debian, it starts automatically after installation, so the above command produces the following result:
carla@xena:~$ psql
psql: FATAL:  role "carla" does not exist
Well, excuse me all to heck! But this really isn't a big deal, because PostgreSQL creates a default postgres superuser with no password. So you have to change to this user, and then create a new user account that you will use to administer the database. On Linux and Unix you need to gain rootly powers, like this:
carla@xena:~$ su
root@xena:/home/carla# su postgres
There, now we can get some real work done! Let's create a carla superuser for PostgreSQL:
postgres@xena:/home/carla$ createuser carla
could not change directory to "/home/carla"
Shall the new role be a superuser? (y/n) y
These PostgreSQL roles, postgres and carla, are PostgreSQL user accounts that are independent of system accounts. A role can be a single user or a group of users. Roles can own database objects, such as tables, and can assign privileges to access those tables to other roles. Use the dropuser command to delete a role:
$ dropuser carla
Now, let's create a brand-new database:
$ createdb testdb

No news is good news; if this is successful there will be no feedback. You'll see a message only if something went wrong. Now let's destroy our new database:
$ dropdb testdb

Again, silence equals success.

Exploring PostgreSQL

Start by creating a new database so you have something to work with. There won't be any data in it yet, but that's all right. Once you've created to it, connect to it using the psql command:
$ psql testdb
psql (9.0.4)
Type "help" for help.

Excellent! We are now sitting at the command prompt of our new database. The hash mark indicates that a superuser is logged in. (An angle brace would indicate a less-privileged user.) You would use \q to quit (i.e., testdb=# \q). man psql lists all of the PostgresSQL commands. All commands that begin with a backslash are internal psql commands. They are not operating system shell commands, and they are not SQL commands.
Now, let's list all of our databases. (The last column of the table has been separated out so it'll all fit on the page, but this will be one table on your screen.)
$ testdb=# \list
 List of databases
   Name    |  Owner   | Encoding |  Collation  |    Ctype      
 postgres  | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 
 template0 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 
           |          |          |             |             
 template1 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 
           |          |          |             |             
 testdb    | carla    | UTF8     | en_US.UTF-8 | en_US.UTF-8 
 |   Access privileges 
 | =c/postgres          +
 | postgres=CTc/postgres
 | =c/postgres          +
 | postgres=CTc/postgres
There's our little testdb down at the bottom. The postgres database holds system data. template0 and template1 are templates for new databases, and template1 is the default. (You can use any Postgres database as a template.) template0 cannot be changed, but template0 can. Let's take a look inside postgres.
$ psql postgres
psql (9.0.4)
Type "help" for help.

Who are the postgres DB users?
postgres=# \du
List of roles
 Role name |            Attributes             | Member of 
 carla     | Superuser, Create role, Create DB | {}
 postgres  | Superuser, Create role, Create DB | {}
What tablespaces do we have?
postgres=# \db
       List of tablespaces
    Name    |  Owner   | Location 
 pg_default | postgres | 
 pg_global  | postgres | 
(2 rows)
Wait, back up: What are tablespaces? That's PostgreSQL's name for the physical storage locations of your database objects on disk. In other words, they're plain old directories on your filesystem. So you can control your disk layout and put your database objects wherever you like; for example, you could put a heavily-used index on a fast disk, or move objects to another partition or disk if you run out of space. Postgres has a squillion built-in objects, such as tables, functions, data types, aggregates, operators, and views. It also supports user-created objects. You can see the objects in the postgres DB, as this abbreviated example shows:
postgres=# \ddS

                      Object descriptions
   Schema   |    Name    |  Object   |      Description                                                   
 pg_catalog | abbrev     | function  | abbreviated display of inet value
 pg_catalog | abs        | function  | absolute value
 pg_catalog | abstime    | data type | absolute, limited-range date and time 
 pg_catalog | aclitem    | data type | access control list
 pg_catalog | bit_and    | aggregate | bitwise-and bigint aggregate
Press the Q key to exit the object descriptions. \dd means "show descriptions of all objects that have descriptions." By default system objects are not shown, so to see these add S.
Postgres is more than a RDBMS; it calls itself an an object-relational database management system, or ORDBMS. This is not the same as a object-oriented database management system (OODBMS), but something of a hybrid, a traditional relational database with an object-oriented database model; it supports objects, classes and inheritance.

Viewing Tables

Let's look at some tables while we're in the postgres DB. First, list its system tables:
postgres=# \dtS
              List of relations
   Schema   |      Name       | Type  |  Owner   
 pg_catalog | pg_aggregate    | table | postgres
 pg_catalog | pg_am           | table | postgres
 pg_catalog | pg_amop         | table | postgres
What's inside the pg_am table?
postgres=# \d pg_am

Table "pg_catalog.pg_am"
     Column      |   Type   | Modifiers 
 amname          | name     | not null
 amstrategies    | smallint | not null
 amsupport       | smallint | not null
 amcanorder      | boolean  | not null
 amcostestimate  | regproc  | not null
 amoptions       | regproc  | not null
    "pg_am_name_index" UNIQUE, btree (amname)
    "pg_am_oid_index" UNIQUE, btree (oid)
This is just the table structure without data. It shows the column names, the data type for each column, and any optional modifiers. Try creating your own table in your test database. For instance, this simple table tracks my underground comics collection:
    name              varchar(80),
    publisher         varchar(80),
    date_published    date
You can copy and paste this right into your own psql prompt:
testdb=# CREATE TABLE comics (
testdb(#     name              varchar(80),
testdb(#     publisher         varchar(80),
testdb(#     date_published    date
testdb(# );
The commas tell psql where your columns end, and the semi-colon tells where your command ends. Try running the commands you've already learned on your test database and new table to see how they look. Use DROP TABLE tablename;
That's all for today. Come back for Part 2 to find out how to populate tables with data, to learn about schema, keys, normalization, views, and tuples, and to discover how to use pgAdmin3.