MacLochlainns Weblog

Michael McLaughlin's Technical Blog

Site Admin

PostgreSQL Auto IDs

with 2 comments

PostgreSQL’s approach to automatic numbering is as simple as Oracle but different than MySQL, and Microsoft SQL Server. For example, you have a two-step process with Oracle, PostgreSQL, MySQL, and Microsoft SQL Server. First, you create an Oracle table with the GENERATED AS IDENTITY clause, a PostgreSQL table with the SERIAL data type, a MySQL table with the AUTO_INCREMENT clause, and a Microsoft SQL Server table with the IDENTITY(1,1) clause. Then, you need to write an INSERT statement for Oracle, MySQL, or Microsoft SQL Server like:

  1. Oracle’s INSERT statement excludes the auto-incrementing column from the list of columns or provides a NULL value in the VALUES-list. You can then assign the RETURNING INTO result from an INSERT statement to a session-level (bind) variable.
  2. MySQL’s INSERT statement excludes the auto-incrementing column from the list of columns or provides a NULL value in the VALUES-list. You can then assign the LAST_INSERT_ID() function value to a session-level variable, and populate a foreign key column.
  3. Microsoft SQL Server’s INSERT statement excludes the auto-incrementing column from the list of columns or provides a NULL value in the VALUES-list. You can then assign the SCOPE_IDENTITY() function’s value to a session-level variable, and populate a foreign key column.

PostgreSQL differs because it works differently between the SQL and PL/pgSQL contexts. Let’s look at how you link the insert of data into two tables in both contexts.

The following PostgreSQL syntax creates an ADDRESS table with an auto incrementing ADDRESS_ID column that uses a SERIAL data type, which acts like an auto numbering column:

/* Create a customer table. */
CREATE TABLE customer
( customer_id     SERIAL CONSTRAINT customer_pk PRIMARY KEY
, first_name      VARCHAR(20)
, last_name       VARCHAR(20));
 
/* Create an address table. */
CREATE TABLE address
( address_id      SERIAL CONSTRAINT address_pk PRIMARY KEY
, customer_id     INTEGER
, street_address  VARCHAR(40)
, city            VARCHAR(30)
, state           VARCHAR(8)
, zip_code        VARCHAR(10));

If you want to insert one row into the CUSTOMER table and a related row in the ADDRESS table. You have two possible approaches. One works in both the SQL and PL/pgSQL contexts. That mechanism requires you to use a scalar subquery to capture the foreign key value of the CUSTOMER_ID column in the ADDRESS table, like this:

/* Insert into customer table. */
INSERT INTO customer
( first_name, last_name ) 
VALUES
('F. Scott','Fitzgerald');
 
/* Insert into address table. */
INSERT INTO address
( customer_id
, street_address
, city
, state
, zip_code )
VALUES
((SELECT   customer_id
  FROM     customer
  WHERE    first_name = 'F. Scott'
  AND      last_name = 'Fitzgerald')
,'599 Summit Avenue'
,'St. Paul'
,'Minnesota'
,'55102');

The RETURNING INTO clause of PostgreSQL only works in a PL/pgSQL context, like this:

DO $$
DECLARE lv_customer_id INTEGER;
BEGIN
  /* Insert into customer table. */
  INSERT INTO customer
  ( first_name, last_name ) 
  VALUES
  ('Madeleine','Smith') 
  RETURNING customer_id INTO lv_customer_id;
 
  /* Insert into address table. */
  INSERT INTO address
  ( customer_id
  , street_address
  , city
  , state
  , zip_code )
  VALUES
  ( lv_customer_id
  ,'7 Blythswood Square'
  ,'Glasgow'
  ,'Scotland'
  ,'G2 4BG');
 
  /* Manage any exceptions. */
  EXCEPTION
    WHEN OTHERS THEN
      RAISE NOTICE '% %', SQLERRM, SQLSTATE;
END$$;

You query the auto generated values and data from the INSERT statement to the CUSTOMER table with a scalar subquery against the natural key (the FIRST_NAME and LAST_NAME columns) from the ADDRESS table. The following is an example of such a query:

SELECT   *
FROM     customer c INNER JOIN address a
ON       c.customer_id = a.customer_id;

It returns:

 customer_id | first_name | last_name  | address_id | customer_id |   street_address    |   city   |   state   | zip_code
-------------+------------+------------+------------+-------------+---------------------+----------+-----------+----------
           1 | F. Scott   | Fitzgerald |          1 |           1 | 599 Summit Avenue   | St. Paul | Minnesota | 55102
           2 | Madeleine  | Smith      |          2 |           2 | 7 Blythswood Square | Glasgow  | Scotland  | G2 4BG
(2 rows)

My take is that the RETURNING column_value INTO local_value clause is a better approach than using Oracle’s .NEXTVAL and .CURRVAL values. I also think the RETURNING INTO clause is a better approach than using MySQL’s LAST_INSERT_ID() or Microsoft SQL Server’s SCOPE_IDENTITY().

Initially, I felt it was odd that the PostgreSQL disallows the RETURNING INTO clause in a SQL context, because it allows the syntax in a PL/pgSQL context. After some reflection the choice makes more sense because most developers work within a procedural context when they use transactions across two or more tables. PL/pgSQL is PostgreSQL’s procedural context from managing transactions across two or more tables.

As always, I hope this helps.

Written by maclochlainn

March 15th, 2015 at 9:37 pm