MacLochlainns Weblog

Michael McLaughlin's Technical Blog

Site Admin

PostgreSQL Arrays

with one comment

If you’re wondering about this post, it shows the basic array of a set of integers and strings before showing you how to create nested tables of data in PostgreSQL. By the way, they’re not called nested tables in PostgreSQL, like they are in Oracle but perform like their Oracle cousins.

Let’s create a table with an auto-incrementing column and two arrays, one array of integers and another of strings:

-- Conditionally drop the demo table.
DROP TABLE IF EXISTS demo;
 
-- Create the test table.
CREATE TABLE demo
( demo_id     serial
, demo_number integer[5]
, demo_string varchar(5)[7]);

You can insert test values like this:

INSERT INTO demo
(demo_number, demo_string)
VALUES
( array[1,2,3,4,5]
, array['One','Two','Three','Four','Five','Six','Seven']);

Then, you can query them with this unnest function, like:

SELECT  unnest(demo_number) AS numbers
,       unnest(demo_string) AS strings
FROM    demo;

It returns:

 numbers | strings
---------+---------
 1       | One
 2       | Two
 3       | Three
 4       | Four
 5       | Five
         | Six
         | Seven
(7 rows)

You may note that the two arrays are asymmetrical. It only becomes an issue when you navigate the result in a PL/pgSQL cursor or imperative programming language, like Python.

Now, let’s do something more interesting like work with a composite user-defined type, like the player structure. You would create the composite user-defined type with this syntax:

-- Conditionally drop the player type.
DROP TYPE IF EXISTS player;
 
-- Create the player type.
CREATE TYPE player AS
( player_no        integer
, player_name      varchar(24)
, player_position  varchar(14)
, ab               integer
, r                integer
, h                integer
, bb               integer
, rbi              integer );

You can create a world_series table that include a players column that uses an array of player type, like

-- Conditionally drop the world_series table.
DROP TABLE IF EXISTS world_series;
 
-- Create the player type.
CREATE TABLE world_series
( world_series_id  serial
, team             varchar(24)
, players          player[30]
, game_no          integer
, year             integer );

If you’re familiar with the Oracle Database, you’d have to specify a tested table in the syntax. Fortunately, PostgreSQL doesn’t require that.

Insert two rows with the following statement:

INSERT INTO world_series
( team
, players
, game_no
, year )
VALUES
('San Francisco Giants'
, array[(24,'Willie Mayes','Center Fielder',5,0,1,0,0)::player
       ,(5,'Tom Haller','Catcher',4,1,2,0,2)::player]
, 4
, 1962 );

You can append to the array with the following syntax. A former student and I have a disagreement on whether this is shown in the PostgreSQL 8.15.4 Modifying Array documentation. I believe it can be inferred from the document and he doesn’t believe so. Anyway, here’s how you add an element to an existing array in a table with the UPDATE statement:

UPDATE world_series
SET    players = (SELECT array_append(players,(7,'Henry Kuenn','Right Fielder',3,0,0,1,0)::player) FROM world_series)
WHERE  team = 'San Francisco Giants'
AND    year = 1962
AND    game_no = 4;

Like Oracle’s nested tables, PostgreSQL’s arrays of composite user-defined types requires writing a PL/pgSQL function. I’ll try to add one of those shortly in another blog entry to show you how to edit and replace entries in stored arrays of composite user-defined types.

You can query the unnested rows and get a return set like a Python tuple with the following query:

SELECT unnest(players) AS player_list
FROM   world_series
WHERE  team = 'San Francisco Giants'
AND    year = 1962
AND    game_no = 4;

It returns the three rows from the players array:

                 player_list
----------------------------------------------
 (24,"Willie Mayes","Center Field",5,0,1,0,0)
 (5,"Tom Haller",Catcher,4,1,2,0,2)
 (7,"Henry Kuenn","Right Fielde",3,0,0,1,0)
(3 rows)

It returns the data set in entry-order. If we step outside of the standard 8.15 Arrays PostgreSQL Documentation, you can do much more with arrays (or nested tables). The balance of this example demonstrates some new syntax that helps you achieve constructive outcomes in PostgreSQL.

You can use a Common Table Expression (CTE) to get the columnar display of the player composite user-defined type. This type of solution is beyond the standard , like:

WITH list AS
 (SELECT unnest(players) AS row_result
  FROM   world_series
  WHERE  team = 'San Francisco Giants'
  AND    year = 1962
  AND    game_no = 4)
SELECT  (row_result).player_name
,       (row_result).player_no
,       (row_result).player_position
FROM     list;

If you’re unfamiliar with accessing composite user-defined types, I wrote a post on that 7 years ago. You can find the older blog entry PostgreSQL Composites on my blog.

It returns only the three requested columns of the player composite user-defined type:

 player_name  | player_no | player_position
--------------+-----------+-----------------
 Willie Mayes |        24 | Center Fielder
 Tom Haller   |         5 | Catcher
 Henry Kuenn  |         7 | Right Fielder
(3 rows)

You should note that the data is presented in an entry-ordered manner when unnested alone in the SELECT-list. That behavior changes when the SELECT-list includes non-array data.

The easiest way to display data from the non-array and array columns is to list them inside the SELECT-list of the CTE, like:

WITH list AS
 (SELECT game_no AS game
  ,      year
  ,      unnest(players) AS row_result
  FROM   world_series
  WHERE  team = 'San Francisco Giants'
  AND    year = 1962
  AND    game_no = 4)
SELECT   game
,        year 
,       (row_result).player_name
,       (row_result).player_no
,       (row_result).player_position
FROM     list;

It returns an ordered set of unnested rows when you include non-array columns, like:

 game | year | player_name  | player_no | player_position
------+------+--------------+-----------+-----------------
    4 | 1962 | Henry Kuenn  |         7 | Right Fielder
    4 | 1962 | Tom Haller   |         5 | Catcher
    4 | 1962 | Willie Mayes |        24 | Center Fielder
(3 rows)

While you can join the world_series table to the unnested array rows (returned as a derived table, its a bad idea. The mechanics to do it require you to return the primary key column in the same SELECT-list of the CTE. Then, you join the CTE list to the world_series table by using the world_series_id primary key.

However, there is no advantage to an inner join approach and it imposes unnecessary processing on the database server. The odd rationale that I heard when I noticed somebody was using a CTE to base-table join was: “That’s necessary so they could use column aliases for the non-array columns.” That’s not true because you can use the aliases inside the CTE, as shown above when game is an alias to the game_no column.

As always, I hope this helps those looking to solve a problem in PostgreSQL.