MacLochlainns Weblog

Michael McLaughlin's Technical Blog

Site Admin

Archive for the ‘PostgreSQL DBA’ tag

PostgreSQL Unicode

with 3 comments

It seems unavoidable to use Windows. Each time I’m compelled to run tests on the platform I find new errors. For example, they don’t use 4-byte unicode and as a result when you want to use Unicode in PostgreSQL there’s a mismatch.

For example, change the Active Console Code Page with the chcp (change code page) to match the one PostgreSQL uses, like:

chip 1252

It lets you avoid this warning message:

Password for user postgres:
psql (14.1)
WARNING: Console code page (437) differs from Windows code page (1252)
         8-bit characters might not work correctly. See psql reference
         page "Notes for Windows users" for details.
Type "help" for help.
 
postgres=#

However, it won’t avoid display issues with real Unicode values. For example, let’s use a small international table like the following:

/* Conditionally drop the conquistador table. */
DROP TABLE IF EXISTS conquistador;
 
/* Create the conquistador table. */
CREATE TABLE conquistador
( conquistador_id   SERIAL
, conquistador      VARCHAR(30)
, actual_name       VARCHAR(30)
, nationality       VARCHAR(30)
, lang              VARCHAR(2));
 
/* Insert some conquistadors into the table. */
INSERT INTO conquistador
( conquistador
, actual_name
, nationality
, lang )
VALUES
 ('Juan de Fuca','Ioánnis Fokás','Greek','el')
,('Nicolás de Federmán','Nikolaus Federmann','German','de')
,('Sebastián Caboto','Sebastiano Caboto','Venetian','it')
,('Jorge de la Espira','Georg von Speyer','German','de')
,('Eusebio Francisco Kino','Eusebius Franz Kühn','Italian','it')
,('Wenceslao Linck','Wenceslaus Linck','Bohemian','cs')
,('Fernando Consag','Ferdinand Konšcak','Croatian','sr')
,('Américo Vespucio','Amerigo Vespucci','Italian','it')
,('Alejo García','Aleixo Garcia','Portuguese','pt');
 
/* Query the values from the conquistador table. */
SELECT * FROM conquistador;

When you call the script to load it, like:

\i testScript.sql

It’ll display the following, which you can check against the strings in the VALUES clause above. There are encoding issues on lines 1, 2, 3, 5, 7, and 8 below.

 conquistador_id |      conquistador      |     actual_name      | nationality | lang
-----------------+------------------------+----------------------+-------------+------
               1 | Juan de Fuca           | Ioánnis Fokás      | Greek       | el
               2 | Nicolás de Federmán  | Nikolaus Federmann   | German      | de
               3 | Sebastián Caboto      | Sebastiano Caboto    | Venetian    | it
               4 | Jorge de la Espira     | Georg von Speyer     | German      | de
               5 | Eusebio Francisco Kino | Eusebius Franz Kühn | Italian     | it
               6 | Wenceslao Linck        | Wenceslaus Linck     | Bohemian    | cs
               7 | Fernando Consag        | Ferdinand Konšcak   | Croatian    | sr
               8 | Américo Vespucio      | Amerigo Vespucci     | Italian     | it
               9 | Alejo García           | Aleixo Garcia        | Portuguese  | pt
(9 rows)

If you’re like me, it was annoying. The problem is that the native 2-byte Unicode of Microsoft sends values into PostgreSQL that are invalid. Those codes are read back with unintended values from other character encoding sets.

While you can’t set Windows generic encoding to 65001 without causing the system problems, you can set Active Console Code Page value in the scope of a Command-Line session before running the script.

The chcp command lets you set it to 4-byte Unicode, like:

chcp 65001

Now, rerun the script and PostgreSQL will display the correct character encoding set with some spacing irregularities. However, that’s not what’s important when you call table from another programming language through the ODBC-layer. The data will be returned in a 4-byte Unicode encoding stream.

 conquistador_id |      conquistador      |     actual_name      | nationality | lang
-----------------+------------------------+----------------------+-------------+------
               1 | Juan de Fuca           | Ioánnis Fokás        | Greek       | el
               2 | Nicolás de Federmán    | Nikolaus Federmann   | German      | de
               3 | Sebastián Caboto       | Sebastiano Caboto    | Venetian    | it
               4 | Jorge de la Espira     | Georg von Speyer     | German      | de
               5 | Eusebio Francisco Kino | Eusebius Franz Kühn  | Italian     | it
               6 | Wenceslao Linck        | Wenceslaus Linck     | Bohemian    | cs
               7 | Fernando Consag        | Ferdinand Konšcak    | Croatian    | sr
               8 | Américo Vespucio       | Amerigo Vespucci     | Italian     | it
               9 | Alejo García           | Aleixo Garcia        | Portuguese  | pt
(9 rows)

A similar error to what I encountered testing MySQL Workbench’s ability to export SQL Server databases 10 years ago. I thought giving a solution to get coerce correct 4-byte Unicode data insertion may help those who also may be surprised by the behavior.

PL/pgSQL List to Struct

without comments

This blog post addresses how to convert a list of values into a structure (in C/C++ its a struct, in Java its an ArrayList, and PL/pgSQL it’s an array of a type). The cast_strings function converts a list of strings into a record data structure. It calls the verify_date function to identify a DATE data type and uses regular expressions to identify numbers and strings.

You need to build the struct type below first.

CREATE TYPE struct AS
( xnumber  DECIMAL
, xdate    DATE
, xstring  VARCHAR(100));

The cast_strings function is defined below:

CREATE FUNCTION cast_strings
( pv_list  VARCHAR(10)[] ) RETURNS struct AS
  $$
  DECLARE
  /* Declare a UDT and initialize an empty struct variable. */
  lv_retval  STRUCT := (null, null, null); 
  BEGIN  
    /* Loop through list of values to find only the numbers. */
    FOR i IN 1..ARRAY_LENGTH(pv_list,1) LOOP
      /* Order if statements by evaluation. */
      CASE
        /* Check for a value with only digits. */
        WHEN lv_retval.xnumber IS NULL AND REGEXP_MATCH(pv_list[i],'^[0-9]+$') IS NOT NULL THEN
          lv_retval.xnumber := pv_list[i];
        /* Check for a valid date. */
        WHEN lv_retval.xdate IS NULL AND verify_date(pv_list[i]) IS NOT NULL THEN
          lv_retval.xdate := pv_list[i];
        /* Check for a string with characters, whitespace, and digits. */
        WHEN lv_retval.xstring IS NULL AND REGEXP_MATCH(pv_list[i],'^[A-Za-z 0-9]+$') IS NOT NULL THEN
          lv_retval.xstring := pv_list[i];
        ELSE
          NULL;
      END CASE;
    END LOOP;
 
    /* Print the results. */
    RETURN lv_retval;
  END;
$$ LANGUAGE plpgsql;

There are two test cases for the cast_strings function. One uses a DO-block and the other a query.

  • The first use-case checks with a DO-block:

    DO
    $$
    DECLARE
      lv_list    VARCHAR(11)[] := ARRAY['86','1944-04-25','Happy'];
      lv_struct  STRUCT;
    BEGIN
      /* Pass the array of strings and return a record type. */
      lv_struct := cast_strings(lv_list);
     
      /* Print the elements returned. */
      RAISE NOTICE '[%]', lv_struct.xnumber;
      RAISE NOTICE '[%]', lv_struct.xdate;
      RAISE NOTICE '[%]', lv_struct.xstring;
    END;
    $$;

    It should return:

    psql:verify_pg.SQL:263: NOTICE:  [86]
    psql:verify_pg.SQL:263: NOTICE:  [1944-04-25]
    psql:verify_pg.SQL:263: NOTICE:  [Happy]

    The program returns a structure with values converted into their appropriate data type.

  • The second use-case checks with a query:

    WITH get_struct AS
    (SELECT cast_strings(ARRAY['99','2015-06-14','Agent 99']) AS mystruct)
    SELECT (mystruct).xnumber
    ,      (mystruct).xdate
    ,      (mystruct).xstring
    FROM    get_struct;

    It should return:

     xnumber |   xdate    | xstring
    ---------+------------+----------
          99 | 2015-06-14 | Agent 99
    (1 row)

    The query defines a call to the cast_strings function with a valid set of values and then displays the elements of the returned structure.

As always, I hope this helps those looking for how to solve this type of problem. Just a quick reminder that this was written and tested in PostgreSQL 14.

PL/pgSQL Date Function

with 2 comments

This post provides an example of using PostgreSQL’s REGEXP_MATCH function, which works very much like the REGEXP_LIKE function in Oracle and a verify_date function that converts a string data type to date data type.

Here’s a basic function to show how to use a generic REGEXP_MATCH function:

1
2
3
4
5
6
7
8
9
10
11
DO
$$
DECLARE
  lv_date_in  DATE := '2022-10-22';
BEGIN
 
  IF (REGEXP_MATCH('2022-10-02','^[0-9]{4,4}-[0-9]{2,2}-[0-9]{2,2}$') IS NOT NULL) THEN
    RAISE NOTICE '[%]', 'Truth';
  END IF;
END;
$$;

The following is a verify_date function, which takes a string with the ‘YYYY-MM-DD’ or ‘YY-MM-DD’ format and returns a BOOLEAN true or false value.

CREATE FUNCTION verify_date
  ( IN pv_date_in  VARCHAR(10)) RETURNS BOOLEAN AS
  $$
  DECLARE
    /* Local return variable. */
    lv_retval  BOOLEAN := FALSE;
  BEGIN
    /* Check for a YYYY-MM-DD or YYYY-MM-DD string. */
    IF REGEXP_MATCH(pv_date_in,'^[0-9]{2,4}-[0-9]{2,2}-[0-9]{2,2}$') IS NOT NULL THEN
 
      /* Case statement checks for 28 or 29, 30, or 31 day month. */
      CASE
        /* Valid 31 day month date value. */
        WHEN (LENGTH(pv_date_in) = 10 AND
              SUBSTRING(pv_date_in,6,2) IN ('01','03','05','07','08','10','12') AND
              TO_NUMBER(SUBSTRING(pv_date_in,9,2),'99') BETWEEN 1 AND 31) OR
             (LENGTH(pv_date_in) = 8 AND
              SUBSTRING(pv_date_in,4,2) IN ('01','03','05','07','08','10','12') AND
              TO_NUMBER(SUBSTRING(pv_date_in,7,2),'99') BETWEEN 1 AND 31) THEN 
          lv_retval := TRUE;
 
        /* Valid 30 day month date value. */
        WHEN (LENGTH(pv_date_in) = 10 AND
              SUBSTRING(pv_date_in,6,2) IN ('04','06','09','11') AND
              TO_NUMBER(SUBSTRING(pv_date_in,9,2),'99') BETWEEN 1 AND 30) OR
             (LENGTH(pv_date_in) = 8 AND
              SUBSTRING(pv_date_in,4,2) IN ('04','06','09','11') AND
              TO_NUMBER(SUBSTRING(pv_date_in,7,2),'99') BETWEEN 1 AND 30) THEN 
          lv_retval := TRUE;
 
        /* Valid 28 or 29 day month date value. */
        WHEN (LENGTH(pv_date_in) = 10 AND SUBSTRING(pv_date_in,6,2) = '02') OR
             (LENGTH(pv_date_in) =  8 AND SUBSTRING(pv_date_in,4,2) = '02') THEN
          /* Verify 4-digit year. */
          IF (LENGTH(pv_date_in) = 10 AND
              MOD(TO_NUMBER(SUBSTRING(pv_date_in,1,4),'99'),4) = 0 AND
              TO_NUMBER(SUBSTRING(pv_date_in,9,2),'99') BETWEEN 1 AND 29) OR
             (LENGTH(pv_date_in) =  8 AND
              MOD(TO_NUMBER(SUBSTRING(TO_CHAR(TO_DATE(pv_date_in,'YYYY-MM-DD'),'YYYY-MM-DD'),1,4),'99'),4) = 0 AND
              TO_NUMBER(SUBSTRING(pv_date_in,7,2),'99') BETWEEN 1 AND 29) THEN
            lv_retval := TRUE;
          ELSE /* Not a leap year. */
            IF (LENGTH(pv_date_in) = 10 AND
                TO_NUMBER(SUBSTRING(pv_date_in,9,2),'99') BETWEEN 1 AND 28) OR
               (LENGTH(pv_date_in) = 8 AND
                TO_NUMBER(SUBSTRING(pv_date_in,7,2),'99') BETWEEN 1 AND 28)THEN
              lv_retval := TRUE;
            END IF;
          END IF;
       NULL;
      END CASE;
    END IF;
 
    /* Return date. */
    RETURN lv_retval;
  END;
$$ LANGUAGE plpgsql;

The following four SQL test cases:

SELECT verify_date('2020-07-04') AS "verify_date('2020-07-04')";
SELECT verify_date('71-05-31')   AS "verify_date('71-05-31')";
SELECT verify_date('2024-02-29') AS "verify_date('2024-02-29')";
SELECT verify_date('2019-04-31') AS "verify_date('2019-04-31')";

Return the following:

 verify_date('2020-07-04')
---------------------------
 t
(1 row)
 
 
 verify_date('71-05-31')
-------------------------
 t
(1 row)
 
 
 verify_date('2024-02-29')
---------------------------
 t
(1 row)
 
 
 verify_date('2019-04-31')
---------------------------
 f
(1 row)

As always, I hope the example code fills somebody’s need.

Written by maclochlainn

May 25th, 2022 at 1:47 am

PL/pgSQL Coupled Loops

without comments

I love a challenge. A loyal Oracle PL/SQL developer said PL/pgSQL couldn’t support coupled loops and user-defined lists. Part true and part false. It’s true PL/pgSQL couldn’t support user-defined lists because it supports arrays. It’s false because PL/pgSQL supports an ARRAY_APPEND function that lets you manage arrays like Java’s ArrayList class.

Anyway, without further ado. You only need to create one data type because PL/pgSQL supports natural array syntax, like Java, C#, and other languages and doesn’t adhere rigidly to the Information Definition Language (IDL) standard that Oracle imposes. Oracle requires creating an Attribute Data Type (ADT) for the string collections, which you can avoid in PL/pgSQL.

You do need to create a record structure type, like:

/* Create a lyric object type. */
CREATE TYPE lyric AS
( day   VARCHAR(8)
, gift  VARCHAR(24));

You can build a function to accept an array of strings and an array of record structures that returns a new array constructed from parts of the two input arrays. The function also compares and matches the two arrays before returning an array that combines strings for a songs lyrics. While the example uses the ever boring 12 Days of Christmas, I’d love another for examples. It just needs to use this type of repetitive structure. If you have one that you would like to share let me know.

The twelve_days function is:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
CREATE FUNCTION twelve_days
  ( IN pv_days   VARCHAR(8)[]
  , IN pv_gifts  LYRIC[] ) RETURNS VARCHAR[] AS
$$  
DECLARE 
  /* Initialize the collection of lyrics. */
  lv_retval  VARCHAR(36)[114];
BEGIN
  /* Read forward through the days. */
  FOR i IN 1..ARRAY_LENGTH(pv_days,1) LOOP
    lv_retval := ARRAY_APPEND(lv_retval,('On the ' || pv_days[i] || ' day of Christmas')::text);
    lv_retval := ARRAY_APPEND(lv_retval,('my true love sent to me:')::text);
 
    /* Read backward through the lyrics based on the ascending value of the day. */
    FOR j IN REVERSE i..1 LOOP
      IF i = 1 THEN
        lv_retval := ARRAY_APPEND(lv_retval,('-'||'A'||' '|| pv_gifts[j].gift)::text);
      ELSIF j <= i THEN
        lv_retval := ARRAY_APPEND(lv_retval,('-'|| pv_gifts[j].day ||' '|| pv_gifts[j].gift )::text);
      END IF;
    END LOOP;
 
    /* A line break by verse. */
    lv_retval := ARRAY_APPEND(lv_retval,' '::text);
  END LOOP;
 
  /* Return the song's lyrics. */
  RETURN lv_retval;
END;
$$ LANGUAGE plpgsql;

Then, you can test it with this query:

SELECT UNNEST(twelve_days(ARRAY['first','second','third','fourth'
                          ,'fifth','sixth','seventh','eighth'
                          ,'nineth','tenth','eleventh','twelfth']
                         ,ARRAY[('and a','Partridge in a pear tree')::lyric
                          ,('Two','Turtle doves')::lyric
                          ,('Three','French hens')::lyric
                          ,('Four','Calling birds')::lyric
                          ,('Five','Golden rings')::lyric
                          ,('Six','Geese a laying')::lyric
                          ,('Seven','Swans a swimming')::lyric
                          ,('Eight','Maids a milking')::lyric
                          ,('Nine','Ladies dancing')::lyric
                          ,('Ten','Lords a leaping')::lyric
                          ,('Eleven','Pipers piping')::lyric
                          ,('Twelve','Drummers drumming')::lyric])) AS "12-Days of Christmas";

It prints:

       12-Days of Christmas
----------------------------------
 On the first day of Christmas
 my true love sent to me:
 -A Partridge in a pear tree
 
 On the second day of Christmas
 my true love sent to me:
 -Two Turtle doves
 -and a Partridge in a pear tree
 
 On the third day of Christmas
 my true love sent to me:
 -Three French hens
 -Two Turtle doves
 -and a Partridge in a pear tree
 
... Redacted for space ...
 
On the twelfth day of Christmas
 my true love sent to me:
 -Twelve Drummers drumming
 -Eleven Pipers piping
 -Ten Lords a leaping
 -Nine Ladies dancing
 -Eight Maids a milking
 -Seven Swans a swimming
 -Six Geese a laying
 -Five Golden rings
 -Four Calling birds
 -Three French hens
 -Two Turtle doves
 -and a Partridge in a pear tree

So, I believe that I met the challenge and hopefully provided a concrete example of some syntax that seems to be missing from most of the typical places.

Written by maclochlainn

May 16th, 2022 at 1:32 am

Record Type Arrays

with one comment

Another question that I was asked today: “Can you create an array of a record type in PL/pgSQL?” The answer is yes.

You first have to create a type, which is what you do when you want to create a table with an embedded table. This is a simple full_name record type:

CREATE TYPE full_name AS
( first_name   VARCHAR(20)
, middle_name  VARCHAR(20)
, last_name    VARCHAR(20));

The following DO block shows you how to create a record type array and then print it’s contents in a FOR-LOOP:

DO
$$
DECLARE
  -- An array of full_name records.
  list  full_name[] = 
          array[('Harry','James','Potter')
               ,('Ginevra','Molly','Potter')
               ,('James','Sirius','Potter')
               ,('Albus','Severus','Potter')
               ,('Lily','Luna','Potter')];
BEGIN
  -- Loop through the integers.
  FOR i IN 1..CARDINALITY(list) LOOP
    RAISE NOTICE '%, % %', list[i].last_name, list[i].first_name, list[i].middle_name;
  END LOOP;
END;
$$;

Since you typically only have a single dimension array with record-type structure, using CARDINALITY is clearer than ARRAY_LENGTH(list,1). If you don’t agree use the latter.

It prints the following:

NOTICE:  Potter, Harry James
NOTICE:  Potter, Ginevra Molly
NOTICE:  Potter, James Sirius
NOTICE:  Potter, Albus Severus
NOTICE:  Potter, Lily Luna
DO

As always, I hope this helps those looking for a solution to this type of problem.

Multidimension Arrays

without comments

Picking up where I left off on yesterday’s post on PostgreSQL arrays, you can also write multidimensional arrays provided all the nested arrays are equal in size. You can’t use the CARDINALITY function to determine the length of nested arrays, you must use the ARRAY_LENGTH to determine the length of subordinate arrays.

Here’s an example file with a multidimensional array of integers:

DO
$$
DECLARE
  /* Declare an array of integers with a subordinate array of integers. */
  list  int[][] = array[array[1,2,3,4]
                       ,array[1,2,3,4]
                       ,array[1,2,3,4]
                       ,array[1,2,3,4]
                       ,array[1,2,3,4]];
  row   varchar(20) = '';
BEGIN
  /* Loop through the first dimension of integers. */
  <<Outer>>
  FOR i IN 1..ARRAY_LENGTH(list,1) LOOP
    row = '';
    /* Loop through the second dimension of integers. */
    <<Inner>>
    FOR j IN 1..ARRAY_LENGTH(list,2) LOOP
      IF LENGTH(row) = 0 THEN
        row = row || list[i][j];
      ELSE
        row = row || ',' || list[i][j];
      END IF;
    END LOOP;
    /* Exit outer loop. */
    RAISE NOTICE 'Row [%][%]', i, row;
  END LOOP;
END;
$$;

It prints:

NOTICE:  Row [1][1,2,3,4]
NOTICE:  Row [2][1,2,3,4]
NOTICE:  Row [3][1,2,3,4]
NOTICE:  Row [4][1,2,3,4]
NOTICE:  Row [5][1,2,3,4]
DO

Multidimensional arrays are unique to PostgreSQL but you can have nested lists of tables or varrays inside an Oracle database. Oracle also supports nested lists that are asynchronous.

As always, I hope this helps those trying sort out the syntax.

PL/pgSQL Array Listing

without comments

Somebody asked me how to navigate a collection in PostgreSQL’s PL/pgSQL and whether they supported table and varray data types, like Oracle’s PL/SQL. The most important thing to correct was that PostgreSQL supports only array types.

The only example that I found with a google search used a FOREACH-loop, like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
DO
$$
DECLARE
  /* An array of integers. */
  list  int[] = array[1,2,3,4,5];
  /* Define a local variable for array members. */
  i     int;
BEGIN
  /* Loop through the integers. */
  FOREACH i IN ARRAY list LOOP
    RAISE NOTICE '[%]', i;
  END LOOP;
END;
$$;

It prints:

NOTICE:  [1]
NOTICE:  [2]
NOTICE:  [3]
NOTICE:  [4]
NOTICE:  [5]

As I suspected the student didn’t want to use a FOREACH-loop. The student wanted to use a for-loop, which was much closer to the Oracle PL/SQL syntax with which they were most familiar. That example is:

1
2
3
4
5
6
7
8
9
10
11
12
DO
$$
DECLARE
  /* An array of integers. */
  list  int[] = array[1,2,3,4,5];
BEGIN
  /* Loop through the integers. */
  FOR i IN 1..5 LOOP
    RAISE NOTICE '[%]', list[i];
  END LOOP;
END;
$$;

However, it’s bad form to use a literal for the upper number in a range for-loop, and you should use the CARDINALITY function in PostgreSQL because there is no collection API, like Oracle’s COUNT method. There is an ARRAY_LENGTH function but it’s really only necessary when you use a multidimensional array.

The modified code is:

1
2
3
4
5
6
7
8
9
10
11
12
DO
$$
DECLARE
  -- An array of integers.
  list  int[] = array[1,2,3,4,5];
BEGIN
  /* Loop through the integers. */
  FOR i IN 1..CARDINALITY(list) LOOP
    RAISE NOTICE '[%]', list[i];
  END LOOP;
END;
$$;

If you use the ARRAY_LENGTH function, line #8 would look like:

7
8
  /* Loop through the integers, and determines the length of the first dimension. */
  FOR i IN 1..ARRAY_LENGTH(list,1) LOOP

As always, I hope this helps those looking for a clear solution to basic activities.

Written by maclochlainn

April 27th, 2022 at 1:21 am

PostgreSQL Arrays

with one comment

If you’re wondering about this post, it shows the basic array of a set of integers and strings before showing you how to create nested tables of data in PostgreSQL. By the way, they’re not called nested tables in PostgreSQL, like they are in Oracle but perform like their Oracle cousins.

Let’s create a table with an auto-incrementing column and two arrays, one array of integers and another of strings:

-- Conditionally drop the demo table.
DROP TABLE IF EXISTS demo;
 
-- Create the test table.
CREATE TABLE demo
( demo_id     serial
, demo_number integer[5]
, demo_string varchar(5)[7]);

You can insert test values like this:

INSERT INTO demo
(demo_number, demo_string)
VALUES
( array[1,2,3,4,5]
, array['One','Two','Three','Four','Five','Six','Seven']);

Then, you can query them with this unnest function, like:

SELECT  unnest(demo_number) AS numbers
,       unnest(demo_string) AS strings
FROM    demo;

It returns:

 numbers | strings
---------+---------
 1       | One
 2       | Two
 3       | Three
 4       | Four
 5       | Five
         | Six
         | Seven
(7 rows)

You may note that the two arrays are asymmetrical. It only becomes an issue when you navigate the result in a PL/pgSQL cursor or imperative programming language, like Python.

Now, let’s do something more interesting like work with a composite user-defined type, like the player structure. You would create the composite user-defined type with this syntax:

-- Conditionally drop the player type.
DROP TYPE IF EXISTS player;
 
-- Create the player type.
CREATE TYPE player AS
( player_no        integer
, player_name      varchar(24)
, player_position  varchar(14)
, ab               integer
, r                integer
, h                integer
, bb               integer
, rbi              integer );

You can create a world_series table that include a players column that uses an array of player type, like

-- Conditionally drop the world_series table.
DROP TABLE IF EXISTS world_series;
 
-- Create the player type.
CREATE TABLE world_series
( world_series_id  serial
, team             varchar(24)
, players          player[30]
, game_no          integer
, year             integer );

If you’re familiar with the Oracle Database, you’d have to specify a tested table in the syntax. Fortunately, PostgreSQL doesn’t require that.

Insert two rows with the following statement:

INSERT INTO world_series
( team
, players
, game_no
, year )
VALUES
('San Francisco Giants'
, array[(24,'Willie Mayes','Center Fielder',5,0,1,0,0)::player
       ,(5,'Tom Haller','Catcher',4,1,2,0,2)::player]
, 4
, 1962 );

You can append to the array with the following syntax. A former student and I have a disagreement on whether this is shown in the PostgreSQL 8.15.4 Modifying Array documentation. I believe it can be inferred from the document and he doesn’t believe so. Anyway, here’s how you add an element to an existing array in a table with the UPDATE statement:

UPDATE world_series
SET    players = (SELECT array_append(players,(7,'Henry Kuenn','Right Fielder',3,0,0,1,0)::player) FROM world_series)
WHERE  team = 'San Francisco Giants'
AND    year = 1962
AND    game_no = 4;

Like Oracle’s nested tables, PostgreSQL’s arrays of composite user-defined types requires writing a PL/pgSQL function. I’ll try to add one of those shortly in another blog entry to show you how to edit and replace entries in stored arrays of composite user-defined types.

You can query the unnested rows and get a return set like a Python tuple with the following query:

SELECT unnest(players) AS player_list
FROM   world_series
WHERE  team = 'San Francisco Giants'
AND    year = 1962
AND    game_no = 4;

It returns the three rows from the players array:

                 player_list
----------------------------------------------
 (24,"Willie Mayes","Center Field",5,0,1,0,0)
 (5,"Tom Haller",Catcher,4,1,2,0,2)
 (7,"Henry Kuenn","Right Fielde",3,0,0,1,0)
(3 rows)

It returns the data set in entry-order. If we step outside of the standard 8.15 Arrays PostgreSQL Documentation, you can do much more with arrays (or nested tables). The balance of this example demonstrates some new syntax that helps you achieve constructive outcomes in PostgreSQL.

You can use a Common Table Expression (CTE) to get the columnar display of the player composite user-defined type. This type of solution is beyond the standard , like:

WITH list AS
 (SELECT unnest(players) AS row_result
  FROM   world_series
  WHERE  team = 'San Francisco Giants'
  AND    year = 1962
  AND    game_no = 4)
SELECT  (row_result).player_name
,       (row_result).player_no
,       (row_result).player_position
FROM     list;

If you’re unfamiliar with accessing composite user-defined types, I wrote a post on that 7 years ago. You can find the older blog entry PostgreSQL Composites on my blog.

It returns only the three requested columns of the player composite user-defined type:

 player_name  | player_no | player_position
--------------+-----------+-----------------
 Willie Mayes |        24 | Center Fielder
 Tom Haller   |         5 | Catcher
 Henry Kuenn  |         7 | Right Fielder
(3 rows)

You should note that the data is presented in an entry-ordered manner when unnested alone in the SELECT-list. That behavior changes when the SELECT-list includes non-array data.

The easiest way to display data from the non-array and array columns is to list them inside the SELECT-list of the CTE, like:

WITH list AS
 (SELECT game_no AS game
  ,      year
  ,      unnest(players) AS row_result
  FROM   world_series
  WHERE  team = 'San Francisco Giants'
  AND    year = 1962
  AND    game_no = 4)
SELECT   game
,        year 
,       (row_result).player_name
,       (row_result).player_no
,       (row_result).player_position
FROM     list;

It returns an ordered set of unnested rows when you include non-array columns, like:

 game | year | player_name  | player_no | player_position
------+------+--------------+-----------+-----------------
    4 | 1962 | Henry Kuenn  |         7 | Right Fielder
    4 | 1962 | Tom Haller   |         5 | Catcher
    4 | 1962 | Willie Mayes |        24 | Center Fielder
(3 rows)

While you can join the world_series table to the unnested array rows (returned as a derived table, its a bad idea. The mechanics to do it require you to return the primary key column in the same SELECT-list of the CTE. Then, you join the CTE list to the world_series table by using the world_series_id primary key.

However, there is no advantage to an inner join approach and it imposes unnecessary processing on the database server. The odd rationale that I heard when I noticed somebody was using a CTE to base-table join was: “That’s necessary so they could use column aliases for the non-array columns.” That’s not true because you can use the aliases inside the CTE, as shown above when game is an alias to the game_no column.

As always, I hope this helps those looking to solve a problem in PostgreSQL.

Python on PostgreSQL

without comments

The ODBC library you use when connecting Python to PostgreSQL is the psycopg2 Python library. This blog post will show use how to use it in Python and install it on your Fedora Linux installation. It leverages a videodb database that I show you how to build in this earlier post on configuring PostgreSQL 14.

You would import psycopg2 as follows in your Python code:

import psycopg2

Unfortunately, that only works on Linux servers when you’ve installed the library. That library isn’t installed with generic Python libraries. You get the following error when the psycopg2 library isn’t installed on your server.

Traceback (most recent call last):
  File "python_new_hire.sql", line 1, in <module>
    import psycopg2
ModuleNotFoundError: No module named 'psycopg2'

You can install it on Fedora Linux with the following command:

yum install python3-psycopg2

It will install:

====================================================================================
 Package                  Architecture   Version               Repository      Size
====================================================================================
Installing:
 python3-psycopg2         x86_64         2.7.7-1.fc30          fedora         160 k
 
Transaction Summary
====================================================================================
Install  1 Package
 
Total download size: 160 k
Installed size: 593 k
Is this ok [y/N]: y
Downloading Packages:
python3-psycopg2-2.7.7-1.fc30.x86_64.rpm            364 kB/s | 160 kB     00:00    
------------------------------------------------------------------------------------
Total                                               167 kB/s | 160 kB     00:00     
Running transaction check
Transaction check succeeded.
Running transaction test
Transaction test succeeded.
Running transaction
  Preparing        :                                                            1/1 
  Installing       : python3-psycopg2-2.7.7-1.fc30.x86_64                       1/1 
  Running scriptlet: python3-psycopg2-2.7.7-1.fc30.x86_64                       1/1 
  Verifying        : python3-psycopg2-2.7.7-1.fc30.x86_64                       1/1 
 
Installed:
  python3-psycopg2-2.7.7-1.fc30.x86_64                                              
 
Complete!

Here’s a quick test case that you can run in PostgreSQL and Python to test all the pieces. The first SQL script creates a new_hire table and inserts two rows, and the Python program queries data from the new_hire table.

The new_hire.sql file creates the new_hire table and inserts two rows:

-- Environment settings for the script.
SET SESSION "videodb.table_name" = 'new_hire';
SET CLIENT_MIN_MESSAGES TO ERROR;
 
--  Verify table name.
SELECT current_setting('videodb.table_name');
 
-- ------------------------------------------------------------------
--  Conditionally drop table.
-- ------------------------------------------------------------------
DROP TABLE IF EXISTS new_hire CASCADE;
 
-- ------------------------------------------------------------------
--  Create table.
-- -------------------------------------------------------------------
CREATE TABLE new_hire
( new_hire_id  SERIAL
, first_name   VARCHAR(20)  NOT NULL
, middle_name  VARCHAR(20)
, last_name    VARCHAR(20)  NOT NULL
, hire_date    TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP
, PRIMARY KEY (new_hire_id));
 
-- Alter the sequence by restarting it at 1001.
ALTER SEQUENCE new_hire_new_hire_id_seq RESTART WITH 1001;
 
-- Display the table organization.
SELECT   tc.table_catalog || '.' || tc.constraint_name AS constraint_name
,        tc.table_catalog || '.' || tc.table_name AS table_name
,        kcu.column_name
,        ccu.table_catalog || '.' || ccu.table_name AS foreign_table_name
,        ccu.column_name AS foreign_column_name
FROM     information_schema.table_constraints AS tc JOIN information_schema.key_column_usage AS kcu
ON       tc.constraint_name = kcu.constraint_name
AND      tc.table_schema = kcu.table_schema JOIN information_schema.constraint_column_usage AS ccu
ON       ccu.constraint_name = tc.constraint_name
AND      ccu.table_schema = tc.table_schema
WHERE    tc.constraint_type = 'FOREIGN KEY'
AND      tc.table_name = current_setting('videodb.table_name')
ORDER BY 1;
 
SELECT c1.table_name
,      c1.ordinal_position
,      c1.column_name
,      CASE
         WHEN c1.is_nullable = 'NO' AND c2.column_name IS NOT NULL THEN 'PRIMARY KEY'
         WHEN c1.is_nullable = 'NO' AND c2.column_name IS NULL THEN 'NOT NULL'
       END AS is_nullable
,      CASE
         WHEN data_type = 'character varying' THEN
           data_type||'('||character_maximum_length||')'
         WHEN data_type = 'numeric' THEN
           CASE
             WHEN numeric_scale != 0 AND numeric_scale IS NOT NULL THEN
               data_type||'('||numeric_precision||','||numeric_scale||')'
             ELSE
               data_type||'('||numeric_precision||')'
             END
         ELSE
           data_type
        END AS data_type
FROM    information_schema.columns c1 LEFT JOIN
          (SELECT trim(regexp_matches(column_default,current_setting('videodb.table_name'))::text,'{}')||'_id' column_name
           FROM   information_schema.columns) c2
ON       c1.column_name = c2.column_name
WHERE    c1.table_name = current_setting('videodb.table_name')
ORDER BY c1.ordinal_position;
 
-- Display primary key and unique constraints.
SELECT constraint_name
,      lower(constraint_type) AS constraint_type
FROM   information_schema.table_constraints
WHERE  table_name = current_setting('videodb.table_name')
AND    constraint_type IN ('PRIMARY KEY','UNIQUE');
 
-- Insert two test records.
INSERT INTO new_hire
( first_name, middle_name, last_name, hire_date )
VALUES
 ('Malcolm','Jacob','Lewis','2018-2-14')
,('Henry',null,'Chabot','1990-07-31');

You can put it into a local directory, connect as the student user to a videodb database, and run the following command (or any database you’ve created).

\i new_hire.sql

The new_hire.py file creates the new_hire table and inserts two rows:

# Import the PostgreSQL connector library.
import psycopg2
 
try:
  # Open a connection to the database.
  connection = psycopg2.connect( user="student"
                               , password="student"
                               , port="5432"
                               , dbname="videodb")
 
  # Open a cursor.
  cursor = connection.cursor()
 
  # Assign a static query.
  query = "SELECT new_hire_id, first_name, last_name " \
          "FROM new_hire"
 
  # Parse and execute the query.
  cursor.execute(query)
 
  # Fetch all rows from a table.
  records = cursor.fetchall()
 
  # Read through and print the rows as tuples.
  for row in range(0, len(records)):
    print(records[row]) 
 
except (Exception, psycopg2.Error) as error :
  print("Error while fetching data from PostgreSQL", error)
 
finally:
  # Close the database connection.
  if (connection):
    cursor.close()
    connection.close()

You run it from the command line, like:

python3 ./new_hire.py

It should print:

(1001, 'Malcolm', 'Lewis')
(1002, 'Henry', 'Chabot')

As always, I hope this helps those trying to sort out how to connect Python to PostgreSQL.

Written by maclochlainn

March 2nd, 2022 at 1:06 am

PostgreSQL Tables

without comments

The most straightforward way to view the description of a PostgreSQL table is the \d command. For example, this lets you display an account_list table:

\d account_list

Unfortunately, this shows you the table, indexes, and foreign key constraints. Often, you only want to see the list of columns in positional order. So, I wrote a little function to let me display only the table and columns.

There are a few techniques in the script that might seem new to some developers. For example, the data types of the return parameter values of a function that returns values from the data dictionary are specific to types used by the data dictionary. These specialized types are required because the SQL cursor gathers the information from the data dictionary in the information_schema, and most of these types can’t be cast as variable length strings.

A simple assumption that the data dictionary strings would implicitly cast to variable length strings is incorrect. That’s because while you can query them like VARCHAR variables they don’t cast to variable length string. If you wrote a wrapper function that returned VARCHAR variables, you would probably get a result like this when you call your function:

ERROR:  structure of query does not match function result type
DETAIL:  Returned type information_schema.sql_identifier does not match expected type character varying in column 1.

The “character varying” is another name for a VARCHAR data type. Some notes will advise you to fix this type of error by using the column name and a %TYPE. The %TYPE anchors the data type in the function’s parameter list to the actual data type of the data dictionary’s table. You would implement that suggestion with code like:

RETURNS TABLE ( table_schema      information_schema.columns.table_schema%TYPE
              , table_name        information_schema.columns.table_name%TYPE
              , ordinal_position  information_schema.columns.ordinal_position%TYPE
              , column_name       information_schema.columns.column_name%TYPE
              , data_type         information_schema.columns.data_type%TYPE
              , is_nullable       information_schema.columns.is_nullable%TYPE ) AS

Unfortunately, your function would raise a NOTICE for every dynamically anchored column at runtime. The NOTICE messages would appear as follows for the describe_table function with anchored parameter values:

psql:describe_table.sql:34: NOTICE:  type reference information_schema.columns.table_schema%TYPE converted to information_schema.sql_identifier
psql:describe_table.sql:35: NOTICE:  type reference information_schema.columns.table_name%TYPE converted to information_schema.sql_identifier
psql:describe_table.sql:36: NOTICE:  type reference information_schema.columns.ordinal_position%TYPE converted to information_schema.cardinal_number
psql:describe_table.sql:37: NOTICE:  type reference information_schema.columns.column_name%TYPE converted to information_schema.sql_identifier
psql:describe_table.sql:38: NOTICE:  type reference information_schema.columns.data_type%TYPE converted to information_schema.character_data
psql:describe_table.sql:39: NOTICE:  type reference information_schema.columns.is_nullable%TYPE converted to information_schema.yes_or_no

As a rule, there’s a better solution when you know how to discover the underlying data types. You can discover the required data types with the following query of the pg_attribute table in the information_schema:

SELECT attname
,      atttypid::regtype
FROM   pg_attribute
WHERE  attrelid = 'information_schema.columns'::regclass
AND    attname IN ('table_schema','table_name','ordinal_position','column_name','data_type','is_nullable')
ORDER  BY attnum;

It returns:

     attname      |              atttypid
------------------+------------------------------------
 table_schema     | information_schema.sql_identifier
 table_name       | information_schema.sql_identifier
 ordinal_position | information_schema.cardinal_number
 column_name      | information_schema.sql_identifier
 is_nullable      | information_schema.yes_or_no
 data_type        | information_schema.character_data
(6 rows)

Only the character_data type can be replaced with a VARCHAR data type, the others should be typed as shown above. Here’s the modified describe_table function.

CREATE OR REPLACE
  FUNCTION describe_table (table_name_in  VARCHAR)
  RETURNS TABLE ( table_schema      information_schema.sql_identifier
                , table_name        information_schema.sql_identifier
                , ordinal_position  information_schema.cardinal_number
                , column_name       information_schema.sql_identifier
                , data_type         VARCHAR
                , is_nullable       information_schema.yes_or_no ) AS
$$
BEGIN
  RETURN QUERY
  SELECT   c.table_schema
  ,        c.table_name
  ,        c.ordinal_position
  ,        c.column_name
  ,        CASE
             WHEN c.character_maximum_length IS NOT NULL
             THEN CONCAT(c.data_type, '(', c.character_maximum_length, ')')
             ELSE
               CASE
                 WHEN c.data_type NOT IN ('date','timestamp','timestamp with time zone')
                 THEN CONCAT(c.data_type, '(', numeric_precision::text, ')')
                 ELSE c.data_type
               END
           END AS modified_type
  ,        c.is_nullable
  FROM     information_schema.columns c
  WHERE    c.table_schema NOT IN ('information_schema', 'pg_catalog')
  AND      c.table_name = table_name_in
  ORDER BY c.table_schema
  ,        c.table_name
  ,        c.ordinal_position;
END;
$$ LANGUAGE plpgsql;

If you’re new to PL/pgSQL table functions, you can check my basic tutorial on table functions. You call the describe_table table function with the following syntax:

SELECT * FROM describe_table('account_list');

It returns:

 table_schema |  table_name  | ordinal_position |   column_name    |        data_type         | is_nullable
--------------+--------------+------------------+------------------+--------------------------+-------------
 public       | account_list |                1 | account_list_id  | integer(32)              | NO
 public       | account_list |                2 | account_number   | character varying(10)    | NO
 public       | account_list |                3 | consumed_date    | date                     | YES
 public       | account_list |                4 | consumed_by      | integer(32)              | YES
 public       | account_list |                5 | created_by       | integer(32)              | NO
 public       | account_list |                6 | creation_date    | timestamp with time zone | NO
 public       | account_list |                7 | last_updated_by  | integer(32)              | NO
 public       | account_list |                8 | last_update_date | timestamp with time zone | NO
(8 rows)

As always, I hope this helps those looking for a solution to functions that wrap the data dictionary and display table data from the PostgreSQL data dictionary.

Written by maclochlainn

February 27th, 2022 at 12:43 am