MacLochlainns Weblog

Michael McLaughlin's Technical Blog

Site Admin

Archive for the ‘PostgreSQL 14’ Category

Add PostGIS to PostgreSQL

without comments

The following blog post shows you how to add PostGIS and PgRouting to your existing install of PostgeSQL 14 on the Ubuntu Desktop, Version 22.0.4. This blog post relies on information in this earlier Install and Configure PostgreSQL on Ubuntu post. Generalized documentation on PostGIS exists at this URL.

You install the postgis libraries:

sudo apt install -y postgis

You install the postgresql-14-pgrouting libraries:

sudo apt install -y postgresql-14-pgrouting

You should also install ogr2ogr program, which is a command-line utility for converting data between GIS data formats, including common file formats and common spatial databases. You install the ogr2ogr libraries:

sudo apt install -y gdal-bin

Verify the installation by using the which utility, like

which -a ogr2ogr

It should return:

/usr/bin/ogr2ogr

You can qualify the installed PostGIS packages with the following command:

dpkg -l | grep -i postgis

It should display:

ii  postgis                                    3.2.0+dfsg-1ubuntu1                     amd64        Geographic objects support for PostgreSQL
ii  postgis-doc                                3.2.0+dfsg-1ubuntu1                     all          Geographic objects support for PostgreSQL -- documentation
ii  postgresql-14-pgrouting                    3.3.0-2                                 amd64        Routing functionality support for PostgreSQL/PostGIS
ii  postgresql-14-pgrouting-scripts            3.3.0-2                                 all          Routing functionality support for PostgreSQL/PostGIS - SQL scripts
ii  postgresql-14-postgis-3                    3.2.0+dfsg-1ubuntu1                     amd64        Geographic objects support for PostgreSQL 14
ii  postgresql-14-postgis-3-scripts            3.2.0+dfsg-1ubuntu1                     all          Geographic objects support for PostgreSQL 14 -- SQL scripts

Connect as the postgres user by becoming the root user with this command:

sudo sh

Then, assume the role of the postgres user with this command:

su - postgres

Connect to the PostgreSQL database as the privileged postgres owner/user:

psql postgres

You will see the following prompt after connecting to the PostgreSQL database:

postgres@student-virtual-machine:~$ psql
psql (14.10 (Ubuntu 14.10-0ubuntu0.22.04.1))
Type "help" for help.
 
postgres=#

As the the privileged postgres owner/user issue the following commands to create the gisdb database and set a new search path for it:

CREATE DATABASE gisdb;
ALTER DATABASE gisdb SET search_path=public,postgis,contrib,tiger;

Connect to the gisdb database:

\connect gisdb

You are now connected to database gisdb as the postgres user. You change to the postgis schema, and create the following extensions in this schema.

CREATE SCHEMA postgis;
CREATE EXTENSION postgis SCHEMA postgis;
CREATE EXTENSION postgis_raster SCHEMA postgis;
CREATE EXTENSION fuzzystrmatch SCHEMA postgis;
CREATE EXTENSION address_standardizer_data_us SCHEMA postgis;

You query the modified catalog with this query:

SELECT n.nspname AS "Name"
,      pg_catalog.pg_get_userbyid(n.nspowner) AS "Owner"
FROM   pg_catalog.pg_namespace n
WHERE  n.nspname !~ '^pg_'
AND n.nspname <> 'information_schema'
ORDER BY 1;

It should return:

  Name   |  Owner   
---------+----------
 postgis | postgres
 public  | postgres
(2 rows)

You must assign the postgis_tiger_geocoder and postgis_topology without a schema assignment. The
postgis_tiger_eeocoder must be assigned by default to the tiger schema, and the postgis_topology schema.

CREATE EXTENSION postgis_tiger_geocoder;
CREATE EXTENSION postgis_topology;

You reuse the same above referenced query to see the modified catalog with this query:

It should return:

    Name    |  Owner   
------------+----------
 postgis    | postgres
 public     | postgres
 tiger      | postgres
 tiger_data | postgres
 topology   | postgres
(5 rows)

Connect as the gisdb database with this command:

\connect gisdb

You can see the active PostGIS extension with this command:

\dx postgis

It shows:

                               List of installed extensions
  Name   | Version | Schema  |                        Description                         
---------+---------+---------+------------------------------------------------------------
 postgis | 3.2.0   | postgis | PostGIS geometry and geography spatial types and functions
(1 row)

Now, you can use this query:

SELECT postgis_full_version();

to discover what PostGIS version is installed:

                                                                       postgis_full_version                                                                        
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
 POSTGIS="3.2.0 c3e3cc0" [EXTENSION] PGSQL="140" GEOS="3.10.2-CAPI-1.16.0" PROJ="8.2.1" LIBXML="2.9.12" LIBJSON="0.15" LIBPROTOBUF="1.3.3" WAGYU="0.5.0 (Internal)"
(1 row)

Connect back to as the privileged postgres owner/user with this command:

\connect postgres

Next, check the available databases with this command:

\l

It should display the following:

                                  List of databases
   Name    |  Owner   | Encoding |   Collate   |    Ctype    |   Access privileges   
-----------+----------+----------+-------------+-------------+-----------------------
 gisdb     | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 
 postgres  | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 
 template0 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
           |          |          |             |             | postgres=CTc/postgres
 template1 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
           |          |          |             |             | postgres=CTc/postgres
 videodb   | student  | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =Tc/student          +
           |          |          |             |             | student=CTc/student  +
           |          |          |             |             | dba=CTc/student
(5 rows)

As the privileged postgres owner/user make the following grants with these commands:

GRANT TEMPORARY, CONNECT ON DATABASE gisdb TO PUBLIC;
GRANT ALL PRIVILEGES ON DATABASE gisdb TO postgres;
GRANT ALL PRIVILEGES ON DATABASE gisdb TO dba;

After making the grants, check the available databases access with this \l command:

\l

It should display the following:

                                  List of databases
   Name    |  Owner   | Encoding |   Collate   |    Ctype    |   Access privileges   
-----------+----------+----------+-------------+-------------+-----------------------
 gisdb     | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =Tc/postgres         +
           |          |          |             |             | postgres=CTc/postgres+
           |          |          |             |             | dba=CTc/postgres
 postgres  | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 
 template0 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
           |          |          |             |             | postgres=CTc/postgres
 template1 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
           |          |          |             |             | postgres=CTc/postgres
 videodb   | student  | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =Tc/student          +
           |          |          |             |             | student=CTc/student  +
           |          |          |             |             | dba=CTc/student
(5 rows)

At this point, you can exit psql, the postgres user’s account, and the root user’s account. This should return you to your sudoer account, which in my case is the student user.

Connect to the gisdb with the following command-line interface command:

psql -U student -W -d gisdb

You can create the following, as per instructions in PostGIS Chapter 9 instructions:

-- Conditionally drop table.
DROP TABLE IF EXISTS geometries;
 
-- Create table with geometry column in table.
CREATE TABLE geometries
( geometries_id  INT
, name           VARCHAR
, geometry_obj   GEOMETRY);
 
 
INSERT INTO geometries
( name
, geometry_obj )
VALUES
 ('Point', 'POINT(0 0)')
,('Linestring', 'LINESTRING(0 0, 1 1, 2 1, 2 2)')
,('Polygon', 'POLYGON((0 0, 1 0, 1 1, 0 1, 0 0))')
,('PolygonWithHole', 'POLYGON((0 0, 10 0, 10 10, 0 10, 0 0),(1 1, 1 2, 2 2, 2 1, 1 1))')
,('Collection', 'GEOMETRYCOLLECTION(POINT(2 0),POLYGON((0 0, 1 0, 1 1, 0 1, 0 0)))');
 
SELECT name
,      ST_AsText(geometry_obj)
FROM   geometries;

Unfortunately, it raises the following error when attempting to create the geometries table:

psql:/home/student/Code/postgis/geometry.sql:7: ERROR:  type "geometry" does not exist
LINE 4: , geometry_obj   GEOMETRY);

As always, I hope the solutions presented helps move forward implementations of the technology. You can also find an excellent tutorial to learning PostGIS in the Introduction to PostGIS tutorial.

Written by maclochlainn

February 15th, 2024 at 10:57 pm

Ruby+PostgreSQL on Ubuntu

without comments

This extends the earlier post on installing and configuring Ruby 3.3.0 on Ubuntu 22.0.4. Please refer to that earlier post to install Ruby. This post shows you how to install the necessary libraries and Ruby Gems for PostgreSQL.

You need to install the libra-dev package, as shown:

sudo apt install postgresql libpq-dev

Next, you need to install the PG Gem:

gem install pg

You can now write a postgres_version.rb program to verify a connection to the PostgreSQL database, like:

# Include Ruby Gem libraries.
require 'rubygems'
require 'pg'
 
# Begin block.
begin
  # Create a new connection resource.
  db = PG::connect( 'localhost', 5432, '', '', 'videodb', 'student', 'student')
 
  # Create a result set.
  stmt = db.query('SELECT version() AS version')
 
  # Read through the result set hash.
  stmt.each do | row |
    puts "#{row['version']}"
  end
  # Release the result set resources.
  stmt.freeze
 
rescue PG::Error => e
  # Print the error.
  puts "ERROR #{e.error} (#{e.sqlstate})"
  puts "Can't connect to the PostgreSQL database specified."
  # Signal an error.
  exit 1
 
ensure
  # Close the connection when it is open.
  db.close if db
end

Call the postgres_version.rb program with this syntax:

ruby mysql_version.rb

It should return:

PostgreSQL 14.10 (Ubuntu 14.10-0ubuntu0.22.04.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit

The postgres_columns.rb script returns a couple columns concatenated into a single column:

# Include Ruby Gem libraries.
require 'rubygems'
require 'pg'
 
# Begin block.
begin
  # Create a new connection resource.
  db = PG::connect( 'localhost', 5432, '', '', 'videodb', 'student', 'student')
 
  # Create a result set.
  stmt = db.query("SELECT   CONCAT(nh.last_name, ', ', nh.first_name) AS name " + \
                  "FROM     new_hire nh " +                                       \
                  "ORDER BY nh.last_name")
 
  # Read through the result set hash.
  stmt.each do | row |
    out = ""
    i = 0
    while i < stmt.fields.count()
      # Check when not last column and use the:
      #   - Hash returned by the result set for the value, and
      #   - String array value returned by the statement object
      #     as the name value of the hash by leveraging its 
      #     numeric index.
      if i < stmt.fields.count() - 1
        out += "#{row[stmt.fields[i]]}"
        out += ", "
      else
        out += "#{row[stmt.fields[i]]}"
      end
      i += 1
    end
    puts "#{out}"
  end
 
  # Release the result set resources.
  stmt.freeze
 
rescue PG::Error => e
  # Print the error.
  puts "ERROR #{e.error} (#{e.sqlstate})"
  puts "Can't connect to PostgreSQL database specified."
  # Signal an error.
  exit 1
 
ensure
  # Close the connection when it is open.
  db.close if db
end

Call the postgres_columns.rb program with this syntax:

ruby mysql_columns.rb

It should return:

Chabot, Henry
Lewis, Malcolm

As always, I hope this helps those looking to learn and solve a problem. You can find the PG Gem documentation here.

Written by maclochlainn

February 7th, 2024 at 12:15 am

Ubuntu DaaS VM

without comments

Completed the build of my new Ubuntu Virtual Machine (VM) with Oracle 23c installed in a Docker instance, and MySQL and PostgreSQL installed locally. The VM image also provides a LAMP stack with Apache2, PHP 8.1 with MySQLi and PDO modules. Since the original post, I’ve added a number of items to support our program courses and the Data Science degrees.

There are lots of tricks and techniques in the blog associated with creating the build. I took the background photograph of Chapel Bridge early on Sunday morning August 30, 1987 in Lucerne Switzerland (with a Canon A1 and ASA 64 slide film that subsequently digitized well).

Next step: roll it out into an AWS image for the students to use for their courses.

Related blog posts:

As always, I hope the post and information helps others.

Written by maclochlainn

December 13th, 2023 at 1:15 am

TDE on PostgreSQL

without comments

The scope of Transparent Data Encryption (TDE) in PostgreSQL only applies to columns. It does not encrypt other aspects of the database, like table-level and database-level encryption; and those who deploy PostgreSQL may need to implement additional security measures to protect these database components.

You need to know two key elements before exploring TDE in PostgreSQL: Scheme inside a database and extensions. Unlike many databases, PostgreSQL schemas are not synonymous with a database. You may have multiple scheme (or, alternatively schemas) inside any PostgreSQL database.

Creating an extension is a one time event. Therefore, it’s easier to show you that first. You create a pgcrypto extension with the following command:

CREATE EXTENSION pgcrypto;

The public schema is the one most users deploy but for the purpose of hiding our AES encryption key this example creates a hidden schema. Unless you change the default find setting the hidden schema is not visible when connecting to the database.

You create the hidden schema with the following idimpotent (re-runnable) set of commands:

/* Drop dependent objects before dropping the schema. */
DROP TABLE IF EXISTS hidden.aes_key;
DROP FUNCTION IF EXISTS hidden.get_aes_key;
 
/*
 *  Drop function with cascade to remove the
 *  film_character_t trigger at same time.
 */
DROP FUNCTION IF EXISTS hidden.film_character_dml_f CASCADE;
 
/* Drop the schema conditionally. */
DROP SCHEMA IF EXISTS hidden;
 
/* Create the schema. */
CREATE SCHEMA hidden;

Next, we need to create a aes_key table and get_aes_key function in the hidden schema. The table will store the AES encryption key and the function lets us create an AES encryption key.

/* Create an aes encryption key table. */
CREATE TABLE hidden.aes_key
( aes_key  text );
 
/* Create a hidden function to build an AES encryption key. */
CREATE OR REPLACE
  FUNCTION hidden.get_aes_key() RETURNS text AS
  $$
  BEGIN
    RETURN gen_random_bytes(16)::text;
  END;
  $$
  LANGUAGE plpgsql;

After creating the public get_key() function, you insert a single row to the aes_key table by prefacing it with the hidden schema name, like this:

/* Insert the AES encryption key into a table. */ 
INSERT INTO hidden.aes_key
( aes_key )
VALUES
( hidden.get_aes_key());

Having built the plumbing for our AES encryption key, let’s show you how to encrypt and decrypt string values. This example lets you create an idimpotent film_character table in the public schema, like:

/* Drop the table conditionally. */  
DROP TABLE IF EXISTS film_character;
 
/* Create the demonstration table for encrypting and decrypting strings. */ 
CREATE TABLE film_character
( character_id    serial PRIMARY KEY
, plain_text      text
, encrypted_text  bytea );

After creating the AES encryption key table, function, and inserting a row of data, you need to create a public get_key() function, like:

/* Create a public function to retrieve the AES encryption key. */
CREATE OR REPLACE
  FUNCTION get_key() RETURNS text AS
  $$
  DECLARE
    retval  text;
  BEGIN
    SELECT aes_key INTO retval FROM hidden.aes_key;
    RETURN retval;
  END;
  $$
  LANGUAGE plpgsql;

The following INSERT statement write a plain text column and encrypted text column into the film_character table. The get_key() function hides how the pgp_sym_encrypt function encrypts the string.

/* Insert plain and encrypted text into a table. */
INSERT INTO film_character
( plain_text
, encrypted_text )
VALUES
('Severus Snape'
, pgp_sym_encrypt('Slytherin',get_key()));

The following query displays the plain and encrypted text stored in a row of the film_character table.

/* Query plain and encrypted text from a table. */
SELECT character_id
,      plain_text
,      encrypted_text
FROM   film_character;

It displays:

 character_id |  plain_text   | encrypted_text
--------------+---------------+--------------------------------------------------------------------------------------------------------------------------------------------------------
            1 | Severus Snape | \xc30d04070302fa1c4eebd90204cc7bd23901f1d4fa91b2455c3ef2987a305aebe01a4d94f9ebb467d6cb7a3846342ccd09cb55ac5e82a71cbaef93728fbeb4aaa9bf71b6fb93457758d1
(1 row)

Last, the following query displays the plain and decrypted text with the pgp_sym_decrypt function in a query:

/* Query the plain and decrypted text from a table. */
SELECT character_id
,      plain_text
,      pgp_sym_decrypt(encrypted_text,get_key()) AS encrypted_text
FROM   film_character;

The query returns the plain and decrypted values:

 character_id |  plain_text   | encrypted_text
--------------+---------------+-----------------
            1 | Severus Snape | Slytherin
(1 row)

However, this approach exposes the method for encrypting the encrypted_text column’s string value. You can hide this by creating a film_character_dml_f function in the hidden schema and a film_character_t trigger in the public schema, like:

/* Create trigger function for insert or update. */
CREATE FUNCTION hidden.film_character_dml_f()
  RETURNS trigger AS
$$
DECLARE
  /* Declare local variable. */
  unencrypted_input  VARCHAR(30);
BEGIN
  unencrypted_input := new.encrypted_text::text;
  /* Encrypt the column. */
  new.encrypted_text := pgp_sym_encrypt(unencrypted_input,get_key());
 
  /* Return new record type. */
  RETURN NEW;
END;
$$
LANGUAGE plpgsql;
 
CREATE TRIGGER film_character_t
  BEFORE INSERT OR UPDATE ON film_character
  FOR EACH ROW EXECUTE FUNCTION hidden.film_character_dml_f();

Now, you can insert the plain text data in an INSERT statement and the encryption occurs without disclosing how it happens. Here’s a sample statement:

INSERT INTO film_character
( plain_text
, encrypted_text )
VALUES
('Harry Potter'
,'Gryffindor');

A query of the table shows you that both rows have an encrypted value in the encrypted_text column.

/* Query plain and encrypted text from a table. */
SELECT character_id
,      plain_text
,      encrypted_text
FROM   film_character;

Displayed like:

 character_id |  plain_text   |                                                                                   encrypted_text
--------------+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
            1 | Severus Snape | \xc30d040703026716034f140d83e76cd23a01f99168afebe50d760b85c69373e3947c74473115a939843887db8e102cd0b2524378f4d684e0ba91c20afc436a056cd983fc47794eef7d4904
            2 | Harry Potter  | \xc30d040703020d8cc71d1f84e1ef6fd24701fd308f669e28a6135beac130fc51a6ccb5cef3c5005f4f557207fe5c84c4aedbb5b098dc9a882a9b7d801c61e34cd90517b4628b5a18b96b3fc61663b48391146b8c0fa2a858
(2 rows)

As always, I hope this code complete solution helps those trying to work with this technical stack.

Written by maclochlainn

January 8th, 2023 at 10:23 am

PostgreSQL Trigger 1

without comments

This entry covers how to write a statement logging trigger for PostgreSQL. It creates two tables: avenger and avenger_log; one avenger_t1 trigger, and a testing INSERT statement.

It was written to help newbies know how and what to return from a function written for a statement-level trigger. They often get stuck on the following when they try to return true. The term non-composite is another way to describe the tuple inserted.

psql:basics_postgres.sql:
59: ERROR:  cannot return non-composite value from function returning composite type
CONTEXT:  PL/pgSQL function write_avenger_t1() line 15 at RETURN

The avenger table:

/* Conditionally drop table. */
DROP TABLE IF EXISTS avenger;
 
/* Create table. */
CREATE TABLE avenger
( avenger_id    SERIAL
, avenger_name  VARCHAR(30)
, first_name    VARCHAR(20)
, last_name     VARCHAR(20));

Seed the avenger table:

/* Seed the avenger table with data. */
INSERT INTO avenger
( first_name, last_name, avenger_name )
VALUES
 ('Anthony', 'Stark', 'Iron Man')
,('Thor', 'Odinson', 'God of Thunder')
,('Steven', 'Rogers', 'Captain America')
,('Bruce', 'Banner', 'Hulk')
,('Clinton', 'Barton', 'Hawkeye')
,('Natasha', 'Romanoff', 'Black Widow')
,('Peter', 'Parker', 'Spiderman')
,('Steven', 'Strange', 'Dr. Strange')
,('Scott', 'Lange', 'Ant-man');

The avenger_log table:

/* Conditionally drop table. */
DROP TABLE IF EXISTS avenger_log;
 
/* Create table. */
CREATE TABLE avenger_log
( avenger_log_id  SERIAL
, trigger_name    VARCHAR(30)
, trigger_timing  VARCHAR(6)
, trigger_event   VARCHAR(6)
, trigger_type    VARCHAR(12));

The INSERT statement that tests the trigger:

DROP FUNCTION IF EXISTS avenger_t1_function; 
CREATE FUNCTION avenger_t1_function()
  RETURNS TRIGGER AS
$$
BEGIN
  /* Insert a row into the avenger_log table.
   * Also, see PostrgreSQL 39.9 Trigger Procedures. */
  INSERT INTO avenger_log
  ( trigger_name
  , trigger_timing
  , trigger_event
  , trigger_type )
  VALUES
  ( UPPER(TG_NAME)
  , TG_WHEN
  , TG_OP
  , TG_LEVEL );
 
  /* A statement trigger doesn't use a composite type or tuple,
   * it should simply return an empty composite type or void. */
  RETURN NULL;
END;
$$ LANGUAGE plpgsql;

The avenger_t1 statement trigger:

CREATE TRIGGER avenger_t1
  BEFORE INSERT ON avenger
  EXECUTE FUNCTION avenger_t1_function();

The INSERT statement:

INSERT INTO avenger
( first_name, last_name, avenger_name )
VALUES
 ('Hope', 'van Dyne', 'Wasp');

The results logged to the avenger_log table from a query:

 avenger_log_id | trigger_name | trigger_timing | trigger_event | trigger_type
----------------+--------------+----------------+---------------+--------------
              1 | AVENGER_T1   | BEFORE         | INSERT        | STATEMENT
(1 row)

As always, I hope this helps those looking for a solution.

Written by maclochlainn

July 11th, 2022 at 3:12 pm

PL/pgSQL Transactions

without comments

There are many nuances that I show students about PL/pgSQL because first I teach them how to use PL/SQL. These are some of the differences:

  • PL/SQL declares the function or procedure and then uses the IS keyword; whereas, PL/pgSQL uses the AS keyword.
  • PL/SQL uses the RETURN keyword for functions declarations, like:

    RETURN [data_type} IS

    Whereas, PL/pgSQL uses the plural RETURNS keyword in the function declaration, like:

    RETURNS [data_type] AS
  • PL/SQL considers everything after the function or procedure header as the implicit declaration section; whereas, PL/pgSQL requires you block the code with something like $$ (double dollar symbols) and explicitly use the DECLARE keyword.
  • PL/SQL supports local functions (inside the DECLARE block of a function or procedure); whereas, PL/pgSQL doesn’t.
  • PL/SQL puts the variable modes (IN, INOUT, OUT) between the parameter name and type; whereas, PL/pgSQL puts them before the variable name.
  • PL/SQL declares cursors like:

    CURSOR cursor_name (parameter_list) IS

    Whereas, PL/pgSQL declares them like

    cursor_name CURSOR (parameter_list) FOR
  • PL/SQL terminates and runs the block by using an END keyword, an optional module name, a semicolon to terminate the END; statement, and a forward slash to dispatch the program to PL/SQL statement engine:

    END [module_name];
    /

    Whereas, PL/pgSQL terminates and runs the block by using an END keyword, a semicolon to terminate the END; statement, two dollar signs to end the PL/pgSQL block, and a semicolon to dispatch the program.

    END LANGUAGE plpgsql;
    $$;

After all that basic syntax discussion, we try to create a sample set of tables, a function, a procedure, and a test case in PL/pgSQL. They’ve already done a virtually equivalent set of tasks in PL/SQL.

Here are the steps:

  1. Create the grandma and tweetie_bird tables:

    /* Conditionally drop grandma table and grandma_s sequence. */
    DROP TABLE IF EXISTS grandma CASCADE;
     
    /* Create the table. */
    CREATE TABLE GRANDMA
    ( grandma_id     SERIAL
    , grandma_house  VARCHAR(30)  NOT NULL
    , PRIMARY KEY (grandma_id)
    );
     
    /* Conditionally drop a table and sequence. */
    DROP TABLE IF EXISTS tweetie_bird CASCADE;
     
    /* Create the table with primary and foreign key out-of-line constraints. */
    SELECT 'CREATE TABLE tweetie_bird' AS command;
    CREATE TABLE TWEETIE_BIRD
    ( tweetie_bird_id     SERIAL
    , tweetie_bird_house  VARCHAR(30)   NOT NULL
    , grandma_id          INTEGER       NOT NULL
    , PRIMARY KEY (tweetie_bird_id)
    , CONSTRAINT tweetie_bird_fk        FOREIGN KEY (grandma_id)
      REFERENCES grandma (grandma_id)
    );
  2. Create a get_grandma_id function that returns a number, which should be a valid primary key value from the grandma_id column of the grandma table.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    
    CREATE OR REPLACE
      FUNCTION get_grandma_id
      ( IN pv_grandma_house  VARCHAR ) RETURNS INTEGER AS
    $$
      /* Required for PL/pgSQL programs. */
      DECLARE
     
        /* Local return variable. */
        lv_retval  INTEGER := 0;  -- Default value is 0.
     
        /* Use a cursor, which will not raise an exception at runtime. */
        find_grandma_id CURSOR 
        ( cv_grandma_house  VARCHAR ) FOR
          SELECT grandma_id
          FROM   grandma
          WHERE  grandma_house = cv_grandma_house;
     
      BEGIN  
     
        /* Assign a value when a row exists. */
        FOR i IN find_grandma_id(pv_grandma_house) LOOP
          lv_retval := i.grandma_id;
        END LOOP;
     
        /* Return 0 when no row found and the ID # when row found. */
        RETURN lv_retval;
      END;
    $$ LANGUAGE plpgsql;
  3. Create a Warner_brother procedure that writes data across two tables as a transaction. You con’t include any of the following in your functions or procedures because all PostgreSQL PL/pgSQL functions and procedures are transaction by default:

    • SET TRANSACTION
    • START TRANSACTION
    • SAVEPOINT
    • COMMIT

    A ROLLBACK should be placed in your exception handler as qualified on lines #33 thru #36. The warner_brother procedure inserts rows into the grandma and tweetie_bird tables.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    
    /* Create or replace procedure warner_brother. */
    CREATE OR REPLACE
      PROCEDURE warner_brother
      ( pv_grandma_house       VARCHAR
      , pv_tweetie_bird_house  VARCHAR ) AS
    $$ 
      /* Required for PL/pgSQL programs. */
      DECLARE
     
      /* Declare a local variable for an existing grandma_id. */
      lv_grandma_id   INTEGER;
     
    BEGIN  
      /* Check for existing grandma row. */
      lv_grandma_id := get_grandma_id(pv_grandma_house);
      IF lv_grandma_id = 0 THEN 
        /* Insert grandma. */
        INSERT INTO grandma
        ( grandma_house )
        VALUES
        ( pv_grandma_house )
        RETURNING grandma_id INTO lv_grandma_id;
      END IF;
     
      /* Insert tweetie bird. */
      INSERT INTO tweetie_bird
      ( tweetie_bird_house 
      , grandma_id )
      VALUES
      ( pv_tweetie_bird_house
      , lv_grandma_id );
     
    EXCEPTION
      WHEN OTHERS THEN
        ROLLBACK;
        RAISE NOTICE '[%] [%]', SQLERRM, SQLSTATE;  
    END;
    $$ LANGUAGE plpgsql;

    You should take note of the RETURNING-INTO statement on line #22. The alternative to this clause isn’t pretty if you know that PostgreSQL uses a table name, column name, and the literal seq value separated by underscores (that is, snake case), like:

        /* Assign current value to local variable. */
        lv_grandma_id := CURRVAL('grandma_grandma_id_seq');

    It would be even uglier if you had to look up the sequence name, like:

        /* Assign current value to local variable. */
        lv_grandma_id := CURRVAL(pg_get_serial_sequence('grandma','grandma_id'));
  4. You can test the combination of these two stored procedures with the following DO-block:

    /* Test the warner_brother procedure. */
    DO
    $$
    BEGIN
      /* Insert the yellow house. */
      CALL warner_brother( 'Yellow House', 'Cage');
      CALL warner_brother( 'Yellow House', 'Tree House');
     
      /* Insert the red house. */
      CALL warner_brother( 'Red House', 'Cage');
      CALL warner_brother( 'Red House', 'Tree House');
    END;
    $$ LANGUAGE plpgsql;

    Then, query the results:

    SELECT *
    FROM   grandma g INNER JOIN tweetie_bird tb
    ON.    g.grandma_id = tb.grandma_id;

    It should return:

     grandma_id | grandma_house | tweetie_bird_id | tweetie_bird_house | grandma_id
    ------------+---------------+-----------------+--------------------+------------
              1 | Red House     |               1 | Cage               |          1
              1 | Red House     |               2 | Tree House         |          1
              2 | Yellow House  |               3 | Cage               |          2
              2 | Yellow House  |               4 | Tree House         |          2
    (4 rows)
  5. As always, I hope writing a clear and simple examples helps those looking for sample code.

Written by maclochlainn

June 16th, 2022 at 9:38 pm

Logging Table Function

with one comment

It is interesting when somebody remembers a presentation from 10 years ago. They asked if it was possible in PL/pgSQL to write an autonomous procedure to log data when calling a table view function. The answer is two fold. PL/pgSQL doesn’t support autonomous functions or procedures like the Oracle database but it doesn’t need to because unless you invoke a transaction it auto commits writes.

Logging table functions are important for security auditing and compliance management against laws, like SOX, HIPAA, and FERPA. All too many systems lack the basic ability to audit who queries records without raising an error and blocking the access. That means the bad actor or actress gains the ability to probe the system for weaknesses before determining an attack vector. It’s often better to capture the unauthorized access and take direct action to protect both the the data and systems.

While the example lets an unauthorized person access the information in the first version of the student_query, it blocks access by reporting no rows returned in the latter. Both versions of the query log the data and thereby collect the evidence necessary to act against the hack.

This blog post shows you how to write it and test it. Follow the following steps:

  1. Create the necessary tables and data to work with a logging PL/pgSQL table view function:

    /* Conditionally drop and create table. */
    DROP TABLE IF EXISTS student;
    CREATE TABLE student
    ( student_id      SERIAL
    , first_name      VARCHAR(20)
    , last_name       VARCHAR(20)
    , hogwarts_house  VARCHAR(10));
     
    /* Conditionally drop and create table. */
    DROP TABLE IF EXISTS logger;
    CREATE TABLE logger
    ( logger_id        SERIAL
    , app_user         VARCHAR(30)
    , queried_student  VARCHAR(30)
    , query_time       TIMESTAMP );
     
    /* Insert one record into table. */
    INSERT INTO student
    ( first_name, last_name, hogwarts_house )
    VALUES
     ( 'Harry', 'Potter', 'Gryffindor' )
    ,( 'Hermione', 'Granger', 'Gryffindor' )
    ,( 'Ronald', 'Weasily', 'Gryffindor' )
    ,( 'Draco', 'Malfoy', 'Slytherin' )
    ,( 'Vincent', 'Crabbe', 'Slytherin' )
    ,( 'Susan', 'Bones', 'Hufflepuff' )
    ,( 'Hannah', 'Abbott', 'Hufflepuff' )
    ,( 'Luna', 'Lovegood', 'Ravenclaw' )
    ,( 'Cho', 'Chang', 'Ravenclaw' )
    ,( 'Gilderoy', 'Lockhart', 'Ravenclaw' );
  2. While not necessary if you’re very familiar with PL/pgSQL, it may be helpful to review:

    • The SET command that lets you assign a value to a session-level variable, which you can later use in a PL/pgSQL block.
    • The SELECT-INTO statement in a DO-block.

    Here’s a test script that demonstrates both:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    
    /* Set a session-level variable. */
    SET credential.app_user = 'Draco Malfoy';
     
    /* Secure the value from a session-level variable. */
    SELECT current_setting('credential.app_user');
     
    /* 
    DO
    $$
    DECLARE
      input   VARCHAR(30) := 'Hermione';
      output  VARCHAR(30);
    BEGIN
      /* Sample for partial name construction of full name. */
      SELECT CONCAT(s.first_name, ' ', s.last_name) AS student_name
      INTO   output
      FROM   student s
      WHERE  CONCAT(s.first_name, ' ', s.last_name) LIKE '%'||input||'%';
     
      /* Show result of local assignment via a query. */
      RAISE NOTICE '[%][%]', current_setting('credential.app_user'), output;
    END;
    $$;

    There’s an important parsing trick to this sample program. It uses the LIKE operator rather than the SIMILAR TO operator because the parser fails to recognize the SIMILAR TO operator.

    The DO-block returns the following output:

    NOTICE:  [Draco Malfoy][Hermione Granger]
  3. This creates the student_query logging table function, which takes a partial portion of a students first and last name to return the student information. While the example only returns the name and the Hogwarts House it lays a foundation for a more complete solution.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    
    CREATE OR REPLACE
      FUNCTION student_query (partial_name  VARCHAR)
      RETURNS TABLE ( first_naem      VARCHAR(20)
                    , last_name       VARCHAR(20)
                    , hogwarts_house  VARCHAR(10) ) AS
    $$
    DECLARE
      queried   VARCHAR;
      by_whome  VARCHAR;
    BEGIN
      /* Query separately because embedding in insert statement fails. */
      SELECT CONCAT(s.first_name, ' ', s.last_name) AS student_name
      FROM   student s INTO queried
      WHERE  CONCAT(s.first_name, ' ', s.last_name) LIKE '%'||partial_name||'%';
     
      /* Log the query with the credentials of the user. */  
      INSERT INTO logger
      ( app_user
      , queried_student
      , query_time )
      VALUES
      ( current_setting('credential.app_user')
      , queried
      , NOW());
     
      /* Return the result set without disclosing the query was recorded. */
      RETURN QUERY
      SELECT s.first_name
      ,      s.last_name
      ,      s.hogwarts_house
      FROM   student s
      WHERE  CONCAT(s.first_name, ' ', s.last_name) LIKE '%'||partial_name||'%';
    END;
    $$ LANGUAGE plpgsql;
  4. You can test the function by calling it, like this:

    SELECT * FROM student_query('Hermione');

    It displays:

     first_naem | last_name | hogwarts_house
    ------------+-----------+----------------
     Hermione   | Granger   | Gryffindor
    (1 row)

    You can check the logging table and discover who looked up another student’s records.

    SELECT * FROM logger;

    It displays:

     logger_id |   app_user   | queried_student  |         query_time
    -----------+--------------+------------------+----------------------------
             1 | Draco Malfoy | Hermione Granger | 2022-05-29 22:51:50.398987
    (1 row)
  5. Assuming you’ve built an authorized_user function that returns a Boolean, you can add a call to it in the WHERE clause. For simplicity, let’s implement the function to deny all users, like:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    
    CREATE OR REPLACE
      FUNCTION authorized_user
      (user_name  VARCHAR) RETURNS BOOLEAN AS
    $$
    DECLARE
      lv_retval  BOOLEAN := FALSE;
    BEGIN
      RETURN lv_retval;
    END;
    $$  LANGUAGE plpgsql;

    You can now replace the query on lines 28 through 32 with the new one below. The added clause on line 33 denies access to unauthorized users because there aren’t any.

    28
    29
    30
    31
    32
    33
    
      SELECT s.first_name
      ,      s.last_name
      ,      s.hogwarts_house
      FROM   student s
      WHERE  CONCAT(s.first_name, ' ', s.last_name) LIKE '%'||partial_name||'%'
      AND    authorized_user(current_setting('credential.app_user'));

    While it returns:

     first_naem | last_name | hogwarts_house
    ------------+-----------+----------------
    (0 rows)

    The logger table shows two entries. One for the query that returned a value and one for the version that didn’t.

     logger_id |   app_user   | queried_student  |         query_time
    -----------+--------------+------------------+----------------------------
             1 | Draco Malfoy | Hermione Granger | 2022-05-29 23:23:39.82063
             2 | Draco Malfoy | Hermione Granger | 2022-05-29 23:23:40.736945
    (2 rows)

    In both cases the bad actor Draco Malfoy’s unauthorized access is captured and he was denied any information without alerting him to the security precaution in a logging table function.

As always, I hope this helps those looking for this type of solution.

PL/SQL Table Function

without comments

An Oracle example was requested as a comparison against the quick tutorial I wrote on how to do this in PostgreSQL’s PL/pgSQL. Unfortunately, there are many more moving parts to deliver this type of solution in Oracle’s PL/SQL.

The functions is same and simple. It returns the list of conquistadors that were originally German. It does that by filtering on the lang column in the table. For example, you use ‘de‘ for German. The additional moving parts are the required User-Defined Types (UDTs); one is a record structure and the other is a list (or Oracle parlance table).

The drops are unconditional and as such will trigger errors the first time they’re run but including PL/SQL blocks to make them conditional would have made the code much larger. It’s already larger because Oracle doesn’t support comma-delimited lists in the VALUES clause.

I’ll stage this with the same conquistador table used in the last post. Then, connect to the psql shell and run the following script file:

/* Drop the conquistador table. */
DROP TABLE conquistador;
 
/* Create the conquistador table. */
CREATE TABLE conquistador
( conquistador_id   NUMBER
, conquistador      VARCHAR(30)
, actual_name       VARCHAR(30)
, nationality       VARCHAR(30)
, lang              VARCHAR(2));
 
/* Drop the conquistador sequence. */
DROP SEQUENCE conquistador_seq;
 
/* Create the conquistador_seq with a 1001 start value. */
CREATE SEQUENCE conquistador_seq START WITH 1001;
 
/* Insert 9 rows into the table. */
INSERT INTO conquistador
( conquistador_id, conquistador, actual_name, nationality, lang )
VALUES
(conquistador_seq.NEXTVAL,'Juan de Fuca','Ioánnis Fokás','Greek','el');
 
INSERT INTO conquistador
( conquistador_id, conquistador, actual_name, nationality, lang )
VALUES 
(conquistador_seq.NEXTVAL,'Nicolás de Federmán','Nikolaus Federmann','German','de');
 
INSERT INTO conquistador
( conquistador_id, conquistador, actual_name, nationality, lang )
VALUES
(conquistador_seq.NEXTVAL,'Sebastián Caboto','Sebastiano Caboto','Venetian','it');
 
INSERT INTO conquistador
( conquistador_id, conquistador, actual_name, nationality, lang )
VALUES
(conquistador_seq.NEXTVAL,'Jorge de la Espira','Georg von Speyer','German','de');
 
INSERT INTO conquistador
( conquistador_id, conquistador, actual_name, nationality, lang )
VALUES
(conquistador_seq.NEXTVAL,'Eusebio Francisco Kino','Eusebius Franz Kühn','Italian','it');
 
INSERT INTO conquistador
( conquistador_id, conquistador, actual_name, nationality, lang )
VALUES
(conquistador_seq.NEXTVAL,'Wenceslao Linck','Wenceslaus Linck','Bohemian','cs');
 
INSERT INTO conquistador
( conquistador_id, conquistador, actual_name, nationality, lang )
VALUES
(conquistador_seq.NEXTVAL,'Fernando Consag','Ferdinand Konšcak','Croatian','sr');
 
INSERT INTO conquistador
( conquistador_id, conquistador, actual_name, nationality, lang )
VALUES
(conquistador_seq.NEXTVAL,'Américo Vespucio','Amerigo Vespucci','Italian','it');
 
INSERT INTO conquistador
( conquistador_id, conquistador, actual_name, nationality, lang )
VALUES
(conquistador_seq.NEXTVAL,'Alejo García','Aleixo Garcia','Portuguese','pt');

While unnecessary in PL/pgSQL, you must create User-Defined Types (UDTs) to write a table function. You must also create a local procedure to avoid allocating memory before assigning values to the list. These are the UDTs required:

/* Drop the dependency before the dependent type. */
DROP TYPE conquistador_table;
DROP TYPE conquistador_struct;
 
/* Create the UDT for a record structure accessible in SQL. */
CREATE OR REPLACE
  TYPE conquistador_struct IS OBJECT
  ( conquistador      VARCHAR(30)
  , actual_name       VARCHAR(30)
  , nationality       VARCHAR(30));
/
 
/* Create a list of the UDT. */
CREATE OR REPLACE
  TYPE conquistador_table IS TABLE OF conquistador_struct;
/

Drop any existing function or procedure of the same name before you try to build it. Oracle’s OR REPLACE fails when you try to use it for a function when there is already a procedure using the same name, and vice versa.

/* Drop the function to avoid any conflict with a procedure of
   the same name. */
DROP FUNCTION getConquistador;

Now, you can build another script file to create the getConquistador function, like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
/* Create the function. */
CREATE OR REPLACE
  FUNCTION getConquistador
  (pv_lang IN VARCHAR) RETURN conquistador_table IS
 
  /* Declare a return variable. */
  lv_retval  CONQUISTADOR_TABLE := conquistador_table();
 
  /* Declare a dynamic cursor. */
  CURSOR get_conquistador
  ( cv_lang  VARCHAR2 ) IS
    SELECT c.conquistador
    ,      c.actual_name
    ,      c.nationality
    FROM   conquistador c
    WHERE  c.lang = cv_lang;
 
  /* Local procedure to add to the song. */
  PROCEDURE ADD
  ( pv_input  CONQUISTADOR_STRUCT ) IS
  BEGIN
    lv_retval.EXTEND;
    lv_retval(lv_retval.COUNT) := pv_input;
  END ADD;
 
BEGIN
  /* Read through the cursor and assign to the UDT table. */
  FOR i IN get_conquistador(pv_lang) LOOP
    add(conquistador_struct( i.conquistador
                           , i.actual_name
                           , i.nationality ));
  END LOOP;
 
  /* Return collection. */
  RETURN lv_retval;
END;
/

While there is some white space for readability, the Oracle version is basically twice as long as the PL/pgSQL version. It also requires you to add UDTs to the data dictionary to make it work. PL/pgSQL actually doesn’t let you add references to type definitions and requires you use enumerated descriptions with column definitions.

Then, you can test it with the following syntax. The TABLE function is required to convert the list to a SQL consumable result set:

COL conquistador  FORMAT A21
COL actual_name   FORMAT A21
COL nationality   FORMAT A12
SELECT * FROM TABLE(getConquistador('de'));

It will return the following:

CONQUISTADOR          ACTUAL_NAME           NATIONALITY
--------------------- --------------------- ------------
Nicolás de Federmán   Nikolaus Federmann    German
Jorge de la Espira    Georg von Speyer      German
 
2 rows selected.

As always, I hope this helps with a technique that’s useful.

Written by maclochlainn

May 28th, 2022 at 6:47 pm

PostgreSQL Table Function

without comments

A quick tutorial on how to write a PL/pgSQL Table function. The functions is simple. It returns the list of conquistadors that were originally German. It does that by filtering on the lang column in the table. For example, you use ‘de‘ for German.

I’ll stage this with the same conquistador table used in the last post. Don’t forget to use the chcp command to the Active Console Code Page to 4-byte Unicode before you run the script file, like:

chcp 65001

Then, connect to the psql shell and run the following script file:

/* Conditionally drop the conquistador table. */
DROP TABLE IF EXISTS conquistador;
 
/* Create the conquistador table. */
CREATE TABLE conquistador
( conquistador_id   SERIAL
, conquistador      VARCHAR(30)
, actual_name       VARCHAR(30)
, nationality       VARCHAR(30)
, lang              VARCHAR(2));
 
/* Insert some conquistadors into the table. */
INSERT INTO conquistador
( conquistador
, actual_name
, nationality
, lang )
VALUES
 ('Juan de Fuca','Ioánnis Fokás','Greek','el')
,('Nicolás de Federmán','Nikolaus Federmann','German','de')
,('Sebastián Caboto','Sebastiano Caboto','Venetian','it')
,('Jorge de la Espira','Georg von Speyer','German','de')
,('Eusebio Francisco Kino','Eusebius Franz Kühn','Italian','it')
,('Wenceslao Linck','Wenceslaus Linck','Bohemian','cs')
,('Fernando Consag','Ferdinand Konšcak','Croatian','sr')
,('Américo Vespucio','Amerigo Vespucci','Italian','it')
,('Alejo García','Aleixo Garcia','Portuguese','pt');

Now, you can build another script file to create the getConquistador function, like:

/* Drop the funciton conditionally. */
DROP FUNCTION IF EXISTS getConquistador;

Create the getConquistador function:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
CREATE FUNCTION getConquistador (IN lang_in VARCHAR(2))
  RETURNS TABLE
    ( conquistador      VARCHAR(30)
    , actual_name       VARCHAR(30)
    , nationality       VARCHAR(30)) AS
$$
BEGIN
  RETURN QUERY
  SELECT c.conquistador
  ,      c.actual_name
  ,      c.nationality
  FROM   conquistador c
  WHERE  c.lang = lang_in;
END;
$$ LANGUAGE plpgsql;

Then, you can test it like:

SELECT * FROM getConquistador('de');

It will return the following:

     conquistador      |    actual_name     | nationality
-----------------------+--------------------+-------------
 Nicolás de Federmán   | Nikolaus Federmann | German
 Jorge de la Espira    | Georg von Speyer   | German
(2 rows)

As always, I hope this helps with a technique that’s useful.

PostgreSQL Unicode

with 3 comments

It seems unavoidable to use Windows. Each time I’m compelled to run tests on the platform I find new errors. For example, they don’t use 4-byte unicode and as a result when you want to use Unicode in PostgreSQL there’s a mismatch.

For example, change the Active Console Code Page with the chcp (change code page) to match the one PostgreSQL uses, like:

chip 1252

It lets you avoid this warning message:

Password for user postgres:
psql (14.1)
WARNING: Console code page (437) differs from Windows code page (1252)
         8-bit characters might not work correctly. See psql reference
         page "Notes for Windows users" for details.
Type "help" for help.
 
postgres=#

However, it won’t avoid display issues with real Unicode values. For example, let’s use a small international table like the following:

/* Conditionally drop the conquistador table. */
DROP TABLE IF EXISTS conquistador;
 
/* Create the conquistador table. */
CREATE TABLE conquistador
( conquistador_id   SERIAL
, conquistador      VARCHAR(30)
, actual_name       VARCHAR(30)
, nationality       VARCHAR(30)
, lang              VARCHAR(2));
 
/* Insert some conquistadors into the table. */
INSERT INTO conquistador
( conquistador
, actual_name
, nationality
, lang )
VALUES
 ('Juan de Fuca','Ioánnis Fokás','Greek','el')
,('Nicolás de Federmán','Nikolaus Federmann','German','de')
,('Sebastián Caboto','Sebastiano Caboto','Venetian','it')
,('Jorge de la Espira','Georg von Speyer','German','de')
,('Eusebio Francisco Kino','Eusebius Franz Kühn','Italian','it')
,('Wenceslao Linck','Wenceslaus Linck','Bohemian','cs')
,('Fernando Consag','Ferdinand Konšcak','Croatian','sr')
,('Américo Vespucio','Amerigo Vespucci','Italian','it')
,('Alejo García','Aleixo Garcia','Portuguese','pt');
 
/* Query the values from the conquistador table. */
SELECT * FROM conquistador;

When you call the script to load it, like:

\i testScript.sql

It’ll display the following, which you can check against the strings in the VALUES clause above. There are encoding issues on lines 1, 2, 3, 5, 7, and 8 below.

 conquistador_id |      conquistador      |     actual_name      | nationality | lang
-----------------+------------------------+----------------------+-------------+------
               1 | Juan de Fuca           | Ioánnis Fokás      | Greek       | el
               2 | Nicolás de Federmán  | Nikolaus Federmann   | German      | de
               3 | Sebastián Caboto      | Sebastiano Caboto    | Venetian    | it
               4 | Jorge de la Espira     | Georg von Speyer     | German      | de
               5 | Eusebio Francisco Kino | Eusebius Franz Kühn | Italian     | it
               6 | Wenceslao Linck        | Wenceslaus Linck     | Bohemian    | cs
               7 | Fernando Consag        | Ferdinand Konšcak   | Croatian    | sr
               8 | Américo Vespucio      | Amerigo Vespucci     | Italian     | it
               9 | Alejo García           | Aleixo Garcia        | Portuguese  | pt
(9 rows)

If you’re like me, it was annoying. The problem is that the native 2-byte Unicode of Microsoft sends values into PostgreSQL that are invalid. Those codes are read back with unintended values from other character encoding sets.

While you can’t set Windows generic encoding to 65001 without causing the system problems, you can set Active Console Code Page value in the scope of a Command-Line session before running the script.

The chcp command lets you set it to 4-byte Unicode, like:

chcp 65001

Now, rerun the script and PostgreSQL will display the correct character encoding set with some spacing irregularities. However, that’s not what’s important when you call table from another programming language through the ODBC-layer. The data will be returned in a 4-byte Unicode encoding stream.

 conquistador_id |      conquistador      |     actual_name      | nationality | lang
-----------------+------------------------+----------------------+-------------+------
               1 | Juan de Fuca           | Ioánnis Fokás        | Greek       | el
               2 | Nicolás de Federmán    | Nikolaus Federmann   | German      | de
               3 | Sebastián Caboto       | Sebastiano Caboto    | Venetian    | it
               4 | Jorge de la Espira     | Georg von Speyer     | German      | de
               5 | Eusebio Francisco Kino | Eusebius Franz Kühn  | Italian     | it
               6 | Wenceslao Linck        | Wenceslaus Linck     | Bohemian    | cs
               7 | Fernando Consag        | Ferdinand Konšcak    | Croatian    | sr
               8 | Américo Vespucio       | Amerigo Vespucci     | Italian     | it
               9 | Alejo García           | Aleixo Garcia        | Portuguese  | pt
(9 rows)

A similar error to what I encountered testing MySQL Workbench’s ability to export SQL Server databases 10 years ago. I thought giving a solution to get coerce correct 4-byte Unicode data insertion may help those who also may be surprised by the behavior.