MacLochlainns Weblog

Michael McLaughlin's Technical Blog

Site Admin

Archive for the ‘Linux’ Category

Python on PostgreSQL

without comments

The ODBC library you use when connecting Python to PostgreSQL is the psycopg2 Python library. This blog post will show use how to use it in Python and install it on your Fedora Linux installation. It leverages a videodb database that I show you how to build in this earlier post on configuring PostgreSQL 14.

You would import psycopg2 as follows in your Python code:

import psycopg2

Unfortunately, that only works on Linux servers when you’ve installed the library. That library isn’t installed with generic Python libraries. You get the following error when the psycopg2 library isn’t installed on your server.

Traceback (most recent call last):
  File "python_new_hire.sql", line 1, in <module>
    import psycopg2
ModuleNotFoundError: No module named 'psycopg2'

You can install it on Fedora Linux with the following command:

yum install python3-psycopg2

It will install:

====================================================================================
 Package                  Architecture   Version               Repository      Size
====================================================================================
Installing:
 python3-psycopg2         x86_64         2.7.7-1.fc30          fedora         160 k
 
Transaction Summary
====================================================================================
Install  1 Package
 
Total download size: 160 k
Installed size: 593 k
Is this ok [y/N]: y
Downloading Packages:
python3-psycopg2-2.7.7-1.fc30.x86_64.rpm            364 kB/s | 160 kB     00:00    
------------------------------------------------------------------------------------
Total                                               167 kB/s | 160 kB     00:00     
Running transaction check
Transaction check succeeded.
Running transaction test
Transaction test succeeded.
Running transaction
  Preparing        :                                                            1/1 
  Installing       : python3-psycopg2-2.7.7-1.fc30.x86_64                       1/1 
  Running scriptlet: python3-psycopg2-2.7.7-1.fc30.x86_64                       1/1 
  Verifying        : python3-psycopg2-2.7.7-1.fc30.x86_64                       1/1 
 
Installed:
  python3-psycopg2-2.7.7-1.fc30.x86_64                                              
 
Complete!

Here’s a quick test case that you can run in PostgreSQL and Python to test all the pieces. The first SQL script creates a new_hire table and inserts two rows, and the Python program queries data from the new_hire table.

The new_hire.sql file creates the new_hire table and inserts two rows:

-- Environment settings for the script.
SET SESSION "videodb.table_name" = 'new_hire';
SET CLIENT_MIN_MESSAGES TO ERROR;
 
--  Verify table name.
SELECT current_setting('videodb.table_name');
 
-- ------------------------------------------------------------------
--  Conditionally drop table.
-- ------------------------------------------------------------------
DROP TABLE IF EXISTS new_hire CASCADE;
 
-- ------------------------------------------------------------------
--  Create table.
-- -------------------------------------------------------------------
CREATE TABLE new_hire
( new_hire_id  SERIAL
, first_name   VARCHAR(20)  NOT NULL
, middle_name  VARCHAR(20)
, last_name    VARCHAR(20)  NOT NULL
, hire_date    TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP
, PRIMARY KEY (new_hire_id));
 
-- Alter the sequence by restarting it at 1001.
ALTER SEQUENCE new_hire_new_hire_id_seq RESTART WITH 1001;
 
-- Display the table organization.
SELECT   tc.table_catalog || '.' || tc.constraint_name AS constraint_name
,        tc.table_catalog || '.' || tc.table_name AS table_name
,        kcu.column_name
,        ccu.table_catalog || '.' || ccu.table_name AS foreign_table_name
,        ccu.column_name AS foreign_column_name
FROM     information_schema.table_constraints AS tc JOIN information_schema.key_column_usage AS kcu
ON       tc.constraint_name = kcu.constraint_name
AND      tc.table_schema = kcu.table_schema JOIN information_schema.constraint_column_usage AS ccu
ON       ccu.constraint_name = tc.constraint_name
AND      ccu.table_schema = tc.table_schema
WHERE    tc.constraint_type = 'FOREIGN KEY'
AND      tc.table_name = current_setting('videodb.table_name')
ORDER BY 1;
 
SELECT c1.table_name
,      c1.ordinal_position
,      c1.column_name
,      CASE
         WHEN c1.is_nullable = 'NO' AND c2.column_name IS NOT NULL THEN 'PRIMARY KEY'
         WHEN c1.is_nullable = 'NO' AND c2.column_name IS NULL THEN 'NOT NULL'
       END AS is_nullable
,      CASE
         WHEN data_type = 'character varying' THEN
           data_type||'('||character_maximum_length||')'
         WHEN data_type = 'numeric' THEN
           CASE
             WHEN numeric_scale != 0 AND numeric_scale IS NOT NULL THEN
               data_type||'('||numeric_precision||','||numeric_scale||')'
             ELSE
               data_type||'('||numeric_precision||')'
             END
         ELSE
           data_type
        END AS data_type
FROM    information_schema.columns c1 LEFT JOIN
          (SELECT trim(regexp_matches(column_default,current_setting('videodb.table_name'))::text,'{}')||'_id' column_name
           FROM   information_schema.columns) c2
ON       c1.column_name = c2.column_name
WHERE    c1.table_name = current_setting('videodb.table_name')
ORDER BY c1.ordinal_position;
 
-- Display primary key and unique constraints.
SELECT constraint_name
,      lower(constraint_type) AS constraint_type
FROM   information_schema.table_constraints
WHERE  table_name = current_setting('videodb.table_name')
AND    constraint_type IN ('PRIMARY KEY','UNIQUE');
 
-- Insert two test records.
INSERT INTO new_hire
( first_name, middle_name, last_name, hire_date )
VALUES
 ('Malcolm','Jacob','Lewis','2018-2-14')
,('Henry',null,'Chabot','1990-07-31');

You can put it into a local directory, connect as the student user to a videodb database, and run the following command (or any database you’ve created).

\i new_hire.sql

The new_hire.py file creates the new_hire table and inserts two rows:

# Import the PostgreSQL connector library.
import psycopg2
 
try:
  # Open a connection to the database.
  connection = psycopg2.connect( user="student"
                               , password="student"
                               , port="5432"
                               , dbname="videodb")
 
  # Open a cursor.
  cursor = connection.cursor()
 
  # Assign a static query.
  query = "SELECT new_hire_id, first_name, last_name " \
          "FROM new_hire"
 
  # Parse and execute the query.
  cursor.execute(query)
 
  # Fetch all rows from a table.
  records = cursor.fetchall()
 
  # Read through and print the rows as tuples.
  for row in range(0, len(records)):
    print(records[row]) 
 
except (Exception, psycopg2.Error) as error :
  print("Error while fetching data from PostgreSQL", error)
 
finally:
  # Close the database connection.
  if (connection):
    cursor.close()
    connection.close()

You run it from the command line, like:

python3 ./new_hire.py

It should print:

(1001, 'Malcolm', 'Lewis')
(1002, 'Henry', 'Chabot')

As always, I hope this helps those trying to sort out how to connect Python to PostgreSQL.

Written by maclochlainn

March 2nd, 2022 at 1:06 am

PostgreSQL CLI Error

without comments

Problems get reported to me all the time on installations for my students, this one was interesting. They got an error complaining about a missing libpq.so.5 library.

psql: /usr/pgsql-11/lib/libpq.so.5: no version information available (required by psql)
psql: /usr/pgsql-11/lib/libpq.so.5: no version information available (required by psql)
could not change directory to "/root": Permission denied
psql (11.7, server 11.8)
Type "help" for help.
 
postgres=#

It appeared as a mismatch of libraries but it’s not that. For reference, this was a Fedora instance. I ran the rpm utility:

rpm -qa | grep postgres

It returned:

postgresql11-libs-11.8-1PGDG.f30.x86_64
postgresql-11.7-2.fc30.x86_64
postgresql-server-11.7-2.fc30.x86_64

Then, I had them run the rpm utility again looking for the Python driver for PostgreSQL:

rpm -qa | grep psycopg2

It returned:

python3-psycopg2-2.7.7-1.fc30.x86_64

Then, it was easy to explain. The Python psycopg2 library uses both PostgreSQL 11.7 and 11.8 dependent libraries and the libpq.so.5 library is missing version information. You must ignore the error, which is really only a warning message, when you want to work on Fedora, PostgreSQL 11, and Python 3.

Written by maclochlainn

March 2nd, 2022 at 12:41 am

PostgreSQL Tables

without comments

The most straightforward way to view the description of a PostgreSQL table is the \d command. For example, this lets you display an account_list table:

\d account_list

Unfortunately, this shows you the table, indexes, and foreign key constraints. Often, you only want to see the list of columns in positional order. So, I wrote a little function to let me display only the table and columns.

There are a few techniques in the script that might seem new to some developers. For example, the data types of the return parameter values of a function that returns values from the data dictionary are specific to types used by the data dictionary. These specialized types are required because the SQL cursor gathers the information from the data dictionary in the information_schema, and most of these types can’t be cast as variable length strings.

A simple assumption that the data dictionary strings would implicitly cast to variable length strings is incorrect. That’s because while you can query them like VARCHAR variables they don’t cast to variable length string. If you wrote a wrapper function that returned VARCHAR variables, you would probably get a result like this when you call your function:

ERROR:  structure of query does not match function result type
DETAIL:  Returned type information_schema.sql_identifier does not match expected type character varying in column 1.

The “character varying” is another name for a VARCHAR data type. Some notes will advise you to fix this type of error by using the column name and a %TYPE. The %TYPE anchors the data type in the function’s parameter list to the actual data type of the data dictionary’s table. You would implement that suggestion with code like:

RETURNS TABLE ( table_schema      information_schema.columns.table_schema%TYPE
              , table_name        information_schema.columns.table_name%TYPE
              , ordinal_position  information_schema.columns.ordinal_position%TYPE
              , column_name       information_schema.columns.column_name%TYPE
              , data_type         information_schema.columns.data_type%TYPE
              , is_nullable       information_schema.columns.is_nullable%TYPE ) AS

Unfortunately, your function would raise a NOTICE for every dynamically anchored column at runtime. The NOTICE messages would appear as follows for the describe_table function with anchored parameter values:

psql:describe_table.sql:34: NOTICE:  type reference information_schema.columns.table_schema%TYPE converted to information_schema.sql_identifier
psql:describe_table.sql:35: NOTICE:  type reference information_schema.columns.table_name%TYPE converted to information_schema.sql_identifier
psql:describe_table.sql:36: NOTICE:  type reference information_schema.columns.ordinal_position%TYPE converted to information_schema.cardinal_number
psql:describe_table.sql:37: NOTICE:  type reference information_schema.columns.column_name%TYPE converted to information_schema.sql_identifier
psql:describe_table.sql:38: NOTICE:  type reference information_schema.columns.data_type%TYPE converted to information_schema.character_data
psql:describe_table.sql:39: NOTICE:  type reference information_schema.columns.is_nullable%TYPE converted to information_schema.yes_or_no

As a rule, there’s a better solution when you know how to discover the underlying data types. You can discover the required data types with the following query of the pg_attribute table in the information_schema:

SELECT attname
,      atttypid::regtype
FROM   pg_attribute
WHERE  attrelid = 'information_schema.columns'::regclass
AND    attname IN ('table_schema','table_name','ordinal_position','column_name','data_type','is_nullable')
ORDER  BY attnum;

It returns:

     attname      |              atttypid
------------------+------------------------------------
 table_schema     | information_schema.sql_identifier
 table_name       | information_schema.sql_identifier
 ordinal_position | information_schema.cardinal_number
 column_name      | information_schema.sql_identifier
 is_nullable      | information_schema.yes_or_no
 data_type        | information_schema.character_data
(6 rows)

Only the character_data type can be replaced with a VARCHAR data type, the others should be typed as shown above. Here’s the modified describe_table function.

CREATE OR REPLACE
  FUNCTION describe_table (table_name_in  VARCHAR)
  RETURNS TABLE ( table_schema      information_schema.sql_identifier
                , table_name        information_schema.sql_identifier
                , ordinal_position  information_schema.cardinal_number
                , column_name       information_schema.sql_identifier
                , data_type         VARCHAR
                , is_nullable       information_schema.yes_or_no ) AS
$$
BEGIN
  RETURN QUERY
  SELECT   c.table_schema
  ,        c.table_name
  ,        c.ordinal_position
  ,        c.column_name
  ,        CASE
             WHEN c.character_maximum_length IS NOT NULL
             THEN CONCAT(c.data_type, '(', c.character_maximum_length, ')')
             ELSE
               CASE
                 WHEN c.data_type NOT IN ('date','timestamp','timestamp with time zone')
                 THEN CONCAT(c.data_type, '(', numeric_precision::text, ')')
                 ELSE c.data_type
               END
           END AS modified_type
  ,        c.is_nullable
  FROM     information_schema.columns c
  WHERE    c.table_schema NOT IN ('information_schema', 'pg_catalog')
  AND      c.table_name = table_name_in
  ORDER BY c.table_schema
  ,        c.table_name
  ,        c.ordinal_position;
END;
$$ LANGUAGE plpgsql;

If you’re new to PL/pgSQL table functions, you can check my basic tutorial on table functions. You call the describe_table table function with the following syntax:

SELECT * FROM describe_table('account_list');

It returns:

 table_schema |  table_name  | ordinal_position |   column_name    |        data_type         | is_nullable
--------------+--------------+------------------+------------------+--------------------------+-------------
 public       | account_list |                1 | account_list_id  | integer(32)              | NO
 public       | account_list |                2 | account_number   | character varying(10)    | NO
 public       | account_list |                3 | consumed_date    | date                     | YES
 public       | account_list |                4 | consumed_by      | integer(32)              | YES
 public       | account_list |                5 | created_by       | integer(32)              | NO
 public       | account_list |                6 | creation_date    | timestamp with time zone | NO
 public       | account_list |                7 | last_updated_by  | integer(32)              | NO
 public       | account_list |                8 | last_update_date | timestamp with time zone | NO
(8 rows)

As always, I hope this helps those looking for a solution to functions that wrap the data dictionary and display table data from the PostgreSQL data dictionary.

Written by maclochlainn

February 27th, 2022 at 12:43 am

PL/pgSQL Function

without comments

How to write an overloaded set of hello_world functions in PostgreSQL PL/pgSQL. The following code lets you write and test overloaded functions and the concepts of null, zero-length string, and string values.

-- Drop the overloaded functions.
DROP FUNCTION IF EXISTS hello_world(), hello_world(whom VARCHAR);
 
-- Create the function.
CREATE FUNCTION hello_world()
RETURNS text AS
$$
DECLARE
  output  VARCHAR(20);
BEGIN
  /* Query the string into a local variable. */
  SELECT 'Hello World!' INTO output;
 
  /* Return the output text variable. */
  RETURN output;
END
$$ LANGUAGE plpgsql;
 
-- Create the function.
CREATE FUNCTION hello_world(whom VARCHAR)
RETURNS text AS
$$
DECLARE
  output  VARCHAR(20);
BEGIN
  /* Query the string into a local variable. */
  IF whom IS NULL OR LENGTH(whom) = 0 THEN
    SELECT 'Hello World!' INTO output;
  ELSE
    SELECT CONCAT('Hello ', whom, '!') INTO output;
  END IF;
 
 
  /* Return the output text variable. */
  RETURN output;
END
$$ LANGUAGE plpgsql;
 
-- Call the function.
SELECT hello_world();
SELECT hello_world(Null) AS output;
SELECT hello_world('') AS output;
SELECT hello_world('Harry') AS output;

It should print:

    output
--------------
 Hello World!
(1 row)
 
    output
--------------
 Hello World!
(1 row)
 
    output
--------------
 Hello World!
(1 row)
 
    output
--------------
 Hello Harry!
(1 row)

As always, I hope this helps those looking for the basics and how to solve problems.

Written by maclochlainn

February 25th, 2022 at 1:48 am

Oracle Container User

without comments

After you create and provision the Oracle Database 21c Express Edition (XE), you can create a c##student container user with the following two step process.

  1. Create a c##student Oracle user account with the following command:

    CREATE USER c##student IDENTIFIED BY student
    DEFAULT TABLESPACE users QUOTA 200M ON users
    TEMPORARY TABLESPACE temp;

  2. Grant necessary privileges to the newly created c##student user:

    GRANT CREATE CLUSTER, CREATE INDEXTYPE, CREATE OPERATOR
    ,     CREATE PROCEDURE, CREATE SEQUENCE, CREATE SESSION
    ,     CREATE TABLE, CREATE TRIGGER, CREATE TYPE
    ,     CREATE VIEW TO c##student;

As always, I hope this helps those looking for how to do something that’s less than clear because everybody uses tools.

Written by maclochlainn

January 31st, 2022 at 5:58 pm

macOS 2021+ DirectX

without comments

Obsolescence is always a factor with the macOS. After all, it does keep the revenue flow stable in hardware sales. Last October, Apple released macOS Monterey (12). It gets problematic for me when helping my students setup MySQL on their Apple computers. Some students come with very old machines. Take for example my wonderfully maintained MacBook Pro (mid 2014), which became obsolete with the terminal release of macOS Big Sur (11.6.2). When I did finally upgrade to that terminal release, I found my VMware (11.5.*) failed to load with a “Not enough physical memory is available …” error dialog.

The pretty Apple Dialog box is quite misleading unless you place it in context of a VMware application problem that occurs when you upgrade from one to another version of the macOS. You might go down any of three provided generalized rat holes, like paring running applications or the other nonsense on the Apple Discussion Board. While these three suggestions are useful when you’re macOS is truly running short of memory, it appears the same error can simply mean the Application isn’t supported on the new macOS release. It strikes me that this error message may be linked to a VMware virtualization issue with Hyper-V memory management with a combination of VMware (11.5.*) and macOs Big Sur (11.6).

For that reason, you can’t believe discussion threads unreservedly whether they’re from Apple or other vendors because they’re targeted to a universal context. Often users are looking for a specific fix, which means an answer to a specific use-case or problem context. The VMware Fusion 11 Release Notes clearly state that it only supports macOS Mojave (10.14) and macOS Catalina (10.15), which narrows the context, or use-case, for the error.

The error message, in this context, is most likely raised because the product is incompatible with how VMware Fusion manages memory at some level in the macOS Big Sur (11.6.*) version. As I speculated earlier, the out of memory error may be linked to how VMware uses Hyper-V but that’s a shot in the dark (or a random guess not the Peter Sellers movie of the same name that reprises his role as Inspector Jacques Clouseau).

How VMware works on the macOS is important to my students because we give them two alternatives for setting up MySQL on osMac: one is inside a Linux VM and the other uses Docker (my notes from January this year for creating a Docker instance on the macOS). My students reported errors like this earlier in the year and I suggested they upgrade to VMware Fusion 12. It seemed to work for everybody but now I can report the exact error message and verify the fix with a qualified reason.

While I’m on this topic, it’s probably best to deal with DirectX support on Apple Hardware. The Apple hardware requirements for support DirectX 11 3D Acceleration in a virtual machine is currently:

  • Mac Pro 2013 and later
  • iMac 27-inch 2014 and later
  • MacBook Pro 13-inch 2015 and later
  • MacBook Pro 15-inch 2015 with dual graphics and later
  • MacBook Air 2015 and later
  • MacBook 2015 and later
  • iMac 21-inch 2015 and later
  • iMac Pro 2017 and later
  • MacMini 2018 and later

It looks like everybody must upgrade any older Apple machines because we can probably assume most 2015 hardware will become obsolete with the new macOS in October 2022. If it’s not in your budget, you should plan for that cost now.

Fortunately, I also have a MacBook Pro (Mid 2021), the last of the Intel 9 chip models. I bought it the week before the announcement of the new tech. A little birdie told me it would be only the M1 after the announcement. The newer MacBook Pro is awesome and the 16″ screen is better than the older 15″ screen. I just hate the lack of a magnetic power cord. Alas, that’s the price of ensuring I had an Intel chip set.

As always, I hope this helps those looking for an answer.

Written by maclochlainn

December 24th, 2021 at 4:53 pm

Posted in Apple,Linux,Mac,macOS

Tagged with

Read CSV with Python

without comments

In 2009, I showed an example of how to use the MySQL LOAD DATA INFILE command. Last year, I updated the details to reset the secure_file-priv privilege to use the LOAD DATA INFILE command, but you can avoid that approach with a simple Python 3 program like the one in this example. You also can use MySQL Shell’s new parallel table import feature, introduced in 8.0.17, as noted in a comment on this blog post.

The example requires creating an avenger table, avenger.csv file, a readWriteData.py Python script, run the readWriteData.py Python script, and a query that validates the insertion of the avenger.csv file’s data into the avenger table. The complete code in five steps using the sakila demonstration database:

  • Creating the avenger table with the create_avenger.sql script:

    -- Conditionally drop the avenger table.
    DROP TABLE IF EXISTS avenger;
     
    -- Create the avenger table.
    CREATE TABLE avenger
    ( avenger_id    int unsigned PRIMARY KEY AUTO_INCREMENT
    , first_name    varchar(20)
    , last_name     varchar(20)
    , avenger_name  varchar(20))
      ENGINE=InnoDB
      AUTO_INCREMENT=1001
      DEFAULT CHARSET=utf8mb4
      COLLATE=utf8mb4_0900_ai_ci;
  • Create the avenger.csv file with the following data:

    Anthony,Stark,Iron Man
    Thor,Odinson,God of Thunder
    Steven,Rogers,Captain America
    Bruce,Banner,Hulk
    Clinton,Barton,Hawkeye
    Natasha,Romanoff,Black Widow
    Peter,Parker,Spiderman
    Steven,Strange,Dr. Strange
    Scott,Lange,Ant-man
    Hope,van Dyne,Wasp
  • Create the readWriteFile.py Python 3 script:

    # Import libraries.
    import csv
    import mysql.connector
    from mysql.connector import errorcode
    from csv import reader
     
    #  Attempt the statement.
    # ============================================================
    #  Use a try-catch block to manage the connection.
    # ============================================================
    try:
      # Open connection.
      cnx = mysql.connector.connect( user='student'
                                   , password='student'
                                   , host='127.0.0.1'
                                   , database='sakila')
      # Create cursor.
      cursor = cnx.cursor()
     
      # Open file in read mode and pass the file object to reader.
      with open('avenger.csv', 'r') as read_obj:
        csv_reader = reader(read_obj)
     
        # Declare the dynamic statement.
        stmt = ("INSERT INTO avenger "
                "(first_name, last_name, avenger_name) "
                "VALUES "
                "(%s, %s, %s)")
     
        # Iterate over each row in the csv using reader object
        for row in csv_reader:
          cursor.execute(stmt, row)
     
        # Commit the writes.
        cnx.commit()
     
        #close the connection to the database.
        cursor.close()
     
    # Handle exception and close connection.
    except mysql.connector.Error as e:
      if e.errno == errorcode.ER_ACCESS_DENIED_ERROR:
        print("Something is wrong with your user name or password")
      elif e.errno == errorcode.ER_BAD_DB_ERROR:
        print("Database does not exist")
      else:
        print("Error code:", e.errno)        # error number
        print("SQLSTATE value:", e.sqlstate) # SQLSTATE value
        print("Error message:", e.msg)       # error message
     
    # Close the connection when the try block completes.
    else:
      cnx.close()
  • Run the readWriteFile.py file:

    python3 readWriteFile.py
  • Query the avenger table:

    SELECT * FROM avenger;

    It returns:

    +------------+------------+-----------+-----------------+
    | avenger_id | first_name | last_name | avenger_name    |
    +------------+------------+-----------+-----------------+
    |       1001 | Anthony    | Stark     | Iron Man        |
    |       1002 | Thor       | Odinson   | God of Thunder  |
    |       1003 | Steven     | Rogers    | Captain America |
    |       1004 | Bruce      | Banner    | Hulk            |
    |       1005 | Clinton    | Barton    | Hawkeye         |
    |       1006 | Natasha    | Romanoff  | Black Widow     |
    |       1007 | Peter      | Parker    | Spiderman       |
    |       1008 | Steven     | Strange   | Dr. Strange     |
    |       1009 | Scott      | Lange     | Ant-man         |
    |       1010 | Hope       | van Dyne  | Wasp            |
    +------------+------------+-----------+-----------------+
    10 rows in set (0.00 sec)

Written by maclochlainn

December 12th, 2021 at 12:17 am

MySQL Query Performance

without comments

Working through our chapter on MySQL views, I wrote the query two ways to introduce the idea of SQL tuning. That’s one of the final topics before introducing JSON types.

I gave the students this query based on the Sakila sample database after explaining how to use the EXPLAIN syntax. The query only uses only inner joins, which are generally faster and more efficient than subqueries as a rule of thumb than correlated subqueries.

SELECT   ctry.country AS country_name
,        SUM(p.amount) AS tot_payments
FROM     city c INNER JOIN address a
ON       c.city_id = a.city_id INNER JOIN customer cus
ON       a.address_id = cus.address_id INNER JOIN payment p
ON       cus.customer_id = p.customer_id INNER JOIN country ctry
ON       c.country_id = ctry.country_id
GROUP BY ctry.country;

It generated the following tabular explain plan output:

+----+-------------+-------+------------+--------+---------------------------+--------------------+---------+------------------------+------+----------+------------------------------+
| id | select_type | table | partitions | type   | possible_keys             | key                | key_len | ref                    | rows | filtered | Extra                        |
+----+-------------+-------+------------+--------+---------------------------+--------------------+---------+------------------------+------+----------+------------------------------+
|  1 | SIMPLE      | cus   | NULL       | index  | PRIMARY,idx_fk_address_id | idx_fk_address_id  | 2       | NULL                   |  599 |   100.00 | Using index; Using temporary |
|  1 | SIMPLE      | a     | NULL       | eq_ref | PRIMARY,idx_fk_city_id    | PRIMARY            | 2       | sakila.cus.address_id  |    1 |   100.00 | NULL                         |
|  1 | SIMPLE      | c     | NULL       | eq_ref | PRIMARY,idx_fk_country_id | PRIMARY            | 2       | sakila.a.city_id       |    1 |   100.00 | NULL                         |
|  1 | SIMPLE      | ctry  | NULL       | eq_ref | PRIMARY                   | PRIMARY            | 2       | sakila.c.country_id    |    1 |   100.00 | NULL                         |
|  1 | SIMPLE      | p     | NULL       | ref    | idx_fk_customer_id        | idx_fk_customer_id | 2       | sakila.cus.customer_id |   26 |   100.00 | NULL                         |
+----+-------------+-------+------------+--------+---------------------------+--------------------+---------+------------------------+------+----------+------------------------------+
5 rows in set, 1 warning (0.02 sec)

Then, I used MySQL Workbench to generate the following visual explain plan:

Then, I compared it against a refactored version of the query that uses a correlated subquery in the SELECT-list. The example comes form Appendix B in Learning SQL, 3rd Edition by Alan Beaulieu.

SELECT ctry.country
,      (SELECT   SUM(p.amount)
        FROM     city c INNER JOIN address a
        ON       c.city_id = a.city_id INNER JOIN customer cus
        ON       a.address_id = cus.address_id INNER JOIN payment p
        ON       cus.customer_id = p.customer_id
        WHERE    c.country_id = ctry.country_id) AS tot_payments
FROM   country ctry;

It generated the following tabular explain plan output:

+----+--------------------+-------+------------+------+---------------------------+--------------------+---------+------------------------+------+----------+-------------+
| id | select_type        | table | partitions | type | possible_keys             | key                | key_len | ref                    | rows | filtered | Extra       |
+----+--------------------+-------+------------+------+---------------------------+--------------------+---------+------------------------+------+----------+-------------+
|  1 | PRIMARY            | ctry  | NULL       | ALL  | NULL                      | NULL               | NULL    | NULL                   |  109 |   100.00 | NULL        |
|  2 | DEPENDENT SUBQUERY | c     | NULL       | ref  | PRIMARY,idx_fk_country_id | idx_fk_country_id  | 2       | sakila.ctry.country_id |    5 |   100.00 | Using index |
|  2 | DEPENDENT SUBQUERY | a     | NULL       | ref  | PRIMARY,idx_fk_city_id    | idx_fk_city_id     | 2       | sakila.c.city_id       |    1 |   100.00 | Using index |
|  2 | DEPENDENT SUBQUERY | cus   | NULL       | ref  | PRIMARY,idx_fk_address_id | idx_fk_address_id  | 2       | sakila.a.address_id    |    1 |   100.00 | Using index |
|  2 | DEPENDENT SUBQUERY | p     | NULL       | ref  | idx_fk_customer_id        | idx_fk_customer_id | 2       | sakila.cus.customer_id |   26 |   100.00 | NULL        |
+----+--------------------+-------+------------+------+---------------------------+--------------------+---------+------------------------+------+----------+-------------+
5 rows in set, 2 warnings (0.00 sec)

and, MySQL Workbench generated the following visual explain plan:

The tabular explain plan identifies the better performing query to an experienced eye but the visual explain plan works better for those new to SQL tuning.

The second query performs best because it reads the least data by leveraging the indexes best. As always, I hope these examples help those looking at learning more about MySQL.

Written by maclochlainn

December 9th, 2021 at 1:01 am

MySQL WITH Clause

without comments

When I went over my example of using the WITH clause to solve how to use a series of literal values in data sets, some students got it right away and some didn’t. The original post showed how to solve a problem where one value in the data set is returned in the SELECT-list and two values are used as the minimum and maximum values with a BETWEEN operator. It used three approaches with literal values:

  • A list of Python dictionaries that require you to filter the return set from the database through a range loop and if statement that mimics a SQL BETWEEN operator.
  • A WITH clause that accepts the literals as bind variables to filter the query results inside the query.
  • A table design that holds the literals values that an analyst might use for reporting.

It was the last example that required elaboration. I explained you might build a web form that uses a table, and the table could allow a data analyst to enter parameter sets. That way the analyst could submit a flag value to use one or another set of values. I threw out the idea on the whiteboard of introducing a report column to the prior post’s level table. The student went off to try it.

Two problems occurred. The first was in the design of the new table and the second was how to properly use the MySQL Python driver.

Below is a formal table design that supports this extension of the first blog post as a list of parameter values. It uses a report column as a super key to return a set of possible values. One value will show in the SELECT-list and the other two values deploy as the minimum and maximum values in a BETWEEN operator. It is seeded with two sets of values. One of the report possibilities is Summary level with three possibilities and the other is the Detail level with five possibilities.

-- Conditionally drop the levels table.
DROP TABLE IF EXISTS levels;
 
-- Create the levels list.
CREATE TABLE levels
( level      VARCHAR(16)
, report     ENUM('Summary','Detail')
, min_roles  INT
, max_roles  INT );
 
-- Insert values into the list table.
INSERT INTO levels
( level, report, min_roles, max_roles )
VALUES
 ('Hollywood Star','Summary', 30, 99999)
,('Prolific Actor','Summary', 20, 29)
,('Newcommer','Summary', 1, 19)
,('Hollywood Star','Detail', 30, 99999)
,('Prolific Actor','Detail', 20, 29)
,('Regular Actor','Detail', 10, 19)
,('Actor','Detail', 5, 9)
,('Newcommer','Detail', 1, 4);

The foregoing table design uses an ENUM type because reporting parameter sets are typically fewer than 64 possibilities. If you use the table to support multiple reports, you should add a second super key column like report_type. The report_type column key would let you use the table to support a series of different report parameter lists.

While the student used a %s inside the query, they created a runtime error when trying to pass the single bind variable into the query. The student misunderstood how to convert a report column input parameter variable into a tuple, which shows up when the student calls the Python MySQL Driver, like this:

59
cursor.execute(query, (report))

The student’s code generated the following error stack:

Traceback (most recent call last):
  File "./python-with-clause.py", line 59, in <module>
    cursor.execute(query,(report))
  File "/usr/lib/python3.7/site-packages/mysql/connector/cursor_cext.py", line 248, in execute
    prepared = self._cnx.prepare_for_mysql(params)
  File "/usr/lib/python3.7/site-packages/mysql/connector/connection_cext.py", line 632, in prepare_for_mysql
    raise ValueError("Could not process parameters")
ValueError: Could not process parameters

The ValueError should indicate to the developer that they’ve used a wrong data type in the call to the method:

cursor.execute(<class 'str'>,<class 'tuple'>)

This clearly was a misunderstanding of how to cast a single string to a tuple. A quick explanation of how Python casts a single string into a tuple can best be illustrated inside an interactive Python shell, like:

>>> # Define a variable.
>>> x = 'Detail'
>>> # An incorrect attempt to make a string a tuple.
>>> y = (x)
>>> # Check type of y after assignment.
>>> print(type(y))
<class 'str'>
>>> # A correct attempt to make a string a tuple.
>>> y = tuple(x)
>>> # Check type of y after assignment.
>>> print(type(y))
<class 'tuple'>
>>> # An alternative to make a string a tuple.
>>> z = (x,)
>>> # Check type of z after assignment.
>>> print(type(z))
<class 'tuple'>

So, the fix was quite simple to line 59:

59
cursor.execute(query, (report,))

The student started with a copy of a Python program that I provided. I fixed the argument handling and added some comments. The line 59 reference above maps to this code example.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
# Import the library.
import sys
import mysql.connector
from mysql.connector import errorcode
 
# Capture argument list.
fullCmdArguments = sys.argv
 
# Assign argument list to variable.
argumentList = fullCmdArguments[1:]
 
# Define a standard report variable.
report = "Summary"
 
#  Check and process argument list.
# ============================================================
#  If there are less than two arguments provide default values.
#  Else enumerate and convert strings to dates.
# ============================================================
if (len(argumentList) == 1):
  # Set a default start date.
  if (isinstance(report,str)):
    report = argumentList[0]
 
#  Attempt the query.
# ============================================================
#  Use a try-catch block to manage the connection.
# ============================================================
try:
  # Open connection.
  cnx = mysql.connector.connect(user='student', password='student',
                                host='127.0.0.1',
                                database='sakila')
  # Create cursor.
  cursor = cnx.cursor()
 
  # Set the query statement.
  query = ("WITH actors AS "
           "(SELECT   a.actor_id "
           " ,        a.first_name "
           " ,        a.last_name "
           " ,        COUNT(fa.actor_id) AS num_roles "
           " FROM     actor a INNER JOIN film_actor fa "
           " ON       a.actor_id = fa.actor_id "
           " GROUP BY a.actor_id "
           " ,        a.first_name "
           " ,        a.last_name ) "
           " SELECT   a.first_name "
           " ,        a.last_name "
           " ,        l.level "
           " ,        a.num_roles "
           " FROM     actors a CROSS JOIN levels l "
           " WHERE    a.num_roles BETWEEN l.min_roles AND l.max_roles "
           " AND      l.report = %s "
           " ORDER BY a.last_name "
           " ,        a.first_name")
 
  # Execute cursor.
  cursor.execute(query,(report,))
 
  # Display the rows returned by the query.
  for (first_name, last_name, level, num_roles) in cursor:
    print('{0} {1} is a {2} with {3} films.'.format( first_name.title()
                                                   , last_name.title()
                                                   , level.title()
                                                   , num_roles))
 
  # Close cursor.
  cursor.close()
 
# ------------------------------------------------------------
# Handle exception and close connection.
except mysql.connector.Error as e:
  if e.errno == errorcode.ER_ACCESS_DENIED_ERROR:
    print("Something is wrong with your user name or password")
  elif e.errno == errorcode.ER_BAD_DB_ERROR:
    print("Database does not exist")
  else:
    print("Error code:", e.errno)        # error number
    print("SQLSTATE value:", e.sqlstate) # SQLSTATE value
    print("Error message:", e.msg)       # error message
 
# Close the connection when the try block completes.
else:
  cnx.close()

A Linux shell program like the following (provided the name of the shell script and Python program are the same) can run the Python program with or without a parameter. It works without a parameter because it sets a default value for the report variable.

# Switch the file extension and run the python program.
file=${0/%sh/py}
python3 ${file} "${@}"

You call the shell script like this:

./python-with-clause.sh Detail

As always, I hope this helps those looking for a solution.

Written by maclochlainn

November 14th, 2021 at 11:01 pm

Linux sqlplus wrapper

without comments

Here’s a quick way to ensure you can use the up-arrows and navigation keys when using the sqlplus command-line interface. You can just add it to your .bashrc file.

sqlplus ()
{ 
    path=`which rlwrap 2>/dev/null`;
    file='';
    if [ -n ${path} ]; then
        file=${path##/*/};
    fi;
    if [ -n ${file} ] && [[ ${file} = "rlwrap" ]]; then
        rlwrap sqlplus "${@}";
    else
        echo "Command-line history unavailable: Install the rlwrap package.";
        $ORACLE_HOME/bin/sqlplus "${@}";
    fi
}

As always, I hope this helps those looking of solutions.

Written by maclochlainn

November 12th, 2021 at 11:34 pm