Archive for the ‘Linux’ Category
Python on PostgreSQL
The ODBC library you use when connecting Python to PostgreSQL is the psycopg2 Python library. This blog post will show use how to use it in Python and install it on your Fedora Linux installation. It leverages a videodb database that I show you how to build in this earlier post on configuring PostgreSQL 14.
You would import psycopg2 as follows in your Python code:
import psycopg2 |
Unfortunately, that only works on Linux servers when you’ve installed the library. That library isn’t installed with generic Python libraries. You get the following error when the psycopg2 library isn’t installed on your server.
Traceback (most recent call last): File "python_new_hire.sql", line 1, in <module> import psycopg2 ModuleNotFoundError: No module named 'psycopg2' |
You can install it on Fedora Linux with the following command:
yum install python3-psycopg2 |
It will install:
==================================================================================== Package Architecture Version Repository Size ==================================================================================== Installing: python3-psycopg2 x86_64 2.7.7-1.fc30 fedora 160 k Transaction Summary ==================================================================================== Install 1 Package Total download size: 160 k Installed size: 593 k Is this ok [y/N]: y Downloading Packages: python3-psycopg2-2.7.7-1.fc30.x86_64.rpm 364 kB/s | 160 kB 00:00 ------------------------------------------------------------------------------------ Total 167 kB/s | 160 kB 00:00 Running transaction check Transaction check succeeded. Running transaction test Transaction test succeeded. Running transaction Preparing : 1/1 Installing : python3-psycopg2-2.7.7-1.fc30.x86_64 1/1 Running scriptlet: python3-psycopg2-2.7.7-1.fc30.x86_64 1/1 Verifying : python3-psycopg2-2.7.7-1.fc30.x86_64 1/1 Installed: python3-psycopg2-2.7.7-1.fc30.x86_64 Complete! |
Here’s a quick test case that you can run in PostgreSQL and Python to test all the pieces. The first SQL script creates a new_hire table and inserts two rows, and the Python program queries data from the new_hire table.
The new_hire.sql file creates the new_hire table and inserts two rows:
-- Environment settings for the script. SET SESSION "videodb.table_name" = 'new_hire'; SET CLIENT_MIN_MESSAGES TO ERROR; -- Verify table name. SELECT current_setting('videodb.table_name'); -- ------------------------------------------------------------------ -- Conditionally drop table. -- ------------------------------------------------------------------ DROP TABLE IF EXISTS new_hire CASCADE; -- ------------------------------------------------------------------ -- Create table. -- ------------------------------------------------------------------- CREATE TABLE new_hire ( new_hire_id SERIAL , first_name VARCHAR(20) NOT NULL , middle_name VARCHAR(20) , last_name VARCHAR(20) NOT NULL , hire_date TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP , PRIMARY KEY (new_hire_id)); -- Alter the sequence by restarting it at 1001. ALTER SEQUENCE new_hire_new_hire_id_seq RESTART WITH 1001; -- Display the table organization. SELECT tc.table_catalog || '.' || tc.constraint_name AS constraint_name , tc.table_catalog || '.' || tc.table_name AS table_name , kcu.column_name , ccu.table_catalog || '.' || ccu.table_name AS foreign_table_name , ccu.column_name AS foreign_column_name FROM information_schema.table_constraints AS tc JOIN information_schema.key_column_usage AS kcu ON tc.constraint_name = kcu.constraint_name AND tc.table_schema = kcu.table_schema JOIN information_schema.constraint_column_usage AS ccu ON ccu.constraint_name = tc.constraint_name AND ccu.table_schema = tc.table_schema WHERE tc.constraint_type = 'FOREIGN KEY' AND tc.table_name = current_setting('videodb.table_name') ORDER BY 1; SELECT c1.table_name , c1.ordinal_position , c1.column_name , CASE WHEN c1.is_nullable = 'NO' AND c2.column_name IS NOT NULL THEN 'PRIMARY KEY' WHEN c1.is_nullable = 'NO' AND c2.column_name IS NULL THEN 'NOT NULL' END AS is_nullable , CASE WHEN data_type = 'character varying' THEN data_type||'('||character_maximum_length||')' WHEN data_type = 'numeric' THEN CASE WHEN numeric_scale != 0 AND numeric_scale IS NOT NULL THEN data_type||'('||numeric_precision||','||numeric_scale||')' ELSE data_type||'('||numeric_precision||')' END ELSE data_type END AS data_type FROM information_schema.columns c1 LEFT JOIN (SELECT trim(regexp_matches(column_default,current_setting('videodb.table_name'))::text,'{}')||'_id' column_name FROM information_schema.columns) c2 ON c1.column_name = c2.column_name WHERE c1.table_name = current_setting('videodb.table_name') ORDER BY c1.ordinal_position; -- Display primary key and unique constraints. SELECT constraint_name , lower(constraint_type) AS constraint_type FROM information_schema.table_constraints WHERE table_name = current_setting('videodb.table_name') AND constraint_type IN ('PRIMARY KEY','UNIQUE'); -- Insert two test records. INSERT INTO new_hire ( first_name, middle_name, last_name, hire_date ) VALUES ('Malcolm','Jacob','Lewis','2018-2-14') ,('Henry',null,'Chabot','1990-07-31'); |
You can put it into a local directory, connect as the student user to a videodb database, and run the following command (or any database you’ve created).
\i new_hire.sql |
The new_hire.py file creates the new_hire table and inserts two rows:
# Import the PostgreSQL connector library. import psycopg2 try: # Open a connection to the database. connection = psycopg2.connect( user="student" , password="student" , port="5432" , dbname="videodb") # Open a cursor. cursor = connection.cursor() # Assign a static query. query = "SELECT new_hire_id, first_name, last_name " \ "FROM new_hire" # Parse and execute the query. cursor.execute(query) # Fetch all rows from a table. records = cursor.fetchall() # Read through and print the rows as tuples. for row in range(0, len(records)): print(records[row]) except (Exception, psycopg2.Error) as error : print("Error while fetching data from PostgreSQL", error) finally: # Close the database connection. if (connection): cursor.close() connection.close() |
You run it from the command line, like:
python3 ./new_hire.py |
It should print:
(1001, 'Malcolm', 'Lewis') (1002, 'Henry', 'Chabot') |
As always, I hope this helps those trying to sort out how to connect Python to PostgreSQL.
PostgreSQL CLI Error
Problems get reported to me all the time on installations for my students, this one was interesting. They got an error complaining about a missing libpq.so.5 library.
psql: /usr/pgsql-11/lib/libpq.so.5: no version information available (required by psql) psql: /usr/pgsql-11/lib/libpq.so.5: no version information available (required by psql) could not change directory to "/root": Permission denied psql (11.7, server 11.8) Type "help" for help. postgres=# |
It appeared as a mismatch of libraries but it’s not that. For reference, this was a Fedora instance. I ran the rpm utility:
rpm -qa | grep postgres |
It returned:
postgresql11-libs-11.8-1PGDG.f30.x86_64 postgresql-11.7-2.fc30.x86_64 postgresql-server-11.7-2.fc30.x86_64 |
Then, I had them run the rpm utility again looking for the Python driver for PostgreSQL:
rpm -qa | grep psycopg2 |
It returned:
python3-psycopg2-2.7.7-1.fc30.x86_64 |
Then, it was easy to explain. The Python psycopg2 library uses both PostgreSQL 11.7 and 11.8 dependent libraries and the libpq.so.5 library is missing version information. You must ignore the error, which is really only a warning message, when you want to work on Fedora, PostgreSQL 11, and Python 3.
PostgreSQL Tables
The most straightforward way to view the description of a PostgreSQL table is the \d command. For example, this lets you display an account_list table:
\d account_list |
Unfortunately, this shows you the table, indexes, and foreign key constraints. Often, you only want to see the list of columns in positional order. So, I wrote a little function to let me display only the table and columns.
There are a few techniques in the script that might seem new to some developers. For example, the data types of the return parameter values of a function that returns values from the data dictionary are specific to types used by the data dictionary. These specialized types are required because the SQL cursor gathers the information from the data dictionary in the information_schema, and most of these types can’t be cast as variable length strings.
A simple assumption that the data dictionary strings would implicitly cast to variable length strings is incorrect. That’s because while you can query them like VARCHAR variables they don’t cast to variable length string. If you wrote a wrapper function that returned VARCHAR variables, you would probably get a result like this when you call your function:
ERROR: structure of query does not match function result type DETAIL: Returned type information_schema.sql_identifier does not match expected type character varying in column 1. |
The “character varying” is another name for a VARCHAR data type. Some notes will advise you to fix this type of error by using the column name and a %TYPE. The %TYPE anchors the data type in the function’s parameter list to the actual data type of the data dictionary’s table. You would implement that suggestion with code like:
RETURNS TABLE ( table_schema information_schema.columns.table_schema%TYPE , table_name information_schema.columns.table_name%TYPE , ordinal_position information_schema.columns.ordinal_position%TYPE , column_name information_schema.columns.column_name%TYPE , data_type information_schema.columns.data_type%TYPE , is_nullable information_schema.columns.is_nullable%TYPE ) AS |
Unfortunately, your function would raise a NOTICE for every dynamically anchored column at runtime. The NOTICE messages would appear as follows for the describe_table function with anchored parameter values:
psql:describe_table.sql:34: NOTICE: type reference information_schema.columns.table_schema%TYPE converted to information_schema.sql_identifier psql:describe_table.sql:35: NOTICE: type reference information_schema.columns.table_name%TYPE converted to information_schema.sql_identifier psql:describe_table.sql:36: NOTICE: type reference information_schema.columns.ordinal_position%TYPE converted to information_schema.cardinal_number psql:describe_table.sql:37: NOTICE: type reference information_schema.columns.column_name%TYPE converted to information_schema.sql_identifier psql:describe_table.sql:38: NOTICE: type reference information_schema.columns.data_type%TYPE converted to information_schema.character_data psql:describe_table.sql:39: NOTICE: type reference information_schema.columns.is_nullable%TYPE converted to information_schema.yes_or_no |
As a rule, there’s a better solution when you know how to discover the underlying data types. You can discover the required data types with the following query of the pg_attribute table in the information_schema:
SELECT attname , atttypid::regtype FROM pg_attribute WHERE attrelid = 'information_schema.columns'::regclass AND attname IN ('table_schema','table_name','ordinal_position','column_name','data_type','is_nullable') ORDER BY attnum; |
It returns:
attname | atttypid ------------------+------------------------------------ table_schema | information_schema.sql_identifier table_name | information_schema.sql_identifier ordinal_position | information_schema.cardinal_number column_name | information_schema.sql_identifier is_nullable | information_schema.yes_or_no data_type | information_schema.character_data (6 rows) |
Only the character_data type can be replaced with a VARCHAR data type, the others should be typed as shown above. Here’s the modified describe_table function.
CREATE OR REPLACE FUNCTION describe_table (table_name_in VARCHAR) RETURNS TABLE ( table_schema information_schema.sql_identifier , table_name information_schema.sql_identifier , ordinal_position information_schema.cardinal_number , column_name information_schema.sql_identifier , data_type VARCHAR , is_nullable information_schema.yes_or_no ) AS $$ BEGIN RETURN QUERY SELECT c.table_schema , c.table_name , c.ordinal_position , c.column_name , CASE WHEN c.character_maximum_length IS NOT NULL THEN CONCAT(c.data_type, '(', c.character_maximum_length, ')') ELSE CASE WHEN c.data_type NOT IN ('date','timestamp','timestamp with time zone') THEN CONCAT(c.data_type, '(', numeric_precision::text, ')') ELSE c.data_type END END AS modified_type , c.is_nullable FROM information_schema.columns c WHERE c.table_schema NOT IN ('information_schema', 'pg_catalog') AND c.table_name = table_name_in ORDER BY c.table_schema , c.table_name , c.ordinal_position; END; $$ LANGUAGE plpgsql; |
If you’re new to PL/pgSQL table functions, you can check my basic tutorial on table functions. You call the describe_table table function with the following syntax:
SELECT * FROM describe_table('account_list'); |
It returns:
table_schema | table_name | ordinal_position | column_name | data_type | is_nullable --------------+--------------+------------------+------------------+--------------------------+------------- public | account_list | 1 | account_list_id | integer(32) | NO public | account_list | 2 | account_number | character varying(10) | NO public | account_list | 3 | consumed_date | date | YES public | account_list | 4 | consumed_by | integer(32) | YES public | account_list | 5 | created_by | integer(32) | NO public | account_list | 6 | creation_date | timestamp with time zone | NO public | account_list | 7 | last_updated_by | integer(32) | NO public | account_list | 8 | last_update_date | timestamp with time zone | NO (8 rows) |
As always, I hope this helps those looking for a solution to functions that wrap the data dictionary and display table data from the PostgreSQL data dictionary.
PL/pgSQL Function
How to write an overloaded set of hello_world functions in PostgreSQL PL/pgSQL. The following code lets you write and test overloaded functions and the concepts of null, zero-length string, and string values.
-- Drop the overloaded functions. DROP FUNCTION IF EXISTS hello_world(), hello_world(whom VARCHAR); -- Create the function. CREATE FUNCTION hello_world() RETURNS text AS $$ DECLARE output VARCHAR(20); BEGIN /* Query the string into a local variable. */ SELECT 'Hello World!' INTO output; /* Return the output text variable. */ RETURN output; END $$ LANGUAGE plpgsql; -- Create the function. CREATE FUNCTION hello_world(whom VARCHAR) RETURNS text AS $$ DECLARE output VARCHAR(20); BEGIN /* Query the string into a local variable. */ IF whom IS NULL OR LENGTH(whom) = 0 THEN SELECT 'Hello World!' INTO output; ELSE SELECT CONCAT('Hello ', whom, '!') INTO output; END IF; /* Return the output text variable. */ RETURN output; END $$ LANGUAGE plpgsql; -- Call the function. SELECT hello_world(); SELECT hello_world(Null) AS output; SELECT hello_world('') AS output; SELECT hello_world('Harry') AS output; |
It should print:
output -------------- Hello World! (1 row) output -------------- Hello World! (1 row) output -------------- Hello World! (1 row) output -------------- Hello Harry! (1 row) |
As always, I hope this helps those looking for the basics and how to solve problems.
Oracle Container User
After you create and provision the Oracle Database 21c Express Edition (XE), you can create a c##student container user with the following two step process.
- Create a c##student Oracle user account with the following command:
CREATE USER c##student IDENTIFIED BY student DEFAULT TABLESPACE users QUOTA 200M ON users TEMPORARY TABLESPACE temp;
- Grant necessary privileges to the newly created c##student user:
GRANT CREATE CLUSTER, CREATE INDEXTYPE, CREATE OPERATOR , CREATE PROCEDURE, CREATE SEQUENCE, CREATE SESSION , CREATE TABLE, CREATE TRIGGER, CREATE TYPE , CREATE VIEW TO c##student;
As always, I hope this helps those looking for how to do something that’s less than clear because everybody uses tools.
macOS 2021+ DirectX
Obsolescence is always a factor with the macOS. After all, it does keep the revenue flow stable in hardware sales. Last October, Apple released macOS Monterey (12). It gets problematic for me when helping my students setup MySQL on their Apple computers. Some students come with very old machines. Take for example my wonderfully maintained MacBook Pro (mid 2014), which became obsolete with the terminal release of macOS Big Sur (11.6.2). When I did finally upgrade to that terminal release, I found my VMware (11.5.*) failed to load with a “Not enough physical memory is available …” error dialog.
The pretty Apple Dialog box is quite misleading unless you place it in context of a VMware application problem that occurs when you upgrade from one to another version of the macOS. You might go down any of three provided generalized rat holes, like paring running applications or the other nonsense on the Apple Discussion Board. While these three suggestions are useful when you’re macOS is truly running short of memory, it appears the same error can simply mean the Application isn’t supported on the new macOS release. It strikes me that this error message may be linked to a VMware virtualization issue with Hyper-V memory management with a combination of VMware (11.5.*) and macOs Big Sur (11.6).
For that reason, you can’t believe discussion threads unreservedly whether they’re from Apple or other vendors because they’re targeted to a universal context. Often users are looking for a specific fix, which means an answer to a specific use-case or problem context. The VMware Fusion 11 Release Notes clearly state that it only supports macOS Mojave (10.14) and macOS Catalina (10.15), which narrows the context, or use-case, for the error.
The error message, in this context, is most likely raised because the product is incompatible with how VMware Fusion manages memory at some level in the macOS Big Sur (11.6.*) version. As I speculated earlier, the out of memory error may be linked to how VMware uses Hyper-V but that’s a shot in the dark (or a random guess not the Peter Sellers movie of the same name that reprises his role as Inspector Jacques Clouseau).
How VMware works on the macOS is important to my students because we give them two alternatives for setting up MySQL on osMac: one is inside a Linux VM and the other uses Docker (my notes from January this year for creating a Docker instance on the macOS). My students reported errors like this earlier in the year and I suggested they upgrade to VMware Fusion 12. It seemed to work for everybody but now I can report the exact error message and verify the fix with a qualified reason.
While I’m on this topic, it’s probably best to deal with DirectX support on Apple Hardware. The Apple hardware requirements for support DirectX 11 3D Acceleration in a virtual machine is currently:
- Mac Pro 2013 and later
- iMac 27-inch 2014 and later
- MacBook Pro 13-inch 2015 and later
- MacBook Pro 15-inch 2015 with dual graphics and later
- MacBook Air 2015 and later
- MacBook 2015 and later
- iMac 21-inch 2015 and later
- iMac Pro 2017 and later
- MacMini 2018 and later
It looks like everybody must upgrade any older Apple machines because we can probably assume most 2015 hardware will become obsolete with the new macOS in October 2022. If it’s not in your budget, you should plan for that cost now.
Fortunately, I also have a MacBook Pro (Mid 2021), the last of the Intel 9 chip models. I bought it the week before the announcement of the new tech. A little birdie told me it would be only the M1 after the announcement. The newer MacBook Pro is awesome and the 16″ screen is better than the older 15″ screen. I just hate the lack of a magnetic power cord. Alas, that’s the price of ensuring I had an Intel chip set.
As always, I hope this helps those looking for an answer.
Read CSV with Python
In 2009, I showed an example of how to use the MySQL LOAD DATA INFILE command. Last year, I updated the details to reset the secure_file-priv privilege to use the LOAD DATA INFILE command, but you can avoid that approach with a simple Python 3 program like the one in this example. You also can use MySQL Shell’s new parallel table import feature, introduced in 8.0.17, as noted in a comment on this blog post.
The example requires creating an avenger table, avenger.csv file, a readWriteData.py Python script, run the readWriteData.py Python script, and a query that validates the insertion of the avenger.csv file’s data into the avenger table. The complete code in five steps using the sakila demonstration database:
- Creating the avenger table with the create_avenger.sql script:
-- Conditionally drop the avenger table. DROP TABLE IF EXISTS avenger; -- Create the avenger table. CREATE TABLE avenger ( avenger_id int unsigned PRIMARY KEY AUTO_INCREMENT , first_name varchar(20) , last_name varchar(20) , avenger_name varchar(20)) ENGINE=InnoDB AUTO_INCREMENT=1001 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
- Create the avenger.csv file with the following data:
Anthony,Stark,Iron Man Thor,Odinson,God of Thunder Steven,Rogers,Captain America Bruce,Banner,Hulk Clinton,Barton,Hawkeye Natasha,Romanoff,Black Widow Peter,Parker,Spiderman Steven,Strange,Dr. Strange Scott,Lange,Ant-man Hope,van Dyne,Wasp
- Create the readWriteFile.py Python 3 script:
# Import libraries. import csv import mysql.connector from mysql.connector import errorcode from csv import reader # Attempt the statement. # ============================================================ # Use a try-catch block to manage the connection. # ============================================================ try: # Open connection. cnx = mysql.connector.connect( user='student' , password='student' , host='127.0.0.1' , database='sakila') # Create cursor. cursor = cnx.cursor() # Open file in read mode and pass the file object to reader. with open('avenger.csv', 'r') as read_obj: csv_reader = reader(read_obj) # Declare the dynamic statement. stmt = ("INSERT INTO avenger " "(first_name, last_name, avenger_name) " "VALUES " "(%s, %s, %s)") # Iterate over each row in the csv using reader object for row in csv_reader: cursor.execute(stmt, row) # Commit the writes. cnx.commit() #close the connection to the database. cursor.close() # Handle exception and close connection. except mysql.connector.Error as e: if e.errno == errorcode.ER_ACCESS_DENIED_ERROR: print("Something is wrong with your user name or password") elif e.errno == errorcode.ER_BAD_DB_ERROR: print("Database does not exist") else: print("Error code:", e.errno) # error number print("SQLSTATE value:", e.sqlstate) # SQLSTATE value print("Error message:", e.msg) # error message # Close the connection when the try block completes. else: cnx.close()
- Run the readWriteFile.py file:
python3 readWriteFile.py
- Query the avenger table:
SELECT * FROM avenger;
It returns:
+------------+------------+-----------+-----------------+ | avenger_id | first_name | last_name | avenger_name | +------------+------------+-----------+-----------------+ | 1001 | Anthony | Stark | Iron Man | | 1002 | Thor | Odinson | God of Thunder | | 1003 | Steven | Rogers | Captain America | | 1004 | Bruce | Banner | Hulk | | 1005 | Clinton | Barton | Hawkeye | | 1006 | Natasha | Romanoff | Black Widow | | 1007 | Peter | Parker | Spiderman | | 1008 | Steven | Strange | Dr. Strange | | 1009 | Scott | Lange | Ant-man | | 1010 | Hope | van Dyne | Wasp | +------------+------------+-----------+-----------------+ 10 rows in set (0.00 sec)
MySQL Query Performance
Working through our chapter on MySQL views, I wrote the query two ways to introduce the idea of SQL tuning. That’s one of the final topics before introducing JSON types.
I gave the students this query based on the Sakila sample database after explaining how to use the EXPLAIN syntax. The query only uses only inner joins, which are generally faster and more efficient than subqueries as a rule of thumb than correlated subqueries.
SELECT ctry.country AS country_name , SUM(p.amount) AS tot_payments FROM city c INNER JOIN address a ON c.city_id = a.city_id INNER JOIN customer cus ON a.address_id = cus.address_id INNER JOIN payment p ON cus.customer_id = p.customer_id INNER JOIN country ctry ON c.country_id = ctry.country_id GROUP BY ctry.country; |
It generated the following tabular explain plan output:
+----+-------------+-------+------------+--------+---------------------------+--------------------+---------+------------------------+------+----------+------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+--------+---------------------------+--------------------+---------+------------------------+------+----------+------------------------------+ | 1 | SIMPLE | cus | NULL | index | PRIMARY,idx_fk_address_id | idx_fk_address_id | 2 | NULL | 599 | 100.00 | Using index; Using temporary | | 1 | SIMPLE | a | NULL | eq_ref | PRIMARY,idx_fk_city_id | PRIMARY | 2 | sakila.cus.address_id | 1 | 100.00 | NULL | | 1 | SIMPLE | c | NULL | eq_ref | PRIMARY,idx_fk_country_id | PRIMARY | 2 | sakila.a.city_id | 1 | 100.00 | NULL | | 1 | SIMPLE | ctry | NULL | eq_ref | PRIMARY | PRIMARY | 2 | sakila.c.country_id | 1 | 100.00 | NULL | | 1 | SIMPLE | p | NULL | ref | idx_fk_customer_id | idx_fk_customer_id | 2 | sakila.cus.customer_id | 26 | 100.00 | NULL | +----+-------------+-------+------------+--------+---------------------------+--------------------+---------+------------------------+------+----------+------------------------------+ 5 rows in set, 1 warning (0.02 sec) |
Then, I used MySQL Workbench to generate the following visual explain plan:
Then, I compared it against a refactored version of the query that uses a correlated subquery in the SELECT-list. The example comes form Appendix B in Learning SQL, 3rd Edition by Alan Beaulieu.
SELECT ctry.country , (SELECT SUM(p.amount) FROM city c INNER JOIN address a ON c.city_id = a.city_id INNER JOIN customer cus ON a.address_id = cus.address_id INNER JOIN payment p ON cus.customer_id = p.customer_id WHERE c.country_id = ctry.country_id) AS tot_payments FROM country ctry; |
It generated the following tabular explain plan output:
+----+--------------------+-------+------------+------+---------------------------+--------------------+---------+------------------------+------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+--------------------+-------+------------+------+---------------------------+--------------------+---------+------------------------+------+----------+-------------+ | 1 | PRIMARY | ctry | NULL | ALL | NULL | NULL | NULL | NULL | 109 | 100.00 | NULL | | 2 | DEPENDENT SUBQUERY | c | NULL | ref | PRIMARY,idx_fk_country_id | idx_fk_country_id | 2 | sakila.ctry.country_id | 5 | 100.00 | Using index | | 2 | DEPENDENT SUBQUERY | a | NULL | ref | PRIMARY,idx_fk_city_id | idx_fk_city_id | 2 | sakila.c.city_id | 1 | 100.00 | Using index | | 2 | DEPENDENT SUBQUERY | cus | NULL | ref | PRIMARY,idx_fk_address_id | idx_fk_address_id | 2 | sakila.a.address_id | 1 | 100.00 | Using index | | 2 | DEPENDENT SUBQUERY | p | NULL | ref | idx_fk_customer_id | idx_fk_customer_id | 2 | sakila.cus.customer_id | 26 | 100.00 | NULL | +----+--------------------+-------+------------+------+---------------------------+--------------------+---------+------------------------+------+----------+-------------+ 5 rows in set, 2 warnings (0.00 sec) |
and, MySQL Workbench generated the following visual explain plan:
The tabular explain plan identifies the better performing query to an experienced eye but the visual explain plan works better for those new to SQL tuning.
The second query performs best because it reads the least data by leveraging the indexes best. As always, I hope these examples help those looking at learning more about MySQL.
MySQL WITH Clause
When I went over my example of using the WITH clause to solve how to use a series of literal values in data sets, some students got it right away and some didn’t. The original post showed how to solve a problem where one value in the data set is returned in the SELECT-list and two values are used as the minimum and maximum values with a BETWEEN operator. It used three approaches with literal values:
- A list of Python dictionaries that require you to filter the return set from the database through a range loop and if statement that mimics a SQL BETWEEN operator.
- A WITH clause that accepts the literals as bind variables to filter the query results inside the query.
- A table design that holds the literals values that an analyst might use for reporting.
It was the last example that required elaboration. I explained you might build a web form that uses a table, and the table could allow a data analyst to enter parameter sets. That way the analyst could submit a flag value to use one or another set of values. I threw out the idea on the whiteboard of introducing a report column to the prior post’s level table. The student went off to try it.
Two problems occurred. The first was in the design of the new table and the second was how to properly use the MySQL Python driver.
Below is a formal table design that supports this extension of the first blog post as a list of parameter values. It uses a report column as a super key to return a set of possible values. One value will show in the SELECT-list and the other two values deploy as the minimum and maximum values in a BETWEEN operator. It is seeded with two sets of values. One of the report possibilities is Summary level with three possibilities and the other is the Detail level with five possibilities.
-- Conditionally drop the levels table. DROP TABLE IF EXISTS levels; -- Create the levels list. CREATE TABLE levels ( level VARCHAR(16) , report ENUM('Summary','Detail') , min_roles INT , max_roles INT ); -- Insert values into the list table. INSERT INTO levels ( level, report, min_roles, max_roles ) VALUES ('Hollywood Star','Summary', 30, 99999) ,('Prolific Actor','Summary', 20, 29) ,('Newcommer','Summary', 1, 19) ,('Hollywood Star','Detail', 30, 99999) ,('Prolific Actor','Detail', 20, 29) ,('Regular Actor','Detail', 10, 19) ,('Actor','Detail', 5, 9) ,('Newcommer','Detail', 1, 4); |
The foregoing table design uses an ENUM type because reporting parameter sets are typically fewer than 64 possibilities. If you use the table to support multiple reports, you should add a second super key column like report_type. The report_type column key would let you use the table to support a series of different report parameter lists.
While the student used a %s inside the query, they created a runtime error when trying to pass the single bind variable into the query. The student misunderstood how to convert a report column input parameter variable into a tuple, which shows up when the student calls the Python MySQL Driver, like this:
59 | cursor.execute(query, (report)) |
The student’s code generated the following error stack:
Traceback (most recent call last): File "./python-with-clause.py", line 59, in <module> cursor.execute(query,(report)) File "/usr/lib/python3.7/site-packages/mysql/connector/cursor_cext.py", line 248, in execute prepared = self._cnx.prepare_for_mysql(params) File "/usr/lib/python3.7/site-packages/mysql/connector/connection_cext.py", line 632, in prepare_for_mysql raise ValueError("Could not process parameters") ValueError: Could not process parameters |
The ValueError should indicate to the developer that they’ve used a wrong data type in the call to the method:
cursor.execute(<class 'str'>,<class 'tuple'>) |
This clearly was a misunderstanding of how to cast a single string to a tuple. A quick explanation of how Python casts a single string into a tuple can best be illustrated inside an interactive Python shell, like:
>>> # Define a variable. >>> x = 'Detail' >>> # An incorrect attempt to make a string a tuple. >>> y = (x) >>> # Check type of y after assignment. >>> print(type(y)) <class 'str'> >>> # A correct attempt to make a string a tuple. >>> y = tuple(x) >>> # Check type of y after assignment. >>> print(type(y)) <class 'tuple'> >>> # An alternative to make a string a tuple. >>> z = (x,) >>> # Check type of z after assignment. >>> print(type(z)) <class 'tuple'> |
So, the fix was quite simple to line 59:
59 | cursor.execute(query, (report,)) |
The student started with a copy of a Python program that I provided. I fixed the argument handling and added some comments. The line 59 reference above maps to this code example.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 | # Import the library. import sys import mysql.connector from mysql.connector import errorcode # Capture argument list. fullCmdArguments = sys.argv # Assign argument list to variable. argumentList = fullCmdArguments[1:] # Define a standard report variable. report = "Summary" # Check and process argument list. # ============================================================ # If there are less than two arguments provide default values. # Else enumerate and convert strings to dates. # ============================================================ if (len(argumentList) == 1): # Set a default start date. if (isinstance(report,str)): report = argumentList[0] # Attempt the query. # ============================================================ # Use a try-catch block to manage the connection. # ============================================================ try: # Open connection. cnx = mysql.connector.connect(user='student', password='student', host='127.0.0.1', database='sakila') # Create cursor. cursor = cnx.cursor() # Set the query statement. query = ("WITH actors AS " "(SELECT a.actor_id " " , a.first_name " " , a.last_name " " , COUNT(fa.actor_id) AS num_roles " " FROM actor a INNER JOIN film_actor fa " " ON a.actor_id = fa.actor_id " " GROUP BY a.actor_id " " , a.first_name " " , a.last_name ) " " SELECT a.first_name " " , a.last_name " " , l.level " " , a.num_roles " " FROM actors a CROSS JOIN levels l " " WHERE a.num_roles BETWEEN l.min_roles AND l.max_roles " " AND l.report = %s " " ORDER BY a.last_name " " , a.first_name") # Execute cursor. cursor.execute(query,(report,)) # Display the rows returned by the query. for (first_name, last_name, level, num_roles) in cursor: print('{0} {1} is a {2} with {3} films.'.format( first_name.title() , last_name.title() , level.title() , num_roles)) # Close cursor. cursor.close() # ------------------------------------------------------------ # Handle exception and close connection. except mysql.connector.Error as e: if e.errno == errorcode.ER_ACCESS_DENIED_ERROR: print("Something is wrong with your user name or password") elif e.errno == errorcode.ER_BAD_DB_ERROR: print("Database does not exist") else: print("Error code:", e.errno) # error number print("SQLSTATE value:", e.sqlstate) # SQLSTATE value print("Error message:", e.msg) # error message # Close the connection when the try block completes. else: cnx.close() |
A Linux shell program like the following (provided the name of the shell script and Python program are the same) can run the Python program with or without a parameter. It works without a parameter because it sets a default value for the report variable.
# Switch the file extension and run the python program. file=${0/%sh/py} python3 ${file} "${@}" |
You call the shell script like this:
./python-with-clause.sh Detail |
As always, I hope this helps those looking for a solution.
Linux sqlplus wrapper
Here’s a quick way to ensure you can use the up-arrows and navigation keys when using the sqlplus command-line interface. You can just add it to your .bashrc file.
sqlplus () { path=`which rlwrap 2>/dev/null`; file=''; if [ -n ${path} ]; then file=${path##/*/}; fi; if [ -n ${file} ] && [[ ${file} = "rlwrap" ]]; then rlwrap sqlplus "${@}"; else echo "Command-line history unavailable: Install the rlwrap package."; $ORACLE_HOME/bin/sqlplus "${@}"; fi } |
As always, I hope this helps those looking of solutions.