Archive for the ‘DBA’ Category
PL/SQL Overloading
So, I wrote an updated example of my grandma and tweetie_bird for my students. It demonstrates overloading with the smallest parameter lists possible across a transaction of two tables. It also shows how one version of the procedure can call another version of the procedure.
The tables are created with the following:
/* Conditionally drop grandma table and grandma_s sequence. */ BEGIN FOR i IN (SELECT object_name , object_type FROM user_objects WHERE object_name IN ('GRANDMA','GRANDMA_SEQ')) LOOP IF i.object_type = 'TABLE' THEN /* Use the cascade constraints to drop the dependent constraint. */ EXECUTE IMMEDIATE 'DROP TABLE '||i.object_name||' CASCADE CONSTRAINTS'; ELSE EXECUTE IMMEDIATE 'DROP SEQUENCE '||i.object_name; END IF; END LOOP; END; / /* Create the table. */ CREATE TABLE GRANDMA ( grandma_id NUMBER CONSTRAINT grandma_nn1 NOT NULL , grandma_house VARCHAR2(30) CONSTRAINT grandma_nn2 NOT NULL , created_by NUMBER CONSTRAINT grandma_nn3 NOT NULL , CONSTRAINT grandma_pk PRIMARY KEY (grandma_id) ); /* Create the sequence. */ CREATE SEQUENCE grandma_seq; /* Conditionally drop a table and sequence. */ BEGIN FOR i IN (SELECT object_name , object_type FROM user_objects WHERE object_name IN ('TWEETIE_BIRD','TWEETIE_BIRD_SEQ')) LOOP IF i.object_type = 'TABLE' THEN EXECUTE IMMEDIATE 'DROP TABLE '||i.object_name||' CASCADE CONSTRAINTS'; ELSE EXECUTE IMMEDIATE 'DROP SEQUENCE '||i.object_name; END IF; END LOOP; END; / /* Create the table with primary and foreign key out-of-line constraints. */ CREATE TABLE TWEETIE_BIRD ( tweetie_bird_id NUMBER CONSTRAINT tweetie_bird_nn1 NOT NULL , tweetie_bird_house VARCHAR2(30) CONSTRAINT tweetie_bird_nn2 NOT NULL , grandma_id NUMBER CONSTRAINT tweetie_bird_nn3 NOT NULL , created_by NUMBER CONSTRAINT tweetie_bird_nn4 NOT NULL , CONSTRAINT tweetie_bird_pk PRIMARY KEY (tweetie_bird_id) , CONSTRAINT tweetie_bird_fk FOREIGN KEY (grandma_id) REFERENCES GRANDMA (GRANDMA_ID) ); /* Create sequence. */ CREATE SEQUENCE tweetie_bird_seq; |
The sylvester package specification holds the two overloaded procedures, like:
CREATE OR REPLACE PACKAGE sylvester IS /* Three variable length strings. */ PROCEDURE warner_brother ( pv_grandma_house VARCHAR2 , pv_tweetie_bird_house VARCHAR2 , pv_system_user_name VARCHAR2 ); /* Two variable length strings and a number. */ PROCEDURE warner_brother ( pv_grandma_house VARCHAR2 , pv_tweetie_bird_house VARCHAR2 , pv_system_user_id NUMBER ); END sylvester; / |
The sylvester package implements two warner_brother procedures. One takes the system user’s ID and the other takes the system user’s name. The procedure that accepts the system user name queries the system_user table with the system_user_name to get the system_user_id column and then calls the other version of itself. This demonstrates how you only write logic once when overloading and let one version call the other with the added information.
Here’s the sylvester package body code:
CREATE OR REPLACE PACKAGE BODY sylvester IS /* Procedure warner_brother with user name. */ PROCEDURE warner_brother ( pv_grandma_house VARCHAR2 , pv_tweetie_bird_house VARCHAR2 , pv_system_user_id NUMBER ) IS /* Declare a local variable for an existing grandma_id. */ lv_grandma_id NUMBER; FUNCTION get_grandma_id ( pv_grandma_house VARCHAR2 ) RETURN NUMBER IS /* Initialized local return variable. */ lv_retval NUMBER := 0; -- Default value is 0. /* A cursor that lookups up a grandma's ID by their name. */ CURSOR find_grandma_id ( cv_grandma_house VARCHAR2 ) IS SELECT grandma_id FROM grandma WHERE grandma_house = cv_grandma_house; BEGIN /* Assign a grandma_id as the return value when a row exists. */ FOR i IN find_grandma_id(pv_grandma_house) LOOP lv_retval := i.grandma_id; END LOOP; /* Return 0 when no row found and the grandma_id when a row is found. */ RETURN lv_retval; END get_grandma_id; BEGIN /* Set the savepoint. */ SAVEPOINT starting; /* * Identify whether a member account exists and assign it's value * to a local variable. */ lv_grandma_id := get_grandma_id(pv_grandma_house); /* * Conditionally insert a new member account into the member table * only when a member account does not exist. */ IF lv_grandma_id = 0 THEN /* Insert grandma. */ INSERT INTO grandma ( grandma_id , grandma_house , created_by ) VALUES ( grandma_seq.NEXTVAL , pv_grandma_house , pv_system_user_id ); /* Assign grandma_seq.currval to local variable. */ lv_grandma_id := grandma_seq.CURRVAL; END IF; /* Insert tweetie bird. */ INSERT INTO tweetie_bird ( tweetie_bird_id , tweetie_bird_house , grandma_id , created_by ) VALUES ( tweetie_bird_seq.NEXTVAL , pv_tweetie_bird_house , lv_grandma_id , pv_system_user_id ); /* If the program gets here, both insert statements work. Commit it. */ COMMIT; EXCEPTION /* When anything is broken do this. */ WHEN OTHERS THEN /* Until any partial results. */ ROLLBACK TO starting; END; PROCEDURE warner_brother ( pv_grandma_house VARCHAR2 , pv_tweetie_bird_house VARCHAR2 , pv_system_user_name VARCHAR2 ) IS /* Define a local variable. */ lv_system_user_id NUMBER := 0; FUNCTION get_system_user_id ( pv_system_user_name VARCHAR2 ) RETURN NUMBER IS /* Initialized local return variable. */ lv_retval NUMBER := 0; -- Default value is 0. /* A cursor that lookups up a grandma's ID by their name. */ CURSOR find_system_user_id ( cv_system_user_id VARCHAR2 ) IS SELECT system_user_id FROM system_user WHERE system_user_name = pv_system_user_name; BEGIN /* Assign a grandma_id as the return value when a row exists. */ FOR i IN find_system_user_id(pv_system_user_name) LOOP lv_retval := i.system_user_id; END LOOP; /* Return 0 when no row found and the grandma_id when a row is found. */ RETURN lv_retval; END get_system_user_id; BEGIN /* Convert a system_user_name to system_user_id. */ lv_system_user_id := get_system_user_id(pv_system_user_name); /* Call the warner_brother procedure. */ warner_brother ( pv_grandma_house => pv_grandma_house , pv_tweetie_bird_house => pv_tweetie_bird_house , pv_system_user_id => lv_system_user_id ); EXCEPTION /* When anything is broken do this. */ WHEN OTHERS THEN /* Until any partial results. */ ROLLBACK TO starting; END; END sylvester; / |
The following anonymous block test case works with the code:
BEGIN sylvester.warner_brother( pv_grandma_house => 'Blue House' , pv_tweetie_bird_house => 'Cage' , pv_system_user_name => 'DBA 3' ); sylvester.warner_brother( pv_grandma_house => 'Blue House' , pv_tweetie_bird_house => 'Tree House' , pv_system_user_id => 4 ); END; / |
You can now query the results with this SQL*PLus formatting and query:
/* Query results from warner_brother procedure. */ COL grandma_id FORMAT 9999999 HEADING "Grandma|ID #" COL grandma_house FORMAT A14 HEADING "Grandma House" COL created_by FORMAT 9999999 HEADING "Created|By" COL tweetie_bird_id FORMAT 9999999 HEADING "Tweetie|Bird ID" COL tweetie_bird_house FORMAT A18 HEADING "Tweetie Bird House" SELECT * FROM grandma g INNER JOIN tweetie_bird tb ON g.grandma_id = tb.grandma_id; |
You should see the following data:
Grandma Created Tweetie Grandma Created ID # Grandma House By Bird ID Tweetie Bird House ID # By -------- -------------- -------- -------- ------------------ -------- -------- 1 Blue House 3 1 Cage 1 3 1 Blue House 3 2 Tree House 1 4
As always, I hope complete code samples help solve real problems.
PostgreSQL Trigger 1
This entry covers how to write a statement logging trigger for PostgreSQL. It creates two tables: avenger and avenger_log; one avenger_t1 trigger, and a testing INSERT statement.
It was written to help newbies know how and what to return from a function written for a statement-level trigger. They often get stuck on the following when they try to return true. The term non-composite is another way to describe the tuple inserted.
psql:basics_postgres.sql: 59: ERROR: cannot return non-composite value from function returning composite type CONTEXT: PL/pgSQL function write_avenger_t1() line 15 at RETURN |
The avenger table:
/* Conditionally drop table. */ DROP TABLE IF EXISTS avenger; /* Create table. */ CREATE TABLE avenger ( avenger_id SERIAL , avenger_name VARCHAR(30) , first_name VARCHAR(20) , last_name VARCHAR(20)); |
Seed the avenger table:
/* Seed the avenger table with data. */ INSERT INTO avenger ( first_name, last_name, avenger_name ) VALUES ('Anthony', 'Stark', 'Iron Man') ,('Thor', 'Odinson', 'God of Thunder') ,('Steven', 'Rogers', 'Captain America') ,('Bruce', 'Banner', 'Hulk') ,('Clinton', 'Barton', 'Hawkeye') ,('Natasha', 'Romanoff', 'Black Widow') ,('Peter', 'Parker', 'Spiderman') ,('Steven', 'Strange', 'Dr. Strange') ,('Scott', 'Lange', 'Ant-man'); |
The avenger_log table:
/* Conditionally drop table. */ DROP TABLE IF EXISTS avenger_log; /* Create table. */ CREATE TABLE avenger_log ( avenger_log_id SERIAL , trigger_name VARCHAR(30) , trigger_timing VARCHAR(6) , trigger_event VARCHAR(6) , trigger_type VARCHAR(12)); |
The INSERT statement that tests the trigger:
DROP FUNCTION IF EXISTS avenger_t1_function; CREATE FUNCTION avenger_t1_function() RETURNS TRIGGER AS $$ BEGIN /* Insert a row into the avenger_log table. * Also, see PostrgreSQL 39.9 Trigger Procedures. */ INSERT INTO avenger_log ( trigger_name , trigger_timing , trigger_event , trigger_type ) VALUES ( UPPER(TG_NAME) , TG_WHEN , TG_OP , TG_LEVEL ); /* A statement trigger doesn't use a composite type or tuple, * it should simply return an empty composite type or void. */ RETURN NULL; END; $$ LANGUAGE plpgsql; |
The avenger_t1 statement trigger:
CREATE TRIGGER avenger_t1 BEFORE INSERT ON avenger EXECUTE FUNCTION avenger_t1_function(); |
The INSERT statement:
INSERT INTO avenger ( first_name, last_name, avenger_name ) VALUES ('Hope', 'van Dyne', 'Wasp'); |
The results logged to the avenger_log table from a query:
avenger_log_id | trigger_name | trigger_timing | trigger_event | trigger_type ----------------+--------------+----------------+---------------+-------------- 1 | AVENGER_T1 | BEFORE | INSERT | STATEMENT (1 row) |
As always, I hope this helps those looking for a solution.
MySQL Query from JSON
One of my students asked how you could get JSON data out in tabular format. I said they should look at Øystein Grøvlen’s JSON_TABLE – Best of Both Worlds blog post from 2018. Unfortunately, the student wanted another example with the Video Store model that we use in class.
For clarity, all path definitions start with a $ followed by other selectors:
- A period followed by a name, such as $.website
- [N] where N is the position in a zero-indexed array
- The .[*] wildcard evaluates all members of an object
- The [*] wildcard evaluates all members of an array
- The prefix and suffix wildcard, **, evaluates to all paths that begin with the named prefix and end with the named suffix
So, here’s a quick supplement to what’s already there. It assumes you created an example table based on my prior blog post that looks like this:
+----+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | id | struct | +----+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | 1 | {"contact": [{"last_name": "Winn", "first_name": "Randi"}, {"last_name": "Winn", "first_name": "Brian"}], "account_number": "US00001"} | | 2 | {"contact": [{"last_name": "Vizquel", "first_name": "Oscar"}, {"last_name": "Vizquel", "first_name": "Doreen"}], "account_number": "US00002"} | | 3 | {"contact": [{"last_name": "Sweeney", "first_name": "Meaghan"}, {"last_name": "Sweeney", "first_name": "Matthew"}, {"last_name": "Sweeney", "first_name": "Ian"}], "account_number": "US00003"} | | 4 | {"contact": [{"last_name": "Clinton", "first_name": "Goeffrey"}], "account_number": "US00004"} | | 5 | {"contact": [{"last_name": "Moss", "first_name": "Wendy"}], "account_number": "US00005"} | | 6 | {"contact": [{"last_name": "Gretelz", "first_name": "Simon"}], "account_number": "US00006"} | | 7 | {"contact": [{"last_name": "Royal", "first_name": "Elizabeth"}], "account_number": "US00007"} | | 8 | {"contact": [{"last_name": "Smith", "first_name": "Brian"}], "account_number": "US00008"} | | 9 | {"contact": [{"last_name": "Potter", "first_name": "Harry"}, {"last_name": "Potter", "first_name": "Ginny"}, {"last_name": "Potter", "first_name": "Lily"}], "account_number": "US00011"} | +----+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 9 rows in set (0.01 sec) |
You can query the account_number key value like this:
SELECT id , JSON_EXTRACT(struct, "$.account_number") AS account_no FROM example; |
It returns:
+----+------------+ | id | account_no | +----+------------+ | 1 | "US00001" | | 2 | "US00002" | | 3 | "US00003" | | 4 | "US00004" | | 5 | "US00005" | | 6 | "US00006" | | 7 | "US00007" | | 8 | "US00008" | | 9 | "US00011" | +----+------------+ 9 rows in set (0.00 sec) |
You use the JSON_TABLE function to get the embedded array elements of first and last name, like:
SELECT id , contact.* FROM example CROSS JOIN JSON_TABLE( struct ,"$.contact[*]" COLUMNS( lname JSON PATH "$.last_name" , fname JSON PATH "$.first_name")) AS contact; |
It returns:
+----+-----------+-------------+ | id | lname | fname | +----+-----------+-------------+ | 1 | "Winn" | "Randi" | | 1 | "Winn" | "Brian" | | 2 | "Vizquel" | "Oscar" | | 2 | "Vizquel" | "Doreen" | | 3 | "Sweeney" | "Meaghan" | | 3 | "Sweeney" | "Matthew" | | 3 | "Sweeney" | "Ian" | | 4 | "Clinton" | "Goeffrey" | | 5 | "Moss" | "Wendy" | | 6 | "Gretelz" | "Simon" | | 7 | "Royal" | "Elizabeth" | | 8 | "Smith" | "Brian" | | 9 | "Potter" | "Harry" | | 9 | "Potter" | "Ginny" | | 9 | "Potter" | "Lily" | +----+-----------+-------------+ 15 rows in set (0.00 sec) |
You can combine both approaches, as shown below.
SELECT id , JSON_EXTRACT(struct, "$.account_number") AS account_no , contact.* FROM example CROSS JOIN JSON_TABLE( struct ,"$.contact[*]" COLUMNS( lname JSON PATH "$.last_name" , fname JSON PATH "$.first_name")) AS contact; |
It returns:
+----+------------+-----------+-------------+ | id | account_no | lname | fname | +----+------------+-----------+-------------+ | 1 | "US00001" | "Winn" | "Randi" | | 1 | "US00001" | "Winn" | "Brian" | | 2 | "US00002" | "Vizquel" | "Oscar" | | 2 | "US00002" | "Vizquel" | "Doreen" | | 3 | "US00003" | "Sweeney" | "Meaghan" | | 3 | "US00003" | "Sweeney" | "Matthew" | | 3 | "US00003" | "Sweeney" | "Ian" | | 4 | "US00004" | "Clinton" | "Goeffrey" | | 5 | "US00005" | "Moss" | "Wendy" | | 6 | "US00006" | "Gretelz" | "Simon" | | 7 | "US00007" | "Royal" | "Elizabeth" | | 8 | "US00008" | "Smith" | "Brian" | | 9 | "US00011" | "Potter" | "Harry" | | 9 | "US00011" | "Potter" | "Ginny" | | 9 | "US00011" | "Potter" | "Lily" | +----+------------+-----------+-------------+ 15 rows in set (0.00 sec) |
Lastly, if you want to get rid of the enclosing double quotes you can do the following:
WITH raw AS (SELECT id , JSON_EXTRACT(struct, "$.account_number") AS account_no , contact.* FROM example CROSS JOIN JSON_TABLE( struct ,"$.contact[*]" COLUMNS( lname JSON PATH "$.last_name" , fname JSON PATH "$.first_name")) AS contact) SELECT id , REGEXP_REPLACE(account_no,'"','') AS account_no , REGEXP_REPLACE(lname,'"','') AS lname , REGEXP_REPLACE(fname,'"','') AS fname FROM raw; |
It’s also possible to use the JSON_UNQUOTE function to cleanup the double quotes. I hope this helps those extracting JSON data into tabular result sets.
MySQL Backslashes
Yesterday, I wrote a blog post that showed you how to write a query returning a JSON structure for a 1:many relationship. The relationship was between the member and contact table. It returns one account_number from the member table and a list of first_name and last_name columns from the contact table in a JSON structure.
One of my students asked why I choose to strip the backslashes with Python, and my reply was the SQL was already complex for most blog readers. The student asked but how would you do it in SQL. OK, that’s a fair question for two reasons. First, you don’t need to do in your local programs because it’ll run faster on the server. Second, if you strip the backslashes you can insert it into a standard JSON column. This blog post will show you how to do both.
You would use three REGEXP_REPLACE function calls, like:
SELECT REGEXP_REPLACE( REGEXP_REPLACE( REGEXP_REPLACE( JSON_OBJECT( 'account_number', account_number ,'contact', CONCAT('[' , GROUP_CONCAT( JSON_OBJECT('first_name',first_name ,'last_name',last_name ) SEPARATOR ',') ,']') ) ,'\\\\','') ,'"\\\[','\\\[') ,'\\\]"','\\\]') AS json_result FROM member m INNER JOIN contact c ON m.member_id = c.member_id GROUP BY m.account_number; |
It returns the following:
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | {"contact": [{"last_name": "Winn", "first_name": "Randi"},{"last_name": "Winn", "first_name": "Brian"}], "account_number": "US00001"} | | {"contact": [{"last_name": "Vizquel", "first_name": "Oscar"},{"last_name": "Vizquel", "first_name": "Doreen"}], "account_number": "US00002"} | | {"contact": [{"last_name": "Sweeney", "first_name": "Meaghan"},{"last_name": "Sweeney", "first_name": "Matthew"},{"last_name": "Sweeney", "first_name": "Ian"}], "account_number": "US00003"} | | {"contact": [{"last_name": "Clinton", "first_name": "Goeffrey"}], "account_number": "US00004"} | | {"contact": [{"last_name": "Moss", "first_name": "Wendy"}], "account_number": "US00005"} | | {"contact": [{"last_name": "Gretelz", "first_name": "Simon"}], "account_number": "US00006"} | | {"contact": [{"last_name": "Royal", "first_name": "Elizabeth"}], "account_number": "US00007"} | | {"contact": [{"last_name": "Smith", "first_name": "Brian"}], "account_number": "US00008"} | | {"contact": [{"last_name": "Potter", "first_name": "Harry"},{"last_name": "Potter", "first_name": "Ginny"},{"last_name": "Potter", "first_name": "Lily"}], "account_number": "US00011"} | +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 9 rows in set (0.00 sec) |
Let’s create a table with a JSON structure with the following script:
/* Drop table if it exists. */ DROP TABLE IF EXISTS example; /* Create a example table. */ CREATE TABLE example ( id int unsigned auto_increment , struct json , PRIMARY KEY (id)); |
Now, we can embed the query inside an INSERT statement:
INSERT INTO example ( struct ) (SELECT REGEXP_REPLACE( REGEXP_REPLACE( REGEXP_REPLACE( JSON_OBJECT( 'account_number', account_number ,'contact', CONCAT('[' , GROUP_CONCAT( JSON_OBJECT('first_name',first_name ,'last_name',last_name ) SEPARATOR ',') ,']') ) ,'\\\\','') ,'"\\\[','\\\[') ,'\\\]"','\\\]') AS json_result FROM member m INNER JOIN contact c ON m.member_id = c.member_id GROUP BY m.account_number); |
A query of the example table, like:
SELECT * FROM example; |
Returns:
+----+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | id | struct | +----+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | 1 | {"contact": [{"last_name": "Winn", "first_name": "Randi"}, {"last_name": "Winn", "first_name": "Brian"}], "account_number": "US00001"} | | 2 | {"contact": [{"last_name": "Vizquel", "first_name": "Oscar"}, {"last_name": "Vizquel", "first_name": "Doreen"}], "account_number": "US00002"} | | 3 | {"contact": [{"last_name": "Sweeney", "first_name": "Meaghan"}, {"last_name": "Sweeney", "first_name": "Matthew"}, {"last_name": "Sweeney", "first_name": "Ian"}], "account_number": "US00003"} | | 4 | {"contact": [{"last_name": "Clinton", "first_name": "Goeffrey"}], "account_number": "US00004"} | | 5 | {"contact": [{"last_name": "Moss", "first_name": "Wendy"}], "account_number": "US00005"} | | 6 | {"contact": [{"last_name": "Gretelz", "first_name": "Simon"}], "account_number": "US00006"} | | 7 | {"contact": [{"last_name": "Royal", "first_name": "Elizabeth"}], "account_number": "US00007"} | | 8 | {"contact": [{"last_name": "Smith", "first_name": "Brian"}], "account_number": "US00008"} | | 9 | {"contact": [{"last_name": "Potter", "first_name": "Harry"}, {"last_name": "Potter", "first_name": "Ginny"}, {"last_name": "Potter", "first_name": "Lily"}], "account_number": "US00011"} | +----+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 9 rows in set (0.00 sec) |
I hope this answers the question about whether you can use SQL remove the backslashes from the original result set and how you can insert the JSON result set into a JSON data type.
PL/pgSQL Transactions
There are many nuances that I show students about PL/pgSQL because first I teach them how to use PL/SQL. These are some of the differences:
- PL/SQL declares the function or procedure and then uses the IS keyword; whereas, PL/pgSQL uses the AS keyword.
- PL/SQL uses the RETURN keyword for functions declarations, like:
RETURN [data_type} IS
Whereas, PL/pgSQL uses the plural RETURNS keyword in the function declaration, like:
RETURNS [data_type] AS
- PL/SQL considers everything after the function or procedure header as the implicit declaration section; whereas, PL/pgSQL requires you block the code with something like $$ (double dollar symbols) and explicitly use the DECLARE keyword.
- PL/SQL supports local functions (inside the DECLARE block of a function or procedure); whereas, PL/pgSQL doesn’t.
- PL/SQL puts the variable modes (IN, INOUT, OUT) between the parameter name and type; whereas, PL/pgSQL puts them before the variable name.
- PL/SQL declares cursors like:
CURSOR cursor_name (parameter_list) IS
Whereas, PL/pgSQL declares them like
cursor_name CURSOR (parameter_list) FOR
- PL/SQL terminates and runs the block by using an END keyword, an optional module name, a semicolon to terminate the END; statement, and a forward slash to dispatch the program to PL/SQL statement engine:
END [module_name]; /
Whereas, PL/pgSQL terminates and runs the block by using an END keyword, a semicolon to terminate the END; statement, two dollar signs to end the PL/pgSQL block, and a semicolon to dispatch the program.
END LANGUAGE plpgsql; $$;
After all that basic syntax discussion, we try to create a sample set of tables, a function, a procedure, and a test case in PL/pgSQL. They’ve already done a virtually equivalent set of tasks in PL/SQL.
Here are the steps:
- Create the grandma and tweetie_bird tables:
/* Conditionally drop grandma table and grandma_s sequence. */ DROP TABLE IF EXISTS grandma CASCADE; /* Create the table. */ CREATE TABLE GRANDMA ( grandma_id SERIAL , grandma_house VARCHAR(30) NOT NULL , PRIMARY KEY (grandma_id) ); /* Conditionally drop a table and sequence. */ DROP TABLE IF EXISTS tweetie_bird CASCADE; /* Create the table with primary and foreign key out-of-line constraints. */ SELECT 'CREATE TABLE tweetie_bird' AS command; CREATE TABLE TWEETIE_BIRD ( tweetie_bird_id SERIAL , tweetie_bird_house VARCHAR(30) NOT NULL , grandma_id INTEGER NOT NULL , PRIMARY KEY (tweetie_bird_id) , CONSTRAINT tweetie_bird_fk FOREIGN KEY (grandma_id) REFERENCES grandma (grandma_id) );
- Create a get_grandma_id function that returns a number, which should be a valid primary key value from the grandma_id column of the grandma table.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
CREATE OR REPLACE FUNCTION get_grandma_id ( IN pv_grandma_house VARCHAR ) RETURNS INTEGER AS $$ /* Required for PL/pgSQL programs. */ DECLARE /* Local return variable. */ lv_retval INTEGER := 0; -- Default value is 0. /* Use a cursor, which will not raise an exception at runtime. */ find_grandma_id CURSOR ( cv_grandma_house VARCHAR ) FOR SELECT grandma_id FROM grandma WHERE grandma_house = cv_grandma_house; BEGIN /* Assign a value when a row exists. */ FOR i IN find_grandma_id(pv_grandma_house) LOOP lv_retval := i.grandma_id; END LOOP; /* Return 0 when no row found and the ID # when row found. */ RETURN lv_retval; END; $$ LANGUAGE plpgsql;
- Create a Warner_brother procedure that writes data across two tables as a transaction. You con’t include any of the following in your functions or procedures because all PostgreSQL PL/pgSQL functions and procedures are transaction by default:
- SET TRANSACTION
- START TRANSACTION
- SAVEPOINT
- COMMIT
A ROLLBACK should be placed in your exception handler as qualified on lines #33 thru #36. The warner_brother procedure inserts rows into the grandma and tweetie_bird tables.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
/* Create or replace procedure warner_brother. */ CREATE OR REPLACE PROCEDURE warner_brother ( pv_grandma_house VARCHAR , pv_tweetie_bird_house VARCHAR ) AS $$ /* Required for PL/pgSQL programs. */ DECLARE /* Declare a local variable for an existing grandma_id. */ lv_grandma_id INTEGER; BEGIN /* Check for existing grandma row. */ lv_grandma_id := get_grandma_id(pv_grandma_house); IF lv_grandma_id = 0 THEN /* Insert grandma. */ INSERT INTO grandma ( grandma_house ) VALUES ( pv_grandma_house ) RETURNING grandma_id INTO lv_grandma_id; END IF; /* Insert tweetie bird. */ INSERT INTO tweetie_bird ( tweetie_bird_house , grandma_id ) VALUES ( pv_tweetie_bird_house , lv_grandma_id ); EXCEPTION WHEN OTHERS THEN ROLLBACK; RAISE NOTICE '[%] [%]', SQLERRM, SQLSTATE; END; $$ LANGUAGE plpgsql;
You should take note of the RETURNING-INTO statement on line #22. The alternative to this clause isn’t pretty if you know that PostgreSQL uses a table name, column name, and the literal seq value separated by underscores (that is, snake case), like:
/* Assign current value to local variable. */ lv_grandma_id := CURRVAL('grandma_grandma_id_seq');
It would be even uglier if you had to look up the sequence name, like:
/* Assign current value to local variable. */ lv_grandma_id := CURRVAL(pg_get_serial_sequence('grandma','grandma_id'));
- You can test the combination of these two stored procedures with the following DO-block:
/* Test the warner_brother procedure. */ DO $$ BEGIN /* Insert the yellow house. */ CALL warner_brother( 'Yellow House', 'Cage'); CALL warner_brother( 'Yellow House', 'Tree House'); /* Insert the red house. */ CALL warner_brother( 'Red House', 'Cage'); CALL warner_brother( 'Red House', 'Tree House'); END; $$ LANGUAGE plpgsql;
Then, query the results:
SELECT * FROM grandma g INNER JOIN tweetie_bird tb ON. g.grandma_id = tb.grandma_id;
It should return:
grandma_id | grandma_house | tweetie_bird_id | tweetie_bird_house | grandma_id ------------+---------------+-----------------+--------------------+------------ 1 | Red House | 1 | Cage | 1 1 | Red House | 2 | Tree House | 1 2 | Yellow House | 3 | Cage | 2 2 | Yellow House | 4 | Tree House | 2 (4 rows)
As always, I hope writing a clear and simple examples helps those looking for sample code.
Is SQL Programming
Is SQL, or Structured Query Language, a programming language? That’s a great question! A question that many answer with emphasis: “No, SQL is not a programming language!” There are some who answer yes; and they usually qualify that answer with something like: “SQL is a programming language designed to communicate with relational databases.”
It strikes me that those saying “yes” are saying that SQL is only a collection of interface methods to read from and write to a database engine. Those saying SQL is not a programming language often qualify that a programming language must have conditional logic and iterative structures, which don’t exist in SQL.
There’s a third group that are fence sitters. They decline to say whether SQL is a programming language, but they also say individuals who only write SQL aren’t programmers. That’s a bit harsh from my perspective.
Before determining whether SQL is a programming language let’s define a programming language. Let’s define a programming language as a collection of lexical units, or building blocks, that build program units. Lexical units are typically organized as delimiters, identifiers, literals, and comments:
- Delimiters include single or double quotes to identify strings and operators that let you assign and compare values.
- Identifiers are reserved words, keywords, predefined identifiers (like data type names), user-defined variables, subroutines, or types.
- Literals are typically numbers and strings, where some strings qualify as dates because they implement a default format mask to convert strings to dates or date-times.
- Comments are simply delimited text that the program ignores but the programmer uses.
That means a programming language must let you define a variable, assign a value to a variable, iterate across a set of values, and make conditional statements. SQL meets these four conditions, but it does, as a set-programming language, qualify all variables as lists of tuples. Though it is possible to have variables with zero to many elements and one to many members in any given tuple. That means you can assign a literal value to to a one-element list with a single-member tuple, like you would a string or integer to a variable of that type.
As Kris Köhntopp commented, computer science defines a programming language as Turing Complete. As his comment qualifies and the Wikipedia page explains: “Turing completeness in declarative SQL is implemented through recursive common table expressions. Unsurprisingly, procedural extensions to SQL (PLSQL, etc.) are also Turing-complete.” While PostgreSQL introduces recursive query syntax through CTEs, it recently added the search and cycle feature in PostgreSQL 14. The recursive query feature has existed in the Oracle database since Oracle 8, but their documentation calls them hierarchical queries. I wrote a quick a tutorial on hierarchical queries in 2008.
For clarity, define and declare are two words that give grief to some newbies. Let’s qualify what they mean. Declare means to give a variable a name and data type. Define means to declare a variable and assign it a value. Another word for assigning a variable is initializing it. Unassigned variables are automatically assigned a default value or a null dependent on the programming language.
Let’s first declare a local variable, assign it to variable, and display the variable. The following example uses Node.js to define the input variable, assign the input variable to the display variable, and then print the display variable to console. Node.js requires that you assign an empty string to the display variable to define it as a string otherwise its type would be undefined, which is common behavior in dynamically typed languages.
/* Declare the display variable as a string. */ var display = "" /* Define the input variable. */ var input = "Hello World!" /* Assign the input variable contents to the display variable. */ display = input /* Print the display variable contents to console. */ console.log(display) |
It prints:
Hello World! |
Let’s write the same type of program in MySQL. Like the Node.js, there are implementation differences. The biggest difference in MySQL or other relational databases occurs because SQL is a declarative set-based language. That means every variable is a collection of a record structure . You can only mimic a scalar or primitive data type variable by creating a record structure with a single member.
In the case below, there are four processing steps:
- The ‘Hello World!’ literal value is assigned to an input variable.
- The SELECT-list (or comma-delimited set of values in the SELECT clause) is assigned like a tuple to the struct collection variable by treating the query of the literal value as an expression.
- The FROM clause returns the struct collection as the data set or as a derived table.
- The topmost SELECT clause evaluates the struct collection row-by-row, like a loop, and assigns the input member to a display variable.
The query is:
SELECT struct.input AS display FROM (SELECT 'Hello World!' AS input) struct; |
Since the struct collection contains only one element, it displays the original literal value one time, like
+--------------+ | display | +--------------+ | Hello World! | +--------------+ 1 row in set (0.00 sec) |
Let’s update the SQL syntax to the more readable, ANSI 1999 and forward, syntax with a Common Table Expression (CTE). CTEs are implemented by the WITH clause.
WITH struct AS (SELECT 'Hello World!' AS input) SELECT struct.input AS display FROM struct; |
The best thing about CTE values they run one-time and are subsequently available anywhere in your query, subqueries, or correlated subqueries. In short, there’s never an excuse to write a subquery twice in the same query.
Let’s look at loops and if-statements. Having established that we can assign a literal to a variable, re-assign the value from one variable to another, and then display the new variable, let’s assign a set of literal values to an array variable. As before, let’s use Node.js to structure the initial problem.
The program now assigns an array of strings to the input variable, uses a for-loop to read the values from the input array, and uses an if-statement with a regular expression evaluation. The if-statement determines which of the array value meets the condition by using a negating logical expression. That’s because the search() function returns a 0 or greater value when the needle value is found in the string and a -1 when not found. After validating that the needle variable value is found in an input string, the input value is assigned to the display variable.
/* Declare the display variable as a string. */ var display = "" /* Declare a lookup variable. */ var needle = "Goodbye" /* Define the input variable as an array of strings. */ var input = ["Hello World!" ,"Goodbye, Cruel World!" ,"Good morning, too early ..."] /* Read through an array and assign the value that meets * the condition to the display variable. */ for (i = 0; i < input.length; i++) if (!(input[i].search(needle) < 0)) display = input[i] /* Print the display variable contents to console. */ console.log(display) |
Then, it prints the display value:
Goodbye, Cruel World! |
To replicate the coding approach in a query, there must be two CTEs. The needle CTE assigns a literal value of ‘goodbye’ to a one-element collection of a single-member tuple variable. The struct CTE creates a collection of strings by using the UNION ALL operator to append three unique tuples instead of one tuple as found in the early example.
The needle CTE returns a one-element collection of a single-member tuple variable. The struct CTE returns a three-element collection of a single-member tuple, which mimics an array of strings. The needle and struct CTEs return distinct variables with different data types. A cross join operation between the two CTEs puts their results together into the same context. It returns a Cartesian product that:
- Adds a single-row tuples to each row of the query’s result set or derived table.
- Adds a multiple-tuples to each row of the query’s result set or derived table by creating copies of each row (following the Cartesian set theory which multiplies rows and adds columns).
In this case, the Cartesian join adds a one-element needle CTE value to each element, or row, returned by the multiple-element struct CTE and produces the following derived table:
+-----------------------------+---------+ | display | lookup | +-----------------------------+---------+ | Hello World! | goodbye | | Goodbye, cruel world! | goodbye | | Good morning, too early ... | goodbye | +-----------------------------+---------+ 3 rows in set (0.00 sec) |
The following query reads through the CTE collection like a loop and filters out any invalid input values. It uses the MySQL regular expression like function in the WHERE clause, which acts as a conditional or if-statement.
WITH needle AS (SELECT 'goodbye' AS lookup) , struct AS (SELECT 'Hello World!' AS input UNION ALL SELECT 'Goodbye, cruel world!' AS input UNION ALL SELECT 'Good morning, too early ...' AS input) SELECT struct.input AS display FROM struct CROSS JOIN needle WHERE REGEXP_LIKE(struct.input, CONCAT('^.*',needle.lookup,'.*$'),'i'); |
It returns the one display value that meets the criteria:
+-----------------------+ | display | +-----------------------+ | Goodbye, cruel world! | +-----------------------+ 1 row in set (0.00 sec) |
The comparisons of the imperative programming approach in Node.js and declarative programming approach should have established that SQL has all the elements of a programming language. That is, SQL has variable declaration and assignment and both iterative and conditional statements. SQL also has different styles for implementing variable declaration and the examples covered subqueries and CTEs with cross joins placing variables in common scope.
Comparative Approaches:
Next, let’s examine a problem that a programmer might encounter when they think SQL only queries or inserts, updates, or deletes single rows. With that perspective of SQL there’s often a limited perspective on how to write queries. Developers with this skill set level typically write only basic queries, which may include inner and outer joins and some aggregation statements.
Let’s assume the following for this programming assignment:
- A sale table as your data source, and
- A requirement to display the type, number, pre-tax sale amount, and percentage by type.
The sale table definition:
+------------+--------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +------------+--------------+------+-----+---------+----------------+ | sale_id | int unsigned | NO | PRI | NULL | auto_increment | | item_desc | varchar(20) | YES | | NULL | | | unit_price | decimal(8,2) | YES | | NULL | | | serial_no | varchar(10) | YES | | NULL | | +------------+--------------+------+-----+---------+----------------+ |
A basic Node.js program may contain a SQL query that returns the item_desc and unit_price columns while counting the number of serial_no rows and summing the unit_price amounts (that assumes no discount sales, after all its Apple). That type of query leaves calculating the total amount of sales and percentage by type to the Node.js program.
const mysql = require('mysql') const connection = mysql.createConnection({ host: 'localhost', user: 'student', password: 'student', database: 'studentdb' }) connection.connect((err) => { if (err) throw err else { console.log('Connected to MySQL Server!\n') connection.query("SELECT s.item_desc " + ", s.unit_price " + ", COUNT(s.serial_no) AS quantity_sold " + ", SUM(s.unit_price) AS sales " + "FROM sale s " + "GROUP BY s.item_desc " + ", s.unit_price", function (err, result) { if (err) throw err else { // Prints the index value in the RowDataPacket. console.log(result) connection.end() }})} }) |
This program would return a JSON structure, like:
[ RowDataPacket { item_desc: 'MacBook Pro 16', unit_price: 2499, quantity_sold: 16, sales: 39984 }, ... RowDataPacket { item_desc: 'MacBook Air M1', unit_price: 999, quantity_sold: 22, sales: 21978 } ] |
While the remaining JavaScript code isn’t difficult to write, it’s unnecessary effort if the developer knew SQL well enough to program in it. The developer could simply re-write the query like the following and return the percentage by type value in the base JSON structure.
WITH sales AS (SELECT SUM(unit_price) AS total FROM sale) SELECT s.item_desc , s.unit_price , COUNT(s.serial_no) AS quantity_sold , SUM(s.unit_price) AS sales , CONCAT(FORMAT((s.unit_price * COUNT(s.serial_no))/sales.total * 100,2),'%') AS percentage FROM sale s CROSS JOIN sales GROUP BY s.item_desc , s.unit_price , sales.total; |
The query uses the sales CTE to calculate and define a tuple with the total sales and adds a derived column calculating the percentage by type of device. It’s probably important to note that aggregation rules require you add the sales.total CTE tuple to the group by clause.
The new query returns this JSON list:
[ RowDataPacket { item_desc: 'MacBook Pro 16', unit_price: 2499, quantity_sold: 16, sales: 39984, percentage: '17.70%' }, ... RowDataPacket { item_desc: 'MacBook Air M1', unit_price: 999, quantity_sold: 22, sales: 21978, percentage: '9.73%' } ] |
The developer would get a complete JSON list when the new query replaces the old. It also would eliminate the need to write additional JavaScript to calculate the percentage by type of device.
Conclusions:
Leveraging the programming power of SQL is frequently possible in many frontend and backend programming solutions. However, the programming power of SQL is infrequently found in programming solutions. That leaves me to ask: “Is it possible that the almost systemic failure to leverage the programming capabilities of SQL is a result of biases by instructors and mentors to their own limited skill sets?” That likely might be true if their instructors and mentors held the belief that: “No, SQL is not a programming language!”
Candidly, folks that write SQL at the programming level almost always have concurrent mastery in two or more imperative programming languages. They’re probably the ones who say, “SQL is a programming language designed to communicate with relational databases.”
Who are those pesky fence sitters? You remember those, don’t you. They’re the ones who declined to take a position on whether SQL is a programming language. Are they the developers who are still learning, and those without an entrenched, preconceived, or learned bias? Or, do they wonder if SQL is Turing complete?
Node.js MySQL Error
While I blogged about how to setup Node.js and MySQL almost two years ago, it was interesting when a student ran into a problem. The student said they’d configured the environment but were unable to use Node.js to access MySQL.
The error is caused by this import statement:
const mysql = require('mysql') |
The student got the following error, which simply says that they hadn’t installed the Node.js package for MySQL driver.
internal/modules/cjs/loader.js:638 throw err; ^ Error: Cannot find module 'mysql' at Function.Module._resolveFilename (internal/modules/cjs/loader.js:636:15) at Function.Module._load (internal/modules/cjs/loader.js:562:25) at Module.require (internal/modules/cjs/loader.js:692:17) at require (internal/modules/cjs/helpers.js:25:18) at Object.<anonymous> (/home/student/Data/cit325/oracle-s/lib/Oracle12cPLSQLCode/Introduction/query.js:4:15) at Module._compile (internal/modules/cjs/loader.js:778:30) at Object.Module._extensions..js (internal/modules/cjs/loader.js:789:10) at Module.load (internal/modules/cjs/loader.js:653:32) at tryModuleLoad (internal/modules/cjs/loader.js:593:12) at Function.Module._load (internal/modules/cjs/loader.js:585:3) |
I explained they could fix the problem with the following two Node.js Package Manager (NPM) commands:
npm init --y npm install --save mysql |
The student was able to retest the code with success. The issue was simply that the Node.js couldn’t find the NPM MySQL module.
MySQL Partitioned Tables
MySQL Partitioned Tables
Learning Outcomes
- Learn about List Partitioning.
- Learn about Range Partitioning.
- Learn about Columns Partitioning.
- Learn about Hash Partitioning.
- Learn about Key Partitioning.
- Learn about Subpartitioning.
Lesson Material
MySQL supports partitioning of tables. It supports range, list, hash, and key partitioning. Range partitioning lets you partition based on column values that fall within given ranges. List partitioning lets you partition based on columns matching one of a set of discrete values. Hash partitioning lets you partition based on the return value from a user-defined expression (the result from a stored SQL/PSM function). Key partitioning performs like hash partitioning, but it lets a user select one or more columns from the set of columns in a table; a hash manages the selection process for you. A hash is a method of organizing keys to types of data, and hashes speed access to read and change data in tables.
Each of the following subsections discusses one of the supported forms of partitioning in MySQL. Naturally, there are differences between other databases and MySQL’s implementation.
List Partitioning
A MySQL list partition works by identifying a column that contains an integer value, the franchise_number in the following example. Partitioning clauses follow the list of columns and constraints and require a partitioning key to be in the primary key or indexed.
The following list partition works with literal numeric values. MySQL uses the IN keyword for list partitions. Note that there’s no primary key designated and an index is on the auto-incrementing surrogate key column. A complete example is provided to avoid confusion on how to index the partitioning key:
CREATE TABLE franchise ( franchise_id INT UNSIGNED AUTO_INCREMENT , franchise_number INT UNSIGNED , franchise_name VARCHAR(20) , city VARCHAR(20) , state VARCHAR(20) , index idx (franchise_id)) PARTITION BY LIST(franchise_number) ( PARTITION offshore VALUES IN (49,50) , PARTITION west VALUES IN (34,45,48) , PARTITION desert VALUES IN (46,47) , PARTITION rockies VALUES IN (38,41,42,44)); |
The inclusion of a PRIMARY KEY constraint on the franchise_id column would trigger an ERROR 1503 when the partitioning key isn’t the primary key. The reason for the error message is that a primary key implicitly creates a unique index, and that index would conflict with the partitioning by list instruction. The use of a non-unique idx index on the franchise_id column is required when you want to partition on a non-primary key column.
Range Partitioning
Range partitioning works only with an integer value or an expression that resolves to an integer against the primary key column. The limitation of the integer drives the necessity of choosing an integer column for range partitioning. You can’t define a range-partitioned table with a PRIMARY KEY constraint unless the primary key becomes your partitioning key, like
the one below.
CREATE TABLE ordering ( ordering_id INT UNSIGNED AUTO_INCREMENT , item_id INT UNSIGNED , rental_amount DECIMAL(15,2) , rental_date DATE , index idx (ordering_id)) PARTITION BY RANGE(item_id) ( PARTITION jan2011 VALUES LESS THAN (10000) , PARTITION feb2011 VALUES LESS THAN (20000) , PARTITION mar2011 VALUES LESS THAN (30000)); |
Range partitioning is best suited to large tables that you want to break into smaller pieces based on the integer column. You can also use stored functions that return integers as the partitioning key instead of the numeric literals shown. Few other options are available in MySQL.
Columns Partitioning
Columns partitioning is a new variant of range and list partitioning. It is included in MySQL 5.5 and forward. Both range and list partitioning work on an integer-based column (using TINYINT, SMALLINT, MEDIUMINT, INT [alias INTEGER], and BIGINT). Columns partitioning extends those models by expanding the possible data types for the partitioning column to include CHAR, VARCHAR, BINARY, and VARBINARY string data types, and DATE, DATETIME, or TIMESTAMP data types. You still can’t use other number data types such as DECIMAL and FLOAT. The TIMESTAMP data type is also available only in range partitions with the caveat that you use a UNIX_TIMESTAMP function, according to MySQL Bug 42849.
Hash Partitioning
Hash partitions ensure an even distribution of rows across a predetermined number of partitions. It is probably the easiest way to partition a table quickly to test the result of partitioning on a large table. You should base hash partitions on a surrogate or natural primary key.
The following provides a modified example of the ordering table:
CREATE TABLE ordering ( ordering_id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT , item_id INT UNSIGNED , rental_amount DECIMAL(15,2) , rental_date DATE) PARTITION BY HASH(ordering_id) PARTITIONS 8; |
This is the partitioning type that benefits from a PRIMARY KEY constraint because it automatically creates a unique index that can be used by the hash. A non-unique index such as the list partitioning example doesn’t work for a hash partition.
Key Partitioning
Key partitioning is valuable because you can partition on columns that aren’t integers. It performs along the line of hash partitioning, except the MySQL Server uses its own hashing expression.
CREATE TABLE orders_list ( order_list_id INT UNSIGNED AUTO_INCREMENT , customer_surname VARCHAR(30) , store_id INT UNSIGNED , salesperson_id INT UNSIGNED , order_date DATE , index idx (order_list_id)) PARTITION BY KEY (order_date) PARTITIONS 8; |
This is the only alternative when you want to partition by date ranges. Like the hash partition, it’s easy to deploy. The only consideration is the number of slices that you want to make of the data in the table.
Subpartitioning
The concept of subpartitioning is also known as composite partitioning. You can subpartition range or list partitions with a hash, linear hash, or linear key.
A slight change to the previously created ordering table is required to demonstrate composite partitioning: we’ll add a store_id column to the table definition. The following is an example of a range partition subpartitioned by a hash:
CREATE TABLE ordering INT UNSIGNED INT UNSIGNED DATE ( ordering_id INT UNSIGNED AUTO_INCREMENT , item_id INT UNSIGNED , store_id INT UNSIGNED , rental_amount DECIMAL(15,2) , rental_date DATE , index idx (ordering_id)) PARTITION BY RANGE(item_id) SUBPARTITION BY HASH(store_id) SUBPARTITIONS 4 ( PARTITION jan2011 VALUES LESS THAN (10000) , PARTITION feb2011 VALUES LESS THAN (20000) , PARTITION mar2011 VALUES LESS THAN (30000)); |
Composite partitioning is non-trivial and might require some experimentation to achieve optimal results. Plan on making a few tests of different scenarios before you deploy a solution.
Oracle Partitioned Tables
Oracle Partitioned Tables
Learning Outcomes
- Learn about List Partitioning.
- Learn about Range Partitioning.
- Learn about Hash Partitioning.
- Learn about Composite Partitioning.
Lesson Material
Partitioning is the process of breaking up a data source into a series of data sources. Partitioned tables are faster to access and transact against. Partitioning data becomes necessary as the amount of data grows in any table. It speeds the search to find rows and insert, update, or delete rows.
Oracle Database 21c supports four types of table partitioning: list, range, hash, and composite partitioning.
List Partitioning
A list partition works by identifying a column that contains a value, such as a STATE column in an ADDRESS table. Partitioning clauses follow the list of columns and constraints.
A list partition could use a STATE column, like the following (the complete example is avoided to conserve space, and the three dots represent the balance of partitions not shown):
CREATE TABLE franchise ( franchise_id NUMBER CONSTRAINT pk_franchise PRIMARY KEY , franchise_name VARCHAR(20) , city VARCHAR(20) , state VARCHAR(20)) PARTITION BY LIST(state) ( PARTITION offshore VALUES('Alaska', 'Hawaii') , PARTITION west VALUES('California', 'Oregon', 'Washington') , PARTITION desert VALUES ('Arizona','New Mexico') , PARTITION rockies VALUES ('Colorado', 'Idaho', 'Montana', 'Wyoming') , ... ); |
This can be used with other values such as ZIP codes with great effect, but the maintenance of list partitioning can be considered costly. Cost occurs when the list of values changes over time. Infrequent change means low cost, while frequent change means high costs. In the latter case, you should consider other partitioning strategies. Although an Oracle database supports partitioning on a variable-length string, MySQL performs list partitioning only on integer columns.
Range Partitioning
Range partitioning is very helpful on any column that contains a continuous metric, such as dates or time. It works by stating a minimum set that is less than a certain value, and then a group of sets of higher values until you reach the top most set of values. This type of partition helps you improve performance by letting you search ranges rather than complete data sets. Range partitioning is also available in MySQL.
A range example based on dates could look like this:
PARTITION BY RANGE(rental_date) ( PARTITION rental_jan2011 VALUES LESS THAN TO_DATE('31-JAN-11','DD-MON-YY') , PARTITION rental_feb2011 VALUES LESS THAN TO_DATE('28-FEB-11','DD-MON-YY') , PARTITION rental_mar2011 VALUES LESS THAN TO_DATE('31-MAR-11','DD-MON-YY') , ... ); |
The problem with this type of partitioning, however, is that the new months require constant management. Many North American businesses simply add partitions for all months in the year as an annual maintenance task during the holidays in November or December. Companies that opt for bigger range increments reap search and access benefits from range partitioning, while minimizing ongoing maintenance expenses.
Hash Partitioning
Hash partitioning is much easier to implement than list or range partitioning. Many DBAs favor it because it avoids the manual maintenance of list and range partitioning. Oracle Database 21c documentation recommends that you implement a hash for the following reasons:
- There is no concrete knowledge about how much data maps to a partitioning range.
- The sizes of partitions are unknown at the outset and difficult to balance as data is added to the database.
- A range partition might cluster data in an ineffective way.
This next statement creates eight partitions and stores them respectively in one of the eight tablespaces. The hash partition manages nodes and attempts to balance the distribution of rows across the nodes.
PARTITION BY HASH(store) PARTITIONS 8 STORE IN (tablespace1, tablespace2, tablespace3, tablespace4 ,tablespace5, tablespace6, tablespace7, tablespace8); |
As you can imagine the maintenance for this type of partitioning is low. Some DBAs choose this method to get an initial sizing before adopting a list or range partitioning plan. Maximizing the physical resources of the machine ultimately rests with the DBAs who manage the system. Developers need to stand ready to assist DBAs with analysis and syntax support.
Composite Partitioning
Composite partitioning requires a partition and subpartition. The composites are combinations of two types of partitioning—typically, list and range partitioning, or range and hash composite partitioning. Which of these you should choose depends on a few considerations. List and range composite partitioning is done for historical information and is well suited for data warehouses. This method lets you partition on unordered or unrelated column values.
A composite partition like this uses the range as the partition and the list as the subpartition, like the following:
PARTITION BY RANGE (rental_date) SUBPARTITION BY LIST (state) (PARTITION FQ1_1999 VALUES LESS THAN (TO_DATE('1-APR-2011','DD-MON-YYYY')) (SUBPARTITION offshore VALUES('Alaska', 'Hawaii') , SUBPARTITION west VALUES('California', 'Oregon', 'Washington') , SUBPARTITION desert VALUES ('Arizona','New Mexico') , SUBPARTITION rockies VALUES ('Colorado', 'Idaho', 'Montana', 'Wyoming') , ... ) ,(PARTITION FQ2_1999 VALUES LESS THAN (TO_DATE('1-APR-2011','DD-MON-YYYY')) (SUBPARTITION offshore VALUES('Alaska', 'Hawaii') , SUBPARTITION west VALUES('California', 'Oregon', 'Washington') , SUBPARTITION desert VALUES ('Arizona','New Mexico') , SUBPARTITION rockies VALUES ('Colorado', 'Idaho', 'Montana', 'Wyoming') , ... ) , ... ) |
Range and hash composite partitioning is done for historical information when you also need to stripe data. Striping is the process of creating an attribute in a table that acts as a natural subtype or separator of data. Users typically view data sets of one subtype, which means organizing the data by stripes (subtypes) can speed access based on user access patterns.
Range is typically the partition and the hash is the subpartition in this composite partitioning schema. The syntax for this type of partition is shown next:
PARTITION BY RANGE (rental_date) SUBPARTITION BY HASH(store) SUBPARTITIONS 8 STORE IN (tablespace1, tablespace2, tablespace3 ,tablespace4, tablespace5, tablespace6 ,tablespace7, tablespace8) ( PARTITION rental_jan2011 VALUES LESS THAN TO_DATE('31-JAN-11','DD-MON-YY') , PARTITION rental_feb2011 VALUES LESS THAN TO_DATE('28-FEB-11','DD-MON-YY') , PARTITION rental_mar2011 VALUES LESS THAN TO_DATE('31-MAR-11','DD-MON-YY') , ... ) |
Logging Table Function
It is interesting when somebody remembers a presentation from 10 years ago. They asked if it was possible in PL/pgSQL to write an autonomous procedure to log data when calling a table view function. The answer is two fold. PL/pgSQL doesn’t support autonomous functions or procedures like the Oracle database but it doesn’t need to because unless you invoke a transaction it auto commits writes.
Logging table functions are important for security auditing and compliance management against laws, like SOX, HIPAA, and FERPA. All too many systems lack the basic ability to audit who queries records without raising an error and blocking the access. That means the bad actor or actress gains the ability to probe the system for weaknesses before determining an attack vector. It’s often better to capture the unauthorized access and take direct action to protect both the the data and systems.
While the example lets an unauthorized person access the information in the first version of the student_query, it blocks access by reporting no rows returned in the latter. Both versions of the query log the data and thereby collect the evidence necessary to act against the hack.
This blog post shows you how to write it and test it. Follow the following steps:
- Create the necessary tables and data to work with a logging PL/pgSQL table view function:
/* Conditionally drop and create table. */ DROP TABLE IF EXISTS student; CREATE TABLE student ( student_id SERIAL , first_name VARCHAR(20) , last_name VARCHAR(20) , hogwarts_house VARCHAR(10)); /* Conditionally drop and create table. */ DROP TABLE IF EXISTS logger; CREATE TABLE logger ( logger_id SERIAL , app_user VARCHAR(30) , queried_student VARCHAR(30) , query_time TIMESTAMP ); /* Insert one record into table. */ INSERT INTO student ( first_name, last_name, hogwarts_house ) VALUES ( 'Harry', 'Potter', 'Gryffindor' ) ,( 'Hermione', 'Granger', 'Gryffindor' ) ,( 'Ronald', 'Weasily', 'Gryffindor' ) ,( 'Draco', 'Malfoy', 'Slytherin' ) ,( 'Vincent', 'Crabbe', 'Slytherin' ) ,( 'Susan', 'Bones', 'Hufflepuff' ) ,( 'Hannah', 'Abbott', 'Hufflepuff' ) ,( 'Luna', 'Lovegood', 'Ravenclaw' ) ,( 'Cho', 'Chang', 'Ravenclaw' ) ,( 'Gilderoy', 'Lockhart', 'Ravenclaw' );
- While not necessary if you’re very familiar with PL/pgSQL, it may be helpful to review:
- The SET command that lets you assign a value to a session-level variable, which you can later use in a PL/pgSQL block.
- The SELECT-INTO statement in a DO-block.
Here’s a test script that demonstrates both:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
/* Set a session-level variable. */ SET credential.app_user = 'Draco Malfoy'; /* Secure the value from a session-level variable. */ SELECT current_setting('credential.app_user'); /* DO $$ DECLARE input VARCHAR(30) := 'Hermione'; output VARCHAR(30); BEGIN /* Sample for partial name construction of full name. */ SELECT CONCAT(s.first_name, ' ', s.last_name) AS student_name INTO output FROM student s WHERE CONCAT(s.first_name, ' ', s.last_name) LIKE '%'||input||'%'; /* Show result of local assignment via a query. */ RAISE NOTICE '[%][%]', current_setting('credential.app_user'), output; END; $$;
There’s an important parsing trick to this sample program. It uses the LIKE operator rather than the SIMILAR TO operator because the parser fails to recognize the SIMILAR TO operator.
The DO-block returns the following output:
NOTICE: [Draco Malfoy][Hermione Granger]
- This creates the student_query logging table function, which takes a partial portion of a students first and last name to return the student information. While the example only returns the name and the Hogwarts House it lays a foundation for a more complete solution.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
CREATE OR REPLACE FUNCTION student_query (partial_name VARCHAR) RETURNS TABLE ( first_naem VARCHAR(20) , last_name VARCHAR(20) , hogwarts_house VARCHAR(10) ) AS $$ DECLARE queried VARCHAR; by_whome VARCHAR; BEGIN /* Query separately because embedding in insert statement fails. */ SELECT CONCAT(s.first_name, ' ', s.last_name) AS student_name FROM student s INTO queried WHERE CONCAT(s.first_name, ' ', s.last_name) LIKE '%'||partial_name||'%'; /* Log the query with the credentials of the user. */ INSERT INTO logger ( app_user , queried_student , query_time ) VALUES ( current_setting('credential.app_user') , queried , NOW()); /* Return the result set without disclosing the query was recorded. */ RETURN QUERY SELECT s.first_name , s.last_name , s.hogwarts_house FROM student s WHERE CONCAT(s.first_name, ' ', s.last_name) LIKE '%'||partial_name||'%'; END; $$ LANGUAGE plpgsql;
- You can test the function by calling it, like this:
SELECT * FROM student_query('Hermione');
It displays:
first_naem | last_name | hogwarts_house ------------+-----------+---------------- Hermione | Granger | Gryffindor (1 row)
You can check the logging table and discover who looked up another student’s records.
SELECT * FROM logger;
It displays:
logger_id | app_user | queried_student | query_time -----------+--------------+------------------+---------------------------- 1 | Draco Malfoy | Hermione Granger | 2022-05-29 22:51:50.398987 (1 row)
- Assuming you’ve built an authorized_user function that returns a Boolean, you can add a call to it in the WHERE clause. For simplicity, let’s implement the function to deny all users, like:
1 2 3 4 5 6 7 8 9 10
CREATE OR REPLACE FUNCTION authorized_user (user_name VARCHAR) RETURNS BOOLEAN AS $$ DECLARE lv_retval BOOLEAN := FALSE; BEGIN RETURN lv_retval; END; $$ LANGUAGE plpgsql;
You can now replace the query on lines 28 through 32 with the new one below. The added clause on line 33 denies access to unauthorized users because there aren’t any.
28 29 30 31 32 33
SELECT s.first_name , s.last_name , s.hogwarts_house FROM student s WHERE CONCAT(s.first_name, ' ', s.last_name) LIKE '%'||partial_name||'%' AND authorized_user(current_setting('credential.app_user'));
While it returns:
first_naem | last_name | hogwarts_house ------------+-----------+---------------- (0 rows)
The logger table shows two entries. One for the query that returned a value and one for the version that didn’t.
logger_id | app_user | queried_student | query_time -----------+--------------+------------------+---------------------------- 1 | Draco Malfoy | Hermione Granger | 2022-05-29 23:23:39.82063 2 | Draco Malfoy | Hermione Granger | 2022-05-29 23:23:40.736945 (2 rows)
In both cases the bad actor Draco Malfoy’s unauthorized access is captured and he was denied any information without alerting him to the security precaution in a logging table function.
As always, I hope this helps those looking for this type of solution.