MacLochlainns Weblog

Michael McLaughlin's Technical Blog

Site Admin

Archive for the ‘Fedora’ Category

Postgres Reading Files

without comments

I’m working on items for migrating my database class from Oracle to PostgreSQL. I ran into an interesting limitation when I tried using the COPY command to read an external CSV file.

I had prepared the system by creating a new directory hierarchy owned by the postgres user on top of a /u01/app mount point. I set the ownership of the directories and files with the following command from the /u01/app mount point:

chown -R postgres:postgres postgres

After running the following command:

COPY transaction_upload
FROM '/u01/app/upload/postgres/transaction_upload_postgres.csv' DELIMITERS ',' CSV;

The command raised the following error:

COPY transaction_upload FROM '/u01/app/upload/postgres/transaction_upload_postgres.csv' DELIMITERS ',' CSV;
ERROR:  must be superuser or a member of the <code>pg_read_server_files</code> role to COPY from a file
HINT:  Anyone can COPY to stdout or from stdin. psql's \copy command also works for anyone.

The two options for fixing the problem are: Changing the student user to a superuser, and granting the pg_read_server_files role to the student user. Changing the student user to a superuser isn’t really a practical option. So, I connected as the postgres superuser and granted the pg_read_server_files role to the student user. It is a system level role and therefore doesn’t limit the role to only the videodb database.

As the postgres user, type the following command to grant the pg_read_server_files role to the system user:

GRANT pg_read_server_files TO student;

After granting the role to the student user, I created a small test case. The test table definition is:

CREATE TABLE test
( id          INTEGER
, first_name  VARCHAR(20)
, last_name   VARCHAR(20));

I created a test.csv file in the /u01/app/upload/postgres directory, like:

1,Simon,Bolivar
2,Peter,Davenport
3,Michael,Swan

The test.csv file requires the following permissions and ownerships:

-rw-r--r--. 1 postgres postgres 49 Nov 13 10:56 test.csv

The permissions are user read-write, groups read, and others read. The ownership should be granted to postgres and the primary group for the postgres user, which should also be postgres.

You can then connect to psql as the student user with the database set to videodb and run the following copy command:

COPY test
FROM '/u01/app/upload/postgres/test.csv' DELIMITERS ',' CSV;

If you put a comma at the end of each line, like you would do in MySQL, it raises an error. The trailing comma raises the following error:

ERROR:  extra data after last expected column

If you forget a delimiting commas somewhere on a line, the copy command raises the following error:

ERROR:  missing data for column "last_name"
CONTEXT:  COPY tester, line 3: "3,Michael Swan"

The error points to the column after the missing column. The context points to the line number while displaying the text.

You should take careful note that the copy command is an appending command. If you run it a second time, you insert a duplicate set of values in the target table.

After experimenting, its time to fix my student instance. The transaction_upload_mysql.csv file has two critical errors that need to be fixed. They are:

  1. A comma terminates each line, which would raise an extra data after last expected column error.
  2. A comma terminates each line followed by some indefinite amount of whitespace, which would also raise an extra data after last expected column error.

Since I have students with little expertise in Unix or Linux commands, I must provide a single command that they can use to convert the file with problems to one without problems. However, they should copy the transaction_upload_mysql.csv file to ensure they don’t disable the equivalent functionality for the MySQL solution space.

They should copy two files as the root user from the mysql directory to the postgres directory, as follows:

cp /u01/app/mysql/upload/transaction_upload_mysql.csv /u01/app/postgres/upload/transaction_upload_postgres.csv
cp /u01/app/mysql/upload/transaction_upload2_mysql.csv /u01/app/postgres/upload/transaction_upload2_postgres.csv

As the root user in the /u01/app/upload/postgres directory, run the following command:

cat transaction_upload_postgres.csv | sed -e 's/\,$//g' > x; cat x | sed -e 's/\,[[:space:]]*$//g' > y; mv y transaction_upload_postgres.csv; rm x

Please check the file permissions and ownerships with the ll (long list) command. If the file isn’t like this:

-rw-r--r--. 1 postgres postgres 49 Nov 13 10:56 transaction_upload_postgres.csv

Then, they should be able to change it as the root user with these commands:

chown postgres:postgres transaction_upload_postgres.csv
chmod 544 transaction_upload_postgres.csv

Lastly, they should connect to the psql as the student user, using the videodb database and run the following command:

COPY transaction_upload
FROM '/u01/app/postgres/upload/transaction_upload_postgres.csv' DELIMITERS ',' CSV;

A query of the import table with this:

SELECT COUNT(*) FROM transaction_upload;

should return:

 count 
-------
 11520
(1 row)

As always, I hope this helps those looking for some explanation and example on the copy feature of PostgreSQL.

Written by maclochlainn

November 13th, 2019 at 7:38 am

PostgreSQL and LPAD

without comments

While porting my Oracle code to PostgreSQL I encountered a little quirk. It’s probably not a quirk except for the fact that I’ve worked in Oracle so long. Oracle implicitly type casts so well that we seldom notice.

PostreSQL doesn’t work like Oracle. It does type cast sometimes but not very often. I tried porting the following segment from my Oracle stored procedure to PostgreSQL:

/* Add account number with zeros. */
FOR j IN 1..50 LOOP
  INSERT INTO account_list
  VALUES
  ( lv_airport_code||'-'||LPAD(j,6,'0')
  , NULL
  , NULL
  , 1002
  , 1002 );
END LOOP;

Oracle implicitly casts the integer j to a text string before running the LPAD function. In PostgreSQL, it doesn’t cast the integer to a text string before calling the LPAD function. Since the LPAD function is not overloaded, calling it with an integer, integer, and text set of parameters fails at runtime but the code doesn’t raise an exception when compiling it as a stored procedure.

At runtime, it raises the following error:

ERROR:  invalid INPUT syntax FOR INTEGER: "function lpad(integer, integer, unknown) does not exist"
CONTEXT:  PL/pgSQL FUNCTION seed_account_list() line 48 at assignment

You can fix the code by explicitly casting the LPAD function’s first parameter to a text string. You do that as follows below:

/* Add account number with zeros. */
FOR j IN 1..50 LOOP
  INSERT INTO account_list
  VALUES
  ( lv_airport_code||'-'||LPAD(j::text,6,'0')
  , NULL
  , NULL
  , 1002
  , 1002 );
END LOOP;

This fixes the casting problem. As always, I hope this helps those looking for a solution.

Written by maclochlainn

November 12th, 2019 at 4:08 pm

Postgres Overloaded Routines

without comments

Earlier I showed how to write an anonymous block in PostgreSQL PL/pgSQL to drop routines, like functions and procedures. However, it would only work when they’re not overloaded functions or procedures. The following lets you drop all routines, including overloaded functions and procedures. Overloaded procedures are those that share the same name but have different parameter lists.

Before you can test the anonymous block, you need to create a set of overloaded functions or procedures. You can create a set of overloaded hello procedures with the following syntax:

CREATE FUNCTION hello()
RETURNS text AS
$$
DECLARE
  output  VARCHAR;
BEGIN
  SELECT 'Hello World!' INTO output;
  RETURN output;
END
$$ LANGUAGE plpgsql;
 
CREATE FUNCTION hello(whom text)
RETURNS text AS
$$
DECLARE
  output  VARCHAR;
BEGIN
  SELECT CONCAT('Hello ',whom,'!') INTO output;
  RETURN output;
END
$$ LANGUAGE plpgsql;
 
CREATE FUNCTION hello(id int, whom text)
RETURNS text AS
$$
DECLARE
  output  VARCHAR;
BEGIN
  SELECT CONCAT('[',id,'] Hello ',whom,'!') INTO output;
  RETURN output;
END
$$ LANGUAGE plpgsql;

You can test the overloaded hello function, like so from the videodb schema:

videodb=> SELECT hello();
    hello     
--------------
 Hello World!
(1 ROW)
 
videodb=> SELECT hello('Captain Marvel');
         hello         
-----------------------
 Hello Captain Marvel!
(1 ROW)
 
videodb=> SELECT hello(1,'Captain America');
           hello            
----------------------------
 [1] Hello Captain America!
(1 ROW)

Then, you can query the information_schema to verify that you’ve created a set of overloaded procedures with the following query:

SELECT   proc.specific_schema AS procedure_schema
,        proc.specific_name
,        proc.routine_name AS procedure_name
,        proc.external_language
,        args.parameter_name
,        args.parameter_mode
,        args.data_type
FROM     information_schema.routines proc left join information_schema.parameters args
ON       proc.specific_schema = args.specific_schema
AND      proc.specific_name = args.specific_name
WHERE    proc.routine_schema NOT IN ('pg_catalog', 'information_schema')
AND      proc.routine_type IN ('FUNCTION','PROCEDURE')
ORDER BY procedure_schema
,        specific_name
,        procedure_name
,        args.ordinal_position;

It should return the following:

 procedure_schema | specific_name | procedure_name | external_language | parameter_name | parameter_mode | data_type 
------------------+---------------+----------------+-------------------+----------------+----------------+-----------
 public           | hello_35451   | hello          | PLPGSQL           |                |                | 
 public           | hello_35452   | hello          | PLPGSQL           | whom           | IN             | text
 public           | hello_35453   | hello          | PLPGSQL           | id             | IN             | integer
 public           | hello_35453   | hello          | PLPGSQL           | whom           | IN             | text
(4 rows)

The set session command maps the videodb catalog for the following anonymous block program.

SET SESSION "videodb.catalog_name" = 'videodb';

The following anonymous block lets you get rid of any ordinary or overloaded function and procedure:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
DO $$
DECLARE
  /* Declare an indefinite length string for SQL statement. */
  sql  VARCHAR;
 
  /* Declare variables to manage cursor return values. */
  row  RECORD;
  arg  VARCHAR;
 
  /* Declare parameter list. */
  list VARCHAR;
 
  /* Declare a routine cursor. */
  routine_cursor CURSOR FOR
    SELECT routine_name
    ,      specific_name
    ,      routine_type
    FROM   information_schema.routines
    WHERE  specific_catalog = current_setting('videodb.catalog_name')
    AND    routine_schema = 'public';
 
  /* Declare a parameter cursor. */
  parameter_cursor CURSOR (cv_specific_name varchar) FOR
    SELECT args.data_type
    FROM   information_schema.parameters args
    WHERE  args.specific_schema = 'public'
    AND    args.specific_name = cv_specific_name;
 
BEGIN
  /* Open the cursor. */
  OPEN routine_cursor;
  <<row_loop>>
  LOOP
    /* Fetch table names. */
    FETCH routine_cursor INTO row;
 
    /* Exit when no more records are found. */
    EXIT row_loop WHEN NOT FOUND;
 
    /* Initialize parameter list. */
    list := '(';
 
    /* Open the parameter cursor. */
    OPEN parameter_cursor(row.specific_name::varchar);
    <<parameter_loop>>
    LOOP
      FETCH parameter_cursor INTO arg;
 
      /* Exit the parameter loop. */
      EXIT parameter_loop WHEN NOT FOUND;
 
      /* Add parameter and delimit more than one parameter with a comma. */
      IF LENGTH(list) > 1 THEN
        list := CONCAT(list,',',arg);
      ELSE
        list := CONCAT(list,arg);
      END IF;
    END LOOP;
 
    /* Close the parameter list. */
    list := CONCAT(list,')');
 
    /* Close the parameter cursor. */
    CLOSE parameter_cursor;
 
    /* Concatenate together a DDL to drop the table with prejudice. */
    sql := 'DROP '||row.routine_type||' IF EXISTS '||row.routine_name||list;
 
    /* Execute the DDL statement. */
    EXECUTE sql;
  END LOOP;
 
  /* Close the routine_cursor. */
  CLOSE routine_cursor;
END;
$$;

Now, you possess the magic to automate cleaning up your schema when you combine this with my earlier post on dynamically dropping tables, sequences, and triggers.

Written by maclochlainn

November 5th, 2019 at 12:11 am

DBeaver for PostgreSQL

without comments

I’m migrating my database classes from the Oracle database to the PostgreSQL database. Using the Oracle Express Edition has always required a virtualized image because students use Windows and Mac OS. Also, the university doesn’t like my use of a virtualized image. Virtualization imposes incremental cost on students to have high end laptops.

The available Docker images don’t typically support the Oracle Express Edition. That means there are licensing implications tied to Oracle.

As a committee, we decided to use PostgreSQL as our new database platform. We opted to use PostgreSQL over MySQL because it supports arrays and stored procedures. PostgreSQL PL/pgSQL are a solid implementation of stored procedures. While MySQL supports a stored procedure language, it effectively discourages using it.

We use sqlplus, mysql, or psql Command-Line Interface (CLI) as the primary interface. That’s because CLI supports production-oriented deployment and DevOps practices. The Open Source DBeaver project provides a clean native GUI management console for PostgreSQL. DBeaver also supports other databases, which pgAdmin doesn’t.

You click the Download button from the DBeaver home page. It support Windows, Mac OS, and Linux distributions. I downloaded the Linux RPM package to my Downloads directory. You can run the RPM with the following syntax:

rpm -ivh dbeaver-ce-6.2.3-stable.x86_64.rpm

It doesn’t check for dependencies. The installation is quick. You can verify that it’s installed with the following command:

which dbeaver

It installs here:

/usr/bin/dbeaver

If you failed to install the PostgreSQL JAR file, you’ll see the following error message:

It will look like the following on Linux:

You should have a user in the pg_hba.conf file, like the following:

host    all             all             127.0.0.1/32            trust

As always, I hope this helps those solving problems.

Written by maclochlainn

November 3rd, 2019 at 5:06 pm

Java and Postgres

without comments

I wanted to get Java working with PostgreSQL to test some GUI interfaces on Linux. Figuring out the necessary JAR file for the JDBC was my first hurdle. I found it was postgreSQL-42-2.5.jar file.

You can download it with the following command line:

wget https://jdbc.postgresql.org/download/postgresql-42.2.5.jar

I downloaded it to a Java directory off the home/student directory. Then, I added the following CLASSPATH to local java.env environment file.

#!/usr/bin/bash
 
# Set the Java CLASSPATH environment variable to include the JDBC jar file.
export set CLASSPATH=/home/student/Java/postgresql-42.2.5.jar:.

I sourced the postgresql-42.2.5.jar file and I wrote the following JavaTest.java program:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
/* Import classes. */
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;
import java.util.logging.Level;
import java.util.logging.Logger;
 
public class JavaTest { 
  public static void main(String[] args) {
    /* Set default strings for the connection. */
    String url = "jdbc:postgresql://localhost:5432/videodb";
    String user = "student";
    String password = "student";
 
    /* Try the connection and statement. */
    try {
      Connection conn = DriverManager.getConnection(url, user, password);
      Statement st = conn.createStatement();
      ResultSet rs = st.executeQuery("SELECT VERSION()");
 
      if (rs.next()) {
        System.out.println(rs.getString(1));
      }
 
      /* Close the result set and statement. */
      rs.close();
      st.close();
    }
    catch (SQLException e) {
      Logger logger = Logger.getLogger(JavaTest.class.getName());
      logger.log(Level.SEVERE, e.getMessage(), e);
    }
  }
}

I compiled the JavaTest.java program and tested it. It failed with the following error:

FATAL: Ident authentication failed for user - Unable to connect to PostgreSQL

The failure occurred because I hadn’t allowed the connection in PostgreSQL’s pg_hba.conf file. I changed the following line in my pg_hba.conf file:

host    all             all             127.0.0.1/32            ident

to

host    all             all             127.0.0.1/32            trust

Then, I restarted the postgresql-11 service, like this:

systemctl restart postgresql-11.service

The JavaTest program ran successfully and returned:

PostgreSQL 11.5 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 9.1.1 20190503 (Red Hat 9.1.1-1), 64-bit

You can extend the logic to output a comma-separated value file by leveraging the JDBC help page, like this:

/* Import classes. */
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.ResultSetMetaData;
import java.sql.SQLException;
import java.sql.Statement;
import java.util.logging.Level;
import java.util.logging.Logger;
 
/* Create class withonly a static method for runtime testing. */
public class JavaReturnSet { 
  public static void main(String[] args) {
    /* Set default strings for the connection. */
    String url = "jdbc:postgresql://localhost:5432/videodb";
    String user = "student";
    String password = "student";
 
    /* Try the connection and statement. */
    try {
      /* Set connection, statement, and result set. */
      Connection conn = DriverManager.getConnection(url, user, password);
      Statement st = conn.createStatement();
 
      /* Use the + to concatenate lines for query clarity. */
      ResultSet rs = st.executeQuery("SELECT m.account_number\n" +
                                     ",      CONCAT(c.last_name,', ',c.first_name)\n" +
                                     "FROM   member m JOIN contact c\n" +
                                     "ON     m.member_id = c.member_id");
 
      /* Get query metadata for subsequent management of results. */
      ResultSetMetaData rsmd = rs.getMetaData();
      String line;
 
      while (rs.next()) {
        /* Initialize the line output for each row. */
        line = "";
 
        /* Process the columns. */
        for (int i = 0; i < rsmd.getColumnCount(); i++) {
          if (rsmd.getColumnType(i + 1) == 12) {
            line = line + "\"" + rs.getString(i + 1) + "\"";
          }
          else {
            line = line + rs.getInt(i + 1);
          }
 
          /* Put a comma between output columns. */
          if (i < rsmd.getColumnCount() - 1) {
            line = line + ",";
          }
        }
        System.out.println(line);
      }
 
      /* Close the result set and statement. */
      rs.close();
      st.close();
    }
    catch (SQLException e) {
      Logger logger = Logger.getLogger(JavaReturnSet.class.getName());
      logger.log(Level.SEVERE, e.getMessage(), e);
    }
  }
}

As always, I hope this helps those looking for a clean solution.

Written by maclochlainn

November 2nd, 2019 at 11:46 pm

Postgres Foreign Constraints

without comments

You can’t disable a foreign key constraint in Postgres, like you can do in Oracle. However, you can remove the foreign key constraint from a column and then re-add it to the column.

Here’s a quick test case in five steps:

  1. Drop the big and little table if they exists. The first drop statement requires a cascade because there is a dependent little table that holds a foreign key constraint against the primary key column of the big table. The second drop statement does not require the cascade keyword because there is not a dependent foreign key constraint.

    DROP TABLE IF EXISTS big CASCADE;
    DROP TABLE IF EXISTS little;

  2. Create the big and little tables:

    -- Create the big table.
    CREATE TABLE big
    ( big_id     SERIAL
    , big_text   VARCHAR(20) NOT NULL
    , CONSTRAINT pk_little_1 PRIMARY KEY (big_id));
     
    -- Display the big table.
    \d big
     
    -- Create little table.
    CREATE TABLE little
    ( little_id     SERIAL
    , big_id        INTEGER     NOT NULL
    , little_text   VARCHAR(20) NOT NULL
    , CONSTRAINT fk_little_1 FOREIGN KEY (big_id) REFERENCES big (big_id));
     
    -- Display the little table.
    \d little

    If you failed to designate the big_id column as a primary key constrained, Postgres will raise the following exception:

    ERROR:  there IS no UNIQUE CONSTRAINT matching given KEYS FOR referenced TABLE "big"

  3. Insert a non-compliant row in the little table. An insert statement into the little table with a value for the foreign key column that does not exist in the big_id column of the big table would fail with the following error:

    ERROR:  INSERT OR UPDATE ON TABLE "little" violates FOREIGN KEY CONSTRAINT "fk_little_1"
    DETAIL:  KEY (big_id)=(2) IS NOT present IN TABLE "big".

    Re-enabling the foreign key constraint, the insert statement succeeds after you first insert a new row into the big table with the foreign key value for the little table as its primary key. The following two insert statements add a row to both the big and little table:

    -- Insert into a big table.
    INSERT INTO big
    (big_text)
    VALUES
    ('Cat in the Hat 2');
     
    -- Insert into a little table.
    INSERT INTO little
    (big_id
    ,little_text)
    VALUES
    ( 2
    ,'Thing 3');

    Then, you can query it like this:

    SELECT *
    FROM   big b JOIN little l ON b.big_id = l.big_id;

     big_id |     big_text     | little_id | big_id | little_text 
    --------+------------------+-----------+--------+-------------
          1 | Cat IN the Hat 1 |         1 |      1 | Thing 1
          1 | Cat IN the Hat 1 |         2 |      1 | Thing 2
          2 | Cat IN the Hat 2 |         3 |      2 | Thing 3
    (3 ROWS)

  4. You can drop a foreign key constraint with the following syntax:

    ALTER TABLE little DROP CONSTRAINT fk_little_1;

  5. You can add a foreign key constraint with the following syntax:

    ALTER TABLE little ADD CONSTRAINT fk_little_1 FOREIGN KEY (big_id) REFERENCES big (big_id);

As always, I hope this helps you solve problems.

Written by maclochlainn

October 8th, 2019 at 8:49 pm

Postgres Remove Constraints

without comments

You can’t disable a not null constraint in Postgres, like you can do in Oracle. However, you can remove the not null constraint from a column and then re-add it to the column.

Here’s a quick test case in four steps:

  1. Drop a demo table if it exists:

    DROP TABLE IF EXISTS demo;

  2. Create a demo table if it exists:

    CREATE TABLE demo
    ( demo_id    SERIAL
    , demo_text  VARCHAR(20) NOT NULL );

  3. Insert a compliant row in the demo table if it exists:

    INSERT INTO demo
    (demo_text)
    VALUES
    ('Thing 1');

    Attempt to insert another row with a null value in the demo_text column:

    INSERT INTO demo
    (demo_text)
    VALUES
    (NULL);

    It raises the following error:

    INSERT 0 1
    psql:remove_not_null.sql:22: ERROR:  NULL VALUE IN COLUMN "demo_text" violates not-NULL CONSTRAINT
    DETAIL:  Failing ROW contains (2, NULL).

  4. You can drop the not null constraint from the demo_text column:

    ALTER TABLE demo ALTER COLUMN demo_text DROP NOT NULL;

    You can now successfully insert a row with a demo_text column value of null. After you have performed your table maintenance you can add the not null constraint back on to the demo_text column.

    You need to update the row with a null value in the demo_text column with a valid value before you re-add the not null constraint. The following shows an update statement that replaces the null value with a text string:

    UPDATE demo
    SET    demo_text = 'Thing 2'
    WHERE  demo_text IS NULL;

    Now, you can change the demo_text column back to a not null constrained column with the following syntax.

    ALTER TABLE demo ALTER COLUMN demo_text SET NOT NULL;

  5. While you can not defer the constraint, removing it and adding it back works well.

Written by maclochlainn

October 8th, 2019 at 12:26 am

Postgres Check Constraints

without comments

The Postgres 11 database documentation says that it supports naming constraints. While you can create a table with named constraints inside the CREATE TABLE statement, the names are not assigned to the not null check constraint.

Here’s a quick test case in three steps:

  1. Drop a demo table if it exists:

    DROP TABLE IF EXISTS demo;

  2. Drop a demo table if it exists:

    CREATE TABLE demo
    ( demo_id    SERIAL
    , demo_text  VARCHAR(20) CONSTRAINT nn_demo_1 NOT NULL );

  3. Create a demo table if it exists:

    SELECT substr(check_clause,1,strpos(check_clause,' ')-1) AS check_column
    ,      constraint_name
    FROM   information_schema.check_constraints
    WHERE  check_clause LIKE 'demo_text%';

    You should see the following output with a parsed check_column name and the system generated check constraint name rather than the nn_demo_1 constraint name:

     check_column |    constraint_name    
    --------------+-----------------------
     demo_text    | 2200_18896_2_not_null
    (1 row)

  4. On the bright side, you can name primary key and foreign key constraints.

Written by maclochlainn

October 7th, 2019 at 10:35 pm

Python-Postgres Query

without comments

As I committed to a student, here are sample programs for writing a Python query against the Postgres 11 database. The first one returns rows or tuples. The latter formats the text returned as columns.

If you’re one of many looking for the key psycopg2 driver library, you can find it in most distro repositories as: python3-psycopg2. You can use dnf or yum to install it separately or you can install pgadmin4, which includes the psycopg2 library.

The first example returns the entire row from a new_hire table with two rows:

import psycopg2
 
try:
  # Open a connection to the database.
  connection = psycopg2.connect( user="student"
                               , password="student"
                               , port="5432"
                               , dbname="videodb")
 
  # Open a cursor.
  cursor = connection.cursor()
 
  # Assign a static query.
  query = "SELECT * FROM new_hire"
 
  # Parse and execute the query.
  cursor.execute(query)
 
  # Fetch all rows from a table.
  records = cursor.fetchall()
 
  # Read through and print the rows as tuples.
  for row in range(0, len(records)):
    print(records[row]) 
 
except (Exception, psycopg2.Error) as error :
  print("Error while fetching data from PostgreSQL", error)
 
finally:
  # Close the database connection.
  if (connection):
    cursor.close()
    connection.close()

The first example returns the rows as tuples, which is probably desired if you want to consume the result in another Python program. Here’s the output retrieved:

(1001, 'Malcolm', 'Jacob', 'Lewis', datetime.date(2018, 2, 14))
(1002, 'Henry', None, 'Chabot', datetime.date(1990, 7, 31))

The second one returns the rows and formats the columns into output for a csv style file:

import psycopg2
 
try:
  # Open a connection to the database.
  connection = psycopg2.connect( user="student"
                               , password="student"
                               , port="5432"
                               , dbname="videodb")
 
  # Open a cursor.
  cursor = connection.cursor()
 
  # Assign a static query.
  query = "SELECT * FROM new_hire"
 
  # Parse and execute the query.
  cursor.execute(query)
 
  # Read through and print the formatted columns of each row.
  for (new_hire_id, first_name, middle_name, last_name, hire_date) in cursor:
    if (isinstance(middle_name,type(None))):
      print("{},'{} {}','{:%d-%b-%Y}'".format(new_hire_id, first_name, last_name, hire_date))
    else:
      print("{},'{} {} {}','{:%d-%b-%Y}'".format(new_hire_id, first_name, middle_name, last_name, hire_date))
 
except (Exception, psycopg2.Error) as error :
  print("Error while fetching data from PostgreSQL", error)
 
finally:
  # Close the database connection.
  if (connection):
    cursor.close()
    connection.close()

The second one returns the rows and formatted columns for a csv style file:

1001,'Malcolm Jacob Lewis','14-Feb-2018'
1002,'Henry Chabot','31-Jul-1990'

As always, I hope these help those looking for a starting place with Python and Postgres.

Written by maclochlainn

September 23rd, 2019 at 1:23 am

Cassandra on Fedora 30

without comments

The first thing to do with Fedora 30 is to check what part of Apache Cassandra is installed. You can use the following rpm command to determine that:

rpm -qa | grep cassandra

My Fedora 30 returned the following values:

cassandra-java-libs-3.11.1-12.fc30.x86_64
cassandra-python2-cqlshlib-3.11.1-12.fc30.x86_64
cassandra-3.11.1-12.fc30.x86_64
python2-cassandra-driver-3.18.0-1.fc30.x86_64

Notably missing from the list of rpm list is the cassandra-server package. You install cassandra-server with the def utility:

dnf install -y cassandra-server

You should get an installation log like the following for the cassandra-server package:

Fedora Magazine has a great Get Started with Apache Cassandra on Fedora article on all the steps required to setup clusters. This article only covers creating and enabling the Cassandra service, and setting up a single node Cassandra instance.

You start Cassandra with the following command as the root user:

systemctl start cassandra

You enable Cassandra with the following command as the root user:

systemctl enable cassandra

It creates the following symlink:

Created symlink /etc/systemd/system/multi-user.target.wants/cassandra.service → /usr/lib/systemd/system/cassandra.service.

You can connect to the Test cluster with the following command:

cqlsh

You should see the following:

Connected to Test Cluster at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 3.11.1 | CQL spec 3.4.4 | Native protocol v4]
Use HELP for help.

You can see the options by typing the help command:

Documented shell commands:
===========================
CAPTURE  CLS          COPY  DESCRIBE  EXPAND  LOGIN   SERIAL  SOURCE   UNICODE
CLEAR    CONSISTENCY  DESC  EXIT      HELP    PAGING  SHOW    TRACING
 
CQL help topics:
================
AGGREGATES               CREATE_KEYSPACE           DROP_TRIGGER      TEXT     
ALTER_KEYSPACE           CREATE_MATERIALIZED_VIEW  DROP_TYPE         TIME     
ALTER_MATERIALIZED_VIEW  CREATE_ROLE               DROP_USER         TIMESTAMP
ALTER_TABLE              CREATE_TABLE              FUNCTIONS         TRUNCATE 
ALTER_TYPE               CREATE_TRIGGER            GRANT             TYPES    
ALTER_USER               CREATE_TYPE               INSERT            UPDATE   
APPLY                    CREATE_USER               INSERT_JSON       USE      
ASCII                    DATE                      INT               UUID     
BATCH                    DELETE                    JSON            
BEGIN                    DROP_AGGREGATE            KEYWORDS        
BLOB                     DROP_COLUMNFAMILY         LIST_PERMISSIONS
BOOLEAN                  DROP_FUNCTION             LIST_ROLES      
COUNTER                  DROP_INDEX                LIST_USERS      
CREATE_AGGREGATE         DROP_KEYSPACE             PERMISSIONS     
CREATE_COLUMNFAMILY      DROP_MATERIALIZED_VIEW    REVOKE          
CREATE_FUNCTION          DROP_ROLE                 SELECT          
CREATE_INDEX             DROP_TABLE                SELECT_JSON

Here’s my script that creates Cassandra keyspace, which is more or less a database. You use the USE command to connect to the keyspace or database, like you would in MySQL. You do not have sequences in Cassandra because they’re not a good fit for a distributed architecture. Cassandra does not support a native procedural extension like relational databases. You must create User-defined functions (UDFs) by embedding the logic in Java.

This script does the following:

  • Creates a keyspace
  • Uses the keyspace
  • Conditionally drops tables and functions
  • Creates two tables
  • Inserts data into the two tables
  • Queries data from the tables

I also included a call to a UDF inside a query in two of the examples. One of the queries demonstrates how to return a JSON structure from a query. To simplify things and provide clarification of the scripts behaviors, the details are outlined below.

  • The first segment of the script creates the keyspace, changes the scope to use the keyspace, conditionally drop tables, create tables, and insert values into the tables:

    /* Create a keyspace in Cassandra, which is like a database
       in MySQL or a schema in Oracle. */
    CREATE KEYSPACE IF NOT EXISTS student
      WITH REPLICATION = {
         'class':'SimpleStrategy'
        ,'replication_factor': 1 }
      AND DURABLE_WRITES = true;
     
    /* Use the keyspace or connect to the database. */
    USE student;
     
    /* Drop the member table from the student keyspace. */
    DROP TABLE IF EXISTS member;
     
    /* Create a member table in the student keyspace. */
    CREATE TABLE member
    ( member_number       VARCHAR
    , member_type         VARCHAR
    , credit_card_number  VARCHAR
    , credit_card_type    VARCHAR
    , PRIMARY KEY ( member_number ));
     
    /* Conditionally drop the contact table from the student keyspace. */
    DROP TABLE IF EXISTS contact;
     
    /* Create a contact table in the student keyspace. */
    CREATE TABLE contact
    ( contact_number      VARCHAR
    , contact_type        VARCHAR
    , first_name          VARCHAR
    , middle_name         VARCHAR
    , last_name           VARCHAR
    , member_number       VARCHAR
    , PRIMARY KEY ( contact_number ));
     
    /* Insert a row into the member table. */
    INSERT INTO member
    ( member_number, member_type, credit_card_number, credit_card_type )
    VALUES
    ('SFO-12345','GROUP','2222-4444-5555-6666','VISA');
     
    /* Insert a row into the contact table. */
    INSERT INTO contact
    ( contact_number, contact_type, first_name, middle_name, last_name, member_number )
    VALUES
    ('CUS_00001','FAMILY','Barry', NULL,'Allen','SFO-12345');
     
    /* Insert a row into the contact table. */
    INSERT INTO contact
    ( contact_number, contact_type, first_name, middle_name, last_name, member_number )
    VALUES
    ('CUS_00002','FAMILY','Iris', NULL,'West-Allen','SFO-12345');
     
    /* Insert a row into the member table. */
    INSERT INTO member
    ( member_number, member_type, credit_card_number, credit_card_type )
    VALUES
    ('SFO-12346','GROUP','3333-8888-9999-2222','VISA');
     
    /* Insert a row into the contact table. */
    INSERT INTO contact
    ( contact_number, contact_type, first_name, middle_name, last_name, member_number )
    VALUES
    ('CUS_00003','FAMILY','Caitlin','Marie','Snow','SFO-12346');

    The following queries the member table:

    /* Select all columns from the member table. */
    SELECT * FROM member;

    It returns the following:

     member_number | credit_card_number  | credit_card_type | member_type
    ---------------+---------------------+------------------+-------------
         SFO-12345 | 2222-4444-5555-6666 |             VISA |       GROUP
         SFO-12346 | 3333-8888-9999-2222 |             VISA |       GROUP
  • Create a concatenate User-defined function (UDF) for Cassandra. The first step requires you to edit the cassandra.yaml file, which you find in the /etc/cassandra/default.conf directory. There is a single parameter that you need to edit, and it is the enable_user_defined_functions parameter. By default the parameter is set to false, and you need to enable it to create UDFs.

    If you open the cassandra.yaml file as the root user, you should find the parameter on line 987, like:

    983
    984
    985
    986
    987
    
    # If unset, all GC Pauses greater than gc_log_threshold_in_ms will log at
    # INFO level
    # UDFs (user defined functions) are disabled by default.
    # As of Cassandra 3.0 there is a sandbox in place that should prevent execution of evil code.
    enable_user_defined_functions: false

    After you make the edit, the cassandra.yaml file should look like this:

    983
    984
    985
    986
    987
    
    # If unset, all GC Pauses greater than gc_log_threshold_in_ms will log at
    # INFO level
    # UDFs (user defined functions) are disabled by default.
    # As of Cassandra 3.0 there is a sandbox in place that should prevent execution of evil code.
    enable_user_defined_functions: true

    After you make the change, you can create your own UDF. The following UDF formats the first, middle, and last name so there’s only one whitespace between the first and last name when there middle name value is null.

    This type of function must use a CALLED ON NULL INPUT clause in lieu of a RETURNS NULL ON NULL INPUT clause. The latter would force the function to return a null value if any one of the parameters were null.

    /* Drop the concatenate function because a replace disallows changing a
       RETURNS NULL ON NULL INPUT with a CALLED ON NULL INPUT without raising
       an "89: InvalidRequest" exception. */
    DROP FUNCTION concatenate;
     
    /* Create a user-defined function to concatenate names. */
    CREATE OR REPLACE FUNCTION concatenate (first_name VARCHAR, middle_name VARCHAR, last_name VARCHAR)
    CALLED ON NULL INPUT
    RETURNS VARCHAR
    LANGUAGE java
    AS $$
      /* Concatenate first and last names when middle name is null, and
         first, middle, and last names when middle name is not null. */
      String name;
     
      /* Check for null middle name. */
      if (middle_name == null) {
        name = first_name + " " + last_name; }
      else {
        name = first_name + " " + middle_name + " " + last_name; }
     
      return name;
    $$;
  • Query the values from the contact table with the UDF function in the SELECT-list:

    /* Query the contact information. */
    SELECT member_number
    ,      contact_number
    ,      contact_type
    ,      concatenate(first_name, middle_name, last_name) AS full_name
    FROM   contact;

    It returns the following:

     member_number | contact_number | contact_type | full_name
    ---------------+----------------+--------------+--------------------
         SFO-12345 |      CUS_00001 |       FAMILY |        Barry Allen
         SFO-12345 |      CUS_00002 |       FAMILY |    Iris West-Allen
         SFO-12346 |      CUS_00003 |       FAMILY | Caitlin Marie Snow

    Query the values from the contact table with a JSON format:

    /* Query the contact information and return in a JSON format. */
    SELECT JSON
           contact_number
    ,      contact_type
    ,      concatenate(first_name, middle_name, last_name) AS full_name
    FROM   contact;

    It returns the following:

     [json]
    -------------------------------------------------------------------------------------------------
    {"contact_number": "CUS_00001", "contact_type": "FAMILY", "full_name": "Barry Allen"}
    {"contact_number": "CUS_00002", "contact_type": "FAMILY", "full_name": "Iris West-Allen"}
    {"contact_number": "CUS_00003", "contact_type": "FAMILY", "full_name": "Caitlin Marie Snow"}

Written by maclochlainn

September 12th, 2019 at 1:17 am