Archive for the ‘Fedora’ Category
Postgres Reading Files
I’m working on items for migrating my database class from Oracle to PostgreSQL. I ran into an interesting limitation when I tried using the COPY
command to read an external CSV file.
I had prepared the system by creating a new directory hierarchy owned by the postgres user on top of a /u01/app
mount point. I set the ownership of the directories and files with the following command from the /u01/app
mount point:
chown -R postgres:postgres postgres |
After running the following command:
COPY transaction_upload FROM '/u01/app/upload/postgres/transaction_upload_postgres.csv' DELIMITERS ',' CSV; |
The command raised the following error:
COPY transaction_upload FROM '/u01/app/upload/postgres/transaction_upload_postgres.csv' DELIMITERS ',' CSV; ERROR: must be superuser or a member of the <code>pg_read_server_files</code> role to COPY from a file HINT: Anyone can COPY to stdout or from stdin. psql's \copy command also works for anyone. |
The two options for fixing the problem are: Changing the student
user to a superuser, and granting the pg_read_server_files
role to the student
user. Changing the student
user to a superuser isn’t really a practical option. So, I connected as the postgres
superuser and granted the pg_read_server_files
role to the student
user. It is a system level role and therefore doesn’t limit the role to only the videodb
database.
As the postgres
user, type the following command to grant the pg_read_server_files
role to the system
user:
GRANT pg_read_server_files TO student; |
After granting the role to the student
user, I created a small test case. The test table definition is:
CREATE TABLE test ( id INTEGER , first_name VARCHAR(20) , last_name VARCHAR(20)); |
I created a test.csv
file in the /u01/app/upload/postgres
directory, like:
1,Simon,Bolivar 2,Peter,Davenport 3,Michael,Swan |
The test.csv
file requires the following permissions and ownerships:
-rw-r--r--. 1 postgres postgres 49 Nov 13 10:56 test.csv |
The permissions are user read-write, groups read, and others read. The ownership should be granted to postgres
and the primary group for the postgres
user, which should also be postgres
.
You can then connect to psql
as the student
user with the database set to videodb
and run the following copy
command:
COPY test FROM '/u01/app/upload/postgres/test.csv' DELIMITERS ',' CSV; |
If you put a comma at the end of each line, like you would do in MySQL, it raises an error. The trailing comma raises the following error:
ERROR: extra data after last expected column |
If you forget a delimiting commas somewhere on a line, the copy command raises the following error:
ERROR: missing data for column "last_name" CONTEXT: COPY tester, line 3: "3,Michael Swan" |
The error
points to the column after the missing column. The context
points to the line number while displaying the text.
You should take careful note that the copy
command is an appending command. If you run it a second time, you insert a duplicate set of values in the target table.
After experimenting, its time to fix my student instance. The transaction_upload_mysql.csv file has two critical errors that need to be fixed. They are:
- A comma terminates each line, which would raise an extra data after last expected column error.
- A comma terminates each line followed by some indefinite amount of whitespace, which would also raise an extra data after last expected column error.
Since I have students with little expertise in Unix or Linux commands, I must provide a single command that they can use to convert the file with problems to one without problems. However, they should copy the transaction_upload_mysql.csv
file to ensure they don’t disable the equivalent functionality for the MySQL solution space.
They should copy two files as the root
user from the mysql
directory to the postgres
directory, as follows:
cp /u01/app/mysql/upload/transaction_upload_mysql.csv /u01/app/postgres/upload/transaction_upload_postgres.csv cp /u01/app/mysql/upload/transaction_upload2_mysql.csv /u01/app/postgres/upload/transaction_upload2_postgres.csv |
As the root user in the /u01/app/upload/postgres directory, run the following command:
cat transaction_upload_postgres.csv | sed -e 's/\,$//g' > x; cat x | sed -e 's/\,[[:space:]]*$//g' > y; mv y transaction_upload_postgres.csv; rm x |
Please check the file permissions and ownerships with the ll
(long list) command. If the file isn’t like this:
-rw-r--r--. 1 postgres postgres 49 Nov 13 10:56 transaction_upload_postgres.csv |
Then, they should be able to change it as the root user with these commands:
chown postgres:postgres transaction_upload_postgres.csv chmod 544 transaction_upload_postgres.csv |
Lastly, they should connect to the psql
as the student
user, using the videodb
database and run the following command:
COPY transaction_upload FROM '/u01/app/postgres/upload/transaction_upload_postgres.csv' DELIMITERS ',' CSV; |
A query of the import table with this:
SELECT COUNT(*) FROM transaction_upload; |
should return:
count ------- 11520 (1 row) |
As always, I hope this helps those looking for some explanation and example on the copy
feature of PostgreSQL.
PostgreSQL and LPAD
While porting my Oracle code to PostgreSQL I encountered a little quirk. It’s probably not a quirk except for the fact that I’ve worked in Oracle so long. Oracle implicitly type casts so well that we seldom notice.
PostreSQL doesn’t work like Oracle. It does type cast sometimes but not very often. I tried porting the following segment from my Oracle stored procedure to PostgreSQL:
/* Add account number with zeros. */ FOR j IN 1..50 LOOP INSERT INTO account_list VALUES ( lv_airport_code||'-'||LPAD(j,6,'0') , NULL , NULL , 1002 , 1002 ); END LOOP; |
Oracle implicitly casts the integer j
to a text
string before running the LPAD function. In PostgreSQL, it doesn’t cast the integer
to a text
string before calling the LPAD function. Since the LPAD
function is not overloaded, calling it with an integer
, integer
, and text
set of parameters fails at runtime but the code doesn’t raise an exception when compiling it as a stored procedure.
At runtime, it raises the following error:
ERROR: invalid INPUT syntax FOR INTEGER: "function lpad(integer, integer, unknown) does not exist" CONTEXT: PL/pgSQL FUNCTION seed_account_list() line 48 at assignment |
You can fix the code by explicitly casting the LPAD
function’s first parameter to a text
string. You do that as follows below:
/* Add account number with zeros. */ FOR j IN 1..50 LOOP INSERT INTO account_list VALUES ( lv_airport_code||'-'||LPAD(j::text,6,'0') , NULL , NULL , 1002 , 1002 ); END LOOP; |
This fixes the casting problem. As always, I hope this helps those looking for a solution.
Postgres Overloaded Routines
Earlier I showed how to write an anonymous block in PostgreSQL PL/pgSQL to drop routines, like functions and procedures. However, it would only work when they’re not overloaded functions or procedures. The following lets you drop all routines, including overloaded functions and procedures. Overloaded procedures are those that share the same name but have different parameter lists.
Before you can test the anonymous block, you need to create a set of overloaded functions or procedures. You can create a set of overloaded hello
procedures with the following syntax:
CREATE FUNCTION hello() RETURNS text AS $$ DECLARE output VARCHAR; BEGIN SELECT 'Hello World!' INTO output; RETURN output; END $$ LANGUAGE plpgsql; CREATE FUNCTION hello(whom text) RETURNS text AS $$ DECLARE output VARCHAR; BEGIN SELECT CONCAT('Hello ',whom,'!') INTO output; RETURN output; END $$ LANGUAGE plpgsql; CREATE FUNCTION hello(id int, whom text) RETURNS text AS $$ DECLARE output VARCHAR; BEGIN SELECT CONCAT('[',id,'] Hello ',whom,'!') INTO output; RETURN output; END $$ LANGUAGE plpgsql; |
You can test the overloaded hello
function, like so from the videodb
schema:
videodb=> SELECT hello(); hello -------------- Hello World! (1 ROW) videodb=> SELECT hello('Captain Marvel'); hello ----------------------- Hello Captain Marvel! (1 ROW) videodb=> SELECT hello(1,'Captain America'); hello ---------------------------- [1] Hello Captain America! (1 ROW) |
Then, you can query the information_schema
to verify that you’ve created a set of overloaded procedures with the following query:
SELECT proc.specific_schema AS procedure_schema , proc.specific_name , proc.routine_name AS procedure_name , proc.external_language , args.parameter_name , args.parameter_mode , args.data_type FROM information_schema.routines proc left join information_schema.parameters args ON proc.specific_schema = args.specific_schema AND proc.specific_name = args.specific_name WHERE proc.routine_schema NOT IN ('pg_catalog', 'information_schema') AND proc.routine_type IN ('FUNCTION','PROCEDURE') ORDER BY procedure_schema , specific_name , procedure_name , args.ordinal_position; |
It should return the following:
procedure_schema | specific_name | procedure_name | external_language | parameter_name | parameter_mode | data_type ------------------+---------------+----------------+-------------------+----------------+----------------+----------- public | hello_35451 | hello | PLPGSQL | | | public | hello_35452 | hello | PLPGSQL | whom | IN | text public | hello_35453 | hello | PLPGSQL | id | IN | integer public | hello_35453 | hello | PLPGSQL | whom | IN | text (4 rows) |
The set session
command maps the videodb
catalog for the following anonymous block program.
SET SESSION "videodb.catalog_name" = 'videodb'; |
The following anonymous block lets you get rid of any ordinary or overloaded function and procedure:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 | DO $$ DECLARE /* Declare an indefinite length string for SQL statement. */ sql VARCHAR; /* Declare variables to manage cursor return values. */ row RECORD; arg VARCHAR; /* Declare parameter list. */ list VARCHAR; /* Declare a routine cursor. */ routine_cursor CURSOR FOR SELECT routine_name , specific_name , routine_type FROM information_schema.routines WHERE specific_catalog = current_setting('videodb.catalog_name') AND routine_schema = 'public'; /* Declare a parameter cursor. */ parameter_cursor CURSOR (cv_specific_name varchar) FOR SELECT args.data_type FROM information_schema.parameters args WHERE args.specific_schema = 'public' AND args.specific_name = cv_specific_name; BEGIN /* Open the cursor. */ OPEN routine_cursor; <<row_loop>> LOOP /* Fetch table names. */ FETCH routine_cursor INTO row; /* Exit when no more records are found. */ EXIT row_loop WHEN NOT FOUND; /* Initialize parameter list. */ list := '('; /* Open the parameter cursor. */ OPEN parameter_cursor(row.specific_name::varchar); <<parameter_loop>> LOOP FETCH parameter_cursor INTO arg; /* Exit the parameter loop. */ EXIT parameter_loop WHEN NOT FOUND; /* Add parameter and delimit more than one parameter with a comma. */ IF LENGTH(list) > 1 THEN list := CONCAT(list,',',arg); ELSE list := CONCAT(list,arg); END IF; END LOOP; /* Close the parameter list. */ list := CONCAT(list,')'); /* Close the parameter cursor. */ CLOSE parameter_cursor; /* Concatenate together a DDL to drop the table with prejudice. */ sql := 'DROP '||row.routine_type||' IF EXISTS '||row.routine_name||list; /* Execute the DDL statement. */ EXECUTE sql; END LOOP; /* Close the routine_cursor. */ CLOSE routine_cursor; END; $$; |
Now, you possess the magic to automate cleaning up your schema when you combine this with my earlier post on dynamically dropping tables, sequences, and triggers.
DBeaver for PostgreSQL
I’m migrating my database classes from the Oracle database to the PostgreSQL database. Using the Oracle Express Edition has always required a virtualized image because students use Windows and Mac OS. Also, the university doesn’t like my use of a virtualized image. Virtualization imposes incremental cost on students to have high end laptops.
The available Docker images don’t typically support the Oracle Express Edition. That means there are licensing implications tied to Oracle.
As a committee, we decided to use PostgreSQL as our new database platform. We opted to use PostgreSQL over MySQL because it supports arrays and stored procedures. PostgreSQL PL/pgSQL are a solid implementation of stored procedures. While MySQL supports a stored procedure language, it effectively discourages using it.
We use sqlplus
, mysql
, or psql
Command-Line Interface (CLI) as the primary interface. That’s because CLI supports production-oriented deployment and DevOps practices. The Open Source DBeaver project provides a clean native GUI management console for PostgreSQL. DBeaver also supports other databases, which pgAdmin doesn’t.
You click the Download button from the DBeaver home page. It support Windows, Mac OS, and Linux distributions. I downloaded the Linux RPM package to my Downloads directory. You can run the RPM with the following syntax:
rpm -ivh dbeaver-ce-6.2.3-stable.x86_64.rpm |
It doesn’t check for dependencies. The installation is quick. You can verify that it’s installed with the following command:
which dbeaver |
It installs here:
/usr/bin/dbeaver |
If you failed to install the PostgreSQL JAR file, you’ll see the following error message:
It will look like the following on Linux:
You should have a user in the pg_hba.conf
file, like the following:
host all all 127.0.0.1/32 trust |
As always, I hope this helps those solving problems.
Java and Postgres
I wanted to get Java working with PostgreSQL to test some GUI interfaces on Linux. Figuring out the necessary JAR file for the JDBC was my first hurdle. I found it was postgreSQL-42-2.5.jar
file.
You can download it with the following command line:
wget https://jdbc.postgresql.org/download/postgresql-42.2.5.jar |
I downloaded it to a Java directory off the home/student directory. Then, I added the following CLASSPATH
to local java.env
environment file.
#!/usr/bin/bash # Set the Java CLASSPATH environment variable to include the JDBC jar file. export set CLASSPATH=/home/student/Java/postgresql-42.2.5.jar:. |
I sourced the postgresql-42.2.5.jar
file and I wrote the following JavaTest.java
program:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 | /* Import classes. */ import java.sql.Connection; import java.sql.DriverManager; import java.sql.ResultSet; import java.sql.SQLException; import java.sql.Statement; import java.util.logging.Level; import java.util.logging.Logger; public class JavaTest { public static void main(String[] args) { /* Set default strings for the connection. */ String url = "jdbc:postgresql://localhost:5432/videodb"; String user = "student"; String password = "student"; /* Try the connection and statement. */ try { Connection conn = DriverManager.getConnection(url, user, password); Statement st = conn.createStatement(); ResultSet rs = st.executeQuery("SELECT VERSION()"); if (rs.next()) { System.out.println(rs.getString(1)); } /* Close the result set and statement. */ rs.close(); st.close(); } catch (SQLException e) { Logger logger = Logger.getLogger(JavaTest.class.getName()); logger.log(Level.SEVERE, e.getMessage(), e); } } } |
I compiled the JavaTest.java
program and tested it. It failed with the following error:
FATAL: Ident authentication failed for user - Unable to connect to PostgreSQL |
The failure occurred because I hadn’t allowed the connection in PostgreSQL’s pg_hba.conf
file. I changed the following line in my pg_hba.conf
file:
host all all 127.0.0.1/32 ident |
to
host all all 127.0.0.1/32 trust |
Then, I restarted the postgresql-11
service, like this:
systemctl restart postgresql-11.service |
The JavaTest
program ran successfully and returned:
PostgreSQL 11.5 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 9.1.1 20190503 (Red Hat 9.1.1-1), 64-bit |
You can extend the logic to output a comma-separated value file by leveraging the JDBC help page, like this:
/* Import classes. */ import java.sql.Connection; import java.sql.DriverManager; import java.sql.ResultSet; import java.sql.ResultSetMetaData; import java.sql.SQLException; import java.sql.Statement; import java.util.logging.Level; import java.util.logging.Logger; /* Create class withonly a static method for runtime testing. */ public class JavaReturnSet { public static void main(String[] args) { /* Set default strings for the connection. */ String url = "jdbc:postgresql://localhost:5432/videodb"; String user = "student"; String password = "student"; /* Try the connection and statement. */ try { /* Set connection, statement, and result set. */ Connection conn = DriverManager.getConnection(url, user, password); Statement st = conn.createStatement(); /* Use the + to concatenate lines for query clarity. */ ResultSet rs = st.executeQuery("SELECT m.account_number\n" + ", CONCAT(c.last_name,', ',c.first_name)\n" + "FROM member m JOIN contact c\n" + "ON m.member_id = c.member_id"); /* Get query metadata for subsequent management of results. */ ResultSetMetaData rsmd = rs.getMetaData(); String line; while (rs.next()) { /* Initialize the line output for each row. */ line = ""; /* Process the columns. */ for (int i = 0; i < rsmd.getColumnCount(); i++) { if (rsmd.getColumnType(i + 1) == 12) { line = line + "\"" + rs.getString(i + 1) + "\""; } else { line = line + rs.getInt(i + 1); } /* Put a comma between output columns. */ if (i < rsmd.getColumnCount() - 1) { line = line + ","; } } System.out.println(line); } /* Close the result set and statement. */ rs.close(); st.close(); } catch (SQLException e) { Logger logger = Logger.getLogger(JavaReturnSet.class.getName()); logger.log(Level.SEVERE, e.getMessage(), e); } } } |
As always, I hope this helps those looking for a clean solution.
Postgres Foreign Constraints
You can’t disable a foreign key constraint in Postgres, like you can do in Oracle. However, you can remove the foreign key constraint from a column and then re-add it to the column.
Here’s a quick test case in five steps:
- Drop the
big
andlittle
table if they exists. The firstdrop
statement requires a cascade because there is a dependentlittle
table that holds a foreign key constraint against the primary key column of thebig
table. The second drop statement does not require the cascade keyword because there is not a dependent foreign key constraint.DROP TABLE IF EXISTS big CASCADE; DROP TABLE IF EXISTS little;
- Create the
big
andlittle
tables:-- Create the big table. CREATE TABLE big ( big_id SERIAL , big_text VARCHAR(20) NOT NULL , CONSTRAINT pk_little_1 PRIMARY KEY (big_id)); -- Display the big table. \d big -- Create little table. CREATE TABLE little ( little_id SERIAL , big_id INTEGER NOT NULL , little_text VARCHAR(20) NOT NULL , CONSTRAINT fk_little_1 FOREIGN KEY (big_id) REFERENCES big (big_id)); -- Display the little table. \d little
If you failed to designate the
big_id
column as a primary key constrained, Postgres will raise the following exception:ERROR: there IS no UNIQUE CONSTRAINT matching given KEYS FOR referenced TABLE "big"
- Insert a non-compliant row in the
little
table. An insert statement into thelittle
table with a value for the foreign key column that does not exist in thebig_id
column of thebig
table would fail with the following error:ERROR: INSERT OR UPDATE ON TABLE "little" violates FOREIGN KEY CONSTRAINT "fk_little_1" DETAIL: KEY (big_id)=(2) IS NOT present IN TABLE "big".
Re-enabling the foreign key constraint, the insert statement succeeds after you first insert a new row into the
big
table with the foreign key value for thelittle
table as its primary key. The following two insert statements add a row to both thebig
andlittle
table:-- Insert into a big table. INSERT INTO big (big_text) VALUES ('Cat in the Hat 2'); -- Insert into a little table. INSERT INTO little (big_id ,little_text) VALUES ( 2 ,'Thing 3');
Then, you can query it like this:
SELECT * FROM big b JOIN little l ON b.big_id = l.big_id;
big_id | big_text | little_id | big_id | little_text --------+------------------+-----------+--------+------------- 1 | Cat IN the Hat 1 | 1 | 1 | Thing 1 1 | Cat IN the Hat 1 | 2 | 1 | Thing 2 2 | Cat IN the Hat 2 | 3 | 2 | Thing 3 (3 ROWS)
- You can drop a foreign key constraint with the following syntax:
ALTER TABLE little DROP CONSTRAINT fk_little_1;
- You can add a foreign key constraint with the following syntax:
ALTER TABLE little ADD CONSTRAINT fk_little_1 FOREIGN KEY (big_id) REFERENCES big (big_id);
As always, I hope this helps you solve problems.
Postgres Remove Constraints
You can’t disable a not null constraint in Postgres, like you can do in Oracle. However, you can remove the not null constraint from a column and then re-add it to the column.
Here’s a quick test case in four steps:
- Drop a demo table if it exists:
DROP TABLE IF EXISTS demo;
- Create a
demo
table if it exists:CREATE TABLE demo ( demo_id SERIAL , demo_text VARCHAR(20) NOT NULL );
- Insert a compliant row in the
demo
table if it exists:INSERT INTO demo (demo_text) VALUES ('Thing 1');
Attempt to insert another row with a null value in the
demo_text
column:INSERT INTO demo (demo_text) VALUES (NULL);
It raises the following error:
INSERT 0 1 psql:remove_not_null.sql:22: ERROR: NULL VALUE IN COLUMN "demo_text" violates not-NULL CONSTRAINT DETAIL: Failing ROW contains (2, NULL).
- You can drop the not null constraint from the
demo_text
column:ALTER TABLE demo ALTER COLUMN demo_text DROP NOT NULL;
You can now successfully insert a row with a
demo_text
column value of null. After you have performed your table maintenance you can add the not null constraint back on to thedemo_text
column.You need to update the row with a null value in the
demo_text
column with a valid value before you re-add the not null constraint. The following shows an update statement that replaces the null value with a text string:UPDATE demo SET demo_text = 'Thing 2' WHERE demo_text IS NULL;
Now, you can change the
demo_text
column back to a not null constrained column with the following syntax.ALTER TABLE demo ALTER COLUMN demo_text SET NOT NULL;
While you can not defer the constraint, removing it and adding it back works well.
Postgres Check Constraints
The Postgres 11 database documentation says that it supports naming constraints. While you can create a table with named constraints inside the CREATE TABLE
statement, the names are not assigned to the not null check
constraint.
Here’s a quick test case in three steps:
- Drop a demo table if it exists:
DROP TABLE IF EXISTS demo;
- Drop a demo table if it exists:
CREATE TABLE demo ( demo_id SERIAL , demo_text VARCHAR(20) CONSTRAINT nn_demo_1 NOT NULL );
- Create a
demo
table if it exists:SELECT substr(check_clause,1,strpos(check_clause,' ')-1) AS check_column , constraint_name FROM information_schema.check_constraints WHERE check_clause LIKE 'demo_text%';
You should see the following output with a parsed
check_column
name and the system generatedcheck
constraint name rather than thenn_demo_1
constraint name:check_column | constraint_name --------------+----------------------- demo_text | 2200_18896_2_not_null (1 row)
On the bright side, you can name primary key
and foreign key
constraints.
Python-Postgres Query
As I committed to a student, here are sample programs for writing a Python query against the Postgres 11 database. The first one returns rows or tuples. The latter formats the text returned as columns.
If you’re one of many looking for the key psycopg2
driver library, you can find it in most distro repositories as: python3-psycopg2
. You can use dnf
or yum
to install it separately or you can install pgadmin4
, which includes the psycopg2
library.
The first example returns the entire row from a new_hire
table with two rows:
import psycopg2 try: # Open a connection to the database. connection = psycopg2.connect( user="student" , password="student" , port="5432" , dbname="videodb") # Open a cursor. cursor = connection.cursor() # Assign a static query. query = "SELECT * FROM new_hire" # Parse and execute the query. cursor.execute(query) # Fetch all rows from a table. records = cursor.fetchall() # Read through and print the rows as tuples. for row in range(0, len(records)): print(records[row]) except (Exception, psycopg2.Error) as error : print("Error while fetching data from PostgreSQL", error) finally: # Close the database connection. if (connection): cursor.close() connection.close() |
The first example returns the rows as tuples, which is probably desired if you want to consume the result in another Python program. Here’s the output retrieved:
(1001, 'Malcolm', 'Jacob', 'Lewis', datetime.date(2018, 2, 14)) (1002, 'Henry', None, 'Chabot', datetime.date(1990, 7, 31)) |
The second one returns the rows and formats the columns into output for a csv
style file:
import psycopg2 try: # Open a connection to the database. connection = psycopg2.connect( user="student" , password="student" , port="5432" , dbname="videodb") # Open a cursor. cursor = connection.cursor() # Assign a static query. query = "SELECT * FROM new_hire" # Parse and execute the query. cursor.execute(query) # Read through and print the formatted columns of each row. for (new_hire_id, first_name, middle_name, last_name, hire_date) in cursor: if (isinstance(middle_name,type(None))): print("{},'{} {}','{:%d-%b-%Y}'".format(new_hire_id, first_name, last_name, hire_date)) else: print("{},'{} {} {}','{:%d-%b-%Y}'".format(new_hire_id, first_name, middle_name, last_name, hire_date)) except (Exception, psycopg2.Error) as error : print("Error while fetching data from PostgreSQL", error) finally: # Close the database connection. if (connection): cursor.close() connection.close() |
The second one returns the rows and formatted columns for a csv
style file:
1001,'Malcolm Jacob Lewis','14-Feb-2018' 1002,'Henry Chabot','31-Jul-1990' |
As always, I hope these help those looking for a starting place with Python and Postgres.
Cassandra on Fedora 30
The first thing to do with Fedora 30 is to check what part of Apache Cassandra is installed. You can use the following rpm
command to determine that:
rpm -qa | grep cassandra |
My Fedora 30 returned the following values:
cassandra-java-libs-3.11.1-12.fc30.x86_64 cassandra-python2-cqlshlib-3.11.1-12.fc30.x86_64 cassandra-3.11.1-12.fc30.x86_64 python2-cassandra-driver-3.18.0-1.fc30.x86_64 |
Notably missing from the list of rpm
list is the cassandra-server
package. You install cassandra-server with the def utility:
dnf install -y cassandra-server |
You should get an installation log like the following for the cassandra-server
package:
Display detailed console log →
Last metadata expiration check: 0:26:07 ago on Wed 11 Sep 2019 09:10:08 PM MDT. Package cassandra-3.11.1-12.fc30.x86_64 is already installed. Dependencies resolved. =========================================================================================================================== Package Architecture Version Repository Size =========================================================================================================================== Installing: cassandra-server x86_64 3.11.1-12.fc30 fedora 180 k Installing dependencies: sigar x86_64 1.6.5-0.20.git58097d9.fc27 fedora 76 k Transaction Summary =========================================================================================================================== Install 2 Packages Total download size: 255 k Installed size: 738 k Is this ok [y/N]: y Downloading Packages: (1/2): sigar-1.6.5-0.20.git58097d9.fc27.x86_64.rpm 131 kB/s | 76 kB 00:00 (2/2): cassandra-server-3.11.1-12.fc30.x86_64.rpm 233 kB/s | 180 kB 00:00 --------------------------------------------------------------------------------------------------------------------------- Total 116 kB/s | 255 kB 00:02 Running transaction check Transaction check succeeded. Running transaction test Transaction test succeeded. Running transaction Preparing : 1/1 Installing : sigar-1.6.5-0.20.git58097d9.fc27.x86_64 1/2 Running scriptlet: sigar-1.6.5-0.20.git58097d9.fc27.x86_64 1/2 Running scriptlet: cassandra-server-3.11.1-12.fc30.x86_64 2/2 Installing : cassandra-server-3.11.1-12.fc30.x86_64 2/2 Running scriptlet: cassandra-server-3.11.1-12.fc30.x86_64 2/2 Verifying : cassandra-server-3.11.1-12.fc30.x86_64 1/2 Verifying : sigar-1.6.5-0.20.git58097d9.fc27.x86_64 2/2 Installed: cassandra-server-3.11.1-12.fc30.x86_64 sigar-1.6.5-0.20.git58097d9.fc27.x86_64 Complete! |
Fedora Magazine has a great Get Started with Apache Cassandra on Fedora article on all the steps required to setup clusters. This article only covers creating and enabling the Cassandra service, and setting up a single node Cassandra instance.
You start Cassandra with the following command as the root
user:
systemctl start cassandra |
You enable Cassandra with the following command as the root user:
systemctl enable cassandra |
It creates the following symlink:
Created symlink /etc/systemd/system/multi-user.target.wants/cassandra.service → /usr/lib/systemd/system/cassandra.service. |
You can connect to the Test cluster with the following command:
cqlsh |
You should see the following:
Connected to Test Cluster at 127.0.0.1:9042. [cqlsh 5.0.1 | Cassandra 3.11.1 | CQL spec 3.4.4 | Native protocol v4] Use HELP for help. |
You can see the options by typing the help
command:
Documented shell commands: =========================== CAPTURE CLS COPY DESCRIBE EXPAND LOGIN SERIAL SOURCE UNICODE CLEAR CONSISTENCY DESC EXIT HELP PAGING SHOW TRACING CQL help topics: ================ AGGREGATES CREATE_KEYSPACE DROP_TRIGGER TEXT ALTER_KEYSPACE CREATE_MATERIALIZED_VIEW DROP_TYPE TIME ALTER_MATERIALIZED_VIEW CREATE_ROLE DROP_USER TIMESTAMP ALTER_TABLE CREATE_TABLE FUNCTIONS TRUNCATE ALTER_TYPE CREATE_TRIGGER GRANT TYPES ALTER_USER CREATE_TYPE INSERT UPDATE APPLY CREATE_USER INSERT_JSON USE ASCII DATE INT UUID BATCH DELETE JSON BEGIN DROP_AGGREGATE KEYWORDS BLOB DROP_COLUMNFAMILY LIST_PERMISSIONS BOOLEAN DROP_FUNCTION LIST_ROLES COUNTER DROP_INDEX LIST_USERS CREATE_AGGREGATE DROP_KEYSPACE PERMISSIONS CREATE_COLUMNFAMILY DROP_MATERIALIZED_VIEW REVOKE CREATE_FUNCTION DROP_ROLE SELECT CREATE_INDEX DROP_TABLE SELECT_JSON |
Here’s my script that creates Cassandra keyspace, which is more or less a database. You use the USE
command to connect to the keyspace or database, like you would in MySQL. You do not have sequences in Cassandra because they’re not a good fit for a distributed architecture. Cassandra does not support a native procedural extension like relational databases. You must create User-defined functions (UDFs) by embedding the logic in Java.
This script does the following:
- Creates a keyspace
- Uses the keyspace
- Conditionally drops tables and functions
- Creates two tables
- Inserts data into the two tables
- Queries data from the tables
I also included a call to a UDF inside a query in two of the examples. One of the queries demonstrates how to return a JSON structure from a query. To simplify things and provide clarification of the scripts behaviors, the details are outlined below.
- The first segment of the script creates the keyspace, changes the scope to use the keyspace, conditionally drop tables, create tables, and insert values into the tables:
/* Create a keyspace in Cassandra, which is like a database in MySQL or a schema in Oracle. */ CREATE KEYSPACE IF NOT EXISTS student WITH REPLICATION = { 'class':'SimpleStrategy' ,'replication_factor': 1 } AND DURABLE_WRITES = true; /* Use the keyspace or connect to the database. */ USE student; /* Drop the member table from the student keyspace. */ DROP TABLE IF EXISTS member; /* Create a member table in the student keyspace. */ CREATE TABLE member ( member_number VARCHAR , member_type VARCHAR , credit_card_number VARCHAR , credit_card_type VARCHAR , PRIMARY KEY ( member_number )); /* Conditionally drop the contact table from the student keyspace. */ DROP TABLE IF EXISTS contact; /* Create a contact table in the student keyspace. */ CREATE TABLE contact ( contact_number VARCHAR , contact_type VARCHAR , first_name VARCHAR , middle_name VARCHAR , last_name VARCHAR , member_number VARCHAR , PRIMARY KEY ( contact_number )); /* Insert a row into the member table. */ INSERT INTO member ( member_number, member_type, credit_card_number, credit_card_type ) VALUES ('SFO-12345','GROUP','2222-4444-5555-6666','VISA'); /* Insert a row into the contact table. */ INSERT INTO contact ( contact_number, contact_type, first_name, middle_name, last_name, member_number ) VALUES ('CUS_00001','FAMILY','Barry', NULL,'Allen','SFO-12345'); /* Insert a row into the contact table. */ INSERT INTO contact ( contact_number, contact_type, first_name, middle_name, last_name, member_number ) VALUES ('CUS_00002','FAMILY','Iris', NULL,'West-Allen','SFO-12345'); /* Insert a row into the member table. */ INSERT INTO member ( member_number, member_type, credit_card_number, credit_card_type ) VALUES ('SFO-12346','GROUP','3333-8888-9999-2222','VISA'); /* Insert a row into the contact table. */ INSERT INTO contact ( contact_number, contact_type, first_name, middle_name, last_name, member_number ) VALUES ('CUS_00003','FAMILY','Caitlin','Marie','Snow','SFO-12346');
The following queries the member table:
/* Select all columns from the member table. */ SELECT * FROM member;
It returns the following:
member_number | credit_card_number | credit_card_type | member_type ---------------+---------------------+------------------+------------- SFO-12345 | 2222-4444-5555-6666 | VISA | GROUP SFO-12346 | 3333-8888-9999-2222 | VISA | GROUP
- Create a
concatenate
User-defined function (UDF) for Cassandra. The first step requires you to edit thecassandra.yaml
file, which you find in the/etc/cassandra/default.conf
directory. There is a single parameter that you need to edit, and it is theenable_user_defined_functions
parameter. By default the parameter is set tofalse
, and you need to enable it to create UDFs.If you open the
cassandra.yaml
file as theroot
user, you should find the parameter on line 987, like:983 984 985 986 987
# If unset, all GC Pauses greater than gc_log_threshold_in_ms will log at # INFO level # UDFs (user defined functions) are disabled by default. # As of Cassandra 3.0 there is a sandbox in place that should prevent execution of evil code. enable_user_defined_functions: false
After you make the edit, the
cassandra.yaml
file should look like this:983 984 985 986 987
# If unset, all GC Pauses greater than gc_log_threshold_in_ms will log at # INFO level # UDFs (user defined functions) are disabled by default. # As of Cassandra 3.0 there is a sandbox in place that should prevent execution of evil code. enable_user_defined_functions: true
After you make the change, you can create your own UDF. The following UDF formats the first, middle, and last name so there’s only one whitespace between the first and last name when there middle name value is null.
This type of function must use a
CALLED ON NULL INPUT
clause in lieu of aRETURNS NULL ON NULL INPUT
clause. The latter would force the function to return a null value if any one of the parameters were null./* Drop the concatenate function because a replace disallows changing a RETURNS NULL ON NULL INPUT with a CALLED ON NULL INPUT without raising an "89: InvalidRequest" exception. */ DROP FUNCTION concatenate; /* Create a user-defined function to concatenate names. */ CREATE OR REPLACE FUNCTION concatenate (first_name VARCHAR, middle_name VARCHAR, last_name VARCHAR) CALLED ON NULL INPUT RETURNS VARCHAR LANGUAGE java AS $$ /* Concatenate first and last names when middle name is null, and first, middle, and last names when middle name is not null. */ String name; /* Check for null middle name. */ if (middle_name == null) { name = first_name + " " + last_name; } else { name = first_name + " " + middle_name + " " + last_name; } return name; $$;
Query the values from the contact
table with the UDF function in the SELECT
-list:
/* Query the contact information. */ SELECT member_number , contact_number , contact_type , concatenate(first_name, middle_name, last_name) AS full_name FROM contact; |
It returns the following:
member_number | contact_number | contact_type | full_name ---------------+----------------+--------------+-------------------- SFO-12345 | CUS_00001 | FAMILY | Barry Allen SFO-12345 | CUS_00002 | FAMILY | Iris West-Allen SFO-12346 | CUS_00003 | FAMILY | Caitlin Marie Snow |
Query the values from the contact
table with a JSON format:
/* Query the contact information and return in a JSON format. */ SELECT JSON contact_number , contact_type , concatenate(first_name, middle_name, last_name) AS full_name FROM contact; |
It returns the following:
[json] ------------------------------------------------------------------------------------------------- {"contact_number": "CUS_00001", "contact_type": "FAMILY", "full_name": "Barry Allen"} {"contact_number": "CUS_00002", "contact_type": "FAMILY", "full_name": "Iris West-Allen"} {"contact_number": "CUS_00003", "contact_type": "FAMILY", "full_name": "Caitlin Marie Snow"} |