Basic Programming | MacLochlainns Weblog

My students working in Linux would have a series of labs to negotiate and I’d have them log the activities of their Oracle SQL scripts. Many of them would suffer quite a bit because they didn’t know how to find the errors in the log files.

I wrote this SQL function for them to put in their .bashrc files. It searches all the .txt files for errors and organizes them by log file, line number, and descriptive error message.

errors () 
{ 
  label="File Name:Line Number:Error Code";
  list=`ls ./*.$1 | wc -l`;
  if [[ ${list} -eq 1 ]]; then
    echo ${label};
    echo "----------------------------------------";
    filename=`ls *.txt`;
    echo ${filename}:`find . -type f | grep -in *.txt -e ora\- -e pls\- -e sp2\-`;
  else
    if [[ ${list} -gt 1 ]]; then
      echo ${label};
      echo "----------------------------------------";
      find . -type f | grep --color=auto -in *.txt -e ora\- -e pls\- -e sp2\-;
    fi;
  fi
}

I hope it helps others now too.

While teaching a class on the Linux Command-Line (CLI), the book gave an example of generating a list of random US telephone numbers into a file. The book uses the RANDOM function to generate segments of the telephone number, and then the grep command to identify malformed telephone numbers.

My students wanted me to explain why the numbers were malformed. I had to explain that the RANDOM function returns a random number between 1 and 99,999. The RANDOM function may return a 1 to 5 digit random number, which means you may get a 1-digit or 2-digit number when you request a 3-digit random number or a 1- to 3-digit number when you request a 4-digit random number.

The author’s example is:

for i in {1..10}; do
  echo "(${RANDOM:0:3}) ${RANDOM:0:3}-${RANDOM:0:4}" >> list.txt
done

They asked if there was a way to write a shell script that guaranteed random but well-formed US telephone numbers. I said yes, however, you need to write a recursive bash shell function and assign the result to a global variable set in the shell script.

They seemed doubtful, so I wrote it for them. Here’s the script if you’re interested in learning more about bash shell scripting. While I implemented it with an bash array, that’s optional.

#!/usr/bin/bash
 
# ============================================================
#  Name:   telephone.sh
#  Author: Michael McLaughlin
#  Date:   05-May-2020
# ------------------------------------------------------------
#  Purpose: Demonstrate how to generate random telehpone
#           numbers. The RANDOM function returns a random
#           number between 1 and 99999; and while you can
#           easily shave off a extra digit guarnteeing a
#           value above 100 is impossible without logic.
# ============================================================
 
targetLength()
{
  # Declare variable in function-level scope.
  randomString=''
 
  # Check the number of parameters to process.
  if [[ ${#} = 2 ]]; then
    # Assign value to function-level and local variables.
    randomString=${1}
    formatLength=${2}
 
    # Get the length of the telephone number as integer.
    length=`echo -n ${randomString} | wc -c`
 
    # Calculate any shortfall.
    short=$((${formatLength}-${length}))
 
    # Check if the telephone number is too short.
    if [[ ${short} > 0 ]]; then
      randomString=`echo "${randomString}${RANDOM:0:${short}}"`
    fi
  fi
 
  # Check if the combination of random numbers equals the target length
  # and assign the value to the global variable, or repeat processing
  # by making a recursive function call.
  if [[ `echo -n ${randomString} | wc -c` = ${formatLength} ]]; then
    result=${randomString}
  else
    targetLength ${randomString} ${formatLength}
  fi
}
 
# Declare global variable to support targetLength().
result=''
 
# Declare an array of strings.
declare -A telephone_parts
 
# Generate one hundred random telephone numbers.
for i in {1..100}; do
  # Create random three digit area code.
  targetLength ${RANDOM:0:3} 3
  telephone_parts[1]=${result}
 
  # Create random three digit prefix code.
  targetLength ${RANDOM:0:3} 3
  telephone_parts[2]=${result}
 
  # Create random four digit number code.
  targetLength ${RANDOM:0:4} 4
  telephone_parts[3]=${result}
 
  # Print the telephone numbers.
  echo "[${i}] (${telephone_parts[1]}) ${telephone_parts[2]}-${telephone_parts[3]}"
done

#!/usr/bin/bash # ============================================================ # Name: telephone.sh # Author: Michael McLaughlin # Date: 05-May-2020 # ------------------------------------------------------------ # Purpose: Demonstrate how to generate random telehpone # numbers. The RANDOM function returns a random # number between 1 and 99999; and while you can # easily shave off a extra digit guarnteeing a # value above 100 is impossible without logic. # ============================================================ targetLength() { # Declare variable in function-level scope. randomString='' # Check the number of parameters to process. if [[ ${#} = 2 ]]; then # Assign value to function-level and local variables. randomString=${1} formatLength=${2} # Get the length of the telephone number as integer. length=`echo -n ${randomString} | wc -c` # Calculate any shortfall. short=$((${formatLength}-${length})) # Check if the telephone number is too short. if [[ ${short} > 0 ]]; then randomString=`echo "${randomString}${RANDOM:0:${short}}"` fi fi # Check if the combination of random numbers equals the target length # and assign the value to the global variable, or repeat processing # by making a recursive function call. if [[ `echo -n ${randomString} | wc -c` = ${formatLength} ]]; then result=${randomString} else targetLength ${randomString} ${formatLength} fi } # Declare global variable to support targetLength(). result='' # Declare an array of strings. declare -A telephone_parts # Generate one hundred random telephone numbers. for i in {1..100}; do # Create random three digit area code. targetLength ${RANDOM:0:3} 3 telephone_parts[1]=${result} # Create random three digit prefix code. targetLength ${RANDOM:0:3} 3 telephone_parts[2]=${result} # Create random four digit number code. targetLength ${RANDOM:0:4} 4 telephone_parts[3]=${result} # Print the telephone numbers. echo "[${i}] (${telephone_parts[1]}) ${telephone_parts[2]}-${telephone_parts[3]}" done

For reference, a recursive function call isn’t required here. It could be done more effectively with the following while loop:

targetLength()
{
  # Declare variable in function-level scope.
  randomString=''
  short=1
 
  # Check the number of parameters to process.
  if [[ ${#} = 2 ]]; then
    # Assign value to function-level and local variables.
    randomString=${1}
    formatLength=${2}
 
    # Check if the telephone number is too short.
    while [[ ${short} > 0 ]]; do
      # Get the length of the telephone number as integer.
      length=`echo -n ${randomString} | wc -c`
 
      # Calculate any shortfall.
      short=$((${formatLength}-${length}))
 
      # Assign new value to randomString.
      randomString=`echo "${randomString}${RANDOM:0:${short}}"`
    done
 
    # Assign randomString to global result variable.
    result=${randomString}
  fi
}

As always, I hope this helps those you want to learn or solve a problem.

It’s possible folks didn’t notice but Mac OS X no longer includes XQuartz by default from Maverick forward. You need to download XQuartz and install it. I’d recommend after you install Xcode.

Launch XQuartz and then either use the bash shell it opens or open a Terminal bash shell session. Inside the shell, you might start Secure Shell (ssh) like this:

Mac-Pro-3:~ michaelmclaughlin$ ssh student@192.168.2.170
student@192.168.2.170's password: 
Last login: Thu Jun  4 14:33:37 2015
[student@localhost ~]$ xclock &
[1] 10422
[student@localhost ~]$ Error: Can't open display:

Granted that’s a trivial error and running the xclock X11 applications isn’t crucial, an error that makes it more important is the following from Oracle’s old Designer/2000 application:

FRM-91111: Internal Error: window system startup failure.
FRM-10039: Unable to start up the Form Builder.

This is the desired behavior. Secure shell (ssh) can’t run it unless you make the connection with the -Y flag. You should use the following syntax:

Mac-Pro-3:~ michaelmclaughlin$ ssh -Y student@192.168.2.170
student@192.168.2.170's password: 
Last login: Tue Jun  9 14:56:55 2015 from 192.168.2.1
/usr/bin/xauth:  file /home/student/.Xauthority does not exist
[student@localhost ~]$ xclock &
[1] 10760

You can safely ignore the .Xauthority does not exist warning message because it’ll create a .Xauthority file and store the magic cookie after the warning message. You should see the xclock program running in the upper left hand corner of your console, like:

It’s terrific that you don’t get a font warning like you typically would using UTF-8 on Linux. Nice that the Mac OS fonts are so well done that there isn’t a raised exception.

Using xclock or xeyes isn’t very useful as a rule, but this method also lets you run any of the Linux GUI applications. For example, the following gedit command lets you run the gedit utility from a Mac OS console. If you’ve installed the gedit plug-ins, you also can use the Terminal console on the remote system.

The process sequence for the command-line is shown below:

1030     1  /usr/sbin/sshd -D     - The root process launches the ssh daemon
3145  1030  sshd: student [priv]  - The sshd launches a ssh session to manage a student ssh session
3152  3145  sshd: student@pts/1   - The ssh session launched to manage the ssh session
3166  3152  -bash                 - The bash shell launched by connecting through the ssh session
3240  3166  gedit                 - The gedit command issued inside a ssh session
3166  3240  gnome-pty-helper      - Launching the gedit session across X11 
3169  3240  /bin/bash             - Launching the Terminal session inside the gedit session across X11
3269  3884  ps -ef                - Command run inside the gedit Terminal session

Hope that helps those who want to use X11 applications on the Mac OS.

Student questions are always interesting! They get me to think and to write. The question this time is: “How do I write a Bash Shell script to process multiple MySQL script files?” This post builds the following model (courtesy of MySQL Workbench) by using a bash shell script and MySQL script files, but there’s a disclaimer on this post. It shows both insecure and secure approaches and you should avoid the insecure ones.

It seems a quick refresher on how to use arrays in bash shell may be helpful. While it’s essential in a Linux environment, it’s seems not everyone masters the bash shell.

Especially, since I checked my Learning the Bash Shell (2^nd Edition) and found a typo on how you handle arrays in the bash shell, and it’s a mistake that could hang newbies up (on page 161). Perhaps I should update my copy because I bought it in 1998. 😉 It was good then, and the new edition is probably better. The error is probably corrected in the current Learning the Bash Shell, but if not, the following examples show you how to use arrays in loops.

Naturally, these do presume some knowledge of working with bash shell, like the first line always is the same in any bash shell script. That you open an if-statement with an if and close it with a fi, and that you else-if is elif; and that a semicolon between a for-statement and the do statement is required when they’re on the same line because they’re two statements.

If you’re new to bash shell arrays, click on the link below to expand a brief tutorial. It takes you through three progressive examples of working with bash arrays.

Working with bash Arrays ↓

A basic example of working with an array in bash shell is the following list1.sh script:

#!/usr/bin/bash
 
# Print script name.
echo $0
#!/usr/bin/bash
 
# Define an array.
declare -a cmd=("one" "two" "three")
 
# Call the array elements.
for i in ${cmd[*]}; do
  echo ${i}
done

Line 8 declares the cmd array by assigning three strings. Line 12 returns the elements of the array to the ${i} variable, which lets you manage them one at a time. You use the chmod command to make the list1.sh executable, like this:

chmod 755 list1.sh

Then, you can run it like this from the present working directory (pwd):

./list1.sh

It should print:

one
two
three

The list2.sh example changes the cmd array declaration from list1.sh. It declares the cmd array as an empty array, and then it assigns elements by index numbers (using a zero-based index), as shown below:

#!/usr/bin/bash
 
# Define an array.
declare -a cmd
 
# Assign elements to an array.
cmd[0]="one"
cmd[1]="two"
cmd[2]="three"
 
# Call the array elements.
for i in ${cmd[*]}; do
  echo ${i}
done

Lines 7 through 9 assign values to the elements of the cmd array. You would chmod the file, and run the file as qualified above for the list1.sh script.

The last pre-implementation example requires that you create three demonstration scripts, the one.sh, two.sh, and three.sh scripts. You should put them in the same directory as the list3.sh script.

The demonstration scripts should all have the same code, like this:

#!/usr/bin/bash
 
# Print script name.
echo $0

Line 4 returns command line parameter $0 or ${0}, which is always the command line program’s file name. The file name may be provided as a relative or absolute file name, and if that’s new to you please check out The Linux Command Line: A Complete Introduction (also downloadable as a PDF for free).

The list3.sh script should contain the following:

#!/usr/bin/bash
<code>
# Define an array.
declare -a cmd
 
# Assign elements to an array.
cmd[0]="one.sh"
cmd[1]="two.sh"
cmd[2]="three.sh"
 
# Call the array elements.
for i in ${cmd[*]}; do
  `pwd`/${i}
done

When you run the list3.sh script from the /home/student/Code/bash directory with a local syntax, the script should return the fully qualified file names of the subshell programs. The output should look like this:

/home/student/Code/bash/one.sh
/home/student/Code/bash/two.sh
/home/student/Code/bash/three.sh

The list3.sh script provides the present working directory (pwd) and the one.sh, two.sh, and three.sh scripts return only their executable name. For example, if you ran one.sh with the following syntax:

./one.sh

It returns

./one.sh

Only one more trick needs to be qualified before our main MySQL examples. That trick is how you pass parameters to a bash shell script. For reference, this is the part that’s insecure because user command histories are available inside the Linux OS.

Here’s a hello_whom.sh script to demonstrates the concept of parameter passing:

#!/usr/bin/bash
 
# This says hello to the argument while managing no argument.
if [[ ${#} = 1 ]]; then
  echo 'The '${0}' program says: "Hello '${1}'!"'
elif [[ ${#} > 1 ]]; then
  echo 'The '${0}' program wants to know if you have more than one name?'
else
  echo 'The '${0}' program wants to know if you have a name?'
fi

If you need more on how parameters are passed and managed, you can check a prior blob post on Handling bash Parameters, or check the bash help pages. The following leverages bash arrays to run scripts and query the MySQL database from the command line.

You will need the three batch SQL files first, so here they are:

Setup SQL Files ↓

The actor.sql file:

-- Use the sampledb database.
USE sampledb;
 
-- Disable foreign key checking.
SET foreign_key_checks = 0;
 
-- Drop an actor table. 
DROP TABLE IF EXISTS actor;
 
-- Create an actor table.
CREATE TABLE actor
( actor_id    INT UNSIGNED PRIMARY KEY AUTO_INCREMENT
, actor_name  VARCHAR(30)  NOT NULL );
 
-- Insert two rows.
INSERT INTO actor (actor_name) VALUES ('Chris Hemsworth');
INSERT INTO actor (actor_name) VALUES ('Chris Pine');
INSERT INTO actor (actor_name) VALUES ('Chris Pratt');

The film.sql file:

-- Use the sampledb database.
USE sampledb;
 
-- Disable foreign key checking.
SET foreign_key_checks = 0;
 
-- Drop a film table. 
DROP TABLE IF EXISTS film;
 
-- Create a film table.
CREATE TABLE film
( film_id    INT UNSIGNED PRIMARY KEY AUTO_INCREMENT
, film_name  VARCHAR(30)  NOT NULL );
 
-- Insert rows.
INSERT INTO film (film_name) VALUES ('Thor');
INSERT INTO film (film_name) VALUES ('Thor: The Dark World');
INSERT INTO film (film_name) VALUES ('Star Trek');
INSERT INTO film (film_name) VALUES ('Star Trek into Darkness');
INSERT INTO film (film_name) VALUES ('Guardians of the Galaxy');

The movie.sql file:

-- Use the sampledb database.
USE sampledb;
 
-- Disable foreign key checking.
SET foreign_key_checks = 0;
 
-- Drop an movie table. 
DROP TABLE IF EXISTS movie;
 
-- Create an movie table.
CREATE TABLE movie
( movie_id   int unsigned PRIMARY KEY AUTO_INCREMENT
, actor_id   int unsigned NOT NULL
, film_id    int unsigned NOT NULL
, CONSTRAINT actor_fk FOREIGN KEY (actor_id)
  REFERENCES actor (actor_id)
, CONSTRAINT film_fk  FOREIGN KEY (film_id)
  REFERENCES film(film_id));
 
-- Use scalar subqueries to discover surrogate keys by using the faux natural key.
INSERT INTO movie
( actor_id
, film_id )
VALUES
((SELECT   actor_id
  FROM     actor
  WHERE    actor_name = 'Chris Hemsworth')
,(SELECT   film_id
  FROM     film
  WHERE    film_name = 'Thor'));
 
-- Use scalar subqueries to discover surrogate keys by using the faux natural key.
INSERT INTO movie
( actor_id
, film_id )
VALUES
((SELECT   actor_id
  FROM     actor
  WHERE    actor_name = 'Chris Hemsworth')
,(SELECT   film_id
  FROM     film
  WHERE    film_name = 'Thor: The Dark World'));
 
-- Use scalar subqueries to discover surrogate keys by using the faux natural key.
INSERT INTO movie
( actor_id
, film_id )
VALUES
((SELECT   actor_id
  FROM     actor
  WHERE    actor_name = 'Chris Pine')
,(SELECT   film_id
  FROM     film
  WHERE    film_name = 'Star Trek'));
 
-- Use scalar subqueries to discover surrogate keys by using the faux natural key.
INSERT INTO movie
( actor_id
, film_id )
VALUES
((SELECT   actor_id
  FROM     actor
  WHERE    actor_name = 'Chris Pine')
,(SELECT   film_id
  FROM     film
  WHERE    film_name = 'Star Trek into Darkness'));
 
-- Use scalar subqueries to discover surrogate keys by using the faux natural key.
INSERT INTO movie
( actor_id
, film_id )
VALUES
((SELECT   actor_id
  FROM     actor
  WHERE    actor_name = 'Chris Pratt')
,(SELECT   film_id
  FROM     film
  WHERE    film_name = 'Guardians of the Galaxy'));

The following list_mysql.sh shell script expects to receive the username, password, database and fully qualified path in that specific order. The script names are entered manually because this should be a unit test script. Naturally, you can extend the script to manage those parameters but as mentioned I see this type of solution as a developer machine only script to simplify unit testing. Anything beyond that is risky!

#!/usr/bin/bash
 
# Assign user and password
username="${1}"
password="${2}"
database="${3}"
directory="${4}"
 
# List the parameter values passed.
echo "Username:  " ${username}
echo "Password:  " ${password}
echo "Database:  " ${database}
echo "Directory: " ${directory}
echo ""
 
# Define an array.
declare -a cmd
 
# Assign elements to an array.
cmd[0]="actor.sql"
cmd[1]="film.sql"
cmd[2]="movie.sql"
 
# Call the array elements.
for i in ${cmd[*]}; do
  mysql -s -u${username} -p${password} -D${database} < ${directory}/${i} > /dev/null 2>/dev/null
done
 
# Connect and pipe the query result minus errors and warnings to the while loop.
mysql -u${username} -p${password} -D${database} <<<'show tables' 2>/dev/null |
 
# Read through the piped result until it's empty but format the title.
while IFS='\n' read list; do
  if [[ ${list} = "Tables_in_sampledb" ]]; then
    echo $list
    echo "----------------------------------------"
  else
    echo $list
  fi
done
echo ""
 
# Connect and pipe the query result minus errors and warnings to the while loop.
mysql -u${username} -p${password} -D${database} <<<'SELECT CONCAT(a.actor_name," in ",f.film_name) AS "Actors in Films" FROM actor a INNER JOIN movie m ON a.actor_id = m.actor_id INNER JOIN film f ON m.film_id = f.film_id' 2>/dev/null |
 
# Read through the piped result until it's empty but format the title.
while IFS='\n' read actor_name; do
  if [[ ${actor_name} = "Actors in Films" ]]; then
    echo $actor_name
    echo "----------------------------------------"
  else
    echo $actor_name
  fi
done

The IFS (Internal Field Separator) works with whitespace by default. The IFS on lines 33 and 47 sets the IFS to a line return ('\n'). That’s the trick to display the data, and you can read more about the IFS in this question and answer post.

You can run this script with the following input parameters from the local directory where you deploy it. The a parameters are: (1) username, (2) password, (3) database, and (4) a fully qualified path to the SQL setup files.

./list_mysql.sh student student sampledb "/home/student/Code/bash/mysql"

With valid input values, the list_mysql.sh bash script generates the following output, which confirms inputs and verifies actions taken by the scripts with queries:

Username:   student
Password:   student
Database:   sampledb
Directory:  /home/student/Code/bash/mysql
 
Tables_in_sampledb
----------------------------------------
actor
film
movie
 
Actors in Films
----------------------------------------
Chris Hemsworth in Thor
Chris Hemsworth in Thor: The Dark World
Chris Pine in Star Trek
Chris Pine in Star Trek into Darkness
Chris Pine in Guardians of the Galaxy

If you forgot to provide the required inputs to the list_mysql.sh bash script, it alternatively returns the following output:

Username:  
Password:  
Database:  
Directory: 
 
./list_mysql.sh: line 25: /actor.sql: No such file or directory
./list_mysql.sh: line 25: /film.sql: No such file or directory
./list_mysql.sh: line 25: /movie.sql: No such file or directory

The secure way removes the password at a minimum! The refactored program will require you to manually enter the password for all elements of the array (three in this sample), and twice for the two queries. Here’s the refactored code:

#!/usr/bin/bash
 
# Assign user and password
username="${1}"
database="${2}"
directory="${3}"
 
# List the parameter values passed.
echo "Username:  " ${username}
echo "Database:  " ${database}
echo "Directory: " ${directory}
echo ""
 
# Define an array.
declare -a cmd
 
# Assign elements to an array.
cmd[0]="actor.sql"
cmd[1]="film.sql"
cmd[2]="movie.sql"
 
# Call the array elements.
for i in ${cmd[*]}; do
  mysql -s -u${username} -p -D${database} < ${directory}/${i} > /dev/null 2>/dev/null
done
 
# Connect and pipe the query result minus errors and warnings to the while loop.
mysql -u${username} -p -D${database} <<<'show tables' 2>/dev/null |
 
# Read through the piped result until it's empty.
while IFS='\n' read list; do
  if [[ ${list} = "Tables_in_sampledb" ]]; then
    echo $list
    echo "----------------------------------------"
  else
    echo $list
  fi
done
echo ""
 
# Connect and pipe the query result minus errors and warnings to the while loop.
mysql -u${username} -p -D${database} <<<'SELECT CONCAT(a.actor_name," in ",f.film_name) AS "Actors in Films" FROM actor a INNER JOIN movie m ON a.actor_id = m.actor_id INNER JOIN film f ON m.film_id = f.film_id' 2>/dev/null |
 
# Read through the piped result until it's empty.
while IFS='\n' read actor_name; do
  if [[ ${actor_name} = "Actors in Films" ]]; then
    echo $actor_name
    echo "----------------------------------------"
  else
    echo $actor_name
  fi
done

Please let me know if you think there should be any more scaffolding for newbies in this post. As always, I hope this helps those looking for this type of solution.

From time to time, folks ask questions about how to solve common problems in Linux or Unix. Today, the question is: “How do I find a list of files that contain a specific string?” There are two alternatives with the find command, and the following sample searches look for files that contain a sqlite3 string literal.

Search for only the file names:

find . -type f | xargs grep -li sqlite3

Or, the more verbose:

find . -type f -exec grep -li sqlite3 /dev/null {} +

Search for the file names and text line:

find . -type f | xargs grep -i sqlite3

Or, the more verbose:

find . -type f -exec grep -i sqlite3 /dev/null {} +

Don’t exclude the /dev/null from the verbose syntax or you’ll get the things you lack permissions to inspect or that raise other errors. I don’t post a lot of Linux or Unix tips and techniques, and you may find this site more useful to answer these types of questions:

Unix & Linux Stack Exchange web site

As always, I hope this helps those you land on the blog page.

I put together a post on how to upload to MySQL from a CSV file with dates. It was more or less for my students but one of them was curious how the mega formula worked. As commented, the easier solution is to create a custom format. Oddly, Open Office does support the MySQL default format natively.

Excel doesn’t support the native MySQL date format as a default format mask, which is YYYY-MM-DD, or 2009-06-02 for June 2, 2009. That means you have to convert it from a scalar date to a string or create a custom format mask (see Dmitri’s comment below). If you just shook your head at the term scalar date, maybe a custom format mask is best. However, if you want a programming solution let me explain that Excel supports only three data types. They’re a string literal, a numeric literal, and a formula. Dates in Excel are merely formatted numbers. When the numbers are integers, the date is a date, but when the number has a fractional component, the date is really a timestamp.

Here’s a brief description of the process required to convert a date in Excel into a MySQL date format string literal in a CSV file. You need the following Excel functions:

Date Functions

The DAY(date) function returns a 1 or 2 digit numeric value for the day of the month, with ranges of 1 to 28, 1 to 29, 1 to 30, or 1 to 31 dependent on the month and year.
The MONTH(date) function returns a 1 or 2 digit numeric value for the month of the year.
The YEAR(date) function returns a 4 digit numeric value for the year.

Logical Functions

The IF(logical_expression,truth_action,false_action) function returns the truth action when the expression is true, and the false action when the expression isn’t true.

MySQL Server

CONCATENATE(string_1, string_2, ...) glues strings together.
LEN(numeric_value) function returns the length of a string or number.

MySQL requires that you return an eight character string of numbers. The first four numbers must be a valid year, the fifth and sixth numbers a valid month, and the seventh and eigth numbers a valid day in the context of the year and month provided. Unfortunately, the DAY() and MONTH() functions may return a 1 or 2 digit value. That can’t happen in the CSV file’s string for a date, so you use the IF() and LEN() functions to guarantee a 2 digit return value.

Here are the examples that guarantee 2 digit day and month values, assuming that the base date is in the A1 cell. The concatenation of a "0" (zero between two double quotes) or the "" (two double quotes or a string null) ensures the number data types become strings.

=IF(LEN(DAY(A1))=1,CONCATENATE("0",DAY(A1)),DAY(A1))
=IF(LEN(MONTH(A1))=1,CONCATENATE("0",MONTH(A1)),MONTH(A1))

A zero is placed before the day or month when the logical condition is met, which means the day or month value is a single digit string. A null is place before the day or month when the logical condition isn’t met, which means the day or month value is a two digit string. There’s only one problem with these mega functions. They return a number.

The year calculation doesn’t require the explicit casting when you concatenate it with the other strings because it is implicitly cast as a string. However, it’s a better practice to include it for clarity (most folks don’t know about the implicit casting behaviors in Excel).

=CONCATENATE(YEAR(A1),"-",IF(LEN(MONTH(A1))=1,CONCATENATE("0",MONTH(A1)),MONTH(A1)),"-",IF(LEN(DAY(A1)) = 1,CONCATENATE("0",DAY(A1)),DAY(A1)))

As Goodwin reported in a comment, there’s an easier way that I missed. You can simply use the TEXT function when the source column is a valid serialized date value.

=TEXT(A1,"YYYYMMDD")

You can see the full MySQL import from CSV in the previous post. Naturally, you may want to copy and paste special the value before creating the CSV file. Also, don’t forget to delete any unused columns to the right or rows beneath because if you don’t your file won’t map to your table definition.

Parsing strings isn’t the easiest thing to do in Excel 2007 or Excel 2008 but it is an important thing to know how to do. You’ll learn how to parse a set names into first, middle and last names. You can find the data set for these examples at the bottom of the blog post.

Parsing the left substring ↓

This shows you how to parse a left substring from a text cell in Microsoft Excel.

There are two built in functions that let you parse dynamic substrings from the left of a string. They are the LEFT and FIND functions. The LEFT function actually does the parsing but the FIND function lets you dynamically find a delimiting character, like a space.

Assuming cell A1 holds the value of Joseph F. Smith, you can parse Joseph by using a static value of seven for the first white space in the string. This works because each character maps to a string, and strings start with the number one. The LEFT function supports three formal parameters, the first is the string you’re parsing, the second is where to stop, and the third is where to start. The position you start at is assumed to be one, which makes the last parameter an optional parameter. You only provide the start with parameter when you want to start someplace other than the left most position.

This is the simplest use of the LEFT function with its two required parameters.

=LEFT(A1,7)

It returns the substring Joseph from the string Joseph F. Smith, which is stored in cell A1.

The simplest solution merely illustrates a concept. You’re not going to do this unless you have a bit more data. Naturally, the list of first names have different lengths in the real world. You must include the FIND function inside the call to the LEFT function to make your function dynamic. The FIND function lets you capture the position of a white space delimiter.

The FIND function supports three formal parameters, the first is the substring you’re looking for in the string, the second is the string to search, and the third is where to start. Like the LEFT function, the position you start at is assumed to be one, which makes the last parameter an optional parameter. You only provide the start with parameter when you want to start someplace other than the left most position. While not required, you should consider always providing the start with parameter to the FIND function.

The following shows you how to use the FIND function to locate the first occurrence of a white space in a string stored in cell A1.

=FIND(" ",A1,1)

You can now make you’re LEFT function dynamic by putting the FIND function inside it. As qualified, the FIND function looks for the first occurrence of a white space, and returns it into the second parameter of the LEFT function. The sample formula now has a nested function, which makes it a mega formula in Microsoft Excel parlance. The following sample also includes the optional start with parameter for clarity.

=LEFT(A1,FIND(" ",A1,1),1)

The problem with the foregoing solution occurs when a name in the list doesn’t have a middle initial or name, or last name. The lack of a second name means that there won’t be any white space between to substrings in the base string. This situation causes the logic to fail because the FIND function returns a #VALUE! error when it can’t find a white space in the string.

You can prevent the error by wrapping the nested FIND function and LEFT function inside two IFERROR functions. This IFERROR function returns the positional value of the FIND function, or an alternative copy of the base string.

=IFERROR(FIND(" ",A1,1),A1)

This type of logic inside the LEFT function causes the LEFT function to fail when the base string is returned to it instead of a position number. You must wrap the LEFT function inside another IFERROR function to guarantee that you don’t throw an error. This also lets you return the base string as the valid substring when appropriate.

=IFERROR(LEFT(A1,IFERROR(FIND(" ",A1),A1)-1),A1)

Assuming you put this formula in cell C1, you could copy this relative reference formula down in a column. It would require that you had a list of full names that followed a first name, white space, middle initial or name, white space, and last name in the A column.

Parsing the right substring ↓

This shows you how to parse a right substring from a text cell in Microsoft Excel.

There are three built in functions that let you parse dynamic substrings from the right of strings. They are the RIGHT and FIND functions that you may have covered when reading how to parse from the left. The LEN function is the other function, and it lets you find the length of a string. Together these functions lets you find the length of a substring on the right.

The RIGHT function actually does the parsing but the FIND and LEN functions let you dynamically find where to cut a substring out of a base string. This example continues to use the string Joseph F. Smith.

The RIGHT function has only two required parameters. The first parameter is the string that you’re parsing. The second parameter is the length of the substring. If you inspect the string, Smith is only five characters long. A static call to the RIGHT function is shown below.

=RIGHT(A1,5)

It returns the substring Smith from the string Joseph F. Smith, which is stored in cell A1.

Like the LEFT function example, this static approach to parsing merely illustrates a concept. You’re not going to do this with real data because the list of first names have different lengths in the real world. You must include the FIND function twice inside your call to the RIGHT function because you’re parsing the string based on the second occurrence of a white space in the string. While the FIND function lets you dynamically capture the position of the white space delimiter, a nested FIND function lets you capture the correct start with parameter value. That value is one position after the first occurrence of a white space.

The FIND function supports three formal parameters, the first is the substring you’re looking for in the string, the second is the string to search, and the third is where to start. Like the LEFT function, the position you start at is assumed to be one, which makes the last parameter an optional parameter. You only provide the start with parameter when you want to start someplace other than the left most position. While not required, you should consider always providing the start with parameter to the FIND function. It improves readability.

The following shows how to use a FIND function to locate the position one beyond where the first occurrence of a white space is found in a string.

=FIND(" ",A1,1)+1

If you pass the preceding FIND function call as the start with parameter to another FIND function, you can locate the second occurrence of a white space in a string stored in cell A1. Nesting function calls inside functions creates what are known as mega formulas in Excel. The following demonstrates a mega formula to find the second instance of a white space in a string.

=FIND(" ",A1,FIND(" ",A1,1)+1)

This returns the value of ten. You now know where to start but not the length of the substring on the right. You first need to find the length of the total string. You use the LEN function to find that, like the following.

=LEN(A1)

The LEN function returns fifteen. You can calculate the length of the substring as five by subtracting the position of the second white space from the length of the string. While you could inspect that value in this one cell, you can’t do that when there are one hundred or one hundred thousand names in a list. The way to dynamically capture the right hand side substring length is shown below.

=LEN(A1)-FIND(" ",A1,FIND(" ",A1,1)+1)

Now that you know how to capture the length of the substring, you can create a larger mega forumla to parse the substring on the right from the base string. The working example follows below.

=RIGHT(A1,LEN(A1)-FIND(" ",A1,FIND(" ",A1,1)+1))

The problem with the foregoing solution is that it fails when one of the names in the list doesn’t have a middle initial or name. The failure occurs because there would only be one white space in the base string, and the logic expects two. The FIND function looking for the second white space returns a #VALUE! error.

You can prevent this error by wrapping the nested FIND function calls and RIGHT function with calls with an IFERROR function. An IFERROR function returns the positional value of the FIND function, or an alternate value, like a fresh copy of the base string.

=IFERROR(RIGHT(A1,LEN(A1)-IFERROR(FIND(" ",A1,FIND(" ",A1)+1),FIND(" ",A1))),A1)

The IFERROR function inside the RIGHT function can cause the RIGHT function to fail when a base string is returned instead of a position number. Therefore, you must also wrap the RIGHT function inside another IFERROR function to avoid an error. This guarantees the return of the base string as a valid substring.

Assuming you put this formula in cell C1, you could copy this relative reference formula down in a column. It would require that you had a list of full names that followed a first name, white space, middle initial or name, white space, and last name in the A column.

Parsing the middle substring ↓

This shows you how to parse a middle substring from a text cell in Microsoft Excel.

There are three built in functions that are required to let you parse dynamic substrings from the middle of base strings. Two of them are the MID and FIND functions. You’ve seen how to use the FIND function in the left and right parsing examples of this blog. The third function is the IFERROR function, which is required when a middle string doesn’t exist. There are two more functions that let you trap for the possibility of a single base string. They are the IF and ISNUMBER functions.

The MID function takes three required parameters. The first is the text value or cell reference, the second is the start with value, and the third is the length of the substring. Dynamic substrings require you to bracket them, which means you need to find their beginning and ending positions and measure their length.

This basic idea means you parse the middle string from a set of three strings by finding their delimiters. As in the other examples, you’ll work with the string Joseph F. Smith as a base string. First, you find the first character of the middle string. You do this by finding the position of the first delimiting white space with a FIND function, and then you add one to the returned result value. The example is below.

=FIND(" ",A1,1)+1

Next, you find the position of the second delimiter. This requires that you create what is known as a mega formula, which you create by nesting one or more formulas in another. You can use the following formula to do that.

=FIND(" ",A1,FIND(" ",A1)+1)

You can then calculate the length of the middle string by subracting the first result from the second one, as shown below:

=FIND(" ",A1,FIND(" ",A1)+1)-FIND(" ",A1)-1

After you’ve mastered those formulas, you need to create a mega formula with the MID function. This doesn’t have any error trapping yet, so it is very dependent on data that contains three substrings separated by white spaces.

=MID(A1,FIND(" ",A1)+1,FIND(" ",A1,FIND(" ",A1)+1)-FIND(" ",A1)-1)

The preceding function returns F.. If the middle string were a middle name, it would return the middle name. It fails when you have a base string that lacks three substrings. You need to wrap the nested FIND function and MID function inside two respective IFERROR functions. An IFERROR function returns the positional value of the FIND function, or an alternate value, like a fresh copy of the base string or a numeric equivalent. The following example uses an IFERROR function call to substitutes a zero value because the absence of a second white space means there isn’t a middle string.

=MID(A5,FIND(" ",A5)+1,IFERROR(FIND(" ",A5,FIND(" ",A5)+1)-FIND(" ",A5)-1,0))

A second IFERROR wrapping the MID function lets you return a null value for middle name when there is only one name in the base string, like Joseph.

=IFERROR(MID(A5,FIND(" ",A5)+1,IFERROR(FIND(" ",A5,FIND(" ",A5)+1)-FIND(" ",A5)-1,0)),"")

Assuming you put this formula in cell D1, you could copy this relative reference formula down in a column. It would require that you had a list of full names that followed a first name, white space, middle initial or name, white space, and last name in the A column.

Data set ↓

A list of native strings, parsed first, middle, and last names, and concatenated names.

Original Name         First Name  Middle Name  Last Name  Resorted Name
Joseph Smith          Joseph                   Smith      Smith, Joseph 
Brigham Young	      Brigham                  Young      Young, Brigham 
John Taylor	      John                     Taylor     Taylor, John 
Wilford Woodruff      Wilford                  Woodruff   Woodruff, Wilford 
Lorenzo Snow          Lorenzo	               Snow       Snow, Lorenzo 
Joseph F. Smith       Joseph      F            Smith      Smith, Joseph F
Heber J. Grant        Heber       J            Grant      Grant, Heber J
George Albert Smith   George      Albert       Smith      Smith, George Albert
David O. Mckay        David       O            Mckay      Mckay, David O
Joseph Fielding Smith Joseph      Fielding     Smith      Smith, Joseph Fielding
Harold B. Lee         Harold      B            Lee        Lee, Harold B
Spencer W. Kimball    Spencer     W	       Kimball    Kimball, Spencer W
Ezra Taft Benson      Ezra        Taft         Benson     Benson, Ezra Taft
Howard W. Hunter      Howard      W            Hunter     Hunter, Howard W
Gordon B. Hinckley    Gordon      B            Hinckley   Hinckley, Gordon B
Thomas S. Monson      Thomas      S            Monson     Monson, Thomas S

MacLochlainns Weblog

Archive for the ‘basic programming’ Category

Bash Debug Function

Recursive bash function

Run X11 Apps on Mac

Bash Arrays & MySQL

Find a string in files

Excel date conversion

Excel string parsing

Recent Posts

Things Written About

Pages

Blogroll

Archives