Archive for the ‘basic programming’ Category
Bash Debug Function
My students working in Linux would have a series of labs to negotiate and I’d have them log the activities of their Oracle SQL scripts. Many of them would suffer quite a bit because they didn’t know how to find the errors in the log files.
I wrote this SQL function for them to put in their .bashrc files. It searches all the .txt files for errors and organizes them by log file, line number, and descriptive error message.
errors () { label="File Name:Line Number:Error Code"; list=`ls ./*.$1 | wc -l`; if [[ ${list} -eq 1 ]]; then echo ${label}; echo "----------------------------------------"; filename=`ls *.txt`; echo ${filename}:`find . -type f | grep -in *.txt -e ora\- -e pls\- -e sp2\-`; else if [[ ${list} -gt 1 ]]; then echo ${label}; echo "----------------------------------------"; find . -type f | grep --color=auto -in *.txt -e ora\- -e pls\- -e sp2\-; fi; fi } |
I hope it helps others now too.
Recursive bash function
While teaching a class on the Linux Command-Line (CLI), the book gave an example of generating a list of random US telephone numbers into a file. The book uses the RANDOM
function to generate segments of the telephone number, and then the grep
command to identify malformed telephone numbers.
My students wanted me to explain why the numbers were malformed. I had to explain that the RANDOM
function returns a random number between 1 and 99,999. The RANDOM
function may return a 1 to 5 digit random number, which means you may get a 1-digit or 2-digit number when you request a 3-digit random number or a 1- to 3-digit number when you request a 4-digit random number.
The author’s example is:
for i in {1..10}; do echo "(${RANDOM:0:3}) ${RANDOM:0:3}-${RANDOM:0:4}" >> list.txt done |
They asked if there was a way to write a shell script that guaranteed random but well-formed US telephone numbers. I said yes, however, you need to write a recursive bash shell function and assign the result to a global variable set in the shell script.
They seemed doubtful, so I wrote it for them. Here’s the script if you’re interested in learning more about bash shell scripting. While I implemented it with an bash array, that’s optional.
#!/usr/bin/bash # ============================================================ # Name: telephone.sh # Author: Michael McLaughlin # Date: 05-May-2020 # ------------------------------------------------------------ # Purpose: Demonstrate how to generate random telehpone # numbers. The RANDOM function returns a random # number between 1 and 99999; and while you can # easily shave off a extra digit guarnteeing a # value above 100 is impossible without logic. # ============================================================ targetLength() { # Declare variable in function-level scope. randomString='' # Check the number of parameters to process. if [[ ${#} = 2 ]]; then # Assign value to function-level and local variables. randomString=${1} formatLength=${2} # Get the length of the telephone number as integer. length=`echo -n ${randomString} | wc -c` # Calculate any shortfall. short=$((${formatLength}-${length})) # Check if the telephone number is too short. if [[ ${short} > 0 ]]; then randomString=`echo "${randomString}${RANDOM:0:${short}}"` fi fi # Check if the combination of random numbers equals the target length # and assign the value to the global variable, or repeat processing # by making a recursive function call. if [[ `echo -n ${randomString} | wc -c` = ${formatLength} ]]; then result=${randomString} else targetLength ${randomString} ${formatLength} fi } # Declare global variable to support targetLength(). result='' # Declare an array of strings. declare -A telephone_parts # Generate one hundred random telephone numbers. for i in {1..100}; do # Create random three digit area code. targetLength ${RANDOM:0:3} 3 telephone_parts[1]=${result} # Create random three digit prefix code. targetLength ${RANDOM:0:3} 3 telephone_parts[2]=${result} # Create random four digit number code. targetLength ${RANDOM:0:4} 4 telephone_parts[3]=${result} # Print the telephone numbers. echo "[${i}] (${telephone_parts[1]}) ${telephone_parts[2]}-${telephone_parts[3]}" done |
For reference, a recursive function call isn’t required here. It could be done more effectively with the following while
loop:
targetLength() { # Declare variable in function-level scope. randomString='' short=1 # Check the number of parameters to process. if [[ ${#} = 2 ]]; then # Assign value to function-level and local variables. randomString=${1} formatLength=${2} # Check if the telephone number is too short. while [[ ${short} > 0 ]]; do # Get the length of the telephone number as integer. length=`echo -n ${randomString} | wc -c` # Calculate any shortfall. short=$((${formatLength}-${length})) # Assign new value to randomString. randomString=`echo "${randomString}${RANDOM:0:${short}}"` done # Assign randomString to global result variable. result=${randomString} fi } |
As always, I hope this helps those you want to learn or solve a problem.
Run X11 Apps on Mac
It’s possible folks didn’t notice but Mac OS X no longer includes XQuartz by default from Maverick forward. You need to download XQuartz and install it. I’d recommend after you install Xcode.
Launch XQuartz and then either use the bash
shell it opens or open a Terminal bash
shell session. Inside the shell, you might start Secure Shell (ssh
) like this:
Mac-Pro-3:~ michaelmclaughlin$ ssh student@192.168.2.170 student@192.168.2.170's password: Last login: Thu Jun 4 14:33:37 2015 [student@localhost ~]$ xclock & [1] 10422 [student@localhost ~]$ Error: Can't open display: |
Granted that’s a trivial error and running the xclock X11 applications isn’t crucial, an error that makes it more important is the following from Oracle’s old Designer/2000 application:
FRM-91111: Internal Error: window system startup failure. FRM-10039: Unable to start up the Form Builder. |
This is the desired behavior. Secure shell (ssh
) can’t run it unless you make the connection with the -Y
flag. You should use the following syntax:
Mac-Pro-3:~ michaelmclaughlin$ ssh -Y student@192.168.2.170 student@192.168.2.170's password: Last login: Tue Jun 9 14:56:55 2015 from 192.168.2.1 /usr/bin/xauth: file /home/student/.Xauthority does not exist [student@localhost ~]$ xclock & [1] 10760 |
You can safely ignore the .Xauthority does not exist warning message because it’ll create a .Xauthority
file and store the magic cookie after the warning message. You should see the xclock
program running in the upper left hand corner of your console, like:
It’s terrific that you don’t get a font warning like you typically would using UTF-8 on Linux. Nice that the Mac OS fonts are so well done that there isn’t a raised exception.
Using xclock
or xeyes
isn’t very useful as a rule, but this method also lets you run any of the Linux GUI applications. For example, the following gedit
command lets you run the gedit
utility from a Mac OS console. If you’ve installed the gedit
plug-ins, you also can use the Terminal console on the remote system.
The process sequence for the command-line is shown below:
1030 1 /usr/sbin/sshd -D - The root process launches the ssh daemon 3145 1030 sshd: student [priv] - The sshd launches a ssh session to manage a student ssh session 3152 3145 sshd: student@pts/1 - The ssh session launched to manage the ssh session 3166 3152 -bash - The bash shell launched by connecting through the ssh session 3240 3166 gedit - The gedit command issued inside a ssh session 3166 3240 gnome-pty-helper - Launching the gedit session across X11 3169 3240 /bin/bash - Launching the Terminal session inside the gedit session across X11 3269 3884 ps -ef - Command run inside the gedit Terminal session |
Hope that helps those who want to use X11 applications on the Mac OS.
Bash Arrays & MySQL
Student questions are always interesting! They get me to think and to write. The question this time is: “How do I write a Bash Shell script to process multiple MySQL script files?” This post builds the following model (courtesy of MySQL Workbench) by using a bash shell script and MySQL script files, but there’s a disclaimer on this post. It shows both insecure and secure approaches and you should avoid the insecure ones.
It seems a quick refresher on how to use arrays in bash
shell may be helpful. While it’s essential in a Linux environment, it’s seems not everyone masters the bash
shell.
Especially, since I checked my Learning the Bash Shell (2nd Edition) and found a typo on how you handle arrays in the bash
shell, and it’s a mistake that could hang newbies up (on page 161). Perhaps I should update my copy because I bought it in 1998. 😉 It was good then, and the new edition is probably better. The error is probably corrected in the current Learning the Bash Shell, but if not, the following examples show you how to use arrays in loops.
Naturally, these do presume some knowledge of working with bash
shell, like the first line always is the same in any bash
shell script. That you open an if-statement with an if
and close it with a fi
, and that you else-if is elif
; and that a semicolon between a for-statement and the do
statement is required when they’re on the same line because they’re two statements.
If you’re new to bash
shell arrays, click on the link below to expand a brief tutorial. It takes you through three progressive examples of working with bash
arrays.
Working with bash
Arrays ↓
A basic example of working with an array in bash
shell is the following list1.sh
script:
1 2 3 4 5 6 7 8 9 10 11 12 13 | #!/usr/bin/bash # Print script name. echo $0 #!/usr/bin/bash # Define an array. declare -a cmd=("one" "two" "three") # Call the array elements. for i in ${cmd[*]}; do echo ${i} done |
Line 8 declares the cmd
array by assigning three strings. Line 12 returns the elements of the array to the ${i}
variable, which lets you manage them one at a time. You use the chmod
command to make the list1.sh
executable, like this:
chmod 755 list1.sh |
Then, you can run it like this from the present working directory (pwd
):
./list1.sh |
It should print:
one two three |
The list2.sh
example changes the cmd
array declaration from list1.sh
. It declares the cmd
array as an empty array, and then it assigns elements by index numbers (using a zero-based index), as shown below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | #!/usr/bin/bash # Define an array. declare -a cmd # Assign elements to an array. cmd[0]="one" cmd[1]="two" cmd[2]="three" # Call the array elements. for i in ${cmd[*]}; do echo ${i} done |
Lines 7 through 9 assign values to the elements of the cmd
array. You would chmod
the file, and run the file as qualified above for the list1.sh
script.
The last pre-implementation example requires that you create three demonstration scripts, the one.sh
, two.sh
, and three.sh
scripts. You should put them in the same directory as the list3.sh
script.
The demonstration scripts should all have the same code, like this:
1 2 3 4 | #!/usr/bin/bash # Print script name. echo $0 |
Line 4 returns command line parameter $0
or ${0}
, which is always the command line program’s file name. The file name may be provided as a relative or absolute file name, and if that’s new to you please check out The Linux Command Line: A Complete Introduction (also downloadable as a PDF for free).
The list3.sh
script should contain the following:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | #!/usr/bin/bash <code> # Define an array. declare -a cmd # Assign elements to an array. cmd[0]="one.sh" cmd[1]="two.sh" cmd[2]="three.sh" # Call the array elements. for i in ${cmd[*]}; do `pwd`/${i} done |
When you run the list3.sh
script from the /home/student/Code/bash directory with a local syntax, the script should return the fully qualified file names of the subshell programs. The output should look like this:
/home/student/Code/bash/one.sh /home/student/Code/bash/two.sh /home/student/Code/bash/three.sh |
The list3.sh
script provides the present working directory (pwd
) and the one.sh
, two.sh
, and three.sh
scripts return only their executable name. For example, if you ran one.sh
with the following syntax:
./one.sh |
It returns
./one.sh |
Only one more trick needs to be qualified before our main MySQL examples. That trick is how you pass parameters to a bash
shell script. For reference, this is the part that’s insecure because user command histories are available inside the Linux OS.
Here’s a hello_whom.sh
script to demonstrates the concept of parameter passing:
1 2 3 4 5 6 7 8 9 10 | #!/usr/bin/bash # This says hello to the argument while managing no argument. if [[ ${#} = 1 ]]; then echo 'The '${0}' program says: "Hello '${1}'!"' elif [[ ${#} > 1 ]]; then echo 'The '${0}' program wants to know if you have more than one name?' else echo 'The '${0}' program wants to know if you have a name?' fi |
If you need more on how parameters are passed and managed, you can check a prior blob post on Handling bash
Parameters, or check the bash help pages. The following leverages bash
arrays to run scripts and query the MySQL database from the command line.
You will need the three batch SQL files first, so here they are:
Setup SQL Files ↓
The actor.sql
file:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | -- Use the sampledb database. USE sampledb; -- Disable foreign key checking. SET foreign_key_checks = 0; -- Drop an actor table. DROP TABLE IF EXISTS actor; -- Create an actor table. CREATE TABLE actor ( actor_id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT , actor_name VARCHAR(30) NOT NULL ); -- Insert two rows. INSERT INTO actor (actor_name) VALUES ('Chris Hemsworth'); INSERT INTO actor (actor_name) VALUES ('Chris Pine'); INSERT INTO actor (actor_name) VALUES ('Chris Pratt'); |
The film.sql
file:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | -- Use the sampledb database. USE sampledb; -- Disable foreign key checking. SET foreign_key_checks = 0; -- Drop a film table. DROP TABLE IF EXISTS film; -- Create a film table. CREATE TABLE film ( film_id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT , film_name VARCHAR(30) NOT NULL ); -- Insert rows. INSERT INTO film (film_name) VALUES ('Thor'); INSERT INTO film (film_name) VALUES ('Thor: The Dark World'); INSERT INTO film (film_name) VALUES ('Star Trek'); INSERT INTO film (film_name) VALUES ('Star Trek into Darkness'); INSERT INTO film (film_name) VALUES ('Guardians of the Galaxy'); |
The movie.sql
file:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 | -- Use the sampledb database. USE sampledb; -- Disable foreign key checking. SET foreign_key_checks = 0; -- Drop an movie table. DROP TABLE IF EXISTS movie; -- Create an movie table. CREATE TABLE movie ( movie_id int unsigned PRIMARY KEY AUTO_INCREMENT , actor_id int unsigned NOT NULL , film_id int unsigned NOT NULL , CONSTRAINT actor_fk FOREIGN KEY (actor_id) REFERENCES actor (actor_id) , CONSTRAINT film_fk FOREIGN KEY (film_id) REFERENCES film(film_id)); -- Use scalar subqueries to discover surrogate keys by using the faux natural key. INSERT INTO movie ( actor_id , film_id ) VALUES ((SELECT actor_id FROM actor WHERE actor_name = 'Chris Hemsworth') ,(SELECT film_id FROM film WHERE film_name = 'Thor')); -- Use scalar subqueries to discover surrogate keys by using the faux natural key. INSERT INTO movie ( actor_id , film_id ) VALUES ((SELECT actor_id FROM actor WHERE actor_name = 'Chris Hemsworth') ,(SELECT film_id FROM film WHERE film_name = 'Thor: The Dark World')); -- Use scalar subqueries to discover surrogate keys by using the faux natural key. INSERT INTO movie ( actor_id , film_id ) VALUES ((SELECT actor_id FROM actor WHERE actor_name = 'Chris Pine') ,(SELECT film_id FROM film WHERE film_name = 'Star Trek')); -- Use scalar subqueries to discover surrogate keys by using the faux natural key. INSERT INTO movie ( actor_id , film_id ) VALUES ((SELECT actor_id FROM actor WHERE actor_name = 'Chris Pine') ,(SELECT film_id FROM film WHERE film_name = 'Star Trek into Darkness')); -- Use scalar subqueries to discover surrogate keys by using the faux natural key. INSERT INTO movie ( actor_id , film_id ) VALUES ((SELECT actor_id FROM actor WHERE actor_name = 'Chris Pratt') ,(SELECT film_id FROM film WHERE film_name = 'Guardians of the Galaxy')); |
The following list_mysql.sh
shell script expects to receive the username
, password
, database
and fully qualified path
in that specific order. The script names are entered manually because this should be a unit test script. Naturally, you can extend the script to manage those parameters but as mentioned I see this type of solution as a developer machine only script to simplify unit testing. Anything beyond that is risky!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 | #!/usr/bin/bash # Assign user and password username="${1}" password="${2}" database="${3}" directory="${4}" # List the parameter values passed. echo "Username: " ${username} echo "Password: " ${password} echo "Database: " ${database} echo "Directory: " ${directory} echo "" # Define an array. declare -a cmd # Assign elements to an array. cmd[0]="actor.sql" cmd[1]="film.sql" cmd[2]="movie.sql" # Call the array elements. for i in ${cmd[*]}; do mysql -s -u${username} -p${password} -D${database} < ${directory}/${i} > /dev/null 2>/dev/null done # Connect and pipe the query result minus errors and warnings to the while loop. mysql -u${username} -p${password} -D${database} <<<'show tables' 2>/dev/null | # Read through the piped result until it's empty but format the title. while IFS='\n' read list; do if [[ ${list} = "Tables_in_sampledb" ]]; then echo $list echo "----------------------------------------" else echo $list fi done echo "" # Connect and pipe the query result minus errors and warnings to the while loop. mysql -u${username} -p${password} -D${database} <<<'SELECT CONCAT(a.actor_name," in ",f.film_name) AS "Actors in Films" FROM actor a INNER JOIN movie m ON a.actor_id = m.actor_id INNER JOIN film f ON m.film_id = f.film_id' 2>/dev/null | # Read through the piped result until it's empty but format the title. while IFS='\n' read actor_name; do if [[ ${actor_name} = "Actors in Films" ]]; then echo $actor_name echo "----------------------------------------" else echo $actor_name fi done |
The IFS
(Internal Field Separator) works with whitespace by default. The IFS
on lines 33 and 47 sets the IFS
to a line return ('\n'
). That’s the trick to display the data, and you can read more about the IFS
in this question and answer post.
You can run this script with the following input parameters from the local directory where you deploy it. The a parameters are: (1) username
, (2) password
, (3) database
, and (4) a fully qualified path to the SQL setup files.
./list_mysql.sh student student sampledb "/home/student/Code/bash/mysql" |
With valid input values, the list_mysql.sh
bash
script generates the following output, which confirms inputs and verifies actions taken by the scripts with queries:
Username: student Password: student Database: sampledb Directory: /home/student/Code/bash/mysql Tables_in_sampledb ---------------------------------------- actor film movie Actors in Films ---------------------------------------- Chris Hemsworth in Thor Chris Hemsworth in Thor: The Dark World Chris Pine in Star Trek Chris Pine in Star Trek into Darkness Chris Pine in Guardians of the Galaxy |
If you forgot to provide the required inputs to the list_mysql.sh
bash
script, it alternatively returns the following output:
Username: Password: Database: Directory: ./list_mysql.sh: line 25: /actor.sql: No such file or directory ./list_mysql.sh: line 25: /film.sql: No such file or directory ./list_mysql.sh: line 25: /movie.sql: No such file or directory |
The secure way removes the password at a minimum! The refactored program will require you to manually enter the password for all elements of the array (three in this sample), and twice for the two queries. Here’s the refactored code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 | #!/usr/bin/bash # Assign user and password username="${1}" database="${2}" directory="${3}" # List the parameter values passed. echo "Username: " ${username} echo "Database: " ${database} echo "Directory: " ${directory} echo "" # Define an array. declare -a cmd # Assign elements to an array. cmd[0]="actor.sql" cmd[1]="film.sql" cmd[2]="movie.sql" # Call the array elements. for i in ${cmd[*]}; do mysql -s -u${username} -p -D${database} < ${directory}/${i} > /dev/null 2>/dev/null done # Connect and pipe the query result minus errors and warnings to the while loop. mysql -u${username} -p -D${database} <<<'show tables' 2>/dev/null | # Read through the piped result until it's empty. while IFS='\n' read list; do if [[ ${list} = "Tables_in_sampledb" ]]; then echo $list echo "----------------------------------------" else echo $list fi done echo "" # Connect and pipe the query result minus errors and warnings to the while loop. mysql -u${username} -p -D${database} <<<'SELECT CONCAT(a.actor_name," in ",f.film_name) AS "Actors in Films" FROM actor a INNER JOIN movie m ON a.actor_id = m.actor_id INNER JOIN film f ON m.film_id = f.film_id' 2>/dev/null | # Read through the piped result until it's empty. while IFS='\n' read actor_name; do if [[ ${actor_name} = "Actors in Films" ]]; then echo $actor_name echo "----------------------------------------" else echo $actor_name fi done |
Please let me know if you think there should be any more scaffolding for newbies in this post. As always, I hope this helps those looking for this type of solution.
Find a string in files
From time to time, folks ask questions about how to solve common problems in Linux or Unix. Today, the question is: “How do I find a list of files that contain a specific string?” There are two alternatives with the find
command, and the following sample searches look for files that contain a sqlite3
string literal.
- Search for only the file names:
find . -type f | xargs grep -li sqlite3 |
Or, the more verbose:
find . -type f -exec grep -li sqlite3 /dev/null {} + |
- Search for the file names and text line:
find . -type f | xargs grep -i sqlite3 |
Or, the more verbose:
find . -type f -exec grep -i sqlite3 /dev/null {} + |
Don’t exclude the /dev/null
from the verbose syntax or you’ll get the things you lack permissions to inspect or that raise other errors. I don’t post a lot of Linux or Unix tips and techniques, and you may find this site more useful to answer these types of questions:
Unix & Linux Stack Exchange web site
As always, I hope this helps those you land on the blog page.
Excel date conversion
I put together a post on how to upload to MySQL from a CSV file with dates. It was more or less for my students but one of them was curious how the mega formula worked. As commented, the easier solution is to create a custom format. Oddly, Open Office does support the MySQL default format natively.
Excel doesn’t support the native MySQL date format as a default format mask, which is YYYY-MM-DD, or 2009-06-02 for June 2, 2009. That means you have to convert it from a scalar date to a string or create a custom format mask (see Dmitri’s comment below). If you just shook your head at the term scalar date, maybe a custom format mask is best. However, if you want a programming solution let me explain that Excel supports only three data types. They’re a string literal, a numeric literal, and a formula. Dates in Excel are merely formatted numbers. When the numbers are integers, the date is a date, but when the number has a fractional component, the date is really a timestamp.
Here’s a brief description of the process required to convert a date in Excel into a MySQL date format string literal in a CSV file. You need the following Excel functions:
Date Functions
- The
DAY(date)
function returns a 1 or 2 digit numeric value for the day of the month, with ranges of 1 to 28, 1 to 29, 1 to 30, or 1 to 31 dependent on the month and year. - The
MONTH(date)
function returns a 1 or 2 digit numeric value for the month of the year. - The
YEAR(date)
function returns a 4 digit numeric value for the year.
Logical Functions
- The
IF(logical_expression,truth_action,false_action)
function returns the truth action when the expression is true, and the false action when the expression isn’t true.
MySQL Server
CONCATENATE(string_1, string_2, ...)
glues strings together.LEN(numeric_value)
function returns the length of a string or number.
MySQL requires that you return an eight character string of numbers. The first four numbers must be a valid year, the fifth and sixth numbers a valid month, and the seventh and eigth numbers a valid day in the context of the year and month provided. Unfortunately, the DAY()
and MONTH()
functions may return a 1 or 2 digit value. That can’t happen in the CSV file’s string for a date, so you use the IF()
and LEN()
functions to guarantee a 2 digit return value.
Here are the examples that guarantee 2 digit day and month values, assuming that the base date is in the A1 cell. The concatenation of a "0"
(zero between two double quotes) or the ""
(two double quotes or a string null) ensures the number data types become strings.
=IF(LEN(DAY(A1))=1,CONCATENATE("0",DAY(A1)),DAY(A1)) =IF(LEN(MONTH(A1))=1,CONCATENATE("0",MONTH(A1)),MONTH(A1)) |
A zero is placed before the day or month when the logical condition is met, which means the day or month value is a single digit string. A null is place before the day or month when the logical condition isn’t met, which means the day or month value is a two digit string. There’s only one problem with these mega functions. They return a number.
The year calculation doesn’t require the explicit casting when you concatenate it with the other strings because it is implicitly cast as a string. However, it’s a better practice to include it for clarity (most folks don’t know about the implicit casting behaviors in Excel).
=CONCATENATE(YEAR(A1),"-",IF(LEN(MONTH(A1))=1,CONCATENATE("0",MONTH(A1)),MONTH(A1)),"-",IF(LEN(DAY(A1)) = 1,CONCATENATE("0",DAY(A1)),DAY(A1))) |
As Goodwin reported in a comment, there’s an easier way that I missed. You can simply use the TEXT function when the source column is a valid serialized date value.
=TEXT(A1,"YYYYMMDD") |
You can see the full MySQL import from CSV in the previous post. Naturally, you may want to copy and paste special the value before creating the CSV file. Also, don’t forget to delete any unused columns to the right or rows beneath because if you don’t your file won’t map to your table definition.
Excel string parsing
Parsing strings isn’t the easiest thing to do in Excel 2007 or Excel 2008 but it is an important thing to know how to do. You’ll learn how to parse a set names into first, middle and last names. You can find the data set for these examples at the bottom of the blog post.
Parsing the left substring ↓
This shows you how to parse a left substring from a text cell in Microsoft Excel.
There are two built in functions that let you parse dynamic substrings from the left of a string. They are the LEFT
and FIND
functions. The LEFT
function actually does the parsing but the FIND
function lets you dynamically find a delimiting character, like a space.
Assuming cell A1
holds the value of Joseph F. Smith, you can parse Joseph by using a static value of seven for the first white space in the string. This works because each character maps to a string, and strings start with the number one. The LEFT
function supports three formal parameters, the first is the string you’re parsing, the second is where to stop, and the third is where to start. The position you start at is assumed to be one, which makes the last parameter an optional parameter. You only provide the start with parameter when you want to start someplace other than the left most position.
This is the simplest use of the LEFT
function with its two required parameters.
=LEFT(A1,7) |
It returns the substring Joseph from the string Joseph F. Smith, which is stored in cell A1
.
The simplest solution merely illustrates a concept. You’re not going to do this unless you have a bit more data. Naturally, the list of first names have different lengths in the real world. You must include the FIND
function inside the call to the LEFT
function to make your function dynamic. The FIND
function lets you capture the position of a white space delimiter.
The FIND
function supports three formal parameters, the first is the substring you’re looking for in the string, the second is the string to search, and the third is where to start. Like the LEFT
function, the position you start at is assumed to be one, which makes the last parameter an optional parameter. You only provide the start with parameter when you want to start someplace other than the left most position. While not required, you should consider always providing the start with parameter to the FIND
function.
The following shows you how to use the FIND
function to locate the first occurrence of a white space in a string stored in cell A1
.
=FIND(" ",A1,1) |
You can now make you’re LEFT
function dynamic by putting the FIND
function inside it. As qualified, the FIND
function looks for the first occurrence of a white space, and returns it into the second parameter of the LEFT
function. The sample formula now has a nested function, which makes it a mega formula in Microsoft Excel parlance. The following sample also includes the optional start with parameter for clarity.
=LEFT(A1,FIND(" ",A1,1),1) |
The problem with the foregoing solution occurs when a name in the list doesn’t have a middle initial or name, or last name. The lack of a second name means that there won’t be any white space between to substrings in the base string. This situation causes the logic to fail because the FIND
function returns a #VALUE!
error when it can’t find a white space in the string.
You can prevent the error by wrapping the nested FIND
function and LEFT
function inside two IFERROR
functions. This IFERROR
function returns the positional value of the FIND
function, or an alternative copy of the base string.
=IFERROR(FIND(" ",A1,1),A1) |
This type of logic inside the LEFT
function causes the LEFT
function to fail when the base string is returned to it instead of a position number. You must wrap the LEFT
function inside another IFERROR
function to guarantee that you don’t throw an error. This also lets you return the base string as the valid substring when appropriate.
=IFERROR(LEFT(A1,IFERROR(FIND(" ",A1),A1)-1),A1) |
Assuming you put this formula in cell C1
, you could copy this relative reference formula down in a column. It would require that you had a list of full names that followed a first name, white space, middle initial or name, white space, and last name in the A
column.
Parsing the right substring ↓
This shows you how to parse a right substring from a text cell in Microsoft Excel.
There are three built in functions that let you parse dynamic substrings from the right of strings. They are the RIGHT
and FIND
functions that you may have covered when reading how to parse from the left. The LEN
function is the other function, and it lets you find the length of a string. Together these functions lets you find the length of a substring on the right.
The RIGHT
function actually does the parsing but the FIND
and LEN
functions let you dynamically find where to cut a substring out of a base string. This example continues to use the string Joseph F. Smith.
The RIGHT
function has only two required parameters. The first parameter is the string that you’re parsing. The second parameter is the length of the substring. If you inspect the string, Smith is only five characters long. A static call to the RIGHT
function is shown below.
=RIGHT(A1,5) |
It returns the substring Smith from the string Joseph F. Smith, which is stored in cell A1
.
Like the LEFT
function example, this static approach to parsing merely illustrates a concept. You’re not going to do this with real data because the list of first names have different lengths in the real world. You must include the FIND
function twice inside your call to the RIGHT
function because you’re parsing the string based on the second occurrence of a white space in the string. While the FIND
function lets you dynamically capture the position of the white space delimiter, a nested FIND
function lets you capture the correct start with parameter value. That value is one position after the first occurrence of a white space.
The FIND
function supports three formal parameters, the first is the substring you’re looking for in the string, the second is the string to search, and the third is where to start. Like the LEFT
function, the position you start at is assumed to be one, which makes the last parameter an optional parameter. You only provide the start with parameter when you want to start someplace other than the left most position. While not required, you should consider always providing the start with parameter to the FIND
function. It improves readability.
The following shows how to use a FIND
function to locate the position one beyond where the first occurrence of a white space is found in a string.
=FIND(" ",A1,1)+1 |
If you pass the preceding FIND
function call as the start with parameter to another FIND
function, you can locate the second occurrence of a white space in a string stored in cell A1
. Nesting function calls inside functions creates what are known as mega formulas in Excel. The following demonstrates a mega formula to find the second instance of a white space in a string.
=FIND(" ",A1,FIND(" ",A1,1)+1) |
This returns the value of ten. You now know where to start but not the length of the substring on the right. You first need to find the length of the total string. You use the LEN
function to find that, like the following.
=LEN(A1) |
The LEN
function returns fifteen. You can calculate the length of the substring as five by subtracting the position of the second white space from the length of the string. While you could inspect that value in this one cell, you can’t do that when there are one hundred or one hundred thousand names in a list. The way to dynamically capture the right hand side substring length is shown below.
=LEN(A1)-FIND(" ",A1,FIND(" ",A1,1)+1) |
Now that you know how to capture the length of the substring, you can create a larger mega forumla to parse the substring on the right from the base string. The working example follows below.
=RIGHT(A1,LEN(A1)-FIND(" ",A1,FIND(" ",A1,1)+1)) |
The problem with the foregoing solution is that it fails when one of the names in the list doesn’t have a middle initial or name. The failure occurs because there would only be one white space in the base string, and the logic expects two. The FIND
function looking for the second white space returns a #VALUE!
error.
You can prevent this error by wrapping the nested FIND
function calls and RIGHT
function with calls with an IFERROR
function. An IFERROR
function returns the positional value of the FIND
function, or an alternate value, like a fresh copy of the base string.
=IFERROR(RIGHT(A1,LEN(A1)-IFERROR(FIND(" ",A1,FIND(" ",A1)+1),FIND(" ",A1))),A1) |
The IFERROR
function inside the RIGHT
function can cause the RIGHT
function to fail when a base string is returned instead of a position number. Therefore, you must also wrap the RIGHT
function inside another IFERROR
function to avoid an error. This guarantees the return of the base string as a valid substring.
Assuming you put this formula in cell C1
, you could copy this relative reference formula down in a column. It would require that you had a list of full names that followed a first name, white space, middle initial or name, white space, and last name in the A
column.
Parsing the middle substring ↓
This shows you how to parse a middle substring from a text cell in Microsoft Excel.
There are three built in functions that are required to let you parse dynamic substrings from the middle of base strings. Two of them are the MID
and FIND
functions. You’ve seen how to use the FIND
function in the left and right parsing examples of this blog. The third function is the IFERROR
function, which is required when a middle string doesn’t exist. There are two more functions that let you trap for the possibility of a single base string. They are the IF
and ISNUMBER
functions.
The MID
function takes three required parameters. The first is the text value or cell reference, the second is the start with value, and the third is the length of the substring. Dynamic substrings require you to bracket them, which means you need to find their beginning and ending positions and measure their length.
This basic idea means you parse the middle string from a set of three strings by finding their delimiters. As in the other examples, you’ll work with the string Joseph F. Smith as a base string. First, you find the first character of the middle string. You do this by finding the position of the first delimiting white space with a FIND
function, and then you add one to the returned result value. The example is below.
=FIND(" ",A1,1)+1 |
Next, you find the position of the second delimiter. This requires that you create what is known as a mega formula, which you create by nesting one or more formulas in another. You can use the following formula to do that.
=FIND(" ",A1,FIND(" ",A1)+1) |
You can then calculate the length of the middle string by subracting the first result from the second one, as shown below:
=FIND(" ",A1,FIND(" ",A1)+1)-FIND(" ",A1)-1 |
After you’ve mastered those formulas, you need to create a mega formula with the MID
function. This doesn’t have any error trapping yet, so it is very dependent on data that contains three substrings separated by white spaces.
=MID(A1,FIND(" ",A1)+1,FIND(" ",A1,FIND(" ",A1)+1)-FIND(" ",A1)-1) |
The preceding function returns F.. If the middle string were a middle name, it would return the middle name. It fails when you have a base string that lacks three substrings. You need to wrap the nested FIND
function and MID
function inside two respective IFERROR
functions. An IFERROR
function returns the positional value of the FIND
function, or an alternate value, like a fresh copy of the base string or a numeric equivalent. The following example uses an IFERROR
function call to substitutes a zero value because the absence of a second white space means there isn’t a middle string.
=MID(A5,FIND(" ",A5)+1,IFERROR(FIND(" ",A5,FIND(" ",A5)+1)-FIND(" ",A5)-1,0)) |
A second IFERROR
wrapping the MID
function lets you return a null value for middle name when there is only one name in the base string, like Joseph.
=IFERROR(MID(A5,FIND(" ",A5)+1,IFERROR(FIND(" ",A5,FIND(" ",A5)+1)-FIND(" ",A5)-1,0)),"") |
Assuming you put this formula in cell D1
, you could copy this relative reference formula down in a column. It would require that you had a list of full names that followed a first name, white space, middle initial or name, white space, and last name in the A
column.
Data set ↓
A list of native strings, parsed first, middle, and last names, and concatenated names.
Original Name First Name Middle Name Last Name Resorted Name Joseph Smith Joseph Smith Smith, Joseph Brigham Young Brigham Young Young, Brigham John Taylor John Taylor Taylor, John Wilford Woodruff Wilford Woodruff Woodruff, Wilford Lorenzo Snow Lorenzo Snow Snow, Lorenzo Joseph F. Smith Joseph F Smith Smith, Joseph F Heber J. Grant Heber J Grant Grant, Heber J George Albert Smith George Albert Smith Smith, George Albert David O. Mckay David O Mckay Mckay, David O Joseph Fielding Smith Joseph Fielding Smith Smith, Joseph Fielding Harold B. Lee Harold B Lee Lee, Harold B Spencer W. Kimball Spencer W Kimball Kimball, Spencer W Ezra Taft Benson Ezra Taft Benson Benson, Ezra Taft Howard W. Hunter Howard W Hunter Hunter, Howard W Gordon B. Hinckley Gordon B Hinckley Hinckley, Gordon B Thomas S. Monson Thomas S Monson Monson, Thomas S |