## Excel string parsing

Parsing strings isn’t the easiest thing to do in Excel 2007 or Excel 2008 but it is an important thing to know how to do. You’ll learn how to parse a set names into first, middle and last names. You can find the data set for these examples at the bottom of the blog post.

** Parsing the left substring** ↓

This shows you how to parse a left substring from a text cell in Microsoft Excel.

There are two built in functions that let you parse dynamic substrings from the left of a string. They are the `LEFT`

and `FIND`

functions. The `LEFT`

function actually does the parsing but the `FIND`

function lets you dynamically find a delimiting character, like a space.

Assuming cell `A1`

holds the value of *Joseph F. Smith*, you can parse *Joseph* by using a static value of seven for the first white space in the string. This works because each character maps to a string, and strings start with the number one. The `LEFT`

function supports three formal parameters, the first is the string you’re parsing, the second is where to stop, and the third is where to start. The position you start at is assumed to be one, which makes the last parameter an optional parameter. You only provide the *start with* parameter when you want to start someplace other than the left most position.

This is the simplest use of the `LEFT`

function with its two required parameters.

=LEFT(A1,7) |

It returns the substring *Joseph* from the string *Joseph F. Smith*, which is stored in cell `A1`

.

The simplest solution merely illustrates a concept. You’re not going to do this unless you have a bit more data. Naturally, the list of first names have different lengths in the real world. You must include the `FIND`

function inside the call to the `LEFT`

function to make your function dynamic. The `FIND`

function lets you capture the position of a white space delimiter.

The `FIND`

function supports three formal parameters, the first is the substring you’re looking for in the string, the second is the string to search, and the third is where to start. Like the `LEFT`

function, the position you start at is assumed to be one, which makes the last parameter an optional parameter. You only provide the *start with* parameter when you want to start someplace other than the left most position. While not required, you should consider always providing the *start with* parameter to the `FIND`

function.

The following shows you how to use the `FIND`

function to locate the first occurrence of a white space in a string stored in cell `A1`

.

=FIND(" ",A1,1) |

You can now make you’re `LEFT`

function dynamic by putting the `FIND`

function inside it. As qualified, the `FIND`

function looks for the first occurrence of a white space, and returns it into the second parameter of the `LEFT`

function. The sample formula now has a nested function, which makes it a *mega formula* in Microsoft Excel parlance. The following sample also includes the optional *start with* parameter for clarity.

=LEFT(A1,FIND(" ",A1,1),1) |

The problem with the foregoing solution occurs when a name in the list doesn’t have a middle initial or name, or last name. The lack of a second name means that there won’t be any white space between to substrings in the base string. This situation causes the logic to fail because the `FIND`

function returns a `#VALUE!`

error when it can’t find a white space in the string.

You can prevent the error by wrapping the nested `FIND`

function and `LEFT`

function inside two `IFERROR`

functions. This `IFERROR`

function returns the positional value of the `FIND`

function, or an alternative copy of the base string.

=IFERROR(FIND(" ",A1,1),A1) |

This type of logic inside the `LEFT`

function causes the `LEFT`

function to fail when the base string is returned to it instead of a position number. You must wrap the `LEFT`

function inside another `IFERROR`

function to guarantee that you don’t throw an error. This also lets you return the base string as the valid substring when appropriate.

=IFERROR(LEFT(A1,IFERROR(FIND(" ",A1),A1)-1),A1) |

Assuming you put this formula in cell `C1`

, you could copy this relative reference formula down in a column. It would require that you had a list of full names that followed a first name, white space, middle initial or name, white space, and last name in the `A`

column.

** Parsing the right substring** ↓

This shows you how to parse a right substring from a text cell in Microsoft Excel.

There are three built in functions that let you parse dynamic substrings from the right of strings. They are the `RIGHT`

and `FIND`

functions that you may have covered when reading how to parse from the left. The `LEN`

function is the other function, and it lets you find the length of a string. Together these functions lets you find the length of a substring on the right.

The `RIGHT`

function actually does the parsing but the `FIND`

and `LEN`

functions let you dynamically find where to cut a substring out of a base string. This example continues to use the string *Joseph F. Smith*.

The `RIGHT`

function has only two required parameters. The first parameter is the string that you’re parsing. The second parameter is the length of the substring. If you inspect the string, *Smith* is only five characters long. A static call to the `RIGHT`

function is shown below.

=RIGHT(A1,5) |

It returns the substring *Smith* from the string *Joseph F. Smith*, which is stored in cell `A1`

.

Like the `LEFT`

function example, this static approach to parsing merely illustrates a concept. You’re not going to do this with real data because the list of first names have different lengths in the real world. You must include the `FIND`

function twice inside your call to the `RIGHT`

function because you’re parsing the string based on the second occurrence of a white space in the string. While the `FIND`

function lets you dynamically capture the position of the white space delimiter, a nested `FIND`

function lets you capture the correct *start with* parameter value. That value is one position after the first occurrence of a white space.

The `FIND`

function supports three formal parameters, the first is the substring you’re looking for in the string, the second is the string to search, and the third is where to start. Like the `LEFT`

function, the position you start at is assumed to be one, which makes the last parameter an optional parameter. You only provide the *start with* parameter when you want to start someplace other than the left most position. While not required, you should consider always providing the *start with* parameter to the `FIND`

function. It improves readability.

The following shows how to use a `FIND`

function to locate the position one beyond where the first occurrence of a white space is found in a string.

=FIND(" ",A1,1)+1 |

If you pass the preceding `FIND`

function call as the *start with* parameter to another `FIND`

function, you can locate the second occurrence of a white space in a string stored in cell `A1`

. Nesting function calls inside functions creates what are known as *mega formulas* in Excel. The following demonstrates a *mega formula* to find the second instance of a white space in a string.

=FIND(" ",A1,FIND(" ",A1,1)+1) |

This returns the value of ten. You now know where to start but not the length of the substring on the right. You first need to find the length of the total string. You use the `LEN`

function to find that, like the following.

=LEN(A1) |

The `LEN`

function returns fifteen. You can calculate the length of the substring as five by subtracting the position of the second white space from the length of the string. While you could inspect that value in this one cell, you can’t do that when there are one hundred or one hundred thousand names in a list. The way to dynamically capture the right hand side substring length is shown below.

=LEN(A1)-FIND(" ",A1,FIND(" ",A1,1)+1) |

Now that you know how to capture the length of the substring, you can create a larger *mega forumla* to parse the substring on the right from the base string. The working example follows below.

=RIGHT(A1,LEN(A1)-FIND(" ",A1,FIND(" ",A1,1)+1)) |

The problem with the foregoing solution is that it fails when one of the names in the list doesn’t have a middle initial or name. The failure occurs because there would only be one white space in the base string, and the logic expects two. The `FIND`

function looking for the second white space returns a `#VALUE!`

error.

You can prevent this error by wrapping the nested `FIND`

function calls and `RIGHT`

function with calls with an `IFERROR`

function. An `IFERROR`

function returns the positional value of the `FIND`

function, or an alternate value, like a fresh copy of the base string.

=IFERROR(RIGHT(A1,LEN(A1)-IFERROR(FIND(" ",A1,FIND(" ",A1)+1),FIND(" ",A1))),A1) |

The `IFERROR`

function inside the `RIGHT`

function can cause the `RIGHT`

function to fail when a base string is returned instead of a position number. Therefore, you must also wrap the `RIGHT`

function inside another `IFERROR`

function to avoid an error. This guarantees the return of the base string as a valid substring.

Assuming you put this formula in cell `C1`

, you could copy this relative reference formula down in a column. It would require that you had a list of full names that followed a first name, white space, middle initial or name, white space, and last name in the `A`

column.

** Parsing the middle substring** ↓

This shows you how to parse a middle substring from a text cell in Microsoft Excel.

There are three built in functions that are required to let you parse dynamic substrings from the middle of base strings. Two of them are the `MID`

and `FIND`

functions. You’ve seen how to use the `FIND`

function in the left and right parsing examples of this blog. The third function is the `IFERROR`

function, which is required when a middle string doesn’t exist. There are two more functions that let you trap for the possibility of a single base string. They are the `IF`

and `ISNUMBER`

functions.

The `MID`

function takes three required parameters. The first is the text value or cell reference, the second is the *start with* value, and the third is the length of the substring. Dynamic substrings require you to bracket them, which means you need to find their beginning and ending positions and measure their length.

This basic idea means you parse the middle string from a set of three strings by finding their delimiters. As in the other examples, you’ll work with the string *Joseph F. Smith* as a base string. First, you find the first character of the middle string. You do this by finding the position of the first delimiting white space with a `FIND`

function, and then you add one to the returned result value. The example is below.

=FIND(" ",A1,1)+1 |

Next, you find the position of the second delimiter. This requires that you create what is known as a *mega formula*, which you create by nesting one or more formulas in another. You can use the following formula to do that.

=FIND(" ",A1,FIND(" ",A1)+1) |

You can then calculate the length of the middle string by subracting the first result from the second one, as shown below:

=FIND(" ",A1,FIND(" ",A1)+1)-FIND(" ",A1)-1 |

After you’ve mastered those formulas, you need to create a *mega formula* with the `MID`

function. This doesn’t have any error trapping yet, so it is very dependent on data that contains three substrings separated by white spaces.

=MID(A1,FIND(" ",A1)+1,FIND(" ",A1,FIND(" ",A1)+1)-FIND(" ",A1)-1) |

The preceding function returns *F.*. If the middle string were a middle name, it would return the middle name. It fails when you have a base string that lacks three substrings. You need to wrap the nested `FIND`

function and `MID`

function inside two respective `IFERROR`

functions. An `IFERROR`

function returns the positional value of the `FIND`

function, or an alternate value, like a fresh copy of the base string or a numeric equivalent. The following example uses an `IFERROR`

function call to substitutes a zero value because the absence of a second white space means there isn’t a middle string.

=MID(A5,FIND(" ",A5)+1,IFERROR(FIND(" ",A5,FIND(" ",A5)+1)-FIND(" ",A5)-1,0)) |

A second `IFERROR`

wrapping the `MID`

function lets you return a null value for middle name when there is only one name in the base string, like *Joseph*.

=IFERROR(MID(A5,FIND(" ",A5)+1,IFERROR(FIND(" ",A5,FIND(" ",A5)+1)-FIND(" ",A5)-1,0)),"") |

Assuming you put this formula in cell `D1`

, you could copy this relative reference formula down in a column. It would require that you had a list of full names that followed a first name, white space, middle initial or name, white space, and last name in the `A`

column.

** Data set** ↓

A list of native strings, parsed first, middle, and last names, and concatenated names.

Original Name First Name Middle Name Last Name Resorted Name Joseph Smith Joseph Smith Smith, Joseph Brigham Young Brigham Young Young, Brigham John Taylor John Taylor Taylor, John Wilford Woodruff Wilford Woodruff Woodruff, Wilford Lorenzo Snow Lorenzo Snow Snow, Lorenzo Joseph F. Smith Joseph F Smith Smith, Joseph F Heber J. Grant Heber J Grant Grant, Heber J George Albert Smith George Albert Smith Smith, George Albert David O. Mckay David O Mckay Mckay, David O Joseph Fielding Smith Joseph Fielding Smith Smith, Joseph Fielding Harold B. Lee Harold B Lee Lee, Harold B Spencer W. Kimball Spencer W Kimball Kimball, Spencer W Ezra Taft Benson Ezra Taft Benson Benson, Ezra Taft Howard W. Hunter Howard W Hunter Hunter, Howard W Gordon B. Hinckley Gordon B Hinckley Hinckley, Gordon B Thomas S. Monson Thomas S Monson Monson, Thomas S |