Mr. Clean swept his way into American homes in 1958 after Proctor and Gamble bought the rights from its creator, Linwood Burton. Interestingly, Regular Expressions were first described in the 50’s by Stephen Cole Kleene. Coincidence? I think not.

However, today’s post is not about household cleaning products, nor is it a history lesson in Computer Science. It is about scrubbing data in VBA using Regular Expressions (RegExp).

Substitute ()

I see many folks using multiple nested levels of the Substitute() function to try to clean their data. This can be time consuming and can lead to some inaccuracies. Instead, we can use Regular Expressions (RegExp) to increase efficiency and accuracy as well as handle complex strings and large data sets quickly and efficiently.

User Defined Functions

I also see folks create User Defined Functions (UDF’s) to manipulate strings. Here is a nice example the other day from Doug Jenkins over at Newton Excel Bach. (btw, If you are not following Doug’s blog, you should be) But is there a better way? Enter Regular Expressions (RegExp).

Regular Expressions (RegExp)

A Regular Expression is a sequence of characters that create a pattern. The sequence could be something complicated like <([A-Z][A-Z0-9]*)\b[^>]*>(.*?) to something simple like \d . Regular Expressions are very useful in VBA for working with many different scenarios of strings and introducing automation for transforming your data before loading to target databases for OLAP such as Essbase, Power Pivot or SSAS.

Regular Expression – Methods

Regular Expressions in VBA offers 3 Methods:

  1. Test
  2. Replace
  3. Execute

For today, I will focus on the Execute Method and look at the other Methods in future posts.

An example

Here’s a silly nonsensical string, 12kj$%23fg^&*34950…345. Let’s say I’m only interested in the numbers in the string so I want to return 122334950345. To return just the numbers from the string, I’ll use the pattern [0-9].

Option Explicit

Sub RegExFoo()
    'Author: Winston Snyder
    'Date: 12/11/2013
    'Extract a pattern of interest from an input string

    'Declare variables
        Dim wb As Workbook
        Dim ws As Worksheet
        Dim RE As Object
        Dim Match As Object
        Dim Matches As Object
        Dim strNumber As String
        Dim i As Long
    'Initialize variables
        Set wb = ThisWorkbook
        Set ws = wb.Worksheets("Sheet1")
        Set RE = CreateObject("VBScript.RegExp")
    'Criteria for Regular Expression
        With RE
            .Pattern = "[0-9]"
            .Global = True
            Set Matches = .Execute(ws.Range("A1").Value)
        End With
    'Loop Matches collection to build string of all numbers in the sample string
        strNumber = ""
        For i = 0 To Matches.Count - 1
            strNumber = strNumber + Matches(i)
        Next i
        MsgBox strNumber
    'Tidy up
        'Destroy objects
            Set Matches = Nothing
            Set RE = Nothing
            Set ws = Nothing
            Set wb = Nothing
End Sub

Output : 122334950345

Great! The code returned exactly what I was looking for.

Case Study: RegExp

The sample is a bit obscure and seems somewhat unlikely, but who knows – might be just the thing someone is looking for. Let’s take a look as something a bit more realistic.

Let’s say we receive a load file from Financial Analysis and Planning (FP&A) that they would like loaded to a cube as a forecast scenario. For our sample, we’ll use Customer Codes and Total Revenues. We receive the file, but we immediately see that there is a problem.


There is no delimiter between the customer code and the total revenue amount. Additionally, both sets of substrings are of varying length meaning we cannot use text to columns without some manual cleanup work. Let’s look at one way we might split these strings using Regular Expressions.

I’m going to:

  1. Read the strings into an array
  2. Loop the array
  3. Split the string into substrings – load the substrings into new arrays
  4. Output the contents of the new arrays to a worksheet

A Quick Note – Early / Late Binding

A discourse on Early / Late Binding is beyond the scope of this post. Suffice to say, I am using Late Binding as demonstrated:

 Dim RegEx                               As Object
'Create a regular epression object
        Set RegEx = GetRegEx
Private Function GetRegEx() As Object

    On Error Resume Next
    Set GetRegEx = CreateObject("VBScript.RegExp")
End Function

There are additional links at the bottom of the post for more information on Early / Late Binding.

The Complete Code

You can open a new workbook, launch the Visual Basic Editor (VBE) add a new module and paste the code below into the module. I broke the Subs() and Functions() into separate snippets to improve readability. Alternatively, you can download the workbook, the link is at the bottom of the post.

 Option Explicit
 Sub SplitStringNoDelimiter()
    'Author:        Winston Snyder
    'Date:          12/15/2013
    'Purpose:       Split string into text and value components
    'Comments:      No delimiter
    '               Loop array for output
    Dim RegEx                               As Object
    Dim wb                                  As Workbook
    Dim wsInput                             As Worksheet
    Dim wsOutput                            As Worksheet
    Dim rngInput                            As Range
    Dim rngOutputDescriptions               As Range
    Dim rngOutputValues                     As Range
    Dim arrInput()                          As Variant
    Dim arrOutputDescriptions()             As Variant
    Dim arrOutputValues()                   As Variant
    Dim i                                   As Long
    Dim lngRowsData                         As Long
    Const strPatternDescriptions            As String = "\D+"
    Const strPatternValues                  As String = "\d+(\.\d{1,2})?"
    Const lngColumnDescriptions             As Long = 1
    Const lngColumnValues                   As Long = 2
    'Excel enrionment - speed things up
        With Application
            .ScreenUpdating = False
            .DisplayAlerts = False
            .EnableEvents = False
            .Calculation = xlCalculationManual
        End With
        Set wb = ThisWorkbook
        With wb
            Set wsInput = .Worksheets("Input")
            Set wsOutput = .Worksheets("Output")
        End With
    'Clear previous output
    'Input range without header
        With wsInput
            lngRowsData = GetRow(ws:=wsInput)
            Set rngInput = .Range(.Cells(2, 1), .Cells(lngRowsData, 1))
        End With
    'Transfer input range to input array
        arrInput = rngInput
    'Dimension output arrays
        ReDim arrOutputDescriptions(LBound(arrInput) To UBound(arrInput))
        ReDim arrOutputValues(LBound(arrInput) To UBound(arrInput))
    'Create a regular epression object
        Set RegEx = GetRegEx
    'Loop through each string in the input array
        For i = LBound(arrInput) To UBound(arrInput)
            'Pass the string to regular expression function to return the descriptive portion of the string
                arrOutputDescriptions(i) = GetSubString(objRegEx:=RegEx, _
                                                        strString:=CStr(arrInput(i, 1)), _
            'Pass the string to regualr expressions functions to return the value portion of the string
                arrOutputValues(i) = GetSubString(objRegEx:=RegEx, _
                                                  strString:=CStr(arrInput(i, 1)), _
        Next i
    'Output all elements of each array to an output range
    'Description in Column 1, Values in Column 2
            Call OutputArray(ws:=wsOutput, _
                             vTmpArray:=arrOutputDescriptions, _
            Call OutputArray(ws:=wsOutput, _
                             vTmpArray:=arrOutputValues, _
    'Add a header to the data
        With wsOutput
            .Range("A1").EntireRow.Insert shift:=xlDown
            .Cells(1, 1) = "Descriptions"
            .Cells(1, 2) = "Values"
        End With

    'Tidy up
        'Erase arrays
            Erase arrInput
            Erase arrOutputDescriptions
            Erase arrOutputValues
        'Destroy objects
            Set RegEx = Nothing
            Set rngInput = Nothing
            Set wsInput = Nothing
            Set wsOutput = Nothing
            Set wb = Nothing
        'Restore Excel environment
            With Application
                .ScreenUpdating = True
                .DisplayAlerts = True
                .EnableEvents = True
                .Calculation = xlCalculationAutomatic
            End With
End Sub


Private Sub OutputArray(ws As Worksheet, _
                        vTmpArray() As Variant, _
                        lngColumn As Long)

    Dim j As Long
    For j = LBound(vTmpArray) To UBound(vTmpArray)
        ws.Cells(j, lngColumn).Value = vTmpArray(j)
    Next j
End Sub


Private Function GetRow(ws As Worksheet) As Long

    GetRow = ws.Cells(Rows.Count, 1).End(xlUp).Row

End Function


Private Function GetRange(ws As Worksheet, _
                          lngRowsStart As Long, _
                          lngRowsEnd As Long, _
                          lngColumn As Long) As Range

    Dim rng As Range

    With ws
        Set rng = .Range(.Cells(lngRowsStart, lngColumn), .Cells(lngRowsEnd, lngColumn))
    End With
    Set GetRange = rng
End Function


Private Function GetRegEx() As Object

    On Error Resume Next
    Set GetRegEx = CreateObject("VBScript.RegExp")

End Function


Private Function GetSubString(objRegEx As Object, _
                              strString As String, _
                              strPattern As String) As String
    Dim reMatches As Object
    Dim strResult As String
    strResult = "No Match"
    With objRegEx
        .Pattern = strPattern
        .Global = True

        Set reMatches = .Execute(strString)
        If reMatches.Count <> 0 Then
            strResult = reMatches.Item(0)
        End If
    End With
    GetSubString = strResult

End Function

The Results

Great! Exactly what I was looking for.

Tidy up

    Final Thoughts

    Regular Expressions are found in nearly all programming languages and much like Duct Tape, they have a Million and one uses. You are truly only limited by your imagination and ability to concoct the correct patterns. I use Regular Expressions as I Extract data from various Data Silos to Transform and normalize data prior to Loading to target reporting databases. Let us know how you use Regular Expressions in the comments.


    Download the workbook from OneDrive.

    Additional Resources

, , , ,

13 comments untill now

  1. Thanks for the kind comments about my blog!

    I have to admit that regular expressions have been a bit of a blind spot for me. I’ve been aware of them, but never really make any use of them, but I’ve got a potential application in mind, so I’ll try and have a look over the Christmas break.

  2. Hi Doug,
    No worries. I’ve been reading Newton Excel Bach for some time. Always interesting to see the different ways folks push the Excel envelope.

  3. […] after I wrote about extracting numbers from text strings, Winston Snyder at wrote a detailed article about using “regular expressions” to separate text from […]

  4. Thanks Doug!

  5. C++ seems to be kryptonite to VBAers, but on the off chance you have an open mind about it hooks the C++ regular expression library into Excel.
    One funny thing I’ve found is Excel converts three periods into the ellipsis character x85.

  6. Thank you so much for taking the time to carefully develop and explain “SplitStringNoDelimeter”. I am a RegEx beginner and would like advice on how to find specific key words contained in two specific columns within a spreadsheet, columns “Z” & “AA”. Once a match is found, I would like to copy the entire row from which the match was found, and copy to a new worksheet for further comparison and analysis. I need the power of RegEx to complete the task so I can write the pattern to catch any misspelled key words. I want to ensure preservation of all original data and minimize the chance of overlooking valuable data based on the fact a word was misspelled.
    Any suggestions or modifications are greatly appreciated. Thank you in advance for your time.

  7. Hi Dana,

    The most efficient approach is to always try to use AutoFilter or AdvancedFilter Methods of the Range Object first, if all else fails, then loop by cell. You can use wildcard characters with the AdvancedFilter Method to handle possible misspellings.

    Nice sample on Tushar Mehta’s site:

  8. Jeff Weir @ 2015-02-12 03:47

    Hi Winston. Great post as always. Hey, the download link doesn’t seem to be working. Can you amend? I can’t wait to take a closer look.

  9. Hi Jeff,

    Link is fixed – thanks


  10. Jeff Weir @ 2015-02-13 13:30

    I just stumbled across an incredible article on Joshua Honig’s Byte Comb blog titled Increasing Performance of Regular Expressions in VBA at
    Here’ a direct quote

    Behind the scenes, every regular expression pattern gets compiled into a tiny little program that executes in order to actually apply the regular expression. The first time you call any of the methods on a RegExp object there will be a short pause while the expression is compiled. Once the regular expression has been compiled, though, it can execute right away.

    The pause for compilation is usually on the order of a few tens of milliseconds. That’s small enough that you might never notice it even when stepping through code. But it can really add up when the expression is going to be applied to a few hundred thousand rows.

    If you’re using a RegExp object in a procedure while looping through a recordset programmatically you can just keep using the same RegExp object. But what about when the same pattern is used in independent calls to the same function — or even to different functions entirely? The solution is to cache all of your regular expressions objects, and serve them up using a simple factory pattern. In this article I’ll show how to do just that.

    This is recommended reading for anyone mucking about with Regex.

  11. Hi !

    can you please send me the soft copy. I cannot dounload from the link.

    Thank you very much

  12. Hi Sharif,

    Should be fixed. Give it another try and let me know if not.


  13. Thanks it works just fine !

Add your comment now