MrCleanFinalFinal

In my last post on working with strings, I demonstrated some VBA with a call to the Regular Expression (RegExp) library to split alpha characters from numeric characters where no delimiter was present. Today I received a data set that contained 10’s of thousand of strings that contained some trailing stuff that I wanted to remove. My initial thought was to use the RegExp engine with the correct pattern, but I discovered a better way.

The Requirements

Upon review of the strings, the pattern I discovered:

  1. The string always begin with an alpha or numeric character with a mix of upper and lower case
  2. The string I need to preserve end with an alpha or numeric character with a mix of upper and lower case
  3. Everything from the beginning of the of the first alphanumeric to the last alphanumeric must be preserved as is, spaces, case, special characters, whatever
  4. Everything trailing to right of the last alphanumeric may safely be removed, special characters, non-printable characters, spaces, whatever
  5. Strings are of random lengths both input and output

So, I need to :

  1. Search from the end (right) of the string
  2. Find the first alphanumeric character irregardless of case
  3. Return the string beginning from the first character to the character position determined in the previous step

Quick Segue – The Functions

Before I get to the Sub Procedure, I would like to review all of the Functions I am using to make the process fairly dynamic :

GetSelectedSheet

Here I am using an InputBox to allow the user to select a worksheet at run-time. The InputBox Method has 1 required parameter and 7 optional parameters. If the optional parameters are not utilized, then the InputBox returns a text value. However, The optional Type parameter makes the InputBox more powerful. In the function, I am using Type:=8 to return a cell reference as a Range Object. You can read more about the InputBox Method here.

Public Function GetSelectedSheet() As String
    
    'Declare variables
        Dim ws                          As Worksheet
        Dim rng                         As Range
    
    'Users - select a cell on a worksheet
        Set rng = InputBox( _
                    Prompt:="Please select a cell on a worksheet", _
                    Title:="Select a worksheet", _
                    Default:=ActiveCell.Address, _
                    Type:=8) 'Range selection
                    
    'Get the parent worksheet of the selected cell
        Set ws = rng.Parent
        
    'Pass the name of the worksheet to the function
        GetSelectedSheet = ws.Name
    
    'Tidy up
        Set rng = Nothing
        Set ws = Nothing
 End Function

GetRows

Fairly straight forward, I pass the selected worksheet to the function and it returns the maximum rows of data based on Column 1. I could make this more dynamic by passing a column number to the function as well, but I generally always use Column 1.

Public Function GetRows(ws As Worksheet) As Long

    Dim r As Long
    
    With ws
        r = .Cells(Rows.Count, 1).End(xlUp).Row
    End With
    
    GetRows = r
End Function

GetColumns

Straight forward, I pass the selected worksheet to the function and it returns the maximum columns of data based on Row 1. I could make this more dynamic by passing a row number to the function as well, but I generally always use Row 1.

Public Function GetColumns(ws As Worksheet) As Long

     'Declare variables    
           Dim c As Long
    
     'Get column count, store it in a variable    
          With ws
               c = .Cells(1, Columns.Count).End(xlToLeft).Column
          End With

     'Pass the variable value to the function
          GetColumns = c
End Function

GetUserInput

Again, fairly straight forward. Get a text value from the user to search for in the next function.
I call the function like this :

    'User - What is search term?
        strSearchTerm = GetUserInput(strPrompt:="What is the search term?", _
                                     strTitle:="Find Column Number")
Public Function GetUserInput(strPrompt As String, _
                             strTitle As String) As String
       
    'Declare variables
         Dim strUserInput As String
       
    'Call the InputBox Method, pass user input to a variable
          strUserInput = InputBox(Prompt:=strPrompt, _
                                  Title:=strTitle)

    'Pass the variable value to the function                                 
         GetUserInput = strUserInput

End Function

GetColumnNumber

The function has 2 arguments, a worksheet and a string value that I got from the user in the last function. The function will create a Range Object and search that Range for the term supplied by the user. Again, I am using Row 1 here, but I could make it more dynamic by passing a row number to the function as one of its arguments. Below, I am using the Named Argument, LookAt:= and passing the value xlPart instead of xlWhole. You may want to consider this in your VBA Projects as you program defensively around what the user may input. Since I am using this for myself, I am not too concerned for now,

Public Function GetColumnNumber(ws As Worksheet, _
                                strSearchTerm As String) As Long

    'Declare variables
        Dim rng As Range
        Dim MaxColumns As Long
        Dim lngField As Long
        
    'Initialize
        MaxColumns = GetColumns(ws:=ws)
        With ws
            Set rng = .Range(.Cells(1, 1), .Cells(1, MaxColumns))
        End With
        
    'Find columns number
        lngField = rng.Find(What:=strSearchTerm, _
                            LookIn:=xlValues, _
                            LookAt:=xlPart, _
                            MatchCase:=False).Column
                            
    'Pass the column number to the function
        GetColumnNumber = lngField
        
    'Tidy up
        Set rng = Nothing

End Function

GetCleanAlphaNumeric

This is the money! All other functions to this point were setup work to allow this function to do the heavy lifting. The function uses the LIKE operator to compare a character beginning at the right-most position of the string to the pattern, “[0-9A-Za-z]”. As soon as a match is found, the For..Next Loop is exited, thus saving time by not checking characters unnecessarily. I then use the MID() Function to get the string from the 1st character to the last alphanumeric character position determined in the For..Next Loop. More on the LIKE operator here.

Public Function GetCleanAlphaNumeric(strChar As String) As String
    
    'Comments   :   Remove non-alpha numeric characters from end of string
   
    'Declare variables
        Dim i As Long
        Dim lngLengthString As Long
        Dim blnTest As Boolean
        Dim posLastAlphaNumeric As Long
        Dim strClean As String
   
    'Initialize
        blnTest = False

    'Length of string to check
        lngLengthString = Len(CStr(strChar))
        
    'Compare each charcter to pattern
    'Begin at end of string
    'Stop as soon as find alphanumeric
        For posLastAlphaNumeric = lngLengthString To 1 Step -1
            blnTest = Mid(CStr(strChar), posLastAlphaNumeric, 1) Like "[0-9A-Za-z]"
            If blnTest = True Then Exit For
        Next posLastAlphaNumeric
        
    'posLastAlphaNumeric is the position of last AlphaNumeric character
    'Use the position of the last alphanumeric to get the final length of the string
    'Assign the value to the range
        strClean = CStr(Mid(strChar, 1, posLastAlphaNumeric))
        
    'Pass the clean string to the function
        GetCleanAlphaNumeric = strClean

 End Function

The Main Procedure

Here is the main procedure that calls all of the functions. Note: Screen updating must be on for the user to select a cell on a worksheet. Turn ScreenUpdating off after the user selects a cell on a worksheet.

Option Explicit
Sub CleanStrings()
     
    'Author:        Winston Snyder
    'Date:          3/28/14
    'Purpose:       Get string excluding non-alphanumeric trailing characters
    '---------------------------------------------------------------------------------------------------------------------------------------------
     
    'Declare variables
        Dim wb                                  As Workbook
        Dim ws                                  As Worksheet
        Dim rng                                 As Range
        Dim C                                   As Range
        Dim strSearchTerm                       As String
        Dim strStringToBeCleaned                As String
        Dim lngColumnNumber                     As Long
        Dim MaxRows                             As Long
     
    'Excel environment - speed things up
        With Application
            .DisplayAlerts = False
            .EnableEvents = False
            .Calculation = xlCalculationManual
        End With
     
    'Initialize
        Set wb = ThisWorkbook
        Set ws = wb.Worksheets(GetSelectedSheet)
        Application.ScreenUpdating = False
        
    'Get maximum number of rows on the worksheet
        MaxRows = GetRows(ws:=ws)
         
    'User - What is search term?
        strSearchTerm = GetUserInput(strPrompt:="What is the search term?", _
                                     strTitle:="Find Column Number")
                                     
    'Get the column number based on the search term
        lngColumnNumber = GetColumnNumber(ws:=ws, _
                                          strSearchTerm:=strSearchTerm)
                                          
    'Define the range that contains strings to be cleaned
        With ws
            Set rng = .Range(.Cells(2, lngColumnNumber), .Cells(MaxRows, lngColumnNumber))
        End With
        
    'Clean each string in the range
        For Each C In rng
            strStringToBeCleaned = CStr(C.Value)
            C.Value = GetCleanAlphaNumeric(strChar:=strStringToBeCleaned)
        Next C

    'Tidy up
             
        'Destroy objects
            Set C = Nothing
            Set rng = Nothing
            Set ws = Nothing
            Set wb = Nothing
             
        'Restore Excel environment
            With Application
                .ScreenUpdating = True
                .DisplayAlerts = True
                .EnableEvents = True
                .Calculation = xlCalculationAutomatic
            End With
End Sub

Tidy up

    Final Thoughts

    That’s it today. I like the LIKE operator. This process is fast, reviewed 25K strings, and updated them when needed in no time. Awesome!

,