MrCleanFinalFinal

In my last post on working with strings, I demonstrated some VBA with a call to the Regular Expression (RegExp) library to split alpha characters from numeric characters where no delimiter was present. Today I received a data set that contained 10’s of thousand of strings that contained some trailing stuff that I wanted to remove. My initial thought was to use the RegExp engine with the correct pattern, but I discovered a better way.

The Requirements

Upon review of the strings, the pattern I discovered:

  1. The string always begin with an alpha or numeric character with a mix of upper and lower case
  2. The string I need to preserve end with an alpha or numeric character with a mix of upper and lower case
  3. Everything from the beginning of the of the first alphanumeric to the last alphanumeric must be preserved as is, spaces, case, special characters, whatever
  4. Everything trailing to right of the last alphanumeric may safely be removed, special characters, non-printable characters, spaces, whatever
  5. Strings are of random lengths both input and output

So, I need to :

  1. Search from the end (right) of the string
  2. Find the first alphanumeric character irregardless of case
  3. Return the string beginning from the first character to the character position determined in the previous step

Quick Segue – The Functions

Before I get to the Sub Procedure, I would like to review all of the Functions I am using to make the process fairly dynamic :

GetSelectedSheet

Here I am using an InputBox to allow the user to select a worksheet at run-time. The InputBox Method has 1 required parameter and 7 optional parameters. If the optional parameters are not utilized, then the InputBox returns a text value. However, The optional Type parameter makes the InputBox more powerful. In the function, I am using Type:=8 to return a cell reference as a Range Object. You can read more about the InputBox Method here.

Public Function GetSelectedSheet() As String
    
    'Declare variables
        Dim ws                          As Worksheet
        Dim rng                         As Range
    
    'Users - select a cell on a worksheet
        Set rng = InputBox( _
                    Prompt:="Please select a cell on a worksheet", _
                    Title:="Select a worksheet", _
                    Default:=ActiveCell.Address, _
                    Type:=8) 'Range selection
                    
    'Get the parent worksheet of the selected cell
        Set ws = rng.Parent
        
    'Pass the name of the worksheet to the function
        GetSelectedSheet = ws.Name
    
    'Tidy up
        Set rng = Nothing
        Set ws = Nothing
 End Function

GetRows

Fairly straight forward, I pass the selected worksheet to the function and it returns the maximum rows of data based on Column 1. I could make this more dynamic by passing a column number to the function as well, but I generally always use Column 1.

Public Function GetRows(ws As Worksheet) As Long

    Dim r As Long
    
    With ws
        r = .Cells(Rows.Count, 1).End(xlUp).Row
    End With
    
    GetRows = r
End Function

GetColumns

Straight forward, I pass the selected worksheet to the function and it returns the maximum columns of data based on Row 1. I could make this more dynamic by passing a row number to the function as well, but I generally always use Row 1.

Public Function GetColumns(ws As Worksheet) As Long

     'Declare variables    
           Dim c As Long
    
     'Get column count, store it in a variable    
          With ws
               c = .Cells(1, Columns.Count).End(xlToLeft).Column
          End With

     'Pass the variable value to the function
          GetColumns = c
End Function

GetUserInput

Again, fairly straight forward. Get a text value from the user to search for in the next function.
I call the function like this :

    'User - What is search term?
        strSearchTerm = GetUserInput(strPrompt:="What is the search term?", _
                                     strTitle:="Find Column Number")
Public Function GetUserInput(strPrompt As String, _
                             strTitle As String) As String
       
    'Declare variables
         Dim strUserInput As String
       
    'Call the InputBox Method, pass user input to a variable
          strUserInput = InputBox(Prompt:=strPrompt, _
                                  Title:=strTitle)

    'Pass the variable value to the function                                 
         GetUserInput = strUserInput

End Function

GetColumnNumber

The function has 2 arguments, a worksheet and a string value that I got from the user in the last function. The function will create a Range Object and search that Range for the term supplied by the user. Again, I am using Row 1 here, but I could make it more dynamic by passing a row number to the function as one of its arguments. Below, I am using the Named Argument, LookAt:= and passing the value xlPart instead of xlWhole. You may want to consider this in your VBA Projects as you program defensively around what the user may input. Since I am using this for myself, I am not too concerned for now,

Public Function GetColumnNumber(ws As Worksheet, _
                                strSearchTerm As String) As Long

    'Declare variables
        Dim rng As Range
        Dim MaxColumns As Long
        Dim lngField As Long
        
    'Initialize
        MaxColumns = GetColumns(ws:=ws)
        With ws
            Set rng = .Range(.Cells(1, 1), .Cells(1, MaxColumns))
        End With
        
    'Find columns number
        lngField = rng.Find(What:=strSearchTerm, _
                            LookIn:=xlValues, _
                            LookAt:=xlPart, _
                            MatchCase:=False).Column
                            
    'Pass the column number to the function
        GetColumnNumber = lngField
        
    'Tidy up
        Set rng = Nothing

End Function

GetCleanAlphaNumeric

This is the money! All other functions to this point were setup work to allow this function to do the heavy lifting. The function uses the LIKE operator to compare a character beginning at the right-most position of the string to the pattern, “[0-9A-Za-z]”. As soon as a match is found, the For..Next Loop is exited, thus saving time by not checking characters unnecessarily. I then use the MID() Function to get the string from the 1st character to the last alphanumeric character position determined in the For..Next Loop. More on the LIKE operator here.

Public Function GetCleanAlphaNumeric(strChar As String) As String
    
    'Comments   :   Remove non-alpha numeric characters from end of string
   
    'Declare variables
        Dim i As Long
        Dim lngLengthString As Long
        Dim blnTest As Boolean
        Dim posLastAlphaNumeric As Long
        Dim strClean As String
   
    'Initialize
        blnTest = False

    'Length of string to check
        lngLengthString = Len(CStr(strChar))
        
    'Compare each charcter to pattern
    'Begin at end of string
    'Stop as soon as find alphanumeric
        For posLastAlphaNumeric = lngLengthString To 1 Step -1
            blnTest = Mid(CStr(strChar), posLastAlphaNumeric, 1) Like "[0-9A-Za-z]"
            If blnTest = True Then Exit For
        Next posLastAlphaNumeric
        
    'posLastAlphaNumeric is the position of last AlphaNumeric character
    'Use the position of the last alphanumeric to get the final length of the string
    'Assign the value to the range
        strClean = CStr(Mid(strChar, 1, posLastAlphaNumeric))
        
    'Pass the clean string to the function
        GetCleanAlphaNumeric = strClean

 End Function

The Main Procedure

Here is the main procedure that calls all of the functions. Note: Screen updating must be on for the user to select a cell on a worksheet. Turn ScreenUpdating off after the user selects a cell on a worksheet.

Option Explicit
Sub CleanStrings()
     
    'Author:        Winston Snyder
    'Date:          3/28/14
    'Purpose:       Get string excluding non-alphanumeric trailing characters
    '---------------------------------------------------------------------------------------------------------------------------------------------
     
    'Declare variables
        Dim wb                                  As Workbook
        Dim ws                                  As Worksheet
        Dim rng                                 As Range
        Dim C                                   As Range
        Dim strSearchTerm                       As String
        Dim strStringToBeCleaned                As String
        Dim lngColumnNumber                     As Long
        Dim MaxRows                             As Long
     
    'Excel environment - speed things up
        With Application
            .DisplayAlerts = False
            .EnableEvents = False
            .Calculation = xlCalculationManual
        End With
     
    'Initialize
        Set wb = ThisWorkbook
        Set ws = wb.Worksheets(GetSelectedSheet)
        Application.ScreenUpdating = False
        
    'Get maximum number of rows on the worksheet
        MaxRows = GetRows(ws:=ws)
         
    'User - What is search term?
        strSearchTerm = GetUserInput(strPrompt:="What is the search term?", _
                                     strTitle:="Find Column Number")
                                     
    'Get the column number based on the search term
        lngColumnNumber = GetColumnNumber(ws:=ws, _
                                          strSearchTerm:=strSearchTerm)
                                          
    'Define the range that contains strings to be cleaned
        With ws
            Set rng = .Range(.Cells(2, lngColumnNumber), .Cells(MaxRows, lngColumnNumber))
        End With
        
    'Clean each string in the range
        For Each C In rng
            strStringToBeCleaned = CStr(C.Value)
            C.Value = GetCleanAlphaNumeric(strChar:=strStringToBeCleaned)
        Next C

    'Tidy up
             
        'Destroy objects
            Set C = Nothing
            Set rng = Nothing
            Set ws = Nothing
            Set wb = Nothing
             
        'Restore Excel environment
            With Application
                .ScreenUpdating = True
                .DisplayAlerts = True
                .EnableEvents = True
                .Calculation = xlCalculationAutomatic
            End With
End Sub

Tidy up

    Final Thoughts

    That’s it today. I like the LIKE operator. This process is fast, reviewed 25K strings, and updated them when needed in no time. Awesome!

,

SpyVsSpyLg

In January, I wrote a post that demonstrated how to combine Excel Workbooks. That post assumed that each of the source workbooks contained at least one ListObject Object.

Today, I will revise that code a bit to add a ListObject Object (LO) if the lo does not exist. Then I will demonstrate the code to merge Workbooks using Range Objects. Lastly, I will compare both processes to see which is the fastest using the Windows High Resolution Timer.


    edit: As is my usual practice, I am going to use Late Binding in the sample snippets below. A discussion on Late / Early Binding is beyond the scope of this post. Please see these links for a detailed explanation of Late / Early Binding.

    1. Beyond Excel
    2. JP Software Technologies


The ListObject Object (LO)

One of the very useful features the LO, is that it offers us the three distinct properties: HeaderRowRange, DataBodyRange, TotalsRowRange. Once we have a LO, we can leverage these properties to streamline the heavy lifting.

ListObjectRanges

Check If The ListObject Object Exists

I can check if the LO exists, if not, I can add one while I’m looping through workbooks:

Option Explicit

Sub AddListObjectIfDoesNotExist()

    'Does a  ListObject Object exist?
    
    Dim wb As Workbook
    Dim ws As Worksheet
    Dim lo As ListObject
    
    Set wb = ThisWorkbook
    For Each ws In wb.Worksheets
        Set lo = ws.Range("A1").ListObject
        If lo Is Nothing Then
            Set lo = ws.ListObjects.Add( _
                                    SourceType:=xlSrcRange, _
                                    Source:=ws.Range("A1").CurrentRegion, _
                                    Destination:=ws.Range("A1"))
        Else
            'ListObject already exists - do nothing
        End If
    Next ws
    
    'Tidy up
        Set lo = Nothing
        Set wb = Nothing
End Sub

Now that I can add an LO dynamically, I can exploit its properties with assurance.

Here is the fully revised code to merge Excel Workbooks using ListObject Objects:

Option Explicit
 
Sub MergeWorbooksUsingListObjects()
     
    'Log:
    'Date               Author                      Action                  Comment
    '-------------------------------------------------------------------------------------------------------------------
    '1/26/2014           ws                         Created                 Merge workbooks using ListObjects from source files
    '3/23/2014           ws                         Modified                If ListObject does not exist, add ListObject
    '-------------------------------------------------------------------------------------------------------------------
     
    'Delare variables
        Dim wb                  As Workbook
        Dim wbData              As Workbook
        Dim ws                  As Worksheet
        Dim wsData              As Worksheet
        Dim rngData             As Range
        Dim rngDestination      As Range
        Dim lo                  As ListObject
        Dim fso                 As Object
        Dim fsoFolder           As Object
        Dim fsoFile             As Object
        Dim strSelectedFolder   As String
        Dim strCurrentPath      As String
        Const strSpecifiedPath  As String = "C:\"
        Dim lngRows             As Long
        Dim blnFlag             As Boolean
         
    'Excel environment - speed things up
        With Application
            .ScreenUpdating = False
            .DisplayAlerts = False
            .EnableEvents = False
            .Calculation = xlCalculationManual
        End With
         
    'Initialize variables
        Set wb = ThisWorkbook
        Set ws = wb.Worksheets("Data")
        blnFlag = True
         
    'Clear data from control workbook from previous consolidations
        ws.UsedRange.ClearContents
     
    'Get the current path, so reset the path at the end of the procedure
        strCurrentPath = CurDir()
         
    'Set the target directory to get the user closer to the working folder
    'This will minimize the time the user must spend drilling into the file system
    'once they are presented with the FileDialog
        ChDir (strSpecifiedPath)
     
    'Create a FileSystemObject
        Set fso = GetFSO
 
    'Prompt the user to select a folder
    'Return the path of the selected folder
        strSelectedFolder = GetSelectedFolder
         
    'Get the FSO Folder of the selected folder
        Set fsoFolder = fso.GetFolder(strSelectedFolder)
         
    'Loop each file in folder
    'Copy data from each file to control workbook
        For Each fsoFile In fsoFolder.Files
            Set wbData = Workbooks.Open(fsoFile)
            Set wsData = wbData.Worksheets("Sheet1")
             
            'Get next blank row from destination worksheet
            'If first time, need row 1, else, next blank row
                lngRows = GetRows(ws:=ws)
                If blnFlag = False Then lngRows = lngRows + 1
                 
            'The Destination Range
                Set rngDestination = ws.Cells(lngRows, 1)
             
            With wsData
                'If a ListObject does not exist, add a ListObject
                    Set lo = .Range("A1").ListObject
                    If lo Is Nothing Then
                        Set lo = .ListObjects.Add( _
                                              SourceType:=xlSrcRange, _
                                              Source:=.Range("A1").CurrentRegion, _
                                              Destination:=ws.Range("A1"))
                    Else
                        'Do nothing, ListObject already exists
                    End If
                
                'If first workbook, include the header row
                    For Each lo In .ListObjects
                        If blnFlag = True Then
                            Set rngData = Union(lo.HeaderRowRange, lo.DataBodyRange)
                            blnFlag = False
                        Else
                            Set rngData = lo.DataBodyRange
                        End If
                    Next lo
            End With
  
            'Copy the Data Range to the Destination Range
                rngData.Copy
                rngDestination.PasteSpecial xlPasteValuesAndNumberFormats
                 
            'Close the source file without saving
                wbData.Close
         
        Next fsoFile
         
    'Tidy up
        'Restore to original path
            ChDir (strCurrentPath)
             
        'Restore Excel environment
            With Application
                .ScreenUpdating = True
                .DisplayAlerts = True
                .EnableEvents = True
                .Calculation = xlCalculationAutomatic
            End With
             
        'Destroy objects
            Set fsoFolder = Nothing
            Set fso = Nothing
            Set lo = Nothing
            Set rngData = Nothing
            Set rngDestination = Nothing
            Set ws = Nothing
            Set wb = Nothing
         
End Sub

And the functions:


Public Function GetRows(ws As Worksheet) As Long
 
    Dim r       As Long
     
    With ws
        r = .Cells(Rows.Count, 1).End(xlUp).Row
        GetRows = r
    End With
     
End Function

'-------------------------------------------------------------------------
Public Function GetFSO()
 
    Dim fso             As Object
     
    Set fso = CreateObject("Scripting.FileSystemObject")
    Set GetFSO = fso
     
    Set fso = Nothing
 
End Function

'-------------------------------------------------------------------------
Public Function GetSelectedFolder() As String
     
    Dim diaFolder       As FileDialog
    Dim strFolder       As String
 
    Set diaFolder = Application.FileDialog(msoFileDialogFolderPicker)
    With diaFolder
        .AllowMultiSelect = False
        .Show
        strFolder = .SelectedItems(1)
    End With
 
    GetSelectedFolder = strFolder
End Function

Works great!

The Range Object

The Range Object does not have the properties: HeaderRowRange, DataBodyRange, TotalsRowRange. I am ok with the entire Range from the first Workbook when Merging multiple Workbooks. But on subsequent Workbooks, I want to exclude the header row.

Assume I have a Range named rng with address $A$1:$G$10, from this, I would like to exclude the header row on Row 1. The first step is to count the number of rows and columns in the range

Dim r As Long
dim c As Long
r = rng.Rows.Count
c = rng.Columns.Count

Next, from the rng, move down 1 Row. This means the rng address is now $A$2:$G$11

rng.Offset(1)

The $A$2 part of the address is good, but the $G$11 part means I now have a blank row. So now I need to Resize the rng to exclude the blank row:

rng.Offset(1).Resize(r-1,c)

And test the final Range Address to make sure all is as expected:

Debug.Print rng.Address

$A$2:$G$10

Perfect! Exactly what I was looking for. I’m now ready to modify the code again to use Range Objects instead of ListObject Objects.

Merge Workbooks Using Range Objects

Here’s the working code to merge workbooks using Range Objects instead of ListObject Objects:

Option Explicit
 
Sub MergeWorbooksUsingRangeObjects()
     
    'Log:
    'Date               Author                      Action                  Comment
    '-------------------------------------------------------------------------------------------------------------------------------------------
    '1/26/2014           ws                         Created                 Merge workbooks using ListObjects from source files
    '3/23/2014           ws                         Modified                If ListObject does not exist, add ListObject
    '3/23/2014           ws                         Modified                Use Range Object, modify range to exclude header row if not File = 1
    '-------------------------------------------------------------------------------------------------------------------------------------------
     
    'Delare variables
        Dim wb                  As Workbook
        Dim wbData              As Workbook
        Dim ws                  As Worksheet
        Dim wsData              As Worksheet
        Dim rngData             As Range
        Dim rngDestination      As Range
        Dim fso                 As Object
        Dim fsoFolder           As Object
        Dim fsoFile             As Object
        Dim strSelectedFolder   As String
        Dim strCurrentPath      As String
        Const strSpecifiedPath  As String = "C:\"
        Dim lngRows             As Long
        Dim r                   As Long
        Dim c                   As Long
        Dim blnFlag             As Boolean
         
    'Excel environment - speed things up
        With Application
            .ScreenUpdating = False
            .DisplayAlerts = False
            .EnableEvents = False
            .Calculation = xlCalculationManual
        End With
         
    'Initialize variables
        Set wb = ThisWorkbook
        Set ws = wb.Worksheets("Data")
        blnFlag = True
         
    'Clear data from control workbook from previous consolidations
        ws.UsedRange.ClearContents
     
    'Get the current path, so reset the path at the end of the procedure
        strCurrentPath = CurDir()
         
    'Set the target directory to get the user closer to the working folder
    'This will minimize the time the user must spend drilling into the file system
    'once they are presented with the FileDialog
        ChDir (strSpecifiedPath)
     
    'Create a FileSystemObject
        Set fso = GetFSO
 
    'Prompt the user to select a folder
    'Return the path of the selected folder
        strSelectedFolder = GetSelectedFolder
         
    'Get the FSO Folder of the selected folder
        Set fsoFolder = fso.GetFolder(strSelectedFolder)
         
    'Loop each file in folder
    'Copy data from each file to control workbook
        For Each fsoFile In fsoFolder.Files
            Set wbData = Workbooks.Open(fsoFile)
            Set wsData = wbData.Worksheets("Sheet1")
             
            'Get next blank row from destination worksheet
            'If first time, need row 1, else, next blank row
                lngRows = GetRows(ws:=ws)
                If blnFlag = False Then lngRows = lngRows + 1
                 
            'The Destination Range
                Set rngDestination = ws.Cells(lngRows, 1)
             
            With wsData
                'Create the data range
                    Set rngData = .Range("A1").CurrentRegion
                    r = rngData.Rows.Count
                    c = rngData.Columns.Count
                
                'If first workbook, include the header row, otherwise resize the range to exclude the header row
                    If blnFlag = True Then
                        blnFlag = False
                    Else
                        Set rngData = rngData.Offset(1).Resize(r - 1, c)
                    End If
            End With
  
            'Copy the Data Range to the Destination Range
                rngData.Copy
                rngDestination.PasteSpecial xlPasteValuesAndNumberFormats
                 
            'Close the source file without saving
                wbData.Close
         
        Next fsoFile
         
    'Tidy up
        'Restore to original path
            ChDir (strCurrentPath)
             
        'Restore Excel environment
            With Application
                .ScreenUpdating = True
                .DisplayAlerts = True
                .EnableEvents = True
                .Calculation = xlCalculationAutomatic
            End With
             
        'Destroy objects
            Set fsoFolder = Nothing
            Set fso = Nothing
            Set rngData = Nothing
            Set rngDestination = Nothing
            Set ws = Nothing
            Set wb = Nothing
         
End Sub

That worked well, next I’ll add a little bit of code to time each of the Subs() to see if there is any appreciable difference.

Windows High-Resolution Timer (WHRT)

To use WHRT, I’ll just need to add 3 lines of code:

Declare a reference to the kernel32 library

Public Declare Function GetTickCount Lib "kernel32.dll" () As Long

Get the tick count and assign it to a variable

t = GetTickCount

Get the tick count at the end, subtract the first to calculate elapsed time

debug.print GetTickCount - t & " Milliseconds"

I’ll put the Public Function “GetTickCount” in a Module with my other Public Functions.
I normally put the first call to the Public Function near the top of a Sub(), but in this case I’m using a FileDialog to interact with the user so I want to place the call to the Function after the FileDialog:

    'Prompt the user to select a folder
    'Return the path of the selected folder
        strSelectedFolder = GetSelectedFolder
        
    'Get the tick count
        t = GetTickCount

The last call to the GetTickCount Function goes at the very end of the code:

            Set ws = Nothing
            Set wb = Nothing
            
        'Get elapsed time
             Debug.Print GetTickCount - t & " Milliseconds"

The Results

The Sub() adding the ListObject Object dynamically:

1420 Milliseconds
1233 Milliseconds
1419 Milliseconds
1419 Milliseconds
1248 Milliseconds
1451 Milliseconds
1435 Milliseconds
1264 Milliseconds
1653 Milliseconds
1264 Milliseconds
——————–
1380.6 Milliseconds Avg

The Sub() Resizing the Range Object:

1217 Milliseconds
1217 Milliseconds
1451 Milliseconds
1264 Milliseconds
1232 Milliseconds
1232 Milliseconds
1248 Milliseconds
1263 Milliseconds
1264 Milliseconds
1248 Milliseconds
————————
1263.6 Milliseconds Avg

Tidy Up

    Final Thoughts

    The Sub() using the Range Object is a bit faster, but not much. Bear in mind, I ran the process on 3 files -each file with 50 rows and 7 columns. The results may be more pronounced on more files with more data in each file.

    Downloads

    Download the file containing both Subs() and all Functions here.

    Additional Resources

    ListObject Objects

    ListObject Object Model
    ListObjects Object Model
    Working with Tables in Excel 2013, 2010 and 2007 (VBA)
    Working with Microsoft Excel tables in VBA

    Range Object

    The Range Object – Object Model

, , ,

ExportCrop

A member over on one of the LinkedIn Excel Groups had a question about how to export 1 to several worksheets from an Excel Workbook to a pdf file.

Below is the method I use to load a Sheets Array from the list the user defined on a worksheet. There are other ways to do this such as based on the ColorIndex of the tab, some character in the name of the worksheet or the code name of the worksheet.

Option Explicit
    
Sub ExportXLToPDF()

    'Comments:
    'Assume list of worksheets to be included in output are listed in Column 1 on "List"

    Dim wb                  As Workbook
    Dim ws                  As Worksheet
    Dim Arr()               As String
    Dim MaxRows             As Long
    Dim i                   As Long
    Dim strPath             As String
    Dim strFileName         As String
    Const strEXTENSION      As String = ".pdf"
    
    Set wb = ThisWorkbook
    Set ws = wb.Worksheets("List")
    
    'User - where to save the output file
        strPath = GetFolder & "\"
        
    'User - what to name the output file
        strFileName = GetUserInput(strPrompt:="Please enter a name for the output file", _
                                   strTitle:="File Name")
        
    'Assume list to be included in sheets array in on worksheet named list in Column 1 beginning in Row 1
    'Total number of rows is dynamic
        MaxRows = GetRows(ws:=ws)
        
    'Redim the array to hold the name of the worksheets
        ReDim Preserve Arr(1 To MaxRows)
    
    'Load the list of sheets to be included into the array
        For i = 1 To MaxRows
            Arr(i) = ws.Cells(i, 1).Value
        Next i
        
    'Select the sheets array
        Sheets(Arr).Select
 
    'Export to the sheets array to pdf
        ActiveSheet.ExportAsFixedFormat Type:=xlTypePDF, _
                                        Filename:=strPath & strFileName & strEXTENSION, _
                                        Quality:=xlQualityStandard, _
                                        IncludeDocProperties:=True, _
                                        IgnorePrintAreas:=False, _
                                        OpenAfterPublish:=False
                    
    'Tidy up
        'Erase arrays
            Erase Arr
        
        'Destroy objects
            Set ws = Nothing
            Set wb = Nothing
End Sub

And the functions:


Public Function GetRows(ws As Worksheet) As Long
 
    Dim r       As Long
     
    With ws
        r = .Cells(Rows.Count, 1).End(xlUp).Row
        GetRows = r
    End With
     
End Function

Public Function GetUserInput(strPrompt As String, _
                             strTitle As String) As String
      
    Dim strUserInput As String
      
    strUserInput = InputBox(Prompt:=strPrompt, _
                            Title:=strTitle)
                              
    GetUserInput = strUserInput
  
End Function

Public Function GetFolder() As String
  
    Dim fd As FileDialog
    Dim strFolderName As String
    Set fd = Application.FileDialog(msoFileDialogFolderPicker)
      
    With fd
        .Title = "Please select a folder"
        .AllowMultiSelect = False
        .Show
        strFolderName = .SelectedItems(1)
    End With
  
    GetFolder = strFolderName
      
    Set fd = Nothing
End Function
,

StoogesFinal

I’m a huge fan of the 3 Stooges (pictured here) and Curly Howard in particular. Without a doubt, they appeal to my 5 year-old sense of humor with their brand of inane slapstick.

Shemp Howard, Curly’s brother, was ok, but Curly Joe? Ah Marone! Don’t even get me started!

Curly Howard was pure slapstick comedic genius – undisputed. Unfortunately, Curly left us far too early.

Today’s post, however, is not about the Stooges or Curly. It is about Cartesian Products and creating “Controlled” random data.

Random Data Generators

Dick Kusleika and Jimmy Pena offer Random Data Generator add-ins on their respective sites. Jimmy’s will cost you a couple of bucks:

  1. Daily Dose of Excel – Dick Kusleika
  2. JP Software Technologies – Jimmy Pena

Cartesian Product

A Cartesian Product returns everything from Table A with everything from Table B which is generally unwanted and means you have not configured your join(s) properly.

Here I have a couple of Tables of NFL Teams and Divisions that I would like to join in a query to get the teams into their respective divisions.

AccessTables

Here’s my query in the Query Design Window

NFLQry1

When I run the query, I get a Cartesian Product where every team is returned against every division. It is not possible for the Arizona Cardinals to belong to 8 different divisions:

NFLCartesian

Revised Join

I revised the join to show the correct relationship between Division and Team:

NFLTeamDivRelate

Now I get the correct results when I run the query:

NFLTeamDivRelateData

Create “Controlled” Sample / “Dummy” Data

Now, I want to create some sample data to load for a PivotTable so I want some random values, but I want to control the Row Label and Column Field values. For Row Labels, I’ll use Regions and Representatives. This is a standard relationship, so we will reflect as such in the query design:

RegionsReps

Additionally, I would like to generate some sample (dummy) data for each rep for each day of the year. I’ll use a Cartesian Join to generate the dataset:

RegionsRepsCart

Note that there is no join between the Reps/Regions and Dates tables. The query results:

RegionsRepsCartResultRecAnnon

Let’s Create Some Random / “Dummy” Data

I’ll need a function in Access that I can add to the query to generate a random number

Public Function GetRandomValue(Optional x As Integer) As Double

    Dim dblRandom As Double
    
        dblRandom = (1000000 - 500000 + 1) * Rnd() + 500000
        GetRandomValue = dblRandom
        
End Function

Note the difference: in MS Access, the Built-in Function for generating a random number is Rnd(). In MS Excel, it’s Rand(). I added the User-Defined Function to the Query Design Grid and named the output field as SalesAmount:

QueryDesignRandomFunctionFinal

And the query results:

QueryRandomResultsFinal

The first value looks great, but do you notice how the same value keep repeating through every row? That’s not very random. The problem is the function is only called one time, so the random value is generated once and then repeated on every row. I’ll need to pass a unique value to the function to make it generate a new random number on every row of the query results.

To create the unique value, I’ll concatenate 3 fields in the query together

[RepName] & "-" & [RegionName] & "-" & [RecordDate]

I can then pass that as a unique value to the Random Function. First, I need to modify the Random Function a little bit. The Function was looking for me to pass an Integer Value. So I’ll just change the argument to a Variant.

'Old
Public Function GetRandomValue(Optional x As Integer) As Double

'New
Public Function GetRandomValue(v As Variant) As Double

Now I can update the query in the design grid:

QueryRandomValueUpdate

And the query results:

QueryResultsRandomValuesFinalAnno

Perfect! Ready to connect to the query with Excel PivotTable and pivot and slice ’til you get your fill.

MS Query

I was able to generate some random data with some controlled data in MS Access, can I repeat the same in MS Query? I’ll begin by copying the from each of the 3 Access tables to 3 different worksheets in an Excel Workbook:

AccessDataToExcel

Next, I’ll import the data from the 3 worksheets into MS Query

  1. Click on Data Tab on the Ribbon
  2. In the Get External Data group, click on From Other Sources
  3. In the resulting pop-up menu, click on From Microsoft Query

MSQueryMenuSteps

At the Choose Data Source Dialog, click on Excel Files and OK

ChooseDataSourceDiaAnn

At the Select Workbook Dialog, browse through the directory structure to find the workbook that contains the data that you would like to use to import from including the the current workbook.

SelectWBDiaAnn

At the Query Wizard – Choose Columns Dialog, expand nodes next to the tables in the panel on the left, find the columns from each table that you would like included in your query results, use arrow buttons to add and remove items from the pane, “Columns in your query”

QryWizardChooseColDia

You may receive an error message:

The Query Wizard can not continue because it can not join the tables in your query. You must join the tables manually in Microsoft Query by dragging the fields to join between the tables.

Go ahead and click, “OK”. The Query Results will be displayed with no joins between the tables:

MicrosoftQueryInitialResultsAnno

In the initial Query Results Window, notice that there are no Joins, so MS Query went ahead and created a Cartesian Product of Table A, B and C thus returning 14.6K records.

Go ahead and add the Join, by dragging RegionKey from the Reps Table to RegionsKey on the Regions Table. The query results will update to the 3,650 records as expected:

MicrosoftQueryUpdateResultsAnno

Once you have the query returning the correct results, click on, “Return Data” icon on the MS Query Toolbar.

MicrosoftQryReturnDataIconAnno

At the Import Data Dialog, select the option to view the data as a Table in the workbook and at the where to put the data prompt, choose, “New worksheet” option.

ImportDataDiaAnn

In $D$1 add a title for the column that will hold values such as “SalesAmount”. In $D$2, enter a randomization formula in the general form =Rand()*(High-Low)+Low. I used =Rand()*(1000000-500000)+500000. Double-click on the fill handle in the lower-right corner of the cell to send the formula to the bottom.

ExcelTableRandFrmla

ExcelTableSalesAmounts

Looks great! Ready to pivot and slice ’til you get your fill.

Tidy Up

    Final Thoughts

    Hopefully, I presented a case for when a Cartesian Product can be a good thing. How many times have you copied the same data move down 20 rows..paste..repeat..over and over? Lots I bet – I know I have.

    I prefer the MS Access method above, but I wanted to present some alternatives. I also wanted to lay some foundation for MS Query a very underutilized feature in Excel IMHO. I’ll take a look at it a bit more in future post(s).

    That’s it for today. I’m going to watch some Stooges – nyuk..nyuk..nyuk.

    Downloads

    Download the Access Database or the Excel Workbook from here on OneDrive.

    Additional Resource

    1. Contextures
    2. Excel User
, ,

FishingTrim

I’m not much of an angler these days. In my younger days, I pulled my share out of the Skunk River and lakes around central Iowa and southern Minnesota. Sit back and I’ll tell you a tale about the one that got away….

Kidding – today’s post is the third in a series about Microsoft Scripting in VBA. In the first post I covered the FileSystemObjct (FSO). In the second post, I looked at Regular Expressions (RegExp).

Previous posts:

  1. FileSystemObject (FSO)
  2. Regular Expressions

Today, I’ll look at the TextStream Object.

The TextStream Object

The TextStream Object enables you to read from and write to text files from Excel using VBA and the Microsoft Scripting Runtime Library (scrrun.dll)


    edit: I am going to use Late Binding in the sample snippets below. A discussion on Late / Early Binding is beyond the scope of this post. Please see these links for a detailed explanation of Late / Early Binding.

    1. Beyond Excel
    2. JP Software Technologies


Public Variables

Because I am using Late Binding, I’ll declare some Public Constants for working with the TextStream Object

Public Const gclForReading As Long = 1
Public Const gclForWriting As Long = 2
Public Const gclForAppending As Long = 8
Public Const gclTristateUseDefault As Long = -2
Public Const gclTristateTrue As Long = -1
Public Const gclTristateFalse As Long = 0

Tristate specifies the format of the text file:

  1. TristateUseDefault = -2 ; Opens the file using the system default
  2. TristateTrue = -1 ; Opens the file as Unicode
  3. TristateFalse = 0 ; Opens the file as ASCII

I also would like to create a function to let users choose a file or a folder. To use the function I’ll create global constants using the MsoFileDialogType Enumeration:

Public Const gclmsoFileDialogFilePicker As Long = 3
Public Const gclmsoFileDialogFolderPicker As Long = 4
Public Const gclmsoFileDialogOpen As Long = 1
Public Const gclmsoFileDialogSaveAs As Long = 2

CreateTextFile Method

I’ll use the CreateTextFile Method of the FileSystemObject (FSO) to create a TextStream Object and write a little message to the file:

Option Explicit
Sub WriteToText()

    'Author     :   Winston Snyder
    'Date       :   2/12/2014
    'Purpose    :   Write to a text file

    'Declare variables
        Dim fso             As Object
        Dim fsoFolder       As Object
        Dim fsoFile         As Object
        Dim ts              As Object
        Dim strFileName     As String
        Dim strFolderName   As String

    'Allow the user to choose a folder location to save the text file to
        strFolderName = GetFolder

    'Get file name from user
        strFileName = GetUserInput(strPrompt:="What would you like to name the file", _
                                   strTitle:="File Name")

    'Create a FileSystemObject (FSO)
        Set fso = GetFileSystemObject

    'Create a text file
        fso.CreateTextFile strFolderName & "\" & strFileName & ".txt"

    'Create an FSO file for the text file just created
        Set fsoFile = fso.GetFile(strFolderName & "\" & strFileName & ".txt")

    ' Open a TextStream for output.
        Set ts = fsoFile.OpenAsTextStream(gclForWriting, gclTristateUseDefault)

    ' Write to the TextStream
        ts.WriteLine "Jeff Weir"
        ts.WriteLine "is a prolific blogger!"
        ts.Close

    'Tidy up
        Set ts = Nothing
        Set fsoFile = Nothing
        Set fso = Nothing

End Sub

Textstream2

Append To A Text File

We don’t always want to create a new text file. Sometimes we want to append to an existing text file. We can use the OpenTextFile Method of the FileSystemObject:

Sub AppendToText()

    'Author     :   Winston Snyder
    'Date       :   2/15/2014
    'Purpose    :   Append to a text file

    'Declare variables
        Dim fso             As Object
        Dim fsoFile         As Object
        Dim ts              As Object
        Dim strFileName     As String

    'Allow the user to choose a file to append to
        strFileName = GetFile()

    'Create a FileSystemObject (FSO)
        Set fso = GetFileSystemObject

    'Create an FSO file for the text file just created
        Set fsoFile = fso.GetFile(strFileName)

    ' Open a TextStream for output.
        Set ts = fsoFile.OpenAsTextStream(gclForAppending, gclTristateUseDefault)

    ' Write to the TextStream
        ts.WriteLine
        ts.WriteLine "Mike Alexander is a handsome devil!"
        ts.Close

    'Tidy up
        Set ts = Nothing
        Set fsoFile = Nothing
        Set fso = Nothing

End Sub

At the GetFile() Dialog prompt, I chose the file I created in the “CreateTextFile Method”. I then used the constant, “gclForAppending” to specify that the new content was to be appended to the existing file. Had I used the constant, “gclForWriting”, the original contents would have been overwritten.

TextFileAppend2

The Functions

Here are the Functions() I used with the code snippets above

FileDialogFolderPicker

Public Function GetFolder() As String
 
    Dim fd As FileDialog
    Dim strFolderName As String
    Set fd = Application.FileDialog(msoFileDialogFolderPicker)
     
    With fd
        .Title = "Please select a folder"
        .AllowMultiSelect = False
        .Show
        strFolderName = .SelectedItems(1)
    End With
 
    GetFolder = strFolderName
     
    Set fd = Nothing
End Function

GetFileSystemObject

Public Function GetFileSystemObject() As Object
     
    On Error Resume Next
    Set GetFileSystemObject = CreateObject("Scripting.FileSystemObject")
     
End Function

GetUserInput

Public Function GetUserInput(strPrompt As String, _
                             strTitle As String) As String
     
    Dim strUserInput As String
     
    strUserInput = InputBox(Prompt:=strPrompt, _
                            Title:=strTitle)
                             
    GetUserInput = strUserInput
 
End Function

FileDialogFilePicker

Public Function GetFile() As String
 
    Dim fd As FileDialog
    Dim strFileName As String
    Set fd = Application.FileDialog(msoFileDialogFilePicker)
     
    With fd
        .Title = "Please select a file"
        .AllowMultiSelect = False
        .Show
        strFileName = .SelectedItems(1)
    End With
 
    GetFile = strFileName
     
    Set fd = Nothing
End Function

The sample Procedures() given are fine for learning and understanding how to create text files and how to write and append data to the files. But how can we leverage these in a business environment to solve business needs and goals?

Merge Text Files

We may have a series of text files that we wish to merge into 1. Back in November, 2013, Ken Puls showed us how to use Power Query to import multiple text files into a Data Model (Power Pivot). All well and good. Let’s see if we can merge text files using the TextStream Object.

Here are some .csv files that I would like to merge together:

TextFilesForMerge

Sub MergeTextFiles()
    
    'Author     :   Winston Snyder
    'Date       :   2/15/2014
    'Purpose    :   Merge text files in a folder
    
    'Declare variables
        Dim fso                             As Object
        Dim fsoFolder                       As Object
        Dim fsoFileOutput                   As Object
        Dim fsoFile                         As Object
        Dim tsOutput                        As Object
        Dim tsInput                         As Object
        Dim strFolderInputFiles             As String
        Dim strFolderOutputFiles            As String
        Dim strFileName                     As String
        Dim strMisc                         As String

    'Allow the user to choose a folder that contains files to be merged
        strFolderInputFiles = GetFolder(strTitle:="Folder for files to be merged")
        
    'Allow the user to choose a folder location for the output file
        strFolderOutputFiles = GetFolder(strTitle:="Folder for output files")

    'What to name the output file
        strFileName = GetUserInput(strPrompt:="What would you like to name the file", _
                                   strTitle:="File Name")

    'Create a FileSystemObject (FSO)
        Set fso = GetFileSystemObject
        
    'Get an FSO folder for the input files
        Set fsoFolder = fso.GetFolder(strFolderInputFiles)
        
    'Create a text file for output
        fso.CreateTextFile strFolderOutputFiles & "\" & strFileName & ".txt"
        
    'Create an FSO file for the text file just created
        Set fsoFileOutput = fso.GetFile(strFolderOutputFiles & "\" & strFileName & ".txt")

    'Open a TextStream for output.
        Set tsOutput = fsoFileOutput.OpenAsTextStream(gclForAppending, gclTristateUseDefault)
        
    'Loop through the files in the input folder
        For Each fsoFile In fsoFolder.Files
            Set tsInput = fsoFile.OpenAsTextStream(gclForReading, gclTristateUseDefault)
            Do Until tsInput.AtEndOfStream
                strMisc = tsInput.ReadLine                  'Read from the input file
                tsOutput.WriteLine strMisc                  'Write to the output file
            Loop
        Next fsoFile
        
    'Tidy up
        Set tsInput = Nothing
        Set tsOutput = Nothing
        Set fsoFolder = Nothing
        Set fso = Nothing

End Sub

Here is the merged file.

MergedFilesMultipleHeaders

Pretty good, except the headers are repeating. I want to revise the code a bit to only include the header from the first file and skip the header on the subsequent files. The revised code:

Sub MergeTextFilesFirstFileHeader()
    
    'Author     :   Winston Snyder
    'Date       :   2/15/2014
    'Purpose    :   Merge text files in a folder
    'Comments   :   Only uses header row from the first file
    '               subsequent files, header row is skipped
    
    'Declare variables
        Dim fso                             As Object
        Dim fsoFolder                       As Object
        Dim fsoFileOutput                   As Object
        Dim fsoFile                         As Object
        Dim tsOutput                        As Object
        Dim tsInput                         As Object
        Dim strFolderInputFiles             As String
        Dim strFolderOutputFiles            As String
        Dim strFileName                     As String
        Dim strMisc                         As String
        Dim blnTest                         As Boolean
        
    'Initialize variables
        blnTest = True                                          'First time through loop

    'Allow the user to choose a folder that contains files to be merged
        strFolderInputFiles = GetFolder(strTitle:="Folder for files to be merged")
        
    'Allow the user to choose a folder location for the output file
        strFolderOutputFiles = GetFolder(strTitle:="Folder for output files")

    'What to name the output file
        strFileName = GetUserInput(strPrompt:="What would you like to name the file", _
                                   strTitle:="File Name")

    'Create a FileSystemObject (FSO)
        Set fso = GetFileSystemObject
        
    'Get an FSO folder for the input files
        Set fsoFolder = fso.GetFolder(strFolderInputFiles)
        
    'Create a text file for output
        fso.CreateTextFile strFolderOutputFiles & "\" & strFileName & ".txt"
        
    'Create an FSO file for the text file just created
        Set fsoFileOutput = fso.GetFile(strFolderOutputFiles & "\" & strFileName & ".txt")

    'Open a TextStream for output.
        Set tsOutput = fsoFileOutput.OpenAsTextStream(gclForAppending, gclTristateUseDefault)
        
    'Loop through the files in the input folder
        For Each fsoFile In fsoFolder.Files
            Set tsInput = fsoFile.OpenAsTextStream(gclForReading, gclTristateUseDefault)
            If blnTest = True Then
                blnTest = False
            Else
                tsInput.SkipLine                            'Move the file pointer to the line below the header
            End If
            Do Until tsInput.AtEndOfStream
                strMisc = tsInput.ReadLine                  'Read from the input file
                tsOutput.WriteLine strMisc                  'Write to the output file
            Loop
        Next fsoFile
        
    'Tidy up
        Set tsInput = Nothing
        Set tsOutput = Nothing
        Set fsoFolder = Nothing
        Set fso = Nothing

End Sub

MergedFilesOneHeader

Awesome! The solution was to use the SkipLine Method of the TextStream Object everytime through the loop other than the first time:

If blnTest = True Then
   blnTest = False
Else
   tsInput.SkipLine                        'Move the file pointer to the line below the header
End If

Read From Text File

Another common problem, is to read the contents of a text file into Excel. I’ll demonstrate some code to take care of this using comma separated values (csv), but you can use any kind of delimiter such as tab, space, pipe “|”, etc….

Sub ReadTextFileIntoExcel()
    
    'Author     :   Winston Snyder
    'Date       :   2/15/2014
    'Purpose    :   Read data from text file, output to Excel
    
    'Declare variables
        Dim fso                         As Object
        Dim fsoFile                     As Object
        Dim ts                          As Object
        
        Dim vArrData()                  As Variant
        Dim strLine()                   As String
        Dim strData()                   As String
        
        Dim i                           As Long
        Dim j                           As Long
        
        Dim strTextFileName             As String
        Dim strFolderOutputFiles        As String
        Dim strExcelFileName            As String
        Dim strReadAll                  As String
        
    'User - choose a text file to read into Excel
        strTextFileName = GetFile()
        
    'User - choose a folder location for the output file
        strFolderOutputFiles = GetFolder(strTitle:="Folder for output files")

    'User - name the output file
        strExcelFileName = GetUserInput(strPrompt:="What would you like to name the file", _
                                        strTitle:="File Name")

    'Create a FileSystemObject (FSO)
        Set fso = GetFileSystemObject
        
    'Create an FSO file for the user selected text file
        Set fsoFile = fso.GetFile(strTextFileName)

    'Open a TextStream for reading
        Set ts = fsoFile.OpenAsTextStream(gclForReading, gclTristateUseDefault)
        
    'Read the text file and store it in a string variable
        strReadAll = ts.ReadAll
    
    'Split each line of the text document based on the new line delimiter
        strLine = Split(strReadAll, vbNewLine)
        
    'Get number of elements in the line
        strData = Split(strLine(0), ",")
        
    'Redim the data array
        ReDim vArrData(LBound(strData) To UBound(strData), LBound(strLine) To UBound(strLine))
        
    'Erase the strData Array
        Erase strData
        
    'Loop the strLine Array, split each line into data elements, load the data elements into the data array
        For i = LBound(strLine) To UBound(strLine)
            strData = Split(strLine(i), ",")
            For j = LBound(strData) To UBound(strData)
                vArrData(j, i) = strData(j)
            Next j
        Next i

    'Output the array to an Excel Worksheet
        Call ArrayToRange(vArr:=vArrData, _
                          strPath:=strFolderOutputFiles, _
                          strFileName:=strExcelFileName)
        
    'Tidy up
        'Erase arrays
            Erase vArrData
            Erase strLine
            Erase strData
            
        'Destroy objects
            Set ts = Nothing
            Set fsoFile = Nothing
            Set fso = Nothing

End Sub

'----------------------------------------------------------------------------------------

Public Sub ArrayToRange(ByRef vArr() As Variant, _
                        strPath As String, _
                        strFileName As String)
                        
    Dim wbNew As Workbook
    Dim wsNew As Worksheet
    Dim rngNew As Range
    Dim r As Long
    Dim c As Long
    
    c = UBound(vArr, 1)                                     '1st dimension of array
    r = UBound(vArr, 2)                                     '2nd dimension of array
    
    Set wbNew = Workbooks.Add
    Set wsNew = wbNew.Worksheets("Sheet1")
    Set rngNew = wsNew.Range("A1")

    'Resize the destination range
    'Use +1, +1 for Rows and columns since array begins at (0,0)
        rngNew.Resize(r + 1, c + 1).Value = Application.Transpose(vArr)
    
    wbNew.SaveAs strPath & "\" & strFileName & ".xlsx"
    wbNew.Close
    
    Set rngNew = Nothing
    Set wsNew = Nothing
    Set wbNew = Nothing
    
End Sub

TextToExcelFinal

Tidy Up

That’s it for today. Long post, sorry ’bout that. That last snjppet ended up being about Arrays as much as it was about the TextStream Object. Now, where did I leave my worms? I’m going fishing.

Download the file from SkyDrive

Previous Posts At dataprose.org – Scripting

  1. The FileSystemObject
  2. Regular Expressions

Microsoft Scripting

  1. Microsoft Scripting Center
  2. FileSystemObject Reference (Windows Scripting)
  3. TextStream Object
  4. 4 Guys From Rolla – FSO

Additional Resources – Arrays

  1. VBA Arrays And Worksheet Ranges
  2. Understanding Arrays
  3. Redim Statement
, , , ,