StoogesFinal

I’m a huge fan of the 3 Stooges (pictured here) and Curly Howard in particular. Without a doubt, they appeal to my 5 year-old sense of humor with their brand of inane slapstick.

Shemp Howard, Curly’s brother, was ok, but Curly Joe? Ah Marone! Don’t even get me started!

Curly Howard was pure slapstick comedic genius – undisputed. Unfortunately, Curly left us far too early.

Today’s post, however, is not about the Stooges or Curly. It is about Cartesian Products and creating “Controlled” random data.

Random Data Generators

Dick Kusleika and Jimmy Pena offer Random Data Generator add-ins on their respective sites. Jimmy’s will cost you a couple of bucks:

  1. Daily Dose of Excel – Dick Kusleika
  2. JP Software Technologies – Jimmy Pena

Cartesian Product

A Cartesian Product returns everything from Table A with everything from Table B which is generally unwanted and means you have not configured your join(s) properly.

Here I have a couple of Tables of NFL Teams and Divisions that I would like to join in a query to get the teams into their respective divisions.

AccessTables

Here’s my query in the Query Design Window

NFLQry1

When I run the query, I get a Cartesian Product where every team is returned against every division. It is not possible for the Arizona Cardinals to belong to 8 different divisions:

NFLCartesian

Revised Join

I revised the join to show the correct relationship between Division and Team:

NFLTeamDivRelate

Now I get the correct results when I run the query:

NFLTeamDivRelateData

Create “Controlled” Sample / “Dummy” Data

Now, I want to create some sample data to load for a PivotTable so I want some random values, but I want to control the Row Label and Column Field values. For Row Labels, I’ll use Regions and Representatives. This is a standard relationship, so we will reflect as such in the query design:

RegionsReps

Additionally, I would like to generate some sample (dummy) data for each rep for each day of the year. I’ll use a Cartesian Join to generate the dataset:

RegionsRepsCart

Note that there is no join between the Reps/Regions and Dates tables. The query results:

RegionsRepsCartResultRecAnnon

Let’s Create Some Random / “Dummy” Data

I’ll need a function in Access that I can add to the query to generate a random number

Public Function GetRandomValue(Optional x As Integer) As Double

    Dim dblRandom As Double
    
        dblRandom = (1000000 - 500000 + 1) * Rnd() + 500000
        GetRandomValue = dblRandom
        
End Function

Note the difference: in MS Access, the Built-in Function for generating a random number is Rnd(). In MS Excel, it’s Rand(). I added the User-Defined Function to the Query Design Grid and named the output field as SalesAmount:

QueryDesignRandomFunctionFinal

And the query results:

QueryRandomResultsFinal

The first value looks great, but do you notice how the same value keep repeating through every row? That’s not very random. The problem is the function is only called one time, so the random value is generated once and then repeated on every row. I’ll need to pass a unique value to the function to make it generate a new random number on every row of the query results.

To create the unique value, I’ll concatenate 3 fields in the query together

[RepName] & "-" & [RegionName] & "-" & [RecordDate]

I can then pass that as a unique value to the Random Function. First, I need to modify the Random Function a little bit. The Function was looking for me to pass an Integer Value. So I’ll just change the argument to a Variant.

'Old
Public Function GetRandomValue(Optional x As Integer) As Double

'New
Public Function GetRandomValue(v As Variant) As Double

Now I can update the query in the design grid:

QueryRandomValueUpdate

And the query results:

QueryResultsRandomValuesFinalAnno

Perfect! Ready to connect to the query with Excel PivotTable and pivot and slice ’til you get your fill.

MS Query

I was able to generate some random data with some controlled data in MS Access, can I repeat the same in MS Query? I’ll begin by copying the from each of the 3 Access tables to 3 different worksheets in an Excel Workbook:

AccessDataToExcel

Next, I’ll import the data from the 3 worksheets into MS Query

  1. Click on Data Tab on the Ribbon
  2. In the Get External Data group, click on From Other Sources
  3. In the resulting pop-up menu, click on From Microsoft Query

MSQueryMenuSteps

At the Choose Data Source Dialog, click on Excel Files and OK

ChooseDataSourceDiaAnn

At the Select Workbook Dialog, browse through the directory structure to find the workbook that contains the data that you would like to use to import from including the the current workbook.

SelectWBDiaAnn

At the Query Wizard – Choose Columns Dialog, expand nodes next to the tables in the panel on the left, find the columns from each table that you would like included in your query results, use arrow buttons to add and remove items from the pane, “Columns in your query”

QryWizardChooseColDia

You may receive an error message:

The Query Wizard can not continue because it can not join the tables in your query. You must join the tables manually in Microsoft Query by dragging the fields to join between the tables.

Go ahead and click, “OK”. The Query Results will be displayed with no joins between the tables:

MicrosoftQueryInitialResultsAnno

In the initial Query Results Window, notice that there are no Joins, so MS Query went ahead and created a Cartesian Product of Table A, B and C thus returning 14.6K records.

Go ahead and add the Join, by dragging RegionKey from the Reps Table to RegionsKey on the Regions Table. The query results will update to the 3,650 records as expected:

MicrosoftQueryUpdateResultsAnno

Once you have the query returning the correct results, click on, “Return Data” icon on the MS Query Toolbar.

MicrosoftQryReturnDataIconAnno

At the Import Data Dialog, select the option to view the data as a Table in the workbook and at the where to put the data prompt, choose, “New worksheet” option.

ImportDataDiaAnn

In $D$1 add a title for the column that will hold values such as “SalesAmount”. In $D$2, enter a randomization formula in the general form =Rand()*(High-Low)+Low. I used =Rand()*(1000000-500000)+500000. Double-click on the fill handle in the lower-right corner of the cell to send the formula to the bottom.

ExcelTableRandFrmla

ExcelTableSalesAmounts

Looks great! Ready to pivot and slice ’til you get your fill.

Tidy Up

    Final Thoughts

    Hopefully, I presented a case for when a Cartesian Product can be a good thing. How many times have you copied the same data move down 20 rows..paste..repeat..over and over? Lots I bet – I know I have.

    I prefer the MS Access method above, but I wanted to present some alternatives. I also wanted to lay some foundation for MS Query a very underutilized feature in Excel IMHO. I’ll take a look at it a bit more in future post(s).

    That’s it for today. I’m going to watch some Stooges – nyuk..nyuk..nyuk.

    Downloads

    Download the Access Database or the Excel Workbook from here on OneDrive.

    Additional Resource

    1. Contextures
    2. Excel User
, ,

BeatlesLg
I like music – lots of different kinds of music, including The Beatles pictured here. These days, I’m more into “Adult Alternative” music, but I still like to listen to some stuff from the ’60’s and ’70’s every now and again.

Around these parts, we get “Breakfast With The Beatles” with Andre Gardner every Sunday from 7am-9am. Check your local listings to see if it is available in your area or you can stream it on KSLX. Really excellent show if you are into The Beatles.

However, today’s post is not about music or The Beatles. It is about Merging Excel Workbooks. Over on the LinkedIn Excel Groups there are many questions about merging data from several Excel Workbooks into 1 Workbook. Many folks suggest linking the workbooks using a formula. Others suggest using the INDIRECT() function. I prefer to use some VBA to copy the desired data from the Source Workbooks to the Destination Workbook.

All testing, screen shots, code in this post are from Office 365. If you are using another version of Excel, your results may vary (though they should not).

Linking Files

A common way to get data from one file to another is to link them. In this sample, I opened 2 Excel files, entered a value in 1 file, activated another file, typed, “=” and clicked on a cell in the 1st file:

LinkFiles2

DestinationWB.xlsx in the active workbook and cell $A$1 is selected. The formula bar give us:

  1. [SourceWB.xlsx] <- The name of the workbook
  2. Sheet1! <- The name of the worksheet
  3. $A$1 <- The cell reference

When I close the source workbook, the file path is added to the linked cell formula:

LinkedCellwPath

The folder location ‘C:\Data\ has been added to the linked formula.

There are a few things I do not like about linked cells

  1. The absolute references were added automatically. If I drag the formula to the right, the formula will still reference $A$1, not $B$1 as we may require
  2. What happens if a Row is added at Row 1 pushing all subsequent rows down by 1? This could cause an error in the linked formula, erroneous results, or at least the need for additional maintenance.
  3. What happens if the source file(s) are moved from the referenced folder location?

Here is the formula with the file in the folder ‘C:\OriginalFolder\.

PathReference

Linking Files – File Migration

One problem with linking files is that files move in our directory structures from time-to-time. We may initiate this move ourselves as we come up with a new way to organize our file structures. Other times this move may be initiated by the IT Department as they are updating or migrating storage.

Let’s see what happens to the original link formula when I move the source file.
I moved the file from ‘C:\OriginalFolder\ to ‘C:\NewFolder\ in Windows Explorer.

MoveFileFinal

Now when I try to update the link value in the Destination Workbook, I receive an error message

EditLinksErrorFinal

I can click on the change source button, but I would like a solution that provides for as little maintenance needs as possible.

Linking Files – Data Moves On Worksheet

The original linked formula is linked to cell A1 in the source file. What happens if, unbeknownst to you, a user inserts a blank row at A1 and all of the data shifts from A1 to A2?

UpdateValueFinal

The Edit Links Dialog status reads, “OK”, but it is not. The Linked Cell formula is still linked to Cell $A$1 and the updated Cell Value is now 0. We could use Find-Replace to update from $1 to $2, but that could have unintended consequences. What if other workbook moved data 2-3 rows? You see the point I’m sure.

Copy Data From Source To Destination

I propose to loop through a folder and copy some data from a worksheet for each workbook found in the folder – I would like the solution to be as dynamic as possible. As I am looping, I will paste the data on a worksheet in Top-Down fashion such that the first file results will be near the top, then the data from the second file and so on.

For today, my source files all have an Excel Table in them, though we could make the process work with Ranges as well with just a bit more work. For additional resources working with Excel Tables, see the links at the bottom of the post.

Some Functions & Properties

First, I’ll look at some VBA functions that I am going to use in the final code. These functions will make your code more dynamic and user-friendly.

CurDir

CurDir returns the current path. I use CurDir to trap the current path so I can restore it at then before I change the path to make navigating the file structure faster. Use as follows:

Option Explicit

Sub foo2()

    Dim strDirectory As String
    strDirectory = CurDir()
    Debug.Print strDirectory
    
End Sub

Output:

C:\Users\wsnyder\Documents

ChDir

ChDir changes the the current path. I use ChDir to change the current path to get the user closer the final folder they will eventually choose using the FileDialog to select a folder for processing. Use as follows:

Option Explicit

Sub foo2()

    Dim strDirectory As String
    strDirectory = "C:\Data\"
    ChDir (strDirectory)
    Debug.Print CurDir()
    
End Sub

Output:

C:\Data

FileDialog Property

The FileDialog Property of the Application Object returns a FileDialog Object. This give you the ability to interact with users at runtime by allowing the user to choose File(s) or Folder(s) to work with. The FileDialog accepts one argument, the DialogType.

There are 4 DialogTypes in the MsoFileDialogType Enumeration :

  1. msoFileDialogFilePicker
  2. msoFileDialogFolderPicker
  3. msoFileDialogOpen
  4. msoFileDialogSaveAs

For today’s purposes, I’ll use msoFileDialogFolderPicker

Set diaFolder = Application.FileDialog(msoFileDialogFolderPicker)

The Setup…

As I mentioned earlier, each source file has an Excel Table in it. I would like to loop through each file in the source folder and copy the DataBodyRange of the Excel Table to the destination workbook. Additionally, on the first pass, I would also like to copy the HeaderRowRange to create headers in the destination workbook. Lastly, I would like to add some data to the right of the data from the source files, such as the date the data was copied and the name of the source file.

Create Some Sample Data and Files

I quickly whipped up some sample data using Dick’s Random Data Generator and creatively saved the files as File1.xlsx, File2.xlsx, File3.xlsx.

TablesLayered

Again, I highlighted the data [Ctrl] + [a] and added an Excel Table [Ctrl] + [t] in each file. I applied a different Table Style to each Table simply to highlight the fact that there are 3 different Tables in 3 different Excel files. If you need to brush up on Excel Tables or need to start at the beginning:

  1. Excel Table Tutorial – Contextures
  2. Sur la Excel Table
  3. Listing Toward ListObjects

Loop Through Files In A Folder

When I need to loop through files in a folder – I use the FileSystemObject (FSO). The FileSystemObject is a top-level object in the Microsoft Scriping Runtime Library (Scrrun.dll). Here are some additional references if you need to brush up or are not familiar with FSO.

  1. JP Software Tech
  2. Chip Pearson
  3. 4 Guys From Rolla
  4. dataprose.org
  5. MSDN

edit: I am going to use Late Binding in the sample snippets below. A discussion on Late / Early Binding is beyond the scope of this post. Please see the “Additional Resources” at bottom for links to detailed explanation of Late / Early Binding.

Copying Data From Source To Destination

It’s finally time to copy the data from the source workbooks to the destination workbook. I’m using three files with 50 records each, but you could use this code with an unlimited number of records or variable number of records and unlimited number of files (3, 10, 50,…) – as long as you do not exceed 1,048,576 rows (though I would never use that many row in Excel – time to consider a database).

Option Explicit

Sub CopyDataFromSourceFiles()
    
    'Author         :           Winston Snyder
    'Created Date   :           1/26/2014
    'Comments       :           Assumes each source file contains at least one list object (Excel Table)
    
    'Delare variables
        Dim wb                  As Workbook
        Dim wbData              As Workbook
        Dim ws                  As Worksheet
        Dim wsData              As Worksheet
        Dim rngData             As Range
        Dim rngDestination      As Range
        Dim lo                  As ListObject
        Dim fso                 As Object
        Dim fsoFolder           As Object
        Dim fsoFile             As Object
        Dim strSelectedFolder   As String
        Dim strCurrentPath      As String
        Const strSpecifiedPath  As String = "C:\"
        Dim lngRows             As Long
        Dim blnFlag             As Boolean
        
    'Excel environment - speed things up
        With Application
            .ScreenUpdating = False
            .DisplayAlerts = False
            .EnableEvents = False
            .Calculation = xlCalculationManual
        End With
        
    'Initialize variables
        Set wb = ThisWorkbook
        Set ws = wb.Worksheets("Data")
        blnFlag = True
        
    'Clear data from control workbook from previous consolidations
        ws.UsedRange.ClearContents
    
    'Get the current path, so reset the path at the end of the procedure
        strCurrentPath = CurDir()
        
    'Set the target directory to get the user closer to the working folder
    'This will minimize the time the user must spend drilling into the file system
    'once they are presented with the FileDialog
        ChDir (strSpecifiedPath)
    
    'Create a FileSystemObject
        Set fso = GetFSO

    'Prompt the user to select a folder
    'Return the path of the selected folder
        strSelectedFolder = GetSelectedFolder
        
    'Get the FSO Folder of the selected folder
        Set fsoFolder = fso.GetFolder(strSelectedFolder)
        
    'Loop each file in folder
    'Copy data from each file to control workbook
        For Each fsoFile In fsoFolder.Files
            Debug.Print fsoFile.Name
            Set wbData = Workbooks.Open(fsoFile)
            Set wsData = wbData.Worksheets("Sheet1")
            
            'Get next blank row from destination worksheet
            'If first time, need row 1, else, next blank row
                lngRows = GetRows(ws:=ws)
                If blnFlag = False Then lngRows = lngRows + 1
                
            'The Destination Range
                Set rngDestination = ws.Cells(lngRows, 1)
            
            'If first time, include the header row
                With wsData
                    For Each lo In .ListObjects
                        If blnFlag = True Then
                            Set rngData = Union(lo.HeaderRowRange, lo.DataBodyRange)
                            blnFlag = False
                        Else
                            Set rngData = lo.DataBodyRange
                        End If
                    Next lo
                End With
                
            'Copy the Data Range to the Destination Range
                rngData.Copy
                rngDestination.PasteSpecial xlPasteValuesAndNumberFormats
                
            'Close the source file
                wbData.Close
        
        Next fsoFile
        
    'Tidy up
        'Restore to original path
            ChDir (strCurrentPath)
            
        'Restore Excel environment
            With Application
                .ScreenUpdating = True
                .DisplayAlerts = True
                .EnableEvents = True
                .Calculation = xlCalculationAutomatic
            End With
            
        'Destroy objects
            Set fsoFolder = Nothing
            Set fso = Nothing
            Set rngData = Nothing
            Set rngDestination = Nothing
            Set ws = Nothing
            Set wb = Nothing
        
End Sub
'------------------------------------------------------------------------------------
Private Function GetRows(ws As Worksheet) As Long

    Dim r       As Long
    
    With ws
        r = .Cells(Rows.Count, 1).End(xlUp).Row
        GetRows = r
    End With
    
End Function
'------------------------------------------------------------------------------------
Private Function GetFSO()

    Dim fso             As Object
    
    Set fso = CreateObject("Scripting.FileSystemObject")
    Set GetFSO = fso
    
    Set fso = Nothing

End Function
'-------------------------------------------------------------------------------
Private Function GetSelectedFolder() As String
    
    Dim diaFolder       As FileDialog
    Dim strFolder       As String

    Set diaFolder = Application.FileDialog(msoFileDialogFolderPicker)
    With diaFolder
        .AllowMultiSelect = False
        .Show
        strFolder = .SelectedItems(1)
    End With

    GetSelectedFolder = strFolder
End Function

MergeSourceFiles

I formatted the output and hid some rows to show that there are 151 records as expected (3 files * 50 records each + 1 header row).

My favorite part of the code is here

'If first time, include the header row
                With wsData
                    For Each lo In .ListObjects
                        If blnFlag = True Then
                            Set rngData = Union(lo.HeaderRowRange, lo.DataBodyRange)
                            blnFlag = False
                        Else
                            Set rngData = lo.DataBodyRange
                        End If
                    Next lo
                End With

The properties of the ListObject (Excel Table) such as HeaderRowRange and DataBodyRange are 2 reasons why the ListObject is far superior to the Range Object. Couple these kinds of properties with the fact that you can move the Excel Table anywhere on the worksheet you want and add rows to the Table or redact rows from the Table and the consolidation code will still work flawlessly. No Excel Hell! Awesome!

Tidy Up

, , , , , , ,