UndauntedFinalII

Some time back, Shane Devenshire and I added our two cents to a question a user asked on LinkedIn here. So when I decided to write a post regarding UnPivot using nested For..Next Loops, it only made sense to ask Shane if he would like to write a guest-post and tutorial on his process and Normalization Utility.

So, without further ado, take it away Shane!

Un-pivoting Excel Spreadsheet data – Creating a Normalized Table

Introduction:
Recently I was working with Tableau and discovered that the company had an Excel add-in for taking data laid out like a pivot table or standard spreadsheet layout:

SDTable1

And converting it to data laid out like a table (normalized):

SDTable2

When I tested the add-in on a large data it took over an hour. I knew I could do this faster, even manually, but I decided to automate the process, which uses a number of Excel’s built-in features. The attached is the initial result of this endeavor.

When one wants to convert spreadsheet data, laid out as in a pivot table, into what is called a normalized database or table one can use a VBA approach which loops through cells rearranging the data and copying repeated data as many times as necessary. You could do this manually but it’s rather labor intensive. However, even the looping approach is not very efficient, which becomes apparent when the data sets are large. An alternate approach using some of Excel’s less familiar features can speed up the process dramatically.

I created the three step wizard shown below to provide some enhancements:

In the first step, see Figure 1, the user indicates the top left corner of the values area, the sample show what we are looking for. This allows the VBA routine to decide where the column titles are and which columns will be retained in their default structure and which column will be “transposed” (placed in a singe “values” column.

Figure 1 – First step of the Wizard:

SDTable3

If the user clicks Finish, the code accepts all the defaults and created the normalized table. If the user clicks Next the second step of the Wizard is displayed, see Figure 2.

Figure 2 – This is the second step of the Wizard:

SDTable4

Here the Wizard displays the first ten fields to the left of the values area allowing the user to choose which ones to include. If there are dates at the top of the values area a default name for the new field is Date which is automatically entered in the Field Name box which the user can change or leave blank. The Delimiter box allows the user to indicate what delimiter to use when concatenating the fields together (discussed later). The delimiter should not be a character found in the chosen fields on the left in the dialog box.

Clicking Next brings you to the third step of the wizard, shown in Figure 3.

Figure 3 – This is the third step of the Wizard:

SDTable5

The first two options on this screen allow the user to retain or remove rows of data in the output table which have either a 0 (zero) or are blank in the values column. This reduces the file size when appropriate. Because the Excel feature used to normalize the data generates an Excel “table”, the last three options allow the user to turn on or off various features of tables or convert the table to a range.

Overview:

You can create a pivot table in Excel with a feature called Multiple Consolidation Ranges. You can then use the Show Details command on the Grand/Grand Total to produce a normalized table. Both of these commands are very fast! One limitation of this command is that is was designed to be used with data which has only one label field to the left of the values area. To bypass this limitation one can combine all the label fields by using a concatenation formula, but to later break it apart a delimiter is need. Then the multiple consolidation command is run against this one field and all of the value columns. The concatenated field of the output is then parsed using Excel’s Text to Columns command with the delimiter specified by the user. These steps are very fast. As noted in the discussion of step 3 of the Wizard, I have also incorporated a number of additional options.

I create two modules and a user form to control this process. All of the code and form can be found in the attached file. I have tried not to obfuscate the code. There are four main components to consider

Option Explicit		
Dim k	As Integer	number of fields to be output
		
Sub NormalizeData()		
Dim sTempSheet	As String	Name of the sheet where the pivot table is placed
Dim j	As Integer	loop counter
Dim i	As Integer	loop counter
Dim sMyFormula	As String	the concatenated formula
Dim sMySheet	As String	Name of the sheet where initial data is
Dim lMyColumn	As Long	column number of column used for concatenated formula
Dim sDataSheet	As String	Name of the sheet where the final output is placed
Dim sSource	As String	Source range for the pivot table
Dim sMyTitles	As String	The concatenated titles
    		
    'Speeds things up		
    With Application		
        .ScreenUpdating = False	
        .Calculation = xlCalculationManual	
    End With		
    		
    sMySheet = ActiveSheet.Name	
    		
    'uses the ref edit address to place the cursor in the top left corner of the values to be transposed
    Range(frmNormalize.refFirstData).Activate
    		
    'Inserts new blank column for concatenation formula
    ActiveCell.EntireColumn.Insert	
    lMyColumn = ActiveCell.Column	
    		
    'initializes loop counters		
    j = 1		
    k = 0		
    		
    'loops through each column and concatenates it if that column was choosen
    'this loop runs a max of ten times	
    For i = iTotCols To 1 Step -1	
        If frmNormalize.Controls("Checkbox" & j) = True Then
            If sMyFormula = "" Then	
                sMyFormula = "=RC[" & -i & "]"	
            Else		
                sMyFormula = sMyFormula & "&" & """" & sDelimiter & """" & "&RC[" & -i & "]"
            End If		
            k = k + 1		
        End If		
        j = j + 1		
    Next i		

The above code creates a concatenation formula of the forma =A5&”*”&B5&”*”&C5, which is then converted to values like 2008*Coffee*Seattle. The above code loops once to build the concatenated formula for a single row, and then fills it down. All very fast. The delimiter must be unique because it will be used by the Text to Columns command to parse the data.

    'Selects the title row and inserts the formula there.
    'Here it will be the concatenated titles	
    ActiveCell.Offset(-1, 0).Select	
    ActiveCell = sMyFormula		
    Range(ActiveCell, Cells(lLastRow, ActiveCell.Column)).FillDown
    ActiveSheet.Calculate		
    		
    'converts the range to values	
    Selection = Selection.Value	
    		
    'the concatenated titles is saved for later use since it would be lost during the next commands
    sMyTitles = ActiveCell		
    		
    'Creates "consolidated ranges" pivot table, which will later be removed
    sSource = Range(Cells(lFirstDataRow - 1, lFirstDataColumn), Cells(lLastRow, lLastColumn)).Address(, , xlR1C1)
    ActiveWorkbook.PivotCaches.Create(SourceType:=xlConsolidation, _
        SourceData:=sSource).CreatePivotTable _
        TableDestination:="", _	
       TableName:="NPT"		
    On Error Resume Next	
    ActiveSheet.PivotTables("NPT").PivotFields("Count of Value").Function = xlSum

The above code is one of the steps that make this process so fast. Excel builds a “consolidated range” pivot table. If some of the data is non-numeric the data field would use “Count” which we want to convert to “Sum”, this is the reason for lines 67 and 68. Manually you can reach this command by pressing Alt+D+P to expose the PivotTable and PivotChart Wizard from Excel 2003, shown in Figure 4, below:

Figure 4 – This is the first step of Microsoft’s Wizard, which the code is emulating:

SDTable6

The command we are using is Multiple consolidation ranges. Excel gives us a pivot table which Is, in and of itself, not very useful. But in the following code, the ShowDetail command used on the Grand/Grand Total creates the desired output. And it’s very fast!

    'creates database detail from the above pivot table, which is pretty much what we want
    sTempSheet = ActiveSheet.Name	
    ActiveCell.SpecialCells(xlLastCell).ShowDetail = True   'Generates normalized database
    		
    'you can turn off these alerts earlier in the macro but its often safer to turn them of and on only where needed
    ‘The pivot table sheet is no longer need and can be deleted
    With Application		
        .DisplayAlerts = False		
        Sheets(sTempSheet).Delete	
        .DisplayAlerts = True

The following code first inserts some blank columns and then uses Excel’s Text to Columns command to convert the concatenated field back to its original individual fields. Yet another very fast command!

        'inserts as many columns as need to display all the concatenated fields
        Range("B1:" & Cells(1, k).Address).EntireColumn.Insert
        'puts the concatenated titles in cell A1
        Range("A1") = sMyTitles
        Range("A1", [A1].End(xlDown)).Select
        
        'the text to columns command overwrites the table's default titles
        .DisplayAlerts = False
    
        'converts concatenated field to a set of columns
        Selection.TextToColumns _
            Destination:=Range("A1"), _
            DataType:=xlDelimited, _
            ConsecutiveDelimiter:=False, _
            Other:=True, _
            OtherChar:=sDelimiter
        .DisplayAlerts = True
    End With
    
    'best fit the data
    Range("A1:" & Cells(1, iTotCols).Address).EntireColumn.AutoFit
    
    'enters the field name for the transposed row
    Range("A1").Offset(0, k) = frmNormalize.txtNewFieldName
    
    'Options - removes rows With blanks or zeros in the value field
    If frmNormalize.cbBlanks = True Then
        RemoveBlanks
    End If
    If frmNormalize.cbZeros = True Then
        RemoveZeros
    End If
    ConvertToRange
    
    'Cleanup
    sDataSheet = ActiveSheet.Name
    Sheets(sMySheet).Activate

   ‘removes the concatenated formula column from the original data.
    Cells(1, lMyColumn).EntireColumn.Delete    
    Sheets(sDataSheet).Activate
    Application.Calculation = xlCalculationAutomatic
    
End Sub

Deleting rows can be done by looping from the bottom and deleting each row that meets a given condition. This is very slow! If you can select all the rows which meet some condition you can then delete them with one command. This is faster. However, if Excel needs to adjust cell references based on the deleted rows this can also slow things down. One solution is to move all the rows you want to delete to the bottom of the dataset and then delete them with a single command. In the following two modules something like that is done. For removing rows with blanks in the values column you simple need to sort by the values and the use the Go to Special, Blanks command to select all the rows.

RemoveBlanks Subroutine

Removes rows with blanks in value column
Private Sub RemoveBlanks()
    
    'puts all blank cells at the bottom of the table making deleting rows faster
    With ActiveSheet.Sort
        .SortFields.Clear
        .SortFields.Add _
            Key:=Cells(1, k + 2), _
            SortOn:=xlSortOnValues, _
            Order:=xlAscending
        .SetRange ActiveCell.CurrentRegion
        .Header = xlYes
        .Orientation = xlTopToBottom
        .Apply
    End With
    
    'this command deletes all rows with blanks in the values column
    Selection.Offset(0, k + 1).SpecialCells(xlCellTypeBlanks).EntireRow.Delete
End Sub

Removing zeros is a little trickier, because, if the user has chosen not to remove blanks you can’t clear all cells with 0’s and then run the above command, extra row may be deleted. In this case you can replace the 0’s with text, then clear the cells with zeros and then use line 18 above to remove the rows. Finally you can clear the cells containing text. I chose not to do that; instead I entered the formula =1/0 in all the cells with 0 and then sorted putting this group near the bottom. Then I selected an deleted all the cells with error formulas with one step, line 25.

RemoveZeros Subroutine

This code removes all rows with zeros in the values field
Private Sub RemoveZeros()
    
    'replaces all cells with 0 with a formula generating an DIV/0! error
    Range("A1").CurrentRegion.Resize(, 1).Offset(0, k + 1).Select
        Selection.Replace _
            What:="0", _
            Replacement:="=1/0", _
            LookAt:=xlWhole
        
    'this sorts all cells with errors to the bottom making deleting rows faster
    With ActiveSheet.Sort
        .SortFields.Clear
        .SortFields.Add _
            Key:=Cells(1, k + 2), _
            SortOn:=xlSortOnValues, _
            Order:=xlAscending
        .SetRange ActiveCell.CurrentRegion
        .Header = xlYes
        .Orientation = xlTopToBottom
        .Apply
    End With
    
    'this command deletes all rows with errors
    Selection.SpecialCells(xlCellTypeFormulas, 16).EntireRow.Delete
End Sub

ConvertToRange Subroutine

'The user can keep the table or convert it to a regular spreadsheet range
'The user can keep or remove the autofilters in either case
'The user can keep or remove the formatting in either case
Private Sub ConvertToRange()
    Dim rList As Range
 
    With ActiveSheet.ListObjects(1)
        Set rList = .Range
        If frmNormalize.cbConvertToRange = True Then
            .Unlist                                                                     'converts a table to a range
            If frmNormalize.cbFilter = False Then Range("A1").CurrentRegion.AutoFilter  'turns on  autofilter
            If frmNormalize.cbFormatting = True Then                                    'removes table formatting
                With rList
                    .Interior.ColorIndex = xlColorIndexNone
                    .Font.ColorIndex = xlColorIndexAutomatic
                    .Borders.LineStyle = xlLineStyleNone
                End With
            End If
        Else
            If frmNormalize.cbFilter = True Then .Range.AutoFilter  'Turns off autofilter
            If frmNormalize.cbFormatting = True Then                'Turns off table formatting
                .TableStyle = ""
            End If
        End If
    End With
End Sub

Tidy Up

That’s it for today. HUGE thanks to Shane Devenshire for sharing his Data Normalization Utility with us and allowing me to post it on my blog. I hope you enjoy it and find it useful in your daily work.

Downloads

Download the file from OneDrive. The zip archive contains the Data Normalizer in .xlsm format as well as a Microsoft Word version of the tutorial.

Other posts regarding data normalization or “UnPivot” at dataprose.org

    Undaunted by UnPivot

Other posts regarding data normalization or “UnPivot” around the horn

    UnPivot Shootout – DDoE
    Reshape Data
, ,


Thanksgiving2
May rich blessings – Plenty, Peace and Prosperity be yours this Thanksgiving Day. Many good harvests to gladden you. -Source unknown

I can’t improve on that, so I won’t try. I hope you are able to spend some time today with family and friends.

Best regards,
Winston