OpenRowSetExcelArrowSQL

My grandparents had a small farm in Iowa . We only lived a few miles away and my brothers and I spent a great deal of time at the farm with our grandparents when we weren’t in school.

I traded the rolling corn fields of Iowa for the rusty-reddish brown soil of Arizona many years ago, but I frequently think about my grandparents and working around the farm with the crops, the garden, the animals and the machinery.

However, today’s post is not about feeding the chickens, or shucking the corn – it is about inserting data from Excel Worksheets to SQL Tables using the T-SQL Function OpenRowSet.


SQLServer

You’ll need SQL Server installed for this to work. I’m using SQL Server 2012 Developer on my laptop, but you could also use SQL Server Express which is free but limited in some of its functionality.

You can pick up a Developer copy of SQL Server 2014 Developer Edition from CDW. It will cost you ~ US $50. You can download SQL Server Express here. It’s free! However, you will need the Developer Edition if you want to do any testing with SSAS Tabular or Multidimensional.


Security300

You’ll need to change the default security for the SQL Server Instance you want to use the OpenRowSet Function on. I’ll use Windows Authentication instead of SQL Server Authentication.

Services1

  • Close SQL Management Studio if it is open.
  • Launch Services.msc. I’m using Windows 8.1 so I went to Search, typed in services and selected View Local Services
  • Jump to SQL Server by typing “S” and then scroll as needed until you find the instance of SQL Server that you want to change security settings.

ServicesClickStopFinal

  • Right-click on the Service and click on Stop.

ServicesClickProperties

  • Right-click on the Service and click on Properties.

PropertiesDialog

  • In the Properties Dialog, Click on the Logon Tab, enter your Windows Login information, Click OK.
  • Right-click on the Service and click on Start.

Settings

Depending on your system configuration, you may need to download and install the correct data connectivity components if you are using Excel 15.0 file formats (.xlsx etc…)

InsertFinal

I’ll use a SQL INSERT statement with the OpenRowSet Function to insert data from an Excel Worksheet to a table in my database

DataSample

Here is the Excel data source, so I’ll need to create a Table in my SQL Database with appropriate fields and data types to receive the data.

NewDatabase

I added a new database “Tecumseh” to my default instance of SQL Server. Now I can add a Table to the Database.

use tecumseh

CREATE TABLE scores
(
Reps nvarchar(50),
TrxDate date,
Region nvarchar(50),
Score float(53),
);

ScoresTableCreatedFinal

The Scores table with the required fields has been added to the Tecumseh database. Next, I’ll write some SQL to insert the data from the Scores.xlsx Workbook to the Scores table. Make sure the Excel file is closed or you will receive file exclusivity warnings.

USE tecumseh

INSERT INTO dbo.scores
SELECT *
FROM OPENROWSET('Microsoft.ACE.OLEDB.12.0',
	        'Excel 12.0 Xml;HDR=YES;Database=C:\data\scores.xlsx',
		'SELECT * FROM [sheet1$]');

The last step is to check that what is in the SQL Table matches what is in the Excel Workbook.

SQLResults

ExcelRows

Both The SQL Table and the Excel Workbook have 590 records! Sweet success!

Final Thoughts

That’s it for today. OpenRowSet and other functions (BULK INSERT) can be used in place of developing Integration Packages (SSIS). I receive many disconnected and disjointed files from a variety of sources each week/month. I prefer to clean everything up and move it to a database such as SQL or MS Access. Your thoughts?

, , , , ,

BearFF

In my last post on using ActiveX Data Objects (ADO) with Excel VBA, I demonstrated some code to load a Recordset, filter the Recordset using the Recordset’s Filter Prpoerty, and copy the Filtered Recordset to a Worksheet using the CopyFromRecordset Method of the Range Object.

I put a link to the blog post on the Excel VBA and Users Group on LinkedIn. To our good fortune, James Wilson was reading. James responded with some nice comments and some very good code of his own. I was impressed and asked James if he would like to do a write up to post on the blog.

James kindly accepted my offer as follows in James’ words. Take it away James!



Bringing the full power of SQL to bear in Excel

James Wilson
September 13, 2014

“I feel the need, the need for speed.” Top Gun

I love Excel, but sometimes you just want a bit more power to analyse your data. My favourite tool for analysing large quantities of data has always been SQL. While Microsoft includes MS Query in Excel out-of-the-box, it does have many limitations and is relatively slow. Using VBA and ADO is the next logical step.

The code below is the latest incarnation of a general purpose SQL function I’ve been using for the last five years or so. For me the data is the thing – I want to be able to start querying my data using SQL without having to start coding from scratch each time. Just copy and paste into a module in your workbook, and you’re ready to go.

Code first then some explanation:

Function SQL(ByVal SQLstr As String, ByVal Destination As String, Optional ByVal ConnectionString As String) As Boolean

    On Error GoTo ErrorHandler
    Application.ScreenUpdating = False
    Application.Calculation = xlCalculationManual
    Application.DisplayAlerts = False

    Dim myConnection As Object
    Dim myRecordSet As Object
    Dim myQueryTable As QueryTable

    ThisWorkbook.Sheets(Destination).Activate
    ThisWorkbook.Sheets(Destination).Cells.Delete

    If ConnectionString = "" Then ConnectionString = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" & ThisWorkbook.FullName & ";Extended Properties=Excel 12.0"
    
    Set myConnection = CreateObject("ADODB.Connection")
    Set myRecordSet = CreateObject("ADODB.Recordset")
    
    myConnection.ConnectionString = ConnectionString
    myConnection.Open
    myRecordSet.ActiveConnection = ConnectionString
    myRecordSet.Source = SQLstr
    myRecordSet.Open
    
    Set myQueryTable = Sheets(Destination).QueryTables.Add(Connection:=myRecordSet, Destination:=Range("'" & Destination & "'!a1"))
    myQueryTable.Refresh
    
    If myRecordSet.State <> adStateClosed Then myRecordSet.Close
    If Not myRecordSet Is Nothing Then Set myRecordSet = Nothing
    If Not myConnection Is Nothing Then Set myConnection = Nothing
    
    Err.Clear
ErrorHandler:
    If Err Then
        Sheets(Destination).Cells(1, 1) = "SQL Error: " & Err.Description
        SQL = False
    Else
        SQL = True
    End If
    
    Application.ScreenUpdating = True
    Application.Calculation = xlCalculationAutomatic
    Application.DisplayAlerts = True
End Function

I wrote this as a VBA function rather than a sub procedure, because I wanted to be able to trap errors in the calling procedure. It is a really a matter of preference whether you like your functions to return True or False (lines 28 and 30). I’ve chosen to return True if it worked, so I’d call it using coding something like:

Sub test()
    DidItWork = SQL("SELECT * FROM [Sheet1$] WHERE [Sheet1$].[date] > #01/03/2014#", "Sheet2")
    If DidItWork = False Then … 'Do some error handling
End Sub

Basically this function sets up an ADO link to a data source, executes an SQL query on that data, and returns the results of that query to a worksheet of our choice in the current workbook using a query table.

When you call this function (line 1), you pass a string with your SQL query, a string with the name of the worksheet you want the data to be returned to, and optionally a connection string to the data source. By default the data source is the workbook the code is in. That’s right – you can use SQL to query data in other tabs in the same workbook (make sure your workbook is saved first).

If you find that most of the time you are querying a corporate database or other data source, then you’d just tweak line 15 of the coding to default to the connection string to the data source you are using most often.

Lines 4 to 6 and 42 to 44 are just the standard VBA codes that you’d put in to speed up any bit of coding. If you are calling this function and have these bits of coding in the calling procedure, then you can safely delete these lines from this function.

Lines 8 to 10 are to set up a local connection and recordset object (we’re going to use ADO to get our data), and a query table (which we are going to use to return the results of the SQL query to Excel).

The way this function is written, your data output is always going to be a worksheet in the current workbook in Cell A1. You can have no other data in this worksheet as Lines 12 to 13 delete the contents of the worksheet, before it is refreshed again with the query table set up in Line 26.

Line 17 to 27 is the meat of the function, setting up an ADO link and returning the data using a query table.

Lines 29 to 31 are to tidy up objects and connections. Line 36 is to give you a clue if you’ve made an error in your SQL.

So I’ve a personal library function that allows me to use SQL in Excel without much further thought – what do I do with it? Let me give a few simple examples to give an idea of the possibilities.

  1. Treat my current spreadsheet a bit like a mini-database and run queries on it – that would be much harder to do just using VBA or manually copying and pasting.
  2. Suck data out of multiple corporate databases and spreadsheets and join it together. You don’t need even to open the spreadsheets to get the data (as I said to start with – it’s all about speed and power). So for example let’s say you have one spreadsheet from your sales guys with sales volumes, and you have another spreadsheet with the confidential prices for each customer, then you can do a bit of SQL coding like:
    DidItWork = SQL(“SELECT A.*, B.[Price], A.[Volume]*B.[Price] as [Revenue] from [C:\Sales Volumes.xlsx].[Data$] A LEFT OUTER JOIN [F:\Prices.xlsx].[Sheet1$] B ON A.[Product] = B.[Product] AND A.[Customer No] = B.[Customer] “, “Sales Forecast”)
    So I’m using SQL aliases A and B for brevity, and by using multipart identifiers specifying the full path and filename of the Excel workbooks I can suck data out of any file I have access to. Note if you specify the data source fully, the connection string ADO uses is virtually irrelevant.
  3. By using For…Next loops in VBA and a bit of text manipulation and the SQL command UNION I can consolidate multiple similar data sources simply. So using a string variable like below in a loop:

mySQLstring = mySQLstring & ” UNION ” & …
Good for consolidating budgets submitted in a similar format.

The limitation is really your knowledge of SQL. Beware of missing spaces and extra commas in your SQL if you are using the VBA & _ to join long strings together to form your SQL.


Tidy Up

Thanks James – great job! How do you use ADO, SQL, Recordsets and QueryTables in your Projects?

, , , , , , , , ,

StoogesFinal

I’m a huge fan of the 3 Stooges (pictured here) and Curly Howard in particular. Without a doubt, they appeal to my 5 year-old sense of humor with their brand of inane slapstick.

Shemp Howard, Curly’s brother, was ok, but Curly Joe? Ah Marone! Don’t even get me started!

Curly Howard was pure slapstick comedic genius – undisputed. Unfortunately, Curly left us far too early.

Today’s post, however, is not about the Stooges or Curly. It is about Cartesian Products and creating “Controlled” random data.

Random Data Generators

Dick Kusleika and Jimmy Pena offer Random Data Generator add-ins on their respective sites. Jimmy’s will cost you a couple of bucks:

  1. Daily Dose of Excel – Dick Kusleika
  2. JP Software Technologies – Jimmy Pena

Cartesian Product

A Cartesian Product returns everything from Table A with everything from Table B which is generally unwanted and means you have not configured your join(s) properly.

Here I have a couple of Tables of NFL Teams and Divisions that I would like to join in a query to get the teams into their respective divisions.

AccessTables

Here’s my query in the Query Design Window

NFLQry1

When I run the query, I get a Cartesian Product where every team is returned against every division. It is not possible for the Arizona Cardinals to belong to 8 different divisions:

NFLCartesian

Revised Join

I revised the join to show the correct relationship between Division and Team:

NFLTeamDivRelate

Now I get the correct results when I run the query:

NFLTeamDivRelateData

Create “Controlled” Sample / “Dummy” Data

Now, I want to create some sample data to load for a PivotTable so I want some random values, but I want to control the Row Label and Column Field values. For Row Labels, I’ll use Regions and Representatives. This is a standard relationship, so we will reflect as such in the query design:

RegionsReps

Additionally, I would like to generate some sample (dummy) data for each rep for each day of the year. I’ll use a Cartesian Join to generate the dataset:

RegionsRepsCart

Note that there is no join between the Reps/Regions and Dates tables. The query results:

RegionsRepsCartResultRecAnnon

Let’s Create Some Random / “Dummy” Data

I’ll need a function in Access that I can add to the query to generate a random number

Public Function GetRandomValue(Optional x As Integer) As Double

    Dim dblRandom As Double
    
        dblRandom = (1000000 - 500000 + 1) * Rnd() + 500000
        GetRandomValue = dblRandom
        
End Function

Note the difference: in MS Access, the Built-in Function for generating a random number is Rnd(). In MS Excel, it’s Rand(). I added the User-Defined Function to the Query Design Grid and named the output field as SalesAmount:

QueryDesignRandomFunctionFinal

And the query results:

QueryRandomResultsFinal

The first value looks great, but do you notice how the same value keep repeating through every row? That’s not very random. The problem is the function is only called one time, so the random value is generated once and then repeated on every row. I’ll need to pass a unique value to the function to make it generate a new random number on every row of the query results.

To create the unique value, I’ll concatenate 3 fields in the query together

[RepName] & "-" & [RegionName] & "-" & [RecordDate]

I can then pass that as a unique value to the Random Function. First, I need to modify the Random Function a little bit. The Function was looking for me to pass an Integer Value. So I’ll just change the argument to a Variant.

'Old
Public Function GetRandomValue(Optional x As Integer) As Double

'New
Public Function GetRandomValue(v As Variant) As Double

Now I can update the query in the design grid:

QueryRandomValueUpdate

And the query results:

QueryResultsRandomValuesFinalAnno

Perfect! Ready to connect to the query with Excel PivotTable and pivot and slice ’til you get your fill.

MS Query

I was able to generate some random data with some controlled data in MS Access, can I repeat the same in MS Query? I’ll begin by copying the from each of the 3 Access tables to 3 different worksheets in an Excel Workbook:

AccessDataToExcel

Next, I’ll import the data from the 3 worksheets into MS Query

  1. Click on Data Tab on the Ribbon
  2. In the Get External Data group, click on From Other Sources
  3. In the resulting pop-up menu, click on From Microsoft Query

MSQueryMenuSteps

At the Choose Data Source Dialog, click on Excel Files and OK

ChooseDataSourceDiaAnn

At the Select Workbook Dialog, browse through the directory structure to find the workbook that contains the data that you would like to use to import from including the the current workbook.

SelectWBDiaAnn

At the Query Wizard – Choose Columns Dialog, expand nodes next to the tables in the panel on the left, find the columns from each table that you would like included in your query results, use arrow buttons to add and remove items from the pane, “Columns in your query”

QryWizardChooseColDia

You may receive an error message:

The Query Wizard can not continue because it can not join the tables in your query. You must join the tables manually in Microsoft Query by dragging the fields to join between the tables.

Go ahead and click, “OK”. The Query Results will be displayed with no joins between the tables:

MicrosoftQueryInitialResultsAnno

In the initial Query Results Window, notice that there are no Joins, so MS Query went ahead and created a Cartesian Product of Table A, B and C thus returning 14.6K records.

Go ahead and add the Join, by dragging RegionKey from the Reps Table to RegionsKey on the Regions Table. The query results will update to the 3,650 records as expected:

MicrosoftQueryUpdateResultsAnno

Once you have the query returning the correct results, click on, “Return Data” icon on the MS Query Toolbar.

MicrosoftQryReturnDataIconAnno

At the Import Data Dialog, select the option to view the data as a Table in the workbook and at the where to put the data prompt, choose, “New worksheet” option.

ImportDataDiaAnn

In $D$1 add a title for the column that will hold values such as “SalesAmount”. In $D$2, enter a randomization formula in the general form =Rand()*(High-Low)+Low. I used =Rand()*(1000000-500000)+500000. Double-click on the fill handle in the lower-right corner of the cell to send the formula to the bottom.

ExcelTableRandFrmla

ExcelTableSalesAmounts

Looks great! Ready to pivot and slice ’til you get your fill.

Tidy Up

    Final Thoughts

    Hopefully, I presented a case for when a Cartesian Product can be a good thing. How many times have you copied the same data move down 20 rows..paste..repeat..over and over? Lots I bet – I know I have.

    I prefer the MS Access method above, but I wanted to present some alternatives. I also wanted to lay some foundation for MS Query a very underutilized feature in Excel IMHO. I’ll take a look at it a bit more in future post(s).

    That’s it for today. I’m going to watch some Stooges – nyuk..nyuk..nyuk.

    Downloads

    Download the Access Database or the Excel Workbook from here on OneDrive.

    Additional Resource

    1. Contextures
    2. Excel User
, ,

Very good post regarding SQL Joins by Kosta Hristov on his blog, “Developing The Future”. Check it out.