Blog

Archive for the ‘Marketing’ Category

Accessing .Net Libraries in SQL Server

Wednesday, May 11th, 2011

In previous posts we demonstrated calling CenterSpace’s NMath libraries from Excel. Some customers asked if we could call these libraries from Microsoft’s SQL Server so we decided to give it a shot. It turns out the problem is a substantially harder problem to solve as the SQL environment is an order of magnitude more complex. This post will show it is possible, but with a few caveats.

There are a myriad of ways SQL Server can be setup and for this post, we took whatever shortcut we could find. Our goal was to get this running and worry about doing it “right” later. We found ourselves on a fairly steep learning curve without a lot of examples to work off of. Consider this more of a “proof of concept” post than the definitive “how to”.

As mentioned above there are lot of different approaches to this problem. In our minds, we wanted to see if we could couple our library at the server side and access results via Reporting Services and Report Builder. We believed that the computations would be more efficient to be handled at the server while the result could take advantage of the graphing packages in Report Services.

Our approach was to build a standalone system to develop and test this work on. We choose a standard PC platform running Windows 7. The first task at hand was to decide which version of Microsoft SQL Server to work with. We chose SQL Server 2008R2 full version as we needed the Report Services support. Microsoft supplies a six month trial version that has all services available which was perfect for our project. We installed the full version with support for the Management Studio, Report Services, and Analysis services. In addition, we downloaded Microsoft’s AdventureWorks sample databases for our testing environment.

We got this all running and ran some sample scripts to make sure we had everything running correctly before moving on. Since we were going to be developing code we installed Visual Studio 2010 next and made sure we included the necessary development templates for the SQL Server environment.

After installing CenterSpace’s NMath 5.0 library, we were ready to write some code.

We will start by creating a new project in Visual Studio called NMathSQL.

Creating a new SQL CLR project in Visual Studio

We then asked to pick a server and identify the database we will be using.


Connecting to the Database

In the next screen we will want to enable SQL/CLR debugging.

Enabling SQL/CLR debugging

At this point we need to start running SQL scripts as part of the overall setup. We could exit VS and perform these tasks in the Management Studio, but VS can do the job. By selecting from the VS menu bar Data>Transact-SQL Editor>New Query Connection, we get a new window to enter scripts to be run on our database.

Accessing the Transact SQL Editor in Visual Studio

We found it easier to copy and paste our scripts into this windows as we needed to run SQL statements.

Our first task at hand is to load the assembly NMath.dll into the database we have selected to work in. It is at this point we hit our first real obstacle. It turns out that in our shipping NMath.dll we are using some instructions to automatically load x86 or x64. These are illegal in the SQL CLR. We were forced to remove these instructions and build an entirely separate version of NMath.dll for SQL Server. Stepping back and looking at the big picture, we decided that this was an opportunity to also include some of the “glue” code we knew we wanted to write.

We had been looking at writing some User Defined Types (UDT) and User Defined Aggregate (UDA) functions to handle the calls into our library. It seemed to make sense to extend our libraries type definitions to be T-SQL types. The result of this approach would enable T-SQL to use NMath types such as DoubleVector and DoubleMatrix. We needed to have some way to load our new data types so we built UDAs LoadDoubleVector, LoadDoubleMatrixByRow, and LoadDoubleMatrixByColumn. After building our new, custom NMath.dll, we can then run a script to load our assembly into the database.

Before we could do this, though, we discovered that we needed some basic assemblies that SQL server had left out of their CLR. The following SQL script loads those assemblies.

Loading Supporting Assemblies to a SQL Database

We are now ready to run the following SQL script to load NMath for SQL.

Loading NMath assembly in the SQL Database

The next step is to add the reference to our project. From the main toolbar launch the Solution Explorer and right click on Reference then select Add Reference. Add the new assemblies we have loaded so far.

Adding NMath reference to the SQL/CLR project

We will now have to tell SQL about the new types and functions we have added. We will start by adding the new UDTs DoubleVector and DoubleMatrix.

SQL commands to add UDTs for LoadDoubleVector

Now that SQL knows about our new types we can add the UDA functions with the following SQL commands.

SQL Commands to add NMaths UDAs to the Database

At this point we are ready to test our library. We have chosen a problem that represents the work expected to be perform. We will create our sample datapoint tables rather than use tables from the AdventureWorks database. We will then load the data from the tables into DoubleVectors and call our library using a User Defined Function(UDF). Our UDF will be a call to library to find the function values for a curve that fits our datapoints. From our function solution we will create a table with datapoints that represent our curve function to compare against our original datapoints.

We will start by creating the UDF call to our library. From the Solution Explorer, we will add a UDF template called NMathFunction.cs.

Here is the C# code for this function call:

using System;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using CenterSpace.NMath.Core;
using CenterSpace.NMath.Analysis;
 
 
public partial class UserDefinedFunctions
{
    [Microsoft.SqlServer.Server.SqlFunction]
    public static DoubleVector NMathFunction(DoubleVector xValues, DoubleVector yValues, DoubleVector start)
    {
        NMathFunctions.GeneralizedDoubleUnaryFunction f = AnalysisFunctions.FourParameterLogistic;
        OneVariableFunctionFitter fitter = new OneVariableFunctionFitter(f);
        return new DoubleVector(fitter.Fit(xValues, yValues, start));
    }
}

We can now tell VS to build the assembly NmathSQL.dll and use SQL to load the assembly to our database.

Loading our custom NMathFunctions to the database

In order for our UDF NmathFunction to be recognized we need to run a SQL create function command. Note that this command must be the only command in the script.

SQL Commands to add our UDF to the database

We have now loaded our NMath function call in the database that can be called by scripts. We are ready to write a script to solve the problem we described above.

We will start by using the same data from the example we did with our excel post.

Sample Data for our example

We can now build a script that uses the functionality we have built in with our library to find the solution.

-------------------------------------------------------
Declare @xv DoubleVector
Declare @yv DoubleVector
 
select @xv = dbo.LoadDoubleVector(xcol) from xvalues
select @yv = dbo.LoadDoubleVector(ycol) from yvalues
 
Declare @sPar DoubleVector
Declare @solution DoubleVector
 
set @sPar = '[ 0.1 0.1 0.1 0.1]'
 
select @solution = dbo.NMathFunction(@xv, @yv, @sPar)
 
select @solution.ToString()
 
declare @newxval float
declare @newyval float
declare @inc float
declare @a float
declare @b float
declare @c float
declare @d float
 
set @a = dbo.DVItem(@solution, 0)
set @b = dbo.DVItem(@solution, 1)
set @c = dbo.DVItem(@solution, 2)
set @d = dbo.DVItem(@solution, 3)
 
create table SolutionTBL (newxval float, newyval float)
	set @newxval = 0
 
	while @newxval < 35.10
	begin
 
	    set @newyval = @d+((@a-@d)/(1+POWER((@newxval/@c), @b)))
 
	    insert into SolutionTBL values(@newxval, @newyval)
 
	  set @newxval = (@newxval + 0.33)
	end
	select * from SolutionTBL
	go
---------------------------------------------------------------------

Here is the output from this script

Returned Solution from NMath library call

At this point we can pull data from tables, call our Math libraries, and put the solution in a table to be displayed. In production, this last script would be a stored procedure that would be run from the Management Studio. All of this work would reside on the SQL Server.

We can now move on to the Reporting Services to see how this solution could be displayed.

After launching Report Builder and establishing connection to the database. The data is accessed by setting up a dataset with a query into the table for the necessary data.

Selecting the chart data in Report Builder

From there it is merely Report Builder to build the necessary graphs. Unfortunately, the chart wizard doesn’t include scatterplots. It is best to select a chart type of Line and then run through the wizard. After your chart is set up you can change its properties to scatterplot.

Report Builder ScatterPlot

We can include our solution in the same report and generate a smooth line as we did in excel. The Report Builder is a very powerful tool which we are still learning.

Solution Fitted Line

Our results are equivilent to our previous excel post and demonstrate that the approach on the SQL Server is as accurate.

In conclusion, we have demonstrated that we can call the CenterSpace NMath libraries from SQL Server and display the results using Report Services powerful charting capabilities. We plan to work on more examples and compare how we might improve on what Analysis Services can produce. We certainly will entertain feedback on useful approaches that should be examined. As a reminder this interface is not possible with our current release of NMath 5.0 and require a “special” version of our assembly. Depending on customer interest and feedback we may or may not decide to develop a product for this interface.

Advanced Curve Fitting using Excel and NMath

Monday, March 14th, 2011

In recent blog posts, we have discussed how to call CenterSpace’s Libraries from Excel. In his March 2010 blog post, CenterSpace’s Ken Baldwin demonstrated how to replicate Excel’s existing Trendline functions using C# and NMath. In this post, we will demonstrate the advanced curve fitting functions available in the CenterSpace libraries that could be easily be integrated into Excel analysis work.

Curve fitting is one of the most practical applications of mathematics as we are often asked to explain or make predictions based on a collection of data points. For example, if we collect a series of changing data values over time, we look to explain the relationship that time effects the generated values or in mathematical terms y=f(x). Here we are using x to represent time values and f(x) to represent the relationship or function that generates the resulting values or y. So if we can find a function f(x) that represents a good fit of the data we should be able to predict the result at any moment in time.

With this in mind, let us get started with some data points (x, y) that we have collected. I have entered the following values in an Excel spreadsheet.

We can now use Excel’s charting function to create the following XY scatterplot with our data:

At this point, we can add an Excel trendline to get the best fit it can provide. By right clicking on a data value in our chart we will get the option to add a trendline. Choose a 2nd order polynomial and these options.

Name the trendline “2nd Order Polynomial” and check “Display equation on chart” and “Display R-squared value on chart”.

Excel calculates and plots the line while returning the equation and the R2 value. If our R2 equals 1 we would have found a perfect fit. For a second order polynomial Excel returned a value of 0.8944 which means that roughly 10.6 percent (1-.894) are not on this line.

If we continue increasing our polynomial orders up to the maximum of six we can achieve the best R2 value of 0.9792, but look at the curve we have fitted to these points.

A better fit might be an exponential function so let us try Excel’s trendline option using exponentials.

Clearly the results are visually a better fit but the R2 value tells us that over 15% of the data points are not on this line. This pretty much represents the best we can do with Excel’s trendline functions for our data.

Looking to CenterSpace’s NMath and NStat libraries to give us more robust analysis, we can utilize more powerful curve fitting tools quickly and with little effort.

Using the linkage provided by ExcelDNA that we examined in our previous posts, we can create the following C# code for our ExcelDNA text file.

<![CDATA[
 
using System;
using ExcelDna.Integration;
using CenterSpace.NMath.Core;
using CenterSpace.NMath.Matrix;
using CenterSpace.NMath.Analysis;
using CenterSpace.NMath.Stats;
 
public class NMathExcelCurveFit
{
 
 [ExcelFunction(Description="Four parameterized Fit")]
 public static double[] NOneVarFunctFitFour(double[] xValues, double[] yValues, double[] start)
 {
  DoubleVector TempVx = new DoubleVector(xValues);
  DoubleVector TempVy = new DoubleVector(yValues);
  DoubleVector TempVs = new DoubleVector(start);
  OneVariableFunctionFitter
<TrustRegionMinimizer> fitter = new  OneVariableFunctionFitter
<TrustRegionMinimizer>( AnalysisFunctions.FourParameterLogistic );
  DoubleVector solution = fitter.Fit(TempVx, TempVy, TempVs);
  return solution.ToArray();
 }
 
 [ExcelFunction(Description="Four parameterized R2")]
 public static double NOneVarFunctFitFourR2(double[] xValues, double[] yValues, double[] start)
 {
  DoubleVector TempVx = new DoubleVector(xValues);
  DoubleVector TempVy = new DoubleVector(yValues);
  DoubleVector TempVs = new DoubleVector(start);
  OneVariableFunctionFitter
<TrustRegionMinimizer> fitter = new  OneVariableFunctionFitter
<TrustRegionMinimizer>( AnalysisFunctions.FourParameterLogistic );
  DoubleVector solution = fitter.Fit(TempVx, TempVy, TempVs);
  GoodnessOfFit gof = new GoodnessOfFit(fitter, TempVx, TempVy, solution);
  return gof.RSquared;
 }
}
]]>

As you can see by the code, I have chosen to use NMath’s OneVariableFunctionFitter with the FourParameterLogistic predefined generalized function.

From CenterSpace’s documentation, we get the above equation and need to solve for the model parameters a, b, c, d .

The OneVariableFunction call will require us to provide a starting guess along with the ranges containing the xValues and yValues. The following screen shows us making the function call from Excel with the necessary parameters.

After computing these values, we can call for the R2 value and generate some points to be plotted.

Note that I choose to hide the rfows from r17 to r107 for display purposes. You will need to have them unhid for the chart to look right. As you can see we returned the best R2 value so far at 0.9923.

In the next screen, we will have added the series to our chart and drawn a line through our calculated points.

This example illustrates the ease that Excel can use the power of NMath curve fitting routines to compute accurate fits to a collection of data.

Mike Magee

Calling External .NET Libraries from Excel

Wednesday, December 8th, 2010

There are many circumstances where you may need to access an external library of functions or routines from Excel.  For example, if you need a complex function such as fitting data to a surface, or portfolio optimization, that is not natively available in Excel.  There also may be a need to protect proprietary calculations by using user defined functions to process algorithms in a black box manner.  I was looking for a way to rapidly prototype some calculations without setting up a complex development environment.

Harking back to the old rule of development projects of  “two out of three”, when the three metrics are fast, cheap, and quality.  On any time limited project you only can plan to achieve two metrics, never all three. Initially I like to dive in and start with fast and cheap and work my way towards quality as necessary.  So, we’ll start with the quick and dirty approach to calling external libraries from Excel.

Project Setup

You must have a version of .NET Framework of 2.0 or greater.  The latest version is free from Microsoft at this link.
http://www.microsoft.com/net/download.aspx.

You’ll also need:

  • Excel 97 or later.
  • External library assemblies that you need to access from Excel. In our case we will use Centerspace’s NMath.dll and NMathStats.dll.
  • A freeware product called ExcelDNA written by Govert van Drimmelen that can be downloaded at http://exceldna.codeplex.com/ .

The first order of business is to unpack the downloaded file, ExcelDNA.zip, into a working directory.  For our example, we will use CenterSpaceExcel as our directory name. After unpacking you should have two folders Distribution and Source in our CenterSpaceExcel directory.  Inside the Distribution folder locate the file ExcelDNA.xll and rename it to NMathExcel.xll.

We now need to locate in the same directory the file ExcelDna.dna and rename it to NMathExcel.dna.  Then using notepad, or your favorite code editor, and open the file NMathExcel.dna.

You should see the following code:

<DnaLibrary>
<![CDATA[ 
     Public Module Module1   
       Function AddThem(x, y)      
          AddThem = x + y   
       End Function 
    End Module 
]]>
</DnaLibrary>
Assuming CenterSpace NMath and NStat are installed in the standard locations. Change it to read as follows and save:
<DnaLibrary>
 
<Reference Name="CenterSpace.NMath.Core" />
<Reference Path="C:\Program Files\CenterSpace\NMath 4.1\Assemblies\NMath.dll" />
<Reference Name="CenterSpace.NMath.Stats" />
<Reference Path="C:\Program Files\CenterSpace\NMath Stats 3.2\Assemblies\NMathStats.dll" />
 
<![CDATA[
	Imports NMath = CenterSpace.NMath.Core
	Imports Stats = CenterSpace.NMath.Stats
 
	Public Module NMathExcel
 
		<ExcelFunction(Description:="Returns x to the y power")> _
	    	Function NPower(x as double, y As double) As double
			NPower = NMath.NMathFunctions.PowFunction(x, y)
		End Function
 
		<ExcelFunction(IsMacroType:=True, IsVolatile:=True)> _
		Function NRand() As double
			dim rand As New NMath.RandGenMTwist
			NRand = rand.Next53BitRes()
		End Function
 
		<ExcelFunction(Description:="Binomial Distribution: Number of Successes, Trials, Probability, Cumulative is True or False")> _
		Function NBinomDist(NSuccess As Int32, NTrials As Int32, Prob As double, Cumul As Boolean) As double
			dim nbin As New Stats.BinomialDistribution
			nbin.N = NTrials
			nbin.P = Prob
			IF Cumul
				NBinomDist = nbin.CDF(NSuccess)
			Else
				NBinomDist = nbin.PDF(NSuccess)
			End If
		End Function
 
		Function NDoubleMatrixRand(rsize As integer, csize As integer, RandLBx As integer, RandUBy As integer) As Object(,)
			dim rng As New NMath.RandGenUniform(RandLBx,RandUBy)
			Rng.Reset(&#038;H124)
			dim TempA As New NMath.DoubleMatrix(rsize, csize, Rng)
			NDoubleMatrixRand = NCopyArray(TempA, rsize, csize)
 
		End Function
 
		Function NCopyArray(IMatrix As Object, rsize As integer, csize As integer) As Object(,)
			dim i As Integer
			dim j As Integer
			dim OArray(rsize, csize) As Object
			for i = 0 to rsize - 1
			   for j = 0 to csize - 1
				OArray(i,j) = IMatrix(i,j)
			   next j
			next i
			NCopyArray = OArray
		End Function		
 
	End Module
]]>

We now have created the VB code to call our CenterSpace Math and Statistics libraries with the following five functions.

  1. The first function shows a simple math library call to the Power function which takes a number x and raises it to the y power and returns the value.
  2. The second function shows a call to obtain a fast random number from the math library.  Since we want a new number each time the spreadsheet is re-calculated we have made the function volatile.
  3. The third function call shows how to set values that need to be accessed by a function in our .NET assemble; in this case, the Binomial Distribution.
  4. The fourth function demonstrates the creation of a DoubleMatrix that is the filled with random uniformly distributed numbers.
  5. The fifth function is a helper sub-routine to transfer data across the com interface.

Test our setup in Excel

Open Excel and move your cursor the Tools menu item.  Usually towards the bottom of the drop down menu you will find the selection Add-Ins.  After selecting Add-Ins, you see the pop-up window with the option to select Microsoft supplied Add-ins.  Choose the Browse option and go to the working directory we created at the beginning.  In our case, this will be the CenterSpaceExcel directory.  Next select the Distribution folder and you should see the renamed file: NMathExcel.xll.  Select it and you should now see the following screen.

Selecting a user created XLL as an Add-in for Excel

Make sure NMathExcel is checked and click OK. If you get an error and this point it is probably due to a typo in the DNA file, otherwise you will get the expected new sheet ready for entry.

Select an empty cell and then select from the menu bar Insert then from the pulldown Function.  You should see the following pop-up.

Selecting the category containing our NMath functions

At the bottom of the category pull down you should see our NMathExcel Functions;  Select it and you should have these options.:

NMath Excel Function Category

If we choose NPower, we will get the next screen,

Calling NMath Library Power function in Excel

I arbitrarily typed the value of 3.2 for x and 3.327 for y.  You can see the result of 47.9329301 before selecting OK.

Select OK and Excel will insert the value into the cell.  Select another blank cell and this time choose our NRand() function.  You will notice there is no opportunity to enter values and finish by selecting OK.  At this point you should see a number between 0 and 1 in the cell.  Each time you press F9 (sheet recalc) a new random number will appear.  If we had not made this function volatile the number would not change unless you edit the cell.

To test our Binomial Distribution function, again we will select a new cell and use the insert function option to insert the NBinomDist function with the following values.

Calling NMath Statistical function Binomial Distribution from Excel

At this point we have made successful calls into both of CenterSpace’s NMath and NMath Stats .NET math libraries.

In our fourth example, we will see how Excel handles matrices and look at issues passing array arguments across the COM interface.  Excel 2003 was limited to a maximum of 60,000 cells in an array, but Excel 2007 was expanded to handle 2 million.  Excel has some quirky ways of displaying matrices, and I’ll cover the in’s and out’s of these quirks.

We have written the basic code to set up a function called NDoubleMatrixRand for the purpose of creating a matrix with supplied dimensions and filled with uniform Random numbers over a specified distribution.  We will select another blank cell and again go to insert function and this time choose NDoubleMatrixRand.  Suppose we want to create a 6×6 matrix filled with random numbers between -2 and 2.  Our input will look like the following screen.

Creating a DoubleMatrix in Excel using NMath

Notice the equal sign in the middle right of the above screen is equal to {-0.994818527251482,-0.08

Values inclosed in curly brackets that are separated by commas indicates that an matrix was actually created, but you can see the Formula result is only displaying a partial value due to display size. At this point when you select OK, you will have a cell with a single value.  Here is where the fun begins.  Start at the cell and drag a 6×6 range as shown in the following screen.

Selecting the area the matrix is to be displayed in

Now get your fingers limbered. Here is where it gets a bit obscure – do exactly as follows.

  • Press the F2 key.  (pressing F2 may be  optional but is recommended by Excel as the cell leaves the edit mode)
  • Press and hold the Ctrl key followed by
  • pressing and holding the Shift key followed by
  • pressing the Enter key

and presto chango!  You should see a screen like this.

Displaying a Matrix in Excel

Notice that your cell’s formula is now enclosed in { }, indicating to Excel that the contained formula is an array function.  This is the only way to get matrices displayed.  Also, if you try to edit this cell you will get an error that changes are not allowed.  If  you want to change the dimensions simply reference the values from another cell when you create the function.

The fifth function NCopyArray copies the library matrix across the COM bridge into an Excel array object.  As I stated in the beginning this would be a quick and dirty approach and would leave room for improvement.

Summary

In my next post, I will provide the above code in C# and add more function calls with some matrices with hopefully an improved approach to NCopyArray.  Future posts will include creating a packaged XLL and a more complex example such as curve fitting.

Since time is our most precious asset, being able to quickly access complex math functions with a general purpose tool like Excel should save time and money!

At CenterSpace, we are interested if this blog is helpful and if there is a need for more examples of how our libraries can be accessed by Excel.  Let us know what areas are of interest to you.

Mike  Magee

Thanks and Resources
Also, a special thanks to Govert van Drimmelen for writing a wonderful tool such as ExcelDNA.

SuperComputing 2010

Friday, October 15th, 2010

CenterSpace Software will be at SC 10 in New Orleans. Our team will be delivering a presentation at the Intel booth on November 18th at noon. Our CEO, Trevor Misfeldt, and CTO, Paul Shirkey, will be presenting a customer success story on NMath and MKL.


SuperComputing 2010 logo



If you’re attending, please join us at the presentation. We look forward to seeing you. Let us know in advance and we can schedule some time to talk about NMath, consulting, cloud computing and new features.

NMath Upgrades Available

Tuesday, April 14th, 2009

We’re pleased to announce new versions of the NMath libraries – NMath 4.1, and NMath Stats 3.2.

Added functionality includes…

NMath Core

  • Custom debug visualizers for matrix and vector classes.
  • Improved performance on multi-core computers.
  • and more

NMath Matrix

  • General sparse matrix classes and factorizations
  • Weighted least squares (WLS) by complete orthogonal (CO) decomposition.
  • Iteratively reweighted least squares (IRLS)
  • and more

NMath Stats

  • Data clustering via nonnegative matrix factorization (NMF).
  • Custom debug visualizer for DataFrame.
  • and more

NMath Analsyis

  • Nonlinear least squares minimization using the Trust-Region method, a variant of the Levenberg-Marquardt method.
  • Curve and surface fitting by nonlinear least squares.
  • and more

Upgrades are provided free of charge to customers with current annual maintenance contracts. To request an upgrade, please contact sales@centerspace.net. Maintenance contracts are available through our webstore.