|
Return to Newsletter Contents...
SQL Server Analysis Services: 'Analysis vs. Data Mining
by: Charles Tournear, Sr. Consultant, MCT, MCSE, MCSD, MCDBA, CRCP
Microsoft SQL Server Analysis Services provides tools to analyze data and to
perform data-mining of data. This
article will begin to answer the question: “What is the difference between
simple analysis and data-mining?”
Analytics (or data analysis) looks at existing data to see where we are now or
what we’ve done in the past. It
allows you to compare data to create trends to try and predict where you might
be next month compared to this month or to the same month last year.
You can use Key Performance Indicators (KPI) to look at the status of
current data and drill-down to detail data to see what or why the indicator
specifies a specific trend. You can
retrieve data by looking up information using Transact-SQL or a
Multi-Dimensional Expression (MDX).
Data Mining is Predictive Analytics.
It looks for previously unknown patterns in existing data or to answer questions
like what if we do this or that what might happen next.
Data Mining uses patterns in data to define a set of rules to help in
predicting future outcomes based on some set of mathematical algorithm.
It is very easy to create questions based on your existing data that might sound
like a data-mining concept, but is in reality just an analysis question.
So do you need data-mining?
If most of the questions that you would ask about your data or would want to
retrieve data “that meets a particular condition”, could be answered by a query
no matter how complex or through the MDX extension to the T-SQL language then
the answer is No.
To determine whether you need to use data-mining you need a thorough
understanding of how to use T-SQL and the analysis tools available in SQL Server
Analysis Services (SSAS) to perform analysis using tools that provide trend
analysis through cubes or KPIs such as Pro-Clarity or Performance Point Server
or to explore data using T-SQL and MDX.
Without such knowledge you won’t be able to decide if the questions that
you need to ask about your data require data-mining.
In most cases, analysis is enough.
Also in many cases you will find that in order to do some true data mining you
may not have all the data that you need to perform mining analysis.
You may need to have some surveys sent
out to collect more detail about people, places, or products in order for new
patterns to become visible.
In SSAS, data mining is done by first selecting a group of data that will fit a
certain model you are trying to analyze.
The next step is to select an appropriate algorithm to apply against that
data to create a set of rules. And the last step is to apply those rules to
similar data to get a predictive result.
The advantage of the data-mining tools provided by SSAS is that you don’t have
to hire some PHD that has a major in statistical analysis to apply some complex
set of mathematical equations to give you different collections of data, which
you then have decide if any of this information is really useful.
The hardest part especially after having someone demonstrate data-mining and
showing you how spectacular data-mining results look, is to determine whether
you have the data necessary, are willing to spend the extra cost to get the
necessary data, and the time to evaluate the results to determine if any of the
output received from using a particular algorithm is useful.
So the easiest way to begin is to start building a list of questions or results
that you would like to get from your data.
Then see if it’s possible to use standard analysis processes, to answer
those questions. In most cases you
will find that data-mining just isn’t what we really need right now.
Most of us ask questions to find out where we are at NOW and what we should be
doing next to survive and don’t really have the data or time to invest in
looking at where will we be 5 years from now, 10 years from now based on certain
trends or patterns in our data. We
tend to focus first on ways to improve how we’re doing things now and how can we
improve the efficiency of what we’re doing.
Until those questions can be answered, (until we’re sure we can survive
through tomorrow), then we can use the additional tools of data-mining to help
us predict the future, or learn what data we need to collect to try and find
hidden patterns in our processes. If
you really want to look forward to the future now though, consider looking at
some of the algorithms that are available in data-mining, to find out what
additional data you could be collecting, so that in the future you are prepared
to perform the analysis.
Go to Top |
Return to Newsletter Contents
|