Approaches for Data Mining with MicroStrategy

You can incorporate MicroStrategy into the data mining workflow in a number of ways. These alternatives are grouped based on how the model is ultimately deployed and where the scoring takes place. The common approaches for incorporating data mining within MicroStrategy are described below:

Scoring the Database: Records are scored in batches and saved as tables or columns.
Database Does the Scoring: The database scores records in response to queries.
MicroStrategy Does the Scoring: MicroStrategy scores records using metrics and reports.

While MicroStrategy supports all three approaches, each has positive and negative aspects. The next sections describe each approach in detail.

Scoring the Database

In this approach, records are scored and inserted into the database either as new tables or as new columns in existing tables. Most often, a third-party scoring engine receives a result set and scores the records. Then the scores are added to the database. Once they are part of the database, MicroStrategy attributes or metrics can reference those scores, just like any other data in the database. Historically, this approach has been the most common. Its pros and cons are described below.

Pros

Since an external scoring engine performs the scoring calculation, model complexity and performance is hidden within the scoring engine. Thus, the scoring process does not require any database resources and does not impact other business intelligence work.
At run time, data is simply read from the database without having to calculate the score on the fly. Scoring on the fly can slow analysis especially if millions of scores are involved.
MicroStrategy can use this approach by just creating metrics or attributes for the scored data.

Cons

This approach requires database space and the support of a database administrator.
New records that are inserted after the batch scoring are not scored.
Updating the model or scores requires more database and database administrator overhead.
In many companies, adding or updating information in the enterprise data warehouse is not done easily or whenever desired. The cross functional effort required to score the database limits the frequency of scoring and prevents the vast majority of users from trying new models or changing existing ones.

This approach is really no different than adding other entities to a MicroStrategy project. For more information, see the Project Design Help.

Database Does the Scoring

In this approach, data mining features of the database system are used to perform the scoring. Nearly all major databases have the ability to score data mining models. The most common approach persists the model in the database and then generates scores by using extensions to the SQL queries processed by the database to invoke the model. A key feature of this approach is that the model can be scored in a system that is different from the data mining tool that developed the model.

The model can be saved in the database as a Predictive Model Markup Language (PMML) object, or, less frequently, in some form of executable code. For more information on PMML, see PMML Overview. Persisting the model in this way is possible since the sophisticated algorithms needed to create the model are not required to score them. Scoring simply involves mathematical calculations on a set of inputs to generate a result. The ability to represent the model and score it outside of the model creation tool is relatively new, but more companies are adopting this approach. Its advantages and disadvantages are described below.

Pros

Scores can be calculated on the fly even if new records are added.
Updating the model is easier than in the Score the database option.
This approach requires less database space than the score the database option.
When the database supports accessing its data mining features via SQL, MicroStrategy can take advantage of this approach using its SQL Engine.

Cons

This approach requires support from a database administrator and application knowledge of the database's data mining tool. However, the database administrator usually does not have this knowledge.
The database data mining tool is typically an additional cost.

MicroStrategy has documented how to implement this approach for the IBM DB2 Intelligent Miner product. Contact MicroStrategy Technical Support for this Tech Note.

MicroStrategy Does the Scoring

In this approach, predictive models are applied from within the Business Intelligence platform environment, without requiring support from the database and from database administrators to implement data mining models. This direct approach reduces the time required, the potential for data inconsistencies, and cross-departmental dependencies.

MicroStrategy Data Mining Services uses enterprise data resources without significantly increasing the overhead. MicroStrategy Data Mining Services allows sophisticated data mining techniques to be applied directly within the business intelligence environment. Just as the other approaches, it also has advantages and disadvantages, as described below:

Pros

MicroStrategy stores the predictive model in its metadata as a predictive metric that can be used just like any other metric.
Scores can be done on the fly even if new records are added.
The predictive model can be viewed in MicroStrategy Developer.
The predictive model is easily updated using MicroStrategy Developer.
This approach does not require database space or support from a database administrator.
MicroStrategy can take advantage of this approach by using the Analytical Engine.

Cons

This approach does not take advantage of the database data mining features.
Predictor inputs need to be passed from the database to Intelligence Server. For large result sets, databases typically handle data operations more efficiently than moving data to MicroStrategy and scoring it there.

A key enabler of this process is MicroStrategy's ability to import predictive models using PMML. Therefore, it is necessary for you to have a basic understanding of PMML and how it is used. This is provided in the next section.

PMML Overview

PMML (Predictive Model Markup Language) is an XML standard that represents data mining models. It was developed by the Data Mining Group (DMG), an independent consortium consisting of over two dozen companies including MicroStrategy. The language thoroughly describes how to apply a predictive model. It allows different model types, including the following:

Regression
Neural networks
Clustering
Trees
Rule set
Support vector machine
Ensembles of models
Association rules
Time series

It also supports data transformation and descriptive statistics. PMML is generated by many data mining applications, including ANGOSS®, FairIsaac®, KXEN®, MicroStrategy, IBM®, Salford Systems®, SAS®, SPSS®, StatSoft®, and others. MicroStrategy can import PMML and also generate PMML for certain types of models.

PMML is a major advance for the industry, as it allows the sophisticated, and sometimes esoteric, work of statisticians and data analysts to be easily deployed to other environments. PMML closes the loop between data mining tools and the applications that use data mining models. Several data mining and database vendors have announced integrations based on PMML. MicroStrategy is the first business intelligence platform to support the standard. By allowing predictive metrics to be accessible to all users in the enterprise, MicroStrategy makes data mining for the masses possible.

For more information on PMML, check the DMG websiteand other related documentation.