MicroStrategy ONE
Data Mining Dataset Reports
Data mining dataset reports have a very simple structure. The data usually focuses on a specific subject or attribute, for example, customers, transactions, or products; this information is used to develop a predictive model.
A dataset report is like a table in a database and usually has the following features:
- Each row represents a specific attribute, such as customer, transaction, or product.
- The first column is a unique identifier for the specific record, such as customer name, customer identification number, transaction number, or product SKU number.
- Each of the remaining columns of the dataset report contains data that describes the item in that row, such as customer age or annual purchases, transaction location or amount, or product color or cost. These columns can be either of the following:
- Inputs to the predictive model, referred to as predictive inputs and also called independent variables
- Representations of outcomes worth predicting, also called dependent variables
The following is an example of a part of a dataset report for customer information:
Notice that each attribute, such as Customer Age Range, has two attribute forms on the report—the ID and the description. Some data mining software works better using numbers, such as the ID, while the description is included for ease of use.
Once the dataset report is ready, it can be used in a data mining analysis, usually in one of two ways:
- Creating a predictive metric using MicroStrategy: The dataset report is used to create a data mining model using MicroStrategy. More information on this approach can be found in Creating a Predictive Model.
- Creating a predictive metric with a third-party data mining tool: The dataset report is made available for analysis by an external application, usually in one of the following ways:
- The dataset report is created in the database as a table using MicroStrategy's Data Mart feature. Third-party data mining applications can easily access databases using ODBC and SQL. This setup also promotes consistency between the dataset report used to develop the predictive model, especially for the variable names and data types. It does require database accessibility and database storage space. For more information on data marts, see Accessing Subsets of Data: Data Marts.
- The dataset report is exported to a particular file format using MicroStrategy's export capabilities. Third-party data mining applications can access many file formats such as Microsoft Excel, text files, and so on. Exporting files requires that the data type of each variable is determined on-the-fly by the data mining application. This interpretation may need correction by the user. On the other hand, this approach is usually easier for most people and does not require help with database administration. See the MicroStrategy online help for more information on exporting reports from MicroStrategy.