MicroStrategy ONE

Regression analysis

Regression analysis is used to analyze the relationships between a set of independent variables and a single dependent variable. The magnitude of the relationships between these variables is determined, leading to the generation of a model that describes these relationships. This model can then be used to make predictions.

MicroStrategy supports the following types of regression analysis:

Linear regression

If this type of regression is specified, the function will attempt to calculate the coefficients of a straight line that best fits the input data. The calculated formula will have the following format: y = b0 + b1x1 + b2x2 + ... + bnxn, where y is the target variable and x1 ... xn are the independent variables.

The dependent variable is always continuous in nature, while the independent variables may be continuous or categorical.

Exponential regression

If this type of regression is specified, the function will attempt to calculate the coefficients of an exponential curve that best fits the input data. The calculated formula will have the following format: y = b0 * (b1^x1) * (b2^x2) * ... * (bn^xn), where y is the target variable and x1 ... xn are the independent variables.

The dependent variable is always continuous in nature, while the independent variables may be continuous or categorical.

Logistic regression

Unlike linear regression where the outcome (or dependent variable) is a continuous number, in Logistic Regression the outcome is always categorical (such as True/False, High/Medium/Low, etc.). Logistic regression is similar to linear regression and uses many of the same statistical calculations and techniques. While linear regression generates an equation that represents the best line that fits the data, logistic regression generates an equation for each possible outcome, and then selects the outcome determined to be the most strongly supported.

Tree regression

Another type of regression model generated by MicroStrategy utilizes the model selection feature provided by PMML. Specifically, a tree regression model is generated that consists of a decision tree model, with individual regression models defined for each leaf node. This type of model allows for the segregation of data based on one or more criteria (for example, geography, product group, etc.).

This feature is triggered by selecting one or more inputs to serve as segmentation metrics. The number selected determines the depth of the generated decision tree. Each leaf node defines a unique regression model, which can be of any of the three supported regression model types listed above.

Stepwise regression

This type of regression refers to the various variable reduction techniques utilized by MicroStrategy. These techniques (backward and forward regression) only apply to the linear and exponential regression methods described above. These methods make use of a user-specified parameter Variable Importance, which is used internally to calculate the value alpha and is equal to 1 - Variable Importance. The alpha value is then used during regression analysis, such that the final model produced must contain variables whose significance is less than alpha.

  • The forward regression method starts with a null model (with no variables) and in every iteration adds the variable which is the most significant (that is, the lowest alpha value). At each step, new alpha values are calculated, and this process continues until no variables can be added to the model whose significance is less than or equal to the user-specified alpha.

  • The backward regression method starts with a full model (with all of the variables) and in every iteration removes the variable which is the least significant (that is, the highest alpha value). At each step, new alpha values are calculated, and this process continues until no variables can be eliminated from the model whose significance is greater than the user-specified alpha.

If stepwise reduction is not desired, the single pass method is available. This option performs a single pass over the variables, calculating alpha values for each, and eliminating those variables with alpha values less than the user-specified alpha.