Version 2021

Variables tab, Correlation Matrix sub-tab

One advantage of the data mining process is the ability to determine the relationship, or correlation, between the variables that participate in the model. While this descriptive information does not affect the scoring of the model, correlations can be helpful in understanding how variables are related to each other.

To include these correlations in predictive metrics, you must enable the Include extended statistical analysis with the model option when creating a training metric, using the Training Metric Wizard. Be aware that this option, and therefore these correlations, are not available for Association Rules and Time Series models. In addition, MicroStrategy also displays correlation matrices included in the PMML from third-party products.

The Correlation Matrix shows the dependency of one variable on another, referred to as the correlation value. These values are normalized on a color-coded scale that ranges from 1 to -1, where:

  • 1 (Green): The variables are perfectly directly correlated. For example, measuring the distance from work to your home in miles is directly correlated to measuring the distance from work to your home in kilometers. For instance, twice as many miles results in twice as many kilometers.

  • 0 (White): The variables have no correlation. For example, the distance from work to your home bears no relationship to weather. For instance, the distance from work to your home is the same when it is raining and when it is sunny.

  • -1 (Red): The variables are perfectly inversely correlated. For example, your average speed on the drive home is inversely correlated to the time it takes you to get home. For instance, if you drive to your house twice as fast, you will get there in half the time.

The main diagonal of the matrix always contains ones since a variable is always perfectly directly correlated to itself. The matrix is also symmetrical, meaning that the correlation between variable A and variable B is the same as the correlation between variable B and variable A.

Each pair of variables in the matrix has a correlation value and a correlation method. You can display these values by selecting the Show Values and Show Methods check boxes, respectively.

The possible correlation methods are:

  • Pearson: Pearson's correlation coefficient

  • Spearman: Spearman's rank correlation coefficient

  • KendallT: Kendall's tau (τ)

  • Cont: Contingency table

  • X2_Test: Chi-Square test

  • CramersV: Cramer's V

  • FisherEx: Fisher's exact test