MicroStrategy ONE

Statistical Descriptors - Simple Example

The report and report components used in this example can be found in the MicroStrategy Tutorial project in the following folder: MicroStrategy Tutorial\Public Objects\Reports\ MicroStrategy Platform Capabilities\Advanced Analytics\ Statistics and Forecasting\Statistical Descriptors

The Component Objects - Advanced folder contains additional statistical descriptor examples.

How Do I Learn More About My Customers?

You can get more information about the customers in each region by performing statistical calculations on their spending. This example uses analysis to answer the following questions:

  • How many customers exist in each customer region?
  • What is the average spending of customers in each customer region?
  • What is the median spending of customers in each customer region?
  • What is the standard deviation of customer spending in each customer region?
  • How much is the 25th and 75th percentile of customer spending in each customer region?

These are statistical descriptors about customers in each customer region. This type of data can help you understand how customers spend their money.

To answer the questions above, you must create a set of simple metrics based on other metrics. These metrics are defined using nested group-value functions.

  1. This metric determines the number of customers in each region:

    Count of Customers = Sum(Count(1) {~+, Customer+} ) {~+}

  2. This metric determines the average spending of customers in each region:

    a.Mean Revenue Per Customer = Avg(Sum(Revenue) {~+, Customer+} ) {~+}

  3. This metric determines the median spending of customers in each region:

    b.Median Revenue Per Customer = Median(Sum(Revenue) {~+, Customer+} ) {~+}

  4. This metric determines the standard deviation of customer spending in each region:

    e.StDev of Revenue Per Customer = Stdev(Sum(Revenue) {~+, Customer+} ) {~+}

  5. This metric determines the threshold for the 25th percentile:

    25th Percentile of Revenue Per Customer = Percentile(Sum(Revenue) {~+, Customer+} , 0.25) {~+}

  6. This metric determines the threshold for the 75th percentile:

    75th Percentile of Revenue Per Customer = Percentile(Sum(Revenue) {~+, Customer+} , 0.75) {~+}

    Create a report and place all of the metrics described above on the row axis and place the attribute Customer Region on the column axis. Execute the Statistical Descriptors - Simple report, which is shown below:

    This report provides information on the behavior of customer spending in each customer region. For example, the Northwest customer region has the least number of customers, but most customers in that region spend more than any other region, as seen in the larger mean and median revenues per customer. This can highlight that it is worthwhile to gain a larger customer base in this region.

    However, the somewhat lower value for the standard deviation of revenue per customer in the Northwest region also indicates that this result might be due to the smaller sample size and a few customers that spend abnormally large amounts. This standard deviation value helps to show that while a campaign to increase the customer base in the region is still an attractive idea, the results should be monitored to determine if revenues continue to be higher per customer.

    The SQL generated for the Statistical Descriptors - Simple report is as follows:

    Pass0 -  Execution Duration: 0:00:00.00
    create table ZZT42W71C95MD000 (
    	CUSTOMER_ID LONG, 
    	WJXBFS1 LONG, 
    	WJXBFS2 DOUBLE)
    Pass1 -  Execution Duration: 0:00:00.06
    insert into ZZT42W71C95MD000 
    select a11.[CUSTOMER_ID] AS CUSTOMER_ID,
    	count(1.0) AS WJXBFS1,
    	sum(a11.[TOT_DOLLAR_SALES]) AS WJXBFS2
    from [CUSTOMER_SLS] a11
    group by a11.[CUSTOMER_ID]
    Pass2 -  Execution Duration: 0:00:00.09
    select distinct pa11.[CUSTOMER_ID] AS CUSTOMER_ID,
    	a14.[CUST_REGION_ID] AS CUST_REGION_ID,
    	a15.[CUST_REGION_NAME] AS CUST_REGION_NAME0,
    	pa11.[WJXBFS1] AS WJXBFS1,
    	pa11.[WJXBFS2] AS WJXBFS2
    from [ZZT42W71C95MD000] pa11, 
    	[LU_CUSTOMER] a12, 
    	[LU_CUST_CITY] a13, 
    	[LU_CUST_STATE] a14, 
    	[LU_CUST_REGION] a15
    where pa11.[CUSTOMER_ID] = a12.[CUSTOMER_ID] and 
    	a12.[CUST_CITY_ID] = a13.[CUST_CITY_ID] and 
    	a13.[CUST_STATE_ID] = a14.[CUST_STATE_ID] and 
    	a14.[CUST_REGION_ID] = a15.[CUST_REGION_ID]
    Pass3 -  Execution Duration: 0:00:00.00
    [Analytical SQL calculated by the Analytical Engine:
    	select CUST_REGION_ID,
    		CUST_REGION_NAME0,
    		sum(WJXBFS1),
    		avg(WJXBFS2),
    		Median(WJXBFS2),
    		Stdev(WJXBFS2),
    		Percentile(WJXBFS2, 0.25),
    		Percentile(WJXBFS2, 0.75)
    	from [previous pass]
    ]
    Pass4 -  Execution Duration: 0:00:00.00
    [Populate Report Data]
    Pass5 -  Execution Duration: 0:00:00.00
    drop table ZZT42W71C95MD000
  • The first two passes of SQL (Pass0 and Pass1) prepare and calculate the values used in other metrics.
  • The Intelligence Server optimizes the process by pulling only a single column of revenue data even though it is used in several other metrics.
  • In Pass2 and Pass3 the Intelligence Server retrieves the results from Pass1 to compute the other metrics. In Pass3 the Intelligence Server acts as an in-memory database, using the values from Pass2 to calculate the following group-value functions: Sum, Avg, Median, Stdev, and Percentile.
  • The remaining passes prepare the report and drop the temporary tables.