MicroStrategy ONE

Schema types: Data retrieval performance versus redundant storage

There are many ways to structure your data warehouse and no method is inherently right or wrong. How you choose to structure the warehouse depends on the nature of your data, the available storage space, and the requirements of your user community. Typically, one of the schema types, or a combination of them, is used to organize the physical schema to enhance query performance while maintaining an acceptable amount of data storage space. These schema types are:

In each of these schemas a base fact table and any number of aggregate fact tables are used (For more information about aggregate fact tables, see Using summary tables to store data: Aggregate tables). Fact table keys consist of attribute keys relevant to the level of data stored in the table. The schema examples that follow show data at the Item/Call Center/Date level.

When comparing the different schema types, you should keep in mind the following concepts:

  • Redundant data can cause a couple of drawbacks. The most obvious drawback is that redundant data requires more storage space to hold the same amount of data as a system with no redundancy.

  • Data redundancy also makes updating data a more intensive and difficult process because data resides in multiple places. With no data redundancy, data only has to be updated in a single place.

  • Joins are SQL operations that are required to combine two tables together in order to retrieve data. These operations are necessary, but as with any operation performed on your data warehouse, the number of joins required to build your queries affects the performance of those queries.

  • The sections below are not meant to be an exhaustive list of all possible schema types. However, the sections below are meant to give a description of the most common or general schema types that are used to develop a physical warehouse schema.