Source systems for data collection

Source systems refer to any system or file that captures or holds data of interest. A bank is an example of a business with many source systems. An average bank offers several services such as account activity updates and loan disbursement, and therefore has many source systems to support these services. For example, suppose one source system—a database file on the bank’s server—keeps track of deposits and withdrawals as they occur. Meanwhile, a different source system—another file on the server—keeps track of each customer’s contact information.

A source system is usually the most significant site of online transaction processing (OLTP). Transactional processing involves the simple recording of transactions and other business data such as sales, inventory, e-commerce, deposits, web site usage, and order processing. This processing is relied upon daily by nearly every industry, including health care, telecommunications, manufacturing, and many others.

OLTP systems are databases or mainframes that store real-time processing data and have the following characteristics:

Data access is optimized for frequent reading and writing, as the system records huge volumes of data every day. An example of data that benefits from this type of optimization is the number of credit card transactions that an OLTP system might record in a single day. This is in contrast to data warehouses which are often designed for reading data for analysis with a minimum number of updates, insertions, or deletions. For more information on data warehouse design, see Data warehouse for data storage and relational design.
Data is aligned by application, that is, by business activities and workflow.
Data formats are not necessarily uniform across systems.
Data history is limited to recent or current data.

Recall the example of a bank that relies on several source systems to store data related to the many services the bank offers. Each of these business services has a different and specific workflow.

At an automated teller machine (ATM), you can withdraw or deposit money as well as check on balances. However, to get a money order, you must enter the bank and perform the transaction with a bank teller. This is because the operational systems supporting these two services are designed to perform specific tasks, and these two services require different operational systems.

If a bank wants to see a unified view of a particular customer, such as a customer’s ATM activity, loan status, account balances, and money market account information, the customer information stored in each of these different systems must be consolidated. This consolidation is achieved using the extraction, transformation, and loading (ETL) process.

The ETL process consolidates data so it can be stored in a data warehouse.