A Warehouse

Assume that you are the manager of a large retail store where your customers want range of products including cosmetics, soft drinks, toiletries, Over-the-counter drugs, toys, processed foods and more. How does your customer feel when you have cosmetics in a separate store and then processed foods in another store in a far away distance. In reality, such a retail store does not serve the purpose of one stop shopping. This will create inconvinience to the customer as well loose the sales for the store owner.

Let us expand this analogy to an organization. An organisation might have one system that handles customer-relationship, a system that handles employees, systems that handles sales data or production data, yet another system for finance and budgeting data etc. In practice, these systems are often poorly or not at all integrated.  Collection of such heterogeneous systems cannot even answer some simple questions without lot of work. IT team may have to work a lot to answer below questions.

  • How much time did sales person A spend on customer C?
  • How much did we sell to Customer C?
  • Was customer C happy with the provided service?
  • Did Customer C pay his bills?

To address such questions and to facilitate effective reporting and analysis, organizations build data warehouse.

So, what is the definition of data warehouse?

A simple definition is, a data warehouse is a repository of an organization's electronically stored data. Data warehousing is the process of building such a repository.

There are two most popular definitions of data warehouse as given by Bill Inmon and Ralph Kimball. These definitions will further elaborate on what a data warehouse is.

Data warehousing - ETL process

Bill Inmon definition of data warehouse

A data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management's decision making process.

Subject-Oriented: A data warehouse can be used to analyze a particular subject area. For example, "sales" can be a particular subject.

Integrated: A data warehouse integrates data from multiple data sources. For example, source A and source B may have different ways of identifying a product, but in a data warehouse, there will be only a single way of identifying a product.

Time-Variant: Historical data is kept in a data warehouse. For example, one can retrieve data from 3 months, 6 months, 12 months, or even older data from a data warehouse. This contrasts with a transactions system, where often only the most recent data is kept. For example, a transaction system may hold the most recent address of a customer, where a data warehouse can hold all addresses associated with a customer.

Non-volatile: Once data is in the data warehouse, it will not change. So, historical data in a data warehouse should never be altered.

Ralph Kimball definition of data warehouse

Ralph Kimball provided a more concise definition of a data warehouse:

A data warehouse is a copy of transaction data specifically structured for query and analysis.

This is a functional view of a data warehouse. Kimball did not address how the data warehouse is built like Inmon did; rather he focused on the functionality of a data warehouse.