Data warehouse v Database

Author: Nishith

Data warehouse vs. database is a long and old debate. In this article I am going to discuss the difference between data warehouse and database. I will also discuss that which one is better to use.

Similarities in Database and Data warehouse:

  • Both database and data warehouse are databases.
  • Both database and data warehouse have some tables containing data.
  • Both database and data warehouse have indexes, keys, views etc.

Difference between Database and Data warehouse:

So is that ‘Data warehouse' really different from the tables in you application? And if the two aren't really different, maybe you can just run your queries and reports directly from your application databases!

Well, to be fair, that may be just what you are doing right now, running some end-of-day reports as complex SQL queries and shipping them off to those who need them. And this scheme might just be serving you fine right now. Nothing wrong with that if it works for you.

But before you start patting yourself on the back for having avoided a data warehouse altogether, do spend a moment to understand the differences, and to appreciate the pros and cons of either approach.

  • The Application database is not your Data Warehouse for the simple reason that your application database is never designed to answer queries.

  • The database is designed and optimized to record while the data warehouse is designed and optimized to respond to analysis questions that are critical for your business.

  • Application databases are On-Line Transaction processing systems where every transition has to be recorded, and super-fast at that.

  • A Data Warehouse on the other hand is a database that is designed for facilitating querying and analysis.

  • A data warehouse is designed as On-Line Analytical processing systems . A data warehouse contains read-only data that can be queried and analyzed far more efficiently as compared to regular OLTP application databases. In this sense an OLAP system is designed to be read-optimized.

  • Separation from your application database also ensures that your business intelligence solution is scalable, better documented and managed and can answer questions far more efficiently and frequently.

Data warehouse is better than a database:

The Data Warehouse is the foundation of any analytics initiative. You take data from various data sources in the organization, clean and pre-process it to fit business needs, and then load it into the data warehouse for everyone to use. This process is called ETL which stands for ‘Extract, transform, and load'.

Suppose you are running your reports off the main application database. Now the question is would the solution still work next year with 20% more customers, 50% more business, 70% more users, and 300% more reports? What about the year after next? If you are sure that your solution will run without any changes, great!! However, if you have already budgeted to buy new state-of-the-art hardware and 25 new Oracle licenses with those partition-options and the 33 other cool-sounding features, you might consider calling up Oracle and letting them know. There's a good chance they'd make you their brand ambassador.

Creation of a data warehouse leads to a direct increase in quality of analyses as you can keep only the needed information in simpler tables, standardized, and denormalized to reduce the linkages between tables and the corresponding complexity of queries.

A data warehouse drastically reduces the cost-per-analysis and thus permits more analysis per FTE.

It's probably more sensible and simpler to create a new data warehouse exclusively for your needs. And if you are cash strapped, you could easily do that at extremely low costs by using excellent open source databases like MySQL.