Conventional analytical processing tools include OLAP cubes and pivot tables such as the ones provided by Microsoft Excel. These tools take up huge chunk of data and summarize it by grouping records based on a set of criteria. For example, an OLAP cube might summarize sales data by grouping it on the basis of product, region and period. In this case, each grid cell would display the total sales for a particular product, in a particular region, and for a specific period. This cell would normally represent data from several records in the original data source
Data analysis and processing tools allow users to redefine grouping criteria dynamically (online). This makes it easy to perform ad-hoc data analysis and discover hidden patterns.
For example, consider the following table:
Date |
Product |
Region |
Sales |
Oct 2015 |
Product A |
North |
12 |
Oct 2015 |
Product B |
North |
15 |
Oct 2015 |
Product C |
South |
4 |
Oct 2015 |
Product A |
South |
3 |
Nov 2015 |
Product A |
South |
6 |
Nov 2015 |
Product C |
North |
8 |
Nov 2015 |
Product A |
North |
10 |
Nov 2015 |
Product B |
North |
3 |
Now suppose you were asked to analyze this data and answer questions such as:
In order to answer these simple questions, you would have to summarize the data to obtain tables such as these:
Sales by Date and by Product
Date |
Product A |
Product B |
Product C |
Total |
Oct 2007 |
15 |
15 |
4 |
34 |
Nov 2007 |
16 |
3 |
8 |
27 |
Total |
31 |
18 |
12 |
61 |
Sales by Product and by Region
Product |
North |
South |
Total |
Product A |
22 |
9 |
31 |
Product B |
18 |
|
18 |
Product C |
8 |
4 |
12 |
Total |
48 |
13 |
61 |
Each cell in the summary tables represents several records in the original data source, where one or more values fields are summarized (sum of sales in this case) and categorized on the basis of other fields (date, product, or region in this case).
This can be done easily in a spreadsheet, but the work is tedious, repetitive, and error-prone. Even if you write a custom application to summarize the data, probably you spend a lot of time maintaining it to add new views, and users might get constrained in their analyses to the views implemented by you.
OLAP allows users to define the views they want in an interactive, and ad-hoc fashion. They can use pre-defined views or create and save new ones. Any changes to the underlying data are reflected automatically in the views, and users can create and share reports showing these views. In short, OLAP control provides flexible and efficient data analysis process.