Rollup, Cube, Grouping

Title GROUPING_ID

Author: Markus Jägle (markus.jaegle@trivadis.com)

Info Technical background info (April 2002)

Source from the Oracle9i Data Warehousing Guide and the Trivadis courses New Features Oracle9i and Advanced SQL.

 

Introduction

Oracle8i introduced the GROUP BY functions ROLLUP and CUBE. With the same version, Oracle also offered the GROUPING function to analyze the aggregates created by the ROLLUP and CUBE functions. But since this function sometimes hogs large amounts of memory, a new function was developed with the version Oracle9i to solve this problem: the GROUPING_ID function.

 

ROLLUP, CUBE and GROUPING

The ROLLUP function introduced with Oracle8i allows the creation of sorted subtotals at every level of an aggregation up to the sum total. ROLLUP first calculates the aggregates specified in the GROUP BY clause. Subtotals of the higher levels are then created progressively, navigating from right to left through the grouping attributes. Finally, the sum total is output (see following example).

 

SQL> SELECT deptno, job, COUNT(*), SUM(sal)

2 FROM emp

3 GROUP BY ROLLUP (deptno, job);

DEPTNO JOB COUNT(*) SUM(SAL)

---------- --------- ---------- ----------

10 CLERK 1 1300

10 MANAGER 1 2450

10 PRESIDENT 1 5000

10 3 8750

20 ANALYST 2 6000

20 CLERK 2 1900

20 MANAGER 1 2975

20 5 10875

30 CLERK 1 950

30 MANAGER 1 2850

30 SALESMAN 4 5600

30 6 9400

14 29025

13 rows selected.

Example 1: ROLLUP

 

The CUBE function, also introduced with Oracle8i, forms all possible combinations of subtotals, including the sum total (asymmetrical aggregation). This allows very simple cross-classified table reports, such as are necessary in drill-down functionalities of client tools. With multi-dimensional analyses, the CUBE function generates all the subtotals that can be formed for a cube with the specified dimensions (see following example).

SQL> SELECT deptno, job, COUNT(*), SUM(sal)

2 FROM emp

3 GROUP BY CUBE(deptno, job);

DEPTNO JOB COUNT(*) SUM(SAL)

---------- --------- ---------- ----------

10 CLERK 1 1300

10 MANAGER 1 2450

10 PRESIDENT 1 5000

10 3 8750

20 ANALYST 2 6000

20 CLERK 2 1900

20 MANAGER 1 2975

20 5 10875

30 CLERK 1 950

30 MANAGER 1 2850

30 SALESMAN 4 5600

30 6 9400

ANALYST 2 6000

CLERK 4 4150

MANAGER 3 8275

PRESIDENT 1 5000

SALESMAN 4 5600

14 29025

18 rows selected.

Example 2: CUBE

 

With the help of the GROUPING function, it is possible to identify the super aggregate rows within a GROUP BY expression (see ROLLUP and CUBE). The following example shows how the GROUPING function is used. GROUPING returns the value 1 if the row is an aggregate/subtotal; otherwise, the value 0 is returned.

SQL> SELECT deptno

2 ,job

3 ,COUNT (*)

4 ,SUM (sal)

5 ,GROUPING(deptno) AS group_dept

6 ,GROUPING(job) AS group_job

7 FROM emp

8 GROUP BY CUBE (deptno, job);

DEPTNO JOB COUNT(*) SUM(SAL) GROUP_DEPT GROUP_JOB

---------- --------- ---------- ---------- ---------- ----------

10 CLERK 1 1300 0 0

10 MANAGER 1 2450 0 0

10 PRESIDENT 1 5000 0 0

10 3 8750 0 1

20 ANALYST 2 6000 0 0

20 CLERK 2 1900 0 0

20 MANAGER 1 2975 0 0

20 5 10875 0 1

30 CLERK 1 950 0 0

30 MANAGER 1 2850 0 0

30 SALESMAN 4 5600 0 0

30 6 9400 0 1

ANALYST 2 6000 1 0

CLERK 4 4150 1 0

MANAGER 3 8275 1 0

PRESIDENT 1 5000 1 0

SALESMAN 4 5600 1 0

14 29025 1 1

18 rows selected.

Example 3: CUBE with GROUPING

 

The value 1 in the GROUP_DEPT and GROUP_JOB columns of this statement's result shows clearly whether the respective row is a total. If, for example, there is a 1 in the GROUP_JOB column, then we are looking at the subtotal of all JOBs of a DEPTNO (as, for instance, in the 4th line of the example). The sum total row can be recognized in that 1 is returned in all GROUPING functions for all GROUP BY columns (see last row in the

example).

 

GROUPING_ID

Since the information of the GROUPING function must be output for the identification of the super aggregate rows for each GROUP BY column, this can hog a lot of memory if the query result is to be stored in tables, e.g. with materialized views. A select statement, for example, that uses a GROUP BY across four columns requires four GROUPING functions in order to be analyzed.

 

The GROUPING_ID function offers a possibility for saving memory in identifying the super aggregate rows. Similar to the GROUPING function, this function identifies these rows and returns the GROUP BY level as a bit vector.

 

GROUPING_ID takes all the 1's and 0's that were generated using the GROUPING function and combines them to form a bit vector. The bit vector is treated as a binary number, and the value is returned by the GROUPING_ID function. If, for example, a grouping is carried out with the expression CUBE (a,b), the following values are possible.

 

Aggregation Level Bit Vector GROUPING_ID

a, b 0 0 0

a 0 1 1

b 1 0 2

Sum total 1 1 3

The following SQL statements should illustrate this functionality with a simple example. In

these statements, only the CUBE function is used, but GROUPING_ID can also be used in

the same way in combination with the ROLLUP function.

SQL> SELECT deptno,job

2 ,COUNT (*)

3 ,SUM (sal)

4 ,GROUPING(deptno) AS grp_dept

5 ,GROUPING(job) AS grp_job

6 ,GROUPING_ID(deptno, job) AS grp_id

7 FROM emp

8 GROUP BY CUBE (deptno, job);

DEPTNO JOB COUNT(*) SUM(SAL) GRP_DEPT GRP_JOB GRP_ID

------ --------- -------- -------- -------- ------- ------

10 CLERK 1 1300 0 0 0

10 MANAGER 1 2450 0 0 0

10 PRESIDENT 1 5000 0 0 0

10 3 8750 0 1 1

20 ANALYST 2 6000 0 0 0

20 CLERK 2 1900 0 0 0

20 MANAGER 1 2975 0 0 0

20 5 10875 0 1 1

30 CLERK 1 950 0 0 0

30 MANAGER 1 2850 0 0 0

30 SALESMAN 4 5600 0 0 0

30 6 9400 0 1 1

ANALYST 2 6000 1 0 2

CLERK 4 4150 1 0 2

MANAGER 3 8275 1 0 2

PRESIDENT 1 5000 1 0 2

SALESMAN 4 5600 1 0 2

14 29025 1 1 3

18 rows selected.

Example 4: CUBE with GROUPING and GROUPING_ID

 

In the result of Example 4, the value of the bit vector (cf. Example 3) is also displayed in the last column. The bit vector of the last row in the above example, for instance—which shows the sum totals—is "1 1" and thus has the value of 3, returned by the GROUPING ID function. The bit vector of the subtotals for the jobs is "1 0" (see rows 13 to 17 in the above example) and has the value 2. The bit vector for the subtotals of the departments (rows 4, 8

and 12 in the above example) is "0 1," so the GROUPING_ID function returns the value 1.

 

The subtotals for departments and jobs have the bit vector "0 0," and GROUPING_ID returns the value 0. The two columns with the information of the GROUPING function are now no longer necessary for evaluation of the superaggregate rows and can be omitted.

 

SQL> SELECT deptno,job

2 ,COUNT (*)

3 ,SUM (sal)

4 ,GROUPING_ID(deptno, job) AS grp_id

5 FROM emp

6 GROUP BY CUBE (deptno, job);

DEPTNO JOB COUNT(*) SUM(SAL) GRP_ID

---------- --------- ---------- ---------- ----------

10 CLERK 1 1300 0

10 MANAGER 1 2450 0

10 PRESIDENT 1 5000 0

10 3 8750 1

20 ANALYST 2 6000 0

20 CLERK 2 1900 0

20 MANAGER 1 2975 0

20 5 10875 1

30 CLERK 1 950 0

30 MANAGER 1 2850 0

30 SALESMAN 4 5600 0

30 6 9400 1

ANALYST 2 6000 2

CLERK 4 4150 2

MANAGER 3 8275 2

PRESIDENT 1 5000 2

SALESMAN 4 5600 2

14 29025 3

18 rows selected.

Example 5: CUBE with GROUPING_ID

 

One possible use of this function would be the following example.

SQL> SELECT DECODE(GROUPING_ID(deptno, job),

2 1, 'Sub-total Department',

3 2, 'Sub-total Job',

4 3, 'Total',

5 null) AS totals

6 ,deptno

7 ,job

8 ,COUNT (*)

9 ,SUM (sal)

10 FROM emp

11 GROUP BY CUBE (deptno, job);

TOTALS DEPTNO JOB COUNT(*) SUM(SAL)

------------------------ ------ --------- -------- --------

10 CLERK 1 1300

10 MANAGER 1 2450

10 PRESIDENT 1 5000

Sub-total Department 10 3 8750

20 ANALYST 2 6000

20 CLERK 2 1900

20 MANAGER 1 2975

Sub-total Department 20 5 10875

30 CLERK 1 950

30 MANAGER 1 2850

30 SALESMAN 4 5600

Sub-total Department 30 6 9400

Sub-total Job ANALYST 2 6000

Sub-total Job CLERK 4 4150

Sub-total Job MANAGER 3 8275

Sub-total Job PRESIDENT 1 5000

Sub-total Job SALESMAN 4 5600

Total 14 29025

18 rows selected.

Example 6: CUBE with DECODE of GROUPING_ID

 

Conclusion

The GROUPING_ID function allows the user to quickly and easily identify and analyze the super aggregate rows for GROUP BY expressions and thus to display and evaluate the level of aggregation in one column, saving space and making it easier to read.

 

Markus Jägle e-mail: markus.jaegle@trivadis.com

Trivadis Projektentwicklung GmbH Tel: +49 761 45571 54

Sasbacher Str. 2 Fax: +49 761 45571 30

D-79111 Freiburg Internet: www.trivadis.com

No comments :

Post a Comment