SQL Aggregate Functions

The sql aggregate functions are a very important part of the language as they enable us to group a set of rows together to return a single result.  There are over 40 sql aggregate functions defined

for the Oracle database but there are only five that are very frequently used:

  • SUM
  • COUNT
  • MIN
  • MAX
  • and AVG

A complete list of the different  aggregate functions is
available in the Oracle SQL Language Reference manual.

There are 3 places where aggregate

functions can appear in a query:-

  • in the select list (the items before the FROM clause)
  • in the ORDER BY clause
  • and in the HAVING clause

You might believe that these functions always have to appear in the select list but that is not the case, it is perfectly valid to just use the function in the ORDER BY or HAVING clauses.

For example you might want a list of departments ordered by the average salary. To do this you would select the department id (or join to the departments table to get the department name) and ORDER BY average salary like so:

SELECT department_id FROM employees
GROUP BY 
department_id ORDER BY avg(salary);

SQL aggregate functions are commonly used with the GROUP BY clause in queries (or sub queries) to return constants or other columns that are not being aggregated. The GROUP BY clause is mandatory when selecting expressions that exclude an sql aggregate function when it is used elsewhere in the query as shown in the above example. It is used to tell Oracle how to group the results. 

It might seem obvious that if you wish to see the average salary of employees you would group by the department id, but you could just as easily group by the manager id and/or any other column that is not being aggregated and it is not necessary to include in the select list any of the columns that you are grouping by.

We could change our previous example to this:

SELECT "X" FROM employees 
GROUP BY 
department_id ORDER BY avg(salary);

which would return as many "X"s as there were different department ids in the table.

Or we could change the query to do this:

SELECT department_id FROM employees
GROUP BY 
department_id,manager_id ORDER BY avg(salary);

which would return one row for each combination of department ids and manager ids in the table. These particular queries may not make sense from a business point of view but they are syntactically valid and you occasionally get quirky business requirements which make these seemingly strange queries necessary. 


Oracle Tips & Tricks to SKYROCKET Your Career!

If you're not already a subscriber to our ezine you're missing out on a myriad of tips and tricks in SQL and PL/SQL to help you become a better, faster, smarter Oracle developer. Subscribe now and ignite your career.


General Rules for Sql Aggregate Functions

There are a few general rules and conditions for sql aggregate functions that we need to be aware of.

Firstly, for some of them (particularly the most used ones (SUMCOUNTAVGMAXand MIN), the aggregate can be calculated using either ALL values (the default) or just on DISTINCT (UNIQUE) values.

For example if we wanted to calculate the average salary of employees, if we calculated this on DISTINCT values then any duplicates would be ignored, so if 2 employees had the same salary only one would be used to compute the average. 

This query which looks at all salaries in the employees table:

SELECT ROUND(AVG(salary)) FROM employees
;

produces a result of 6350 when run on our test database. Whereas this next query which ignores duplicates produces a result of 6861 in our test database.

SELECT ROUND(AVG(DISTINCT salary)) FROM employees;

The DISTINCT keyword is most commonly used with the SQL COUNT function to count unique occurrences of a particular item.

What about NULL values?

In Oracle the NULL value means the value of something is undefined so it can't take place in calculations. NULL does not mean zero and it can never be equal to any value, not even NULL (because it is undefined - it doesn't have a value). Therefore 10+NULL has a value of NULL (i.e. undefined) because obviously if you add 10 to an undefined number (one whose value you don't know) the result is going to be undefined.  This means that if you attempt to calculate the aggregate of a null value, the answer will be null (except for COUNT(NULL) when the answer will be 0) but if there is at least one non-NULL value included in the aggregate then the NULL values will be ignored.

For example the query:

SELECT SUM(salary) FROM employees;

will produce a result of NULL if all employees have a NULL salary.

The way to overcome this is to either use the NVL function to convert NULLs to a specific value (usually zero for numbers) or to restrict the rows fetched to just those with a not NULL value.

This affects all the SQL aggregate functions except for COUNT, GROUPING and GROUPING_ID.

Nesting SQL Aggregate Functions

Aggregate functions can be nested in the same way as other functions such as to_number or to_char, so for example we can find out which department has the highest average salary with a query such as this:

SELECT MAX(AVG(salary)) FROM employees GROUP BY department_id;

The average salary (assuming salary can't be NULL) for each department is calculated first and then the largest of these is returned as the final result.

Eliminating Groups With the Having Clause

Just as the WHERE clause is used to eliminate individual rows from the calculation, the HAVING clause is used to eliminate aggregates from the results.

The following query for example calculates the average of the maximum salaries for those departments that have a maximum salary less than 12000.

SELECT AVG(MAX(salary)) FROM employees 
GROUP BY department_id HAVING MAX(salary)<12000;

In other words, the maximum salary for all departments is calculated and any with a maximum salary >= 12000 are then eliminated from the calculation of the average. Obviously you're not restricted to using the HAVING clause just when you're nesting functions. One important point to note is that the HAVING clause works at the group level - after the aggregate has been calculated - so in performance terms it is better to eliminate data from the calculation by use of the WHERE clause as this restricts the number of rows that have to be processed.


As well as reading our other Oracle tutorials, one of the best ways to learn more about Oracle is to attend a formal training course with one of our partners. See the Oracle Training page for more details.

Return from sql aggregate functions to Oracle tutorials