Analytical Functions in Oracle and it’s benefits
Hello Everyone,
Today we will discuss about Analytical Functions in Oracle.
Analytic functions in Oracle calculate an aggregate value based on a group of
rows and return multiple rows for each group. In other word we can say Analytical
functions are used to do analyze data over multiple rows and return the result
in the current row. Also with analytic
functions, however, you can group the data that is used to calculate the
function’s result, but show a value one each record in the result set, without
grouping as well.
Advantage of Analytical Function is as below:
Improve the Performance of Query
Analtical Functions are easier to write than sub queries and joins
to get the same result.
Easy to maintain.
Before Starting of analytical function we need to understand below two points:
OVER:
The scope of an analytical function is defined in an
OVER clause. OVER is mandatory for all analytical functions. If you do not give
any arguments after OVER, it will assume the scope as ALL.
PARTITION BY:
We know aggregate functions use GROUP BY to group rows
whereas analytical functions use PARTITION BY clause .
Oracle had around 26 analytical functions , but commonly used Analytical Functions are as below:
ROW_NUMBER
RANK
DENSE_RANK
LEAD AND LAG
FIRST_VALUE AND LAST_VALUE
WINDOW CLAUSE
Let’s start with ROW_NUMBER analytical Function in
Oracle :
ROW_NUMBER()
is an analytic function in Oracle that assigns a sequential unique integer to
each row to which it is applied, either each row in the partition or each row
in the result set. ROW_NUMBER gives a running serial number to a set of
records. ORDER BY clause plays a crucial role in Analytical Function.
Syntax for ROW_number :
ROW_NUMBER() OVER (
[query_partition_clause]
order_by_clause
)
Here order_by_clause is required. It specifies the
order of rows in each partition or in the whole result set.
Note : Here Query_partition_clause
is optional. It distributes the rows into partitions to which the function is
applied. If you omit the query_partition_clause, the function will treat the
whole result set as a single partition.
Below is the simple example where i can see the row_number for each department by using partition clause.
SELECT
ename,
hiredate,
deptno,
row_number() over (partition by deptno order by
hiredate) r1
FROM
Emp;
Output of above query is as below:
Another Simple example of ROW_NUMBER Analytical
Function by using sub query is as below:
select
ename, deptno,hiredate
from
(
SELECT
ename,
hiredate,
deptno,
row_number() over (partition by deptno order by
hiredate) r1
FROM emp
) where r1=5;
Here we are getting the records for row number 5 for
each department. Below is the output of above query.
RANK : RANK analytical function also
gives a serial number to a set of rows ordered in a specific way like the
ROW_NUMBER. The RANK() function is an analytic function that calculates the
rank of a value in a set of values. The RANK() function returns the same rank
for the rows with the same values. It adds the number of gap to the next rank.
The difference is: ROW_NUMBER gives a unique number to
each row even if the rows are duplicate. But RANK gives the same number to the
duplicate rows.
Syntax for RANK Function is similar to ROW_NUMBER
RANK()
OVER
([ query_partition_clause ] order_by_clause)
Example of Rank : In below query we will see the
output as both row_number and rank for better understanding.
SELECT
ename,
hiredate,
deptno,
row_number() over (order by hiredate) r1,
rank() over( order by hiredate) r2
FROM emp
Below is the output of above query:
Here we can easily understand that row_number for Jones
is 10 but rank is 9 only. As we put condition against the order by hiredate .
Another example of RANK by using subquery where we can see the multiple records as we
put condition for RANK.
select * from (
SELECT
ename,
hiredate,
deptno,
row_number() over (order by hiredate) r1,
rank() over( order by hiredate) r2
FROM emp)
where r2=9;
Below is the output of above query :
DENSE_RANK : DENSE_RANK works in a similar way as the
RANK. Main difference is that it does not skip numbers while assigning to
duplicate rows. In other word it’s not providing gap and provide the real rank
value.
Note : DENSE_RANK() is an analytic function that
calculates the rank of a row in an ordered set of rows. The returned rank is an
integer starting from 1.
Syntax is Similar to RANK only .
DENSE_RANK( ) OVER([ query_partition_clause ]
order_by_clause)
Example of DENSE_RANK
function is as below to get the exact RANK :
SELECT
ename,
hiredate,
deptno,
rank() over(order by hiredate) r1,
dense_rank() over( order by hiredate) r2
FROM emp
Below is the output of above
query :
Here you can easily understand
the dense rank of Scott coming as 11 and RANK is 12.
Below is the another
DENSE_RANK function by using Subquery where we will able to see both RANK and
DENSE_RANK for Scott only .
select * from(
SELECT
ename,
hiredate,
deptno,
rank() over(order by hiredate)
r1,
dense_rank() over( order by
hiredate) r2
FROM emp)
where r2=11;
Below is the output of above
query
LEAD AND LAG :
LEAD() is an analytic function
that allows you to access the following row from the current row . In another
word LEAD function allows us to apply
any computation on the next rows(rows after the current row).
Syntax of LEAD is as below :
LEAD(<expression>,<offset>,<default>)
over(<analytic clause>)
Here <expression> is the
expression to apply on the leading row
<offset> is the number
of the leading row relative to the current rows.
<default> is the value
to be returned if the <offset> points to a row outside the partition
range
Let’s write a query with using
LEAD and LAG where we find out the
previous lower salary and the next higher salary for department 30, the query is as below:
SELECT
ename,
sal,
lead(sal,1,0) over(partition
by deptno order by sal desc) next_lowsal,
lag(sal,1,0) over(partition by
deptno order by sal desc) prev_highsal
FROM emp
where deptno=30;
Note : LAG use the same
syntax. The <offset> in LAG points to the previous rows related to the
current row.
Output of above query is as
below :
FIRST_VALUE AND LAST_VALUE Analytical Function : The
FIRST_VALUE() is an analytic function that allows you to get the first value in
an ordered set of value. Whereas The LAST_VALUE() allows us to obtain the last value in an
ordered set of values.
Let's take a simple example where we have to find the
number of days between the first hire and current hiredate of each employee of
department 30, the query is as below:
SELECT
ename,
sal,
hiredate-first_value(hiredate) over(order by hiredate)
first_value_days
FROM emp
where deptno=30
Output of above query is as below:
LAST_VALUE Query we can write in similar way .
SELECT
ename,
sal,
hiredate-last_value(hiredate) over(order by hiredate)
last_value_days
FROM emp
where deptno=30
Output of Above query is as below:
WINDOW CLAUSE :
Partition clause is not the only method of limiting
the scope of an analytical function. We can use ROWS BETWEEN or RANGE BETWEEN
clause to define an additional scope in Analytical Function.
Syntax:
[ROW or RANGE] BETWEEN <start_expr> AND
<end_expr>
Note : <start_expr> can be any one of the
following UNBOUNDED PRECEDING
CURRENT ROW
<sql_expr> PRECEDING or FOLLOWING.
<end_expr> can be any one of the following UNBOUNDED
FOLLOWING or
CURRENT ROW or <sql_expr>
PRECEDING or FOLLOWING.
Below is a query which gives the highest salary in the
previous rows relative to the current row.
SELECT
ename,
hiredate,
sal,
max(sal) over(order by hiredate,ename
ROWS BETWEEN
UNBOUNDED PRECEDING AND 1 PRECEDING) maximumbefore_salary
FROM emp;
Output of above query is as below:
Here, for each row, the highest salary before the
current row is returned. ORDER BY clause
here is not used for ranking but for specifying a window. UNBOUNDED PRECEDING
is the lowest possible bound i.e. the first row.
Note :
Difference between Aggregate and Analytical function is Aggregate
functions group the rows and return the output for the entire result set.
whereas analytical functions return the aggregated output but they do not group
the result set. They return the value for each row in the result set.
Please go through this blog carefully as Analytical
Functions in Oracle are very powerful tools to aggregate and analyze the data
across multiple dimensions. The execution speed is also much better than the
normal aggregate functions , Also improve the query performance as compared to
aggregate functions. Please go through each topic and for any doubts put it in
comment.
Thanks.
It is very useful for me. Thanks...
ReplyDeleteDevOps Online Training
Best DevOps Online Training in Hyderabad
Nice Blog and its very useful for me. Thanks...
ReplyDeleteSnowflake Online Training