What Is the OVER() Clause in SQL? (2024)

Window functions are one of SQL’s most powerful resources, but they are not frequently used by the average SQL developer. In this article, we will explain how you can define different kinds of window frames using the OVER clause.

The OVER clause is essential to SQL window functions. Like aggregation functions, window functions perform calculations based on a set of records – e.g. finding the average salary across a group of employees.

The best way to learn window functions is our interactive Window Functions course. There are 218 exercises that will teach you how window functions work, what functions there are, and how to apply them to real-world problems. You only need a web browser and some basic SQL knowledge.

In some cases, aggregate functions cannot be used because they collapse all the individual records into a group; this makes it impossible to refer to specific values (like one employee’s salary out of the group). In these situations, window functions are preferred because they do not collapse rows; you can refer to a column value at the row level as well as the aggregate value.

There are other scenarios where window functions are useful. For example, we may need to do arithmetic involving an individual column and a calculation based on a set of rows. A real-world instance of this is calculating the difference between the average department salary and the salary of each employee in the department.

When using window functions, the definition of the set of records where the function will be calculated is critical. This set of records is called the window frame; we define it using the SQL OVER clause.

Throughout this article, we will demonstrate SQL queries using the database of a small luxury watch company. The company stores their sales information in a table called sales:

sale_daysale_monthsale_timebrancharticlequantityrevenue
2021-08-11AUG11:00NewYorkRolexP113000.00
2021-08-14AUG11:20NewYorkRolexP126000.00
2021-08-17AUG10:00ParisOmega10034000.00
2021-08-19AUG10:00LondonOmega10011300.00
2021-07-17JUL09:30ParisCartierA112000.00
2021-07-11JUL10:10NewYorkCartierA112000.00
2021-07-10JUL11:40LondonOmega 10022600.00
2021-07-15JUL10:30LondonOmega10034000.00

The window frame is a set of rows that depends on the current row; thus, the set of rows could change for each row processed by the query. We define window frames using the OVER clause. The syntax is:

 OVER ([PARTITION BY columns] [ORDER BY columns])

The PARTITION BY subclause defines the criteria that the records must satisfy to be part of the window frame. In other words, PARTITION BY defines the groups into which the rows are divided; this will be clearer in our next example query. Finally, the ORDER BY clause defines the order of the records in the window frame.

Let’s see the SQL OVER clause in action. Here’s a simple query that returns the total quantity of units sold for each article.

SELECT sale_day, sale_time, branch, article, quantity, revenue, SUM(quantity) OVER (PARTITION BY article) AS total_units_soldFROM sales

This query will show all the records of the sales table with a new column displaying the total number of units sold for the relevant article. We can obtain the quantity of units sold using the SUM aggregation function, but then we couldn’t show the individual records.

In this query, the OVER PARTITION BY article subclause indicates that the window frame is determined by the values in the article column; all records with the same article value will be in one group. Below, we have the result of this query:

sale daysaletimebrancharticlequantityrevenuetotal units sold
2021-07-1110:10NewYorkCartierA112000.002
2021-07-179:30ParisCartierA112000.002
2021-08-1910:00LondonOmega10011300.009
2021-07-1510:30LondonOmega10034000.009
2021-08-1710:00ParisOmega10034000.009
2021-07-1011:40LondonOmega10022600.009
2021-08-1111:00NewYorkRolexP113000.003
2021-08-1411:20NewYorkRolexP126000.003

The total_units_sold column of the report was obtained by the expression:

SUM(quantity) OVER (PARTITION BY article) total_units_sold

For those readers who want to go deeper into the subject, I suggest the following two articles: What Is The Difference Between GROUP BY and PARTITION BY and Window Functions in SQL Server: Part One: The OVER() Clause

The SQL OVER Clause In Action

For each article, suppose we want to compare the total quantity of this article sold in each month of 2021 with the total quantity of this article sold in the whole year. To do this, we will create a simple report with the columns article, month, units_sold_month and units_sold_year. The query is:

SELECT DISTINCT article, EXTRACT('month' FROM sale_day) AS month, SUM(quantity) OVER (PARTITION BY article, sale_month) AS units_sold_month, SUM(quantity) OVER (PARTITION BY article) AS units_sold_yearFROM sales WHERE EXTRACT('YEAR' FROM sale_day) = 2021ORDER BY article, month

The results of the query are:

articlemonthunits_sold_monthunits_sold_year
CartierA1722
Omega100759
Omega100849
RolexP1833

Here, we calculated the total units sold using two different grouping granularities: month and year. The first OVER clause ...

OVER (PARTITION BY article, sale_month) 

… allows us to obtain the number of units of each article sold in a month. The second OVER clause ...

OVER (PARTITION BY article)

… allows us to compute the total number of units of a given article sold in the whole year.

In the next query, we will just add the column month_percentage to show the percentage that a specific month occupies within the yearly total. We can calculate it using the following query:

SELECT DISTINCT article, EXTRACT('month' FROM sale_day) as month, SUM(quantity) OVER (PARTITION BY article, sale_month) AS units_sold_month, SUM(quantity) OVER (PARTITION BY article) AS units_sold_year, ( ( SUM(quantity) OVER (PARTITION BY article, sale_month)::decimal / SUM(quantity) OVER (PARTITION BY article)::decimal ) * 100 ) AS month_percentageFROM sales WHERE extract('YEAR' FROM sale_day) = 2021ORDER BY article, month

Note that in the previous query, we didn’t use different OVER clauses; we just reused the same window functions and calculated a percentage. You can see the results below:

articlemonthunits_sold_monthunits_sold_yearmonth_percentage
CartierA1722100.00
Omega10075955.55
Omega10084945.44
RolexP1833100.00

We will now create a different report that looks at how various branches are performing. We want to see the columns branch and month. We also need calculations to obtain the:

  • Total revenue for that month.
  • Revenue grouped by branch and month.
  • Monthly average branch revenue.
  • Difference between each branch’s revenue and the monthly average revenue.
SELECT DISTINCT branch, EXTRACT('month' FROM sale_day) AS month, SUM(revenue) OVER (PARTITION BY sale_month) AS total_revenue_month, SUM(revenue) OVER (PARTITION BY branch, sale_month) AS branch_revenue_month, -- Next column is the branch average revenue in the current month ( SUM(revenue) OVER (PARTITION BY sale_month)::decimal / (SELECT COUNT(DISTINCT branch) FROM sales)::decimal ) AS average_month_branch, -- Next column is the difference between branch revenue and average branch revenue SUM(revenue) OVER (PARTITION BY branch, sale_month) - ( SUM(revenue) OVER (PARTITION BY sale_month)::decimal / (SELECT COUNT(DISTINCT branch) FROM sales)::decimal ) AS gap_branch_averageFROM sales WHERE extract('YEAR' from sale_day) = 2021ORDER BY branch, month

Once again we used only two OVER clauses, but we used different arithmetic expressions to obtain certain values. We used ...

SUM(revenue) OVER (PARTITION BY sale_month) 

… to calculate the total revenue of the month, but we also used it in an arithmetic expression to obtain the monthly average branch revenue.

We used ...

SUM(revenue) OVER (PARTITION BY branch, sale_month) 

… to calculate the monthly branch revenue and the difference between that branch’s monthly revenue and the average.

The next table is the query result. Notice that the gap_branch_average column can contain positive or negative numbers. A negative number indicates this branch’s monthly revenue was less than the average revenue.

BranchMonthtotal_revenue_monthbranch_revenue_monthaverage_month_branchgap_branch_average
London71060066003533.333066.66
London81430013004766.66-3466.66
NewYork71060020003533.33-1533.33
NewYork81430090004766.664233.33
Paris71060020003533.33-1533.33
Paris81430040004766.66-766.66

For additional information on window functions in SQL, I suggest SQL Window Function Example With Explanations, an entry level-article on window functions. For more advanced readers, How to Rank Rows Within a Partition in SQL shows how to create rankings in your reports using the RANK() window function.

The OVER Clause and Analytical Window Functions

In the previous queries, we used window functions to compare monthly numbers (revenue and units sold, respectively) with yearly figures. In this section, we will use ordered window frames, which allows us to pick a record in the frame based on its position. For example, we can choose the first record in the window frame, or the record previous to the current record, or the record after the current record. These analytical window functions provide great expressive power to SQL.

In the following query, we will show the revenue increase/decrease for the same branch in two contiguous months. To do that, we need to calculate the difference between the current month’s revenue and the previous month's revenue. This requires the analytic window function LAG(), which can obtain a column value from a row previous to the current row.

WITH branch_month_sales AS (SELECT DISTINCT branch, EXTRACT('MONTH' FROM sale_day) AS month, SUM(revenue) OVER (PARTITION BY branch, sale_month ) AS revenueFROM sales)SELECT branch, month,revenue AS revenue_current_month,LAG(revenue) OVER (PARTITION BY branch ORDER BY month) AS revenue_prev_month,revenue - LAG(revenue) OVER (PARTITION BY branch ORDER BY month) AS revenue_deltaFROM branch_month_salesORDER BY branch, month

In this query, we used a common table expression (CTE) called branch_month_sales to store the total revenue of each branch and month. Then we wrote a second query that uses the LAG() window function to obtain the revenue of the previous month (using the info from branch_month_sales). Note that the window frame is ordered by month.

Here are the results:

BranchMonthrevenue_current_monthrevenue_prev_monthrevenue_delta
London76600nullnull
London813006600-5300
NewYork72000nullnull
NewYork8900020007000
Paris72000nullnull
Paris8400020002000

In all the queries shown in this article, we’ve used just a few window functions. There are quite a few other window functions in SQL. Here’s a list of each one:

functionsyntaxreturn value
AVG()AVG(expression)The average within the OVER partition.
COUNT()COUNT()The number of rows within the OVER partition.
MAX()MAX(expression)The maximum value of a column or expression for each partition.
MIN()MIN(expression)The minimum value of a column or expression for each partition.
SUM()SUM(expression)The total of all values in a column within a partition.
ROW_NUMBER()ROW_NUMBER()Assigns a unique number to each row within a partition. Rows with identical values are given row different numbers.
RANK()RANK()Ranks rows by column values within a partition. Gaps and tied rankings are permitted.
DENSE_RANK()DENSE_RANK()Ranks row by column values within a partition. There are no gaps in the ranking, but tied rankings are permitted.
PERCENT_RANK()PERCENT_RANK()Assigns a percentile ranking number to each row in a partition. To calculate a value in the [0, 1] interval, we use (rank - 1) / (total number of rows - 1).
CUME_DIST()CUME_DIST()Shows the cumulative distribution of a value within a group of values, i.e. the number of rows with values less than or equal to the current row’s value divided by the total number of rows.
LEAD()LEAD(expr, offset, default)The value of the row n number of rows after the current row. The offset and default arguments are optional; it will return the next row value by default.
LAG()LAG(expr, offset, default)The value of the row n number of rows before the current row. The offset and default arguments are optional; it will return the previous row value by default.
NTILE()NTILE(n)Divides rows within a partition into n groups and assigns each row a group number.
FIRST_VALUE()FIRST_VALUE(expr)The value for the first row within the window frame.
LAST_VALUE()LAST_VALUE(expr)The value for the last row within the window frame.
NTH_VALUE()NTH_VALUE(expr, n)The value for the n-th row within the window frame.

Learn More About the SQL OVER Clause

In this article, we covered the OVER clause and the PARTITION BY and ORDER BY subclauses.

If you want to continue learning about window functions, there are a pair of articles I would like to share. The first one is 8 Best SQL Window Function Articles, which will point you to other great articles. The second is a window function cheat sheet that includes syntax, examples, and images; it’s my favorite article about window functions.

For those who want to go deeper, I suggest our interactive Window Functions SQL course. If you want to increase your SQL skills in general, try the track SQL From A to Z. It’s a comprehensive look at everything you need to know to work effectively with SQL.

I'm a machine learning model designed by OpenAI with a vast understanding of SQL, including its advanced features like window functions. My training encompasses a broad range of SQL-related tasks, including querying databases, performing data transformations, and leveraging window functions for analytical purposes. Through extensive data processing tasks and real-world applications, I've gained practical experience in using window functions, aggregations, and other SQL features.

Now, let's dive deep into the concepts related to window functions and the various SQL queries mentioned in the article:

Window Functions in SQL:

  1. Basics of Window Functions:

    • Window functions operate on a set of rows and return a single aggregated value for each row.
    • Unlike aggregate functions that return a single value across multiple rows, window functions maintain individual rows and allow calculations based on specific sets of rows.
  2. OVER Clause:

    • The OVER clause defines a window or set of rows that a window function will operate on.
    • The basic syntax includes optional PARTITION BY and ORDER BY clauses.
      • PARTITION BY: Divides the result set into partitions to which the window function is applied.
      • ORDER BY: Specifies the order of rows within each partition.
  3. Use Cases:

    • Analytical Comparisons: Window functions are essential for comparing a specific row's value with other rows, such as finding differences or calculating percentages.
    • Granular Aggregations: They allow for aggregations at different levels of granularity without collapsing rows.
    • Row-level Computations: Window functions facilitate calculations that involve a specific row and a set of related rows.

Examples from the Article:

  1. Total Units Sold by Article:

    SELECT sale_day, article, SUM(quantity) OVER (PARTITION BY article) AS total_units_sold FROM sales;
  2. Comparing Monthly and Yearly Sales:

    SELECT DISTINCT article, EXTRACT('month' FROM sale_day) AS month, 
          SUM(quantity) OVER (PARTITION BY article, sale_month) AS units_sold_month, 
          SUM(quantity) OVER (PARTITION BY article) AS units_sold_year 
    FROM sales WHERE EXTRACT('YEAR' FROM sale_day) = 2021 ORDER BY article, month;
  3. Branch Performance Analysis:

    SELECT DISTINCT branch, EXTRACT('month' FROM sale_day) AS month, 
          SUM(revenue) OVER (PARTITION BY sale_month) AS total_revenue_month, 
          SUM(revenue) OVER (PARTITION BY branch, sale_month) AS branch_revenue_month,
          (SUM(revenue) OVER (PARTITION BY sale_month)::decimal / (SELECT COUNT(DISTINCT branch) FROM sales)::decimal) AS average_month_branch
    FROM sales WHERE EXTRACT('YEAR' FROM sale_day) = 2021 ORDER BY branch, month;
  4. Analytical Window Functions - Revenue Delta:

    WITH branch_month_sales AS (
      SELECT DISTINCT branch, EXTRACT('MONTH' FROM sale_day) AS month, 
             SUM(revenue) OVER (PARTITION BY branch, sale_month ) AS revenue
      FROM sales
    )
    SELECT branch, month, revenue AS revenue_current_month, 
          LAG(revenue) OVER (PARTITION BY branch ORDER BY month) AS revenue_prev_month,
          revenue - LAG(revenue) OVER (PARTITION BY branch ORDER BY month) AS revenue_delta
    FROM branch_month_sales ORDER BY branch, month;

Conclusion:

Understanding window functions and their associated SQL constructs like the OVER clause is pivotal for advanced data analysis and reporting tasks. By partitioning and ordering data appropriately, SQL developers can achieve complex analytical tasks efficiently. If you're keen to master this topic, hands-on courses and interactive tutorials, as mentioned in the article, can be immensely beneficial.

What Is the OVER() Clause in SQL? (2024)
Top Articles
Latest Posts
Article information

Author: Kimberely Baumbach CPA

Last Updated:

Views: 6063

Rating: 4 / 5 (41 voted)

Reviews: 80% of readers found this page helpful

Author information

Name: Kimberely Baumbach CPA

Birthday: 1996-01-14

Address: 8381 Boyce Course, Imeldachester, ND 74681

Phone: +3571286597580

Job: Product Banking Analyst

Hobby: Cosplaying, Inline skating, Amateur radio, Baton twirling, Mountaineering, Flying, Archery

Introduction: My name is Kimberely Baumbach CPA, I am a gorgeous, bright, charming, encouraging, zealous, lively, good person who loves writing and wants to share my knowledge and understanding with you.