Hello everyone! I will talk about SQL in this article, more precisely on how to have two completely different order rules for your returning records according to the index. In my case, I was using PostgreSQL, but this will work in any database. You only need to do some small changes according to your database.
So, let’s get started!
What is the problem?
Recently, I got a peculiar task that asked to add two sort rules. The top 3 rows should be order by one specific rule, let’s name it top_3_rows_order_rule
. The rest of the rows, from the fourth onwards, should be ordered by another specific rule, let’s name it rest_of_the_rows_order_rule
.
I never wrote any SQL query like that before. Usually, we just have one sort rule that may have multiple conditions that will be applied to all rows, but in this scenario, it’s a little bit different.
Something like this:
SELECT *
FROM <table_name>
ORDER BY
<table_name>.column_1,
<table_name>.column_2,
...
Sometimes we need to add a CASE
or write a more complex ORDER BY
statement, but for this task was different.
Just to give an example, imagine we have a employments
table with name
(text
), department
(text
), and salary
(integer
) columns.
And we want the top 3 rows to return the top 3 salary
where department
is development
and the rest of the rows to return the name
ordered alphabetically.
In this case, we need two completely different logics to order. A SQL query should be like this:
SELECT *
FROM <table_name>
ORDER BY
CASE WHEN <is_top_3_row>
THEN <top_3_rows_order_rule>
ELSE <rest_of_the_rows_order_rule>
END
At first, you may think this is just writing the two logic like the above sample, but we have a problem here: how to identify the 3 first rows?
What I mean is that we don’t know the row number, so we don’t have a way to figure out this: is_top_3_row
.
My first attempt was to use the UNION
operator so, let’s try it out.
Using union
You may think that using UNION
or UNION ALL
may be the best solution. Because you can write two queries:
- Create a query for the top 3 rows.
- Create another query for the rest of the rows.
- Combine both using
UNION ALL
.
Example:
-- Query for the Top 3 rows
SELECT *
FROM <table_name>
ORDER BY <top_3_rows_order_rule>
LIMIT 3
UNION ALL
-- Query for the rest of the rows
SELECT *
FROM <table_name>
ORDER BY <rest_of_the_rows_order_rule>
But unfortunately, this will not solve the issue.
I can mention at least two problems here.
Duplicated rows problem
The second query is not ignoring the rows from the first SELECT
, and this will lead to duplicated rows.
In other words, the second query (rest_of_the_rows_order_rule
) is not ignoring the rows from the first query (top_3_rows_order_rule
).
To fix that, you would have to remove the result from the first query in the second query, like:
-- Query for the Top 3 rows
SELECT *
FROM <table_name>
ORDER BY <top_3_rows_order_rule>
LIMIT 3
UNION ALL
-- Query for the rest of the rows
SELECT *
FROM <table_name>
WHERE id NOT IN (
-- The first query
SELECT id
FROM <table_name>
ORDER BY <top_3_rows_order_rule>
LIMIT 3
)
ORDER BY <rest_of_the_rows_order_rule>
The problems are:
- Every time you change the first query, you need to change it in two places.
- The database will probably run the first query twice.
- We are using the
NOT IN
operator, which will not have the best performance. Check this Stack Overflow answer for more info: https://stackoverflow.com/questions/17037508/sql-when-it-comes-to-not-in-and-not-equal-to-which-is-more-efficient-and-why/17038097#17038097
Ordering rows problem
Another problem is that the UNION
or UNION ALL
does not ensure that the order will be the same.
In other words, if the first SELECT
returns:
row 1
row 2
row 3
And the second SELECT
returns:
row 4
row 5
There is no guarantee that the result will be:
row 1
row 2
row 3
row 4
row 5
To fix that, you need to add an extra "order" column inside each query. Then, wrap the two queries inside another one and use this "order" column to sort again!
You can check this Stack Overflow answer for more info: https://stackoverflow.com/a/21181766/4538055
The downsides are:
- We need to add an extra column to sort.
- Even after ordering each sub-query, we need to order everything again.
Using ROW_NUMBER()
The ROW_NUMBER()
window function allows us to create a new column with the number of each row.
Inside this window function, you must provide a ORDER BY
, and it will return the row number starting with 1
.
You can check more about ROW_NUMBER()
here: SQL Server ROW_NUMBER Function
If you want to know more about window function, check this article.
Now, going back to our problem. We can use this window function to solve it.
Let me share the SQL query, and explain each part of it:
-- Query A
SELECT * FROM (
-- Query B
SELECT *
ROW_NUMBER() OVER (
-- Query B - Order By
ORDER BY <top_3_rows_order_rule>
) AS top_3_order_row_number
FROM <table_name>
) temp_table
-- Query A - Order By
ORDER BY
CASE WHEN top_3_order_row_number <= 3 THEN top_3_order_row_number ELSE 4 END,
<rest_of_the_rows_order_rule>;
Inside of ROW_NUMBER()
, we are ordering by top_3_rows_order_rule
( Query B - Order By
). So, the top_3_order_row_number
column with the values 1
, 2
and 3
will have the top 3 rows we want to show first.
Then, we add a SELECT
(Query A
) around the original SELECT
(Query B
), because this way we can use top_3_order_row_number
column name. In other words, the column top_3_order_row_number
is not accessible inside Query B
.
The last ORDER BY
(Query A - Order By
) is where the magic happens:
-- Query A - Order By
ORDER BY
CASE WHEN top_3_order_row_number <= 3 THEN top_3_order_row_number ELSE 4 END,
<rest_of_the_rows_order_rule>;
Since we already know that the rows with top_3_order_row_number
equals 1
, 2
, and 3
are the top rows, we just return the top_3_order_row_number
if it’s bellow or equals 3
, otherwise, we will return 4
.
This will pass the values 1
, 2
, 3
and 4
for ORDER BY
, that will order it.
So, if top_3_order_row_number
is higher than 3
, this rule will be "ignored", since all will have the same value (4
):
CASE WHEN top_3_order_row_number <= 3 THEN top_3_order_row_number ELSE 4 END
Then, the rest of the rows will be ordered by another rule:
<rest_of_the_rows_order_rule>;
And that will solve all of our problems! 🚀
Now, let me show an example so you can understand it better.
Test database
I created a simple table so you can see better what is going on and even test locally.
Here is the gist with the SQL to create and populate the database: Gist
This script will add a employments
table with the following columns:
id
-> Auto-increment (integer
).name
-> The employee name (text
).department
-> The employee’s department (text
).salary
-> The employee’s salary (integer
).
Using multiple sorts
Let’s say we want to show first the top 3 development
salaries and the higher salaries regardless of which department
is in.
Using ROW_NUMBER()
we can create this SQL Query:
SELECT *
FROM (
SELECT *,
ROW_NUMBER() OVER (
ORDER BY
CASE department
WHEN 'development' THEN 1
ELSE 2
END,
salary DESC
) AS top_dev_salaries_row_number
FROM employments
) employments
ORDER BY
CASE WHEN top_dev_salaries_row_number <= 3 THEN top_dev_salaries_row_number ELSE 4 END,
salary DESC;
Again, let’s check each part of this query!
We will put the top 3 ORDER BY
logic inside ROW_NUMBER()
:
ROW_NUMBER() OVER (
ORDER BY
CASE department
WHEN 'development' THEN 1
ELSE 2
END,
salary DESC
) AS top_dev_salaries_row_number
First, we are checking the department
. If the department is development
we are going to return 1
, otherwise, 2
so, the ORDER BY
will put first the rows where department = 'development'
.
This is the same thing we’ve done with CASE WHEN top_dev_salaries_row_number <= 3 THEN top_dev_salaries_row_number ELSE 4 END
.
After sorting first the development
, we also need to order by salary
. Because of that, we added salary DESC
.
Then, we have the last ORDER BY
:
ORDER BY
CASE WHEN top_dev_salaries_row_number <= 3 THEN top_dev_salaries_row_number ELSE 4 END,
salary DESC;
Again, the top_dev_salaries_row_number
will have the top 3 order so, we just want to preserve that order. Otherwise, it returns 4
for the rest of the rows.
Then, all the rest of the rows will have the same weight, and the next ORDER BY
clause will affect the order: salary DESC
.
In other words, the rest of the rows will be ordered by salary DESC
.
Avoiding the SELECT
wrapper
If you don't want to create an extra SELECT
to wrap your original query only to have access to row_number
column, I have two news:
The good news is that there is a way to prevent it.
The bad news is that it's not so pretty 😅.
You just need to replace all row_number
inside ORDER BY
by the actual ROW_NUMBER()
logic.
In order words, copy the whole ROW_NUMBER()
logic and paste it in ORDER BY
:
SELECT *,
ROW_NUMBER() OVER (
ORDER BY
CASE department WHEN 'development' THEN 1 ELSE 2 END,
salary DESC
) AS top_dev_salaries_row_number
FROM employments
ORDER BY
CASE WHEN
ROW_NUMBER() OVER (
ORDER BY
CASE department WHEN 'development' THEN 1 ELSE 2 END,
salary DESC
) <= 3
THEN ROW_NUMBER() OVER (
ORDER BY
CASE department WHEN 'development' THEN 1 ELSE 2 END,
salary DESC
)
ELSE 4 END,
salary DESC;
As you can see, the query is a little messy, but it’s good to know you can do that.
Adding more sorting rules
Here I'm using two different sorting rules, but you can use as many as you want to.
You only need to add more ROW_NUMBER()
and using them in your final ORDER BY
, like:
SELECT * FROM (
SELECT *
ROW_NUMBER() OVER (
ORDER BY <rule_1>
) AS row_number_rule_1,
ROW_NUMBER() OVER (
ORDER BY <rule_2>
) AS row_number_rule_2,
ROW_NUMBER() OVER (
ORDER BY <rule_3>
) AS row_number_rule_3
FROM <table_name>
) temp_table
ORDER BY
CASE WHEN row_number_rule_1 <= 3 THEN row_number ELSE 4 END, -- Rows 1-3 using rule 1
CASE WHEN row_number_rule_2 <= 2 THEN row_number ELSE 3 END, -- Rows 4-5 using rule 2
CASE WHEN row_number_rule_3 <= 5 THEN row_number ELSE 6 END, -- Rows 6-8 using rule 3
<rule_4>; -- Rows 9~ using rule 3
Performance
When I was creating this query in my client against a huge table, I didn’t see any performance issues.
My change was adding the ROW_NUMBER()
and the last ORDER BY
in an existing query, and the time to execute the query didn’t increase too much.
I also ran EXPLAIN ANALYZE
and the cost didn’t increase much.
But if you are adding this logic to a vital part of your application or a huge table, I recommend you check it.
Using our database
Here I will run some queries with EXPLAIN ANALYZE
. But keep in mind that this is a really small database, and it didn’t have an "original query" like in my client repository.
What I mean by "original query" is the current query we were using, with a bunch of INNER/LEFT JOIN
s, WHERE
‘s, and custom SELECT
‘s columns.
In other words, in my client, I had to add more rules in an existing complexity query, and here we don’t have that. It’s just a simple database to help us to understand the problem.
So, I will share here some queries with EXPLAIN ANALYZE
just to have a clue how PostgreSQL will handle it that.
Simple SELECT
EXPLAIN ANALYZE SELECT *
FROM employments;
Simple SELECT + ORDER BY
EXPLAIN ANALYZE SELECT *
FROM employments
ORDER BY salary DESC;
Simple SELECT + ROWS_NUMBER
EXPLAIN ANALYZE SELECT *,
ROW_NUMBER() OVER (
ORDER BY
CASE department
WHEN 'development' THEN 1
ELSE 2
END,
salary DESC
) AS top_dev_salaries_row_number
FROM employments;
Final Query
EXPLAIN ANALYZE SELECT *
FROM (
SELECT *,
ROW_NUMBER() OVER (
ORDER BY
CASE department
WHEN 'development' THEN 1
ELSE 2
END,
salary DESC
) AS top_dev_salaries_row_number
FROM employments
) employments
ORDER BY
CASE WHEN top_dev_salaries_row_number <= 3 THEN top_dev_salaries_row_number ELSE 4 END,
salary DESC;
As you can see, the "Planning Time" and "Execution Time" increases when we add more logic (obviously), but it didn’t increase drastically our performance.
Again, you must check that in your database with your query to see if this logic increases or not the query’s performance.
Conclusion
I really liked how this order works. In the beginning, I thought this task would be really hard but doing some deep research and mainly, talking with my coworkers help me a lot.
If you have any thoughts or better solutions, let us know 😉!
See you in the next post! 🚀
We want to work with you. Check out our "What We Do" section!