Presto follows the SQL Standard faithfully. We extend it only when it is well justified,
we strive to never break it and we always prefer the standard way of doing things.
There was one situation where we stumbled, though. We had a non-standard way of limiting
query results with LIMIT n
without implementing the standard way of doing that first.
We have corrected that, adding ANSI SQL way of limiting query results, discarding initial
results and – a hidden gem – retaining initial results in case of ties.
Limiting query results #
Probably everyone using relational databases knows the LIMIT n
syntax for limiting query
results. It is supported by e.g. MySQL, PostgreSQL and many more SQL engines following
their example. It is so common that one could think that LIMIT n
is the standard way
of limiting the query results. Let’s have a look at how various popular SQL engines
provide this feature.
- DB2, MySQL, MariaDB, PostgreSQL, Redshift, MemSQL, SQLite and many others provide the
... LIMIT n
syntax. - SQL Server provides
SELECT TOP n ...
syntax. - Oracle provides
... WHERE ROWNUM <= n
syntax.
And what does the SQL Standard say?
SELECT *
FROM my_table
FETCH FIRST n ROWS ONLY
If we look again at the database systems mentioned above, it turns out many of them support the standard syntax too: Oracle, DB2, SQL Server and PostgreSQL (although that’s not documented currently).
And Presto? Presto has LIMIT n
support since 2012. In Presto 310,
we added also the FETCH FIRST n ROWS ONLY
support.
Let’s have a look beyond the limits.
Tie break #
Admittedly, FETCH FIRST n ROWS ONLY
syntax is way more verbose than the short LIMIT n
syntax Presto
always supported (and still does). However, it is also more powerful: it allows selecting rows “top n,
ties included”. Consider a case where you want to list top 3 students with highest score on an exam.
What happens if the 3rd, 4th and 5th persons have equal score? Which
one should be returned? Instead of getting an arbitrary (and indeterminate) result you can use
the FETCH FIRST n ROWS WITH TIES
syntax:
SELECT student_name, score
FROM student s JOIN exam_result e ON s.id = e.student_id
ORDER BY score
FETCH FIRST 3 ROWS WITH TIES
The FETCH FIRST n ROWS WITH TIES
clause retains all rows with equal values of the ordering keys (the ORDER BY
clause) as
the last row that would be returned by the FETCH FIRST n ROWS ONLY
clause.
Offset #
Per the SQL Standard, the FETCH FIRST n ROWS ONLY
clause can be prepended with OFFSET m
, to skip m
initial rows.
In such a case, it makes sense to use FETCH NEXT ...
variant of the clause – it’s allowed with and without OFFSET
,
but definitely looks better with that clause.
SELECT student_name, score
FROM student s JOIN exam_result e ON s.id = e.student_id
ORDER BY score
OFFSET 5
FETCH NEXT 3 ROWS WITH TIES
As an extension to SQL Standard, and for the brevity of this syntax, we also allow OFFSET
with LIMIT
:
SELECT student_name, score
FROM student s JOIN exam_result e ON s.id = e.student_id
ORDER BY score
OFFSET 5
LIMIT 3
Concluding notes #
LIMIT
/ FETCH FIRST ... ROWS ONLY
, FETCH FIRST ... WITH TIES
and OFFSET
are powerful and very useful clauses
that come especially handy when writing ad-hoc queries over big data sets. They offer certain syntactic freedom beyond
what is described here, so check out documentation of OFFSET Clause and
LIMIT or FETCH FIRST Clauses for all the options.
Since semantics of these clauses depend on query results being well ordered, they are best used with ORDER BY
that
defines proper ordering. Without proper ordering the results are arbitrary (except for WITH TIES
) which may or may
not be a problem, depending on the use case.
For scheduled queries, or queries that are part of some workflow (as opposed to ad-hoc), we recommend using query
predicates (where relevant) instead of OFFSET
. Read more at
https://use-the-index-luke.com/sql/partial-results/fetch-next-page.
□