diff options
| author | Mads Jensen <mje@inducks.org> | 2017-09-18 15:42:29 +0200 |
|---|---|---|
| committer | Tim Graham <timograham@gmail.com> | 2017-09-18 09:42:29 -0400 |
| commit | d549b8805053d4b064bf492ba90e90db5d7e2a6b (patch) | |
| tree | 2beee237ae541804ba18367d81e82840745d6e47 /docs | |
| parent | da1ba03f1dfb303df9bfb5c76d36216e45d05edc (diff) | |
Fixed #26608 -- Added support for window expressions (OVER clause).
Thanks Josh Smeaton, Mariusz Felisiak, Sergey Fedoseev, Simon Charettes,
Adam Chainz/Johnson and Tim Graham for comments and reviews and Jamie
Cockburn for initial patch.
Diffstat (limited to 'docs')
| -rw-r--r-- | docs/ref/models/database-functions.txt | 129 | ||||
| -rw-r--r-- | docs/ref/models/expressions.txt | 209 | ||||
| -rw-r--r-- | docs/ref/models/querysets.txt | 5 | ||||
| -rw-r--r-- | docs/releases/2.0.txt | 13 |
4 files changed, 356 insertions, 0 deletions
diff --git a/docs/ref/models/database-functions.txt b/docs/ref/models/database-functions.txt index 875786c640..f5a9af3a05 100644 --- a/docs/ref/models/database-functions.txt +++ b/docs/ref/models/database-functions.txt @@ -819,3 +819,132 @@ Usage example:: 'minute': 'minute': datetime.datetime(2014, 6, 15, 14, 30, tzinfo=<UTC>), 'second': datetime.datetime(2014, 6, 15, 14, 30, 50, tzinfo=<UTC>) } + +.. _window-functions: + +Window functions +================ + +.. versionadded:: 2.0 + +There are a number of functions to use in a +:class:`~django.db.models.expressions.Window` expression for computing the rank +of elements or the :class:`Ntile` of some rows. + +``CumeDist`` +------------ + +.. class:: CumeDist(*expressions, **extra) + +Calculates the cumulative distribution of a value within a window or partition. +The cumulative distribution is defined as the number of rows preceding or +peered with the current row divided by the total number of rows in the frame. + +``DenseRank`` +------------- + +.. class:: DenseRank(*expressions, **extra) + +Equivalent to :class:`Rank` but does not have gaps. + +``FirstValue`` +-------------- + +.. class:: FirstValue(expression, **extra) + +Returns the value evaluated at the row that's the first row of the window +frame, or ``None`` if no such value exists. + +``Lag`` +------- + +.. class:: Lag(expression, offset=1, default=None, **extra) + +Calculates the value offset by ``offset``, and if no row exists there, returns +``default``. + +``default`` must have the same type as the ``expression``, however, this is +only validated by the database and not in Python. + +``LastValue`` +------------- + +.. class:: LastValue(expression, **extra) + +Comparable to :class:`FirstValue`, it calculates the last value in a given +frame clause. + +``Lead`` +-------- + +.. class:: Lead(expression, offset=1, default=None, **extra) + +Calculates the leading value in a given :ref:`frame <window-frames>`. Both +``offset`` and ``default`` are evaluated with respect to the current row. + +``default`` must have the same type as the ``expression``, however, this is +only validated by the database and not in Python. + +``NthValue`` +------------ + +.. class:: NthValue(expression, nth=1, **extra) + +Computes the row relative to the offset ``nth`` (must be a positive value) +within the window. Returns ``None`` if no row exists. + +Some databases may handle a nonexistent nth-value differently. For example, +Oracle returns an empty string rather than ``None`` for character-based +expressions. Django doesn't do any conversions in these cases. + +``Ntile`` +--------- + +.. class:: Ntile(num_buckets=1, **extra) + +Calculates a partition for each of the rows in the frame clause, distributing +numbers as evenly as possible between 1 and ``num_buckets``. If the rows don't +divide evenly into a number of buckets, one or more buckets will be represented +more frequently. + +``PercentRank`` +--------------- + +.. class:: PercentRank(*expressions, **extra) + +Computes the percentile rank of the rows in the frame clause. This +computation is equivalent to evaluating:: + + (rank - 1) / (total rows - 1) + +The following table explains the calculation for the percentile rank of a row: + +===== ===== ==== ============ ============ +Row # Value Rank Calculation Percent Rank +===== ===== ==== ============ ============ +1 15 1 (1-1)/(7-1) 0.0000 +2 20 2 (2-1)/(7-1) 0.1666 +3 20 2 (2-1)/(7-1) 0.1666 +4 20 2 (2-1)/(7-1) 0.1666 +5 30 5 (5-1)/(7-1) 0.6666 +6 30 5 (5-1)/(7-1) 0.6666 +7 40 7 (7-1)/(7-1) 1.0000 +===== ===== ==== ============ ============ + +``Rank`` +-------- + +.. class:: Rank(*expressions, **extra) + +Comparable to ``RowNumber``, this function ranks rows in the window. The +computed rank contains gaps. Use :class:`DenseRank` to compute rank without +gaps. + +``RowNumber`` +------------- + +.. class:: RowNumber(*expressions, **extra) + +Computes the row number according to the ordering of either the frame clause +or the ordering of the whole query if there is no partitioning of the +:ref:`window frame <window-frames>`. diff --git a/docs/ref/models/expressions.txt b/docs/ref/models/expressions.txt index 0974a9dd51..6ef1001a9f 100644 --- a/docs/ref/models/expressions.txt +++ b/docs/ref/models/expressions.txt @@ -353,6 +353,13 @@ The ``Aggregate`` API is as follows: generated. Specifically, the ``function`` will be interpolated as the ``function`` placeholder within :attr:`template`. Defaults to ``None``. + .. attribute:: window_compatible + + .. versionadded:: 2.0 + + Defaults to ``True`` since most aggregate functions can be used as the + source expression in :class:`~django.db.models.expressions.Window`. + The ``expression`` argument can be the name of a field on the model, or another expression. It will be converted to a string and used as the ``expressions`` placeholder within the ``template``. @@ -649,6 +656,184 @@ should avoid them if possible. force you to acknowledge that you're not interpolating your SQL with user provided data. +Window functions +---------------- + +.. versionadded:: 2.0 + +Window functions provide a way to apply functions on partitions. Unlike a +normal aggregation function which computes a final result for each set defined +by the group by, window functions operate on :ref:`frames <window-frames>` and +partitions, and compute the result for each row. + +You can specify multiple windows in the same query which in Django ORM would be +equivalent to including multiple expressions in a :doc:`QuerySet.annotate() +</topics/db/aggregation>` call. The ORM doesn't make use of named windows, +instead they are part of the selected columns. + +.. class:: Window(expression, partition_by=None, order_by=None, frame=None, output_field=None) + + .. attribute:: filterable + + Defaults to ``False``. The SQL standard disallows referencing window + functions in the ``WHERE`` clause and Django raises an exception when + constructing a ``QuerySet`` that would do that. + + .. attribute:: template + + Defaults to ``%(expression)s OVER (%(window)s)'``. If only the + ``expression`` argument is provided, the window clause will be blank. + +The ``Window`` class is the main expression for an ``OVER`` clause. + +The ``expression`` argument is either a :ref:`window function +<window-functions>`, an :ref:`aggregate function <aggregation-functions>`, or +an expression that's compatible in a window clause. + +The ``partition_by`` argument is a list of expressions (column names should be +wrapped in an ``F``-object) that control the partitioning of the rows. +Partitioning narrows which rows are used to compute the result set. + +The ``output_field`` is specified either as an argument or by the expression. + +The ``order_by`` argument accepts a sequence of expressions on which you can +call :meth:`~django.db.models.Expression.asc` and +:meth:`~django.db.models.Expression.desc`. The ordering controls the order in +which the expression is applied. For example, if you sum over the rows in a +partition, the first result is just the value of the first row, the second is +the sum of first and second row. + +The ``frame`` parameter specifies which other rows that should be used in the +computation. See :ref:`window-frames` for details. + +For example, to annotate each movie with the average rating for the movies by +the same studio in the same genre and release year:: + + >>> from django.db.models import Avg, ExtractYear, F, Window + >>> Movie.objects.annotate( + >>> avg_rating=Window( + >>> expression=Avg('rating'), + >>> partition_by=[F('studio'), F('genre')], + >>> order_by=ExtractYear('released').asc(), + >>> ), + >>> ) + +This makes it easy to check if a movie is rated better or worse than its peers. + +You may want to apply multiple expressions over the same window, i.e., the +same partition and frame. For example, you could modify the previous example +to also include the best and worst rating in each movie's group (same studio, +genre, and release year) by using three window functions in the same query. The +partition and ordering from the previous example is extracted into a dictionary +to reduce repetition:: + + >>> from django.db.models import Avg, ExtractYear, F, Max, Min, Window + >>> window = { + >>> 'partition': [F('studio'), F('genre')], + >>> 'order_by': ExtractYear('released').asc(), + >>> } + >>> Movie.objects.annotate( + >>> avg_rating=Window( + >>> expression=Avg('rating'), **window, + >>> ), + >>> best=Window( + >>> expression=Max('rating'), **window, + >>> ), + >>> worst=Window( + >>> expression=Min('rating'), **window, + >>> ), + >>> ) + +Among Django's built-in database backends, MySQL 8.0.2+, PostgreSQL, and Oracle +support window expressions. Support for different window expression features +varies among the different databases. For example, the options in +:meth:`~django.db.models.Expression.asc` and +:meth:`~django.db.models.Expression.desc` may not be supported. Consult the +documentation for your database as needed. + +.. _window-frames: + +Frames +~~~~~~ + +For a window frame, you can choose either a range-based sequence of rows or an +ordinary sequence of rows. + +.. class:: ValueRange(start=None, end=None) + + .. attribute:: frame_type + + This attribute is set to ``'RANGE'``. + + PostgreSQL has limited support for ``ValueRange`` and only supports use of + the standard start and end points, such as ``CURRENT ROW`` and ``UNBOUNDED + FOLLOWING``. + +.. class:: RowRange(start=None, end=None) + + .. attribute:: frame_type + + This attribute is set to ``'ROWS'``. + +Both classes return SQL with the template:: + + %(frame_type)s BETWEEN %(start)s AND %(end)s + +Frames narrow the rows that are used for computing the result. They shift from +some start point to some specified end point. Frames can be used with and +without partitions, but it's often a good idea to specify an ordering of the +window to ensure a deterministic result. In a frame, a peer in a frame is a row +with an equivalent value, or all rows if an ordering clause isn't present. + +The default starting point for a frame is ``UNBOUNDED PRECEDING`` which is the +first row of the partition. The end point is always explicitly included in the +SQL generated by the ORM and is by default ``UNBOUNDED FOLLOWING``. The default +frame includes all rows from the partition to the last row in the set. + +The accepted values for the ``start`` and ``end`` arguments are ``None``, an +integer, or zero. A negative integer for ``start`` results in ``N preceding``, +while ``None`` yields ``UNBOUNDED PRECEDING``. For both ``start`` and ``end``, +zero will return ``CURRENT ROW``. Positive integers are accepted for ``end``. + +There's a difference in what ``CURRENT ROW`` includes. When specified in +``ROWS`` mode, the frame starts or ends with the current row. When specified in +``RANGE`` mode, the frame starts or ends at the first or last peer according to +the ordering clause. Thus, ``RANGE CURRENT ROW`` evaluates the expression for +rows which have the same value specified by the ordering. Because the template +includes both the ``start`` and ``end`` points, this may be expressed with:: + + ValueRange(start=0, end=0) + +If a movie's "peers" are described as movies released by the same studio in the +same genre in the same year, this ``RowRange`` example annotates each movie +with the average rating of a movie's two prior and two following peers:: + + >>> from django.db.models import Avg, ExtractYear, F, RowRange, Window + >>> Movie.objects.annotate( + >>> avg_rating=Window( + >>> expression=Avg('rating'), + >>> partition_by=[F('studio'), F('genre')], + >>> order_by=ExtractYear('released').asc(), + >>> frame=RowRange(start=-2, end=2), + >>> ), + >>> ) + +If the database supports it, you can specify the start and end points based on +values of an expression in the partition. If the ``released`` field of the +``Movie`` model stores the release month of each movies, this ``ValueRange`` +example annotates each movie with the average rating of a movie's peers +released between twelve months before and twelve months after the each movie. + + >>> from django.db.models import Avg, ExpressionList, F, ValueRange, Window + >>> Movie.objects.annotate( + >>> avg_rating=Window( + >>> expression=Avg('rating'), + >>> partition_by=[F('studio'), F('genre')], + >>> order_by=F('released').asc(), + >>> frame=ValueRange(start=-12, end=12), + >>> ), + >>> ) + .. currentmodule:: django.db.models Technical Information @@ -677,6 +862,30 @@ calling the appropriate methods on the wrapped expression. Tells Django that this expression contains an aggregate and that a ``GROUP BY`` clause needs to be added to the query. + .. attribute:: contains_over_clause + + .. versionadded:: 2.0 + + Tells Django that this expression contains a + :class:`~django.db.models.expressions.Window` expression. It's used, + for example, to disallow window function expressions in queries that + modify data. Defaults to ``True``. + + .. attribute:: filterable + + .. versionadded:: 2.0 + + Tells Django that this expression can be referenced in + :meth:`.QuerySet.filter`. Defaults to ``True``. + + .. attribute:: window_compatible + + .. versionadded:: 2.0 + + Tells Django that this expression can be used as the source expression + in :class:`~django.db.models.expressions.Window`. Defaults to + ``False``. + .. method:: resolve_expression(query=None, allow_joins=True, reuse=None, summarize=False, for_save=False) Provides the chance to do any pre-processing or validation of diff --git a/docs/ref/models/querysets.txt b/docs/ref/models/querysets.txt index c7ab413f53..f62050d818 100644 --- a/docs/ref/models/querysets.txt +++ b/docs/ref/models/querysets.txt @@ -1681,6 +1681,11 @@ raised if ``select_for_update()`` is used in autocommit mode. ``select_for_update()`` you should use :class:`~django.test.TransactionTestCase`. +.. admonition:: Certain expressions may not be supported + + PostgreSQL doesn't support ``select_for_update()`` with + :class:`~django.db.models.expressions.Window` expressions. + .. versionchanged:: 1.11 The ``skip_locked`` argument was added. diff --git a/docs/releases/2.0.txt b/docs/releases/2.0.txt index da898db67d..fd9cf8dfd3 100644 --- a/docs/releases/2.0.txt +++ b/docs/releases/2.0.txt @@ -52,6 +52,14 @@ Mobile-friendly ``contrib.admin`` The admin is now responsive and supports all major mobile devices. Older browser may experience varying levels of graceful degradation. +Window expressions +------------------ + +The new :class:`~django.db.models.expressions.Window` expression allows +adding an ``OVER`` clause to querysets. You can use :ref:`window functions +<window-functions>` and :ref:`aggregate functions <aggregation-functions>` in +the expression. + Minor features -------------- @@ -404,6 +412,11 @@ backends. requires that the arguments to ``OF`` be columns rather than tables, set ``DatabaseFeatures.select_for_update_of_column = True``. +* To enable support for :class:`~django.db.models.expressions.Window` + expressions, set ``DatabaseFeatures.supports_over_clause`` to ``True``. You + may need to customize the ``DatabaseOperations.window_start_rows_start_end()`` + and/or ``window_start_range_start_end()`` methods. + * Third-party database backends should add a ``DatabaseOperations.cast_char_field_without_max_length`` attribute with the database data type that will be used in the |
