Fixed #26608 -- Added support for window expressions (OVER clause).

Thanks Josh Smeaton, Mariusz Felisiak, Sergey Fedoseev, Simon Charettes, Adam Chainz/Johnson and Tim Graham for comments and reviews and Jamie Cockburn for initial patch.
author: Mads Jensen <mje@inducks.org> 2017-09-18 15:42:29 +0200
committer: Tim Graham <timograham@gmail.com> 2017-09-18 09:42:29 -0400
commit: d549b8805053d4b064bf492ba90e90db5d7e2a6b (patch)
tree: 2beee237ae541804ba18367d81e82840745d6e47 /docs
parent: da1ba03f1dfb303df9bfb5c76d36216e45d05edc (diff)
4 files changed, 356 insertions, 0 deletions
diff --git a/docs/ref/models/database-functions.txt b/docs/ref/models/database-functions.txt
index 875786c640..f5a9af3a05 100644
--- a/docs/ref/models/database-functions.txt
+++ b/docs/ref/models/database-functions.txt
@@ -819,3 +819,132 @@ Usage example::
      'minute': 'minute': datetime.datetime(2014, 6, 15, 14, 30, tzinfo=<UTC>),
      'second': datetime.datetime(2014, 6, 15, 14, 30, 50, tzinfo=<UTC>)
     }
+
+.. _window-functions:
+
+Window functions
+================
+
+.. versionadded:: 2.0
+
+There are a number of functions to use in a
+:class:`~django.db.models.expressions.Window` expression for computing the rank
+of elements or the :class:`Ntile` of some rows.
+
+``CumeDist``
+------------
+
+.. class:: CumeDist(*expressions, **extra)
+
+Calculates the cumulative distribution of a value within a window or partition.
+The cumulative distribution is defined as the number of rows preceding or
+peered with the current row divided by the total number of rows in the frame.
+
+``DenseRank``
+-------------
+
+.. class:: DenseRank(*expressions, **extra)
+
+Equivalent to :class:`Rank` but does not have gaps.
+
+``FirstValue``
+--------------
+
+.. class:: FirstValue(expression, **extra)
+
+Returns the value evaluated at the row that's the first row of the window
+frame, or ``None`` if no such value exists.
+
+``Lag``
+-------
+
+.. class:: Lag(expression, offset=1, default=None, **extra)
+
+Calculates the value offset by ``offset``, and if no row exists there, returns
+``default``.
+
+``default`` must have the same type as the ``expression``, however, this is
+only validated by the database and not in Python.
+
+``LastValue``
+-------------
+
+.. class:: LastValue(expression, **extra)
+
+Comparable to :class:`FirstValue`, it calculates the last value in a given
+frame clause.
+
+``Lead``
+--------
+
+.. class:: Lead(expression, offset=1, default=None, **extra)
+
+Calculates the leading value in a given :ref:`frame <window-frames>`. Both
+``offset`` and ``default`` are evaluated with respect to the current row.
+
+``default`` must have the same type as the ``expression``, however, this is
+only validated by the database and not in Python.
+
+``NthValue``
+------------
+
+.. class:: NthValue(expression, nth=1, **extra)
+
+Computes the row relative to the offset ``nth`` (must be a positive value)
+within the window. Returns ``None`` if no row exists.
+
+Some databases may handle a nonexistent nth-value differently. For example,
+Oracle returns an empty string rather than ``None`` for character-based
+expressions. Django doesn't do any conversions in these cases.
+
+``Ntile``
+---------
+
+.. class:: Ntile(num_buckets=1, **extra)
+
+Calculates a partition for each of the rows in the frame clause, distributing
+numbers as evenly as possible between 1 and ``num_buckets``. If the rows don't
+divide evenly into a number of buckets, one or more buckets will be represented
+more frequently.
+
+``PercentRank``
+---------------
+
+.. class:: PercentRank(*expressions, **extra)
+
+Computes the percentile rank of the rows in the frame clause. This
+computation is equivalent to evaluating::
+
+    (rank - 1) / (total rows - 1)
+
+The following table explains the calculation for the percentile rank of a row:
+
+===== ===== ==== ============ ============
+Row # Value Rank Calculation  Percent Rank
+===== ===== ==== ============ ============
+1     15    1    (1-1)/(7-1)  0.0000
+2     20    2    (2-1)/(7-1)  0.1666
+3     20    2    (2-1)/(7-1)  0.1666
+4     20    2    (2-1)/(7-1)  0.1666
+5     30    5    (5-1)/(7-1)  0.6666
+6     30    5    (5-1)/(7-1)  0.6666
+7     40    7    (7-1)/(7-1)  1.0000
+===== ===== ==== ============ ============
+
+``Rank``
+--------
+
+.. class:: Rank(*expressions, **extra)
+
+Comparable to ``RowNumber``, this function ranks rows in the window. The
+computed rank contains gaps. Use :class:`DenseRank` to compute rank without
+gaps.
+
+``RowNumber``
+-------------
+
+.. class:: RowNumber(*expressions, **extra)
+
+Computes the row number according to the ordering of either the frame clause
+or the ordering of the whole query if there is no partitioning of the
+:ref:`window frame <window-frames>`.
diff --git a/docs/ref/models/expressions.txt b/docs/ref/models/expressions.txt
index 0974a9dd51..6ef1001a9f 100644
--- a/docs/ref/models/expressions.txt
+++ b/docs/ref/models/expressions.txt
@@ -353,6 +353,13 @@ The ``Aggregate`` API is as follows:
         generated. Specifically, the ``function`` will be interpolated as the
         ``function`` placeholder within :attr:`template`. Defaults to ``None``.
 
+    .. attribute:: window_compatible
+
+        .. versionadded:: 2.0
+
+        Defaults to ``True`` since most aggregate functions can be used as the
+        source expression in :class:`~django.db.models.expressions.Window`.
+
 The ``expression`` argument can be the name of a field on the model, or another
 expression. It will be converted to a string and used as the ``expressions``
 placeholder within the ``template``.
@@ -649,6 +656,184 @@ should avoid them if possible.
     force you to acknowledge that you're not interpolating your SQL with user
     provided data.
 
+Window functions
+----------------
+
+.. versionadded:: 2.0
+
+Window functions provide a way to apply functions on partitions. Unlike a
+normal aggregation function which computes a final result for each set defined
+by the group by, window functions operate on :ref:`frames <window-frames>` and
+partitions, and compute the result for each row.
+
+You can specify multiple windows in the same query which in Django ORM would be
+equivalent to including multiple expressions in a :doc:`QuerySet.annotate()
+</topics/db/aggregation>` call. The ORM doesn't make use of named windows,
+instead they are part of the selected columns.
+
+.. class:: Window(expression, partition_by=None, order_by=None, frame=None, output_field=None)
+
+    .. attribute:: filterable
+
+        Defaults to ``False``. The SQL standard disallows referencing window
+        functions in the ``WHERE`` clause and Django raises an exception when
+        constructing a ``QuerySet`` that would do that.
+
+    .. attribute:: template
+
+        Defaults to ``%(expression)s OVER (%(window)s)'``. If only the
+        ``expression`` argument is provided, the window clause will be blank.
+
+The ``Window`` class is the main expression for an ``OVER`` clause.
+
+The ``expression`` argument is either a :ref:`window function
+<window-functions>`, an :ref:`aggregate function <aggregation-functions>`, or
+an expression that's compatible in a window clause.
+
+The ``partition_by`` argument is a list of expressions (column names should be
+wrapped in an ``F``-object) that control the partitioning of the rows.
+Partitioning narrows which rows are used to compute the result set.
+
+The ``output_field`` is specified either as an argument or by the expression.
+
+The ``order_by`` argument accepts a sequence of expressions on which you can
+call :meth:`~django.db.models.Expression.asc` and
+:meth:`~django.db.models.Expression.desc`. The ordering controls the order in
+which the expression is applied. For example, if you sum over the rows in a
+partition, the first result is just the value of the first row, the second is
+the sum of first and second row.
+
+The ``frame`` parameter specifies which other rows that should be used in the
+computation. See :ref:`window-frames` for details.
+
+For example, to annotate each movie with the average rating for the movies by
+the same studio in the same genre and release year::
+
+    >>> from django.db.models import Avg, ExtractYear, F, Window
+    >>> Movie.objects.annotate(
+    >>>     avg_rating=Window(
+    >>>         expression=Avg('rating'),
+    >>>         partition_by=[F('studio'), F('genre')],
+    >>>         order_by=ExtractYear('released').asc(),
+    >>>     ),
+    >>> )
+
+This makes it easy to check if a movie is rated better or worse than its peers.
+
+You may want to apply multiple expressions over the same window, i.e., the
+same partition and frame. For example, you could modify the previous example
+to also include the best and worst rating in each movie's group (same studio,
+genre, and release year) by using three window functions in the same query. The
+partition and ordering from the previous example is extracted into a dictionary
+to reduce repetition::
+
+    >>> from django.db.models import Avg, ExtractYear, F, Max, Min, Window
+    >>> window = {
+    >>>    'partition': [F('studio'), F('genre')],
+    >>>    'order_by': ExtractYear('released').asc(),
+    >>> }
+    >>> Movie.objects.annotate(
+    >>>     avg_rating=Window(
+    >>>         expression=Avg('rating'), **window,
+    >>>     ),
+    >>>     best=Window(
+    >>>         expression=Max('rating'), **window,
+    >>>     ),
+    >>>     worst=Window(
+    >>>         expression=Min('rating'), **window,
+    >>>     ),
+    >>> )
+
+Among Django's built-in database backends, MySQL 8.0.2+, PostgreSQL, and Oracle
+support window expressions. Support for different window expression features
+varies among the different databases. For example, the options in
+:meth:`~django.db.models.Expression.asc` and
+:meth:`~django.db.models.Expression.desc` may not be supported. Consult the
+documentation for your database as needed.
+
+.. _window-frames:
+
+Frames
+~~~~~~
+
+For a window frame, you can choose either a range-based sequence of rows or an
+ordinary sequence of rows.
+
+.. class:: ValueRange(start=None, end=None)
+
+    .. attribute:: frame_type
+
+        This attribute is set to ``'RANGE'``.
+
+    PostgreSQL has limited support for ``ValueRange`` and only supports use of
+    the standard start and end points, such as ``CURRENT ROW`` and ``UNBOUNDED
+    FOLLOWING``.
+
+.. class:: RowRange(start=None, end=None)
+
+    .. attribute:: frame_type
+
+        This attribute is set to ``'ROWS'``.
+
+Both classes return SQL with the template::
+
+    %(frame_type)s BETWEEN %(start)s AND %(end)s
+
+Frames narrow the rows that are used for computing the result. They shift from
+some start point to some specified end point. Frames can be used with and
+without partitions, but it's often a good idea to specify an ordering of the
+window to ensure a deterministic result. In a frame, a peer in a frame is a row
+with an equivalent value, or all rows if an ordering clause isn't present.
+
+The default starting point for a frame is ``UNBOUNDED PRECEDING`` which is the
+first row of the partition. The end point is always explicitly included in the
+SQL generated by the ORM and is by default ``UNBOUNDED FOLLOWING``. The default
+frame includes all rows from the partition to the last row in the set.
+
+The accepted values for the ``start`` and ``end`` arguments are ``None``, an
+integer, or zero. A negative integer for ``start`` results in ``N preceding``,
+while ``None`` yields ``UNBOUNDED PRECEDING``. For both ``start`` and ``end``,
+zero will return ``CURRENT ROW``. Positive integers are accepted for ``end``.
+
+There's a difference in what ``CURRENT ROW`` includes. When specified in
+``ROWS`` mode, the frame starts or ends with the current row. When specified in
+``RANGE`` mode, the frame starts or ends at the first or last peer according to
+the ordering clause. Thus, ``RANGE CURRENT ROW`` evaluates the expression for
+rows which have the same value specified by the ordering. Because the template
+includes both the ``start`` and ``end`` points, this may be expressed with::
+
+    ValueRange(start=0, end=0)
+
+If a movie's "peers" are described as movies released by the same studio in the
+same genre in the same year, this ``RowRange`` example annotates each movie
+with the average rating of a movie's two prior and two following peers::
+
+    >>> from django.db.models import Avg, ExtractYear, F, RowRange, Window
+    >>> Movie.objects.annotate(
+    >>>     avg_rating=Window(
+    >>>         expression=Avg('rating'),
+    >>>         partition_by=[F('studio'), F('genre')],
+    >>>         order_by=ExtractYear('released').asc(),
+    >>>         frame=RowRange(start=-2, end=2),
+    >>>     ),
+    >>> )
+
+If the database supports it, you can specify the start and end points based on
+values of an expression in the partition. If the ``released`` field of the
+``Movie`` model stores the release month of each movies, this ``ValueRange``
+example annotates each movie with the average rating of a movie's peers
+released between twelve months before and twelve months after the each movie.
+
+    >>> from django.db.models import Avg, ExpressionList, F, ValueRange, Window
+    >>> Movie.objects.annotate(
+    >>>     avg_rating=Window(
+    >>>         expression=Avg('rating'),
+    >>>         partition_by=[F('studio'), F('genre')],
+    >>>         order_by=F('released').asc(),
+    >>>         frame=ValueRange(start=-12, end=12),
+    >>>     ),
+    >>> )
+
 .. currentmodule:: django.db.models
 
 Technical Information
@@ -677,6 +862,30 @@ calling the appropriate methods on the wrapped expression.
         Tells Django that this expression contains an aggregate and that a
         ``GROUP BY`` clause needs to be added to the query.
 
+    .. attribute:: contains_over_clause
+
+        .. versionadded:: 2.0
+
+        Tells Django that this expression contains a
+        :class:`~django.db.models.expressions.Window` expression. It's used,
+        for example, to disallow window function expressions in queries that
+        modify data. Defaults to ``True``.
+
+    .. attribute:: filterable
+
+        .. versionadded:: 2.0
+
+        Tells Django that this expression can be referenced in
+        :meth:`.QuerySet.filter`. Defaults to ``True``.
+
+    .. attribute:: window_compatible
+
+        .. versionadded:: 2.0
+
+        Tells Django that this expression can be used as the source expression
+        in :class:`~django.db.models.expressions.Window`. Defaults to
+        ``False``.
+
     .. method:: resolve_expression(query=None, allow_joins=True, reuse=None, summarize=False, for_save=False)
 
         Provides the chance to do any pre-processing or validation of
diff --git a/docs/ref/models/querysets.txt b/docs/ref/models/querysets.txt
index c7ab413f53..f62050d818 100644
--- a/docs/ref/models/querysets.txt
+++ b/docs/ref/models/querysets.txt
@@ -1681,6 +1681,11 @@ raised if ``select_for_update()`` is used in autocommit mode.
     ``select_for_update()`` you should use
     :class:`~django.test.TransactionTestCase`.
 
+.. admonition:: Certain expressions may not be supported
+
+    PostgreSQL doesn't support ``select_for_update()`` with
+    :class:`~django.db.models.expressions.Window` expressions.
+
 .. versionchanged:: 1.11
 
     The ``skip_locked`` argument was added.
diff --git a/docs/releases/2.0.txt b/docs/releases/2.0.txt
index da898db67d..fd9cf8dfd3 100644
--- a/docs/releases/2.0.txt
+++ b/docs/releases/2.0.txt
@@ -52,6 +52,14 @@ Mobile-friendly ``contrib.admin``
 The admin is now responsive and supports all major mobile devices.
 Older browser may experience varying levels of graceful degradation.
 
+Window expressions
+------------------
+
+The new :class:`~django.db.models.expressions.Window` expression allows
+adding an ``OVER`` clause to querysets. You can use :ref:`window functions
+<window-functions>` and :ref:`aggregate functions <aggregation-functions>` in
+the expression.
+
 Minor features
 --------------
 
@@ -404,6 +412,11 @@ backends.
   requires that the arguments to ``OF`` be columns rather than tables, set
   ``DatabaseFeatures.select_for_update_of_column = True``.
 
+* To enable support for :class:`~django.db.models.expressions.Window`
+  expressions, set ``DatabaseFeatures.supports_over_clause`` to ``True``. You
+  may need to customize the ``DatabaseOperations.window_start_rows_start_end()``
+  and/or ``window_start_range_start_end()`` methods.
+
 * Third-party database backends should add a
   ``DatabaseOperations.cast_char_field_without_max_length`` attribute with the
   database data type that will be used in the
author	Mads Jensen <mje@inducks.org>	2017-09-18 15:42:29 +0200
committer	Tim Graham <timograham@gmail.com>	2017-09-18 09:42:29 -0400
commit	d549b8805053d4b064bf492ba90e90db5d7e2a6b (patch)
tree	2beee237ae541804ba18367d81e82840745d6e47 /docs
parent	da1ba03f1dfb303df9bfb5c76d36216e45d05edc (diff)