[2.1.X] Fixed CVE-2019-14233 -- Prevented excessive HTMLParser recursion in strip_tags() when handling incomplete HTML entities.

Thanks to Guido Vranken for initial report.
author: Florian Apolloner <florian@apolloner.eu> 2019-07-15 12:00:06 +0200
committer: Carlton Gibson <carlton.gibson@noumenal.es> 2019-07-29 11:12:53 +0200
commit: 5ff8e791148bd451180124d76a55cb2b2b9556eb (patch)
tree: eb9f93019462f82a18ea6f89263f275d53563623
parent: c23723a1551340cc7d3126f04fcfd178fa224193 (diff)
4 files changed, 38 insertions, 2 deletions
diff --git a/django/utils/html.py b/django/utils/html.py
index 5dd67fd151..4514759d81 100644
--- a/django/utils/html.py
+++ b/django/utils/html.py
@@ -186,8 +186,8 @@ def strip_tags(value):
     value = str(value)
     while '<' in value and '>' in value:
         new_value = _strip_once(value)
-        if len(new_value) >= len(value):
-            # _strip_once was not able to detect more tags
+        if value.count('<') == new_value.count('<'):
+            # _strip_once wasn't able to detect more tags.
             break
         value = new_value
     return value
diff --git a/docs/releases/1.11.23.txt b/docs/releases/1.11.23.txt
index 6058bb8a81..c95ffd9a50 100644
--- a/docs/releases/1.11.23.txt
+++ b/docs/releases/1.11.23.txt
@@ -19,3 +19,20 @@ filters, which were thus vulnerable.
 The regular expressions used by ``Truncator`` have been simplified in order to
 avoid potential backtracking issues. As a consequence, trailing punctuation may
 now at times be included in the truncated output.
+
+CVE-2019-14233: Denial-of-service possibility in ``strip_tags()``
+=================================================================
+
+Due to the behavior of the underlying ``HTMLParser``,
+:func:`django.utils.html.strip_tags` would be extremely slow to evaluate
+certain inputs containing large sequences of nested incomplete HTML entities.
+The ``strip_tags()`` method is used to implement the corresponding
+:tfilter:`striptags` template filter, which was thus also vulnerable.
+
+``strip_tags()`` now avoids recursive calls to ``HTMLParser`` when progress
+removing tags, but necessarily incomplete HTML entities, stops being made.
+
+Remember that absolutely NO guarantee is provided about the results of
+``strip_tags()`` being HTML safe. So NEVER mark safe the result of a
+``strip_tags()`` call without escaping it first, for example with
+:func:`django.utils.html.escape`.
diff --git a/docs/releases/2.1.11.txt b/docs/releases/2.1.11.txt
index f4ee3dbd30..9cae1e6f2e 100644
--- a/docs/releases/2.1.11.txt
+++ b/docs/releases/2.1.11.txt
@@ -19,3 +19,20 @@ filters, which were thus vulnerable.
 The regular expressions used by ``Truncator`` have been simplified in order to
 avoid potential backtracking issues. As a consequence, trailing punctuation may
 now at times be included in the truncated output.
+
+CVE-2019-14233: Denial-of-service possibility in ``strip_tags()``
+=================================================================
+
+Due to the behavior of the underlying ``HTMLParser``,
+:func:`django.utils.html.strip_tags` would be extremely slow to evaluate
+certain inputs containing large sequences of nested incomplete HTML entities.
+The ``strip_tags()`` method is used to implement the corresponding
+:tfilter:`striptags` template filter, which was thus also vulnerable.
+
+``strip_tags()`` now avoids recursive calls to ``HTMLParser`` when progress
+removing tags, but necessarily incomplete HTML entities, stops being made.
+
+Remember that absolutely NO guarantee is provided about the results of
+``strip_tags()`` being HTML safe. So NEVER mark safe the result of a
+``strip_tags()`` call without escaping it first, for example with
+:func:`django.utils.html.escape`.
diff --git a/tests/utils_tests/test_html.py b/tests/utils_tests/test_html.py
index 94b8f946cc..8feb4d8e82 100644
--- a/tests/utils_tests/test_html.py
+++ b/tests/utils_tests/test_html.py
@@ -88,6 +88,8 @@ class TestUtilsHtml(SimpleTestCase):
             ('&gotcha&#;<>', '&gotcha&#;<>'),
             ('<sc<!-- -->ript>test<<!-- -->/script>', 'ript>test'),
             ('<script>alert()</script>&h', 'alert()h'),
+            ('><!' + ('&' * 16000) + 'D', '><!' + ('&' * 16000) + 'D'),
+            ('X<<<<br>br>br>br>X', 'XX'),
         )
         for value, output in items:
             with self.subTest(value=value, output=output):
author	Florian Apolloner <florian@apolloner.eu>	2019-07-15 12:00:06 +0200
committer	Carlton Gibson <carlton.gibson@noumenal.es>	2019-07-29 11:12:53 +0200
commit	5ff8e791148bd451180124d76a55cb2b2b9556eb (patch)
tree	eb9f93019462f82a18ea6f89263f275d53563623
parent	c23723a1551340cc7d3126f04fcfd178fa224193 (diff)