summaryrefslogtreecommitdiff
path: root/docs/ref/unicode.txt
diff options
context:
space:
mode:
authorAnubhav Joshi <anubhav9042@gmail.com>2014-07-22 17:55:22 +0530
committerLoic Bistuer <loic.bistuer@gmail.com>2014-10-16 02:31:17 +0700
commit10b17a22bec2eaf44c3315614aea87c127caee46 (patch)
tree39145c16ca06aa33050e1642076db4216d663a10 /docs/ref/unicode.txt
parent3af5af1a61d73c533aca4fb0ea1f53e4f6300b17 (diff)
Fixed #19508 -- Implemented uri_to_iri as per RFC.
Thanks Loic Bistuer for helping in shaping the patch and Claude Paroz for the review.
Diffstat (limited to 'docs/ref/unicode.txt')
-rw-r--r--docs/ref/unicode.txt31
1 files changed, 24 insertions, 7 deletions
diff --git a/docs/ref/unicode.txt b/docs/ref/unicode.txt
index 90201d2d33..21e8c537c8 100644
--- a/docs/ref/unicode.txt
+++ b/docs/ref/unicode.txt
@@ -173,11 +173,11 @@ URL from an IRI_ -- very loosely speaking, a URI_ that can contain Unicode
characters. Quoting and converting an IRI to URI can be a little tricky, so
Django provides some assistance.
-* The function ``django.utils.encoding.iri_to_uri()`` implements the
- conversion from IRI to URI as required by the specification (:rfc:`3987`).
+* The function :func:`django.utils.encoding.iri_to_uri()` implements the
+ conversion from IRI to URI as required by the specification (:rfc:`3987#section-3.1`).
-* The functions ``django.utils.http.urlquote()`` and
- ``django.utils.http.urlquote_plus()`` are versions of Python's standard
+* The functions :func:`django.utils.http.urlquote()` and
+ :func:`django.utils.http.urlquote_plus()` are versions of Python's standard
``urllib.quote()`` and ``urllib.quote_plus()`` that work with non-ASCII
characters. (The data is converted to UTF-8 prior to encoding.)
@@ -213,12 +213,29 @@ you can construct your IRI without worrying about whether it contains
non-ASCII characters and then, right at the end, call ``iri_to_uri()`` on the
result.
-The ``iri_to_uri()`` function is also idempotent, which means the following is
-always true::
+Similarly, Django provides :func:`django.utils.encoding.uri_to_iri()` which
+implements the conversion from URI to IRI as per :rfc:`3987#section-3.2`.
+It decodes all percent-encodings except those that don't represent a valid
+UTF-8 sequence.
+
+An example to demonstrate::
+
+ >>> uri_to_iri('/%E2%99%A5%E2%99%A5/?utf8=%E2%9C%93')
+ '/♥♥/?utf8=✓'
+ >>> uri_to_iri('%A9helloworld')
+ '%A9helloworld'
+
+In the first example, the UTF-8 characters and reserved characters are
+unquoted. In the second, the percent-encoding remains unchanged because it
+lies outside the valid UTF-8 range.
+
+Both ``iri_to_uri()`` and ``uri_to_iri()`` functions are idempotent, which means the
+following is always true::
iri_to_uri(iri_to_uri(some_string)) = iri_to_uri(some_string)
+ uri_to_iri(uri_to_iri(some_string)) = uri_to_iri(some_string)
-So you can safely call it multiple times on the same IRI without risking
+So you can safely call it multiple times on the same URI/IRI without risking
double-quoting problems.
.. _URI: http://www.ietf.org/rfc/rfc2396.txt