engelbert gruber | 1 Sep 2008 16:26
Picon
Gravatar

Patch [ 1878977 ] make_id(): deaccent characters

hello,

any objections to apply this patch

--- docutils/nodes.py   (revision 5503)
+++ docutils/nodes.py   (working copy)
 <at>  <at>  -1766,13 +1766,183  <at>  <at> 
    .. _HTML 4.01 spec: http://www.w3.org/TR/html401
    .. _CSS1 spec: http://www.w3.org/TR/REC-CSS1
    """
-    id = _non_id_chars.sub('-', ' '.join(string.lower().split()))
+    if isinstance(string, unicode):
+        id = string.lower().translate(_non_id_translate)
+    else:
+        try:
+            id = string.decode().lower().translate(_non_id_translate)
+        except UnicodeDecodeError:
+            id = string.lower()
+    id = _non_id_chars.sub('-', ' '.join(id.split()))
    id = _non_id_at_ends.sub('', id)
    return str(id)

 _non_id_chars = re.compile('[^a-z0-9]+')
 _non_id_at_ends = re.compile('^[-0-9]+|-+$')
+_non_id_translate = {
+    # From Latin-1 Supplement
+    0x00df: u'ss',      # sharp s
+    0x00e0: ord('a'),   # a with grave
and 180 other mappings

is the test ``isinstance(string, unicode)`` required ?

tests pass and i would extend test_nodes.test_make_id a little

cheers

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/

Gmane