src/pyams_utils/doctests/unicode.txt
changeset 435 4504a27af426
parent 434 7f256d281e84
child 436 f7154a8ec9eb
equal deleted inserted replaced
434:7f256d281e84 435:4504a27af426
     1 
       
     2 Unicode functions
       
     3 -----------------
       
     4 
       
     5 While working with extended characters sets containing accentuated characters, it's necessary to
       
     6 convert strings to UTF8 so that they can be used without any conversion problem.
       
     7 
       
     8     >>> from pyams_utils import unicode
       
     9 
       
    10 'translate_string' is a utility function which can be used, for example, to generate an object's id
       
    11 without space and with accentuated characters converted to their unaccentuated version:
       
    12 
       
    13     >>> sample = 'Mon titre accentué'
       
    14     >>> unicode.translate_string(sample)
       
    15     'mon titre accentue'
       
    16 
       
    17 Results are lower-cased by default ; this can be avoided by setting the 'force_lower' argument
       
    18 to False:
       
    19 
       
    20     >>> unicode.translate_string(sample, force_lower=False)
       
    21     'Mon titre accentue'
       
    22     >>> unicode.translate_string(sample, force_lower=True, spaces='-')
       
    23     'mon-titre-accentue'
       
    24 
       
    25     >>> sample = 'Texte accentué avec "ponctuation" !'
       
    26     >>> unicode.translate_string(sample, force_lower=True, spaces=' ')
       
    27     'texte accentue avec ponctuation'
       
    28     >>> unicode.translate_string(sample, force_lower=True, remove_punctuation=False, spaces=' ')
       
    29     'texte accentue avec "ponctuation" !'
       
    30     >>> unicode.translate_string(sample, force_lower=True, remove_punctuation=False, spaces='-')
       
    31     'texte-accentue-avec-"ponctuation"-!'
       
    32     >>> unicode.translate_string(sample, force_lower=True, remove_punctuation=True, spaces='-')
       
    33     'texte-accentue-avec-ponctuation'
       
    34     >>> unicode.translate_string(sample, force_lower=True, remove_punctuation=True, spaces=' ', keep_chars='!')
       
    35     'texte accentue avec ponctuation !'
       
    36 
       
    37 
       
    38 If input string can contain 'slashes' (/) or 'backslashes' (\), they are normally removed ;
       
    39 by using the 'escape_slashes' parameter, the input string is splitted and only the last element is
       
    40 returned ; this is handy to handle filenames on Windows platform:
       
    41 
       
    42     >>> sample = 'Autre / chaîne / accentuée'
       
    43     >>> unicode.translate_string(sample)
       
    44     'autre chaine accentuee'
       
    45     >>> unicode.translate_string(sample, escape_slashes=True)
       
    46     'accentuee'
       
    47     >>> sample = 'C:\\Program Files\\My Application\\test.txt'
       
    48     >>> unicode.translate_string(sample)
       
    49     'cprogram filesmy applicationtest.txt'
       
    50     >>> unicode.translate_string(sample, escape_slashes=True)
       
    51     'test.txt'
       
    52 
       
    53 To remove remaining spaces or convert them to another character, you can use the "spaces" parameter
       
    54 which can contain any string to be used instead of initial spaces:
       
    55 
       
    56     >>> sample = 'C:\\Program Files\\My Application\\test.txt'
       
    57     >>> unicode.translate_string(sample, spaces=' ')
       
    58     'cprogram filesmy applicationtest.txt'
       
    59     >>> unicode.translate_string(sample, spaces='-')
       
    60     'cprogram-filesmy-applicationtest.txt'
       
    61 
       
    62 Spaces replacement is made in the last step, so using it with "escape_slashes" parameter only affects
       
    63 the final result:
       
    64 
       
    65     >>> unicode.translate_string(sample, escape_slashes=True, spaces='-')
       
    66     'test.txt'
       
    67 
       
    68 Unicode module also provides encoding and decoding functions:
       
    69 
       
    70     >>> var = b'Cha\xeene accentu\xe9e'
       
    71     >>> unicode.decode(var, 'latin1')
       
    72     'Chaîne accentuée'
       
    73     >>> unicode.encode(unicode.decode(var, 'latin1'), 'latin1') == var
       
    74     True
       
    75 
       
    76     >>> utf = 'Chaîne accentuée'
       
    77     >>> unicode.encode(utf, 'latin1')
       
    78     b'Cha\xeene accentu\xe9e'
       
    79     >>> unicode.decode(unicode.encode(utf, 'latin1'), 'latin1') == utf
       
    80     True