w3m

Unnamed repository; edit this file to name it for gitweb.
git clone https://logand.com/git/w3m.git/
Log | Files | Refs | README

README.m17n (14640B)


      1 
      2 Muntilingualizaion of w3m 
      3                                                               2003/03/08
      4                                                               H. Sakamoto
      5 
      6 Introduction
      7 
      8   I have tried the muntilingualization of w3m (w3m-m17n).
      9   The patch for w3m-0.4.1 is available on the following site.
     10 
     11     http://www2u.biglobe.ne.jp/~hsaka/w3m/index.html#m17n
     12                                           patch/w3m-0.4.1-m17n-20030308.tar.gz
     13                                           patch/README.m17n
     14 
     15   It is a development version. And enough test is not preformed because
     16   I can understand Japanese only. Please use, test, and report bugs.
     17 
     18   Now, w3m-m17n has following functions.
     19 
     20 Supported encoding schemes (character set)
     21 
     22   * Japanese
     23       EUC-JP           - US_ASCII, JIS X 0208, JIS X 0201, JIS X 0212
     24       (EUC-JISX0213)     (JIS X 0213)
     25       ISO-2022-JP      - US_ASCII, JIS X 0208, JIS X 0201, JIS X 0212, etc.
     26       ISO-2022-JP-2    - US_ASCII, JIS X 0208, JIS X 0201, JIS X 0212,
     27                          GB 2312, KS X 1001, ISO 8859-1, ISO 8859-7, etc.
     28       ISO-2022-JP-3    - US_ASCII, JIS X 0208, JIS X 0201, JIS X 0213, etc.
     29       Shift_JIS(CP932) - US_ASCII, JIS X 0208, JIS X 0201, CP932 extension
     30       Shift_JISX0213   - US_ASCII, JIS X 0208, JIS X 0201, JIS X 0213
     31   * Chinese (simplified)
     32       EUC-CN(GB2312) - US_ASCII, GB 2312
     33       ISO-2022-CN    - US_ASCII, GB 2312, CNS-11643-1,..7, etc.
     34       GBK(CP936)     - US_ASCII, GB 2312, GBK
     35       GB18030        - US_ASCII, GB 2312, GBK, GB18030, Unicode,
     36       HZ-GB-2312     - US_ASCII, GB 2312
     37   * Chinese (Taiwan, tradisional)
     38       EUC-TW        - US_ASCII, CNS 11643-1,..16
     39       ISO-2022-CN   - US_ASCII, CNS-11643-1,..7, GB 2312, etc.
     40       Big5          - Big5
     41       HKSCS         - Big5, HKSCS
     42   * Korean
     43       EUC-KR        - US_ASCII, KS X 1001 Wansung
     44       ISO-2022-KR   - US_ASCII, KS X 1001 Wansung, etc.
     45       Johab         - US_ASCII, KS X 1001 Johab
     46       UHC(CP949)    - US_ASCII, KS X 1001 Wansung, UHC
     47   * Vietnamese
     48       TCVN-5712 VN-1, VISCII 1.1, VPS, CP1258
     49   * Thai
     50       TIS-620 (ISO-8859-11), CP874
     51   * Other
     52       US_ASCII, ISO-8859-1 กม 10, 13 กม 15,
     53       KOI8-R, KOI8-U, NeXT, CP437, CP737, CP775, CP850, CP852, CP855, CP856,
     54       CP857, CP860, CP861, CP862, CP863, CP864, CP865, CP866, CP869, CP1006,
     55       CP1250, CP1251, CP1252, CP1253, CP1254, CP1255, CP1256, CP1257
     56   * Unicode (UCS-4)
     57       UTF-8, UTF-7
     58 
     59   NOTE:
     60     * The left part of JIS X 0201 and GB 1988 (Chinese ASCII) are
     61       treated as US_ASCII because they are used in tags of HTML document.
     62       Another variant of US_ASCII is treated without change.
     63     * JIS C 6226(old JIS) is treated as JIS X 0208.
     64     * The sequence '~\n' of HZ is not supported.
     65 
     66 Display
     67 
     68   There are two method for multilingual diplay.
     69 
     70   (1) kterm + ISO-2022-JP/CN/KR
     71 
     72     * kterm can handle JIS X 0213, CNS 11643, if the following patch
     73       is applied.
     74         http://www.st.rim.or.jp/~hanataka/kterm-6.2.0.ext02.patch.gz
     75 
     76     * Specify the fontList for kterm with -fl option or in ~/.Xdefaults.
     77     
     78         -fl "*--16-*-jisx0213.2000-*,\
     79              *--16-*-jisx0212.1990-0,\
     80              *--16-*-ksc5601.1987-0,\
     81              *--16-*-gb2312.1980-0,\
     82              *--16-*-cns11643.1992-*,\
     83              *--16-*-iso8859-*"
     84 
     85       Fonts of JIS X 0213 exist in
     86         http://www.mars.sphere.ne.jp/imamura/jisx0213.html
     87 
     88     * Set the "display_charset" to ISO-2022-JP(or ISO-2022-JP-2, KR, CN),
     89       and "strict_iso2022" to OFF on the option pannel. (see below)
     90 
     91   (2) xterm + UTF-8
     92 
     93     * Use xterm (xterm-140 or later) of XFree86.
     94         http://www.clark.net/pub/dickey/xterm/xterm.html
     95 
     96     * Fonts of Unicode exist in
     97         http://www.cl.cam.ac.uk/~mgk25/ucs-fonts.html
     98         http://openlab.ring.gr.jp/efont/index.html.en
     99 
    100     * Use xterm with -u8 option.
    101       The fonts are specified such as
    102         -fn "*-medium-*--13-*-iso10646-1" \
    103         -fb "*-bold-*--13-*-iso10646-1" \
    104         -fw "*-medium-*-ja-13-*-iso10646-1"
    105 
    106     * Set the "display_charset" to UTF-8.
    107       And, it is better that "pre_conv" is ON.
    108 
    109   (3) mlterm + ISO-2022-JP/KR/CN
    110 
    111     * Homepage
    112         http://mlterm.sourceforge.net/
    113 
    114     * Set encoding of mlterm to ISO-2022-JP/KR/CN or UTF-8.
    115 
    116     * Set the "display_charset" to ISO-2022-JP/KR/CN or UTF-8.
    117 
    118 Command line options
    119 
    120    -I <document charset>
    121    -O <display/output charset>
    122 
    123         j(p):      ISO-2022-JP
    124         j(p)2:     ISO-2022-JP-2
    125         j(p)3:     ISO-2022-JP-3
    126         cn:        ISO-2022-CN
    127         kr:        ISO-2022-KR
    128         e(j):      EUC-JP
    129         ec,g(b):   EUC-CN(GB2312)
    130         et:        EUC-TW
    131         ek:        EUC-KR
    132         s(jis):    Shift_JIS
    133         sjisx0213: Shift_JISX0213
    134         gbk:       GBK
    135         gb18030:   GB18030
    136         h(z):      HZ-GB-2312
    137         b(ig5):    Big5
    138         hk(scs):   HKSCS
    139         jo(hab):   Johab
    140         uhc:       UHC
    141         l?:        ISO-8859-?
    142         t(is):     TIS-620(ISO-8859-11)
    143         tc(vn):    TCVN-5712 VN-1
    144         v(iscii):  VISCII 1.1
    145         vp(s):     VPS
    146         ko(i8r):   KOI8-R
    147         koi8u:     KOI8-U
    148         n(ext):    NeXT
    149         cp???:     CP???
    150         w12??:     CP12??
    151         u(tf8):    UTF-8
    152         u(tf)7:    UTF-7
    153 
    154 Option pannel
    155 
    156    display_charset
    157        Display charset.
    158    document_charset
    159        Defalut Document charset.
    160    auto_detect
    161        Automatic charset detect when loading. (Default: ON)
    162    system_charset
    163        System charset. It is used for configuration files and file name.
    164    follow_locale
    165        System charset follows locale($LANG). (Default: ON)
    166    ext_halfdump
    167        Output with display charset when -halfdump.
    168    search_conv
    169        Adjust search string for document charset. (Default: ON)
    170    use_wide
    171        Use multi column characters. (Default: ON)
    172    use_combining
    173        Use combining characters. (Default: ON)
    174    use_language_tag
    175        Use Unicode language tags. (Default: ON)
    176    ucs_conv
    177        Charset conversion using Unicode map. (Default: ON)
    178    pre_conv
    179        Charset conversion when loading. (Default: OFF)
    180    fix_width
    181        Fix character width when conversion. (Default: ON)
    182        If it is OFF, the rendering may collapse.
    183    use_gb12345_map
    184        Use GB 12345 Unicode map instead of GB 2312's. (Default: OFF)
    185        If it is ON, GB2312 can be converted to Big5, EUC-TW, or EUC-JP.
    186    use_jisx0201
    187        Use JIS X 0201 Roman for ISO-2022-JP. (Default: OFF)
    188    use_jisc6226
    189        Use JIS C 6226:1978 for ISO-2022-JP. (Default: OFF)
    190    use_jisx0201k
    191        Use JIS X 0201 Katakana. (Default: OFF)
    192    use_jisx0212
    193        Use JIS X 0212:1990 (Supplemental Kanji). (Default: OFF)
    194    use_jisx0213
    195        Use JIS X 0213:2000 (2000JIS). (Default: OFF)
    196    strict_iso2022
    197        Strict ISO-2022-JP/KR/CN. (Default: ON)
    198        If it is OFF, all ISO 2022 base character set can be displayed
    199        with ISO-2022-JP/KR/CN.
    200    east_asian_width
    201        Use double width for some Unicode characters. (Default: OFF)
    202        If it is ON, treat East Asian Ambiguous characters as double width.
    203    gb18030_as_ucs
    204        Treat 4 bytes char. of GB18030 as Unicode. (Default: OFF)
    205    simple_preserve_space
    206        Simple Preserve space.
    207        If it is ON, a space is remained in Japanese and some other languages.
    208 
    209    alt_entity
    210        Use alternate expression with ASCII for entities. (Default: ON)
    211        If it is OFF, entities are treated as ISO 8859-1
    212    graphic_char
    213        Use DEC special graphics for border of table and menu.
    214        If it is OFF, ruled line is used with CJK charset or UTF-8.
    215 
    216 Code conversion
    217 
    218   The following special code conversions are supported.
    219     * EUC-JP <-> ISO-2022-JP <-> Shift-JIS
    220     * EUC-CN <-> ISO-2022-CN <-> HZ-GB-2312
    221     * EUC-TW <-> ISO-2022-CN
    222     * EUC-KR <-> ISO-2022-KR <-> Johab (only Symbol and Hanja)
    223 
    224   Other conversions are based on Unicode.
    225 
    226 Change document charset
    227 
    228    Press '=' (show document infomation), and select document charaset.
    229 
    230    If you specify the following keymaps,
    231      keymap C CHARSET
    232      keymap M-c DEFAULT_CHARSET
    233    you can press `C' to change the current document charset,
    234    and `M-c' to change the default document charset.
    235 
    236 Line Editing 
    237 
    238   Input coding system is followed by display coding system.
    239 
    240   NOTE:
    241     * HZ can not be used as input coding system.
    242     * Input with ISO-2022-CN or ISO-2022-KR is perhaps failure, because
    243       SI(\017) and SO(\016) are already assigned as other command key.
    244       (SO is assigned as `next-history'). If you want to use SI and SO,
    245       press C-@(^@). After that, SI, SO, SS2, SS3, LS2, and LS3 of
    246       7bit ISO-2022 are recognited. When you press C-@ again, the default
    247       binding is set.
    248 
    249 Regular expression
    250 
    251    Multilingual regular expression is supported.
    252 
    253 -----------------------------------
    254 Change log
    255 
    256 2003/03/08      w3m-0.4.1-m17n-20030308
    257  * Base on w3m-0.4.1
    258 
    259 2003/02/24      w3m-0.4-m17n-20030224
    260  * Base on w3m-0.4
    261 
    262 2003/02/11      w3m-0.4rc1-m17n-20030211
    263  * Base on w3m-0.4rc1
    264 
    265 2003/02/07      w3m-0.3.2.2-m17n-20030207
    266  * Base on w3m-0.3.2.2+cvs-1.742
    267 
    268 2003/02/01      w3m-0.3.2.2-m17n-20030201
    269  * Base on w3m-0.3.2.2+cvs-1.734
    270 
    271 2003/01/31      w3m-0.3.2.2-m17n-20030131
    272  * Base on w3m-0.3.2.2+cvs-1.732
    273 
    274 2003/01/23      w3m-0.3.2.2-m17n-20030123
    275  * Base on w3m-0.3.2.2+cvs-1.705
    276 
    277 2003/01/22      w3m-0.3.2.2-m17n-20030122
    278  * Base on w3m-0.3.2.2+cvs-1.699
    279 
    280 2003/01/01      w3m-0.3.2.2-m17n-20030101
    281  * Base on w3m-0.3.2.2+cvs-1.655
    282 
    283 2002/12/22      w3m-0.3.2.2-m17n-20021222
    284  * Base on w3m-0.3.2.2+cvs-1.640
    285 
    286 2002/12/19      w3m-0.3.2.2-m17n-20021219
    287  * Base on w3m-0.3.2.2+cvs-1.635
    288 
    289 2002/12/07      w3m-0.3.2.2-m17n-20021207
    290  * Base on w3m-0.3.2.2+cvs-1.599
    291  * Fixed a problem on int != long system
    292 
    293 2002/11/27	w3m-0.3.2.1-m17n-20021127
    294  * Base on w3m-0.3.2.1+cvs-1.562
    295 
    296 2002/11/20	w3m-0.3.2-m17n-20021120
    297  * Base on w3m-0.3.2+cvs-1.538
    298 
    299 2002/11/18
    300  * Added UTF-7 to auto detection of charset.
    301 
    302 2002/11/16	w3m-0.3.2-m17n-20021116
    303  * Base on w3m-0.3.2+cvs-1.526
    304 
    305 2002/11/13	w3m-0.3.2-m17n-20021113
    306  * Base on w3m-0.3.2+cvs-1.506
    307 
    308 2002/11/12	w3m-0.3.2-m17n-20021112
    309  * Base on w3m-0.3.2+cvs-1.498
    310 
    311 2002/11/09	w3m-0.3.2-m17n-20021109
    312  * Base on w3m-0.3.2+cvs-1.490
    313 
    314 2002/11/07	w3m-0.3.2-m17n-20021107
    315  * Base on w3m-0.3.2
    316  * Applied [w3m-dev 03371]
    317 
    318 2002/10/22	w3m-0.3.1-m17n-20021022
    319  * Base on w3m-0.3.1+cvs-1.444
    320 
    321 2002/07/17	w3m-0.3.1-m17n-20020717
    322  * Base on w3m-0.3.1
    323 
    324 2002/05/29	w3m-0.3-m17n-20020529
    325  * Base on w3m-0.3+cvs-1.379.
    326 
    327 2002/03/16	w3m-0.3-m17n-20020316
    328  * Base on w3m-0.3+cvs-1.353.
    329 
    330 2002/03/11	w3m-0.3-m17n-20020311
    331  * Base on w3m-0.3+cvs-1.342.
    332  * Some bug fixes.
    333 
    334 2002/02/16	w3m-0.2.5-m17n-20020216
    335  * Base on w3m-0.2.5+cvs-1.319.
    336  * Added an option "use_wide"
    337 
    338 2002/02/05	w3m-0.2.5-m17n-20020205
    339  * Base on w3m-0.2.5+cvs-1.302.
    340 
    341 2002/02/02	w3m-0.2.5-m17n-20020202
    342  * Base on w3m-0.2.5+cvs-1.291.
    343 
    344 2002/01/31	w3m-0.2.4-m17n-20020131
    345  * Base on w3m-0.2.4+cvs-1.278.
    346 
    347 2002/01/29	w3m-0.2.4-m17n-20020129
    348  * Base on w3m-0.2.4+cvs-1.268.
    349  * Some bug fixes.
    350 
    351 2002/01/28	w3m-0.2.4-m17n-20020128
    352  * Base on w3m-0.2.4+cvs-1.265.
    353 
    354 2002/01/08	w3m-0.2.4-m17n-20020108
    355  * Base on w3m-0.2.4.
    356 
    357 2002/01/07
    358  * Replaced some wc_conv,wc_Str_conv with wc_conv_strict,wc_Str_conv_strict.
    359 
    360 2001/12/31
    361  * Added the conversion between HKSCS and Unicode.
    362  * Changed the conversion table between Big5 and Unicode.
    363  * Deleted the special conversion between Big5 and CNS11643.
    364  * Fixed HKSCS.
    365 
    366 2001/12/30	w3m-0.2.3.2-m17n-20011230
    367  * Base on w3m-0.2.3.2+cvs-1.196.
    368 
    369 2001/12/22	w3m-0.2.3.2-m17n-20011222
    370  * Base on w3m-0.2.3.2.
    371  * [w3m-dev-en 00660] can't compile if INET6 is defined
    372  * [w3m-dev-en 00663] double meanings for WC_N_??? 
    373 
    374 2001/12/21	w3m-0.2.3.1-m17n-20011221
    375  * Base on w3m-0.2.3.1.
    376  * Support of HKSCS, KOI8-U, UTF-7.
    377    The conversion table between HKSCS and Unicode is not yet available.
    378  * Add the conversion between ISO 8859-16 and Unicode.
    379  * Add option 'ext_halfdump'.
    380 
    381 2001/04/14	w3m-(0.2.1)-m17n-0.20
    382  * Support of UTF-7.
    383  * [w3m-dev 01913] ([w3m-dev-en 00452])
    384 
    385 2001/04/12	w3m-(0.2.1)-m17n-0.19
    386  * TILDE of JISX0212, JISX0213 -> FULLWIDTH TILDE of Unicode.
    387  * MICRO SIGN of Unicode -> GREEK SMALL MU of JISX0208.
    388  * [w3m-dev 01892], [w3m-dev 01894], [w3m-dev 01898], [w3m-dev 01902]
    389 
    390 2001/03/31
    391  * Changed implement of <_SYMBOL> again.
    392  * When -dump option, "pre_conv" is false as default.
    393 
    394 2001/03/29
    395  * Support combining characters of TCVN 5712.
    396  * [w3m-dev 01873], [w3m-dev-en 00411].
    397 
    398 2001/03/28
    399  * Setting -suffix="" can be okay in confiugre. (thanks to naddy!)
    400  * Bugfix: when #define USE_SSL and #undef USE_SSL_VERIFY, rc.c
    401    doesn't compile. (thanks to naddy!)
    402  * [w3m-dev 01859].
    403  * Bugfix: 0xA0 is error in Shift-JIS.
    404  * Changed implement of <_SYMBOL> ([w3m-dev 01852]).
    405 
    406 2001/03/24	w3m-(0.2.1)-m17n-0.18
    407  * Base on w3m-0.2.1.
    408  * [w3m-dev 01703], [w3m-dev 01814], [w3m-dev 01823]
    409  * Separated ISO-2022-JP-3 from ISO-2022-JP.
    410  * Improved auto detection.
    411 
    412 2001/03/23
    413  * Base on w3m-0.2.0.
    414 
    415 2001/03/21
    416  * Added functions (CHARSET and DEFAULT_CHARSET).
    417  * Improved document charset detection of frame HTML.
    418 
    419 2001/03/20
    420  * Conversion from FULL WIDTH variant except ASCII to normal character.
    421 
    422 2001/03/18	w3m-(0.1.11-pre-hsaka24)-m17n-0.17
    423  * Based on "[w3m-dev 01779] w3m-0.1.11-pre-hsaka24".
    424  * Prefer JIS X 0213 than JIS X 0212.
    425 
    426 2001/03/14      w3m-(0.1.11-pre-kokb23)-m17n-0.16
    427  * Add the conversion between JIS X 0213 and Unicode Extention B.
    428  * Bugfix: conversion between JIS X 0213 and Unicode.
    429  * Bugfix: treat UHC as Hangul.
    430  * Ignore "search_conv" if "pre_conv" is ON.
    431 
    432 2001/03/09	w3m-(0.1.11-pre-kokb23)-m17n-0.15
    433  * Improvement of wc_wchar_t (mainly for Unicode).
    434  * Some bugfixes for Unicode.
    435  * Ignore "use_gb12345_map" option when output with GBK or GB18030.
    436  * When -dump option, "prev_conv" is always true.
    437  * when -dump or -halfdump option, some proccessing is skiped.
    438  * Get system charset from the environment variable LC_CTYPE -> LANG -> LC_ALL.
    439  * Bugfixes: [w3m-dev 01724], [w3m-dev 01726], [w3m-dev 01752],
    440    [w3m-dev 01753], [w3m-dev 01754]
    441 
    442 2001/03/06	w3m-(0.1.11-pre-kokb23)-m17n-0.14
    443  * Support of Language tag (UTR#7).
    444  * Bugfix: conversion between GB18030, Johab and Unicode.
    445 
    446 2001/03/04	w3m-(0.1.11-pre-kokb23)-m17n-0.13
    447  * Support of GBK(CP936), GB18030, UHC(CP949) !
    448  * Unicode mapping table of GB2312 and GB12345 became compatible with
    449    CP936, GB18030. (Code point: 0xA1A4, 0xA1AA)
    450  * Allow 0xFFFE and 0xFFFF in Uncide (due to compatibility with GB18030).
    451  * Bugfix: code point of NBSP in Unicode.
    452 
    453 2001/03/03	w3m-(0.1.11-pre-kokb23)-m17n-0.12
    454  * I wrote English README.m17n.
    455 
    456 -------------------------------------------
    457 Hironori Sakamoto <hsaka@mth.biglobe.ne.jp>
    458  http://www2u.biglobe.ne.jp/~hsaka/
    459