README.m17n (14640B)
1 2 Muntilingualizaion of w3m 3 2003/03/08 4 H. Sakamoto 5 6 Introduction 7 8 I have tried the muntilingualization of w3m (w3m-m17n). 9 The patch for w3m-0.4.1 is available on the following site. 10 11 http://www2u.biglobe.ne.jp/~hsaka/w3m/index.html#m17n 12 patch/w3m-0.4.1-m17n-20030308.tar.gz 13 patch/README.m17n 14 15 It is a development version. And enough test is not preformed because 16 I can understand Japanese only. Please use, test, and report bugs. 17 18 Now, w3m-m17n has following functions. 19 20 Supported encoding schemes (character set) 21 22 * Japanese 23 EUC-JP - US_ASCII, JIS X 0208, JIS X 0201, JIS X 0212 24 (EUC-JISX0213) (JIS X 0213) 25 ISO-2022-JP - US_ASCII, JIS X 0208, JIS X 0201, JIS X 0212, etc. 26 ISO-2022-JP-2 - US_ASCII, JIS X 0208, JIS X 0201, JIS X 0212, 27 GB 2312, KS X 1001, ISO 8859-1, ISO 8859-7, etc. 28 ISO-2022-JP-3 - US_ASCII, JIS X 0208, JIS X 0201, JIS X 0213, etc. 29 Shift_JIS(CP932) - US_ASCII, JIS X 0208, JIS X 0201, CP932 extension 30 Shift_JISX0213 - US_ASCII, JIS X 0208, JIS X 0201, JIS X 0213 31 * Chinese (simplified) 32 EUC-CN(GB2312) - US_ASCII, GB 2312 33 ISO-2022-CN - US_ASCII, GB 2312, CNS-11643-1,..7, etc. 34 GBK(CP936) - US_ASCII, GB 2312, GBK 35 GB18030 - US_ASCII, GB 2312, GBK, GB18030, Unicode, 36 HZ-GB-2312 - US_ASCII, GB 2312 37 * Chinese (Taiwan, tradisional) 38 EUC-TW - US_ASCII, CNS 11643-1,..16 39 ISO-2022-CN - US_ASCII, CNS-11643-1,..7, GB 2312, etc. 40 Big5 - Big5 41 HKSCS - Big5, HKSCS 42 * Korean 43 EUC-KR - US_ASCII, KS X 1001 Wansung 44 ISO-2022-KR - US_ASCII, KS X 1001 Wansung, etc. 45 Johab - US_ASCII, KS X 1001 Johab 46 UHC(CP949) - US_ASCII, KS X 1001 Wansung, UHC 47 * Vietnamese 48 TCVN-5712 VN-1, VISCII 1.1, VPS, CP1258 49 * Thai 50 TIS-620 (ISO-8859-11), CP874 51 * Other 52 US_ASCII, ISO-8859-1 กม 10, 13 กม 15, 53 KOI8-R, KOI8-U, NeXT, CP437, CP737, CP775, CP850, CP852, CP855, CP856, 54 CP857, CP860, CP861, CP862, CP863, CP864, CP865, CP866, CP869, CP1006, 55 CP1250, CP1251, CP1252, CP1253, CP1254, CP1255, CP1256, CP1257 56 * Unicode (UCS-4) 57 UTF-8, UTF-7 58 59 NOTE: 60 * The left part of JIS X 0201 and GB 1988 (Chinese ASCII) are 61 treated as US_ASCII because they are used in tags of HTML document. 62 Another variant of US_ASCII is treated without change. 63 * JIS C 6226(old JIS) is treated as JIS X 0208. 64 * The sequence '~\n' of HZ is not supported. 65 66 Display 67 68 There are two method for multilingual diplay. 69 70 (1) kterm + ISO-2022-JP/CN/KR 71 72 * kterm can handle JIS X 0213, CNS 11643, if the following patch 73 is applied. 74 http://www.st.rim.or.jp/~hanataka/kterm-6.2.0.ext02.patch.gz 75 76 * Specify the fontList for kterm with -fl option or in ~/.Xdefaults. 77 78 -fl "*--16-*-jisx0213.2000-*,\ 79 *--16-*-jisx0212.1990-0,\ 80 *--16-*-ksc5601.1987-0,\ 81 *--16-*-gb2312.1980-0,\ 82 *--16-*-cns11643.1992-*,\ 83 *--16-*-iso8859-*" 84 85 Fonts of JIS X 0213 exist in 86 http://www.mars.sphere.ne.jp/imamura/jisx0213.html 87 88 * Set the "display_charset" to ISO-2022-JP(or ISO-2022-JP-2, KR, CN), 89 and "strict_iso2022" to OFF on the option pannel. (see below) 90 91 (2) xterm + UTF-8 92 93 * Use xterm (xterm-140 or later) of XFree86. 94 http://www.clark.net/pub/dickey/xterm/xterm.html 95 96 * Fonts of Unicode exist in 97 http://www.cl.cam.ac.uk/~mgk25/ucs-fonts.html 98 http://openlab.ring.gr.jp/efont/index.html.en 99 100 * Use xterm with -u8 option. 101 The fonts are specified such as 102 -fn "*-medium-*--13-*-iso10646-1" \ 103 -fb "*-bold-*--13-*-iso10646-1" \ 104 -fw "*-medium-*-ja-13-*-iso10646-1" 105 106 * Set the "display_charset" to UTF-8. 107 And, it is better that "pre_conv" is ON. 108 109 (3) mlterm + ISO-2022-JP/KR/CN 110 111 * Homepage 112 http://mlterm.sourceforge.net/ 113 114 * Set encoding of mlterm to ISO-2022-JP/KR/CN or UTF-8. 115 116 * Set the "display_charset" to ISO-2022-JP/KR/CN or UTF-8. 117 118 Command line options 119 120 -I <document charset> 121 -O <display/output charset> 122 123 j(p): ISO-2022-JP 124 j(p)2: ISO-2022-JP-2 125 j(p)3: ISO-2022-JP-3 126 cn: ISO-2022-CN 127 kr: ISO-2022-KR 128 e(j): EUC-JP 129 ec,g(b): EUC-CN(GB2312) 130 et: EUC-TW 131 ek: EUC-KR 132 s(jis): Shift_JIS 133 sjisx0213: Shift_JISX0213 134 gbk: GBK 135 gb18030: GB18030 136 h(z): HZ-GB-2312 137 b(ig5): Big5 138 hk(scs): HKSCS 139 jo(hab): Johab 140 uhc: UHC 141 l?: ISO-8859-? 142 t(is): TIS-620(ISO-8859-11) 143 tc(vn): TCVN-5712 VN-1 144 v(iscii): VISCII 1.1 145 vp(s): VPS 146 ko(i8r): KOI8-R 147 koi8u: KOI8-U 148 n(ext): NeXT 149 cp???: CP??? 150 w12??: CP12?? 151 u(tf8): UTF-8 152 u(tf)7: UTF-7 153 154 Option pannel 155 156 display_charset 157 Display charset. 158 document_charset 159 Defalut Document charset. 160 auto_detect 161 Automatic charset detect when loading. (Default: ON) 162 system_charset 163 System charset. It is used for configuration files and file name. 164 follow_locale 165 System charset follows locale($LANG). (Default: ON) 166 ext_halfdump 167 Output with display charset when -halfdump. 168 search_conv 169 Adjust search string for document charset. (Default: ON) 170 use_wide 171 Use multi column characters. (Default: ON) 172 use_combining 173 Use combining characters. (Default: ON) 174 use_language_tag 175 Use Unicode language tags. (Default: ON) 176 ucs_conv 177 Charset conversion using Unicode map. (Default: ON) 178 pre_conv 179 Charset conversion when loading. (Default: OFF) 180 fix_width 181 Fix character width when conversion. (Default: ON) 182 If it is OFF, the rendering may collapse. 183 use_gb12345_map 184 Use GB 12345 Unicode map instead of GB 2312's. (Default: OFF) 185 If it is ON, GB2312 can be converted to Big5, EUC-TW, or EUC-JP. 186 use_jisx0201 187 Use JIS X 0201 Roman for ISO-2022-JP. (Default: OFF) 188 use_jisc6226 189 Use JIS C 6226:1978 for ISO-2022-JP. (Default: OFF) 190 use_jisx0201k 191 Use JIS X 0201 Katakana. (Default: OFF) 192 use_jisx0212 193 Use JIS X 0212:1990 (Supplemental Kanji). (Default: OFF) 194 use_jisx0213 195 Use JIS X 0213:2000 (2000JIS). (Default: OFF) 196 strict_iso2022 197 Strict ISO-2022-JP/KR/CN. (Default: ON) 198 If it is OFF, all ISO 2022 base character set can be displayed 199 with ISO-2022-JP/KR/CN. 200 east_asian_width 201 Use double width for some Unicode characters. (Default: OFF) 202 If it is ON, treat East Asian Ambiguous characters as double width. 203 gb18030_as_ucs 204 Treat 4 bytes char. of GB18030 as Unicode. (Default: OFF) 205 simple_preserve_space 206 Simple Preserve space. 207 If it is ON, a space is remained in Japanese and some other languages. 208 209 alt_entity 210 Use alternate expression with ASCII for entities. (Default: ON) 211 If it is OFF, entities are treated as ISO 8859-1 212 graphic_char 213 Use DEC special graphics for border of table and menu. 214 If it is OFF, ruled line is used with CJK charset or UTF-8. 215 216 Code conversion 217 218 The following special code conversions are supported. 219 * EUC-JP <-> ISO-2022-JP <-> Shift-JIS 220 * EUC-CN <-> ISO-2022-CN <-> HZ-GB-2312 221 * EUC-TW <-> ISO-2022-CN 222 * EUC-KR <-> ISO-2022-KR <-> Johab (only Symbol and Hanja) 223 224 Other conversions are based on Unicode. 225 226 Change document charset 227 228 Press '=' (show document infomation), and select document charaset. 229 230 If you specify the following keymaps, 231 keymap C CHARSET 232 keymap M-c DEFAULT_CHARSET 233 you can press `C' to change the current document charset, 234 and `M-c' to change the default document charset. 235 236 Line Editing 237 238 Input coding system is followed by display coding system. 239 240 NOTE: 241 * HZ can not be used as input coding system. 242 * Input with ISO-2022-CN or ISO-2022-KR is perhaps failure, because 243 SI(\017) and SO(\016) are already assigned as other command key. 244 (SO is assigned as `next-history'). If you want to use SI and SO, 245 press C-@(^@). After that, SI, SO, SS2, SS3, LS2, and LS3 of 246 7bit ISO-2022 are recognited. When you press C-@ again, the default 247 binding is set. 248 249 Regular expression 250 251 Multilingual regular expression is supported. 252 253 ----------------------------------- 254 Change log 255 256 2003/03/08 w3m-0.4.1-m17n-20030308 257 * Base on w3m-0.4.1 258 259 2003/02/24 w3m-0.4-m17n-20030224 260 * Base on w3m-0.4 261 262 2003/02/11 w3m-0.4rc1-m17n-20030211 263 * Base on w3m-0.4rc1 264 265 2003/02/07 w3m-0.3.2.2-m17n-20030207 266 * Base on w3m-0.3.2.2+cvs-1.742 267 268 2003/02/01 w3m-0.3.2.2-m17n-20030201 269 * Base on w3m-0.3.2.2+cvs-1.734 270 271 2003/01/31 w3m-0.3.2.2-m17n-20030131 272 * Base on w3m-0.3.2.2+cvs-1.732 273 274 2003/01/23 w3m-0.3.2.2-m17n-20030123 275 * Base on w3m-0.3.2.2+cvs-1.705 276 277 2003/01/22 w3m-0.3.2.2-m17n-20030122 278 * Base on w3m-0.3.2.2+cvs-1.699 279 280 2003/01/01 w3m-0.3.2.2-m17n-20030101 281 * Base on w3m-0.3.2.2+cvs-1.655 282 283 2002/12/22 w3m-0.3.2.2-m17n-20021222 284 * Base on w3m-0.3.2.2+cvs-1.640 285 286 2002/12/19 w3m-0.3.2.2-m17n-20021219 287 * Base on w3m-0.3.2.2+cvs-1.635 288 289 2002/12/07 w3m-0.3.2.2-m17n-20021207 290 * Base on w3m-0.3.2.2+cvs-1.599 291 * Fixed a problem on int != long system 292 293 2002/11/27 w3m-0.3.2.1-m17n-20021127 294 * Base on w3m-0.3.2.1+cvs-1.562 295 296 2002/11/20 w3m-0.3.2-m17n-20021120 297 * Base on w3m-0.3.2+cvs-1.538 298 299 2002/11/18 300 * Added UTF-7 to auto detection of charset. 301 302 2002/11/16 w3m-0.3.2-m17n-20021116 303 * Base on w3m-0.3.2+cvs-1.526 304 305 2002/11/13 w3m-0.3.2-m17n-20021113 306 * Base on w3m-0.3.2+cvs-1.506 307 308 2002/11/12 w3m-0.3.2-m17n-20021112 309 * Base on w3m-0.3.2+cvs-1.498 310 311 2002/11/09 w3m-0.3.2-m17n-20021109 312 * Base on w3m-0.3.2+cvs-1.490 313 314 2002/11/07 w3m-0.3.2-m17n-20021107 315 * Base on w3m-0.3.2 316 * Applied [w3m-dev 03371] 317 318 2002/10/22 w3m-0.3.1-m17n-20021022 319 * Base on w3m-0.3.1+cvs-1.444 320 321 2002/07/17 w3m-0.3.1-m17n-20020717 322 * Base on w3m-0.3.1 323 324 2002/05/29 w3m-0.3-m17n-20020529 325 * Base on w3m-0.3+cvs-1.379. 326 327 2002/03/16 w3m-0.3-m17n-20020316 328 * Base on w3m-0.3+cvs-1.353. 329 330 2002/03/11 w3m-0.3-m17n-20020311 331 * Base on w3m-0.3+cvs-1.342. 332 * Some bug fixes. 333 334 2002/02/16 w3m-0.2.5-m17n-20020216 335 * Base on w3m-0.2.5+cvs-1.319. 336 * Added an option "use_wide" 337 338 2002/02/05 w3m-0.2.5-m17n-20020205 339 * Base on w3m-0.2.5+cvs-1.302. 340 341 2002/02/02 w3m-0.2.5-m17n-20020202 342 * Base on w3m-0.2.5+cvs-1.291. 343 344 2002/01/31 w3m-0.2.4-m17n-20020131 345 * Base on w3m-0.2.4+cvs-1.278. 346 347 2002/01/29 w3m-0.2.4-m17n-20020129 348 * Base on w3m-0.2.4+cvs-1.268. 349 * Some bug fixes. 350 351 2002/01/28 w3m-0.2.4-m17n-20020128 352 * Base on w3m-0.2.4+cvs-1.265. 353 354 2002/01/08 w3m-0.2.4-m17n-20020108 355 * Base on w3m-0.2.4. 356 357 2002/01/07 358 * Replaced some wc_conv,wc_Str_conv with wc_conv_strict,wc_Str_conv_strict. 359 360 2001/12/31 361 * Added the conversion between HKSCS and Unicode. 362 * Changed the conversion table between Big5 and Unicode. 363 * Deleted the special conversion between Big5 and CNS11643. 364 * Fixed HKSCS. 365 366 2001/12/30 w3m-0.2.3.2-m17n-20011230 367 * Base on w3m-0.2.3.2+cvs-1.196. 368 369 2001/12/22 w3m-0.2.3.2-m17n-20011222 370 * Base on w3m-0.2.3.2. 371 * [w3m-dev-en 00660] can't compile if INET6 is defined 372 * [w3m-dev-en 00663] double meanings for WC_N_??? 373 374 2001/12/21 w3m-0.2.3.1-m17n-20011221 375 * Base on w3m-0.2.3.1. 376 * Support of HKSCS, KOI8-U, UTF-7. 377 The conversion table between HKSCS and Unicode is not yet available. 378 * Add the conversion between ISO 8859-16 and Unicode. 379 * Add option 'ext_halfdump'. 380 381 2001/04/14 w3m-(0.2.1)-m17n-0.20 382 * Support of UTF-7. 383 * [w3m-dev 01913] ([w3m-dev-en 00452]) 384 385 2001/04/12 w3m-(0.2.1)-m17n-0.19 386 * TILDE of JISX0212, JISX0213 -> FULLWIDTH TILDE of Unicode. 387 * MICRO SIGN of Unicode -> GREEK SMALL MU of JISX0208. 388 * [w3m-dev 01892], [w3m-dev 01894], [w3m-dev 01898], [w3m-dev 01902] 389 390 2001/03/31 391 * Changed implement of <_SYMBOL> again. 392 * When -dump option, "pre_conv" is false as default. 393 394 2001/03/29 395 * Support combining characters of TCVN 5712. 396 * [w3m-dev 01873], [w3m-dev-en 00411]. 397 398 2001/03/28 399 * Setting -suffix="" can be okay in confiugre. (thanks to naddy!) 400 * Bugfix: when #define USE_SSL and #undef USE_SSL_VERIFY, rc.c 401 doesn't compile. (thanks to naddy!) 402 * [w3m-dev 01859]. 403 * Bugfix: 0xA0 is error in Shift-JIS. 404 * Changed implement of <_SYMBOL> ([w3m-dev 01852]). 405 406 2001/03/24 w3m-(0.2.1)-m17n-0.18 407 * Base on w3m-0.2.1. 408 * [w3m-dev 01703], [w3m-dev 01814], [w3m-dev 01823] 409 * Separated ISO-2022-JP-3 from ISO-2022-JP. 410 * Improved auto detection. 411 412 2001/03/23 413 * Base on w3m-0.2.0. 414 415 2001/03/21 416 * Added functions (CHARSET and DEFAULT_CHARSET). 417 * Improved document charset detection of frame HTML. 418 419 2001/03/20 420 * Conversion from FULL WIDTH variant except ASCII to normal character. 421 422 2001/03/18 w3m-(0.1.11-pre-hsaka24)-m17n-0.17 423 * Based on "[w3m-dev 01779] w3m-0.1.11-pre-hsaka24". 424 * Prefer JIS X 0213 than JIS X 0212. 425 426 2001/03/14 w3m-(0.1.11-pre-kokb23)-m17n-0.16 427 * Add the conversion between JIS X 0213 and Unicode Extention B. 428 * Bugfix: conversion between JIS X 0213 and Unicode. 429 * Bugfix: treat UHC as Hangul. 430 * Ignore "search_conv" if "pre_conv" is ON. 431 432 2001/03/09 w3m-(0.1.11-pre-kokb23)-m17n-0.15 433 * Improvement of wc_wchar_t (mainly for Unicode). 434 * Some bugfixes for Unicode. 435 * Ignore "use_gb12345_map" option when output with GBK or GB18030. 436 * When -dump option, "prev_conv" is always true. 437 * when -dump or -halfdump option, some proccessing is skiped. 438 * Get system charset from the environment variable LC_CTYPE -> LANG -> LC_ALL. 439 * Bugfixes: [w3m-dev 01724], [w3m-dev 01726], [w3m-dev 01752], 440 [w3m-dev 01753], [w3m-dev 01754] 441 442 2001/03/06 w3m-(0.1.11-pre-kokb23)-m17n-0.14 443 * Support of Language tag (UTR#7). 444 * Bugfix: conversion between GB18030, Johab and Unicode. 445 446 2001/03/04 w3m-(0.1.11-pre-kokb23)-m17n-0.13 447 * Support of GBK(CP936), GB18030, UHC(CP949) ! 448 * Unicode mapping table of GB2312 and GB12345 became compatible with 449 CP936, GB18030. (Code point: 0xA1A4, 0xA1AA) 450 * Allow 0xFFFE and 0xFFFF in Uncide (due to compatibility with GB18030). 451 * Bugfix: code point of NBSP in Unicode. 452 453 2001/03/03 w3m-(0.1.11-pre-kokb23)-m17n-0.12 454 * I wrote English README.m17n. 455 456 ------------------------------------------- 457 Hironori Sakamoto <hsaka@mth.biglobe.ne.jp> 458 http://www2u.biglobe.ne.jp/~hsaka/ 459