STORY.html (9428B)
1 <html> 2 <head> 3 <title>History of w3m</title> 4 </head> 5 <body> 6 <h1>History of w3m</h1> 7 <div align=right> 8 1999/2/18<br> 9 1999/3/8 revised<br> 10 1999/6/11 translated into English<br> 11 Akinori Ito<br> 12 aito@fw.ipsj.or.jp 13 </div> 14 <h2>Introduction</h2> 15 W3m is a text-based pager and WWW browser. 16 It is similar application to the famous text-based 17 browser <a href="http://www.lynx.browser.org/">Lynx</a>. 18 However, w3m has several advantages against Lynx. For example, 19 <UL> 20 <LI>W3m can render tables. 21 <LI>W3m can render frame (by converting frame into table). 22 <LI>As w3m is a pager, it can read document from standard input. 23 (I heard Lynx also can display standard-input-given document, like this: 24 <pre> 25 lynx /dev/fd/0 > file 26 </pre> 27 Hmm, it works on Linux. ) 28 <LI>W3m is small. Its stripped binary for Sparc (compiled with 29 gcc -O2, version beta-990217) is only 260kbyte, while binary size 30 of Lynx is beyond 1.8Mbyte. (Actually, lynx it 800K on my i386 system, w3m is 200K + libgc.) 31 </UL> 32 It is true that Lynx is an excellent browser, who have many 33 features w3m doesn't have. For example, 34 <UL> 35 <LI>Lynx can handle cookies. 36 <LI>Lynx has many options. 37 <LI>Lynx is multilingual. (W3m is Japanese-English bilingual) 38 </UL> 39 etc. It is also a great advantage that Lynx has a lot of 40 documentation. 41 <P> 42 <b>I don't intend w3m to be a substitute of any other browsers, 43 including Netscape and Lynx.</b> Why did I wrote w3m? 44 Because I felt inconvenient with conventional browsers 45 to `take a look' at web pages. 46 I am browsing web pages in LAN environment. When I want to take 47 a glance at a web page, I don't want to wait to start up Netscape. 48 Lynx also takes a few seconds to start up (you can get lynx startup time to almost zero when you rm /etc/mailcap). On the other hand, 49 w3m starts immediately with little load to the host machine. 50 After looking at the information using w3m, I use other browser 51 if I want to read the the page in detail. As for me, however, 52 w3m is enough to read most of web pages. 53 54 <h2>The birth of w3m</h2> 55 <P> 56 w3m was derived from a pager named `fm'. Fm was written before 57 1991 (I don't remember the exact date) when WWW was not popular. 58 At that time, the word `browser' meant a file browser like 59 `more' or `less'. 60 <P> 61 I wrote fm to debug a program for my research. To trace the status 62 of the program, it dumped megabytes of values of variables into a file, 63 and I debugged it by checking the dumped file. The program dumped 64 information at a certain time in one line, which made the dumped line 65 several hundred characters long. When I looked the file using `more' or 66 `less', one line was folded into several lines and it was very hard 67 to read it. Therefore, I wrote fm, which didn't fold a line. Fm displayed 68 one logical line as one physical line. When seeing the hidden 69 part of a line, fm shifted entire screen. As I used 80x24 terminal at that 70 time, fm was very useful for the debugging. 71 <P> 72 Several years later, I got to know WWW and began to use it. 73 I used XMosaic and Chimera. I liked Chimera because it was light. 74 As I was interested in the mechanism of WWW, I learned HTML and 75 HTTP, and I felt it simpler than I expected. The earlier version 76 of HTTP was very similar to Gopher protocol. HTML 2.0 was 77 simple enough to render. All I have to do seemed to be line folding 78 and itemized display. Then I made a little modification to fm 79 and made a web browser. It was the first version of w3m. 80 The name `w3m' was an abbreviation of Japanese phrase `WWW wo miru', 81 which means `see WWW'. It was an inheritance from `fm', which 82 was an abbreviation of `File wo miru'. The first version of w3m 83 was released at the beginning of 1995. 84 85 <h2>Death and rebirth of w3m</h2> 86 <p> 87 I had used w3m as a pager to read files, E-mails and online manuals. 88 It was a substitute of less. Sometimes I used w3m as a web browser, 89 but there were many pages w3m couldn't display correctly, most of 90 which used table for page layout. Once I tried to implement table 91 renderer, but I gave up because it seemed to be too difficult for me. 92 <P> 93 It was 1998 when I tried to modify w3m again. There were two reasons. 94 The first is that I had some time to do it. I stayed Boston University 95 as a visiting researcher at that time. The second reason is that 96 I wanted to use table in my personal web page. I had written research 97 log using HTML, and I wanted to write a table in it. At first I used 98 <pre>..</pre> to describe table, but it was not cool at all. 99 One day I used <table> tag, which made me to use Netscape to 100 read the research log. Then I decided to implement a table renderer 101 into w3m. 102 <P> 103 I didn't intend to write a perfect table renderer because tables 104 I used was not very complicated. However, incomplete table rendering 105 made the display of table-layout pages horrible. I realized that 106 it required almost-perfect table renderer 107 to do well both in `rendering (real) table' and `fine display of 108 table-layout page.' It was a thorn path. 109 <P> 110 After taking several months, I finished `fair' table renderer. 111 Then I implemented form into w3m. Finally, w3m was reborn as a 112 practical web browser. 113 114 <h2>Table rendering algorithm in w3m</h2> 115 116 HTML table rendering is difficult. Tabular environment 117 of LaTeX is not very difficult, which makes the width of a column 118 either a specified value or the maximum width to put items into it. 119 On the other hand, HTML table renderer has to decide 120 the width of a column so that the entire table can fit into the 121 display appropriately, and fold the contents of the table according 122 to the column width. Inappropriate column width decision makes 123 the table ugly. Moreover, table can be nested, which makes the algorithm 124 more complicated. 125 126 <OL> 127 <LI>First, calculate the maximum and minimum width of each column. 128 The maximum width is the width required to display the column 129 without folding the contents. Generally, it is the length of 130 paragraph delimited by <BR> or <P>. 131 The minimum width is the lower limit to display the contents. 132 If the column contains the word `internationalization', the minimum 133 width will be 20. If the column contains 134 <pre>..</pre>, the maximum width of the preformatted 135 text will be the minimum width of the column. 136 137 <LI>If the width of the column is specified by WIDTH attribute, 138 fix the column width using that value. If the specified width is 139 smaller than the minimum width of the column, fix the column width 140 to the minimum width. 141 142 <LI>Calculate the sum of the maximum width (or fixed width) of 143 each column and check if the sum exceeds the screen width. 144 If it is smaller than screen width, these values are used for 145 width of each column. 146 147 <LI>If the sum is larger than the screen width, determine the widths 148 of each column according to the following steps. 149 <OL> 150 <LI>Let W be the screen width subtracted by the sum of widths of 151 fixed-width columns. 152 <LI>Distribute W into the columns whose width are not decided, 153 in proportion to the logarithm of the maximum width of each column. 154 <li>If the distributed width of a column is smaller than the minimum width, 155 then fix the width of the column to the minimum width, and 156 do the distribution again. 157 </OL> 158 </OL> 159 160 In this process, distributed width is proportion to logarithm of 161 maximum width, but I am not sure that this heuristic is the best. 162 It can be, for example, square root of the maximum width. 163 <P> 164 The algorithm above assumes that the screen width is known. 165 But it is not true for nested table. According the algorithm above, 166 the column width of the outer table have to be known to render 167 the inner table, while the total width of the inner table have to 168 be known to determine the column width of the outer table. 169 If WIDTH attribute exists there are no problems. Otherwise, w3m 170 assumes that the inner table is 0.8 times as wide as the outer 171 table. It works fine, but if there are two tables side by side in an outer 172 table, the width of the outer table always exceeds the screen width. 173 To render this kind of table correctly, one have to render the table once, 174 check the width of outmost table, and then render the entire table again. 175 Netscape might employ this kind of algorithm. 176 177 <h2>Libraries</h2> 178 179 w3m uses 180 <a href="http://reality.sgi.com/boehm/gc.html">Boehm GC</a> 181 library. This library was written by H. Boehm and A. Demers. 182 I could distribute w3m without this library because one can 183 get the library separately, but I decided to contain it in the 184 w3m distribution for the convenience of an installer. 185 W3m doesn't use libwww. 186 <P> 187 Boehm GC is a garbage collector for C and C++. I began to use this 188 library when I implemented table, and it was great. I couldn't 189 implement table and form without this library. 190 <P> 191 Older version than beta-990304 used 192 <a href="http://home.cern.ch/~orel/libftp/libftp/libftp.html">LIBFTP</a> 193 because I felt tired of writing codes to handle FTP protocol. 194 But I rewrote the FTP code by myself to make w3m completely free. 195 It made w3m slightly smaller. 196 <P> 197 By the way, w3m doesn't use UNIX standard regexp library and curses library. 198 It is because I want to use Japanese. When I wrote fm, there were 199 no free regexp/curses libraries that can treat Japanese. Now both libraries 200 are available and they looks faster than w3m code. 201 202 <h2>Future work</h2> 203 204 ...Nothing. As w3m's virtues are its small size and rendering speed, 205 adding more features might lose these advantages. On the other hand, 206 w3m is still known to have many bugs, and I will continue fixing them. 207 208 </body> 209 </html>