index.org (9733B)
1 #+options: creator:nil timestamp:nil author:nil 2 3 w3mail 4 5 w3mail is a program for sending web pages via email while filtering 6 out unwanted content. 7 8 * Introduction 9 10 There are many ways of browsing the Web. In many cases, I prefer 11 using my email reader for managing the web pages I read. 12 13 In addition to removing distractions like advertisements and excessive 14 navigational noise, there is no excessive amount of open tabs in my 15 web browser, smaller memory and CPU usage, better readibility and 16 powerful management of unread and read web pages, their marking, 17 expiry and deletion. It's asynchronous and the actual reading takes 18 minimum keystrokes and no aiming with mouse at all. 19 20 #+begin_quote 21 To look at page I send mail to a demon which runs wget and mails the 22 page back to me. It is very efficient use of my time, but it is slow 23 in real time. -- [[http://lwn.net/Articles/262570/][rms]] 24 #+end_quote 25 26 I've used various conventional web browsers like [[http://www.mozilla.com/firefox/][Firefox]] and also some 27 unconventional ones like [[http://emacs-w3m.namazu.org/][emacs-w3m]], [[http://emacs-w3m.namazu.org/info/emacs-w3m_69.html][emacs-w3m/shimbun]] and [[http://surf.suckless.org/][surf]] but 28 none of them seems the right choice. There seem to be two kinds of 29 web pages I look at: 30 31 1. quickly disposable to skim over in search for a particular (often 32 brief) information 33 34 2. with "deep" valuable information 35 36 For the disposable browsing, Firefox or emacs-w3m works well. w3mail 37 tries to fill the gap in the second case, for browsing web pages that 38 carry non-trivial information of long-term value or those that require 39 more focus and time to read. 40 41 * Dependencies 42 43 Linux only. 44 45 ** Build dependencies 46 47 - git 48 - gcc 49 - make 50 51 If you are using Ubuntu, you can install these programs by running: 52 53 : $ sudo apt-get install git-core gcc make 54 55 ** Runtime dependencies 56 57 *** Required runtime dependencies 58 59 - wget 60 - md5sum (coreutils) 61 62 If you are using Ubuntu, you can install these programs by running: 63 64 : $ sudo apt-get install wget coreutils 65 66 *** Optional runtime dependencies 67 68 - sendmail (mailutils) 69 - xmlstarlet 70 - tidy 71 72 If you are using Ubuntu, you can install these programs by running: 73 74 : $ sudo apt-get install mailutils 75 : $ sudo apt-get install xmlstarlet 76 : $ sudo apt-get install tidy 77 78 * Download 79 80 Clone the git repository: 81 82 : $ git clone http://logand.com/git/w3mail.git 83 84 * Building from sources 85 86 Switch to the new directory and make the w3mail executable: 87 88 : $ cd w3mail 89 : $ make 90 91 This will build the w3mail and dirpop3d programs. 92 93 * Configuration 94 95 ** Executable path 96 97 First, it is convenient to put the w3mail program somewhere reachable 98 from $PATH, e.g. 99 100 - create a symlink to the w3mail executable file in your ~/bin 101 directory if the ~/bin directory is in your $PATH 102 103 - or add the w3mail git directory into your $PATH. 104 105 ** Configuration directory 106 107 Next, set up the configuration directory: 108 109 : $ mkdir ~/.w3mail 110 111 and put the following lines into your ~/.w3mail/config file: 112 113 - If you want to use the local pop3 daemon dirpop3d: 114 115 : cat /dev/stdin >`mktemp ~/.w3mail/inbox/username/XXXXXX` 116 : email@address 117 : email@address 118 : host.name 119 120 In this case, you will also need to create the inbox directory: 121 122 : $ mkdir ~/.w3mail/inbox 123 124 and an inbox directory for one user: 125 126 : $ mkdir ~/.w3mail/inbox/username 127 128 - If you want to use sendmail from your local machine: 129 130 : sendmail -t 131 : email@address 132 : email@address 133 : host.name 134 135 - If you want to use sendmail from a remote machine: 136 137 : ssh username@host.name -e none /usr/lib/sendmail -t 138 : email@address 139 : email@address 140 : host.name 141 142 In the texts above, replace username, email@address and host.name with 143 the correct values. 144 145 ** Content filters 146 147 In the end, set up the filter directory: 148 149 : $ mkdir ~/.w3mail/filter 150 151 and put some filters there: 152 153 - Edit ~/.w3mail/filter/default 154 155 : #!/bin/sh 156 : tidy -q -n -c -asxml -f /dev/null | xmlstarlet ed -O -N x="http://www.w3.org/1999/xhtml" -d "//x:script" -d "//x:object" -d "//x:form" 157 158 - Edit ~/.w3mail/filter/xpath 159 160 : #!/bin/sh 161 : XPATH=`echo "$1" | tr \" \'` 162 : tidy -q -n -c -asxml -f /dev/null | xmlstarlet sel -O -N x="http://www.w3.org/1999/xhtml" -t -c $XPATH | xmlstarlet ed -O -N x="http://www.w3.org/1999/xhtml" -d "//x:script" -d "//x:object" -d "//x:form" 163 164 - Edit ~/.w3mail/filter/bbc 165 166 : #!/bin/sh 167 : tidy -q -n -c -asxml -f /dev/null | xmlstarlet sel -O -N x="http://www.w3.org/1999/xhtml" -t -c "//x:*[@class='story-body']" -c "//x:*[@class='storybody']" | xmlstarlet ed -O -N x="http://www.w3.org/1999/xhtml" -d "//x:script" -d "//x:form" -d "//x:*[@id='page-bookmark-links-head']" -d "//x:object" -d "//x:*[@class='hidden']" -d "//x:*[@class='hyperpuff']" -d "//x:*[@class='links-list']" -d "//x:*[@class='warning']//x:p" -d "//x:*[@class='story-feature related narrow']" -d "//x:*[@class='comment-introduction']" 168 169 Put any custom filters to the ~/.w3mail/filter directory. 170 171 Then make the filters executable: 172 173 : $ chmod +x ~/.w3mail/filter/* 174 175 Also, tell w3mail when to use those filters by adding filter 176 definitions into ~/.w3mail/tidy file: 177 178 : bbc http://www.bbc.co.uk/ filter bbc 179 : emacswiki http://www.emacswiki.org/ xpath //*[@class="content browse"] 180 181 Here the first word is the filter name (unused for now), the matching 182 url prefix and filter type: 183 184 - filter: followed by filter name to lookup and execute in the 185 ~/.w3mail/filter directory 186 187 - xpath: followed by xpath expression of the web page DOM element 188 holding the interesting content. 189 190 If no filter is specified in ~/.w3mail/tidy file, the default filter 191 ~/.w3mail/filter/default is run. 192 193 Note: to find out the XPath epression of the element I am interested 194 in, I use Firebug (Firefox plug-in), point to that element and choose 195 "Copy XPath". 196 197 * Invocation from shell 198 199 I anticipate a few ways of using w3mail: 200 201 - Send single web page: 202 203 : $ w3mail 'http://logand.com/' 204 205 - Send many web pages: 206 207 First save the URLs into a file, one URL per line. Then run: 208 209 : $ cat file | w3mail 210 211 Or much faster in parallel with maximum 20 processes: 212 213 : $ cat file | xargs -n1 -P20 w3mail 214 215 - Run w3mail in background 216 217 First start the server: 218 219 : $ echo >~/.w3mail/in; tail -f ~/.w3mail/in | xargs -n1 -P20 w3mail 2>>~/.w3mail/log & 220 221 Then request sending a web page by running: 222 223 : $ echo 'url' >>~/.w3mail/in 224 225 or send many web pages by: 226 227 : $ cat file >>~/.w3mail/in 228 229 Watch the log for errors: 230 231 : $ tail -f ~/.w3mail/log 232 233 If you configured w3mail to save web pages into 234 ~/.w3mail/inbox/username directory, you can use dirpop3d to retrieve 235 those web pages as email messages. You will need to set up the 236 following: 237 238 1) Run dirpop3d: 239 240 : $ dirpop3d 3333 ~/.w3mail/inbox/username & 241 242 2) Add the pop3 server at localhost:3333 to your email client and when 243 asked to authenticate, enter the username and a password (anything 244 as password is not checked on this local pop3 server). 245 246 * Using w3mail with Emacs 247 248 Put the following emacs-lisp code into your ~/.emacs file: 249 250 #+begin_src emacs-lisp 251 (defun w3mail (url &optional new-window) 252 (interactive (browse-url-interactive-arg "URL: ")) 253 (shell-command (format "w3mail '%s' &" (browse-url-encode-url url)))) 254 255 (defun w3m-w3mail (url) 256 (interactive (list (w3m-input-url nil nil nil nil 'feeling-lucky))) 257 (when (and (stringp url) 258 (not (interactive-p))) 259 (setq url (w3m-canonicalize-url url))) 260 (set-text-properties 0 (length url) nil url) 261 (setq url (w3m-uri-replace url)) 262 (unless (or (w3m-url-local-p url) 263 (string-match "\\`about:" url)) 264 (w3m-string-match-url-components url) 265 (setq url (concat 266 (w3m-url-transfer-encode-string 267 (substring url 0 (match-beginning 8)) 268 (or w3m-current-coding-system 269 w3m-default-coding-system)) 270 (if (match-beginning 8) 271 (concat "#" (match-string 9 url)) 272 "")))) 273 (w3mail url)) 274 275 (global-set-key [f5] 'w3m-w3mail) 276 #+end_src 277 278 Pressing f5 key will ask for the URL of the web page to be send. 279 280 It is better to run the w3mail as a server as mentioned above and then 281 it is possible to replace the 282 283 : w3mail '%s' & 284 285 parameter in w3mail emacs-lisp function by 286 287 : echo '%s' >>~/.w3mail/in 288 289 which won't block emacs at all. 290 291 * Future plans 292 293 ** TODO Fix fragile tidy 294 295 Tidying (X)HTML is rather fragile at the moment and I haven't found a 296 good tool for that yet. 297 298 I imagine 299 300 - the tidy program needs to be fixed ([[http://lists.w3.org/Archives/Public/html-tidy/2010OctDec/0022.html][unlikely]]); 301 - the w3m program could be changed to dump xhtml; 302 - use parser from Firefox or Webkit; 303 - or I need to write yet another tolerant parser. 304 305 *** emacs-w3m 306 307 : $ cvs -d :pserver:anonymous@cvs.namazu.org:/storage/cvsroot login 308 : $ cvs -d :pserver:anonymous@cvs.namazu.org:/storage/cvsroot co emacs-w3m 309 310 *** w3m 311 312 How do I check out the CVS repository directly? The official 313 repository doesn't work. 314 315 : wget http://www.w3m.org/download/source/w3m-0.1.10-tb2.tar.gz 316 : tar zxvf w3m-0.1.10-tb2.tar.gz 317 318 ** TODO Fix fragile xmlstarlet pyx and p2x 319 320 Removing namespaces from xhtml doesn't work reliably either. Probably 321 bug in xmlstarlet? 322 323 ** TODO Handle RSS and Atom feeds better 324 325 ** TODO Handle mime-types better 326 327 For example, application/xml is quite common but doesn't work yet. 328 329 ** TODO Avoid base64 and use text/plain 330 331 This might be configurable but text/plain email messages would be 332 searchable using simple grep command. 333 334 It would be good if the plain text messages contained the links too. 335 336 If I don't use base64, won't there be problems with line length in 337 mail messages? 338 339 * Licence 340 341 [[http://www.gnu.org/licenses/][GPLv3+]] 342 343 * Feedback 344 345 Please send [[http://logand.com/contact.html][me]] an email.