w3mail

program to send a web page by email
git clone https://logand.com/git/w3mail.git/
Log | Files | Refs | README | LICENSE

index.org (9733B)


      1 #+options: creator:nil timestamp:nil author:nil
      2 
      3 w3mail
      4 
      5 w3mail is a program for sending web pages via email while filtering
      6 out unwanted content.
      7 
      8 * Introduction
      9 
     10 There are many ways of browsing the Web.  In many cases, I prefer
     11 using my email reader for managing the web pages I read.
     12 
     13 In addition to removing distractions like advertisements and excessive
     14 navigational noise, there is no excessive amount of open tabs in my
     15 web browser, smaller memory and CPU usage, better readibility and
     16 powerful management of unread and read web pages, their marking,
     17 expiry and deletion.  It's asynchronous and the actual reading takes
     18 minimum keystrokes and no aiming with mouse at all.
     19 
     20 #+begin_quote
     21 To look at page I send mail to a demon which runs wget and mails the
     22 page back to me.  It is very efficient use of my time, but it is slow
     23 in real time.  -- [[http://lwn.net/Articles/262570/][rms]]
     24 #+end_quote
     25 
     26 I've used various conventional web browsers like [[http://www.mozilla.com/firefox/][Firefox]] and also some
     27 unconventional ones like [[http://emacs-w3m.namazu.org/][emacs-w3m]], [[http://emacs-w3m.namazu.org/info/emacs-w3m_69.html][emacs-w3m/shimbun]] and [[http://surf.suckless.org/][surf]] but
     28 none of them seems the right choice.  There seem to be two kinds of
     29 web pages I look at:
     30 
     31 1. quickly disposable to skim over in search for a particular (often
     32    brief) information
     33 
     34 2. with "deep" valuable information
     35 
     36 For the disposable browsing, Firefox or emacs-w3m works well.  w3mail
     37 tries to fill the gap in the second case, for browsing web pages that
     38 carry non-trivial information of long-term value or those that require
     39 more focus and time to read.
     40 
     41 * Dependencies
     42 
     43 Linux only.
     44 
     45 ** Build dependencies
     46 
     47 - git
     48 - gcc
     49 - make
     50 
     51 If you are using Ubuntu, you can install these programs by running:
     52 
     53 : $ sudo apt-get install git-core gcc make
     54 
     55 ** Runtime dependencies
     56 
     57 *** Required runtime dependencies
     58 
     59 - wget
     60 - md5sum (coreutils)
     61 
     62 If you are using Ubuntu, you can install these programs by running:
     63 
     64 : $ sudo apt-get install wget coreutils
     65 
     66 *** Optional runtime dependencies
     67 
     68 - sendmail (mailutils)
     69 - xmlstarlet
     70 - tidy
     71 
     72 If you are using Ubuntu, you can install these programs by running:
     73 
     74 : $ sudo apt-get install mailutils
     75 : $ sudo apt-get install xmlstarlet
     76 : $ sudo apt-get install tidy
     77 
     78 * Download
     79 
     80 Clone the git repository:
     81 
     82 : $ git clone http://logand.com/git/w3mail.git
     83 
     84 * Building from sources
     85 
     86 Switch to the new directory and make the w3mail executable:
     87 
     88 : $ cd w3mail
     89 : $ make
     90 
     91 This will build the w3mail and dirpop3d programs.
     92 
     93 * Configuration
     94 
     95 ** Executable path
     96 
     97 First, it is convenient to put the w3mail program somewhere reachable
     98 from $PATH, e.g.
     99 
    100 - create a symlink to the w3mail executable file in your ~/bin
    101   directory if the ~/bin directory is in your $PATH
    102 
    103 - or add the w3mail git directory into your $PATH.
    104 
    105 ** Configuration directory
    106 
    107 Next, set up the configuration directory:
    108 
    109 : $ mkdir ~/.w3mail
    110 
    111 and put the following lines into your ~/.w3mail/config file:
    112 
    113 - If you want to use the local pop3 daemon dirpop3d:
    114 
    115   : cat /dev/stdin >`mktemp ~/.w3mail/inbox/username/XXXXXX`
    116   : email@address
    117   : email@address
    118   : host.name
    119 
    120   In this case, you will also need to create the inbox directory:
    121 
    122   : $ mkdir ~/.w3mail/inbox
    123 
    124   and an inbox directory for one user:
    125 
    126   : $ mkdir ~/.w3mail/inbox/username
    127 
    128 - If you want to use sendmail from your local machine:
    129 
    130   : sendmail -t
    131   : email@address
    132   : email@address
    133   : host.name
    134 
    135 - If you want to use sendmail from a remote machine:
    136 
    137   : ssh username@host.name -e none /usr/lib/sendmail -t
    138   : email@address
    139   : email@address
    140   : host.name
    141 
    142 In the texts above, replace username, email@address and host.name with
    143 the correct values.
    144 
    145 ** Content filters
    146 
    147 In the end, set up the filter directory:
    148 
    149 : $ mkdir ~/.w3mail/filter
    150 
    151 and put some filters there:
    152 
    153 - Edit ~/.w3mail/filter/default
    154 
    155   : #!/bin/sh
    156   : tidy -q -n -c -asxml -f /dev/null | xmlstarlet ed -O -N x="http://www.w3.org/1999/xhtml" -d "//x:script" -d "//x:object" -d "//x:form"
    157 
    158 - Edit ~/.w3mail/filter/xpath
    159 
    160   : #!/bin/sh
    161   : XPATH=`echo "$1" | tr \" \'`
    162   : tidy -q -n -c -asxml -f /dev/null | xmlstarlet sel -O -N x="http://www.w3.org/1999/xhtml" -t -c $XPATH | xmlstarlet ed -O -N x="http://www.w3.org/1999/xhtml" -d "//x:script" -d "//x:object" -d "//x:form"
    163 
    164 - Edit ~/.w3mail/filter/bbc
    165 
    166   : #!/bin/sh
    167   : tidy -q -n -c -asxml -f /dev/null | xmlstarlet sel -O -N x="http://www.w3.org/1999/xhtml" -t -c "//x:*[@class='story-body']" -c "//x:*[@class='storybody']" | xmlstarlet ed -O -N x="http://www.w3.org/1999/xhtml" -d "//x:script" -d "//x:form" -d "//x:*[@id='page-bookmark-links-head']" -d "//x:object" -d "//x:*[@class='hidden']" -d "//x:*[@class='hyperpuff']" -d "//x:*[@class='links-list']" -d "//x:*[@class='warning']//x:p" -d "//x:*[@class='story-feature related narrow']" -d "//x:*[@class='comment-introduction']"
    168 
    169 Put any custom filters to the ~/.w3mail/filter directory.
    170 
    171 Then make the filters executable:
    172 
    173 : $ chmod +x ~/.w3mail/filter/*
    174 
    175 Also, tell w3mail when to use those filters by adding filter
    176 definitions into ~/.w3mail/tidy file:
    177 
    178 : bbc http://www.bbc.co.uk/ filter bbc
    179 : emacswiki http://www.emacswiki.org/ xpath //*[@class="content browse"]
    180 
    181 Here the first word is the filter name (unused for now), the matching
    182 url prefix and filter type:
    183 
    184 - filter: followed by filter name to lookup and execute in the
    185   ~/.w3mail/filter directory
    186 
    187 - xpath: followed by xpath expression of the web page DOM element
    188   holding the interesting content.
    189 
    190 If no filter is specified in ~/.w3mail/tidy file, the default filter
    191 ~/.w3mail/filter/default is run.
    192 
    193 Note: to find out the XPath epression of the element I am interested
    194 in, I use Firebug (Firefox plug-in), point to that element and choose
    195 "Copy XPath".
    196 
    197 * Invocation from shell
    198 
    199 I anticipate a few ways of using w3mail:
    200 
    201 - Send single web page:
    202 
    203   : $ w3mail 'http://logand.com/'
    204 
    205 - Send many web pages:
    206 
    207   First save the URLs into a file, one URL per line.  Then run:
    208 
    209   : $ cat file | w3mail
    210 
    211   Or much faster in parallel with maximum 20 processes:
    212 
    213   : $ cat file | xargs -n1 -P20 w3mail
    214 
    215 - Run w3mail in background
    216 
    217   First start the server:
    218 
    219   : $ echo >~/.w3mail/in; tail -f ~/.w3mail/in | xargs -n1 -P20 w3mail 2>>~/.w3mail/log &
    220 
    221   Then request sending a web page by running:
    222 
    223   : $ echo 'url' >>~/.w3mail/in
    224 
    225   or send many web pages by:
    226 
    227   : $ cat file >>~/.w3mail/in
    228 
    229   Watch the log for errors:
    230 
    231   : $ tail -f ~/.w3mail/log
    232 
    233 If you configured w3mail to save web pages into
    234 ~/.w3mail/inbox/username directory, you can use dirpop3d to retrieve
    235 those web pages as email messages.  You will need to set up the
    236 following:
    237 
    238 1) Run dirpop3d:
    239 
    240    : $ dirpop3d 3333 ~/.w3mail/inbox/username &
    241 
    242 2) Add the pop3 server at localhost:3333 to your email client and when
    243    asked to authenticate, enter the username and a password (anything
    244    as password is not checked on this local pop3 server).
    245 
    246 * Using w3mail with Emacs
    247 
    248 Put the following emacs-lisp code into your ~/.emacs file:
    249 
    250 #+begin_src emacs-lisp
    251 (defun w3mail (url &optional new-window)
    252   (interactive (browse-url-interactive-arg "URL: "))
    253   (shell-command (format "w3mail '%s' &" (browse-url-encode-url url))))
    254 
    255 (defun w3m-w3mail (url)
    256   (interactive (list (w3m-input-url nil nil nil nil 'feeling-lucky)))
    257   (when (and (stringp url)
    258              (not (interactive-p)))
    259     (setq url (w3m-canonicalize-url url)))
    260   (set-text-properties 0 (length url) nil url)
    261   (setq url (w3m-uri-replace url))
    262   (unless (or (w3m-url-local-p url)
    263               (string-match "\\`about:" url))
    264     (w3m-string-match-url-components url)
    265     (setq url (concat
    266                (w3m-url-transfer-encode-string
    267                 (substring url 0 (match-beginning 8))
    268                 (or w3m-current-coding-system
    269                     w3m-default-coding-system))
    270                (if (match-beginning 8)
    271                    (concat "#" (match-string 9 url))
    272                    ""))))
    273   (w3mail url))
    274 
    275 (global-set-key [f5] 'w3m-w3mail)
    276 #+end_src
    277 
    278 Pressing f5 key will ask for the URL of the web page to be send.
    279 
    280 It is better to run the w3mail as a server as mentioned above and then
    281 it is possible to replace the
    282 
    283 : w3mail '%s' &
    284 
    285 parameter in w3mail emacs-lisp function by
    286 
    287 : echo '%s' >>~/.w3mail/in
    288 
    289 which won't block emacs at all.
    290 
    291 * Future plans
    292 
    293 ** TODO Fix fragile tidy
    294 
    295 Tidying (X)HTML is rather fragile at the moment and I haven't found a
    296 good tool for that yet.
    297 
    298 I imagine
    299 
    300 - the tidy program needs to be fixed ([[http://lists.w3.org/Archives/Public/html-tidy/2010OctDec/0022.html][unlikely]]);
    301 - the w3m program could be changed to dump xhtml;
    302 - use parser from Firefox or Webkit;
    303 - or I need to write yet another tolerant parser.
    304 
    305 *** emacs-w3m
    306 
    307 : $ cvs -d :pserver:anonymous@cvs.namazu.org:/storage/cvsroot login
    308 : $ cvs -d :pserver:anonymous@cvs.namazu.org:/storage/cvsroot co emacs-w3m
    309 
    310 *** w3m
    311 
    312 How do I check out the CVS repository directly?  The official
    313 repository doesn't work.
    314 
    315 : wget http://www.w3m.org/download/source/w3m-0.1.10-tb2.tar.gz
    316 : tar zxvf w3m-0.1.10-tb2.tar.gz
    317 
    318 ** TODO Fix fragile xmlstarlet pyx and p2x
    319 
    320 Removing namespaces from xhtml doesn't work reliably either.  Probably
    321 bug in xmlstarlet?
    322 
    323 ** TODO Handle RSS and Atom feeds better
    324 
    325 ** TODO Handle mime-types better
    326 
    327 For example, application/xml is quite common but doesn't work yet.
    328 
    329 ** TODO Avoid base64 and use text/plain
    330 
    331 This might be configurable but text/plain email messages would be
    332 searchable using simple grep command.
    333 
    334 It would be good if the plain text messages contained the links too.
    335 
    336 If I don't use base64, won't there be problems with line length in
    337 mail messages?
    338 
    339 * Licence
    340 
    341 [[http://www.gnu.org/licenses/][GPLv3+]]
    342 
    343 * Feedback
    344 
    345 Please send [[http://logand.com/contact.html][me]] an email.