w3mail

program to send a web page by email
git clone https://logand.com/git/w3mail.git/
Log | Files | Refs | README | LICENSE

commit 411c87eb7e8f28401f5926003a222ca9262a35b2
parent b1cd1b5ccd153c9ac9dd9577a8c7a53ef892c697
Author: Tomas Hlavaty <tom@logand.com>
Date:   Tue,  1 Feb 2011 02:36:43 +0100

index.org added

Diffstat:
Aindex.org | 345+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 345 insertions(+), 0 deletions(-)

diff --git a/index.org b/index.org @@ -0,0 +1,345 @@ +#+options: creator:nil timestamp:nil author:nil + +w3mail + +w3mail is a program for sending web pages via email while filtering +out unwanted content. + +* Introduction + +There are many ways of browsing the Web. In many cases, I prefer +using my email reader for managing the web pages I read. + +In addition to removing distractions like advertisements and excessive +navigational noise, there is no excessive amount of open tabs in my +web browser, smaller memory usage, better readibility and powerful +management of unread and read web pages, their marking, expiry and +deletion. It's asynchronous and the actual reading takes minimum +keystrokes and no aiming with mouse at all. + +#+begin_quote +To look at page I send mail to a demon which runs wget and mails the +page back to me. It is very efficient use of my time, but it is slow +in real time. -- [[http://lwn.net/Articles/262570/][rms]] +#+end_quote + +I've used various conventional web browsers like [[http://www.mozilla.com/firefox/][Firefox]] and also some +unconventional ones like [[http://emacs-w3m.namazu.org/][emacs-w3m]], [[http://emacs-w3m.namazu.org/info/emacs-w3m_69.html][emacs-w3m/shimbun]] and [[http://surf.suckless.org/][surf]] but +none of them seems the right choice. There seem to be two kinds of +web pages I look at: + +1. quickly disposable to skim over in search for a particular (often + brief) information + +2. with "deep" valuable information + +For the disposable browsing, Firefox or emacs-w3m works well. w3mail +tries to fill the gap in the second case, for browsing web pages that +carry non-trivial information of long-term value or those that require +more focus and time to read. + +* Dependencies + +Linux only. + +** Build dependencies + +- git +- gcc +- make + +If you are using Ubuntu, you can install these programs by running: + +: $ sudo apt-get install git-core gcc make + +** Runtime dependencies + +*** Required runtime dependencies + +- wget +- md5sum (coreutils) + +If you are using Ubuntu, you can install these programs by running: + +: $ sudo apt-get install wget coreutils + +*** Optional runtime dependencies + +- sendmail (mailutils) +- xmlstarlet +- tidy + +If you are using Ubuntu, you can install these programs by running: + +: $ sudo apt-get install mailutils +: $ sudo apt-get install xmlstarlet +: $ sudo apt-get install tidy + +* Download + +Clone the git repository: + +: $ git clone http://logand.com/git/w3mail.git + +* Building from sources + +Switch to the new directory and make the w3mail executable: + +: $ cd w3mail +: $ make + +This will build the w3mail and dirpop3d programs. + +* Configuration + +** Executable path + +First, it is convenient to put the w3mail program somewhere reachable +from $PATH, e.g. + +- create a symlink to the w3mail executable file in your ~/bin + directory if the ~/bin directory is in your $PATH + +- or add the w3mail git directory into your $PATH. + +** Configuration directory + +Next, set up the configuration directory: + +: $ mkdir ~/.w3mail + +and put the following lines into your ~/.w3mail/config file: + +- If you want to use the local pop3 daemon dirpop3d: + + : cat /dev/stdin >`mktemp ~/.w3mail/inbox/username/XXXXXX` + : email@address + : email@address + : host.name + + In this case, you will also need to create the inbox directory: + + : $ mkdir ~/.w3mail/inbox + + and an inbox directory for one user: + + : $ mkdir ~/.w3mail/inbox/username + +- If you want to use sendmail from your local machine: + + : sendmail -t + : email@address + : email@address + : host.name + +- If you want to use sendmail from a remote machine: + + : ssh username@host.name -e none /usr/lib/sendmail -t + : email@address + : email@address + : host.name + +In the texts above, replace username, email@address and host.name with +the correct values. + +** Content filters + +In the end, set up the filter directory: + +: $ mkdir ~/.w3mail/filter + +and put some filters there: + +- Edit ~/.w3mail/filter/default + + : #!/bin/sh + : tidy -q -n -c -asxml -f /dev/null | xmlstarlet ed -O -N x="http://www.w3.org/1999/xhtml" -d "//x:script" -d "//x:object" -d "//x:form" + +- Edit ~/.w3mail/filter/xpath + + : #!/bin/sh + : XPATH=`echo "$1" | tr \" \'` + : tidy -q -n -c -asxml -f /dev/null | xmlstarlet sel -O -N x="http://www.w3.org/1999/xhtml" -t -c $XPATH | xmlstarlet ed -O -N x="http://www.w3.org/1999/xhtml" -d "//x:script" -d "//x:object" -d "//x:form" + +- Edit ~/.w3mail/filter/bbc + + : #!/bin/sh + : tidy -q -n -c -asxml -f /dev/null | xmlstarlet sel -O -N x="http://www.w3.org/1999/xhtml" -t -c "//x:*[@class='story-body']" -c "//x:*[@class='storybody']" | xmlstarlet ed -O -N x="http://www.w3.org/1999/xhtml" -d "//x:script" -d "//x:form" -d "//x:*[@id='page-bookmark-links-head']" -d "//x:object" -d "//x:*[@class='hidden']" -d "//x:*[@class='hyperpuff']" -d "//x:*[@class='links-list']" -d "//x:*[@class='warning']//x:p" -d "//x:*[@class='story-feature related narrow']" -d "//x:*[@class='comment-introduction']" + +Put any custom filters to the ~/.w3mail/filter directory. + +Then make the filters executable: + +: $ chmod +x ~/.w3mail/filter/* + +Also, tell w3mail when to use those filters by adding filter +definitions into ~/.w3mail/tidy file: + +: bbc http://www.bbc.co.uk/ filter bbc +: emacswiki http://www.emacswiki.org/ xpath //*[@class="content browse"] + +Here the first word is the filter name (unused for now), the matching +url prefix and filter type: + +- filter: followed by filter name to lookup and execute in the + ~/.w3mail/filter directory + +- xpath: followed by xpath expression of the web page DOM element + holding the interesting content. + +If no filter is specified in ~/.w3mail/tidy file, the default filter +~/.w3mail/filter/default is run. + +Note: to find out the XPath epression of the element I am interested +in, I use Firebug (Firefox plug-in), point to that element and choose +"Copy XPath". + +* Invocation from shell + +I anticipate a few ways of using w3mail: + +- Send single web page: + + : $ w3mail 'http://logand.com/' + +- Send many web pages: + + First save the URLs into a file, one URL per line. Then run: + + : $ cat file | w3mail + + Or much faster in parallel with maximum 20 processes: + + : $ cat file | xargs -n1 -P20 w3mail + +- Run w3mail in background + + First start the server: + + : $ echo >~/.w3mail/in; tail -f ~/.w3mail/in | xargs -n1 -P20 w3mail 2>>~/.w3mail/log & + + Then request sending a web page by running: + + : $ echo 'url' >>~/.w3mail/in + + or send many web pages by: + + : $ cat file >>~/.w3mail/in + + Watch the log for errors: + + : $ tail -f ~/.w3mail/log + +If you configured w3mail to save web pages into +~/.w3mail/inbox/username directory, you can use dirpop3d to retrieve +those web pages as email messages. You will need to set up the +following: + +1) Run dirpop3d: + + : $ dirpop3d 3333 ~/.w3mail/inbox/username & + +2) Add the pop3 server at localhost:3333 to your email client and when + asked to authenticate, enter the username and a password (anything + as password is not checked on this local pop3 server). + +* Using w3mail with Emacs + +Put the following emacs-lisp code into your ~/.emacs file: + +#+begin_src emacs-lisp +(defun w3mail (url &optional new-window) + (interactive (browse-url-interactive-arg "URL: ")) + (shell-command (format "w3mail '%s' &" (browse-url-encode-url url)))) + +(defun w3m-w3mail (url) + (interactive (list (w3m-input-url nil nil nil nil 'feeling-lucky))) + (when (and (stringp url) + (not (interactive-p))) + (setq url (w3m-canonicalize-url url))) + (set-text-properties 0 (length url) nil url) + (setq url (w3m-uri-replace url)) + (unless (or (w3m-url-local-p url) + (string-match "\\`about:" url)) + (w3m-string-match-url-components url) + (setq url (concat + (w3m-url-transfer-encode-string + (substring url 0 (match-beginning 8)) + (or w3m-current-coding-system + w3m-default-coding-system)) + (if (match-beginning 8) + (concat "#" (match-string 9 url)) + "")))) + (w3mail url)) + +(global-set-key [f5] 'w3m-w3mail) +#+end_src + +Pressing f5 key will ask for the URL of the web page to be send. + +It is better to run the w3mail as a server as mentioned above and then +it is possible to replace the + +: w3mail '%s' & + +parameter in w3mail emacs-lisp function by + +: echo '%s' >>~/.w3mail/in + +which won't block emacs at all. + +* Future plans + +** TODO fix fragile tidy + +Tidying (X)HTML is rather fragile at the moment and I haven't found a +good tool for that yet. + +I imagine + +- the tidy program needs to be fixed ([[http://lists.w3.org/Archives/Public/html-tidy/2010OctDec/0022.html][unlikely]]); +- the w3m program could be changed to dump xhtml; +- use parser from Firefox or Webkit; +- or I need to write yet another tolerant parser. + +*** emacs-w3m + +: $ cvs -d :pserver:anonymous@cvs.namazu.org:/storage/cvsroot login +: $ cvs -d :pserver:anonymous@cvs.namazu.org:/storage/cvsroot co emacs-w3m + +*** w3m + +How do I check out the CVS repository directly? The official +repository doesn't work. + +: wget http://www.w3m.org/download/source/w3m-0.1.10-tb2.tar.gz +: tar zxvf w3m-0.1.10-tb2.tar.gz + +** TODO fix fragile xmlstarlet pyx and p2x + +Removing namespaces from xhtml doesn't work reliably either. Probably +bug in xmlstarlet? + +** TODO handle RSS and Atom feeds better + +** TODO handle mime-types better + +For example, application/xml is quite common but doesn't work yet. + +** TODO avoid base64 and use text/plain + +This might be configurable but text/plain email messages would be +searchable using simple grep command. + +It would be good if the plain text messages contained the links too. + +If I don't use base64, won't there be problems with line length in +mail messages? + +* Licence + +[[http://www.gnu.org/licenses/][GPLv3+]] + +* Feedback + +Please send [[http://logand.com/contact.html][me]] an email.