unoffice

Reclaim text from office documents
git clone https://logand.com/git/unoffice.git/
Log | Files | Refs | README

undocx (302B)


      1 #!/usr/bin/env bash
      2 set -euo pipefail
      3 unzip -p "$1" \
      4     | grep -a '<w:r' \
      5     | sed 's/<w:p[^<\/]*>/\n/g' \
      6     | sed 's/<[^<]*>//g' \
      7     | sed 's/
//g' \
      8     | sed 's/&lt;/</g' \
      9     | sed 's/&gt;/>/g' \
     10     | sed "s/&apos;/'/g" \
     11     | sed 's/&quot;/"/g' \
     12     | sed 's/&amp;/&/g' \
     13     | cat -s