[Esd-l] Has anyone tried removing HTML "code" via a sanitizer
petchema at concept-micro.com
Thu Oct 23 16:42:46 PDT 2003
Le Thu, 23 Oct 2003 11:55:03 -0600, Jim Bucks <jbucks at coloradostudios.com> a
> - Find a way to strip the html tags from the
> messages, leaving just the ascii text.
> I'm not sure if this is even reasonably doable.
Sylpheed-claws, my MUA of choice, does that, and it does a reasonable job.
Some formatting doesn't translate nicely (tables come to mind), but most
HTML email uses no formatting at all, or formatting you're glad to filter
out (colors, stupid choice of fonts,...), so 95% of the time it works just
Some emails are braindead enough to have a multipart/alternative with a
blank text variant, so those message will appear blank (since the text
variant has higher priority than html-to-text conversion), but as far as I
can remember those have always been SPAM to start with.
If it can be done in C, no doubt it can be done in Perl.
(another solution for the lazy: using lynx -dump feature ?)
More information about the esd-l