[Esd-l] Has anyone tried removing HTML "code" via a sanitizer process??

Joaquin Ferrero atari at pucela.net
Fri Oct 24 04:23:41 PDT 2003


El jue, 23 de 10 de 2003 a las 19:55, Jim Bucks escribió: 
> Hello All,
> 
> I was wondering if anyone has tried removing HTML code via a sanitizer
> process.  I know the resulting text is going to be extremely ugly - and
> probably unreadable.  
> 

We use this solution (from Randal Schwart, www.stonehenge.com/merlyn and
a little patch of mine). This strip the html part of a multipart
message.

Insert this at .procmailrc file:

--8<-- 
# Quitar la estupidez de doble version html
:0 Hfw
* ! ^From:.*alerta
* ^Content-type:.*multipart/alternative;
| $HOME/bin/Strip-HTML.pl
--8<--

and at $HOME/Strip-HTML.pl:
--8<--
#!/usr/bin/perl -w
#
# Filter messages for html part.
# Filtrado de mensajes para quitar la parte de html.
#
# Randal L. Schwartz. 2000
# Joaquin Ferrero 2002
#

use strict;
$|++;

my $envelope = <STDIN>;

use MIME::Parser;
use MIME::Entity;
use MIME::QuotedPrint ();

my $parser = MIME::Parser->new;
$parser->output_to_core(1);
$parser->tmp_to_core(1);

my $ent = $parser->parse(\*STDIN);
#$ent->dump_skeleton(\*STDERR); exit 1; #DEBUG

if ($ent->effective_type eq "multipart/alternative"
    and $ent->parts == 2
    and $ent->parts(0)->effective_type eq "text/plain"
    and $ent->parts(1)->effective_type eq "text/html") {
    
    my $charset = $ent->parts(0)->head->mime_attr('content-type.charset');
    my $encoding = $ent->parts(0)->head->get('Content-Transfer-Encoding');

    my $newbody = 
    	$ent->parts(0)->body_as_string
                . "\n\n[[HTML Version removed]]\n";
#               . "Charset:$charset\nEncoding:$encoding\n";  #DEBUG

        $newbody = MIME::QuotedPrint::decode($newbody) if defined $encoding;

        my $newent = MIME::Entity->build(
                Data => $newbody,
                Charset => $charset,
                Encoding => defined ($encoding) ? $encoding : '-SUGGEST',
        );

    $ent->parts([$newent]);
    $ent->make_singlepart;
    $ent->sync_headers(Length => 'COMPUTE', Nonstandard => 'ERASE');
    
    print $envelope;
    $ent->print;
}
--8<--
-- 
Joaquin Ferrero <atari at pucela.net>



More information about the esd-l mailing list