[Esd-l] Has anyone tried removing HTML "code" via a sanitizer process??

Jim Bucks jbucks at coloradostudios.com
Thu Oct 23 10:55:03 PDT 2003


Hello All,

I was wondering if anyone has tried removing HTML code via a sanitizer
process.  I know the resulting text is going to be extremely ugly - and
probably unreadable.  

Here's more details on the issue:
  1) Using Redhat 6.x
     Sendmail 8.12.9
     Procmail 3.21
     Sanitizer 1.139

  2) Mail clients are predominantly Netscape Communicator 4.79

  3) The defanging process is working GREAT!  However, I have 
     a small group of users that are having problems (they 
     refuse to view the message source) with HTML email they 
     have received.  The Sanitizer is properly modifying the
     html tags, but then the message just comes up as blank.

  4) Most of the time, the original recipient (R)can see 
     the HTML message.  When this person (R) then forwards
     or reply's to that message, the message body disappears
     from the forwarded message.

     See below for the message source from one of the 
     "interesting" messages.

  5) My thoughts on fixing this are:
     - Turn off html sanitizing. I'm fighting this.

     - Find a way to strip the html tags from the 
       messages, leaving just the ascii text.
       I'm not sure if this is even reasonably doable.

     - Find a way to strip just the meta and style sections, 
       leaving the remaining part of the message intact.
       I'm sure this will still be pretty ugly.

     - Look into an alternative to Sanitizer.
       Probably end up being big $$$$$.

Do 'yall have any lessons learned / words of wisdom I can use as
guidance for this?  Other than shoot the users?

Jim




Source of "interesting" message............................

Return-Path: <ccccc at ddddd.xxx>
Received: from eeeee.ddddd.xxx ([88.88.88.88])
        by gw1.bbbbb.xxx (8.12.9/8.11.6) with ESMTP id
h9MLNfG1025024
        for <aaaaa at bbbbb.xxx>; Wed, 22 Oct 2003 15:23:43
-0600
X-MimeOLE: Produced By Microsoft Exchange V6.0.6375.0
content-class: urn:content-classes:message
MIME-Version: 1.0
X-Security: MIME headers sanitized on fffff
        See http://www.impsec.org/email-tools/sanitizer-intro.html
        for details. $Revision: 1.139 $Date: 2003-09-07 10:14:23-07 
Content-Type: multipart/alternative;
        boundary="----_=_NextPart_001_01C398E2.B9D528E1"
Subject: Why Me
Date: Wed, 22 Oct 2003 16:23:24 -0500
Message-ID:
<6A7AD98CA7919B45A429AA7AC5D88D37175613 at eeeeee.ddddd.xxx>
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
Thread-Topic: Why Me
Thread-Index: AcOY4rnTfLE/hfwFSmGufHKejINLyw==
From: "Bill Smith" <ccccc at ddddd.com>
To: <aaaaa at bbbbb.com>

This is a multi-part message in MIME format.

------_=_NextPart_001_01C398E2.B9D528E1
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

blah blah blah blah blah blah blah blah blah blah blah blah blah blah bl
ah blah blah blah blah bla.

=20

blah bla    blah     blah blah

111111       1111   1111111111

111111       1111   1111111111

111111       1111   1111111111

11111       1111   1111111111

1111111     1111   1111111111

1111111     1111   1111111111


------_=_NextPart_001_01C398E2.B9D528E1
Content-Type: text/html; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

<html>

<head>
<DEFANGED_META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
charset=3Dus-ascii">


<DEFANGED_meta name=3DGenerator content=3D"Microsoft Word 10
(filtered)">

 <!-- <DEFANGED_STYLE>
<!--
 /* Style Definitions */
 p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0in;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:"Times New Roman";}
a:link, span.MsoHyperlink
        {color:blue;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {color:purple;
        text-decoration:underline;}
span.EmailStyle17
        {font-family:Arial;
        color:windowtext;}
@page Section1
        {size:8.5in 11.0in;
        margin:1.0in 1.25in 1.0in 1.25in;}
div.Section1
        {page:Section1;}
-->
 --> </DEFANGED_STYLE>

</head>

<body lang=3DEN-US link=3Dblue vlink=3Dpurple>

<div class=3DSection1>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;

font-family:Arial'>blah blah blah blah blah blah blah blah blah blah bla
h blah ?&nbsp; blah blah blah blah blah blah blah b</sman></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;

font-family:Arial'>&nbsp;</span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;

font-family:Arial'>blah bla&nbsp;&nbsp;&nbsp; blah&nbsp;&nbsp;&nbsp; =
blah blah</span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;

 font-family:Arial'>111111</span></font><font size=3D2 =
face=3DArial><span

style=3D'font-size:10.0pt;font-family:Arial'>&nbsp;&nbsp;&nbsp;&nbsp;&nbs=

p;&nbsp; 1111&nbsp;&nbsp; 1111111111</span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;

 font-family:Arial'>111111</span></font><font size=3D2 =
face=3DArial><span

style=3D'font-size:10.0pt;font-family:Arial'>&nbsp;&nbsp;&nbsp;&nbsp;&nbs=

p;&nbsp; 1111&nbsp;&nbsp; 1111111111</span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;

 font-family:Arial'>111111</span></font><font size=3D2 =
face=3DArial><span

style=3D'font-size:10.0pt;font-family:Arial'>&nbsp;&nbsp;&nbsp;&nbsp;&nbs=

p;&nbsp; 1111&nbsp;&nbsp; 1111111111</span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;

 font-family:Arial'>111111</span></font><font size=3D2 =
face=3DArial><span

style=3D'font-size:10.0pt;font-family:Arial'>&nbsp;&nbsp;&nbsp;&nbsp;&nbs=

p;&nbsp; 1111&nbsp;&nbsp; 1111111111</span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;

 font-family:Arial'>1111111</span></font><font size=3D2 =
face=3DArial><span

style=3D'font-size:10.0pt;font-family:Arial'>&nbsp;&nbsp;&nbsp;&nbsp; =
1111&nbsp;&nbsp; 1111111111</span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;

 font-family:Arial'>1111111</span></font><font size=3D2 =
face=3DArial><span

style=3D'font-size:10.0pt;font-family:Arial'>&nbsp;&nbsp;&nbsp;&nbsp; =
1111&nbsp;&nbsp; 1111111111</span></font></p>

</div>

</body>

</html>
=00
------_=_NextPart_001_01C398E2.B9D528E1--


-- 
Jim Bucks - IT/IS Support     www.coloradostudios.com 
2400 N. Ulster St.  Denver, Co.                 80238
jbucks at coloradostudios.com               303-388-8500


More information about the esd-l mailing list