[Esd-l] Has anyone tried removing HTML "code" via a sanitizer process??

daniel lance herrick dan.herrick at pbs.proquest.com
Thu Oct 23 13:22:14 PDT 2003


On Thu, 23 Oct 2003, Jim Bucks wrote:

> Hello All,
>
> I was wondering if anyone has tried removing HTML code via a sanitizer
> process.  I know the resulting text is going to be extremely ugly - and
> probably unreadable.

You have multipart/alternative and the browser is
choosing the web-imitation alternative. If you
change the "Content-Type: text/html" to something
like "Content-Type: text/petunia" then the browser
won't know how to display the "text/petunia" part
and will choose the other part. (Then you could
distribute a mailcap that says to use lynx to
display text/petunia and really send them in
circles.)

dan


> Here's more details on the issue:
>   1) Using Redhat 6.x
>      Sendmail 8.12.9
>      Procmail 3.21
>      Sanitizer 1.139
>
>   2) Mail clients are predominantly Netscape Communicator 4.79
>
>   3) The defanging process is working GREAT!  However, I have
>      a small group of users that are having problems (they
>      refuse to view the message source) with HTML email they
>      have received.  The Sanitizer is properly modifying the
>      html tags, but then the message just comes up as blank.
>
>   4) Most of the time, the original recipient (R)can see
>      the HTML message.  When this person (R) then forwards
>      or reply's to that message, the message body disappears
>      from the forwarded message.
>
>      See below for the message source from one of the
>      "interesting" messages.
>
>   5) My thoughts on fixing this are:
>      - Turn off html sanitizing. I'm fighting this.
>
>      - Find a way to strip the html tags from the
>        messages, leaving just the ascii text.
>        I'm not sure if this is even reasonably doable.
>
>      - Find a way to strip just the meta and style sections,
>        leaving the remaining part of the message intact.
>        I'm sure this will still be pretty ugly.
>
>      - Look into an alternative to Sanitizer.
>        Probably end up being big $$$$$.
>
> Do 'yall have any lessons learned / words of wisdom I can use as
> guidance for this?  Other than shoot the users?
>
> Jim
>
>
>
>
> Source of "interesting" message............................
>
> Return-Path: <ccccc at ddddd.xxx>
> Received: from eeeee.ddddd.xxx ([88.88.88.88])
>         by gw1.bbbbb.xxx (8.12.9/8.11.6) with ESMTP id
> h9MLNfG1025024
>         for <aaaaa at bbbbb.xxx>; Wed, 22 Oct 2003 15:23:43
> -0600
> X-MimeOLE: Produced By Microsoft Exchange V6.0.6375.0
> content-class: urn:content-classes:message
> MIME-Version: 1.0
> X-Security: MIME headers sanitized on fffff
>         See http://www.impsec.org/email-tools/sanitizer-intro.html
>         for details. $Revision: 1.139 $Date: 2003-09-07 10:14:23-07
> Content-Type: multipart/alternative;
>         boundary="----_=_NextPart_001_01C398E2.B9D528E1"
> Subject: Why Me
> Date: Wed, 22 Oct 2003 16:23:24 -0500
> Message-ID:
> <6A7AD98CA7919B45A429AA7AC5D88D37175613 at eeeeee.ddddd.xxx>
> X-MS-Has-Attach:
> X-MS-TNEF-Correlator:
> Thread-Topic: Why Me
> Thread-Index: AcOY4rnTfLE/hfwFSmGufHKejINLyw==
> From: "Bill Smith" <ccccc at ddddd.com>
> To: <aaaaa at bbbbb.com>
>
> This is a multi-part message in MIME format.
>
> ------_=_NextPart_001_01C398E2.B9D528E1
> Content-Type: text/plain; charset="us-ascii"
> Content-Transfer-Encoding: quoted-printable
>
> blah blah blah blah blah blah blah blah blah blah blah blah blah blah bl
> ah blah blah blah blah bla.
>
> =20
>
> blah bla    blah     blah blah
>
> 111111       1111   1111111111
>
> 111111       1111   1111111111
>
> 111111       1111   1111111111
>
> 11111       1111   1111111111
>
> 1111111     1111   1111111111
>
> 1111111     1111   1111111111
>
>
> ------_=_NextPart_001_01C398E2.B9D528E1
> Content-Type: text/html; charset="us-ascii"
> Content-Transfer-Encoding: quoted-printable
>
> <html>
>
> <head>
> <DEFANGED_META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
> charset=3Dus-ascii">
>
>
> <DEFANGED_meta name=3DGenerator content=3D"Microsoft Word 10
> (filtered)">
>
>  <!-- <DEFANGED_STYLE>
> <!--
>  /* Style Definitions */
>  p.MsoNormal, li.MsoNormal, div.MsoNormal
>         {margin:0in;
>         margin-bottom:.0001pt;
>         font-size:12.0pt;
>         font-family:"Times New Roman";}
> a:link, span.MsoHyperlink
>         {color:blue;
>         text-decoration:underline;}
> a:visited, span.MsoHyperlinkFollowed
>         {color:purple;
>         text-decoration:underline;}
> span.EmailStyle17
>         {font-family:Arial;
>         color:windowtext;}
> @page Section1
>         {size:8.5in 11.0in;
>         margin:1.0in 1.25in 1.0in 1.25in;}
> div.Section1
>         {page:Section1;}
> -->
>  --> </DEFANGED_STYLE>
>
> </head>
>
> <body lang=3DEN-US link=3Dblue vlink=3Dpurple>
>
> <div class=3DSection1>
>
> <p class=3DMsoNormal><font size=3D2 face=3DArial><span =
> style=3D'font-size:10.0pt;
>
> font-family:Arial'>blah blah blah blah blah blah blah blah blah blah bla
> h blah ?&nbsp; blah blah blah blah blah blah blah b</sman></font></p>
>
> <p class=3DMsoNormal><font size=3D2 face=3DArial><span =
> style=3D'font-size:10.0pt;
>
> font-family:Arial'>&nbsp;</span></font></p>
>
> <p class=3DMsoNormal><font size=3D2 face=3DArial><span =
> style=3D'font-size:10.0pt;
>
> font-family:Arial'>blah bla&nbsp;&nbsp;&nbsp; blah&nbsp;&nbsp;&nbsp; =
> blah blah</span></font></p>
>
> <p class=3DMsoNormal><font size=3D2 face=3DArial><span =
> style=3D'font-size:10.0pt;
>
>  font-family:Arial'>111111</span></font><font size=3D2 =
> face=3DArial><span
>
> style=3D'font-size:10.0pt;font-family:Arial'>&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
>
> p;&nbsp; 1111&nbsp;&nbsp; 1111111111</span></font></p>
>
> <p class=3DMsoNormal><font size=3D2 face=3DArial><span =
> style=3D'font-size:10.0pt;
>
>  font-family:Arial'>111111</span></font><font size=3D2 =
> face=3DArial><span
>
> style=3D'font-size:10.0pt;font-family:Arial'>&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
>
> p;&nbsp; 1111&nbsp;&nbsp; 1111111111</span></font></p>
>
> <p class=3DMsoNormal><font size=3D2 face=3DArial><span =
> style=3D'font-size:10.0pt;
>
>  font-family:Arial'>111111</span></font><font size=3D2 =
> face=3DArial><span
>
> style=3D'font-size:10.0pt;font-family:Arial'>&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
>
> p;&nbsp; 1111&nbsp;&nbsp; 1111111111</span></font></p>
>
> <p class=3DMsoNormal><font size=3D2 face=3DArial><span =
> style=3D'font-size:10.0pt;
>
>  font-family:Arial'>111111</span></font><font size=3D2 =
> face=3DArial><span
>
> style=3D'font-size:10.0pt;font-family:Arial'>&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
>
> p;&nbsp; 1111&nbsp;&nbsp; 1111111111</span></font></p>
>
> <p class=3DMsoNormal><font size=3D2 face=3DArial><span =
> style=3D'font-size:10.0pt;
>
>  font-family:Arial'>1111111</span></font><font size=3D2 =
> face=3DArial><span
>
> style=3D'font-size:10.0pt;font-family:Arial'>&nbsp;&nbsp;&nbsp;&nbsp; =
> 1111&nbsp;&nbsp; 1111111111</span></font></p>
>
> <p class=3DMsoNormal><font size=3D2 face=3DArial><span =
> style=3D'font-size:10.0pt;
>
>  font-family:Arial'>1111111</span></font><font size=3D2 =
> face=3DArial><span
>
> style=3D'font-size:10.0pt;font-family:Arial'>&nbsp;&nbsp;&nbsp;&nbsp; =
> 1111&nbsp;&nbsp; 1111111111</span></font></p>
>
> </div>
>
> </body>
>
> </html>
> =00
> ------_=_NextPart_001_01C398E2.B9D528E1--
>
>
> --
> Jim Bucks - IT/IS Support     www.coloradostudios.com
> 2400 N. Ulster St.  Denver, Co.                 80238
> jbucks at coloradostudios.com               303-388-8500
> _______________________________________________
> Esd-l mailing list
> Esd-l at spconnect.com
> http://www.spconnect.com/mailman/listinfo/esd-l
>



More information about the esd-l mailing list