[Esd-l] Why are urls in html decoded?

Anders Nielsen anielsen at diku.dk
Mon Mar 24 05:43:00 PST 2003


Hi!

I am using revision 1.138 of the sanitizer. I have noticed that it URL
decodes the links in the html-part of a message. Is this this correct? I
don't understand why it does that - isn't URLs in html supposed to have
these encodings?

I other words: Why is &q=http%3A%2F%2Fwww.jobindex.dk turned into
&q=http://www.jobindex.dk

Below is an example. It is a diff between a message before and after it
has passed through the sanitizer. The last 2 lines are related to my
question.

Best regards
  Anders Nielsen. 


$ diff cur/1048514032.13681_1.mail.jobsafari.dk\:2,S 
backup/new/1048514032.13681_0.mail.jobsafari.dk 
17,19d16
< X-Security: MIME headers sanitized on mail.jobsafari.dk
< 	See http://www.impsec.org/email-tools/sanitizer-intro.html
< 	for details. $Revision: 1.138 $Date: 2003-01-26 11:25:54-08 
26,29d22
< X-Spam-Status: No, hits=1.7 required=5.0
< 	tests=UPPERCASE_25_50,MAILTO_LINK,AWL
< 	version=2.31
< X-Spam-Level: *
47,48c40,41
<   <DEFANGED_META HTTP-EQUIV="Content-Type" CONTENT="text/html;
CHARSET=UTF-8">
<   <DEFANGED_META NAME="GENERATOR" CONTENT="GtkHTML/1.1.8">
---
>   <META HTTP-EQUIV="Content-Type" CONTENT="text/html; CHARSET=UTF-8">
>   <META NAME="GENERATOR" CONTENT="GtkHTML/1.1.8">
51c44
< <A
HREF="http://www.google.com/search?hl=en&lr=&ie=ISO-8859-1&q=http://www.jobindex.dk">test</A><BR>
---
> <A
HREF="http://www.google.com/search?hl=en&lr=&ie=ISO-8859-1&q=http%3A%2F%2Fwww.jobindex.dk">test</A><BR>



-- 
Anders Nielsen <anielsen at diku.dk>



More information about the esd-l mailing list