[Esd-l] Why are urls in html decoded?

John D. Hardin jhardin at impsec.org
Mon Mar 24 07:11:27 PST 2003


On 24 Mar 2003, Anders Nielsen wrote:

> I am using revision 1.138 of the sanitizer. I have noticed that it
> URL decodes the links in the html-part of a message. Is this this
> correct? I don't understand why it does that - isn't URLs in html
> supposed to have these encodings?

The encodings are quite often used by spammers to confuse
string-matching antispam filters. The sanitized decodes printable
characters (alphanumerics and certain punctuation marks) so that
something like "%46%52%45%45", that has no legitimate reason to be
encoded, becomes "FREE" and thus might contribute to the
classification of a message as spam.

This is about the only nod to spam filtering that the sanitizer makes.

> I other words: Why is &q=http%3A%2F%2Fwww.jobindex.dk turned into
> &q=http://www.jobindex.dk

Did that break the link?

--
 John Hardin KA7OHZ    ICQ#15735746    http://www.impsec.org/~jhardin/
 jhardin at impsec.org                        pgpk -a jhardin at impsec.org
 key: 0xB8732E79 - 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
 ...voice or no voice, the people can always be brought to the bidding
 of the leaders. That is easy. All you have to do is tell them they
 are being attacked and denounce the pacifists for lack of patriotism
 and exposing the country to danger. It works the same way in any
 country.
                                            -- Hermann Goering
-----------------------------------------------------------------------
   59 days until The Matrix Reloaded



More information about the esd-l mailing list