If you allow user-contributed content in your site, you run into the problem of dealing with user supplied HTML in a safe manner. The most secure way of dealing with things, of course, is to strip or escape all HTML from user input fields. Unfortunately, there are many situations where it would be nice to allow a large subset of HTML input, but block out anything potentially dangerous.
SafeHTML is a lightweight PHP user input sanitizer that does just that. Just run any input field through the SafeHTML filter and any javascript, object tags, or layout breaking tags will be stripped from the supplied text. It also does a reasonable job of correcting any gnarly, malformed code, which is also a common problem with user-contributed data.
Using it is easy. Just instantiate the SafeHTML object and call its parse method:
require_once('classes/safehtml.php');
$safehtml =& new SafeHTML();
if ( isset( $_POST["inputfield"] ) )
{
$inputfield=$_POST["inputfield"];
$cleaninput = $safehtml->parse($inputfield);
}
This will take the posted "inputfield" parameter, strip any baddies, XHTMLify what's left, and the result will be stored in the $cleaninput variable. It's a simple addition to your code, and a lot more straightforward than trying to roll your own.
My only beef with the package is that it's written with a default allow policy, stripping out tags that are in its deleteTags array, but essentially allowing anything else through. If you'd rather only let through tags that you specifically want to allow, I'd recommend adding an allowTags array and adjusting the _openHandler method, adding the following after the deleteTags check:
if ( ! in_array($name, $this->allowTags)) {
return true;
}
You'll need to fill allowTags with everything you know to be safe and welcome, and you may miss a few that people will end up wanting to legitimately use, but this is easily corrected and the default deny policy is much safer in the long run.





































Whilst HTMLSafe is a good library, I'm actually in the process of moving an internal project that I develop at work from HTMLSafe to HTMLPurifier.
HTMLSafe is perfectly fine with the current set of tags and so forth, but should HTML get extended in the future the blocklist that it currently ships with could be an issue.
My reasoning being that I'm not always going to be a round - I could run over a bus tomorrow, and despite HTMLPurifier being slower, it might be safer in the long run.
Reply to this comment
a safe_html alternate - htmlawed
fantastic, superfast
bioinformatics.org/phplabware/internal_utilities/htmLawed/index.php
Reply to this comment
Good ole HTML tidy has been used this way too:
http://tidy.sourceforge.net/
Reply to this comment
Sometimes it's possible to fool filters like this easily.
I used:
if ( isset( $_POST["inputfield"] ) )
{
$inputfield = $_POST["inputfield"];
$cleaninput = $safehtml->parse($inputfield);
while ( $inputfield != $cleaninput )
{
$inputfield = $cleaninput;
$cleaninput = $safehtml->parse($inputfield);
}
}
Reply to this comment