Sanitize your users' HTML input
Courtenay : August 25th, 2008
The default Rails sanitize helper is actually quite powerful. You can see some of its usage here:
<%= sanitize @article.body, :tags => %w(table tr td), :attributes => %w(id class style) %>
However, as the docs say,
Please note that sanitizing user-provided text does not
guarantee that the resulting markup is valid.
We were having an issue with users providing bad markup and leaving their tags unclosed.
This is <a href="http://foo.com">my dog<a/> and he’s super cool!
We solved it by running Hpricot over their input.
before_save :clean_html
def clean_html
self.body = Hpricot(body).to_html
end
For performance reasons, you should probably run the hpricot and sanitize methods on the way into the database, rather than rendering it in the views, because it’s somewhat slow, and is a calculation that you only need to perform once.
In fact, instead of saving it in a callback, you could overload the accessor like so:
def body=(new_body)
write_attribute :body, Hpricot(new_body).to_html
end
You’ll want to include the ActionView methods from ActionView::Helpers::SanitizeHelper to get ‘sanitize’ available in your model.
9 Responses to “Sanitize your users' HTML input”
Sorry, comments are closed for this article.
August 25th, 2008 at 08:37 AM
Good idea!
I built xss terminate (http://code.google.com/p/xssterminate/) to help do this sanitization automatically when records are saved. I included support for sanitizing with HTML5lib (http://code.google.com/p/html5lib/) which parses HTML like browsers do to try to fix the invalid HTML problem, but I didn't try just running it through hpricot.
August 25th, 2008 at 09:13 AM
Good solution, but the thing that I don't like is including a view helper in the model. There is a reason for not having view helpers, route generators and sessions available in the model. Otherwise you'll get a really fat model.
August 25th, 2008 at 09:16 AM
http://gist.github.com/7086
You could put hpricot in there as well, and do everything in the controller where it belongs.
August 25th, 2008 at 10:38 AM
technoweenie's whitelist helper made it into rails, that sanitize method IS whitelist. just so you know.
August 25th, 2008 at 10:38 AM
oh i hate markdown
August 25th, 2008 at 10:42 AM
One issue I have is this changes the user's original body. This is why I tend to save to an alternate field like
formatted_body.Also, if you look at the sanitize_helper, the html sanitizers are classes from the html tokenizer library. There's no need to include helpers:
August 25th, 2008 at 04:49 PM
If anyone misses perl's powerful HTML::Scrubber module, Michael Moen wrote HpricotScrub ( http://github.com/UnderpantsGnome/hpricot_scrub/tree/master )
Example of scrubbing rules: http://github.com/UnderpantsGnome/hpricotscrub/tree/master/test/hpricotscrub_test.rb
August 26th, 2008 at 06:42 PM
I'm using Tidy on a project to make sure user input doesn't produce invalid html. Not sure how its performance compares to Hpricot, but I'm doing it on the way out of the database, and the performance hit is negligible.
August 27th, 2008 at 04:33 AM
I've used hpricot_scrub before too, with a great deal of success. I actually do not store the scrubbed version. That way you don't have to worry about synchronizing the fields and pulling out twice as much data from the DB every time. If you properly cache your views, you won't actually be calling the scrub operation very often at all.