In recent years XSS surpassed buffer overflows to become the most common of all publicly reported security vulnerabilities. [ed: the last time I wrote about this, in early 2007, buffer overflows were more common.] Likely at least 70% of websites are open to XSS attacks on their users. Site administrators rarely fix XSS problems and, when they do, the hole is likely to have been open for more than a month and a half. In general, cross-site scripting holes can be seen as vulnerabilities present in web pages which allow attackers to bypass security mechanisms. By finding clever ways of injecting malicious scripts into web pages, an attacker can gain elevated access privileges to sensitive page content, session cookies, and a variety of other objects.
Incredibly scary stuff. And it's all due to insufficient sanitization of user input, where HTML, or some subset of HTML, is allowed. Check out some of the standard XSS exploits for examples of clever ways hackers can exploit the tiniest of oversights in your HTML input sanitizing. Think there's just five or six ways to build an <a> or <img> tag? Think again. There are hundreds! So that's my challenge with the WMD editor. I have to write XSS-proof code to sanitize the HTML input on the server before I write it to the database. I'd like your feedback on how best to do this. Here's my general approach, in pseudocode form. Given a random HTML string..
- Run a regular expression to match all the HTML <tags> in the HTML string.
- For each individual tag match, verify that it passes our tag regular expression whitelist.
- If the tag match does not pass, remove the entire tag from the content.
- Repeat from step 2 until we're out of tags.
- Return the sanitized HTML string.
Update: removed unnecessary extra code; all input is processed by the HTML sanitizer. It's slightly too much code to post here in a blog entry, so I have posted my C# SanitizeHtml routine on RefactorMyCode.com [ed. note: site is spam now, so link has been removed]. Please take a look and let me know what you think. (scroll to the bottom, however, to see the latest "refactoring".) Help me refactor my code, because I make bad software, with bugs! I've been itching for an excuse to link to RefactorMyCode for a while. It's a great site for coders, and signing up to submit code is super easy through OpenID -- no redundant account creation necessary! Even if you have no interest whatsoever in my crappy SanitizeHtml function, I encourage you to visit RefactorMyCode [ed. note: Actually, don't. URL campers have it and put something shady there] and consider the value of many internet eyes on a snippet of your code.