Securing SVG Sanitizers: A Deep Dive into Untrusted Input & Software Engineering Reports
In the world of web development, handling user-generated content, especially complex formats like SVG, presents significant security challenges. A recent GitHub discussion on the security of php-svg-optimizer, initiated by MathiasReker, brought to light critical considerations for developers aiming to create robust sanitizers. This community insight delves into the expert advice shared, offering a comprehensive guide to securing SVG processing against untrusted input, crucial for any software engineering reports on secure development practices.
The Challenge: Securing Untrusted SVG Input
MathiasReker sought community review for his PHP SVG sanitizer, designed to optimize and secure SVG files. His primary concern was ensuring the sanitizer could safely handle untrusted SVG input, preventing vulnerabilities like Cross-Site Scripting (XSS) and other potential attacks. The sanitizer focused on removing unsafe elements, non-standard tags, and risky attributes. He shared an example of its usage:
')
->withRules(
removeNonStandardAttributes: true,
removeNonStandardTags: true,
removeUnsafeElements: true,
)
->allowRisky()
->optimize()
->saveToFile('path/to/output.svg');
} catch (\Exception $exception) {
echo $exception->getMessage();
}
?>
While a good starting point, the community quickly pointed out that a more rigorous approach is necessary when dealing with truly untrusted data.
Expert Recommendations for Bulletproof SVG Sanitization
The most comprehensive feedback came from midiakiasat, who outlined a set of stringent requirements, emphasizing that for untrusted SVG, the goal must be "no executable surface left." These recommendations are vital for any software engineering reports on application security:
1. Whitelist, Not Blacklist
- Explicitly define allowed tags: Instead of removing "unsafe" elements, maintain a strict list of allowed tags (e.g.,
svg,g,path,rect,circle,defs,linearGradient). Drop everything else. - Attribute-per-tag whitelisting: For each allowed tag, explicitly list its permitted attributes. Many standard SVG features can be exploited.
2. Eliminate All Script Vectors
- Remove script-related tags: Strip
,,,,. - Strip all
on*attributes (e.g.,onload,onclick). - Reject attribute values starting with
javascript:,vbscript:,data:text/html. - Strictly validate
href/xlink:hrefattributes.
3. Block External Resource Loading
- Disallow remote URLs in
,,, and CSSurl(). - Remove
@importintags. - Ideally, remove
entirely for untrusted input or disallow external fonts.
4. Harden XML Parsing (Critical)
- Disable DTD (Document Type Definition) processing.
- Disable external entity resolution to prevent XXE (XML External Entity) attacks.
- Prevent entity expansion.
- When using libraries like
libxml, ensure entity loader and network access are disabled.
5. CSS is an Execution Surface
- If
is allowed: Stripexpression(),url(javascript:...), and@import. - Reject unknown CSS properties unless explicitly whitelisted.
6. Namespace Control
- Reject unknown namespaces.
- Reject embedded HTML/MathML.
- Normalize namespaces before validation.
7. Rebuild, Don’t Mutate
The most secure approach is to parse the input SVG into a DOM, then construct a new, clean DOM tree using only explicitly allowed nodes and attributes. Finally, serialize this new DOM. Never attempt to partially edit or "clean" the raw input directly.
8. allowRisky() is a Red Flag
A sanitizer designed for untrusted input must not offer a "risky" mode. Security must be deterministic, non-optional, and always at its highest level for such critical operations.
Conclusion
The discussion underscores that SVG sanitization for untrusted input is a complex task requiring a highly defensive, whitelist-based strategy. Developers working on such tools, or integrating them, must consider every potential execution surface. These insights provide a robust framework for enhancing security and are invaluable for any software engineering reports on secure coding practices.