Class HtmlParserUtil

Object
com.liferay.portal.kernel.util.HtmlParserUtil

public class HtmlParserUtil extends Object
Provides utility methods for rendering HTML text. This class uses XSS recommendations from http://www.owasp.org/index.php/Cross_Site_Scripting#How_to_Protect_Yourself when escaping HTML text.
Author:
Brian Wing Shun Chan, Clarence Shen, Harry Mark, Samuel Kong
  • Constructor Details

    • HtmlParserUtil

      public HtmlParserUtil()
  • Method Details

    • extractText

      public static String extractText(String html)
      Extracts the raw text from the HTML input, compressing its whitespace and removing all attributes, scripts, and styles.

      For example, raw text returned by this method can be stored in a search index.

      Parameters:
      html - the HTML text
      Returns:
      the raw text from the HTML input, or null if the HTML input is null
    • findAttributeValue

      public static String findAttributeValue(Predicate<Function<String,String>> findValuePredicate, Function<Function<String,String>,String> returnValueFunction, String html, String startTagName)
    • render

      public static String render(String html)
      Renders the HTML content into text. This provides a human readable version of the segment content that is modeled on the way Mozilla Thunderbird® and other email clients provide an automatic conversion of HTML content to text in their alternative MIME encoding of emails.

      Using the default settings, the output complies with the Text/Plain; Format=Flowed (DelSp=No) protocol described in RFC-3676.

      Parameters:
      html - the HTML text
      Returns:
      the rendered HTML text, or null if the HTML text is null