I'm using Thymeleaf to process html templates, I understood how to append inline strings from my controller, but now I want to append a fragment of HTML code into the page.
For example, lets stay that I have this in my Java application:
String n="<span><i class=\"icon-leaf\"></i>"+str+"</span> \n";
final WebContext ctx = new WebContext(request, response,
servletContext, request.getLocale());
ctx.setVariable("n", n);
What do I need to write in the HTML page so that it would be replaced by the value of the n variable and be processed as HTML code instead of it being encoded as text?
You can use th:utext attribute that stands for unescaped text (see documentation). Use this with caution and avoid user input in th:utext as it can cause security problems.
<div th:remove="tag" th:utext="${n}"></div>
If you want short-hand syntax you can use following:
[(${variable})]
Escaped short-hand syntax is
[[${variable}]]
but if you change inner square brackets [ with regular ( ones HTML is not escaped.
Example within tags:
<div>
[(${variable})]
</div>
Staring with Thymeleaf 3.0 the html friendly tag would be:
<div class="mailbox-read-message" data-th-utext="*{body}">
Related
I am attempting to convert a bunch of HTML documents to XML compliance (via a java method) and there are a lot of <br> tags that either (1) are unclosed or (2) contain attributes. For some reason the regex I'm using does not address the tags that contain attributes. Here is the code:
htmlString = htmlString.replaceAll("(?i)<br *>", "<br/>");
This code works fine for all the <br> tags in the documents; it replaces them with <br/>. However, for tags like
<BR style="PAGE-BREAK-BEFORE: always" clear=all>
it doesn't do anything. I'd like all br tags to just be <br/>, regardless of any attributes in the tag prior to conversion.
What do I need to add to my regex in order to achieve this?
This regex will do what you want: <(BR|br)[^>]*>
Here is a working example: Regex101
You probably want <br\b[^>]*> to match all tags that
Start with <br
Have a word-break after the <br (so you wouldn't match a <brown> tag, for example
Contain any number of non-> characters, including 0
End with a >
You have to use .* instead of * :
htmlString.replaceAll("(?i)<br .*>", "<br/>")
//-----------------------------^^
because :
* Match the preceding character or subexpression 0 or more times.
and
.* Matches any character zero or many times
So for your case :
String htmlString = "<BR style=\"PAGE-BREAK-BEFORE: always\" clear=all>";
System.out.println(htmlString.replaceAll("(?i)<br .*>", "<br/>"));
Output
<br/>
Using regular expressions to parse HTML is not a good idea because HTML is not regular. You should use a proper parsing library like NekoHTML.
NekoHTML is a simple HTML scanner and tag balancer that enables
application programmers to parse HTML documents and access the
information using standard XML interfaces. The parser can scan HTML
files and "fix up" many common mistakes that human (and computer)
authors make in writing HTML documents. NekoHTML adds missing parent
elements; automatically closes elements with optional end tags; and
can handle mismatched inline element tags.
So we are storing html in out data model. I need to output this into a freemarker template:
example:
[#assign value = model.value!]
${value}
value = '<p>This is <a href='somelink'>Some link</a></p>'
I have tried [#noescape] but it throws an error saying there is no escape block. see FREEMARKER: avoid escaping HTML chars. This solution did not work for me.
[#noescape] or <#noescape> is only valid when used inside an [#escape] tag. Your data is probably stored with the HTML encoded. You need to get the backend to un-encode the html.
Otherwise you'll need to do something like...
${value?replace(">", ">")?replace("<", "<")}
But that isn't a good approach because it won't catch all the encoded values and shouldn't be done in the view layer.
I need to store the html retrieved from a <jsp:include> in a javascript variable. So I will have something like this
<script>
var html = '<jsp:include page="...">';
</script>
The problem is the jsp file has lots of whitespace and newlines which makes the javascript invalid! I tried using the trimDirectiveWhitespaces directive as suggested here, but that does not remove newlines.
How can I remove newlines as well from html so it can be a valid javascript string?
Or, another solution is welcome as well.
EDIT:
The snippet should eventually look like this (but with many more options):
<script>
var html = '<label class="someClass">Label</label><select><option value="val1">Value</option></select>';
</script>
I am using this pattern to remove all HTML tags (Java code):
String html="text <a href=#>link</a> <b>b</b> pic<img src=#>";
html=html.replaceAll("\\<.*?\\>", "");
System.out.println(html);
Now, I want to keep tag <a ...> (with </a>) and tag <img ...>
I want the result to be:
text <a href=#>link</a> b pic<img src=#>
How to do this?
I don't need HTML parser to do this,
because I need this regex pattern to filter a lot of html fragment,
so,I want the solution with regex
You could do this using a negative lookahead:
"<(?!(?:a|/a|img)\\b).*?>"
Rubular
However this has a number of problems and I would recommend instead that you use an HTML parser if you want a robust solution.
For more information see this question:
What HTML parsing libraries do you recommend in Java
Check this out http://sourceforge.net/projects/regexcreator/ . This is very handy gui regex editor.
Hey! Here is your answer:
You can’t parse [X]HTML with regex.
Use a proper HTML parser, for example htmlparser, Jericho or the validator.nu HTML parser. Then use the parser’s API, SAX or DOM to pull out the stuff you’re interested in.
If you insist on using regular expressions, you’re almost certain to make some small mistake that will lead to breakage, and possibly to cross-site scripting attacks, depending on what you’re doing with the markup.
See also this answer.
I recommend you use strip_tags (a PHP function)
string strip_tags ( string $str [, string $allowable_tags ] )
<?php
$text = '<p>Test paragraph.</p><!-- Comment --> Other text';
echo strip_tags($text);
echo "\n";
// Allow <p> and <a>
echo strip_tags($text, '<p><a>');
?>
OUTPUT
Test paragraph. Other text
<p>Test paragraph.</p> Other text
In files.jsp I am using following anchor and JSTL c:url combination -
<c:url value="downloadfile.jsp" var="dwnUrl" scope="request">
<c:param name="fileType" value="PDF"/>
<c:param name="fileId" value="${file.fileId}"/>
<c:param name="fileName" value="${file.fileName}"/>
</c:url>
Download
On downloadfile.jsp getting the file name value in JavaScript variable as -
selectedFile = <c:out value='${param.fileName}'>
Now, if file name contains some extra character e.g. XYZ 2/3" Technical then on the other page I am getting some different character as - XYZ 2/3#034; Technical
However, if I print request.getParameter("fileName"), its giving correct name. What is wrong?
The <c:out> by default escapes XML entities, such as the doublequote. This is done so to get well-formed XML and to avoid XSS.
To fix this, you should either get rid of <c:out>, since JSP 2.0, EL works perfectly fine in template text as well:
selectedFile = '${param.fileName}';
.. or, if you're still on legacy JSP 1.2 or older, set its escapeXml attribute to false:
selectedFile = '<c:out value="${param.fileName}" escapeXml="false">';
Note that I have added the singlequotes and semicolon to make JS code valid.
Needless to say, you'll need to keep XSS risks in mind if you do so.
The funky characters in your <c:param> values are being URL encoded by <c:url> as they should be. As far as downloadfile.jsp is concerned, the servlet container takes care of URL decoding incoming variables so you don't have to. This is normal behavior and shouldn't pose any problems for you.
If you simply turn escapeXml to false as #BalusC suggests, you will add an XSS vunerability to your page. Instead, you should encode the user input at the time of injection into the destination language, and escape characters that would be evaluated in the destination language. In this case, if the user input contained a single quote character (I'm assuming the string literal in your original example was supposed to be wrapped in single quotes, but the same would be true for double quotes if you were using them), any JavaScript code that followed it would be interpreted by the browser and executed. To safely do what you are trying to do, you should change the line in downloadfile.jsp to:
selectedFile = '${fn:replace(param.fileName, "'", "\'")}';
That will escape only single quotes, which would otherwise end the string literal declaration.
If you were using double quotes, then this would be appropriate:
selectedFile = "${fn:replace(param.fileName, '"', '\"')}";
It is worth noting that escapeXml could be appropriate for escaping JavaScript string literals (and it often is) when the string literal will eventually be dumped into HTML markup. However, in this case, the value should not be XML escaped as it is evaluated in the context of a file path, rather than in the context of HTML.