Write HTML file using Java

Write HTML file using Java - java

I want my Java application to write HTML code in a file. Right now, I am hard coding HTML tags using java.io.BufferedWriter class. For Example:
BufferedWriter bw = new BufferedWriter(new FileWriter(file));
bw.write("<html><head><title>New Page</title></head><body><p>This is Body</p></body></html>");
bw.close();
Is there any easier way to do this, as I have to create tables and it is becoming very inconvenient?

If you want to do that yourself, without using any external library, a clean way would be to create a template.html file with all the static content, like for example:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>$title</title>
</head>
<body>$body
</body>
</html>
Put a tag like $tag for any dynamic content and then do something like this:
File htmlTemplateFile = new File("path/template.html");
String htmlString = FileUtils.readFileToString(htmlTemplateFile);
String title = "New Page";
String body = "This is Body";
htmlString = htmlString.replace("$title", title);
htmlString = htmlString.replace("$body", body);
File newHtmlFile = new File("path/new.html");
FileUtils.writeStringToFile(newHtmlFile, htmlString);
Note: I used org.apache.commons.io.FileUtils for simplicity.

A few months ago I had the same problem and every library I found provides too much functionality and complexity for my final goal. So I end up developing my own library - HtmlFlow - that provides a very simple and intuitive API that allows me to write HTML in a fluent style. Check it here: https://github.com/fmcarvalho/HtmlFlow (it also supports dynamic binding to HTML elements)
Here is an example of binding the properties of a Task object into HTML elements. Consider a Task Java class with three properties: Title, Description and a Priority and then we can produce an HTML document for a Task object in the following way:
import htmlflow.HtmlView;
import model.Priority;
import model.Task;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.PrintStream;
public class App {
private static HtmlView<Task> taskDetailsView(){
HtmlView<Task> taskView = new HtmlView<>();
taskView
.head()
.title("Task Details")
.linkCss("https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/css/bootstrap.min.css");
taskView
.body().classAttr("container")
.heading(1, "Task Details")
.hr()
.div()
.text("Title: ").text(Task::getTitle)
.br()
.text("Description: ").text(Task::getDescription)
.br()
.text("Priority: ").text(Task::getPriority);
return taskView;
}
public static void main(String [] args) throws IOException{
HtmlView<Task> taskView = taskDetailsView();
Task task = new Task("Special dinner", "Have dinner with someone!", Priority.Normal);
try(PrintStream out = new PrintStream(new FileOutputStream("Task.html"))){
taskView.setPrintStream(out).write(task);
Desktop.getDesktop().browse(URI.create("Task.html"));
}
}
}

You can use jsoup or wffweb (HTML5) based.
Sample code for jsoup:-
Document doc = Jsoup.parse("<html></html>");
doc.body().addClass("body-styles-cls");
doc.body().appendElement("div");
System.out.println(doc.toString());
prints
<html>
<head></head>
<body class=" body-styles-cls">
<div></div>
</body>
</html>
Sample code for wffweb:-
Html html = new Html(null) {{
new Head(this);
new Body(this,
new ClassAttribute("body-styles-cls"));
}};
Body body = TagRepository.findOneTagAssignableToTag(Body.class, html);
body.appendChild(new Div(null));
System.out.println(html.toHtmlString());
//directly writes to file
html.toOutputStream(new FileOutputStream("/home/user/filepath/filename.html"), "UTF-8");
prints (in minified format):-
<html>
<head></head>
<body class="body-styles-cls">
<div></div>
</body>
</html>

Velocity is a good candidate for writing this kind of stuff.
It allows you to keep your html and data-generation code as separated as possible.

I would highly recommend you use a very simple templating language such as Freemarker

It really depends on the type of HTML file you're creating.
For such tasks, I use to create an object, serialize it to XML, then transform it with XSL. The pros of this approach are:
The strict separation between source code and HTML template,
The possibility to edit HTML without having to recompile the application,
The ability to serve different HTML in different cases based on the same XML, or even serve XML directly when needed (for a further deserialization for example),
The shorter amount of code to write.
The cons are:
You must know XSLT and know how to implement it in Java.
You must write XSLT (and it's torture for many developers).
When transforming XML to HTML with XSLT, some parts may be tricky. Few examples: <textarea/> tags (which make the page unusable), XML declaration (which can cause problems with IE), whitespace (with <pre></pre> tags etc.), HTML entities ( ), etc.
The performance will be reduced, since serialization to XML wastes lots of CPU resources and XSL transformation is very costly too.
Now, if your HTML is very short or very repetitive or if the HTML has a volatile structure which changes dynamically, this approach must not be taken in account. On the other hand, if you serve HTML files which have all a similar structure and you want to reduce the amount of Java code and use templates, this approach may work.

I had also problems in finding something simple to satisfy my needs so I decided to write my own library (with MIT license).
It's mainly based on composite and builder pattern.
A basic declarative example is:
import static com.github.manliogit.javatags.lang.HtmlHelper.*;
html5(attr("lang -> en"),
head(
meta(attr("http-equiv -> Content-Type", "content -> text/html; charset=UTF-8")),
title("title"),
link(attr("href -> xxx.css", "rel -> stylesheet"))
)
).render();
A fluent example is:
ul()
.add(li("item 1"))
.add(li("item 2"))
.add(li("item 3"))
You can check more examples here
I also created an on line converter to transform every html snippet (from complex bootstrap template to simple single snippet) on the fly (i.e. html -> javatags)

Templates and other methods based on preliminary creation of the document in memory are likely to impose certain limits on resulting document size.
Meanwhile a very straightforward and reliable write-on-the-fly approach to creation of plain HTML exists, based on a SAX handler and default XSLT transformer, the latter having intrinsic capability of HTML output:
String encoding = "UTF-8";
FileOutputStream fos = new FileOutputStream("myfile.html");
OutputStreamWriter writer = new OutputStreamWriter(fos, encoding);
StreamResult streamResult = new StreamResult(writer);
SAXTransformerFactory saxFactory =
(SAXTransformerFactory) TransformerFactory.newInstance();
TransformerHandler tHandler = saxFactory.newTransformerHandler();
tHandler.setResult(streamResult);
Transformer transformer = tHandler.getTransformer();
transformer.setOutputProperty(OutputKeys.METHOD, "html");
transformer.setOutputProperty(OutputKeys.ENCODING, encoding);
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
writer.write("<!DOCTYPE html>\n");
writer.flush();
tHandler.startDocument();
tHandler.startElement("", "", "html", new AttributesImpl());
tHandler.startElement("", "", "head", new AttributesImpl());
tHandler.startElement("", "", "title", new AttributesImpl());
tHandler.characters("Hello".toCharArray(), 0, 5);
tHandler.endElement("", "", "title");
tHandler.endElement("", "", "head");
tHandler.startElement("", "", "body", new AttributesImpl());
tHandler.startElement("", "", "p", new AttributesImpl());
tHandler.characters("5 > 3".toCharArray(), 0, 5); // note '>' character
tHandler.endElement("", "", "p");
tHandler.endElement("", "", "body");
tHandler.endElement("", "", "html");
tHandler.endDocument();
writer.close();
Note that XSLT transformer will release you from the burden of escaping special characters like >, as it takes necessary care of it by itself.
And it is easy to wrap SAX methods like startElement() and characters() to something more convenient to one's taste...

If you are willing to use Groovy, the MarkupBuilder is very convenient for this sort of thing, but I don't know that Java has anything like it.
http://groovy.codehaus.org/Creating+XML+using+Groovy's+MarkupBuilder

if it is becoming repetitive work ; i think you shud do code reuse ! why dont you simply write functions that "write" small building blocks of HTML. get the idea? see Eg. you can have a function to which you could pass a string and it would automatically put that into a paragraph tag and present it. Of course you would also need to write some kind of a basic parser to do this (how would the function know where to attach the paragraph!). i dont think you are a beginner .. so i am not elaborating ... do tell me if you do not understand..

Try the ujo-web library, which supports building HTML pages using the Element class. Here is a sample use case based on a Java servlet:
#WebServlet("/form-servlet")
public class FormServlet extends HttpServlet {
#Override
protected void doGet(HttpServletRequest request, HttpServletResponse response)
throws IOException {
try (HtmlElement html = HtmlElement.niceOf(response, "/style.css")) {
try (Element body = html.addBody()) {
body.addHeading("Simple form");
try (Element form = body.addForm("form-inline")) {
form.addLabel("control-label").addText("Note:");
form.addInput("form-control", "col-lg-1")
.setNameValue(NOTE, NOTE.of(request));
form.addSubmitButton("btn", "btn-primary")
.addText("Submit");
}
}
}
}
enum Attrib implements HttpParameter {
NOTE;
#Override public String toString() {
return name().toLowerCase();
}
}
}
The servlet generates the next HTML code:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8"/>
<title>Demo</title>
<link href="/css/regexp.css" rel="stylesheet"/>
</head>
<body>
<h1>Simple form</h1>
<form class="form-inline">
<label class="control-label">Note:</label>
<input class="form-control col-lg-1" name="note" value=""/>
<button class="btn btn-primary" type="submit">Submit</button>
</form>
</body>
</html>
See more information here: https://ujorm.org/www/web/

Related

API design for Java String results which contain charset-specific data

In the API of a document converter, which generates HTML (or XHTML), I want to expose these methods:
// Convert the input file to a file using the specified charset
void convert(File in, File out, Charset charset);
// Convert the input document to a string using the specified charset
String convert(String in, Charset charset);
There is no way for client code to produce faulty documents with the file-based method, it safely writes a result document with the specified charset.
The String based method obviuously will lead to problems, if the client code does not respect the chosen charset - for example if the charset parameter is ISO-8859-1 but the result String is served as UTF-8 content in a web application:
String html = convert(getInputDocument(), ISO_8859_1);
...
response.setContentType("text/html;charset=UTF-8");
response.setCharacterEncoding("UTF-8");
try (PrintWriter out = response.getWriter()) {
out.print(html);
}
Question: which options should I consider to design the API so that users are guided to correct usage of the result string?
deprecate the method and provide a method which returns a byte array
use method names which contain the encoding (convertToUTF_8, convertToISO_8859_1 ...)
The result string could for example be
<!DOCTYPE html>
<html>
<head>
<META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
<title>Untitled document</title>
</head>
<body>
<p>Motörhead</p>
</body>
</html>

I don't know your exact use-case, but one possibility is to protect document with a proper object context (instead of it just being a String):
public interface Document {
void writeTo(ServletResponse response);
}
This way you can retain all control of how that "string" can be written to different targets.
I'm not sure whether you need a convert at all, since the document could automatically convert its content if it sees that the response already has a different encoding. But even if you need a convert you could do it this way:
public interface Document {
void writeTo(ServletResponse response);
Document convert(Charset targetCharset);
}
This would return a new document which is of a different charset.

Change HTML element's CSS style using java

I am using JSP to create my web page. I need to use java classes to access the data that I need to pull from another website's JSON (this CANNOT change).
Say I have the code:
<div class="fruit apple"></div>
<div class="fruit banana"></div>
//"fruit peach", "fruit orange", and so on...
style.fruit {display: none;}
I need to change the HTML element using JAVA, not javascript. In my JSP file, it will be in a <% %> tag.
<% var divClassINeedToChange = "banana";
//some sort of JAVA code that is equivalent to:
//document.getElementsByClass(divClassINeedToChange).style.display = "block"; %>
I cannot find the line of java code that is equivalent to the above line.

I hope this help you
you can parse your page using DOM or SAX parser.
for example
DocumentBuilderFactory factory=DocumentBuilderFactory.newInstance();
DocumentBuilder builder=factory.newDocumentBuilder();
Document doc=builder.parse(new File(filename));
Element e = doc.getElemetById(divClassINeedToChange);

substringBetween() returns null when trying to extract <html>..</html>

I am building a small Java application to fetch five Wikipedia pages and find substrings in the html source code. I am using the library org.apache.commons.lang3.StringUtils. However a Wikipedia article can be big, and there seems to be a limitation in StringUtils:
String html;
try {
html = Jsoup.connect("http://en.wikipedia.org/wiki/Canada").get().html();
} catch(IOException e) {
html = "";
}
String trimmedHtml = substringBetween(html, "<html>", "</html>");
System.out.println(html); // prints the whole source code fine
System.out.println(trimmedHtml); // prints null
Why does the console print null for trimmedHtml? The output should be (almost) as big as for html. Is there a maximum length for the string output or for the parameters of substringBetween()?

The string util methods work and are well tested - there is no "limitation" or "bug" here.
Viewing the page source reveals that <html> will not match:
<html lang="en" dir="ltr" class="client-nojs">
A great example of why string processing of HTML is not a good idea in general. Keep using the support offered by Jsoup, which might be using the html() method after obtaining the <HTML> element.

How to escape HTML by default in StringTemplate?

It is very good practice in HTML template engines to HTML-escape by default placeholder text to help prevent XSS (cross-site scripting) attacks. Is it possible to achieve this behavior in StringTemplate?
I have tried to register custom AttributeRenderer which escapes HTML unless format is "raw":
stg.registerRenderer(String.class, new AttributeRenderer() {
#Override
public String toString(Object o, String format, Locale locale) {
String s = (String)o;
return Objects.equals(format, "raw") ? s : StringEscapeUtils.escapeHtml4(s);
}
});
But it fails because in this case StringTemlate escapes not only placeholder text but also template text itself. For example this template:
example(title, content) ::= <<
<html>
<head>
<title>$title$</title>
</head>
<body>
$content; format = "raw"$
</body>
</html>
>>
Is rendered as:
<html>
<head>
<title>Example Title</title>
</head>
<body>
<p>Not escaped because of <code>format = "raw"</code>.</p>
</body>
</html>
Can anybody help?

There is no good solution to encode by default. The template is passed through the AttributeRenderer for the string data type, and there is no context information to detect if it is processing the template or a variable. So all strings, including the template, are encoded by default since you cannot specify "raw" for the template.
An alternative solution is to use format="xml-encode" in the variables that need to be encoded. The built-in StringRenderer has support for several formats:
upper
lower
cap
url-encode
xml-encode
So your example would be:
example(title, content) ::= <<
<html>
<head>
<title>$title; format="xml-encode"$</title>
</head>
<body>
$content$
</body>
</html>
>>
In order to encode by default, you have limited options. The alternatives are:
Use a custom data type (not String) for your variables, so you can register your HtmlEscapeStringRenderer for the custom data type. This is difficult if you use complex objects as variables that are already using standard strings.
Add the raw and the escaped variables to the model manually, e.g. add title (escaped) and title_raw (raw). You do not need a custom AttributeRenderer in this case. StringTemplate has a strict view/model separation and you need to have the model populated before it is rendered with both the raw and escaped values.
Neither option is particularly desirable, but I do not see any other alternatives with StringTemplate4.

The answer is to revert to StringTemplate v3.

Escaping html tags in html report

I have to write an HTML report from a java class which contains the source code of web pages. So the problem is that as soon as the source of a web page is encountered it is thought of by the browser as being the the end of html tags on the main report page and so the output is not renderd correctly. An example is shown below :
<html>
<body>
<li>
<pre>
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html>
<head>
<title>404 Not Found</title>
</head><body>
<h1>Not Found</h1>
The page was not found on this server.
</body>
</html>
</pre>
</li>
</body>
</html>
I want that everything inside the pre tags must be taken as normal text and not html markup. I tried replacing < with < , > with > , & with & etc.. but it doesnt seem to work. Any tips on how to make this possible?
EDIT :
This is what i tried (a is the part inside pre tags)
File aFile = new File(filename);
try {
BufferedWriter out = new BufferedWriter(new FileWriter(aFile,aFile.exists()));
a.replaceAll("<","<");a.replaceAll(">",">");a.replaceAll("\"","&;quot;");a.replaceAll("&","&");
out.write(a + "\r\n");
out.close();
}
EDIT 2:
So this correct solution involved a=a.replaceAll(...), but another thing to note is that if i replace < with &gt and later on i replace & with &amp (like i do in the above example), It will againn mess my output(< will become <). So the order must also be changed(replcae & first and then <).

In Java, String objects are immutable. That means a.replaceAll doesn’t change a but returns a new String object in which the replacement took place.
So to fix this, you need to work with the returned object instead:
a = a.replaceAll("&","&").replaceAll("<","<");
And you actually only need to replace the & and < for your specific application.

do:
a = a.replaceAll("<","<");
instead of :
a.replaceAll("<","<");
and same for others...
As replaceAll method doesn't change the string, it rather returns a new one

Well.. replaceAll may work.. However, I'll always prefer to use StingEscapeUtils as ..
a = StringEscapeUtils.escapeHtml4(a)

The sequence you post in the comment:
a.replaceAll("<","<");
a.replaceAll(">",">");
a.replaceAll("\"","&;quot;");
a‌.replaceAll("&","&");
won't work, since the replaceAll() method doesn't change the String it is called on. It can't, Strings are immutable in Java.
Also, as #Rishabh points out, your last replace call will mess up the previous replaces, so you need to change the order.
You need to do
a = a.replaceAll("&","&");
a = ...
Or, just do them all without saving the intermediate result:
a = a.replaceAll("&","&").replaceAll("<","<").replaceAll(">",">").replaceAll("\"","&;quot;");
Also, you should probably use the replace() method instead of replaceAll(), there is no need to use regexes in this case.

Replace this line:
a.replaceAll("<","<");a.replaceAll(">",">");a.replaceAll("\"","&;quot;");a.replaceAll("&","&");
As this:
a = a.replaceAll("<","<").replaceAll(">",">").replaceAll("\"","&;quot;").replaceAll("&","&");

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Write HTML file using Java - java

Velocity is a good candidate for writing this kind of stuff. It allows you to keep your html and data-generation code as separated as possible.

I would highly recommend you use a very simple templating language such as Freemarker

If you are willing to use Groovy, the MarkupBuilder is very convenient for this sort of thing, but I don't know that Java has anything like it. http://groovy.codehaus.org/Creating+XML+using+Groovy's+MarkupBuilder

Related

API design for Java String results which contain charset-specific data

Change HTML element's CSS style using java

substringBetween() returns null when trying to extract <html>..</html>

How to escape HTML by default in StringTemplate?

Escaping html tags in html report

Categories

Resources