For example I have
<html lang="en"> ...... web page </html>
I want to extract the string "en" with Jsoup.
I tried with selector and attribute without success.
Document htmlDoc = Jsoup.parse(html);
Element taglang = htmlDoc.select("html").first();
System.out.println(taglang.text());
Looks like you want to get value of lang attribute. In that case you can use attr("nameOfAttribute") like
System.out.println(taglang.attr("lang"));
I have a problem reading a value that contains '&' from the url using java spring.
My summary.jsp file contains the following code:
`
<h3>
<fmt:message key="contact.title"/>
<c:if test="${!editMode}">
<label class="eightyfivefont">
<a class="termslink eightyfivefont"
href="?submitForm=true&
institutionName=<c:out value="${institution.institutionName}" />&
repositoryName=<c:out value="${institution.repositoryName}" />&
editMode=true">
(edit)</a>
</label>
</c:if>
</h3>
`
which produce the following url:
...summary.html?submitForm=true&institutionName=r&r&repositoryName=r&r
The problem is when I am trying to fetch the institutionName that hold the value "r&r".
when fetching the value using the following command:
String name = request.getParameter("institutionName");
it fetches only the string "r" and not "r&r".
The string is stored in XML file as "<institutionName>r&r</institutionName>" which is parsed and added using:
Document doc = DocumentHelper.createDocument();
Element root = doc.addElement("institution");
and for reading from the xml:
Document doc = DocumentHelper.parseText(xml);
Element root = doc.getRootElement();
(I assume the issue isn't with the XML).
Is there a solution for this problem?
The ampersand & is used to divide key-value pairs. If you want to have it as a value, or key, for that matter, you have to urlencode it. For example, like this:
encodeURIComponent('&') = "%26"
You should be able to use JavaScript for that. Or, of course, some Java method, if the URL is being created in the jsp itself.
You should use URLEncoder to encode the institutionName
java.net.URLEncoder.encode("r&r", "UTF-8")
this outputs the URL-encoded, which is fine as a GET parameter:
r%26r
check this answer. Create a custom EL function or otherwise you can use scriptlet though it is not recommended.
I need to turn an entire HTML document into one, valid JavaScript string.
This is a process that will need to be done regularly.
For example:This:
<html>
<!--here is my html-->
<div id="Main">some content</div>
</html>
needs to become this:
var htmlString="<html><!--here is my html --><div id=\"Main\">some content</div><html>"
Based on what I've ready my initial thought is to read the contents of the file with Java and write my own parser. I don't think this would be possible with a shell script or the like? (I'm on OSX)
Can someone recommend a good approach?
Similar Questions I've Read
Reading entire html file to String?
convert html to javascript
Try this live, jsFiddle
JS
var htmlString = document.getElementsByTagName('html')[0].innerHTML;
console.log(htmlString);
HTML
<p>Hello</p>
<div>Hi div</div>
How about a jQuery GET request?
e.g.
$.get( "http://www.domain.com/some/page.html", function( data ) {
// data == entire page in html string
});
It will work on php script also (resulting html output that is).
Alternatively, if you want to achieve this PHP, something like this will also work:
<?php
$html_str = file_get_contents('http://www.domain.com/some/page.html');
?>
I want my Java application to write HTML code in a file. Right now, I am hard coding HTML tags using java.io.BufferedWriter class. For Example:
BufferedWriter bw = new BufferedWriter(new FileWriter(file));
bw.write("<html><head><title>New Page</title></head><body><p>This is Body</p></body></html>");
bw.close();
Is there any easier way to do this, as I have to create tables and it is becoming very inconvenient?
If you want to do that yourself, without using any external library, a clean way would be to create a template.html file with all the static content, like for example:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>$title</title>
</head>
<body>$body
</body>
</html>
Put a tag like $tag for any dynamic content and then do something like this:
File htmlTemplateFile = new File("path/template.html");
String htmlString = FileUtils.readFileToString(htmlTemplateFile);
String title = "New Page";
String body = "This is Body";
htmlString = htmlString.replace("$title", title);
htmlString = htmlString.replace("$body", body);
File newHtmlFile = new File("path/new.html");
FileUtils.writeStringToFile(newHtmlFile, htmlString);
Note: I used org.apache.commons.io.FileUtils for simplicity.
A few months ago I had the same problem and every library I found provides too much functionality and complexity for my final goal. So I end up developing my own library - HtmlFlow - that provides a very simple and intuitive API that allows me to write HTML in a fluent style. Check it here: https://github.com/fmcarvalho/HtmlFlow (it also supports dynamic binding to HTML elements)
Here is an example of binding the properties of a Task object into HTML elements. Consider a Task Java class with three properties: Title, Description and a Priority and then we can produce an HTML document for a Task object in the following way:
import htmlflow.HtmlView;
import model.Priority;
import model.Task;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.PrintStream;
public class App {
private static HtmlView<Task> taskDetailsView(){
HtmlView<Task> taskView = new HtmlView<>();
taskView
.head()
.title("Task Details")
.linkCss("https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/css/bootstrap.min.css");
taskView
.body().classAttr("container")
.heading(1, "Task Details")
.hr()
.div()
.text("Title: ").text(Task::getTitle)
.br()
.text("Description: ").text(Task::getDescription)
.br()
.text("Priority: ").text(Task::getPriority);
return taskView;
}
public static void main(String [] args) throws IOException{
HtmlView<Task> taskView = taskDetailsView();
Task task = new Task("Special dinner", "Have dinner with someone!", Priority.Normal);
try(PrintStream out = new PrintStream(new FileOutputStream("Task.html"))){
taskView.setPrintStream(out).write(task);
Desktop.getDesktop().browse(URI.create("Task.html"));
}
}
}
You can use jsoup or wffweb (HTML5) based.
Sample code for jsoup:-
Document doc = Jsoup.parse("<html></html>");
doc.body().addClass("body-styles-cls");
doc.body().appendElement("div");
System.out.println(doc.toString());
prints
<html>
<head></head>
<body class=" body-styles-cls">
<div></div>
</body>
</html>
Sample code for wffweb:-
Html html = new Html(null) {{
new Head(this);
new Body(this,
new ClassAttribute("body-styles-cls"));
}};
Body body = TagRepository.findOneTagAssignableToTag(Body.class, html);
body.appendChild(new Div(null));
System.out.println(html.toHtmlString());
//directly writes to file
html.toOutputStream(new FileOutputStream("/home/user/filepath/filename.html"), "UTF-8");
prints (in minified format):-
<html>
<head></head>
<body class="body-styles-cls">
<div></div>
</body>
</html>
Velocity is a good candidate for writing this kind of stuff.
It allows you to keep your html and data-generation code as separated as possible.
I would highly recommend you use a very simple templating language such as Freemarker
It really depends on the type of HTML file you're creating.
For such tasks, I use to create an object, serialize it to XML, then transform it with XSL. The pros of this approach are:
The strict separation between source code and HTML template,
The possibility to edit HTML without having to recompile the application,
The ability to serve different HTML in different cases based on the same XML, or even serve XML directly when needed (for a further deserialization for example),
The shorter amount of code to write.
The cons are:
You must know XSLT and know how to implement it in Java.
You must write XSLT (and it's torture for many developers).
When transforming XML to HTML with XSLT, some parts may be tricky. Few examples: <textarea/> tags (which make the page unusable), XML declaration (which can cause problems with IE), whitespace (with <pre></pre> tags etc.), HTML entities ( ), etc.
The performance will be reduced, since serialization to XML wastes lots of CPU resources and XSL transformation is very costly too.
Now, if your HTML is very short or very repetitive or if the HTML has a volatile structure which changes dynamically, this approach must not be taken in account. On the other hand, if you serve HTML files which have all a similar structure and you want to reduce the amount of Java code and use templates, this approach may work.
I had also problems in finding something simple to satisfy my needs so I decided to write my own library (with MIT license).
It's mainly based on composite and builder pattern.
A basic declarative example is:
import static com.github.manliogit.javatags.lang.HtmlHelper.*;
html5(attr("lang -> en"),
head(
meta(attr("http-equiv -> Content-Type", "content -> text/html; charset=UTF-8")),
title("title"),
link(attr("href -> xxx.css", "rel -> stylesheet"))
)
).render();
A fluent example is:
ul()
.add(li("item 1"))
.add(li("item 2"))
.add(li("item 3"))
You can check more examples here
I also created an on line converter to transform every html snippet (from complex bootstrap template to simple single snippet) on the fly (i.e. html -> javatags)
Templates and other methods based on preliminary creation of the document in memory are likely to impose certain limits on resulting document size.
Meanwhile a very straightforward and reliable write-on-the-fly approach to creation of plain HTML exists, based on a SAX handler and default XSLT transformer, the latter having intrinsic capability of HTML output:
String encoding = "UTF-8";
FileOutputStream fos = new FileOutputStream("myfile.html");
OutputStreamWriter writer = new OutputStreamWriter(fos, encoding);
StreamResult streamResult = new StreamResult(writer);
SAXTransformerFactory saxFactory =
(SAXTransformerFactory) TransformerFactory.newInstance();
TransformerHandler tHandler = saxFactory.newTransformerHandler();
tHandler.setResult(streamResult);
Transformer transformer = tHandler.getTransformer();
transformer.setOutputProperty(OutputKeys.METHOD, "html");
transformer.setOutputProperty(OutputKeys.ENCODING, encoding);
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
writer.write("<!DOCTYPE html>\n");
writer.flush();
tHandler.startDocument();
tHandler.startElement("", "", "html", new AttributesImpl());
tHandler.startElement("", "", "head", new AttributesImpl());
tHandler.startElement("", "", "title", new AttributesImpl());
tHandler.characters("Hello".toCharArray(), 0, 5);
tHandler.endElement("", "", "title");
tHandler.endElement("", "", "head");
tHandler.startElement("", "", "body", new AttributesImpl());
tHandler.startElement("", "", "p", new AttributesImpl());
tHandler.characters("5 > 3".toCharArray(), 0, 5); // note '>' character
tHandler.endElement("", "", "p");
tHandler.endElement("", "", "body");
tHandler.endElement("", "", "html");
tHandler.endDocument();
writer.close();
Note that XSLT transformer will release you from the burden of escaping special characters like >, as it takes necessary care of it by itself.
And it is easy to wrap SAX methods like startElement() and characters() to something more convenient to one's taste...
If you are willing to use Groovy, the MarkupBuilder is very convenient for this sort of thing, but I don't know that Java has anything like it.
http://groovy.codehaus.org/Creating+XML+using+Groovy's+MarkupBuilder
if it is becoming repetitive work ; i think you shud do code reuse ! why dont you simply write functions that "write" small building blocks of HTML. get the idea? see Eg. you can have a function to which you could pass a string and it would automatically put that into a paragraph tag and present it. Of course you would also need to write some kind of a basic parser to do this (how would the function know where to attach the paragraph!). i dont think you are a beginner .. so i am not elaborating ... do tell me if you do not understand..
Try the ujo-web library, which supports building HTML pages using the Element class. Here is a sample use case based on a Java servlet:
#WebServlet("/form-servlet")
public class FormServlet extends HttpServlet {
#Override
protected void doGet(HttpServletRequest request, HttpServletResponse response)
throws IOException {
try (HtmlElement html = HtmlElement.niceOf(response, "/style.css")) {
try (Element body = html.addBody()) {
body.addHeading("Simple form");
try (Element form = body.addForm("form-inline")) {
form.addLabel("control-label").addText("Note:");
form.addInput("form-control", "col-lg-1")
.setNameValue(NOTE, NOTE.of(request));
form.addSubmitButton("btn", "btn-primary")
.addText("Submit");
}
}
}
}
enum Attrib implements HttpParameter {
NOTE;
#Override public String toString() {
return name().toLowerCase();
}
}
}
The servlet generates the next HTML code:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8"/>
<title>Demo</title>
<link href="/css/regexp.css" rel="stylesheet"/>
</head>
<body>
<h1>Simple form</h1>
<form class="form-inline">
<label class="control-label">Note:</label>
<input class="form-control col-lg-1" name="note" value=""/>
<button class="btn btn-primary" type="submit">Submit</button>
</form>
</body>
</html>
See more information here: https://ujorm.org/www/web/
I am having HTML contents as given below. The tag that i am looking out for here are "img src" and "!important". Does Java provide any HTML parsing techniques?
<fieldset>
<table cellpadding='0'border='0'cellspacing='0'style="clear :both">
<tr valign='top' ><td width='35' >
<a href='http://mypage.rediff.com/android/32868898'class='space' onmousedown="return
enc(this,'http://track.rediff.com/clickurl=___http%3A%2F%2Fmypage.rediff.com%2Fandroid%2F3 868898___&service=mypage_feeds&clientip=202.137.232.117&pos=0&feed_id=12942949154d255f839677925642&prc_id=32868898&rowid=2064549114')" >
<div style='width:25px;height:25px;overflow:hidden;'>
<img src='http://socialimg04.rediff.com/image.php?uid=32868898&type=thumb' width='25' vspace='0' /></div></a></td> <td><span>
<a href='http://mypage.rediff.com/android/32868898' class="space" onmousedown="return enc(this,'http://track.rediff.com/click?url=___http%3A%2F%2Fmypage.rediff.com%2Fandroid%2F32868898___&service=mypage_feeds&clientip=202.137.232.117&pos=0&feed_id=12942949154d255f839677925642&prc_id=32868898&rowid=2064549114')" >Android </a> </span><span style='color:#000000
!important;'>android se updates...</span><div class='divtext'></div></td></tr><tr><td height='5' ></td></tr></table></fieldset><br/>
String value = Jsoup.parse(new File("d:\\1.html"), "UTF-8").select("img").attr("src");
System.out.println(value); //http://socialimg04.rediff.com/image.php?uid=32868898&type=thumb
System.out.println(Jsoup.parse(new File("d:\\1.html"), "UTF-8").select("span[style$=important;]").first().text());//android se updates...
JSoup
What-are-the-pros-and-cons-of-the-leading-java-html-parsers
Try NekoHtml. This is the HTML parsing library used by various higher-level testing frameworks such as HtmlUnit.
NekoHTML is a simple HTML scanner and tag balancer that enables application programmers to parse HTML documents and access the information using standard XML interfaces. The parser can scan HTML files and "fix up" many common mistakes that human (and computer) authors make in writing HTML documents. NekoHTML adds missing parent elements; automatically closes elements with optional end tags; and can handle mismatched inline element tags.
I used jsoup - this library have nice selector syntax (http://jsoup.org/cookbook/extracting-data/selector-syntax), and for your problem you can use code like this:
File input = new File("input.html");
Document doc = Jsoup.parse(input, "UTF-8", "http://example.com/");
Elements pngs = doc.select("img[src$=.png]");
I like using Jericho: http://jericho.htmlparser.net/docs/index.html
It is invulnerable to bad formed html, links leading to unavailable locations etc.
There's a lot of examples on their page, you just get all IMG tags and analyze their attributes to extracts those that pass your needs.