Java getElementById() or Alternative - java

So, I have an XHTML document report skeleton that I want to populate by getting Elements of a certain IDs and setting their contents.
I tried getElementById(), and had null returned (because, as I found out, id is not implicitly "id" and needs to be declared in a schema).
panel.setDocument(Main.class.getResource("/halreportview/DefaultSiteDetails.html").toString());
panel = populateDefaultReport(panel);
Element header1 = panel.getDocument().getElementById("header1");
header1.setTextContent("<span class=\"b\">Instruction Type:</span> Example<br/><span class=\"b\">Allocated To:</span> "+employee.toString()+"<br/><span class=\"b\">Scheduled Date:</span> "+dateFormat.format(scheduledDate));
So, I tried some work-arounds because I don't want to have to validate my XHTML documents. I tried adding a quick DTD to the top of the file in question like so;
<?xml version="1.0"?>
<!DOCTYPE foo [<!ATTLIST bar id ID #IMPLIED>]>
But getElementById() still returned null. So tried using xml:id instead of id in the XHTML document in the hopes it was supported, but again no luck. So instead I tried to use getElementsByTagName() and loop through the results checking ids. This worked, and found the correct element (as confirmed by output printing "Found it"), but when I try to call setTextContent on this element I am still getting a NullPointException. Code below;
Element header1;
NodeList sections = panel.getDocument().getElementsByTagName("p");
for (int i = 0; i < sections.getLength(); ++i) {
if (((Element)sections.item(i)).getAttribute("id").equals("header1")) {
System.out.println("Found it");
header1 = (Element) sections.item(i);
header1.setTextContent("<span class=\"b\">Instruction Type:</span> Example<br/><span class=\"b\">Allocated To:</span> "+employee.toString()+"<br/><span class=\"b\">Scheduled Date:</span> "+dateFormat.format(scheduledDate));
}
}
I'm loosing my mind on this one. I must be suffering from some kind of fundamental misunderstanding of how this is supposed to work. Any ideas?
Edit; Excerpt from my XHTML file below with CSS removed.
<html>
<head>
<title>Site Details</title>
<style type="text/css">
</style>
</head>
<body>
<div class="header">
<p></p>
<img src="#" alt="Logo" height="81" width="69"/>
<p id="header1"><span class="b">Instruction Type:</span> Example<br/><span class="b">Allocated To:</span> Example<br/><span class="b">Scheduled Date:</span> Example</p>
</div>
</body>
</html>

I am not sure why its not working , but I have put together example for you and it works !
Note : My example is using following libraries
Apache Commons IO (Link)
Jsoup HTML Parser (Jsoup link)
Apache Commons Lang (Link)
My input xhtml file ,
<html>
<head>
<title>Site Details</title>
<style type="text/css">
</style>
</head>
<body>
<div class="header">
<p></p>
<img src="#" alt="Logo" height="81" width="69" />
<p id="header1">
<span class="b">Instruction Type:</span> Example<br />
<span class="b">Allocated To:</span> Example<br />
<span class="b">Scheduled Date:</span> Example
</p>
</div>
</body>
</html>
The java code that work ! [All comments, read ]
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.util.Date;
import org.apache.commons.io.FileUtils;
import org.apache.commons.io.IOUtils;
import org.apache.commons.lang3.StringEscapeUtils;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class Test {
/**
* #param args
* #throws IOException
* #throws InterruptedException
*/
public static void main(String[] args) throws IOException, InterruptedException {
//loading file from project
//When it is exported as JAR the files inside jar are not files but they are stream
InputStream stream = Test.class.getResourceAsStream("/test.xhtml");
//convert stream to file
File xhtmlfile = File.createTempFile("xhtmlFile", ".tmp");
FileOutputStream fileOutputStream = new FileOutputStream(xhtmlfile);
IOUtils.copy(stream, fileOutputStream);
xhtmlfile.deleteOnExit();
//get html string from file
String htmlString = FileUtils.readFileToString(xhtmlfile);
//parse using jsoup
Document doc = Jsoup.parse(htmlString);
//get all elements
Elements allElements = doc.getAllElements();
for (Element el : allElements) {
//if element id is header 1
if (el.id().equals("header1")) {
//dummy emp name
String employeeName = "dummyEmployee";
//update text
el.text("<span class=\"b\">Instruction Type:</span> Example<br/><span class=\"b\">Allocated To:</span> "
+ employeeName.toString() + "<br/><span class=\"b\">Scheduled Date:</span> " + new Date());
//dont loop further
break;
}
}
//now get html from the updated document
String html = doc.html();
//we need to unscape html
String escapeHtml4 = StringEscapeUtils.unescapeHtml4(html);
//print html
System.out.println(escapeHtml4);
}
}
*output *
<html>
<head>
<title>Site Details</title>
<style type="text/css">
</style>
</head>
<body>
<div class="header">
<p></p>
<img src="#" alt="Logo" height="81" width="69" />
<p id="header1"><span class="b">Instruction Type:</span> Example<br/><span class="b">Allocated To:</span> dummyEmployee<br/><span class="b">Scheduled Date:</span> Sat Nov 02 07:37:12 GMT 2013</p>
</div>
</body>
</html>

Related

Unable to get the output of python executed from Java

I am actually working on my graduation project. I have to work with spring boot technology. I have to run a python script from a java code which will use input of a html form. I have prepared my three files HTML, java and python
<!DOCTYPE HTML>
<html xmlns:th="http://www.thymeleaf.org">
<head>
<meta charset="UTF-8" />
<title>Add Info</title>
<link rel="stylesheet" type="text/css" th:href="#{/css/style.css}"/>
</head>
<body>
<h1>Insert an Info:</h1>
<!--
In Thymeleaf the equivalent of
JSP's ${pageContext.request.contextPath}/edit.html
would be #{/edit.html}
-->
<form th:action="#{/index}" method="get">
<input type="text" th:name="Coord1"/> </br>
<input type="text" th:name="Coord2"/> </br>
<input type="text" th:name="Coord3"/> </br>
<input type="text" th:name="Coord4"/> </br>
<input type="text" th:name="datedeb"/> </br>
<input type="text" th:name="datefin"/> </br>
<input type="submit"/>
</form>
<br/>
<!-- Check if errorMessage is not null and not empty -->
<div th:if="${errorMessage}" th:utext="${errorMessage}"
style="color:red;font-style:italic;">
...
</div>
</body>
</html>
the java code:
package com.example.project.controller;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import org.springframework.stereotype.Controller;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestMethod;
import org.springframework.web.bind.annotation.RequestParam;
#Controller
public class MainController {
float Coord1;
float Coord2;
public static String s;
#RequestMapping(value="/index",method=RequestMethod.GET)
public void addAObjectForm(#RequestParam("Coord1") float Coord1,#RequestParam("Coord2")
float Coord2,#RequestParam("Coord3") float Coord3, #RequestParam("Coord4")float Coord4,#RequestParam("datedeb")
String datedeb,#RequestParam("datefin") String datefin) throws IOException {
//System.out.println(Coord1);
try
{
String pathPython = "test1.py";
String [] cmd = new String[8];
cmd[0] = "python";
cmd[1] = pathPython;
cmd[2] = "Coord1";
cmd[3] = "Coord2";
cmd[4] = "Coord3";
cmd[5] = "Coord4";
cmd[6] = "datedeb";
cmd[7] = "datefin";
Runtime r = Runtime.getRuntime();
Process p = r.exec(cmd);
BufferedReader in = new BufferedReader(new InputStreamReader(p.getInputStream()));
while((s=in.readLine()) != null){
System.out.println(s);
System.out.println(Coord1);
}
// BufferedReader in = new BufferedReader(new InputStreamReader(p.getInputStream()));
}
catch(Exception e){
}
}
}
and python code:
import sys
import os
def getDataFromJava(arg1=(sys.argv[0]),arg2=(sys.argv[1]),arg3=(sys.argv[2]),arg4=(sys.argv[3]),arg5=(sys.argv[4]),arg6=(sys.argv[5])):
cord1=arg1
cord2=arg2
cord3=arg3
cord4=arg4
datedeb=arg5
datefin=arg6
print(cord1)
print(cord2)
#print(arg3_val)
return cord1,cord2,cord3,cord4,datedeb,datefin
when I am trying to be sure that my python script is returning a result, I get the String s in the java code equal to null
Any help please to get a result
Thank you in advance.
# append to your current test1.py
import sys
getDataFromJava( \
str(sys.argv[0]), str(sys.argv[1]), str(sys.argv[2]), \
str(sys.argv[3]), str(sys.argv[4]), str(sys.argv[5]))
Ref
I am a beginner at python. I googled to find what I think is missing in your solution. I think the python function needs to be called, and shell parameters need to be retrieved and passed in to python function.
Update 1
Based on your edit, you have updated the function definition def getDataFromJava(...):. You have not added a function call getDataFromJava(...) below your function definition. I have provided a runnable script below.
import sys
import json
def getDataFromJava(arg1,arg2,arg3,arg4,arg5,arg6):
cord1=arg1
cord2=arg2
cord3=arg3
cord4=arg4
datedeb=arg5
datefin=arg6
return cord1,cord2,cord3,cord4,datedeb,datefin
print(json.dumps(getDataFromJava( \
str(sys.argv[1]), str(sys.argv[2]), str(sys.argv[3]), \
str(sys.argv[4]), str(sys.argv[5]), str(sys.argv[6]))))
$ python test1.py qw er ty ui op as
["qw", "er", "ty", "ui", "op", "as"]
I added json because I saw that you are returning a list of items back to java side. I thought that json will be a good data interchange format between your python side and your java side. Your java code now need to parse the returned json-formatted string. If you are returning a few items and want to skip json usage, you can return something like abc#def#ghi#jkl.
print('#'.join(getDataFromJava( \
str(sys.argv[1]), str(sys.argv[2]), str(sys.argv[3]), \
str(sys.argv[4]), str(sys.argv[5]), str(sys.argv[6]))))

How to unescape escaped special characters while reading XML in Java

I'm working on extracting ISO-8559-2 encoded text from an XML. It works fine, however, there are some special characters which use their corresponding HTML code.
The XML file:
<?xml version="1.0" encoding="iso-8859-2"?>
<!DOCTYPE TEI.2 SYSTEM "http://mek.oszk.hu/mekdtd/prose/TEI-MEK-prose.dtd">
<!-- ?xml-stylesheet type="text/xsl" href="http://mek.oszk.hu/mekdtd/xsl/boszorkany_txt.xsl"? -->
<TEI.2 id="MEK-00798">
<text type="novel">
<front>
<titlePage>
<docAuthor>Jókai Mór</docAuthor>
<docTitle>
<titlePart>Az arany ember</titlePart>
</docTitle>
</titlePage>
</front>
<body>
<div type="part">
<head>
<title>A Szent Borbála</title>
</head>
<div type="chapter">
<head>
<title>I. A VASKAPU</title>
</head>
<p text-align="justify">A kitartó hetes vihar. – Ez járhatlanná teszi a Dunát a Vaskapu
között.
</p>
</div>
</div>
</body>
</text>
</TEI.2>
A snippet of the code I use:
SAXReader reader = new SAXReader();
reader.setEncoding("ISO-8859-2");
Document document = reader.read(file);
Node node = document.selectSingleNode("//*[#type='chapter']/p");
String text = node.getStringValue();
// String text = org.jsoup.parser.Parser.unescapeEntities(node.getStringValue(), true);
// String text = org.apache.commons.lang3.StringEscapeUtils.unescapeHtml4(node.getStringValue());
I also included in comments some libraries I tried, without any success.
What I want to see is:
A kitartó hetes vihar. - Ez járhatlanná teszi a Dunát a Vaskapu között.
What I see when I debug is:
A kitartó hetes vihar . Ez járhatlanná teszi a Dunát a Vaskapu között.

How to validate html using java? getting issues with jsoup library

I need to validate HTML using java. So I try with jsoup library. But some my test cases failing with it.
For eg this is my html content. I dont have any control on this content. I am getting this from some external source provider.
String invalidHtml = "<div id=\"myDivId\" ' class = claasnamee value='undaa' > <<p> p tagil vanne <br> <span> span close cheythillee!! </p> </div>";
doc = Jsoup.parseBodyFragment(invalidHtml);
For above html I am getting this output.
<html>
<head></head>
<body>
<div id="myDivId" '="" class="claasnamee" value="undaa">
<
<p> p tagil vanne <br /> <span> span close cheythillee!! </span></p>
</div>
</body>
</html>
for a single quote in my above string is comming like this. So how can I fix this issue. Any one can help me please.
The best place to validate your html would be http://validator.w3.org/. But that would be manual process. But dont worry jsoup can do this for you as well. The below program is like a workaround but it does the purpose.
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
public class JsoupValidate {
public static void main(String[] args) throws Exception {
String invalidHtml = "<div id=\"myDivId\" ' class = claasnamee value='undaa' > <<p> p tagil vanne <br> <span> span close cheythillee!! </p> </div>";
Document initialDoc = Jsoup.parseBodyFragment(invalidHtml);
Document validatedDoc = Jsoup.connect("http://validator.w3.org/check")
.data("fragment", initialDoc.html())
.data("st", "1")
.post();
System.out.println("******");
System.out.println("Errors");
System.out.println("******");
for(Element error : validatedDoc.select("li.msg_err")){
System.out.println(error.select("em").text() + " : " + error.select("span.msg").text());
}
System.out.println();
System.out.println("**************");
System.out.println("Cleaned output");
System.out.println("**************");
Document cleanedOuput = Jsoup.parse(validatedDoc.select("pre.source").text());
cleanedOuput.select("meta[name=generator]").first().remove();
cleanedOuput.outputSettings().indentAmount(4);
cleanedOuput.outputSettings().prettyPrint(true);
System.out.println(cleanedOuput.html());
}
}
var invalidHtml = "<div id=\"myDivId\" ' class = claasnamee value='undaa' > <<p> p tagil vanne <br> <span> span close cheythillee!! </p> </div>";
var parser = Parser.htmlParser()
.setTrackErrors(10); // Set the number of errors it can track. 0 by default so it's important to set that
var dom = Jsoup.parse(invalidHtml, "" /* this is the default */, parser);
System.out.println(parser.getErrors()); // Do something with the errors, if any

How to inject snippets of html into an string containing valid html?

I have the following html (sized down for literary content) that is passed into a java method.
However, I want to take this passed in html string and add a <pre> tag that contains some text passed in and add a section of <script type="text/javascript"> to the head.
String buildHTML(String htmlString, String textToInject)
{
// Inject inject textToInject into pre tag and add javascript sections
String newHTMLString = <build new html sections>
}
-- htmlString --
<html>
<head>
</head>
<body>
<body>
</html>
-- newHTMLString
<html>
<head>
<script type="text/javascript">
window.onload=function(){alert("hello?";}
</script>
</head>
<body>
<div id="1">
<pre>
<!-- Inject textToInject here into a newly created pre tag-->
</pre>
</div>
<body>
</html>
What is the best tool to do this from within java other than a regex?
Here's how to do this with Jsoup:
public String buildHTML(String htmlString, String textToInject)
{
// Create a document from string
Document doc = Jsoup.parse(htmlString);
// create the script tag in head
doc.head().appendElement("script")
.attr("type", "text/javascript")
.text("window.onload=function(){alert(\'hello?\';}");
// Create div tag
Element div = doc.body().appendElement("div").attr("id", "1");
// Create pre tag
Element pre = div.appendElement("pre");
pre.text(textToInject);
// Return as string
return doc.toString();
}
I've used chaining a lot, what means:
doc.body().appendElement(...).attr(...).text(...)
is exactly the same as
Element example = doc.body().appendElement(...);
example.attr(...);
example.text(...);
Example:
final String html = "<html>\n"
+ " <head>\n"
+ " </head>\n"
+ " <body>\n"
+ " <body>\n"
+ "</html>";
String result = buildHTML(html, "This is a test.");
System.out.println(result);
Result:
<html>
<head>
<script type="text/javascript">window.onload=function(){alert('hello?';}</script>
</head>
<body>
<div id="1">
<pre>This is a test.</pre>
</div>
</body>
</html>

Getting data from a form using java

<div></div>
<div></div>
<div></div>
<div></div>
<ul>
<form id=the_main_form method="post">
<li>
<div></div>
<div> <h2>
<a onclick="xyz;" target="_blank" href="http://sample.com" style="text-decoration:underline;">This is sample</a>
</h2></div>
<div></div>
<div></div>
</li>
there are 50 li's like that
I have posted the snip of the html from a big HTML.
<div> </div> => means there is data in between them removed the data as it is not neccessary.
I would like to know how the JSOUP- select statement be to extract the href and Text?
I selected doc.select("div div div ul xxxx");
where xxx is form ..shoud I give the form id (or) how should I do that
Try this:
Elements allLis = doc.select("#the_main_form > li ");
for (Element li : allLis) {
Element a = li.select("div:eq(1) > h2 > a");
String href = a.attr("href");
String text = a.text();
}
Hope it helps!
EDIT:
Elements allLis = doc.select("#the_main_form > li ");
This part of the code gets all <li> tags that are inside the <form> with id #the_main_form.
Element a = li.select("div:eq(1) > h2 > a");
Then we iterate over all the <li> tags and get <a> tags, by first getting <div> tags ( the second one inside all <li>s by using index=1 -> div:eq(1)) then getting <h2> tags, where our <a> tags are present.
Hope you understand now! :)
Please try this:
package com.stackoverflow.works;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
/*
* # author: sarath_sivan
*/
public class HtmlParserService {
public static void parseHtml(String html) {
Document document = Jsoup.parse(html);
Element linkElement = document.select("a").first();
String linkHref = linkElement.attr("href"); // "http://sample.com"
String linkText = linkElement.text(); // "This is sample"
System.out.println(linkHref);
System.out.println(linkText);
}
public static void main(String[] args) {
String html = "<a onclick=\"xyz;\" target=\"_blank\" href=\"http://sample.com\" style=\"text-decoration:underline;\">This is sample</a>";
parseHtml(html);
}
}
Hope you have the Jsoup Library in your classpath.
Thank you!

Categories

Resources