Java getElementById() or Alternative

Java getElementById() or Alternative - java

So, I have an XHTML document report skeleton that I want to populate by getting Elements of a certain IDs and setting their contents.
I tried getElementById(), and had null returned (because, as I found out, id is not implicitly "id" and needs to be declared in a schema).
panel.setDocument(Main.class.getResource("/halreportview/DefaultSiteDetails.html").toString());
panel = populateDefaultReport(panel);
Element header1 = panel.getDocument().getElementById("header1");
header1.setTextContent("<span class=\"b\">Instruction Type:</span> Example<br/><span class=\"b\">Allocated To:</span> "+employee.toString()+"<br/><span class=\"b\">Scheduled Date:</span> "+dateFormat.format(scheduledDate));
So, I tried some work-arounds because I don't want to have to validate my XHTML documents. I tried adding a quick DTD to the top of the file in question like so;
<?xml version="1.0"?>
<!DOCTYPE foo [<!ATTLIST bar id ID #IMPLIED>]>
But getElementById() still returned null. So tried using xml:id instead of id in the XHTML document in the hopes it was supported, but again no luck. So instead I tried to use getElementsByTagName() and loop through the results checking ids. This worked, and found the correct element (as confirmed by output printing "Found it"), but when I try to call setTextContent on this element I am still getting a NullPointException. Code below;
Element header1;
NodeList sections = panel.getDocument().getElementsByTagName("p");
for (int i = 0; i < sections.getLength(); ++i) {
if (((Element)sections.item(i)).getAttribute("id").equals("header1")) {
System.out.println("Found it");
header1 = (Element) sections.item(i);
header1.setTextContent("<span class=\"b\">Instruction Type:</span> Example<br/><span class=\"b\">Allocated To:</span> "+employee.toString()+"<br/><span class=\"b\">Scheduled Date:</span> "+dateFormat.format(scheduledDate));
}
}
I'm loosing my mind on this one. I must be suffering from some kind of fundamental misunderstanding of how this is supposed to work. Any ideas?
Edit; Excerpt from my XHTML file below with CSS removed.
<html>
<head>
<title>Site Details</title>
<style type="text/css">
</style>
</head>
<body>
<div class="header">
<p></p>
<img src="#" alt="Logo" height="81" width="69"/>
<p id="header1"><span class="b">Instruction Type:</span> Example<br/><span class="b">Allocated To:</span> Example<br/><span class="b">Scheduled Date:</span> Example</p>
</div>
</body>
</html>

I am not sure why its not working , but I have put together example for you and it works !
Note : My example is using following libraries
Apache Commons IO (Link)
Jsoup HTML Parser (Jsoup link)
Apache Commons Lang (Link)
My input xhtml file ,
<html>
<head>
<title>Site Details</title>
<style type="text/css">
</style>
</head>
<body>
<div class="header">
<p></p>
<img src="#" alt="Logo" height="81" width="69" />
<p id="header1">
<span class="b">Instruction Type:</span> Example<br />
<span class="b">Allocated To:</span> Example<br />
<span class="b">Scheduled Date:</span> Example
</p>
</div>
</body>
</html>
The java code that work ! [All comments, read ]
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.util.Date;
import org.apache.commons.io.FileUtils;
import org.apache.commons.io.IOUtils;
import org.apache.commons.lang3.StringEscapeUtils;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class Test {
/**
* #param args
* #throws IOException
* #throws InterruptedException
*/
public static void main(String[] args) throws IOException, InterruptedException {
//loading file from project
//When it is exported as JAR the files inside jar are not files but they are stream
InputStream stream = Test.class.getResourceAsStream("/test.xhtml");
//convert stream to file
File xhtmlfile = File.createTempFile("xhtmlFile", ".tmp");
FileOutputStream fileOutputStream = new FileOutputStream(xhtmlfile);
IOUtils.copy(stream, fileOutputStream);
xhtmlfile.deleteOnExit();
//get html string from file
String htmlString = FileUtils.readFileToString(xhtmlfile);
//parse using jsoup
Document doc = Jsoup.parse(htmlString);
//get all elements
Elements allElements = doc.getAllElements();
for (Element el : allElements) {
//if element id is header 1
if (el.id().equals("header1")) {
//dummy emp name
String employeeName = "dummyEmployee";
//update text
el.text("<span class=\"b\">Instruction Type:</span> Example<br/><span class=\"b\">Allocated To:</span> "
+ employeeName.toString() + "<br/><span class=\"b\">Scheduled Date:</span> " + new Date());
//dont loop further
break;
}
}
//now get html from the updated document
String html = doc.html();
//we need to unscape html
String escapeHtml4 = StringEscapeUtils.unescapeHtml4(html);
//print html
System.out.println(escapeHtml4);
}
}
*output *
<html>
<head>
<title>Site Details</title>
<style type="text/css">
</style>
</head>
<body>
<div class="header">
<p></p>
<img src="#" alt="Logo" height="81" width="69" />
<p id="header1"><span class="b">Instruction Type:</span> Example<br/><span class="b">Allocated To:</span> dummyEmployee<br/><span class="b">Scheduled Date:</span> Sat Nov 02 07:37:12 GMT 2013</p>
</div>
</body>
</html>

Related

Unable to get the output of python executed from Java

I am actually working on my graduation project. I have to work with spring boot technology. I have to run a python script from a java code which will use input of a html form. I have prepared my three files HTML, java and python
<!DOCTYPE HTML>
<html xmlns:th="http://www.thymeleaf.org">
<head>
<meta charset="UTF-8" />
<title>Add Info</title>
<link rel="stylesheet" type="text/css" th:href="#{/css/style.css}"/>
</head>
<body>
<h1>Insert an Info:</h1>
<!--
In Thymeleaf the equivalent of
JSP's ${pageContext.request.contextPath}/edit.html
would be #{/edit.html}
-->
<form th:action="#{/index}" method="get">
<input type="text" th:name="Coord1"/> </br>
<input type="text" th:name="Coord2"/> </br>
<input type="text" th:name="Coord3"/> </br>
<input type="text" th:name="Coord4"/> </br>
<input type="text" th:name="datedeb"/> </br>
<input type="text" th:name="datefin"/> </br>
<input type="submit"/>
</form>
<br/>
<!-- Check if errorMessage is not null and not empty -->
<div th:if="${errorMessage}" th:utext="${errorMessage}"
style="color:red;font-style:italic;">
...
</div>
</body>
</html>
the java code:
package com.example.project.controller;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import org.springframework.stereotype.Controller;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestMethod;
import org.springframework.web.bind.annotation.RequestParam;
#Controller
public class MainController {
float Coord1;
float Coord2;
public static String s;
#RequestMapping(value="/index",method=RequestMethod.GET)
public void addAObjectForm(#RequestParam("Coord1") float Coord1,#RequestParam("Coord2")
float Coord2,#RequestParam("Coord3") float Coord3, #RequestParam("Coord4")float Coord4,#RequestParam("datedeb")
String datedeb,#RequestParam("datefin") String datefin) throws IOException {
//System.out.println(Coord1);
try
{
String pathPython = "test1.py";
String [] cmd = new String[8];
cmd[0] = "python";
cmd[1] = pathPython;
cmd[2] = "Coord1";
cmd[3] = "Coord2";
cmd[4] = "Coord3";
cmd[5] = "Coord4";
cmd[6] = "datedeb";
cmd[7] = "datefin";
Runtime r = Runtime.getRuntime();
Process p = r.exec(cmd);
BufferedReader in = new BufferedReader(new InputStreamReader(p.getInputStream()));
while((s=in.readLine()) != null){
System.out.println(s);
System.out.println(Coord1);
}
// BufferedReader in = new BufferedReader(new InputStreamReader(p.getInputStream()));
}
catch(Exception e){
}
}
}
and python code:
import sys
import os
def getDataFromJava(arg1=(sys.argv[0]),arg2=(sys.argv[1]),arg3=(sys.argv[2]),arg4=(sys.argv[3]),arg5=(sys.argv[4]),arg6=(sys.argv[5])):
cord1=arg1
cord2=arg2
cord3=arg3
cord4=arg4
datedeb=arg5
datefin=arg6
print(cord1)
print(cord2)
#print(arg3_val)
return cord1,cord2,cord3,cord4,datedeb,datefin
when I am trying to be sure that my python script is returning a result, I get the String s in the java code equal to null
Any help please to get a result
Thank you in advance.

# append to your current test1.py
import sys
getDataFromJava( \
str(sys.argv[0]), str(sys.argv[1]), str(sys.argv[2]), \
str(sys.argv[3]), str(sys.argv[4]), str(sys.argv[5]))
Ref
I am a beginner at python. I googled to find what I think is missing in your solution. I think the python function needs to be called, and shell parameters need to be retrieved and passed in to python function.
Update 1
Based on your edit, you have updated the function definition def getDataFromJava(...):. You have not added a function call getDataFromJava(...) below your function definition. I have provided a runnable script below.
import sys
import json
def getDataFromJava(arg1,arg2,arg3,arg4,arg5,arg6):
cord1=arg1
cord2=arg2
cord3=arg3
cord4=arg4
datedeb=arg5
datefin=arg6
return cord1,cord2,cord3,cord4,datedeb,datefin
print(json.dumps(getDataFromJava( \
str(sys.argv[1]), str(sys.argv[2]), str(sys.argv[3]), \
str(sys.argv[4]), str(sys.argv[5]), str(sys.argv[6]))))
$ python test1.py qw er ty ui op as
["qw", "er", "ty", "ui", "op", "as"]
I added json because I saw that you are returning a list of items back to java side. I thought that json will be a good data interchange format between your python side and your java side. Your java code now need to parse the returned json-formatted string. If you are returning a few items and want to skip json usage, you can return something like abc#def#ghi#jkl.
print('#'.join(getDataFromJava( \
str(sys.argv[1]), str(sys.argv[2]), str(sys.argv[3]), \
str(sys.argv[4]), str(sys.argv[5]), str(sys.argv[6]))))

How to unescape escaped special characters while reading XML in Java

I'm working on extracting ISO-8559-2 encoded text from an XML. It works fine, however, there are some special characters which use their corresponding HTML code.
The XML file:
<?xml version="1.0" encoding="iso-8859-2"?>
<!DOCTYPE TEI.2 SYSTEM "http://mek.oszk.hu/mekdtd/prose/TEI-MEK-prose.dtd">
<!-- ?xml-stylesheet type="text/xsl" href="http://mek.oszk.hu/mekdtd/xsl/boszorkany_txt.xsl"? -->
<TEI.2 id="MEK-00798">
<text type="novel">
<front>
<titlePage>
<docAuthor>Jókai Mór</docAuthor>
<docTitle>
<titlePart>Az arany ember</titlePart>
</docTitle>
</titlePage>
</front>
<body>
<div type="part">
<head>
<title>A Szent Borbála</title>
</head>
<div type="chapter">
<head>
<title>I. A VASKAPU</title>
</head>
<p text-align="justify">A kitartó hetes vihar.  Ez járhatlanná teszi a Dunát a Vaskapu
között.
</p>
</div>
</div>
</body>
</text>
</TEI.2>
A snippet of the code I use:
SAXReader reader = new SAXReader();
reader.setEncoding("ISO-8859-2");
Document document = reader.read(file);
Node node = document.selectSingleNode("//*[#type='chapter']/p");
String text = node.getStringValue();
// String text = org.jsoup.parser.Parser.unescapeEntities(node.getStringValue(), true);
// String text = org.apache.commons.lang3.StringEscapeUtils.unescapeHtml4(node.getStringValue());
I also included in comments some libraries I tried, without any success.
What I want to see is:
A kitartó hetes vihar. - Ez járhatlanná teszi a Dunát a Vaskapu között.
What I see when I debug is:
A kitartó hetes vihar . Ez járhatlanná teszi a Dunát a Vaskapu között.

How to validate html using java? getting issues with jsoup library

I need to validate HTML using java. So I try with jsoup library. But some my test cases failing with it.
For eg this is my html content. I dont have any control on this content. I am getting this from some external source provider.
String invalidHtml = "<div id=\"myDivId\" ' class = claasnamee value='undaa' > <<p> p tagil vanne <br> <span> span close cheythillee!! </p> </div>";
doc = Jsoup.parseBodyFragment(invalidHtml);
For above html I am getting this output.
<html>
<head></head>
<body>
<div id="myDivId" '="" class="claasnamee" value="undaa">
<
<p> p tagil vanne <br /> <span> span close cheythillee!! </span></p>
</div>
</body>
</html>
for a single quote in my above string is comming like this. So how can I fix this issue. Any one can help me please.

The best place to validate your html would be http://validator.w3.org/. But that would be manual process. But dont worry jsoup can do this for you as well. The below program is like a workaround but it does the purpose.
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
public class JsoupValidate {
public static void main(String[] args) throws Exception {
String invalidHtml = "<div id=\"myDivId\" ' class = claasnamee value='undaa' > <<p> p tagil vanne <br> <span> span close cheythillee!! </p> </div>";
Document initialDoc = Jsoup.parseBodyFragment(invalidHtml);
Document validatedDoc = Jsoup.connect("http://validator.w3.org/check")
.data("fragment", initialDoc.html())
.data("st", "1")
.post();
System.out.println("******");
System.out.println("Errors");
System.out.println("******");
for(Element error : validatedDoc.select("li.msg_err")){
System.out.println(error.select("em").text() + " : " + error.select("span.msg").text());
}
System.out.println();
System.out.println("**************");
System.out.println("Cleaned output");
System.out.println("**************");
Document cleanedOuput = Jsoup.parse(validatedDoc.select("pre.source").text());
cleanedOuput.select("meta[name=generator]").first().remove();
cleanedOuput.outputSettings().indentAmount(4);
cleanedOuput.outputSettings().prettyPrint(true);
System.out.println(cleanedOuput.html());
}
}

var invalidHtml = "<div id=\"myDivId\" ' class = claasnamee value='undaa' > <<p> p tagil vanne <br> <span> span close cheythillee!! </p> </div>";
var parser = Parser.htmlParser()
.setTrackErrors(10); // Set the number of errors it can track. 0 by default so it's important to set that
var dom = Jsoup.parse(invalidHtml, "" /* this is the default */, parser);
System.out.println(parser.getErrors()); // Do something with the errors, if any

How to inject snippets of html into an string containing valid html?

I have the following html (sized down for literary content) that is passed into a java method.
However, I want to take this passed in html string and add a <pre> tag that contains some text passed in and add a section of <script type="text/javascript"> to the head.
String buildHTML(String htmlString, String textToInject)
{
// Inject inject textToInject into pre tag and add javascript sections
String newHTMLString = <build new html sections>
}
-- htmlString --
<html>
<head>
</head>
<body>
<body>
</html>
-- newHTMLString
<html>
<head>
<script type="text/javascript">
window.onload=function(){alert("hello?";}
</script>
</head>
<body>
<div id="1">
<pre>
<!-- Inject textToInject here into a newly created pre tag-->
</pre>
</div>
<body>
</html>
What is the best tool to do this from within java other than a regex?

Here's how to do this with Jsoup:
public String buildHTML(String htmlString, String textToInject)
{
// Create a document from string
Document doc = Jsoup.parse(htmlString);
// create the script tag in head
doc.head().appendElement("script")
.attr("type", "text/javascript")
.text("window.onload=function(){alert(\'hello?\';}");
// Create div tag
Element div = doc.body().appendElement("div").attr("id", "1");
// Create pre tag
Element pre = div.appendElement("pre");
pre.text(textToInject);
// Return as string
return doc.toString();
}
I've used chaining a lot, what means:
doc.body().appendElement(...).attr(...).text(...)
is exactly the same as
Element example = doc.body().appendElement(...);
example.attr(...);
example.text(...);
Example:
final String html = "<html>\n"
+ " <head>\n"
+ " </head>\n"
+ " <body>\n"
+ " <body>\n"
+ "</html>";
String result = buildHTML(html, "This is a test.");
System.out.println(result);
Result:
<html>
<head>
<script type="text/javascript">window.onload=function(){alert('hello?';}</script>
</head>
<body>
<div id="1">
<pre>This is a test.</pre>
</div>
</body>
</html>

Getting data from a form using java

<div></div>
<div></div>
<div></div>
<div></div>
<ul>
<form id=the_main_form method="post">
<li>
<div></div>
<div> <h2>
<a onclick="xyz;" target="_blank" href="http://sample.com" style="text-decoration:underline;">This is sample</a>
</h2></div>
<div></div>
<div></div>
</li>
there are 50 li's like that
I have posted the snip of the html from a big HTML.
<div> </div> => means there is data in between them removed the data as it is not neccessary.
I would like to know how the JSOUP- select statement be to extract the href and Text?
I selected doc.select("div div div ul xxxx");
where xxx is form ..shoud I give the form id (or) how should I do that

Try this:
Elements allLis = doc.select("#the_main_form > li ");
for (Element li : allLis) {
Element a = li.select("div:eq(1) > h2 > a");
String href = a.attr("href");
String text = a.text();
}
Hope it helps!
EDIT:
Elements allLis = doc.select("#the_main_form > li ");
This part of the code gets all <li> tags that are inside the <form> with id #the_main_form.
Element a = li.select("div:eq(1) > h2 > a");
Then we iterate over all the <li> tags and get <a> tags, by first getting <div> tags ( the second one inside all <li>s by using index=1 -> div:eq(1)) then getting <h2> tags, where our <a> tags are present.
Hope you understand now! :)

Please try this:
package com.stackoverflow.works;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
/*
* # author: sarath_sivan
*/
public class HtmlParserService {
public static void parseHtml(String html) {
Document document = Jsoup.parse(html);
Element linkElement = document.select("a").first();
String linkHref = linkElement.attr("href"); // "http://sample.com"
String linkText = linkElement.text(); // "This is sample"
System.out.println(linkHref);
System.out.println(linkText);
}
public static void main(String[] args) {
String html = "<a onclick=\"xyz;\" target=\"_blank\" href=\"http://sample.com\" style=\"text-decoration:underline;\">This is sample</a>";
parseHtml(html);
}
}
Hope you have the Jsoup Library in your classpath.
Thank you!

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java getElementById() or Alternative - java

Related

Unable to get the output of python executed from Java

How to unescape escaped special characters while reading XML in Java

How to validate html using java? getting issues with jsoup library

How to inject snippets of html into an string containing valid html?

Getting data from a form using java

Categories

Resources