I have an XML file, which uses an XSS and XSL stored in the folder to display the XML in a proper format.
when i use the following code
JEditorPane editor = new JEditorPane();
editor.setBounds(114, 65, 262, 186);
frame.getContentPane().add(editor);
editor.setContentType( "html" );
File file=new File("c:/r/testResult.xml");
editor.setPage(file.toURI().toURL());
All i can see is the text part of the XML without any styling. what should i do to make this display with style sheet.
The JEditorPane does not automatically process XSLT style-sheets. You must perform the transformation yourself:
try (InputStream xslt = getClass().getResourceAsStream("StyleSheet.xslt");
InputStream xml = getClass().getResourceAsStream("Document.xml")) {
DocumentBuilder db = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc = db.parse(xml);
StringWriter output = new StringWriter();
TransformerFactory tf = TransformerFactory.newInstance();
Transformer transformer = tf.newTransformer(new StreamSource(xslt));
transformer.transform(new DOMSource(doc), new StreamResult(output));
String html = output.toString();
// JEditorPane doesn't like the META tag...
html = html.replace("<META http-equiv=\"Content-Type\" content=\"text/html; charset=UTF-8\">", "");
editor.setContentType("text/html; charset=UTF-8");
editor.setText(html);
} catch (IOException | ParserConfigurationException | SAXException | TransformerException e) {
editor.setText("Unable to format document due to:\n\t" + e);
}
editor.setCaretPosition(0);
Use an appropriate InputStream or StreamSource for your particular xslt and xml documents.
Related
So i wanted to see if there was a way to convert an XML file with a soap message to a string and then update the values of particular tags. Here are the tags that i am talking about.
<o:Username>Bill</o:Username>
<o:Password Type="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-username-token-profile-1.0#PasswordText">Hello123</o:Password>
What i had originally done was update the xml file itself with the new user and pass, as seen in the code below.
try {
String namespace = "http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-secext-1.0.xsd";
configProperties.load(SecurityTokenHandler.class.getResourceAsStream(PROPERTIES_FILE));
DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
documentBuilderFactory.setNamespaceAware(true);
Document requestDoc = documentBuilderFactory.newDocumentBuilder().parse(SecurityTokenHandler.class.getResourceAsStream(SOAP_REQUEST_FILE));
Element docElement = requestDoc.getDocumentElement();
docElement.getElementsByTagNameNS(namespace, "Username").item(0).setTextContent(configProperties.getProperty("username"));
docElement.getElementsByTagNameNS(namespace,"Password").item(0).setTextContent(configProperties.getProperty("password"));
Transformer docTransformer = TransformerFactory.newInstance().newTransformer();
DOMSource source = new DOMSource(requestDoc);
StreamResult result = new StreamResult(SecurityTokenHandler.class.getResource(SOAP_REQUEST_FILE).getFile());
docTransformer.transform(source, result);
} catch(IOException | ParserConfigurationException | SAXException | TransformerException exception) {
LOGGER.error("There was an error loading the properties file", exception);
}
However, i found out later on that as this is a resource file, i'm not allowed to modify the file itself. I have to store the xml file as a string, update the user and password values without modifying the file, and then return a byte array of the xml file with the updated values (without modifying the original document). Any idea how i can accomplish this?
So the solution i came up with was to basically change the result to a byteArrayOuputStream rather than the xml file itself. Posting updated code:
try {
String namespace = "http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-secext-1.0.xsd";
configProperties.load(SecurityTokenHandler.class.getClassLoader().getResourceAsStream(PROPERTIES_FILE));
DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
documentBuilderFactory.setNamespaceAware(true);
Document requestDoc = documentBuilderFactory.newDocumentBuilder().parse(SecurityTokenHandler.class.getClassLoader().getResourceAsStream(SOAP_REQUEST_FILE));
Element docElement = requestDoc.getDocumentElement();
docElement.getElementsByTagNameNS(namespace, "Username").item(0).setTextContent(configProperties.getProperty("username"));
docElement.getElementsByTagNameNS(namespace,"Password").item(0).setTextContent(configProperties.getProperty("password"));
Transformer docTransformer = TransformerFactory.newInstance().newTransformer();
try (ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream()) {
StreamResult result = new StreamResult(byteArrayOutputStream);
DOMSource source = new DOMSource(requestDoc);
docTransformer.transform(source, result);
b = byteArrayOutputStream.toByteArray();
}
} catch(IOException | ParserConfigurationException | SAXException | TransformerException exception) {
LOGGER.error("There was an error loading the properties file", exception);
}
Using javax.xml.transform I created this ISO-8859-1 document which contains two &#-encoded characters 쎼 and 쎶:
<?xml version="1.0" encoding="ISO-8859-1"?>
<xml>쎼 and 쎶</xml>
Question: how will a standards-compliant XML reader interpret the 쎼 and 쎶,
just as the plain &# ... strings (not converted back to 쎼 and 쎶)
as 쎼 and 쎶
Code to generate the XML:
public void testInvalidCharacter() {
try {
String str = "\uC3BC and \uC3B6"; // 쎼 and 쎶
System.out.println(str);
DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc = builder.newDocument();
Element root = doc.createElement("xml");
root.setTextContent(str);
doc.appendChild(root);
DOMSource domSource = new DOMSource(doc);
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.setOutputProperty(OutputKeys.ENCODING, StandardCharsets.ISO_8859_1.name());
StringWriter out = new StringWriter();
transformer.transform(domSource, new StreamResult(out));
System.out.println(out.toString());
} catch (ParserConfigurationException | DOMException | IllegalArgumentException | TransformerException e) {
e.printStackTrace(System.err);
}
}
An XML Parser will recognize the '&#...' escape syntax and properly return 쎼 and 쎶 with its API for the text of the element.
E.g. in Java the org.w3c.dom.Element.getTextContent() method for the Element with the tag Name 'xml' will return a String with that Unicode characters, though your XML document itself is ISO-8859-1
I am trying to parse a large xml file using DOM Parser and Xpath, but it seems like my code breaks as it's a large xml file (60000 lines). When I try and print the xml, it starts printing from the middle of the xml. Any ideas how I can avoid this?
Regards
FileInputStream file = new FileInputStream(new File(filePath));
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = builderFactory.newDocumentBuilder();
Document xmlDocument = builder.parse(file);
XPath xPath = XPathFactory.newInstance().newXPath();
disclaimer = xPath.compile(disclaimerPath + File.separator + "title").evaluate(xmlDocument);
TransformerFactory tf = TransformerFactory.newInstance();
Transformer transformer = tf.newTransformer();
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
StringWriter writer = new StringWriter();
transformer.transform(new DOMSource(xmlDocument), new StreamResult(writer));
System.out.println(writer.getBuffer().toString().replaceAll("\n|\r", ""));
I am trying to create an org.w3c.dom.Document form an XML string. I am using this How to convert string to xml file in java as a basis. I am not getting an exception, the problem is that my document is always null. The XML is system generated and well formed. I wish to convert it to a Document object so that I can add new Nodes etc.
public static org.w3c.dom.Document stringToXML(String xmlSource) throws Exception {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
InputStream input = IOUtils.toInputStream(xmlSource); //uses Apache commons to obtain InputStream
BOMInputStream bomIn = new BOMInputStream(input); //create BOMInputStream from InputStream
InputSource is = new InputSource(bomIn); // InputSource with BOM removed
Document document = builder.parse(new InputSource(new StringReader(xmlSource)));
Document document2 = builder.parse(is);
System.out.println("Document=" + document.getDoctype()); // always null
System.out.println("Document2=" + document2.getDoctype()); // always null
return document;
}
I have tried these things: I created a BOMInputStream thinking that a BOM was causing the conversion to fail. I thought that this was my issue but passing the BOMInputStream to the InputSource doesn't make a difference. I have even tried passing a literal String of simple XML and nothing but null. The toString method returns [#document:null]
I am using Xpages, a JSF implementation that uses Java 6. Full name of Document class used to avoid confusion with Xpage related class of the same name.
Don't rely on what toString is telling you. It is providing diagnostic information that it thinks is useful about the current class, which is, in this case, nothing more then...
"["+getNodeName()+": "+getNodeValue()+"]";
Which isn't going to help you. Instead, you will need to try and transform the model back into a String, for example...
String text
= "<fruit>"
+ "<banana>yellow</banana>"
+ "<orange>orange</orange>"
+ "<pear>yellow</pear>"
+ "</fruit>";
InputStream is = null;
try {
is = new ByteArrayInputStream(text.getBytes());
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(is);
System.out.println("Document=" + document.toString()); // always null
Transformer tf = TransformerFactory.newInstance().newTransformer();
tf.setOutputProperty(OutputKeys.INDENT, "yes");
tf.setOutputProperty(OutputKeys.METHOD, "xml");
tf.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");
ByteArrayOutputStream os = null;
try {
os = new ByteArrayOutputStream();
DOMSource domSource = new DOMSource(document);
StreamResult sr = new StreamResult(os);
tf.transform(domSource, sr);
System.out.println(new String(os.toByteArray()));
} finally {
try {
os.close();
} finally {
}
}
} catch (ParserConfigurationException | SAXException | IOException | TransformerConfigurationException exp) {
exp.printStackTrace();
} catch (TransformerException exp) {
exp.printStackTrace();
} finally {
try {
is.close();
} catch (Exception e) {
}
}
Which outputs...
Document=[#document: null]
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<fruit>
<banana>yellow</banana>
<orange>orange</orange>
<pear>yellow</pear>
</fruit>
You can try using this: http://www.wissel.net/blog/downloads/SHWL-8MRM36/$File/SimpleXMLDoc.java
I currrently have some code that converts a .doc document to html but the code I am using for converting a .docx to text unfortunately doesn't get the text and convert it. Below is my code.
private void convertWordDocXtoHTML(File file) throws ParserConfigurationException, TransformerConfigurationException, TransformerException, IOException {
XWPFDocument wordDocument = null;
try {
wordDocument = new XWPFDocument(new FileInputStream(file));
} catch (IOException ex) {
Exceptions.printStackTrace(ex);
}
WordToHtmlConverter wordToHtmlConverter = new WordToHtmlConverter(DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument());
org.w3c.dom.Document htmlDocument = wordToHtmlConverter.getDocument();
ByteArrayOutputStream out = new ByteArrayOutputStream();
DOMSource domSource = new DOMSource(htmlDocument);
StreamResult streamResult = new StreamResult(out);
TransformerFactory tf = TransformerFactory.newInstance();
Transformer serializer = tf.newTransformer();
serializer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
serializer.setOutputProperty(OutputKeys.INDENT, "yes");
serializer.setOutputProperty(OutputKeys.METHOD, "html");
serializer.transform(domSource, streamResult);
out.close();
String result = new String(out.toByteArray());
acDocTextArea.setText(newDocText);
String htmlText = result;
}
Any ideas as to why this isn't working would be much appreciated. The ByteArrayOutput should return the entire html but it is empty and has no text.
Mark, you're using HWPF package which supports only .doc format, see this description. The document also mentions attempts to provide the interface for .docx files, through XWPF package. However they seem to lack human resources and users are encouraged to submit extensions. Limited functionality should be available though, extracting the text must be one of them.
You should also see this question: How to Extract docx (word 2007 above) using apache POI.
I too was struck at this point.
Now I know there is a 3rd party API to convert docx to html
works fine
https://code.google.com/p/xdocreport/wiki/XWPFConverterXHTML