Include delimiter when performing substring operation

Include delimiter when performing substring operation - java

How do I include the delimiter when performing a substring operation?
i.e. given the string message which looks like this:
<nutrition>
<daily-values>
<total-fat units="g">65</total-fat>
<saturated-fat units="g">20</saturated-fat>
<cholesterol units="mg">300</cholesterol>
<sodium units="mg">2400</sodium>
<carb units="g">300</carb>
<fiber units="g">25</fiber>
<protein units="g">50</protein>
</daily-values>
</nutrition>
<food>
<name>Avocado Dip</name>
<mfr>Sunnydale</mfr>
<serving units="g">29</serving>
<calories total="110" fat="100"/>
<total-fat>11</total-fat>
<saturated-fat>3</saturated-fat>
<cholesterol>5</cholesterol>
<sodium>210</sodium>
<carb>2</carb>
<fiber>0</fiber>
<protein>1</protein>
<vitamins>
<a>0</a>
<c>0</c>
</vitamins>
<minerals>
<ca>0</ca>
<fe>0</fe>
</minerals>
</food>
and then
message = message.substring(message.indexOf("<food>"), message.indexOf("</food>"));
returns
<food>
<name>Avocado Dip</name>
<mfr>Sunnydale</mfr>
<serving units="g">29</serving>
<calories total="110" fat="100"/>
<total-fat>11</total-fat>
<saturated-fat>3</saturated-fat>
<cholesterol>5</cholesterol>
<sodium>210</sodium>
<carb>2</carb>
<fiber>0</fiber>
<protein>1</protein>
<vitamins>
<a>0</a>
<c>0</c>
</vitamins>
<minerals>
<ca>0</ca>
<fe>0</fe>
</minerals>
How do I get it to keep the last </food> tag given I don't know the surrounding content of the XML file?

Here's a solution using javax.xml. It aims to solve the case when multiple <food> elements are present in the document. In order to handle this case correctly, you need to
deserialize your XML into org.w3c.dom.Document
extract the list of <food> nodes as org.w3c.dom.NodeList
serialize back to String at the end
Here's a simplified example:
private static final String XML =
"<?xml version = \"1.0\" encoding = \"UTF-8\"?>\n"
+ "<message>\n"
+ " <food>\n"
+ " <name>A</name>\n"
+ " </food>\n"
+ " <food>\n"
+ " <name>B</name>\n"
+ " </food>\n"
+ "</message>\n";
#Test
public void xpath() throws Exception {
// Deserialize
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
Document document;
try (InputStream in = new ByteArrayInputStream(XML.getBytes(StandardCharsets.UTF_8))) {
document = factory.newDocumentBuilder().parse(in);
}
XPath xPath = XPathFactory.newInstance().newXPath();
XPathExpression expr = xPath.compile("//food");
NodeList nodeList = (NodeList) expr.evaluate(document, XPathConstants.NODESET);
for (int i = 0; i < nodeList.getLength(); i++) {
Node node = nodeList.item(i);
System.out.println(node.getNodeName() + ": " + node.getTextContent().trim());
}
// Serialize
Document exportDoc = factory.newDocumentBuilder().newDocument();
Node exportNode = exportDoc.importNode(nodeList.item(0), true);
exportDoc.appendChild(exportNode);
String content = serialize(exportDoc);
System.out.println(content);
}
private static String serialize(Document doc) throws TransformerException {
DOMSource domSource = new DOMSource(doc);
StringWriter writer = new StringWriter();
StreamResult result = new StreamResult(writer);
TransformerFactory tf = TransformerFactory.newInstance();
Transformer transformer = tf.newTransformer();
// set indent
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
transformer.transform(domSource, result);
return writer.toString();
}
The 1st output shows all <food> elements are deserialized correctly:
food: A
food: B
The 2nd output shows the 1st element are serialized back to string:
<food>
<name>A</name>
</food>

Related

How to modify values of multiple XML tags?

I have been trying to modify values of more than one XML tag in java. So far I am able to get the values of the two nodes that I want to modify but while setting up values it always overrides the first one with the second one.
XML
<driver>
<BirthDate>1977-07-18</BirthDate>
<Age>40</Age>
<Gender>M</Gender>
<PrimaryResidence>OwnCondo</PrimaryResidence>
</driver>
I am trying to change Gender and PrimaryResidence tags.
Code
// Modifies multiple XML nodes
public static String changeCoreDiscountType(String reqXML) {
Document document = null;
String updatedXML = null;
try {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
InputSource is = new InputSource();
is.setCharacterStream(new StringReader(reqXML));
document = builder.parse(is);
XPath xPath = XPathFactory.newInstance().newXPath();
XPathExpression expression = xPath.compile("/driver/Gender | /driver/PrimaryResidence");
NodeList nodeList = (NodeList) expression.evaluate(document,XPathConstants.NODESET);
for(int i = 0; i < nodeList.getLength(); i++) {
Node node = nodeList.item(i);
node.setTextContent("F");
node.setTextContent("OwnCondo");
String value = node.getTextContent();
}
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
DOMSource source = new DOMSource(document);
StreamResult result = new StreamResult(new StringWriter());
transformer.transform(source, result);
updatedXML = result.getWriter().toString();
} catch (Exception ex) {
ex.printStackTrace();
}
return updatedXML;
}
Any help is appreciated.

You need to check you are updating the correct node first, e.g.
for(int i = 0; i < nodeList.getLength(); i++) {
Node node = nodeList.item(i);
if(node.getNodeName() == "Gender")
node.setTextContent("F");
if(node.getNodeName() == "PrimaryResidence")
node.setTextContent("OwnCondo");
}
Full Demo

Pretty print XML in java 8

I have an XML file stored as a DOM Document and I would like to pretty print it to the console, preferably without using an external library. I am aware that this question has been asked multiple times on this site, however none of the previous answers have worked for me. I am using java 8, so perhaps this is where my code differs from previous questions? I have also tried to set the transformer manually using code found from the web, however this just caused a not found error.
Here is my code which currently just outputs each xml element on a new line to the left of the console.
import java.io.*;
import javax.xml.parsers.*;
import javax.xml.transform.*;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import org.w3c.dom.Document;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
public class Test {
public Test(){
try {
//java.lang.System.setProperty("javax.xml.transform.TransformerFactory", "org.apache.xalan.xsltc.trax.TransformerFactoryImpl");
DocumentBuilderFactory dbFactory;
DocumentBuilder dBuilder;
Document original = null;
try {
dbFactory = DocumentBuilderFactory.newInstance();
dBuilder = dbFactory.newDocumentBuilder();
original = dBuilder.parse(new InputSource(new InputStreamReader(new FileInputStream("xml Store - Copy.xml"))));
} catch (SAXException | IOException | ParserConfigurationException e) {
e.printStackTrace();
}
StringWriter stringWriter = new StringWriter();
StreamResult xmlOutput = new StreamResult(stringWriter);
TransformerFactory tf = TransformerFactory.newInstance();
//tf.setAttribute("indent-number", 2);
Transformer transformer = tf.newTransformer();
transformer.setOutputProperty(OutputKeys.METHOD, "xml");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "no");
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
transformer.transform(new DOMSource(original), xmlOutput);
java.lang.System.out.println(xmlOutput.getWriter().toString());
} catch (Exception ex) {
throw new RuntimeException("Error converting to String", ex);
}
}
public static void main(String[] args){
new Test();
}
}

In reply to Espinosa's comment, here is a solution when "the original xml is not already (partially) indented or contain new lines".
Background
Excerpt from the article (see References below) inspiring this solution:
Based on the DOM specification, whitespaces outside the tags are perfectly valid and they are properly preserved. To remove them, we can use XPath’s normalize-space to locate all the whitespace nodes and remove them first.
Java Code
public static String toPrettyString(String xml, int indent) {
try {
// Turn xml string into a document
Document document = DocumentBuilderFactory.newInstance()
.newDocumentBuilder()
.parse(new InputSource(new ByteArrayInputStream(xml.getBytes("utf-8"))));
// Remove whitespaces outside tags
document.normalize();
XPath xPath = XPathFactory.newInstance().newXPath();
NodeList nodeList = (NodeList) xPath.evaluate("//text()[normalize-space()='']",
document,
XPathConstants.NODESET);
for (int i = 0; i < nodeList.getLength(); ++i) {
Node node = nodeList.item(i);
node.getParentNode().removeChild(node);
}
// Setup pretty print options
TransformerFactory transformerFactory = TransformerFactory.newInstance();
transformerFactory.setAttribute("indent-number", indent);
Transformer transformer = transformerFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
// Return pretty print xml string
StringWriter stringWriter = new StringWriter();
transformer.transform(new DOMSource(document), new StreamResult(stringWriter));
return stringWriter.toString();
} catch (Exception e) {
throw new RuntimeException(e);
}
}
Sample usage
String xml = "<root>" + //
"\n " + //
"\n<name>Coco Puff</name>" + //
"\n <total>10</total> </root>";
System.out.println(toPrettyString(xml, 4));
Output
<root>
<name>Coco Puff</name>
<total>10</total>
</root>
References
Java: Properly Indenting XML String published on MyShittyCode
Save new XML node to file

I guess that the problem is related to blank text nodes (i.e. text nodes with only whitespaces) in the original file. You should try to programmatically remove them just after the parsing, using the following code. If you don't remove them, the Transformer is going to preserve them.
original.getDocumentElement().normalize();
XPathExpression xpath = XPathFactory.newInstance().newXPath().compile("//text()[normalize-space(.) = '']");
NodeList blankTextNodes = (NodeList) xpath.evaluate(original, XPathConstants.NODESET);
for (int i = 0; i < blankTextNodes.getLength(); i++) {
blankTextNodes.item(i).getParentNode().removeChild(blankTextNodes.item(i));
}

This works on Java 8:
public static void main (String[] args) throws Exception {
String xmlString = "<hello><from>ME</from></hello>";
DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder();
Document document = documentBuilder.parse(new InputSource(new StringReader(xmlString)));
pretty(document, System.out, 2);
}
private static void pretty(Document document, OutputStream outputStream, int indent) throws Exception {
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
if (indent > 0) {
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", Integer.toString(indent));
}
Result result = new StreamResult(outputStream);
Source source = new DOMSource(document);
transformer.transform(source, result);
}

I've written a simple class for for removing whitespace in documents - supports command-line and does not use DOM / XPath.
Edit: Come to think of it, the project also contains a pretty-printer which handles existing whitespace:
PrettyPrinter prettyPrinter = PrettyPrinterBuilder.newPrettyPrinter().ignoreWhitespace().build();

Underscore-java has static method U.formatXml(string). I am the maintainer of the project. Live example
import com.github.underscore.U;
public class MyClass {
public static void main(String args[]) {
String xml = "<root>" + //
"\n " + //
"\n<name>Coco Puff</name>" + //
"\n <total>10</total> </root>";
System.out.println(U.formatXml(xml));
}
}
Output:
<root>
<name>Coco Puff</name>
<total>10</total>
</root>

I didn't like any of the common XML formatting solutions because they all remove more than 1 consecutive new line character (for some reason, removing spaces/tabs and removing new line characters are inseparable...). Here's my solution, which was actually made for XHTML but should do the job with XML as well:
public String GenerateTabs(int tabLevel) {
char[] tabs = new char[tabLevel * 2];
Arrays.fill(tabs, ' ');
//Or:
//char[] tabs = new char[tabLevel];
//Arrays.fill(tabs, '\t');
return new String(tabs);
}
public String FormatXHTMLCode(String code) {
// Split on new lines.
String[] splitLines = code.split("\\n", 0);
int tabLevel = 0;
// Go through each line.
for (int lineNum = 0; lineNum < splitLines.length; ++lineNum) {
String currentLine = splitLines[lineNum];
if (currentLine.trim().isEmpty()) {
splitLines[lineNum] = "";
} else if (currentLine.matches(".*<[^/!][^<>]+?(?<!/)>?")) {
splitLines[lineNum] = GenerateTabs(tabLevel) + splitLines[lineNum];
++tabLevel;
} else if (currentLine.matches(".*</[^<>]+?>")) {
--tabLevel;
if (tabLevel < 0) {
tabLevel = 0;
}
splitLines[lineNum] = GenerateTabs(tabLevel) + splitLines[lineNum];
} else if (currentLine.matches("[^<>]*?/>")) {
splitLines[lineNum] = GenerateTabs(tabLevel) + splitLines[lineNum];
--tabLevel;
if (tabLevel < 0) {
tabLevel = 0;
}
} else {
splitLines[lineNum] = GenerateTabs(tabLevel) + splitLines[lineNum];
}
}
return String.join("\n", splitLines);
}
It makes one assumption: that there are no <> characters except for those that comprise the XML/XHTML tags.

Create xml file :
new FileInputStream("xml Store - Copy.xml") ;// result xml file format incorrect !
so that, when parse the content of the given input source as an XML document
and return a new DOM object.
Document original = null;
...
original.parse("data.xml");//input source as an XML document

Need of transformer function

I am linking to a webservice all works fine. Below is my codes.
private static void printSOAPResponse(SOAPMessage soapResponse) throws Exception {
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
Source sourceContent = soapResponse.getSOAPPart().getContent();
System.out.print("\nResponse SOAP Message = \n");
StreamResult result = new StreamResult(System.out);
transformer.transform(sourceContent, result);
}
I am not too sure of the transformerfactory. What I need to do now is to traverse through the results and look for below tag.
<Table diffgr:id="Table1" > and there after there will be few tags in it for e.g.
<rID>1212</rID>
<sNo>15677</sNo>
So what is the best way as some require to covert it into string is that necessary?

Transform to document (unchecked):
TransformerFactory tf = TransformerFactory.newInstance();
Transformer transformer = tf.newTransformer();
DOMResult result = new DOMResult();
transformer.transform(sourceContent, result);
Document doc = (Document) result.getNode();
Find in document:
String tag = "Table";
String attr = "diffgr:id";
String attrValue = "Table1";
NodeList list = doc.getElementsByTagName("Table");
Element tableNode = null;
for (int i = 0; i < list.getLength(); i++) {
tableNode = ((Element) list.item(i));
String currentAttrValue = tableNode.getAttribute(attr);
if (attrValue.equals(currentAttrValue)) {
break;
}
}
String childTag1 = "rID";
String childTag2 = "sNo";
Node child1 = (Node) tableNode.getElementsByTagName(childTag1).item(0);
Node child2 = (Node) tableNode.getElementsByTagName(childTag2).item(0);
String rIDValue = child1.getTextContent();
String sNoValue = child1.getTextContent();

You code Transformer transformer = transformerFactory.newTransformer(); is creating an "identity transformer" which copies the input unchanged, so it's not really doing anything useful. What you want here is a real (XSLT) transformer which actually extracts the information you need: something like
<xsl:template match="/">
<xsl:copy-of select="//Table[#diffgr:id='Table1']"/>
</xsl:template>
which you can compile using transformerFactory.newTemplates().

updated xml data not updated in the xml file

i have made a method for updating my xml in the xml file by a using a GUI..
but when I update it everything seem to be working fine and the console is printing out the correct things.
But when I open the xml file and press refrah nothing is updated.
What is my problem?
public void updateObjType(String newTxt, int x) throws ParserConfigurationException, SAXException, IOException, XPathExpressionException {
System.out.println("String value : " + newTxt);
System.out.println("Index value : " + x);
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = builderFactory.newDocumentBuilder();
Document xmlDocument = builder.parse("xmlFiles/CoreDatamodel.xml");
XPath xPath = XPathFactory.newInstance().newXPath();
// Go thru the Object_types in the XML file and get item x.
NodeList nodeList = (NodeList) xPath.compile("//OBJECT_TYPE/text()")
.evaluate(xmlDocument, XPathConstants.NODESET);
// Set new NodeValue
nodeList.item(x).setNodeValue(newTxt);
String value = nodeList.item(x).getTextContent();
System.out.println(value);
}
this is the output from the console :
Original data : IF150Data
Incoming String value : Data
Index value : 4
updated data : Data

I solved it by using a transformer.
Full solution :
// Update the object type name from the object type list.
public void updateObjType(String newTxt, int x)
throws ParserConfigurationException, SAXException, IOException,
XPathExpressionException {
File file = new File("xmlFiles/CoreDatamodel.xml");
System.out.println("Incoming String value : " + newTxt);
System.out.println("Index value : " + x);
DocumentBuilderFactory builderFactory = DocumentBuilderFactory
.newInstance();
DocumentBuilder builder = builderFactory.newDocumentBuilder();
Document xmlDocument = builder.parse(file);
XPath xPath = XPathFactory.newInstance().newXPath();
NodeList nodeList = (NodeList) xPath.compile("//OBJECT_TYPE/text()")
.evaluate(xmlDocument, XPathConstants.NODESET);
// Set new NodeValue
nodeList.item(x).setNodeValue(newTxt);
// Save the new updates
try {
save(file, xmlDocument);
} catch (Exception e) {
e.printStackTrace();
}
}
And then the method I added :
public void save(File file, Document doc) throws Exception {
TransformerFactory factory = TransformerFactory.newInstance();
Transformer transformer = factory.newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
StringWriter writer = new StringWriter();
StreamResult result = new StreamResult(writer);
DOMSource source = new DOMSource(doc);
transformer.transform(source, result);
String s = writer.toString();
System.out.println(s);
FileWriter fileWriter = new FileWriter(file);
BufferedWriter bufferedWriter = new BufferedWriter(fileWriter);
bufferedWriter.write(s);
bufferedWriter.flush();
bufferedWriter.close();
}

How to retrieve XML including tags using the DOM parser

I am using org.w3c.dom to parse an XML file. Then I need to return the ENTIRE XML for a specific node including the tags, not just the values of the tags. I'm using the NodeList because I need to count how many records are in the file. But I also need to read the file wholesale from the beginning and then write it out to a new XML file. But my current code only prints the value of the node, but not the node itself. I'm stumped.
public static void main(String[] args) {
try {
File fXmlFile = new File (args[0]);
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(fXmlFile);
System.out.println("Root element :" + doc.getDocumentElement().getNodeName());
NodeList listOfRecords = doc.getElementsByTagName("record");
int totalRecords = listOfRecords.getLength();
System.out.println("Total number of records : " + totalRecords);
int amountToSplice = queryUser();
for (int i = 0; i < amountToSplice; i++) {
String stringNode = listOfRecords.item(i).getTextContent();
System.out.println(stringNode);
}
} catch (Exception e) {
e.printStackTrace();
}
}

getTextContent() will only "return the text content of this node and its descendants" i.e. you only get the content of the 'text' type nodes. When parsing XML it's good to remember there are several different types of node, see XML DOM Node Types.
To do what you want, you could create a utility method like this...
public static String nodeToString(Node node)
{
Transformer t = TransformerFactory.newInstance().newTransformer();
t.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
t.setOutputProperty(OutputKeys.INDENT, "yes");
StringWriter sw = new StringWriter();
t.transform(new DOMSource(node), new StreamResult(sw));
return sw.toString();
}
Then loop and print like this...
for (int i = 0; i < amountToSplice; i++)
System.out.println(nodeToString(listOfRecords.item(i)));

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Include delimiter when performing substring operation - java

Related

How to modify values of multiple XML tags?

Pretty print XML in java 8

Need of transformer function

updated xml data not updated in the xml file

How to retrieve XML including tags using the DOM parser

Categories

Resources