Avoid repeated instantiation of InputSource with XPath in Java - java

Currently I am parsing XML messages with XPath Expression. It works very well. However I have the following problem:
I am parsing the whole data of the XML, thus I instantiate for every call made to xPath.evaulate a new InputSource.
StringReader xmlReader = new StringReader(xml);
InputSource source = new InputSource(xmlReader);
XPathExpression xpe = xpath.compile("msg/element/#attribute");
String attribute = (String) xpe.evaluate(source, XPathConstants.STRING);
Now I would like to go deeper into my XML message and evaluate more information. For this I found myself in the need to instantiate source another time. Is this required? If I don't do it, I get Stream closed Exceptions.

Parse the XML to a DOM and keep a reference to the node(s). Example:
XPath xpath = XPathFactory.newInstance()
.newXPath();
InputSource xml = new InputSource(new StringReader("<xml foo='bar' />"));
Node root = (Node) xpath.evaluate("/", xml, XPathConstants.NODE);
System.out.println(xpath.evaluate("/xml/#foo", root));
This avoids parsing the string more than once.
If you must reuse the InputSource for a different XML string, you can probably use the setters with a different reader instance.

Related

XML with different namespaces drilling down to needed value

I am trying to figure out how to go about getting the value of jxdm:ID from the following XML file:
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<My:Message
xmlns:Abcd="http://...."
xmlns:box-1="http://...."
xmlns:bulb="http://...."
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xsi:schemaLocation="http://....stores.xsd">
<Abcd:StoreDataSection>
<Abcd:DataSection>
<Abcd:FirstStore>
<box-1:Response>
<box-1:DataSection>
<box-1:Release>
<box-1:Activity>
<bulb:Date>2017-04-29</bulb:Date>
<bulb:Store xsi:type="TPIR:Organization">
<bulb:StoreID>
<bulb:ID>D79G2102</bulb:ID>
</bulb:StoreID>
</bulb:Store>
</box-1:Activity>
</box-1:Release>
</box-1:DataSection>
</box-1:Response>
</Abcd:FirstStore>
</Abcd:DataSection>
</Abcd:StoreDataSection>
</ My:Message>
I keep getting "null" as the value of node
Node node = (Node) xPath.evaluate(expression, document, XPathConstants.NODE);
This is my current Java code:
try {
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = builderFactory.newDocumentBuilder();
Document document = builder.parse(new File("c:/temp/testingNamespace.xml"));
XPath xPath = XPathFactory.newInstance().newXPath();
String expression = "//My/Message//Abcd/StoreDataSection/DataSection/FirstStore//box-1/Response/DataSection/Release/Activity//bulb/Store/StoreID/ID";
Node node = (Node) xPath.evaluate(expression, document, XPathConstants.NODE);
node.setTextContent("changed ID");
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.transform(new DOMSource(document), new StreamResult(new File("C:/temp/test-updated.xml")));
} catch (Exception e) {
System.out.println(e.getMessage());
}
How would the correct XPath be formatted in order for me to get that value and change it?
Update 1
So something like this?
String expression = "/My:Message/Abcd:StoreDataSection/Abcd:DataSection/Abcd:FirstStore/box-1:Response/box-1:DataSection/box-1:Release/box-1:Activity/bulb:Store/bulb:StoreID/bulb:ID";
The problem is that you should access to Node by prefix (if you want to) but in a different way, like: //bulb:StoreID if you want to access StorID for example.
Then again it would still not work because you need to tell XPath how to resolve namspaces prefixes.
You should check this answer : How to query XML using namespaces in Java with XPath?
for details on how to implement and use a NamespaceContext.
The bottom line is that you need to implement a javax.xml.namespace.NamespaceContext and set it to the XPath.
XPath xpath = XPathFactory.newInstance().newXPath();
NamespaceContext context = new MyNamespaceContext();
xpath.setNamespaceContext(context);
Two things wrong here:
Your XML is not namespace-well-formed; it does not declare the used namespace prefixes.
Once namespace prefixes are properly declared in the XML and in your Java code, you use them in XPath via : not via /. So, it'd be not /Abcd/StoreDataSection but rather /Abcd:StoreDataSection (and so on for the rest of the steps in your XPath).
See also How does XPath deal with XML namespaces?
I am unable to change anything in the XML so I have to go with it as-is sadly.
Technically you might be able to use some XML tools with undeclared namespaces because this omission only renders the XML only namespace-not-well-formed. Many tools expect not only well-formed but also namespace-well-formed XML. (See Namespace-Well-Formed
for the difference)
Otherwise, see How to parse invalid (bad / not well-formed) XML? to repair your XML.

How to convert xpath to java code

I have a xpath of an element and need to write a java code which gives me exactly the same element as an object. I believe i need to use SAX or DOM ? i m totally newbie..
xpath :
/*[local-name(.)='feed']/*[local-name(.)='entry']/*[local-name(.)='title']
Your comment suggests you want to use DOM4J, which supports XPath out of the box:
SAXReader reader = new SAXReader();
Document doc = reader.read(new File(....)); // or URL, or wherever the XML comes from
Node selectedNode = doc.selectSingleNode("/*[local-name(.)='feed']/*[local-name(.)='entry']/*[local-name(.)='title']");
(or there's also selectNodes which returns a List, if there might be more than one node matching that XPath expression - quite likely if this is an Atom feed).
But rather than using the local-name hack like this, if you know the namespace URI of the elements in your XML you can declare a prefix for this namespace and select the nodes by their fully qualified name:
SAXReader reader = new SAXReader();
Map<String, String> namespaces = new HashMap<>();
namespaces.put("atom", "http://www.w3.org/2005/Atom");
reader.getDocumentFactory().setXPathNamespaceURIs(namespaces);
Document doc = reader.read(new File(....)); // or URL, or wherever the XML comes from
List selectedNodes = doc.selectNodes("/atom:feed/atom:entry/atom:title");
read here:
https://howtodoinjava.com/java/xml/java-xpath-tutorial-example/
I found it while I were searching to find how to convert Xpath PMD-rule to java-rule,, I did not find what I need in it.
but, anyway may be you can find yours.

How to get value within an xml tag using java code

I have a String variable in java with xml tags as its value:
eg: String xml="<root><name>abcd</name><age>22</age><gender>male</gender></root>";
Now I need to get the value within the name tag i.e "abcd" from this variable and store the value in another string variable. How to go about this using java. Can anyone please help me out with this?
It is not quite clear what you want, but I think what you will need is something to read an XML document (as a file or directly as a string), an XML parser.
There is a whole list (and many more) of different XML parsers you can use for this:
JDOM
Woodstox
XOM
dom4j
VTD-XML
Xerces-J
Crimson
I would recommend dom4j for its easy usage. Here is an example for a dom4j implemenation:
String xmlPath = "myXmlDocument.xml";
SAXReader reader = new SAXReader();
Document document = reader.read(xmlPath);
Element rootElement = document.getRootElement();
System.out.println("Root Element: "+rootElement.getName());
You can directly feed in a String to be parsed to an XML document too:
String xmlString = "<name>Hello</name>";
SAXReader reader = new SAXReader();
Document document = DocumentHelper.parseText(xmlString);
Element rootElement = document.getRootElement();
System.out.println("Root Element: "+rootElement.getName());
References
Best XML parser for Java
http://dom4j.sourceforge.net/dom4j-1.6.1/faq.html#from-string

Xpath returns empty element object

I have an xml document as a string without any namespace and I want to parse it using Java, JDOM and XPath, and create a object tree. Since XPAth always requires a prefix and a namespace to query, I added namespace and a prefix to the root and then later to the node I want to get, but I see Xpath requires a namespace in every node in the document but only in the root.
So in the beginning is there a way to add the namespace to all of the elements in the document object so my xpath query works correct?
There should be other mistakes and bad approches in the code as well. Will be glad for any ideas.
String response="myXmlString"
ByteArrayInputStream stream = new ByteArrayInputStream(
response.getBytes());
SAXBuilder builder = new SAXBuilder();
Document doc = builder.build(stream);
org.jdom.Element request=(org.jdom.Element) doc.getRootElement();
request.setNamespace(Namespace.getNamespace("myNamespace"));
createRequest(request);
And then
public Request createRequest(Element requestXML) {
Request request = new Request();
requestXML.detach();
Document doc = new Document(requestXML);
XPath xpath = XPath.newInstance(myExpression);
xpath.addNamespace("m", doc.getRootElement().getNamespaceURI());
xpath.selectSingleNode(doc);
}
this last line returns empty, it is not null but it throws jdom exception inside.
XPath and XML do NOT require namespace. Go back to your original XML and remove any namespace/prefix hackery in your code.

how to use Pattern matcher in java?

lets say the string is <title>xyz</title>
I want to extract the xyz out of the string.
I used:
Pattern titlePattern = Pattern.compile("&lttitle&gt\\s*(.+?)\\s*&lt/title&gt");
Matcher titleMatcher = titlePattern.matcher(line);
String title=titleMatcher.group(1));
but I am getting an error for titlePattern.matcher(line);
You say your error occurs earlier (what is the actual error, runs without an error for me), but after solving that you will need to call find() on the matcher once to actually search for the pattern:
if(titleMatcher.find()){
String title = titleMatcher.group(1);
}
Not that if you really match against a string with non-escaped HTML entities like
<title>xyz</title>
Then your regular expression will have to use these, not the escaped entities:
"<title>\\s*(.+?)\\s*</title>"
Also, you should be careful about how far you try to get with this, as you can't really parse HTML or XML with regular expressions. If you are working with XML, it's much easier to use an XML parser, e.g. JDOM.
Not technically an answer but you shouldn't be using regular expressions to parse HTML. You can try and you can get away with it for simple tasks but HTML can get ugly. There are a number of Java libraries that can parse HTML/XML just fine. If you're going to be working a lot with HTML/XML it would be worth your time to learn them.
As others have suggested, it's probably not a good idea to parse HTML/XML with regex. You can parse XML Documents with the standard java API, but I don't recommend it. As Fabian Steeg already answered, it's probably better to use JDOM or a similar open source library for parsing XML.
With javax.xml.parsers you can do the following:
String xml = "<title>abc</title>";
DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
Document doc = docBuilder.parse(new InputSource(new StringReader(xml)));
NodeList nodeList = doc.getElementsByTagName("title");
String title = nodeList.item(0).getTextContent();
This parses your XML string into a Document object which you can use for further lookups. The API is kinda horrible though.
Another way is to use XPath for the lookup:
XPathFactory xpathFactory = XPathFactory.newInstance();
XPath xPath = xpathFactory.newXPath();
String titleByXpath = xPath.evaluate("/title/text()", new InputSource(new StringReader(xml)));
// or use the Document for lookup
String titleFromDomByXpath = xPath.evaluate("/title/text()", doc);

Categories

Resources