How to get unique value of repeated nodes using dom parser

How to get unique value of repeated nodes using dom parser - java

I have a XML having repeated nodes and I have top parse it using DOM parser. After a lot R&D I could find anything on internet which can help me. My xml looks like
<nos1>
<Name>aqwer</Name>
<class>sas</class>
<class>xcd</class>
<class>asd</class>
<Name>cfg</Name>
<Name>cfg</Name>
<nos1>
Any suggestion How can I parse this xml for repeated values.

You can use w3c dom document to parse your XML as follows:
DocumentBuilderFactory df = DocumentBuilderFactory.newInstance();
try
{
DocumentBuilder db = df.newDocumentBuilder();
InputStream is = new ByteArrayInputStream(response.getContent().getBytes("UTF-8"));
org.w3c.dom.Document doc = db.parse(is);
NodeList links = doc.getElementsByTagName("class");
for(int i=0; i< links.getLength(); i++)
{
Node link = links.item(i);
System.out.println(link.getTextContent());
}
}
catch(Exception ex)
{
}
Hope this helps you.

You should read all elements and after reading eliminate the duplicates via a Set. Here is an example using XMLBeam, but any other library will do.
public class TestMultipleElements {
#XBDocURL("resource://test.xml")
public interface Projection {
#XBRead("/nos1/Name")
List<String> getNames();
#XBRead("/nos1/class")
List<String> getClasses();
}
#Test
public void uniqueElements() throws IOException {
Projection projection = new XBProjector().io().fromURLAnnotation(Projection.class);
for (String name : new HashSet<String>(projection.getNames())) {
System.out.println("Found Name:" + name);
}
for (String clazz : new HashSet<String>(projection.getClasses())) {
System.out.println("Found Name:" + clazz);
}
}
}
This prints out:
Found Name:aqwer
Found Name:cfg
Found Name:xcd
Found Name:sas
Found Name:asd

Related

Java How Compare XML Data with file extensions

im new here , just wanted to try if i can get some help here.
I would like to ask for some help for my problem.
I got an XML-File, and i would like to compare those Strings there with File extension for exmaple. Example.txt -> compare all Strings in XML with my File-Extension.
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href"tx ?>
<zip>
<exclusions>
<switch> .bak </switch>
<switch> .tmp </switch>
<switch> .frm </switch>
<switch> .opt </switch>
<switch> .met </switch>
<switch> .i </switch>
</exclusions>
</zip>
This is my XML Code to print it , my idea was to store all the Strings into arrays and compare them with my extension .. but i dont know how.
Hope you have some ideas for me.
Thanks
public class xmlFileExten {
public static void main(String[] args) {
try {
File file = new File(xmlFile);
DocumentBuilder dBuilder = DocumentBuilderFactory.newInstance()
.newDocumentBuilder();
Document doc = dBuilder.parse(file);
System.out.println("Root element :" + doc.getDocumentElement().getNodeName());
if (doc.hasChildNodes()) {
printNote(doc.getChildNodes());
}
} catch (Exception e) {
System.out.println(e.getMessage());
}
}
private static void printNote(NodeList nodeList) {
for (int count = 0; count < nodeList.getLength(); count++) {
Node tempNode = nodeList.item(count);
if (tempNode.getNodeType() == Node.ELEMENT_NODE) {
System.out.println("Node Value =" + tempNode.getTextContent());

You can use following code. Main changes:
1) Using List as result instead of Array,
2) Using textNode AND getNodeValue() instead of getTextContent (getNodeValue returns text only this node),
3) Using recursive function,
public class xmlFileExten
{
public static void main(final String[] args)
{
final List<String> extensionList = getExtensionList("1.xml");
System.out.print(extensionList); // return [.bak, .tmp, .frm, .opt, .met, .i]
}
private static List<String> getExtensionList(final String fileName)
{
final List<String> results = new ArrayList<>();
try
{
final File file = new File(fileName);
final DocumentBuilder dBuilder = DocumentBuilderFactory.newInstance()
.newDocumentBuilder();
final Document doc = dBuilder.parse(file);
if (doc.hasChildNodes())
{
results.addAll(getExtensionList(doc.getChildNodes()));
}
}
catch (final Exception e)
{
System.out.println(e.getMessage());
}
return results;
}
private static List<String> getExtensionList(final NodeList nodeList)
{
final List<String> results = new ArrayList<>();
for (int count = 0; count < nodeList.getLength(); count++)
{
final Node tempNode = nodeList.item(count);
final String value = tempNode.getNodeValue();
if (tempNode.getNodeType() == Node.TEXT_NODE && value != null && !value.trim().isEmpty())
{
results.add(value.trim());
}
results.addAll(getExtensionList(tempNode.getChildNodes()));
}
return results;
}
}

I think the main problem here is that you are not able to parse it properly . Refer this Parse XML TO JAVA POJO in efficient way
and you can use http://pojo.sodhanalibrary.com/ to get the correct POJO classes required for your task.
After you get the POJO you can compare the extensions

Java XML DOM: Why does Java XML Attr.isId() return false when it should return true

I have an xsd file with a section like this:
<xsd:complexType name="tDefinitions">
<xsd:attribute name="id" type="xsd:ID" use="optional"/>
I am creating a DocumentBuilder like this:
private final DocumentBuilderFactory _docBldF = DocumentBuilderFactory.newInstance();
....
public synchronized void addSchema(String schemaUri) {
if(!_schemaUris.contains(schemaUri)){
_schemaUris.add(schemaUri);
}
Source[] allSources = new Source[_schemaUris.size()];
for (int i = 0; i < allSources.length; i++) {
allSources[i] = new StreamSource(_schemaUris.get(i));
}
Schema schema;
SchemaFactory scFactory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
try {
schema = scFactory.newSchema(allSources);
} catch (SAXException e) {
throw new RuntimeException(e);
}
_docBldF.setSchema(schema);
_docBldF.setNamespaceAware(true);
}
....
exec.addSchema(getClass().getResource("model/xsd/some.xsd").toURI().toString());
and here's how I load the XML:
DocumentBuilder dBuilder = _docBldF.newDocumentBuilder();
Document document = dBuilder.parse(xml);
document.normalizeDocument();
then I am walking around this doc:
definitionsElement = document.getDocumentElement();
NodeList defNodes = definitionsElement.getChildNodes();
.....
And now I want to know the ID of an element:
NamedNodeMap attrs = element.getAttributes();
String id = null;
for (int i = 0; i < attrs.getLength(); i++) {
Attr attr = (Attr) attrs.item(i);
if (attr.isId()) {
id = attr.getValue();
break;
}
}
The isId() returns false even if in the debugger I see:
attr.getSchemaTypeInfo().getTypeName().equals("ID") // true
and other traces that the schema was actually loaded.
Please advise what I am missing as I think I am doing everything I should in the correct order: loading an XSD, setting namespace awareness, normalizing the document.
The XSD has multiple imports and includes and still seems to be loaded correctly with all of them.
The addSchema is called only once now but is intended to be called multiple times with different XSD files and is thought to be able to adapt the DocumentBuilder to load XML files of any format (schema) that we would have added. I would expect it to corrupt the Doc Builder when called second time, but now it is called only once...
upd
Just for fun:
_document.getElementById("some_id").getAttributeNode("id").isId() // return false
The parser is able to locate an element by its id, but can't tell that the id attribute is an id. On my level of knowledge I consider this impossible, please help
As a workaround, I am going to check whether
attr.getSchemaTypeInfo().getTypeName().equals("ID")

How to make XML Parser aware of all Character Entity References?

I get arbitrary XML from a server and parse it using this Java code:
String xmlStr; // arbitrary XML input
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
try {
DocumentBuilder builder = factory.newDocumentBuilder();
InputSource is = new InputSource(new StringReader(xmlStr));
return builder.parse(is);
}
catch (SAXException | IOException | ParserConfigurationException e) {
LOGGER.error("Failed to parse XML.", e);
}
Every once in a while, the XML input contains some unknown entity reference like and fails with an error, such as org.xml.sax.SAXParseException: The entity "nbsp" was referenced, but not declared.
I could solve this problem by preprocessing the original xmlStr and translating all problematic entity references before parsing. Here's a dummy implementation that works:
protected static String translateEntityReferences(String xml) {
String newXml = xml;
Map<String, String> entityRefs = new HashMap<>();
entityRefs.put(" ", " ");
entityRefs.put("«", "«");
entityRefs.put("»", "»");
// ... and 250 more...
for(Entry<String, String> er : entityRefs.entrySet()) {
newXml = newXml.replace(er.getKey(), er.getValue());
}
return newXml;
}
However, this is really unsatisfactory, because there are are a huge number of entity references which I don't want to all hard-code into my Java class.
Is there any easy way of teaching this entire list of character entity references to the DocumentBuilder?

If you can change your code to work with StAX instead of DOM, the trivial solution is to use the XMLInputFactory property IS_REPLACING_ENTITY_REFERENCES set to false.
public static void main(String[] args) throws Exception
{
String doc = "<doc> </doc>";
ByteArrayInputStream is = new ByteArrayInputStream(doc.getBytes());
XMLInputFactory xif = XMLInputFactory.newFactory();
xif.setProperty(javax.xml.stream.XMLInputFactory.IS_REPLACING_ENTITY_REFERENCES, false);
XMLStreamReader xr = xif.createXMLStreamReader(is);
while(xr.hasNext())
{
int t = xr.getEventType();
switch(t) {
case XMLEvent.ENTITY_REFERENCE:
System.out.println("Entity: "+ xr.getLocalName());
break;
case XMLEvent.START_DOCUMENT:
System.out.println("Start Document");
break;
case XMLEvent.START_ELEMENT:
System.out.println("Start Element: " + xr.getLocalName());
break;
case XMLEvent.END_DOCUMENT:
System.out.println("End Document");
break;
case XMLEvent.END_ELEMENT:
System.out.println("End Element: " + xr.getLocalName());
break;
default:
System.out.println("Other: ");
break;
}
xr.next();
}
}
Output:
Start Document
Start Element: doc
Entity: nbsp null
End Element: doc
But that may require too much rewrite in your code if you really need the full DOM tree in memory.
I spent an hour tracing through the DOM implementation and couldn't find any way to make the DOM parser read from an XMLStreamReader.
Also there is evidence in the code that the internal DOM parser implementation has an option similar to IS_REPLACING_ENTITY_REFERENCES but I couldn't find any way to set it from the outside.

Saxon XPathEvaluator only returns the first result

I am trying to make some XPaths queries in a XML document using the Saxon 9.5 HE Java Library. I created the query with the net.sf.saxon.xpath.XPathEvaluator, and wanted to get all book titles out of a XML document. Unfortunately I only get the first title. Here is my sample code:
public static void main(String[] args)
{
try
{
InputSource is = new InputSource(new File("books.xml").toURI().toURL().toString());
String x = new XPathEvaluator().evaluate("//book/title", is);
System.out.println(x);
}
catch(Exception e)
{
e.printStackTrace();
}
}
Thanks and kind regards :)

I just implemented a solution with the XPathCompiler:
This works fine. If you are interested you can have a look at the source code:
public static void main(String[] args)
{
try
{
Processor proc = new Processor(false);
DocumentBuilder builder = proc.newDocumentBuilder();
XPathCompiler xpc = proc.newXPathCompiler();
XPathSelector selector = xpc.compile("//book/title").load();
selector.setContextItem(builder.build(new File("books.xml")));
for (XdmItem item: selector)
{
System.out.println(item.getStringValue());
}
}
catch(Exception e)
{
e.printStackTrace();
}
}

Which XPathEvaluator is that, the JAXP one? I think that implements XPath 1.0 semantics where the string result of evaluating an XPath expression returning a set of nodes returns the string value of the first selected node. You would either need to evaluate to node set or you could use the XPath "string-join(//book/title, ', ')".
If you want to have a sequence of string values you could use http://www.saxonica.com/html/documentation/javadoc/net/sf/saxon/sxpath/XPathEvaluator.html although that documentation suggests the preferred way is to use s9api with http://www.saxonica.com/html/documentation/javadoc/net/sf/saxon/s9api/XPathCompiler.html.

What's wrong with this Java XML-Parsing code?

I'm trying to parse an XML file and be able to insert a path and get the value of the field.
It looks as follows:
import java.io.IOException;
import javax.xml.parsers.*;
import org.w3c.dom.*;
import org.xml.sax.SAXException;
public class XMLConfigManager {
private Element config = null;
public XMLConfigManager(String file) {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
try {
Document domTree;
DocumentBuilder db = dbf.newDocumentBuilder();
domTree = db.parse(file);
config = domTree.getDocumentElement();
}
catch (IllegalArgumentException iae) {
iae.printStackTrace();
}
catch (ParserConfigurationException pce) {
pce.printStackTrace();
}
catch (SAXException se) {
se.printStackTrace();
}
catch (IOException ioe) {
ioe.printStackTrace();
}
}
public String getStringValue(String path) {
String[] pathArray = path.split("\\|");
Element tempElement = config;
NodeList tempNodeList = null;
for (int i = 0; i < pathArray.length; i++) {
if (i == 0) {
if (tempElement.getNodeName().equals(pathArray[0])) {
System.out.println("First element is correct, do nothing here (just in next step)");
}
else {
return "**This node does not exist**";
}
}
else {
tempNodeList = tempElement.getChildNodes();
tempElement = getChildElement(pathArray[i],tempNodeList);
}
}
return tempElement.getNodeValue();
}
private Element getChildElement(String identifier, NodeList nl) {
String tempNodeName = null;
for (int i = 0; i < nl.getLength(); i++) {
tempNodeName = nl.item(i).getNodeName();
if (tempNodeName.equals(identifier)) {
Element returner = (Element)nl.item(i).getChildNodes();
return returner;
}
}
return null;
}
}
The XML looks like this (for test purposes):
<?xml version="1.0" encoding="UTF-8"?>
<amc>
<controller>
<someOtherTest>bla</someOtherTest>
<general>
<spam>This is test return String</spam>
<interval>1000</interval>
</general>
</controller>
<agent>
<name>test</name>
<ifc>ifcTest</ifc>
</agent>
</amc>
Now I can call the class like this
XMLConfigManager xmlcm = new XMLConfigManager("myConfig.xml");
System.out.println(xmlcm.getStringValue("amc|controller|general|spam"));
Here, I'm expecting the value of the tag spam, so this would be "This is test return String". But I'm getting null.
I've tried to fix this for days now and I just can't get it. The iteration works so it gets to the tag spam, but then, just as I said, it returns null instead of the text.
Is this a bug or am I just doing wrong? Why? :(
Thank you very much for help!
Regards, Flo

You're calling Node.getNodeValue() - which is documented to return null when you call it on an element. You should call getTextContent() instead - or use a higher level API, of course.

As others mentioned before me, you seem to be reinventing the concept of XPath. You can replace your code with the following:
javax.xml.xpath.XPath xpath = javax.xml.xpath.XPathFactory.newInstance().newXPath();
String expression = "/amc/controller/general/spam";
org.xml.sax.InputSource inputSource = new org.xml.sax.InputSource("myConfig.xml");
String result = xpath.evaluate(expression, inputSource);
See also: XML Validation and XPath Evaluation in J2SE 5.0
EDIT:
An example of extracting a collection with XPath:
NodeList result = (NodeList) xpath.evaluate(expression, inputSource, XPathConstants.NODESET);
for (int i = 0; i < result.getLength(); i++) {
System.out.println(result.item(i).getTextContent());
}
The javax.xml.xpath.XPath interface is documented here, and there are a few more examples in the aforementioned article.
In addition, there are third-party libraries for XML manipulation, which you may find more convenient, such as dom4j (suggested by duffymo) or JDOM. Regardless of which library you use, you can leverage the quite powerful XPath language.

Because you're using getNodeValue() rather than getTextContent().
Doing this by hand is an accident waiting to happen; either use the built-in XPath solutions, or a third-party library as suggested by #duffymo. This is not a situation where re-invention adds value, IMO.

I'd wonder why you're not using a library like dom4j and built-in XPath. You're doing a lot of work with a very low-level API (WC3 DOM).
Step through with a debugger and see what children that <spam> node has. You should quickly figure out why it's null. It'll be faster than asking here.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to get unique value of repeated nodes using dom parser - java

Related

Java How Compare XML Data with file extensions

Java XML DOM: Why does Java XML Attr.isId() return false when it should return true

How to make XML Parser aware of all Character Entity References?

Saxon XPathEvaluator only returns the first result

What's wrong with this Java XML-Parsing code?

Categories

Resources