How to get all attributes from each element separately? - java

Here's some basic xml doc:
<h1>My Heading</h1>
<p align = "center"> My paragraph
<img src="smiley.gif" alt="Smiley face" height="42" width="42"></img>
<img src="sad.gif" alt="Sad face" height="45" width="45"></img>
<img src="funny.gif" alt="Funny face" height="48" width="48"></img>
</p>
<p>My para</p>
What am i trying to do is find element, all his attributes and save attribute name + attribute value for each element. Here's my code so far:
private Map <String, String> tag = new HashMap <String,String> ();
public Map <String, String> findElement () {
try {
FileReader fRead = new FileReader (sourcePage);
BufferedReader bRead = new BufferedReader (fRead);
DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance ();
DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder ();
Document doc = docBuilder.parse(new FileInputStream (new File (sourcePage)));
XPathFactory xFactory = XPathFactory.newInstance ();
XPath xPath = xFactory.newXPath ();
NodeList nl = (NodeList) xPath.evaluate("//img/#*", doc, XPathConstants.NODESET);
for( int i=0; i<nl.getLength (); i++) {
Attr attr = (Attr) nl.item(i);
String name = attr.getName();
String value = attr.getValue();
tag.put (name,value);
}
bRead.close ();
fRead.close ();
}
catch (Exception e) {
e.printStackTrace();
System.err.println ("An error has occured.");
}
Problem appears when i am looking for img's attributes, because of identical attributes. HashMap is not suitable for this, for its overwriting of values with the same key. Maybe i'm using wrong expression to find all attributes. Is there any other way, how to get attributes names and values of nth img element?

First, let's level the field a little. I cleaned up your code a bit to have a compiling starting point. I removed the unnecessary code and fixed the method by my best guess of what it is supposed to do. And I generized it a little to make it accept one tagName parameter. It's still the same code and does the same mistake, but now it compiles (Java 7 features used for convenience, switch it back to Java 6 if you want). I also split the try-catch into multiple blocks just for the sake of it:
public Map<String, String> getElementAttributesByTagName(String tagName) {
Document document;
try (InputStream input = new FileInputStream(sourcePage)) {
DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
document = docBuilder.parse(input);
} catch (IOException | ParserConfigurationException | SAXException e) {
throw new RuntimeException(e);
}
NodeList attributeList;
try {
XPath xPath = XPathFactory.newInstance().newXPath();
attributeList = (NodeList)xPath.evaluate("//descendant::" + tagName + "[1]/#*", document, XPathConstants.NODESET);
} catch (XPathExpressionException e) {
throw new RuntimeException(e);
}
Map<String, String> tagInfo = new HashMap<>();
for (int i = 0; i < attributeList.getLength(); i++) {
Attr attribute = (Attr)attributeList.item(i);
tagInfo.put(attribute.getName(), attribute.getValue());
}
return tagInfo;
}
When run against your example code above, it returns:
{height=48, alt=Funny face, width=48, src=funny.gif}
The solution depends on what is your expected output. You either want
To get the attributes of only one of the <img> elements (say, the first one)
To get a list of all <img> elements and their attributes
For the first solution, it's enough to change your XPath expression to
//descendant::img[1]/#*
or
//descendant::" + tagName + "[1]/#*
with the tagName parameter. Beware, that this is not the same as //img[1]/#* even though it returns the same element in this particular case.
When changed this way, the method returns:
{height=42, alt=Smiley face, width=42, src=smiley.gif}
which are correctly returned attributes of the first <img> element.
Note that you don't even have to use XPath expression for this kind of work. Here's a non-XPath version:
public Map<String, String> getElementAttributesByTagNameNoXPath(String tagName) {
Document document;
try (InputStream input = new FileInputStream(sourcePage)) {
DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
document = docBuilder.parse(input);
} catch (IOException | ParserConfigurationException | SAXException e) {
throw new RuntimeException(e);
}
Node node = document.getElementsByTagName(tagName).item(0);
NamedNodeMap attributeMap = node.getAttributes();
Map<String, String> tagInfo = new HashMap<>();
for (int i = 0; i < attributeMap.getLength(); i++) {
Node attribute = attributeMap.item(i);
tagInfo.put(attribute.getNodeName(), attribute.getNodeValue());
}
return tagInfo;
}
The second solution needs to change things a bit. We want to return the attributes of all <img> elements in the document. Multiple elements means we'll use a List which will hold multiple Map<String, String> instances, where every Map represents one <img> element.
A complete XPath version in case you actually need some complex XPath expression:
public List<Map<String, String>> getElementsAttributesByTagName(String tagName) {
Document document;
try (InputStream input = new FileInputStream(sourcePage)) {
DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
document = docBuilder.parse(input);
} catch (IOException | ParserConfigurationException | SAXException e) {
throw new RuntimeException(e);
}
NodeList nodeList;
try {
XPath xPath = XPathFactory.newInstance().newXPath();
nodeList = (NodeList)xPath.evaluate("//" + tagName, document, XPathConstants.NODESET);
} catch (XPathExpressionException e) {
throw new RuntimeException(e);
}
List<Map<String, String>> tagInfoList = new ArrayList<>();
for (int i = 0; i < nodeList.getLength(); i++) {
Node node = nodeList.item(i);
NamedNodeMap attributeMap = node.getAttributes();
Map<String, String> tagInfo = new HashMap<>();
for (int j = 0; j < attributeMap.getLength(); j++) {
Node attribute = attributeMap.item(j);
tagInfo.put(attribute.getNodeName(), attribute.getNodeValue());
}
tagInfoList.add(tagInfo);
}
return tagInfoList;
}
To get rid of the XPath part, you can simply switch it to a one-liner:
NodeList nodeList = document.getElementsByTagName(tagName);
Both these versions, when run against your test case above with an "img" parameter, return this: (formatted for clarity)
[ {height=42, alt=Smiley face, width=42, src=smiley.gif},
{height=45, alt=Sad face, width=45, src=sad.gif },
{height=48, alt=Funny face, width=48, src=funny.gif } ]
which is a correct list of all the <img> elements.

try using
Map <String, ArrayList<String>> tag = new HashMap <String, ArrayList<String>> ();

You can use a map inside the map:
Map<Map<int, String>, String> // int = "some index" 0,1,etc.. & String1(the value of the second Map) =src & String2(the value of the original Map) =smiley.gif
OR
You can inverse it and consider that when using it, like :
Map<String, String> // String1=key=smiley.gif , String2=value=src

Related

Filling Vector from NodeList

I am trying to fill a String Vector by data from a NodeList (which values is String too), but it doesn't work and Vector is still empty.
What am I doing wrong and how to fix it?
Thanks in advance!
Document doc = parseFile(xml);
Vector <String> x = new Vector <>();
NodeList list = doc.getElementsByTagName("Stuff");
for (int i = 0; i < list.getLength(); i++) {
x.addElement(list.item(i).getFirstChild().getNodeValue());
}
public Document parseFile(File file) {
try {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
doc = (Document) builder.parse(file);
} catch (Exception e) { e.printStackTrace(); }
return doc;
}
Made mistake in tag name in NodeList list = doc.getElementsByTagName("Stuff");
Looking for wrong "Stuff" :)
Sorry, thank you all for help anyway

How get XML value from Unziped file

I need to get value like "Symbol" ect. from xml file and send to list.
For now my code looks like this:
Scanner sc = null;
byte[] buff = new byte[1 << 13];
List<String> question2 = new ArrayList<String>();
question2 = <MetodToGetFile>(sc,fileListQ);
for ( String strLista : question2){
ByteArrayInputStream in = new ByteArrayInputStream(strLista.getBytes());
try(InputStream reader = Base64.getMimeDecoder().wrap(in)){
try (GZIPInputStream gis = new GZIPInputStream(reader)) {
try (ByteArrayOutputStream out = new ByteArrayOutputStream()){
int readGis = 0;
while ((readGis = gis.read(buff)) > 0)
out.write(buff, 0, readGis);
byte[] buffer = out.toByteArray();
String s2 = new String(buffer);
}
}
}
}
}
I want to know how can i contunue this and takevalue "xxx" and "zzzz" to put to another list, because i need to compere some value.
XML looks like this:
<?xml version="1.0" encoding="utf-8"?>
<Name Name="some value">
<Group Names="some value">
<Package Guid="{7777-7777-7777-7777-7777}">
<Attribute Typ="" Name="Symbol">xxx</Attribute>
<Attribute Type="" Name="Surname">xxx</Attribute>
<Attribute Type="Address" Name="Name">zzzz</Attribute>
<Attribute Type="Address" Name="Country">zzzz</Attribute>
</Package>
EDIT: Hello i hope that my solution will be usefull for someone :)
try{
//Get is(inputSource with xml in s2(xml string value from stream)
InputSource is = new InputSource(new StringReader(s2));
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(is);
XPathFactory xpf = XPathFactory.newInstance();
XPath xpath = xpf.newXPath();
//Get "some value" from attribut Name
String name= (String) xpath.evaluate("/Name/#Name", doc, XPathConstants.STRING);
//Get "guid" from attribute guid
String guid= (String) xpath.evaluate("/Name/Group/Package/#Guid", doc, XPathConstants.STRING);
//Get element xxx by tag value Symbol
String symbol= xpath.evaluate("/Name/Group/Package/Attribute[#Name=\"Symbol\"]", doc.getDocumentElement());
System.out.println(name);
System.out.println(guid);
System.out.println(symbol);
}catch(Exception e){
e.printStackTrace();
}
I would be happy if i will help someone by my code :)
Add a method like this to retrieve all of the elements that match a given Path expression:
public List<Node> getNodes(Node sourceNode, String xpathExpresion) throws XPathExpressionException {
// You could cache/reuse xpath for better performance
XPath xpath = XPathFactory.newInstance().newXPath();
NodeList nodes = (NodeList) xpath.evaluate(xpathExpresion,sourceNode,XPathConstants.NODESET);
ArrayList<Node> list = new ArrayList<Node>();
for(int i = 0; i < nodes.getLength(); i++) {
Node node = nodes.item(i);
list.add(node);
}
return list;
}
Add another method to build a Document from an XML input:
public Document buildDoc(InputStream is) throws Exception {
DocumentBuilderFactory fact = DocumentBuilderFactory.newInstance();
DocumentBuilder parser = fact.newDocumentBuilder();
Document newDoc = parser.parse(is);
newDoc.normalize();
is.close();
return newDoc;
}
And then put it all together:
InputSource is = new InputSource(new StringReader("... your XML string here"));
Document doc = buildDoc(is);
List<Node> nodes = getNodes(doc, "/Name/Group/Package/Attribute");
for (Node node: nodes) {
// for the text body of an element, first get its nested Text child
Text text = node.getChildNodes().item(0);
// Then ask that Text child for it's value
String content = node.getNodeValue();
}
I hope I copied and pasted this correctly. I pulled this from a working class in an open source project of mine and cleaned it up a bit to answer your specific question.

Get value of XML tag using java

I'm trying to read the value of "release" tag of a remote XML file and return it's value .I'm able to find the value of "release" tag using getElementText() but not by getElementValue()
Java Code..
try {
URL url1 = new URL("http://hsv-artifactory.emrsn.org:8081/artifactory/libs-release-local/com/avocent/commonplatform/cps/symbols/gdd/GDDResources/maven-metadata.xml");
XMLStreamReader reader1 = XMLInputFactory.newInstance().createXMLStreamReader(url1.openStream());
String Latest = null;
while (reader1.hasNext()) {
if (reader1.next() == XMLStreamConstants.START_ELEMENT) {
if (reader1.getLocalName().equals("release")) {
Latest = reader1.getElementText();
break;
}
}
}
System.out.println("Latest version in Artifactory is :"+Latest);
} catch (IOException ex) {
// handle exception
Logger.getLogger(SVNRepoConnector1.class.getName()).log(Level.SEVERE, null, ex);
} catch (XMLStreamException ex) {
// handle exception
Logger.getLogger(SVNRepoConnector1.class.getName()).log(Level.SEVERE, null, ex);
} finally {
// close the stream
}
In the above code the value is being stored in a String variable but i want to store it in an integer variable so that i can perform operations like addition,subtraction on it afterwards..Please Help
DOM Solution:
URL url1 = new URL("http://hsv-artifactory.emrsn.org:8081/artifactory/libs-release-local/com/avocent/commonplatform/cps/symbols/gdd/GDDResources/maven-metadata.xml");
InputSource xmlInputSource = new InputSource(url1.openStream());
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document dom = dBuilder.parse(xmlInputSource);
XPathFactory xPathfactory = XPathFactory.newInstance();
XPath xpath = xPathfactory.newXPath();
XPathExpression expr = xpath.compile("//release");//all release elements
NodeList nodes = (NodeList) expr.evaluate(e,XPathConstants.NODESET);
ArrayList<Integer> releaseList = new ArrayList<Integer>();
for (int i = 0; i < nodes.getLength(); i++) {
Element releaseElem = (Element) nodes.item(i);
releaseList.add(Integer.parseInt(releaseElem.getText()));
}
Just catch the exceptions.

Java XML Parsing into a List and Grabbing Nodes

I am parsing an XML document and I need to put every child in to a List and then once it is in a List I need to be able to grab a specific child node from an index in the List. My code so far only grabs every child node but I don't know how to put it in a List, looping through it doesn't seem to work. Here is what I have so far:
public static void main(String[] args){
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
try {
URL url = new URL ("http://feeds.cdnak.neulion.com/fs/nhl/mobile/feeds/data/20140401.xml");
URLConnection connection = url.openConnection();
InputStream is = connection.getInputStream();
// use the factory to create a documentbuilder
DocumentBuilder builder = factory.newDocumentBuilder();
// create a new document from input stream
Document doc = builder.parse(is); // get the first element
Element element = doc.getDocumentElement();
System.out.println(element);
// get all child nodes
NodeList nodes = element.getChildNodes();
// print the text content of each child
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println("" + nodes.item(i).getTextContent());
} } catch (Exception ex) {
ex.printStackTrace();
}
}

Node.getTextContent() is there a way to get text content of the current node, not the descendant's text

Node.getTextContent() returns the text content of the current node and its descendants.
is there a way to get text content of the current node, not the descendant's text.
Example
<paragraph>
<link>XML</link>
is a
<strong>browser based XML editor</strong>
editor allows users to edit XML data in an intuitive word processor.
</paragraph>
expected output
paragraph = is a editor allows users to edit XML data in an intuitive word processor.
link = XML
strong = browser based XML editor
i tried below code
String str = "<paragraph>"+
"<link>XML</link>"+
" is a "+
"<strong>browser based XML editor</strong>"+
"editor allows users to edit XML data in an intuitive word processor."+
"</paragraph>";
org.w3c.dom.Document domDoc = null;
DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder;
try {
docBuilder = docFactory.newDocumentBuilder();
ByteArrayInputStream bis = new ByteArrayInputStream(str.getBytes());
domDoc = docBuilder.parse(bis);
} catch (ParserConfigurationException e1) {
e1.printStackTrace();
} catch (SAXException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
DocumentTraversal traversal = (DocumentTraversal) domDoc;
NodeIterator iterator = traversal.createNodeIterator(
domDoc.getDocumentElement(), NodeFilter.SHOW_ELEMENT, null, true);
for (Node n = iterator.nextNode(); n != null; n = iterator.nextNode()) {
String tagname = ((Element) n).getTagName();
System.out.println(tagname + "=" + ((Element)n).getTextContent());
}
but it gives the output like this
paragraph=XML is a browser based XML editoreditor allows users to edit XML data in an intuitive word processor.
link=XML
strong=browser based XML editor
note the paragraph element contains the text of link and strong tag, which i dont want.
please suggest some ideas?
What you want is to filter children of your node <paragraph> to only keep ones with node type Node.TEXT_NODE.
This is an example of method that will return you the desired content
public static String getFirstLevelTextContent(Node node) {
NodeList list = node.getChildNodes();
StringBuilder textContent = new StringBuilder();
for (int i = 0; i < list.getLength(); ++i) {
Node child = list.item(i);
if (child.getNodeType() == Node.TEXT_NODE)
textContent.append(child.getTextContent());
}
return textContent.toString();
}
Within your example it means:
String str = "<paragraph>" + //
"<link>XML</link>" + //
" is a " + //
"<strong>browser based XML editor</strong>" + //
"editor allows users to edit XML data in an intuitive word processor." + //
"</paragraph>";
Document domDoc = null;
try {
DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docFactory.newDocumentBuilder();
ByteArrayInputStream bis = new ByteArrayInputStream(str.getBytes());
domDoc = docBuilder.parse(bis);
} catch (Exception e) {
e.printStackTrace();
}
DocumentTraversal traversal = (DocumentTraversal) domDoc;
NodeIterator iterator = traversal.createNodeIterator(domDoc.getDocumentElement(), NodeFilter.SHOW_ELEMENT, null, true);
for (Node n = iterator.nextNode(); n != null; n = iterator.nextNode()) {
String tagname = ((Element) n).getTagName();
System.out.println(tagname + "=" + getFirstLevelTextContent(n));
}
Output:
paragraph= is a editor allows users to edit XML data in an intuitive word processor.
link=XML
strong=browser based XML editor
What it does is iterating on all the children of a Node, keeping only TEXT (thus excluding comments, node and so on) and accumulating their respective text content.
There is no direct method in Node or Element to get only the text content at first level.
If you change the last for loop into the following one it behaves as you wanted
for (Node n = iterator.nextNode(); n != null; n = iterator.nextNode()) {
String tagname = ((Element) n).getTagName();
StringBuilder content = new StringBuilder();
NodeList children = n.getChildNodes();
for(int i=0; i<children.getLength(); i++) {
Node child = children.item(i);
if(child.getNodeName().equals("#text"))
content.append(child.getTextContent());
}
System.out.println(tagname + "=" + content);
}
I do this with Java 8 streams and a helper class:
import java.util.*;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
public class NodeLists
{
/** converts a NodeList to java.util.List of Node */
static List<Node> list(NodeList nodeList)
{
List<Node> list = new ArrayList<>();
for(int i=0;i<nodeList.getLength();i++) {list.add(nodeList.item(i));}
return list;
}
}
And then
NodeLists.list(node)
.stream()
.filter(node->node.getNodeType()==Node.TEXT_NODE)
.map(Node::getTextContent)
.reduce("",(s,t)->s+t);
Implicitly don't have any function for the actual node text but with a simple trick you can do it. Ask if the node.getTextContent() contains "\n", if that is the case then the actual node don't have any text.
Hope this help.

Categories

Resources