Is it possible to get the contents of an XML tag as a String in Java using Simple XML?
I'm trying to do it using a Converter. I can obatin <tag1> as an InputNode object, but there is no API to retrieve the contents as String. I could iterate the children with InputNode.getNext() and reconstruct the content by recursively retrieving name, attributes, values, etc... but I would never be sure that it would match the original XML.
Example:
<root>
<tag1>
<unknownTag>Unknown</unknownTag>
<otherUnknownTag>
<children1>hello</children1>
<children2>bye</children2>
</otherUnknownTag>
</tag1>
<tag2>
...
</tag2>
</root>
I would like to retrieve the following contents of <tag1> as a String (and prevent deserialisation for all <tag1> children):
<unknownTag>Unknown</unknownTag>
<otherUnknownTag>
<children1>hello</children1>
<children2>bye</children2>
</otherUnknownTag>
The contents of <tag1> are not known at deserialisation time.
As far as I know it is possible only partially. This is how far I've got:
public String getNodeAsString(InputNode node) throws Exception {
StringBuilder builder = new StringBuilder();
String value = node.getValue();
if (value != null) {
builder.append(value);
}
InputNode child = node.getNext();
while (child != null) {
builder.append("<").append(child.getName());
for (String attribute : child.getAttributes()) {
builder.append(" ")
.append(attribute)
.append("=\"")
.append(child.getAttribute(attribute).getValue())
.append("\"");
}
builder.append(">")
.append(child.getValue())
.append("</").append(child.getName()).append(">");
value = node.getValue();
if (value != null) {
builder.append(value);
}
child = child.getNext();
}
return builder.toString();
}
This kind of works but has two flaws:
The order of attributes is not preserved because SimpleXML puts attributes to map and the attributes iteration is ordered in the same order as map keys.
This cannot parse tags nested in direct child of the InputNode or at least I don't know how to get list of children of child node.
Related
I am trying with this code to replace null values in arraylist. I am getting null values in a tag in my xml file. Values in that tag are coming from arraylist. I want to remove null from tag and put nothing in place of it. My code is something like this:
for(String s:a.getList){
here I setting values in tag by create tag and than appending child nodess using DOM parser.
}
where a=object that contains list
output is like this:
<value>1</value>
<value>2</value>
<value>null</value>
<value>3</value>
<value>4</value>
<value>null</null>
.
.
.and so on
Expected output:
<value>1</value>
<value>2</value>
<value/>
<value>3</value>
<value>4</value>
<value/>
null should be removed and tag should look something like this
code I am trying is:
for(String s:a.list){
if(s.equals("null")){
s.replace("null","");
my code;
}
Always getting null pointer exception and don't know if this runs what will be output.
Please help..
You are not updating the list, you are creating a new String instance since String are immutable. Just set the value you want if the current value is "null"
for(int i = 0; i < list.size(); ++i){
if("null".equals(list.get(i)){
list.set(i, "");
}
}
The condition won't fail for null value, but if you want to replace those, you need to add the condition because for now, it will only update "null".
The best way to approach using array list is iterate from last to first if you want to remove concurrently.
for (int i = list.size()-1; i >= 0; i--) {
if ("null".equalsIgnoreCase(list.get(i))) {
list.remove(i);
}
}
"null".equalsIgnoreCase(list.get(i)) will avoid null pointer exception
For removing and printing value
for (String str : abc) {
if ("null".equalsIgnoreCase(str)) {
System.out.println("<value/>");
} else {
System.out.println("<value>"+str+"</value>");
}
}
To make it clearer
public static void main(String... args) {
ArrayList<String> a = new ArrayList<String>();
a.add("one");
a.add(null);
a.add("two");
a.removeAll(Collections.singleton(null));
for(String value : a) {
System.out.println(value);
}
}
Output
one
two
I am using JTidy and xpath in parsing HTML, but for the time being parsing text causes me a little trouble because it may include b tag inside, so I don't want to loop over it's child nodes but simply remove 'b' tags after it loads html.
How can I delete tags if from DOM document.
Document doc = tidy.parseDOM(url.openStream(), System.out);
for example pseudo code for it - doc.removeTag('<b>');
Is it possible ?
You have tagged this with 'jdom', but your document is a DOM document (not JDOM).
Of corse, if it was JDOM, you could replace the Elements with their content using a relatively simple document scan. Or, you can use a custom SAXHandler to skip adding the Element in the first place.
Using JDOM, you could, for example, do something like:
for (Iterator <Content> it = document.getDescendants(); it.hasNext(); ) {
Content c = it.next();
if ((c instanceof Element) && "b".equals(((Element)c).getName())) {
Element e = (Element)c;
it.remove();
for (Content k : e.getContent()) {
k.detach();
it.add(k);
}
}
}
I know this was asked many times but I still cannot get it to work. I convert xml string to Document object and then parse it. Here is the code:
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.*;
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder;
try
{
builder = factory.newDocumentBuilder();
Document document = builder.parse( new InputSource( new StringReader( result ) ) );
Node head = document.getFirstChild();
if(head != null)
{
NodeList airportList = head.getChildNodes();
for(int i=0; i<airportList.getLength(); i++) {
Node n = airportList.item(i);
Element airportElem = (Element)n;
}
}
catch (Exception e) {
e.printStackTrace();
}
When I cast the Node object n to Element I get an exception java.lang.ClassCastException: org.apache.harmony.xml.dom.TextImpl cannot be cast to org.w3c.dom.Element. When I check the node type of the Node object it says Node.TEXT_NODE. I believe it should be Node.ELEMENT_NODE. Am I right?
So how do I convert Node to Element, so I can do something like element.getAttribute("attrName").
Here is my XML:
<?xml version="1.0" encoding="utf-8" ?>
<ArrayOfCity xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" >
<City>
<strName>Abu Dhabi</strName>
<strCode>AUH</strCode>
</City>
<City>
<strName>Amsterdam</strName>
<strCode>AMS</strCode>
</City>
<City>
<strName>Antalya</strName>
<strCode>AYT</strCode>
</City>
<City>
<strName>Bangkok</strName>
<strCode>BKK</strCode>
</City>
</ArrayOfCity>
Thanks in advance!
I think you need something like this:
NodeList airportList = head.getChildNodes();
for (int i = 0; i < airportList.getLength(); i++) {
Node n = airportList.item(i);
if (n.getNodeType() == Node.ELEMENT_NODE) {
Element elem = (Element) n;
}
}
When I cast the Node object n to Element I get an exception java.lang.ClassCastException: org.apache.harmony.xml.dom.TextImpl cannot be cast to org.w3c.dom.Element. When I check the node type of the Node object it says Node.TEXT_NODE. I believe it should be Node.ELEMENT_NODE. Am I right?
Probably not, the parser is probably right. It means that some of the nodes in what you're parsing are text nodes. For example:
<foo>bar</foo>
In the above, we have a foo element containing a text node. (The text node contains the text "bar".)
Similarly, consider:
<foo>
<bar>baz</bar>
</foo>
If your XML document literally looks like the above, it contains a root element foo with these child nodes (in order):
A text node with some whitespace in it
A bar element
A text node with some more whitespace in it
Note that the bar element is not the first child of foo. If it looked like this:
<foo><bar>baz</bar></foo>
then the bar element would be the first child of foo.
you can also try to "protect" your casting
Node n = airportList.item(i);
if (n instanceof Element)
{
Element airportElem = (Element)n;
// ...
}
but as pointed by others, you have text node, those won't be casted by this method, be sure you don't need them of use the condition to have a different code to process them
I know how to parse XML documents with DOM when they are in the form:
<tagname> valueIWant </tagname>
However, the element I'm now trying to get is instead in the form
<photo farm="9" id="8147664661" isfamily="0" isfriend="0" ispublic="1"
owner="8437609#N04" secret="4902a217af" server="8192" title="Rainbow"/>
I usually use cel.getTextContent() to return the value, but that doesn't work in this case. Neither does cel.getAttributes(), which I thought would work...
Ideally, I need to just get the id and owner numerical values. However if someone can help on how to get all of it, then I can deal with removing the parts I don't want later.
What you're looking to retrieve is the value of different attributes that are attached with an Element. Look at using the getAttribute(String name) method to achieve this
If you want to retrieve all the attributes, all you can do so using getAttributes() and iterate through it. An example of both of these methods might be something like this:
private void getData(Document document){
if(document == null)
return;
NodeList list = document.getElementsByTagName("photo");
Element photoElement = null;
if(list.getLength() > 0){
photoElement = (Element) list.item(0);
}
if(photoElement != null){
System.out.println("ID: "+photoElement.getAttribute("id"));
System.out.println("Owner: "+photoElement.getAttribute("owner"));
NamedNodeMap childList = photoElement.getAttributes();
Attr attribute;
for(int index = 0; index < childList.getLength(); index++){
if(childList.item(index).getNodeType() == Node.ATTRIBUTE_NODE){
attribute = ((Attr)childList.item(index));
System.out.println(attribute.getNodeName()+" : "+attribute.getNodeValue());
}else{
System.out.println(childList.item(index).getNodeType());
}
}
}
}
Something like:
Element photo = (Element)yournode;
photo.getAttribute("farm");
will get you the value of the farm attribute. You need to treat your node as an Element to have access to these attributes (doc).
I want to modify xml file using dom ,but when I make node.getNodeValue(); it returns null !I don't know why? my xml file contains the following tags:
[person] which contains child [name] which contains childs [firstname ,middleInitial ,lastName] childs
I want to update First name , middleInitial and last name using dom
this is my java dom processing file:
NodeList refPeopleList = doc.getElementsByTagName("person");
for (int i = 0; i < refPeopleList.getLength(); i++) {
NodeList personList = refPeopleList.item(i).getChildNodes();
for (int personDetalisCnt = 0; personDetalisCnt < refPeopleList.getLength(); personDetalisCnt++) {
{
currentNode = personList.item(personDetalisCnt);
String nodeName = currentNode.getNodeName();
System.out.println("node name is " + nodeName);
if (nodeName.equals("name")) {
System.out.println("indise name");
NodeList nameList = currentNode.getChildNodes();
for(int cnt=0;cnt<nameList.getLength();cnt++)
{
currentNode=nameList.item(cnt);
if(currentNode.getNodeName().equals("firstName"))
{
System.out.println("MODIFID NAME :"+currentNode.getNodeValue()); //prints null
System.out.println("indide fname"+" node name is "+currentNode.getNodeName()); //prints firstName
String nodeValue="salma";
currentNode.setNodeValue(nodeValue);
System.out.println("MODIFID NAME :"+currentNode.getNodeValue());//prints null
}
}
}
}
Rather than calling getNodeValue() / setNodeValue() on the <firstName> element node, try getting the firstName element's text node child, and call getNodeValue() / setNodeValue() on it.
Try
if(currentNode.getNodeName().equals("firstName"))
{
Node textNode = currentNode.getFirstChild();
System.out.println("Initial value:" + textNode.getNodeValue());
String nodeValue="salma";
textNode.setNodeValue(nodeValue);
System.out.println("Modified value:" + textNode.getNodeValue());
}
From the DOM spec,
The attributes nodeName, nodeValue and
attributes are included as a mechanism
to get at node information without
casting down to the specific derived
interface. In cases where there is no
obvious mapping of these attributes
for a specific nodeType (e.g.,
nodeValue for an Element or attributes
for a Comment), this returns null.
Similarly in the Java docs for the Node interface, the table near the top shows that the nodeValue of an element is null.
This is why using getNodeValue on an element will always return null, and why you need to use getFirstChild() first in order to get the text node (assuming there are no other child nodes). If there is a mixture of element and text child nodes, you can use getNodeType() to check which child is which (text is type 3).
Is it firstName or firstname (watch the case).