JDOM get string of containing content

JDOM get string of containing content - java

lets say I have some xml:
<document>blabla<bold>test<list><item>hello<italics>dfh</italics></item></list></bold>sdfsd</document>
and I now need to get the content of as a string, so I would have
blabla<bold>test<list><item>hello<italics>dfh</italics></item></list></bold>sdfsd
i have been messing with this in my head for a while now, and I haven't seem to be able to figure it out.
Hope to get some directions to what I have to do.
EDIT:
just to be clear, lets say I have the XML like this:
SAXBuilder sb = new SAXBuilder();
Document doc = sb.build(new StringReader("<document>blabla<bold>test<list><item>hello<italics>dfh</italics></item></list></bold>sdfsd</document>"));
and I now need to get the content of

It is very unusual to need to get an inconsistent subset of an XML document like you want. It's much more common to get just the text content: blabla test hello dfh sdfsd
Note that you can get a subset of the content as the "contentlist" of the root element, and then output just that list as a string:
XMLOutputter xout = new XMLOutputter();
String txt = xout.outputString(doc.getRootElement().getContent());
System.out.println(txt);
For me, I wrote the code:
public static void main(String[] args) throws JDOMException, IOException {
SAXBuilder sb = new SAXBuilder();
Document doc = sb.build(new StringReader("<document>blabla<bold>test<list><item>hello<italics>dfh</italics></item></list></bold>sdfsd</document>"));
XMLOutputter xout = new XMLOutputter();
String txt = xout.outputString(doc.getRootElement().getContent());
System.out.println(txt);
}
and it output:
blabla<bold>test<list><item>hello<italics>dfh</italics></item></list></bold>sdfsd

Related

how to create xml file in runtime?

I am trying to create an XML file at run-time under my web content folder, but a No such file or directory error was displayed.
My code:
Document document = DocumentHelper.createDocument();
Element rootElement = document.addElement("Students");
Element studentElement = rootElement.addElement("student").addAttribute("country", "USA");
studentElement.addElement("id").addText("1");
studentElement.addElement("name").addText("Peter");
XMLWriter writer = new XMLWriter(new FileWriter("/WebContent/Students.xml"));
//Note that You can format this XML document
/*
* FileWriter output = new FileWriter(new File("Students.xml"));
OutputFormat format = OutputFormat.createPrettyPrint();
XMLWriter writer = new XMLWriter(output,format);<- will fomat the output
*/
//You can print this to the console and see what it looks like
String xmlElement = document.asXML();
System.out.println(xmlElement);
writer.write(document);
writer.close();
I don't know how to do this. Can anyone help me to fix my code?

i got the answer i just change the path from /WebContent/Students.xml to
WebContent/Students.xml.
just remove the / before the WebContent

Dom4j get single node text value

Assume I have
<Sports>
<Soccer>
<Players>
<Player_1> Messi Leonel </Player_1>
</Players>
</Soccer>
</Sports>
How to get Player_1 node text in one line without iteration using Dom4J?
Return value should be: Messi Leonel
Thanks

Got it, to the person who looks something like this
File file = new File("/path/to/file.xml");
SAXReader reader = new SAXReader();
Document document = reader.read(file);
String name = document.selectSingleNode("//Sports/Soccer/Players/Player_1").getText();

get node raw text

How get node value with its children nodes? For example I have following node parsed into dom Document instance:
<root>
<ch1>That is a text with <value name="val1">value contents</value></ch1>
</root>
I select ch1 node using xpath. Now I need to get its contents, everything what is containing between <ch1> and </ch1>, e.g. That is a text with <value name="val1">value contents</value>.
How can I do it?

I have found the following code snippet that uses transformation, it gives almost exactly what I want. It is possible to tune result by changing output method.
public static String serializeDoc(Node doc) {
StringWriter outText = new StringWriter();
StreamResult sr = new StreamResult(outText);
Properties oprops = new Properties();
oprops.put(OutputKeys.METHOD, "xml");
TransformerFactory tf = TransformerFactory.newInstance();
Transformer t = null;
try {
t = tf.newTransformer();
t.setOutputProperties(oprops);
t.transform(new DOMSource(doc), sr);
} catch (Exception e) {
System.out.println(e);
}
return outText.toString();
}

If this is server side java (ie you do not need to worry about it running on other jvm's) and you are using the Sun/Oracle JDK, you can do the following:
import com.sun.org.apache.xml.internal.serialize.OutputFormat;
import com.sun.org.apache.xml.internal.serialize.XMLSerializer;
...
Node n = ...;
OutputFormat outputFormat = new OutputFormat();
outputFormat.setOmitXMLDeclaration(true);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
XMLSerializer ser = new XMLSerializer(baos, outputFormat);
ser.serialize(n);
System.out.println(new String(baos.toByteArray()));
Remember to ensure your ultimate conversion to string may need to take an encoding parameter if the parsed xml dom has its text nodes in a different encoding than your platforms default one or you'll get garbage on the unusual characters.

You could use jOOX to wrap your DOM objects and get many utility functions from it, such as the one you need. In your case, this will produce the result you need (using css-style selectors to find <ch1/>:
String xml = $(document).find("ch1").content();
Or with XPath as you did:
String xml = $(document).xpath("//ch1").content();
Internally, jOOX will use a transformer to generate that output, as others have mentioned

As far as I know, there is no equivalent of innerHTML in Document. DOM is meant to hide the details of the markup from you.
You can probably get the effect you want by going through the children of that node. Suppose for example that you want to copy out the text, but replace each "value" tag with a programmatically supplied value:
HashMap<String, String> values = ...;
StringBuilder str = new StringBuilder();
for(Element child = ch1.getFirstChild; child != null; child = child.getNextSibling()) {
if(child.getNodeType() == Node.TEXT_NODE) {
str.append(child.getTextContent());
} else if(child.getNodeName().equals("value")) {
str.append(values.get(child.getAttributes().getNamedItem("name").getTextContent()));
}
}
String output = str.toString();

How to rename an xml node to a html tag

Say I have a Java String which has xml data like so:
String content = "<abc> Hello <mark> World </mark> </abc>";
Now, I seek to render this String as text on a web page and hightlight/mark the word "World". The tag "abc" could change dynamically, so is there a way I can rename the outermost xml tag in a String using Java ?
I would like to convert the above String to the format shown below:
String content = "<i> Hello <mark> World </mark> </i>";
Now, I could use the new String to set html content and display the text in italics and highlight the word World.
Thanks,
Sony
PS: I am using xquery over files in BaseX xml database. The String content is essentially a result of an xquery which uses ft:extract(), a function to extract full text search results.

XML "parsing" with regexes can be cumbersome. If there is a possibility that your XML string can be more complicated than the one used in your example, you should consider processing it as a real XML node.
String newName = "i";
// parse String as DOM
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new InputSource(new StringReader(content)));
// modify DOM
doc.renameNode(doc.getDocumentElement(), null, newName);
This code assumes that the element to that needs to be renamed is always the outermost element, that is, the root element.
Now the document is a DOM tree. It can be converted back to String object with a transformer.
// output DOM as String
Transformer transformer = TransformerFactory.newInstance().newTransformer();
StringWriter sw = new StringWriter();
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
transformer.transform(new DOMSource(doc), new StreamResult(sw));
String italicsContent = sw.toString();

Perhaps a simple regex?
String content = "<abc> Sample text <mark> content </mark> </abc>";
Pattern outerTags = Pattern.compile("^<(\\w+)>(.*)</\\1>$");
Matcher m = outerTags.matcher(content);
if (m.matches()) {
content = "<i>" + m.group(2) + "</i>";
System.out.println(content);
}
Alternatively, use a DOM parser, find the children of the outer tag and print them, preceded and followed by your desired tag as strings

How to save parsed and changed DOM document in xml file?

I have xml-file. I need to read it, make some changes and write new changed version to some new destination.
I managed to read, parse and patch this file (with DocumentBuilderFactory, DocumentBuilder, Document and so on).
But I cannot find a way how to save that file. Is there a way to get it's plain text view (as String) or any better way?

Something like this works:
Transformer transformer = TransformerFactory.newInstance().newTransformer();
Result output = new StreamResult(new File("output.xml"));
Source input = new DOMSource(myDocument);
transformer.transform(input, output);

That will work, provided you're using xerces-j:
public void serialise(org.w3c.dom.Document document) {
java.io.ByteArrayOutputStream data = new java.io.ByteArrayOutputStream();
java.io.PrintStream ps = new java.io.PrintStream(data);
org.apache.xml.serialize.OutputFormat of =
new org.apache.xml.serialize.OutputFormat("XML", "ISO-8859-1", true);
of.setIndent(1);
of.setIndenting(true);
org.apache.xml.serialize.XMLSerializer serializer =
new org.apache.xml.serialize.XMLSerializer(ps, of);
// As a DOM Serializer
serializer.asDOMSerializer();
serializer.serialize(document);
return data.toString();
}

That will give you possibility to define xml format
new XMLWriter(new FileOutputStream(fileName),
new OutputFormat(){{
setEncoding("UTF-8");
setIndent(" ");
setTrimText(false);
setNewlines(true);
setPadText(true);
}}).write(document);

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

JDOM get string of containing content - java

Related

how to create xml file in runtime?

Dom4j get single node text value

get node raw text

How to rename an xml node to a html tag

How to save parsed and changed DOM document in xml file?

Categories

Resources