Insert blank lines in jdom2 pretty printing - java

I'm trying to simply add some blank lines to my jdom xml output. I've tried the following without luck:
Element root = new Element("root");
root.addContent(new CDATA("\n"));
root.addContent(new Text("\n"));
I figured the all-whitespace entry was being ignored so I tried creating my own XMLOutputProccessor like this:
class TweakedOutputProcessor extends AbstractXMLOutputProcessor {
#Override
public void process(java.io.Writer out, Format format, Text text) throws IOException {
if ("\n".equals(text.getText())) {
out.write("\n");
} else {
super.process(out, format, text);
}
}
}\
... called like this:
public static void printDocument(Document doc) {
XMLOutputter xmlOutput = new XMLOutputter(new TweakedOutputProcessor());
xmlOutput.setFormat(Format.getPrettyFormat());
try {
xmlOutput.output(doc, System.out);
} catch (IOException e) { }
}
The unexpected thing here was that process(..., Text) was never called. After some experimentation I've found that process(..., Document) is being called, but none of the other process(..., *) methods are.
I also tried overriding the printText(...) and printCDATA(...) methods, but neither is being called -- even when the text is non-whitespace! Yet printElement(...) is being called.
So...
What is going on here? What's doing the work if not these methods?
How do I simply insert a blank line?

Use the XML xml:space="preserve" when setting values in the XML. JDOM honours that XML white space handling

Related

How to make newlines between attributes in XML file by using dom4j?

I want to generate a xml file in the following format by using java :
each attribute should be in separate line.
<parameters>
<parameter
name="Tom"
city="York"
number="123"
/>
</parameters>
But I can only get all attributes in one line
<parameters>
<parameter name="Tom" city="York" number="123"/>
</parameters>
I'm using dom4j, could anyone tell how I can make it? Does dom4j supports this kind of format?
Thanks.
You cannot do it with the XMLWriter unless you want to substantially rewrite the main logic. However, since XMLWriter is also a SAX ContentHandler it can consume SAX events and serialize them to XML, and in this mode of operation, XMLWriteruses a different code path which is easier to customize. The following sub class will give you almost what you want, except that empty elements will not use the short form <element/>. Maybe that can be fixed by further tweaking.
static class ModifiedXmlWriter extends XMLWriter {
// indentLevel is private, need reflection to read it
Field il;
public ModifiedXmlWriter(OutputStream out, OutputFormat format) throws UnsupportedEncodingException {
super(out, format);
try {
il = XMLWriter.class.getDeclaredField("indentLevel");
il.setAccessible(true);
} catch (NoSuchFieldException e) {
throw new RuntimeException(e);
}
}
int getIndentLevel() {
try {
return il.getInt(this);
} catch (IllegalAccessException e) {
throw new RuntimeException(e);
}
}
#Override
protected void writeAttributes(Attributes attributes) throws IOException {
int l = getIndentLevel();
setIndentLevel(l+1);
super.writeAttributes(attributes);
setIndentLevel(l);
}
#Override
protected void writeAttribute(Attributes attributes, int index) throws IOException {
writePrintln();
indent();
super.writeAttribute(attributes, index);
}
}
public static void main(String[] args) throws Exception {
String XML = "<parameters>\n" +
" <parameter name=\"Tom\" city=\"York\" number=\"123\"/>\n" +
"</parameters>";
Document doc = DocumentHelper.parseText(XML);
XMLWriter writer = new ModifiedXmlWriter(System.out, OutputFormat.createPrettyPrint());
SAXWriter sw = new SAXWriter(writer);
sw.write(doc);
}
Sample output:
<?xml version="1.0" encoding="UTF-8"?>
<parameters>
<parameter
name="Tom"
city="York"
number="123"></parameter>
</parameters>
Generally speaking, very few XML serializers give you this level of control over the output format.
You can get something close to this with the Saxon serializer if you specify the options method=xml, indent=yes, saxon:line-length=20. The Saxon serializer is capable of taking a DOM4J tree as input. You will need Saxon-PE or -EE because it requires a serialization parameter in the Saxon namespace. It still won't be exactly what you want because the first attribute will be on the same line as the element name and the others will be vertically aligned underneath the first.

Extend a JDOM Document

For a project at university, I need to parse a GML file. GML files are XML based so I use JDOM2 to parse it. To fit my purposes, I extended org.jdom2.Document like so:
package datenbank;
import java.io.File;
// some more imports
public class GMLDatei extends org.jdom2.Document {
public void saveAsFile() {
// ...
}
public GMLKnoten getRootElement(){
return (GMLKnoten) this.getDocument().getRootElement();
}
public void setRootElement(GMLKnoten root){
this.getDocument().setRootElement(root);
}
}
I also extended org.jdom2.Element and named the subclass GMLKnoten but this does not matter too much for my question.
When testing, I try to load a GML file. When using the native document and element classes, it loads fine, but when using my subclasses, I get the following scenario:
I load the file using:
SAXBuilder saxBuilder = new SAXBuilder();
File inputFile = new File("gml/Roads_Munich_Route_Lines.gml");
GMLDatei document = null;
ArrayList<String> types = new ArrayList<String>();
try {
document = (GMLDatei) saxBuilder.build(inputFile);
} catch (JDOMException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
In the line
document = (GMLDatei) saxBuilder.build(inputFile);
I get a Cast-Exception:
Exception in thread "main" java.lang.ClassCastException: org.jdom2.Document cannot be cast to datenbank.GMLDatei
at datenbank.GMLTest.main(GMLTest.java:27)
I thought that casting schould be no problem as I am subclassing org.jdom2.document. What am I missing?
vat
In general I want to "challenge" your requirement to extend Document - what value do you get from your custom classes that are not already part of the native implementation? I ask this for 2 reasons:
as the maintainer of JDOM, should I be adding some new feature?
I am just curious.....
JDOM has a system in place for allowing you to extend it's core classes and have a different implementation of them when parsing a document. It is done by extending the JDOMFactory.
Consider this code here: JDOMFactory interface. When SAXParser parses a document it uses those methods to build the document.
There is a default, overridable implementation in DefaultJDOMFactory that you can extend, and, for example, in your implementation, you must override the non-final "Element" methods like:
#Override
public Element element(final int line, final int col, final String name,
String prefix, String uri) {
return new Element(name, prefix, uri);
}
and instead have:
#Override
public Element element(final int line, final int col, final String name,
String prefix, String uri) {
return new GMLKnoten (name, prefix, uri);
}
Note that you will have to override all methods that are non-final and return content that is to be customised (for example, you will have to override 4 Element methods by my count.
With your own GMLJDOMFactory you can then use SAXBuilder by either using the full constructor new SAXBuilder(null, null, new GMPJDOMFactory()) or by setting the JDOMFactory after you have constructred it with setJDOMFactory(...)

Turn off date comment in properties file [duplicate]

Is it possible to force Properties not to add the date comment in front? I mean something like the first line here:
#Thu May 26 09:43:52 CEST 2011
main=pkg.ClientMain
args=myargs
I would like to get rid of it altogether. I need my config files to be diff-identical unless there is a meaningful change.
Guess not. This timestamp is printed in private method on Properties and there is no property to control that behaviour.
Only idea that comes to my mind: subclass Properties, overwrite store and copy/paste the content of the store0 method so that the date comment will not be printed.
Or - provide a custom BufferedWriter that prints all but the first line (which will fail if you add real comments, because custom comments are printed before the timestamp...)
Given the source code or Properties, no, it's not possible. BTW, since Properties is in fact a hash table and since its keys are thus not sorted, you can't rely on the properties to be always in the same order anyway.
I would use a custom algorithm to store the properties if I had this requirement. Use the source code of Properties as a starter.
Based on https://stackoverflow.com/a/6184414/242042 here is the implementation I have written that strips out the first line and sorts the keys.
public class CleanProperties extends Properties {
private static class StripFirstLineStream extends FilterOutputStream {
private boolean firstlineseen = false;
public StripFirstLineStream(final OutputStream out) {
super(out);
}
#Override
public void write(final int b) throws IOException {
if (firstlineseen) {
super.write(b);
} else if (b == '\n') {
firstlineseen = true;
}
}
}
private static final long serialVersionUID = 7567765340218227372L;
#Override
public synchronized Enumeration<Object> keys() {
return Collections.enumeration(new TreeSet<>(super.keySet()));
}
#Override
public void store(final OutputStream out, final String comments) throws IOException {
super.store(new StripFirstLineStream(out), null);
}
}
Cleaning looks like this
final Properties props = new CleanProperties();
try (final Reader inStream = Files.newBufferedReader(file, Charset.forName("ISO-8859-1"))) {
props.load(inStream);
} catch (final MalformedInputException mie) {
throw new IOException("Malformed on " + file, mie);
}
if (props.isEmpty()) {
Files.delete(file);
return;
}
try (final OutputStream os = Files.newOutputStream(file)) {
props.store(os, "");
}
if you try to modify in the give xxx.conf file it will be useful.
The write method used to skip the First line (#Thu May 26 09:43:52 CEST 2011) in the store method. The write method run till the end of the first line. after it will run normally.
public class CleanProperties extends Properties {
private static class StripFirstLineStream extends FilterOutputStream {
private boolean firstlineseen = false;
public StripFirstLineStream(final OutputStream out) {
super(out);
}
#Override
public void write(final int b) throws IOException {
if (firstlineseen) {
super.write(b);
} else if (b == '\n') {
// Used to go to next line if did use this line
// you will get the continues output from the give file
super.write('\n');
firstlineseen = true;
}
}
}
private static final long serialVersionUID = 7567765340218227372L;
#Override
public synchronized Enumeration<java.lang.Object> keys() {
return Collections.enumeration(new TreeSet<>(super.keySet()));
}
#Override
public void store(final OutputStream out, final String comments)
throws IOException {
super.store(new StripFirstLineStream(out), null);
}
}
Can you not just flag up in your application somewhere when a meaningful configuration change takes place and only write the file if that is set?
You might want to look into Commons Configuration which has a bit more flexibility when it comes to writing and reading things like properties files. In particular, it has methods which attempt to write the exact same properties file (including spacing, comments etc) as the existing properties file.
You can handle this question by following this Stack Overflow post to retain order:
Write in a standard order:
How can I write Java properties in a defined order?
Then write the properties to a string and remove the comments as needed. Finally write to a file.
ByteArrayOutputStream baos = new ByteArrayOutputStream();
properties.store(baos,null);
String propertiesData = baos.toString(StandardCharsets.UTF_8.name());
propertiesData = propertiesData.replaceAll("^#.*(\r|\n)+",""); // remove all comments
FileUtils.writeStringToFile(fileTarget,propertiesData,StandardCharsets.UTF_8);
// you may want to validate the file is readable by reloading and doing tests to validate the expected number of keys matches
InputStream is = new FileInputStream(fileTarget);
Properties testResult = new Properties();
testResult.load(is);

JUnit testing for HTML parsing

I'm trying to set up unit tests on a web crawler and am rather confused as to how I would test them. (I've only done unit testing once and it was on a calculator program.)
Here are two example methods from the program:
protected static void HttpURLConnection(String URL) throws IOException {
try {
URL pageURL = new URL(URL);
HttpURLConnection connection = (HttpURLConnection) pageURL
.openConnection();
stCode = connection.getResponseCode();
System.out.println("HTTP Status code: " + stCode);
// append to CVS string
CvsString.append(stCode);
CvsString.append("\n");
// retrieve URL
siteURL = connection.getURL();
System.out.println(siteURL + " = URL");
CvsString.append(siteURL);
CvsString.append(",");
} catch (MalformedURLException e) {
e.printStackTrace();
}
}
and:
public static void HtmlParse(String line) throws IOException {
// create new string reader object
aReader = new StringReader(line);
// create HTML parser object
HTMLEditorKit.Parser parser = new ParserDelegator();
// parse A anchor tags whilst handling start tag
parser.parse(aReader, new HTMLEditorKit.ParserCallback() {
// method to handle start tags
public void handleStartTag(HTML.Tag t, MutableAttributeSet a,
int pos) {
// check if A tag
if (t == HTML.Tag.A) {
Object link = a.getAttribute(HTML.Attribute.HREF);
if (link != null) {
links.add(String.valueOf(link));
// cast to string and pass to methods to get title,
// status
String pageURL = link.toString();
try {
parsePage(pageURL); // Title - To print URL, HTML
// page title, and HTTP status
HttpURLConnection(pageURL); // Status
// pause for half a second between pages
Thread.sleep(500);
} catch (IOException e) {
e.printStackTrace();
} catch (BadLocationException e) {
e.printStackTrace();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
}
}, true);
aReader.close();
}
I've set up a test class in Eclipse and have outline test methods along these lines:
#Test
public void testHttpURLConnection() throws IOException {
classToTest.HttpURLConnection( ? );
assertEquals("Result", ? ? )
}
I don't really know where to go from here. I'm not even sure whether I should be testing live URLs or local files.
I found this question here: https://stackoverflow.com/questions/5555024/junit-testing-httpurlconnection
but I couldn't really follow it and I'm not sure it was solved anyway.
Any pointers appreciated.
There is no one conclusive answer to your question, what you test depends on what your code does and how deep you want to test it.
So if you have a parse method that takes an HTML and returns the string: "this is a parsed html" (obviously not very usefull, but just making a point), you'll test it like:
#Test
public void testHtmlParseSuccess() throws IOException {
assertEquals("this is a parsed html", classToTest.parse(html) ) //will return true, test will pass
}
#Test
public void testHtmlParseSuccess() throws IOException {
assertEquals("this is a wrong answer", classToTest.parse(html) ) //will return false, test will fail
}
There are a lot more methods besides assertEquals() so you should look here.
eventually it is up to you to decide what parts to test and how to test them.
Think about what effects your methods should have. In the first case the expected thing that should happen when HttpURLConnection(url) is called seems to be that the status code and url are appended to something called CvsString. You will have to implement something in CvsString so that you can inspect if that what you expected did actually happen.
However: Looking at your code I would suggest you consult a book about unit testing and how to refactor code so that it becomes well testable. In your code snippets I see a lot of reasons why unit testing your code is difficult if not impossible, e. g. overall use of static methods, methods with side effects, very little separation of concerns etc. Because of this it is impossible to answer your question fully in this context.
Don't get me wrong, this isn't meant in an offending way. It is well worth learning these things it will improve your coding abilities a lot.

How do I edit a XML node in a file object, using Java

There are a lot of examples on the internet of "reading" files but I can't find anything on "editing" a node value and writing it back out to the original file.
I have a non-working xml writer class that looks like this:
import org.w3c.dom.Document;
public class RunIt {
public static Document xmlDocument;
public static void main(String[] args)
throws TransformerException, IOException {
try {
xmlDocument = DocumentBuilderFactory.newInstance()
.newDocumentBuilder().parse("thor.xml");
} catch (IOException ex) {
ex.printStackTrace();
} catch (SAXException ex) {
ex.printStackTrace();
} catch (ParserConfigurationException ex) {
ex.printStackTrace();
}
addElement("A", "New");
writeDoc();
}
public static void addElement(String path, String val){
Element e = xmlDocument.createElement(path);
e.appendChild(xmlDocument.createTextNode(val));
xmlDocument.getDocumentElement().appendChild(e);
}
public static void writeDoc() throws TransformerException, IOException {
StringWriter writer = new StringWriter();
Transformer tf;
try {
tf = TransformerFactory.newInstance().newTransformer();
tf.transform(new DOMSource(xmlDocument), new StreamResult(writer));
writer.close();
} catch (TransformerConfigurationException e) {
e.printStackTrace();
} catch (TransformerFactoryConfigurationError e) {
e.printStackTrace();
}
}
}
For this example, lets say this is the XML and I want to add a "C" node (inside the A node) that contains the value "New" :
<A>
<B>Original</B>
</A>
You use the Document object to create new nodes. Adding nodes as you suggest involves creating a node, setting its content and then appending it to the root element. In this case your code would look somehting like this:
Element e = xmlDocument.createElement("C");
e.appendChild(xmlDocument.createTextNode("new"));
xmlDocument.getDocumentElement().appendChild(e);
This will add the C node as a new child of A right after the B node.
Additionally, Element has some convenience functions that reduce the amount of required code. The second line above could have been replaced with
e.setTextContent("new");
More complicated efforts involving non root elements will involve you using XPath to fetch the target node to be edited. If you do start to use XPath to target nodes, bear in mind that the JDK XPath performance is abysmal. Avoid using an XPath of "#foo" in favor of constructs like e.getAttribute("foo") whenever you can.
EDIT: Formatting the document back to a string which can be written to a file can be done with the following code.
Document xmlDocument;
StringWriter writer = new StringWriter();
TransformerFactory.newInstance().transform(new DOMSource(xmlDocument), new StreamResult(writer));
writer.close();
String xmlString = writer.toString();
EDIT: Re: updated question with code.
Your code doesn't work because you're conflating 'path' and 'element name'. The parameter to Document.createElement() is the name of the new node, not the location in which to place it. In the example code I wrote I didn't get into locating the appropriate node because you were asking specifically about adding a node to the document parent element. If you want your addElement() to behave the way I think you're expecting it to behave, you'd have to add another parameter for the xpath of the target parent node.
The other problem with your code is that your writeDoc() function doesn't have any output. My example shows writing the XML to a String value. You can write it to any writer you want by adapting the code, but in your example code you use a StringWriter but never extract the written string out of it.
I would suggest rewriting your code something like this
public static void main(String[] args) {
File xmlFile = new File("thor.xml");
Document xmlDocument = DocumentBuilderFactory.newInstance()
.newDocumentBuilder().parse(xmlFile);
// this is effective because we know we're adding to the
// document root element
// if you want to write to an arbitrary node, you must
// include code to find that node
addTextElement(xmlDocument.getDocumentElement(), "C", "New");
writeDoc(new FileWriter(xmlFile);
}
public static Element addTextElement(Node parent, String element, String val){
Element e = addElement(parent, element)
e.appendChild(xmlDocument.createTextNode(val));
return e;
}
public static Element addElement(Node parent, String element){
Element e = xmlDocument.createElement(path);
parent.appendChild(e);
return e;
}
public static void writeDoc(Writer writer) {
try {
Transformer tf = TransformerFactory.newInstance().newTransformer();
tf.transform(new DOMSource(xmlDocument), new StreamResult(writer));
} finally {
writer.close();
}
}
In order to write your document back to a file, you'll need an XML serializer or write your own. If you are using the Xerces library, check out XMLSerializer. For sample usage, you can also check out the DOMWriter samples page.
For more information on Xerces, read this

Categories

Resources