Externalize XML construction from a stream of CSV in Java

Externalize XML construction from a stream of CSV in Java - java

I get a stream of values as CSV , based on some condition I need to generate a XML including only a set of values from the CSV. For e.g .
Input : a:value1, b:value2, c:value3, d:value4, e:value5.
if (condition1)
XML O/P = <Request><ValueOfA>value1</ValueOfA><ValueOfE>value5</ValueOfE></Request>
else if (condition2)
XML O/P = <Request><ValueOfB>value2</ValueOfB><ValueOfD>value4</ValueOfD></Request>
I want to externalize the process in a way that given a template the output XML is generated accordingly. String manipulation is the easiest way of implementing this but I do not want to mess up the XML if some special characters appear in the input, etc. Please suggest.

Perhaps you could benefit from templating engine, something like Apache Velocity.

I would suggest creating an xsd and using JAXB to create the Java binding classes that you can use to generate the XML.

I recommend my own templating engine (JATL http://code.google.com/p/jatl/) Although its geared to (X)HTML its also very good at generating XML.
I didn't bother solving the whole problem for you (that is double splitting on the input ("," and then ":").) but this is how you would use JATL.
final String a = "stuff";
HtmlWriter html = new HtmlWriter() {
#Override
protected void build() {
//If condition1
start("Request").start("ValueOfA").text(a).end().end();
}
};
//Now write.
StringWriter writer = new StringWriter();
String results = html.write(writer).getBuffer().toString();
Which would generate
<Request><ValueOfA>stuff</ValueOfA></Request>
All the correct escaping is handled for you.

Related

Interpolate JSON values into a string

I am writing an application/class that will take in a template text file and a JSON value and return interpolated text back to the caller.
The format of the input template text file needs to be determined. For example: my name is ${fullName}
Example of the JSON:
{"fullName": "Elon Musk"}
Expected output:
"my name is Elon Musk"
I am looking for a widely used library/formats that can accomplish this.
What format should the template text file be?
What library would support the template text file format defined above and accept JSON values?
Its easy to build my own parser but there are many edge cases that needs to be taken care of and I do not want to reinvent the wheel.
For example, if we have a slightly complex JSON object with lists, nested values etc. then I will have to think about those as well and implement it.

I have always used org.json library. Found at http://www.json.org/.
It makes it really easy to go through JSON Objects.
For example if you want to make a new object:
JSONObject person = new JSONObject();
person.put("fullName", "Elon Musk");
person.put("phoneNumber", 3811111111);
The JSON Object would look like:
{
"fullName": "Elon Musk",
"phoneNumber": 3811111111
}
It's similar to retrieving from the Object
String name = person.getString("fullName");
You can read out the file with BufferedReader and parse it as you wish.
Hopefully I helped out. :)

This is how we do it.
Map inputMap = ["fullName": "Elon Musk"]
String finalText = StrSubstitutor.replace("my name is \${fullName}", inputMap)

You can try this：
https://github.com/alibaba/fastjson
Fastjson is a Java library that can be used to convert Java Objects into their JSON representation. It can also be used to convert a JSON string to an equivalent Java object. Fastjson can work with arbitrary Java objects including pre-existing objects that you do not have source-code of.

XML generated by xstream.toXml() is printing to one line [duplicate]

I want to format the output XML generated by Xstream, to make it more readable. Currently, a newline is added after each element but I would like newline to be added after every attribute. Is there a way to do this?
Pretty Print Writer is used by default to format the output of the xml but this doesn't suffice for me. I want newline to be added after every

XStream includes a PrettyPrintWriter
After building your XStream...
XStream xstream = //...whatever
Instead of:
// p is my object needing xml serialization
xstream.toXML(p)
Use something like this to make it pretty:
BufferedOutputStream stdout = new BufferedOutputStream(System.out);
xstream.marshal(p, new PrettyPrintWriter(new OutputStreamWriter(stdout)));

Take a look at their tutorial on tweaking the output.

I've used this to :
xstream= new XStream(new DomDriver());
But it's not so efficient than StaxDriver()

Read Xml Data and store in Text file Dynamically

I need to read XMl Data and store it in Text File, In the above code i am hard Coding getTagValue for all the Tag Names, If they are 4 tag names i can hardcode getTagValuebut now i had 200 tags and how can i read data into text file without hard coding getTagValue

When using DOM to parse the XML you must know the exact structure of the XML, so ther is no real way to avoid what your are doing.
If you have an XSD (if not you can write one), you can generate a Java object from it using some Xml binding framework like XmlBeans and then with one line you can parse the XML and start working with regular java object.
A sample code would be:
File xmlFile = new File("c:\employees.xml");
// Bind the instance to the generated XMLBeans types.
EmployeesDocument empDoc =
EmployeesDocument.Factory.parse(xmlFile);
// Get and print pieces of the XML instance.
Employees emps = empDoc.getEmployees();
Employee[] empArray = emps.getEmployeeArray();
for (int i = 0; i < empArray.length; i++)
{
System.out.println(empArray[i]);
}

Are there any advantages to using an XSLT stylesheet compared to manually parsing an XML file using a DOM parser

For one of our applications, I've written a utility that uses java's DOM parser. It basically takes an XML file, parses it and then processes the data using one of the following methods to actually retrieve the data.
getElementByTagName()
getElementAtIndex()
getFirstChild()
getNextSibling()
getTextContent()
Now i have to do the same thing but i am wondering whether it would be better to use an XSLT stylesheet. The organisation that sends us the XML file keeps changing their schema meaning that we have to change our code to cater for these shema changes. Im not very familiar with XSLT process so im trying to find out whether im better of using XSLT stylesheets rather than "manual parsing".
The reason XSLT stylesheets looks attractive is that i think that if the schema for the XML file changes i will only need to change the stylesheet? Is this correct?
The other thing i would like to know is which of the two (XSLT transformer or DOM parser) is better performance wise. For the manual option, i just use the DOM parser to parse the xml file. How does the XSLT transformer actually parse the file? Does it include additional overhead compared to manually parsing the xml file? The reason i ask is that performance is important because of the nature of the data i will be processing.
Any advice?
Thanks
Edit
Basically what I am currently doing is parsing an xml file and process the values in some of the xml elements. I don't transform the xml file into any other format. I just extract some value, extract a row from an Oracle database and save a new row into a different table. The xml file I parse just contains reference values I use to retrieve some data from the database.
Is xslt not suitable in this scenario? Is there a better approach that I can use to avoid code changes if the schema changes?
Edit 2
Apologies for not being clear enough about what i am doing with the XML data. Basically there is an XML file which contains some information. I extract this information from the XML file and use it to retrieve more information from a local database. The data in the xml file is more like reference keys for the data i need in the database. I then take the content i extracted from the XML file plus the content i retrieved from the database using a specific key from the XML file and save that data into another database table.
The problem i have is that i know how to write a DOM parser to extract the information i need from the XML file but i was wondering whether using an XSLT stylesheet was a better option as i wouldnt have to change the code if the schema changes.
Reading the responses below it sounds like XSLT is only used for transorming and XML file to another XML file or some other format. Given that i dont intend to transform the XML file, there is probably no need to add the additional overhead of parsing the XSLT stylesheet as well as the XML file.

Transforming XML documents into other formats is XSLT's reason for being. You can use XSLT to output HTML, JSON, another XML document, or anything else you need. You don't specify what kind of output you want. If you're just grabbing the contents of a few elements, then maybe you won't want to bother with XSLT. For anything more, XSLT offers an elegant solution. This is primarily because XSLT understands the structure of the document it's working on. Its processing model is tree traversal and pattern matching, which is essentially what you're manually doing in Java.
You could use XSLT to transform your source data into the representation of your choice. Your code will always work on this structure. Then, when the organization you're working with changes the schema, you only have to change your XSLT to transform the new XML into your custom format. None of your other code needs to change. Why should your business logic care about the format of its source data?

You are right that XSLT's processing model based on a rule-based event-driven approach makes your code more resilient to changes in the schema.
Because it's a different processing model to the procedural/navigational approach that you use with DOM, there is a learning and familiarisation curve, which some people find frustrating; if you want to go this way, be patient, because it will be a while before the ideas click into place. Once you are there, it's much easier than DOM programming.
The performance of a good XSLT processor will be good enough for your needs. It's of course possible to write very inefficient code, just as it is in any language, but I've rarely seen a system where XSLT was the bottleneck. Very often the XML parsing takes longer than the XSLT processing (and that's the same cost as with DOM or JAXB or anything else.)
As others have said, a lot depends on what you want to do with the XML data, which you haven't really explained.

I think that what you need is actually an XPath expression. You could configure that expression in some property file or whatever you use to retrieve your setup parameters.
In this way, you'd just change the XPath expression whenever your customer hides away the info you use in yet another place.
Basically, an XSLT is an overkill, you just need an XPath expression. A single XPath expression will allow to home in onto each value you are after.
Update
Since we are now talking about JDK 1.4 I've included below 3 different ways of fetching text in an XML file using XPath. (as simple as possible, no NPE guard fluff I'm afraid ;-)
Starting from the most up to date.
0. First the sample XML config file
<?xml version="1.0" encoding="UTF-8"?>
<config>
<param id="MaxThread" desc="MaxThread" type="int">250</param>
<param id="rTmo" desc="RespTimeout (ms)" type="int">5000</param>
</config>
1. Using JAXP 1.3 standard part of Java SE 5.0
import javax.xml.parsers.*;
import javax.xml.xpath.*;
import org.w3c.dom.Document;
public class TestXPath {
private static final String CFG_FILE = "test.xml" ;
private static final String XPATH_FOR_PRM_MaxThread = "/config/param[#id='MaxThread']/text()";
public static void main(String[] args) {
DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
docFactory.setNamespaceAware(true);
DocumentBuilder builder;
try {
builder = docFactory.newDocumentBuilder();
Document doc = builder.parse(CFG_FILE);
XPathExpression expr = XPathFactory.newInstance().newXPath().compile(XPATH_FOR_PRM_MaxThread);
Object result = expr.evaluate(doc, XPathConstants.NUMBER);
if ( result instanceof Double ) {
System.out.println( ((Double)result).intValue() );
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
2. Using JAXP 1.2 standard part of Java SE 1.4-2
import javax.xml.parsers.*;
import org.apache.xpath.XPathAPI;
import org.w3c.dom.*;
public class TestXPath {
private static final String CFG_FILE = "test.xml" ;
private static final String XPATH_FOR_PRM_MaxThread = "/config/param[#id='MaxThread']/text()";
public static void main(String[] args) {
try {
DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
docFactory.setNamespaceAware(true);
DocumentBuilder builder = docFactory.newDocumentBuilder();
Document doc = builder.parse(CFG_FILE);
Node param = XPathAPI.selectSingleNode( doc, XPATH_FOR_PRM_MaxThread );
if ( param instanceof Text ) {
System.out.println( Integer.decode(((Text)(param)).getNodeValue() ) );
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
3. Using JAXP 1.1 standard part of Java SE 1.4 + jdom + jaxen
You need to add these 2 jars (available from www.jdom.org - binaries, jaxen is included).
import java.io.File;
import org.jdom.*;
import org.jdom.input.SAXBuilder;
import org.jdom.xpath.XPath;
public class TestXPath {
private static final String CFG_FILE = "test.xml" ;
private static final String XPATH_FOR_PRM_MaxThread = "/config/param[#id='MaxThread']/text()";
public static void main(String[] args) {
try {
SAXBuilder sxb = new SAXBuilder();
Document doc = sxb.build(new File(CFG_FILE));
Element root = doc.getRootElement();
XPath xpath = XPath.newInstance(XPATH_FOR_PRM_MaxThread);
Text param = (Text) xpath.selectSingleNode(root);
Integer maxThread = Integer.decode( param.getText() );
System.out.println( maxThread );
} catch (Exception e) {
e.printStackTrace();
}
}
}

Since performance is important, I would suggest using a SAX parser for this. JAXB will give you roughly the same performance as DOM parsing PLUS it will be much easier and maintainable. Handling the changes in the schema also should not affect you badly if you are using JAXB, just get the new schema and regenerate the classes. If you have a bridge between the JAXB and your domain logic, then the changes can be absorbed in that layer without worrying about XML. I prefer treating XML as just a message that is used in the messaging layer. All the application code should be agnostic of XML schema.

How to parse multiple XML feeds at once from an array of URLs with SAX Parser for Java?

I am working on an Android application that parses one or more XML feeds based on user preferences. Is it possible to parse (using SAX Parser) more than one XML feed at once by providing the parser with an array of URLs of my XML feeds?
If no, what would be an alternative way of listing the parsed items from different XML feeds in one list? An intuitive approach is to use java.io.SequenceInputStream to merge the two input streams. However, this throws a NullPointerException:
try {
URL urlOne = new URL("http://example.com/feedone.xml");
URL urlTwo = new URL("http://example.com/feedtwo.xml");
InputStream streamOne = urlOne.openStream();
InputStream streamTwo = urlTwo.openStream();
InputStream streamBoth = new SequenceInputStream(streamOne, streamTwo);
InputSource sourceBoth = new InputSource(streamBoth);
//Parsing
stream = xmlHandler.getStream();
}
catch (Exception error) {
error.printStackTrace();
}
List<Item> content = stream.getList();
return content;

The tactic of appending the streams before parsing is not likely to work well, as the appended XML will not be valid XML. As each XML input has its own root element, the appended XML will have multiple roots, which is not permitted in XML. Additionally it's likely to have multiple XML headers like
<?xml version="1.0" encoding="UTF-8"?>
which is also invalid.
While it's possible to preprocess the input to work around these issues, you're likely better off parsing them separately and dealing with getting the results combined later.
It's possible to make a SAX parser add the parsed elements to an existing list of elements. If you post code in your question showing how you're parsing a single file, we might be able to help figure out how to adjust it to your need for multiple inputs.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.