first time using stack overflow, I'm now a student doing a project for analtics purpose, but the company store all the records into XML and I have to convert it and make a program as so to make it automated report send by email.
I'm using java to do XML parser and I'm now trying Apache common digester as other parser needs XSLT to do that, but iIwant a program that doesn't depends on XSLT because the company wants a system and sends a report like every 5 min of summary. So using XSLT may be quite slow as I saw some of the answer here. So may I know how to do that using digester or other methord, try to show some example the codes if possible, to make a conversion.
Here is the sample code I have build under digester:
public void run() throws IOException, SAXException {
Digester digester = new Digester();
// This method pushes this (SampleDigester) class to the Digesters
// object stack making its methods available to processing rules.
digester.push(this);
// This set of rules calls the addDataSource method and passes
// in five parameters to the method.
digester.addCallMethod("datasources/datasource", "addDataSource", 5);
digester.addCallParam("datasources/datasource/name", 0);
digester.addCallParam("datasources/datasource/driver", 1);
digester.addCallParam("datasources/datasource/url", 2);
digester.addCallParam("datasources/datasource/username", 3);
digester.addCallParam("datasources/datasource/password", 4);
File file = new File("C:\\Users\\1206432E\\Desktop\\datasource.xml");
// This method starts the parsing of the document.
//digester.parse("file:\\Users\\1206432E\\Desktop\\datasource.xml");
digester.parse(file);
}
Another one which is build using DOM to convert CSV to XML but still relies one XSLT file:
public static void main(String args[]) throws Exception {
File stylesheet = new File("C:\\Users\\1206432E\\Desktop\\Style.xsl");
File xmlSource = new File("C:\\Users\\1206432E\\Desktop\\data.xml");
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(xmlSource);
StreamSource stylesource = new StreamSource(stylesheet);
Transformer transformer = TransformerFactory.newInstance()
.newTransformer(stylesource);
Source source = new DOMSource(document);
Result outputTarget = new StreamResult(
new File("C:\\Users\\1206432E\\Desktop\\temp.csv"));
transformer.transform(source, outputTarget);
}
Related
I am trying to change a single value in a large (5mb) XML file. I always know the value will be in the first 10 lines, therefore I do not need to read in 99% of the file. Yet it seems doing a partial XML read in Java is quite tricky.
In this picture you can see the single value I need to access.
I have read a lot about XML in Java and the best practices of handling it. However, in this case I am unsure of what the best approach would be - A DOM, STAX or SAX parser all seem to have different best use case scenarios - and I am not sure which would best suit this problem. Since all I need to do is edit one value.
Perhaps, I shouldn't even use an XML parser and just go with regex, but it seem like it is a pretty bad idea to use regex on XML
Hoping someone could point me in the right direction,
Thanks!
I would choose DOM over SAX or StAX simply for the (relative) simplicity of the API. Yes, there is some boilerplate code to get the DOM populated, but once you get past that it is fairly straight-forward.
Having said that, if your XML source is 100s or 1000s of megabytes, one of the streaming APIs would be better suited. As it is, 5MB is not what I would consider a large dataset, so go ahead and use DOM and call it a day:
import java.io.File;
import javax.xml.parsers.*;
import javax.xml.transform.*;
import javax.xml.transform.dom.*;
import javax.xml.transform.stream.*;
import javax.xml.xpath.*;
import org.w3c.dom.*;
public class ChangeVersion
{
public static void main(String[] args)
throws Exception
{
if (args.length < 3) {
System.err.println("Usage: ChangeVersion <input> <output> <new version>");
System.exit(1);
}
File inputFile = new File(args[0]);
File outputFile = new File(args[1]);
int updatedVersion = Integer.parseInt(args[2], 10);
DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = domFactory.newDocumentBuilder();
Document doc = docBuilder.parse(inputFile);
XPathFactory xpathFactory = XPathFactory.newInstance();
XPath xpath = xpathFactory.newXPath();
XPathExpression expr = xpath.compile("/PremiereData/Project/#Version");
NodeList versionAttrNodes = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);
for (int i = 0; i < versionAttrNodes.getLength(); i++) {
Attr versionAttr = (Attr) versionAttrNodes.item(i);
versionAttr.setNodeValue(String.valueOf(updatedVersion));
}
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.transform(new DOMSource(doc), new StreamResult(outputFile));
}
}
You can use the StAX parser to write the XML as you read it. While doing this you can replace the content as it parses. Using a StAX parser will only contain parts of the xml in memory at any given time.
public static void main(String [] args) throws Exception {
final String newProjectId = "888";
File inputFile = new File("in.xml");
File outputFile = new File("out.xml");
System.out.println("Reading " + inputFile);
System.out.println("Writing " + outputFile);
XMLInputFactory inFactory = XMLInputFactory.newInstance();
XMLEventReader eventReader = inFactory.createXMLEventReader(new FileInputStream(inputFile));
XMLOutputFactory factory = XMLOutputFactory.newInstance();
XMLEventWriter writer = factory.createXMLEventWriter(new FileWriter(outputFile));
XMLEventFactory eventFactory = XMLEventFactory.newInstance();
boolean useExistingEvent; // specifies if we should use the event right from the reader
while (eventReader.hasNext()) {
XMLEvent event = eventReader.nextEvent();
useExistingEvent = true;
// look for our Project element
if(event.getEventType() == XMLEvent.START_ELEMENT) {
// read characters
StartElement elemEvent = event.asStartElement();
Attribute attr = elemEvent.getAttributeByName(QName.valueOf("ObjectID"));
// check to see if this is the project we want
// TODO: put what logic you want here
if("Project".equals(elemEvent.getName().getLocalPart()) && attr != null && attr.getValue().equals("1")) {
Attribute versionAttr = elemEvent.getAttributeByName(QName.valueOf("Version"));
// we need to make a list of new attributes for this element which doesnt include the Version a
List<Attribute> newAttrs = new ArrayList<>(); // new list of attrs
Iterator<Attribute> existingAttrs = elemEvent.getAttributes();
while(existingAttrs.hasNext()) {
Attribute existing = existingAttrs.next();
// copy over everything but version attribute
if(!existing.getName().getLocalPart().equals("Version")) {
newAttrs.add(existing);
}
}
// add our new attribute for projectId
newAttrs.add(eventFactory.createAttribute(versionAttr.getName(), newProjectId));
// were using our own event instead of the existing one
useExistingEvent = false;
writer.add(eventFactory.createStartElement(elemEvent.getName(), newAttrs.iterator(), elemEvent.getNamespaces()));
}
}
// persist the existing event.
if(useExistingEvent) {
writer.add(event);
}
}
writer.close();
}
I have a writeScore.class which is doing the following:
Get URL and than InputStream of a relative file score.xml
Start building new xml document by parsing through existing one
Append new record to the document builder
The last thing the program does should be to write the new document in a place of the old one. Sadly I can't find any way to use the relative path I already have. I've tried to use the use URLConnection and than .getOutputStream, but I get the protocol doesn't support output error. I also gave a try to OIUtils.copy(is, os) to convert InputStream into OutputStream but although there are no errors, for some reason the file doesn't change and the last modification date points to the last time I did it with direct file address. (I did a system wide search in case new file was created somewhere else, found nothing).
Here is simplified code using the URLConnection method:
public class writeScore {
public writeScore(Score NewScore, Interface obj) {
try {
// prepare file path
URL scoreUrl = new URL("file","localhost","save/score.xml");
InputStream is = scoreUrl.openStream();
final URLConnection scoreCon = scoreUrl.openConnection();
final OutputStream os = scoreCon.getOutputStream();
// build the document
DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docFactory.newDocumentBuilder();
Document doc = docBuilder.parse(is);
// root element
Element root = doc.getDocumentElement();
Element rootElement = doc.getDocumentElement();
// add new save
/* removed for code cleaning */
// save source
DOMSource source = new DOMSource(doc);
// set up the transformer
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
// add properties of the output file
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
// save the file
StreamResult result = new StreamResult(os);
transformer.transform(source, result);
} // and catch all the exceptions
}
}
If need be I'm happy to change the way I input the file as well, as long I can have a relative path from Java/bin/game/writeScore.class to Java/save/score.xml where 'Java' is my project directory. But also once the game is package into a .jar and save is a folder outside of that jar.
Just use new File("name") that creates a file in the current working directory. With new File("../name") you create a file in the parent directory. You then need to wrap the file in a FileOutputStream.
But I would consinder using JAXB.unmarshal(file, clazz) for reading and JAXB.marshal(object, file) for writing. Just delete the old file before you write the new one. I never had troubles with updating resp. overwriting.
Here is how I did it, I removed the exception handling and logging. Pretty concise and generic, too.
public static <T> void writeXml(T obj, File f)
{
JAXB.marshal(obj, f);
}
I have xml-file. I need to read it, make some changes and write new changed version to some new destination.
I managed to read, parse and patch this file (with DocumentBuilderFactory, DocumentBuilder, Document and so on).
But I cannot find a way how to save that file. Is there a way to get it's plain text view (as String) or any better way?
Something like this works:
Transformer transformer = TransformerFactory.newInstance().newTransformer();
Result output = new StreamResult(new File("output.xml"));
Source input = new DOMSource(myDocument);
transformer.transform(input, output);
That will work, provided you're using xerces-j:
public void serialise(org.w3c.dom.Document document) {
java.io.ByteArrayOutputStream data = new java.io.ByteArrayOutputStream();
java.io.PrintStream ps = new java.io.PrintStream(data);
org.apache.xml.serialize.OutputFormat of =
new org.apache.xml.serialize.OutputFormat("XML", "ISO-8859-1", true);
of.setIndent(1);
of.setIndenting(true);
org.apache.xml.serialize.XMLSerializer serializer =
new org.apache.xml.serialize.XMLSerializer(ps, of);
// As a DOM Serializer
serializer.asDOMSerializer();
serializer.serialize(document);
return data.toString();
}
That will give you possibility to define xml format
new XMLWriter(new FileOutputStream(fileName),
new OutputFormat(){{
setEncoding("UTF-8");
setIndent(" ");
setTrimText(false);
setNewlines(true);
setPadText(true);
}}).write(document);
I want to modify my name in xml cv file, but when I use this statement:
XMLOutputFactory xof = XMLOutputFactory.newInstance();
XMLStreamWriter xtw = null;
xtw = xof.createXMLStreamWriter(new FileWriter("eman.xml"));
all the content of the file are removed and it becomes empty. Basically, I want to open (eman.xml) for modification without removing its content.
If u want to use STAX in processing your xml file , u should know that u can only read/write from/ to xml file , but if u want to modify ur xml when exact event happen in STAX u could make processing for ur xml using DOM .
here is some example:
XMLInputFactory factory = XMLInputFactory.newInstance();
XMLStreamReader xmlReader= factory.createXMLStreamReader(new FileReader(fileName));
int eventType;
while(xmlReader.hasNext()){
eventType= xmlReader.next();
if(eventType==XMLEvent.START_ELEMENT)
{
QName qNqme = xmlReader.getName();
if("YURTAG".equals(qNqme.toString()))
{
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
File file = new File("YOURXML.xml");
Document doc = builder.parse(file);
//make the required processing for your file.
}
}
The question about reading and writing at the same time with stax is also answered here: How to modify a huge XML file by StAX?
I'm parsing (a lot of) XML files that contain entity references which i dont know in advance (can't change that fact).
For example:
xml = "<tag>I'm content with &funny; &entity; &references;.</tag>"
when i try to parse this using the following code:
final DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
final DocumentBuilder db = dbf.newDocumentBuilder();
final InputSource is = new InputSource(new StringReader(xml));
final Document d = db.parse(is);
i get the following exception:
org.xml.sax.SAXParseException: The entity "funny" was referenced, but not declared.
but, what i do want to achieve is, that the parser replaces every entity that is not declared (unknown to the parser) with an empty String ''.
Or even better, is there a way to pass a map to the parser like:
Map<String,String> entityMapping = ...
entityMapping.put("funny","very");
entityMapping.put("entity","important");
entityMapping.put("references","stuff");
so that i could do the following:
final DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
final DocumentBuilder db = dbf.newDocumentBuilder();
final InputSource is = new InputSource(new StringReader(xml));
db.setEntityResolver(entityMapping);
final Document d = db.parse(is);
if i would obtain the text from the document using this example code i should receive:
I'm content with very important stuff.
Any suggestions? Of course, i already would be happy to just replace the unknown entity's with empty strings.
Thanks,
The StAX API has support for this. Have a look at XMLInputFactory, it has a runtime property which dictates whether or not internal entities are expanded, or left in place. If set to false, then the StAX event stream will contain instances of EntityReference to represent the unexpanded entities.
If you still want a DOM as the end result, you can chain it together like this:
XMLInputFactory inputFactory = XMLInputFactory.newInstance();
inputFactory.setProperty(XMLInputFactory.IS_REPLACING_ENTITY_REFERENCES, false);
Transformer transformer = TransformerFactory.newInstance().newTransformer();
String xml = "my xml";
StringReader xmlReader = new StringReader(xml);
XMLEventReader eventReader = inputFactory.createXMLEventReader(xmlReader);
StAXSource source = new StAXSource(eventReader);
DOMResult result = new DOMResult();
transformer.transform(source, result);
Node document = result.getNode();
In this case, the resulting DOM will contain nodes of org.w3c.dom.EntityReference mixed in with the text nodes. You can then process these as you see fit.
Since your XML input seems to be available as a String, could you not do a simple pre-processing with regular expression replacement?
xml = "...";
/* replace entities before parsing */
for (Map.Entry<String,String> entry : entityMapping.entrySet()) {
xml = xml.replaceAll("&" + entry.getKey() + ";", entry.getValue());
}
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
...
It's quite hacky, and you may want to spend some extra effort to ensure that the regexps only match where they really should (think <entity name="&don't-match-me;"/>), but at least it's something...
Of course, there are more efficient ways to achieve the same effect than calling replaceAll() a lot of times.
You could add the entities at the befinning of the file. Look here for more infos.
You could also take a look at this thread where someone seems to have implemented an EntityResolver interface (you could also implement EntityResolver2 !) where you can process the entities on the fly (e.g. with your proposed Map).
WARNING: there is a bug! in jdk6, but you could try it with jdk5