My java application loads an XML file and then parses the XML.
What I would like to is a search/replace on the file before I create the SAXBuilder. How can I do this in memory ( without having to write to the file ) ?
Here's my code, and where I envision doing the search/replace :
private String xmlFile = "D:\\mycomputer\\extract.xml";
File myXMLFile = new File(xmlFile);
// TODO
// REPLACE ALL "<content>" in xmlFile with "<content><![CDATA["
// REPLACE ALL "</content>" with "]]></content>"
SAXBuilder builder = new SAXBuilder("org.apache.xerces.parsers.SAXParser");
document = builder.build(new File(myXMLFile));
Read the file into memory, do the search/replace, and use the SAXBuilder(StringReader) method.
You can first read file to string with apache commons io and then change the input source for the SaxBuilder as in the following code snippet:
String fileStr = FileUtils.readFileToString(myXMLFile);
fileStr = fileStr.replaceAll("<content>","<content><![CDATA[");
fileStr = fileStr.replaceAll("</content>","]]></content>");
SAXBuilder builder = new SAXBuilder("org.apache.xerces.parsers.SAXParser");
document = builder.build(new ByteArrayInputStream(fileStr.getBytes()));
You answered to the question yourself - read the whole file into a StringBuilder, perform the replace in it and then call SAXParser.
The string can be passed to SAXBuilder using StringReader:
StringBuilder sb = new StringBuilder ();
loadFIleContent (filePath, sb);
document = builder.build (new StringReader (sb.toString ()));
P.S.: follow up to theglauber's answer:
If the file is really big (~100Mb) it's impractical to fully read it into memory as well as parsing it into a DOM tree. In this case you should consider using SAXParser and replacing as the file being parsed.
Depending on how large these files are, either read the file into a String, do your replacements in memory and build the XML from the String, or spawn a new thread to read the file, do the replacements and output, then build the XML from the output of that thread.
(I would suggest parsing and modifying the XML tree or using a XML filter, but i suspect you want to do this string-based replacement because the current content of your files is not correct XML.)
Related
I'm trying to read an xml file on from an android app using XOM as the XML library. I'm trying this:
Builder parser = new Builder();
Document doc = parser.build(context.openFileInput(XML_FILE_LOCATION));
But I'm getting nu.xom.ParsingException: Premature end of file. even when the file is empty.
I need to parse a very simple XML file, and I'm ready to use another library instead of XOM so let me know if there's a better one. or just a solution to the problem using XOM.
In case it helps, I'm using xerces to get the parser.
------Edit-----
PS: The purpose of this wasn't to parse an empty file, the file just happened to be empty on the first run which showed this error.
If you follow this post to the end, it seems that this has to do with xerces and the fact that its an empty file, and they didn't reach a solution on xerces side.
So I handled the issue as follows:
Document doc = null;
try {
Builder parser = new Builder();
doc = parser.build(context.openFileInput(XML_FILE_LOCATION));
}catch (ParsingException ex) { //other catch blocks are required for other exceptions.
//fails to open the file with a parsing error.
//I create a new root element and a new document.
//I fill them with xml data (else where in the code) and save them.
Element root = new Element("root");
doc = new Document(root);
}
And then I can do whatever I want with doc. and you can add extra checks to make sure that the cause is really an empty file (like check the file size as indicated by one of sam's comments on the question).
An empty file is not a well-formed XML document. Throwing a ParsingException is the right thing to do here.
I get a SOAP message from a web service, and I can convert the response string to an XML file using the below code. This works fine. But my requirement is not to write the SOAP message to a file. I just need to keep this XML document object in memory, and extract some elements to be used in further processing. However, if I just try to access the document object below, it comes as empty.
Can somebody please tell me how I can convert a String to an in-memory XML object (without having to write to a file)?
String xmlString = new String(data);
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder;
try
{
builder = factory.newDocumentBuilder();
// Use String reader
Document document = builder.parse( new InputSource(
new StringReader( xmlString ) ) );
TransformerFactory tranFactory = TransformerFactory.newInstance();
Transformer aTransformer = tranFactory.newTransformer();
Source src = new DOMSource( document );
Result dest = new StreamResult( new File( "xmlFileName.xml" ) );
aTransformer.transform( src, dest );
}
Remove the 5 last lines of code, and you'll just have the DOM document in memory. Store this document in some field, rather than in a local variable.
If that isn't sufficient, then please explain, with code, what you mean with "if I just try to access the document object below, it comes as empty".
JB Nizet is right, the first steps create a DOM out of xmlString. That will load your xmlString (or SOAP message) into an in-memory Document. What the following steps are doing (all the things related with the Transform) is to serialize the DOM to a file (xmlFileName.xml), which is not what you want to do, right?
When you said that your DOM is empty, I think you tried to print out the content of your DOM with document.toString(), and returned something like "[document: null]". This doesn't mean your DOM is empty. Actually your DOM contains data. You need now to use the DOM API to get access to the nodes inside your document. Try something like document.getChildNodes(), document.getElementsByTagName(), etc
I have XML data as a string which has to parsed, I am converting the XML string to inputsource using the following code:
StringReader reader1 = new StringReader( xmlstring);
InputSource inputSource1= new InputSource( reader );
And I am passing input source to
Document doc = builder.build(inputSource);
I want to use the same inputSource1 in another parser class also, but I am getting stream closed.
How would I handle this or is there any other way to pass XML data to a parser other than file?
Looking at the JavaDoc, it seems that InputSource is not designed to be shared and reused by multiple parsers.
standard processing of both byte and character streams is to close them on as part of end-of-parse cleanup, so applications should not attempt to re-use such streams after they have been handed to a parser.
So you will have to create a new InputSource. If you are really reading from a String, there would be no difference in I/O or memory cost anyway.
I'm working on a project under which i have to take a raw file from the server and convert it into XML file.
Is there any tool available in java which can help me to accomplish this task like JAXP can be used to parse the XML document ?
I guess you will need your objects for later use ,so create MyObject that will be some bean that you will load the values form your Raw File and you can write this to someFile.xml
FileOutputStream os = new FileOutputStream("someFile.xml");
XMLEncoder encoder = new XMLEncoder(os);
MyObject p = new MyObject();
p.setFirstName("Mite");
encoder.writeObject(p);
encoder.close();
Or you con go with TransformerFactory if you don't need the objects for latter use.
Yes. This assumes that the text in the raw file is already XML.
You start with the DocumentBuilderFactory to get a DocumentBuilder, and then you can use its parse() method to turn an input stream into a Document, which is an internal XML representation.
If the raw file contains something other than XML, you'll want to scan it somehow (your own code here) and use the stuff you find to build up from an empty Document.
I then usually use a Transformer from a TransformerFactory to convert the Document into XML text in a file, but there may be a simpler way.
JAXP can also be used to create a new, empty document:
Document dom = DocumentBuilderFactory.newInstance()
.newDocumentBuilder()
.newDocument();
Then you can use that Document to create elements, and append them as needed:
Element root = dom.createElement("root");
dom.appendChild(root);
But, as Jørn noted in a comment to your question, it all depends on what you want to do with this "raw" file: how should it be turned into XML. And only you know that.
I think if you try to load it in an XmlDocument this will be fine
I need your expertise once again. I have a java class that searches a directory for xml files (displays the files it finds in the eclipse console window), applies the specified xslt to these and sends the output to a directory.
What I want to do now is create an xml containing the file names and file format types. The format should be something like;
<file>
<fileName> </fileName>
<fileType> </fileType>
</file>
<file>
<fileName> </fileName>
<fileType> </fileType>
</file>
Where for every file it finds in the directory it creates a new <file>.
Any help is truely appreciated.
Use an XML library. There are plenty around, and the third party ones are almost all easier to use than the built-in DOM API in Java. Last time I used it, JDom was pretty good. (I haven't had to do much XML recently.)
Something like:
Element rootElement = new Element("root"); // You didn't show what this should be
Document document = new Document(rootElement);
for (Whatever file : files)
{
Element fileElement = new Element("file");
fileElement.addContent(new Element("fileName").addContent(file.getName());
fileElement.addContent(new Element("fileType").addContent(file.getType());
}
String xml = XMLOutputter.outputString(document);
Have a look at DOM and ECS. The following example was adapted to you requirements from here:
XMLDocument document = new XMLDocument();
for (File f : files) {
document.addElement( new XML("file")
.addXMLAttribute("fileName", file.getName())
.addXMLAttribute("fileType", file.getType())
)
);
}
You can use the StringBuilder approach suggested by Vinze, but one caveat is that you will need to make sure your filenames contain no special XML characters, and escape them if they do (for example replace < with <, and deal with quotes appropriately).
In this case it probably doesn't arise and you will get away without it, however if you ever port this code to reuse in another case, you may be bitten by this. So you might want to look at an XMLWriter class which will do all the escaping work for you.
Well just use a StringBuilder :
StringBuilder builder = new StringBuilder();
for(File f : files) {
builder.append("<file>\n\t<fileName>").append(f.getName).append("</fileName>\n)";
[...]
}
System.out.println(builder.toString());