I have a xml file, and its structure is like this.
<?xml version="1.0" encoding="MS949"?>
<pmd-cpd>
<duplication lines="123" tokens"123">
<file line="1" path="..">
<file line="1" path="..">
<codefragment><![CDATA[........]]></codefragment>
</duplication>
<duplication>
...
</duplication>
</pmd-cpd>
I want to delete 'codefragment' node, because my parser make an error 'invalid XML character(0x1). '
My parsing code is like this,
private void parseXML(File f){
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = null;
Document document = null;
try {
builder = factory.newDocumentBuilder();
document = builder.parse(f);
}catch(...)
The error happens in document = builder.parse(f); so I cannot use parser to delete the codefragment node.
This is why I want to delete these lines without the parser.
How can I delete this node without the parser...?
This is a followup answer to OP's self-answer, and the comment I made to that answer. Here's the recap, plus some extra:
Never do String += String in a loop. Use StringBuilder.
Read the XML in blocks, not lines.
Don't use String.replaceAll(). It has to recompile the regex every time, a regex you already have. Use Matcher.replaceAll().
Remember to close() the Reader. Better yet, use try-with-resources.
No need to save the clean XML back out, just use it directly.
Since XML is usually in UTF-8, read the file as UTF-8.
Don't print and ignore errors. Let caller handle errors.
private static void parseXML(File f) throws IOException, ParserConfigurationException, SAXException {
StringBuilder xml = new StringBuilder();
try (BufferedReader in = new BufferedReader(new InputStreamReader(new FileInputStream(f),
StandardCharsets.UTF_8))) {
Pattern badChars = Pattern.compile("[^\\u0009\\u000a\\u000d\\u0020-\\uD7FF\\uE000-\\uFFFD]+");
char[] cbuf = new char[1024];
for (int len; (len = in.read(cbuf)) != -1; )
xml.append(badChars.matcher(CharBuffer.wrap(cbuf, 0, len)).replaceAll(""));
}
DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder domBuilder = domFactory.newDocumentBuilder();
Document document = domBuilder.parse(new InputSource(new StringReader(xml.toString())));
// insert code using DOM here
}
How I solved this problem was, to remove the bad characters such as x01, save as new XML file, and then parse the new file.
Because I could not even parse my old xml file, I could not remove the node with parser.
So removing invalid character and saving as a new file code was like this.
//save the xml string as a new file.
public static Document stringToDom(String xmlSource)
throws SAXException, ParserConfigurationException, IOException {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
return builder.parse(new InputSource(new StringReader(xmlSource)));
}
//get the file and remove bad characters in it
private static void cleanString(File fileName) {
try {
BufferedReader in = new BufferedReader(new FileReader(fileName));
String xmlLines, cleanXMLString="";
Pattern p = null;
Matcher m = null;
p = Pattern.compile("[^\\u0009\\u000a\\u000d\\u0020-\\uD7FF\\uE000-\\uFFFD]");
while (((xmlLines = in.readLine()) != null)){
m = p.matcher(xmlLines);
if (m.find()){
cleanXMLString = cleanXMLString + xmlLines.replaceAll("[^\\u0009\\u000a\\u000d\\u0020-\\uD7FF\\uE000-\\uFFFD]", "")+"\n";
}else
cleanXMLString = cleanXMLString + xmlLines+"\n";
}
Document doc = stringToDom(cleanXMLString);
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
DOMSource source = new DOMSource(doc);
StreamResult result = new StreamResult(new File("\\new\\"+fileName.getName()));
transformer.transform(source, result);
} catch (IOException | SAXException | ParserConfigurationException | TransformerException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
Maybe, that's not good method since it takes quite long time for even a small file(under 5MB).
But if your file is small, you can try this...
Related
So i wanted to see if there was a way to convert an XML file with a soap message to a string and then update the values of particular tags. Here are the tags that i am talking about.
<o:Username>Bill</o:Username>
<o:Password Type="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-username-token-profile-1.0#PasswordText">Hello123</o:Password>
What i had originally done was update the xml file itself with the new user and pass, as seen in the code below.
try {
String namespace = "http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-secext-1.0.xsd";
configProperties.load(SecurityTokenHandler.class.getResourceAsStream(PROPERTIES_FILE));
DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
documentBuilderFactory.setNamespaceAware(true);
Document requestDoc = documentBuilderFactory.newDocumentBuilder().parse(SecurityTokenHandler.class.getResourceAsStream(SOAP_REQUEST_FILE));
Element docElement = requestDoc.getDocumentElement();
docElement.getElementsByTagNameNS(namespace, "Username").item(0).setTextContent(configProperties.getProperty("username"));
docElement.getElementsByTagNameNS(namespace,"Password").item(0).setTextContent(configProperties.getProperty("password"));
Transformer docTransformer = TransformerFactory.newInstance().newTransformer();
DOMSource source = new DOMSource(requestDoc);
StreamResult result = new StreamResult(SecurityTokenHandler.class.getResource(SOAP_REQUEST_FILE).getFile());
docTransformer.transform(source, result);
} catch(IOException | ParserConfigurationException | SAXException | TransformerException exception) {
LOGGER.error("There was an error loading the properties file", exception);
}
However, i found out later on that as this is a resource file, i'm not allowed to modify the file itself. I have to store the xml file as a string, update the user and password values without modifying the file, and then return a byte array of the xml file with the updated values (without modifying the original document). Any idea how i can accomplish this?
So the solution i came up with was to basically change the result to a byteArrayOuputStream rather than the xml file itself. Posting updated code:
try {
String namespace = "http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-secext-1.0.xsd";
configProperties.load(SecurityTokenHandler.class.getClassLoader().getResourceAsStream(PROPERTIES_FILE));
DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
documentBuilderFactory.setNamespaceAware(true);
Document requestDoc = documentBuilderFactory.newDocumentBuilder().parse(SecurityTokenHandler.class.getClassLoader().getResourceAsStream(SOAP_REQUEST_FILE));
Element docElement = requestDoc.getDocumentElement();
docElement.getElementsByTagNameNS(namespace, "Username").item(0).setTextContent(configProperties.getProperty("username"));
docElement.getElementsByTagNameNS(namespace,"Password").item(0).setTextContent(configProperties.getProperty("password"));
Transformer docTransformer = TransformerFactory.newInstance().newTransformer();
try (ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream()) {
StreamResult result = new StreamResult(byteArrayOutputStream);
DOMSource source = new DOMSource(requestDoc);
docTransformer.transform(source, result);
b = byteArrayOutputStream.toByteArray();
}
} catch(IOException | ParserConfigurationException | SAXException | TransformerException exception) {
LOGGER.error("There was an error loading the properties file", exception);
}
I'm trying to read in an RSS Feed/XML file into my application. The problem is that there's a BOM (Byte Order Mark) that my inputStream doesn't like and it throws an error which throws another error and everything dies.
Here's the method:
private Document getDomFromXMLString(String xml) {
Document doc = null;
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
try {
DocumentBuilder db = dbf.newDocumentBuilder();
InputSource is = new InputSource();
is.setCharacterStream(new StringReader(xml));
doc = db.parse(is);
} catch (Exception e) {
e.printStackTrace();
}
return doc;
}
So I'm trying to figure out how to effectively skip the BOM and input the rest of the file
If you have a character stream, and a String is, then skipping the BOM is as easy as stripping the first character, which is the BOM:
if (xml.charAt(0) == '\ufeff')
xml = xml.substring(1);
What you should really do, though, is ask the source to fix its feed; the BOM shouldn't be there in the first place.
i have made a method for updating my xml in the xml file by a using a GUI..
but when I update it everything seem to be working fine and the console is printing out the correct things.
But when I open the xml file and press refrah nothing is updated.
What is my problem?
public void updateObjType(String newTxt, int x) throws ParserConfigurationException, SAXException, IOException, XPathExpressionException {
System.out.println("String value : " + newTxt);
System.out.println("Index value : " + x);
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = builderFactory.newDocumentBuilder();
Document xmlDocument = builder.parse("xmlFiles/CoreDatamodel.xml");
XPath xPath = XPathFactory.newInstance().newXPath();
// Go thru the Object_types in the XML file and get item x.
NodeList nodeList = (NodeList) xPath.compile("//OBJECT_TYPE/text()")
.evaluate(xmlDocument, XPathConstants.NODESET);
// Set new NodeValue
nodeList.item(x).setNodeValue(newTxt);
String value = nodeList.item(x).getTextContent();
System.out.println(value);
}
this is the output from the console :
Original data : IF150Data
Incoming String value : Data
Index value : 4
updated data : Data
I solved it by using a transformer.
Full solution :
// Update the object type name from the object type list.
public void updateObjType(String newTxt, int x)
throws ParserConfigurationException, SAXException, IOException,
XPathExpressionException {
File file = new File("xmlFiles/CoreDatamodel.xml");
System.out.println("Incoming String value : " + newTxt);
System.out.println("Index value : " + x);
DocumentBuilderFactory builderFactory = DocumentBuilderFactory
.newInstance();
DocumentBuilder builder = builderFactory.newDocumentBuilder();
Document xmlDocument = builder.parse(file);
XPath xPath = XPathFactory.newInstance().newXPath();
NodeList nodeList = (NodeList) xPath.compile("//OBJECT_TYPE/text()")
.evaluate(xmlDocument, XPathConstants.NODESET);
// Set new NodeValue
nodeList.item(x).setNodeValue(newTxt);
// Save the new updates
try {
save(file, xmlDocument);
} catch (Exception e) {
e.printStackTrace();
}
}
And then the method I added :
public void save(File file, Document doc) throws Exception {
TransformerFactory factory = TransformerFactory.newInstance();
Transformer transformer = factory.newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
StringWriter writer = new StringWriter();
StreamResult result = new StreamResult(writer);
DOMSource source = new DOMSource(doc);
transformer.transform(source, result);
String s = writer.toString();
System.out.println(s);
FileWriter fileWriter = new FileWriter(file);
BufferedWriter bufferedWriter = new BufferedWriter(fileWriter);
bufferedWriter.write(s);
bufferedWriter.flush();
bufferedWriter.close();
}
I am trying to create an org.w3c.dom.Document form an XML string. I am using this How to convert string to xml file in java as a basis. I am not getting an exception, the problem is that my document is always null. The XML is system generated and well formed. I wish to convert it to a Document object so that I can add new Nodes etc.
public static org.w3c.dom.Document stringToXML(String xmlSource) throws Exception {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
InputStream input = IOUtils.toInputStream(xmlSource); //uses Apache commons to obtain InputStream
BOMInputStream bomIn = new BOMInputStream(input); //create BOMInputStream from InputStream
InputSource is = new InputSource(bomIn); // InputSource with BOM removed
Document document = builder.parse(new InputSource(new StringReader(xmlSource)));
Document document2 = builder.parse(is);
System.out.println("Document=" + document.getDoctype()); // always null
System.out.println("Document2=" + document2.getDoctype()); // always null
return document;
}
I have tried these things: I created a BOMInputStream thinking that a BOM was causing the conversion to fail. I thought that this was my issue but passing the BOMInputStream to the InputSource doesn't make a difference. I have even tried passing a literal String of simple XML and nothing but null. The toString method returns [#document:null]
I am using Xpages, a JSF implementation that uses Java 6. Full name of Document class used to avoid confusion with Xpage related class of the same name.
Don't rely on what toString is telling you. It is providing diagnostic information that it thinks is useful about the current class, which is, in this case, nothing more then...
"["+getNodeName()+": "+getNodeValue()+"]";
Which isn't going to help you. Instead, you will need to try and transform the model back into a String, for example...
String text
= "<fruit>"
+ "<banana>yellow</banana>"
+ "<orange>orange</orange>"
+ "<pear>yellow</pear>"
+ "</fruit>";
InputStream is = null;
try {
is = new ByteArrayInputStream(text.getBytes());
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(is);
System.out.println("Document=" + document.toString()); // always null
Transformer tf = TransformerFactory.newInstance().newTransformer();
tf.setOutputProperty(OutputKeys.INDENT, "yes");
tf.setOutputProperty(OutputKeys.METHOD, "xml");
tf.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");
ByteArrayOutputStream os = null;
try {
os = new ByteArrayOutputStream();
DOMSource domSource = new DOMSource(document);
StreamResult sr = new StreamResult(os);
tf.transform(domSource, sr);
System.out.println(new String(os.toByteArray()));
} finally {
try {
os.close();
} finally {
}
}
} catch (ParserConfigurationException | SAXException | IOException | TransformerConfigurationException exp) {
exp.printStackTrace();
} catch (TransformerException exp) {
exp.printStackTrace();
} finally {
try {
is.close();
} catch (Exception e) {
}
}
Which outputs...
Document=[#document: null]
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<fruit>
<banana>yellow</banana>
<orange>orange</orange>
<pear>yellow</pear>
</fruit>
You can try using this: http://www.wissel.net/blog/downloads/SHWL-8MRM36/$File/SimpleXMLDoc.java
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Writing to a XML file in Java
I have below XML text as a string.
<someNode>
<id>A124</id>
<status>404</status>
<message>No data</message>
</someNode>
I have above XML data as a String. Is it possible to convert the text into an XML file and archive the generated XML file?
Thanks!
DocumentBuilderFactory dbfac = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = dbfac.newDocumentBuilder();
Document doc = docBuilder.parse(new InputSource(new StringReader(theString)));
public class StringToXML {
public static void main(String[] args) {
String xmlString = "<?xml version=\"1.0\" encoding=\"utf-8\"?><soap:Envelope xmlns:soap=\"http://schemas.xmlsoap.org/soap/envelope/\" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" xmlns:xsd=\"http://www.w3.org/2001/XMLSchema\"></soap:Envelope>";
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder;
try
{
builder = factory.newDocumentBuilder();
// Use String reader
Document document = builder.parse( new InputSource(
new StringReader( xmlString ) ) );
TransformerFactory tranFactory = TransformerFactory.newInstance();
Transformer aTransformer = tranFactory.newTransformer();
Source src = new DOMSource( document );
Result dest = new StreamResult( new File( "xmlFileName.xml" ) );
aTransformer.transform( src, dest );
} catch (Exception e)
{
// TODO Auto-generated catch block
e.printStackTrace();
}
}
This information is helpful.
Thanks,
Pavan
Its simple as that:
String text = "<your><xml>data</xml></your>";
Writer writer = new FileWriter("/tmp/filename.xml");
writer.write(text);
writer.flush();
writer.close();
You can, use the java.io.FileWriter to save your file.
String fileData = "<sample><xml>data</xml></sample>";
File outputFile = new File("someFile.xml");
BufferedWriter bw = null;
try{
bw = new BufferedWriter(new FileWriter(outputFile));
bw.write(fileData);
}
catch(IOException e)
{
e.printStackTrace();
}
finally
{
try{bw.close();}catch(Exception e){}
}
In case you need to manipulate the xml do like Kazekage Gaara said:
DocumentBuilderFactory dbfac = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = dbfac.newDocumentBuilder();
Document doc = docBuilder.parse(new InputSource(new StringReader(theString)));
And to save you can do as I said above. To transform the document back to string:
fileData = doc.toString();
I would recommend using commons-io. It has a single method that will do everything you need.
Code would look something like
FileUtils.writeStringToFile(new File("filename.xml"), xml);