Groovy script to find and replace characters in a XML file - java

I have the below groovy script snippet, I want to replace some characters inside a list of xml files, How could I do this?
println "Remove Invisible characters in CustomMetadata"
def customMetadata = ant.fileScanner {
fileset(dir: '${target.dir}') {
include(name: 'customMetadata/*.md')
}
}
// m is the file
for (m in customMetadata) {
//Want to get a content of the file and replace if there's any specified characters
println("Found file $m")
}

If you need to replace characters in the whole file, just read it with
def content = new File('[your file name]').text
use a replaceAll() to replace your character via regular expressions and write the file back with
new File('[your file name]').write(content)
For replacing unnecessary whitespaces, this should work.
A "cleaner" solution would be to parse the file, replace the characters in the xml content and write it back. This is more complicated and might lead to some problems with XML namespaces. To give it a try, search for XMLSlurper or XMLParser: http://www.groovy-lang.org/processing-xml.html

Related

How to print the escape characters as it is while using PrettyPrintWriter?

Using PrettyPrintWriter to pretty print the xml file
In the generated xml file the ' (apostrophe) is getting written as &apos
Want it to print as '
Using the following
xstream.marshal(obj, new PrettyPrintWriter(writer)) to pretty print
,any suggestions on how to print the escape characters as it is?
You can provide your own implementation of PrettyPrintWriter, which extends that class and overrides its writeText(QuickWriter, String) method.
In its most basic form that would be something like this:
import com.thoughtworks.xstream.core.util.QuickWriter;
import com.thoughtworks.xstream.io.xml.PrettyPrintWriter;
import java.io.Writer;
public class MyPrettyPrintWriter extends PrettyPrintWriter {
public MyPrettyPrintWriter(Writer writer) {
super(writer);
}
#Override
public void writeText(QuickWriter writer, String string) {
writer.write(string);
}
}
You would use this as follows:
String s = "Foo'Bar";
XStream xstream = new XStream();
FileWriter writer = new FileWriter("my_demo.xml");
xstream.marshal(s, new MyPrettyPrintWriter(writer));
The output file contains the following:
<string>Foo'Bar</string>
This is basic - it just passes the tag contents through to the file unchanged - nothing is escaped.
You are OK for content containing ", ' and >. But this will be a problem for text containing > and & - which should still be escaped. So you can enhance your writeText method to handle those cases as needed. See What characters do I need to escape in XML documents? for more details.
Note also this is only for text values - not for XML attributes. There is a separate writeAttributeValue method for that (probably not needed in your scenario).
It is worth adding: There should be no need to do any of this. The XML is valid, with escaped values such as &apos;. Any process (any half-way decent XML library or tool) reading that data should handle them correctly.

How to validate a file name in java

I am working with a coverity issue which i need to validate a file name
using regEx in java . In my application support .pdf , .txt , csv etc . My
file name getting as xxx.txt from user . i want to validate my file name
with proper extension format and not included any special character other
than dot ( eg .txt) .
filePath = properties.getProperty("DOCUMENT.LIBRARY.LOCATION");
String fileName = (String) request.getParameter("read");
Only If the file path is completed itsproper validation, the below code should be work .
filePath += "/" + fileName;
This is a terrible answer as it only verifies the filename ends with the desired extension, but doesn't verify the rest of the filename as requested in the original question. Something more like this would be MUCH better:
fileName.matches("[-_. A-Za-z0-9]+\\.(pdf|txt|csv)");
This ensures the filename contains only ONE OR MORE -, _, PERIOD, SPACE, or alphanumeric characters, followed by exactly one of .pdf, .txt or .csv at the end of the filename. Your system might allow other characters in filenames and you could add them to this list if desired. An alternate, less secure approach is to prevent 'bad' characters something like:
fileName.matches("[^/\]+\\.(pdf|txt|csv)");
Which simply prevents / or \ characters from being in the file name before the required ending extension. But this doesn't prevent potentially other dangerous characters, like NULL bytes, for example.
Have a look at String.endsWith() method
if (fileName.endsWith(".pdf")) {
// do something
}
Or use the method String.matches()
fileName.matches("\\.(pdf|txt|csv)$")

Java regex for Windows file path

I'm trying to build a Java regex to search a .txt file for a Windows formatted file path, however, due to the file path containing literal backslashes, my regex is failing.
The .txt file contains the line:
C\Windows\SysWOW64\ntdll.dll
However, some of the filenames in the text file are formatted like this:
C\Windows\SysWOW64\ntdll.dll (some developer stuff here...)
So I'm unable to use String.equals
To match this line, I'm using the regex:
filename = "C\\Windows\\SysWOW64\\ntdll.dll"
read = BufferedReader.readLine();
if (Pattern.compile(Pattern.quote(filename), Pattern.CASE_INSENSITIVE).matcher(read).find()) {
I've tried escaping the literal backslashes, using the replace method, i.e:
filename.replace("\\", "\\\\");
However, this is failing to find, I'm guessing this is because I need to further escape the backslashes after the Pattern has been built, I'm thinking I might need to escape upto an additional four backslashes, i.e:
Pattern.replaceAll("\\\\", "\\\\\\\\");
However, each time I try, the pattern doesn't get matched. I'm certain it's a problem with the backslashes, but I'm not sure where to do the replacement, or if there's a better way of building the pattern.
I think the problem is further being compounded as the replaceAll method also uses a regex, with means the pattern will have it's own backslashes in there, to deal with the case insensitivity.
Any input or advice would be appreciated.
Thanks
Seems like you're attempting to to a direct comparison of String against another. For exact matches, you could do (
if (read.equalsIgnoreCase(filename)) {
of simply
if (read.startsWith(filename)) {
Try this :
While reading each line from the file, replace '\' by '\\'.
Then :
String lLine = "C\\Windows\\SysWOW64\\ntdll.dll";
Pattern lPattern = Pattern.compile("C\\\\Windows\\\\SysWOW64\\\\ntdll\\.dll");
Matcher lMatcher = lPattern.matcher(lLine);
if(lMatcher.find()) {
System.out.println(lMatcher.group());
}
lLine = "C\\Windows\\SysWOW64\\ntdll.dll (some developer stuff here...)";
lMatcher = lPattern.matcher(lLine);
if(lMatcher.find()) {
System.out.println(lMatcher.group());
}
The correct usage will be:
String filename = "C\\Windows\\SysWOW64\\ntdll.dll";
String file = filename.replace('\\', ' ');

Forcing escaped characters when writing to XML

I'm using org.w3c and javax.xml.parsers in Java for reading and writing xml files.
When I read an xml file, the
escaped line breaks will be replaced by real line breaks. When I write the content back to the file, I loose escaping and the content of the file will change unintentionally.
so
<somenode>First line.
Second line</somenode>
will be replaced by:
<somenode>First line.
Second line.</somenode>
Before writing xml content back to disk I tried:
String content = node.getTextContent().replace("\n","
");
node.setTextContent(content);
Of course it does not work, it will be escaped to &#10; in the file.
I do not want to litter the file with CDATA tags!
What I want to do is legal XML output so there has to be a way to do it.
Thanks in advance for any ideas :)
Do it by setting the following property for the JAXB Marshaller:
marshaller.setProperty("jaxb.encoding", "Unicode");

How to place an XML file in form of a String inside Java file

I am using Eclipse IDE
I have a big XML file .
I wan to copy this XML file and provide it in form of a String .
String XMLStringSource = "XML Content Here" ;
I am getting errros with double quotes in the XML file , please tell me how can we resolve this ??
You should not do that. In fact, it is impossible beyond a certain size as there is a limit of 64KB on the bytecode of methods (which include initializers).
The correct way to do it is put the XML file next to the source code and use Class.getResourceAsStream() to read the file.
You can configure Eclipse to escape text when pasting into a string literal.
Go to Window > Preferences > Java > Editor > Typing.
Select the checkbox which says "Escape text when pasting into a string literal".
Press Apply.
Now create a String literal e.g.
String xml = "";
Copy your xml and paste it inside the quotes. Eclipse will automatically escape it for you.
This is quite handy for small bits of xml or text.
If you have a large file, then you should read the file into a string instead.
Yow will need to escape the quotes. But this will change the look of XML; it will be a combination of Java String/XML. Also if the the XML file is big like you say then you will need to do a search and replace for quotes " with escaped quotes \" before pasting into the java file.
try
{
FileReader fstream = new FileReader("D:\\File.xml");
BufferedReader out = new BufferedReader(fstream);
String y="";
while ((y=out.readLine()) != null) {
System.out.println(y);
}
//out.close();
}catch(Exception e) {
e.printStackTrace();
}
is this acceptable? just read the file line by line, the String variable will have double quotations and they wont generate an error. If reading step by step can solve problem.

Categories

Resources