Remove indentation of xml files - java

I am writing a function that I will use for my unit tests. I want to compare XML files, but as one of them will be created by a Third party library I want to mitigate any possible differences because of different indentation. Thus I wrote the following function:
private String normalizeXML(String xmlString) {
String res = xmlString.replaceAll("[ \t]+", " ");
// leading whitespaces are inconsistent in the resulting xmls.
res = res.replaceAll("^\\s+", "");
return res.trim();
}
However this function is not removing the leading interval on each line of the XML.
When I write the function in this way (difference in the first regex):
private String normalizeXMLs(String xmlString) {
String res = xmlString.replaceAll("\\s+", " ");
// leading whitespaces are inconsistent in the resulting xmls.
res = res.replaceAll("^\\s+", "");
return res.trim();
}
It does remove the trailing white space, but it also makes the xml appear as a single line which is very troubling when you need to compare the differences.
I just can not justify why the first implementation does not displace the leading interval. Any ideas?
EDIT: Even more interesting is that if I make a single line manipulation:
String res = xmlString.replaceAll("^\\s+", "");
This line does not remove any of identation!

Rather than trying to manipulate the string representations, it would be safer to use a dedicated XML comparison tool such as XMLUnit that allows you to define exactly which differences are significant and which aren't. Trying to modify XML data using regular expressions is rarely a good idea, you should use a proper XML parser that knows all the rules of what makes well formed XML.

Maybe:
String res = xmlString.replaceAll("[ \\t]+", " ");
Not \t...

this one worked for me:
private static String normalizeXMLs(String xmlString) {
String res = xmlString.replaceAll("\\t", "");
return res.trim();
}
Good luck :)

Related

Android Java: Extract substring from uri string after particular characters

I would like to extract a substring starting from particular substring.
I'm getting an array of URIs of multiple images from Photo Library via this solution. But the URIs are something like this
content://com.android.providers.media.documents/document/image%3A38
I would like to remove content:// and get only
com.android.providers.media.documents/document/image%3A38
I've searched through the Internet but found no best solution. Perhaps to avoid regex because it's kinda heavy.
At the moment I choose not to get the substring by checking after second '/' because it feels kinda "hardcoded".
Not sure if I've missed a good solution but please help.
If you need to get whatever string comes after a certain substring, in this case "content://", you could use the split method.
String string = "content://com.android.providers.media.documents/document/image%3A38";
String uri = string.split("content://")[1];
Or you could use the substring and indexOf methods like in the other answer, but add on the length of the substring.
String string = "content://com.android.providers.media.documents/document/image%3A38";
String sub = "content://";
String uri = string.substring(string.indexOf(sub) + sub.length());
You can just use the substring method in order to create new strings without content://, something like this :
String string = "content://com.android.providers.media.documents/document/image%3A38"
String secondString = string.substring(string.indexOf("com.android"));

How to automatically unescape the escape characters in a string

I am receiving the data from the service with the escape sequence characters...I have managed to elemenate them by this code
results=results.replace("\\\"", "\"");
if(results.startsWith("\"")) {
results=results.substring(1,results.length());
}
if(results.endsWith("\"")) {
results=results.substring(0,results.length()-1);
}
It works fine but for some strings it throws exception while creating json object...How do I automatically unescape the escape characters in the result, I have searched for answers but many of them saying to use a third party library...what is the best I can achieve this.
I think Apache Commons work pretty good. It has StringEscapeUtils class with bunch of different static methods for escaping and unescaping strings, so i think you should check it.
Good luck!
place this part of code below the parsing Array
// to remove all <P> </p> and <br /> and replace with ""
content = content.replace("<br />", "");
content = content.replace("<p>", "");
content = content.replace("</p>", "");
here for me content is object, replace according to ur necessary in the place of "content".

Replace backslashes in a string using Java/Groovy

Trying to get a simple string replace to work using a Groovy script. Tried various things, including escaping strings in various ways, but can't figure it out.
String file ="C:\\Test\\Test1\\Test2\\Test3\\"
String afile = file.toString() println
"original string: " + afile
afile.replace("\\\\", "/")
afile.replaceAll("\\\\", "/") println
"replaced string: " + afile
This code results in:
original string: C:\Test\Test1\Test2\Test3\
replaced string: C:\Test\Test1\Test2\Test3\
----------------------------
The answer, as inspired by Sorrow, looks like this:
// first, replace backslashes
String afile = file.toString().replaceAll("\\\\", "/")
// then, convert backslash to forward slash
String fixed = afile.replaceAll("//", "/")
replace returns a different string. In Java Strings cannot be modified, so you need to assign the result of replacing to something, and print that out.
String other = afile.replaceAll("\\\\", "/")
println "replaced string: " + other
Edited: as Neftas pointed in the comment, \ is a special character in regex and thus have to be escaped twice.
In Groovy you can't even write \\ - it is "an unsupported escape sequence". So, all answers I see here are incorrect.
If you mean one backslash, you should write \\\\. So, changing backslashes to normal slashes will look as:
scriptPath = scriptPath.replaceAll("\\\\", "/")
If you want to replace pair backslashes, you should double the effort:
scriptPath = scriptPath.replaceAll("\\\\\\\\", "/")
Those lines are successfully used in the Gradle/Groovy script I have intentionally launched just now once more - just to be sure.
What is even more funny, to show these necessary eight backslashes "\\\\\\\\" in the normal text here on StackOverflow, I have to use sixteen of them! Sorry, I won't show you these sixteen, for I would need 32! And it will never end...
If you're working with paths, you're better off using the java.io.File object. It will automatically convert the given path to the correct operating-system dependant path.
For example, (on Windows):
String path = "C:\\Test\\Test1\\Test2\\Test3\\";
// Prints C:\Test\Test1\Test2\Test3
System.out.println(new File(path).getAbsolutePath());
path = "/Test/Test1/Test2/Test3/";
// Prints C:\Test\Test1\Test2\Test3
System.out.println(new File(path).getAbsolutePath());
1) afile.replace(...) doesn't modify the string you're calling it on, it just returns a new string.
2) The input strings (String file ="C:\\Test\\Test1\\Test2\Test3\\";), from Java's perspective, only contain single backslashes. The first backslash is the escape character, then the second backslash tells it that you actually want a backslash.
so
afile.replace("\\\\", "/");
afile.replaceAll("\\\\", "/");
should be...
afile = afile.replace("\\", "/");
afile = afile.replaceAll("\\", "/");
In Groovy you can use regex in this way as well:
afile = afile.replaceAll(/(\\)/, "/")
println("replaced string: "+ afile)
Note that (as Sorrow said) replaceAll returns the result, doesn't modify the string. So you need to assign to a var before printing.
String Object is immutable so if you call a method on string object that modifies it. It will always return a new string object(modified). So you need to store the result return by replaceAll() method into a String object.
As found here, the best candidate might be the static Matcher method:
Matcher.quoteReplacement( ... )
According to my experiments this doubles single backslashes. Despite the method name... and despite the slightly cryptic Javadoc: "Slashes ('\') and dollar signs ('$') will be given no special meaning"

Adding string inside an XML Tag in Java?

String tagIdentifier=12;
String sMyXML = "";
sMyXML += "<?xml version=\"1.0\" encoding=\"UTF-8\"?>";
sMyXML += "<Tag>";
sMyXML += "<Header>";
sMyXML += "`<TagIdentifier></TagIdentifier>`";....
Here i want to use the String value tagIdentifier inside the XML
String tagIdentifier=12;
How can i add this string inside the tags
<TagIdentifier>12</TagIdentifier>
sMyXML += "<TagIdentifier>" + tagIdentifier + "</TagIdentifier>";
but of course,
String tagIdentifier = 12;
isn't valid in the first place.
Building up XML as strings is a headache though. For all but the most trivial applications, I would recommend an XML library of which there are several - I'm using dom4j at the moment.
you can build a String like mjg123 wrote by your self (in case of large XML string I recommend use StringBuilder rather than joining pure strings) or you can use API for building XML file. For example dom4j.
It depends on how general you want it to work. If you want it to be really generic you can use XMLParser, if you want it to work without much effort on your side you can
"<TagIdentifier>"+tagIdentifier+"</TagIdentifier>".
However in your code i would also change
String tagIdentifier=12;
to
String tagIdentifier = new String(new Integer(12));
The "new String" is pretty obvious; the "new Integer" is so that you are safe when you decide to parameterize "12" to something e.g. the user entered.

What is the most efficient way to format UTF-8 strings in java?

I am doing the following:
String url = String.format(WEBSERVICE_WITH_CITYSTATE, cityName, stateName);
String urlUtf8 = new String(url.getBytes(), "UTF8");
Log.d(TAG, "URL: [" + urlUtf8 + "]");
Reader reader = WebService.queryApi(url);
The output that I am looking for is essentially to get the city name with blanks (e.g., "Overland Park") to be formatted as Overland%20Park.
Is it this the best way?
Assuming you are actually wanting to encode your string for use in a URL (ie, "Overland Park" can also be formatted as "Overland+Park") you want URLEncoder.encode(url, "UTF-8"). Other unsafe characters will be converted to the %xx format you are asking for.
The simple answer is to use URLEncoder.encode(...) as stated by #Recurse. However, if part or all of the URL has already been encoded, then this can lead to double encoding. For example:
http://foo.com/pages/Hello%20There
or
http://foo.com/query?keyword=what%3f
Another concern with URLEncoder.encode(...) is that it doesn't understand that certain characters should be escaped in some contexts and not others. So for example, a '?' in a query parameter should be escaped, but the '?' that marks the start of the "query part" should not be escaped.
I think that safer way to add missing escapes would be the following:
String safeURI = new URI(url).toASCIIString();
However, I haven't tested this ...

Categories

Resources