Can XmlPullParser be configured to NOT ignore whitespace? - java

I'm using XmlPullParser to parse some custom XML. I open the XML file like this...
XmlPullParser xpp = activity.getResources().getXml(R.xml.myXML);
And later I read the following XML node
<L> ####</L>
using the code
String str = "";
if( xpp.next() == XmlPullParser.TEXT )
str = xpp.getText();
return str;
Instead of returning
' ####'
I get
' ####'
(single quotes added by me for clarity.) NOTE The missing leading space.
It appears getText is stripping the leading space? When the XML doesn't contain a leading space, my code works as expected.
I can't find any property of XMLPullParser that allows me to tell it to keep all whitespace. Nor can I change the XML to add double quotes around the text with leading whitespace.

XmlPullParser.next() and XmlPullParser.getText() can return the content in several pieces, in an unpredictable way. In your case, maybe the very first space char is returned as a first piece and silently dropped by your program if it iterates on xpp.next() without concatenating the pieces. The algorithm should more be:
String str = "";
while (xpp.next() == XmlPullParser.TEXT) {
str += xpp.getText();
}
return str;

Related

Regex: extract String from String

I need a regex that makes it possible to extract a part out of String. I get this String by parsing a XML-Document with DOM. Then I am looking for the "§regex" part in this String and now I try do extract the value of it. e.g. "([A-ZÄÖÜ]{1,3}[- ][A-Z]{1,2}[1-9][0-9]{0,3})" from the rest.
The Problem is, I don´t know how to make sure the extracted part ends with a ")"
This regex needs to work for every value given. The goal is to write only the Value in brackets after the "§regex=" including the brackets into a String.
<UML:TaggedValue tag="description" value=" random Text §regex=([A-ZÄÖÜ]{1,3}[- ][A-Z]{1,2}[1-9][0-9]{0,3}) random text"/>
private List<String> findRegex() {
List<String> forReturn = new ArrayList<String>();
for (String str : attDescription) {
if (str.contains("§regex=")) {
String s = str.replaceAll(regex);
forReturn.add(s);
}
}
return forReturn;
}
attDescription is a list which contains all Attributes found in the XML-Document parsed.
So far i tried this regex: ".*(§regex=)(.*)[)$].*", "$2" but this cuts off the ")" and does not delete the text infront of the searched part. Even with the help of this http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html I really don´t understand how to get what I need.
It seems to work for me (with this example anyway) if I use this in place of String s = str.replaceAll(regex);
String s = str.replaceAll( ".*§regex=(\\(.*\\)).*", "$1" );
It's just looking for a substring enclosed by parentheses following §regex=.
This seems to work:
String s = str.replaceAll(".*§regex=\\((.*)[)].*", "$1");
Note:
Escape the leading bracket
The $ inside a character class is a literal $ - ignore it, because your regex should always end with a bracket
No need to capture the fixed text
Test code, noting that this works with brackets in/around the regex:
String str = "random Text §regex=(([A-ZÄÖÜ]{1,3}[- ][A-Z]{1,2}[1-9][0-9]{0,3})) random text";
String s = str.replaceAll(".*§regex=\\((.*)[)].*", "$1");
System.out.println(s);
Output:
([A-ZÄÖÜ]{1,3}[- ][A-Z]{1,2}[1-9][0-9]{0,3})

How do I check if a string contains at least one character from another string?

I'm using Java 6. I have a string that contains "special" characters -- "!##$%^&*()_". How do I write a Java expression to check if another string, "password", contains at least one of the characters defined in the first string? I have
regForm.getPassword().matches(".*[\\~\\!\\#\\#\\$\\%\\^\\&\\*\\(\\)\\_\\+].*")
but I don't want to hard-code the special characters, rather load them into a string from a properties file. So I'm having trouble figuring out how to escape everything properly after I load it from the file.
You can try creating regex from string that contains special characters and escape symbols using Pattern.quote. Try this:
String special = "!##$%^&*()_";
String pattern = ".*[" + Pattern.quote(special) + "].*";
regFrom.getPassword().matches(pattern);
I think simple looping the regex to check each character might work better and will work for all the cases:
String special = "!##$%^&*()_";
boolean found = false;
for (int i=0; i<special.length(); i++) {
if (regFrom.getPassword().indexOf(special.charAt(i)) > 0) {
found = true;
break;
}
}
if (found) { // password has one of the allowed characters
//...
//...
}
One option is to use StringTokenizer, and see if it returns more than 1 substring. It has a constructor that allows specifying the characters to split by.
Anyway, my favourite option would be just iterating the characters and using String.indexOf.

Removing a substring between two characters (java)

I have a java string such as this:
String string = "I <strong>really</strong> want to get rid of the strong-tags!";
And I want to remove the tags. I have some other strings where the tags are way longer, so I'd like to find a way to remove everything between "<>" characters, including those characters.
One way would be to use the built-in string method that compares the string to a regEx, but I have no idea how to write those.
Caution is advised when using regex to parse HTML (due its allowable complexity), however for "simple" HTML, and simple text (text without literal < or > in it) this will work:
String stripped = html.replaceAll("<.*?>", "");
To avoid Regex:
String toRemove = StringUtils.substringBetween(string, "<", ">");
String result = StringUtils.remove(string, "<" + toRemove + ">");
For multiple instances:
String[] allToRemove = StringUtils.substringsBetween(string, "<", ">");
String result = string;
for (String toRemove : allToRemove) {
result = StringUtils.remove(result, "<" + toRemove + ">");
}
Apache StringUtils functions are null-, empty-, and no match- safe
You should use
String stripped = html.replaceAll("<[^>]*>", "");
String stripped = html.replaceAll("<[^<>]*>", "");
where <[^>]*> matches substrings starting with <, then zero or more chars other than > (or the chars other than < and > if you choose the second version) and then a > char.
Note that <.*?>
is less efficient than a negated character class (see Which would be better non-greedy regex or negated character class?)
does not find substrings spanning across multiple lines (see How do I match any character across multiple lines in a regular expression?), but it can be solved with (?s)<.*?>, <(?s:.)*?>, <[\w\W]*?>, and many other not-so-efficient variations.
See the regex demo.

reading line in bufferedReader

From the javadoc
public String readLine()
throws IOException
Read a line of text. A line is considered to be terminated by any one of a line feed ('\n'), a carriage return ('\r'), or a carriage return followed immediately by a linefeed.
I have following kind of text :
Now the earth was formless and empty. Darkness was on the surface
of the deep. God's Spirit was hovering over the surface
of the waters.
I am reading lines as:
while(buffer.readline() != null){
}
But, the problem is it is considering a line for string upto before newline.But i would like to consider line when string ends with .. How would i do it?
You can use a Scanner and set your own delimiter using useDelimiter(Pattern).
Note that the input delimiter is a regex, so you will need to provide the regex \. (you need to break the special meaning of the character . in regex)
You can read a character at a time, and copy the data to a StringBuilder
Reader reader = ...;
StringBuilder sb = new StringBuilder();
int ch;
while((ch = reader.read()) >= 0) {
if(ch == '.') break;
sb.append((char) ch);
}
Use a java.util.Scanner instead of a buffered reader, and set the delimiter to "\\." with Scanner.useDelimiter().
(but be aware that the delimiter is consumed, so you'll have to add it again!)
or read the raw string and split it on each .
You could split the whole text by every .:
String text = "Your test.";
String[] lines = text.split("\\.");
After you split the text you get an array of lines. You could also use a regex if you want more control, e.g. to split the text also by : or ;. Just google it.
PS.: Perhaps you have to remove the new line characters first with something like:
text = text.replaceAll("\n", "");

Remove end of line characters from end of Java String

I have a string which I'd like to remove the end of line characters from the very end of the string only using Java
"foo\r\nbar\r\nhello\r\nworld\r\n"
which I'd like to become
"foo\r\nbar\r\nhello\r\nworld"
(This question is similar to, but not the same as question 593671)
You can use s = s.replaceAll("[\r\n]+$", "");. This trims the \r and \n characters at the end of the string
The regex is explained as follows:
[\r\n] is a character class containing \r and \n
+ is one-or-more repetition of
$ is the end-of-string anchor
References
regular-expressions.info/Anchors, Character Class, Repetition
Related topics
You can also use String.trim() to trim any whitespace characters from the beginning and end of the string:
s = s.trim();
If you need to check if a String contains nothing but whitespace characters, you can check if it isEmpty() after trim():
if (s.trim().isEmpty()) {
//...
}
Alternatively you can also see if it matches("\\s*"), i.e. zero-or-more of whitespace characters. Note that in Java, the regex matches tries to match the whole string. In flavors that can match a substring, you need to anchor the pattern, so it's ^\s*$.
Related questions
regex, check if a line is blank or not
how to replace 2 or more spaces with single space in string and delete leading spaces only
Wouldn't String.trim do the trick here?
i.e you'd call the method .trim() on your string and it should return a copy of that string minus any leading or trailing whitespace.
The Apache Commons Lang StringUtils.stripEnd(String str, String stripChars) will do the trick; e.g.
String trimmed = StringUtils.stripEnd(someString, "\n\r");
If you want to remove all whitespace at the end of the String:
String trimmed = StringUtils.stripEnd(someString, null);
Well, everyone gave some way to do it with regex, so I'll give a fastest way possible instead:
public String replace(String val) {
for (int i=val.length()-1;i>=0;i--) {
char c = val.charAt(i);
if (c != '\n' && c != '\r') {
return val.substring(0, i+1);
}
}
return "";
}
Benchmark says it operates ~45 times faster than regexp solutions.
If you have Google's guava-librariesin your project (if not, you arguably should!) you'd do this with a CharMatcher:
String result = CharMatcher.any("\r\n").trimTrailingFrom(input);
String text = "foo\r\nbar\r\nhello\r\nworld\r\n";
String result = text.replaceAll("[\r\n]+$", "");
"foo\r\nbar\r\nhello\r\nworld\r\n".replaceAll("\\s+$", "")
or
"foo\r\nbar\r\nhello\r\nworld\r\n".replaceAll("[\r\n]+$", "")

Categories

Resources